# meanfield_chaos_diffusion_models__6b6339f8.pdf

Mean-field Chaos Diffusion Models

Sungwoo Park 1 Dongjun Kim 2 Ahmed M. Alaa 1 3

In this paper, we introduce a new class of scorebased generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory. We present mean-field chaos diffusion models (MF-CDMs), which address the curse of dimensionality inherent in high-cardinality data by utilizing the propagation of chaos property of interacting particles. By treating high-cardinality data as a large stochastic system of interacting particles, we develop a novel score-matching method for infinitedimensional chaotic particle systems and propose an approximation scheme that employs a subdivision strategy for efficient training. Our theoretical and empirical results demonstrate the scalability and effectiveness of MF-CDMs for managing large high-cardinality data structures, such as 3D point clouds.

1. Introduction

Generative models serve as a fundamental focus in machine learning, aiming to learn a high-dimensional probability density function. Among the contenders such as Normalizing flows (Rezende & Mohamed, 2015) and energy-based models (Zhao et al., 2016), Score-based Generative Models (SGMs), especially have gained widespread recognition of their capabilities on various domains, such as images (Song et al., 2021b), time-series (Tashiro et al., 2021; Park et al., 2023), graphs (Jo et al., 2022) and point-clouds (Zeng et al., 2022). The key idea of SGMs is to conceptualize a combination of forward and reverse diffusion processes as generative models. In forward dynamics, the data density is progressively corrupted by following a Markov probability trajectory, eventually transformed into Gaussian density. Consequently, denoising score networks sequentially remove noises in the reverse dynamics, aiming to restore the original state.

1Department of Electrical Engineering and Computer Sciences, UC Berkeley 2Department of Computer Science, Stanford 3UCSF. Correspondence to: Ahmed M. Alaa <amalaa@berkeley.edu>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

Despite the remarkable empirical successes, recent theoretical studies (De Bortoli, 2022; Chen et al., 2023) have highlighted the limitations on the scalability of SGMs when applied to high-dimensional and high-cardinality data structures. To tackle the challenge, a series of research (Lim et al., 2023; Kerrigan et al., 2023; Dutordoir et al., 2023; Hagemann et al., 2023) broadens the scope of diffusion models, introducing new methods for data representation in an infinite-dimensional function space. These macroscopic approaches fully mitigate dimensionality issues in diffusion modeling; however, they make strong assumptions on the function-valued representations of the input data, which limits their applicability to practical settings such as modeling 3D point clouds.

Figure 1. 3D representations of (νN, µ).

This paper introduces another strategy to manage high cardinality data through the lens of mean-field theory (MFT) and restructure existing SGMs. MFT has long been recognized as a powerful analytical tool for large-scale particle systems in multiple disciplines, such as statistical physics (Kadanoff, 2009), biology (Koehl & Delarue, 1994), and macroeconomics (Lachapelle et al., 2010). Among the diverse concepts developed in MFT, our interest specifically focuses on the property called propagation of chaos (Po C) (Sznitman, 1991a; Gottlieb, 1998), which describes statistical independency and symmetry in proximity to the mean-field limit of large-particle system. While the direct integration of Po C into conventional SGMs poses a considerable challenge due to the infinite dimensionality, our systematic approach begins by defining denoising models with interacting N-particle diffusion dynamics (i.e., , νN, block dots, Fig. 1). We then explore ways to approximate its mean-field limit (i.e., N , µ, organ surface), which can possess extensive representational capabilities. This work is centered on two key contributions to achieve this:

Mean-field Score Matching. We introduce a variational framework on Wasserstein space by applying the Itˆo Wentzell-Lions formula and derive a mean-field score matching (MF-SM) to generalize conventional SGMs for mean-field particle system. We provide mean-field analysis on the asymptotic behavior of the proposed novel

Mean-field Chaos Diffusion Models

framework to elucidate the effectiveness in learning large cardinality data distribution.

Subdivision for Efficiency. For the ease of computational complexity, we introduce a subdivision of chaotic entropy, which establishes piece-wise discontinuous gradient flows and efficiently approximates the true discrepancy in a divide-and-conquer manner.

2. Mean-field Chaos Diffusion Models

2.1. Score-based Generative Models

Before presenting our proposed method, we provide a brief background on SGMs. For notations not discussed, refer to the detailed descriptions in Appendix.

Let us consider a probabilistic space (Y, Ft, P) and two respective diffusion paths for variables t and u := T t.

d Xu = fu(Xu)du + σud Bu, Xu, Xt Y, (1)

d Xt = ft(Xt) σ2 t log ζt(Xt) dt + σtd Bt. (2)

A pair of Markovian probability measures (ζs, νt) corresponding to the system of the above SDEs, called forwardreverse SDEs (i.e., FR-SDEs), illustrates noising and denoising processes, respectively. A primitive form of the standard objective of SGMs is to minimize the discrepancy (e.g., relative entropy, H) between data generative model νT and target data ζ0 at the terminal state of reverse dynamics, t = T:

(P0) min ν[0,T ] H[νT |ζ0], (3)

where ν[0,T ] denotes a path measure on the interval [0, T]. As the direct calculation is intractable, Song et al. (2021b) have shown that the optimization of an alternative tractable formulation, known as score matching objective, can minimize the discrepancy between νT and ζ0. The goal of SGMs is then to train a score network sθ to approximate a score function (i.e., log ζt):

JSM(θ) Et,Xt h sθ(t, Xt) log ζt(Xt) 2i . (4)

Given the basic machinery defined above, one question naturally arises considering the goal outlined in the introduction:

Q1. How can we restructure existing diffusion models to preserve robust performance when dim(Y) ?

Throughout the paper, we address this fundamental question using principles of MFT. As a first step, we begin with dissecting a decomposition of generic FR-SDEs defined on Y (e.g., RNd) into the mean-field interacting N-particle system on the space X (e.g., Rd).

2.2. Mean-field Stochastic Differential Equations

Our new definition of SDEs called mean-field stochastic differential equations (MF-SDEs) takes microscopic perspective to model diffusion processes:

Definition 2.1. (Mean-field SDEs). For the atomless Polish space X, let {Bi,N t }i N be a set of independent Wiener processes on probability space (X, Ft, P). Then, we define the N-particle system as follows:

d Xi,N u = fs(Xi,N u )du + σud Bi,N u , Xu, Xt X, (5)

d Xi,N t = [ft(Xi,N t ) σ2 t log ζt(Xi,N t )]dt + σtd Bi,N t , (6)

where the initial states of each dynamics is i.i.d. standard Gaussian random vectors, i.e., Xi,N 0 N[Id].

The proposed dynamics explicitly delineate the N individual rules of each particle, modeling detailed inter-associations between particles. Upon the structure of MF-SDEs in Definition 2.1, the N-particle system is endowed with weak probabilistic structure ϱN t in the Nd-dimensional coordinate system x N = (x1, , x N) X N and admits a joint density defined as following:

XN t νN t := Law(X1,N t , XN,N t ) = ϱN t dx N, (7)

ϱM,N t (x M) = Z

X N M ϱN t (x N)dx M+1 dx N. (8)

Furthermore, a set of N particles in the proposed system is exchangeable, satisfying the following symmetry property for any given permutation τ SN:

ϱN t (x1, , x N) = ϱN t (xτ(1), , xτ(N)). (9)

Empirical Measures as Data. Compared to the data description νt of the macroscopic approach in FR-SDE, our framework interprets a single instance (e.g., point cloud) as an empirical random measure νN t , in which particles (e.g., point) are represented as marginal random variables Xi,N t ,

P2(Y) νT | {z } FR-SDEs

νN T := ϱNdx N P(P2(X N)) | {z } MF-SDEs

It is clear from the context that the term cardinality stands for the degree of N, and the proposed interpretation features two key points. First, our method simply augments the particle count N in handling high-resolution data instances, keeping the dimensionality d = dim(X). This modeling can explicitly expose the effect of increasing cardinality in the analysis as opposite to FR-SDEs, which adjust the dimensionality of the ambient space Nd = dim(Y) without comprehensive details. Second, data representations νN T naturally inherit the permutation invariance which is essential for efficient learning (Niu et al., 2020; Kim et al., 2021)

Mean-field Chaos Diffusion Models

unstructured data (e.g., sets, point-clouds) as it postulates the exchangeability between the particles (e.g., elements, points) as depicted in Eq. 9. Throughout, this paper focuses on unstructured data generation to fully leverage this symmetry property.

2.3. Propagation of Chaos and Chaotic Entropy

While we have established a system of individual particles to provide flexible representations, our next step is to adjust the original problem of entropy estimation in (P0) for N-particle system. To do so, we consider the N-particle relative entropy as a tool for comparing discrepancy between target and generative representations.

H(νN T |ζ N 0 ) = 1

log ϱN T ζ N 0

ϱN T dx N. (11)

As the forward diffusion process is defined as a time-varying Ornstein-Ulenbeck process (e.g., VP SDE (Song et al., 2021c)), its density for N-particles can be represented as a product of Gaussian measures ζ N t defined as:

dζ N t (x N) :=

j=1 N xj; mζ(t), σ2 ζ(t)Id dxj, (12)

where the mean vector mζ(t) and covariance matrix σ2 ζ(t)Id of forward noising Gaussian process ζt are determined by the selection of the model parameters.

Propagation of Chaos. Now, we address the question in Sec 2.1 by bringing attention to the concept in MFT known as propagation of chaos suggested by Kac (Kac, 1956). Definition 2.2. (Kac s Chaos). We say that the sequence of marginal measures {νM,N t }M N is µt-chaotic, if the following equality holds a.s [t] for all continuous and bounded test functions ϕ in the weak sense:

νM,N t , ϕ N µ M t , ϕ , 1 M N. (13)

The µt-chaotic measures {νM,N t }M N begin to behave as if they are statistically indistinguishable with their mean-field limit µt in weak sense for the infinitely large cardinality (i.e., N ). With the fact1 that our N-particle system already enjoys chaoscity, this work exploits the property presented in Eq. 13 to alleviate analytic and computational complexities in generative modeling with infinitely many particles: A finite number (e.g., M) of chaotic SDEs can be utilized for training and sampling high-cardinality data instances (e.g., µT ) only with marginal errors. We will delve into the detailed theoretical rationale in Sec 4.1.

Chaotic Entropy. To formalize the problem by leveraging Kac s chaos, we articulate our objective as minimization of chaotic entropy (Jabin & Wang, 2017; Hauray &

1Please, refer to Proposition A.3 for details.

Key ν T J MF HT (ν T ) Appx. concepts Sec 2 Sec 3 Sec 4

VP-SDE, (P0) Ours, (P1) Ours, (P2) Ours, (P3)

Table 1. The List of Key Concepts in SGMs for N .

Mischler, 2014), which entails the convergence property H(νN T |ζ N 0 ) N H(µT |ζ0). Particularly, we propose a new challenging problem: extrapolating the macroscopic modeling from the problem (P0) to the microscopic counterpart for infinitely many exchangeable particles.

(P1) min µ[0,T ] H(µT |ζ0) = min ν[0,T ] lim N H(νN T |ζ N 0 ). (14)

The equality holds as the property of Po C guarantees weak convergence νN T w µT . To highlight our approach in addressing the chaotic entropy minimization problem, we have designated our methodology as mean-field chaos diffusion models (MF-CDMs). The latter portion of this paper is dedicated to tackling both theoretical and numerical issues associated with solving problem (P1), by progressively generalizing the main concepts in SGMs. Table 1 outlines how redefined problems in subsequent sections broaden the application of SGMs under the mean-field assumption, featuring the following two key aspects.

(1) SGMs with Chaotic Entropy. Due to the intrinsic symmetry in Eq. 9, a straightforward derivation of a score-based objective with chaotic relative entropy is non-trivial. Section 3 presents the concept of probability measure flows and proposes the mean-field score matching objective (i.e., J MF ) that offers a tractable evaluation of chaotic entropy.

(2) Handling Large Cardinality. Section 4 introduces a novel numerical approximation scheme termed subdivision of entropy, designed to simplify the complex problem presented in (P1) into new manageable sub-problems in (P3), efficiently overcoming computational complexity.

3. Training MF-CDMs with Chaotic Entropy

Analysis based on the coordinate system in Eq. 7 rapidly becomes impractical with varying N, owing to the curse of dimensionality. To circumvent the issue, we explore an equivalent representation of the N-particle system in the space of probability measures: Wasserstein space P2(X), a domain in which both νN t , µt inherently lie.

3.1. Denoising Wasserstein Gradient Flows

We denote P2 as Wasserstein space consisting of absolutely continuous measures, each of which is characterized by bounded second moments, i.e., P2(X) := {ν; dν =

Mean-field Chaos Diffusion Models

ϱdx, Ed2 X (x, x0)dν(x) < } and the metric space (P2(X), W2) can be (Santambrogio, 2017) equipped with 2Wasserstein distance, i.e., W2. This geometric realization allows functional flows E : P2 R along the gradient direction of energy reduction: P2E(ϱ) = ϱ E

δ (ϱ) (x), where the first variation E/ δ(ϱ) (Santambrogio, 2015) is defined as E[ E/ δ(ϱ)ϕ(x)] = limε 0 d/dεE(ϱ + ϵϕ) for all ϕ C 0 (X) satisfying Eϕ = 0. To reformulate MF-SDE in a distributional sense, we adopt the concept of Wasserstein gradient flows (WGFs) in Eq. 15 corresponding to denoising N-particle MF-SDEs in Eq. 6.

tνN t = P2E[νN t ], t [0, T] (15)

E[νN t ] = Z V N(t, x N, νN t ) + σ2 t 2 log ϱN t dνN t . (16)

We specify the functional V N by extending the concept of variance-preserving SDE (Song et al., 2021c) to the proposed mean-field system. Notably, we consider potential functions V N : [0, T] X N X N for N-particles configurations, termed mean-field VP-SDE (MF VP-SDE), which can be characterized by

V N(t, x N) = f N t (x N) + σ2 t log ζ N T t(x N), (17)

where we define a drift function as f N t = βt x N 2 E/4, and the volatility constant is simply set to βt = σ2 t for the pre-defined hyperparameter βt.

tϱN t = LN t ϱN t | {z } MF-SDEs

Prop A.3 tνN t = P2E[νN t ] | {z } d WGFs

Denoising WFGs. Eq. 18 shows that the Liouville equation associated with MF-SDE on the left-hand side can be identified with the proposed WGF on the right-hand side. This implies that our WGF can substitute MF-SDE as a denoising scheme for generative results. From now on, we utilize denoising WGF (d WGF) as our primary tool and derive variation equations in the next section.

3.2. Mean-field Score Matching

This section examines a variational equation associated with chaotic entropy. The core idea is to capture infinitesimal changes in Wasserstein metric by applying Itˆo-Wentzell Lions formula (Dos Reis & Platonov, 2023; Guo et al., 2023) to our d WGFs and derive tractable upper bounds. Theorem 3.1. (Wasserstein Variational Equations) Let M := M(ζ0) < be a squared second moment of target data instance ζ0. We shall refer to the N-particle relative entropy as follows:

HN t (νN t ) := H(νN t |ζ N T t). (19)

Then, for arbitrary temporal variables 0 s < t T, and some numerical constants C0 O(

d + M2), C1 O(T), we have variational equations satisfying

HN t (νN t ) HN s (νN s ) + C0

s O E P2HN r 2 E

s O E x P2HN r 2 F

As shown in Theorem 3.1, the geometric deviation in the Wasserstein space affects the norm of the gradient P2HN r in the right-hand side. This indicates that our variation equation exploits geometric information around the law of particles induced by the Wasserstein gradient (i.e., P2Ht). This approach is opposed to conventional methodologies (Song et al., 2021b; Dockhorn et al., 2022) that employ the variational equation concerning temporal derivative (i.e., t Ht). Section A.6 provides an indepth discussion of the dissimilarity between these two approaches.

As a comprehensive restatement, we refine the right-hand side in Eq. 20 as the Sobolev norm of score functions.

Corollary 3.2. Let W be a norm defined on Sobolev space W 1,2(X N, νN t ). Let us define Gt = log ϱN t log ζ N T t. Then, the N-particle entropy can be upperbounded as follows.

HN T (νN t ) M

0 Gt 2 W dt. (21)

Recall that the Sobolev norm of vector-valued function h W 1,2 is defined as h 2 W = E[ h 2 E + h 2 F ]. Corollary 3.2 asserts that the minimization of the N-particle relative entropy is achievable when the Sobolev norm on the right-hand side tends to be zero. Motivated by recent studies (Dockhorn et al., 2022; Song et al., 2021b), we leverage the inequality in Eq. 21 to derive our mean-field score matching (MF-SM) objective by substituting the score function log ϱN t with score networks sθ.

Definition 3.3. (Mean-field Score-Matching) Let us define score networks, denoted as sθ : Θ [0, T] X N P2 X N, that satisfies mild regularity conditions. Then, we propose a score-matching objective as

J N MF (θ, νN [0,T ]) :=

Et p(t) sθ(t, XN t , νN t ) log ζ N T t(XN t ) 2

where p(t) is the uniform density on [0, T] and we specify the denoising score networks sθ as follows:

sθ(t, x N, νN t ) = Aθ(t, x N) + Bθ[νN t ](x N). (23)

Mean-field Chaos Diffusion Models

Design of Mean-field Interaction. In constructing sθ, we incorporate mean-field interactions to encapsulate the information of external forces affected by their neighboring particles. To be more specific, we propose a local convolutionbased interaction model inspired by grouping operations (Qi et al., 2017a;b; Wang et al., 2019a) in architectures for 3D point-clouds.

Bθ[νN t ](x N) := [Bθ B νN t ](x N). (24)

Here, B denotes a truncated convolution operation with respect to the Euclidean ball BR of radius R. This modeling signifies that interaction with particles outside the convolution domain will be excluded in probability. One may intuitively view this operation as an infinite-dimensional positional encoding, which encapsulates information about geometrically proximate particles. Section A.3 elaborates details on the design of two functions Aθ, Bθ[νN t ].

Variation Equation for µT . From the result obtained in Corollary 3.2, we extend a concept of variation equation for the mean-field limit µT in the subsequent result:

Proposition 3.4. There exist numerical constants C2, C3, C4 > 0 such that the N-particle relative entropy for an infinity cardinality N can be bounded:

H T (µT ) | {z } (P1)

Nd J N MF (θ, νN [0,T ])

+ σ 2 ζ (T) O C2

N + C3 N 1/2 + C4 N 3/2

| {z } Cardinality Errors : E(N)

Proposition 3.4 shows that the minimization problem (P1) on the left-hand side can be upper-bounded with MF-SM and cardinality errors E(N) in the right-hand side. It is worth noting that our variational framework enhances the conventional score matching, particularly for the representation of data with high cardinality. The coefficient 1/

Nd induces robust score estimations and renders the proposed framework robust to large cardinality N, a property not present in conventional SGMs. As a consequence of the result, the chaotic entropy minimization problem (P1) can be restructured to involve MF-SM:

(P2) min θ lim N J N MF (θ, νN [0,T ]). (26)

The restructured objective reveals that score networks sθ is trained to restore vector fields f t βtsθ V

to reconstruct the target instance µT via sampling d WGFs. Unfortunately, optimizing (P2) may confronts intractability with large cardinality as our score networks sθ takes inputs defined on Nd-dimensional space (e.g., XN X N).

4. Subdivision of Chaotic Entropy

Our next step is to design an approximation framework that transforms the score-matching objective into computationally tractable variants. Let N = {Nk; NK = N} be a set of non-decreasing cardinality, and T = {tk; t K = T} be a partition of the interval [0, T], where k {0, . . . , K}. Then, we subdivide Eq. 25 into K sub-sequences to obtain alternative and computable upper-bounds:

Proposition 4.1. (Subdivision) Under the assumption of reducibility2 and b > 0, Nk+1 = b Nk, the chaotic entropy can be split into K sub-problems.

H T (µT ) lim K

σ 2 ζ (T) E(Nk+1) | {z } Eq. 25

k JMF (Nk, θ, νNk [tk,tk+1])

| {z } Subdivision Errors J N MF (θ,νN [0,T ])

We observe that chaotic entropy can be approximated by aggregating K sub-problems of MF-SM, each corresponding to a unique cardinality Nk and a specific interval [tk, tk+1]. This implies that a divide-and-conquer strategy can be effectively employed to address problem (P2), by treating the sub-problems JMF (Nk, , ) individually.

In the decomposed upper-bound in Eq. 27, the particle branching ratio b moderates the impact of sub-problems for large cardinality in the score estimation, leading to improved robustness against N. Our final objective function in (P3) reflects the subdivision of chaotic entropy and the summation is only taken for finite K sub-problems, leveraging the canceling effect gained from the branching ratio.

1 bk JMF (Nk, θ, νNk [tk,tk+1]). (28)

Section A.9.1 contains a detailed algorithmic procedures for training score networks sθ with the objective (P3).

Particle Branching Function Ψθ. The discontinuity of K piece-wise d WGFs {νNk t , t [tk, tk+1]} associated with individual sub-problems makes the sampling schemes intractable, necessitating the development of gluing pieces together to prevent abrupt changes in distribution. As a remedy, we introduce the particle branching function Ψθ Nk+1 to connect the end of previous segment of flows (e.g., νNk tk ) with the start of next flows (e.g., νNk+1 tk ). In a distributional sense, this operation can be represented as a product with a

2See Section A.3 for detailed definition and the discussion.

Mean-field Chaos Diffusion Models

Figure 2. Illustrative Overview of Denoising MF-SDEs/WGFs. MF-SDEs governing M particles are evolved with respect to vector fields f M t + s M θ over the interval [tk 1, tk], interacting with proximate particles lying in BR. The illustration depicts the scenario in which the particle branching function Ψθ transforms the density of M = 3 particles into an expanded density for N = 6 particles (e.g., branching ratio b = 2) following the time interval tk and result in the joint density ϱN t .

push-forward measure:

(Id b 1 | {z } (b 1)Nk Ψθ |{z} Nk )#νNk tk ˆν b Nk tk = νb Nk tk | {z } Nk+1=b Nk

where ( )# stands for the push-forward operator, and Id is a identity operator. As a consequence of particle branching, the intermediate flows of probability measure presented as a solution to d WGFs for Nk particles (i.e., νNk tk ) is augmented with another (b 1)Nk particles, yielding new flows with enhanced cardinality Nk+1 = b Nk. Proposition A.8 reveals the explicit form of optimal particle branching.

Sampling Denoising Dynamics. After finishing training denoising MF-SDEs/WGFs with the triplet (N, N, b), we sample the chaotic dynamics by progressively increasing the cardinality in the middle of the denoising process. The procedure begins by taking initial Gaussian noises distributed as ζ N0 T and propagate particles via Euler scheme with score network sθ until reaching the next branching step at T t1 and each particle branches from N0 to b N0 = N1. By the iteration, we achieve the desired number of chaotic particles. Figure 2 provides an illustrative overview of the sampling procedure with particle branching along with the denoising WFGs. Section A.9.2 contains a detailed algorithmic procedure.

4.1. Mean-field Analysis of MF-CDMs

As this work primarily capitalizes on the mean-field property, this section aims to explore the theoretical implications and benefits of incorporating principles of Po C into the framework of SGMs. The subsequent theoretical findings provide insights to address the question (i.e., Q1) posed earlier in Section 2.1.

Theorem 4.2. (informal) Let f := f(κ) > 0 be a numerical constant dependent on log-Sobolev3 constant κ with respect

3Please refer to Sec A.8 for detailed definition.

to proposed d WGFs. Given mild regularity conditions for sθ, we have short-tailed concentration probability bound:

P h H(νM,N t |µ M t ) ε i (M N )

O(ε ε d) O exp Mf(κ)ε2 Mf(κ)h(R) . (30)

where For the numerical constant h(R) dependent on the radius R > 0 for truncation of convolution defined in Section A.3.

Concentration of Chaotic Entropy. The short-tailed concentration of chaotic entropy in Eq. 30 confirms that a relatively small number of particles M suffices to reconstruct the mean-field surface µt even when the total cardinality diverges to infinite (N ). In addition, it demonstrates that infinite cardinality constraints (i.e., lim N ) specified in (P2) can be circumvented by subdivision of chaotic entropy in (P3), as score estimation errors are tolerable in practice with a finite number of sub-problems |N| < and particle counts {Nk}k K.

Theorem 4.3. (informal) Let us define Ft := Gt 2 E + Gt 2 F , and EνN [0,T ]Et p(t)Ft(XN t ) = J N MF (θ, νN [0,T ]),

there exist constants C5, C6 > 0, N+ q > 4 such that

P Et F(XN t ) JMF (N = 1, θ, µ[0,T ]) ε (31)

N C6 q 1 + N ( q+4)/2q 2!

Concentration of MF-SM. Our second observation in Eq. 31 elucidates that our MF-SM is naturally concentrated on their mean-field limit µt with asymptotically stable probability upper bounds. This shows the remarkable robustness of our objective function when N , where conventional score matching objectives JSM in Eq. 4 are highly vulnerable to this extreme condition because of the absence of guaranteed stability, as illustrated by Eq.31.

Mean-field Chaos Diffusion Models

Figure 3. (Left) Scalability to Data Complexity. Performance comparisons with varying data dimensionality (i.e., d) and cardinality (i.e., N). (Right) Ablation Study on Hyperparameters. Performance variation of MF-CDMs with respect to different hyperparameters; branching ratio b {1, 2, 4, 8} and number of particle branching |K | {1, 2, 4, 8}.

5. Related Works

Mean-field Dynamics in Generative Models. Modeling score-based generative models via population dynamics (Koshizuka & Sato, 2023; Chen et al., 2021; Shi et al., 2023) have gained attention recently. Among these, meanfield dynamics through a particle interaction was explored in (Liu et al., 2022), where the Schr odinger bridge was integrated to handle mean-field games for the approximation of large population data distributions. (Lu et al., 2023) derived score transportation directly from the mean-field Fokker-Planck equation where particle interaction was derived for score-based learning. While these works primarily focus on an analytic perspective and assume an infinite dimensional setting associated with high-dimensional PDEs, our method adopts Po C as a limit algorithm to reduce the potential complexity encountered in dealing with PDEs.

Diffusion Models for Unstructured Data. Recent studies have demonstrated the exceptional performance of diffusion dynamics in point-cloud synthesis (Luo & Hu, 2021; Zhou et al., 2021; Zeng et al., 2022; Tyszkiewicz et al., 2023), with a focus on architectural design to impose structural constraints on unstructured data formats. Another stream of research (Hoogeboom et al., 2022; Xu et al., 2023) considered global geometric constraints to capitalize on equivariance property in the modeling of point-clouds. Despite their superior performance, the aforementioned methods face a limitation in the maximum capacity of cardinality owing to rigid structural constraints on localization. In contrast, our method employs a flexible localization using mean-field interaction, requiring only a weak probabilistic structure over the particle set but consistently assures robust performance.

6. Empirical Study

This section provides a numerical validation of the efficacy of integrating MFT into the SGM framework, particularly in extreme scenarios of large cardinality, where previous works struggle to achieve robust performance.

Benchmarks. We compare our MF-CDMs with wellrecognized models in score-based generative models: VPSDE (Song et al., 2021c), CLD (Dockhorn et al., 2022),

Method (103, 5) (103, 32) (105, 5) (105, 32)

VP-SDEs 2.198 2.683 6.943 7.542 CLD 2.387 2.826 6.411 7.131

DPM 1.924 2.007 6.847 7.448 LION 1.841 1.919 5.234 6.105

MF-CDMs 2.017 2.413 3.167 4.059

Table 2. Performance Evaluation on the Synthetic data. We measure performance across different data complexities (N, d) by applying the sliced 2-Wasserstein distance scaled by a factor of 102. The best results are highlighted in bold.

and diffusion models for 3D point-cloud: DPM (Luo & Hu, 2021), LION (Zeng et al., 2022), PVD (Zhou et al., 2021). For information on the implementation of score networks along with hyperparameters and statistics of datasets with pre-processing, please refer to Sec A.9.

6.1. Synthetic Dataset: Robustness Analysis

The first experiment is designed to evaluate the impact of dimensionality (i.e., d) and cardinality (i.e., N) on the robustness of benchmark SGMs when dealing with unstructured data. For this purpose, we generate a synthetic dataset with an equi-weighted Gaussian mixture {Yn}N n GMMd(dxd) := (1/8) P8 a N[ma, σa Id] where Gaussian parameters (ma, σa) are randomly selected within unitcubes [ 1, 1]d. The challenge arises as all elements {Yn} satisfies p(Ym) = p(Yn) for any m = n N, and this interchangeability complicates to extract meaningful local associations among the elements, which is essential for efficient learning. To evaluate performance, we employ a tool from optimal transport, sliced 2-Wasserstein distance (i.e., SW2) (Kolouri et al., 2019), known for its efficiency in capturing discrepancies between unstructured data instances, especially at high cardinality.

Results. Fig 3 and Table 2 present qualitative results when the set cardinality and dimensionality change within the ranges of N {103, 105} and d {5, 32}. We note that other methods can easily surpass ours, as the proposed mean-field modeling loses its strength and entails excessive computational complexity with small cardinality (i.e., ,

Mean-field Chaos Diffusion Models

Figure 4. Qualitative Results on Med Shape Net Dataset. Both µ N T and ν N K T illustrate the target and generated 3D shapes, where displayed liver object in Med Shape Net dataset comprises a high-cardinality point-set of nearly 2.0E+4 points.

Shape Net Med Shape Net Methods EMD / CD EMD / CD

VP-SDEs 4.860 / 4.585 6.387 / 4.616 CLD 4.083 / 5.865 8.647 / 5.632

DPM 3.058 / 3.269 6.139 / 3.248 PVD 3.445 / 3.032 6.386 / 5.902 LION 3.248 / 3.248 6.221 / 5.135

MF-CDMs 2.627 / 1.877 4.046 / 2.764

Table 3. Performance Evaluation of 3D point-cloud generation on Shape Net/Med Shape Net datasets. The best results are highlighted in bold. Evaluation metrics on EMD and CD are scaled by 102 and 102, respectively.

N = 103). While existing methods show promising results in low cardinality experiments, their performance significantly deteriorates under conditions of extreme cardinality (i.e., , N = 105). The reason for performance decline is due to their shortcomings in the explainable analysis regarding the curse of dimensionality issue and thus lack of effective modeling of inter-associations among elements.

In comparison with benchmarks, our method demonstrates robust performance, significantly outperforming all other benchmarks by a large margin in scenarios of N = 105. Since our methodology has extended VP-SDEs through the integration of Po C in reverse dynamics, the performance gain of MF-CDM over VP-SDEs implies that the chaotic modeling significantly enhances the robustness of conventional SGMs.

6.2. Real-world Dataset: 3D Point-cloud Generation

In the second experiment, we benchmark the empirical performance of MF-CDMs along with existing SGMs for 3D shape diffusion models on two datasets: Shape Net (Chang et al., 2015) and Med Shape Net (Li et al., 2023), with each 3D point-clouds instance consisting of N = 1.0E+4 and N = 2.0E+4 points, respectively. The data cardinality in our experiments is up to 10 times larger than standard setups, which typically focus on scenarios with a relatively limited number of points, (e.g., 2048). For the fair comparison, we utilized evaluation metrics suggested in (Yang et al., 2019) to compare benchmarks (i.e., MMD-EMD, MMDCD). Owing to the numerical instability of these metrics

when applied to high-cardinality objects, we randomly subsampled 2048 points from both the generated νN T and the target µ N T objects and performed numerical comparisons.

Results. Table 3 summarizes performance comparisons with benchmarks. Without requiring any strong localization modules, our MF-CDM surpasses all other benchmarks on two datasets, showing its efficiency in real-world settings. It is worth highlighting that task-oriented methods, such as PVD and LION, have achieved state-of-the-art performance on the Shape Net dataset with 2048 points. However, they suffer from a drastic performance decline when applied to the Med Shape Net dataset as they depend on fixed localization modules, which are primarily optimized for low cardinality data. We also posit that our superiority stems from the concentration property of large particle systems, as supported by our theoretical findings in Section 4.1.

Figure 4 provides a visualization of the intermediate 3D shape during the denoising process with d WGFs. The simulation of d WGFs starts with N0 = 1.25e+3 particles and the number of particles are doubled (e.g., b = 2) at each of the branching steps k {50, 100, 150, 200}, reaching N := N K = 2.0e+4 at the end of the process. The final illustrative result, νN K T , closely resembles the target 3D anatomic structure µ N T (i.e., liver).

7. Conclusion

In this study, we propose MF-CDMs, a novel class of SGMs designed for the efficient generation of unstructured data instances with infinite dimensionality. Beginning with the original entropy minimization problem (P0), we gradually enlarge our discussion to leverage principles of MFT and pose advanced problems (P1) (P3) to deal with the curse of dimensionality issues. Our theoretical results reveal that the MF-CDMs naturally inherit chaoticity, ensuring the robust behavior of our model with infinite cardinality. Experimental results on both synthetic and 3D shape datasets empirically validate the superior capability of our framework in generating data instances. In future works, we hope to apply our methodology across diverse tasks in scientific domains, such as physical simulation of large-particle dynamical system (Karniadakis et al., 2021), large-molecule polymer generation (Anstine & Isayev, 2023).

Mean-field Chaos Diffusion Models

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Adams, R. A. and Fournier, J. J. Sobolev spaces. Elsevier, 2003.

Anonymous. Score-based generative models break the curse of dimensionality in learning a family of sub-gaussian distributions. In Submitted to The Twelfth International Conference on Learning Representations, 2023. under review.

Anstine, D. M. and Isayev, O. Generative models as an emerging paradigm in the chemical sciences. Journal of the American Chemical Society, 145(16):8736 8750, 2023.

Bakry, D. On sobolev and logarithmic sobolev inequalities for markov semigroups. New trends in stochastic analysis (Charingworth, 1994), pp. 43 75, 1997.

Bakry, D., Gentil, I., Ledoux, M., et al. Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.

Beaulieu-Jones, B. K., Wu, Z. S., Williams, C., Lee, R., Bhavnani, S. P., Byrd, J. B., and Greene, C. S. Privacypreserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes, 12(7):e005122, 2019.

Bensoussan, A., Frehse, J., Yam, P., et al. Mean field games and mean field type control theory, volume 101. Springer, 2013.

Bensoussan, A., Frehse, J., and Yam, S. C. P. On the interpretation of the master equation. Stochastic Processes and their Applications, 127(7):2093 2137, 2017.

Bolley, F. Quantitative concentration inequalities on sample path space for mean field interaction. ESAIM: Probability and Statistics, 14:192 209, 2010. doi: 10.1051/ps: 2008033.

Bolley, F., Guillin, A., and Villani, C. Quantitative concentration inequalities for empirical measures on noncompact spaces. Probability Theory and Related Fields, 137:541 593, 2007.

Bossy, M. and Talay, D. A stochastic particle method for the mckean-vlasov and the burgers equation. Mathematics of computation, 66(217):157 192, 1997.

Brezis, H. and Br ezis, H. Functional analysis, Sobolev spaces and partial differential equations, volume 2. Springer, 2011.

Cardaliaguet, P. Notes on mean field games. Technical report, Technical report, 2010.

Cardaliaguet, P. and Lehalle, C.-A. Mean field game of controls and an application to trade crowding. Mathematics and Financial Economics, 12:335 363, 2018.

Carmona, R. and Delarue, F. Probabilistic analysis of meanfield games. SIAM Journal on Control and Optimization, 51(4):2705 2734, 2013.

Carmona, R. and Delarue, F. Forward backward stochastic differential equations and controlled mckean vlasov dynamics. 2015.

Carmona, R. and Lauri ere, M. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games i: the ergodic case. SIAM Journal on Numerical Analysis, 59(3):1455 1485, 2021.

Carmona, R. and Lauri ere, M. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: Ii the finite horizon case. The Annals of Applied Probability, 32(6):4065 4105, 2022.

Carmona, R., Delarue, F., et al. Probabilistic theory of mean field games with applications I-II. Springer, 2018.

Carrillo, J. A., Mc Cann, R. J., and Villani, C. Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana, 19(3):971 1018, 2003.

Carrillo, J. A., Mc Cann, R. J., and Villani, C. Contractions in the 2-wasserstein length space and thermalization of granular media. Archive for Rational Mechanics and Analysis, 179:217 263, 2006.

Chaintron, L.-P. and Diez, A. Propagation of chaos: A review of models, methods and applications. . applications. Kinetic and Related Models, 15(6):1017 1173, 2022.

Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. Shapenet: An information-rich 3d model repository. ar Xiv preprint ar Xiv:1512.03012, 2015.

Chen, H., Lee, H., and Lu, J. Improved analysis of scorebased generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pp. 4735 4763. PMLR, 2023.

Mean-field Chaos Diffusion Models

Chen, T., Liu, G.-H., and Theodorou, E. A. Likelihood training of schr\ odinger bridge using forward-backward sdes theory. ar Xiv preprint ar Xiv:2110.11291, 2021.

Collet, J.-F. and Malrieu, F. Logarithmic sobolev inequalities for inhomogeneous markov semigroups. ESAIM: Probability and Statistics, 12:492 504, 2008.

Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. H. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1):195 259, 2009.

De Bortoli, V. Convergence of denoising diffusion models under the manifold hypothesis. ar Xiv preprint ar Xiv:2208.05314, 2022.

Del Moral, P. Mean field simulation for Monte Carlo integration. CRC press, 2013.

Dembo, A. and Zeitouni, O. Large deviations techniques and applications, volume 38. Springer Science & Business Media, 2009.

Dockhorn, T., Vahdat, A., and Kreis, K. Score-based generative modeling with critically-damped langevin diffusion. In International Conference on Learning Representations, 2022.

Dos Reis, G. and Platonov, V. Itˆo-wentzell-lions formula for measure dependent random fields under full and conditional measure flows. Potential Analysis, 59(3):1313 1344, 2023.

dos Reis, G., Engelhardt, S., and Smith, G. Simulation of mckean vlasov sdes with super-linear growth. IMA Journal of Numerical Analysis, 42(1):874 922, 2022.

Dutordoir, V., Saul, A., Ghahramani, Z., and Simpson, F. Neural diffusion processes, 2023.

Ethier, S. N. and Kurtz, T. G. Markov processes: characterization and convergence. John Wiley & Sons, 2009.

Fournier, N. and Guillin, A. On the rate of convergence in wasserstein distance of the empirical measure. Probability theory and related fields, 162(3-4):707 738, 2015.

Franceschi, J.-Y., Gartrell, M., Santos, L. D., Issenhuth, T., de B ezenac, E., Chen, M., and Rakotomamonjy, A. Unifying gans and score-based diffusion as generative particle models, 2023.

Germain, M., Mikael, J., and Warin, X. Numerical resolution of mckean-vlasov fbsdes using neural networks. Methodology and Computing in Applied Probability, 24 (4):2557 2586, 2022.

Gottlieb, A. D. Markov transitions and the propagation of chaos. University of California, Berkeley, 1998.

Guillin, A., Liu, W., Wu, L., and Zhang, C. The kinetic fokker-planck equation with mean field interaction. Journal de Math ematiques Pures et Appliqu ees, 150:1 23, 2021.

Guo, X., Pham, H., and Wei, X. Itˆo s formula for flows of measures on semimartingales. Stochastic Processes and their Applications, 159:350 390, 2023.

Hagemann, P., Ruthotto, L., Steidl, G., and Yang, N. T. Multilevel diffusion: Infinite dimensional score-based diffusion models for image generation. ar Xiv preprint ar Xiv:2303.04772, 2023.

Han, J., Hu, R., and Long, J. Learning high-dimensional mckean-vlasov forward-backward stochastic differential equations with general distribution dependence. ar Xiv preprint ar Xiv:2204.11924, 2022.

Hauray, M. and Mischler, S. On kac s chaos and related problems. Journal of Functional Analysis, 266(10):6055 6157, 2014.

Ho, J. and Salimans, T. Classifier-free diffusion guidance, 2022.

Hoogeboom, E., Satorras, V. G., Vignac, C., and Welling, M. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp. 8867 8887. PMLR, 2022.

Jabin, P.-E. and Wang, Z. Mean field limit for stochastic particle systems. Active Particles, Volume 1: Advances in Theory, Models, and Applications, pp. 379 402, 2017.

Jo, J., Lee, S., and Hwang, S. J. Score-based generative modeling of graphs via the system of stochastic differential equations. In International Conference on Machine Learning, pp. 10362 10383. PMLR, 2022.

Kac, M. Foundations of kinetic theory. In Proceedings of The third Berkeley symposium on mathematical statistics and probability, volume 3, pp. 171 197, 1956.

Kadanoff, L. P. More is the same; phase transitions and mean field theories. Journal of Statistical Physics, 137: 777 797, 2009.

Kaissis, G. A., Makowski, M. R., R uckert, D., and Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence, 2(6):305 311, 2020.

Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L. Physics-informed machine learning. Nature Reviews Physics, 3(6):422 440, 2021.

Mean-field Chaos Diffusion Models

Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. ar Xiv preprint ar Xiv:1710.10196, 2017.

Kerrigan, G., Ley, J., and Smyth, P. Diffusion generative models in infinite dimensions. In International Conference on Artificial Intelligence and Statistics, pp. 9538 9563. PMLR, 2023.

Kim, J., Yoo, J., Lee, J., and Hong, S. Setvae: Learning hierarchical composition for generative modeling of set-structured data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15059 15068, 2021.

Kloeckner, B. A geometric study of wasserstein spaces: Euclidean spaces. Annali della Scuola Normale Superiore di Pisa-Classe di Scienze, 9(2):297 323, 2010.

Koehl, P. and Delarue, M. Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. Journal of molecular biology, 239(2):249 275, 1994.

Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., and Rohde, G. Generalized sliced wasserstein distances. Advances in neural information processing systems, 32, 2019.

Koshizuka, T. and Sato, I. Neural lagrangian schr\ {o}dinger bridge: Diffusion modeling for population dynamics. In The Eleventh International Conference on Learning Representations, 2023.

Kunita, H. Stochastic flows and stochastic differential equations, volume 24. Cambridge university press, 1997.

Lachapelle, A., Salomon, J., and Turinici, G. Computation of mean field equilibria in economics. Mathematical Models and Methods in Applied Sciences, 20(04):567 588, 2010.

Ledoux, M. Concentration of measure and logarithmic sobolev inequalities. In Seminaire de probabilites XXXIII, pp. 120 216. Springer, 2006.

Lee, H., Lu, J., and Tan, Y. Convergence for score-based generative modeling with polynomial complexity. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.

Li, J., Pepe, A., Gsaxner, C., Luijten, G., Jin, Y., Ambigapathy, N., Nasca, E., Solak, N., Melito, G. M., Memon, A. R., et al. Medshapenet a large-scale dataset of 3d medical shapes for computer vision. ar Xiv preprint ar Xiv:2308.16139, 2023.

Lim, S., Yoon, E., Byun, T., Kang, T., Kim, S., Lee, K., and Choi, S. Score-based generative modeling through stochastic evolution equations in hilbert spaces. In Thirtyseventh Conference on Neural Information Processing Systems, 2023.

Liu, G.-H., Chen, T., So, O., and Theodorou, E. Deep generalized schr odinger bridge. In Advances in Neural Information Processing Systems, 2022.

Lott, J. Some geometric calculations on wasserstein space. Communications in Mathematical Physics, 277(2):423 437, 2008.

Lu, C., Zheng, K., Bao, F., Chen, J., Li, C., and Zhu, J. Maximum likelihood training for score-based diffusion odes by high order denoising score matching. In International Conference on Machine Learning, pp. 14429 14460. PMLR, 2022.

Lu, J., Wu, Y., and Xiang, Y. Score-based transport modeling for mean-field fokker-planck equations. ar Xiv preprint ar Xiv:2305.03729, 2023.

Luo, S. and Hu, W. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837 2845, 2021.

Malag o, L., Montrucchio, L., and Pistone, G. Wasserstein riemannian geometry of gaussian densities. Information Geometry, 1:137 179, 2018.

Malrieu, F. Logarithmic sobolev inequalities for some nonlinear pde s. Stochastic processes and their applications, 95(1):109 132, 2001.

Malrieu, F. Convergence to equilibrium for granular media equations and their euler schemes. The Annals of Applied Probability, 13(2):540 560, 2003.

Niu, C., Song, Y., Song, J., Zhao, S., Grover, A., and Ermon, S. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp. 4474 4484. PMLR, 2020.

Øksendal, B. and Øksendal, B. Stochastic differential equations. Springer, 2003.

Otto, F. and Villani, C. Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. Journal of Functional Analysis, 173(2):361 400, 2000.

Panaretos, V. M. and Zemel, Y. Statistical aspects of wasserstein distances. Annual review of statistics and its application, 6:405 431, 2019.

Mean-field Chaos Diffusion Models

Park, S., Park, B., Lee, M., and Lee, C. Neural stochastic differential games for time-series analysis. In Proceedings of the 40th International Conference on Machine Learning, ICML 23, 2023.

Pidstrigach, J., Marzouk, Y., Reich, S., and Wang, S. Infinitedimensional diffusion models for function spaces. ar Xiv preprint ar Xiv:2302.10130, 2023.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017a.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.

Rasul, K., Seward, C., Schuster, I., and Vollgraf, R. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp. 8857 8868. PMLR, 2021.

Rezende, D. and Mohamed, S. Variational inference with normalizing flows. In International conference on machine learning, pp. 1530 1538. PMLR, 2015.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684 10695, 2022.

Ruthotto, L., Osher, S. J., Li, W., Nurbekyan, L., and Fung, S. W. A machine learning framework for solving highdimensional mean field game and mean field control problems. Proceedings of the National Academy of Sciences, 117(17):9183 9193, 2020.

Santambrogio, F. Optimal transport for applied mathematicians. Birk auser, NY, 55(58-63):94, 2015.

Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bulletin of Mathematical Sciences, 7:87 154, 2017.

Shi, Y., Bortoli, V. D., Campbell, A., and Doucet, A. Diffusion schr odinger bridge matching. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.

Song, Y., Durkan, C., Murray, I., and Ermon, S. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34: 1415 1428, 2021b.

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021c.

Song, Y., Dhariwal, P., Chen, M., and Sutskever, I. Consistency models, 2023.

Strocchi, M., Augustin, C. M., Gsell, M. A., Karabelas, E., Neic, A., Gillette, K., Razeghi, O., Prassl, A. J., Vigmond, E. J., Behar, J. M., et al. A publicly available virtual cohort of four-chamber heart meshes for cardiac electromechanics simulations. Plo S one, 15(6):e0235145, 2020.

Sznitman, A.-S. Topics in propagation of chaos. Lecture notes in mathematics, pp. 165 251, 1991a.

Sznitman, A.-S. Topics in propagation of chaos. In Ecole d Et e de Probabilit es de Saint-Flour XIX 1989, pp. 165 251. Springer Berlin Heidelberg, 1991b.

Tashiro, Y., Song, J., Song, Y., and Ermon, S. Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804 24816, 2021.

Thorpe, M., Nguyen, T. M., Xia, H., Strohmer, T., Bertozzi, A., Osher, S., and Wang, B. GRAND++: Graph neural diffusion with a source term. In International Conference on Learning Representations, 2022.

Tyszkiewicz, M. J., Fua, P., and Trulls, E. Gecco: Geometrically-conditioned point diffusion models. ar Xiv preprint ar Xiv:2303.05916, 2023.

Villani, C. Hypocoercivity, 2006.

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon, J. M. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38 (5):1 12, 2019a.

Wang, Y. et al. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics, 38(5):146, 2019b.

Xu, M., Powers, A. S., Dror, R. O., Ermon, S., and Leskovec, J. Geometric latent diffusion models for 3d molecule generation. In International Conference on Machine Learning, pp. 38592 38610. PMLR, 2023.

Yang, G., Huang, X., Hao, Z., Liu, M.-Y., Belongie, S., and Hariharan, B. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the

Mean-field Chaos Diffusion Models

IEEE/CVF international conference on computer vision, pp. 4541 4550, 2019.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. Deep sets. Advances in neural information processing systems, 30, 2017.

Zeng, X., Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., and Kreis, K. Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems, 2022.

Zhao, J., Mathieu, M., and Le Cun, Y. Energy-based generative adversarial networks. In International Conference on Learning Representations, 2016.

Zhou, L., Du, Y., and Wu, J. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5826 5835, 2021.

Mean-field Chaos Diffusion Models

A. Appendix

A.1. Notations

Throughout the paper, we adhere to the following notations:

Without loss of generality, we employ the same notation for the tensor product across different objects, including functions, and probability measures denoted as f g, µ ν.

For any member of continuous bounded and integrable function class f, we denote the self N-products and its integral as

f N(x N) = [f(x1), f(x N)], Z f N(x N)µ N(dx N) = Y

Z f(xi)µ(xi), (32)

We denote coordinate system of for N-particle system as x N = (x1, x N) X N where each component is represented as xi X, i N.

For the probability measure ν and the integrable test function f, we simply denote ν, f := R fdν as integral.

The law of the N-particle joint density, νN t , falls within the 2-Wasserstein space, which specifically contains absolutely continuous measures, represented by P2 P2,ac. We routinely presume the absolute continuity of all probability measures in this context.

The N-particle mean-field dynamics is represented as XN νN P2,ac(X N). Following by the absolutely continuity, we define the density representation with Radon-Nikodym derivative: dνN = ϱNdx N.

The first M component of N-particles will be denoted by XM,N νM,N P2,ac(X M) with dνM,N = ϱM,Ndx M.

The N-product of probability measure ν will be denoted by ν N P2,ac(X N) with dν N = ϱ Ndx N.

The Euclidean and Frobenius norm will be denoted by a E, A F , respectively.

Sym(d), Set of symmetric matrices with size (d d); GL(d), general liner matrix group of size (d d).

Within our mathematical context, the symbols are defined as follows: D and Dϱ for abstract and functional derivatives, respectively; x := for the Euclidean gradient; P2 for the Wasserstein gradient; and t for the temporal derivative. For simplicity, the Jacobian matrix of vector-valued objects h will be interchangeably denoted by h := J h.

In the paper, Gt represents the deviation of score functions for an N-particle system, Gi,N t denotes the projection of these functions onto the i-th component, and G t corresponds to its mean-field limit.

Lp(X), denotes the Lp function space on X,

Lip(f) is a Lipschitz constant of continuous and bounded function f.

N denotes the index set for cardinality, and N+ is defined as a set of positive integers.

For the maximum and minimum of two real-values, we follow the convention for the notation in literature as max(a, b) = a b, min(a, b) = a b.

Mean-field Chaos Diffusion Models

A.2. Assumptions and Lemmas

We establish the following assumptions to facilitate existing theoretical frameworks of MFT in analyzing the behavior of the proposed MF-SDEs/d WGFs.

1. (H1). We always assume the large cardinality in data representation, i.e., N d.

2. (H2). For all j N, A and the mean-field interaction B satisfy the Lipschitz continuity with respect to both X and P2,ac,

[B ν](xj) [B ν ](yj) 2 E CB xj yj 2 E + W2 2(ν, ν ) , (33) A(s, xj) A(t, yj) 2 E CA xj yj 2 E + CA(s t s t)2. (34)

By definition and assumptions above, the Lipschitz continuity for the score networks is naturally inferred as

sθ(s, xj, ν) sθ(t, yj, ν ) 2 E 2(CA CB) xj yj 2 E + CA(s t s t)2 + CBW2 2(ν, ν ). (35)

We assume that the second moment of proposed score networks is bounded.

sθ(t, xj, ν) 2 E D(1 + xj 2 E). (36)

3. (H3). There exist real-valued functions A, B C2(X) and AN, BN C2(X N) such that

A = A, AN = AN, B = B, BN = BN, (37)

and those functions are uniformly convex. Equivalently, there exist constants γA, γB, γ B > 0 such that Hessian matrices satisfy following: 2A γAId, γ BId 2B γBId. (38)

4. (H4). Almost surely, we can always find the score networks θ Θ that can replace the score function of MF-SDEs.

P sθ(t, x N, νN t ) = log ϱN t (x N) = 1, N N. (39)

5. (H5). For any t [0, T], there exist a constant q > 2, q = 4, such that the solution to non-linear Fokker-Planck equation µt has finite q-th moment, i.e., (Eµt[ x q])1/q < .

6. (H6). For some constant a > 0, the following numeric estimation are bounded for any 1 M N:

Ex νM,N t exp a||x||2 E < , νM,N t (dx M) = ϱM,N t (x M)dx M. (40)

Lemma A.1. (Gr onwall s Lemma, Theorem 5.1 (Ethier & Kurtz, 2009)). Assume h : [0, T] R is bounded non-negative measurable function on [0, T] and g : [0, T] R is a non-negative integrable function. Let following inequality holds for the constant a > 0,

h(t) B + Z t

0 g(s)h(s)ds, h(t) B exp Z t

0 g(s)ds , t [0, T]. (41)

Mean-field Chaos Diffusion Models

A.3. Exchangeability, Chaocity, Reducibility

In this section, we discuss three core properties (eg, exchangeability, chaocity, reducibility) of the proposed mean-field N-particle system, which will be often referenced in subsequent proofs.

Exchangeability. We first show the universal exchangeability property of sample particles:

Proposition A.2. (Exchangebility of N-particle system.) Let XN t νN t be a solution to mean-field SDEs defined in Eq. 6. Assume that XN 0 N N[Id]. Then, any particles {Xi,N t }i N at any time t (0, T] are exchangeable.

Proof. Since the infinitesimal generator for N-particle system lies in the set LN t {L; τ 1Lτ = L, τ SN}, all the solutions ϱN t (or νN t ) to the Liouville equation in Eq. 18 are trivially symmetric measures at any time t (0, T] when the initial state ϱN 0 (or νN 0 ) is symmetric. The initial constraint ϱN 0 = N N ensures exchangeability of a set of initial states since samples drawn from two projected components πN i N N and πN j N N are i.i.d for any pairs (i, j) N+ N+, meaning that those random variables are exchangeable.

The rationale behind the equality in Eq. 9 is based on the result of Proposition A.2, since the initial state of the denoising process assumes i.i.d Gaussianity with the fact that its associated generator LN t is concurrently acting on every particles.

Design of Score Networks, Aθ, [Bθ B νN t ]. We first consider the equi-weighted N-product of score networks as following.

Aθ(t, x N) = ˆA N θ (t, x N) = 1

N [A(t, x1, θ), , A(t, x N, θ)]T X N. (42)

Note that Lip(ˆA N) = PN j Lip(ˆA N j ) = Lip(Aθ). Consequently, we define truncated convolution for N-particle system as

[Bθ B νN t ](x N) = 1

N [[Bθ ˆνN t ](x1), , [Bθ ˆνN t ](x N)]T , (43)

where ˆνN t = (1/N) PN i Xi ,N t is an empirical projection of νN t P2(X N) onto ˆνN t P(P2(X)). Then, each component in Eq. 43 can be represented as

[Bθ B ˆνN t ](xj) = Z Bθ(xj yj)dνR t [xj](yj), 1 j N, (44)

where Bθ : Rd Rd are score networks parameterized by θ Θ. Here, the truncated measure νR t [xj](yj) with respect to the centered particle xj Rd is defined as

dνR t [xj](yj) = χB xj R ˆνN t (dyj)

ˆνN t [Bxj R ] , (45)

where Bxj R is a Euclidean ball of radius R centered at xj and χA represents an indicator function defined on any set A Rd.

Reducibility. We say that the function h : X N X N is reducible if there exists at least one X-valued function ˆh : [0, T] X X such that h = ˆh N uniformly, where the function product ˆh N(t, x N) X N is defined in Eq. 32. With the definition, the notion of reducibility can be formalized as a kernel of the following functional R on Sobolev space:

R(h) := inf ˆh W 1,2

h h(t, x N) ˆh N(t, x N) W

Any functions h in the kernel of functional i.e., Ker(R) = {h; R(h) = 0, h W 1,2(X N)} operates in a particle-wise manner, acting on each particle in parallel.

By the direct calculation, one can show that our score networks (i.e., sθ := Aθ + [Bθ B νN t ]) are reducible and ready to be implemented for our purpose, as Proposition A.3 assures the chaoscity. Furthermore, one can easily show that vector fields V N of mean-field VP-SDE in 17 also satisfy reducibility. The reducibility condition, particularly, results in substantial

Mean-field Chaos Diffusion Models

computational efficiency in the modeling of score networks Aθ, Bθ Ker(R). It permits point-wise operation through GPU-based calculations, thus accelerating the sampling process of the N-particle system in high cardinality environments. The reducibility property is critical in our approach, ensuring the particles chaotic behavior and scalability in the practical application of numerical implementation.

Chaocity. We conclusively demonstrate that our N-particle system, modeled by MF-SDEs, not only achieves µT -chaos but also exhibits stability in its limit behavior.

Proposition A.3. (Equivalence) Assuming mild Lipschitz continuity, the following three statements are equivalent:

1. The N-particle entropy Eq. 11 becomes chaotic if score networks sθ are reducible.

2. A joint probability density ϱN T solving the Liouville equation in Eq. 18 is µT -chaotic.

3. The solution to the d WGF for N-particle system in Eq. 18 becomes µt-chaotic if score networks sθ are reducible.

Proof. The classical result of the propagation of chaos (Jabin & Wang, 2017) with the Lipschitz continuity assumption in (H2) assures that the denoising dynamics with reducible score networks induce chaoscity as exchangeability is already satisfied by the result of Prop A.2. Following by the result suggested in Theorem 1.4 (Hauray & Mischler, 2014), Kac s chaos (i.e., µT = lim N νN T ) identically implies chaotic entropy given by assumptions of Lipschitz continuity.

A.4. Wasserstein Variation Equation

Gradient flows on P2,ac, Itˆo s flows of Measures. With mild assumptions on the regularity of energy functionals (e.g., functional differentiability), Wasserstein gradient can be identified with Lions L-derivative (Cardaliaguet, 2010) by utilizing Gˆateaux (or Fr echet) derivative of semi-martingale lifting. To be more specific, Theorem A.4 reveals the fundamental structure that Eq. 50 can be rewritten in an alternative form based on a functional analytic perspective.

Theorem A.4. (Carmona et al., 2018) Let us assume that functional E has a first variation E/ δ|µ for any µ K P2,ac, and define spatial gradient of first variation as

P2,ac Rd (µ, x) 7 x E

δ [µ](x) Rd. (47)

Assume that the mapping is jointly continuous in (µ, x), and well-defined, at most of the linear growth in Rd, uniformly bounded in subset K P2,ac. Then Lions L-derivative is identical to the spatial gradient of the first variation.

For the a test function φ and a solution ϱt to d WGFs for N (e.g., Mc Kean-Vlasov equation), we apply Gateaux derivative to the infinite-dimensional energy functional E : [0, T] L2(X) R,

E(t, ϱt) = Z DϱE(t, ϱt, x)

= E x DϱE(t, ϱt, x) V (t, x, νt) + 1

2Tr[Σ(t, x)Σ(t, x)T 2 x DϱE(t, ϱt, x)] dt, (48)

A variety notions for the derivatives of Equation 48 have been explored in the literature (Guo et al., 2023; Carmona et al., 2018; Dos Reis & Platonov, 2023; Santambrogio, 2015). We examine the identity and details among them as follows:

x DϱE|t,x,ϱ=ϱt Sec 7.2 (Santambrogio, 2015) x E

δ |t,x,ϱt Theorem A.4 P2E|t,x,ϱt. (49)

Assuming the appropriate regularity conditions for each energy functional, we find that three distinct notions of derivatives in Eq 49 are congruent. This observation leads us to delve into an alternative definition of the functional derivative and examine its role in defining the evolution of measures over time.

Mean-field Chaos Diffusion Models

Definition A.5. (Itˆo s Flows of Measures) Given semi-martingale X( ) with finite variation E[Var(V )] < and finite quadratic variation E[d[X( ), X( )]] < , the time-varying energy functional E : [0, T] P2,ac R, E C1,1(P2(X)) associated with differential calculus on the Wasserstein space P2,ac evolves according to dynamics defined as:

d E(t, Law(Xt)) = E x E

δ (t, Law(Xt)) d Xt

δ (t, Law(Xt)) d[Xt, Xt] . (50)

where , 2 are gradient and Hessian operators, and the expectation is taken with the law of semi-martingale X( ).

Definition A.5 is a pivotal tool in our paper as it offers a closed form for the upper bounds of our variational equation. The following variation equation clearly delineates that the normalized entropy is influenced by fluctuations of Wasserstein metric. Now, we are ready to derive our Wasserstein variation equation of functional E = HN t with aforementioned notions:

Theorem 3.1 (Variation Equations for N-particle Relative Entropy). For arbitrary temporal variables 0 s < t T, there exist constants C0, C1 > 0 satisfying the following variational equation:

HN t (νN t ) HN s (νN s ) + C0

s O E P2HN u 2 E

s O E x P2HN u 2 du. (51)

Proof. We start by deriving the proposed score-matching objective. Let us consider a semi-martingale νt Xt, d Xt = V dt + Σtd Wt for V := V 1 in Eq. 144 with progressively measurable processes ft, log ζT t. We define the time-varying energy functional E as relative entropy

E(t, µt) = H(µt|ζT t) := H(t, µt) := Ht (52)

With the notation νN t = Law(XN t ), the functional Ht evolves with differential calculus by Itˆo s flow of measures introduced in Definition A.5 associated with Wasserstein gradient flow in Eq. 15:

d H(t, νN t ) = E x H

δ (t, νN t ) d XN t

δ (t, νN t ) d[XN t , XN t ] . (53)

where , 2 are gradient and Hessian operators, and the expectation is taken with respect to the law of semi-martingale X( ). Then, the direct application of variation equation in Definition A.5 to entropy Ht gives

Ht = Hs + Z t

log ϱN u ζ N T u

log ϱN u ζ N T u

d[XN u , XN u ]T #

log ϱN u ζ N T u

log ϱN u ζ N T u

s ΣuΣT u du

log ϱN u ζ N T u

log ϱN u ζ N T u

s ΣuΣT u du F

where , 2 denote Euclidean gradient and Hessian operators with respect to the spatial axis, and F is Frobenius norm. The first equality holds as the Wasserstein gradient is identified with spatial gradient of the first variation. Note that first variation of entropy-type functionals can be directly obtained from Section 8.2 (Santambrogio, 2015).

H[νN t |ζ N T t] δ (x N) = log ϱN t (x N) log ζ N T t(x N) + 1, (55)

H[νN t |ζ N T t] δ (µ) = P2Ht[νN t ] = log ϱN t (x N) log ζ N T t(x N), (56)

x P2Ht[νN t ] = 2 log ϱN t (x N) 2 log ζ N T t(x N). (57)

Mean-field Chaos Diffusion Models

For the deterministic log-probabilities log ϱt and log ζT t, the expectation of martingale terms vanishes

E log ϱN u Σud Wu = EE log ϱN u Σud Wu|Fu = 0,

E log ζ N T uΣud Wu = EE log ζ N T uΣud Wu|Fu = 0.

For the time-varying diffusion matrix Σt, quadratic variation can be calculated as

d[XN ( ), XN ( )]T = Z Σ( )ΣT ( )dt T = Z (Σ( )ΣT ( ))T dt = Z (Σ( )ΣT ( ))dt, Σ( )ΣT ( ) Sym(d). (58)

Let us define X-valued function Gt = sθ log ζT t. Recall the definition of weighted Sobolev space, and its canonical norm with respect to multi-index α, recall the definition of the norm on the weighted Sobolev space W w α,p(X N) as

Gt W α p = Z Gt p Ew0dνt

Z DαGt αwαdνt

where Dα stands for higher-order weak partial derivatives at most L degree Dαφ = Lφ/ xα1 1 xαL L defined as: Z u Dαφdx N = ( 1)K |α| Z φDαudx N. (60)

With aforementioned notations and definitions for |α| = 1, p = 2, the right-hand side can be rewritten by the weighted Sobolev norm.

Ht Hs + Z t

Z log ϱN u log ζ N T u 2

Ew0(x N)dνN u (x N) du

2 log ϱN u 2 log ζ N T u 2

F w1(x N)dνu(x N) du

s G W w 1,2du,

with following weight functions w0, w1:

w0(t, x N) = V N E, w1(t) = E

ΣtΣT t F dt

ΣtΣT t F dt. (62)

To simplify the weighted norm to derive Eq. 20, we apply H older s inequality to the first term in the last line of Eq. 54.

log ϱN u ζ N T u

log ϱN u ζ N T u

The constant C0 can be controlled by

Z w2 0(t, XN t )dνt(XN t ) 1/2

2 EνN t XN t 2 E + EνN t log ζ N T t(XN t ) 2

sup t [0,T ]

2 + 1 σζ(T t)

sup t [0,T ] EνN t XN t 2 E + E YN 2 E sup t [0,T ]

h C3C2e C2T (1 + E XN 0 2 E) + E YN 2 EC4 i1/2

" C3C2e C2T (1 + d N) + NM2(2, ζ0)C4 N

d M2(2, ζ0)

Mean-field Chaos Diffusion Models

where we denote by M(r, ν) the r-th moment of measure ν, and used the fact E YN 2 E = NE|Y|2 = NM2(2, ζ0). The constants C3, C4 are dependent on the choice of hyperparameters βmin and βmax.

C3 = sup t [0,T ]

2 + 1 σζ(T t)

, C4 = sup t [0,T ]

σζ(T t) 1. (65)

Applying H older s inequality again for the quadratic variation term, we have the following upper bound:

log ϱN u ζ N T u

s ΣuΣT u du F

log ϱN u ζ N T u

where C1 can be bounded by

Z ΣsΣT s F ds 1

N T sup t [0,T ]

d 2T((1 T)βmin + Tβmax) (67)

By using these results, Eq. 61 can be further improved as follows:

Z Gt W 1,2 w dt

N E log ϱN t log ζ N T t 2

N E 2 log ϱN t 2 log ζ N T t 2

By rewriting the inequality above, the proof is complete.

Remark. Following by the Sobolev embedding theorem (Brezis & Br ezis, 2011), it is trivial to observe that the Sobolev space can be embedded into L6-space, i.e., W 1,2 , L6, assuring a lower-bound Gt L6 S Gt W 1,2 with numerical constant S > 0 related to the diameter of ΩX , when one restricts on the open and bounded subset ΩX X. Since H older s inequality naturally gives another embedding L6 , L2, the chain of two embeddings bridges the gap between conventional score-matching and the proposed MF-SM.

Corollary 3.2. (Sobolev Score Matching) Let W be a norm defined on Sobolev space W 1,2(X N, νN t ) and M := M(ζ0) < be a second moment of target data instance Y ζ0. Then, we have the following

HN T (νN t ) M

log ϱN t log ζ N T t W dt. (69)

Proof. The proof is direct consequence from Theorem 3.1 and the subsequent inequalities:

0 Gt W 1,2 w dt + HN 0

N E log ϱN t log ζ N T t 2

N E 2 log ϱN t 2 log ζ N T t 2

0 Gt 2 W 1,2dt

Nd M2(2, ζ0) [βmin T(1 T) + T 2βmax]2 Z T

0 Gt 2 W 1,2dt

0 Gt 2 W 1,2dt

= M2(2, ζ0)

log ϱN t log ζ N T t 2

W 1,2dt N 0,

Mean-field Chaos Diffusion Models

where we assume T = 1.0, d = 3, M2 βmax d in the last line. The relative entropy of an N-particle system approaches zero when the right-hand side of Eq. 70 also converges to zero. With the fact that HN 0 = 0 by the assumptions on initial states of d WGFs, the proof is complete.

A.5. Variation Equation in Infinite Particle System

Time-inhomogenous Markov Process for N-particle System. While the proposed denoising MF VP-SDEs are modeled as time-inhomogenous Markovian dynamics, this section starts by providing basic materials for further understanding and analysis of the asymptotic behavior of proposed MF-SDEs/d WGFs. Given the structure of MF-SDEs with joint density ϱN t , the entire N-particle system possesses a PN(X)-valued Markovian property, where its semi-group and infinitesimal generator are given by:

LN t ϱN t (x N) =

i Li tϱN t (x1, , xi, xi+1, x N)(x), (71)

Li tϱi,N t = P2Etϱi,N t = ϱi,N t xi V N β

2 2 xiϱi,N t . (72)

Note that the Liouville equation in Sec 3.1 representing the probabilistic formulation of MF-SDEs is based on the infinitesimal generator defined as above. For the function families f, g Dom(LN t ), we associate the infinitesimal generator with its first and second order carrˆe du champ operator (Bakry, 1997) Γ, Γ2 defined by

Γ(t)(f, g) := 1

2 LN t (fg) f LN t g g LN t f , (73)

Γ2(t)(f) := 1

2 LN t Γ(f) 2Γ(f, LN t f) . (74)

Recall that we say that the diffusion LN for probability measure of time-homogeneous Markov process enjoys the logarithmic Sobolev inequality: Γ2(f) υΓ(f, f) for arbitrary υ R+. The goal is to generalize this type of functional inequality to time-inhomogenous dynamics. For this, consider a diffusion process, which has a infinitesimal generator Lt as follows:

a,b d [σσT ]ab(t) abf + X

a d vi(t, x) af, (75)

where the infinitesimal generator L1 t is associated with SDE of following type:

d X1 t = v(t, X1 t, ν1 t )dt + σ(t)d Wt. (76)

Let PN t (x) = E[XN t |X0 = x] be a semi-group related to LN t . By direct calculations, first and second-order carrˆe du champ operators can be estimated as

Γ(t)(f, f) = [σσT ](t) f 2 E, (77)

Γ2(t)(f) = 2f 2 F f J(vt) f, (78)

tΓ(t)(f) = t[σσT ](t) f 2 E, f Dom(LN t ), (79)

where J denotes a Jacobian operator. Then, the time-inhomogeneous semigroup Pt is said to satisfy log-Sobolev inequality if Bakry Emery criterion in Eq. 80 holds for any suitable f:

Γ2(t)(f) + 1

2 tΓ(t)(f) κ(t)Γ(t)(f), (80)

Generalized Logarithmic Sobolev inequality. Under the conditions desribed in Eq. 80, Theorem 3.10 (Collet & Malrieu, 2008) ensures the existence of Φ-logarithmic Sobolev inequality.

EntΦ νt(g) c(t)Pt (Φ (g)Γ(t)(g)) , c(t) = Z t

0 exp 2 Z t

v κ(u)du dv, (81)

Mean-field Chaos Diffusion Models

where Φ : R R is a smooth convex function and the Φ-entropy is given by

EntΦ ν (f) = Z Φ(f)dµ Φ Z fdν. (82)

Define LSΦ[c(t)] with respect to c(t) in Eq. 81 as the constant related to the generalized Φ-log Sobolev inequality. Then the constant LSΦ associated with product measure is readily derived using the subsequent result:

Lemma A.6. (Stability under product) (Bakry et al., 2014) If (X N, µ1, L1,N t ) and (X N, µ2, L2,N t ) satisfy logarithmic Sobolev inequalities LSΦ[c1(t)] and LSΦ[c2(t)] respectively, then the product (X N X N, µ1 µ2, L1,N t L2,N t ) satisfies a logarithmic Sobolev inequality LSΦ[max(C1(t), C2(t))].

By the result of Lemma A.6, it is straightforward to show that N-product of Gaussian measures in forward noising process ζ N t preserve the log-Sobolev constant of its single component ζt.

Theorem A.7. (HWI inequality) (Otto & Villani, 2000) Let dν e W dx be a probability measure on X, with finite second moments, such that W C2(X), 2W κId, κ R. Then, ν satisfies the log-Sobolev inequality with constant LS(κ, ). For any other absolutely continuous measures ν0, the following inequality holds:

H(ν0|ν) W2(ν0, ν) q

2 W2 2(ν0, ν). (83)

The inequality above equally indicates that

H(ν0|e W ) H(ν1|e W ) W2(ν0, ν1) q

I(ν0|e W ) κ

2 W2 2(ν0, ν1). (84)

H, I denotes non-normalized relative entropy and relative Fisher information, respectively.

Remark. It should be emphasized that the functionals described in Theorem A.7 are presented in non-normalized forms while the N-particle entropy in Eq. 11 is defined as its normalized counterpart. This distinction in notation, while subtle, is made explicit in the context and is intentionally simplified here for brevity.

Proposition 3.4. The N-particle entropy for infinity cardinality N have upper bound as

Nd J N MF (θ, [0, T]) + κζO C2

N + C3 N 1/2 + C4 N 3/2

N 0 0. (85)

We define numerical constants C2 := C2[βT , CB, σζ(T)], C3 := C3[D, σζ(T), M, βT , mζ(T)], where each βT , CB, σζ, D, M(2, ζ0), mζ is independent on the data cardinality N.

Proof. For any fixed t [0, T], let us repurpose the stationary density of the time-varying Ornstein-Uhlenbeck process for VP SDE. m(x) e Wζ(t,x) = e PN j κζ xj Ymζ(t) 2 E, (86)

where we denote κζ = σ 2 ζ (t). Following direct calculation, one has

2Wζ(t, xj) κζId, 2W N ζ (t, x N) κζINd, (87)

Consider µt as a solution to Liouville equation in Eq. 18 for limitation N , and let µ N t be a N-product of µt. For any N |N|, the normalized variant of HWI inequality in Theorem A.7 shows the following inequality holds for any N:

Ne N := NH(µ N t |mdx N) NH(νN t |mdx N) W2(νN t , µ N t ) | {z } (A)

I(νN t |mdx N) | {z } (B)

2 W2 2(νN t , µ N t ) | {z } (A ) (88)

Mean-field Chaos Diffusion Models

We first derive the upper bounds of (B) by estimating I, which stands for the relative fisher information. Assuming s N θ Ker(G) and Eq. 39, we have

I(νN t |mdx N) := Z log ϱN t e W

Z log ϱN t 2 EdνN t +

Z Wζ 2 EdνN t

D(1 + XN t 2 E) + 4κ2 ζ E XN t 2 E + NM2(2, ζ0)m2 ζ(t)

D + 4κ2 ζ (1 + D)[NM2(2, ζ0) + Ndβ2 T + D]e D + NM2(2, ζ0)m2 ζ(t)

As a next step, we investigate the upper bound of 2-Wasserstein distance involved in (A) and (A ). First, we define two dynamics (XN t , XN t ) as

(XN t , XN t ) =

( d XN t = f N t (XN t )dt βtsθ(t, XN t , νN t )dt + βtd BN t , d Xt = f N t ( XN t )dt βtsθ(t, XN t , µ N t )dt + βtd BN t . (90)

By using Itˆo s formula and Burkholder-Davis-Gundy inequality, one can induce that

XN t XN t 2 E

0 β2 t E s N θ (t, XN t , νN t ) s N θ (t, XN t , µ N t ) 2

4T sup t [0,T ] β2 t

0 E s N θ (t, XN t , νN t ) s N θ (t, XN t , µ N t ) 2

0 E s N θ (t, XN t , νN t ) s N θ (t, XN t , νN t ) 2

0 W2 2(νN t , µ N t )dt + 2(CA CB) Z T

0 E sup s t

XN t XN t 2 E

With the fact that Lip(sθ) = Lip(s N θ ), and applying Gr onwall s Lemma gives

sup t W2 2(νN t , µ N t ) E sup t

XN t XN t 2 E

4β2 T TCB exp 8β2 T T 2(CA CB) Z T

0 W2 2(νN t , µ N t )dt

a + 4β2 T TCB exp 8β2 T (CA CB) Z T

0 sup s t W2 2(νN t , µ N t )dt

W2 2(νN t , µ N t ) e4β2 T CB.

Since a > 0 is an arbitrary positive constant. Optimization of the final term is achieved by setting a = exp(exp( 8βT (CA CB))) 1 in third inequality and we apply Gr onwall s Lemma again. By rewriting the HWI inequality in Eq. 88 and setting t = T, we have

HT (µT ) HT (νN T ) | {z } Corollary 3.2

+ e N. (93)

It is noteworthy that the first term on the right-hand side can be controlled by Corollary 3.2. The error term e N called

Mean-field Chaos Diffusion Models

cardinality errors, is determined by aggregating Eq. 92, Eq. 89, being inversely proportional to cardinality N.

N e2β2 T CB κζ

2 e2β2 T CB +

D + 4κ2 ζ h (1 + D)[NM2(2, ζ0) + Ndβ2 T + D]e D + NM2(2, ζ0)m2 ζ(T) i!

2N e4β2 T CB + 4κζ

r (1 + D)M2 + dβ2 T + M2m2 ζ(T) 1 N 1/2

+ D(4κ2 ζ + e D)

r (1 + D)M2 + dβ2 T + M2m2 ζ(T) 1 N 3/2

N + C3 N 1/2 + C4 N 3/2

N 0 0, κζ(T) := σ 2(T),

where C2 := C2[βT , CB], C3 := C3[D, M, βT , mζ(T)] and C4 := C4[D, M, βT , mζ(T)]. The proof is complete by rewriting Eq. 93 for e N computed above.

This section explicates the division of chaotic entropy into K smaller sub-problems, each with a notably low cardinality Nkk K. The foundation of the proof relies on the strategic use of the HWI inequality.

Proposition 4.1. (Subdivision of Chaotic Entropy) Let N = {Nk} be a set of strictly increasing cardinality, and T = {tk} be a partition of the interval [0, T], where k {0, . . . , K}. Under the conditions sθ Ker(G), the chaotic entropy can be split into K sub-problems.

HT (µT ) lim K

C2 Nk+1 + C3 N 1/2 k+1 + C4 N 3/2 k+1

k JMF (Nk, θ, [tk, tk+1]) . (95)

The damping ratio b N+, Nk+1 = b Nk controls the influence of each sub-problem.

Proof. Let us specify the cardinality set as N = {Nk; Nk+1 = b Nk, k {1, K}, b N+}, where we set sup N = N.

H(νNk+1 t |mdx Nk+1) b H(νNk t |mdx N k ) e Nk+1 (96)

Note that the N-particle relative entropy for measure product can be decomposed b copy of the original measure.

H(νNk+1 t ) = H([νNk t ] b)

= Z log[ϱNk t ] b(x Nk+1)d[νNk t ] b(x Nk+1) Z log ζ Nk+1 T t (x Nk+1)d[νNk t ] b(x Nk+1)

= b H([νNk t ])

The equality can be easily seen by showing that

Z log[ϱNk t ] b(x Nk+1) = Z

i=1 log ϱNk+1 t (πi Nkx Nk+1)

d[νNk t ] b(x Nk+1)

X Nk log ϱNk t dνNk t (x Nk),

and the log-probability with the projected component can be calculated as

i=1 log ζ Nk T t (πi Nkx Nk+1)

d[νNk t ] b(x Nk+1) = b Z

X Nk log ζ Nk T t (x Nk)dνNk t (x Nk). (99)

Mean-field Chaos Diffusion Models

The above calculations are correct for any subsequent elements Nk < Nk+1 T and πi Nkx Nk+1 = (xib, , x(i+1)b) X Nk. By rewriting Eq. 96, we have

H(Nk, t, νNk t |mdx Nk) 1

b H(Nk+1, t, νNk+1 t |mdx Nk+1) e Nk+1 (100)

Let tk tk+1 be subsequent elements in the partition T . Combining Eq. 96 and Eq. 70, we can show the following:

H(N0, t0, νN0 t0 |ζ N0 0 ) 1

b H(N1, t0, νN1 t0 |ζ N1 0 ) e Nk+1 Eq. 96

b H(N1, t1, νN1 t1 |ζ N1 0 ) M2(2, ζ0) d N1

t0 Gt 2,νN1 t1 W 1,2 dt e N1. Eq. 70

Note that the Sobolev norm is taken to the law of temporal marginals for Cauchy sequence (Xk,N ( ) )(N) at timestamp

t = tk+1 with cardinality condition N = Nk+1, i.e., νNk+1 tk+1 . Given the fact t NK = T, one can show the recursion until reaching the target cardinality Nk NK.

H(N0, 0, νN0 0 |ζ N0 0 ) | {z } =0

H(NK, T, νNK T |ζT ) M2(2, ζ0)

tk Gt 2,ν Nk+1 tk+1 W 1,2 dt (102)

We show that the left-hand side is equal to 0 by the assumption that the initial states are distributed by standard Gaussian N,

H(N, 0, νN 0 |ζ N 0 ) = H(N N[Id] | N N[Id]) = 0, N N (103)

Combining this result and rearranging the terms on both sides of Eq. 102 yields the inequality

H(νNK T |ζ NK 0 ) = H(NK, T, νNK T |ζT )

JMF (Nk, θ, [tk, tk+1]) + e Nk+1. (104)

Recall the following fact that the chaotic entropy converges as Po C is guaranteed by.

H(µT |ζ0) = lim N H(νN T |ζ N 0 ). (105)

To summarize, we have the desired result and complete the proof.

HT (µT ) M2(2, ζ0)

JMF (Nk, θ, [tk, tk+1]) + κζO

C2 Nk+1 + C3 N 1/2 k+1 + C4 N 3/2 k+1

A.6. Comparison of Variational Equations

With the definition of the non-normalized relative entropy H, we derive the variational equation to the temporal derivative. Let ϱt, ζt be density representations of forward and reverse diffusion dynamics of FR-SDEs. Taking a temporal derivative (i.e., t) gives the following equality:

H(ρ0|ζT ) = Z T

0 t H(ρt|ζT t)dt + H(ρT |ζ0). (107)

By rearranging both terms above and using the divergence theorem, one can obtain the closed-form of relative entropy as

H(ν0|ζT ) = σ2

0 EYt ρtdx h log ρt log ζt 2i dt, VP SDE, (Song et al., 2021b). (108)

Mean-field Chaos Diffusion Models

On the other hand, the proposed Wasserstein variational equation gives the inequality as

H(νN 0 |ζ N T ) M

log ϱN t log ζ N T t W dt, MF-CDMs. (109)

Given the definition above, we detail three notable differences here:

Impact of Cardinality N. In contrast to conventional score-matching objectives which are incapable of revealing the impact of data cardinality, our score-matching formula in Eq. 109 derived from Wasserstein variational equation explicitly shows the detailed association of particle counts.

Cardinality Adaptive Discrepancy. As can be seen, existing approaches in Eq. 108 based on temporal derivative overlook the influence of data dimensionality in the estimation of discrepancy. In contrast, the proposed new variational equation based on the Itˆo-Wentzell-Lions (known as It o s flow of measures) formula in Eq. 109, effectively cancels the dimensionality effect. Moreover, the proposed parameterization of the score function endowed with a reducible structure outlined in the preceding section provides clarity on the architecture s scalability for an increasing N, contrasting with the heuristic model choices prevalent in existing architecture modeling.

Higher-order Information. As a result of the geometric deviation induced by Itˆo-Wentzell-Lions formula, our methodology adopts the Sobolev norm on W 1,2. It additionally compares the second derivatives of score functions, applying more stringent constraints to achieve a higher level of accuracy in estimating the discrepancy. Meanwhile, the computational overhead remains minimal, as the Hessian of the log-probability exhibits utmost constant complexity. i.e., 2ζ N T t σ 2 ζ O(1). This simplicity in computation ensures efficiency in practical applications.

A.7. Particle Branching and Monge-Amp ere equation

The following result shows that the Monge-Amp ere equation sheds light on the precise way in which the optimal particle branching modifies the score function especially when the score networks solve the proposed MF-SM objective optimally.

Proposition A.8. For the optimal parameter profiles θ = θ solving the proposed MF-SM objecitve, then we have

log ϱ Nk t (x Nk) =

( log ϱNk,b Nk t (Φθ )(x Nk) + logdet(J Φθ )(x Nk), log ϱc Nk,b Nk t (x Nk), 1 < c b N+. (110)

For the affine transforms Φθ(x) = Fθx + eθ with any neural parameters Fθ GL(d) and eθ Rd, the gradient of log-determinant vanishes (i.e., logdet(J Φ) = 0) almost every where [νNk t ].

Proof. Assume the scenario that the branching ratio is b = 2, where the number of particle is doubled after branching. Considering the necessity for the push-forward mapping to be optimal, in the case of optimal parameter θ , which solves the problem (P2), one has a representation as follows for arbitrary M, N satisfying Nk < b Nk N.

(Idb 1 Ψθ )#νNk t = ν b Nk t . (111)

Following by Brenier s theorem on optimal transport mapping, there exists a convex ϕ such that ϕ optimally transports νM t to ζ M t , i.e., ( ϕ)#νNk t = νNk,b Nk t . On the other hand, the optimal particle branching function needs to assure the following equality: Φθ # νNk t = νNk,b Nk t , (Φθ ) 1 # νNk,b Nk t = νNk t . (112)

Whenever we can specify Φθ # = ϕ almost everywhere, we have the second-order partial differential equation, so-called Monge-Amp ere equation as following:

ϱ Nk t = ϱNk,b Nk t (Φθ )det(J Φθ ), log ϱ Nk t (x Nk) = log ϱNk,b Nk t (Φθ ) + logdet(J Φθ ) (113)

The result is a restatement of the above equality. For the affine transformation Φθ(x) = Fθx + eθ with neural parameters Fθ GL(d) and eθ Rd, it is trivial that logdet > 0 is a positive constant and the result follows.

Mean-field Chaos Diffusion Models

A.8. Chaotic Convergence of d WGFs

This section provides comprehensive proofs for two concentration results presented in Sec 4.1.

Theorem 4.2. (Concentration of Chaotic Dynamics) For the constant f κ dependent on the Log-Sobolev constant κ and h R dependent on radius of convolution, we have

P h H(νM,N t |µ M t ) ε i O(ε ϵ d) O exp Mf(κ)ε2 Mf(κ)h(R) . (114)

Remark. Since the proof is a direct modification of results in (Bolley et al., 2007; Bolley, 2010), for the sake of simplicity, we only provide the modified descriptions, where the details can be found in the literature.

Proof. We first assume N M and define the deviation between two vector-fields for Nand M-particle systems: X M δVt := s M,N θ (t, , νN t ) sθ(t, , νM t ), where s M,N θ denotes the first M-components of sθ among N components. By Girsanov theorem (Øksendal & Øksendal, 2003) and induced exchangeability due to the fact that sθ is reducible, the Radon-Nikodym derivative can be represented as

dνM t dνM,N t = exp

[0,t] δV i s d Ws 1 2σ2 t

δV i s 2 Eds

, 1 i M, (115)

where δV i t is the i-th component of δVt, W( ) is νM,N [0,T ] -adapted Brownian motion, and thus dνN t /dνM t is νM,N t -martingale.

Assuming (2σt) 1 Dσ for numerical constant Dσ, the definition of normalized entropy gives following:

H(νM t |νM,N t ) = 1

" dνM t dνM,N t

[0,T ] EνM t

[0,T ] EνM t

[0,T ] EνM t

h δVt 2 E i ds Dσ sup t [0,T ] EνM t

h δVt 2 E i ,

where the last equality is induced by the exchangeability of the system. Let us define two empirical projections ˆνM,N t as ˆνM t follows:

ˆνM,N t := 1

m δXm,N t , ˆνM t := 1

m δXm t . (117)

For the d-dimensional Euclidean ball Bx R = B(x, R) of radius R centered at x, we consider the truncated measures as follows:

Xj,R t νj,N,R t (dx) := χBXj,M R νj,N t (dx)

νj,N t [BXj,M R ] , Yi,R t νi,M,R t (dy) := χBXj,M R νi,M t (dy)

νi,M t [BXj,M R ] , i = j M N. (118)

and we define an empirical measure for truncated representations above:

ˆνM,N,R t := 1

j δXj,R t , ˆνM,R t := 1

i δYi,R t . (119)

Next, our objective is to demonstrate the probability inequality concerning the Euclidean norm of the deviation δVt for any given t [0, T]:

P h DσEνM t δVt 2 E ε i P DσEνM t

[Bθ B νM,N t ] [Bθ B νM t ] 2

[Bθ B ˆνM,N t ](Xl,M t ) [Bθ B ˆνM t ](Xl,M t ) 2

P Cσ B sup l W2 2(ˆνM,N,R t , ˆνM,R t )|l ε = P h Cσ BW2 2(ˆνM,N,R t , ˆνM,R t )|l= l ε i ,

Mean-field Chaos Diffusion Models

where we define the index l that gives the maximal Wasserstein distance and scale the constant Cσ B = DσCB. It is worth noting that the term in the last line of Eq 120 contains randomness since these two representations νM,N,R t and νM,R t are empirical projections defined in the space P(P2(Rd)). Setting ε = ε(Cσ B) 1, there exist constants α0, α1, α2 > 0 such that the following can be obtained by triangle inequality of 2-Wasserstein distance and the Lipschitzness assumption on (H2).

P h Cσ BW2 2(ˆνM,N,R t , ˆνM,R t )| l ε i

P h W2 2(ˆνM,N,R t , νj,N,R t ) + W2 2(νj,N,R t , νj,N t ) + W2 2(νj,N t , νi,M t ) + W2 2(νi,M t , νi,M,R t ) + W2 2(νi,M,R t , ˆνM,R t ) ε i

P W2 2(ˆνM,N,R t , νj,N,R t ) + W2 2(νi,M,R t , ˆνM,R t ) ε 4(|Ex Ey|)R2 exp α0R2 4α1 exp(α2)

P h W2 2(ˆνM,N,R t , νj,N,R t ) ε a0 2(|Ex Ey|)R2 exp α0R2 4α1 exp(α2) i

+ P h W2 2(νi,M,R t , ˆνM,R t ) ε (1 a0) 2(|Ex Ey|)R2 exp α0R2 i ,

where we define a bounded second moment of empirical measures as follows:

Ex := Ex νM,N t exp a0||x||2 E < , Ey := Ey νM t exp a0||y||2 E < . (122)

Note that (H6) assures the boundness of the above terms. Following the analogous calculation in prior proofs with the Lipschitz constraints of sθ, invoking the Burkholder-Davis-Gundy inequality leads to the following result, where those constants α1 and α2 are dependent on CA, CB and βt.

W2 2(νj,N t , νi,M t ) sup t E Xj,N t Xi,M t 2

E α1(βt, CA, CB, Dσ) exp(α2(βt, CA, CB, Dσ)T). (123)

Consider the compact subset lying in Polish space BR X and its corresponding probability space A P(BR). Exercise 6.2.19 (Dembo & Zeitouni, 2009) has shown that the following probability inequality holds;

P[ˆνM,R t A1] M(A1, δ ) exp

M inf νA A1 δ H(νA|νi,M,R t )

where M(A1, δ ) stands for the metric entropy, referring to the smallest number of δ -Wasserstein balls (for the W2 metric) that are necessary to cover the subset A. Similarly, we have

P[ˆνM,N,R t A2] M(A2, δ ) exp

M inf νA A2 δ H(νA|νj,N,R t )

, j M (125)

For the purpose of deriving the upper bound of Wasserstein distance, we specify the Wasserstein subspace A1 and A2 as

A1 = n ν P(Bxi R ); W2 2(ν, νi,M,R t ) ε (1 a0) 2(|Ex Ey|)R2 exp α0R2 o P(Bxi R ), (126)

A2 = n ν P(Bxj R ); W2 2(ν, νj,N,R t ) ε a0 2(|Ex Ey|)R2 exp α0R2 4α1 exp(α2) o P(Bxj R ), (127)

A1 δ = ν P(Bxi R ); W2 2(A1, ν) δ , A2 δ = ν P(Bxj R ); W2 2(A2, ν) δ , (128)

where {Aa δ }a=1,2 stands for the δ -thickening of Aa δ . We cover the subspace A with Wasserstein balls of radius δ /2 in W2 metric. As the probability measure νi,N t also satisfies Talagrand s inequality with the same constant as νj,M t , we take infimum on Aa δ to derive

H(ν|νi,M,R t ) κ(t, θ)

2 W2 2(ν, νi,M,R t ) α3R2 exp α0R2

2 ε (1 a0) 2(|Ex Ey|)R2 exp α0R2 δ 0 2 α3R2 exp α0R2 . (129)

Mean-field Chaos Diffusion Models

To get a last line, we first show that there exist constants c2, c3 depending on c0, c1 such that the following inequality holds for the arbitrary c0, c1 R: (c0x + c1y)2 0 (x y)2 (1 c2)x2 c3y2. (130)

Following with above relation with setting δ = α3ε

κ(t, θ)(1 a4)(1 a0 α3)2(ε )2/2 κ(t, θ)a5R4 exp 2α0R2

2 ε (1 a0) 2(|Ex Ey|)R2 exp α0R2 δ 0 2 , (131)

Assuming ln 1/R2 /R α0, and rescaling numerical terms, we have

H(ν|νi,M,R t ) κ(t, θ)a0(ε )2 κ(t, θ)α3R4 exp 2α0R2 . (132)

Since νj,N t enjoys an identical constant for Talagrand s inequality compared to νi,N t , the lower-bound of H(ν|νj,R t ) for the subset A2 can be obtained:

H(ν|νi,M,R t ) κ(t, θ)

2 ε a0 2(|Ex Ey|)R2 exp α0R2 4α1 exp(α2) δ 0 2 α3R2 exp α0R2 . (133) As similar to above, we apply the inequality in Eq. 130 twice to get constants a5, a6 such that following relation holds:

κ(t, θ)(1 a5)(a0 α3)2(ε )2/2 κ(t, θ)a6 R4 exp 2α0R2 + exp 2α2 + R2 exp α0R2 + α2

2 ε a0 2(|Ex Ey|)R2 exp α0R2 4α1 exp(α2) δ 0 2 . (134)

For some α 3 and α 2, we rescale numerical constants in the inequality to have:

H(ν|νi,M,R t ) κ(t, θ)a1ε2 κ(t, θ)α 3R4 exp 2α0R2 α 2. (135)

Following by the Theorem A.1. (Bolley, 2010), the metric entropy for the subset A1 can be bounded for some numerical constants b0

M(A1, δ ) M(P2(Bxi R ), δ ) b0R

O(ε ϵ d), (136)

where we set the radius of Wasserstein ball δ = α3ε . By collecting Eq. 135 and Eq. 136, we have

P h W2 2(ˆνM,N,R t , νj,N,R t ) ε a0 2(|Ex Ey|)R2 exp α0R2 4α1 exp(α2) i (137)

exp Mκ(t, θ)a1ϵ2 Mκ(t, θ)α 3R4 exp 2α0R2 α 2 (138)

O(ε ϵ d)O(exp Mf ε2 ). (139)

With a similar calculation as done above, one can obtain

P h W2 2(νi,M,R t , ˆνM,R t ) ε (1 a0) 2(|Ex Ey|)R2 exp α0R2 i

O(ε ϵ d)O(exp Mf ε2 + R4 exp R2 ). (140)

Combining Eq. 137 and Eq. 140 with Eq. 121 gives the desired outcome:

P h Cσ BW2 2(ˆνM,N,R t , ˆνM,R t )| l ε i O(ε ϵ d) O exp Mfε2 Mfh(R) . (141)

Given the fact that the above relation holds for all t [0, T = 1], the proof is complete as we take the limitation with N .

Mean-field Chaos Diffusion Models

Theorem 4.3. (Concentration of MF-SM). Let XN t be a solution to MF-SDE (for d WGFs) for the set of particles. Then, for any ϵ (0, 1), the following is true:

P Et F(XN t ) JMF (N = 1, θ, µ[0,T ]) ε exp

N C6 q 1 + N ( q+4)/2q 2!

where f := f(κ) = supt [0,T ][c(t, θ) κ(t, θ)], and the log-Sobolev constant of time-inhomogeneous dynamics c : [0, T] Θ R is defined as

c(t, θ) = Z t

0 exp 2 Z t

v κ(u, θ)du dv, κ(t, θ) =

2 + βt σ2 ζ(t) for θ = θ ,

2 + γA + γB for θ = θ . (143)

The neural parameter θ of score networks ensures vanishing N-particle relative entropy HνN t T |θ=θ = 0 for all t [0, T]. In other means, it follows that sθ = ζN T t almost surely [νN [0,T ]].

Remark. Note that in the main manuscript, we omitted the curvature effect by replacing q

supt [0,T ] c(t, θ) , K(κ) to only emphasize the connection towards HWI inequality in the estimation of MF-SM. However, the full description specifies the explicit effect of the Bakry Emery curvature condition, showing that the designing factor of VP-SDE (e.g., β0, β1) controls convergent behavior of our N-particle system towards mean-field limit µt.

Proof. We provide an analysis of adapting the VP SDE (Song et al., 2021c) to an N-particle mean-field system. Through the adoption of VP SDE, the original drift term f N t in our denoising WGF is characterized by substituting with the corresponding drift term in MF-VP SDE, i.e., h βt x N 2 E/4 i = f N t . Hence, the vector fields V of potential V for N-particle system can be represented as follows:

V N(t, x N) =

" βt x N 2 E 4

βt log ζ N T t(x N), for θ = θ , (144)

V N(t, x N, νN t ) =

" βt x N 2 E 4

A(t, x N) [B BR νN t ](x N), for θ = θ . (145)

It is noteworthy that θ is the parameter profile that can be obtained from perfect score matching where the proposed score networks optimally approximate the score function, i.e., sθ = log ζT t. The constant βt = βmin + t(βmax βmin) is defined as a linear function on t for the pre-defined fixed hyperparameters (βmin, βmax). Note that βt is non-decreasing over t and supt [0,T ] βt = βT .

Recall that

ζt := N(mζ(t)Y; σ2 ζ(t)Id), log ζ N t (x N) = 1 σ2 ζ(t)(x N mζ(t)YN). (146)

where YN ζ N 0 stands for the N-copies of target data instance and the scalar mean and variance are given as

mζ(t) = e 1

2 R t 0 βsds, σ2 ζ(t) = 1 e R t 0 βsds. (147)

Taking Hessian operator to V 1, we have

κ(t, θ) = J( V 1) = 2V 1 =

2 + βt σ2 ζ(T t) for θ = θ ,

2 + γA + γB for θ = θ . (148)

Mean-field Chaos Diffusion Models

Following by Eq. 77, we compute the carrˆe du champ operators as

Γ(t)(f, f) = βt f 2 E, (149)

( 2f 2 F ( βt/2 + γA + γB) f 2 E, for θ = θ 2f 2 F (βt/2 + βt/σ2 ζ(T t)) f 2 E, for θ = θ , (150)

tΓ(t)(f) = tβt f 2 E βmax f 2 E. (151)

Recall the Bakry Emery criterion in Eq. 80:

Γ2(t)(f) + 1

2 tΓ(t)(f) κ(t)Γ(t)(f). (152)

Utilizing the estimations from Eq. 149 to Eq. 151 gives

2βmax 2f 2 F κ(t, θ) 2f 2 F . (153)

This concludes that κ(t, θ) = (βt/2 + γA + γB) if θ = θ and κ(t) = βt(1/2 + 1/σ2 ζ(t)) if θ = θ .

Once we determine the curvature estimation for time t and θ, the next step is to derive concentration inequality from Φ-log Sobolev inequality. Let PN, t be the dual semi-group of PN t for the N-particle denoising MF-SDEs, which can be represented as XN t νN t = PN, t dζ N T t. (154)

For the action of dual semigroup onto ζ N T t, Φ-log Sobolev inequality in Eq. 81 can be modified as

EntΦ PN, t dζT t(g) c(t)Pt (Φ (g)Γ(t)(g)) , c(t) = Z t

0 exp 2 Z t

v κ(u)du dv. (155)

Setting Φ(g) = g2 and g = f 2 = exp(u F) with the function Ft := Gt 2 E + Gt 2 F (Ft : X N R) that haves Lipschitz constant Lip(F), we obtain

EntΦ νt(g) 2c(t)EνN t [Γ(t)(g)] 2 sup t [βtc(t)]EνN t [ g 2 E]. (156)

By definition of Γ, we have Γ(t)(g) = X

βt ig jg = βt g 2 E, (157)

Replacing g = f 2 gives EntΦ νN t (f 2) 2 sup t [βtc(t)]EνN t [ f 2 2 E] (158)

To estimate the right-hand side, we show that

EνN t [ f 2 2 E] = EνN t

4 F 2 Eeu F i u2

4 Lip2(F)EνN t [f 2]. (159)

On the other hand, the Φ-entropy with respect to measure νt can be directly calculated as

EntΦ νN t (f 2) = u FEνN t [f 2] EνN t [f 2] log EνN t [f 2] sup t [βtc(t)]u2

2 Lip2(F)EνN t [f 2], (160)

where the right-hand side is induced from Eq. 159. Now, we consider log-expectation to extract the expectation of F in the summation.

1 u log EνN t [f 2] = EνN t [F] + Z u

u EνN t [f 2] du EνN t [F] + u supt[βtc(t)]Lip2(F)

Mean-field Chaos Diffusion Models

The inequality comes from the fact that

u EνN t [f 2] supt[βtc(t)]Lip2(F)

2 βT supt[c(t)]Lip2(F)

We multiply u and subsequently take exponential on both sides of Eq. 161, and the exponential inequality follows.

EνN t [exp(u F)] exp u EνN t [F] + βT sup t [c(t)]Lip2(F)/2 . (163)

As a direct application of Chebyshev s inequality, we see that

P |F(XN t ) EνN t F(XN t )| ε 2 exp uε + βT sup t [c(t)]Lip2(F)ε2/2 . (164)

By selecting an optimal variable u, we finally have

P |F(XN t ) EνN t F(XN t )| ε 2 exp ε2

2βT supt c(t)Lip2(F)

Given that the particles are exchangeable by the result of Proposition A.2, one can demonstrate that with a probability of at least 1 ε, we have

|F(XN t ) EνN t F(XN t )| r

2βT sup t c(t)L2 log(2/ε), (166)

for any 1 j N and F Lip(L, X N). Let us decompose F into reducible components as F(XN t ) = (1/N) PN i F(Xi,N t ). Since one can see that L = (1/

N)Lip( F), exchangeability of particles gives

i F(Xi,N t )

2 exp ε2N 2βT supt c(t)Lip2( F)

, j N. (167)

Note that the reducibility of score networks assures that F(XN t ) := F(XN t , νN t ) = Gt(XN t , νN t ) 2 E + JGt(XN t , νN t ) 2 F and F(Xi,N t ) := F(Xi,N t , ˆνN t ) = Gt(Xi,N t , ˆνN t ) 2

E + JGt(Xi,N t , ˆνN t ) 2

F with relation F(XN t ) = (1/N) PN i F(X1,N t ).

Given the definition of canonical projection πi N(x N) = xi, we define an empirical measure as ˆνN t (dx) := 1

N PN i δπi NXN t . Then, the triangle inequality naturally gives the following results: EˆνN t F( , ˆνN t ) Eµt F( , µt) EˆνN t F( , ˆνN t ) Eµt F( , ˆνN t ) + Eµt F( , ˆνN t ) Eµt F( , µt)

Lip( F)W2(ˆνN t , µt) + 4d(γ B)2W2 2(ˆνN t , µt)

C 4d(γ B)2 + Lip( F) s 1 N 1/2 + 1 N (q 2)/q

where the second inequality is induced from the fact that

| F|ˆνN t , ˆνN t F|ˆνN t , µt | Lip( F) sup F /Lip

| F, ˆνN t F, µt |

Lip( F)W1(ˆνN t , µt) Lip( F)W2 2(ˆνN t , µt),

and one can calculate the bounded Jacobian of score networks as Jx sθ(t, Xt, µt) sθ(t, Xt, ˆνN t ) 2 F 2 γ BId 2 F = 2d(γ B)2, Xt µt, (170)

The asymptotic upper-bound in the last line of Eq. 168 can be derived from the result explored in Theorem 1 (Fournier & Guillin, 2015) associated with numerical constant C . By combining the results, we finally have

P Et F(XN t ) Et,µt F( Xt) ε 2 exp

N C 4d(γ B)2 + L q 1 + N ( q+4)/2q i2

2βT supt c(t)L2

Mean-field Chaos Diffusion Models

Since the expectation of F with respect to measure µt can be represented as squared W 1,2-Sobolev norm, i.e., EνN t F = Gt 2 W . By rephrasing the result above with numerical constants C5 = C 4d(γ B)2 + L , C6 = 2βT L2 and f(κ) := supt[c(t) κ(t, θ)], we bring the proof to completion, revealing the concentration property of our mean-field score matching objective.

Table 4. Hyperparameters according to cardinality in data instances.

Hyperparameters N = 103 N = 104 N = 2 104 N = 105

Learning Rate 1.0e 3 1.0e 4

(VP SDE) σ2 t = βt, βt = βmin + t(βmax βmin), βmax = 20.0, βmin = 0.1 (Diffusion Steps) K {1, , 300}, |K| = 300 (Branching Ratio) b 2 (Branching Steps) K {100, 200} {50, 100, 150, 200} {50, 100, 150, 200, 250} (Initial Cardinality) {N0} 250 625 1250 3125 (Interaction Degree) k 10 3 3 3

A.9. Implementation Details, Training and Sampling of MF-CDMs

Hyperparameters. Across all experiments, our MF-CDMs are configured to perform a total of 300 diffusion steps (|K| = 300) in the denoising path. This includes particle branching at selected sub-steps within the subset K K, adhering to a branching ratio of b = 2. The radius R of the convolution is determined by the average distance between each particle and its proximate k interacting particles, calculated at every iteration during the training process. In the inference time, we utilized the radius calculated latest training iteration. Table 4 summarizes detailed specifications of hyperparameters.

Figure 7. Additional Qualitative Results on Med Shape Net Dataset. We display reconstructed 3D shapes Spine L3 vertebra and Colon in Med Shape Net dataset which comprise 2.0e+3 points.

Example: Sampling of MF-CDMs on Med Shape Net. In the experiments targeting a cardinality of 2.0e+3 on Med Shape Net, we initiate by simulating denoising particle paths starting from a lower cardinality of N0 = 1.25e+3, proceeding until the first branching steps at {50} K /K . In the branching step, we apply a point branching function to the simulated particles, which increases to twice the number of particle profiles, N100 K = 2.5e+3. The following diagram provides an overview of how the branching operation increases cardinality during the denoising process:

N0 N49 K/K | {z } Card: 1.25e+3

Branching Φ N50 K N99 K/K | {z } Card: 2.5e+3

Branching Φ N100 K N149 | {z } Card: 5.0e+3

Branching Φ N150 K N199 K/K | {z } Card:1.0e+4

Branching Φ N200 K N299 K/K | {z } Card:2.0e+4 . (172)

Sec A.9.2 provides a detailed algorithmic procedure.

Mean-field Chaos Diffusion Models

Datasets. This paper utilizes Shape Net, a widely recognized dataset comprising a vast collection of 3D object models across multiple categories, and Med Shape Net, a curated collection of medical shape data designed for advanced imaging analysis.

1. Shape Net. (Chang et al., 2015) We adhered to the standard protocol suggested by (Yang et al., 2019) for preprocessing (e.g., random shuffling, normalization) point-sets from 3D shapes, but adjusted the number of points to 10, 000, which is approximately five times larger than the standard setup. All categories were utilized in our experiments.

2. Med Shape Net. (Li et al., 2023) This dataset contains nearly 100, 000 medical shapes, including bones, organs, vessels, muscles, etc., as well as surgical instruments. Our data preprocessing pipeline involves randomizing the arrangement of nodes and selecting a subset of 20, 000 points to form a standardized 3D point cloud. Considering the segmentation of each organ shape into smaller and incomplete parts in the dataset, we focused on utilizing only 1, 000 fully aggregated instances within the dataset. We applied uniform normalization and resized each shape to align within a predefined cubic space of [ 1, 1]3 R3, facilitating comparative and computational analyses.

Neural Network Architectures. In the experiment on a synthetic dataset, we utilized the similar architecture suggested in DPM (Luo & Hu, 2021) for both functions Aθ and Bθ. In modeling mean-field interaction, we incorporated a local particle association module, akin to the one used in DCGNN (Wang et al., 2019b). This module dynamically pools particles with close geometric proximity during the inference. All experiments were conducted using a setup of 4 NVIDIA A100 GPUs.

A.9.1. TRAINING MEAN-FIELD CHAOTIC DIFFUSION MODELS

This section aims to present the algorithmic implementation of mean-field score matching and training procedure with objective (P3). We train our score networks based on a mean-field score objective, incorporating the Sobolev norm and reducible network structures. The training procedure is comprehensively outlined in the following three steps.

Step I Initialization. Consider an index set K = {0, , K} for the discrete simulation of SDEs, and its subset K K for particle branching steps. This operation is selectively applied to steps k K out of the entire sequence of diffusion steps, K. For simplicity, let us denote ζt := N(mζ(t), σ2 ζ(t)Id), where Gaussian parameters are selected from Appendix C (Song et al., 2021a). Then, we sample B i.i.d particles having a form of

νtk Yb tk =

( Yb,Nk tk ζ Nk tk , k K , Yb,Nk+1 tk (Id b 1 Ψθ)#[ζ Nk tk ], k K \ K . (173)

Consequently, the cardinality of particles changes with each diffusion step k. Specifically, if k belongs to the set for particle branching, Card(νtk) = Nk+1; Otherwise, it remains at Card(νtk) = Nk.

Step II Estimation of Sobolev Norm. We first define the discretization of progressively measurable process Gθ t with respect to Yb tk and its Jacobian as follows:

Gθ tk = s Card(νtk ) θ (tk, Yb tk, νtk) log ζ Card(νtk ) T tk (Yb tk) (174)

J Gθ tk = J s Card(νtk ) θ (tk, Yb tk, νtk) 2 log ζ Card(νtk ) T tk (Yb tk), (175)

where each term A Nk θ and Bθ for score networks s Nk θ is estimated by the Table A.9.2, Step II. Note that Card(νtk) denotes the cardinality of sampled particles.

Step III Update Network Parameters. For the calculated estimations above, we update the networks by MF-SM with respect to the subdivision of chaotic entropy, (P3) in Eq. 28:

θ θ θ 1 B|K|

bk E h Gθ tk 2

E + J Gθ tk 2

Mean-field Chaos Diffusion Models

A.9.2. SAMPLING SCHEME FOR MEAN-FIELD CHAOS DIFFUSION MODELS

To sample the denoising dynamics, this work proposes a modified Euler scheme, adapted for mean-field interacting particle systems (Bossy & Talay, 1997; dos Reis et al., 2022), and approximate the stochastic differential equations in the mean-field limit. The proposed scheme involves a four-step sampling procedure.

Step I Initialization. Consider an index set K = {0, , K} for the discrete simulation of SDEs, and its subset K K for particle branching steps. In the initial step k = 0, the probability measure ϱt N0 0 dx Nk is set to N0-product of standard Gaussian density, i.e., N N0(IN0d). For the steps k > 0, we sample i.i.d B particles from the branched probability measure obtained in the previous step: {Xb,Nk tk }b B ϱNk tk dx Nk.

Step II Estimation of Vector fields. Given sampled (Nkd)-dimensional B vectors in the previous step, we estimate the vector fields in this step. Recall that the vector fields are given as V N(t, x, νN t ; θ) := f N t (x) σ2 t sθ(t, x, νt). Given the definition of MF VP-SDE where (βmax, βmin) = (20, 0.1), we have

f Nk t (Xb,Nk t ) = βt

2 Xb,Nk t , βt = βmin + t(βmax βmin), b B. (177)

To estimate Aθ, we adhere to the definition of a reducible architecture explored in Sec A.3, namely, the concatenation of equi-weighted, identical networks.

A Nk θ (tk, Xb,Nk tk ) = 1

Nk [Aθ(tk, Xb,1,Nk tk ), , Aθ(tk, Xb,Nk,Nk tk )]T X Nk. (178)

The mean-field interaction is formally redefined in the following manner: it involves the projection of the probability measure as πi #νNk t = νi,Nk t {Xb,i,Nk tk }b B:

[Bθ νi BR](Xb,Nk tk ) Nk = 1

h [B π1 #νNk tk ], , [B π# Nk tk ] i T (Xb,Nk tk ). (179)

With the finite cut-off radius R, we consider Euclidean balls to define truncated convolution:

BR := B x=X b,i,Nk tk R = n y; d2 E(y, Xb,i,Nk tk ) R o . (180)

Given definition above, each component in Eq. 179 is given by

[Bθ νi BR](Xb,i,Nk tk ) 1 Nk 1

BR Bθ(Xb,i,Nk tk Xb,j,Nk tk )νj,Nk tk (d Xb,i,Nk tk ). (181)

Step III Applying Euler Schemes. Having collected estimated terms from the previous step, we apply the Euler scheme to have particle simulation of d WGFs accordingly.

Xb,Nk tk+1 = Xb,Nk tk + V N(t, Xb,Nk tk , νNk tk ; θ) t + p

βt t BNk tk , b B, (182)

where t BNk tk := BNk tk BNk tk 1 N[ t Id Nk].

Step IV Particle Branching. In the final step, we apply the particle branching operation to enhance the cardinality. This operation is selectively applied to steps k K out of the entire sequence of diffusion steps, K.

[XB, (b 1)Nk tk , Ψθ(XB,Nk tk )] XB,Nk+1 tk , (Id b 1 Ψθ Nk+1)#[ϱNk tk ] ϱNk+1 tk dx Nk+1. (183)

When branching particles, the cardinality grows as Nk+1 = b Nk, and the entire sampling scheme is repeated until reaching the final step k K.