# equivariant_diffusion_for_crystal_structure_prediction__e2fa75a4.pdf

Equivariant Diffusion for Crystal Structure Prediction

Peijia Lin 1 Pin Chen 1 2 Rui Jiao 3 4 Qing Mo 2 Jianhuan Cen 1 Wenbing Huang 5 6 Yang Liu 3 4 Dan Huang 1

Yutong Lu 1 2

In addressing the challenge of Crystal Structure Prediction (CSP), symmetry-aware deep learning models, particularly diffusion models, have been extensively studied, which treat CSP as a conditional generation task. However, ensuring permutation, rotation, and periodic translation equivariance during diffusion process remains incompletely addressed. In this work, we propose Equi CSP, a novel equivariant diffusion-based generative model. We not only address the overlooked issue of lattice permutation equivariance in existing models, but also develop a unique noising algorithm that rigorously maintains periodic translation equivariance throughout both training and inference processes. Our experiments indicate that Equi CSP significantly surpasses existing models in terms of generating accurate structures and demonstrates faster convergence during the training process. Code is available at https: //github.com/Emperor Jia/Equi CSP.

1. Introduction

Crystal structure prediction (CSP) seeks the atomic arrangement with the lowest energy for given chemical compositions and conditions (Desiraju, 2002), focusing on finding the global minimum of the potential energy surface. This task, while conceptually straightforward, is a significant challenge in physics, chemistry, and materials science due to the complexity of the potential energy landscape and the

1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2National Supercomputer Center in Guangzhou, China 3Dept. of Comp. Sci. Tech., Institute for AI, Tsinghua University, Beijing, China 4Institute for AIR, Tsinghua University, Beijing, China 5Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 6Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Yutong Lu <luyutong@mail.sysu.edu.cn>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

exponential increase in possible structures with more atoms in a unit cell (Oganov et al., 2019).

Traditional CSP methods primarily employ Density Functional Theory (DFT) (Kohn & Sham, 1965) for iterative energy calculations, integrating optimization algorithms like genetic algorithm (Oganov & Glass, 2006; Oganov et al., 2011) and particle swarm optimization (Wang et al., 2010a; 2012) to navigate the energy landscape for stable states. However, the time-intensive nature of DFT calculations makes these traditional CSP approaches notably inefficient.

Recent advances have seen a shift towards deep generative models, which learn distributions directly from datasets of stable structures (Court et al., 2020; Yang et al., 2021), with diffusion models, a subset of these models, gaining prominence in crystal generation (Xie et al., 2021; Jiao et al., 2023). Diffusion models are lauded for their superior interpretability and performance, owing to their inherent physical explainability. However, developing diffusion models for CSP involves addressing specific challenges. Physically, E(3) transformations, including translation, rotation, and reflection of crystal coordinates, do not change physical laws, necessitating E(3) invariant sample generation in the model. Typical diffusion models like denoising diffusion probabilistic models (DDPMs) (Sohl-Dickstein et al., 2015; Vignac et al., 2023) and score-based generative models with stochastic differential equations (SDEs) (Song et al., 2020), initially used in computer vision, require E(3) equivariance when adapted to molecular graph domains (Luo et al., 2021), and for crystals, additional consideration of periodic invariance is needed (Jiao et al., 2023).

In this study, we introduce Equi CSP, an equivariant1 diffusion method to address CSP. We track the impact of lattice permutation in crystals on diffusion models used for CSP and propose corresponding solutions. Equi CSP entails that during generation and training, any permutation of lattice parameters results in a corresponding equivariant transformation of the atomic fractional coordinates. Furthermore, we propose a specialized diffusion noising algorithm in

1This paper mainly focuses on lattice permutation equivariance and periodic translation equivariance, which are unique to crystallographic data. Traditional rotation equivariance is addressed using invariant representations.

Equivariant Diffusion for Crystal Structure Prediction

SDEs, meticulously designed to preserve periodic translation equivariance consistently during both training and inference stages.

To summarize, our contributions in this work are as follows:

1. To our knowledge, we are the first to address lattice permutation equivariance in diffusion models, aiming to address that any permutation of lattice parameters during training and generation corresponds to an equivalent transformation in atomic fractional coordinates, achieved through simple and efficient loss functions rather than encoding the equivariance directly into the neural networks.

2. In addition, we propose a novel diffusion noising method, named Periodic Co M-free Noising, to make the widely used Score-Matching method in SDEs achieving the periodic translation equivariance of crystal generation.

3. We validate Equi CSP s effectiveness in CSP tasks, demonstrating its superior performance over existing learning methods (e.g. CDVAE (Xie et al., 2021) and Diff CSP (Jiao et al., 2023)). Furthermore, we enhance Equi CSP for ab initio generation, proving its enhanced performance comparing to similar methods.

2. Related Works

Crystal Structure Prediction. Traditional computational methods, such as DFT combined with optimization algorithms, are used to search for local minima on the potential energy surface (Pickard & Needs, 2011; Yamashita et al., 2018; Wang et al., 2010b; Zhang et al., 2017). Despite their accuracy, these methods are computationally demanding. Recently, machine learning has emerged as an alternative, using crystal databases to predict energy more efficiently than DFT (Jacobsen et al., 2018; Podryabinkin et al., 2019; Cheng et al., 2022). Another approach employs deep generative models to directly learn stable structures, representing crystals with 3D voxels (Court et al., 2020; Hoffmann et al., 2019; Noh et al., 2019), distance matrices (Yang et al., 2021; Hu et al., 2020; 2021), or 3D coordinates (Nouira et al., 2018; Kim et al., 2020; Ren et al., 2021). However, these methods often overlook the complete symmetries in crystal structures.

Equivariant Graph Neural Networks. E(3) symmetric, geometrically equivariant Graph Neural Networks (GNNs) are effective for representing physical objects and have excelled in modeling 3D structures (Sch utt et al., 2018; Thomas et al., 2018; Fuchs et al., 2020; Satorras et al., 2021; Th olke & De Fabritiis, 2021), as evidenced in applications like the open catalyst project (Chanussot et al., 2021; Tran et al., 2022). To accommodate periodic materials, multi-graph

edge construction (Xie & Grossman, 2018; Yan et al., 2022) and Fourier transforms to fractional coordinates (Jiao et al., 2023) were proposed to represent periodicity. In our work, we utilize Fourier transforms to achieve periodic transition invariance and constrain the additional lattice permutation invariance.

Diffusion Generative Models. Rooted in non-equilibrium thermodynamics theory (Sohl-Dickstein et al., 2015), diffusion models establish a link between data and prior distributions through forward and backward Markov chains (Ho et al., 2020), achieving significant advancements in image generation (Rombach et al., 2021; Ramesh et al., 2022). When integrated with equivariant GNNs, these models efficiently generate samples from invariant distributions, proving effective in tasks such as conformation generation (Xu et al., 2021; Shi et al., 2021), ab initio molecule design (Hoogeboom et al., 2022), and protein generation (Luo et al., 2022). Diff CSP distinguishes itself by simultaneously generating lattice and atom coordinates for crystals, utilizing a periodic-E(3)-equivariant denoising model (Jiao et al., 2023). However, it has yet to fully realize E(3) equivariance based on periodic graph symmetry during its diffusion training process.

3. Preliminaries

Crystal structures. A 3D crystal structure is depicted as an endlessly repeating pattern of atoms in three-dimensional space, with the basic repeating entity known as a unit cell . This unit cell is defined by a triplet M = (A, X, L), where A = [a1, a2, . . . , an] Rh n symbolizes the one-hot encoded representations of atom types, X = [x1, x2, . . . , xn] R3 n comprises the atoms Cartesian coordinates and L = [l1, l2, l3] R3 3 represents the lattice matrix that indicates the repeating parameters of the unit cell. We represent periodic crystal structure as:

{(a i, x i)|a i = ai, x i = xi + Lk, k Z3 1}, (1)

where the j-th element of the integral vector k denotes the integral 3D translation in units of lj.

Fractional coordinate system. In crystallography, the fractional coordinate system is often used to represent the periodic nature of crystal structures (Nouira et al., 2018; Kim et al., 2020; Ren et al., 2021; Hofmann & Apostolakis, 2003). This system employs lattice vectors (l1, l2, l3) as coordinate bases, distinguishing it from the Cartesian system with its three orthogonal bases. A point in the fractional coordinate system, denoted by the vector f = [f1, f2, f3] [0, 1)3, corresponds to a Cartesian vector x = P3 i=1 fili. All atomic coordinates in a cell compose F [0, 1)3 n. This representation inherently maintains invariance to rotational and reflective transformations of the crystal structure. As described in (Mardia et al., 2000), periodic data on each

Equivariant Diffusion for Crystal Structure Prediction

lattice base can be visualized as points on a circle, measured by angle value as depicted in Figure 1 (e).

Lattice parameters In crystallography, the lattice matrix L can be converted to an invariant representations with three lattice lengths l = [l1, l2, l3] , where li = li 2, and three lattice angles ϕ = [ϕ23, ϕ13, ϕ12], where ϕij is the angle between li and lj (Hofmann & Apostolakis, 2003; Luo et al., 2023). This paper employs lattice parameters C = [l, ϕ] R3 2 instead of lattice matrix and represents the crystal by M = (A, F , C).

Task definition. The CSP task entails predicting the lattice parameters C and the fractional matrix F for each unit cell, based on its chemical composition A. Specifically, this involves learning the conditional distribution p(C, F | A).

This section initially outlines the symmetries inherent in crystal geometry, subsequently provides an overview of Equi CSP, and then introduces the joint equivariant diffusion process applied to C and F , followed by the architecture of the denoising model.

4.1. Symmetries of Crystal Structure Distribution

The primary challenge of CSP lies in capturing the distribution symmetries of crystal structures. To tackle this, we define four key symmetries as representations within the distribution p(C, F | A): composition permutation invariance, O(3) invariance, periodic translation invariance and lattice permutation invariance. Detailed definitions are provided as follows.

Definition 4.1 (Composition Permutation Invariance). For any permutation P Sn, p(C, F | A) = p(C, F P | AP ), i.e., changing the order of atoms will not change the distribution, where Sn represents the set of permutation matrices with dimensions n n.

Definition 4.2 (O(3) Invariance). Given an transformation matrix Q R3 3 where Q is any O(3) group element operated on L, the condition p(C(QL), F | A) = p(C(L), F | A) holds, indicating that the distribution remains invariant under any rotation or reflection applied to L, where C( ) is the function that translates a lattice matrix to lattice parameters.

Definition 4.3 (Lattice Permutation Invariance). For any permutation P S3, p(C, F | A) = p(P C, P F | A), i.e., changing the lattice base order will not change the distribution.

Definition 4.4 (Periodic Translation Invariance). For any translation t R3 1, p(C, w(F + t1 ) | A) = p(C, F | A), where the function w(F ) = F F [0, 1)3 n returns the fractional part of each element in F , and 1 R3 1

Figure 1. (a) (b): The lattice permutation of the lattice bases l1, l2. (c) (d): The periodic translation of the fractional coordinates f1, f2. (e) (f): The schematic diagram of the period translation represented as points on a circle. Both cases do not change the crystal structure. Here, the 2D crystal is used for better illustration.

is a vector with all elements set to one. It explains that any periodic translation of F will not change the distribution.

Composition permutation invariance in generation is effectively achieved by GNNs as the foundational architecture (Kipf & Welling, 2016). According to previous work (Jiao et al., 2023), employing the fractional system handles the O(3) invariance of crystals by ensuring O(3) invariance with respect to orthogonal transformations on the lattice matrix. Previous work (Luo et al., 2023) further address the O(3) invariance of the lattice matrix by substituting it with lattice parameters C, as C(QL) = C(L) always holds for arbitrary Q O(3). Consequently, our representation of crystals using both the fractional system and lattice parameters naturally satisfies O(3) invariance. Thus, we mainly focus on the lattice permutation and periodic translation invarance as shown in Figure 1. For better demonstration, we utilize representation method in (Mardia et al., 2000) to show the periodic translation invariance. For details on the representation of lattice bases as circles in Figure 1 (e) and (f), see Appendix B.1.

Comparing with other symmetry awareness generation method. We notice that previous approaches (Xie et al., 2021; Luo et al., 2023; Jiao et al., 2023) ignore the lattice permutation invariance, both for CSP task and for ab initio generation task. The ab initio generation method Sym Mat (Luo et al., 2023) directly generates lattice parameters C using a variational autoencoders from rand noise ϵ.

Equivariant Diffusion for Crystal Structure Prediction

Figure 2. Overview of training process in Equi CSP.

However, it doesn t guarantee that the marginal distribution satisfies p(C) = p(P C) for any P S3, which means that it doesn t guarantee the lattice permutation invariance. Diff CSP (Jiao et al., 2023) generates lattice matrix by diffusion model, however, as discussed in Section 4.3, its diffusion method lacks lattice permutation equivariance, impacting the final lattice distribution not invariant. Our method is the first to realize this symmetry of crystal structure, and we will ablate the benefit in Section 5.2.

4.2. An Overview of Equi CSP

In our work, we implement Equi CSP by concurrently diffusing the C and F within the framework of Diff CSP (Jiao et al., 2023). For a given atomic composition A, the intermediate states of C and F at any time step t (where 0 t T) are represented by Mt. Equi CSP orchestrates two distinct Markov processes: a forward diffusion that incrementally introduces noise into M0, and a backward generation process that strategically samples from the prior distribution MT to reconstruct the initial data M0. The implementation specifics are summarized in Algorithms 1 and 2.

In light of the symmetry discussed in Section 4.1, the distribution restored from MT must meet invariance. This requirement is achieved if the prior distribution p(MT ) exhibits invariance and the Markov transition p(Mt 1|Mt) is equivariant, as established in previous literature (Xu et al., 2021). An equivariant transition implies p(g Mt 1|g Mt) = p(Mt 1|Mt) for any transformation g acting on M, as defined in Definitions 4.3-4.4. Further explanations on how diffusion processes are applied to C and F are

detailed subsequently.

4.3. Diffusion on Lattice Parameters

Given that C is a continuous variable with lattice lengths l > 0 and lattice angles ϕ (0, π)3, we exploit Denoising Diffusion Probabilistic Model (DDPM) (Ho et al., 2020) with prepossessing of C to accomplish the generation. As detailed in Appendix B.2, such preprocessing projects the definition domain of C onto R3 2, and hereinafter the notation C refers to the projected lattice parameters.

4.3.1. GENERATION

We define the generation process that progressively diffuses the Normal prior p(CT ) towards stable crystal lattice distribution p(C0) by:

p(Ct 1|Mt) = N(Ct 1|µ(Mt), σ2(Mt)I), (2)

where µ(Mt) = 1 αt

Ct βt 1 αt ˆϵL(Mt, t) , σ2(Mt)=

1 αt . The denoising term ˆϵL(Mt, t) R3 2 is predicted by the neural network model ϕ(Ct, Ft, A, t) detailed in Section 4.5.

As the prior distribution p(CT ) = N(0, I) is already lattice permutation invariant, we require the generation process in Eq. (2) to be lattice permutation equivariant, which is formally stated below, and give a proof in Appendix A.1.

Proposition 4.5. The marginal distribution p(C0) by Eq. (2) is lattice permutation invariant if ˆϵL(Mt, t) is lattice permutation equivariant, namely ˆϵL(P Ct, P Ft, A, t) = P ˆϵL(Ct, Ft, A, t), P S3.

Equivariant Diffusion for Crystal Structure Prediction

4.3.2. TRAINING

We define the forward process as one that gradually diffuses C0 towards a Normal prior, represented by p(CT ) = N(0, I). This process is defined through the conditional probability q(Ct|Ct 1), which is formulated based on the initial distribution:

q(Ct|C0) = N Ct| αt C0, (1 αt)I , (3)

where βt (0, 1) controls the variance, and αt = Qt s=1 αt = Qt s=1(1 βt) is valued in accordance to the cosine scheduler (Nichol & Dhariwal, 2021).

To train the denoising model ϕ, we initiate by sampling ϵL N(0, I) and reparameterize Ct = αt C0 + 1 αtϵL based on Eq. (3). The training goal is then established by minimizing the ℓ2 loss between ϵL and its estimate ˆϵL:

LC = EϵL N(0,I),t U(1,T )[ ϵL ˆϵL(Mt, t) 2 2]. (4)

To satisfy proposition 4.5, we introduce an additional loss value as a penalty term during the training process, detailed as follows:

Lp C = E[ ˆϵL(P Ct, P Ft, A, t) P ˆϵL(Ct, Ft, A, t) 2 2], (5)

where the expectation is taken with respect t U(1, T) and P U(S3). After sufficient training, the generation process will satisfy lattice permutation invariance as stated in Proposition 4.5. Our ablation experiments demonstrate the significant performance of this method, and the learning curve in Appendix D shows that the difficulty of learning is greatly reduced compared with Diff CSP (Jiao et al., 2023).

Comparing with the Method of Encoding the Equivariance to Denoising Model. We discover that the Frame Average (FA) method (Puny et al., 2021), employing a unified, hard-constraint approach for equivariant neural networks, also satisfies Proposition 4.5. We provide implementation details in Appendix B.3. However, our experimental findings in Section 5.2 reveal that the computational burden of finite group operations required by FA renders it impractical for iterative models like diffusion models. In contrast, our method significantly enhances computational efficiency and accuracy by simply incorporating additional loss values during training.

4.4. Diffusion on Fractional Coordinates

Combining Score-Matching (SM) based framework with Wrapped Normal (WN) distribution (SMWN), as proposed in (Jiao et al., 2023), for generating fractional coordinates proves advantageous due to the periodicity and [0,1) constraint of these coordinates. Based on SMWN method, we

propose an innovative noising algorithm to meet periodic translation equivariance, detailed as follows:

4.4.1. GENERATION

In the generation process, we first initialize FT from the uniform distribution U(0, 1), which is periodic translation invariant. With the denoising term ˆϵF (Mt, t) predicted by ϕ(Ct, Ft, A, t) to model the data score Ft log p(Ft), we combine the ancestral predictor with the Langevin corrector used in Diff CSP (Jiao et al., 2023) to sample F0. Specifically, this method can be simply viewed as progressively sampling from the wrapped normal distribution p(Ft 1|Mt) at each time step t, where the mean of the wrapped normal is a function of ˆϵF , with the detailed formula provided in Eq. (29) of the Appendix A.2. To ensure that p(Ft 1|Mt) satisfies periodic translation equivariance, in accordance with (Jiao et al., 2023), ˆϵF must meet the periodic translation invariance:

ˆϵF (Ct, Ft, A, t) = ˆϵF (Ct, w(Ft + t1 ), A, t), (6)

where t R3 and the truncation function w( ) is already defined in Definition 4.4. We will ensure that the model output conforms to this property in Section 4.5 to guarantee the equivariance of generation.

Similarly, the data score Ft log p(Ft) must adhere to periodic translation invariance. The accurate estimation of this score, set as the training target for ˆϵF (Mt, t), represents the primary challenge we will tackle in Section 4.4.2.

In addition, we require the generation process to be lattice permutation equivariant, which is formally stated below, provided a proof in Appendix A.2: Proposition 4.6. The marginal distribution p(F0) is lattice permutation invariant if ˆϵF (Mt, t) is lattice permutation equivariant, namely ˆϵF (P Ct, P Ft, A, t) = P ˆϵF (Ct, Ft, A, t), P S3.

4.4.2. TRAINING

During the forward process, SMWN samples each column of ϵ R3 n from wrapped normal distribution Nw(0, σt I), and then acquire Ft = w(F0 + ϵ), where Nw(0, σ2 t I) denotes the probability density function(PDF) of WN distribution with mean 0, variance σ2 t and period 1, σt is the noise magnitude level and σ1 < σ2 < . . . σT . According to the feature of WN, if σT is sufficiently large, p(FT ) approaches a uniform distribution U(0, 1) which is desirable for generation. Our training target is:

ˆϵF (Mt, t) Ft log q(Ft). (7)

The pivotal challenge is how to accurately obtain the score matrix Ft log q(Ft) to maintain the periodic translation invariance, a feature not guaranteed by the conventional diffusion framework. For instance, Diff CSP (Jiao et al., 2023)

Equivariant Diffusion for Crystal Structure Prediction

employs the ordinary denoising score matching (Vincent, 2011) training objective to estimate the score:

LF = EF0 q(F0),Ft q(Ft|F0),t U(1,T ) λt Ft log q(Ft|F0) S 2 2 , (8)

where S is the estimate of Ft log q(Ft) and λt is the weight of loss. The core issue arises from the dataset distribution q(F0), which typically does not exhibit the same periodic translation invariance as the ground truth distribution p(F0) because the dataset usually does not contain all the samples w(F0 + t1 ). Consequently, S cannot be guaranteed to be periodic translation invariant. For illustration, consider a dataset with only one sample F0, the estimate of score will be:

S = Ft log q(Ft|F0 = F0)

= Ft log Nw(Ft| F0, σ2 t I)

= ϵ log Nw(ϵ|0, σ2 t I),

and obviously

ϵ log Nw(ϵ|0, σ2 t I)

= w(ϵ+t1 ) log Nw w(ϵ + t1 )|0, σ2 t I , (10)

which means the predicted score not invariant. Numerous studies (Luo et al., 2023; 2021; Niu et al., 2020; Jin et al., 2023) have also indicated that for ensuring equivariance, the score matrix should be determined more cautiously.

A potential solution to this issue is to augment the dataset using periodic translation operations to better align q(F0) with the invariant distribution p(F0). However, this approach demands a significant amount of training time due to the periodic translation group being a Lie group with infinitely many elements.

We propose Periodic Co M-free Noising , a new noising method that ensures the noise added to q(F0) results in periodic translation invariant score as closely aligned as possible to the score achieved by the original noise added to p(F0). The method is based on the following statements: Ft log q(Ft) is periodic translation invariant if Ft log q(Ft|F0) is periodic translation invariant:

Ft log q(Ft|F0)

= w(Ft+t1 ) log q(w(Ft + t1 )|F0), (11)

where t R3.

The noising method is equivalent to operating on Ft log q(Ft|F0). Therefore, we first focus on achieving Ft log q(Ft|F0) that meets the periodic translation invariance, followed by adjusting the score numerically for more accurate training results.

Periodic Co M-free Noising. In order to satisfy Eq.(11), we adopt a parameterization scheme for Ft log q(Ft|F0) as follows:

Ft log q(Ft|F0) = F log Nw( F |F0, σ2 t I)

= ϵ log Nw( ϵ|0, σ2 t I), (12)

F = w(F0 + ϵ), (13)

ϵ = m(ϵ) = m(w(ϵ + t1 )), t R3. (14)

Here, we introduce a noise conversion function m( ) to map all the fractional coordinate matrices that are periodic translation equivalent with Ft to a unique matrix F . This addresses the requirement of periodic translation invariance of score. Consequently, we can employ the ordinary score calculation method, specifically the score of anisotropic WN here, to compute F log q( F |F0) as a substitute for the required score.

The key of Periodic Co M-free Noising is to design the specific function m : w(ϵ + t1 ) ϵ. We note that the Co M-free systems in molecular conformation generation (Xu et al., 2022) solve similar problem in translation invariance. However, the Center of Mass(Co M) of periodic data cannot be simple computed as mean value of data (Bai & Breen, 2008). Similar to the idea of Co M-free systems, we utilize the concept of mean angle from (Mardia et al., 2000) to construct m( ) as a periodic Co M-free function. Specifically, we denote ϵ = [ϵ1, ϵ2, . . . , ϵn] and fomulate:

m(ϵ) = w ϵ atan2 ( y(ϵ), x(ϵ))

i=0 sin (2πϵi),

j=0 cos (2πϵi).

Intuitively, as shown in Figure 3, the function transform periodic data of each lattice axis to angle data on a circle, and then subtract all the data by the periodic Co M. Consequently, it maps all equivalent periodic data to the same representation, and addresses the periodic translation invariance. We provide a proof in Appendix A.3.

After implementing the algorithm, we substituted the Ft log q(Ft|F0) in the denoising score matching training objective, as defined in Eq.(8), with Eq.(12). This change led to significant performance improvements in our ablation study and excellent training convergence demonstrated in Appendix D. However, with F = [ f1, f2, . . . , fn] and Ft = [f1, f2, . . . , fn], we identified two points that still require enhancement: 1. The marginal distribution2 q( fi|F0)

2We focus on evaluating the score of marginal distribution

Equivariant Diffusion for Crystal Structure Prediction

Figure 3. The illustration of periodic translation invariance with periodic Co M-free function.

does not simply satisfy Nw(0, σ2 t I), so it is necessary to re-evaluate its distribution to recalculate F log q( F |F0). 2. The generation process is designed for non periodic Co M-free system, while our method now simply use periodic Co M-free score F log q( F ) to replace corresponding Ft log q(Ft). A more rigorous implementation of probabilistic modeling is warranted to establish a refined connection between the score. We provide solutions below.

Von Mises Simulation. We denote ϵ = [ ϵ1, ϵ2, . . . , ϵn]. To evaluate the probability density function (PDF) q( fi|F0), we recognize that since F = w(F0 + ϵ), the task can be reframed as estimating the marginal PDF of ϵ = m(ϵ), specifically p( ϵi), where ϵ Nw(0, σ2 t I). However, directly obtaining the formula of p( ϵi) is challenging due to the complexity of m( ) and WN. For simplicity, we utilize the Von Mises distribution (Gatto & Jammalamadaka, 2007) which is commonly used in dealing with circular distribution problems to simulate the p( ϵi), and use the Monte Carlo method to obtain its parameters. More details is list in Appendix C.1. Consequently, F log q( F |F0) = [ c1, c2 . . . cn] can be expressed using the formula of score of Von Mises:

ci = fi log q( fi|F0) = ϵi log p( ϵi)

= 2π κ(n, σt) sin(2π ϵi), (16)

where κ(n, σt) is the parameter of Von Mises distribution obtained by Monte Carlo method. Consequently, using the revised F log q( F |F0) allows for a more accurate estimation of the invariant score F log q( F ) = [ s1, s2, . . . sn].

Probabilistic Modeling Process. For the second point, we propose a novel probabilistic modeling process inspired by (Luo et al., 2023). Denoting the score of non periodic Co M-free system as Ft log q(Ft) = [s1, s2 . . . sn], we consider si as a function of all the corresponding Co M-free data namely { f1, f2, . . . , fn} aiming to address periodic translation invariance. From the chain rule of derivatives, we can approximate the score Ft log q(Ft) by the score of

instead of the joint distribution to better align with the origin noise addition techniques employed in SDEs.

Co M-free system namely F log q( F ):

i=0 fi log q( fi) fj fi,

i=0 si fj fi.

We previously established that si is invariant, but fj fi might destroy the periodic translation invariance. Fortunately, we can first transform fj fi to the j-colomn of ϵ ϵi by their definition, and then strictly prove the following statements in Appendix A.4: Proposition 4.7. ϵ ϵi = ϵ m(ϵ)[:, i] is periodic translation invariance, where m(ϵ)[:, i] is the i-th column of m(ϵ). In other words, ϵ m(ϵ)[:, i] = w(ϵ+t1 ) m(w(ϵ + t1 ))[:, i] . Thus Ft log q(Ft) by Eq.(17) is periodic translation invariance.

While m( ϵ) = m(ϵ) holds, we can reformulate Eq.(17) using Proposition 4.7 as:

i=0 si ϵj m( ϵ)[:, i] , (18)

where ϵj m( ϵ)[:, i] is the j-column of ϵ m( ϵ)[: , i] . This indicates that to compute the adjusted score Ft log q(Ft), an additional invariant parameter ϵ is required. This parameter can be predicted by the model ϕ.

Put things together. Finally, we consolidate our approach to construct the training objective to approximate F log q( F ) and the expected ϵ:

Ls = EF0 q(F0),ϵ Nw(0,σt I),t U(1,T ) [ F log q( F |F0) sθ(Mt, t) 2 2], (19)

LF = EF0 q(F0),ϵ Nw(0,σt I),t U(1,T ) [ ϵ Fθ(Mt, t) 2 2], (20)

where sθ(Mt, t) and Fθ(Mt, t) are directly predicted by the model ϕ, and F log q( F |F0) is calculated by Eq.(16), meanwhile ϵ = m(ϵ). And we can finally derive ˆϵF in Section 4.4.1 by replacing F log q( F ) with sθ(Mt, t) and ϵ with Fθ(Mt, t) in Eq.(18).

In addition, to satisfy Proposition 4.6, we design the permutation loss similar to the method in Section 4.3:

Lps = E[ sθ(P Ct, P Ft, A, t) P sθ(Ct, Ft, A, t) 2 2], (21)

Lp F = E[ Fθ(P Ct, P Ft, A, t) P Fθ(Ct, Ft, A, t) 2 2], (22)

where the expectation is taken with respect t U(1, T) and P U(S3).

Equivariant Diffusion for Crystal Structure Prediction

Table 1. Results on stable structure prediction task. The results of baseline methods are from Jiao (Jiao et al., 2023)

Perov-5 MP-20 MPTS-52 Match rate RMSE Match rate RMSE Match rate RMSE

RS 36.56 0.0886 11.49 0.2822 2.68 0.3444

BO 55.09 0.2037 12.68 0.2816 6.69 0.3444

PSO 21.88 0.0844 4.35 0.1670 1.09 0.2390

P-c G-Sch Net 48.22 0.4179 15.39 0.3762 3.67 0.4115

CDVAE 45.31 0.1138 33.90 0.1045 5.34 0.2106

Diff CSP 52.02 0.0760 51.49 0.0631 12.19 0.1786

Equi CSP 52.02 0.0707 57.59 0.0510 14.85 0.1169

4.5. The Architecture of the Denoising Model

In this subsection, we outline the specific design of the denoising model ϕ(Mt), focusing on how it computes the three denoising terms: ˆϵL, sθ, Fθ. For simplicity, the subscript t is omitted in this discussion.

The model begins by integrating the atom embeddings fatom(A) with sinusoidal time embeddings ftime(t) to generate the initial node features H = φin(fatom(A), ftime(t)). We then describe the message passing mechanism from node j to node i in the l-th layer of the network.

m(l) ij = φm(h(l 1) i , h(l 1) j , C, ψFT(fj fi)) (23)

h(l) i = h(l 1) i + φh(h(l 1) i ,

j=1 m(l) ij ), (24)

where φm and φh are MLPs, and The function ψFT : ( 1, 1)3 [ 1, 1]3 K is Fourier Transformation of the relative fractional coordinate fj fi to address periodic translation invariance according to (Jiao et al., 2023).

After S layers of message passing, we get the graph-level denoising term as:

i=1 h(S) i , (25)

and the node-level denoising terms as

sθ[:, i], Fθ[:, i] = φs(h(S) i ), φF (h(S) i ) (26)

where φL, φs, φF are MLPs.

5. Experiments

In this section, we evaluate the performance of Equi CSP on diverse tasks, by showing the capability of generating

high-quality structures of different crystals in Section 5.1. Ablations in Section 5.2 show the necessity of each designed component. We further exhibit the capability of Equi CSP in the ab initio generation task in Appendix E.

5.1. Stable Structure Prediction Results

Datasets. Experiments are carried out on three datasets, each varying in complexity. The Perov-5 dataset (Castelli et al., 2012a;b) comprises 18,928 perovskite materials, characterized by their analogous structural configurations. Notably, each structure within this dataset features a unit cell containing 5 atoms. The dataset MP-20 comprises 45,231 stable inorganic materials curated from the Material Projects(Jain et al., 2013). This dataset predominantly includes materials that are experimentally generated and contain no more than 20 atoms per unit cell. In addition, MPTS-52 represents a more challenging extension of MP20, encompassing 40,476 structures with up to 52 atoms per cell. These structures are organized based on the earliest year of publication in the literature. For datasets such as Perov-5, and MP-20, we adhere to a 60-20-20 split for training, validation, and testing, respectively, aligning with the methodology of Jiao et al. (2023). Conversely, for MPTS-52, we allocate 27,380 entries for training, 5,000 for validation, and 8,096 for testing, arranged in chronological order.

Baselines. This study contrasts two categories of preceding research. The initial category adopts a predict-optimize approach, initially training a property predictor, followed by employing optimization algorithms for identifying optimal structures. Following Cheng et al. (2022), we use MEGNet (Chen et al., 2019) for formation energy prediction. For optimization, we select Random Search (RS), Bayesian Optimization (BO), and Particle Swarm Optimization (PSO), each conducted over 5,000 iterations. The second category revolves around deep generative models. In line with modifications by Xie et al. (2021), we employ c G-Sch Net (Gebauer et al., 2022), integrating Sch Net (Sch utt et al., 2018) as its

Equivariant Diffusion for Crystal Structure Prediction

core and incorporating ground-truth lattice initialization to encode periodicity, resulting in the P-c G-Sch Net model. Another baseline, CDVAE (Xie et al., 2021), which is a VAE-based approach for crystal generation, predicts lattice and initial composition, and then optimizes atom types and coordinates using annealed Langevin dynamics. Following the method by (Jiao et al., 2023),we adapt CDVAE for the CSP task. Diff CSP (Jiao et al., 2023), a diffusion method, learns stable structure distributions, incorporating translation, rotation, and periodicity, effectively modeling material systems.

Evaluation metrics. Adhering to established protocols (Xie et al., 2021), we assess performance by comparing predicted candidates against ground-truth structures. For each test set structure, we generate one samples with identical composition, considering a match if any sample aligns with the ground truth under pymatgen s Structure Matcher class metrics (Ong et al., 2013), with ltol=0.3, angle tol=10, stol=0.5. The Match rate reflects the ratio of matched structures in the test set. RMSE is computed between the ground truth and the closest matching candidate, normalized by 3p

V/n where V represents the lattice volume, and averaged across matched structures.

Results. Table 1 presents the following key insights: 1. Optimization approaches exhibit low Match rates, indicating the challenging nature of pinpointing optimal structures within the expansive search space. 2. Our method outperforms other generative approaches, which underscore our method s effectiveness in incorporating symmetry awareness during training and inference. 3. Across datasets ranging from Perov-5 to MPTS-52, all techniques experience a drop in performance with increasing atoms per cell. Despite this, our approach consistently surpasses the performance of other methods. In particular, our method significantly improved the RMSE metric, indicating that our equivariant diffusion approach effectively reduces the redundancy in the solution space, allowing the model to better learn the distribution characteristics of crystal data.

5.2. Ablation Studies

In Table 2, we conduct an ablation study on each component of Equi CSP, exploring the following aspects. 1. To verify the necessity of lattice permutation equivariance in the generation procedure, we conduct experiments by removing the loss component of lattice permutation. Result indicates that 3.77% decrease in match rate and 6.67% increase in RMSE, both of these performance metrics have deteriorated. In addition, we also compared the performance of the Frame Average method, which is also lower than that of our proposed method. 2. Without applying periodic Co M-free noising, we observe a significant deterioration in the performance metrics, with the match rate dropping

from 57.59% to 52.31% and the RMSE increasing from 0.0510 to 0.0594. This substantial change indicates that our methodology has effectively captured the characteristics of crystalline periodic translation during training, leading to a notable impact on the evaluation metrics. 3. To further investigate the importance of the Von Mises and Probalistic Model components in periodic translation, we conducted ablation studies on these two modules separately. Even without utilizing these two components, we observed performance improvements compared to scenarios where the noising method m( ) was not used. This indicates that our approach of modifying the noising method to ensure the score is periodic translation invariant is valid. And we observed that removing any or all of the Von Mises and Probalistic Model components will reduce the performance of the model in terms of match rate and RMSE, indicating that both components play a positive role in performance.

Table 2. Ablation studies of Equi CSP model on MP-20.

Performance

Method Match rate RMSE

Equi CSP 57.59 0.0510 w/o Lattice Permutation Equivariance

w/o permutation loss 55.42 0.0544

w/ Frame Average 55.92 0.0578 w/o Periodic Co M free Noising

w/o m( ) 52.31 0.0594 w/ Partial Periodic Co M free Noising

w/o Probalistic Model & w/o Von Mises 54.77 0.0578

w/ Von Mises & w/o Probalistic Model 57.03 0.0537

w/ Probalistic Model & w/o Von Mises 56.32 0.0525

6. Conclusion

In summary, we introduce Equi CSP, a novel equivariant diffusion generative model for Crystal Structure Prediction task. We addresses a previously unacknowledged challenge in current models: lattice permutation equivariance. During the diffusion phase, when lattice parameters undergo permutation, the fractional coordinates of atoms experience an equivariant transformation, ensuring consistency and preserving structural integrity. Furthermore, we have devised an innovative noising algorithm that meticulously preserves periodic translation equivariance throughout both the inference and training phases. Experimental results unequivocally demonstrate that Equi CSP outperforms existing CSP methods, achieving superior in generating high quilty structures.

Acknowledgements

This research was jointly supported by the following project: the National Key RD Program of China (2021YFB0301300); the Major Program of Guangdong

Equivariant Diffusion for Crystal Structure Prediction

Basic and Applied Research (2019B030302002); Guangdong Province Special Support Program for Cultivating High-Level Talents (2021TQ06X160); Pazhou Lab Research Project (PZL2023KF0001); the Fundamental Research Funds for the Central Universities, Sun Yat-sen University (23xkjc016); the National Science and Technology Major Project under Grant 2020AAA0107300; the National Natural Science Foundation of China (No. 61925601, No. 62376276); Beijing Nova Program (No. 20230484278); Alibaba Damo Research Fund.

Impact Statement

The development of equivariant diffusion models for CSP holds transformative potential across a broad range of scientific disciplines, including materials science, chemistry, and physics. By accurately predicting the arrangement of atoms in crystal structures, this technology can significantly expedite the discovery of novel materials. This, in turn, has far-reaching implications for various applications such as renewable energy, pharmaceuticals, electronics, and more, where innovative materials can lead to advancements in efficiency, efficacy, and sustainability. Moreover, the ability to predict crystal structures from their elemental compositions could reduce the need for expensive and time-consuming physical experiments, making research more accessible and accelerating the pace of innovation.

Bai, L. and Breen, D. Calculating center of mass in an unbounded 2d environment. Journal of Graphics Tools, 13(4):53 60, 2008.

Castelli, I. E., Landis, D. D., Thygesen, K. S., Dahl, S., Chorkendorff, I., Jaramillo, T. F., and Jacobsen, K. W. New cubic perovskites for one-and two-photon water splitting using the computational materials repository. Energy & Environmental Science, 5(10):9034 9043, 2012a.

Castelli, I. E., Olsen, T., Datta, S., Landis, D. D., Dahl, S., Thygesen, K. S., and Jacobsen, K. W. Computational screening of perovskite metal oxides for optimal solar light capture. Energy & Environmental Science, 5(2): 5814 5819, 2012b.

Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W., Palizhati, A., Sriram, A., Wood, B., Yoon, J., Parikh, D., Zitnick, C. L., and Ulissi, Z. Open catalyst 2020 (oc20) dataset and community challenges. ACS Catalysis, 2021. doi: 10.1021/acscatal.0c04525.

Chen, C., Ye, W., Zuo, Y., Zheng, C., and Ong, S. P. Graph networks as a universal machine learning framework for

molecules and crystals. Chemistry of Materials, 31(9): 3564 3572, 2019.

Cheng, G., Gong, X.-G., and Yin, W.-J. Crystal structure prediction by combining graph network and optimization algorithm. Nature communications, 13(1):1 8, 2022.

Court, C. J., Yildirim, B., Jain, A., and Cole, J. M. 3-d inorganic crystal structure generation and property prediction via representation learning. Journal of chemical information and modeling, 60(10):4518 4535, 2020.

Davies, D. W., Butler, K. T., Jackson, A. J., Skelton, J. M., Morita, K., and Walsh, A. Smact: Semiconducting materials by analogy and chemical theory. Journal of Open Source Software, 4(38):1361, 2019.

Desiraju, G. R. Cryptic crystallography. Nature materials, 1(2):77 79, 2002.

Fuchs, F., Worrall, D. E., Fischer, V., and Welling, M. Se(3)- transformers: 3d roto-translation equivariant attention networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020.

Gatto, R. and Jammalamadaka, S. R. The generalized von mises distribution. Statistical Methodology, 4(3):341 353, 2007.

Gebauer, N., Gastegger, M., and Sch utt, K. Symmetryadapted generation of 3d point sets for the targeted discovery of molecules. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch e-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32, pp. 7566 7578. Curran Associates, Inc., 2019.

Gebauer, N. W., Gastegger, M., Hessmann, S. S., M uller, K.-R., and Sch utt, K. T. Inverse design of 3d molecular structures with conditional generative neural networks. Nature communications, 13(1):1 11, 2022.

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840 6851, 2020.

Hoffmann, J., Maestrati, L., Sawada, Y., Tang, J., Sellier, J. M., and Bengio, Y. Data-driven approach to encoding and decoding 3-d crystal structures. ar Xiv preprint ar Xiv:1909.00949, 2019.

Hofmann, D. W. and Apostolakis, J. Crystal structure prediction by data mining. Journal of Molecular Structure, 647(1-3):17 39, 2003.

Equivariant Diffusion for Crystal Structure Prediction

Hoogeboom, E., Satorras, V. G., Vignac, C., and Welling, M. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867 8887. PMLR, 2022.

Hu, J., Yang, W., and Dilanga Siriwardane, E. M. Distance matrix-based crystal structure prediction using evolutionary algorithms. The Journal of Physical Chemistry A, 124 (51):10909 10919, 2020.

Hu, J., Yang, W., Dong, R., Li, Y., Li, X., Li, S., and Siriwardane, E. M. Contact map based crystal structure prediction using global optimization. Cryst Eng Comm, 23 (8):1765 1776, 2021.

Jacobsen, T., Jørgensen, M., and Hammer, B. On-the-fly machine learning of atomic potential in density functional theory structure optimization. Physical review letters, 120 (2):026102, 2018.

Jain, A., Ong, S. P., Hautier, G., Chen, W., Richards, W. D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL materials, 1(1):011002, 2013.

Jiao, R., Huang, W., Lin, P., Han, J., Chen, P., Lu, Y., and Liu, Y. Crystal structure prediction by joint equivariant diffusion. ar Xiv preprint ar Xiv:2309.04475, 2023.

Jin, W., Chen, X., Vetticaden, A., Sarzikova, S., Raychowdhury, R., Uhler, C., and Hacohen, N. Dsmbind: Se (3) denoising score matching for unsupervised binding energy prediction and nanobody design. bio Rxiv, pp. 2023 12, 2023.

Kim, S., Noh, J., Gu, G. H., Aspuru-Guzik, A., and Jung, Y. Generative adversarial networks for crystal structure prediction. ACS central science, 6(8):1412 1420, 2020.

Kipf, T. N. and Welling, M. Variational graph auto-encoders. ar Xiv preprint ar Xiv:1611.07308, 2016.

Kohn, W. and Sham, L. J. Self-consistent equations including exchange and correlation effects. Physical review, 140(4A):A1133, 1965.

Luo, S., Shi, C., Xu, M., and Tang, J. Predicting molecular conformation via dynamic graph score matching. Advances in Neural Information Processing Systems, 34: 19784 19795, 2021.

Luo, S., Su, Y., Peng, X., Wang, S., Peng, J., and Ma, J. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.

Luo, Y., Liu, C., and Ji, S. Towards symmetryaware generation of periodic materials. ar Xiv preprint ar Xiv:2307.02707, 2023.

Mardia, K. V., Jupp, P. E., and Mardia, K. Directional statistics, volume 2. Wiley Online Library, 2000.

Nichol, A. Q. and Dhariwal, P. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162 8171. PMLR, 2021.

Niu, C., Song, Y., Song, J., Zhao, S., Grover, A., and Ermon, S. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp. 4474 4484. PMLR, 2020.

Noh, J., Kim, J., Stein, H. S., Sanchez-Lengeling, B., Gregoire, J. M., Aspuru-Guzik, A., and Jung, Y. Inverse design of solid-state materials via a continuous representation. Matter, 1(5):1370 1384, 2019.

Nouira, A., Sokolovska, N., and Crivello, J.-C. Crystalgan: learning to discover crystallographic structures with generative adversarial networks. ar Xiv preprint ar Xiv:1810.11203, 2018.

Oganov, A., Lyakhov, A., and Valle, M. How evolutionary crystal structure prediction works and why. Accounts of chemical research, 44:227 37, 03 2011. doi: 10.1021/ ar1001318.

Oganov, A. R. and Glass, C. W. Crystal structure prediction using ab initio evolutionary techniques: Principles and applications. Journal of Chemical Physics, 124(24):201 419, 2006.

Oganov, A. R., Pickard, C. J., Zhu, Q., and Needs, R. J. Structure prediction drives materials discovery. Nature Reviews Materials, 4(5):331 348, 2019.

Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V. L., Persson, K. A., and Ceder, G. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68:314 319, 2013.

Pickard, C. J. Airss data for carbon at 10gpa and the c+n+h+o system at 1gpa, 2020.

Pickard, C. J. and Needs, R. Ab initio random structure searching. Journal of Physics: Condensed Matter, 23(5): 053201, 2011.

Podryabinkin, E. V., Tikhonov, E. V., Shapeev, A. V., and Oganov, A. R. Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning. Physical Review B, 99(6):064114, 2019.

Equivariant Diffusion for Crystal Structure Prediction

Puny, O., Atzmon, M., Ben-Hamu, H., Misra, I., Grover, A., Smith, E. J., and Lipman, Y. Frame averaging for invariant and equivariant network design. ar Xiv preprint ar Xiv:2110.03336, 2021.

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. Hierarchical text-conditional image generation with clip latents. ar Xiv preprint ar Xiv:2204.06125, 2022.

Ren, Z., Tian, S. I. P., Noh, J., Oviedo, F., Xing, G., Li, J., Liang, Q., Zhu, R., Aberle, A. G., Sun, S., Wang, X., Liu, Y., Li, Q., Jayavelu, S., Hippalgaonkar, K., Jung, Y., and Buonassisi, T. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter, 2021. ISSN 2590-2385. doi: https://doi.org/10.1016/j.matt.2021.11. 032.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models, 2021.

Satorras, V. G., Hoogeboom, E., and Welling, M. E (n) equivariant graph neural networks. In International Conference on Machine Learning, pp. 9323 9332. PMLR, 2021.

Sch utt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A., and M uller, K.-R. Schnet a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.

Shi, C., Luo, S., Xu, M., and Tang, J. Learning gradient fields for molecular conformation generation. In International Conference on Machine Learning, pp. 9558 9568. PMLR, 2021.

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256 2265. PMLR, 2015.

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. ar Xiv preprint ar Xiv:2011.13456, 2020.

Th olke, P. and De Fabritiis, G. Equivariant transformers for neural network based molecular potentials. In International Conference on Learning Representations, 2021.

Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. ar Xiv preprint ar Xiv:1802.08219, 2018.

Tran, R., Lan, J., Shuaibi, M., Goyal, S., Wood, B. M., Das, A., Heras-Domingo, J., Kolluru, A., Rizvi, A., Shoghi,

N., et al. The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysis. ar Xiv preprint ar Xiv:2206.08917, 2022.

Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., and Frossard, P. Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023.

Vincent, P. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661 1674, 2011.

Wang, Y., Lv, J., Zhu, L., and Ma, Y. Crystal structure prediction via particle swarm optimization. Physics, 82 (9):7174 7182, 2010a.

Wang, Y., Lv, J., Zhu, L., and Ma, Y. Crystal structure prediction via particle-swarm optimization. Physical Review B, 82(9):094116, 2010b.

Wang, Y., Lv, J., Zhu, L., and Ma, Y. Calypso: A method for crystal structure prediction. Computer Physics Communications, 183(10):2063 2070, 2012. ISSN 0010-4655. doi: https://doi.org/10.1016/j.cpc.2012.05. 008. URL https://www.sciencedirect.com/ science/article/pii/S0010465512001762.

Ward, L., Agrawal, A., Choudhary, A., and Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2(1):1 7, 2016.

Xie, T. and Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett., 120:145301, Apr 2018. doi: 10.1103/Phys Rev Lett.120.145301.

Xie, T., Fu, X., Ganea, O.-E., Barzilay, R., and Jaakkola, T. S. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations, 2021.

Xu, M., Yu, L., Song, Y., Shi, C., Ermon, S., and Tang, J. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2021.

Xu, M., Yu, L., Song, Y., Shi, C., Ermon, S., and Tang, J. Geodiff: A geometric diffusion model for molecular conformation generation. ar Xiv preprint ar Xiv:2203.02923, 2022.

Yamashita, T., Sato, N., Kino, H., Miyake, T., Tsuda, K., and Oguchi, T. Crystal structure prediction accelerated by bayesian optimization. Physical Review Materials, 2 (1):013803, 2018.

Equivariant Diffusion for Crystal Structure Prediction

Yan, K., Liu, Y., Lin, Y., and Ji, S. Periodic graph transformers for crystal material property prediction. In The 36th Annual Conference on Neural Information Processing Systems, 2022.

Yang, W., Siriwardane, E. M. D., Dong, R., Li, Y., and Hu, J. Crystal structure prediction of materials with high symmetry using differential evolution. Journal of Physics: Condensed Matter, 33(45):455902, 2021.

Zhang, Y., Wang, H., Wang, Y., Zhang, L., and Ma, Y. Computer-assisted inverse design of inorganic electrides. Physical Review X, 7(1):011017, 2017.

Zimmermann, N. E. and Jain, A. Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity. RSC advances, 10(10):6063 6081, 2020.

Equivariant Diffusion for Crystal Structure Prediction

A. Theoretical Analysis.

A.1. Proof of Proposition 4.5

We first introduce the following definition to describe the equivariance and invariance from the perspective of distributions.

Definition A.1. We call a distribution p(x) is G-invariant if for any transformation g in the group G, p(g x) = p(x), and a conditional distribution p(x|c) is G-equivariant if p(g x|g c) = p(x|c), g G.

We then provide the following lemma to capture the symmetry of the generation process.

Lemma A.2 (Xu et al. (2021)). Consider the generation Markov process p(x0) = p(x T ) R p(x0:T 1|xt)dx1:T . If the prior distribution p(x T ) is G-invariant and the Markov transitions p(xt 1|xt), 0 < t T are G-equivariant, the marginal distribution p(x0) is also G-invariant.

The proposition Proposition 4.5 is rewritten and proved as follows.

Proof. Consider the transition probability in Eq. (2), we have

p(Ct 1|Ct, Ft, A) = N(Ct 1|at(Ct btˆϵL(Ct, Ft, A, t)), σ2 t I),

where at = 1 αt , bt = βt 1 αt , σ2 t = βt 1 αt 1

1 αt for simplicity, and ˆϵL(Mt, t) is completed as ˆϵL(Ct, Ft, A, t).

As the denoising term ˆϵL(Ct, Ft, A, t) is lattice permutation equivariant, we have ˆϵL(P Ct, P Ft, A, t) = P ˆϵL(Ct, Ft, A, t) for any permutation matrix P S3, P P = I.

For the variable C N( C, σ2I), we have P C N(P C, P (σ2I)P ) = N(P C, σ2I). That is,

N(C| C, σ2I) = N(P C|P C, σ2I). (27)

For the transition probability p(Ct 1|Ct, Ft, A), we have

p(P Ct 1|P Ct, P Ft, A) = N(P Ct 1|at(P Ct btˆϵL(P Ct, P Ft, A, t)), σ2 t I)

= N(P Ct 1|at(P Ct bt P ˆϵL(Ct, Ft, A, t)), σ2 t I) (lattice permutation equivariant ˆϵL)

= N(P Ct 1|P at(Ct btˆϵL(Ct, Ft, A, t)) , σ2 t I)

= N(Ct 1|at(Ct btˆϵL(Ct, Ft, A, t)), σ2 t I) (Eq. (27))

= p(Ct 1|Ct, Ft, A).

As the transition is lattice permutation equivariant and the prior distribution N(0, I) is lattice permutation invariant, we prove that the the marginal distribution p(C0) is lattice permutation invariant based on lemma A.2.

A.2. Proof of Proposition 4.6

Let Nw(µ, σ2I) denote the wrapped normal distribution with mean µ, variance σ2 and period 1. We first provide the following lemma.

Lemma A.3. If the denoising term ˆϵF (Ct, Ft, A, t) is lattice permutation equivariant, and the transition probabilty can be formulated as p(Ft 1|Ct, Ft, A) = Nw(Ft 1|Ft +utˆϵF (Ct, Ft, A, t), v2 t I), where ut, vt are functions of t, the transition is lattice permutation equivariant.

Proof. For the variable F Nw( F , v2 t I) and P S3, we have P F Nw(P F , P (v2 t I)P ) = Nw(P F , v2 t I). That is,

Nw(F | F , v2 t I) = Nw(P F |P F , v2 t I). (28)

Equivariant Diffusion for Crystal Structure Prediction

For the transition probability p(Ft 1|Ct, Ft, A), we have

p(P Ft 1|P Ct, P Ft, A) = Nw(P Ft 1|P Ft + utˆϵF (P Ct, P Ft, A, t), v2 t I)

= Nw(P Ft 1|P Ft + ut P ˆϵF (Ct, Ft, A, t), v2 t I) (lattice permutation equivariant ˆϵF )

= Nw(P Ft 1|P Ft + utˆϵF (Ct, Ft, A, t) , v2 t I)

= Nw(Ft 1|Ft + utˆϵF (Ct, Ft, A, t), v2 t I) (Eq. (28))

= p(Ft 1|Ct, Ft, A).

The transition probability of the fractional coordinates during the Predictor-Corrector sampling can be formulated as

p(Ft 1|Ct, Ft, A) = p P (Ft 1

2 |Ct, Ft, A)p C(Ft 1|Ct 1, Ft 1

2 |Ct, Ft, A) = Nw(Ft 1

2 |Ft + (σ2 t σ2 t 1)ˆϵF (Ct, Ft, A, t), σ2 t 1(σ2 t σ2 t 1) σ2 t I),

p C(Ft 1|Ct 1, Ft 1

2 , A) = Nw(Ft 1

2 |Ft + γ σt 1

σ1 ˆϵF (Ct 1, Ft 1

2 , A, t 1), 2γ σt 1

where p P , p C are the transitions of the predictor and corrector. According to lemma A.3, both of the transitions are lattice permutation equivariant. Therefore, the transition p(Ft 1|Ct, Ft, A) is lattice permutation equivariant. As the prior distribution U(0, 1) is lattice permutation invariant, we finally prove that the marginal distribution p(F0) is lattice permutation invariant based on lemma A.2.

A.3. Proof of Periodic Co M-free Nosing

We prove the periodic Co M-free function m(ϵ) constructed by Eq (15) is periodic translation invariance as Eq (14) describes in this section. Since m( ) can be treated as operating on each row of ϵ R3 n independently, without loss of generality, we use ϵ = [ϵ1, ϵ2, . . . , ϵn] to represent any row of the origin ϵ for simplification. Then we can rewrite m as:

m(ϵ) = w ϵ atan2 ( y(ϵ), x(ϵ))

i=0 sin (2πϵi),

i=0 cos (2πϵi),

We next prove that m(ϵ) = m(ϵ + r) for any r R.

Proof. We can rewrite m(ϵ) = m(ϵ + r):

w ϵ atan2 ( y(ϵ), x(ϵ))

2π = w ϵ + r atan2 ( y(ϵ + r), x(ϵ + r))

ϵ atan2 ( y(ϵ), x(ϵ))

2π = ϵ + r atan2 ( y(ϵ + r), x(ϵ + r))

where d denotes any integer. We can redefine that r = w(r) [0, 1) because y(ϵ + r) = y(ϵ + w(r)) and x(ϵ + r) = x(ϵ + w(r)) by the periodicity of trigonometric functions and we can merge integer part of the origin r into d since d can

Equivariant Diffusion for Crystal Structure Prediction

be any integer. Further simplification leads to:

atan2 ( y(ϵ), x(ϵ))

2π + r + d = atan2 ( y(ϵ + r), x(ϵ + r))

atan2 ( y(ϵ), x(ϵ)) + 2π r + 2π d = atan2 ( y(ϵ + r), x(ϵ + r)) ,

tan atan2 ( y(ϵ), x(ϵ)) + 2π r + 2π d = y(ϵ + r)

x(ϵ + r), (tan( ) for both sides)

tan atan2 ( y(ϵ), x(ϵ)) + 2π r) = y(ϵ + r)

y(ϵ) x(ϵ) + tan(2π r)

x(ϵ) tan(2π r) = y(ϵ + r)

x(ϵ + r), (tan( ) addition formula)

y(ϵ) cos(2π r) + x(ϵ) sin(2π r) x(ϵ) cos(2π r) y(ϵ) sin(2π r) = y(ϵ + r)

Then we can prove that:

the right-hand side =

1 n Pn i=0 sin (2πϵi + 2π r)

1 n Pn i=0 cos (2πϵi + 2π r),

1 n Pn i=0 sin (2πϵi) cos 2π r + cos (2πϵi) sin (2π r)

1 n Pn i=0 cos (2πϵi) cos (2π r) sin (2πϵi) sin (2π r) (sin( ) and cos( ) addition formula)

= y(ϵ) cos(2π r) + x(ϵ) sin(2π r)

x(ϵ) cos(2π r) y(ϵ) sin(2π r) (30)

= the left-hand side

We finally prove that m(ϵ) is periodic translation invariance.

A.4. Proof of Proposition 4.7

Since ϵ m(ϵ)[:, i] can be treated as operating on each row of ϵ R3 n independently, without loss of generality, we use ϵ = [ϵ1, ϵ2, . . . , ϵn] to represent any row of the origin ϵ, and use mi(ϵ) to represent the origin m(ϵ)[:, i] for simplification. Then we can rewrite m(ϵ)[:, i] as:

mi(ϵ) = w ϵi atan2 ( y(ϵ), x(ϵ))

j=0 sin (2πϵj),

j=0 cos (2πϵj),

Our target is to prove that ϵmi(ϵ) is periodic translation invariance.

We first get the formula of ϵmi(ϵ):

x x2 + y2 y

ϵj y x2 + y2 x

, if i = j,

x x2 + y2 y

ϵj y x2 + y2 x

, if i = j,

Equivariant Diffusion for Crystal Structure Prediction

x = x(ϵ), y ϵj = 1

n 2π cos(2πϵj),

n 2π sin(2πϵj).

Substituting these partial derivatives, we obtain:

1 n( x2 + y2) ( x cos(2πϵj) + y sin(2πϵj)) , if i = j,

1 1 n( x2 + y2) ( x cos(2πϵj) + y sin(2πϵj)) , if i = j, (32)

where y = y(ϵ) and x = x(ϵ). Thus, the gradient ϵmi(ϵ) is a vector with its jth component given by Eq.(32).

We next prove ϵmi(ϵ) = ϵ+rmi(ϵ + r).

Proof. By Eq.(32), we can rewrite ϵmi(ϵ) = ϵ+rmi(ϵ + r) as:

x(ϵ) cos(2πϵj) + y(ϵ) sin(2πϵj)

x2(ϵ) + y2(ϵ) = x(ϵ + 2πr) cos(2πϵj + 2πr) + y(ϵ + 2πr) sin(2πϵj + 2πr)

x2(ϵ + 2πr) + y2(ϵ + 2πr) (33)

Referring to Eq.(30), we have:

y(ϵ + r) = y(ϵ) cos(2π r) + x(ϵ) sin(2π r), x(ϵ + r) = x(ϵ) cos(2π r) y(ϵ) sin(2π r).

The numerator of the right-hand side of Eq.(33) can be expressed as:

x(ϵ + r) cos(2πϵj + 2πr) + y(ϵ + r) sin(2πϵj + 2πr)

= [ x(ϵ) cos(2πr) y(ϵ) sin(2πr)][cos(2πϵj) cos(2πr) sin(2πϵj) sin(2πr)]

+ [ y(ϵ) cos(2πr) + x(ϵ) sin(2πr)][sin(2πϵj) cos(2πr) + cos(2πϵj) sin(2πr)]

= x(ϵ) cos(2πϵj) + y(ϵ) sin(2πϵj),

where the terms involving r cancel out due to trigonometric identities.

The denominator remains invariant under the transformation due to the Pythagorean identity:

x2(ϵ + r) + y2(ϵ + r) = [ x(ϵ) cos(2πr) y(ϵ) sin(2πr)]2 + [ y(ϵ) cos(2πr) + x(ϵ) sin(2πr)]2

= x2(ϵ) + y2(ϵ).

Therefore, the given statement is proven:

x(ϵ) cos(2πϵj) + y(ϵ) sin(2πϵj)

x2(ϵ) + y2(ϵ) = x(ϵ + r) cos(2πϵj + 2πr) + y(ϵ + r) sin(2πϵj + 2πr)

x2(ϵ + r) + y2(ϵ + r) .

We finally prove that ϵmi(ϵ) is periodic translation invariance, which is equivalent to the statement that ϵ m(ϵ)[:, i] is periodic translation invariance.

Equivariant Diffusion for Crystal Structure Prediction

We next prove that Ft log q(Ft) is periodic translation invariance. We can derive it as:

Ft log q(Ft) =

fi log q( fi) 1 Ft fi,

fi log q( fi) 1 ϵ m(ϵ)[:, i] ,

Since fi log q( fi) is periodic translation invariance, and ϵ m(ϵ)[:, i] is also periodic translation invariance from the above proof, we can easily get Ft log q(Ft) is periodic translation invariance.

B.1. Symmetries of crystal structure distribution

We represent the fractional coordinates on a lattice base of crystal as the points on the circle in Figure 1 (e) (f). If the points rotate alongside the circle, it means that the fractional coordinates on the base undergo a periodic translation. Specifically, the geometry of 2π fi alongside the circle is equivalent to 2πw(fi + d). Periodic translation invariance can be explained that any rotation on the circle does not change geometry of the angle distribution.

B.2. Diffusion on Lattice Parameters

In DDPM models (Ho et al., 2020), lattice parameters typically range from [0, + ) for lengths and (0, π) for angles. However, DDPMs diffuse within the ( , + ) interval, potentially generating unreasonable lattice parameters during diffusion generation. To ensure generated lattice parameters are always reasonable, we apply a logarithmic transformation to lengths, as the function log maps (0, + ) to ( , + ), perfectly aligning with our requirement. Thus, we generate log l instead of l, and convert it back using elog l = l, ensuring lengths l are always positive. For angles, we process them with tan(ϕ π/2), which also maps the desired (0, π) to ( , + ). Upon generating values for tan(ϕ π/2), we retrieve angles ϕ in the (0, π) range through arctan(tan(ϕ π/2) + π/2) = ϕ.

B.3. Frame Average Method

Frame Average Method encodes the lattice permutation equivariance to the neural network. On the context of lattice permutation group, a frame is defined as one specific order of lattice vectors. By applying a permutation matrix to both the lattice and its fractional coordinates, we are able to transform the structure into an equivalent frame, as we described in Definition 4.5 of our paper.

Specifically, we adjusted the neural network formula to:

ϕF A(X) = 1

P S3 P 1ϕ(P X)

where ϕ is the neural network model in Section 4.5, X is a 3 N matrix comprising three lattices and their respective fractional coordinates, and P represents the permutation matrices for the lattices.

C. Implementation Details.

C.1. Von Mises Distribution Simulation

The Von Mises distribution(Gatto & Jammalamadaka, 2007), often referred to as the circular normal distribution, is a probability distribution used for modeling angular or directional data. It is flexible and efficient to handle periodic and directional characteristics. Let V(µ, κ) denote the Von Mises distribution with mean direction µ, concentration parameter κ and period 1. The probability density function (PDF) of V(µ, κ) is defined as:

V(x; µ, κ) = eκ cos(2πx µ)

Equivariant Diffusion for Crystal Structure Prediction

where µ is the mean direction of the distribution, and κ is the concentration parameter, indicating the level of concentration around the mean direction. The function I0(κ) is the modified Bessel function of order zero, which normalizes the distribution.

As detailed in Section 4.4.1, we employ Von Mises distribution to simulate p( ϵi) where ϵ = m(ϵ) by Eq.(15), ϵ R3 n

and ϵ Nw(0, σ2 t I) and ϵi is the i-coloumn of ϵ. Similar to the analysis in Appendix.A.4, we can simplify the question as: using V(µ, κ) to simulate p( ϵi) where ϵi = mi(ϵ) by Eq.(31), ϵ = [ϵ1, ϵ2, ϵn], i U(1, n) and ϵ Nw(0, σ2 t I). Since the mean of the ϵ Nw(0, σ2 t I) is 0 and the function mi(ϵ) intuitively moves the elements of ϵ as a whole closer to 0, we set the mean direction of V(µ, κ) to 0 empirically. As a result, the key of simulation is to estimate the concentration parameter κ. We denotes V(0, κ(n, σt)) as the target distribution since κ is relative to the size of ϵ i.e. n and the variance of Nw(0, σ2 t I) i.e. σt.

We employ the Monte Carlo method to estimate κ(n, σt). For each specified n and σt, the procedure initiates by generating samples from ϵ Nw(0, σ2 t I), which are then transformed into ϵi following the methodology outlined above. These transformed points are then used to compute their respective probability values according to V(0, κ(n, σt)), where κ(n, σt) is initially set to an arbitrary value. The negative log-likelihood of these probabilities serves as the loss function. To refine the estimation and ascertain the optimal κ(n, σt) value, we utilize the minimize function from the Sci Py library, ensuring an effective and precise optimization tailored to each n and σt configuration.

We have obtained the approximate probability density function for each element of ϵ. As each element is considered to be independently and identically distributed, the calculation of the score for ϵ involves deriving the corresponding score for each individual element, i.e. ϵi log V( ϵi|0, κ(n, σt)). To derive ϵi log V( ϵi|0, κ(n, σt)), we first consider the logarithm of V( ϵi; µ, κ):

log V( ϵi; µ, κ) = log eκ cos(2π ϵi µ)

= κ cos(2π ϵi µ) log I0(κ)

Now, taking the gradient with respect to ϵi, we get:

ϵi log V( ϵi|0, κ(n, σt)) = d

d ϵi (κ(n, σt) cos(2π ϵi) log I0(κ(n, σt)))

= 2πκ(n, σt) sin(2π ϵi) (34)

To sum up, we have:

ϵi log p( ϵi) ϵi log V( ϵi|0, κ(n, σt))

= 2πκ(n, σt) sin(2π ϵi)

C.2. Probabilistic Modeling Process

We simplify Eq.(18) to improve the efficiency of calculation. Since Ft log q(Ft) can be treated as operating on each row of ϵ independently, without loss of generality, we use ϵ = [ ϵ1, ϵ2, . . . , ϵn] to represent any row of the origin ϵ and use s to represent the corresponding row of Ft log q(Ft) for simplification, while using s = [ s1, s2, . . . , sn] to represent the corresponding row of F log q( F ). Then we have:

si ϵmi( ϵ) ,

Equivariant Diffusion for Crystal Structure Prediction

where ϵi log p( ϵi) can be obtained by Eq.(34) and ϵmi( ϵ) can be obtained by Eq.(32). Further expanding:

s = s1 [1 + g1, g2, . . . , gn] + s2 [g1, 1 + g2, . . . , gn] + . . .

sn [g1, g2, . . . , 1 + gn],

s1 + g1 n X

i=0 si , . . . , sn + gn n X

i=0 si g( ϵ),

g( ϵ) = [g1, g2, . . . , gn],

= x cos(2π ϵ) + y sin(2π ϵ)

n( x2 + y2)

j=0 sin (2π ϵj),

j=0 cos (2π ϵj).

Consequently, we have successfully formulated a more parallelization-friendly expression of s. By extrapolating this to the initial context where ϵ = [ ϵ1, ϵ2, . . . , ϵn] R3 n, the ultimate expression is thus deduced:

Ft log q(Ft) = F log q( F ) + n X

i=0 si1 g( ϵ , (35)

g( ϵ) = 1 n( x2 + y2) x cos(2π ϵ) + y sin(2π ϵ)

j=0 sin (2π ϵj) 1 ,

j=0 cos (2π ϵj) 1 .

C.3. Algorithms for Training and Sampling

Algorithm 1 provides a comprehensive overview of the forward diffusion process and the training procedure for the denoising model ϕ, while Algorithm 2 elucidates the backward sampling process. These algorithms can effectively preserve symmetries if ϕ is meticulously designed. It is worth mentioning that we employ the predictor-corrector sampler (Song et al., 2020) to sample F0 in Algorithm 2, where Line 8 denotes the predictor, and Lines 11-12 correspond to the corrector. The m( ) denotes the function in Eq.(15). The w( ) denotes the truncation function. The g( ) denotes the function in Eq.(36).

C.4. Hyper-parameters and Training Details.

For our Equi CSP, we employ a 4-layer setting with 256 hidden states for Perov-5 and a 6-layer setting with 512 hidden states for other datasets. The dimension of the Fourier embedding is set to k = 256. We utilize the cosine scheduler with s = 0.008 to regulate the variance of the DDPM process on Ct, and an exponential scheduler with σ1 = 0.005, σT = 0.5 to control the noise scale of the score matching process on Ft. The diffusion step is set to T = 1000. Our model undergoes training for 3500, 4000, 1000, and 1000 epochs respectively for Perov-5, Carbon-24, MP-20, and MPTS-52 using the same optimizer and learning rate scheduler as CDVAE. For Langevin dynamics step size γ, we apply values of γ = 5 10 7 for Perov-5, γ = 5 10 6 for MP-20, γ = 1 10 5 for MPTS-52; while for ab initio generation in Carbon-24 case we use γ = 1 10 5. All models are trained on one Nvidia A800 GPU.

Equivariant Diffusion for Crystal Structure Prediction

Algorithm 1 Training Procedure of Equi CSP

1: Input: lattice parameters C0, atom types A, fractional coordinates F0, denoising model ϕ, and the number of sampling steps T. 2: Sample ϵL N(0, I),ϵF N(0, I),P U(S3) and t U(1, T). 3: ϵ m(w(σtϵF )) 4: Ct αt C0 + 1 αtϵL 5: Ft w(F0 + ϵ) 6: ˆϵL, sθ, Fθ ϕ(Ct, Ft, A, t) 7: ˆϵ L, s θ, F θ ϕ(P Ct, P Ft, A, t) 8: LC ϵL ˆϵL 2 2 9: Ls ( 2πκ(n, σt) sin(2π ϵ)) sθ 2 2 10: LF ϵ Fθ 2 2 11: Lp C P ˆϵL ˆϵ L|2 2 12: Lps P sθ s θ|2 2 13: Lp F P Fθ F θ|2 2 14: Minimize LC + Ls + LF + Lp C + Lps + Lp F

Algorithm 2 Sampling Procedure of Equi CSP

1: Input: atom types A, denoising model ϕ, number of sampling steps T, step size of Langevin dynamics γ. 2: Sample CT N(0, I),FT U(0, 1). 3: for t T, , 1 do 4: Sample ϵL, ϵF , ϵ F N(0, I) 5: ˆϵL, sθ, Fθ ϕ(Ct, Ft, A, t).

6: Ct 1 1 αt (Ct βt 1 αt ˆϵL) + q

7: ˆϵF = sθ + (Pn i=0 sθ[:, i])1 g(Fθ)

2 w(Ft + (σ2 t σ2 t 1)ˆϵF + σt 1 q

σ2 t σ2 t 1 σt ϵF ) 9: , sθ, Fθ ϕ(Ct 1, Ft 1

2 , A, t 1).

10: ˆϵF = sθ + (Pn i=0 sθ[:, i])1 g(Fθ) 11: dt γσt 1/σ1 12: Ft 1 w(Ft 1

2 + dtˆϵF + 2dtϵ F ). 13: end for 14: Return C0, F0.

D. Learning Curves of Different Variants.

We plot the curves of training loss of different variants proposed in Figure 4 and 5.

E. Ab initio Structure Generation

Dataset. We conduct experiments on Perov-5, Carbon-24 and MP-20 dataset. Notably, Carbon-24 (Pickard, 2020) encompasses 10,153 carbon materials, each containing 6 to 24 atoms per cell. Contrasting with other datasets used in Table 3, where compositions typically correspond to a single stable structure, Carbon-24 features a wide array of structures for any given composition. This dataset allows us to evaluate the capability to generate diverse one-to-many metastable structures, reflecting the variability inherent in crystal structures.

Extending Equi CSP to Ab Initio Generation Task We utilize the approach described in Appendix G of the Diff CSP literature (Jiao et al., 2023) to extend Equi CSP to the ab initio generation task.

Baseline. Our approach is compared against four generative methods suited to this dataset. FTCP(Ren et al., 2021), a coordinate-based, non-E(3)-invariant method, represents crystals via a blend of real-space and Fourier-transformed properties, utilizing a CNN-VAE architecture for generation. G-Sch Net(Gebauer et al., 2019) employs an autoregressive model for structure generation, while P-G-Sch Net is a G-Sch Net variant incorporating periodicity. CDVAE(Xie et al., 2021), as previously mentioned, integrates a score matching-based decoder into the VAE framework; here, its standard version is applied without modifications. Sy Mat (Luo et al., 2023) uses a variational auto-encoder for generating periodic structures, defining lattice and atom types. Diff CSP (Jiao et al., 2023), a diffusion method, learns stable structure distributions, incorporating translation, rotation, and periodicity, effectively modeling material systems.

Equivariant Diffusion for Crystal Structure Prediction

Figure 4. Learning curves of lattice loss.

Figure 5. Leanring curves of fractional coordinates loss.

Equivariant Diffusion for Crystal Structure Prediction

Table 3. Results on ab initio generation task. The results of baseline methods are from Jiao (Jiao et al., 2023)

Data Method Validity (%) Coverage (%) Property Struc. Comp. COV-R COV-P dρ d E delem

Perov-5 FTCP 0.24 54.24 0.00 0.00 10.27 156.0 0.6297 Cond-DFC-VAE 73.60 82.95 73.92 10.13 2.268 4.111 0.8373 G-Sch Net 99.92 98.79 0.18 0.23 1.625 4.746 0.0368 P-G-Sch Net 79.63 99.13 0.37 0.25 0.2755 1.388 0.4552 CDVAE 100.0 98.59 99.45 98.46 0.1258 0.0264 0.0628 Sy Mat 100.0 97.40 99.68 98.64 0.1893 0.2364 0.0177 Diff CSP 100.0 98.85 99.74 98.27 0.1110 0.0263 0.0128 Equi CSP 100.0 98.60 99.60 98.76 0.1110 0.0257 0.0503

Carbon-24 FTCP 0.08 0.00 0.00 5.206 19.05 G-Sch Net 99.94 0.00 0.00 0.9427 1.320 P-G-Sch Net 48.39 0.00 0.00 1.533 134.7 CDVAE 100.0 99.80 83.08 0.1407 0.2850 Sy Mat 100.0 100.0 97.59 0.1195 3.9576 - Diff CSP 100.0 99.90 97.27 0.0805 0.0820 Equi CSP 100.0 99.75 97.12 0.0734 0.0508

MP-20 FTCP 1.55 48.37 4.72 0.09 23.71 160.9 0.7363 G-Sch Net 99.65 75.96 38.33 99.57 3.034 42.09 0.6411 P-G-Sch Net 77.51 76.40 41.93 99.74 4.04 2.448 0.6234 CDVA 100.0 86.70 99.15 99.49 0.6875 0.2778 1.432 Sy Mat 100.0 88.26 98.97 99.97 0.3805 0.3506 0.5067 Diff CSP 100.0 83.25 99.71 99.76 0.3502 0.1247 0.3398 Equi CSP 99.97 82.20 99.65 99.68 0.1300 0.0848 0.3978

Evaluation Metrics We assess the results using three different criteria. Validity: This encompasses both structural and compositional validity. Structural validity is assessed by calculating the percentage of generated structures where all pairwise distances exceed 0.5 A, while compositional validity checks for charge neutrality using the SMACT criteria (Davies et al., 2019). Coverage: This metric evaluates how well the structural and compositional attributes of the generated samples Sg match those in the test set St. It uses d S(M1, M2) and d C(M1, M2) to represent the L2 distances for Crystal NN structural fingerprints (Zimmermann & Jain, 2020) and normalized Magpie compositional fingerprints (Ward et al., 2016), respectively. Coverage Recall (COV-R) is calculated as COV-R = 1 |St||{Mi|Mi St, Mj Sg, d S(Mi, Mj) < δS, d C(Mi, Mj) < δC}|, with predefined thresholds δS, δC. Coverage Precision (COV-P) is defined in a similar manner but with the roles of Sg and St reversed. Property Statistics: This includes the calculation of Wasserstein distances for three properties density, formation energy, and elemental count between the generated and test structures, denoted as dρ, d E, and delem, respectively. The validity and coverage metrics are based on 10,000 generated samples, whereas the property statistics are derived from 1,000 samples that passed the validity check.

Results. Our method, Equi CSP, exhibits outstanding performance across multiple metrics, as detailed in Table 3. Notably, Equi CSP achieves competitive results in validity and coverage precision, underscoring the high quality of the samples it generates. Additionally, it delivers robust coverage recall, demonstrating the diversity of the structures produced. In the realm of property metrics, Equi CSP excels by significantly reducing the density distance dρ, influenced by the volume of the generated lattice, and the formation energy distance d E, which relates to the atomic configuration. These achievements in minimizing key distances underscore the effectiveness of our symmetry-aware processing approach.

F. Visualizations

In this section, we present additional visualizations of the predicted structures from Equi CSP and the second best method Diff CSP in Figure 6. Our Equi CSP provides more accurate predictions compared with Diff CSP.

Equivariant Diffusion for Crystal Structure Prediction

Figure 6. Additional visualizations of the predicted structures. We translate the same atom to the origin for better visualization and comparison.