# space_group_constrained_crystal_generation__22f7d472.pdf Published as a conference paper at ICLR 2024 SPACE GROUP CONSTRAINED CRYSTAL GENERATION Rui Jiao1,2 , Wenbing Huang3,4 , Yu Liu5, Deli Zhao5, Yang Liu1,2 1Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University 2Institute for AIR, Tsinghua University 3Gaoling School of Artificial Intelligence, Renmin University of China 4Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China 5Alibaba Group Crystals are the foundation of numerous scientific and industrial applications. While various learning-based approaches have been proposed for crystal generation, existing methods seldom consider the space group constraint which is crucial in describing the geometry of crystals and closely relevant to many desirable properties. However, considering space group constraint is challenging owing to its diverse and nontrivial forms. In this paper, we reduce the space group constraint into an equivalent formulation that is more tractable to be handcrafted into the generation process. In particular, we translate the space group constraint into two parts: the basis constraint of the invariant logarithmic space of the lattice matrix and the Wyckoff position constraint of the fractional coordinates. Upon the derived constraints, we then propose Diff CSP++, a novel diffusion model that has enhanced a previous work Diff CSP (Jiao et al., 2023) by further taking space group constraint into account. Experiments on several popular datasets verify the benefit of the involvement of the space group constraint, and show that our Diff CSP++ achieves promising performance on crystal structure prediction, ab initio crystal generation and controllable generation with customized space groups. 1 INTRODUCTION Crystal generation represents a critical task in the realm of scientific computation and industrial applications. The ability to accurately and efficiently generate crystal structures opens up avenues for new material discovery and design, thereby having profound implications for various fields, including physics, chemistry, and material science (Liu et al., 2017; Oganov et al., 2019). Recent advancements in machine learning have paved the way for the application of generative models to this task (Nouira et al., 2018; Hoffmann et al., 2019; Hu et al., 2020; Ren et al., 2021). Among various strategies, diffusion models have been exhibited to be particularly effective in generating realistic and diverse crystal structures (Xie et al., 2021; Jiao et al., 2023). These methods leverage a stochastic process to gradually transform a random initial state into a stable distribution, effectively capturing the complex landscapes of crystal structures. Despite the success of existing methods, one significant aspect that has been largely overlooked is the consideration of space group symmetry (Hiller, 1986). Space groups play a pivotal role in crystallography, defining the geometry of crystal structures and being intrinsically tied to many properties such as the topological phases (Tang et al., 2019; Chen et al., 2022). However, integrating space group symmetry into diffusion models is a non-trivial task due to the diverse and complex forms of space groups. In this paper, we supplement this piece by introducing a novel approach that effectively takes space group constraints into account. Our method, termed Diff CSP++, enhances the previous Diff CSP method (Jiao et al., 2023) by translating the space group constraint into a more manageable form, which can be seamlessly integrated into the diffusion process. Our contributions can be summarized as follows: This work is done when Rui Jiao works as an intern in Alibaba Group. Wenbing Huang and Yang Liu are corresponding authors. Published as a conference paper at ICLR 2024 We propose to equivalently interpret the space group constraint into two tractable parts: the basis constraint of the O(3)-invariant logarithmic space of the lattice matrix in 4.1 and the Wyckoff position constraint of the fractional coordinates in 4.2, which largely facilitates the incorporation of the space group constraint into the crystal generation process. Our method Diff CSP++ separately and simultaneously generates the lattices, fractional coordinates and the atom composition under the reduced form of the space group constraint, through a novel denoising model that is E(3)-invariant. Extensive experiments demonstrate that our method not only respects the crucial space group constraints but also achieves promising performance in crystal structure prediction and ab initio crystal generation. 2 RELATED WORKS Learning-based Crystal Generation. Data-driven approaches have emerged as a promising direction in the field of crystal generation. These techniques, rather than employing graph-based models, depicted crystals through alternative representations such as voxels voxels (Court et al., 2020; Hoffmann et al., 2019; Noh et al., 2019), distance matrices (Yang et al., 2021; Hu et al., 2020; 2021) or 3D coordinates (Nouira et al., 2018; Kim et al., 2020; Ren et al., 2021). Recently, CDVAE (Xie et al., 2021) combines the VAE backbone with a diffusion-based decoder, and generates the atom types and coordinates on a multi-graph (Xie & Grossman, 2018) built upon predicted lattice parameters. Diff CSP (Jiao et al., 2023) jointly optimizes the lattice matrices and atom coordinates via a diffusion framework. Based on the joint diffusion paradigm, Matter Gen (Zeni et al., 2023) applies polar decomposition to represent lattices as O(3)-invariant symmetry matrices, and Gems Diff (Klipfel et al., 2024) projects the lattice matrices onto a decomposed linear vector space. Although these approaches share similar lattice representations with our method, they often overlook the constraints imposed by space groups. Addressing this gap, PGCGM (Zhao et al., 2023) incorporates the affine matrices of the space group as additional input into a Generative Adversarial Network (GAN) model. However, the application of PGCGM is constrained by ternary systems, thus limiting its universality and rendering it inapplicable to all datasets applied in this paper. Besides, PCVAE (Liu et al., 2023) integrates space group constraints to predict lattice parameters using a conditional VAE. In contrast, we impose constraints on the lattice within the logarithmic space, ensuring compatibility with the diffusion-based framework. Moreover, we further specify the Wyckoff position constraints of all atoms, achieving the final goal of structure prediction. Diffusion Generative Models. Diffusion models have been recognized as a powerful generative framework across various domains. Initially gaining traction in the field of computer vision (Ho et al., 2020; Rombach et al., 2021; Ramesh et al., 2022), the versatility of diffusion models has been demonstrated in their application to the generation of small molecules (Xu et al., 2021; Hoogeboom et al., 2021), protein structures (Luo et al., 2022) and crystalline materials (Xie et al., 2021; Jiao et al., 2023). Notably, Chroma (Ingraham et al., 2022) incorporates symmetry conditions into the generation process for protein structures. Different from symmetric proteins, space group constraints require reliable designs for the generation of lattices and special Wyckoff positions, which is mainly discussed in this paper. 3 PRELIMINARIES Crystal Structures A crystal structure M describes the periodic arrangement of atoms in 3D space. The repeating unit is called a unit cell, which can be characterized by a triplet, denoted as (A, X, L), where A = [a1, a2, ..., a N] Rh N represents the one-hot representations of atom type, X = [x1, x2, ..., x N] R3 N comprises the atoms Cartesian coordinates, and L = [l1, l2, l3] R3 3 is the lattice matrix containing three basic vectors to periodically translate the unit cell to the entire 3D space, which can be extended as M := {(ai, x i)|x i L= xi}, where x i L= xi denotes that x i is equivalent to xi if x i can be obtained via an integral translation of xi along the lattices L i.e., x i L= xi ki Z3 1, s.t. x i = x + Lki. (1) Apart from the prevalent Cartesian coordinate system, fractional coordinates are also widely applied in crystallography. Given a lattice matrix L = [l1, l2, l3], the fractional coordinate f = Published as a conference paper at ICLR 2024 Aware Denoising k2 = k3 = k4 = 0 k2 = k3 = k4 = 0 Crystal Family Constraints Wyckoff Position Constraints Figure 1: Overview of our proposed Diff CSP++ for the denoising from Mt to Mt 1. We decompose the space group constraints as the crystal family constraints on the lattice matrix (the red dashed line) and the Wyckoff position constraints on each atom (the blue dashed line). (f1, f2, f3) [0, 1)3 locates the atom at x = P3 j=1 fjlj. More generally, given a Cartesian coordinate matrix X = [x1, x2, ..., x N], the corresponding fractional matrix is derived as F = L 1X. Space Group The concept of space group is used to describe the inherent symmetry of a crystal structure. Given a transformation g E(3), we define the transformation of the coordinate matrix X as g X which is implemented as g X := OX + t1 for a orthogonal matrix O O(3), a translation vector t R3 and a 3-dimensional all-ones vector 1. If g lets M invariant, that is g M := {(ai, g xi)} = M (note that the symbol = here refers to the equivalence between sets), M is recognized to be symmetric with respect to g. The space group symmetry g M = M can also be depicted by checking how the atoms are transformed. Specifically, for each transformation g G(M), there exists a permutation matrix Pg {0, 1}N N that maps each atom to its corresponding symmetric point: A = APg, g X L= XPg, (2) The set of all possible symmetric transformations of M constitutes a space group G(M) = {g E(3)|g M = M}. Owing to the periodic nature of crystals, the size of G(M) is finite, and the total count of different space groups is finite as well. It has been conclusively demonstrated that there are 230 kinds of space groups for all crystals. Task Definition We focus on generating space group-constrained crystals by learning a conditional distribution p(M|G), where G is the given space group with size |G| = m. Most previous works (Xie et al., 2021; Jiao et al., 2023) derive p(M) without G and they usually apply E(3)- equivariant generative models to implement p(M) to eliminate the influence by the choice of the coordinate systems. In this paper, the O(3) equivariance is no longer required as both L and X will be embedded to invariant quantities, which will be introduced in 4.1. The translation invariance and periodicity will be maintained under the Fourier representation of the fractional coordinates, which will be shown in 4.4. 4 THE PROPOSED METHOD: DIFFCSP++ It is nontrivial to exactly involve the constraint of Eq. 2 into existing generative models, due to the various types of the space group constraints. In this section, we will reduce the space group constraint from two aspects: the invariant representation of constrained lattice matrices in 4.1 and the Wyckoff positions of fractional coordinates in 4.2, which will be tractably and inherently maintained during our proposed diffusion process in 4.3. 4.1 INVARIANT REPRESENTATION OF LATTICE MATRICES The lattice matrix L R3 3 determines the shape of the unit cell. If the determinant (namely the volume) of L is meaningful: det(L) > 0, then the lattice matrix is invertible and we have the following decomposition. Proposition 1 (Polar Decomposition (Hall, 2013)). An invertible matrix L R3 3 can be uniquely decomposed into L = Q exp(S), where Q R3 3 is an orthogonal matrix, S R3 3 is a symmetric matrix and exp(S) = P n=0 Sn n! defines the exponential mapping of S. The above proposition indicates that L can be uniquely represented by a symmetric matrix S. Moreover, any O(3) transformation of L leaves S unchanged, as the transformation will be reflected by Published as a conference paper at ICLR 2024 Q. We are able to find 6 bases of the space of symmetric matrices, e.g., 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 1 0 0 0 1 Each symmetric matrix can be expanded via the above symmetric bases as stated below. Proposition 2. S R3 3, S = S , k = (k1, , k6), s.t.S = P6 i=1 ki Bi. By joining the conclusions of Propositions 1 and 2, it is clear to find that L is determined by the values of ki. Therefore, we are able to choose different combinations of the symmetric bases to reflect the space group constraint acting on L. Actually, the total 230 space groups are classified into 6 crystal families, determining the shape of L. After careful derivations (provided in Appendix A.3), the correspondence between the crystal families and the values of ki is given in the following table. With such a table, when we want to generate the lattice restricted by a given space group, we can first retrieve the crystal family and then enforce the corresponding constraint on ki during generation. Table 1: Relationship between the lattice shape and the constraint of the symmetric bases, where a, b, c and α, β, γ denote the lengths and angles of the lattice bases, respectively. Crystal Family Space Group No. Lattice Shape Constraint of Symmetric Bases Triclinic 1 2 No Constraint No Constraint Monoclinic 3 15 α = γ = 90 k1 = k3 = 0 Orthorhombic 16 74 α = β = γ = 90 k1 = k2 = k3 = 0 Tetragonal 75 142 α = β = γ = 90 k1 = k2 = k3 = 0 a = b k4 = 0 Hexagonal 143 194 α = β = 90 , γ = 120 k2 = k3 = 0, k1 = log(3)/4 a = b k4 = 0 Cubic 195 230 α = β = γ = 90 k1 = k2 = k3 = 0 a = b = c k4 = k5 = 0 4.2 WYCKOFF POSITIONS OF FRACTIONAL COORDINATES As shown in Eq. 2, each transformation g G is associated with a permutation matrix Pg. Considering atom i for example, it will be transformed to a symmetric and equivalent point j, if Pg[i, j] = 1, i = j, where Pg[i, j] returns the element of the i-th row and j-th column. Under some particular transformation g, the atom s will be transformed to itself, implying that Pg[i, i] = 1. Such transformations that leave i invariant comprise the site symmetry group, defined as Gi = {g G|g xi L= xi} G. Now, we introduce the notion of Wyckoff position that is useful in crystallography. For atom i, it shares the same Wyckoff position with atoms owning the conjugate site symmetry groups of Gi. There could be multiple types of Wyckoff positions in a unit cell. Wyckoff positions can also be represented by the fractional coordinate system. Given a crystal structure with N atoms belonging to N Wyckoff positions, we denote that each Wyckoff position contains ns atoms satisfying 1 s N . The symbol ns is named as the multiplicity of the s-th Wyckoff position and maintains PN s=1 ns = N. In the following sections, we denote a s, f s as the atom type and basic coordinate of the s-th Wyckoff position, and asi, fsi as the atom type and fractional coordinate of atom si in the s-th Wyckoff position. A type of Wyckoff positions is formulated as a list of transformation pairs {(Rsi, tsi)}ns i=1 that project the basic coordinate f s to all equivalent positions {Rsif s + tsi}ns i=1. Figure 2 illustrates an example with N = 7. Since Wyckoff positions are inherently determined by the space group, we can realize the space group constraint by restricting the coordinates of the atoms that are located in the same set of the Wyckoff positions during crystal generation, which will be presented in detail in the next subsection. Published as a conference paper at ICLR 2024 Wyckoff Positions of P4mm (No.11) 1a: (0,0) 1b: (1/2,1/2) 2c: (1/2,0), (0,1/2) 4d: (x,0), (-x,0), (0,x), (0,-x) 4e: (x,1/2), (-x,1/2), (1/2,x), (1/2,-x) 4f: (x,x), (x,-x), (-x,x), (-x,-x) 8g: (x,y), (x,-y), (-x,y), (-x,-y) (y,x), (y,-x), (-y,x), (-y,-x) [dashed, orange] Special position [dashed, orange] General position Figure 2: Inspired by Py Xtal (Fredericks et al., 2021), we utilize the 2D plain group P4mm as a toy example to demonstrate the concept of Wyckoff position. An asymmetric triangle is copied eight times to construct the square unit cell, and the general Wyckoff position (8g1) has a multiplicity of eight. The other Wyckoff positions are restricted to certain subspaces. For instance, position 4e is constrained by the red dashed lines. 4.3 DIFFUSION UNDER SPACE GROUP CONSTRAINTS To tackle the crystal generation problem, we utilize diffusion models to jointly generate the lattice matrix L, the fractional coordinates F and the atom types A under the specific space group constraint. We detail the forward diffusion process and the backward generation process of the three key components as follows. Diffusion on L. As mentioned in 4.1, the lattice matrix L can be uniquely represented by an O(3)- invariant coefficient vector k. Hence, we directly design the diffusion process on k, and the forward probability of time step t is given by q(kt|k0) = N kt| αtk0, (1 αt)I , (3) where αt = Qt s=1(1 βt), and βt (0, 1) determines the variance of each diffusion step, controlled by the cosine scheduler proposed in Nichol & Dhariwal (2021). Starting from the normal prior k T N(0, I), the corresponding generation process is designed as p(kt 1|Mt) = N(kt 1|µk(Mt), βt 1 αt 1 1 αt I), (4) where µk(Mt) = 1 αt kt βt 1 αt ˆϵk(Mt, t) , and the term ˆϵk(Mt, t) is predicted by the denois- ing model ϕ(Mt, t). To confine the structure under a desired space group constraint, we only diffuse and generate the unconstrained dimensions of k while preserving the constrained value k0 as outlined in Table 1. To optimize the denoising term ˆϵk(Mt, t), we first sample ϵk N(0, I) and reparameterize kt as kt = m ( αtk0+ 1 αtϵk)+(1 m) k0, where the mask is given by m {0, 1}6, mi = 1 indicates the i-th basis is unconstrained, and is the element-wise multiplication. The objective on k is finally computed by Lk = Eϵk N(0,I),t U(1,T )[ m ϵk ˆϵL(Mt, t) 2 2]. (5) Diffusion on F . In a unit cell, the fractional coordinates F R3 N of N atoms can be arranged as the Wyckoff positions of N basic fractional coordinates F R3 N . Hence we only focus on the generation of F , and its forward process is conducted via the Wrapped Normal distribution following Jiao et al. (2023) to maintain periodic translation invariance: q(F t|F 0) = Nw F t|F 0, σ2 t I , (6) 1In practical implementation, Wyckoff positions are identified by a combination of a number and a letter, where the number is the multiplicity, and the letter is to distinguish the Wyckoff position type in a dictionary order corresponding to the ascending order of the multiplicity. Published as a conference paper at ICLR 2024 The forward sampling can be implemented as F t = w(F 0 + σtϵF (Mt, t)), where w( ) retains the fractional part of the input. For the backward process, we first acquire F T from the uniform initialization, and sample F 0 via the predictor-corrector sampler with the denoising term ˆϵF (Mt, t) output by the model ϕ(Mt, t). Note that the basic coordinates do not always have 3 degrees of freedom. The transformation matrice Rsi could be singular, resulting in Wyckoff positions located in specific planes, axes, or even reduced to fixed points. Hence we project the noise term ϵF onto the constrained subspaces via the least square method as ϵ F [:, s] = R s0ϵF [:, s], where R i is the pseudo-inverse of Rs0. The training objective on F is LF = EF t q (F t |F 0),t U(1,T ) λt F t log q (F t|F 0) ˆϵF (Mt, t) 2 2 , (7) where λt = E 1 log Nw(0, σ2 t ) 2 2 is the pre-computed weight, and q is the projected distribution of q induced by ϵ F . Further details are provided in Appendix. Diffusion on A. Since the atom types A remain consistent with the Wyckoff positions, we can also only focus on the basic atoms A A. Considering A Rh N as the one-hot continues representation, we apply the standard DDPM-based method by specifying the forward process as q(A t|A 0) = N A t| αt A 0, (1 αt)I . (8) And the backward process is defined as p(A t 1|Mt) = N(A t 1|µA (Mt), βt 1 αt 1 1 αt (Mt)I), (9) where µA (Mt) is similar to µk(Mt) in Eq. 4. The denoising term ˆϵA (Mt, t) Rh N is predicted by the model ϕ(Mt, t). The training objective is LA = EϵAs N(0,I),t U(1,T )[ ϵAs ˆϵAs(Mt, t) 2 2]. (10) The entire objective for training the joint diffusion model of M is combined as LM = λk Lk + λF LF + λA LA . (11) 4.4 DENOISING MODEL In this subsection, we introduce the specific design of the denoising model ϕ(Mt, t) to obtain the three denoising terms ˆϵk, ˆϵF , ˆϵA under the space group constraint, with the detailed architecture illustrated in Figure 4 at Appendix B.1. We omit the subscript t in this subsection for brevity. We first fuse the atom embeddings fatom(A) and the sinusoidal time embedding ftime(t) to acquire the input node features H = φin(fatom(A), ftime(t)), where φin is an MLP. The message passing from node j to i in the l-th layer is designed as Eq. (12-13), m(l) ij = φm(h(l 1) i ,h(l 1) j , k, ψFT(fj fi)), (12) h(l) i = h(l 1) i + φh(h(l 1) i , j=1 m(l) ij ), (13) ˆϵk,unconstrained = φk 1 h(L) i , (14) ˆϵF [:, i], ˆϵA[:, i] = φF (h(L) i ), φA(h(L) i ), (15) ˆϵk = m ˆϵk,unconstrained, (16) ˆϵF = Wyckoff Mean(ˆϵ F ), (17) ˆϵA = Wyckoff Mean(ˆϵA), (18) where φm and φh are MLPs, and ψFT : ( 1, 1)3 [ 1, 1]3 K is the Fourier transformation with K bases to periodically embed the relative fractional coordinate fj fi. Note that here we apply k in Eq. (12) as the unique O(3)-invariant representation of L instead of the inner product L L in Diff CSP (Jiao et al., 2023), and its reliability is validated in 5.4. After L layers of message passing, we get the invariant graphand node-level denoising terms as Eq. (1415), where φk, φF , φA are MLPs. To align with the constrained diffusion framework proposed in 4.3, the denosing terms are required to maintain the space group constraints, which is not considered in the original Diff CSP. The constrained denoising terms are finally projected as Eq. (16-18), where ˆϵ F [:, i] = R i ˆϵF [:, i] is the projected denoising term towards the subspace of the Wyckoff positions, and Wyckoff Mean computes the average of atoms belonging to the same Wyckoff position. Published as a conference paper at ICLR 2024 Table 2: Results on crystal structure prediction task. MR stands for Match Rate. Perov-5 MP-20 MPTS-52 MR (%) RMSE MR (%) RMSE MR (%) RMSE RS 36.56 0.0886 11.49 0.2822 2.68 0.3444 BO 55.09 0.2037 12.68 0.2816 6.69 0.3444 PSO 21.88 0.0844 4.35 0.1670 1.09 0.2390 P-c G-Sch Net (Gebauer et al., 2022) 48.22 0.4179 15.39 0.3762 3.67 0.4115 CDVAE (Xie et al., 2021) 45.31 0.1138 33.90 0.1045 5.34 0.2106 Diff CSP (Jiao et al., 2023) 52.02 0.0760 51.49 0.0631 12.19 0.1786 Diff CSP++ (w/ CSPML) 52.17 0.0841 70.58 0.0272 37.17 0.0676 Diff CSP++ (w/ GT) 98.44 0.0430 80.27 0.0295 46.29 0.0896 5 EXPERIMENTS In this section, we evaluate our method over various tasks. We demonstrate the capability and explore the potential upper limits on the crystal structure prediction task in 5.2. Additionally, we present the remarkable performance achieved in the ab initio generation task in 5.3. We further provide adequate analysis in 5.4. Datasets. We evaluate our method on four datasets with different data distributions. Perov5 (Castelli et al., 2012) encompasses 18,928 perovskite crystals with similar structures but distinct compositions. Each structure has precisely 5 atoms in a unit cell. Carbon-24 (Pickard, 2020) comprises 10,153 carbon crystals. All the crystals share only one element, Carbon, while exhibiting diverse structures containing 6 24 atoms within a unit cell. MP-20 (Jain et al., 2013) contains 45,231 materials sourced from Material Projects with diverse compositions and structures. These materials represent the majority of experimentally generated crystals, each consisting of no more than 20 atoms in a unit cell. MPTS-52 serves as a more challenging extension of MP-20, consisting of 40,476 structures with unit cells containing up to 52 atoms. For Perov-5, Carbon-24 and MP-20, we follow the 60-20-20 split with previous works (Xie et al., 2021). For MPTS-52, we perform a chronological split, allocating 27,380/5,000/8,096 crystals for training/validation/testing. Tasks. We focus on two major tasks attainable through our method. Crystal Structure Prediction (CSP) aims at predicting the structure of a crystal based on its composition. Ab Initio Generation requires generating crystals with valid compositions and stable structures. We conduct the CSP experiments on Perov-5, MP-20, and MPTS-52, as the structures of carbon crystals vary diversely, and it is not reasonable to match the generated samples with one specific reference on Carbon24. The comparison for the generation task is carried out using Perov-5, Carbon-24, and MP-20, aligning with previous works. 5.2 CRYSTAL STRUCTURE PREDICTION To adapt our method to the CSP task, we keep the atom types unchanged during the training and generation stages. Moreover, the proposed Diff CSP++ requires the provision of the space group and the Wyckoff positions of all atoms during the generation process. To address this requirement, we employ two distinct approaches to obtain these essential conditions. For the first version of our method, we select the space group of the Ground-Truth (GT) data as input. However, it s worth noting that these conditions are typically unavailable in real-world scenarios. Instead, we also implement our method with CSPML (Kusaba et al., 2022), a metric learning technique designed to select templates for the prediction of new structures. Given a composition, we first identify the composition in the training set that exhibits the highest similarity. Subsequently, we employ the corresponding structure as a template and refine it using Diff CSP++ after proper element substitution. We provide more details in Appendix B.2. For evaluation, we match the predicted sample with the ground truth structure. For each composition within the testing set, we generate one structure and the match is determined by the Structure Matcher class in pymatgen (Ong et al., 2013) with thresholds stol=0.5, angle tol=10, ltol=0.3, in accordance Published as a conference paper at ICLR 2024 Table 3: Results on ab initio generation task. The results of baseline methods are from Xie et al. (2021); Jiao et al. (2023). Data Method Validity (%) Coverage (%) Property Struc. Comp. COV-R COV-P dρ d E delem Perov-5 FTCP (Ren et al., 2021) 0.24 54.24 0.00 0.00 10.27 156.0 0.6297 Cond-DFC-VAE (Court et al., 2020) 73.60 82.95 73.92 10.13 2.268 4.111 0.8373 G-Sch Net (Gebauer et al., 2019) 99.92 98.79 0.18 0.23 1.625 4.746 0.0368 P-G-Sch Net (Gebauer et al., 2019) 79.63 99.13 0.37 0.25 0.2755 1.388 0.4552 CDVAE (Xie et al., 2021) 100.0 98.59 99.45 98.46 0.1258 0.0264 0.0628 Diff CSP (Jiao et al., 2023) 100.0 98.85 99.74 98.27 0.1110 0.0263 0.0128 Diff CSP++ 100.0 98.77 99.60 98.80 0.0661 0.0405 0.0040 Carbon-24 FTCP (Ren et al., 2021) 0.08 0.00 0.00 5.206 19.05 G-Sch Net (Gebauer et al., 2019) 99.94 0.00 0.00 0.9427 1.320 P-G-Sch Net (Gebauer et al., 2019) 48.39 0.00 0.00 1.533 134.7 CDVAE (Xie et al., 2021) 100.0 99.80 83.08 0.1407 0.2850 Diff CSP (Jiao et al., 2023) 100.0 99.90 97.27 0.0805 0.0820 Diff CSP++ 99.99 100.0 88.28 0.0307 0.0935 MP-20 FTCP (Ren et al., 2021) 1.55 48.37 4.72 0.09 23.71 160.9 0.7363 G-Sch Net (Gebauer et al., 2019) 99.65 75.96 38.33 99.57 3.034 42.09 0.6411 P-G-Sch Net (Gebauer et al., 2019) 77.51 76.40 41.93 99.74 4.04 2.448 0.6234 CDVAE (Xie et al., 2021) 100.0 86.70 99.15 99.49 0.6875 0.2778 1.432 Diff CSP (Jiao et al., 2023) 100.0 83.25 99.71 99.76 0.3502 0.1247 0.3398 Diff CSP++ 99.94 85.12 99.73 99.59 0.2351 0.0574 0.3749 with previous setups. The match rate represents the ratio of matched structures relative to the total number within the testing set, and the RMSD is averaged over the matched pairs, and normalized by V/N where V is the volume of the lattice. We compare our methods with two lines of baselines. The first line is the optimization-based methods (Cheng et al., 2022) including Random Search (RS), Bayesian Optimization (BO), and Particle Swarm Optimization (PSO). The second line considers three types of generative methods. P-c GSch Net (Gebauer et al., 2022) is an autoregressive model taking the composition as the condition. CDVAE (Xie et al., 2021) proposes a VAE framework that first predicts the invariant lattice parameters and then generates the atom types and coordinates via a score-based decoder. Diff CSP (Jiao et al., 2023) jointly generates the lattices and atom coordinates. All the generative methods do not consider the space group constraints. The results are shown in Table 2, where we provide the performance of the templates mined by CSPML and directly from GT. We have the following observations. 1. Diff CSP++, when equipped with GT conditions, demonstrates a remarkable superiority over other methods. This indicates that incorporating space group symmetries into the generation framework significantly enhances its ability to predict more precise structures. 2. When combined with CSPML templates, our method continues to surpass baseline methods. Given that the ground truth (GT) space groups are not accessible in real-world CSP scenes, our method offers a practical solution for predicting structures with high space group symmetry. 3. Notably, there remains a gap between match rates under space group conditions derived from mined templates and those from GT conditions (70.58% vs. 80.27% on MP-20). This suggests that an improved template-finding algorithm could potentially enhance performance, which we leave for further studies. 5.3 AB INITIO GENERATION For each dataset, we first sample 10,000 structures from the training set with replacement as templates, and conduct the ab initio generation on the extracted templates. We focus on three lines of metrics for evaluation. Validity. We requires both the structures and the compositions of the generated samples are valid. The structural valid rate is the ratio of the samples with the minimal pairwise distance larger than 0.5 A, while the compositional valid rate is the percentage of samples under valence equilibrium solved by SMACT (Davies et al., 2019). Coverage. The coverage recall (COV-R) and precision (COV-P) calculate the percentage of the crystals in the testing set and that in generated samples matched with each other within a fingerprint distance threshold. Property statistics. We calculate three Wasserstein distances between the generated and testing structures, specifically focusing on density, formation energy, and the number of elements (Xie et al., 2021), denoted as dρ, d E, and delem respectively. To execute this evaluation, we apply these validity and Published as a conference paper at ICLR 2024 coverage metrics to all 10,000 generated samples, and the property metrics are computed on a subset of 1,000 valid samples. We compare our method with previous generative methods FTCP (Ren et al., 2021), Cond-DFCVAE (Court et al., 2020), G-Sch Net (Gebauer et al., 2019), P-G-Sch Net, CDVAE (Xie et al., 2021) and Diff CSP (Jiao et al., 2023). Table 3 depicts the results. We find that our method yields comparable performance on validity and coverage metrics, while showcasing a substantial superiority over the baselines when it comes to property statistics, indicating that the inclusion of space group constraints contributes to the model s ability to generate more realistic crystals, especially for complex structures like in MP-20. 5.4 ANALYSIS In this subsection, we discuss the influence of the key components in our proposed framework. Table 4: Ablation studies. MP-20 MR (%) RMSE Invariant Lattice Representation Diff CSP 51.49 0.0631 Diff CSP-k 50.76 0.0608 Pre-Average vs. Post-Average Diff CSP++ (Pre) 78.75 0.0355 Diff CSP++ (Post) 80.27 0.0295 Invariant Lattice Representation. In this work, we substitute the coefficient vector k for the inner product term L L in Diff CSP (Jiao et al., 2023) to serve as the O(3)- invariant representation of the lattice matrix. To assess the impact of this modification, we adapt the diffusion process and representation from L to k in Diff CSP, without imposing extra space group constraints. This variant is denoted as Diff CSP-k. The performance of Diff CSP-k is substantiated by the results in Table 4 as being on par with the original Diff CSP, validating k as a dependable invariant representation of the lattice matrix. Pre-Average vs. Post-Average. In Eq. (17), we average the denoising outputs on F to the base nodes for each Wyckoff position, and calculate the losses on the base node in Eq. (7). Practically, the loss function can be implemented in two forms, named pre-average and post-average, as extended in Eq. (19 - 20) respectively. LF ,pre = λt F t log q (F t|F 0) Mean(ˆϵ F ) 2 2, (19) LF ,post = λt Mean F t log q (F t|F 0) ˆϵ F 2 2 . (20) Figure 3: Generation under different space groups. Intuitively, the pre-average loss enforces the average output of each Wyckoff position to match with the label on the base node, while the post-average loss minimizes the L2-distances of each atom. Table 4 reveals the superior performance of the post-average model, which we adopt for all subsequent experiments. Towards structures with customized symmetries Our method enables structure generation under given space group constraints, hence allowing the creation of diverse structures from the same composition but based on different space groups. To illustrate the versatility of our approach, we visualize some resulting structures in Figure 3 which demonstrates the distinct structures generated under various space group constraints. 6 CONCLUSION In this work, we propose Diff CSP++, a diffusion-based approach for crystal generation that effectively incorporates space group constraints. We decompose the complex space group constraints into invariant lattice representations of different crystal families and the symmetric atom types and coordinates according to Wyckoff positions, ensuring compatibility with the backbone model and the diffusion process. Adequate experiments verify the reliability of Diff CSP++ on crystal structure prediction and ab initio generation tasks. Notably, our method facilitates the generation of structures from specific space groups, opening up new opportunities for material design, particularly in applications where certain space groups or templates are known to exhibit desirable properties. Published as a conference paper at ICLR 2024 ACKNOWLEDGMENTS This work is supported by the National Science and Technology Major Project under Grant 2020AAA0107300, the National Natural Science Foundation of China (No. 61925601, 62376276), Beijing Nova Program (20230484278), and Alibaba Damo Research Fund. Ivano E Castelli, David D Landis, Kristian S Thygesen, Søren Dahl, Ib Chorkendorff, Thomas F Jaramillo, and Karsten W Jacobsen. New cubic perovskites for one-and two-photon water splitting using the computational materials repository. Energy & Environmental Science, 5(10):9034 9043, 2012. Lei Chen, Chandan Setty, Haoyu Hu, Maia G Vergniory, Sarah E Grefe, Lukas Fischer, Xinlin Yan, Gaku Eguchi, Andrey Prokofiev, Silke Paschen, et al. Topological semimetal driven by strong correlations and crystalline symmetry. Nature Physics, 18(11):1341 1346, 2022. Guanjian Cheng, Xin-Gao Gong, and Wan-Jian Yin. Crystal structure prediction by combining graph network and optimization algorithm. Nature communications, 13(1):1 8, 2022. Callum J Court, Batuhan Yildirim, Apoorv Jain, and Jacqueline M Cole. 3-d inorganic crystal structure generation and property prediction via representation learning. Journal of chemical information and modeling, 60(10):4518 4535, 2020. Daniel W Davies, Keith T Butler, Adam J Jackson, Jonathan M Skelton, Kazuki Morita, and Aron Walsh. Smact: Semiconducting materials by analogy and chemical theory. Journal of Open Source Software, 4(38):1361, 2019. Scott Fredericks, Kevin Parrish, Dean Sayre, and Qiang Zhu. Pyxtal: A python library for crystal structure generation and symmetry analysis. Computer Physics Communications, 261:107810, 2021. ISSN 0010-4655. doi: https://doi.org/10.1016/j.cpc.2020.107810. URL http://www. sciencedirect.com/science/article/pii/S0010465520304057. Niklas Gebauer, Michael Gastegger, and Kristof Sch utt. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch e-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp. 7566 7578. Curran Associates, Inc., 2019. Niklas WA Gebauer, Michael Gastegger, Stefaan SP Hessmann, Klaus-Robert M uller, and Kristof T Sch utt. Inverse design of 3d molecular structures with conditional generative neural networks. Nature communications, 13(1):1 11, 2022. Brian C Hall. Lie groups, Lie algebras, and representations. Springer, 2013. Howard Hiller. Crystallography and cohomology of groups. The American Mathematical Monthly, 93(10):765 779, 1986. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840 6851, 2020. Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, and Yoshua Bengio. Data-driven approach to encoding and decoding 3-d crystal structures. ar Xiv preprint ar Xiv:1909.00949, 2019. Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr e, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in Neural Information Processing Systems, 34:12454 12465, 2021. Jianjun Hu, Wenhui Yang, and Edirisuriya M Dilanga Siriwardane. Distance matrix-based crystal structure prediction using evolutionary algorithms. The Journal of Physical Chemistry A, 124 (51):10909 10919, 2020. Published as a conference paper at ICLR 2024 Jianjun Hu, Wenhui Yang, Rongzhi Dong, Yuxin Li, Xiang Li, Shaobo Li, and Edirisuriya MD Siriwardane. Contact map based crystal structure prediction using global optimization. Cryst Eng Comm, 23(8):1765 1776, 2021. John Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, et al. Illuminating protein space with a programmable generative model. Bio Rxiv, pp. 2022 12, 2022. Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL materials, 1(1):011002, 2013. Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion on lattices and fractional coordinates. In Workshop on Machine Learning for Materials ICLR 2023, 2023. URL https://openreview.net/ forum?id=VPByphdu24j. Sungwon Kim, Juhwan Noh, Geun Ho Gu, Alan Aspuru-Guzik, and Yousung Jung. Generative adversarial networks for crystal structure prediction. ACS central science, 6(8):1412 1420, 2020. Astrid Klipfel, Ya el Fregier, Adlane Sayede, and Zied Bouraoui. Vector field oriented diffusion model for crystal material generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 22193 22201, 2024. Minoru Kusaba, Chang Liu, and Ryo Yoshida. Crystal structure prediction with machine learningbased element substitution. Computational Materials Science, 211:111496, 2022. Ron Larson. Elementary linear algebra. Cengage Learning, 2016. Chang Liu, Erina Fujita, Yukari Katsura, Yuki Inada, Asuka Ishikawa, Ryuji Tamura, Kaoru Kimura, and Ryo Yoshida. Machine learning to predict quasicrystals from chemical compositions. Advanced Materials, 33(36):2102507, 2021. Ke Liu, Shangde Gao, Kaifan Yang, and Yuqiang Han. Pcvae: A physics-informed neural network for determining the symmetry and geometry of crystals. In 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1 8. IEEE, 2023. Yue Liu, Tianlu Zhao, Wangwei Ju, and Siqi Shi. Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159 177, 2017. Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, and Jianzhu Ma. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162 8171. PMLR, 2021. Juhwan Noh, Jaehoon Kim, Helge S Stein, Benjamin Sanchez-Lengeling, John M Gregoire, Alan Aspuru-Guzik, and Yousung Jung. Inverse design of solid-state materials via a continuous representation. Matter, 1(5):1370 1384, 2019. Asma Nouira, Nataliya Sokolovska, and Jean-Claude Crivello. Crystalgan: learning to discover crystallographic structures with generative adversarial networks. ar Xiv preprint ar Xiv:1810.11203, 2018. Artem R Oganov, Chris J Pickard, Qiang Zhu, and Richard J Needs. Structure prediction drives materials discovery. Nature Reviews Materials, 4(5):331 348, 2019. Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent L Chevrier, Kristin A Persson, and Gerbrand Ceder. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68:314 319, 2013. Published as a conference paper at ICLR 2024 Chris J. Pickard. Airss data for carbon at 10gpa and the c+n+h+o system at 1gpa, 2020. Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical textconditional image generation with clip latents. ar Xiv preprint ar Xiv:2204.06125, 2022. Zekun Ren, Siyu Isaac Parker Tian, Juhwan Noh, Felipe Oviedo, Guangzong Xing, Jiali Li, Qiaohao Liang, Ruiming Zhu, Armin G. Aberle, Shijing Sun, Xiaonan Wang, Yi Liu, Qianxiao Li, Senthilnath Jayavelu, Kedar Hippalgaonkar, Yousung Jung, and Tonio Buonassisi. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter, 2021. ISSN 2590-2385. doi: https://doi.org/10.1016/j.matt.2021.11.032. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj orn Ommer. Highresolution image synthesis with latent diffusion models, 2021. Feng Tang, Hoi Chun Po, Ashvin Vishwanath, and Xiangang Wan. Comprehensive search for topological materials using symmetry indicators. Nature, 566(7745):486 489, 2019. Tian Xie and Jeffrey C. Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett., 120:145301, Apr 2018. doi: 10.1103/Phys Rev Lett.120.145301. Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi S Jaakkola. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations, 2021. Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2021. Wenhui Yang, Edirisuriya M Dilanga Siriwardane, Rongzhi Dong, Yuxin Li, and Jianjun Hu. Crystal structure prediction of materials with high symmetry using differential evolution. Journal of Physics: Condensed Matter, 33(45):455902, 2021. Claudio Zeni, Robert Pinsler, Daniel Z ugner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabb e, Lixin Sun, Jake Smith, et al. Mattergen: a generative model for inorganic materials design. ar Xiv preprint ar Xiv:2312.03687, 2023. Yong Zhao, Edirisuriya M Dilanga Siriwardane, Zhenyao Wu, Nihang Fu, Mohammed Al-Fahdi, Ming Hu, and Jianjun Hu. Physics guided deep learning for generative design of crystal materials with symmetry constraints. npj Computational Materials, 9(1):38, 2023. Nils ER Zimmermann and Anubhav Jain. Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity. RSC advances, 10 (10):6063 6081, 2020. Published as a conference paper at ICLR 2024 A THEORETICAL ANALYSIS A.1 PROOF OF PROPOSITION 1 The proposition 1 is rewritten and proved as follows. Proposition 1 (Polar Decomposition). An invertible matrix L R3 3 can be uniquely decomposed into L = Q exp(S), where Q R3 3 is an orthogonal matrix, S R3 3 is a symmetric matrix and exp(S) = P n=0 Sn n! defines the exponential mapping of S. Proof. Given an invertible matrix L R3 3, we first calculate the inner product term J = L L. As J is symmetric, we can formulate its eigendecomposition as J = UΛU , where U R3 3 is the square matrix composed by eigenvectors of J and Λ R3 3 is a diagonal matrix with eigenvalues of J as diagonal elements. The required symmetric matrix can be achieved by S = 1 2U log(Λ)U . As S is obviously symmetric, we need to prove that Q = L exp(S) 1 is orthogonal, i.e. Q Q = I. To see this, we have Q Q = Q L exp(S) 1 = Q L exp 1 2U log(Λ)U 1 = Q LU exp 1 From the above construction, we further have that the decomposition is unique, as exp(S) is positive definite. A.2 PROOF OF PROPOSITION 2 We begin with the following definition. Definition 1 (Frobenius Inner Product in Real Space). Given A, B R3 3, the Frobenius inner product is defined as A, B F = tr(A B), where tr( ) denotes the trace of the matrix. The proposition 2 is rewritten as follows. Proposition 2. S R3 3, S = S , k = (k1, , k6), s.t.S = P6 i=1 ki Bi. Proof. Based on the above definition, we can easily find that Bi, Bj F = 0, i, j = 1, , 6 and i = j, meaning that the bases defined in 4.1 are orthogonal bases, and the coefficients of the linear combination S = P6 i=1 ki Bi can be formed as ki = S, Bi F p Bi, Bi F . (21) As the 3D symmetric matrix S R3 3 has 6 degrees of freedom (Larson, 2016), it can be uniquely represented by the coefficient vector k = (k1, , k6). Published as a conference paper at ICLR 2024 A.3 CONSTRAINTS FROM DIFFERENT CRYSTAL FAMILIES In Crystallography, the lattice matrix L = [l1, l2, l3] can also be represented by the lengths a, b, c and angles α, β, γ of the parallelepiped. Specifically, we have a = l1 2, b = l2 2, c = l3 2, α = arccos l2,l3 b c , β = arccos l1,l3 a c , γ = arccos l1,l2 Based on such notation, the inner product matrix J can be further formulated as a2 ab cos γ ac cos β ab cos γ b2 bc cos α ac cos β bc cos α c2 = UΛU . (23) Moreover, according to Appendix A.2, we can formulate the corresponding matrix in the logarithmic space as i=1 ki Bi = k6I + "k4 + k5 k1 k2 k1 k5 k4 k3 k2 k3 2k5 2U log(Λ)U . (24) We will specify the cases of the 6 crystal families separately as follows. Note that different with previous works which directly applies constraints on lattice parameters (Liu et al., 2023), we focus on the constraints in the logrithmic space, which plays a significant role in designing the diffusion process. Triclinic. As discussed in Appendix A.2, a triclinic lattice formed by an arbitrary invertible lattice matrix can be represented as the linear combination of 6 bases under no constraints. Monoclinic. Monoclinic lattices require α = γ = 90 , where J can be simplified as a2 0 ac cos β 0 b2 0 ac cos β 0 c2 Obviously, JM has an eigenvector e2 = (0, 1, 0) as JMe2 = b2e2. As J and S have the same eigenvectors, we have SMe2 = (k1, k5 + k6 k4, k3) = λMe2 for some λM. Hence we directly have λM = k5 + k6 k4 and k1 = k3 = 0. Orthorhombic. Orthorhombic lattices require α = β = γ = 90 , where we have a2 0 0 0 b2 0 0 0 c2 2 log(JO) = diag(log(a), log(b), log(c)), we can directly achieve the solution of k as k1 = k2 = k3 = 0, k4 = log(a/b)/2, k5 = log(ab/c2)/6, k6 = log(abc)/3. Tetragonal. Tetragonal lattices have higher symmetry than orthorhombic lattices with a = b. We have k4 = 0 by substituting a = b into Eq. (27). Published as a conference paper at ICLR 2024 Hexagonal. Hexagonal lattices are constrained by α = β = 90 , γ = 120 , a = b, which formulate J as 2a2 a2 0 0 0 c2 2 2 0 0 0 1 3 2a2 0 0 0 1 2a2 0 0 0 c2 2 2 0 0 0 1 And for S, we have 2 2 0 0 0 1 2) 0 0 0 log(a) + 1 2) 0 0 0 log(c) 2 2 0 0 0 1 4 log(3) 0 1 4 log(3) log(a) + 1 4) 0 0 0 log(c) Combine Eq. (24) and Eq. (30), we have the solution as k2 = k3 = k4 = 0, k1 = log(3)/4, k5 = log( 3a2 2c2 )/6, k6 = log( 3 2 a2c)/3. Cubic. Cubic lattices extend tetragonal lattices to a = b = c, changing the solution in Eq. (27) into k1 = k2 = k3 = k4 = k5 = 0, k6 = log(a). B IMPLEMENTATION DETAILS B.1 ARCHITECTURE OF THE DENOISING BACKBONE We illustrate the architecture of the model described in 4.4 in Figure 4. The Fourier coordinate embedding ψFT is defined as ψFT(f)[c, k] = sin(2πmfc), k = 2m, cos(2πmfc), k = 2m + 1. (32) B.2 COMBINATION WITH SUBSTITUTION-BASED ALGORITHMS Crystal structure prediction (CSP) requires predicting the crystal structure from the given composition. To conduct our method on the CSP task, we must initially select an appropriate space group and assign each atom a Wyckoff position. We achieve this goal via a substitution-based method, CSPML (Kusaba et al., 2022), which first retrieves a template structure from the training set according to the query composition, and then substitutes elements in the template with those of the query. We depict the prediction pipeline in Figure 5, including the following steps. Template Retrieval. Given a composition as a query, CSPML initially identifies all structures within the training set that share the same compositional ratio (for instance, 1:1:3 for Ca Ti O3). The retrieved candidates are then ranked using a model based on metric learning. This model is trained on pairwise data derived from the training set. For structures Mi, Mj, we obtain compositional fingerprints Fp(c, i), Fp(c, j) via Xenon Py (Liu et al., 2021) and structural fingerprints Fp(s, i), Fp(s, j) via Crystal NN (Zimmermann & Jain, 2020). The model ϕ is trained via the binary classification loss LCSP ML = BCE(ϕ(|Fpc,i Fp(c, j)|), 1 F p(s,i) F p(s,j) <δ). Here, δ is a threshold used to determine if the structures of Mi and Mj are similar. We adopt δ = 0.3 in line with the setting of Kusaba et al. (2022). The ranking score is defined as ϕ(|Fp(c, q) Published as a conference paper at ICLR 2024 Message-Passing Module Embedding Module Atom Embedding Time Embedding || Output Module MLP MLP MLP Projection Module %𝝐𝒌,#$%&$'()*+$,- %𝝐𝑭 %𝝐𝑨 Wyckoff Mean Wyckoff Mean %𝝐𝒌 %𝝐𝑭! %𝝐𝑨! Figure 4: Architecture of the denoising model. Fp(c, k)|) for a query composition Aq and a candidate composition Ak, implying the probability of similarity. Element Substitution. The second step is to assign the atoms in the query composition to the template with the corresponding element ratio (1/5 for Ca in Ca Ti O3). For the elements with the same ratio (Ca and Ti), we solve the optimal transport with the L2-distance between the element descriptors as the cost. Refinement. Finally, we refine the structure via Diff CSP++ by adding noise to timestep t and apply the generation process under the constraints provided by the template. Practically, we select t = 50 for MPTS-52 and t = 100 for Perov-5 and MP-20. Published as a conference paper at ICLR 2024 Template Retrieval Element Substitution Figure 5: Pipeline of Diff CSP++ combined with CSPML. Table 5: CSP results of the CSPML templates. MR stands for Match Rate. Perov-5 MP-20 MPTS-52 MR (%) RMSE MR (%) RMSE MR (%) RMSE CSPML (Kusaba et al., 2022) 51.84 0.1066 70.51 0.0338 36.98 0.0664 Diff CSP++ (w/ CSPML) 52.17 0.0841 70.58 0.0272 37.17 0.0676 More Results. We further provide the performance of the CSPML templates in Table 5. Diff CSP++ exhibits generally higher match rates and lower RMSE values upon the CSPML templates. This underscores the model s proficiency in refining structures. Note that the refinement step is independent of the template-finding method, and more powerful ranking models or substitution algorithms may further enhance the CSP performance. B.3 HYPER-PARAMETERS AND TRAINING DETAILS We follow the same data split as proposed in CDVAE (Xie et al., 2021) and Diff CSP (Jiao et al., 2023). For the implementation of the CSPML ranking models, we construct 100,000 positive and 100,000 negative pairs from the training set for each dataset to train a 3-layer MLP with 100 epochs and a 1 10 3 learning rate. To train the Diff CSP++ models, we train a denoising model with 6 layers, 512 hidden states, and 128 Fourier embeddings for each task and the training epochs are set to 3500, 4000, 1000, 1000 for Perov-5, Carbon-24, MP-20, and MPTS-52. The diffusion step is set to T = 1000. We utilize the cosine scheduler with s = 0.008 to control the variance of the DDPM process on k and A, and an exponential scheduler with σ1 = 0.005, σT = 0.5 to Published as a conference paper at ICLR 2024 Figure 6: Different structures generated upon the same space group constraints. control the noise scale on F . The loss coefficients are set as λk = λ F = 1, λ A = 20. We apply γ = 2 10 5 for Carbon-24, 1 10 5 for MPTS-52 and 5 10 6 for other datasets for the corrector steps during generation. For sampling from q in Eq. (7), we first sample ϵF N(0, σ2 t I), select Rs0 for each Wyckoff position to acquire ϵ F [:, s] = R s0ϵF [:, s], and finally achieve F t as F t = w(F 0 + ϵ F ), where the operation w( ) preserves the fractional part of the input coordinates. To expand the atom types and coordinates of N Wyckoff positions to N atoms, we first ensure that all atoms in one Wyckoff position have the same type, i.e. asi = a s, and then determine the fractional coordinate of each atom via the basic fractional coordinate f s and the corresponding transformation pair (Rsi, tsi), meaning fsi = Rsif s + tsi. C MORE VISUALIZATIONS We provide visualizations in 5.4 from a CSP perspective to demonstrate the proficiency of our method in generating structures of identical composition but within varying space groups. Transitioning to the ab initio generation task, we attain an inverse objective, that is, to generate diverse structures originating from the same space group as determined by the template structure. This is further illustrated in Figure 6. Our code is available at https://github.com/jiaor17/Diff CSP-PP.