# wyckoff_transformer_generation_of_symmetric_crystals__b12f59cd.pdf Wyckoff Transformer: Generation of Symmetric Crystals Nikita Kazeev 1 Wei Nong 2 Ignat Romanov 3 Ruiming Zhu 2 Andrey Ustyuzhanin 4 5 1 Shuya Yamazaki 2 Kedar Hippalgaonkar 1 2 6 Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce Wy Former, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates Wy Former s compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed. https: //github.com/Symmetry Advantage/ Wyckoff Transformer 1Institute for Functional Intelligent Materials University of Singapore, Block S9, Level 9, 4 Science Drive 2, Singapore 117544 2School of Materials Science and Engineering, Nanyang Technological University, Singapore 639798 3HSE University, Myasnitskaya Ulitsa, 20, Moscow, Russia, 101000 4Constructor University, Bremen, Campus Ring 1, 28759, Germany 5Constructor Knowledge Labs, Bremen, Campus Ring 1, 28759, Germany 6Institute of Materials Research and Engineering, Agency for Science Technology and Research, 2 Fusionopolis Way, Singapore, 138634. Correspondence to: Nikita Kazeev , Kedar Hippalgaonkar . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). 1. Introduction Discovery of materials with desirable properties is the cornerstone of civilization from the stone age to the bronze age and now in the silicon age, the ability to wield materials with different properties and function has transformed society (Pyzer-Knapp et al., 2022). However, for the most part, the search of new materials as well as new functionalities, has proceeded through a traditional route of trialand-error, also called the Edisonian approach (Wang et al., 2024). The space of all possible combinations of atoms forming periodic structures is intractably large, Cao et al. (2024) gauge it at 10160. It is not possible to fully screen this space or even to enumerate it. Materials that exist under realistic conditions, however, occupy a small part of this set of possibilities (Curtarolo et al., 2013). It consists of the energetically-favored combinations of atoms that are held together through covalent, ionic, metallic and other chemical bonding. A generative model that outputs novel a priori stable materials will speed up automated material design by orders of magnitude. 1.1. Space groups and Wyckoff positions A crystal structure can be represented by lattice vectors and atomic basis. The lattice provides a periodic geometric framework in three-dimensional space, defined by the lattice matrix L = [l1, l2, l3] R3 3, with a basis of an atom (or group of atoms) that occupy any lattice point. The atomic positions in real space are hence given by X = [x1, x2, . . . , x N] R3 N, where N is the number of atoms in the unit cell. These positions can also be expressed in fractional coordinates as F = [f1, f2, . . . , f N] [0, 1)3 N, related to real-space coordinates by F = L 1X, ensuring atomic positions remain consistent within the periodic lattice. The periodic arrangement can be further constrained by the space group G, a finite set of symmetry operations g G defined as g X = RX + t, where R O(3) is a 3 3 orthogonal transformation matrix representing rotations, reflections, combinations thereof, and t R3 is a 3 1 translation vector. These symmetry operations collectively form the 230 distinct space groups, which comprehensively classify all possible crystal symmetries in three dimensions (Fedorow, 1892; Hahn et al., 1983). Each space group defines the allowable positions for atoms within the Wyckoff Transformer: Generation of Symmetric Crystals Figure 1. A toy 2D crystal (Goodall et al., 2020). It contains 4 mirror lines, and one rotation center. There are four Wyckoff positions, illustrated by shading. Magenta is the Wyckoff position that is invariant under all the transformations, it only contains a single point; red and yellow lie on the mirror lines, and teal is only invariant under the identity transformation and occupies the rest of the space. Markers of the corresponding colors show one of the possible locations of an atom belonging to the corresponding Wyckoff position. unit cell. Every periodic crystal possesses at least the simplest level of symmetry, P1, which consists only of translational symmetry. Most known crystals have additional internal symmetry, see Figure 2. This is not merely a mathematical observation; optical, electrical, magnetic, structural and other properties are determined by symmetry, as shown by Malgrange et al. (2014); Yang et al. (2005), as well as our results in Section 3.2. Within a given space group G, a subgroup forms the site symmetry, referring to the set of symmetry operations Gi = {g G | g fi fi} G that leave a specific point in the crystal invariant. These operations describe the local symmetrical environment, such as mirrors, screw axes, or inversions centered on a given region. Atoms located at the representative fractional coordinates fi generate equivalent positions {Rsfi + ts}ns s=1, where ns is the multiplicity of the symmetry-equivalent position. Higher site symmetry is in regions where multiple symmetry elements intersect, while those with lower site symmetry include only one symmetry operation. Taking space group 225 Fm-3m as an example, F represents a face-centered lattice, site symmetry subgroup m-3m represents a highly symmetric environment at the center of a cubic unit cell, where multiple symmetry elements intersect, including mirror planes and a 3-fold rotoinversion axis (Hahn et al., 1983). In contrast, another lower site symmetry subgroup .3m corresponds to a less symmetric environment with only a 3-fold rotation axis and a mirror plane. These site symmetry points, classified by their symmetry properties, are grouped into Wyckoff positions (WPs) (Wyckoff, 1922). Mathematically, a WP encompasses all points whose site symmetry groups are conjugate subgroups of the full space group (Kantorovich, 2004). An illustration of WPs is present in Figure 1. Two different WPs in the same space group can share the same site symmetry. This is called symmetry equivalence and occurs when the Wyckoff positions can be mapped onto each other using higher-order symmetry operations. Continuing with the Fm-3m space group example, Wyckoff positions 4a (0,0,0) and 4b (½,½,½) appear distinct under conventional symmetry, but a lattice center translation reveals their higher order symmetry equivalence (see Figure 3). Such a transformation is a coset representative of the affine normalizer, which introduces symmetry operations beyond the space group s symmetry operations, G. The Euclidean normalizer is defined as the largest symmetry group preserving G, but allowing additional transformations like centering translations or scaling, mapping Wyckoff positions onto each other in a higher-symmetry framework, forming the basis for enumeration and augmentation in the next sections. We further explore this idea in (Yamazaki et al., 2025). WPs for a given space group are enumerated by Latin letters, typically in order of decreasing site symmetry. Each WP has a defined multiplicity, which represents the number of equivalent atomic positions in the unit cell related by the symmetry operations of that space group. For example WP 2a has the highest site symmetry and multiplicity 2. The number of distinct WPs in a space group is finite, ranging from a single WP in the simplest symmetry group P1 to as many as 27 in the most complex space groups. Wyckoff positions can represent 1D lines, 2D planes, or open 3D regions within the unit cell. These fundamental concepts lattice, atomic basis, space groups, site symmetry, and Wyckoff positions define a framework to unequivocally describe crystal structures, which is the foundation to our representation. See also Appendix A for an illustration. 1.2. Our contribution 1. Representing a crystal as an unordered set of tokens fused from the chemical elements and Wyckoff positions; Section 2.1. 2. Encoding Wyckoff positions using their universally defined symmetry point groups and symmetry operations descriptors based on spherical harmonics; Section 2.1. 3. Wyckoff Transformer architecture and training protocol that combine autoregressive probability factorization with permutation invariance; Section 2.3. 4. Model invariance with respect to the arbitrary choice of the coset representative of the space group Euclidean normalizer; Sections 2.1, 2.3. Wyckoff Transformer: Generation of Symmetric Crystals Diff CSP Wy Cryst Diff CSP++ Crystal Former Wy For Diff CSP++ Figure 2. Distribution of space groups in MP-20 dataset (Xie et al., 2021) and generated samples. 10 space groups most frequent in MP-20 are labeled, 98% of MP-20 structures belong to symmetry groups other than P1. Plot design by Levy et al. (2024). The comparison of the distribution of generated samples space groups to the ground truth distribution is presented in Table 1, column Space Group χ2. Sr (1a=[m-3m, 0]) (0, 0, 0) Ti (1b=[m-3m, 1]) (1/2, 1/2, 1/2) O (3d=[4/mm.m, 1]) (1/2, 0, 0) Sr (1b=[m-3m, 1]) (1/2, 1/2, 1/2) Ti (1a=[m-3m, 0]) (0, 0, 0) O (3c=[4/mm.m, 0]) (0, 1/2, 1/2) Figure 3. Two equivalent Wyckoff representations of Sr Ti O3 mp-4651, depending on the lattice center choice: [Ti, (m-3m, 0)], [Sr, (m-3m, 1)], [O, (4/mm.m, 1)] [Ti, (m-3m, 1)], [Sr, (m-3m, 0)], [O, (4/mm.m, 0)] 5. Empirically, our model outperforms baseline methods in generating novel, symmetric, diverse materials conditioned on space group symmetry; Section 3.2. 6. Despite not using the information about atom coordinates, our model achieves property prediction performance competitive with the machine learning models that use the full structure; Section 3.2. 1.3. Related work Crystal generation is a burgeoning field, with most stateof-the-art models using a differentiable non-invertible SO(3) invariant representation constructed from atom coordinates, such as a graph neural network. Then they use diffusion or flow matching to solve the generation problem (Jiao et al., 2024a;b; Cao et al., 2024; Yang et al., 2023; Zeni et al., 2025; Xie et al., 2021; Klipfel et al., 2023; Luo et al., 2024; Sinha et al., 2024). Our approach uses discrete Wyckoff space and fast autoregressive sampling, compared to gradual refinement in the aforementioned works. Wy Former complements them naturally by providing symmetry constraints and/or initial structure approximation the synergy with the most suitable partner, Diff CSP++, we evaluate thoroughly. Wyckoff positions and machine learning. The concept of Wyckoff positions was originally published more than a 100 years ago (Wyckoff, 1922). Given the elegance of the representation, naturally, in modern times WPs have found their way into machine learning. The main limiting factor in their adoption was the ability of machine learning algorithms to handle discrete structured data which is formed by WPs. WP-based representation was used for property prediction (Goodall et al., 2020; Jain & Bligaard, 2018; M oller et al., 2018; Goodall et al., 2022), and recently in generative models. Our work is inspired by Zhu et al. (2024), the first such model. It uses a VAE over one-hot-encoded information about WPs, as opposed to our Transformer encoder, which is a generally superior architecture for categorical data. AI4Science et al. (2023) use GFlow Net (Bengio et al., 2023) to sample space group and chemical composition, but not the full Wyckoff representation. A concurrent preprint (Cao et al., 2024) independently explores a Transformer-based approach similar to ours; another concurrent work (Levy et al., 2024) uses diffusion over Wyckoff position site symmetry, fractional coordinates, and lattice parameters. The main difference between our and most other approaches, that are based on Wyckoff positions is that they use Wyckoff letters as the representation. Wyckoff letter definitions depend on the space group, unlike site symmetry, leading to data fragmentation. Levy et al. (2024) also use WP site symmetry, with one-hot-encoding of symmetry operations per axis to represent it; Goodall et al. (2022) use the sum of one-hot-encodings of sites to represent a WP; we treat site symmetry as a categorical variable and use learnable embeddings. Zhu et al. (2024); Cao et al. (2024) don t take into account dependency of the Wyckoff letters on the arbitrary choice of the coset representative of the space group Euclidean normalizer. Finally, Cao et al. (2024) use positional encoding to establish the relationship between the chemical elements and Wyckoff positions they occupy, while we combine them in one token. Wyckoff Transformer: Generation of Symmetric Crystals Spherical harmonics are widely used to build a fixed-length descriptor for spatial relationships (Bart ok et al., 2013). 2. Wyckoff Transformer (Wy Former) 2.1. Tokenization Our work is based on the inductive bias that for stable materials space group symmetry and Wyckoff sites almost completely define the structure more than 98% of the materials in MP-20 (Xie et al., 2021) and MPTS-52 (Baird et al., 2024) datasets, which together contain almost all experimentally stable structures from the Materials Project (Jain et al., 2013), have unique Wyckoff representations. Therefore, it is safe to assume that for almost any Wyckoff representation there is either none, or just one stable material conforming to it. Symmetry captured by this discrete part is sufficient to determine properties of a material, such as piezoelectricity via non-centrosymmetry; direct/indirect band gap via positions of the valence/conduction bands in the Brillouin Zone, while the fractional coordinates can be linked to the magnitude of that property. We additionally prove this assumption by predicting various material properties; see Section 3.2. Given a Wyckoff representation, coordinates can be determined as discussed in Section 2.4. We represent each structure as a set of tokens, as shown in Figure4. The first token contains the space group; the rest are divided into groups of three tokens, each representing a specific WP. The first token in each group is responsible for the type of atom that occupies the position following the site symmetry, while the last token is for the so-called enumeration. Several WPs can have the same site symmetry. To differentiate those WPs we enumerate them separately within each space group and site symmetry according to the conventional WP order (Aroyo et al., 2006). For example, in space group 225 present in Figure4 WP 4a is encoded as (m-3m, 0), 4b as (m-3m, 1), and 8c as (-43m, 0). A more comprehensive example can be found in Appendix S. The purpose of this encoding is to take advantage of the fact that, unlike Wyckoff letters, site symmetry definition is universal across different space groups. An ablation study comparing our representation with Wyckoff letters is in Appendix N. Such an encoding has an additional advantage. For a given crystal, the conventional unit cell can sometimes be chosen in several equivalent ways, which changes the Wyckoff positions (see Figure 3) corresponding to each atom, but not their site symmetries. We collect all the arbitrariness in one variable, which leaves the rest of the representation strictly invariant to that choice. Formally, we define Wyckoff representation of a structure as R = (G, E, W), where G is the space group, W = [w1, . . . , wm] are the Wyckoff positions with wi = (si, ni), where si is the site symmetry, and ni is the enumeration, and E = [e1, . . . , em] are the chemical elements occupying them. Spglib (Atsushi Togo & Tanaka, 2024) provides us a mapping ρ from crystal C = (L, E, F) to R, which is used to preprocess the training dataset. The problem solved by Wyckoff Transformer is sampling the distribution P (R| C : ρ(C) is stable), which is enabled by learning the following probabilities: p(ei|G, Ei 1, Si 1, Ni 1), p(si|Ei, Si 1, Ni 1), and p(ni|EI, SI, Ni 1), where Ei 1 = [e1, . . . , ei 1], Si 1 = [s1, . . . , si 1], Ni 1 = [n1, . . . , ni 1]. 2.1.1. SPHERICAL HARMONICS Enumerations are defined by an arbitrary convention, in this respect they are no better than Wyckoff letters. We address this with a representation that is defined consistently across space groups. Consider a Wyckoff position consisting of a set of k symmetry operations {Rix + ti, i = 1...k}. We apply these operations to points x1 = [0, 0, 0] and x2 = [1, 1, 1] obtaining two matrices W (1) and W (2): W (j) i = Rixj +tixj. Finally, we convolve the transformed coordinates with spherical harmonics: ϕ(j) i = arctan([W (j)]2 i , W (j)]1 i ); θ(j) i = arccos([W (j)]3 i ) i=1 |W (j) i |[Y 0 n (θ(j) i , ϕ(j) i ), ..., Y n n (θ(j) i , ϕ(j) i )]/k, where n is the degree of spherical harmonics, a parameter, and the resulting complex vectors h(1) and h(2) each have n + 1 dimensions. n = 2 is enough to disambiguate all Wyckoff positions with the same site symmetry belonging to the same space groups; n = 1 is not. Finally, we obtain the final 2n + 2 dimensional descriptor s by concatenation: s = ℜ(h(1) h(2)) ℑ(h(1) h(2)). The harmonic representation is not directly invertible; in the main section of the paper, we only use it for property prediction, which results in a slight performance increase, as shown in Appendix O. A way to adapt the harmonics-based representation for structure generation is discussed in Appendix P. 2.2. Model architecture Elements, site symmetries, and enumerations are each embedded with a simple lookup table with trainable weights, the embeddings are concatenated. Then we apply a linear layer to provide each head of the multihead attention with information from all three parts of a token. Since our model is conditioned on space group, preventing data fragmentation is of utmost importance. To this end, the space group is not encoded just as a categorical variable. Building upon (AI4Science et al., 2023) and similarly to Levy et al. (2024) we use py Xtal to get one-hot-encoded 15 10 matrix that represents symmetry elements on each Wyckoff Transformer: Generation of Symmetric Crystals Space Group Site Symmetry Enumeration Figure 4. An example of structure tokenization, Tm Mg Hg2 mp-865981 axis for each space group, flatten it, discard the positions that do not vary across the dataset and use the resulting vector as the space group embedding. Then we apply a linear layer, so the representation becomes learnable but still transferable between space groups. Token sequences are used as input for a Transformer encoder (Vaswani, 2017; Devlin, 2018). Wyckoff representation is permutation-invariant, so is Transformer; we do not use positional encoding, making the model formally permutation-invariant with respect to the input. De novo generation We use enumerations representation. We additionally add a STOP token to each structure. To represent states where some parts of token are known and others are not, we replace those values with MASK. We also add a fully-connected neural network for each part of the token that we want to predict, three in total. To get the prediction, we take the output of Transformer encoder on the token containing MASK value(s), concatenate it with a one-hot vector encoding presence in the input sequence of each possible value for this token part, and use it as the input for the corresponding fully-connected network. Property prediction We take the Transformer encoder outputs tokens, excluding the token corresponding to the space group, compute a weighted average with weights being equal to the multiplicities of WPs, and use the result as input for a fully-connected neural network that outputs a scalar predicted value. 2.3. Training Following the approach of Wang et al. (2023); Abramson et al. (2024), we use a simple architecture and do not strictly enforce invariance with respect to the choice of the equivalent Wyckoff representations, but rather leave it as a training goal by picking a randomly selected equivalent representation at every training epoch. It is especially viable because of the low number of variants; in MP-20 dataset for 96% structures there are less than 10. The experimental results we present were obtained by training separate models for property prediction and de novo generation. A single model to do both is possible, we leave it for the future work. De novo generation The training pipeline and architecture are shown in Figure 5. We train the model to predict next part of a token in a cascade fashion: first the chemical element conditioned on the previous tokens, then site symmetry conditioned on the previous tokens and the element and, finally, enumeration conditioned on the previous tokens, the element and the site symmetry. On each training iteration we randomly sample known sequence length and the part of the cascade to predict; place MASK tokens as necessary, input the known parts of the sequences into the model, compute cross-entropy loss between the predicted scores and the target. Unlike Transformer itself, auto-regressive generation is not permutation-invariant. The number of WPs is small, the average in MP-20 is just 3.0; this again allows us to train the model to be invariant with augmentation by shuffling the order of every Wyckoff representation at every training epoch. Moreover, we use multi-class loss when training to predict the first cascade part, chemical element, further reducing learning complexity. On MP-20 the model is trained for 9 105 epochs using SGD optimizer without batching; due to the efficiency of the representation gradient backpropagation for the entire dataset fits into GPU memory. We use the loss on the validation dataset for early stopping, learning rate scheduling, and manual hyperparameter tuning. Property prediction The model is trained using MSE loss with batch size 500, and Adam optimizer. For both MP-20 and AFLOW training takes around 5k epochs. Hyperparameters are available in Appendix L. 2.4. Structure generation We generate crystals conditioned on space group number which is sampled from the combination of training and validation datasets, as illustrated in Figure 7. Wyckoff representation is then autoregressively sampled using Wy Former. We use two ways to generate the final crystal structure conditioned on the representation, the details are described in Appendix C. They both start with randomly sampling a structure conditioned on the Wyckoff representation with py Xtal (Fredericks et al., 2021). Then it s relaxed with Cry SPR (Nong et al., 2024) and CHGNet (Deng et al., 2023) Wyckoff Transformer: Generation of Symmetric Crystals Transformer Encoder Prediction Cross-Entropy STEP 1 STEP 2 STEP 3 One-hot encoded symmetry operations STEP 4 STEP 5 STEP 6 for Block Mixing Figure 5. Model training pipeline. (1) The crystal is converted into a token sequence where the first token is the space group number and then token triplets in the order atom, site, symmetry and enumeration. Then the triplets are randomly shuffled. (2) Randomly sample the number of fully known Wyckoff positions and the part of the next triplet to be predicted; mask unknown tokens, remove unknown Wyckoff positions. (3) Embed the tokens using simple lookup tables; for each Wyckoff positions concatenate tokens corresponding to it in the embedding dimension. (4) A linear layer mixes the features to provide homogeneous input to multiple attention heads. (5) The sequence is passed through the Transformer Encoder. (6) An MLP is applied to the last token of the output sequence. (7) The loss is cross entropy of the prediction and the true value of the token being predicted. or Diff CSP++ (Jiao et al., 2024b). 3. Experimental Evaluation 3.1. De novo generation 3.1.1. DATASETS We use MP-20 (Xie et al., 2021), which contains almost all experimentally stable materials in Materials Project (Jain et al., 2013) with a maximum of 20 atoms per unit cell, within 0.08 e V/atom of the convex hull, and formation energy smaller than 2 e V/atom, 45 229 structures in total, split 60/20/20 into train, validation and test parts. Additionally, we train and evaluate Wy Former on MPTS-52 (Baird et al., 2024), a more challenging subset of Materials Projects containing materials with up to 52 atoms per unit cell. 3.1.2. METRICS Structure property similarity metrics Coverage and Property EMD (Wasserstein) distance, have been proposed as a low-cost proxy metric for de novo structure generation by Xie et al. (2021) and then followed by most of the subsequent work. Validity Xie et al. (2021) proposed verifying crystal feasibility according to two criteria: Structural validity means that no two atoms are closer than 0.5 A. All structures in MP-20 and almost all structures produced by state-of-the-art models fulfill it. Compositional validity means having neutral charge (Davies et al., 2019). Only 90% of MP-20 structures pass this test meaning that nonconforming structures are physically possible if somewhat rare. Novelty and uniqueness The purpose of de novo generation is to obtain new materials. Generated materials that already exist in the training dataset increase the model performance according to structure stability and similarity metrics, but such structures are useless for material design and just increase the gap between the proxy metrics and the model fitness for its purpose. Therefore we exclude generated materials that are not novel and unique from metric computation. On a deeper level, generative models for materials Wyckoff Transformer: Generation of Symmetric Crystals Table 1. Evaluation. Symmetry metrics are computed only using novel structurally valid examples. Note that the 1000 and 105-example metrics are computed using MP-20 train and validation as reference datasets for novelty, while the 10 000-example S.U.N. only uses MP-20 train to remain compatible with the reported values. Bold indicates the values within p = 0.1 statistical significance threshold from the best. Values marked by were computed by Miller et al. (2024), the rest by us; see note H.1 for an important caveat; in short, the values in (brackets) are less accurate, but are compatible with each other. Method/Metric Novel Unique P1 (%) Space Group S.U.N. % S.S.U.N. % S.U.N. % Templates (#) ref = 1.7 χ2 Ehull < 80 me V Ehull < 0 me V Sample size 1000 1000 1000 105 105 10 000 Relaxation CHGNet CHGNet CHGNet DFT DFT DFT Wy Former CHGNet 180 3.24 0.223 23.1 22.3 Wy Former Diff CSP++ 186 1.46 0.212 22.2 21.1 3.83 (4.14) Diff CSP++ 10 2.57 0.255 14.4 14.4 Crystal Former 74 0.91 0.276 20.1 20.1 Symm CD 101 2.35 0.24 20.7 20.7 Wy Cryst 165 4.79 0.710 5.5 5.5 Diff CSP 76 36.57 7.989 22.2 20.6 (3.34 ) Flow MM 51 44.27 12.423 17.8 16.9 (2.34 ) Wy Former MPTS 52 386 0 0.225 are subject to exploration/exploitation trade-off: the more physically similar are the sampled materials to the training dataset, the more likely they are stable and distributed similar to the data, but the less useful they are for the purpose of material design. From a purely machine learning point of view, novelty percentage serves a proxy metric for overfitting. Stability determines whether the material, in fact, exists under normal conditions. It is estimated by computing energy above convex hull, and comparing it to a threshold. Materials Project is the source of the reference structures for the hull. The details are in Appendix G. S.U.N. (Zeni et al., 2025) combines the above into the fraction of stable unique novel structures. Symmetry of the structures has paramount physical importance. Controlling symmetries also leads to control over physical, electronic, and mechanical behavior, which is desirable in property-directed inverse design of materials. For example, in electronic materials, higher symmetry can improve carrier mobility and uniformity in electronic band structure, enhancing performance in applications such as semiconductors or optoelectronics. Furthermore, highsymmetry structures often exhibit isotropic properties, meaning their behaviors are the same in all directions, making them more versatile for industrial use. We use four metrics for evaluating the ability of the generative models to reproduce the symmetry present in the data and, ultimately, in nature: P1 is the percentage of the structures that have symmetry group P1. In MP-20 the corresponding number is just 1.7%. We argue that presence of symmetry is good proxy value for structure feasibility that is difficult to capture in standard DFT computations, and would require finite-temperature calculations and/or improved methodologies. Novel Unique Templates is the number of the novel unique element-agnostic Wyckoff representations (Section 2.1) in the generated sample. Element-agnostic means that we remove the chemical element, while retaining the symmetry information. For example, for the Tm Mg Hg2 in Figure 4, it will be (X, (m-3m, 0)), (X, (m-3m, 1)), (X, (-43m, 0)) and its equivalent. An important difference between our work and (Levy et al., 2024) is that we take into account equivalence of Wyckoff representations. The metric provides a lower limit on overfitting and physically meaningful sample novelty: if two materials have different symmetry templates, their physical properties will be different, while the inverse is not always true. It serves as an addition to the strict structure novelty, which provides the upper bound. Finally, the ability of a model to generate new templates allows it generate more structures before starting to repeat itself, as we demonstrate in Appendix J. Space Group χ2 is the χ2 statistic of difference of the frequencies of space groups between the generated and test datasets. S.S.U.N. is the percentage of the structures that are symmetric (space group not P1), stable, unique and novel. 3.1.3. METHODOLOGY Wy Former was trained using MP-20 dataset following the original train/test/validation split. We sampled Wyckoff Transformer: Generation of Symmetric Crystals 104 Wyckoff representations, then obtained structures using Cry SPR+CHGNet (Wy Former CHGNet) and Diff CSP++ (Wy Former Diff CSP++) approaches described in Section 3.1.3. Wy Cryst (Zhu et al., 2024) only supports a limited number of unique elements per structure, therefore we trained it on a subset of MP-20 containing only binary and ternary compounds, 35 575 in total. An evaluation of Wy Former trained on the same dataset is present in Appendix K. As Wy Cryst also produces Wyckoff representations, and not structures, the same Cry SPR+CHGNet procedure was used to obtain them. Crystal Former (Cao et al., 2024) code and weights published by the authors were used by us to produce the sample, conditioned on the space groups sampled from MP-20. Diff CSP (Jiao et al., 2024a), Diff CSP++ (Jiao et al., 2024b), and Symm CD (Levy et al., 2024) samples were provided by the authors. The Diff CSP++ sampling process is conditioned on Wyckoff templates from the training dataset, which includes the space group. For each model a data sample containing 1000 structures was relaxed using CHGNet. The generated samples were filtered for uniqueness, more than 99.5% of structures for every method passed the filtering. We computed for DFT for 105 novel structures for each method; detailed description of the settings is available in Appendix H. Additionally, we computed DFT for 10 000 structures from Wy Former, and compared S.U.N. values to the values reported by Miller et al. (2024). 3.1.4. DE NOVO STRUCTURE GENERATION RESULTS Evaluation results are present in Tables 1 and 2; a sample of generated structures is illustrated in Figure 9. Wy Former achieves 24% higher S.U.N. on the 10 000structure sample compared to the best available baseline; best template novelty, fraction of asymmetric structures and space group distribution reproduction. On the 105-structure sample, the difference Wy Former, Crystal Former, Diff CSP, Flow MM, and Symm CD the difference between S.U.N. and S.S.U.N. values is not statistically significant. Diff CSP++ has lower stability, despite using a priori valid structure templates from the data. As we show in Appendix J, the lack of template novelty limits the diversity, and the model starts to repeat itself. Diff CSP++ oversamples the structures with the large number of unique elements, Wy Former matches the distribution most closely, as depicted in Figure 10. Crystal Former has lower novelty, which means that the model has been overfitted, and the structures are more simi- lar to the training dataset. It also produces a sizable fraction of a priori structurally invalid crystals. Wy Cryst suffers from even lower novelty, stability and distribution similarity metrics. Diff CSP and Flow MM can not be conditioned on the symmetry group, and produce a large fraction of unrealistic asymmetric structures. Symm CD is a concurrent work based on similar principles, and achieves similar performance, except for a lesser number of Novel Unique Templates. On MPTS-52, as expected, Wy Former shows higher novelty as well as template novelty. In terms of distribution similarity metrics Wy Former performs largely similarly on MP-20 and MPTS-52. We used CHGNet to predict formation energies estimate S.S.U.N.: 24.4% on MPTS-52, compared to 35.2% on MP-20. This reflects the increased difficulty, and shows that Wy Former is still very much capable of generating stable structures in this setting. 3.2. Material property prediction MP-20 dataset contains two properties: formation energy and band gap, which we predict using Wy Former. The results are shown in Table 3. Wy Former achieves competitive results with the models that use full structures. We also utilize the AFLOW database (Curtarolo et al., 2012), which contains 4905 compounds spanning a diverse range of chemistries and crystal structures. We predict four properties: thermal conductivity, Debye temperature, bulk modulus, and shear modulus. The data are divided into training, validation, and test sets using a 60/20/20 split. The results are presented in Table 4; Wy Former demonstrated superior performance in predicting thermal conductivity. For the remaining three properties, the model s performance is comparable to the baselines. From this we argue that the symmetries and composition of a crystal alone already carry a considerable amount of information about its properties. This is especially true for band gap, where Brillouin zones are defined by symmetry, and thermal conductivity, which is a non-equilibrium phonon transport property conditioned on underlying symmetry of the structure; according to the first order approximation kinetic theory, higher symmetry crystals typically have higher thermal conductivity due to (1) higher group velocities and (2) longer scattering times due to lower anharmonicity (Newnham, 2004; Yang et al., 2021). 4. Conclusions and Limitations Ehull determined from formation energy as a proxy for stability is commonly used, but is imperfect, as it doesn t take Wyckoff Transformer: Generation of Symmetric Crystals Table 2. Evaluation of the methods according to validity and property distribution metrics. Structures were relaxed with CHGNet. Following the reasoning in Section 3.1.2, we apply filtering by novelty and structural validity, and do not discard structures based on compositional validity. An evaluation following the protocol proposed by Xie et al. (2021) is available in Appendix I. Method Novelty Validity (%) Coverage (%) Property EMD (%) Struct. Comp. COV-R COV-P ρ E Nelem Wy Former CHGNet 90.00 99.56 80.44 98.67 96.72 0.74 0.053 0.097 Wy Former Diff CSP++ 89.50 99.66 80.34 99.22 96.79 0.67 0.050 0.098 Diff CSP++ 89.69 100.00 85.04 99.33 95.80 0.15 0.036 0.504 Crystal Former 76.92 86.84 82.37 99.87 95.13 0.52 0.100 0.163 Symm CD 88.77 95.82 84.88 99.55 94.66 0.62 0.102 0.525 Wy Cryst 52.62 99.81 75.53 98.85 87.10 0.96 0.113 0.286 Diff CSP 90.06 100.00 80.94 99.55 96.21 0.82 0.052 0.294 Flow MM 89.44 100.00 81.93 99.67 99.64 0.49 0.036 0.131 Wy Former MPTS 52 98.7% 99.3% 76.7% 0.698 0.108 0.228 Table 3. One-shot energy and band gap prediction. We computed CHGNet energy predictions on the MP-20 dataset, the rest of the baseline values are from (Lin et al., 2023); The MP-20 test set is a part of CHGNet training set. Xie & Grossman (2018); Jha et al. (2019) report the error between DFT-computed and experimental results 0.08 e V for energy, and 0.6 e V for band gap. Method Energy Band gap Train Test me V me V CGCNN 31 292 Materials Project 2018.6.1 Sch Net 33 345 MEGNet 30 307 GATGNN 33 280 ALIGNN 22 218 Matformer 21 211 Pot Net 19 204 CHGNet 34 MPTrj MP-20 Wy Former 25 234 MP-20 into account configurational and vibrational entropic contributions, and hull determination relies on already known structures. Moreover, our results, along with Miller et al. (2024) show that generated structures with space symmetry group P1 are consistently found stable at a much higher rate than they occur in nature. There are two logical explanations: either Diff CSP and Flow MM have, in passing, discovered a new class of asymmetric materials or our stability estimation methodology is systematically flawed. In our biased opinion the latter is much more likely. Novelty and diversity evaluation is a crucial and open question. A model can generate structures that are similar to the ones in the training dataset, and are valid, but not very useful for new material design. Counting complete duplicates is a step in the right direction, but doesn t measure substantial sample diversity (Hicks et al., 2021). Table 4. MAE values for AFLOW dataset; baseline values are by Wang et al. (2021). Method Thermal Debye Bulk Shear conductivity temperature modulus modulus Roost 2.70 37.17 8.82 9.98 Crab Net 2.32 33.46 8.69 9.08 Hot Crab 2.25 35.76 9.10 9.43 Elem Net 3.32 45.72 12.12 13.32 RF 2.66 36.48 11.91 10.09 Wy Former 2.20 36.36 9.63 10.14 An important part of the future work is Crystal Structure Prediction (CSP). Unlike the models that work with atoms and coordinates, it is hard to ensure that Wy Former output strictly conforms to a given stoichiometry. But we can add the stoichiometry as a generation condition, like space group. Then, as as we show in Appendix 6, Wy Former is four orders of magnitude faster than other CSP solutions, which allows to simply use rejection sampling. In conclusion, we demonstrate that Wy Former represents a novel advancement in generation of realistic symmetric crystals by leveraging Wyckoff positions to encode material symmetries. Wy Former achieves a higher degree of structure diversity compared to baselines by encoding the discrete symmetries of space groups without relying on atomic coordinates. This unique tokenization of symmetry elements enables the model to explore a reduced, yet highly representative space of possible configurations, resulting in more stable and purportedly synthesizable crystals. The model respects the inherent symmetry of crystalline materials, outperforms existing models in generating both novel and physically meaningful structures. These innovations underscore the method s potential in accelerating material discovery while maintaining accuracy in predicting key properties like formation energy and band gap. Wyckoff Transformer: Generation of Symmetric Crystals Acknowledgements We thank Lei Wang for insights on symmetry-conditioned generation; Andrey Okhotin for insights on permutation invariance and the 10k CHGNet computation; Benjamin Miller for a discussion of the evaluation metrics; Rui Jiao and Daniel Levy for providing data samples. This research/project is supported by the Ministry of Education, Singapore, under its Research Centre of Excellence award to the Institute for Functional Intelligent Materials (I-FIM, project No. EDUNC-33-18-279-V12). This research/project is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG3-RP-2022-028) and from the MAT-GDT Program at A*STAR via the AME Programmatic Fund by the Agency for Science, Technology and Research under Grant No. M24N4b0034. The computational work for this article was performed on resources at the National Supercomputing Centre of Singapore (NSCC). Computational work involved in this research work is partially supported by NUS IT s Research Computing group. The research used computational resources provided by Constructor Tech. This research was supported in part through computational resources of HPC facilities at HSE University. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A. J., Bambrick, J., et al. Accurate structure prediction of biomolecular interactions with Alpha Fold 3. Nature, pp. 1 3, 2024. AI4Science, M., Hernandez-Garcia, A., Duval, A., Volokhova, A., Bengio, Y., Sharma, D., Carrier, P. L., Benabed, Y., Koziarski, M., and Schmidt, V. Crystal GFN: sampling crystals with desirable properties and constraints. ar Xiv preprint ar Xiv:2310.04925, 2023. Aroyo, M. I., Perez-Mato, J. M., Capillas, C., Kroumova, E., Ivantchev, S., Madariaga, G., Kirov, A., and Wondratschek, H. Bilbao crystallographic server: I. databases and crystallographic computing programs. Zeitschrift f ur Kristallographie-Crystalline Materials, 221(1):15 27, 2006. Atsushi Togo, K. S. and Tanaka, I. Spglib: a software library for crystal symmetry search. Sci. Technol. Adv. Mater., Meth., 4(1):2384822 2384836, 2024. doi: 10. 1080/27660400.2024.2384822. URL https://doi. org/10.1080/27660400.2024.2384822. Baird, S. G., Sayeed, H. M., Montoya, J., and Sparks, T. D. matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures. Journal of Open Source Software, 9(97):5618, 2024. doi: 10.21105/joss.05618. URL https://doi.org/10. 21105/joss.05618. Bart ok, A. P., Kondor, R., and Cs anyi, G. On representing chemical environments. Physical Review B Condensed Matter and Materials Physics, 87(18):184115, 2013. Bengio, Y., Lahlou, S., Deleu, T., Hu, E. J., Tiwari, M., and Bengio, E. GFlow Net foundations. The Journal of Machine Learning Research, 24(1):10006 10060, 2023. Cao, Z., Luo, X., Lv, J., and Wang, L. Space group informed transformer for crystalline materials generation. ar Xiv preprint ar Xiv:2403.15734, 2024. Curtarolo, S., Setyawan, W., Hart, G. L., Jahnatek, M., Chepulskii, R. V., Taylor, R. H., Wang, S., Xue, J., Yang, K., Levy, O., Mehl, M. J., Stokes, H. T., Demchenko, D. O., and Morgan, D. AFLOW: An automatic framework for high-throughput materials discovery. Computational Materials Science, 58:218 226, 2012. ISSN 09270256. doi: 10.1016/j.commatsci.2012. 02.005. URL http://dx.doi.org/10.1016/j. commatsci.2012.02.005. Curtarolo, S., Hart, G. L., Nardelli, M. B., Mingo, N., Sanvito, S., and Levy, O. The high-throughput highway to computational materials design. Nature materials, 12(3): 191 201, 2013. Davies, D. W., Butler, K. T., Jackson, A. J., Skelton, J. M., Morita, K., and Walsh, A. Smact: Semiconducting materials by analogy and chemical theory. Journal of Open Source Software, 4(38):1361, 2019. Deng, B., Zhong, P., Jun, K., Riebesell, J., Han, K., Bartel, C. J., and Ceder, G. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence, 5(9):1031 1041, 2023. Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. ar Xiv preprint ar Xiv:1810.04805, 2018. Fedorow, E. v. II. Zusammenstellung der krystallographischen Resultate des Herrn Schoenflies und der meinigen. Zeitschrift f ur Kristallographie-Crystalline Materials, 20 (1-6):25 75, 1892. Wyckoff Transformer: Generation of Symmetric Crystals Fredericks, S., Parrish, K., Sayre, D., and Zhu, Q. Py Xtal: A Python library for crystal structure generation and symmetry analysis. Computer Physics Communications, 261: 107810, 2021. Ganose, A. M., Sahasrabuddhe, H., Asta, M., Beck, K., Biswas, T., Bonkowski, A., Bustamante, J., Chen, X., Chiang, Y., Chrzan, D., Clary, J., Cohen, O., Ertural, C., Gallant, M., George, J., Gerits, S., Goodall, R., Guha, R., Hautier, G., Horton, M., Kaplan, A., Kingsbury, R., Kuner, M., Li, B., Linn, X., Mc Dermott, M., Mohanakrishnan, R. S., Naik, A., Neaton, J., Persson, K., Petretto, G., Purcell, T., Ricci, F., Rich, B., Riebesell, J., Rignanese, G.-M., Rosen, A., Scheffler, M., Schmidt, J., Shen, J.-X., Sobolev, A., Sundararaman, R., Tezak, C., Trinquet, V., Varley, J., Vigil-Fowler, D., Wang, D., Waroquiers, D., Wen, M., Yang, H., Zheng, H., Zheng, J., Zhu, Z., and Jain, A. Atomate2: Modular Workflows for Materials Science. Chem Rxiv, 2025. URL https://chemrxiv. org/engage/chemrxiv/article-details/ 678e76a16dde43c9085c75e9. Goodall, R. E., Parackal, A. S., Faber, F. A., and Armiento, R. Wyckoff set regression for materials discovery. In Third Workshop on Machine Learning and the Physical Sciences (Neur IPS 2020), Vancouver, Canada., 2020. Goodall, R. E., Parackal, A. S., Faber, F. A., Armiento, R., and Lee, A. A. Rapid discovery of stable materials by coordinate-free coarse graining. Science advances, 8(30): eabn4117, 2022. Gruver, N., Sriram, A., Madotto, A., Wilson, A. G., Zitnick, C. L., and Ulissi, Z. Fine-tuned language models generate stable inorganic materials as text. ar Xiv preprint ar Xiv:2402.04379, 2024. Hahn, T., Shmueli, U., and Arthur, J. W. International tables for crystallography, volume 1. Reidel Dordrecht, 1983. Hicks, D., Toher, C., Ford, D. C., Rose, F., Santo, C. D., Levy, O., Mehl, M. J., and Curtarolo, S. AFLOWXtal Finder: a reliable choice to identify crystalline prototypes. npj Computational Materials, 7(1):30, 2021. Jain, A. and Bligaard, T. Atomic-position independent descriptor for machine learning of material properties. Physical Review B, 98(21):214112, 2018. Jain, A., Ong, S. P., Hautier, G., Chen, W., Richards, W. D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL materials, 1(1), 2013. Jha, D., Choudhary, K., and Tavazza, F. e. a. Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning. Nat Commun 10, 2019. Jiao, R., Huang, W., Lin, P., Han, J., Chen, P., Lu, Y., and Liu, Y. Crystal structure prediction by joint equivariant diffusion. Advances in Neural Information Processing Systems, 36, 2024a. Jiao, R., Huang, W., Liu, Y., Zhao, D., and Liu, Y. Space group constrained crystal generation. ar Xiv preprint ar Xiv:2402.03992, 2024b. Kantorovich, L. Quantum theory of the solid state: an introduction, volume 136. Springer Science & Business Media, 2004. Klipfel, A., Fr egier, Y., Sayede, A., and Bouraoui, Z. Unified model for crystalline material generation. ar Xiv preprint ar Xiv:2306.04510, 2023. Kresse, G. and Furthm uller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B, 54:11169 11186, Oct 1996. doi: 10.1103/Phys Rev B.54.11169. URL https://link. aps.org/doi/10.1103/Phys Rev B.54.11169. Kresse, G. and Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B, 59:1758 1775, Jan 1999. doi: 10.1103/Phys Rev B. 59.1758. URL https://link.aps.org/doi/10. 1103/Phys Rev B.59.1758. Levy, D., Panigrahi, S. S., Kaba, S.-O., Zhu, Q., Galkin, M., Miret, S., and Ravanbakhsh, S. Symm CD: Symmetry Preserving Crystal Generation with Diffusion Models. In AI for Accelerated Materials Design-Neur IPS 2024, 2024. Lin, Y., Yan, K., Luo, Y., Liu, Y., Qian, X., and Ji, S. Efficient approximations of complete interatomic potentials for crystal property prediction. Proceedings of the 40-th International Conference on Machine Learning, 2023. Luo, X., Wang, Z., Gao, P., Lv, J., Wang, Y., Chen, C., and Ma, Y. Deep learning generative model for crystal structure prediction. npj Computational Materials, 10(1): 254, 2024. Malgrange, C., Ricolleau, C., and Schlenker, M. Symmetry and physical properties of crystals. Springer, 2014. Miller, B. K., Chen, R. T., Sriram, A., and Wood, B. M. Flow MM: Generating Materials with Riemannian Flow Matching. ICML 2024; ar Xiv preprint ar Xiv:2406.04713, 2024. Wyckoff Transformer: Generation of Symmetric Crystals M oller, J. J., K orner, W., Krugel, G., Urban, D. F., and Els asser, C. Compositional optimization of hardmagnetic phases with machine-learning models. Acta Materialia, 153:53 61, 2018. Newnham, R. E. Thermal conductivity. In Properties of Materials: Anisotropy, Symmetry, Structure. Oxford University Press, 11 2004. ISBN 9780198520757. doi: 10.1093/oso/9780198520757.003. 0020. URL https://doi.org/10.1093/oso/ 9780198520757.003.0020. Nong, W., Zhu, R., and Hippalgaonkar, K. Cry SPR: A Python interface for implementation of crystal structure pre-relaxation and prediction using machine-learning interatomic potentials. Chem Rxiv, 2024. doi: https://doi.org/10.26434/ chemrxiv-2024-r4wnq. URL https://chemrxiv. org/engage/chemrxiv/article-details/ 66b308a501103d79c5fd9b91. Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V. L., Persson, K. A., and Ceder, G. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68:314 319, 2013. Perdew, J. P., Burke, K., and Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett., 77:3865 3868, Oct 1996. doi: 10.1103/Phys Rev Lett. 77.3865. URL https://link.aps.org/doi/10. 1103/Phys Rev Lett.77.3865. Pyzer-Knapp, E. O., Pitera, J. W., Staar, P. W., Takeda, S., Laino, T., Sanders, D. P., Sexton, J., Smith, J. R., and Curioni, A. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Computational Materials, 8(1):84, 2022. Riebesell, J., Goodall, R. E., Jain, A., Benner, P., Persson, K. A., and Lee, A. A. Matbench discovery an evaluation framework for machine learning crystal stability prediction. ar Xiv preprint ar Xiv:2308.14920, 2023. Sinha, A., Jia, S., and Fung, V. Representation-space diffusion models for generating periodic materials. ar Xiv preprint ar Xiv:2408.07213, 2024. Sommer, T., Willa, R., Schmalian, J., and Friederich, P. 3DSC-a dataset of superconductors including crystal structures. Scientific Data, 10(1):816, 2023. Vaswani, A. Attention is all you need. Advances in Neural Information Processing Systems, 2017. Wang, A. Y.-T., Kauwe, S. K., Murdock, R. J., and Sparks, T. D. Compositionally restricted attention-based network for materials property predictions. Npj Computational Materials, 7(1):77, 2021. Wang, Y., Elhag, A. A., Jaitly, N., Susskind, J. M., and Bautista, M. A. Swallowing the bitter pill: Simplified scalable conformer generation. In Forty-first International Conference on Machine Learning, 2023. Wang, Z., Chen, A., Tao, K., Han, Y., and Li, J. Matgpt: A vane of materials informatics from past, present, to future. Advanced Materials, 36(6):2306733, 2024. Wyckoff, R. W. G. The Analytical Expression of the Results of the Theory of Space-groups, volume 318. Carnegie institution of Washington, 1922. Xie, T. and Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120 (14):145301, 2018. Xie, T., Fu, X., Ganea, O.-E., Barzilay, R., and Jaakkola, T. Crystal diffusion variational autoencoder for periodic material generation. ICLR 2022, ar Xiv preprint ar Xiv:2110.06197, 2021. Yamazaki, S., Nong, W., Zhu, R., Novoselov, K. S., Ustyuzhanin, A., and Hippalgaonkar, K. Multi-property directed generative design of inorganic materials through Wyckoff-augmented transfer learning. ar Xiv preprint ar Xiv:2503.16784, 2025. Yang, J. et al. An introduction to the theory of piezoelectricity, volume 9. Springer, 2005. Yang, M., Cho, K., Merchant, A., Abbeel, P., Schuurmans, D., Mordatch, I., and Cubuk, E. D. Scalable diffusion for materials generation, 2023. URL http://arxiv. org/abs/2311.09235. Yang, R., Yue, S., Quan, Y., and Liao, B. Crystal symmetry based selection rules for anharmonic phonon-phonon scattering from a group theory formalism. Phys. Rev. B, 103:184302, May 2021. doi: 10.1103/Phys Rev B.103. 184302. URL https://link.aps.org/doi/10. 1103/Phys Rev B.103.184302. Zeni, C., Pinsler, R., Z ugner, D., Fowler, A., Horton, M., Fu, X., Wang, Z., Shysheya, A., Crabb e, J., Ueda, S., Sordillo, R., Sun, L., Smith, J., Nguyen, B., Schulz, H., Lewis, S., Huang, C.-W., Lu, Z., Zhou, Y., Yang, H., Hao, H., Li, J., Yang, C., Li, W., Tomioka, R., and Xie, T. A generative model for inorganic materials design. Nature, 639(8055):624 632, Mar 2025. ISSN 1476-4687. doi: 10.1038/s41586-025-08628-5. URL https:// doi.org/10.1038/s41586-025-08628-5. Zhu, R., Nong, W., Yamazaki, S., and Hippalgaonkar, K. Wy Cryst: Wyckoff inorganic crystal generator framework. Matter, 2024. ISSN 25902385. doi: https://doi.org/10.1016/j.matt.2024.05. Wyckoff Transformer: Generation of Symmetric Crystals 042. URL https://www.sciencedirect.com/ science/article/pii/S2590238524003059. Wyckoff Transformer: Generation of Symmetric Crystals A. Wyckoff representation with fractional coordinates A crystal can be represented as a space group, a set of WPs and chemical elements occupying them, the fractional coordinates of the WP degrees of freedom, and free lattice parameters. Such representation reduces the number of parameters by an order of magnitude without information loss. For example, see Figure 6. Group: I4/mmm (139) Lattice: a = b = 8.9013, c = 5.1991, α = 90.0, β = 90.0, γ = 90.0 Wyckoff sites: Nd @ [ 0.0000 0.0000 0.0000], WP [2a] Site [4/m2/m2/m] Al @ [ 0.2788 0.5000 0.0000], WP [8j] Site [mm2.] Al @ [ 0.6511 0.0000 0.0000], WP [8i] Site [mm2.] Cu @ [ 0.2500 0.2500 0.2500], WP [8f] Site [..2/m] Figure 6. Wyckoff representation of Nd(Al2Cu)4 (mp-974729), variable parameters in bold. If represented as a point cloud, the structure has 13[atoms] 3[coordinates] + 6[lattice] = 42 parameters; if represented using WPs, it has just 4 continuous parameters (WPs 8i and 8j each have a free parameter, and the tetragonal lattice has two), and 5 discrete parameters (space group number, and WPs for each atom). B. Wy Former Description Structure generation is shown in Figure 7 and described in Algorithm 1; training in Algorithm 2; model itself in Algorithm 3. C. Structure generation details The process of obtaining crystal structures from Wyckoff representations using Py Xtal (Fredericks et al., 2021) begins by specifying a space group and defining WPs. Py Xtal allows users to input atomic species, stoichiometry, and symmetry preferences. Based on these parameters, Py Xtal generates a random crystal structure that respects the symmetry requirements of the space group. Once the initial structure is generated, we then perform energy relaxation using CHGNet. CHGNet is a neural network-based model designed to predict atomic forces and energies, significantly speeding up calculations that would traditionally require density functional theory (DFT). We repeat the process for six random initializations and pick the structure with the lowest energy. Energy distribution among the initializations is presented in Figure 8. Energy relaxation involves optimizing the atomic positions to reach a minimum energy configuration, which represents the most stable form of the material. CHGNet, trained on vast DFT datasets, can efficiently relax crystal structures by adjusting atomic positions to reduce the total energy. This approach ensures that the final structure is not only symmetrical but also physically realistic in terms of energy stability. For the 2nd structure generation method, Diff CSP++ is a diffusion-based crystal structure prediction model that focuses on generating purportedly stable crystal structures by sampling from an energy landscape in a physically consistent manner. Diff CSP++ generation also starts with Py Xtal sampling. D. Training computational requirements Our tests were done on a single NVIDIA RTX 6000 Ada, 24 CPU cores and MP-20 dataset. The results are present in Figure5. We have also tried batched training for next token prediction training, but it just becomes slower without improved Table 5. Wy Former training resources requirements on the MP-20 dataset. Prediction target Time Batch size Number of epochs GPU memory, Mi B GPU load Next token 11h 27136 900k 2000 50% Formation energy 26m 1000 5.2k 1700 45% Band gap 10m 1000 3k 1700 25% quality. The reason might be that we choose a random known sequence length and part of the token, token permutation, and Wyckoff Transformer: Generation of Symmetric Crystals STEP 1 STEP 2 STEP 3 Model for structure Figure 7. High-level flowchart of structure generation with Wy Former. In step 1 space group which is sampled from the training data distribution and used as the initial token for Wy Former; in step 2 Wy Former autoregressively generates tokens; in step 3 the Wyckoff representation is converted to JSON and stored. Finally, in step 4, the Wyckoff representation is passed to Diff CSP++/Cry SPR for structure generation as described in Section 2.4. Wyckoff Transformer: Generation of Symmetric Crystals Algorithm 1 Generation of Crystal Structure using Wyckoff Transformer Model 1: Load Trained Wyckoff Transformer Model 2: Select or sample a space group 3: Initialize sequence = [space group] 4: loop Should End false 5: current token count length(sequence) 6: repeat 7: if current token count max length then 8: loop Should End true 9: end if 10: if not loop Should End then 11: Predict element using Model(sequence) 12: if element == STOP then 13: loop Should End true 14: else 15: append element to sequence 16: current token count current token count + 1 17: if current token count max length then 18: loop Should End true 19: end if 20: end if 21: end if 22: if not loop Should End then 23: Predict site symmetry using Model(sequence) 24: if site symmetry == STOP [Less likely, but possible] then 25: loop Should End true 26: else 27: append site symmetry to sequence 28: current token count current token count + 1 29: if current token count max length then 30: loop Should End true 31: end if 32: end if 33: end if 34: if not loop Should End then 35: Predict enumeration using Model(sequence) 36: if enumeration == STOP [Less likely, but possible] then 37: loop Should End true 38: else 39: append enumeration to sequence 40: current token count current token count + 1 41: if current token count max length then 42: loop Should End true 43: end if 44: end if 45: end if 46: until loop Should End = true 47: Convert generated sequence of (element, site symmetry, enumeration) tokens into a list of {element, Wyckoff position letter} pairs for the chosen space group. 48: Use py Xtal library with the Wyckoff representation to create an initial 3D crystal structure. 49: Relax the structure a MLIP (CHGNet, etc.), Diff CSP++, or DFT 50: return the crystal structure. Wyckoff Transformer: Generation of Symmetric Crystals Algorithm 2 Wyckoff Transformer Training Algorithm Require: Training dataset Dtrain (crystal structures represented as sequences of tokens: space group + list of [element, site symmetry, enumeration]) 1: Initialize Wyckoff Transformer Model M with random weights 2: Initialize Optimizer O (e.g., SGD) 3: for epoch = 1 to Max Epochs do 4: for each crystal structure sequence S in Dtrain do 5: Saug S 6: Randomly shuffle the order of [element, site symmetry, enumeration] tokens within Saug {A}ugmentation for permutation invariance 7: Randomly choose one of the equivalent Wyckoff representations for Saug 8: postarget Randomly pick a position in Saug to predict 9: parttarget Randomly pick which part of the token (element, site symmetry, or enumeration) to predict at postarget 10: Replace parttarget at postarget in Saug with a MASK token 11: Also mask any subsequent parts of the token at postarget, remove the tokens after it 12: Ppred M(Saug) {Forward Pass: Model predicts the masked part} 13: Vactual the true value of parttarget at postarget in the original Saug 14: L Cross Entropy Loss(Ppred, Vactual) {Use multi-class variant if parttarget is element} 15: Calculate gradients L based on the loss L w.r.t. M s parameters 16: Update M s weights using O( L) 17: end for 18: Optional: Validate model M s performance on a separate validation dataset Dval periodically. 19: Optional: Adjust learning rate or implement early stopping based on validation performance. 20: end for 21: return Wyckoff Transformer Model M. Figure 8. Distribution of CHGNet-predicted energy standard deviation across six random py Xtal initializations for 1000 Wyckoff representations. Wyckoff Transformer: Generation of Symmetric Crystals Algorithm 3 Model Forward Pass 1: Define element embedding layer (lookup table) 2: Define site symmetry embedding layer (lookup table) 3: Define enumeration embedding layer (lookup table) 4: Define space group embedding layer (special encoding + linear layer) 5: Define embedding mixer (linear layer) {Mixes concatenated [element, site symmetry, enumeration] embeddings} 6: Define transformer encoder block (standard Transformer Encoder layers, NO positional encoding) 7: Define element prediction head (Fully-connected Neural Network) 8: Define site symmetry prediction head (Fully-connected Neural Network) 9: Define enumeration prediction head (Fully-connected Neural Network) 10: 11: Function Model Forward (space group token, sequence of wyckoff tokens) 12: space group embedding space group embedding layer(space group token) 13: wyckoff embeddings list [] 14: for each token in sequence of wyckoff tokens do 15: element emb element embedding layer(token.element) 16: site sym emb site symmetry embedding layer(token.site symmetry) 17: enum emb enumeration embedding layer(token.enumeration) 18: concatenated emb concaternate(element emb, site sym emb, enum emb) 19: mixed wyckoff emb embedding mixer(concatenated emb) 20: append mixed wyckoff emb to wyckoff embeddings list 21: end for 22: full sequence embeddings concatenate(space group embedding, wyckoff embeddings list) 23: transformer output sequence transformer encoder block(full sequence embeddings) 24: target scores transformer output sequence.last {the masked token is the last one} 25: Optional: target embedding concatenate(target embedding, presence vector) 26: if predicting element then 27: predicted probabilities element prediction head(target embedding) 28: else if predicting site symmetry then 29: predicted probabilities site symmetry prediction head(target embedding) 30: else if predicting enumeration then 31: predicted probabilities enumeration prediction head(target embedding) 32: end if 33: return predicted probabilities Wyckoff Transformer: Generation of Symmetric Crystals *enumerations* variant on every batch; this can help to avoid a sharp minimum even when gradients are computed over the whole dataset. For comparison, training Diff CSP++ took 19.5 hours and 32000 Mi B of GPU memory. E. Inference speed We conducted experiments on a machine with NVIDIA RTX 6000 Ada and 24 physical CPU cores. For baselines, we used source code, model hyperparameters and weights published by the authors. Assuming that the downstream costs of structure relaxation by DFT or machine-learning interaction potential are fixed, the inference cost per S.U.N. structure is present in the Figure6. Table 6. Inference time per S.U.N. structure. When a GPU is running, it also occupies a CPU core, which is taken into account. S.U.N. rates are measured according to DFT stability estimation. CHGNet is not used anywhere, for Wy Former Raw we sample a structure with py Xtal and use it directly as an input for DFT. Method S.U.N. GPU ms per CPU s per (%) structure S.U.N. structure S.U.N. Wy Former Raw 4.8 0.05 1.0 0.105 2.2 Wy For Diff CSP++ 12.8 840 5957 0.940 6.7 Diff CSP 19.7 360 1731 0.360 1.73 Diff CSP++ 7.6 1250 14705 1.35 15.9 Generating a batch of Wyckoff representations takes 25 seconds, of which 5 seconds are spent generating Py Torch tensors, and 20 seconds on decoding them into Python dictionaries containing Wyckoff representations. The latter part has not been optimized. In total, generation takes 0.05 GPU ms and 4.8 CPU ms per structure. Obtaining unrelaxed structures using py Xtal takes 100 CPU ms / structure. Relaxing the structure is the most expensive step. Diff CSP++ takes 14 minutes to produce 1000 structures at 840 GPU ms / structure. Note that we modified the code to remove the inference of atom types, so it runs faster compared to the original version. CHGNet: 112 GPU s / structure for MP-20 on NVIDIA A40 Diff CSP: the authors don t report speed. On our machine, generating 10000 structures on GPU took 1 hour, at 360 GPU ms per structure. Diff CSP++: the authors don t report speed. On our machine, generating 27135 structures took 6 hours, at 1.25 CPU+GPU seconds per structure Crystal Former; Cao et al. (2024): It takes 520 seconds to generate a batch size 13,000 crystal samples on a single A100 GPU , which translates to a generation speed of 40 ms per sample. Flow MM: The authors also do not publish inference time or model weights. They claim to be 3x faster than Diff CSP in terms of integration steps. Wy Cryst; Zhu et al. (2024): Latent space sampling 1 CPU second/2000 structures; Py Xtal generation 2 CPU core seconds/structure Figure 10 contains the number of unique elements per structure for MP-20 and novel generated structures. G. Energy above hull calculations For CHGNet, to obtain the Ehull, we firstly constructed the reference convex hull data by querying all 153235 structures from the Materials Project (MP); then, using the pymatgen.analysis.phase diagram sub-module the Ehull for Wyckoff Transformer: Generation of Symmetric Crystals K2As Au S3, C2 (5) Eu Co Si3, I4mm (107) Ce5Bi3, P63/mcm (193) Ce Tl S2, R-3m (166) Nd Bi3, Pm-3m (221) K2Na Ag F6, Fm-3m (225) Nd Sb S, P4/nmm (129) Dy Se Br, Pnma (62) Li6Mn3O8, P1 (1) Nd2Cl3, R-3c (167) (a) stable (b) unstable Figure 9. 10 structures generated from Wy Former Diff CSP++ and presented without additional relaxation. The labels contain the chemical formula, followed by the space group symbol in the short Hermann-Mauguin notation, and space group number. To the left 8 structures were randomly chosen from 15 stable structures as validated by DFT calculations, to the right 2 from unstable structures. The solid box lines represent the primitive cell. 1 2 3 4 5 6+ Unique elements per stucture Normalized frequency Wy Former Crystal Former Diff CSP++ Diff CSP Flow MM Wy Cryst MP-20 Figure 10. Distribution of the number of unique elements per structure for MP-20 and novel generated structures. Wyckoff Transformer: Generation of Symmetric Crystals 0.0 0.1 0.2 0.3 Root mean squared deviation (Å) Empirical cumulative density function Diff CSP++ Diff CSP Wy Former/CHGNet Wy Former Raw Wy For Diff CSP++ Wy Cryst/CHGNet Figure 11. The empirical cumulative density function (ECDF) for root mean squared deviation (RMSD) of DFT-unrelaxed structures from DFT-relaxed counterparts. RMSD is calculated using pymatgen.analysis.Structure Matcher, in which only the RMSD of matched structure pairs is reported. Wy Former/CHGNet and Wy Cryst/CHGNet refer to the models that use CHGNet-relaxed structures as inputs for DFT relaxations, while Wy Former Raw refers to Wy Former directly using pyxtal-generated unrelaxed structures (Section C). each entry of generated structure was computed by referencing to the MP convex hull, Ehull = max{ Ei}, where Ei is the decomposition energy of any possible path for a structure decomposing into the reference convex hull. For the DFT data we used the MP convex hull 2023-02-07-ppd-mp.pkl.gz distributed by matbench-discovery (Riebesell et al., 2023) was used as the reference hull. H. DFT details We use DFT settings from Materials Project https://docs.materialsproject.org/ methodology/materials-methodology/calculation-details/gga+u-calculations/ parameters-and-convergence for structure relaxation and energy computation. In particular, we do GGA and GGA+U calculations with atomate2.vasp.flows.mp. MPGGADouble Relax Static Maker (Ganose et al., 2025), which in turn relies on pymatgen.io.vasp.sets.MPRelax Set and pymatgen.io.vasp.sets.MPStatic Set (Ong et al., 2013). Computations themselves were done with VASP (Kresse & Furthm uller, 1996) version 5.4.4. with the plane-wave basis set (Kresse & Furthm uller, 1996). The electron-ion interaction is described by the projector augmented wave (PAW) pseudo-potentials (Kresse & Joubert, 1999). The exchange-correlation of valence electrons is treated with the Perdew-Burke-Ernzerhof (PBE) functional within the generalized gradient approximation (GGA) (Perdew et al., 1996). For a small fraction (1 15%) of the generated structures, the DFT failed to converge. We consider such structures to be unstable for the purposes of S.U.N. computation. The effect is especially strong for Crystal Former, as 13% of the structures it generates are structurally invalid, that is have atoms closer than 0.5 A. The 105-sample relaxations used structures as produced by the ML models. For the 10 000-sample run we used CHGNet pre-relaxation to speed up the computations. The raw total energies computed by DFT were corrected with Materials Project2020Compatibility before putting into the Phase Diagram to obtain the DFT Ehull. Wyckoff Transformer: Generation of Symmetric Crystals H.1. DFT setting difference between Materials Project and (Miller et al., 2024) The settings used by Miller et al. (2024) are not entirely the same as the ones used by the Materials Project, us, and Zeni et al. (2025). Both in the paper (section A.7.) and in the code https://github.com/facebookresearch/ flowmm/blob/6a96aec3b6eba89f6fa07436f0c8837979abb285/scripts_analysis/dft_create_ inputs.py#L43, Miller et al. (2024) refer to using pymatgen.io.vasp.sets.MPRelax Set, and doing so only once. The Materials Project procedure consists of relaxing all cell and atomic positions two times in consecutive runs, followed by a more precise static calculation https://docs.materialsproject.org/methodology/ materials-methodology/calculation-details. To estimate the effect and make a direct comparison, in Figure1 we report the S.U.N. obtained from structures relaxed with single MPRelax Set in (brackets) and our more accurate MPGGADouble Relax Static Maker result without. I. Legacy metrics For completeness sake, in Figure7 we present the metrics computed following the protocol set up by Xie et al. (2021). We would like to again reiterate the issues with it. Firstly, the metrics are negatively correlated with structure novelty, the raison d ˆetre for material generative models. Secondly, filtering by charge neutrality aka compositional validity means discarding viable structures. Table 7. Method comparison according the protocol set up by Xie et al. (2021). COV-P depends on the generated sample size, so to compute it we uniformly subsample 1k structures. (a) Directly using structures produced by the methods, without additional relaxation. Note that CHGNet is an integral part of generating structures with Wyckoff Transformer and Wy Cryst, so it s used. Method Validity (%) Coverage (%) Property EMD Struct. Comp. COV-R COV-P ρ E Nelem Wyckoff Transformer 99.60 81.40 98.77 95.94 0.39 0.078 0.081 Wy Former Diff CSP++ 99.80 81.40 99.51 95.81 0.36 0.083 0.079 Diff CSP++ 99.94 85.13 99.67 95.71 0.31 0.069 0.399 Crystal Former 93.39 84.98 99.62 94.56 0.19 0.208 0.128 Symm CD 100.00 86.27 99.50 94.82 0.06 0.160 0.402 Wy Cryst 99.90 82.09 99.63 96.16 0.44 0.330 0.322 Diff CSP 100.00 83.20 99.82 96.84 0.35 0.095 0.347 Flow MM 96.87 83.11 99.73 95.59 0.12 0.073 0.094 (b) All structures have been relaxed with CHGNet. Note that for some models we didn t compute CHNet relaxation for all the structures, so the sample size is smaller. Method Validity (%) Coverage (%) Property EMD Struct. Comp. COV-R COV-P ρ E Nelem Wyckoff Transformer 99.60 81.40 98.77 95.94 0.39 0.078 0.081 Wy Trans Diff CSP++ 99.70 81.40 99.26 95.85 0.33 0.070 0.078 Diff CSP++ 100.00 85.80 99.42 95.48 0.13 0.036 0.453 Crystal Former 89.92 84.88 99.87 95.45 0.19 0.139 0.119 Symm CD 95.49 85.86 99.19 96.05 0.32 0.095 0.392 Wy Cryst 99.90 82.09 99.63 96.16 0.44 0.330 0.322 Diff CSP 100.00 82.50 99.64 95.18 0.46 0.075 0.321 Flow MM 100.00 82.83 99.71 95.83 0.17 0.046 0.093 J. Template Novelty and Diversity To asses the impact of template novelty on the diversity of the generated data can be assessed by evaluating the number of unique structures as the function of the total dataset size. We sampled 118k examples from the model with the lowest template novelty, Diff CSP++, and the highest, Wy Former. We present the number of unique samples as a function of the Wyckoff Transformer: Generation of Symmetric Crystals generated sample size in Figure 12. Diff CSP++ uniqueness is clearly lower; due to its high inference costs (see Appendix 6), we were unable to prepare a larger sample. 102 103 104 105 Sample Size Fraction of Unique Structures Diff CSP++ Wy Former 0.0 0.2 0.4 0.6 0.8 1.0 Sample Size 1e5 Total Number of Unique Structures Diff CSP++ Wyckoff Transformer Figure 12. Fraction of unique structures and total number of unique structures as a function of sample size. For Wyckoff Transformer we used only the Wyckoff representations for uniqueness assessment, meaning that the uniqueness is likely to be slightly underestimated. K. Evaluation on MP-20 binary & ternary Comparison of Wy Former to Wy Cryst is presented in tables 8 and 9. Both models were trained on a subset of MP-20 training data containing only binary and ternary structures, and similarly selected subset of MP-20 testing dataset is used as the reference for property distributions. All generated structures were relaxed with CHGNet. Wy Former outperforms Wy Cryst across the board. S.U.N. values are close, but this is achieved by Wy Cryst sacrificing sample diversity and property similarity metrics, with about half of the generated structures already existing in the training dataset. Table 8. Evaluation of the methods according to the symmetry metrics. Aside from Template Novelty, metrics are computed only using novel structurally valid structures. Stability estimated with CHGNet. Method Template Novelty P1 (%) Space Group S.S.U.N. (%) ref = 1.7 χ2 (%) Wy Former 25.63 1.43 0.224 37.9 Wy Cryst 18.51 4.79 0.815 35.2 Table 9. Evaluation of the methods according to validity and property distribution metrics. Following the reasoning in Section 3.1.2, we apply filtering by novelty and structural validity, and do not discard structures based on compositional validity. Validity is also computed only for novel structures. Stability estimated with CHGNet. Method Novelty Validity (%) Coverage (%) Property EMD S.U.N. (%) Struct. Comp. COV-R COV-P ρ E Nelem (%) Wy Former 91.19 99.89 77.28 98.90 96.75 0.83 0.064 0.084 38.4 Wy Cryst 52.62 99.81 75.53 98.85 89.27 1.35 0.128 0.003 36.6 Wyckoff Transformer: Generation of Symmetric Crystals L. Hyperparameters L.1. Next token prediction MP-20 WP representation: Site symmetry + enumeration Element embedding size: 16 Site symmetry embedding size: 16 Site enumerations embedding size: 8 Number of fully-connected layers: 3 Number of attention heads: 4 Dimension of feed forward layers inside Encoder: 128 Dropout inside Encoder: 0.2 Number of Encoder layers: 3 Loss function: Cross Entropy, multi-class for element, single-class for other token parts, no averaging Batch size: 27136 (full MP-20 train) Optimizer: SGD Initial learning rate: 0.2 Scheduler: Reduce LROn Plateau Scheduler patience: 2 104 epochs Early stopping patience: 105 epochs of no improvement in validation loss clip grad norm: max norm=2 L.2. Energy prediction MP-20 WP representation: Site symmetry + harmonics Element embedding size: 32 Site symmetry embedding size: 64 Harmonics vector size: 12 Embedding dropout: 0.03 Number of fully-connected layers: 3 Fully-connected dropout: 0 Number of attention heads: 4 Dimension of feed forward layers inside Encoder: 128 Wyckoff Transformer: Generation of Symmetric Crystals Dropout inside Encoder: 0.1 Number of Encoder layers: 4 Loss function: Mean squared error (MSE), averaged over batch Batch size: 1000 Optimizer: Adam Initial learning rate: 0.001 Scheduler: Reduce LROn Plateau Scheduler patience: 200 epochs Scheduler factor: 0.5 Early stopping patience: 103 epochs of no improvement in validation loss clip grad norm: max norm=2 L.3. Band gap prediction MP-20 WP representation: Site symmetry + harmonics Element embedding size: 32 Site symmetry embedding size: 64 Harmonics vector size: 12 Embedding dropout: 0.05 Number of fully-connected layers: 3 Fully-connected dropout: 0.03 Number of attention heads: 4 Dimension of feed forward layers inside Encoder: 128 Dropout inside Encoder: 0.2 Number of Encoder layers: 1 Loss function: Mean squared error (MSE), averaged over batch Batch size: 1000 Optimizer: Adam Initial learning rate: 0.001 Scheduler: Reduce LROn Plateau Scheduler patience: 200 epochs Scheduler factor: 0.5 Early stopping patience: 103 epochs of no improvement in validation loss clip grad norm: max norm=2 Wyckoff Transformer: Generation of Symmetric Crystals M. Fine-tuning LLM with Wyckoff representation To challenge Wyckoff Transformer s architecture, we compared it with pre-trained language models that were used in vanilla mode as well as after fine-tuning, essentially combining approach by Gruver et al. (2024) with Wyckoff representation. We explored two different textual representations of crystals corresponding to a given space group: Naive, which contains the specifications of atoms at particular symmetry groups encoded by Wyckoff symmetry labels: Na at a, Na at a, Na at a, Mn at a, Co at a, Ni at a, O at a, O at a, O at a, O at a, O at a, O at a Augmented, which contains the specifications of atom types with its symmetries and site enumerations: Na @ m @ 0, Na @ m @ 0, Na @ m @ 0, Mn @ m @ 0, Co @ m @ 0, Ni @ m @ 0, O @ m @ 0, O @ m @ 0, O @ m @ 0, O @ m @ 0, O @ m @ 0, O @ m @ 0, where the set of valid symmetries is: [ 2.22 , 4/mmm , 1 , -3.. , 6mm , m-3m , 2 , 3mm , .m , -6mm2m , 4mm , .32 , 322 , .2/m. , -1 , .m. , ..m , m.2m , .3m , 3m , m2m. , 2mm , -32/m. , 2.. , ..2 , .3. , 2/m , -43m , 4/mm.m , .2. , 2/m2/m. , 23. , 222 , m.. , mm. , -3. , m-3. , 3. , 4/m.. , .-3m , 2m. , -32/m , -42m , m.mm , 4.. , m.m2 , 422 , 32. , 22. , -622m2 , 3m. , .-3. , mmm.. , 222. , mm2.. , -4m2 , 2/m.. , mm2 , -3m2/m , -4m.2 , 2mm. , 3.. , -42.m , ..2/m , 4m.m , -4.. , 6/mm2/m , m2m , m2. , 2.mm , mmm. , mmm , 32 , m , -6.. ] We fine-tuned the Open AI chat GPT-4o-mini-2024-07-18 model using different representations and compared it with the vanilla Open AI gpt-4o-2024-08-06 model. For each of the cases prompt looked like: Provide example of a material for spacegroup number X. The table below contains details of the model training: Model Base Model Representation Hyperparameters Training Time Inference Time Number of Parameters Wy LLMvanilla gpt-4o2024-08-06 Naive 74m 200B Wy LLMnaive gpt-4omini-202407-18 Naive epochs: 1, batch: 24, learning rate multiplier: 1.8 51m 51m 8B Wy LLMsitesymmetry gpt-4omini-202407-18 Site Symmetry epochs: 1, batch: 24, learning rate multiplier: 1.8 95m 37m 8B Table 10. Comparison of different models and their characteristics. Number of parameters is not known exactly and is taken from public sources as an approximate estimation. For reference, Wy Former has 150k parameters. Both training and inference times were measured using batch job execution on Open AI s cloud. The fine-tuned model returned a JSON string that was easy to parse, while the vanilla model required additional parsing of its output. Comparison the Wy Former to Wy LLM is present in Figure11. When fine-tuned, an LLM using Wyckoff representations shows similar performance to Wy Former at a much greater computational cost. Using site symmetries instead of Wyckoff letters doesn t unequivocally increase the LLM performance, a possible explanation is that since this representation is our original proposition, the LLM is less able to take advantage of pre-training that contained letter-based Wyckoff representations. Without fine-tuning, the majority of LLM outputs are formally invalid, and the distribution of the valid ones doesn t match MP-20. N. Ablation study: letters vs site symmetries To evaluate the effect of using a representation based on site symmetry, as opposed on Wyckoff letters, we trained a Wy Former model with the same hyperparameters, but using a Wyckoff letters, and not site symmetry + enumeration representation. The letter-based variant underperforms, as show in Figure12. Wyckoff Transformer: Generation of Symmetric Crystals Table 11. Comparison for Wy Former to different variant of Wy LLM. All structures have been relaxed with Diff CSP++. Sample size is 1000 structures per model. The metrics described in Section 3.1.2. nan is placed where the generated structures contained a rare element that crashed the property computation code. Wyckoff Validity refers to the percentage of the generated outputs that are valid Wyckoff representations. Aside from LLM-specific problems, such as non-existent elements, a Wyckoff representation can be invalid if it places several atoms at Wyckoff position without degrees of freedom, or refers to Wyckoff positions that do not exist in the space group. Stability computed with DFT. Method Novelty Validity (%) Coverage (%) Property EMD (%) Struct. Comp. COV-R COV-P ρ E Nelem Wy Former 89.50 99.66 80.34 99.22 96.79 0.67 0.050 0.098 Wy LLM-naive 94.67 99.79 82.89 98.72 94.97 0.39 0.067 0.015 Wy LLM-vanilla 95.59 99.82 88.75 94.46 59.67 2.23 0.234 0.253 Wy LLM-site-symmetry 89.58 99.89 83.89 99.44 96.32 0.29 nan 0.039 Method Wyckoff Validity Novel Unique P 1 (%) Space Group S.U.N. S.S.U.N. (%) Templates (#) ref = 1.7 χ2 % % Wy Former 97.8 186 1.46 0.212 22.2 21.3 Wy LLM-naive 94.9 237 1.38 0.167 11.7 11.7 Wy LLM-vanilla 28.7 87 2.03 0.621 Wy LLM-site-symmetry 89.6 191 2.24 0.158 Method/Metric Novel Unique P1 (%) Space Group S.U.N. S.S.U.N. Templates (#) ref = 1.7 χ2 % % Wy Former Diff CSP++ 186 1.46 0.21 22.2 21.1 Wy Former-letters-Diff CSP++ 250 1.16 0.21 16.0 16.0 Table 12. Wy Former using Wyckoff letters (Wy Former-letters-Diff CSP++) vs Wy Former using site symmetry+enumeration (Wy Former Diff CSP++) Wyckoff Transformer: Generation of Symmetric Crystals O. Performance analysis of encoding WPs with spherical harmonics To assess the impact of spherical harmonics we compare the performance of models with the same set of hyperparameters for the property prediction task on MP-20, leaving generative performance comparison for the future work. The results are presented in Figure13, hyperparameters in Figure14. Table 13. Performance of Wy Former with different representation. The values are slightly different from Figure3, as there we have tuned hyperparameters. Representation Energy MAE, me V Band Gap MAE, me V Site symmetry only 31.7 247.8 Wyckoff letter 30.5 234.0 Site symmetry & Enumeration 30.7 244.1 Site symmetry & Harmonics 29.7 238.7 Parameter Value Element embedding size 16 Wyckoff letter embedding size 27 Site symmetry embedding size 16 Site enumerations embedding size 7 Harmonic vector length 12 Batch size 500 Number of fully-connected layers 3 Number of attention heads 4 Dimension of feed-forward layers inside Encoder 128 Dropout inside Encoder 0.2 Number of Encoder layers 3 Table 14. Hyperparameters used in the ablation study. P. Sampling harmonic-encoded WPs WP harmonic representation is a real-valued vector. But for each space group it can only take up to 8 possible values, so learning the full distribution of such vectors is not necessary. We tried the following procedure: 1. Take the harmonic representations of all the WPs in all space group 2. Use K-means clustering to find 8 cluster centers. 3. Separately for each space group, assign harmonic labels to each enumeration: (a) Compute the Euclidean distances between all cluster centers and all WPs in the SG (b) Choose the smallest distance. Assign the WP to the corresponding cluster, remove WP and the cluster center from consideration. (c) Repeat until all WPs are assigned This way all we obtain a discrete prediction target with one-to-one mapping with enumerations, but where physically-similar values are grouped together. Experimentally, however, this modification reduces performance. When predicting spherical harmonics clusters, S.U.N. based on 1k CHGNet-relaxed structures was 34.0% as compared to 36.6% for enumerations-based model; S.U.N. based on 105 DFT structures S.U.N. was 19.1% vs 22.2%. Wyckoff Transformer: Generation of Symmetric Crystals Q. Superconductor critical temperature prediction We used Wy Former to predict the critical temperature in superconductors on the 3DSC dataset (Sommer et al., 2023); obtained test MLSE of 0.81 R. Token analysis R.1. Wy Fomer tokens Tokens are formed from three parts: (element, site symmetry, enumeration), for example: (O, .m., 0). Considering all choices of space group Euclidean normalizer, there are 10904 unique tokens in MP-20. The distribution for MP-20 is present in Figure 13; for MPTS-52 in Figure 14. 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 Token frequency in MP-20 for top 30 tokens O, 1, 0 O, m, 0 F, 1, 0 S, 1, 0 O, m.., 0 Li, 1, 0 O, .m., 0 S, m, 0 O, mm2, 0 Se, 1, 0 S, 3m., 1 O, mm2, 1 O, 2mm., 0 S, 3m., 0 S, 3m., 2 Se, 3m., 1 Se, m, 0 Cl, 1, 0 Te, 3m., 1 Se, 3m., 0 Mn, 1, 0 Se, 3m., 2 Ga, m.., 0 Te, 3m., 0 Te, 3m., 2 V, 1, 0 Fe, 1, 0 H, 1, 0 O, m, 1 I, 1, 0 0.000 0.001 0.002 0.003 0.004 0.005 Token frequency in MP-20 Number of tokens Token frequency distribution in MP-20 for 10901 tokens with frequency < 0.0 Figure 13. Distribution of tokens in MP-20 Wyckoff Transformer: Generation of Symmetric Crystals 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Token frequency in MPTS-52 for top 30 tokens O, 1, 0 H, 1, 0 F, 1, 0 S, 1, 0 O, m, 0 N, 1, 0 Cl, 1, 0 P, 1, 0 C, 1, 0 Se, 1, 0 I, 3m., 1 O, .m., 0 I, 3m., 0 I, 3m., 2 O, m.., 0 S, m, 0 I, 1, 0 Br, 1, 0 Si, 1, 0 Na, 1, 0 K, 1, 0 Te, 1, 0 B, 1, 0 Cd, 3m., 1 O, ..m, 0 Cd, 3m., 0 Cd, 3m., 2 Se, m, 0 As, 1, 0 Cu, 1, 0 0.000 0.002 0.004 0.006 0.008 0.010 Token frequency in MPTS-52 Number of tokens Token frequency distribution in MPTS-52 for 11066 tokens with frequency < 0 Figure 14. Distribution of tokens in MPTS-52 Wyckoff Transformer: Generation of Symmetric Crystals R.2. Template tokens In this section, we consider a different token structure (space group number, site symmetry, enumeration) , which we will call template token. It does not correspond to token structure inside Wy Former, but the analysis of such tokens is interesting from the template novelty point of view. Considering all choices of space group Euclidean normalizer, there are 1047 unique template tokens in MP-20. The distribution is plotted in Figure 15. Wyckoff Transformer generates templates tokens not present in the training and validation datasets. For sample size of 9046 it produced 20 new template tokens; for comparison, the similarly-sized test dataset contains 21 new template tokens. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Template token frequency in MP-20 for top 30 tokens 2, 1, 0 12, m, 0 225, m-3m, 0 225, m-3m, 1 225, -43m, 0 1, 1, 0 14, 1, 0 156, 3m., 0 156, 3m., 1 156, 3m., 2 62, .m., 0 71, mm2, 0 166, 3m, 0 71, mm2, 1 8, m, 0 71, m.., 0 63, m2m, 0 216, -43m, 0 216, -43m, 1 216, -43m, 3 216, -43m, 2 139, 4mm, 0 5, 1, 0 47, mm2, 1 47, mm2, 2 47, mm2, 0 47, mm2, 3 225, 4m.m, 0 12, 1, 0 129, 4mm, 0 0.000 0.002 0.004 0.006 0.008 0.010 Token frequency in MP-20 Number of tokens Template token frequency distribution in MP-20 for 1033 tokens with frequency < 0.01 Figure 15. Distribution of template tokens in MP-20 Wyckoff Transformer: Generation of Symmetric Crystals S. Comparison of enumerations for full Space groups Figure 16. Different WPs can have a common site symmetry. In this case, they differ in coordinates. The corresponding column indicates the triples of coordinates of all the included atoms, where x, y, and z are the unfixed parameters that change from 0 to 1. Such collisions could be resolved using letters. However, as seen in the table, letters are not connected to symmetries and differ significantly between space groups. Therefore, we use an approach that numbers positions within a group of WPs with the same site symmetry. The ordering is performed in accordance with (Aroyo et al., 2006).