# sketchgen_generating_constrained_cad_sketches__a0fbd796.pdf Sketch Gen: Generating Constrained CAD Sketches Wamiq Reyaz Para1 Shariq Farooq Bhat 1 Paul Guerrero2 Tom Kelly3 Niloy Mitra2,4 Leonidas Guibas5 Peter Wonka1 1 KAUST 2 Adobe Research 3 University of Leeds 4 University College London 5 Stanford University Computer-aided design (CAD) is the most widely used modeling approach for technical design. The typical starting point in these designs is 2D sketches which can later be extruded and combined to obtain complex three-dimensional assemblies. Such sketches are typically composed of parametric primitives, such as points, lines, and circular arcs, augmented with geometric constraints linking the primitives, such as coincidence, parallelism, or orthogonality. Sketches can be represented as graphs, with the primitives as nodes and the constraints as edges. Training a model to automatically generate CAD sketches can enable several novel workflows, but is challenging due to the complexity of the graphs and the heterogeneity of the primitives and constraints. In particular, each type of primitive and constraint may require a record of different size and parameter types. We propose Sketch Gen as a generative model based on a transformer architecture to address the heterogeneity problem by carefully designing a sequential language for the primitives and constraints that allows distinguishing between different primitive or constraint types and their parameters, while encouraging our model to re-use information across related parameters, encoding shared structure. A particular highlight of our work is the ability to produce primitives linked via constraints that enables the final output to be further regularized via a constraint solver. We evaluate our model by demonstrating constraint prediction for given sets of primitives and full sketch generation from scratch, showing that our approach significantly out performs the state-of-the-art in CAD sketch generation. 1 Introduction Computer Aided Design (CAD) tools are popular for modeling in industrial design and engineering. The employed representations are popular due to their compactness, precision, simplicity, and the fact that they are directly understood by many fabrication procedures. CAD models are essentially a collection of primitives (e.g., lines, arcs) that are held together by a set of relations (e.g., collinear, parallel, tangent). Complex shapes are then formed by a sequence of CAD operations, forming a base shape in 2D that is subsequently lifted to 3D using extrusion operations. Thus, at the core, most CAD models are coplanar constrained sketches, i.e., a sequence of planar curves linked by constraints. For example, CAD models in some of the largest open-sourced datasets like Fusion360 [37] and Sketch Graphs [27] are thus structured, or sketches drawn in algebraic systems like Cinderella [1]. Such sequences are not only useful for shape creation, but also facilitate easy editing and design exploration, as relations are preserved when individual elements are edited. CAD representations are heterogeneous. First, different instructions have their unique sets of parameters, with different parameter counts, data types, and semantics. Second, constraint specifications involve links (i.e., pointers) to existing primitives or parts of primitives. Further, the same final shape 35th Conference on Neural Information Processing Systems (Neur IPS 2021). can be equivalently realized with different sets of primitives and constraints. Thus, CAD models neither come in regular structures (e.g., image grids) nor can they be easily encoded using a fixed length encoding. The heterogeneous nature of CAD models makes it difficult to adopt existing deep learning frameworks to train generative models. One easy-to-code solution is to simply ignore the constraints and rasterize the primitives into an image. While such a representation is easy to train, the output is either in the image domain, or needs to be converted to primitives via some form of differential rendering. In either case, the constraints and the regularization parametric primitives provide is lost, and the conversion to primitives is likely to be unreliable. A slightly better option is to add padding to the primitive and constraint representations, and then treat them as fixed-length encodings for each primitive/constraint. However this only hides the difficulty, as the content of the representation is still heterogeneous, the generator now additionally has to learn the length of the padding that needs to be added, and the increased length is inefficient and becomes unworkable for complex CAD models. We introduce Sketch Gen as a generative framework for learning a distribution of constrained sketches. Our main observation is that the key to working with a heterogeneous representation is a wellstructured language. We develop a language for sketches that is endowed with a simple syntax, where each sketch is represented as a sequence of tokens. We explicitly use the syntax tree of a sequence as additional guidance for our generative model. Our architecture makes use of transformers to capture long range dependencies and outputs both primitives along with their constraints. We can aggressively quantize/approximate the continuous primitive parameters in our sequences as long as the inter-primitive constraints can be correctly generated. A sampled graph can then be converted, if desired, to a physical sketch by solving a constrained optimization using traditional solvers, removing errors that were introduced by the quantization. We evaluate Sketch Gen on one of the largest publicly-available constrained sketch datasets Sketch Graph [27]. We compare with a set of existing and contemporary works, and report noticeable improvement in the quality of the generated sketches. We also present ablation studies to evaluate the efficacy of our various design choices. 2 Related Work Vector graphics generation. Vector graphics, unlike images, are presented in domain-specific languages (e.g., SVG) and are not in a format easily utilized by standard deep learning setups. Recently, various approaches have been proposed by linking images and vectors using differentiable renderer [19, 26] or supervised with vector sequences (e.g., deep SVG [4], SVG-VAE [20], and Cloud2Curve [5]). Deep SVG, mostly closely related to our work, proposes a hierarchical nonautoregressive model for generation, building on command, coordinate, and index embeddings. Structured model generation. Geometric layouts are often represented as graphs encoding relationships between geometric entities. Motivated by the success in image synthesis, several authors attempted to build on generative adversarial networks [17, 22] or variational autoencoder [15]. Recently, autoregressive models like the transformer architecture [31] emerged as an important tool for generative modeling of layouts, e.g. [34, 23, 36]. Closely related to our work are Deep SVG [4] and Polygen [21]. These solutions are also related to modeling with variable length commands, but they simplify the representation so that commands are padded to a fixed length. Polygen is an auto-regressive generative model for 3D meshes that introduces several influential ideas. One of the proposed ideas, that we also build on, is to employ pointer networks [32] to refer to previously generated objects (i.e., linking vertices to form polygonal mesh faces). Unlike Poly Gen, we work with heterogeneous primitives and constraints, and with edges that constrain the primitive parameters, motivating our constraint optimization step. CAD Datasets and CAD generation. Recently, several notable datasets have emerged for CAD models: ABC [13], Fusion360 [37], and Sketch Graphs [27]. Only the latter two include constraints and only Sketch Graphs introduces a framework for generative modeling. There are several papers that focus on the boundary (surface) representation for CAD model generation without constraints such as UV-Net [9] and BRep Net [14]. A related topic is to recover (parse) a CAD-like representation from unstructured input data, e.g. [28, 6, 30, 18, 33, 35, 11, 3, 29, 40, 7, 10]. line 1 x 1 y 1 u 1 v 1 coincident 3 ¹ 2 4 ¹ 3 º midpoint 1 ¹ 1 2 2 x 2 y 2 u 2 v 2 a 1 b 1 r 1 c 1 ... primitive sequence constraint sequence loc. dir. ran. syntax trees midpoint ref sub ref sub ref sub ref Figure 1: A typical CAD sketch consists of primitives such as lines, circles and arcs, and constraints between primitives, shown as floating boxes. We represent them as primitiveand constraint sequences in a language that is endowed with a simple syntax. We show the syntax trees of the first two primitives and constraints. Tokens in the constraint sequence reference primitives (colored token background). Concurrent work. Multiple papers, namely Computer-Aided Design as Language [8], Engineering Sketch Generation for Computer-Aided Design [38], Deep CAD [39], have appeared on ar Xiv very recently, and should be considered to be concurrently works. CAD sketches. A CAD sketch can be defined by a graph S = (P, C), where P is a set of parametric primitives, such as points, lines, and circular arcs and C is a set of constraints between the primitives, such as coincidence, parallelism, or orthogonality. See Figure 1 for an example. Some of the primitives are only used as construction aids and are not part of the final 3D CAD model (shown dashed in Figure 1). They can be used to construct more complex constraints; in Figure 1, for example, the center of the circle is aligned with the midpoint of the yellow line, which serves as a construction aid. Both primitives and constraints in the graph are heterogeneous: different primitives have different numbers and types of parameters and different constraints may reference a different number of primitives. Primitives P P are defined by a type and a tuple of parameters P = (τ, κ, ρ), where τ is the type of primitive, κ is a binary variable which indicates if the primitive is a construction aid, and ρ is a tuple of parameters with a length dependent on the primitive type. A point, for example, has parameters ρ = (x, y). See Figure 2 for a complete list of primitives and their parameter type. Constraints C C are defined by a type and a tuple of target primitives C = (ν, γ), where ν is the constraint type and γ is a tuple of references to primitives with a length dependent on the constraint type. Constraints can target either the entire primitive, or a specific part of the primitive, such as the center of a circle, or the start/end point of a line. In a coincidence constraint, for example, γ = (λ1, µ1, λ2, µ2) refers to two primitives Pλ1 and Pλ2, while µ1, µ2 indicate which part of each primitive the constraint targets, or if it targets the entire primitive. CAD sketch generation with Transformers. We show how a generative model based on the transformer architecture [31] can be used to generate CAD sketches defined by these graphs. Transformers have proven very successful as generative models in a wide range of domains, including geometric layouts [21, 36], but require converting data samples into sequences of tokens. The choice of sequence representation is a main factor influencing the performance of the transformer and has to be designed carefully. The main challenge in our setting is then to find a conversion of our heterogeneous graphs into a suitable sequence of tokens. Will, will Will will Will Will's will? possesive name proper name noun verb proper name aux. verb proper name VP VP NP VP VP: verb phrase NP: noun phrase To this end, we design a language to describe CAD sketches that is endowed with a simple syntax. The syntax imposes constraints on the tokens that can be generated at any given part of the sequence and can help interpreting a sequence of tokens. An extreme example is the famous natural language sentence "Will, will Will will Will Will s will?", which is a valid sentence that is easier to interpret given its syntax tree, as shown in the inset. In natural language processing, syntax is complex and hard to infer automatically from a given sentence, so generative models usually only infer it implicitly in a data-driven approach. On the other end of the spectrum of syntactic complexity are geometric problems such as mesh generation [21], where the syntax consists of repeating triples of vertex coordinates or triangle indices that can easily be given explicitly, for example as indices from 1 to 3 that denote which element of the triple a given token represents. In our case, the grammar is more complex due to the heterogeneous nature of our sketch graphs, but can still be stated explicitly. We show that giving the syntax of a sequence as additional input to the transformer helps sequence interpretation and increases the performance of our generative model for CAD sketches. We describe our language for CAD sketches in Section 4. In this language, primitives are described first, followed by constraints. We train two separate generative models, one model for primitives that we describe in Section 5.1, and a second model for constraints that we describe in Section 5.2. 4 A Language for CAD Sketches We define a formal language for CAD sketches, where each valid sentence is a sequence of tokens Q = (q1, q2, . . . ) that specifies a CAD sketch. A grammar for our language is defined in Figure 2, with production rules for primitives on the left and for constraints on the right. Terminal symbols for primitives include {Λ, Ω, τ, κ, x, y, u, v, a, b} and for constraints {Λ, ν, λ, µ, Ω}. Each terminal symbol denotes a variable that holds the numerical value of one token qi in the sequence Q. The symbols Λ and Ωare special constants; Λ marks the start of a new primitive or constraint, while Ω marks the end of the primitive or constraint sequence. τ, ν, κ, λ, and µ were defined in Section 3 and denote the primitive type, constraint type, construction indicator, primitive reference, and part reference type, respectively. The remaining terminal symbols denote specific parameters of primitives, please refer to the supplementary material for a full description. Syntax trees. A derivation of a given sequence in our grammar can be represented with a syntax tree, where the leafs are the terminal symbols that appear in the sequence, and their ancestors are non-terminal symbols. An example of a sequence for a CAD sketch and its syntax tree are shown in Figure 1. The syntax tree provides additional information about a token sequence that we can use to 1) interpret a given token sequence in order to convert it into a sketch, 2) enforce syntactic rules during generation to ensure generated sequences are well-formed, and 3) help our generative model interpret previously generated tokens, thereby improving its performance. Given a syntax tree T, we create two additional sequences Q3 and Q4. These sequences contain the ancestors of each token from two specific levels of the syntax tree. Qx contains the ancestors of each token at depth x: Qx = (ax T (q1), ax T (q2), . . . ), where ax T (q) is a function that returns the ancestor of q at level x of the syntax tree T, or a filler token if q does not have such an ancestor. Level 3 of the syntax tree contains non-terminal symbols corresponding to primitive or constraint types, such as point, line, or coincident, while level 4 contains parameter types, such as location and direction. The two sequences Q3 and Q4 are used alongside Q as additional input to our generative model. Parsing sketches. To parse a sketch into a sequence of tokens that follows our grammar, we iterate through primitives first and then constraints. For each, we create a corresponding sequence of tokens using derivations in our grammar, choosing production rules based on the type of primitive/constraint and filling in the terminal symbols with the parameters of the primitive/constraint. Concatenating the resulting per-primitive and per-constraint sequences separated by tokens Λ gives us the full sequence Q for the sketch. During the primitive and constraint derivations, we also store the parent and grandparent non-terminal symbols of each token, giving us sequences Q3 and Q4. The primitives and constraints can be sorted by a variety of criteria. In our experiments, we use the order in which the primitives were drawn by the designer of the sketch [27]. In this order, the most constrained primitives typically occur earlier in the sequence. The constraints are arranged based on prevalence in the dataset, constraints that are used more frequently in the dataset occur earlier. Following [21], we decompose our graph generation problem into two parts: p(S) = p(P) |{z} Primitive Model p(C|P) | {z } Constraint Model Both of these models are trained by teacher forcing with a Cross-Entropy loss. We now describe each of the models. Figure 2: Grammar of the CAD sketch language. Each sentence represents a syntactically valid sketch. The full grammar is given in the supplementary material. P = Λ, P, {Λ, P}, Ω P = point | line | circle | arc point = τ point, κ, location line = τ line, κ, location, direction, range ... location = x, y direction = u, v range = a, b ... C = Λ, C, {Λ, C}, Ω C = coincident | parallel | equal | horizontal | vertical | midpoint | perpendicular | tangential coincident = νcoincident, ref, sub, ref, sub parallel = νparallel, ref, ref ... ref = λ sub = µ 1 1 x 1 y 1 u 1 v 1 location direction range 1 1 x 1 y 1 u 1 v 1 a1 b 1 syntax tree º 1 1 ¹ 1 2 ¹ 2 º 1 1 ¹ 1 2 ¹ 2 2 3 4 5 6 7 1 10 8 9 2 3 4 5 6 1 7 n P ... n C ... ref sub ref sub primitive generation constraint generation syntax tree Figure 3: Sequence generation approach. The sequence Q from 1 . . . n P describes the primitives and from n P + 1 . . . n C the constraints of a sketch. We use two separate generators for the two sub-sequences (blue for primitives, green for constraints). The sequences Q4 and Q3 describe part of the syntax tree of Q and are used as additional input. 5.1 Primitive Generator Quantization. Most of the primitive parameters are continuous and have different value distributions. For example, locations are more likely to occur near the origin and their distribution tapers off outwards, while there is a bias towards axis alignment in direction parameters. To account for these differences, we quantize each continuous parameter type (location, direction, range, radius, rotation) separately. Due to the non-uniform distribution of parameter values in each parameter type, a uniform quantization would be wasteful. Instead, we find the quantization bins using k-means clustering of all parameter values of a given parameter type in the dataset, with k = 256. Input encoding. In addition the the three inputs sequences Q, Q3, and Q4 described in Section 4, we use a fourth sequence QI of token indices that provides information about the global position in the sequence. Figure 3 gives an overview of the input sequences and the generation process. We use four different learned embeddings for the input tokens, one for each sequence, that we sum up to obtain an input feature: fi = ξqi + ξ3 q3 i + ξ4 q4 i + ξI q I i , (1) where q i Q and ξ are learned dictionaries that are trained together with the generator. Sequence generation. We use a transformer decoder network [31] as generator. As an autoregressive model, it decomposes the joint probability p(Q) of a sequence Q into a product of conditional probabilities: p(Q) = Q n p(qn|qnp, where np is the number of primitive tokens in Q. Primitive encoding. We use the same quantization for parameters described in the previous section, but use a different set of learned embeddings, one for each primitive terminal token. The feature h j for primitive j is the sum of the embeddings for its tokens: h j = ξτ τj + ξκ κj + ξx xj + ξy yj + ξu uj + ξv vj + ξr rj + ξc cj + ξa aj + ξb bj, (3) where τj, κj, . . . are the tokens for primitive j. We use a special filler value for tokens that are missing in primitives. We follow the strategy proposed in Poly Gen [21] to further encode the context of each primitive into its feature vector using a transformer encoder: h j = e(h j , H ), (4) where H is the sequence of all primitive features h j . The transformer encoder e is trained jointly with the constraint generator. Input encoding. In addition to the sequence Q, we use the two sequences Q4 and QI as inputs, but do not use Q3 as we did not notice an increase in performance when adding it. The final input feature is then: hi = h qi + ξ4C q4 i + ξIC q I i , (5) where h qi is the feature for the primitive with index qi, and ξ4C, ξIC are learned dictionaries that are trained together with the constraint generator. Sequence generation. Similar to the primitive generator, the constraint generator outputs a probability distribution over the values for the current token in each step: p(qi|h>npnpnpnpnp