# sketchgen_generating_constrained_cad_sketches__a0fbd796.pdf

Sketch Gen: Generating Constrained CAD Sketches

Wamiq Reyaz Para1 Shariq Farooq Bhat 1 Paul Guerrero2

Tom Kelly3 Niloy Mitra2,4 Leonidas Guibas5 Peter Wonka1

1 KAUST 2 Adobe Research 3 University of Leeds 4 University College London 5 Stanford University

Computer-aided design (CAD) is the most widely used modeling approach for technical design. The typical starting point in these designs is 2D sketches which can later be extruded and combined to obtain complex three-dimensional assemblies. Such sketches are typically composed of parametric primitives, such as points, lines, and circular arcs, augmented with geometric constraints linking the primitives, such as coincidence, parallelism, or orthogonality. Sketches can be represented as graphs, with the primitives as nodes and the constraints as edges. Training a model to automatically generate CAD sketches can enable several novel workﬂows, but is challenging due to the complexity of the graphs and the heterogeneity of the primitives and constraints. In particular, each type of primitive and constraint may require a record of different size and parameter types. We propose Sketch Gen as a generative model based on a transformer architecture to address the heterogeneity problem by carefully designing a sequential language for the primitives and constraints that allows distinguishing between different primitive or constraint types and their parameters, while encouraging our model to re-use information across related parameters, encoding shared structure. A particular highlight of our work is the ability to produce primitives linked via constraints that enables the ﬁnal output to be further regularized via a constraint solver. We evaluate our model by demonstrating constraint prediction for given sets of primitives and full sketch generation from scratch, showing that our approach signiﬁcantly out performs the state-of-the-art in CAD sketch generation.

1 Introduction

Computer Aided Design (CAD) tools are popular for modeling in industrial design and engineering. The employed representations are popular due to their compactness, precision, simplicity, and the fact that they are directly understood by many fabrication procedures.

CAD models are essentially a collection of primitives (e.g., lines, arcs) that are held together by a set of relations (e.g., collinear, parallel, tangent). Complex shapes are then formed by a sequence of CAD operations, forming a base shape in 2D that is subsequently lifted to 3D using extrusion operations. Thus, at the core, most CAD models are coplanar constrained sketches, i.e., a sequence of planar curves linked by constraints. For example, CAD models in some of the largest open-sourced datasets like Fusion360 [37] and Sketch Graphs [27] are thus structured, or sketches drawn in algebraic systems like Cinderella [1]. Such sequences are not only useful for shape creation, but also facilitate easy editing and design exploration, as relations are preserved when individual elements are edited.

CAD representations are heterogeneous. First, different instructions have their unique sets of parameters, with different parameter counts, data types, and semantics. Second, constraint speciﬁcations involve links (i.e., pointers) to existing primitives or parts of primitives. Further, the same ﬁnal shape

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

can be equivalently realized with different sets of primitives and constraints. Thus, CAD models neither come in regular structures (e.g., image grids) nor can they be easily encoded using a ﬁxed length encoding. The heterogeneous nature of CAD models makes it difﬁcult to adopt existing deep learning frameworks to train generative models. One easy-to-code solution is to simply ignore the constraints and rasterize the primitives into an image. While such a representation is easy to train, the output is either in the image domain, or needs to be converted to primitives via some form of differential rendering. In either case, the constraints and the regularization parametric primitives provide is lost, and the conversion to primitives is likely to be unreliable. A slightly better option is to add padding to the primitive and constraint representations, and then treat them as ﬁxed-length encodings for each primitive/constraint. However this only hides the difﬁculty, as the content of the representation is still heterogeneous, the generator now additionally has to learn the length of the padding that needs to be added, and the increased length is inefﬁcient and becomes unworkable for complex CAD models.

We introduce Sketch Gen as a generative framework for learning a distribution of constrained sketches. Our main observation is that the key to working with a heterogeneous representation is a wellstructured language. We develop a language for sketches that is endowed with a simple syntax, where each sketch is represented as a sequence of tokens. We explicitly use the syntax tree of a sequence as additional guidance for our generative model. Our architecture makes use of transformers to capture long range dependencies and outputs both primitives along with their constraints. We can aggressively quantize/approximate the continuous primitive parameters in our sequences as long as the inter-primitive constraints can be correctly generated. A sampled graph can then be converted, if desired, to a physical sketch by solving a constrained optimization using traditional solvers, removing errors that were introduced by the quantization.

We evaluate Sketch Gen on one of the largest publicly-available constrained sketch datasets Sketch Graph [27]. We compare with a set of existing and contemporary works, and report noticeable improvement in the quality of the generated sketches. We also present ablation studies to evaluate the efﬁcacy of our various design choices.

2 Related Work

Vector graphics generation. Vector graphics, unlike images, are presented in domain-speciﬁc languages (e.g., SVG) and are not in a format easily utilized by standard deep learning setups. Recently, various approaches have been proposed by linking images and vectors using differentiable renderer [19, 26] or supervised with vector sequences (e.g., deep SVG [4], SVG-VAE [20], and Cloud2Curve [5]). Deep SVG, mostly closely related to our work, proposes a hierarchical nonautoregressive model for generation, building on command, coordinate, and index embeddings.

Structured model generation. Geometric layouts are often represented as graphs encoding relationships between geometric entities. Motivated by the success in image synthesis, several authors attempted to build on generative adversarial networks [17, 22] or variational autoencoder [15]. Recently, autoregressive models like the transformer architecture [31] emerged as an important tool for generative modeling of layouts, e.g. [34, 23, 36]. Closely related to our work are Deep SVG [4] and Polygen [21]. These solutions are also related to modeling with variable length commands, but they simplify the representation so that commands are padded to a ﬁxed length. Polygen is an auto-regressive generative model for 3D meshes that introduces several inﬂuential ideas. One of the proposed ideas, that we also build on, is to employ pointer networks [32] to refer to previously generated objects (i.e., linking vertices to form polygonal mesh faces). Unlike Poly Gen, we work with heterogeneous primitives and constraints, and with edges that constrain the primitive parameters, motivating our constraint optimization step.

CAD Datasets and CAD generation. Recently, several notable datasets have emerged for CAD models: ABC [13], Fusion360 [37], and Sketch Graphs [27]. Only the latter two include constraints and only Sketch Graphs introduces a framework for generative modeling. There are several papers that focus on the boundary (surface) representation for CAD model generation without constraints such as UV-Net [9] and BRep Net [14]. A related topic is to recover (parse) a CAD-like representation from unstructured input data, e.g. [28, 6, 30, 18, 33, 35, 11, 3, 29, 40, 7, 10].

line 1 x 1 y 1 u 1 v 1

coincident 3 ¹ 2 4 ¹ 3 º

midpoint 1 ¹ 1 2

2 x 2 y 2 u 2 v 2 a 1 b 1 r 1 c 1 ...

primitive sequence

constraint sequence

loc. dir. ran.

syntax trees

midpoint ref sub ref sub

ref sub ref

Figure 1: A typical CAD sketch consists of primitives such as lines, circles and arcs, and constraints between primitives, shown as ﬂoating boxes. We represent them as primitiveand constraint sequences in a language that is endowed with a simple syntax. We show the syntax trees of the ﬁrst two primitives and constraints. Tokens in the constraint sequence reference primitives (colored token background).

Concurrent work. Multiple papers, namely Computer-Aided Design as Language [8], Engineering Sketch Generation for Computer-Aided Design [38], Deep CAD [39], have appeared on ar Xiv very recently, and should be considered to be concurrently works.

CAD sketches. A CAD sketch can be deﬁned by a graph S = (P, C), where P is a set of parametric primitives, such as points, lines, and circular arcs and C is a set of constraints between the primitives, such as coincidence, parallelism, or orthogonality. See Figure 1 for an example. Some of the primitives are only used as construction aids and are not part of the ﬁnal 3D CAD model (shown dashed in Figure 1). They can be used to construct more complex constraints; in Figure 1, for example, the center of the circle is aligned with the midpoint of the yellow line, which serves as a construction aid. Both primitives and constraints in the graph are heterogeneous: different primitives have different numbers and types of parameters and different constraints may reference a different number of primitives. Primitives P P are deﬁned by a type and a tuple of parameters P = (τ, κ, ρ), where τ is the type of primitive, κ is a binary variable which indicates if the primitive is a construction aid, and ρ is a tuple of parameters with a length dependent on the primitive type. A point, for example, has parameters ρ = (x, y). See Figure 2 for a complete list of primitives and their parameter type. Constraints C C are deﬁned by a type and a tuple of target primitives C = (ν, γ), where ν is the constraint type and γ is a tuple of references to primitives with a length dependent on the constraint type. Constraints can target either the entire primitive, or a speciﬁc part of the primitive, such as the center of a circle, or the start/end point of a line. In a coincidence constraint, for example, γ = (λ1, µ1, λ2, µ2) refers to two primitives Pλ1 and Pλ2, while µ1, µ2 indicate which part of each primitive the constraint targets, or if it targets the entire primitive.

CAD sketch generation with Transformers. We show how a generative model based on the transformer architecture [31] can be used to generate CAD sketches deﬁned by these graphs. Transformers have proven very successful as generative models in a wide range of domains, including geometric layouts [21, 36], but require converting data samples into sequences of tokens. The choice of sequence representation is a main factor inﬂuencing the performance of the transformer and has to be designed carefully. The main challenge in our setting is then to ﬁnd a conversion of our heterogeneous graphs into a suitable sequence of tokens.

Will, will Will will Will Will's will?

possesive name proper name noun verb proper name aux. verb proper name

VP VP NP VP

VP: verb phrase NP: noun phrase To this end, we design a language to describe CAD sketches that is endowed with a simple syntax. The syntax imposes constraints on the tokens that can be generated at any given part of the sequence and can help interpreting a sequence of tokens. An extreme example is the famous natural language sentence "Will, will Will will Will Will s will?", which is a valid sentence that is easier to interpret given its syntax tree, as shown in the inset. In natural language processing, syntax is complex and hard to infer automatically from a given sentence, so generative models usually only infer it implicitly in a data-driven approach. On the other end of the spectrum of syntactic complexity are geometric problems such as mesh generation [21], where the syntax consists of repeating triples of vertex coordinates or triangle indices that can easily be given explicitly, for example as indices from 1 to 3 that denote which element of the triple a given token represents. In our case, the grammar

is more complex due to the heterogeneous nature of our sketch graphs, but can still be stated explicitly. We show that giving the syntax of a sequence as additional input to the transformer helps sequence interpretation and increases the performance of our generative model for CAD sketches.

We describe our language for CAD sketches in Section 4. In this language, primitives are described ﬁrst, followed by constraints. We train two separate generative models, one model for primitives that we describe in Section 5.1, and a second model for constraints that we describe in Section 5.2.

4 A Language for CAD Sketches

We deﬁne a formal language for CAD sketches, where each valid sentence is a sequence of tokens Q = (q1, q2, . . . ) that speciﬁes a CAD sketch. A grammar for our language is deﬁned in Figure 2, with production rules for primitives on the left and for constraints on the right. Terminal symbols for primitives include {Λ, Ω, τ, κ, x, y, u, v, a, b} and for constraints {Λ, ν, λ, µ, Ω}. Each terminal symbol denotes a variable that holds the numerical value of one token qi in the sequence Q. The symbols Λ and Ωare special constants; Λ marks the start of a new primitive or constraint, while Ω marks the end of the primitive or constraint sequence. τ, ν, κ, λ, and µ were deﬁned in Section 3 and denote the primitive type, constraint type, construction indicator, primitive reference, and part reference type, respectively. The remaining terminal symbols denote speciﬁc parameters of primitives, please refer to the supplementary material for a full description.

Syntax trees. A derivation of a given sequence in our grammar can be represented with a syntax tree, where the leafs are the terminal symbols that appear in the sequence, and their ancestors are non-terminal symbols. An example of a sequence for a CAD sketch and its syntax tree are shown in Figure 1. The syntax tree provides additional information about a token sequence that we can use to 1) interpret a given token sequence in order to convert it into a sketch, 2) enforce syntactic rules during generation to ensure generated sequences are well-formed, and 3) help our generative model interpret previously generated tokens, thereby improving its performance. Given a syntax tree T, we create two additional sequences Q3 and Q4. These sequences contain the ancestors of each token from two speciﬁc levels of the syntax tree. Qx contains the ancestors of each token at depth x: Qx = (ax T (q1), ax T (q2), . . . ), where ax T (q) is a function that returns the ancestor of q at level x of the syntax tree T, or a ﬁller token if q does not have such an ancestor. Level 3 of the syntax tree contains non-terminal symbols corresponding to primitive or constraint types, such as point, line, or coincident, while level 4 contains parameter types, such as location and direction. The two sequences Q3 and Q4 are used alongside Q as additional input to our generative model.

Parsing sketches. To parse a sketch into a sequence of tokens that follows our grammar, we iterate through primitives ﬁrst and then constraints. For each, we create a corresponding sequence of tokens using derivations in our grammar, choosing production rules based on the type of primitive/constraint and ﬁlling in the terminal symbols with the parameters of the primitive/constraint. Concatenating the resulting per-primitive and per-constraint sequences separated by tokens Λ gives us the full sequence Q for the sketch. During the primitive and constraint derivations, we also store the parent and grandparent non-terminal symbols of each token, giving us sequences Q3 and Q4. The primitives and constraints can be sorted by a variety of criteria. In our experiments, we use the order in which the primitives were drawn by the designer of the sketch [27]. In this order, the most constrained primitives typically occur earlier in the sequence. The constraints are arranged based on prevalence in the dataset, constraints that are used more frequently in the dataset occur earlier.

Following [21], we decompose our graph generation problem into two parts:

p(S) = p(P) |{z} Primitive Model

p(C|P) | {z } Constraint Model

Both of these models are trained by teacher forcing with a Cross-Entropy loss. We now describe each of the models.

Figure 2: Grammar of the CAD sketch language. Each sentence represents a syntactically valid sketch. The full grammar is given in the supplementary material.

P = Λ, P, {Λ, P}, Ω P = point | line | circle | arc point = τ point, κ, location line = τ line, κ, location, direction, range ... location = x, y direction = u, v range = a, b ...

C = Λ, C, {Λ, C}, Ω C = coincident | parallel | equal | horizontal | vertical | midpoint | perpendicular | tangential coincident = νcoincident, ref, sub, ref, sub parallel = νparallel, ref, ref ... ref = λ sub = µ

1 1 x 1 y 1 u 1 v 1

location direction range

1 1 x 1 y 1 u 1 v 1 a1 b 1

syntax tree

º 1 1 ¹ 1 2 ¹ 2

º 1 1 ¹ 1 2 ¹ 2

2 3 4 5 6 7 1 10 8 9 2 3 4 5 6 1 7 n P ... n C ...

ref sub ref sub

primitive generation constraint generation

syntax tree

Figure 3: Sequence generation approach. The sequence Q from 1 . . . n P describes the primitives and from n P + 1 . . . n C the constraints of a sketch. We use two separate generators for the two sub-sequences (blue for primitives, green for constraints). The sequences Q4 and Q3 describe part of the syntax tree of Q and are used as additional input.

5.1 Primitive Generator

Quantization. Most of the primitive parameters are continuous and have different value distributions. For example, locations are more likely to occur near the origin and their distribution tapers off outwards, while there is a bias towards axis alignment in direction parameters. To account for these differences, we quantize each continuous parameter type (location, direction, range, radius, rotation) separately. Due to the non-uniform distribution of parameter values in each parameter type, a uniform quantization would be wasteful. Instead, we ﬁnd the quantization bins using k-means clustering of all parameter values of a given parameter type in the dataset, with k = 256.

Input encoding. In addition the the three inputs sequences Q, Q3, and Q4 described in Section 4, we use a fourth sequence QI of token indices that provides information about the global position in the sequence. Figure 3 gives an overview of the input sequences and the generation process. We use four different learned embeddings for the input tokens, one for each sequence, that we sum up to obtain an input feature: fi = ξqi + ξ3 q3 i + ξ4 q4 i + ξI q I i , (1)

where q i Q and ξ are learned dictionaries that are trained together with the generator.

Sequence generation. We use a transformer decoder network [31] as generator. As an autoregressive model, it decomposes the joint probability p(Q) of a sequence Q into a product of conditional probabilities: p(Q) = Q

n p(qn|q<n). In our case, the probabilities are conditioned on the input features p(qn|q<n) = p(qn|f<n) where f<i denotes the sequence of input features up to (excluding) position i. Each step applies the network g P to compute the probability distribution over all discrete values for the next token qi: p(qi | f<i) = g P (f<i). (2) At training time all input sequences are obtained from the ground truth. At inference time, the sequence Q is sampled from the output probabilities p(qi | f<i) using nucleus sampling, and sequences Q3 and Q4 are constructed on the ﬂy based on the generated primitive type τ, as shown in

Figure 3. In addition to providing guidance to the network, the syntax described in Q3 and Q4 allows us to limit generated token values to a valid range. For example, we correct the generated values for tokens Λ and Ωto the expected special values if a different value has been generated.

5.2 Constraint Generator

The constraint generator is implemented as a Pointer Network [32] where each step returns an index into a list of encoded primitives. These indices form the constraint part of the sequence q>np, where np is the number of primitive tokens in Q.

Primitive encoding. We use the same quantization for parameters described in the previous section, but use a different set of learned embeddings, one for each primitive terminal token. The feature h j for primitive j is the sum of the embeddings for its tokens:

h j = ξτ τj + ξκ κj + ξx xj + ξy yj + ξu uj + ξv vj + ξr rj + ξc cj + ξa aj + ξb bj, (3)

where τj, κj, . . . are the tokens for primitive j. We use a special ﬁller value for tokens that are missing in primitives. We follow the strategy proposed in Poly Gen [21] to further encode the context of each primitive into its feature vector using a transformer encoder:

h j = e(h j , H ), (4)

where H is the sequence of all primitive features h j . The transformer encoder e is trained jointly with the constraint generator.

Input encoding. In addition to the sequence Q, we use the two sequences Q4 and QI as inputs, but do not use Q3 as we did not notice an increase in performance when adding it. The ﬁnal input feature is then:

hi = h qi + ξ4C q4 i + ξIC q I i , (5)

where h qi is the feature for the primitive with index qi, and ξ4C, ξIC are learned dictionaries that are trained together with the constraint generator.

Sequence generation. Similar to the primitive generator, the constraint generator outputs a probability distribution over the values for the current token in each step: p(qi|h>np<i), conditioned on the input features for the previously generated tokens in the constraint sequence, denoted as h>np<i. Unlike the primitive generator, we follow Pointer Networks [32] in computing the probability as dot product between a the output feature of the generator network and the primitive features:

P(qi = j|q>np<i) = h j g C(h>np<i), (6)

where g C is the constraint generator. This effectively gives us a probability distribution over indices into our list of primitive embeddings. Constraints may also reference sub-parts of primitives, such as line endpoints, deﬁned by the µ tokens of the constraint (see Figure 3). For µ tokens, the indices into primitive embeddings are interpreted as IDs for the sub-part; each primitive can have up to 4 sub-parts (see the supplementary for details). At training time all input sequences are obtained from the ground truth. At inference time, the sequence Q is sampled from the output probabilities p(qi|h>np<i) using nucleus sampling, and the sequence Q4 is constructed on the ﬂy based on the generated constraint type ν, as shown in Figure 3. Similar to primitive generation, the syntax in Q4 provides additional guidance to the network and allows us to limit generated token values to a valid range.

We evaluate our approach on two main applications. We experiment with generating sketches from scratch and also demonstrate an application that we call auto-constraining sketches, where we infer plausible constraints for existing primitives. We evaluate conditional models in the supplementary.

0 14 0 14 0 14

0.2 0.35 0.35

probability

circle per sketch line per sketch coincident per sketch horizontal per sketch data ours

primitive count primitive count constraint count constraint count

Figure 4: Sketch statistics. We compare the the distribution of circle and line counts per sketch and the distribution of coincident and horizontal constraint counts in generated and data sketches.

Dataset. We train and evaluate on the recent Sketchgraphs dataset [27], which contains 15 million real-world CAD sketches (licensed without any usage restriction), obtained from On Shape [2], a webbased CAD modeling platform. These sketches are impressively diverse, however, simple sketches with few degrees of freedom tend to have many near-identical copies in the dataset. These make up 84% of the dataset and would bias our results signiﬁcantly. For this reason, we ﬁlter out sketches with less than 6 primitives, leaving us with roughly 2.4 million sketches. Additionally we ﬁlter sketches with constraint sequences of more than 208 tokens, which are typically over-constrained and constitute < 0.1% of the sketches. We focus on the 4 most common primitive types and the 8 most common constraint types (see Figure 2 for a full list), and remove any other primitives and constraints from our sketches. We keep aside a random subset of 50k samples as validation set and 86k samples as test set.

Experimental Setup. We implemented our models in Py Torch [24], using GPT-2 [25] like Transformer blocks. For primitive generation, we use 24 blocks, 12 attention heads, an embedding dimension of 528 and a batch size of 544. For constraint generation, the encoder has 22 layers and the pointer network 16 layers. Both have 12 attention heads, an embedding dimension of 264 and use a batch size of 1536. We use the Adam optimizer [12] with a learning rate of 0.0001. Training was performed for 40 epochs on 8 V100 GPUs for the primitive model and for 80 epochs on 8 A100 GPUs for the constraint model. See the supplementary material for more details.

Baselines. We use the sketch generation approach proposed in Sketch Graphs [27] as the main baseline, which is based on a graph neural network that operates directly on sketch graphs. Due to the recent publication of the Sketch Graphs dataset, to our knowledge this is still the only established baseline for data-driven CAD Sketch generation. This baseline has two variants: SG-sketch generates full sketches and SG-constraint generates constraints only on a given set of primitives. We re-train both variants on our dataset. Additionally we retrain a Deep SVG [4] model on the Sketch Graphs dataset. Details of the retraining for both can be found in the supplementary material. As a lower bound for the generation performance, we also include a random baseline, where token values in each step are picked with a uniform random distribution over all possible values. Additionally, for constraint generation, we compare to an auto-constraining method with hand-crafted rules, and as ablation, we compare to variants of our own model that do not use the syntax tree, or only part of the syntax tree as input.

Table 1: Sketch generation. We compare the quality of our learned distribution over sketches to two baselines on the left, and compare three variants of our method using statistics over generated sequences on the right.

(a) Metrics on the test set.

NLL in bits per

method sketch prim. constr.

random 1020.73 70.97 24.14 SG-sketch 158.90 - 2.42 Deep SVG 100.26 11.49 - ours 88.22 8.60 0.61

(b) Metrics on generated sequences.

method Esyntax EP stat EC stat Estat

ours (p = 1.0) 19.8 0.0058 0.0134 0.0192 ours (p = 0.9) 18.3 0.0185 0.0442 0.0627

Figure 5: Examples of generated sketches before (gray background) and after optimization to satisfy the generated constraints. Note that the optimization corrects quantization errors (red arrows point out a few changed details). The right-most example is a perturbed testset sketch.

Sketch Generation Metrics. We report sketch generation performance using three metrics. The negative log-likelihood (NLL) of test set sketches in our models (in bits) measures how much the learned distribution of sketches differs from the test set distribution (lower numbers mean better agreement). In addition to this metric of the generator s performance on the test set, we use two metrics that evaluate the quality of generated sequences. The syntactic error (Esyntax) measures the percentage of sequences with invalid syntax if we do not constrain tokens to be syntactically valid. Lastly, the statistical error (Estat), measures how much the statistics of generated sketches differ from the ground truth statistics. We compute Estat based on statistics like the number of point primitives per sketch, the distribution of typical line directions, or relative postioning errors, each represented as normalized histogram. Estat is then the earth mover s distance [16] (EMD) between the histograms of generated sketches and test set sketches, see the supplementary material for details. We further split up Estat into EP stat for statistics relating to primitives and EC stat for constraints.

Sketch Generation Results. Quantitative results are shown in Table 1. Metrics computed on the test set are shown on the left. We can see that our method leads to the smallest NLL, indicating that the distribution of generated sketches more closely aligns with the test set distribution than both Deep SVG and SG-sketch. Since primitives are constructed using constraints in SG-sketch, we do not show the NLL per primitive for that baseline. Deep SVG can only generate primitives, but as we see in the results, still has lower performance than our model. All three methods perform far better than the upper bound given by the random baseline.

On the right, metrics are computed on 15k generated sequences. We compare two different variants of our results, using two different nucleus sampling parameters p. Nucleus sampling clips the tails of the generated distribution, which tend to have lower-quality samples. Thus, without nucleus sampling (p = 1.0) we see an increase in the syntactic error, due to the lower quality samples in the tail, but a decrease in the statistical error, since, without the clipped tails, the distribution more accurately resembles the data distribution. We show a few of the statistics we used to compute Estat in Figure 4. We can verify that our generated distributions closely align with the data distribution. Additional statistics are shown in the supplementary material.

The generated constraints can be used to correct errors in the primitive parameters that may arise, for example, due to quantization. In Figure 5, all sketches except the right-most sketch are examples of generated sketches before and after optimizing to satisfy the generated constraints, using the constraint solver provided by On Shape [2]. We can see that our our generated sketches are visually plausible and that the constraint generator ﬁnds a plausible set of constraints, that correctly closes gaps between adjacent line endpoints, among other misalignments.

Auto-constraining Sketches. Another potential application of our model is to predict a plausible set of constraints for a given set of primitives, for example to expose only useful degrees of freedom

Table 2: Auto-constraining sketches. We compare the quality of our learned distribution over constraints to two baselines on the left, and compare three variants of our method using various metrics over the generated constraint sequences on the right.

(a) Metrics on the test set.

NLL in bits per

method seq. constr.

random 259.40 24.14 SG-constraint 19.29 1.22 ours 8.28 0.61

(b) Metrics on generated sequences.

method Esyntax EC stat

ours (p = 1.0) 1.56 0.0415 ours (p = 0.9) 1.29 0.0442

Table 3: Ablation on the primitive model.

method NLL/seq. NLL/prim. NLL/token

ours 79.94 8.600 0.979 ours -Q3 79.96 8.605 0.980 ours -Q3 -Q4 80.35 8.640 0.984

Table 4: Ablation on the constraint model.

method NLL/seq. NLL/constr. NLL/token

ours 8.28 0.610 0.110 ours -Q4 8.47 0.633 0.113 ours -shared 8.45 0.627 0.112

for editing the sketch, or to correct slight misalignments in the sketch. To evaluate this application, we separately measure the constraint generation performance given a set of primitives, using the metrics described previously.

Quantitative results are shown in Table 2. On the left, we see similar results for constraint generation as for full sketch generation: our learned distribution over constraint sequences is signiﬁcantly closer to the dataset distribution (lower NLL) than SG-constraint, and both are far from the upper bound given by the random baseline. On the right, we compute constraints for the primitives of 15k previously generated sketches, using two different nucleus sampling parameters p. Similar to the previous section, deactivating nucleus sampling results in an increase of the syntactic error, but a decrease in the statistical error.

Constraints can also be generated for primitives that do not come from the primitive generator. We perturb the primitives in all test set sketches, generate constraints, and optimize the primitives to satisfy the generated constraints using the On Shape optimizer. Since we have ground truth constraints for the test set, we can measure the accuracy of the generated constraints. The average accuracy on the test set is 98.4%. An example is shown in the right-most shape of Figure 5, where the left version (gray background) is the perturbed test set shape and the right version the same shape after optimization. Additional results are shown in the supplementary material.

Ablation. We ablate the primitive and constraint models separately. We aim to show that our syntax provides prior knowledge about the the otherwise ambiguous structure of a sketch sequence that improves the generative performance of our models. In Table 3, we show the result of removing sequences Q3 and/or Q4, which contain information about the sequence syntax, from the input of the primitive generator. Removing only Q3 results in a slight performance degradation, while removing both results in a more signiﬁcant drop. In Table 4, we see that removing Q4 also causes a signiﬁcant performance drop in the constraint generator. Additionally, we show the importance of grouping the tokens by parameter type in the shared embeddings of the primitive encoder (see Eq. 3). Mixing tokens with different parameter types in the shared embeddings, denoted as ours -shared, causes a signiﬁcant drop in performance.

7 Conclusion

In this work, we improve upon the state of the art in CAD sketch generation through the use of transformers coupled with a carefully designed sketch language and an explicit use of its syntax. The Sketch Gen framework enables the full generation of CAD sketches, including primitives and constraints, or auto-constraining existing sketches by augmenting them with generated constraints.

We left a few limitations for future work. First, we chose only the most common types of primitives and constraints in the dataset to avoid learning from the long tail of the dataset distribution. We can

easily incorporate more types by extending our grammar, and in future work it would be interesting to experiment with using more complex and less common types, such as Bezier curves and distance constraints. Second, the autoregressive nature of our model prevents correcting errors in earlier parts of the sequence and we would like to explore backtracking to correct these errors in future work.

We are excited about future research into generating complex parametric geometry and in examining the explicit use of formal languages and generative language models.

8 Broader Impact

There are no foreseeable societal impacts speciﬁc to our method. There are societal impacts of generative modeling, machine learning, and deep learning in general that are shared by papers in these areas. The discussion of these broader topics is beyond the scope of this paper.

9 Acknowledgements

This work was suppported in part by ARL grant W911NF2120104, and a Vannevar Bush Faculty Fellowship. We would like to acknowledge gifts from Adobe, Autodesk and the UCL AI Centre. We thank the KAUST Supercomputing Lab (KSL) for providing compute infrastructure. Finally, we thank the anonymous reviewers for their constructive comments.

[1] Cinderella. https://www.cinderella.de/. Accessed: 2021-05-28.

[2] Onshape. https://www.onshape.com/. Accessed: 2021-05-28.

[3] Emre Aksan, Thomas Deselaers, Andrea Tagliasacchi, and Otmar Hilliges. Cose: Compositional stroke embeddings. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 10041 10052. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/ 723e8f97fde15f7a8d5ff8d558ea3f16-Paper.pdf.

[4] Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 16351 16361. Curran Associates, Inc., 2020.

[5] Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. Cloud2curve: Generation and vectorization of parametric sketches. ar Xiv preprint ar Xiv:2103.15536, 2021.

[6] Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B Tenenbaum. Learning to infer graphics programs from hand-drawn images. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 6062 6071, 2018.

[7] Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama. Write, execute, assess: Program synthesis with a repl. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

[8] Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, and Stefano Saliceti. Computer-aided design as language. ar Xiv preprint ar Xiv:2105.02769, 2021.

[9] Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph Lambourne, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from curve-networks and solids. ar Xiv preprint ar Xiv:2006.10211, 2020.

[10] R Kenny Jones, Theresa Barton, Xianghao Xu, Kai Wang, Ellen Jiang, Paul Guerrero, Niloy J Mitra, and Daniel Ritchie. Shapeassembly: Learning to generate programs for 3d shape structure synthesis. ACM Transactions on Graphics (TOG), 39(6):1 20, 2020.

[11] Kacper Kania, Maciej Zi eba, and Tomasz Kajdanowicz. Ucsg-net unsupervised discovering of constructive solid geometry tree. ar Xiv preprint ar Xiv:2006.09102, 2020.

[12] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR, 2015.

[13] Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9601 9611, 2019.

[14] Joseph G. Lambourne, Karl D.D. Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[15] Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B. Le, Haifeng Gong, Ming-Hsuan Yang, and Weilong Yang. Neural design network: Graphic layout generation with constraints. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, ECCV 2020, pages 491 506, 2020.

[16] E. Levina and P. Bickel. The earth mover s distance is the mallows distance: some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2001.

[17] Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. Layoutgan: Generating graphic layouts with wireframe discriminators. In International Conference on Learning Representations, 2018.

[18] Lingxiao Li, Minhyuk Sung, Anastasia Dubrovina, Li Yi, and Leonidas J Guibas. Supervised ﬁtting of geometric primitives to 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2652 2660, 2019.

[19] Tzu-Mao Li, Michal Lukáˇc, Michaël Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics (TOG), 39(6):1 15, 2020.

[20] Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vector graphics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.

[21] Charlie Nash, Yaroslav Ganin, SM Eslami, and Peter W Battaglia. Polygen: An autoregressive generative model of 3d meshes. ar Xiv preprint ar Xiv:2002.10880, 2020.

[22] Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Furukawa. House-gan: Relational generative adversarial networks for graph-constrained house layout generation. 2020.

[23] Wamiq Para, Paul Guerrero, Tom Kelly, Leonidas Guibas, and Peter Wonka. Generative layout modeling using constraint graphs. ar Xiv preprint ar Xiv:2011.13417, 2020.

[24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary De Vito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 2019.

[25] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.

[26] Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. ar Xiv preprint ar Xiv:2102.02798, 2021.

[27] Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketch Graphs: A large-scale dataset for modeling relational geometry in computer-aided design. In ICML 2020 Workshop on Object-Oriented Learning, 2020.

[28] Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. Csgnet: Neural shape parser for constructive solid geometry. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[29] Gopal Sharma, Difan Liu, Subhransu Maji, Evangelos Kalogerakis, Siddhartha Chaudhuri, and Radomír Mˇech. Parsenet: A parametric surface ﬁtting network for 3d point clouds. In European Conference on Computer Vision, pages 261 276. Springer, 2020.

[30] Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu. Learning to infer and execute 3d shape programs. In International Conference on Learning Representations, 2019.

[31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.

[32] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, 2015.

[33] Homer Walke, R Kenny Jones, and Daniel Ritchie. Learning to infer shape programs using latent execution self training. ar Xiv preprint ar Xiv:2011.13045, 2020.

[34] Kai Wang, Manolis Savva, Angel X. Chang, and Daniel Ritchie. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics, 2018.

[35] Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, and Hao Zhang. Pie-net: Parametric inference of point cloud edges. In Advances in Neural Information Processing Systems, 2020.

[36] Xinpeng Wang, Chandan Yeshwanth, and Matthias Nießner. Sceneformer: Indoor scene generation with transformers. ar Xiv preprint ar Xiv:2012.09793, 2020.

[37] Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences. ACM Transactions on Graphics (TOG), 40(4), 2021.

[38] Karl DD Willis, Pradeep Kumar Jayaraman, Joseph G Lambourne, Hang Chu, and Yewen Pu. Engineering sketch generation for computer-aided design. ar Xiv preprint ar Xiv:2104.09621, 2021.

[39] Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. ar Xiv preprint ar Xiv:2105.09492, 2021.

[40] Xianghao Xu, Wenzhe Peng, Chin-Yi Cheng, Karl D. D. Willis, and Daniel Ritchie. Inferring cad modeling sequences using zone graphs. In CVPR, 2021.