# discovering_design_concepts_for_cad_sketches__db6dd5d9.pdf Discovering Design Concepts for CAD Sketches Yuezhi Yang The University of Hong Kong Microsoft Research Asia yzyang@cs.hku.hk Hao Pan Microsoft Research Asia haopan@microsoft.com Sketch design concepts are recurring patterns found in parametric CAD sketches. Though rarely explicitly formalized by the CAD designers, these concepts are implicitly used in design for modularity and regularity. In this paper, we propose a learning based approach that discovers the modular concepts by induction over raw sketches. We propose the dual implicit-explicit representation of concept structures that allows implicit detection and explicit generation, and the separation of structure generation and parameter instantiation for parameterized concept generation, to learn modular concepts by end-to-end training. We demonstrate the design concept learning on a large scale CAD sketch dataset and show its applications for design intent interpretation and auto-completion. 1 Introduction Parametric CAD modeling is a standard paradigm for mechanical CAD design nowadays. In parametric modeling, CAD sketches are fundamental 2D shapes used for various 3D construction operations. As shown in Fig. 1, a CAD sketch is made of primitive geometric elements (e.g. lines, arcs, points) which are constrained by different relationships (e.g. coincident, parallel, tangent); the sketch graph of primitive elements and constraints captures design intents, and allows adaptation and reuse of designed parts by changing parameters and updating all related elements automatically [1]. Designers are therefore tasked with the meticulous design of such sketch graphs, so that the inherent high-level design intents are easy to interpret and disentangle. To this end, meta-structures (Fig. 1), which we call sketch concepts in this paper, capture repetitive design patterns and regulate the design process with more efficient intent construction and communication [9, 12]. Concretely, each sketch concept is a structure that encapsulates specific primitive elements and their compositional constraints, and the interactions of its internal elements with outside only go through the interface of the concept. How to discover these modular concepts automatically from raw sketch graphs? In this paper, we cast this task as a program library induction problem by formulating a domain specific language (DSL) for sketch generation, where a sketch graph is formalized as a program, and sketch concepts are modular functions that abstract primitive elements and compose the program (Fig. 1). Discovering sketch concepts thus becomes the induction of library functions from sketch programs. While previous works address the general library induction problem via expensive combinatorial search [20, 5, 7], we present a simple end-to-end deep learning solution for sketch concepts. Specifically, we bridge the implicit and explicit representations of sketch concepts, and separate concept structure generation from parameter instantiation, so that a powerful deep network can detect and generate sketch concepts, by training with the inductive objective of reconstructing sketch with modular concepts. We conduct experiments on large-scale sketch datasets [17]. The learned sketch concepts show that they provide modular interpretation of design sketches. The network can also be trained on incomplete input sketches and learn to auto-complete them. Comparisons with state-of-the-art approaches that Work done during internship at Microsoft Research Asia. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). Coinc. Coinc. Coinc. In arg: 1 Out arg: 1 Coinc. Out arg:0 Coinc. Line Parallel Distance Perpend. Parallel Distance Coinc. Coinc. S.t0 : T1 0 S.t1 : T1 0 S.t2 : T1 1 (b) Learned program that restructures the sketch: T1 0 λ(αo 0, αo 1).{t0:Arc, t1:Tang., t2, t3:Coinc., R={t1(t0, αo 0), t2(t0, αo 0), t3(t0, αo 1)}} T1 1 λ(αi 0, αi 1).{t0,t1,t2,t3:Line, t4,t5,t6,t7:Coinc., t8:Perpend., t9:Parallel, t10:Distance, t11:Horizon, R={t4(t0, t3), t5(t0, t2), t0(t1, t2), t7(t1, t3), t8(t1, t2), t9(t2, t3), t10(t2, t3), t11(t3), αi 0(t1), αi 1(t0)}} S {t0,t1:T1 0, t2:T1 1, R={t0(t2.αi 1, t2.αi 0), t1(t2.αi 0, t2.αi 1)}} Figure 1: Concept learning from sketch graphs. (a) In black are the raw sketch and its constraint graph, with nodes showing primitives and edges depicting constraints. Colored are the restructured sketch and its modular constraint graph, where each module box represents a concept; primitives and constraint edges are colored according to the modular concepts. (b) The restructured sketch graph in our DSL program representation (List 1), where the whole sketch S is compactly constructed with three instances of two learned L1 types. We simplify notation super/sub-scripts for readability. solve sketch graph generation through autoregressive models show that the modular sketch concepts learned by our approach enable more accurate and interpretable completion results. To summarize, we make the following contributions in this paper: We formulate the task of discovering modular CAD sketch concepts as program library induction for a declarative DSL modeling sketch graphs. We propose a self-supervised deep learning framework that discovers modular libraries for the DSL with simple end-to-end training. We show the framework learns from large-scale datasets sketch concepts capturing intuitive and reusable components, and enables structured sketch interpretation and auto-completion. 2 Related work Concept discovery for CAD sketch It is well acknowledged in the CAD design community that design intents are inherent to and implicitly encoded by the combinations of geometric primitives and constraints [12, 10]. However, there is generally no easy approach to discover the intents and make them explicit, albeit through manual design of meta-templates guided by expert knowledge [9, 10]. We propose an automatic approach to discover such intents, by formulating the intents as modular structures with self-contained references, and learning them through self-supervised inductive training with simple objectives on large raw sketch dataset. Therefore, we provide an automatic approach for discovering combinatorially complex structures through end-to-end neural network learning. Generative models for CAD sketch A series of recent works [6, 24, 13, 18, 25] use autoregressive models [22] to generate CAD sketches and constraints modeled through pointer networks [23]. These works focus on learning from large datasets [17] to generate plausible layouts of geometric primitives and their constraints, which can then be fine-tuned with a constraint solver for more regular sketches. Different from these works, our aim is to discover modular structures (i.e. sketch concepts) from the concrete sketches. Therefore, our framework provides higher-level interpretation of raw sketches and more transparent auto-completion than these works (cf. Sec. 6). Program library induction for CAD modeling Program library induction has been studied in the shape modeling domain [7]. General program synthesis assisted by deep learning is a research topic with increasing popularity [20, 3, 4, 19, 5]. The library induction task specifically involves combinatorial search, as has been handled by neural guided search [20, 5] or by pure stochastic sampling [7]. We instead present an end-to-end learning algorithm for sketch concept induction. In particular, based on key observations about sketch concepts, we present implicit-explicit dual representations of concept library functions, and separate the concept structure generation from parameter instantiation, to enable self-supervised training with induction objectives. List 1: A domain-specific language formulating CAD sketch concepts // Basic data types Length, Angle, Coord, Ref // L0 primitive types Line cstart_x, cstart_y, cend_x, cend_y : Coord Circle ccenter_x, ccenter_y : Coord, lradius : Length // L0 constraint types Coincident λ(r1, r2 : Ref).{} Parallel Distance λ(r1, r2 : Ref).{ldist : Length} // L1 composite types T1 i λ([αk : Ref]).{t0 i,j : T0 j L0, RT1 i [t0 i,j] [αk] } // Sketch decomposition S {t1 i : T1 i L1, RS([t1 i ])} 3 CAD sketch concept formulation To capture the notion of sketch concepts precisely, we formulate a domain specific language (DSL) (syntax given in List 1, an exhaustive list of data types given in the supplementary). In the DSL, we first define the basic data types, including length, angle, coordinate, and the reference type, where a reference binds to another reference or a primitive for modeling the constraint relationships. Second, we define the L0 collection of primitive and constraint types as given in raw sketches. In particular, we regard the constraints as functions whose arguments are the references to bind with primitives, e.g. a coincident constraint c = λ(r1, r2 : Ref).{}, where a function is represented in the lambda calculus style (one may refer to [14] for introductory lambda calculus formality). Some constraints have parameters other than mere references, which are treated as variables inside, e.g. parallel distance in List 12. Third, we define the sketch concepts as L1 types composed of L0 types. To be specific, a composite type T1 i L1 is a function with arguments [αk] and members t0 i,j : T0 j L0, which are connected through a composition operator RT1 i = {p(q)|p, q [t0 i,j] [αk]} that specifies how each pair of primitive elements binds together. For example, a coincident constraint p = λ(r1, r2).{} may take a line primitive q as its first argument and bind to an argument αk of the composite type as its second argument, i.e. p(q, αk) RT1 i ; on the other hand, an argument αk may bind to a primitive q, which is specified by αk(q) RT1 i . Finally, an input sketch S is restructured as a collection of composite types t1 i : T1 i L1, as well as their connections specified by a corresponding composition operator RS. RS records how different concepts bind through their arguments, which further transfers to L0 typed elements inside the concepts and translates into the raw constraint relationships of the sketch graph. Fig. 1(b) shows an example DSL program encoding sketches and concepts. Given the explicit formulation of CAD sketches through a DSL, the discovery of sketch concepts becomes the task of learning program libraries L1 by induction on many sketch samples. Therefore, our task resembles shape program synthesis that aims at building modular programs for generating shapes [5, 7], and differs from works that use autoregressive language models to generate CAD sketch programs one token at a time [6, 13, 18]. In Sec. 6.2, we show that the structured learning of CAD sketches enables more robust auto-completion than unstructured language modeling. The search of structured concepts is clearly a combinatorial problem with exponential complexity, which is intractable unless we can exploit the inherent patterns in large-scale sketch datasets. However, to enable deep learning based detection and search of structured concepts, we need to bridge the implicit deep representations and the explicit and interpretable structures, which we build through the following two key observations: A concept has dual representations: implicit and explicit. The implicit representation as embeddings in latent spaces is compatible with deep learning, while the explicit representation provides structures on which desired properties (e.g. modularity) can be imposed. 2While other works [13, 18] have skipped such constraints, we preserve them but omit generating the parameter values that can be reliably deduced from primitives. See more discussions in the supplementary. input sketch S=[t0 i ] concept lib L1 structure T1 instance t1 Encoder Decoder (a) detection module (b) generation module (c) loss computation Figure 2: Framework illustration. (a) The detection module is a transformer network that detects from the sketch sequence [t0 i ] implicitly encoded concepts [qi] and their composition q R. (b) Each q is quantized against the concept library L1 to obtain prototype q , which is expanded by structure network into an explicit structure T1 and further instantiated by parameter network into t1. (c) The collection of [t1 i ] are assembled by composition operator RS generated from q R to obtain the final generated sketch graph, which is compared with the input sketch for loss computation. A concept is a parameterized structure. A concept is a composite type with fixed modular structure for interpretability, but the structure is always instantiated by assigning parameters to its component primitives when the concept is found in a sketch. 3.1 Method overview According to the two observations, we design an end-to-end sketch concept learning framework by self-supervised induction on sketch graphs. As shown in Fig. 2, the framework has two main steps before loss computation: a detection step that generates implicit representations of concepts making up the input sketch, and an explicit generation step that expands the implicit concepts into concrete structures on which self-supervision targets like reconstruction and modularity are applied. Building on a state-of-the-art detection architecture [2], the detection module D takes a sketch S as input and detects the modular concepts within it, i.e. {qi} = D(S, {qi}), where the concepts are represented implicitly as latent codes {qi}, and {qi} are a learnable set of concept instance queries. Notably, we apply vector quantization to the latent codes and obtain {q i = minp L1 ||p qi||2}, which ensures that each concept is selected from the common collection of learnable concepts L1 used for restructuring all sketches. The explicit generation module is separated into two sub-steps, structure generation and parameter instantiation, which ensures that the modular concept structures are explicit and reused throughout different sketch instances. Specifically, the structure network takes each quantized concept code q i and generates its explicit form T1 i in terms of primitives and constraints of L0 types along with the composition operator RT1 i . Subsequently, the parameter network instantiates the concept structure by assigning parameter values to each component of T1 i conditioned on qi and input sketch, to obtain t1 i . The composition operator RS for combining {t1 i } is generated from a special latent code q R transformed by D from a learnable token q R appended to {qi}. The entire model is trained end-to-end by reconstruction and modularity objectives. In particular, we design loss functions that measure differences between the generated and groundtruth sketch graphs, in terms of both per-element attributes and pairwise references. Given our explicit modeling of encapsulated structures of the learned concepts, we can further enhance the modularity of the generation by introducing a bias loss that encourages in-concept references. 4 End-to-end sketch concept induction 4.1 Implicit concept detection Sketch encoding A raw sketch S can be serialized into a sequence of L0 primitives and constraints. Previous works have adopted slightly different schemes to encode the sequence [6, 13, 18, 24, 25]. In this paper, we build on the previous works and take a simple strategy akin to [13, 25] for input sketch encoding. Specifically, we split each L0 typed instance t0 into several tokens: type, parameter, and a list of references. For each of the token category, we use a specific embedding module. For example, parameters as scalars are quantized into finite bins before being embedded as vectors (see supplementary for the quantization details), and since there are at most five parameters for each primitive, we pack all parameter embeddings into a single code. On the other hand, each constraint reference as a primitive index is directly embedded as a code. Therefore, each token of a L0 typed instance is encoded as et0.x = enctype(t0) + encpos(t0.x) + encparam(t0.x)|encref(t0.x) , (1) where t0.x iterates over the split tokens (i.e., type, parameters, references), the type embedding is shared for all tokens of the instance, the position embedding counts the token index in the whole split-tokenized sequence of S, and parameter or reference embeddings are applied where applicable. Concept detection We build the detection network as an encoder-decoder transformer following [2]. The transformer encoder operates on the sketch encoded sequence [et0 i S] and produces the contextualized sequence [e t0 i S] through layers of self-attention and feed-forward. The transformer decoder takes a learnable set of concept queries [qi] of size kqry plus a special query q R for composition generation, and applies interleaved self-attention, cross-attention to [e t0 i ] and feedforward layers to obtain the implicit concept codes [qi] and q R. The concept codes are further quantized into [q i] by selecting concept prototypes from a library L1 implicitly encoding L1, before being expanded into explicit forms. 4.2 Explicit concept structure generation Concept structure expansion Given a library code q L1 representing a type T1 L1, through an MLP we expand its explicit structure as a collection of codes [t0 i ] representing the L0 type instances [t0 i ] and a matrix representing the composition RT1 of [t0 i ] and arguments (cf. List 1). concept A concept B RS primitive constraint inward arg outward arg We fix the maximum number of L0 type instances to k L0 (12 by default), and split the arguments into two groups, inward arguments and outward arguments, each of maximum number karg (2 by default). Each type code t0 i is decoded into discrete probabilities over L0 with an additional probability for null type φ to indicate the emptiness of this element (cf. Sec. 5.1), by dectype( ) as the inverse of enctype( ) in Sec. 4.1. An inward argument only points to a primitive inside the concept structure and originates from a constraint outside, and conversely an outward argument only points to primitives outside and originates from a constraint inside the concept (see inset for illustration); the split into two groups eases composition computation, as discussed below. The composition operator RT1 is implemented as an assignment matrix RT1 of shape (2k L0+karg) (k L0+karg), where each row corresponds to a constraint reference or inward argument, and each column to a primitive or outward argument. The two-fold coefficient of constraint references comes from that any constraint we considered in the dataset [17] has at most two arguments. Each row is a discrete probability distribution such that P j RT1[i, j] = 1, with the maximum entry signifying that the i-th constraint/outward argument refers to the j-th primitive/inward argument. We compute RT1 by first mapping the concept code q to a matrix of logits in the shape of RT1, and then applying softmax transform for each row. Notably, we avoid the meaningless loops of an element referring back to itself, and inward arguments referring to outward arguments, by masking the diagonal blocks RT1[2i:2i+2, i], i [k L0] and the argument block RT1[2k L0:, k L0:] by setting their logits to . Cross-concept composition Aside from references inside a concept, references across concepts are generated to complete the whole sketch graph. We achieve cross-concept references by argument passing (see inset above for illustration). In particular, we implement the cross-concept composition operator RS as an assignment matrix RS of shape (kqry karg) (kqry karg) directly mapped from q R through an MLP. Similar to the in-concept composition matrix, each row of the cross-concept matrix is a discrete distribution such that P j RS[i, j] = 1, with the maximum entry signifying that the (i mod karg)-th outward argument of the i/karg -th concept instance refers to the (j mod karg)-th inward argument of the j/karg -th concept instance. The complete cross-concept reference is therefore the product of three transport matrices: Rcref[t1 i , t1 j] = Rt1 i [:2k L0, k L0:] RS[i karg:(i+1) karg, j karg:(j+1) karg] Rt1 j[2k L0:, :k L0], where Rcref[t1 i , t1 j] is a block assignment matrix of shape 2k L0 k L0. Intuitively, Rcref[t1 i , t1 j] specifies how constraints inside t1 i refers to primitives of t1 j throughout all possible paths crossing the arguments of two concepts. Collectively, we denote the complete reference matrix for all pairs of generated L0 elements as R of shape (2kqry k L0) (kqry k L0), which includes in-concept and cross-concept references. 4.3 Concept instantiation by parameter generation Instantiating a concept structure requires assigning parameters to the components where applicable. Therefore, as shown in Fig. 2, the parameter generation network takes a concept structure T1 and its implicit instance encoding q as input, and produces the specific parameters for each L0 typed instances inside the concept. In addition, as the parameters of generated instances are directly related to the input parameters of the raw sketch S, we find it improves convergence and accuracy by allowing the parameter network to attend to the input tokens. We implement the parameter network as a transformer decoder in a similar way as [2]. The instance code q is first expanded to k L0 tokens by a small MLP, which are summed with [t0 i ] token-wise to obtain the query codes. The parameter network then transforms the query codes through interleaved layers of self-attention, cross-attention to the contextualized input sequence [e t0], and feed-forward, before finally mapped to explicit parameters in the form of probabilities over quantized bins, through a decoding layer decparam( ) that is inverse of encparam( ) in Sec. 4.1. 5 Induction objectives Without any given labels of concepts, we use the following objectives to supervise the inductive network training: sketch reconstruction, concept quantization, and modularity enhancement. 5.1 Reconstruction loss As discussed in Sec. 4, an input sketch is restructured into a set of sketch concepts which are expanded into a graph of primitives and constraints; the generated sketch graph e S is compared with the input sketch S for reconstruction supervision. The comparison of generated and target graphs requires a one-to-one correspondence between elements of the two graphs, on which the graph differences can be measured. However, it is nontrivial to find such a matching, because not only are there variable numbers of elements in the two graphs, but also both elements and references between elements must be taken into account for matching. To this end, we build a cost matrix that measures the difference for each pair of generated and target elements, in terms of their attributes and references, and apply linear assignment matching on the cost matrix [11, 8] to establish the optimal correspondence between two graphs. Cost matrix construction To compare each pair of generated element and target element, we measure their type differences, and further use type casting to interpret the generated element as the target type, so that their parameters can be compared. To account for reference differences between the two elements, we compare the reference arrows by the differences of their pointed primitives. generations cost matrix green: primitives orange: constraints p.r q q.r R[2q+r, :] Binary cost between target constraint p and generated element q. Specifically, given the target graph S of ktgt elements and the generated graph e S of kqry k L0 elements, we build the cost matrix C of shape ktgt (kqry k L0) in two steps. First, for a pair of target element p and generated element q, we compare their type and parameter differences by cross-entropy. We denote the cost matrix in this stage as Cury, as it accounts for the element-wise unary distances between two graphs. Second, to measure the binary distances of references, for each target constraint element p and its r-th referenced primitive p.r, its distance from the generated references of element q is computed as (also illustrated by inset figure): Cbry[p, q] = X j e S R[2q + r, j] Cury[p.r, j], (3) where R[2q+r, j] is the probability of q taking j as its r-th reference, as predicted by the composition operation (Sec. 4.2). Intuitively, the binary cost is a summation of unary costs weighted by predicted reference probabilities, where the unary costs measure how different a generated pointed primitive is from the target pointed primitive. The complete cost matrix is C = wury Cury + wbry Cbry, with wury = 50, wbry = 1; we give a larger weight to the unary costs because meaningful binary costs depend on reliable unary costs in the first place, as evident in Eq. (3). Matching and reconstruction loss Given the cost matrix C, we apply linear assignment to obtain a matching σ : e S S {φ} between e S and S. Note that the number of elements of these two graphs can be different, but we have chosen kqry, k L0 such that the generated elements always cover the target elements. Therefore, σ(q) = p assigns a matched generation q M e S to a target p S, but assigns the rest unmatched generations M = e S\M to the empty target φ, i.e. σ(M ) = φ. The loss terms for matched generations are simply the corresponding cost terms C[σ(q), q], q M; for unmatched generations, we supervise its type to be the empty type φ and neglect its parameters or references. We denote the average loss of all generated terms as Lrecon. Besides matching cost, we also use an additional reference loss to encourage the generated references to be sharp (i.e., R being closer to binary). This loss complements the binary costs mentioned above by making sure that even if the generated primitives are similar, a generated constraint only refers to one primitive sharply. We define the sharp reference loss as p Sc,r log R[2σ 1(p) + r, σ 1(p.r)], (4) where p iterates over the target constraints Sc, σ 1( ) is the inverse mapping from target element to generation, and we skip a term if p.r does not exist for constraints with one reference. 5.2 Concept quantization loss Following [21, 16], we optimize the concept code quantization against library L1 with: Lvq = 1 kqry i [kqry] ||sg(qi) q i|| + β||qi sg(q i)||, (5) where sg( ) is the stop gradient operation. For training stability, we follow [16] and replace the first term with EMA updates of q L1. Furthermore, we improve spare code usage by reviving unused code in L1 periodically [16] (please refer to supplementary for details). 5.3 Modularity enhancement loss We look for modular L1 concepts that have rich and meaningful encapsulated structures, rather than arbitrary groups of L0 elements that rely on cross-group references to recover the graph structures. This modularity can be enhanced by limiting the use of arguments for sketch concepts. Instead of allocating very few arguments as a hard constraint, we introduce a soft bias loss to encourage the restrictive use of arguments, which may still cover cases when more arguments are needed for accurate reconstruction. To be specific, we penalize the accumulated probability of elements pointing to outward arguments: Lbias = 1 |Sc| i [karg] RT1 σ 1(p)[2σ 1(p) + r, i + k L0], (6) where again p iterates over the target constraints Sc, σ 1( ) is the inverse mapping from target element to generation, and i + k L0 slices the reference probabilities to outward arguments. 5.4 Total loss The training objective sums up losses of reconstruction, concept quantization and modularity bias: Ltotal = wrecon Lrecon + wsharp Lsharp + wvq Lvq + wbias Lbias, (7) where we empirically use weights wrecon = 1, wsharp = 20, wvq = 1, wbias = 25 throughout all experiments unless otherwise specified in the ablation studies. Dataset and implementation Following previous works [6, 13, 18], we adopt the Sketch Graphs dataset [17] which contains millions of real-world CAD sketches for training and evaluation. We filter the data by removing trivially simple sketches and duplicates, and limit the sketch complexity such that the number of primitives and constraints is within [20, 50]. As a result, we obtain around 1 million sketches and randomly split them into 950k for training and 50k for testing. We defer network details to the supplementary and open-source code and data to facilitate future research3. Evaluation metrics We evaluate the generated sketches in terms of reconstruction accuracy and sketch concept modularity, which are the two major objectives of our task. We measure reconstruction accuracy by the F-scores of generated primitives and constraints, where F-score is simply the harmonic mean of precision and recall. A generated primitive is considered a correct match with ground-truth if its type and parameters are correct, where for the scalar parameters we allow a threshold of 10% of quantization levels. A constraint is correct if and only if its type, parameter and references match ground-truth, i.e., the generated q is correct w.r.t target p iff q has the same type and parameters with p and the primitives q.r and p.r are correctly matched. Modularity is measured by the percentage of constraints with references entirely within the encapsulating concepts, among all correct constraints. Line Coinc. Line Coinc. Line Perpend Line Coinc. Line Coinc. Coinc. Coinc. Line Perpend Coinc. Coinc. Horizon Line Horizon Line Perpend Coinc. Line Coinc. Parallel Point Coinc. Tangent Point Coinc. Line Tangent Out arg: 0 Coinc. Perpend Out arg: 1 Coinc. Out arg: 0 Coinc. Out arg: 1 Equal Point Coinc. Line Out arg: 0 Coinc. Line Horizon In arg: 0 Line Coinc. Perpend Line Coinc. Line Coinc. Out arg: 1 Coinc. Line Coinc. Line Parallel Out arg: 1 Coinc. Perpend Coinc. Figure 3: Design intent parsing. Left: input raw sketches and sketches restructured with concepts. Right: raw constraint graphs and modular constraint graphs. Primitives and constraints in the restructured sketches and graphs are colored according to concepts. Line Coinc. Perpend Coinc. Parallel Line Coinc. Perpend Figure 4: Instances of concepts. Left: two learned rectangle concepts with subtly different structures. Right: four sketches containing instances of these two concepts. (a) (b) Figure 5: Auto completion. Each example shows the input partial sketch (black) and groundtruth completion (red), result of the autoregressive baseline, and our result (colored by concepts). 3URL to code and data: https://github.com/yyuezhi/Sketch Concept Autoregressive baseline Ours Autoregressive baseline Ours (a) primitive (b) constraint Figure 6: Auto completion comparison. Plotted are F-scores at different ratios of partial input. Config Primitive Constraint Modular(%) Cbry 0.993 0.290 34.2 Lsharp 0.983 0.104 100 Lbias 0.992 0.743 13.3 Ours 0.994 0.766 50.8 Table 1: Loss ablation. F-scores are reported for primitives and constraints. 6.1 Design intent interpretation By training our model on the raw sketches with self-supervised induction losses, we obtain a result library of sketch concepts and a model for design intent parsing that interprets a given sketch into modular concepts and their combination. Indeed, we find the automatically discovered concepts capture natural design intents and modular structures. For example, through the restructured sketches and constraint graphs in Figs. 1 and 3, we find that our network decomposes sketches into modular structures like rectangles, line-arcs and parallel lines that align symmetrically, even though no such prior knowledge is applied during training except for concept modularity. Fig. 4 shows that a given concept can be used repetitively in different sketches, and structures with subtle differences in constraint relations can be detected and distinguished into different concepts of the library. Note that these subtle structural differences are subsumed in the input sketch graph, which makes them more difficult to detect. We refer to the supplementary for more examples of design intent parsing and instantiation of learned concepts, as well as quantitative analysis of the learned library. 6.2 Auto completion Auto-completion is a critical feature of CAD modeling software for assisting designers. Given a partial sketch of primitives and their constraints, auto-completion aims at complementing them with the rest primitives and constraints to form regular and well-structured designs. Therefore, our concept detection and generation approach would naturally enhance the auto-completion task with better regularity. For training and evaluation, following previous work [18], we synthesize the partial input by removing a suffix of random length (up to 50%) from the sketch sequence, along with constraints that refer to the removed primitives, and make the model learn to generate the full sketches. State-of-the-art methods [6, 13, 18, 24] formulate auto-completion through a combination of primitive and constraint generation models, both of which operate in an autoregressive fashion, with the constraint model conditioned on and referring (by pointers [23]) to the generated primitives. Since these works use diverse sketch encodings and have no publicly released code at submission time, for fair comparison, we implement the autoregressive baseline with our sketch encoding (Sec. 4.1). Fig. 6 compares our method with autoregressive baseline under various primitive mask ratios: our method has superior primitive and constraint accuracy than the autoregressive baseline at almost all mask ratios. This difference confirms that since our method completes sketches concept-by-concept instead of primitive-by-primitive, more meaningful structures are likely to be generated. Our model also gains advantage by taking primitives and constraints together as input and generating primitives and constraints simultaneously, while in comparison the autoregressive baseline separates generation in two steps (primitives followed by constraints). Indeed, in practice CAD designers rarely finish all primitives first before supplementing the constraints, but rather apply constraints on partial primitives immediately whenever they form a design intent. Fig. 5 shows how our approach interprets the partial inputs and completes with modular concepts (see supplementary for more examples); in comparison, the autoregressive baseline does not provide such interpretable or regular completions. 6.3 Ablation study To evaluate the impact of different loss terms of the induction objective (Sec. 5), we train several models in the absence of these losses respectively on the auto-encoding task. The results are shown in Table. 1. We see that removing the binary costs Cbry from reconstruction loss results in significant drop of constraint reconstruction, showing its necessity for constraint reference learning. Removing sharp reference loss Lsharp similarly fails constraint reference learning, although modularity enhancement bias loss makes all constraint references inside concepts. Removing the modularity enhancement bias loss Lbias only results in a slight drop in reconstruction quality but a significant drop in modularity, since without it cross-concept reference through arguments is more likely and therefore modularity suffers. We provide more ablation tests on hyper parameters like the numbers of concept queries kqry and arguments karg in the supplementary. 7 Conclusion CAD sketch concepts are meta-structures containing primitives and constraints that define modular sub-graphs and capture design intents. By formulating the sketch concepts as program libraries of a DSL, we present an end-to-end approach for discovering CAD sketch concepts through library induction learning. Key to our approach are the implicit-explicit representation of concepts and the separated structure generation and parameter instantiation for concept generation, which together enable the end-to-end training under self-supervised induction objectives. By training on large-scale sketch dataset, our approach enables the discovery of repetitive and modular concepts from raw sketches, and more structured and interpretable auto-completion than baseline autoregressive models. Limitations and future work Design intents can be hierarchical [10], meaning that higher order meta-structures can be built out of lower order ones. In this sense, our framework only addresses the first order library induction, and should be extended for higher order library learning; toward this goal, we believe a progressive approach like [5] can be used with our framework as the one-step induction. In addition, similar strategies of end-to-end induction learning can be applied to constraint graphs involving 3D CAD operations or even more general programs in other domains, as long as they have similar declarative and parametric structures as sketch graphs. [1] Jorge D Camba, Manuel Contero, and Pedro Company. Parametric cad modeling: An analysis of strategies for design reusability. Computer-Aided Design, 74, 2016. [2] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In ECCV, 2020. [3] Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. Learning to infer graphics programs from hand-drawn images. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. [4] Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama. Write, execute, assess: Program synthesis with a repl. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. [5] Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, 2021. [6] Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, and Stefano Saliceti. Computer-aided design as language. In Advances in Neural Information Processing Systems, volume 34, 2021. [7] R. Kenny Jones, David Charatan, Paul Guerrero, Niloy J. Mitra, and Daniel Ritchie. Shapemod: Macro operation discovery for 3d shape programs. ACM Transactions on Graphics (Siggraph), 40(4), 2021. [8] Harold W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83 97, 1955. [9] Sofia Kyratzi and Philip Azariadis. A constraint-based framework to recognize design intent during sketching in parametric environments. Computer-Aided Design and Applications, 18(3), 2021. [10] Sofia Kyratzi and Philip Azariadis. Integrated design intent of 3d parametric models. Computer Aided Design, 146, 2022. [11] James Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32 38, 1957. [12] Jeffrey Otey, Pedro Company, Manuel Contero, and Jorge D. Camba. Revisiting the design intent concept in the context of mechanical cad education. Computer-Aided Design and Applications, 15(1), 2018. [13] Wamiq Para, Shariq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas J Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches. In Advances in Neural Information Processing Systems, volume 34, 2021. [14] Benjamin C Pierce. Types and programming languages. MIT press, 2002. [15] PTC Inc. Onshape, 2022. URL https://www.onshape.com/en/. Accessed Apr. 20, 2022. [16] Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, volume 32, 2019. [17] Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketch Graphs: A large-scale dataset for modeling relational geometry in computer-aided design. In ICML 2020 Workshop on Object-Oriented Learning, 2020. [18] Ari Seff, Wenda Zhou, Nick Richardson, and Ryan P Adams. Vitruvion: A generative model of parametric CAD sketches. In International Conference on Learning Representations, 2022. [19] Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu. Learning to infer and execute 3d shape programs. In International Conference on Learning Representations, 2019. [20] Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. Houdini: Lifelong learning as program synthesis. In International Conference on Neural Information Processing Systems, 2018. [21] Aaron van den Oord, Oriol Vinyals, and koray kavukcuoglu. Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30, 2017. [22] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017. [23] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, volume 28, 2015. [24] Karl D.D. Willis, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Hang Chu, and Yewen Pu. Engineering sketch generation for computer-aided design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2021. [25] Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computeraided design models. In IEEE International Conference on Computer Vision (ICCV), 2021. 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] Please see Sec. 7. (c) Did you discuss any potential negative societal impacts of your work? [Yes] Please see Appendix. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [N/A] We present a deep learning algorithm rather than theorems in this paper. (b) Did you include complete proofs of all theoretical results? [N/A] 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We provided the url to open-source repository in Sec. 6. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please refer to Sec. 6 and Appendix. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No] We did not report error bars due to lack of computational resources for running all comparisons and ablations multiple times consistently. But we note that the contrasts between different methods are very obvious, and we did run different methods in non-uniform repetitions and have found the same conclusions. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please see Appendix. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] Please see Sec. 6. (b) Did you mention the license of the assets? [Yes] The citation [17] clearly states that the Sketch Graphs dataset we use has an MIT license. (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] We have provided the url to open-source repository in Sec. 6. (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [Yes] The MIT license of [17] permits research purpose usage as done in this paper. (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] The CAD sketch graph data used in this paper is not personally related. 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]