# dmesh_a_differentiable_mesh_representation__5b6c4712.pdf DMesh: A Differentiable Mesh Representation Sanghyun Son1 Matheus Gadelha2 Yang Zhou2 Zexiang Xu2 Ming C. Lin1 Yi Zhou2 1University of Maryland, College Park, {shh1295,lin}@umd.edu 2Adobe Research, {gadelha,yazhou,zexu,yizho}@adobe.com Figure 1: ( ) Optimization process. We can optimize our mesh starting from either (a) random state or (b) initialization based on sample points for faster convergence. Mesh connectivity changes dynamically during the optimization. To make this topology change possible, we compute existence probability for an arbitrary set of faces in a differentiable manner. Figure 2: Versatility of DMesh. DMesh can represent diverse geometry in differentiable manner, including (a) non-convex polyhedra of different Euler characteristics, (b) non-orientable geometries (Möbius strip, Klein bottle), and (c) complex protein structure (colored for aesthetic purpose). We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point clouds and multi-view images using gradient-based optimization. We publicize the source code and supplementary material at our project page 1. 1https://sonsang.github.io/dmesh-project 38th Conference on Neural Information Processing Systems (Neur IPS 2024). 1 Introduction Polygonal meshes are widely used in modeling and animation due to their diverse, compact and explicit configuration. Recent AI progress has spurred efforts to integrate mesh generation into machine learning, but challenges like varying topology hinder suitable differentiable mesh representations. This limitation leads to reliance on differentiable intermediates like implicit functions, and subsequent iso-surface extraction for mesh creation (Liao u. a., 2018; Guillard u. a., 2021; Munkberg u. a., 2022; Shen u. a., 2023, 2021; Liu u. a., 2023b). However, meshes generated by such approaches can be misaligned at sharp regions and unnecessarily dense (Shen u. a., 2023), not suitable for down-stream applications that require light-weight meshes. This limitation necessitates us to develop a truly differentiable mesh representation, not the intermediate forms. The fundamental challenge in creating a differentiable mesh representation lies in formulating both the vertices geometric features and their connectivity, defined as edges and faces, in a differentiable way. Given a vertex set, predicting their connectivity in a free-form way using existing machine learning data-structures can cost significant amount of computation and be difficult to avoid irregular and intersecting faces. Consequently, most studies on differentiable meshes simplify the task by using a mesh with a pre-determined topology and modifying it through various operations (Zhou u. a., 2020; Hanocka u. a., 2019; Palfinger, 2022; Nicolet u. a., 2021). This work, on the contrary, ambitiously aims to establish a general 3D mesh representation, named as DMesh, where both mesh topology and geometric features (e.g. encoded in vertex location) can be simultaneously optimized through gradient-based techniques. Our core insight is to use differentiable Weighted Delaunay Triangulation (WDT) to divide a convex domain, akin to amber encapsulating a surface mesh, into tetrahedra to form a mesh. To create a mesh with arbitrary topology, we select only a subset of triangular faces from the tetrahedra, termed the real part", as our final mesh. The other faces, the imaginary part", support the real part but are not part of the final mesh (Figure 4). We introduce a method to assess the probability of a face being part of the mesh based on weighted points that carry positional and inclusiveness information. Optimization is then focused on the points features to generate the triangular mesh. The probability determination allows us to compute geometric losses and rendering losses during gradient-based optimization that optimizes connectivity and positioning. The key contributions of our work can be summarized as follows. We present a novel differentiable mesh representation, DMesh, which is versatile to accommodate various types of mesh (Figure 2). The generated meshes can represent shapes more effectively, with much less number of vertices and faces (Table 2). We propose a computationally efficient approach to differentiable WDT, which produces robust probability estimations. While exhaustive approach (Rakotosaona u. a., 2021) requires quadratic computational cost, our method runs in approximately linear time. We provide efficient algorithms for reconstructing surfaces from both point clouds and multi-view images using DMesh as an intermediate representation. We finally propose an effective regularization term which can be used for mesh simplification and enhancing triangle quality. Additionally, to further accelerate the algorithm, we implemented our main algorithm and differentiable renderer in CUDA, which is made available for further research. 2 Related Work 2.1 Shape Representations for Optimization Recently, using neural implicit functions for shape representation gained popularity in graphics and vision applications (Mildenhall u. a., 2021; Zhang u. a., 2020; Liu u. a., 2020; Chen u. a., 2022a, 2023; Wang u. a., 2021; Yariv u. a., 2020). They mainly use volume density, inspired by (Mildenhall u. a., 2021), to represent a shape. However, because of its limited accuracy in 3D surface representation, neural signed distance functions (SDFs) (Yariv u. a., 2021; Wang u. a., 2021, 2023; Oechsle u. a., 2021) or unsigned distance functions (UDFs) (Liu u. a., 2023a; Long u. a., 2023) are often preferred. Figure 3: Our overall framework to optimize mesh according to the given observations. (a): Each point is defined by a 5-dimensional feature vector: position, weight, and real value. Points with larger real values are rendered in red. (b): Given a set of points, we gather possibly existing faces in the mesh and evaluate their probability in differentiable manner. (c): We can compute reconstruction loss based on given observations, such as mesh, point cloud, or multi-view images. (d): To facilitate the optimization process and enhance the mesh quality, we can use additional regularizations. After optimization, one can recover meshes using iso-surface extraction techniques (Lorensen und Cline, 1998; Ju u. a., 2002). Differing from neural representations, another class of methods directly produce meshes and optimize them. However, they assume that the overall mesh topology is fixed (Chen u. a., 2019; Nicolet u. a., 2021; Liu u. a., 2019; Laine u. a., 2020), only allowing local connectivity changes through remeshing (Palfinger, 2022). Learning-based approaches like BSP-Net (Chen u. a., 2020) allow topological variation, but their meshing process is not differentiable. Recently, differentiable isosurface extraction techniques have been developed, resulting in high-quality geometry reconstruction of various topologies (Liao u. a., 2018; Shen u. a., 2021, 2023; Wei u. a., 2023; Munkberg u. a., 2022; Liu u. a., 2023b; Mehta u. a., 2022). Unfortunately, meshes relying on iso-surface extraction algorithms (Lorensen und Cline, 1998; Ju u. a., 2002) often result in unnecessarily dense meshes that could contain geometric errors. In contrast, our approach addresses these issues: we explicitly define faces and their existence probabilities, and devise regularizations that yield simplified, but accurate meshes based on them (Table 2). See Table 3 for more detailed comparisons to these other methods. 2.2 Delaunay Triangulation for Geometry Processing Delaunay Triangulation (DT) (Aurenhammer u. a., 2013) has been proven to be useful for reconstructing shapes from unorganized point sets. It s been shown that DT of dense samples on a smooth 2D curve includes the curve within its edges (Brandt und Algazi, 1992; Amenta u. a., 1998a). This idea of using DT to approximate shape has been successfully extended to 3D, to reconstruct threedimensional shapes (Amenta u. a., 1998b) for point sets that satisfy certain constraints. However, these approaches are deterministic. Our method can be considered as a differentiable version of these approaches, which admits gradient-based optimization. More recently, Rakotosaona u. a. (2021) focused on this DT s property to connect points and tessellate the domain, and proposed a differentiable WDT algorithm to compute smooth inclusion, namely existence score of 2-simplexes (triangles) in 2 dimensional space. However, it is not suitable to apply this approach to our 3D case, as there are computational challenges (Section 3.2). Other related work, Voro Mesh (Maruani u. a., 2023), also used Voronoi diagrams in point cloud reconstruction, but their formulation cannot represent open surfaces and is only confined to handle point clouds. 3 Preliminary 3.1 Probabilistic Approach to Mesh Connectivity To define a traditional, non-differentiable mesh, we specify the vertices and their connectivity. This connectivity is discrete, meaning for any given triplet of vertices, we check if they form a face in the mesh, returning 1 if they do and 0 otherwise. To overcome this discreteness, we propose a (a) 2D Font (b) 3D Dragon Figure 4: Illustration of our mesh representation for 2D and 3D cases. (a): Our representation in 2D for a letter A . (b): Our representation in 3D for a dragon model. Blue faces are real part and yellow ones are imaginary part . probabilistic approach to create a fully differentiable mesh given a triplet of vertices, we evaluate the probability of a face existing. This formulation enables differentiability not only of vertex positions but also of their connectivity. Note that we need a procedure that tells us the existence probability of any given face to realize this probabilistic approach. This procedure must be 1) differentiable, 2) computationally efficient, and 3) maintain desirable mesh properties, such as avoiding non-manifoldness and self-intersections, when determining the face probabilities. Among many possible options, we use Weighted Delaunay Triangulation (WDT) (Figure 6(a)) and a point-wise feature called the "real" value (ψ) to define our procedure. Each vertex in our framework is represented as a 5-dimensional2 vector including position (3), WDT weight (1), and real value (1) (Figure 3(a)). Given the precomputed WDT based on vertices positions and weights, we check the face existence probability of each possible triplets. Specifically, 1) a face F must exist in the WDT, and then 2) satisfy a condition on the real values of its vertices to exist on the actual surface. We describe the probability functions for these conditions as Λwdt and Λreal: Λwdt(F) = P(F WDT), Λreal(F) = P(F Mesh | F WDT). (1) Then we get the final existence probability function, which can be used in downstream applications (Figure 3), as follows: Λ(F) = P(F Mesh) = Λwdt(F) Λreal(F). (2) This formulation attains one nice property in determining the final mesh that is, it prohibits self-intersections between faces. When it comes to the other two criteria about this procedure, Λwdt function s differentiability and efficiency is crucial, as we design Λreal to be a very efficient differentiable function based on real values (Section 4.2). Thus, we first introduce how we can evaluate Λwdt in a differentiable and efficient manner, which is one of our main contributions. 3.2 Basic Principles Figure 5: Renderings of ks for different pairs of (d, k). Different ks are rendered in different colors. To begin with, we use (d, k) pair to denote a k-simplex ( k) in d-dimensional space. For a 3D mesh, we observe that our face F corresponds to (d = 3, k = 2) in Figure 5(b). To compute the probability Λwdt for the face, we use power diagram (PD), which is the dual structure of WDT (Figure 6(a)). While previously Rakotosaona et al. (2021) proposed a differentiable 2D triangulation method for the (d = k = 2) case (Figure 5(c)), it suffers from quadratically increasing computational cost (e.g. it takes 4.3 seconds to process 10K points in 2D, Table 4) and unreliable estimation when (k < d). We will discuss later how our formulation conquer these computational challenges. Our setting for 3D meshes are similar to 2D meshes, the (d = 2, k = 1) case in Figure 5(d), where a triangular face reduces to a line. Therefore, we will mainly use this setting to describe and visualize basic concepts for simplicity. However, note that it can be generalized to any (d, k) case without much problem. 2In 2D case, a vertex is a 4-dimensional vector including position (2), WDT weight (1), and real value (1). p1 p2 p1 p2 p1 (a) WDT and PD (b) Power cell, (c) Reduced power cell, (d) Reduced power cell, Figure 6: Basic concepts to compute existence probability of given 1-simplex ( 1) when d = 2. (a): WDT and PD of given set of weighted vertices are rendered in solid and dotted lines. The size of a vertex represents its weight. (b): Power cell of p1 (Cp1) is rendered in grey. Also, 1 is rendered in black line, of which dual line (D 1) is rendered in red. (c), (d): For given 1, reduced power cell of p1 for the 1 (Rp1| 1) is rendered in blue, with the original power cell (grey). We can evaluate the existence of 1 in WDT by computing the signed distance from D 1 to Rp1| 1. We generalize the basic principles suggested by Rakotosaona u. a. (2021) to address our cases. For formal definitions of the concepts in this section, please refer to Appendix B. Let S be a finite set of points in Rd=2 with weights. For a given point p S, we denote its weight as wp. We call those weighted points in S as vertices , to distinguish them from general unweighted points in R2. Then, we adopt power distance π(p, q) = d(p, q)2 wp wq as the distance measure between two vertices in S. As depicted in Figure 6, k=1 is a 1-simplex, which is a line connecting two vertices pi and pj in S. Its dual form D 1 (red line) is the set of unweighted points3 in R2 that are located at the same power distance to pi and pj. In the power diagram, the power cell Cp (gray cell) of a vertex p is the set of unweighted points that are closer to p than to any other vertices in S. We can use Cp and D 1 to measure the existence of 1. From Figure 6, we can see that when 1 = {pi, pj} exists in WDT, its dual line D 1 aligns exactly with Cpi s boundary, while when 1 = {pi, pj} doesn t exist in WDT, D 1 is outside Cpi. To make this measurement less binary when 1 exists, we use the expanded version of power cell called "reduced" power cell (Rp| ), introduced by Rakotosaona u. a. (2021). The reduced power cell of pi S for 1 = {pi, pj} is computed by excluding pj from S when constructing the power cell 4. For example, when 1 exists, Rpi| 1 will expand towards pj s direction (Figure 6(c)), and D 1 will "go through" Rpi| 1. In contrast, when doesn t exists, even though we have removed pj, Rpi| 1 will not expand (Figure 6(d)), and thus D 1 stays outside of Rpi| 1. Now we can define a signed distance field τpt(x, R) for a given reduced power cell R, where the signed distance is measured as the distance from the point x Rd to the boundary of the reduced power cell (sign is positive when inside). Then, we can induce the signed distance between a dual form D and a reduced power cell R: τ(D, R) = max x D τpt(x, R). (3) As illustrate in Figure 6(c) and (d), τ(D 1, Rp1| 1) is positive when 1 exists, while negative when it does not exist in WDT. This observation can be generalized as follows: Remark 3.1. k exists in WDT if and only if pi k, τ(D k, Rpi| k) > 0. In fact, the sign of every τ(D k, Rpi| k) is same for every pi k. Therefore, we can use its average to measure the existence probability of k, along with sigmoid function σ: Λwdt( k) = 1 k + 1 p { k} σ(τ(D k, Rp| k) αwdt), (4) where αwdt is a constant value used for the sigmoid function. Λwdt( k) is greater than 0.5 when k exists, aligning with our probabilistic viewpoint and being differentiable. 3We treat unweighted points weight as 0 when computing the power distance. 4In 3D case where k = 2 and 2 = {pi, pj, pk}, pj and pk would be ignored for Rpi| 2. Computational Challenges. As mentioned before, Rakotosaona et al. (2021) solved the problem for the case where (d = k = 2) where the dual D k is a single point. Naïvely applying their approach for computing Eq. 3 to our cases poses two computational challenges: Precision: When (d = 3, k = 2) or (d = 2, k = 1), D k becomes a line, not a single point. Finding a point on the line that maximizes Eq. 3 is not straightforward. Efficiency: When naïvely estimating the reduced power cell in exhaustive manner, the computational cost increases with the number of points at a rate of O(N 2), where N is the number of points. See Appendix B.2 for a detailed discussion of these limitations. 4 Formulation 4.1 Practical approach to compute Λwdt We introduce a practical approach to resolve the two aforementioned challenges. Specifically, we propose constructing the PD first and use it for computing lower bound of Eq. 3 in an efficient way. We also propose to handle the two cases separately: whether k exists in the WDT or not. First, when the simplex k does not exist, we choose to use the negative distance between the dual form and the normal power cell, d(D k, Cp). This is the lower bound of Eq. 3: d(D k, Cp) τ(D k, Rp| k) < 0, (5) as Cp Rp| k. See Figure 6(d) for this case in (d = 2, k = 1) case. We can observe that computing this distance is computationally efficient, because the normal PD only needs to be computed once for all in advance. Moreover, Cp is a convex polyhedron, and D k is a (convex) line, which allows us to find the distance between line segments on the boundary of Cp5 and D k, and choose the minimum. Second, we analyze the case when the simplex k exists in WDT. In this case, we have to construct the reduced power cell Rp| k for given k, which requires much additional cost. Instead of doing it, we leverage pre-computed PD to approximate the reduced power cell. Then, we pick a point v D k Rp| k, where p { k}, and the following holds: 0 τpt(v, Rp| k) τ(D k, Rp| k), (6) because v Rp| k and by the definition at Eq. 3. In our case, since D k Rp| k is a line segment, we choose its middle point as v to tighten this lower bound. See Figure 6(c) for the line segment in (d = 2, k = 1) case. We use this bound when k exists. Note that computing this lower bound is also computationally efficient, because we can simply project v to the reduced power cell. To sum up, we can rewrite Eq. 3 as follows. τ(D k, Rp| k) = τpt(v, Rp| k) if k WDT d(D k, Cp) else (7) By using this relaxation, we can get lower bound of Eq. 3, which is reliable because it always has the same sign. Also, we can reduce the computational cost from O(N 2) to nearly O(N), which is prerequisite for representing meshes that have more than 1K vertices in general. See Appendix B.2 for the computational speed and accuracy of our method, compared to the previous one. Finally, we implemented our algorithm for computing Eq. 7 in CUDA for further acceleration. 4.2 Definition of Λreal Λreal evaluates the existence probability of a k-simplex k in our mesh when it exists in WDT. To define it, we leverage per-point value ψ [0, 1]. To be specific, we compute the minimum ψ of the points in k in differentiable way: Λreal( k) = P p k κ(p) Ψ(p), where κ(p) = e β Ψ(p)/ P q k e β Ψ(q), and Ψ is function that maps a point p to its ψ value. We set β = 100. Along with Λwdt that we discussed before, now we can evaluate the final existence probability of faces in Eq. 2. We also note here, that when we extract the final mesh, we only select the faces of which Λwdt and Λreal are larger than 0.5. 5This holds when d = 3. When d = 2, we can use vertices of of Cp. 4.3 Loss Functions DMesh can be reconstructed from various inputs, such as normal meshes, point clouds, and multiview images. With its per-vertex features and per-face existence probabilities Λ(F), we can optimize it with various reconstruction losses and regularization terms. Please see details in Appendix C. 4.3.1 Reconstruction Loss (Lrecon) First, if we have a normal mesh with vertices P and faces F, and we want to represent it with DMesh, we should compute the additional two per-vertex attributes, WDT weights and real values. We optimize them by maximizing Λ(F) since these faces lies on the reference mesh. Conversely, for the remaining set of faces F that can be defined on P, we should minimize Λ( F). Together, they define the reconstruction loss for mesh input (Appendix C.1). For reconstruction from point clouds or multi-view images, we need to optimize for all features including positions. For point clouds, we define our loss using Chamfer Distance (CD) and compute the expected CD using our face probabilities (Appendix C.2). For multi-view images, we define the loss as the L1 loss between the given images and the rendering of DMesh, interpreting face probabilities as face opacities. We implemented efficient differentiable renderers to allow gradients to flow across face opacities (Appendix C.3). 4.3.2 Regularizations Figure 7: Results with different λweight. Being fully differentiable for both vertices and faces, DMesh allows us to develop various regularizations to improve the optimization process and enhance the final mesh quality. The first is weight regularization (Lweight), applied to the dual Power Diagram of the WDT (Appendix C.4). This regularization reduces the structural complexity of the WDT, controlling the final mesh complexity (Figure 7). The next is real regularization (Lreal), which enforces nearby points to have similar real values and increases the real values of points adjacent to high real value points (Appendix C.5). This helps remove holes or inner structures and makes faces near the current surface more likely to be considered (Appendix D). The final regularization, quality regularization (Lqual), aims to improve the quality of triangle faces by minimizing the average expected aspect ratio of the faces, thus removing thin triangles (Appendix C.6). To sum up, our final loss function can be written as follows: L = Lrecon + λweight Lweight + λreal Lreal + λqual Lqual, where λ values are hyperparameters. In Appendix E, we provide values for these hyperparameters for every experiment. Also, in Appendix E.3, we present ablation studies for these regularizations. 5 Experiments and Applications Table 1: Mesh reconstruction results. - Bunny Dragon Buddha RE 99.78% 99.72% 99.64% FP 0.00% 0.55% 0.84% In this section, we provide experimental results to demonstrate the efficacy of our approach. First, we optimize vertex attributes to restore a given ground truth mesh, directly proving the differentiability of our design. Next, we conduct experiments on 3D reconstruction from point clouds and multi-view images, showcasing how our differentiable formulation can be used in downstream applications. For the mesh reconstruction problem, we used three models from the Stanford 3D Scanning Repository (Curless und Levoy, 1996). For point cloud and multi-view reconstruction tasks, we used four closed-surface models from the Thingi32 dataset, four open-surface models from the Deep Fashion3D dataset, and three additional models with both closed and open surfaces from the Objaverse dataset Figure 8: Point cloud and multi-view reconstruction results. (a): Ground truth mesh. (b), (f): Our method restores the original shape without losing much detail. (c), (d), (g), (h): PSR (Kazhdan und Hoppe, 2013), Voro Mesh (Maruani u. a., 2023), Flexi Cube (Shen u. a., 2023), and NIE (Mehta u. a., 2022) fail for open and mixed surfaces. (e): NDC (Chen u. a., 2022b) exhibits artifacts from grids. Table 2: Quantitative comparison for point cloud and multi-view reconstruction results. Best results are written in bold. Methods CD ( 10 5) F1 NC ECD EF1 # Verts # Faces Time (sec) PSR 690 0.770 0.931 0.209 0.129 159K 319K 10.6 Voro Mesh >1K 0.671 0.819 >1K 0.263 121K 242K 12.2 NDC 3.611 0.874 0.936 0.022 0.421 20.7K 42.8K 3.49 Ours (w/o normal) 3.726 0.866 0.936 0.067 0.342 3.87K 10.4K 775 Ours (w/ normal) 3.364 0.886 0.952 0.141 0.438 3.56K 7.54K 743 NIE 585 0.439 0.848 0.064 0.023 74.5K 149K 6696 Flexi Cube 273 0.591 0.881 0.039 0.152 10.9K 21.9K 56.47 Ours 34.6 0.685 0.892 0.094 0.113 4.19K 8.80K 1434 and Adobe Stock. These models are categorized as "closed," "open," and "mixed" in this section. Additionally, we use nonconvex polyhedra of various Euler characteristics and non-orientable geometries to prove our method s versatility. We implemented our main algorithm for computing face existence probabilites and differentiable renderer used for multi-view image reconstruction in CUDA (Nickolls u. a., 2008). Since we need to compute WDT before running the CUDA algorithm, we used WDT implementation of CGAL (Jamin u. a., 2023). We implemented the rest of logic with Pytorch (Paszke u. a., 2017). All of the experiments were run on a system with AMD EPYC 7R32 CPU and Nvidia A10 GPU. 5.1 Mesh to DMesh In this experiment, we demonstrate that we can preserve most of the faces in the original normal triangular mesh after converting it to DMesh using the mesh reconstruction loss introduced in 4.3. In Table 1, we show the recovery ratio (RE) and false positive ratio (FP) of faces in our reconstructed mesh. Note that we could recover over 99% of faces in the original mesh, while only having under 1% of false faces. Please see Appendix E.1 for more details. This result successfully validates our differentiable formulation, but also reveals its limitation in reconstructing some abnormal triangles in the original mesh, such as long, thin triangles. 5.2 Point Cloud & Multi-View Reconstruction In this experiment, we aim to reconstruct a mesh from partial geometric data, such as (oriented) point clouds or multi-view images. For point cloud reconstruction, we sampled 100K points from the ground truth mesh. We can additionally use point orientations, if they are available. For multi-view reconstruction, we rendered diffuse and depth images of the ground truth mesh from 64 view points. Figure 9: ( ) Shape interpolation using DMesh exhibiting topology change. After fitting DMesh to a torus (upper left), we optimize it again to reconstruct a double torus (lower right), which has a different genus. We use multi-view images for the optimization. In Appendix E, we illustrated the example inputs for these experiments. Also, please see Appendix D to see the initialization and densification strategy we took in these experiments. To validate our approach, we compare our results with various approaches. When it comes to point cloud reconstruction, we first compare our result with classical Screened Poisson Surface Reconstruction (PSR) method (Kazhdan und Hoppe, 2013) 6. Then, to compare our method with optimization based approach, we use recent Voro Mesh (Maruani u. a., 2023) method. Note that these two methods are not tailored for open surfaces. To compare our method also for the open surfaces, we use Neural Dual Contouring (NDC) (Chen u. a., 2022b), even though it is learning-based approach. Finally, for multi-view reconstruction task, we compare our results with Flexicube (Shen u. a., 2023) and Neural Implicit Evolution (NIE) (Mehta u. a., 2022), which correspond to volumetric approaches that can directly produce meshes of varying geometric topology for given visual inputs. In Figure 8, we visualize the reconstruction results along with the ground truth mesh for qualitative evaluation. For closed meshes, in general, volumetric approaches like PSR, Voro Mesh, and Flexicube, capture fine details better than our methods. This is mainly because we currently have limitation in the mesh resolution that we can produce with our method. NIE, which is also based on volumetric principles, generates overly smoothed reconstruction results. However, when it comes to open or mixed mesh models, which are more ubiquitous in real applications, we can observe that these methods fail, usually with false internal structures or self-intersecting faces (Appendix E.2). Since NDC leverages unsigned information, it can handle these cases without much problem as ours. However, we can observe step-like visual artifacts coming from its usage of grid in the final output, which requires post-processing. Additionally, to show the versatility of our representation, we also visualize various shapes reconstructed from oriented point clouds in Figure 2. Table 2 presents quantitative comparisons with other methods. We used following metrics from Chen u. a. (2022b) to measure reconstruction accuracy: Chamfer Distance (CD), F-Score (F1), Normal Consistency (NC), Edge Chamfer Distance (ECD), and Edge F-Score (EF1) to the ground truth mesh. Also, we report number of vertices and faces of the reconstructed mesh to compare mesh complexity, along with computational time. All values are average over 11 models that we used. In general, our method generates mesh of comparable, or better accuracy than the other methods. However, when it comes to ECD and EF1, which evaluate the edge quality of the reconstructed mesh, our results showed some weaknesses, because our method cannot prevent non-manifold edges yet. However, our method showed superior results in terms of mesh complexity this is partially due to the use of weight regularization. Please see Appendix E.3 to see how the regularization works through ablation studies. Likewise, our method shows promising results in producing compact and accurate mesh. However, we also note that our method requires more computational cost than the other methods in the current implementation. Before moving on, we present an experimental result about shape interpolation using DMesh in Figure 9. We used multi-view images to reconstruct a torus first, and then optimized the DMesh again 6We provide point orientations for PSR, which is optional for our method. 100 200 500 1K 2K 5K 10K 20K 50K 100K Number of Points Total Time (ms) 9 ms 10 ms 12 ms 16 ms 23 ms 44 ms 811 ms WDT Prob Figure 10: Analysis of computational cost for computing face existence probabilities (Λ(F)). The computational cost rises sharply beyond 20K points, with most of the time spent on WDT construction ( WDT"), while the probability computation ( Prob") requires significantly less time. to fit a double torus. The results show that DMesh effectively reconstructs the double torus, even when initialized from a converged single torus, highlighting the method s robustness to local minima. However, this also indicates that our representation lacks meaningful shape interpolation, as it does not assume any specific shape topology. 6 Conclusion 6.1 Limitations As shown above, our method achieves a more effective and complete forms of differentiable meshes of various topology than existing methods, but still has several limitations to overcome. Computational Cost. Currently, the resolution of DMesh is limited by computational cost. Although our theoretical relaxation and CUDA implementation reduce this burden, processing meshes with over 100K vertices remains challenging due to the computational bottleneck of constructing the WDT at each optimization step. In Figure 10, we analyze computational costs relative to the number of points. As shown, costs rise sharply beyond 20K points, with WDT construction consuming most of the time. This limits our method s ability to handle high resolution mesh. Non-Manifoldness. As we have claimed so far, DMesh shows much better generalization than the other methods as it does not have any constraints on the shape topology and mesh connectivity. However, due to this relaxation of constraint, we can observe spurious non-manifold errors in the mesh, even though we adopted measures to minimize them (Appendix D.2.7). Specifically, an edge must have at most two adjacent faces to be a "manifold" edge. Similarly, a "manifold" vertex should be adjacent to a set of faces that form a closed or open fan. We refer to edges or vertices that do not satisfy these definitions as "non-manifold." In our results, we found that 5.50% of edges and 0.38% of vertices were non-manifold for point cloud reconstruction. For multi-view reconstruction, 6.62% of edges and 0.25% of vertices were non-manifold. Therefore, we conclude that non-manifold edges are more prevalent than non-manifold vertices in our approach. 6.2 Future Work To address the computational cost issue, we can explore methods that reduce reliance on the WDT algorithm, as its cost increases significantly with the number of points. This is crucial since representing complex shapes with fine details often requires over 100K vertices. To tackle the non-manifold issue, we could integrate approaches based on (un)signed distance fields (Shen u. a., 2023; Liu u. a., 2023b) into our method, ensuring manifold mesh generation. Finally, future research could extend this work to solve other challenging problems, such as 3D reconstruction from real-world images, or applications like generative models for 3D shapes. This could involve encoding color or texture information within our framework, opening up exciting new directions for exploration. Acknowledgements We thank Zhiqin Chen and Matthew Fisher for helpful advice. This research is a joint collaboration between Adobe and University of Maryland at College Park. This work has been supported in part by Adobe, IARPA, UMD-ARL Cooperate Agreement, and Dr. Barry Mersky and Capital One Endowed E-Nnovate Professorships. [Amenta u. a. 1998a] AMENTA, Nina ; BERN, Marshall ; EPPSTEIN, David: The crust and the β-skeleton: Combinatorial curve reconstruction. In: Graphical models and image processing 60 (1998), Nr. 2, S. 125 135 [Amenta u. a. 1998b] AMENTA, Nina ; BERN, Marshall ; KAMVYSSELIS, Manolis: A new Voronoi-based surface reconstruction algorithm. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, 1998, S. 415 421 [Aurenhammer u. a. 2013] AURENHAMMER, Franz ; KLEIN, Rolf ; LEE, Der-Tsai: Voronoi diagrams and Delaunay triangulations. World Scientific Publishing Company, 2013 [Brandt und Algazi 1992] BRANDT, Jonathan W. ; ALGAZI, V R.: Continuous skeleton computation by Voronoi diagram. In: CVGIP: Image understanding 55 (1992), Nr. 3, S. 329 338 [Chen u. a. 2022a] CHEN, Anpei ; XU, Zexiang ; GEIGER, Andreas ; YU, Jingyi ; SU, Hao: Tenso RF: Tensorial Radiance Fields. In: European Conference on Computer Vision (ECCV), 2022 [Chen u. a. 2023] CHEN, Anpei ; XU, Zexiang ; WEI, Xinyue ; TANG, Siyu ; SU, Hao ; GEIGER, Andreas: Dictionary Fields: Learning a Neural Basis Decomposition. In: ACM Trans. Graph. (2023) [Chen u. a. 2019] CHEN, Wenzheng ; LING, Huan ; GAO, Jun ; SMITH, Edward ; LEHTINEN, Jaakko ; JACOBSON, Alec ; FIDLER, Sanja: Learning to predict 3d objects with an interpolationbased differentiable renderer. In: Advances in neural information processing systems 32 (2019) [Chen u. a. 2022b] CHEN, Zhiqin ; TAGLIASACCHI, Andrea ; FUNKHOUSER, Thomas ; ZHANG, Hao: Neural dual contouring. In: ACM Transactions on Graphics (TOG) 41 (2022), Nr. 4, S. 1 13 [Chen u. a. 2020] CHEN, Zhiqin ; TAGLIASACCHI, Andrea ; ZHANG, Hao: Bsp-net: Generating compact meshes via binary space partitioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, S. 45 54 [Cheng u. a. 2013] CHENG, Siu-Wing ; DEY, Tamal K. ; SHEWCHUK, Jonathan ; SAHNI, Sartaj: Delaunay mesh generation. CRC Press Boca Raton, 2013 [Cignoni u. a. 2008] CIGNONI, Paolo ; CALLIERI, Marco ; CORSINI, Massimiliano ; DELLEPIANE, Matteo ; GANOVELLI, Fabio ; RANZUGLIA, Guido u. a.: Meshlab: an open-source mesh processing tool. In: Eurographics Italian chapter conference Bd. 2008 Salerno, Italy (Veranst.), 2008, S. 129 136 [Curless und Levoy 1996] CURLESS, Brian ; LEVOY, Marc: A volumetric method for building complex models from range images. In: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, S. 303 312 [Guillard u. a. 2021] GUILLARD, Benoit ; REMELLI, Edoardo ; LUKOIANOV, Artem ; RICHTER, Stephan R. ; BAGAUTDINOV, Timur ; BAQUE, Pierre ; FUA, Pascal: Deep Mesh: Differentiable iso-surface extraction. In: ar Xiv preprint ar Xiv:2106.11795 (2021) [Hanocka u. a. 2019] HANOCKA, Rana ; HERTZ, Amir ; FISH, Noa ; GIRYES, Raja ; FLEISHMAN, Shachar ; COHEN-OR, Daniel: Meshcnn: a network with an edge. In: ACM Transactions on Graphics (To G) 38 (2019), Nr. 4, S. 1 12 [Jamin u. a. 2023] JAMIN, Clément ; PION, Sylvain ; TEILLAUD, Monique: 3D Triangulations. In: CGAL User and Reference Manual. 5.6. CGAL Editorial Board, 2023. URL https: //doc.cgal.org/5.6/Manual/packages.html#Pkg Triangulation3 [Ju u. a. 2002] JU, Tao ; LOSASSO, Frank ; SCHAEFER, Scott ; WARREN, Joe: Dual contouring of hermite data. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002, S. 339 346 [Kazhdan und Hoppe 2013] KAZHDAN, Michael ; HOPPE, Hugues: Screened poisson surface reconstruction. In: ACM Transactions on Graphics (To G) 32 (2013), Nr. 3, S. 1 13 [Kerbl u. a. 2023] KERBL, Bernhard ; KOPANAS, Georgios ; LEIMKÜHLER, Thomas ; DRETTAKIS, George: 3D Gaussian Splatting for Real-Time Radiance Field Rendering. In: ACM Transactions on Graphics 42 (2023), Nr. 4 [Kingma und Ba 2014] KINGMA, Diederik P. ; BA, Jimmy: Adam: A method for stochastic optimization. In: ar Xiv preprint ar Xiv:1412.6980 (2014) [Laine u. a. 2020] LAINE, Samuli ; HELLSTEN, Janne ; KARRAS, Tero ; SEOL, Yeongho ; LEHTINEN, Jaakko ; AILA, Timo: Modular primitives for high-performance differentiable rendering. In: ACM Transactions on Graphics (TOG) 39 (2020), Nr. 6, S. 1 14 [Lee 2010] LEE, John: Introduction to topological manifolds. Bd. 202. Springer Science & Business Media, 2010 [Liao u. a. 2018] LIAO, Yiyi ; DONNE, Simon ; GEIGER, Andreas: Deep marching cubes: Learning explicit surface representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, S. 2916 2925 [Liu u. a. 2020] LIU, Lingjie ; GU, Jiatao ; LIN, Kyaw Z. ; CHUA, Tat-Seng ; THEOBALT, Christian: Neural Sparse Voxel Fields. In: Neur IPS (2020) [Liu u. a. 2019] LIU, Shichen ; LI, Tianye ; CHEN, Weikai ; LI, Hao: Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, S. 7708 7717 [Liu u. a. 2023a] LIU, Yu-Tao ; WANG, Li ; YANG, Jie ; CHEN, Weikai ; MENG, Xiaoxu ; YANG, Bo ; GAO, Lin: Neudf: Leaning neural unsigned distance fields with volume rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, S. 237 247 [Liu u. a. 2023b] LIU, Zhen ; FENG, Yao ; XIU, Yuliang ; LIU, Weiyang ; PAULL, Liam ; BLACK, Michael J. ; SCHÖLKOPF, Bernhard: Ghost on the Shell: An Expressive Representation of General 3D Shapes. In: ar Xiv preprint ar Xiv:2310.15168 (2023) [Long u. a. 2023] LONG, Xiaoxiao ; LIN, Cheng ; LIU, Lingjie ; LIU, Yuan ; WANG, Peng ; THEOBALT, Christian ; KOMURA, Taku ; WANG, Wenping: Neuraludf: Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, S. 20834 20843 [Lorensen und Cline 1998] LORENSEN, William E. ; CLINE, Harvey E.: Marching cubes: A high resolution 3D surface construction algorithm. In: Seminal graphics: pioneering efforts that shaped the field. 1998, S. 347 353 [Maruani u. a. 2023] MARUANI, Nissim ; KLOKOV, Roman ; OVSJANIKOV, Maks ; ALLIEZ, Pierre ; DESBRUN, Mathieu: Voro Mesh: Learning Watertight Surface Meshes with Voronoi Diagrams. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, S. 14565 14574 [Mehta u. a. 2022] MEHTA, Ishit ; CHANDRAKER, Manmohan ; RAMAMOORTHI, Ravi: A level set theory for neural implicit evolution under explicit flows. In: European Conference on Computer Vision Springer (Veranst.), 2022, S. 711 729 [Mildenhall u. a. 2021] MILDENHALL, Ben ; SRINIVASAN, Pratul P. ; TANCIK, Matthew ; BARRON, Jonathan T. ; RAMAMOORTHI, Ravi ; NG, Ren: Nerf: Representing scenes as neural radiance fields for view synthesis. In: Communications of the ACM 65 (2021), Nr. 1, S. 99 106 [Munkberg u. a. 2022] MUNKBERG, Jacob ; HASSELGREN, Jon ; SHEN, Tianchang ; GAO, Jun ; CHEN, Wenzheng ; EVANS, Alex ; MÜLLER, Thomas ; FIDLER, Sanja: Extracting triangular 3d models, materials, and lighting from images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, S. 8280 8290 [Nickolls u. a. 2008] NICKOLLS, John ; BUCK, Ian ; GARLAND, Michael ; SKADRON, Kevin: Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for? In: Queue 6 (2008), Nr. 2, S. 40 53 [Nicolet u. a. 2021] NICOLET, Baptiste ; JACOBSON, Alec ; JAKOB, Wenzel: Large steps in inverse rendering of geometry. In: ACM Transactions on Graphics (TOG) 40 (2021), Nr. 6, S. 1 13 [Oechsle u. a. 2021] OECHSLE, Michael ; PENG, Songyou ; GEIGER, Andreas: UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. In: International Conference on Computer Vision (ICCV), 2021 [Palfinger 2022] PALFINGER, Werner: Continuous remeshing for inverse rendering. In: Computer Animation and Virtual Worlds 33 (2022), Nr. 5, S. e2101 [Paszke u. a. 2017] PASZKE, Adam ; GROSS, Sam ; CHINTALA, Soumith ; CHANAN, Gregory ; YANG, Edward ; DEVITO, Zachary ; LIN, Zeming ; DESMAISON, Alban ; ANTIGA, Luca ; LERER, Adam: Automatic differentiation in pytorch. (2017) [Rakotosaona u. a. 2021] RAKOTOSAONA, Marie-Julie ; AIGERMAN, Noam ; MITRA, Niloy J. ; OVSJANIKOV, Maks ; GUERRERO, Paul: Differentiable surface triangulation. In: ACM Transactions on Graphics (TOG) 40 (2021), Nr. 6, S. 1 13 [Shen u. a. 2021] SHEN, Tianchang ; GAO, Jun ; YIN, Kangxue ; LIU, Ming-Yu ; FIDLER, Sanja: Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In: Advances in Neural Information Processing Systems 34 (2021), S. 6087 6101 [Shen u. a. 2023] SHEN, Tianchang ; MUNKBERG, Jacob ; HASSELGREN, Jon ; YIN, Kangxue ; WANG, Zian ; CHEN, Wenzheng ; GOJCIC, Zan ; FIDLER, Sanja ; SHARP, Nicholas ; GAO, Jun: Flexible isosurface extraction for gradient-based mesh optimization. In: ACM Transactions on Graphics (TOG) 42 (2023), Nr. 4, S. 1 16 [Wang u. a. 2021] WANG, Peng ; LIU, Lingjie ; LIU, Yuan ; THEOBALT, Christian ; KOMURA, Taku ; WANG, Wenping: Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In: ar Xiv preprint ar Xiv:2106.10689 (2021) [Wang u. a. 2023] WANG, Yiming ; HAN, Qin ; HABERMANN, Marc ; DANIILIDIS, Kostas ; THEOBALT, Christian ; LIU, Lingjie: Neu S2: Fast Learning of Neural Implicit Surfaces for Multiview Reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023 [Wei u. a. 2023] WEI, Xinyue ; XIANG, Fanbo ; BI, Sai ; CHEN, Anpei ; SUNKAVALLI, Kalyan ; XU, Zexiang ; SU, Hao: Neu Manifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support. In: ar Xiv preprint ar Xiv:2305.17134 (2023) [Yariv u. a. 2021] YARIV, Lior ; GU, Jiatao ; KASTEN, Yoni ; LIPMAN, Yaron: Volume rendering of neural implicit surfaces. In: Thirty-Fifth Conference on Neural Information Processing Systems, 2021 [Yariv u. a. 2020] YARIV, Lior ; KASTEN, Yoni ; MORAN, Dror ; GALUN, Meirav ; ATZMON, Matan ; RONEN, Basri ; LIPMAN, Yaron: Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In: Advances in Neural Information Processing Systems 33 (2020) [Zhang u. a. 2020] ZHANG, Kai ; RIEGLER, Gernot ; SNAVELY, Noah ; KOLTUN, Vladlen: Ne RF++: Analyzing and Improving Neural Radiance Fields. In: ar Xiv:2010.07492 (2020) [Zhou u. a. 2020] ZHOU, Yi ; WU, Chenglei ; LI, Zimo ; CAO, Chen ; YE, Yuting ; SARAGIH, Jason ; LI, Hao ; SHEIKH, Yaser: Fully convolutional mesh autoencoder using efficient spatially varying kernels. In: Advances in neural information processing systems 33 (2020), S. 9251 9262 Table 3: Traits of different optimization-based shape reconstruction methods. We compare methods based on template mesh (Palfinger, 2022; Nicolet u. a., 2021), neural SDF (Wang u. a., 2021, 2023), neural UDF (Long u. a., 2023; Liu u. a., 2023a), differentiable isosurface extraction techniques (Shen u. a., 2021; Munkberg u. a., 2022; Shen u. a., 2023; Liu u. a., 2023b) with ours. Methods Closed Open Diff. Mesh Diff. Render. Geo. Topo. Mesh Topo. Manifold Template Mesh O O O O X X O Neural SDF O X X O O X O Neural UDF O O X O O X Diff. Isosurface O O O O X O DMesh (Ours) O O O O O O X A Comparison to Other Shape Reconstruction Methods Here we provide conceptual comparisons between our approach and the other optimization-based 3D reconstruction algorithms, which use different shape representations. To be specific, we compared our method with mesh optimization methods starting from template mesh (Palfinger, 2022; Nicolet u. a., 2021), methods based on neural signed distance fields (SDF) (Wang u. a., 2021, 2023), methods based on neural unsigned distance fields (UDF) (Liu u. a., 2023a; Long u. a., 2023), and methods based on differentiable isosurface extraction (Shen u. a., 2021; Munkberg u. a., 2022; Shen u. a., 2023; Liu u. a., 2023b). We used following criteria to compare these methods. Closed surface: Whether or not the given method can reconstruct, or represent closed surfaces. Open surface: Whether or not the given method can reconstruct, or represent open surfaces. Differentiable Meshing: Whether or not the given method can produce gradients from the loss computed on the final mesh. Differentiable Rendering: Whether or not the given method can produce gradients from the loss computed on the rendering results. Geometric topology: Whether or not the given method can change geometric topology of the shape. Here, geometric topology defines the continuous deformation of Euclidean subspaces (Lee, 2010). For instance, genus of the shape is one of the traits that describe geometric topology. Mesh topology: Whether or note the given method can produce gradients from the loss computed on the mesh topology, which denotes the structural configuration, or edge connectivity of a mesh. Manifoldness: Whether or not the given method guarantees manifold mesh. In Table 3, we present a comparative analysis of different methods. Note that our method meets all criteria, only except manifoldness. It is partially because our method does not assume volume, which is the same for methods based on neural UDF. However, because our method does not leverage smoothness prior of neural network like those methods, it could exhibit high frequency noises in the final mesh. Because of this reason, we gave to the neural UDF methods, while giving X to our approach. When it comes to methods based on differentiable isosurface extraction algorithms, we gave to its ability to handle open surfaces, because of (Liu u. a., 2023b). They can represent open surfaces as subset of closed ones, but cannot handle non-orientable open surfaces. Finally, note that our method is currently the only method that can handle mesh topology. Likewise, DMesh shows promise in addressing the shortcomings found in previous research. Nonetheless, it has its own set of limitations (Section 6). Identifying and addressing these limitations is crucial for unlocking the full potential of our method. B Details about Section 3.2 B.1 Mathematical Definitions Here, we provide formal mathematical definitions of the terms used in Section 3.2. We mainly use notations from Aurenhammer u. a. (2013) and Cheng u. a. (2013). Generalizing the notations in Section 3.2, let S[w] be a finite set of weighted points in Rd, where w is a weight assignment that maps each point p S to its weight wp. We denote a weighted point as p[wp] and define power distance to measure distance between two weighted points. Definition B.1 (Power distance). Power distance between two weighted points p[wp] and q[wq] is measured as: π(p[wp], q[wq]) = d(p, q)2 wp wq, (8) where d(p, q) is the Euclidean distance. Note that an unweighted point is regarded as carrying weight of 0. Based on this power distance, we can define the power cell Cp of a point p[wp] as the set of unweighted points in Rd that are closer to p[wp] than to any other weighted point in S[w]. Definition B.2 (Power cell). Power cell of a point p[wp] S[w] is defined as: Cp = {x Rd | q[wq] S[w], π(x, p[wp]) π(x, q[wq])}. (9) Note that some points may have empty power cells if their weights are relatively smaller than neighboring points. We call them submerged points. As we will see later, weight regularization aims at submerging unnecessary points in mesh definition, which leads to mesh simplification. To construct the power cell, we can use the concept of half space. A half space Hp |, (17) where λnormal is a parameter than determines the importance of point orientation in reconstruction. If λnormal = 0, we only consider the positional information of the sampled points. After we evaluate the above distance function values for the k-nearest points, we reorder them in ascending order. Then, we compute the following expected minimum distance from p to Pours, D(p, Pours) = X i=1,...,k D(p, pi) P(pi) P(pi), P(pi) = Λpt(pi) Iprev(F(pi), P(pi) = Πi=1,...,k 1(1 P(pi)), where Iprev is an indicator function that returns 1 only when the given face has not appeared before in computing the above expected distance. For instance, if the face ids for the reordered points were (1, 2, 3, 2, 3, 4), the Iprev function evaluates to (1, 1, 1, 0, 0, 1). This indicator function is needed, because if we select pi as the nearest point to p with the probability Λpt(p), it means that we interpret that the face corresponding to pi already exists, and then we would select pi on the face as the nearest point to p rather than the other points that were sampled from the same face, but have larger distance than pi and thus come after pi in the ordered points. Note that we dynamically change k during runtime to get a reliable estimation of D(p, Pours). That is, for current k, if most of P(pk)s for the points in Pgt are still large, it means that there is a chance that the estimation could change a lot if we find and consider more neighboring points. Therefore, in our experiments, if any point in Pgt has P(pk) larger than 10 4, we increase k by 1 for the next iteration. However, if there is no such point, we decrease k by 1 to accelerate the optimization process. Finally, we can compute CDgt by summing up the point-wise expected minimum distances. p Pgt D(p, Pours). C.2.3 CDours In computing CDours, which is CD from Pours to Pgt, we also find k-nearest neighbors for each point p Pours, which we denote as (p1, p2, ..., pk). Then, for a point p, we use the same distance function D in Eq. 17 to find the distance between p and (p1, p2, ..., pk). After that, we select the minimum one for each point, multiply the existence probability of each point, and then sum them up to compute CDours. D(p, Pgt) = min i=1,...,k D(p, pi), p Pours Λpt(p) D(p, Pgt). Finally, we can compute the final reconstruction loss for point clouds as shown in Eq. 16. C.3 Multi-View Reconstruction When we are given multi-view images, we reconstruct the mesh by minimizing the L1 difference between our rendered images and the given images. In this work, we mainly use both diffuse and depth renderings to reconstruct the mesh. If we denote the (Nimg) ground truth images of Npixel number of pixels as Igt i (i = 1, ..., Nimg), and our rendered images as Iours i , we can write the reconstruction loss function as Lrecon = 1 Nimg Npixel i=1,...,Nimg ||Igt i Iours i ||. Then, we can define our rendered image as follows: Iours i = F(P, F, Λ(F), MVi, Pi). where F is a differentiable renderer that renders the scene for the given points P, faces F, face existence probabilities Λ(F), i-th modelview matrix MVi R4 4, and i-th projection matrix Pi R4 4. The differentiable renderer F has to backpropagate gradients along P, F, and Λ(F) to update our point attributes. Specifically, here we interpret Λ(F) as opacity for faces to use in the rendering process. This is because opacity means the probability that a ray stops when it hits the face, which aligns with our face existence probability well. For this reason, we ignore faces with very low existence probability under some threshold to accelerate the reconstruction, as they are almost transparent and do not contribute to the rendering a lot. To implement F, we looked through previous works dedicated for differentiable rendering (Laine u. a., 2020; Liu u. a., 2019). However, we discovered that these methods incur substantial computational costs when rendering a large number of (potentially) semi-transparent triangles, as is the case in our scenario. Consequently, we developed two efficient, partially differentiable renderers that meet our specific requirements. These renderers fulfill distinct roles within our pipeline as detailed in Appendix D, our optimization process encompasses two phases within a single epoch. The first renderer is employed during the initial phase, while the second renderer is utilized in the subsequent phase. Figure 11: FB uses tessellation structure to efficiently render overlapped faces in the correct order. If there are multiple semi-transparent faces in the scene, we have to sort the faces that covers a target pixel with their (view-space) depth values, and iterate through them until the accumulated transmittance is saturated to determine the color for the pixel. Conducting this process for each individual pixel is not only costly, but also requires a lot of memory to store information for backward pass. Recently, 3D Gaussian Splatting (Kerbl u. a., 2023) overcame this issue with tile-based rasterizer. We adopted this approach, and modified their implementation to render triangular faces, instead of gaussian splats. To briefly introduce its pipeline, it first assigns face-wise depth value by computing the view-space depth of its center point. Then, after subdividing the entire screen into 16 16 tiles, we assign faces to each tiles if they overlap. After that, by using the combination of tile ID and the face-wise depth as a key, we get the face list sorted by depth value in each tile. Finally, for each tile, we iterate through the sorted faces and determine color and depth for each pixel as follows. i=1,...,k Ti αi Ci, (Ti = Πj=1,...,i 1(1 αj)), where Ti is the accumulated transmittance, αi is the opacity of the i-th face, and Ci is the color (or depth) of the i-th face. Note that αi = Λ(Fi), as mentioned above. (a) Rendered Images from FA (b) Rendered Images from FA Figure 12: Rendered images from two differentiable renderers, FA and FA . Left and right image corresponds to diffuse and depth rendering, respectively. (a) FA is our (partially) differentiable renderer based on tile-based approach. (b) Since FA does not produce visibility-related gradients, we additionally use FA (Laine u. a., 2020) to render images and integrate with ours. Even though this renderer admits an efficient rendering of large number of semi-transparent faces, there are still two large limitations in the current implementation. First, the current implementation does not produce visibility-related gradients (near face edges) to update point attributes. Therefore, we argue that this renderer is partially differentiable, rather than fully differentiable. Next, since it does not compute precise view-point depth for each pixel, its rendering result can be misleading for some cases, as pointed out in (Kerbl u. a., 2023). To amend the first issue, we opt to use another differentiable renderer of Laine u. a. (2020), which produces the visibility-related gradients that we lack. Since this renderer cannot render (large number of) transparent faces as ours does, we only render the faces with opacity larger than 0.5. Also, we set the faces to be fully opaque. If we call this renderer as FA , our final rendered image can be written as follows. Iours i = 1 2(FA(P, F, Λ(F), MVi, Pi) + FA (P, F, Λ(F), MVi, Pi)). In Figure 12, we illustrate rendered images from FA and FA . Acknowledging that this formulation is not theoretically correct, we believe that it is an intriguing future work to implement a fully differentiable renderer that works for our case. However, we empirically found out that we can reconstruct a wide variety of meshes with current formulation without much difficulty. As mentioned before, this renderer is used at the first phase of the optimization process, where all of the point attributes are updated. However, in the second phase, we fix the point positions and weights, and only update point-wise real values (Appendix D.2). In this case, we can leverage the tessellation structure to implement an efficient differentiable renderer. As the second renderer does a precise depth testing unlike the first one, it can be used to modify the errors incurred by the second limitation of the first renderer (Figure 13). The second renderer performs precise depth ordering in an efficient way, based on the fixed tessellation structure that we have. In Figure 11, we illustrate a 2D diagram that explains our approach. When the green ray, which corresponds to a single ray to determine the color of a single pixel, goes through the tessellation, we can observe that it goes through a sequence of triangles (tetrahedron in 3D), which are denoted as T1, T2, and T3. When the ray enters a triangle Ti through one of its three edges, we can see that it moves onto the other adjacent triangle Ti+1 only through one of the other edges of Ti, because of compact tessellation. Therefore, when the ray hits one edge of Ti, it can only examine the other two edges of Ti to find the next edge it hits. Note that we do not have to do depth testing explicitly in this approach. Also, unlike the first approach, this renderer does not have to store all the possible faces that a ray collides for the backward pass, because it can iterate the same process in the opposite way in the backward pass to find the edge that it hit before the last edge. If we only store the last edge that each hits at the forward pass, we can start from the last edge and find the previous edges that it hit to compute gradients. Therefore, this second renderer requires much less memory (a) Extracted Mesh after phase 1 (b) Extracted Mesh after phase 2 Figure 13: Reconstructed mesh from multi-view images, rendered in Mesh Lab s (Cignoni u. a., 2008) x-ray mode to see inner structure. In multi-view reconstruction, we divide each epoch in two phases. (a) After the first phase ends, where we do inaccurate depth testing, lots of false inner faces are created. (b) To remove these inner faces, we require a renderer that does the exact depth testing, which we use in the second phase. Also see Appendix D.2 for details about post-processing step to remove the inner structure. than the first one, and also performs precise depth testing naturally. However, note that this renderer is also partilly differentiable, because it cannot update point positions and weights. To sum up, we implemented two partially differentiable renderers to solve multi-view reconstruction problem with DMesh. They serve different objectives in our reconstruction process, and we empirically found out that they are powerful enough to reconstruct target meshes in our experiments. However, we expect that we can simplify the process and improve its stability, if we can implement a fully differentiable renderer that satisfy our needs. We leave it as a future work. C.4 Weight Regularization Weight regularization aims at reducing the complexity of WDT, which supports our mesh. By using this regularization, we can discard unnecessary points that do not contribute to representing our mesh. Moreover, we can reduce the number of points on the mesh, if they are redundant, which ends up in the mesh simplification effect (Appendix E.3). We formulate the complexity of WDT as the sum of edge lengths in its dual power diagram. Formally, we can write the regularization as follows, Lweight = X i=1,...,N Length(Ei), where Ei are the edges in the dual power diagram, and N is the number of edges. C.5 Real Regularization Real regularization is a regularization that is used for maintaining the real values of the connected points in WDT as similar as possible. Also, we leverage this regularization to make real values of points that are connected to the points with high real values to become higher, so that they can be considered in reconstruction more often than the points that are not connected to those points. To be specific, note that we ignore faces with very low existence probability in the reconstruction process. By using this regularization, it can remove holes more effectively. This real regularzation can be described as Lreal = 1 P i=1,...,N Λ(Fi) i=1,...,N Λ(Fi) (σ1(Fi) + σ2(Fi)), j=1,2,3 |ψj (ψ1 + ψ2 + ψ3) j=1,2,3 |1 ψj| I( max j=1,2,3(ψj) > δhigh). Here ψ1,2,3 represent the real values of points that comprise Fi, and δhigh is a threshold to determine high real value, which is set as 0.8 in our experiments. Note that the faces with higher existence probabilities are prioritized over the others. C.6 Quality Regularization After reconstruction, we usually want to have a mesh that is comprised of triangles of good quality, rather than ill-formed triangles. We adopt the aspect ratio as a quality measure for the triangular faces, and minimize the sum of aspect ratios for all faces during optimization to get a mesh of good quality. Therefore, we can write the regularization as follows. Lqual = 1 P i=1,...,N Λ(Fi) i=1,...,N AR(Fi) Emax(Fi) Λ(Fi), AR(Fi) = Emax(Fi) Emax(Fi) = Maximum edge length of Fi, Hmin(Fi) = Minimum height of Fi. Note that we prioritize faces with larger maximum edge length and higher existence probability than the others in this formulation. In Appendix E.3, we provide ablation studies for this regularization. D Optimization Process In this section, we explain the optimization processes, or exact reconstruction algorithms, in detail. First, we discuss the optimization process for the experiment in Section 5.1, where we represent the ground truth mesh with DMesh. Then, we discuss the overall optimization process for point cloud or multi-view reconstruction tasks in Section 5.2, from initialization to post processing. D.1 Mesh to DMesh Our overall algorithm to convert the ground truth mesh into DMesh is outlined in Algorithm 1. We explain each step in detail below. D.1.1 Point Initialization At the start of optimization, we initialize the point positions (P), weights (W), and real values (ψ) using the given ground truth information (Pgt, Fgt). To be specific, we initialize the point attributes as follows. P = Pgt, W = [1, ..., 1], ψ = [1, ..., 1]. The length of vector W and ψ is equal to the number of points. In Figure 14, we illustrate the initialized DMesh using these point attributes, which becomes the convex hull of the ground truth mesh. Algorithm 1 Mesh to DMesh Pgt, Fgt Ground truth mesh vertices and faces P, W, ψ Initialize point attributes for DMesh F Empty set of faces while Optimization not ended do P, W, ψ Do point insertion, with P, F WDT, PD Run WDT algorithm, with P, W F Update faces to exclude, with WDT Λ(Fgt), Λ( F) Compute existence probability for faces, with P, ψ, WDT, PD Lrecon Compute reconstruction loss, with Λ(Fgt), Λ( F) Update P, W, ψ to minimize Lrecon Bound P end M Get final mesh from DMesh (a) Ground Truth (b) Initialization (c) Point Insertion (d) 5000 Steps Figure 14: Intermediate results in converting bunny model to DMesh. For given ground truth mesh in (a), we initialize our point attributes using the mesh vertices. (b) Then, the initial mesh becomes convex hull of the original mesh. (c) To remove undesirable faces that were not in the original mesh, we insert additional points on the undesirable faces. Then, some of them disappear because of the inserted points. (d) After optimizing 5000 steps, just before another point insertion, DMesh recovers most of the ground truth connectivity. Note that during optimization, we allow only small perturbations to the positions of initial points, and fix weights and real values of them to 1. This is because we already know that these points correspond to the ground truth mesh vertices, and thus should be included in the final mesh without much positional difference. In our experiments, we set the perturbation bound as 1% of the model size. However, we notice that we cannot restore the mesh connectivity with only small perturbations to the initial point positions, if there are no additional points that can aid the process. Therefore, we periodically perform point insertion to add additional points, which is described below. D.1.2 Point Insertion The point insertion is a subroutine to add additional points to the current point configurations. It is performed periodically, at every fixed step. The additional points are placed at the random place on the faces in F, which correspond to the faces that should not exist in the final mesh. Therefore, these additional points can aid removing these undesirable faces. However, we found out that inserting a point for every face in F can be quite expensive. Therefore, we use k-means clustering algorithm to aggregate them into 0.1 NF clusters, where NF is the number of faces in F, to add the centroids of the clusters to our running point set. On top of that, we select 1000 random faces in F to put additional points directly on them. This is because there are cases where centroids are not placed on the good positions where they can remove the undesirable faces. In Figure 14, we render DMesh after point insertion to the initialized mesh. Note that some of the undesirable faces disappear because of the added points. Algorithm 2 Point cloud & Multi-view Reconstruction T Observation (Point cloud, Multi-view images) P, W, ψ Initialize point attributes for DMesh (using T if possible) F Empty set of faces while epoch not ended do P, W, ψ (If not first epoch) Initialize point attributes with sample points from current DMesh, for mesh refinement // Phase 1 while step not ended do WDT, PD Run WDT algorithm with P, W F Update faces to evaluate existence probability for, with WDT Λ(F) Compute existence probability for faces in F, with P, ψ, WDT, PD Lrecon Compute reconstruction loss, with P, F, Λ(F), T Lweight Compute weight regularization, with PD Lreal Compute real regularization, with P, ψ, WDT Lqual Compute quality regularization, with P, F, Λ(F) L Lrecon + λweight Lweight + λreal Lreal + λqual Lqual Update P, W, ψ to minimize L end // Phase 2 WDT, PD Run WDT algorithm with P, W F Faces in WDT Λwdt(F) 1 while step not ended do Λ(F) Compute existence probability for F, with P, ψ, Λwdt(F) Lrecon Compute reconstruction loss, with P, F, Λ(F), T Lreal Compute real regularization, with P, ψ, WDT L Lrecon + λreal Lreal Update ψ to minimize L end end M Get final mesh from DMesh, after post-processing D.1.3 Maintaining F In this problem, we minimize the reconstruction loss specified in Eq. 15 to restore the connectivity in the ground truth mesh, and remove faces that do not exist in it. In the formulation, we denoted the faces that are comprised of mesh vertices P, but are not included in the original mesh as F. Even though we can enumerate all of them, the total number of faces in F mounts to O(N 3), where N is the number of mesh vertices. Therefore, rather than evaluating all of those cases, we maintain a set of faces F that we should exclude in our mesh during optimization. To be specific, at each iteration, we find faces in the current WDT that are comprised of points in P, but do not exist in F, and add them to the running set of faces F. On top of that, at every pre-defined number of iterations, in our case 10 steps, we compute k-nearest neighboring points for each point in P. Then, we find faces that can be generated by combining each point with 2 of its k-nearest points, following Rakotosaona u. a. (2021). Then, we add the face combinations that do not belong to F to F. In our experiments, we set k = 8. D.2 Point cloud & Multi-view Reconstruction In Algorithm 2, we describe the overall algorithm that is used for point cloud and multi-view reconstruction tasks. We explain each step in detail below. D.2.1 Two Phase Optimization We divide each optimization epoch in two phases. In the first phase (phase 1), we optimize all of the point attributes positions, weights, and real values. However, in the second phase (phase 2), we fix the point positions and weights, and only optimize the real values. (a) Ground Truth (b) Initialized DMesh (Points, Extracted Mesh) Figure 15: Initialized DMesh using sample points from ground truth mesh. (a) From ground truth mesh, we uniformly sample 10K points to initialize DMesh. (b) In the left figure, sample points from the ground truth mesh (Psample) are rendered in red. The points that correspond to Pvoronoi are rendered in blue. In the right figure, we render the initial mesh we can get from the points, which has a lot of holes. This design aims at removing ambiguity in our differentiable formulation. That is, even though we desire face existence probabilities to converge to either 0 and 1, those probabilities can converge to the values in between. To alleviate this ambiguity, after the first phase ends, we fix the tessellation to make Λwdt for each face in F to either 0 or 1. Therefore, in the second phase, we only care about the faces that exist in current WDT, which have Λwdt value of 1. Then, we can only care about real values. Note that the two differentiable renderers that we introduced in Appendix C.3 are designed to serve for these two phases, respectively. D.2.2 Point Initialization with Sample Points In this work, we propose two point initialization methods. The first initialization method can be used when we have sample points near the target geometry in hand. This initialization method is based on an observation that the vertices of Voronoi diagram of a point set tend to lie on the medial axis of the target geometry (Amenta u. a., 1998a,b). Therefore, for the given sample point set Psample, we first build Voronoi diagram of it, and find Voronoi vertices Pvoronoi. Then, we merge them to initialize our point set P: P = Psample Pvoronoi, all of which weights are initialized to 1. Then, we set the real values (ψ) of points in Psample as 1, while setting those of points in Pvoronoi as 0. In Figure 15, we render the mesh that we can get from this initialization method, when we use 10K sample points. Note that the initial mesh has a lot of holes, because there could be Voronoi vertices that are located near the mesh surface, as pointed out by (Amenta u. a., 1998b). However, we can converge to the target mesh faster than the initialization method that we discuss below, because most of the points that we need are already located near the target geometry. D.2.3 Point Initialization without Sample Points If there is no sample point that we can use to initialize our points, we initialize our points with N 3 points regularly distributed on a grid structure that encompasses the domain, all of which has weight 1 and ψ value of 1. We set N = 20 for every experiment (Figure 16a). Then, we optimize the mesh to retrieve a coarse form of the target geometry (Figure 16b). Note that we need to refine this mesh in the subsequent epochs, as explained below. (a) Epoch 1, Initial State (b) Epoch 1, Last State (c) Epoch 2, Initial State (d) Epoch 2, Last State (e) Epoch 3, Initial State (f) Epoch 3, Last State (g) Epoch 4, Initial State (h) Epoch 4, Last State Figure 16: Optimization process for multi-view reconstruction for Plant model. At each row, we present the initial state (left) and the last state (right) of each epoch. For each figure, the left rendering shows the point attributes color coded based on real values, while the right one shows the extracted mesh. (a), (b) In the first epoch, we initialize DMesh without sample points. At the end of each epoch, we sample points from the current mesh, and use them for initialization in the next epoch. D.2.4 Point Initialization for Different Inputs Until now, we introduced two point initialization techniques. When the input is a point cloud, we sample subset of the point cloud to initialize our mesh (Figure 15). However, when the input is multi-view images, we start from initialization without sample points (Figure 16), because there is no sample point cloud that we can make use of. D.2.5 Maintaining F We maintain the running set of faces to evaluate probability existence for in F. At each iteration, after we get WDT, we insert every face in WDT to F, as it has a high possibility to persist in the subsequent optimization steps. Also, as we did int mesh to DMesh conversion (Appendix D.1), at every 10 optimization step, we find k-nearest neighbors for each point, and form face combinations based on them. Then, we add them to F. D.2.6 Mesh Refinement At start of each epoch, if it is not the first epoch, we refine our mesh by increasing the number of points. To elaborate, we refine our mesh by sampling N number of points on the current DMesh, and then initialize point attributes using those sample points as we explained above. We increase N as number of epoch increases. For instance, in our multi-view reconstruction experiments, we set the number of epochs as 4, and set N = (1K, 3K, 10K) for the epochs excluding the first one. In Figure 16, we render the initial and the last state of DMesh of each epoch. Note that the mesh complexity increases and becomes more accurate as epoch proceeds, because we use more points. Therefore, this approach can be regarded as a coarse-to-fine approach. D.2.7 Post-Processing When it comes to multi-view reconstruction, we found out that it is helpful to add one more constraint in defining the face existence. In our formulation, in general, a face F has two tetrahedra (T1, T2) that are adjacent to each other over the face. Then, we call the remaining point of T1 and T2 that is not included in F as P1 and P2. Our new constraint requires at least one of P1 and P2 to have ψ value of 0 to let F exist. This additional constraint was inspired by the fact that F is not visible from outside if F exists in our original formulation, and both of P1 and P2 have ψ value of 1. That is, if it is not visible from outside, we do not recognize its existence. This constraint was also adopted to accommodate our real regularization, which increases the real value of points near surface. If this regularization makes the real value of points inside the closed surface, they would end up in internal faces that are invisible from outside. Because of this invisibility, our loss function cannot generate a signal to remove them. In the end, we can expect all of the faces inside a closed surface will exist, because of the absence of signal to remove them. Therefore, we choose to remove those internal faces by applying this new constraint in the post-processing step. Note that this discussion is based on the assumption that our renderer does a precise depth testing. If it does not do the accurate depth testing, internal faces can be regarded as visible from outside, and thus get false gradient signal. In Figure 13a, the final mesh after phase 1 is rendered, and we can see therer are lots of internal faces as the renderer used in phase 1 does not support precise depth testing. However, we can remove them with the other renderer in phase 2, as shown in Figure 13b, which justifies our implementation of two different renderers. Finally, we note that this constraint is not necessary for point cloud reconstruction, because if we minimize CDours in Appendix C.2, the internal faces will be removed automatically. E Experimental Details In this section, we provide experimental details for the results in Section 5, and visual renderings of the our reconstructed mesh. Additionally, we provide the results of ablation studies about regularizations that we suggested in Section 4.3. E.1 Mesh to DMesh As shown in Table 1, we reconstruct the ground truth connectivity of Bunny, Dragon, and Buddha model from Stanford dataset (Curless und Levoy, 1996). For all these experiments, we optimized for 20K steps, and used an ADAM optimizer (Kingma und Ba, 2014) with learning rate of 10 4. For Bunny model, we inserted additional points at every 5000 step. For the other models, we inserted them at every 2000 step. In Figure 17, we provide the ground truth mesh and our reconstructed mesh. We can observe that most of the connectivity is preserved in our reconstruction, as suggested numerically in Table 1. However, note that the appearance of the reconstructed mesh can be slightly different from the ground truth mesh, because we allow 1% of positional perturbations to the mesh vertices. (a) Ground Truth Mesh (b) Reconstructed DMesh Figure 17: Reconstruction results for mesh to DMesh experiment. From Left: Bunny, Dragon, and Buddha. We can observe that most of the edge connectivity is perserved in the reconstruction, even though the appearance is slightly different from the ground truth mesh because of small perturbations of vertex positions. E.2 Point Cloud & Multi-view Reconstruction E.2.1 Hyperparameters for Point Cloud Reconstruction Optimizer: ADAM Optimizer, Learning rate = 10 4 for open surface meshes and two mixed surface meshes (Bigvegas, Raspberry) / 3 10 4 for closed surface meshes, and one mixed surface mesh (Plant). Regularization: λweight = 10 8, λreal = 10 3, λqual = 10 3 for every mesh. Number of epochs: Single epoch for every mesh. Number of steps per epoch: 1000 steps for phase 1, 500 steps for phase 2 for every mesh. E.2.2 Hyperparameters for Multi-view Reconstruction Optimizer: ADAM Optimizer, Learning rate = 10 3 in the first epoch, and 3 10 4 in the other epochs for every mesh. Weight Regularization: λweight = 10 8 for every mesh. Real Regularization: λreal = 10 3 for the first 100 steps in every epoch for open surface meshes and one mixed surface mesh (Plant) / 10 2 for the first 100 steps in every epoch for closed surface meshes and two mixed surface meshes (Bigvegas, Raspberry). Quality Regularization: λqual = 10 3 for every mesh. Normal Coefficient: λnormal = 0 for every mesh (Eq. 17). Number of epochs: 4 epochs for every mesh. In the first epoch, use 20 3 regularly distributed points for initialization. In the subsequent epochs, sample 1K, 3K, and 10K points from the current mesh for initialization. (a) Ground Truth Mesh (b) Flexicube Figure 18: Reconstruction results for a closed surface model in Thingi32 dataset. Flexicube (Shen u. a., 2023) can generate internal structures, while our approach removes them through post-processing. (a) Ground Truth (b) Flexicube (c) Flexicube, self-intersecting faces removed Figure 19: Reconstruction results for the Plant model. Flexicube (Shen u. a., 2023) can generate redundant, self-intersecting faces for open surfaces, in this case, leaves. To better capture the redundant faces, we rendered the models from upper side, which is shown in the bottom right figures. Number of steps per epoch: 500 steps for phase 1, 500 steps for phase 2 for every mesh. Batch size: 64 for open surface meshes, 16 for the other meshes. E.2.3 Visual Renderings In Figure 22, 23, and 24, we provide visual renderings of our point cloud and multi-view reconstruction results with ground truth mesh. We also provide illustration of input point cloud and diffuse map. Note that we also used depth renderings for multi-view reconstruction experiments. E.2.4 Additional Discussion Generally, we can observe that reconstruction results from both point cloud and multi-view images capture the overall topology well. However, we noticed that the multi-view reconstruction results are not as good as point cloud reconstruction results. In particular, we can observe small holes in the multi-view reconstruction results. We assume that these artifacts are coming from relatively weaker supervision of multi-view images than dense point clouds. Also, we believe that we can improve these multi-view reconstruction results with more advanced differentiable renderer, and better mesh refinement strategy. In the current implementation, we lose connectivity information at the start of each epoch, which is undesirable. We believe that we can improve this approach by inserting points near the regions of interest, rather than resampling over entire mesh. (a) Bigvegas Figure 20: Point cloud reconstruction results with different λweight. From Left: λweight = 10 6, 10 5, and 10 4. Also, regarding comparison to Flexicube (Shen u. a., 2023) in Table 2, we tried to found out the reason why ours give better results than Flexicube in terms of CD to the ground truth mesh for closed surfaces in thingi32 dataset. We could observe that Flexicube s reconstruction results capture fine geometric details on the surface mesh, but also observed that they have lots of false internal structure (Figure 18). Note that this observation not only applies to closed surfaces, but also to open surfaces, where it generates lots of false, self-intersecting faces (Figure 19). Our results do not suffer from these problems, as we do post-processing (Appendix D.2) to remove inner structure, and also our method can represent open surfaces better than the volumetric approaches without self-intersecting faces. E.3 Ablation studies In this section, we provide ablation studies for the regularizations that we proposed in Section 4.3. We tested the effect of the regularizations on the point cloud reconstruction task. E.3.1 Weight Regularization We tested the influence of weight regularzation in the final mesh, by choosing λweight in (10 6, 10 5, 10 4). Note that we set the other experimental settings as same as described in Section E.2, except λquality, which is set as 0, to exclude it from optimization. In Table 5, we provide the quantitative results for the experiments. For different λweight, we reconstructed mesh from point clouds, and computed average Chamfer Distance (CD) and average number of faces across every test data. We can observe that there exists a clear tradeoff between CD and mesh complexity. To be specific, when λweight = 10 6, the CD is not very different from the results in Table 2, where we use λweight = 10 8. However, when it increases to 10 5 and 10 4, we can observe that the mesh complexity (in terms of number of faces) decreases, but CD increases quickly. (a) Bigvegas Figure 21: Point cloud reconstruction results with different λquality. From Left: λreal = 10 4, 10 3, and 10 2. Table 5: Ablation study for weight regularization, quantitative results. λweight 10 6 10 5 10 4 CD 7.48 8.08 10.82 Num. Face 4753 2809 1786 The renderings in Figure 20 support these quantitative results. When λweight = 10 6, we can observe good reconstruction quality. When λweight = 10 5, there are small artifacts in the reconstruction, but we can get meshes of generally good quality with fewer number of faces. However, when it becomes 10 4, the reconstruction results deteriorate, making holes and bumpy faces on the smooth surface. Therefore, we can conclude that weight regularization contributes to reducing the mesh complexity. However, we need to choose λweight carefully, so that it does not harm the reconstruction quality. The experimental results tell us setting λweight to 10 6 could be a good choice to balance between these two contradictory objectives. E.3.2 Quality Regularization As we did in the previous section, we test the influence of quality regularization in the final mesh by selecting λreal among (10 4, 10 3, 10 2). We also set the other experimental settings as same as before, except λweight = 0. Table 6: Ablation study for quality regularization, quantitative results. λqual 10 4 10 3 10 2 CD 7.60 7.42 7.28 Num. Face 8266 8349 10806 Aspect Ratio 2.33 2.06 1.55 In Table 6 and Figure 21, we present quantitative and qualitative comparisons between the reconstruction results. We provide statistics about average CD, average number of faces, and average aspect ratio of faces. Interestingly, unlike weight regularization, we could not observe tradeoff between CD and aspect ratio. Rather than that, we could find that CD decreases as aspect ratio gets smaller, and thus the triangle quality gets better. We find the reason for this phenomenon in the increase of smaller, good quality triangle faces. Note that there is no significant difference between the number of faces between λqual = 10 4 and 10 3. Also, we cannot find big difference between visual renderings between them, even though the aspect ratio was clearly improved. However, when λqual becomes 10 2, the number of faces increase fast, which can be observed in the renderings, too. We believe that this increase stems from our quality constraint, because it has to generate more triangles to represent the same area, if there is less degree of freedom to change the triangle shape. Since it has more triangle faces, we assume that they contribute to capturing fine details better, leading to the improved CD. However, at the same time, note that the number of holes increase as we increase λqual, which lead to visual artifacts. We assume that there are not enough points to remove these holes, by generating quality triangle faces that meet our needs. Therefore, as discussed before, if we can find a systematic way to prevent holes, or come up with a better optimization scheme to remove them, we expect that we would be able to get accurate mesh comprised of better quality triangles. (a) Mesh 164 (b) Mesh 30 (c) Mesh 320 (d) Mesh 448 Figure 22: Point cloud and Multi-view Reconstruction results for open surface models. From Left: Ground truth mesh, sample point cloud, point cloud reconstruction results, diffuse rendering, multi-view reconstruction results. (a) Mesh 64444 (b) Mesh 252119 (c) Mesh 313444 (d) Mesh 527631 Figure 23: Point cloud and Multi-view Reconstruction results for closed surface models. From Left: Ground truth mesh, sample point cloud, point cloud reconstruction results, diffuse rendering, multi-view reconstruction results. (a) Bigvegas (c) Mesh 313444 Figure 24: Point cloud and Multi-view Reconstruction results for mixed surface models. From Left: Ground truth mesh, sample point cloud, point cloud reconstruction results, diffuse rendering, multi-view reconstruction results. Neur IPS Paper Checklist Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: In abstract and introduction, we described that we are presenting a differentiable mesh, and we are going to explore various aspects of it (e.g. computational cost, reconstruction task, regularization). They are accurately discussed across the entire paper, including Appendix. Guidelines: The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: We discussed the two limitations of our current approach in Section 6, which are about computational efficiency and manifoldness of the generated mesh. Guidelines: The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [No] Justification: Even though we tried our best to describe every theoretical detail in main paper, especially in Section 3.2, and Appendix B, our method assumes that the reader has some background knowledge about the geoemtrical concepts. However, we specified the sources that the readers can refer to learn details about the theoretical claims we made in the paper. Guidelines: The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: In Appendix D and E, we provided the pseudocode of our optimization process, and detailed hyperparameters to reproduce the experimental results. Also, we included code in the supplementary material. Guidelines: The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Excluding 2 models that we obtained from Adobe Stock, every model that we used is publicly available. Also, we submitted our code. However, due to the size limitation of the supplementary material, we could not submit the entire version of the code, because it depends on many external libraries. But we believe readers can compare the code and the paper to learn details of our paper. Guidelines: The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: We provide experimental details in Appendix E. Guidelines: The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [Yes] Justification: For the nature of our optimization based experiments, random initialization has little affect in the results. In addition, our experiments exam the effect of resolution and topology varieties, it does not use large amount of testing data. Thus, error bars for showing statistical significance of the experiments are not suitable for our paper. Guidelines: The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: We provided the machine that we used for running experiments in Section 5, and reported execution time in the experimental results. Guidelines: The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: Our research conforms to the every guideline in the Code of Ethics. Guidelines: The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: Our research is mainly about geometry, especially about triangle mesh formulation. So we believe it does not possess societal impact. Guidelines: The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: Our research does not deliver any trained model, but suggest a method for formulating differentiable mesh. Guidelines: The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [No] Justification: We cited the source of assets that we used in experiments in Section 5, but did not include license information about them. Guidelines: The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: We used only existing assets. Guidelines: The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: Our research does not involve crowdsourcing nor research with human subjects. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: Our research does not involve crowdsourcing nor research with human subjects. Guidelines: The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.