# shape_as_points_a_differentiable_poisson_solver__579fd4c1.pdf

Shape As Points: A Differentiable Poisson Solver

Songyou Peng1,2 Chiyu Max Jiang Yiyi Liao2,3 Michael Niemeyer2,3

Marc Pollefeys1,4 Andreas Geiger2,3

1ETH Zurich 2Max Planck Institute for Intelligent Systems, Tübingen 3University of Tübingen 4Microsoft

In recent years, neural implicit representations gained popularity in 3D reconstruction due to their expressiveness and ﬂexibility. However, the implicit nature of neural implicit representations results in slow inference time and requires careful initialization. In this paper, we revisit the classic yet ubiquitous point cloud representation and introduce a differentiable point-to-mesh layer using a differentiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPUaccelerated fast solution of the indicator function given an oriented point cloud. The differentiable PSR layer allows us to efﬁciently and differentiably bridge the explicit 3D point representation with the 3D mesh via the implicit indicator ﬁeld, enabling end-to-end optimization of surface reconstruction metrics such as Chamfer distance. This duality between points and meshes hence allows us to represent shapes as oriented point clouds, which are explicit, lightweight and expressive. Compared to neural implicit representations, our Shape-As-Points (SAP) model is more interpretable, lightweight, and accelerates inference time by one order of magnitude. Compared to other explicit representations such as points, patches, and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We demonstrate the effectiveness of SAP on the task of surface reconstruction from unoriented point clouds and learning-based reconstruction.

1 Introduction

Shape representations are central to many of the recent advancements in 3D computer vision and computer graphics, ranging from neural rendering [41,45,48,55,58] to shape reconstruction [10,26,40, 47,50,52,70]. While conventional representations such as point clouds and meshes are efﬁcient and well-studied, they also suffer from several limitations: Point clouds are lightweight and easy to obtain, but do not directly encode surface information. Meshes, on the other hand, are usually restricted to ﬁxed topologies. More recently, neural implicit representations [10,40,50] have shown promising results for representing geometry due to their ﬂexibility in encoding varied topologies, and their easy integration with differentiable frameworks. However, as such representations implicitly encode surface information, extracting the underlying surface is typically slow as they require numerous network evaluations in 3D space for extracting complete surfaces using marching cubes [10,40,50], or along rays for intersection detection in the context of volumetric rendering [45,47,49,70].

In this work, we introduce a novel Poisson solver which performs fast GPU-accelerated Differentiable Poisson Surface Reconstruction (DPSR) and solves for an indicator function from an oriented point cloud in a few milliseconds. Thanks to the differentiablility of our Poisson solver, gradients from a loss on the output mesh or a loss on the intermediate indicator grid can be efﬁciently backpropagated to update the oriented point cloud representation. This differential bridge between points, indicator

Work done while at UC Berkeley. Corresponding authors.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

Representations Points [17] Voxels [11] Meshes [61] Patches [20] Implicits [40] SAP (Ours)

Efﬁciency Grid Eval Time (1283) n/a n/a n/a n/a 0.33s 0.012s Priors Easy Initialization

Quality Watertight No Self-intersection n/a n/a Topology-Agnostic

Table 1: Overview of Different Shape Representations. Shape-As-Points produces higher quality geometry compared to other explicit representations [11,17,20,61] and requires signiﬁcantly less inference time for extracting geometry compared to neural implicit representations [40].

functions, and meshes allows us to represent shapes as oriented point clouds. We therefore call this shape representation Shape-As-Points (SAP). Compared to existing shape representations, Shape-As Points has the following advantages (see also Table 1):

Efﬁciency: SAP has a low memory footprint as it only requires storing a collection of oriented point samples at the surface, rather than volumetric quantities (voxels) or a large number of network parameters for neural implicit representations. Using spectral methods, the indicator ﬁeld can be computed efﬁciently (12 ms at 1283 resolution3), compared to the typical rather slow query time of neural implicit networks (330 ms using [40] at the same resolution). Accuracy: The resulting mesh can be generated at high resolutions, is guaranteed to be watertight, free from self-intersections and also topology-agnostic. Initialization: It is easy to initialize SAP with a given geometry such as template shapes or noisy observations. In contrast, neural implicit representations are harder to initialize, except for few simple primitives like spheres [1]. See supplementary for more discussions.

To investigate the aforementioned properties, we perform a set of controlled experiments. Moreover, we demonstrate state-of-the-art performance in reconstructing surface geometry from unoriented point clouds in two settings: an optimization-based setting that does not require training and is applicable to a wide range of shapes, and a learning-based setting for conditional shape reconstruction that is robust to noisy point clouds and outliers. In summary, the main contributions of this work are:

We present Shape-As-Points, a novel shape representation that is interpretable, lightweight, and yields high-quality watertight meshes at low inference times.

The core of the Shape-As-Points representation is a versatile, differentiable and generalizable Poisson solver that can be used for a range of applications.

We study various properties inherent to the Shape-As-Points representation, including inference time, sensitivity to initialization and topology-agnostic representation capacity.

We demonstrate state-of-the-art reconstruction results from noisy unoriented point clouds at a signiﬁcantly reduced computational budget compared to existing methods.

Code is available at https://github.com/autonomousvision/shape_as_points.

2 Related Work

2.1 3D Shape Representations

3D shape representations are central to 3D computer vision and graphics. Shape representations can be generally categorized as being either explicit or implicit. Explicit shape representations and learning algorithms depending on such representations directly parameterize the surface of the geometry, either as a point cloud [17, 38, 53, 54, 64, 67], parameterized mesh [22, 24, 27, 61] or surface patches [2,20,37,43,65,68,69]. Explicit representations are usually lightweight and require few parameters to represent the geometry, but suffer from discretization, the difﬁculty to represent

3On average, our method requires 12 ms for computing a 1283 indicator grid from 15K points on a single NVIDIA GTX 1080Ti GPU. Computing a 2563 indicator grid requires 140 ms.

watertight surfaces (point clouds, surface patches), or are restricted to a pre-deﬁned topology (mesh). Implicit representations, in contrast, represent the shape as a level set of a continuous function over a discretized voxel grid [14, 25, 33, 66] or more recently parameterized as a neural network, typically referred to as neural implicit functions [10,40,50]. Neural implicit representations have been successfully used to represent geometries of objects [10,16,18,35,40,44,47,50,57,59,62,63] and scenes [8,26,34,46,52,57]. Additionally, neural implicit functions are able to represent radiance ﬁelds which allow for high-ﬁdelity appearance and novel view synthesis [39,45]. However, extracting surface geometry from implicit representations typically requires dense evaluation of multi-layer perceptrons, either on a volumetric grid or along rays, resulting in slow inference time. In contrast, SAP efﬁciently solves the Poisson Equation during inference by representing the shape as an oriented point cloud.

2.2 Optimization-based 3D Reconstruction from Point Clouds

Several works have addressed the problem of inferring continuous surfaces from a point cloud. They tackle this task by utilizing basis functions, set properties of the points, or neural networks. Early works in shape reconstruction from point clouds utilize the convex hull or alpha shapes for reconstruction [15]. The ball pivoting algorithm [5] leverages the continuity property of spherical balls of a given radius. One of the most popular techniques, Poisson Surface Reconstruction (PSR) [28,29], solves the Poisson Equation and inherits smoothness properties from the basis functions used in the Poisson Equation. However, PSR is sensitive to the normals of the input points which must be inferred using a separate preprocessing step. In contrast, our method does not require any normal estimation and is thus more robust to noise. More recent works take advantage of the continuous nature of neural networks as function approximators to ﬁt surfaces to point sets [19, 23, 42, 65]. However, these methods tend to be memory and computationally intensive, while our method yields high-quality watertight meshes in a few milliseconds.

2.3 Learning-based 3D Reconstruction from Point Clouds

Learning-based approaches exploit a training set of 3D shapes to infer the parameters of a reconstruction model. Some approaches focus on local data priors [2,26] which typically result in better generalization, but suffer when large surfaces must be completed. Other approaches learn objectlevel [33,40,50] or scene-level priors [12,13,26,52]. Most reconstruction approaches directly reconstruct a meshed surface geometry, though some works [3,4,21,31] ﬁrst predict point set normals to subsequently reconstruct the geometry via PSR [28,29]. However, such methods fail to handle large levels of noise, since they are unable to move points or selectively ignore outliers. In contrast, our end-to-end approach is able to address this issue by either moving outlier points to the actual surface or by selectively muting outliers either by forming paired point clusters that self-cancel or reducing the magnitude of the predicted normals which controls their inﬂuence on the reconstruction.

At the core of the Shape-As-Points representation is a differentiable Poisson solver, which can be used for both optimization-based and learning-based surface estimation. We ﬁrst introduce the Poisson solver in Section 3.1. Next, we investigate two applications using our solver: optimization-based 3D reconstruction (Section 3.2) and learning-based 3D reconstruction (Section 3.3).

3.1 Differentiable Poisson Solver

The key step in Poisson Surface Reconstruction [28,29] involves solving the Poisson Equation. Let x R3 denote a spatial coordinate and n R3 denote its corresponding normal. The Poisson Equation arises from the insight that a set consisting of point coordinates and normals {p = (c, n)} can be viewed as samples of the gradient of the underlying implicit indicator function χ(x) that describes the solid geometry. We deﬁne the normal vector ﬁeld as a superposition of pulse functions v(x) = P

(ci,ni) {p} δ(x ci, ni), where δ(x, n) = {n if x = 0 and 0 otherwise}. By applying the divergence operator, the variational problem transforms into the standard Poisson equation:

2χ := χ = v (1)

In order to solve this set of linear Partial Differential Equations (PDEs), we discretize the function values and differential operators. Without loss of generality, we assume that the normal vector ﬁeld v and the indicator function χ are sampled at r uniformly spaced locations along each dimension. Denote the spatial dimensionality of the problem to be d. Without loss of generality, we consider

the three dimensional case where n := r r r for d = 3. We have the indicator function χ Rn, the point normal ﬁeld v Rn d, the gradient operator : Rn 7 Rn d, the divergence operator ( ) : Rn d 7 Rn, and the derived laplacian operator 2 := : Rn 7 Rn. Under such a discretization scheme, solving for the indicator function amounts to solving the linear system by inverting the divergence operator subject to boundary conditions of surface points having zero level set. Following [28], we ﬁx the overall scale to m = 0.5 at x = 0:

χ = ( 2) 1 v s.t. χ|x {c} = 0 and abs(χ|x=0) = m (2)

Point Rasterization: We obtain the uniformly discretized point normal ﬁeld v by rasterizing the point normals onto a uniformly sampled voxel grid. We can differentiably perform point rasterization via inverse trilinear interpolation, similar to the approach in [28,29]. We scatter the point normal values to the voxel grid vertices, weighted by the trilinear interpolation weights. The point rasterization process has O(n) space complexity, linear with respect to the number of grid cells, and O(N) time complexity, linear with respect to the number of points. See supplementary for details.

Spectral Methods for Solving PSR: In contrast to the ﬁnite-element approach taken in [28,29], we solve the PDEs using spectral methods [7]. While spectral methods are commonly used in scientiﬁc computing for solving PDEs and in some cases applied to computer vision problems [32], we are the ﬁrst to apply them in the context of Poisson Surface Reconstruction. Unlike ﬁnite-element approaches that depend on irregular data structures such as octrees or tetrahedral meshes for discritizing space, spectral methods can be efﬁcently solved over a uniform grid as they leverage highly optimized Fast Fourier Transform (FFT) operations that are well supported for GPUs, TPUs, and mainstream deep learning frameworks. Spectral methods decompose the original signal into a linear sum of functions represented using the sine / cosine basis functions whose derivatives can be computed analytically. This allows us to easily approximate differential operators in spectral space. We denote the spectral domain signals with a tilde symbol, i.e., v = FFT(v). We ﬁrst solve for the unnormalized indicator function χ , not accounting for boundary conditions

χ = IFFT( χ) χ = gσ,r(u) iu v 2π u 2 gσ,r(u) = exp 2σ2 u 2

where the spectral frequencies are denoted as u := (u, v, w) Rn d corresponding to the x, y, z spatial dimensions, and IFFT( χ) represents the inverse fast Fourier transform of χ. gσ,r(u) is a Gaussian smoothing kernel of bandwidth σ at grid resolution r in the spectral domain. The Gaussian kernel is used to mitigate ringing effects as a result of the Gibbs phenomenon from rasterizing the point normals. We denote the element-wise product as : Rn Rn 7 Rn, the L2-norm as 2 : Rn d 7 Rn, and the dot product as ( ) : Rn d Rn d 7 Rn. Finally, we subtract by the mean of the indicator function at the point set and scale the indicator function to obtain the solution to the PSR problem in Eqn. 2:

χ = m abs(χ |x=0) | {z } scale

c {c} χ |x=c

| {z } subtract by mean

A detailed derivation of our differentiable PSR solver is provided in the supplementary material.

3.2 SAP for Optimization-based 3D Reconstruction

We can use the proposed differentiable Poisson solver for various applications. First, we consider the classical task of surface reconstruction from unoriented point clouds. The overall pipeline for this setting is illustrated in Fig. 1 (top). We now provide details about each component.

Forward pass: It is natural to initialize the oriented 3D point cloud serving as 3D shape representation using the noisy 3D input points and corresponding (estimated) normals. However, to demonstrate the ﬂexibility and robustness of our model, we purposefully initialize our model using a generic 3D sphere with radius r in our experiments. Given the orientated point cloud, we apply our Poisson solver to obtain an indicator function grid, which can be converted to a mesh using Marching Cubes [36].

Backward pass: For every point pmesh sampled from the mesh M, we calculate a bi-directional L2 Chamfer Distance LCD with respect to the input point cloud. To backpropagate the loss LCD through

Noisy Input

Optimization-based Setting

Learning-based Setting

Offsets Normals

Shape-As-Points

Marching Cubes

Indicator Function Ground Truth

Mesh Output

Marching Cubes

Optimize Parameters

Optimize Points and Normals

Figure 1: Model Overview. Top: Pipeline for optimization-based single object reconstruction. The Chamfer loss on the target point cloud is backpropagated to the source point cloud w/ normals for optimization. Bottom: Pipeline for learning-based surface reconstruction. Unlike the optimizationbased setting, here we provide supervision at the indicator grid level, since we assume access to watertight meshes for supervision, as is common practice in learning-based single object reconstruction.

pmesh to point p in our source oriented point cloud, we decompose the gradient using the chain rule:

All terms in (5) are differentialable except for the middle one pmesh

χ which involves Marching Cubes. However, this gradient can be effectively approximated by the inverse surface normal [56]:

χ = nmesh (6)

where nmesh is the normal of the point pmesh. Different from Mesh SDF [56] that uses the gradients to update the latent code of a pretrained implicit shape representation, our method updates the source point cloud using the proposed differentiable Poisson solver.

Resampling: To increase the robustness of the optimization process, we uniformly resample points and normals from the largest mesh component every 200 iterations, and replace all points in the original point clouds with the resampled ones. This resampling strategy eliminates outlier points that drift away during the optimization, and enforces a more uniform distribution of points. We provide an ablation study in supplementary.

Coarse-to-ﬁne: To further decrease run-time, we consider a coarse-to-ﬁne strategy during optimization. More speciﬁcally, we start optimizing at an indicator grid resolution of 323 for 1000 iterations, from which we obtain a coarse shape. Next, we sample from this coarse mesh and continue optimization at a resolution of 643 for 1000 iterations. We repeat this process until we reach the target resolution (2563) at which we acquire the ﬁnal output mesh. See also supplementary.

3.3 SAP for Learning-based 3D Reconstruction

We now consider the learning-based 3D reconstruction setting in which we train a conditional model that takes a noisy, unoriented point cloud as input and outputs a 3D shape. More speciﬁcally, we train the model to predict a clean oriented point cloud, from which we obtain a watertight mesh using our Poisson solver and Marching Cubes. We leverage the differentiability of our Poisson solver to learn the parameters of this conditional model. Following common practice, we assume watertight meshes as ground truth and consequently supervise directly with the ground truth indicator grid obtained

from these meshes. Fig. 1 (bottom) illustrates the pipeline of our architecture for the learning-based surface reconstruction task.

Architecture: We ﬁrst encode the unoriented input point cloud coordinates {c} into a feature φ. The resulting feature should encapsulate both local and global information about the input point cloud. We utilize the convolutional point encoder proposed in [52] for this purpose. Note that in the following, we will use φθ(c) to denote the features at point c, dropping the dependency of φ on the remaining points {c} for clarity. Also, we use θ to refer to network parameters in general.

Given their features, we aim to estimate both offsets and normals for every input point c in the point cloud {c}. We use a shallow Multi-Layer Perceptron (MLP) fθ to predict the offset for c:

c = fθ(c, φθ(c)) (7)

where φ(c) is obtained from the feature volume using trilinear interpolation. We predict k offsets per input point, where k 1. We add the offsets c to the input point position c and call the updated point position ˆc. Additional offsets allow us to densify the point cloud, leading to enhanced reconstruction quality. We choose k = 7 for all learning-based reconstruction experiments (see ablation study in Table 4). For each updated point ˆc, we use a second MLP gθ to predict its normal:

ˆn = gθ(ˆc, φθ(ˆc)) (8)

We use the same decoder architecture as in [52] for both fθ and gθ. The network comprises 5 layers of Res Net blocks with a hidden dimension of 32. These two networks fθ and gθ do not share weights.

Training and Inference: During training, we obtain the estimated indicator grid ˆχ from the predicted point clouds (ˆc, ˆn) using our differentiable Poisson solver. Since we assume watertight and noise-free meshes for supervision, we acquire the ground truth indicator grid by running PSR on a densely sampled point clouds of the ground truth meshes with the corresponding ground truth normals. This avoids running Marching Cubes at every iteration and accelerates training. We use the Mean Square Error (MSE) loss on the predicted and ground truth indicator grid:

LDPSR = ˆχ χ 2 (9)

We implement all models in Py Torch [51] and use the Adam optimizer [30] with a learning rate of 5e-4. During inference, we use our trained model to predict normals and offsets, use DPSR to solve for the indicator grid, and run Marching Cubes [36] to extract meshes.

4 Experiments

Following the exposition in the previous section, we conduct two types of experiments to evaluate our method. First, we perform single object reconstruction from unoriented point clouds. Next, we apply our method to learning-based surface reconstruction on Shape Net [9], using noisy point clouds with or without outliers as inputs.

Datasets: We use the following datasets for optimization-based reconstruction: 1) Thingi10K [71], 2) Surface reconstruction benchmark (SRB) [65] and 3) D-FAUST [6]. Similar to prior works, we use 5 objects per dataset [19,23,65]. For learning-based object-level reconstruction, we consider all 13 classes of the Shape Net [9] subset, using the train/val/test split from [11].

Baselines: In the optimization-based reconstruction setting, we compare to network-based methods IGR [19] and Point2Mesh [23], as well as Screened Poisson Surface Reconstruction4 (SPSR) [29] on plane-ﬁtted normals. To ensure that the predicted normals are consistently oriented for SPSR, we propagate the normal orientation using the minimum spanning tree [72]. For learning-based surface reconstruction, we compare against point-based Point Set Generation Networks (PSGN) [17], patchbased Atlas Net [20], voxel-based 3D-R2N2 [11], and Conv ONet [52], which has recently reported state-of-the-art results on this task. We use Conv Onet in their best-performing setting (3-plane encoders). SPSR is also used as a baseline. In addition, to evaluate the importance of our differentiable PSR optimization, we design another point-based baseline. This baseline uses the same network architecture to predict points and normals. However, instead of passing them to our Poisson solver and

4We use the ofﬁcial implementation https://github.com/mkazhdan/Poisson Recon.

Input IGR [19] Point2Mesh [23] SPSR [29] Ours GT mesh

Figure 2: Optimization-based 3D Reconstruction. Input point clouds are downsampled for visualization. Note that the ground truth of SRB is provided as point clouds.

Dataset Method Chamfer-L1 ( ) F-Score ( ) Normal C. ( ) Time (s)

IGR [19] 0.440 0.505 0.692 1842.3 Point2Mesh [23] 0.109 0.656 0.806 3714.7 SPSR [29] 0.223 0.787 0.896 9.3 Ours 0.054 0.940 0.947 370.1

IGR [19] 0.178 0.755 1847.6 Point2Mesh [23] 0.116 0.648 4707.9 SPSR [29] 0.232 0.735 9.2 Ours 0.076 0.830 326.0

IGR [19] 0.235 0.805 0.911 1857.2 Point2Mesh [23] 0.071 0.855 0.905 3678.7 SPSR [29] 0.044 0.966 0.965 4.3 Ours 0.043 0.966 0.959 379.9

Table 2: Optimization-based 3D Reconstruction. Quantitative comparison on 3 datasets. Normal Consistency cannot be evaluated on SRB as this dataset provides only unoriented point clouds. Optimization time is evaluated on a single GTX 1080Ti GPU for IGR, Point2Mesh and our method.

calculate LDPSR on the indicator grid, we directly supervise the point positions with a bi-directional Chamfer distance, and an L1 Loss on the normals as done in [37]. During inference, we also feed the predicted points and normals to our PSR solver and run Marching Cubes to obtain meshes.

Metrics: We consider Chamfer Distance, Normal Consistency and F-Score with the default threshold of 1% for evaluation, and also report optimization & inference time.

4.1 Optimization-based 3D Reconstruction

In this part, we investigate whether our method can be used for the single-object surface reconstruction task from unoriented point clouds or scans. We consider three different types of 3D inputs: point clouds sampled from synthetic meshes [71] with Gaussian noise, real-world scans [65], and highresolution raw scans of humans with comparably little noise [6].

Fig. 2 and Table 2 show that our method achieves superior performance compared to both classical methods and network-based approaches. Note that the objects considered in this task are challenging

Input SPSR [29] 3D-R2N2 [11] Atlas Net [20] Conv ONet [52] Ours GT mesh

Figure 3: 3D Reconstruction from Point Clouds on Shape Net. Comparison of SAP to baselines on 3 different setups. More results can be found in supplementary.

(a) Noise=0.005 (b) Noise=0.025 (c) Noise=0.005, Outliers=50%

Chamfer-L1 F-Score Normal C. Chamfer-L1 F-Score Normal C. Chamfer-L1 F-Score Normal C. Runtime SPSR [29] 0.298 0.612 0.772 0.499 0.324 0.604 1.317 0.164 0.636 - PSGN [17] 0.147 0.259 - 0.151 0.247 - 0.736 0.007 - 0.010 s 3D-R2N2 [11] 0.172 0.400 0.715 0.173 0.418 0.710 0.202 0.387 0.709 0.015 s Atlas Net [20] 0.093 0.708 0.855 0.117 0.527 0.821 1.822 0.057 0.609 0.025 s Conv ONet [52] 0.044 0.942 0.938 0.066 0.849 0.913 0.052 0.916 0.929 0.327 s Ours (w/o LDPSR) 0.044 0.942 0.935 0.067 0.841 0.907 0.085 0.819 0.903 0.064 s Ours 0.034 0.975 0.944 0.054 0.896 0.917 0.038 0.959 0.936 0.064 s

Table 3: 3D Reconstruction from Point Clouds on Shape Net. Quantitative comparison between our method and baselines on the Shape Net dataset (mean over 13 classes).

due to their complex geometry, thin structures, noisy and incomplete observations. While some of the baseline methods fail completely on these challenging objects, our method achieves robust performance across all datasets.

In particular, Fig. 2 shows that IGR occasionally creates meshes in free space, as this is not penalized by its optimization objective when point clouds are unoriented. Both, Point2Mesh and our method alleviate this problem by optimizing for the Chamfer distance between the estimated mesh and the input point clouds. However, Point2Mesh requires an initial mesh as input of which the topology cannot be changed during optimization. Thus, it relies on SPSR to provide an initial mesh for objects with genus larger than 0 and suffers from inaccurate initialization [23]. Furthermore, compared to both IGR and Point2Mesh, our method converges faster.

While SPSR is even more efﬁcient, it suffers from incorrect normal estimation on noisy input point clouds, which is a non-trivial task on its own. In contrast, our method demonstrates more robust behavior as we optimize points and normals guided by the Chamfer distance. Note that in this single object reconstruction task, our method is not able to complete large unobserved regions (e.g., the bottom of the person s feet in Fig. 2 is unobserved and hence not completed). This limitation can be addressed using learning-based object-level reconstruction as discussed next.

To analyze whether our proposed differentiable Poisson solver is also beneﬁcial for learning-based reconstruction, we evaluate our method on the single object reconstruction task using noise and outlieraugmented point clouds from Shape Net as input to our method. We investigate the performance for three different noise levels: (a) Gaussian noise with zero mean and standard deviation 0.005, (b) Gaussian noise with zero mean and standard deviation 0.025, (c) 50% points have the same noise as in a) and the other 50% points are outliers uniformly sampled inside the unit cube.

Enc. Grid MC Total Enc. Grid MC Total Conv ONet 0.010 0.280 0.037 0.327 0.010 3.798 0.299 4.107 Ours 0.013 0.012 0.039 0.064 0.019 0.140 0.374 0.533

Chamfer F-Score Normal C

Offset 1 0.041 0.952 0.928 Offset 3 0.039 0.958 0.934 Offset 5 0.039 0.957 0.934 Offset 7 0.038 0.959 0.936 2D Enc. 0.043 0.939 0.928 3D Enc. 0.038 0.959 0.936

Table 4: Ablation Study. Left: Runtime breakdown (encoding, grid evaluation, marching cubes) for Conv ONet vs. ours in seconds. Right: Ablation over number of offsets and 2D vs. 3D encoders.

4.2 Learning-based Reconstruction on Shape Net

Fig. 3 and Table 3 show our results. Compared to the baselines, our method achieves similar or better results on all three metrics. The results show that, in comparison to directly using Chamfer loss on point positions and L1 loss on point normals, our DPSR loss can produce better reconstructions in all settings as it directly supervises the indicator grid which implicitly determines the surface through the Poisson equation. SPSR fails when the noise level is high or when there are outliers in the input point cloud. We achieve signiﬁcantly better performances than other representations such as point clouds, meshes, voxel grids and patches. Moreover, we ﬁnd that our method is robust to strong outliers. We refer to the supplementary for more detailed visualizations on how SAP handles outliers.

Table 3 also reports the runtime for setting (a) for all GPU-accelerated methods using a single NVIDIA GTX 1080Ti GPU, averaged over all objects of the Shape Net test set. The baselines [11, 17, 20] demonstrate fast inference time but suffer in terms of reconstruction quality while the neural implicit model [52] attains high quality reconstructions but suffers from slow inference. In contrast, our method is able to produce competitive reconstruction results at reasonably fast inference time. In addition, since Conv ONet and our method share a similar reconstruction pipeline, we provide a more detailed breakdown of the runtime at a resolution of 1283 and 2563 voxels in Table 4. We use the default setup from Conv ONet5. As we can see from Table 4, the difference in terms of point encoding and Marching Cubes is marginal, but we gain more than 20 speed-up over Conv ONet in evaluating the indicator grid. In total, we are roughly 5 and 8 faster regarding the total inference time at a resolution of 1283 and 2563 voxels, respectively.

4.3 Ablation Study

In this section, we investigate different architecture choices in the context of learning-based reconstruction. We conduct our ablation experiments on Shape Net for the third setup (most challenging).

Number of Offsets: From Table 4 we notice that predicting more offsets per input point leads to better performance. This can be explained by the fact that with more points near the object surface, geometric details can be better preserved.

Point Cloud Encoder: Here we compare two different point encoder architectures proposed in [52]: a 2D encoder using 3 canonical planes at a resolution of 642 pixels and a 3D encoder using a feature volume with a resolution of 323 voxels. We ﬁnd that the 3D encoder works best in this setting and hypothesize that this is due to the representational alignment with the 3D indicator grid.

5 Conclusion

We introduce Shape-As-Points, a novel shape representation which is lightweight, interpretable and produces watertight meshes efﬁciently. We demonstrate its effectiveness for the task of surface reconstruction from unoriented point clouds in both optimization-based and learning-based settings. Our method is currently limited to small scenes due to the cubic memory requirements with respect to the indicator grid resolution. We believe that processing scenes in a sliding-window manner and space-adaptive data structures (e.g., octrees) will enable extending our method to larger scenes. Point cloud-based methods are broadly used in real-world applications ranging from household robots to self-driving cars, and hence share the same societal opportunities and risks as other learning-based 3D reconstruction techniques.

5To be consistent, we use the Marching Cubes implementation from [60] for both Conv ONet and ours.

Acknowledgement: Andreas Geiger was supported by the ERC Starting Grant LEGO-3D (850533) and the DFG EXC number 2064/1 - project number 390727645. The authors thank the Max Planck ETH Center for Learning Systems (CLS) for supporting Songyou Peng and the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Michael Niemeyer. This work was supported by an NVIDIA research gift. We thank Matthias Niessner, Thomas Funkhouser, Hugues Hopp, Yue Wang for helpful discussions in early stages of this project. We also thank Xu Chen, Christian Reiser, Rémi Pautrat for proofreading.

[1] M. Atzmon and Y. Lipman. Sal: Sign agnostic learning of shapes from raw data. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2 [2] A. Badki, O. Gallo, J. Kautz, and P. Sen. Meshlet priors for 3d mesh reconstruction. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2, 3 [3] Y. Ben-Shabat and S. Gould. Deepﬁt: 3d surface ﬁtting via neural network weighted least squares. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 3 [4] Y. Ben-Shabat, M. Lindenbaum, and A. Fischer. Nesti-net: Normal estimation for unstructured 3d point clouds using convolutional neural networks. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 3 [5] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin. The ball-pivoting algorithm for surface reconstruction. IEEE Trans. on Visualization and Computer Graphics (VCG), 1999. 3 [6] F. Bogo, J. Romero, G. Pons-Moll, and M. J. Black. Dynamic FAUST: registering human bodies in motion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. 6, 7 [7] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral methods: fundamentals in single domains. Springer Science & Business Media, 2007. 4 [8] R. Chabra, J. E. Lenssen, E. Ilg, T. Schmidt, J. Straub, S. Lovegrove, and R. Newcombe. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 3 [9] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model repository. ar Xiv.org, 1512.03012, 2015. 6 [10] Z. Chen and H. Zhang. Learning implicit ﬁelds for generative shape modeling. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 3 [11] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A uniﬁed approach for single and multi-view 3d object reconstruction. In Proc. of the European Conf. on Computer Vision (ECCV), 2016. 2, 6, 8, 9 [12] A. Dai, C. Diller, and M. Nießner. SG-NN: sparse generative neural networks for self-supervised scene completion of RGB-D scans. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3 [13] A. Dai, C. Diller, and M. Nießner. Sg-nn: Sparse generative neural networks for self-supervised scene completion of rgb-d scans. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3 [14] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner. Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 3 [15] H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes. ACM Trans. on Graphics, 1994. 3 [16] P. Erler, P. Guerrero, S. Ohrhallinger, M. Wimmer, and N. J. Mitra. Points2surf: Learning implicit surfaces from point cloud patches. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 3 [17] H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction from a single image. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. 2, 6, 8, 9 [18] K. Genova, F. Cole, A. Sud, A. Sarna, and T. A. Funkhouser. Local deep implicit functions for 3d shape. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3 [19] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman. Implicit geometric regularization for learning shapes. In Proc. of the International Conf. on Machine learning (ICML), 2020. 3, 6, 7

[20] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. Atlas Net: A papier-mâché approach to learning 3d surface generation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 2, 6, 8, 9 [21] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. Pcpnet learning local shape properties from raw point clouds. In Computer Graphics Forum, 2018. 3 [22] K. Gupta and M. Chandraker. Neural mesh ﬂow: 3d manifold mesh generation via diffeomorphic ﬂows. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 2 [23] R. Hanocka, G. Metzer, R. Giryes, and D. Cohen-Or. Point2mesh: a self-prior for deformable meshes. In ACM Trans. on Graphics, 2020. 3, 6, 7, 8 [24] C. Jiang, J. Huang, A. Tagliasacchi, and L. J. Guibas. Shapeﬂow: Learnable deformation ﬂows among 3d shapes. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 2 [25] C. Jiang, P. Marcus, et al. Hierarchical detail enhancing mesh-based shape generation with 3d generative adversarial network. ar Xiv preprint ar Xiv:1709.07581, 2017. 3 [26] C. Jiang, A. Sud, A. Makadia, J. Huang, M. Nießner, and T. Funkhouser. Local implicit grid representations for 3d scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 3 [27] C. M. Jiang, J. Huang, K. Kashinath, Prabhat, P. Marcus, and M. Nießner. Spherical cnns on unstructured grids. In Proc. of the International Conf. on Learning Representations (ICLR), 2019. 2 [28] M. M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Proceedings of the Fourth Eurographics Symposium on Geometry Processing, 2006. 3, 4 [29] M. M. Kazhdan and H. Hoppe. Screened poisson surface reconstruction. ACM Trans. on Graphics, 32(3):29, 2013. 3, 4, 6, 7, 8 [30] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of the International Conf. on Machine learning (ICML), 2015. 6 [31] J. E. Lenssen, C. Osendorfer, and J. Masci. Deep iterative surface normal estimation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3 [32] J. Li and A. O. Hero. A spectral method for solving elliptic equations for surface reconstruction and 3d active contours. In Proc. IEEE International Conf. on Image Processing (ICIP), 2001. 4 [33] Y. Liao, S. Donne, and A. Geiger. Deep marching cubes: Learning explicit surface representations. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 3 [34] S. Lionar, D. Emtsev, D. Svilarkovic, and S. Peng. Dynamic plane convolutional occupancy networks. In Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2021. 3 [35] S. Liu, Y. Zhang, S. Peng, B. Shi, M. Pollefeys, and Z. Cui. Dist: Rendering deep implicit signed distance function with differentiable sphere tracing. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3 [36] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In ACM Trans. on Graphics, 1987. 4, 6 [37] Q. Ma, S. Saito, J. Yang, S. Tang, and M. J. Black. Scale: Modeling clothed humans with a surface codec of articulated local elements. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 2, 7 [38] Q. Ma, J. Yang, S. Tang, and M. J. Black. The power of points for modeling humans in clothing. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2021. 2 [39] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. Ne RF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3 [40] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger. Occupancy networks: Learning 3d reconstruction in function space. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 2, 3 [41] M. Meshry, D. B. Goldman, S. Khamis, H. Hoppe, R. Pandey, N. Snavely, and R. Martin Brualla. Neural rerendering in the wild. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1 [42] G. Metzer, R. Hanocka, D. Zorin, R. Giryes, D. Panozzo, and D. Cohen-Or. Orienting point clouds with dipole propagation. ACM Trans. on Graphics, 2021. 3 [43] M. Mihajlovic, S. Weder, M. Pollefeys, and M. R. Oswald. Deepsurfels: Learning online appearance fusion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 2

[44] M. Mihajlovic, Y. Zhang, M. J. Black, and S. Tang. LEAP: Learning articulated occupancy of people. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3 [45] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Ne RF: Representing scenes as neural radiance ﬁelds for view synthesis. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 1, 3 [46] M. Niemeyer and A. Geiger. Giraffe: Representing scenes as compositional generative neural feature ﬁelds. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3 [47] M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 3 [48] M. Oechsle, L. Mescheder, M. Niemeyer, T. Strauss, and A. Geiger. Texture ﬁelds: Learning texture representations in function space. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2019. 1 [49] M. Oechsle, S. Peng, and A. Geiger. Unisurf: Unifying neural implicit surfaces and radiance ﬁelds for multi-view reconstruction. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2021. 1 [50] J. J. Park, P. Florence, J. Straub, R. A. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 3 [51] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (Neur IPS), 2019. 6 [52] S. Peng, M. Niemeyer, L. Mescheder, M. Pollefeys, and A. Geiger. Convolutional occupancy networks. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 1, 3, 6, 8, 9 [53] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. 2 [54] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems (Neur IPS), 2017. 2 [55] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance ﬁelds with thousands of tiny mlps. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2021. 1 [56] E. Remelli, A. Lukoianov, S. R. Richter, B. Guillard, T. Bagautdinov, P. Baque, and P. Fua. Meshsdf: Differentiable iso-surface extraction. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 5 [57] V. Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein. Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 3 [58] V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhöfer. Deepvoxels: Learning persistent 3d feature embeddings. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1 [59] M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 3 [60] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, and T. Yu. scikit-image: image processing in python. Peer J, 2014. 9 [61] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proc. of the European Conf. on Computer Vision (ECCV), 2018. 2 [62] S. Wang, M. Mihajlovic, Q. Ma, A. Geiger, and S. Tang. Metaavatar: Learning animatable clothed human models from few depth images. In Advances in Neural Information Processing Systems (Neur IPS), 2021. 3 [63] W. Wang, Q. Xu, D. Ceylan, R. Mech, and U. Neumann. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. In Advances in Neural Information Processing Systems (Neur IPS), 2019. 3

[64] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph cnn for learning on point clouds. ACM Trans. on Graphics, 2019. 2 [65] F. Williams, T. Schneider, C. Silva, D. Zorin, J. Bruna, and D. Panozzo. Deep geometric prior for surface reconstruction. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 2, 3, 6, 7 [66] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems (Neur IPS), 2016. 3 [67] G. Yang, X. Huang, Z. Hao, M. Liu, S. J. Belongie, and B. Hariharan. Pointﬂow: 3d point cloud generation with continuous normalizing ﬂows. In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2019. 2 [68] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 2 [69] Z. Yang, Y. Chai, D. Anguelov, Y. Zhou, P. Sun, D. Erhan, S. Rafferty, and H. Kretzschmar. Surfelgan: Synthesizing realistic sensor data for autonomous driving. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2 [70] L. Yariv, Y. Kasten, D. Moran, M. Galun, M. Atzmon, B. Ronen, and Y. Lipman. Multiview neural surface reconstruction by disentangling geometry and appearance. In Advances in Neural Information Processing Systems (Neur IPS), 2020. 1 [71] Q. Zhou and A. Jacobson. Thingi10k: A dataset of 10,000 3d-printing models. ar Xiv preprint ar Xiv:1605.04797, 2016. 6, 7 [72] Q.-Y. Zhou, J. Park, and V. Koltun. Open3D: A modern library for 3D data processing. ar Xiv:1801.09847, 2018. 6