# hypernetwork_approach_to_generating_point_clouds__4c9a76bf.pdf

Hypernetwork Approach to Generating Point Clouds

Przemysław Spurek 1 Sebastian Winczowski 1 Jacek Tabor 1 Maciej Zamorski 2 3 Maciej Zieba 2 3

Tomasz Trzci nski 4 3

In this work, we propose a novel method for generating 3D point clouds that leverage properties of hyper networks. Contrary to the existing methods that learn only the representation of a 3D object, our approach simultaneously ﬁnds a representation of the object and its 3D surface. The main idea of our Hyper Cloud method is to build a hyper network that returns weights of a particular neural network (target network) trained to map points from a uniform unit ball distribution into a 3D shape. As a consequence, a particular 3D shape can be generated using point-by-point sampling from the assumed prior distribution and transforming sampled points with the target network. Since the hyper network is based on an auto-encoder architecture trained to reconstruct realistic 3D shapes, the target network weights can be considered a parametrization of the surface of a 3D shape, and not a standard representation of point cloud usually returned by competitive approaches. The proposed architecture allows ﬁnding meshbased representation of 3D objects in a generative manner while providing point clouds en pair in quality with the state-of-the-art methods.

1. Introduction

Today many registration devices, such as LIDARs and depth cameras, are able to capture not only RGB channels, but also depth estimates. As a result, 3D objects registered by those devices and geometric data structures representing them, called point clouds, become increasingly important in contemporary computer vision applications, including

1Faculty of Mathematics and Computer Science, Jagiellonian University, Krak ow, Poland 2Wrocław University of Science and Technology, Wrocław, Poland 3Tooploox, Wrocław, Poland 4Warsaw University of Technology, Warsaw, Poland. Correspondence to: Przemysław Spurek <przemyslaw.spurek@uj.edu.pl>, Tomasz Trzcinski <tomasz.trzcinski@pw.edu.pl>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

Figure 1. Mesh representations generated by our Hyper Cloud method. Contrary to the existing methods that return point cloud representations sparsely distributed in 3D space, our approach allows to create a continuous 3D object representation in the form of high quality meshes.

autonomous driving (Yang et al., 2018a) or robotic manipulation (Kehoe et al., 2015). To enable processing of point clouds, researchers typically transform them into regular 3D voxel grids or collections of images (Su et al., 2015; Wu et al., 2015). This, however, increases memory footprint of object representations and leads to signiﬁcant information losses. On the other hand, representing 3D objects with the parameters of their surfaces is not trivial due to the complexity of mesh representations and combinatorial irregularities. Last but not least, point clouds can contain a variable number of data points corresponding to one object and registered at various angles, which requires for the methods that process them to be permutation and rotation invariant.

One way of addressing the above challenges related to point cloud representations is to subsample the point clouds and enforce permutation invariance within the model architecture, as it was done in Deep Sets (Zaheer et al., 2017) or Point Net (Qi et al., 2017a;b). Although it works perfectly ﬁne when point clouds are given as an input of the model,

Hypernetwork Approach to Generating Point Clouds

it is not obvious how to apply this approach for variable size outputs. Recently introduced family of methods solve for this problem by relying on generative models that return probability distribution of the points on the object surface, instead of an exact set of points (Yang et al., 2019; Stypułkowski et al., 2019). The most successful methods that follow this path, such as Point Flow (Yang et al., 2019) and Conditioned Invertible Flow (Stypułkowski et al., 2019), are based on the ﬂow architecture that allows obtaining a representation of 3D object surfaces. The main limitation of the ﬂow-based models is their cumbersome training process. Since ﬂow architectures require the determinant of the Jacobian to be tractable for a given transformation, their optimization needs to be overly constrained. Moreover, ﬂow-based methods cannot be trained on probability distributions without compact support. For instance, it is not possible to train ﬂow-based model on 3D ball since computing a cost function using log-likelihood returns inﬁnity as a result and can therefore lead to numerical instability of the entire training procedure. Moreover, ﬂow-based models require the dimensionality of input and output data to be identical. Last but not least, these architectures need a signiﬁcant amount of parameter and structure ﬁne-tuning to work.

In this paper, we propose to address the above shortcomings of the ﬂow models by introducing a novel architecture that builds on the approach of (Zamorski et al., 2018) and extends it with a hyper network (Ha et al., 2016; Klocek et al., 2019) that outputs weights of a generative model, the so-called target network. The target network can then be used to create an arbitrary number of points (depending on its architecture returned by a hyper network), instead of ﬁxed-size sets. Fig. 2 shows the overview of our method in comparison to the baseline approaches. Contrary to the ﬂowbased models, our method dubbed Hyper Cloud1 is much simpler conceptually and more general as it can be used to adapt any Point Net model to generate continuous output representation. Furthermore, it is much easier to train than the competing algorithms, as it requires a smaller number of hyperparameters and does not put any constraints on the input probability distribution and its Jacobian. Finally, as presented in Fig. 3, our method returns a continuous mesh representation of 3D objects at virtually no cost in the quality of reconstructions. To the best of our knowledge, this is the ﬁrst time a hyper network is used in the context of 3D point cloud generation, and we believe it opens a new research path into understanding and processing this type of data.

The contributions of this work can be summarized as follows: ﬁrstly, we introduce a novel yet general method that

1We make our implementation available at https:// github.com/gmum/3d-point-clouds-Hyper Cloud

Figure 2. Top: The baseline approach for generating 3D point clouds returns a ﬁxed number of points (Zamorski et al., 2018). Bottom: Our Hyper Cloud method leverages a hypernetwork architecture that takes a 3D point cloud as an input while returning the parameters of the target network. Since the parameters of the target network are generated by hypernetwork, the output dataset can be variable in size. As a result, we obtain a continuous parametrization of the object s surface and a more powerful representation of its mesh.

Figure 3. Scheme of producing mesh representations with Hyper Cloud. When using 3D ball distribution, our method can generate 3D point clouds ﬁlled with data points, while when given 3D sphere distribution it transforms samples from the sphere to surfaces of 3D objects - a feature highly desirable in the context of 3D mesh rendering.

builds varied-size representations of point clouds that can be output by any model. Secondly, we achieve this by mapping the probability distributions to 3D models with generative target networks trained by a hypernetwork introduced in this work. Lastly, our approach offers a continuous mesh

Hypernetwork Approach to Generating Point Clouds

representation of 3D objects that can be used to render their surfaces directly, as shown in Fig. 1.

The remainder of this paper is structured as follows. Sec. 2 discusses related works. In Sec. 3 we introduce our Hyper Cloud approach and describe it in details. Sec. 4 presents the results of evaluations and we conclude this work in Sec. 5.

2. Related Work

Introducing deep learning in the context of 3D point cloud representations allowed to improve performance in various discriminative tasks including classiﬁcation (Qi et al., 2017a;b; Yang et al., 2018b; Zaheer et al., 2017) and segmentation (Qi et al., 2017a; Shoef et al., 2019). Despite those successes, generating 3D point clouds with deep learning models remains a challenging task.

Due to the irregular format of point cloud representation, most researchers transform such data to regular 3D voxel grids or collections of images. In (Wu et al., 2015), the authors propose the voxelized representation of an input point cloud. Other approaches use multi-view 2D images (Su et al., 2015) or occupancy grid calculation (Ji et al., 2012; Maturana & Scherer, 2015). Modeling volumetric objects in a general-adversarial manner is also considered in (Wu et al., 2016) for the 3D-GAN model.

Another approach to generative models for point cloud converts a point distribution to a N 3 matrix by sampling a pre-deﬁned number of N points from the distribution so that existing generative models are applicable. Such a solution can be applied in the VAE framework (Gadelha et al., 2018) as well as in adversarial auto-encoders (AAEs) (Zamorski et al., 2018).

In the above methods, auto-encoders and GANs are trained with loss functions that optimize directly the distance between two point sets, e.g. using Chamfer distance (CD) or earth mover s distance (EMD). In (Sun et al., 2018), authors apply auto-regressive models (Van den Oord et al., 2016) with a discrete point distribution to generate one point at a time, also using a ﬁxed number of points per shape.

All the above methods learn to produce a ﬁxed number of points for each shape, but they do not parametrize a surface of the shapes. Treating a point cloud as a ﬁxed-dimensional matrix has several drawbacks. First, the model is restricted to generate a ﬁxed number of points. Getting more points for a particular shape requires separate up-sampling models such as (Yifan et al., 2019; Yu et al., 2018).

In (Yang et al., 2019), authors propose a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Point Flow uses twolevel of distributions where the ﬁrst level is the distribution of shapes, and the second level is the distribution of points

given a shape. Point Flow uses continuous normalizing ﬂow (Chen et al., 2018; Grathwohl et al., 2018) for both of these tasks.

Instead of directly parametrizing the distribution of points in a shape, Point Flow models this distribution as an invertible parameterized transformation of 3D points from a prior distribution (e.g., a 3D Gaussian). Intuitively, under this model, generating points for a given shape involves sampling points from a generic Gaussian prior and then moving them according to this parameterized transformation to their new location in the target shape. Such solution has many advantages over the classical approaches, which only produce a cloud of points, nevertheless is is limited in multiple ways. The most important limitation is the fact that they use log-likelihood as a cost function, and, in consequence, cannot be trained on probability distributions with compact support. This signiﬁcantly reduces the utility of ﬂow-based models as, for instance, using a 3D ball distribution as a prior returns inﬁnite values and therefore leads to numerical instability of training. In this work, we show that once this constraint is dropped thanks to using a fully-connected neural network we can directly model 3D point cloud surfaces and hence create their continuous mesh representations.

3. Hyper Cloud: Hypernetwork for Generating 3D Point Clouds

In this section, we present our Hyper Cloud model for generating 3D point clouds. Hyper Cloud encompasses previously introduced approaches: the auto-encoder based generative model proposed in (Zamorski et al., 2018) and the hypernetwork proposed in (Ha et al., 2016). Before we present our solution, we will brieﬂy describe these two approaches.

Adversarial Auto-encoders for 3D Point Clouds Let us start with the auto-encoder architecture for 3D point cloud. Let X = {Xi}i=1,...,n = {(xi, yi, zi)}i=1,...,n be a given dataset containing point clouds. The basic aim of autoencoder is to transport the data through a typically, but not necessarily, lower dimensional latent space Z RD while minimizing the reconstruction error. Thus, we search for an encoder E : X Z and decoder D : Z X functions, which minimizes the reconstruction error between Xi and its reconstructions D(EXi).

For point cloud representation, the crucial step is to deﬁne proper reconstruction loss that can be used in the autoencoding framework. In the literature, two common distance measures are successively applied for reconstruction purposes: Earth Mover s (Wasserstein) Distance (Rubner et al., 2000) and Chamfer pseudo-distance (Tran, 2013).

Earth Mover s Distance (EMD) is a metric between two distributions based on the minimal cost that must be paid to

Hypernetwork Approach to Generating Point Clouds

transform one distribution into the other. For two equally sized subsets X1 R3 and X2 R3 their EMD is deﬁned as:

EMD(X1, X2) = min φ:X1 X2

x X1 c(x, φ(x))

where φ is a bijection and c(x, φ(x)) is the cost function and can be deﬁned as:

c(x, φ(x)) = 1

2 x φ(x) 2 2.

Chamfer pseudo-distance (CD): measures the squared distance between each point in one set to its nearest neighbor in the other set:

CD(X1, X2) =P

x X1 min y X2 x y 2 2+P

x X2 min y X1 x y 2 2.

Auto-encoder based generative model is a classical autoencoder model with a modiﬁed cost function, which forces the model to be generative, i.e., ensures that the data transported to the latent space comes from the prior distribution (typically Gaussian) (Kingma & Welling, 2013; Tolstikhin et al., 2017; Tabor et al., 2018). Thus, to construct a generative auto-encoder model, we add to its cost function a measure of the distance of a given sample from prior distribution.

Variational Auto-encoderss (VAE) are generative models that are capable of learning approximated data distribution by applying variational inference (Kingma & Welling, 2013). To ensure that the data transported to latent space Z are distributed according to standard normal density, we add the distance from standard multivariate normal density:

cost(X; E, D)=Err(X; D(EX))+λDKL(EX, N(0, I)),

where DKL is the Kullback Leibler divergence (Kullback & Leibler, 1951).

The main limitation of VAE models is that the regularization term requires particular prior distribution to make KL divergence tractable. In order to deal with that limitation, the authors of (Makhzani et al., 2015) introduce Adversarial Auto-encoder (AAE) that utilize adversarial training to force a particular distribution on Z space. The model assumes that an additional neural network - discriminator, which is responsible for distinguishing between fake and true samples, where the true samples are sampled from an assumed prior distribution and fake samples are generated via an encoding network.

In (Zamorski et al., 2018), authors propose an approach to Adversarial Auto-encoders dedicated to the 3D point clouds. Because the input of the model is a set of points, they use as

encoder E Point Net model (Qi et al., 2017a) that is invariant to permutations. We receive the same distribution for all possible orderings of points from X. Since discriminator is not permutation invariant mapping D (as it is a simple MLP model), authors utilize an additional function that provides one-to-one mapping for the points stored in X.

The probability distribution assumed on latent space can be more complex than N(0, I) and not given in an explicit form. Some autoencoders try to learn some more sophisticated distributions directly from data. Such solutions may utilize techniques like Vamp Prior (Tomczak & Welling, 2017) or incorporate continuous (Yang et al., 2019) or discrete (Berg et al., 2018) normalizing ﬂows.

Due to large techniques of enforcing probability distribution on the latent space, the cost function of the model can be formulated in the more general form:

cost(X; E, D) = Err(X; D(EX)) + Reg(EX, P), (1)

where Err is Earth Mover s (Wasserstein) Distance or Chamfer pseudo-distance and Reg is a function that forces latent space to be from some known or trainable distribution P. For known distributions like Gaussian, Kullback Leibler divergence or adversarial training can be used for regularization.

In our work, we propose to enrich the presented regularized autoencoder by replacing the decoder with the hypernetwork. The goal of the hypernetwork is to transform the latent representation of the point cloud to the weights of the so-called target network. The goal of the target network is to transform the samples from assumed prior to the points that represent 3D shapes without assuming the arbitrary number of points. Roughly speaking, in our case, hypernetwork produces a parametrization of the respective generative model.

Hyper-network Hyper-networks, introduced in (Ha et al., 2016), are deﬁned as neural models that generate weights for a separate target network solving a speciﬁc task. The authors aim to reduce the number of trainable parameters by designing a hyper-network with a smaller number of parameters than the target network. Making an analogy between hyper-networks and generative models, the authors of (Sheikh et al., 2017), use this mechanism to generate a diverse set of target networks approximating the same function.

Hyper-networks can also be used for functional representations of images (Klocek et al., 2019). In such concept by a functional (or deep) representation of an image, authors understand a function (neural network) I : R2 R3 which given a point (with arbitrary coordinates) (x, y) in the plane returns the point in [0, 1]3 representing the RGB values of the color of the image at (x, y).

Hypernetwork Approach to Generating Point Clouds

Figure 4. Interpolations between two 3D point clouds and its mesh representations.

Figure 5. 3D point clouds and their mesh representations produced by Hyper Cloud.

Hypernetwork Approach to Generating Point Clouds

Hyper Cloud Inspired by the above methods, we propose our Hyper Cloud model that uses hyper-network to output weights of generative network to create 3D point clouds, instead of generating them directly with the decoder, as done in (Zamorski et al., 2018). More speciﬁcally, we present parameterization of the surface of 3D objects as a function S : R3 R3, which given a point from the prior distribution (x, y, z) returns the point on the surface of the objects. Roughly speaking, instead of producing a 3D point cloud, we would like to produce many neural networks (a different neural network for each object) that model surfaces of objects.

In practice, we have one neural network architecture that uses different weights for each 3D object. More precisely, we model function Tθ : R3 R3 (neural network with weights θ), which takes an element from the prior distribution P and transfers it on an element on the surface of the object. In our work, we use the transformation between uniform distribution on the 3D ball and the object. This choice of distribution allows one to create a continuous mesh representation. The key idea behind this is that the distribution does not have compact support. Roughly speaking, Gaussian distribution does not have a smooth border.

In consequence, we can produce as many points as we need (we can sample an arbitrary number of points from the uniform distribution of the unit ball and transfer them by target network). Thanks to the target network, we can train our model on point clouds containing a different number of points.

Furthermore, we can produce a continuous mesh representation of the object. All elements from the ball are transformed into a 3D object. In consequence, the unit sphere is transformed into the surface of the object. Now we can produce meshes without a secondary mesh rendering procedure. It is obtained by simply feeding our neural network by the vertices of a sphere mesh, see Fig 3. As a result, we obtain a high-quality meshes of 3D objects. The sharpness of the borders is a direct consequence of compact support probability distribution of the input prior. Since ﬂow-based models cannot handle this family of priors and require inﬁnite support distributions, the representations generated with those models are lower quality.

The target network is not trained directly. We use a hypernetwork Hφ : R3 X θ, which for an point-cloud X R3 returns weights θ to the corresponding target network Tθ. Thus, a point cloud X is represented by a function

T((x, y, x); θ) = T((x, y, x); Hφ(X)).

To use the above model, we need to train the weights φ of the hypernetwork. For this purpose, we minimize the distance between point clouds like Chamfer distance (CD)

or earth mover s distance (EMD) over the training set of points clouds. More precisely, we take an input point cloud X R3 and pass it to Hφ. The hypernetwork returns weights θ to target network Tθ. Next, the input point cloud X is compared with the output from the target network Tθ (we sample the correct number of points from the prior distribution and transfer them by target network). As a hypernetwork, we use a permutation invariant encoder that is based on Point Net architecture (Qi et al., 2017a) and modiﬁed decoder to produce weight instead of row points. The architecture of Tθ consists of: an encoder (E) which is a Point Net-like network that transports the data to lowerdimensional latent space Z RD and a decoder (D) (fullyconnected network), which transfers latent space to the vector of weights for the target network. In our framework hypernetwork Tθ(X) represents our autoencoder structure D(EX). Assuming Tθ(X) = D(EX), we train our model by minimizing the cost function given by equation (1).

Observe, that we only train a single neural model (hypernetwork), which allows us to produce a great variety of functions at test time. In consequence, we might expect that target networks for similar point cloud will be similar (see Sec. 4 for details). We are able to produce smooth interpolation by using hypernetwork.

Figure 6. Thanks to using hypernetwork architecture, we can work with one object (distribution of points on a single 3D point cloud). One possible application is interpolation in the target network instead. By taking two samples on the uniform ball and its interpolation we can construct interpolation between points on the surface of the object.

4. Experiments

In this section, we describe the experimental results of the proposed generative models in various tasks, including 3D points mesh generation and interpolation. In the ﬁrst subsection, we show that our model inherited reconstruction and generative capabilities from models based on generating a ﬁxed number of points. Then we show that we are able to produce continuous mesh representation.

Metrics Following the methodology for evaluating generative ﬁdelity and diversiﬁcation among samples provided in (Achlioptas et al., 2017) and (Yang et al., 2019), we utilize

Hypernetwork Approach to Generating Point Clouds

2*Cathegory 2*Methods 2*JSD MMD COV 1-NNA CD EMD CD EMD CD EMD 7*Airplane r-GAN 7.44 0.261 5.47 42.72 18.02 93.58 99.51 l-GAN (CD) 4.62 0.239 4.27 43.21 21.23 86.30 97.28 l-GAN (EMD) 3.61 0.269 3.29 47.90 50.62 87.65 85.68 PC-GAN 4.63 0.287 3.57 36.46 40.94 94.35 92.32 Point Flow 4.92 0.217 3.24 46.91 48.40 75.68 75.06 Hyper Cloud (ours) 4.84 0.266 3.28 39.75 43.70 93.80 88.95 Training set 6.61 0.226 3.08 42.72 49.14 70.62 67.53 7*Chair r-GAN 11.5 2.57 12.8 33.99 9.97 71.75 99.47 l-GAN (CD) 4.59 2.46 8.91 41.39 25.68 64.43 85.27 l-GAN (EMD) 2.27 2.61 7.85 40.79 41.69 64.73 65.56 PC-GAN 3.90 2.75 8.20 36.50 38.98 76.03 78.37 Point Flow 1.74 2.42 7.87 46.83 46.98 60.88 59.89 Hyper Cloud (ours) 2.73 2.56 7.84 41.54 46.67 68.20 68.80 Training set 1.50 1.92 7.38 57.25 55.44 59.67 58.46 7*Car r-GAN 12.8 1.27 8.74 15.06 9.38 97.87 99.86 l-GAN (CD) 4.43 1.55 6.25 38.64 18.47 63.07 88.07 l-GAN (EMD) 2.21 1.48 5.43 39.20 39.77 69.74 68.32 PC-GAN 5.85 1.12 5.83 23.56 30.29 92.19 90.87 Point Flow 0.87 0.91 5.22 44.03 46.59 60.65 62.36 Hyper Cloud (ours) 3.09 1.07 5.38 40.05 40.05 84.65 77.27 Train set 0.86 1.03 5.33 48.30 51.42 57.39 53.27

Table 1. Generation results. MMD-CD scores are multiplied by 103; MMD-EMD scores and JSDs are multiplied by 102.

the following criteria for evaluation: Jensen-Shannon Divergence, Coverage, Minimum Matching Distance 1-nearest Neighbor Accuracy.

Jensen-Shannon Divergence (JSD): a measure of the distance between two empirical distributions P and Q, deﬁned as:

JSD(P Q)= KL(P M)+KL(Q M)

2 , where M = P +Q

Coverage (COV): a measure of generative capabilities in terms of richness of generated samples from the model. For two point cloud sets X1, X2 R coverage is deﬁned as a fraction of points in X2 that are in the given metric the nearest neighbor to some points in X1.

Minimum Matching Distance (MMD): Since COV only takes the closest point clouds into account and does not depend on the distance between the matchings additional metric was introduced. For point cloud sets X1, X2 MMD is a measure of similarity between point clouds in X1 to those in X2.

1-Nearest Neighbor Accuracy (1-NNA) is a testing procedure characteristic for evaluating GANs. We consider two sets: set Sg composed of generated point clouds and set of test (reference) point clouds, Sr. We pick some generated point cloud X from Sg and ﬁnd the corresponding nearest neighbor in S X = Sr S Sg {X}, the set that aggregates both training and sampled shapes excluding the considered point cloud X. The 1-NNA is the leave-one-out accuracy of

the 1-NN classiﬁer:

X Sg 1[NX Sg]+P

Y Sr 1[NY Sr]

|Sg|+|Sr| .

For each sample, the 1-NN classiﬁer classiﬁes it as coming from Sr or Sg according to the label of its nearest sample. The perfect situation occurs when the classiﬁer is unable to distinguish between real and generated point clouds, which means that the value of the criterion is close to 50%.

We examine the generative capabilities of the provided Hyper Cloud model in comparison to the existing reference approaches. In this experiment, we follow the methodology provided in (Yang et al., 2019). For this particular experiment, we utilize the hypernetwork architecture trained with EMD reconstruction loss together with the continuous ﬂow on latent representation instead of simple KLD regularization. We compare the results with the existing solutions: raw-GAN (Achlioptas et al., 2017), latent-GAN (Achlioptas et al., 2017), PC-GAN (Li et al., 2018) and Point Flow (Yang et al., 2019). We train each model using point clouds from one of the three categories in the Shape Net dataset: airplane, chair, and car. We follow the exact evaluation pipeline provided in (Yang et al., 2019).

The results are presented in Table 1. The Hyper Cloud obtains comparable results to the other models that utilize EMD reconstruction loss with the advantage of sampling an arbitrary number of points. The model was outperformed by Point Flow that does not utilize EMD as reconstruction loss and is not directly capable of generating 3D meshes. Moreover, target network in Hypercloud is originally im-

Hypernetwork Approach to Generating Point Clouds

3DGAN Folding Net l-GAN (EMD) l-GAN (CD) l-GAN-2 (EMD) l-GAN-2 (CD) Point Flow Hyper Cloud MN40 (%) 83.3 88.4 84.0 84.5 87.0 86.7 86.8 84.7

Table 2. Unsupervised feature learning. Models are ﬁrst trained on Shape Net to learn shape representations, which are then evaluated on Model Net40 (MN40) by comparing the accuracy of off-the-shelf SVMs trained using the learned representations. l-GAN-2 was trained and evalauted using Point Flow experimental settings.

2*Sphere R 2*JSD MMD COV CD EMD CD EMD Airplane Point Flow

R=2.795 22.26 0.49 6.65 44.69 20.74 R=3.136 26.46 0.60 6.89 39.50 19.01 R=3.368 29.65 0.68 6.84 40.49 16.79 Hyper Cloud (ours)

R=1 9.51 0.45 5.29 30.60 28.88 Chair Point Flow

R=2.795 19.28 4.28 13.38 36.85 20.84 R=3.136 22.52 4.89 14.47 32.47 17.22 R=3.368 24.68 5.36 14.97 31.41 17.06 Hyper Cloud (ours)

R=1 4.32 2.81 9.32 40.33 40.63 Car Point Flow

R=2.795 16.59 1.6 8.00 20.17 17.04 R=3.136 20.21 1.75 7.80 21.59 17.32 R=3.368 24.10 1.96 8.35 18.75 17.04 Hyper Cloud (ours)

R=1 5.20 1.11 6.54 37.21 28.40

Table 3. The values of quality measures of 3D representations obtained by sampling from sphere of a given radius R for airplane, chair and car shapes. It can be seen that Hyper Cloud preserves the good quality of sampled point clouds, while Point Flow has difﬁculties in obtaining good quality representations from the sphere.

plemented as a simple multilayer perceptron (MLP), contrary to Point Flow which uses more complex continuous ﬂow. We expect, that by substituting MLP with continuous ﬂow model we can achieve the results comparable with Point Flow. Finally, the inference speed of our approach is reduced from 0.27s per one Shape Net sample for Point Flow to 0.08s (3.4x improvement) for our Hyper Cloud.

Generation of 3D meshes The main advantage of our method comparing to reference solutions is the ability to generate both 3D point clouds and meshes without any postprocessing stage. In Fig. 5, we present point cloud as well as mesh representation generated by the same model. Thanks to using a uniform distribution on the 3D ball, we can easily construct mesh. All elements from the ball are transformed into a 3D object. In consequence, the unit sphere is transformed into surface of the object. As it was mentioned, we can produce meshes without a secondary meshing proce-

dure. It is obtained by propagating the triangulation of the 3D sphere through the target network, see Fig 3.

In the case of Gaussian prior, we can use a similar procedure, but it is nontrivial to select the optimal sphere radius, which will be used by the generation of mesh (contrary to Hyper Cloud, in Point Flow there is no default for radius R). If the chosen radius is too small, the constructed mesh lies inside the point cloud, and consequently, we lose small outlying elements of the object, e.g., chair legs. On the other hand, if the chosen sphere radius is large, some small elements of the 3D object will be merged, e.g., four legs of a chair will be joined into one.

For evaluation of the quality of mesh grid representation, we propose the following experiment. Instead of sampling the points from the assumed prior distribution, we sample them from a given surface (sphere of the assumed radius). Next, we calculate the standard quality measures of generated point clouds considered in the previous experiment. Since all models except Point Flow listed in Tab. 1 work only on a ﬁxed number of points we compare our results only with Point Flow.

As it was mention above, we can use the Point Flow model to produce mesh representation in a similar way by feeding the target network by triangulation on a sphere. In our experiment, consistently with the standard used for hypothesis testing, we use 95%, 98% and 99% conﬁdence spheres for 3D Gaussian distribution, see Tab. 3. As we can see, the default Gaussian prior is not suitable for producing a continuous representation of the boundary. Moreover, the seemingly natural exchange (with accordance with our approach) of the normal distribution onto the uniform distribution on the ball will not work since ﬂow methods use log-likelihood as a cost function, and consequently, it is impossible to use prior density with compact support.

Unsupervised representation learning In this experiment we evaluate the quality of latent space representation of our model. We follow the experimental settings from previous works (Achlioptas et al., 2017; Yang et al., 2019) and train our model using full Shape Net dataset. Next, we evalaute the quality of latent representation by training a linear SVM classiﬁer on top of it using Model Net40 dataset. We provide the results of empirical evaluation of our model in Table 2. Hyper Cloud achieved the accuracy that is comparable to the results achieved by original version of l-GAN but was worse than results achieved by Point Flow and l-

Hypernetwork Approach to Generating Point Clouds

GAN trained with a new settings. However, in our experiments we did not use preprocessed Model Net dataset in the same pipeline as in Point Flow, but in the way recommended in (Achlioptas et al., 2017).

Interpolation In our model, we can construct two types of interpolation. Since we have two different prior distributions: Gaussian in hyper network architecture (latent of auto-encoder) and uniform distribution on the unit sphere in the target network, see Fig. 2. First of all, we can take two 3d objects and obtain a smooth transition between them, see Fig. 4. For each point cloud, we can generate mesh representation. Therefore we can also produce interpolation between meshes.

Thanks to using hypernetwork architecture, we can work with one object (distribution of points on a single 3D point cloud). One possible application is interpolation in the target network instead of the classical approach in the latent space of auto-encoder, see Fig. 6. By taking two samples on the uniform ball and its interpolation, we can construct interpolation between points on the surface of the object.

5. Conclusions

In this work, we presented a novel approach to represent point clouds of 3D objects with parameters of target networks trained by a hypernetwork as generative models. More speciﬁcally, we are able to build variable size representations of point clouds not only when they are inputted into the model, but also when they are returned as an output. Contrary to the existing methods, our approach is not constrained by the assumptions enforced on the objective functions in the case of the ﬂow-based architectures, such as tractability of Jacobian determinants. Finally, our Hyper Cloud method offers a general framework that allows to adapt any Point Net model to build a continuous representation of the output vector. In this work we focused speciﬁcally on mesh representations of 3D objects, presenting that our approach give empirically better results on the task of realistic mesh generation. Nevertheless, thanks to the generality of our proposed architecture that encompasses many existing ones, it can be used in a multitude of real-life applications and it can open new areas of research related to generative models.

6. Acknowledgements

The work of P. Spurek was supported by the National Centre of Science (Poland) Grant No. 2018/31/B/ST6/00993. The work of S. Winczowski was supported by the National Centre of Science (Poland) Grant No. 2019/33/B/ST6/00894. The work of J. Tabor was supported by the National Centre of Science (Poland) Grant No. 2017/25/B/ST6/01271. The

work of T. Trzcinski was supported by the National Centre of Science (Poland) Grant No. 2016/21/D/ST6/01946 as well as the Foundation for Polish Science Grant No. POIR.04.04.00-00-14DE/18-00 co-ﬁnanced by the European Union under the European Regional Development Fund.

Achlioptas, P., Diamanti, O., Mitliagkas, I., and Guibas, L. Learning representations and generative models for 3d point clouds. ar Xiv preprint ar Xiv:1707.02392, 2017.

Berg, R. v. d., Hasenclever, L., Tomczak, J. M., and Welling, M. Sylvester normalizing ﬂows for variational inference. ar Xiv preprint ar Xiv:1803.05649, 2018.

Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571 6583, 2018.

Gadelha, M., Wang, R., and Maji, S. Multiresolution tree networks for 3d point cloud processing. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103 118, 2018.

Grathwohl, W., Chen, R. T., Betterncourt, J., Sutskever, I., and Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. ar Xiv preprint ar Xiv:1810.01367, 2018.

Ha, D., Dai, A., and Le, Q. V. Hypernetworks. ar Xiv preprint ar Xiv:1609.09106, 2016.

Ji, S., Xu, W., Yang, M., and Yu, K. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221 231, 2012.

Kehoe, B., Patil, S., Abbeel, P., and Goldberg, K. A survey of research on cloud robotics and automation. IEEE Transactions on automation science and engineering, 12 (2):398 409, 2015.

Kingma, D. P. and Welling, M. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114, 2013.

Klocek, S., Maziarka, Ł., Wołczyk, M., Tabor, J., Nowak, J., and Smieja, M. Hypernetwork functional image representation. In International Conference on Artiﬁcial Neural Networks, pp. 496 510. Springer, 2019.

Kullback, S. and Leibler, R. A. On information and sufﬁciency. The annals of mathematical statistics, 22(1): 79 86, 1951.

Hypernetwork Approach to Generating Point Clouds

Li, C.-L., Zaheer, M., Zhang, Y., Poczos, B., and Salakhutdinov, R. Point cloud gan. ar Xiv preprint ar Xiv:1810.05795, 2018.

Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. ar Xiv preprint ar Xiv:1511.05644, 2015.

Maturana, D. and Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922 928. IEEE, 2015.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652 660, 2017a.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099 5108, 2017b.

Rubner, Y., Tomasi, C., and Guibas, L. J. The earth mover s distance as a metric for image retrieval. International journal of computer vision, 40(2):99 121, 2000.

Sheikh, A.-S., Rasul, K., Merentitis, A., and Bergmann, U. Stochastic maximum likelihood optimization via hypernetworks. ar Xiv preprint ar Xiv:1712.01141, 2017.

Shoef, M., Fogel, S., and Cohen-Or, D. Pointwise: An unsupervised point-wise feature learning network. ar Xiv preprint ar Xiv:1901.04544, 2019.

Stypułkowski, M., Zamorski, M., Zieba, M., and Chorowski, J. Conditional invertible ﬂow for point cloud generation. ar Xiv preprint ar Xiv:1910.07344, 2019.

Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945 953, 2015.

Sun, Y., Wang, Y., Liu, Z., Siegel, J. E., and Sarma, S. E. Pointgrow: Autoregressively learned point cloud generation with self-attention. ar Xiv preprint ar Xiv:1810.05591, 2018.

Tabor, J., Knop, S., Spurek, P., Podolak, I., Mazur, M., and Jastrzebski, S. Cramer-wold autoencoder. ar Xiv preprint ar Xiv:1805.09235, 2018.

Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein auto-encoders. ar Xiv preprint ar Xiv:1711.01558, 2017.

Tomczak, J. M. and Welling, M. Vae with a vampprior. ar Xiv preprint ar Xiv:1705.07120, 2017.

Tran, M.-P. 3d contour closing: A local operator based on chamfer distance transformation. 2013.

Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al. Conditional image generation with pixelcnn decoders. In Advances in neural information processing systems, pp. 4790 4798, 2016.

Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems, pp. 82 90, 2016.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912 1920, 2015.

Yang, B., Luo, W., and Urtasun, R. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7652 7660, 2018a.

Yang, G., Huang, X., Hao, Z., Liu, M.-Y., Belongie, S., and Hariharan, B. Pointﬂow: 3d point cloud generation with continuous normalizing ﬂows. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4541 4550, 2019.

Yang, Y., Feng, C., Shen, Y., and Tian, D. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206 215, 2018b.

Yifan, W., Wu, S., Huang, H., Cohen-Or, D., and Sorkine Hornung, O. Patch-based progressive 3d point set upsampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5958 5967, 2019.

Yu, L., Li, X., Fu, C.-W., Cohen-Or, D., and Heng, P.-A. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790 2799, 2018.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. Deep sets. In Advances in neural information processing systems, pp. 3391 3401, 2017.

Zamorski, M., Zieba, M., Klukowski, P., Nowak, R., Kurach, K., Stokowiec, W., and Trzci nski, T. Adversarial autoencoders for compact representations of 3d point clouds. ar Xiv preprint ar Xiv:1811.07605, 2018.