# hypernetwork_approach_to_generating_point_clouds__4c9a76bf.pdf Hypernetwork Approach to Generating Point Clouds Przemysław Spurek 1 Sebastian Winczowski 1 Jacek Tabor 1 Maciej Zamorski 2 3 Maciej Zieba 2 3 Tomasz Trzci nski 4 3 In this work, we propose a novel method for generating 3D point clouds that leverage properties of hyper networks. Contrary to the existing methods that learn only the representation of a 3D object, our approach simultaneously finds a representation of the object and its 3D surface. The main idea of our Hyper Cloud method is to build a hyper network that returns weights of a particular neural network (target network) trained to map points from a uniform unit ball distribution into a 3D shape. As a consequence, a particular 3D shape can be generated using point-by-point sampling from the assumed prior distribution and transforming sampled points with the target network. Since the hyper network is based on an auto-encoder architecture trained to reconstruct realistic 3D shapes, the target network weights can be considered a parametrization of the surface of a 3D shape, and not a standard representation of point cloud usually returned by competitive approaches. The proposed architecture allows finding meshbased representation of 3D objects in a generative manner while providing point clouds en pair in quality with the state-of-the-art methods. 1. Introduction Today many registration devices, such as LIDARs and depth cameras, are able to capture not only RGB channels, but also depth estimates. As a result, 3D objects registered by those devices and geometric data structures representing them, called point clouds, become increasingly important in contemporary computer vision applications, including 1Faculty of Mathematics and Computer Science, Jagiellonian University, Krak ow, Poland 2Wrocław University of Science and Technology, Wrocław, Poland 3Tooploox, Wrocław, Poland 4Warsaw University of Technology, Warsaw, Poland. Correspondence to: Przemysław Spurek , Tomasz Trzcinski . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). Figure 1. Mesh representations generated by our Hyper Cloud method. Contrary to the existing methods that return point cloud representations sparsely distributed in 3D space, our approach allows to create a continuous 3D object representation in the form of high quality meshes. autonomous driving (Yang et al., 2018a) or robotic manipulation (Kehoe et al., 2015). To enable processing of point clouds, researchers typically transform them into regular 3D voxel grids or collections of images (Su et al., 2015; Wu et al., 2015). This, however, increases memory footprint of object representations and leads to significant information losses. On the other hand, representing 3D objects with the parameters of their surfaces is not trivial due to the complexity of mesh representations and combinatorial irregularities. Last but not least, point clouds can contain a variable number of data points corresponding to one object and registered at various angles, which requires for the methods that process them to be permutation and rotation invariant. One way of addressing the above challenges related to point cloud representations is to subsample the point clouds and enforce permutation invariance within the model architecture, as it was done in Deep Sets (Zaheer et al., 2017) or Point Net (Qi et al., 2017a;b). Although it works perfectly fine when point clouds are given as an input of the model, Hypernetwork Approach to Generating Point Clouds it is not obvious how to apply this approach for variable size outputs. Recently introduced family of methods solve for this problem by relying on generative models that return probability distribution of the points on the object surface, instead of an exact set of points (Yang et al., 2019; Stypułkowski et al., 2019). The most successful methods that follow this path, such as Point Flow (Yang et al., 2019) and Conditioned Invertible Flow (Stypułkowski et al., 2019), are based on the flow architecture that allows obtaining a representation of 3D object surfaces. The main limitation of the flow-based models is their cumbersome training process. Since flow architectures require the determinant of the Jacobian to be tractable for a given transformation, their optimization needs to be overly constrained. Moreover, flow-based methods cannot be trained on probability distributions without compact support. For instance, it is not possible to train flow-based model on 3D ball since computing a cost function using log-likelihood returns infinity as a result and can therefore lead to numerical instability of the entire training procedure. Moreover, flow-based models require the dimensionality of input and output data to be identical. Last but not least, these architectures need a significant amount of parameter and structure fine-tuning to work. In this paper, we propose to address the above shortcomings of the flow models by introducing a novel architecture that builds on the approach of (Zamorski et al., 2018) and extends it with a hyper network (Ha et al., 2016; Klocek et al., 2019) that outputs weights of a generative model, the so-called target network. The target network can then be used to create an arbitrary number of points (depending on its architecture returned by a hyper network), instead of fixed-size sets. Fig. 2 shows the overview of our method in comparison to the baseline approaches. Contrary to the flowbased models, our method dubbed Hyper Cloud1 is much simpler conceptually and more general as it can be used to adapt any Point Net model to generate continuous output representation. Furthermore, it is much easier to train than the competing algorithms, as it requires a smaller number of hyperparameters and does not put any constraints on the input probability distribution and its Jacobian. Finally, as presented in Fig. 3, our method returns a continuous mesh representation of 3D objects at virtually no cost in the quality of reconstructions. To the best of our knowledge, this is the first time a hyper network is used in the context of 3D point cloud generation, and we believe it opens a new research path into understanding and processing this type of data. The contributions of this work can be summarized as follows: firstly, we introduce a novel yet general method that 1We make our implementation available at https:// github.com/gmum/3d-point-clouds-Hyper Cloud Figure 2. Top: The baseline approach for generating 3D point clouds returns a fixed number of points (Zamorski et al., 2018). Bottom: Our Hyper Cloud method leverages a hypernetwork architecture that takes a 3D point cloud as an input while returning the parameters of the target network. Since the parameters of the target network are generated by hypernetwork, the output dataset can be variable in size. As a result, we obtain a continuous parametrization of the object s surface and a more powerful representation of its mesh. Figure 3. Scheme of producing mesh representations with Hyper Cloud. When using 3D ball distribution, our method can generate 3D point clouds filled with data points, while when given 3D sphere distribution it transforms samples from the sphere to surfaces of 3D objects - a feature highly desirable in the context of 3D mesh rendering. builds varied-size representations of point clouds that can be output by any model. Secondly, we achieve this by mapping the probability distributions to 3D models with generative target networks trained by a hypernetwork introduced in this work. Lastly, our approach offers a continuous mesh Hypernetwork Approach to Generating Point Clouds representation of 3D objects that can be used to render their surfaces directly, as shown in Fig. 1. The remainder of this paper is structured as follows. Sec. 2 discusses related works. In Sec. 3 we introduce our Hyper Cloud approach and describe it in details. Sec. 4 presents the results of evaluations and we conclude this work in Sec. 5. 2. Related Work Introducing deep learning in the context of 3D point cloud representations allowed to improve performance in various discriminative tasks including classification (Qi et al., 2017a;b; Yang et al., 2018b; Zaheer et al., 2017) and segmentation (Qi et al., 2017a; Shoef et al., 2019). Despite those successes, generating 3D point clouds with deep learning models remains a challenging task. Due to the irregular format of point cloud representation, most researchers transform such data to regular 3D voxel grids or collections of images. In (Wu et al., 2015), the authors propose the voxelized representation of an input point cloud. Other approaches use multi-view 2D images (Su et al., 2015) or occupancy grid calculation (Ji et al., 2012; Maturana & Scherer, 2015). Modeling volumetric objects in a general-adversarial manner is also considered in (Wu et al., 2016) for the 3D-GAN model. Another approach to generative models for point cloud converts a point distribution to a N 3 matrix by sampling a pre-defined number of N points from the distribution so that existing generative models are applicable. Such a solution can be applied in the VAE framework (Gadelha et al., 2018) as well as in adversarial auto-encoders (AAEs) (Zamorski et al., 2018). In the above methods, auto-encoders and GANs are trained with loss functions that optimize directly the distance between two point sets, e.g. using Chamfer distance (CD) or earth mover s distance (EMD). In (Sun et al., 2018), authors apply auto-regressive models (Van den Oord et al., 2016) with a discrete point distribution to generate one point at a time, also using a fixed number of points per shape. All the above methods learn to produce a fixed number of points for each shape, but they do not parametrize a surface of the shapes. Treating a point cloud as a fixed-dimensional matrix has several drawbacks. First, the model is restricted to generate a fixed number of points. Getting more points for a particular shape requires separate up-sampling models such as (Yifan et al., 2019; Yu et al., 2018). In (Yang et al., 2019), authors propose a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. Point Flow uses twolevel of distributions where the first level is the distribution of shapes, and the second level is the distribution of points given a shape. Point Flow uses continuous normalizing flow (Chen et al., 2018; Grathwohl et al., 2018) for both of these tasks. Instead of directly parametrizing the distribution of points in a shape, Point Flow models this distribution as an invertible parameterized transformation of 3D points from a prior distribution (e.g., a 3D Gaussian). Intuitively, under this model, generating points for a given shape involves sampling points from a generic Gaussian prior and then moving them according to this parameterized transformation to their new location in the target shape. Such solution has many advantages over the classical approaches, which only produce a cloud of points, nevertheless is is limited in multiple ways. The most important limitation is the fact that they use log-likelihood as a cost function, and, in consequence, cannot be trained on probability distributions with compact support. This significantly reduces the utility of flow-based models as, for instance, using a 3D ball distribution as a prior returns infinite values and therefore leads to numerical instability of training. In this work, we show that once this constraint is dropped thanks to using a fully-connected neural network we can directly model 3D point cloud surfaces and hence create their continuous mesh representations. 3. Hyper Cloud: Hypernetwork for Generating 3D Point Clouds In this section, we present our Hyper Cloud model for generating 3D point clouds. Hyper Cloud encompasses previously introduced approaches: the auto-encoder based generative model proposed in (Zamorski et al., 2018) and the hypernetwork proposed in (Ha et al., 2016). Before we present our solution, we will briefly describe these two approaches. Adversarial Auto-encoders for 3D Point Clouds Let us start with the auto-encoder architecture for 3D point cloud. Let X = {Xi}i=1,...,n = {(xi, yi, zi)}i=1,...,n be a given dataset containing point clouds. The basic aim of autoencoder is to transport the data through a typically, but not necessarily, lower dimensional latent space Z RD while minimizing the reconstruction error. Thus, we search for an encoder E : X Z and decoder D : Z X functions, which minimizes the reconstruction error between Xi and its reconstructions D(EXi). For point cloud representation, the crucial step is to define proper reconstruction loss that can be used in the autoencoding framework. In the literature, two common distance measures are successively applied for reconstruction purposes: Earth Mover s (Wasserstein) Distance (Rubner et al., 2000) and Chamfer pseudo-distance (Tran, 2013). Earth Mover s Distance (EMD) is a metric between two distributions based on the minimal cost that must be paid to Hypernetwork Approach to Generating Point Clouds transform one distribution into the other. For two equally sized subsets X1 R3 and X2 R3 their EMD is defined as: EMD(X1, X2) = min φ:X1 X2 x X1 c(x, φ(x)) where φ is a bijection and c(x, φ(x)) is the cost function and can be defined as: c(x, φ(x)) = 1 2 x φ(x) 2 2. Chamfer pseudo-distance (CD): measures the squared distance between each point in one set to its nearest neighbor in the other set: CD(X1, X2) =P x X1 min y X2 x y 2 2+P x X2 min y X1 x y 2 2. Auto-encoder based generative model is a classical autoencoder model with a modified cost function, which forces the model to be generative, i.e., ensures that the data transported to the latent space comes from the prior distribution (typically Gaussian) (Kingma & Welling, 2013; Tolstikhin et al., 2017; Tabor et al., 2018). Thus, to construct a generative auto-encoder model, we add to its cost function a measure of the distance of a given sample from prior distribution. Variational Auto-encoderss (VAE) are generative models that are capable of learning approximated data distribution by applying variational inference (Kingma & Welling, 2013). To ensure that the data transported to latent space Z are distributed according to standard normal density, we add the distance from standard multivariate normal density: cost(X; E, D)=Err(X; D(EX))+λDKL(EX, N(0, I)), where DKL is the Kullback Leibler divergence (Kullback & Leibler, 1951). The main limitation of VAE models is that the regularization term requires particular prior distribution to make KL divergence tractable. In order to deal with that limitation, the authors of (Makhzani et al., 2015) introduce Adversarial Auto-encoder (AAE) that utilize adversarial training to force a particular distribution on Z space. The model assumes that an additional neural network - discriminator, which is responsible for distinguishing between fake and true samples, where the true samples are sampled from an assumed prior distribution and fake samples are generated via an encoding network. In (Zamorski et al., 2018), authors propose an approach to Adversarial Auto-encoders dedicated to the 3D point clouds. Because the input of the model is a set of points, they use as encoder E Point Net model (Qi et al., 2017a) that is invariant to permutations. We receive the same distribution for all possible orderings of points from X. Since discriminator is not permutation invariant mapping D (as it is a simple MLP model), authors utilize an additional function that provides one-to-one mapping for the points stored in X. The probability distribution assumed on latent space can be more complex than N(0, I) and not given in an explicit form. Some autoencoders try to learn some more sophisticated distributions directly from data. Such solutions may utilize techniques like Vamp Prior (Tomczak & Welling, 2017) or incorporate continuous (Yang et al., 2019) or discrete (Berg et al., 2018) normalizing flows. Due to large techniques of enforcing probability distribution on the latent space, the cost function of the model can be formulated in the more general form: cost(X; E, D) = Err(X; D(EX)) + Reg(EX, P), (1) where Err is Earth Mover s (Wasserstein) Distance or Chamfer pseudo-distance and Reg is a function that forces latent space to be from some known or trainable distribution P. For known distributions like Gaussian, Kullback Leibler divergence or adversarial training can be used for regularization. In our work, we propose to enrich the presented regularized autoencoder by replacing the decoder with the hypernetwork. The goal of the hypernetwork is to transform the latent representation of the point cloud to the weights of the so-called target network. The goal of the target network is to transform the samples from assumed prior to the points that represent 3D shapes without assuming the arbitrary number of points. Roughly speaking, in our case, hypernetwork produces a parametrization of the respective generative model. Hyper-network Hyper-networks, introduced in (Ha et al., 2016), are defined as neural models that generate weights for a separate target network solving a specific task. The authors aim to reduce the number of trainable parameters by designing a hyper-network with a smaller number of parameters than the target network. Making an analogy between hyper-networks and generative models, the authors of (Sheikh et al., 2017), use this mechanism to generate a diverse set of target networks approximating the same function. Hyper-networks can also be used for functional representations of images (Klocek et al., 2019). In such concept by a functional (or deep) representation of an image, authors understand a function (neural network) I : R2 R3 which given a point (with arbitrary coordinates) (x, y) in the plane returns the point in [0, 1]3 representing the RGB values of the color of the image at (x, y). Hypernetwork Approach to Generating Point Clouds Figure 4. Interpolations between two 3D point clouds and its mesh representations. Figure 5. 3D point clouds and their mesh representations produced by Hyper Cloud. Hypernetwork Approach to Generating Point Clouds Hyper Cloud Inspired by the above methods, we propose our Hyper Cloud model that uses hyper-network to output weights of generative network to create 3D point clouds, instead of generating them directly with the decoder, as done in (Zamorski et al., 2018). More specifically, we present parameterization of the surface of 3D objects as a function S : R3 R3, which given a point from the prior distribution (x, y, z) returns the point on the surface of the objects. Roughly speaking, instead of producing a 3D point cloud, we would like to produce many neural networks (a different neural network for each object) that model surfaces of objects. In practice, we have one neural network architecture that uses different weights for each 3D object. More precisely, we model function Tθ : R3 R3 (neural network with weights θ), which takes an element from the prior distribution P and transfers it on an element on the surface of the object. In our work, we use the transformation between uniform distribution on the 3D ball and the object. This choice of distribution allows one to create a continuous mesh representation. The key idea behind this is that the distribution does not have compact support. Roughly speaking, Gaussian distribution does not have a smooth border. In consequence, we can produce as many points as we need (we can sample an arbitrary number of points from the uniform distribution of the unit ball and transfer them by target network). Thanks to the target network, we can train our model on point clouds containing a different number of points. Furthermore, we can produce a continuous mesh representation of the object. All elements from the ball are transformed into a 3D object. In consequence, the unit sphere is transformed into the surface of the object. Now we can produce meshes without a secondary mesh rendering procedure. It is obtained by simply feeding our neural network by the vertices of a sphere mesh, see Fig 3. As a result, we obtain a high-quality meshes of 3D objects. The sharpness of the borders is a direct consequence of compact support probability distribution of the input prior. Since flow-based models cannot handle this family of priors and require infinite support distributions, the representations generated with those models are lower quality. The target network is not trained directly. We use a hypernetwork Hφ : R3 X θ, which for an point-cloud X R3 returns weights θ to the corresponding target network Tθ. Thus, a point cloud X is represented by a function T((x, y, x); θ) = T((x, y, x); Hφ(X)). To use the above model, we need to train the weights φ of the hypernetwork. For this purpose, we minimize the distance between point clouds like Chamfer distance (CD) or earth mover s distance (EMD) over the training set of points clouds. More precisely, we take an input point cloud X R3 and pass it to Hφ. The hypernetwork returns weights θ to target network Tθ. Next, the input point cloud X is compared with the output from the target network Tθ (we sample the correct number of points from the prior distribution and transfer them by target network). As a hypernetwork, we use a permutation invariant encoder that is based on Point Net architecture (Qi et al., 2017a) and modified decoder to produce weight instead of row points. The architecture of Tθ consists of: an encoder (E) which is a Point Net-like network that transports the data to lowerdimensional latent space Z RD and a decoder (D) (fullyconnected network), which transfers latent space to the vector of weights for the target network. In our framework hypernetwork Tθ(X) represents our autoencoder structure D(EX). Assuming Tθ(X) = D(EX), we train our model by minimizing the cost function given by equation (1). Observe, that we only train a single neural model (hypernetwork), which allows us to produce a great variety of functions at test time. In consequence, we might expect that target networks for similar point cloud will be similar (see Sec. 4 for details). We are able to produce smooth interpolation by using hypernetwork. Figure 6. Thanks to using hypernetwork architecture, we can work with one object (distribution of points on a single 3D point cloud). One possible application is interpolation in the target network instead. By taking two samples on the uniform ball and its interpolation we can construct interpolation between points on the surface of the object. 4. Experiments In this section, we describe the experimental results of the proposed generative models in various tasks, including 3D points mesh generation and interpolation. In the first subsection, we show that our model inherited reconstruction and generative capabilities from models based on generating a fixed number of points. Then we show that we are able to produce continuous mesh representation. Metrics Following the methodology for evaluating generative fidelity and diversification among samples provided in (Achlioptas et al., 2017) and (Yang et al., 2019), we utilize Hypernetwork Approach to Generating Point Clouds 2*Cathegory 2*Methods 2*JSD MMD COV 1-NNA CD EMD CD EMD CD EMD 7*Airplane r-GAN 7.44 0.261 5.47 42.72 18.02 93.58 99.51 l-GAN (CD) 4.62 0.239 4.27 43.21 21.23 86.30 97.28 l-GAN (EMD) 3.61 0.269 3.29 47.90 50.62 87.65 85.68 PC-GAN 4.63 0.287 3.57 36.46 40.94 94.35 92.32 Point Flow 4.92 0.217 3.24 46.91 48.40 75.68 75.06 Hyper Cloud (ours) 4.84 0.266 3.28 39.75 43.70 93.80 88.95 Training set 6.61 0.226 3.08 42.72 49.14 70.62 67.53 7*Chair r-GAN 11.5 2.57 12.8 33.99 9.97 71.75 99.47 l-GAN (CD) 4.59 2.46 8.91 41.39 25.68 64.43 85.27 l-GAN (EMD) 2.27 2.61 7.85 40.79 41.69 64.73 65.56 PC-GAN 3.90 2.75 8.20 36.50 38.98 76.03 78.37 Point Flow 1.74 2.42 7.87 46.83 46.98 60.88 59.89 Hyper Cloud (ours) 2.73 2.56 7.84 41.54 46.67 68.20 68.80 Training set 1.50 1.92 7.38 57.25 55.44 59.67 58.46 7*Car r-GAN 12.8 1.27 8.74 15.06 9.38 97.87 99.86 l-GAN (CD) 4.43 1.55 6.25 38.64 18.47 63.07 88.07 l-GAN (EMD) 2.21 1.48 5.43 39.20 39.77 69.74 68.32 PC-GAN 5.85 1.12 5.83 23.56 30.29 92.19 90.87 Point Flow 0.87 0.91 5.22 44.03 46.59 60.65 62.36 Hyper Cloud (ours) 3.09 1.07 5.38 40.05 40.05 84.65 77.27 Train set 0.86 1.03 5.33 48.30 51.42 57.39 53.27 Table 1. Generation results. MMD-CD scores are multiplied by 103; MMD-EMD scores and JSDs are multiplied by 102. the following criteria for evaluation: Jensen-Shannon Divergence, Coverage, Minimum Matching Distance 1-nearest Neighbor Accuracy. Jensen-Shannon Divergence (JSD): a measure of the distance between two empirical distributions P and Q, defined as: JSD(P Q)= KL(P M)+KL(Q M) 2 , where M = P +Q Coverage (COV): a measure of generative capabilities in terms of richness of generated samples from the model. For two point cloud sets X1, X2 R coverage is defined as a fraction of points in X2 that are in the given metric the nearest neighbor to some points in X1. Minimum Matching Distance (MMD): Since COV only takes the closest point clouds into account and does not depend on the distance between the matchings additional metric was introduced. For point cloud sets X1, X2 MMD is a measure of similarity between point clouds in X1 to those in X2. 1-Nearest Neighbor Accuracy (1-NNA) is a testing procedure characteristic for evaluating GANs. We consider two sets: set Sg composed of generated point clouds and set of test (reference) point clouds, Sr. We pick some generated point cloud X from Sg and find the corresponding nearest neighbor in S X = Sr S Sg {X}, the set that aggregates both training and sampled shapes excluding the considered point cloud X. The 1-NNA is the leave-one-out accuracy of the 1-NN classifier: X Sg 1[NX Sg]+P Y Sr 1[NY Sr] |Sg|+|Sr| . For each sample, the 1-NN classifier classifies it as coming from Sr or Sg according to the label of its nearest sample. The perfect situation occurs when the classifier is unable to distinguish between real and generated point clouds, which means that the value of the criterion is close to 50%. We examine the generative capabilities of the provided Hyper Cloud model in comparison to the existing reference approaches. In this experiment, we follow the methodology provided in (Yang et al., 2019). For this particular experiment, we utilize the hypernetwork architecture trained with EMD reconstruction loss together with the continuous flow on latent representation instead of simple KLD regularization. We compare the results with the existing solutions: raw-GAN (Achlioptas et al., 2017), latent-GAN (Achlioptas et al., 2017), PC-GAN (Li et al., 2018) and Point Flow (Yang et al., 2019). We train each model using point clouds from one of the three categories in the Shape Net dataset: airplane, chair, and car. We follow the exact evaluation pipeline provided in (Yang et al., 2019). The results are presented in Table 1. The Hyper Cloud obtains comparable results to the other models that utilize EMD reconstruction loss with the advantage of sampling an arbitrary number of points. The model was outperformed by Point Flow that does not utilize EMD as reconstruction loss and is not directly capable of generating 3D meshes. Moreover, target network in Hypercloud is originally im- Hypernetwork Approach to Generating Point Clouds 3DGAN Folding Net l-GAN (EMD) l-GAN (CD) l-GAN-2 (EMD) l-GAN-2 (CD) Point Flow Hyper Cloud MN40 (%) 83.3 88.4 84.0 84.5 87.0 86.7 86.8 84.7 Table 2. Unsupervised feature learning. Models are first trained on Shape Net to learn shape representations, which are then evaluated on Model Net40 (MN40) by comparing the accuracy of off-the-shelf SVMs trained using the learned representations. l-GAN-2 was trained and evalauted using Point Flow experimental settings. 2*Sphere R 2*JSD MMD COV CD EMD CD EMD Airplane Point Flow R=2.795 22.26 0.49 6.65 44.69 20.74 R=3.136 26.46 0.60 6.89 39.50 19.01 R=3.368 29.65 0.68 6.84 40.49 16.79 Hyper Cloud (ours) R=1 9.51 0.45 5.29 30.60 28.88 Chair Point Flow R=2.795 19.28 4.28 13.38 36.85 20.84 R=3.136 22.52 4.89 14.47 32.47 17.22 R=3.368 24.68 5.36 14.97 31.41 17.06 Hyper Cloud (ours) R=1 4.32 2.81 9.32 40.33 40.63 Car Point Flow R=2.795 16.59 1.6 8.00 20.17 17.04 R=3.136 20.21 1.75 7.80 21.59 17.32 R=3.368 24.10 1.96 8.35 18.75 17.04 Hyper Cloud (ours) R=1 5.20 1.11 6.54 37.21 28.40 Table 3. The values of quality measures of 3D representations obtained by sampling from sphere of a given radius R for airplane, chair and car shapes. It can be seen that Hyper Cloud preserves the good quality of sampled point clouds, while Point Flow has difficulties in obtaining good quality representations from the sphere. plemented as a simple multilayer perceptron (MLP), contrary to Point Flow which uses more complex continuous flow. We expect, that by substituting MLP with continuous flow model we can achieve the results comparable with Point Flow. Finally, the inference speed of our approach is reduced from 0.27s per one Shape Net sample for Point Flow to 0.08s (3.4x improvement) for our Hyper Cloud. Generation of 3D meshes The main advantage of our method comparing to reference solutions is the ability to generate both 3D point clouds and meshes without any postprocessing stage. In Fig. 5, we present point cloud as well as mesh representation generated by the same model. Thanks to using a uniform distribution on the 3D ball, we can easily construct mesh. All elements from the ball are transformed into a 3D object. In consequence, the unit sphere is transformed into surface of the object. As it was mentioned, we can produce meshes without a secondary meshing proce- dure. It is obtained by propagating the triangulation of the 3D sphere through the target network, see Fig 3. In the case of Gaussian prior, we can use a similar procedure, but it is nontrivial to select the optimal sphere radius, which will be used by the generation of mesh (contrary to Hyper Cloud, in Point Flow there is no default for radius R). If the chosen radius is too small, the constructed mesh lies inside the point cloud, and consequently, we lose small outlying elements of the object, e.g., chair legs. On the other hand, if the chosen sphere radius is large, some small elements of the 3D object will be merged, e.g., four legs of a chair will be joined into one. For evaluation of the quality of mesh grid representation, we propose the following experiment. Instead of sampling the points from the assumed prior distribution, we sample them from a given surface (sphere of the assumed radius). Next, we calculate the standard quality measures of generated point clouds considered in the previous experiment. Since all models except Point Flow listed in Tab. 1 work only on a fixed number of points we compare our results only with Point Flow. As it was mention above, we can use the Point Flow model to produce mesh representation in a similar way by feeding the target network by triangulation on a sphere. In our experiment, consistently with the standard used for hypothesis testing, we use 95%, 98% and 99% confidence spheres for 3D Gaussian distribution, see Tab. 3. As we can see, the default Gaussian prior is not suitable for producing a continuous representation of the boundary. Moreover, the seemingly natural exchange (with accordance with our approach) of the normal distribution onto the uniform distribution on the ball will not work since flow methods use log-likelihood as a cost function, and consequently, it is impossible to use prior density with compact support. Unsupervised representation learning In this experiment we evaluate the quality of latent space representation of our model. We follow the experimental settings from previous works (Achlioptas et al., 2017; Yang et al., 2019) and train our model using full Shape Net dataset. Next, we evalaute the quality of latent representation by training a linear SVM classifier on top of it using Model Net40 dataset. We provide the results of empirical evaluation of our model in Table 2. Hyper Cloud achieved the accuracy that is comparable to the results achieved by original version of l-GAN but was worse than results achieved by Point Flow and l- Hypernetwork Approach to Generating Point Clouds GAN trained with a new settings. However, in our experiments we did not use preprocessed Model Net dataset in the same pipeline as in Point Flow, but in the way recommended in (Achlioptas et al., 2017). Interpolation In our model, we can construct two types of interpolation. Since we have two different prior distributions: Gaussian in hyper network architecture (latent of auto-encoder) and uniform distribution on the unit sphere in the target network, see Fig. 2. First of all, we can take two 3d objects and obtain a smooth transition between them, see Fig. 4. For each point cloud, we can generate mesh representation. Therefore we can also produce interpolation between meshes. Thanks to using hypernetwork architecture, we can work with one object (distribution of points on a single 3D point cloud). One possible application is interpolation in the target network instead of the classical approach in the latent space of auto-encoder, see Fig. 6. By taking two samples on the uniform ball and its interpolation, we can construct interpolation between points on the surface of the object. 5. Conclusions In this work, we presented a novel approach to represent point clouds of 3D objects with parameters of target networks trained by a hypernetwork as generative models. More specifically, we are able to build variable size representations of point clouds not only when they are inputted into the model, but also when they are returned as an output. Contrary to the existing methods, our approach is not constrained by the assumptions enforced on the objective functions in the case of the flow-based architectures, such as tractability of Jacobian determinants. Finally, our Hyper Cloud method offers a general framework that allows to adapt any Point Net model to build a continuous representation of the output vector. In this work we focused specifically on mesh representations of 3D objects, presenting that our approach give empirically better results on the task of realistic mesh generation. Nevertheless, thanks to the generality of our proposed architecture that encompasses many existing ones, it can be used in a multitude of real-life applications and it can open new areas of research related to generative models. 6. Acknowledgements The work of P. Spurek was supported by the National Centre of Science (Poland) Grant No. 2018/31/B/ST6/00993. The work of S. Winczowski was supported by the National Centre of Science (Poland) Grant No. 2019/33/B/ST6/00894. The work of J. Tabor was supported by the National Centre of Science (Poland) Grant No. 2017/25/B/ST6/01271. The work of T. Trzcinski was supported by the National Centre of Science (Poland) Grant No. 2016/21/D/ST6/01946 as well as the Foundation for Polish Science Grant No. POIR.04.04.00-00-14DE/18-00 co-financed by the European Union under the European Regional Development Fund. Achlioptas, P., Diamanti, O., Mitliagkas, I., and Guibas, L. Learning representations and generative models for 3d point clouds. ar Xiv preprint ar Xiv:1707.02392, 2017. Berg, R. v. d., Hasenclever, L., Tomczak, J. M., and Welling, M. Sylvester normalizing flows for variational inference. ar Xiv preprint ar Xiv:1803.05649, 2018. Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571 6583, 2018. Gadelha, M., Wang, R., and Maji, S. Multiresolution tree networks for 3d point cloud processing. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103 118, 2018. Grathwohl, W., Chen, R. T., Betterncourt, J., Sutskever, I., and Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. ar Xiv preprint ar Xiv:1810.01367, 2018. Ha, D., Dai, A., and Le, Q. V. Hypernetworks. ar Xiv preprint ar Xiv:1609.09106, 2016. Ji, S., Xu, W., Yang, M., and Yu, K. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221 231, 2012. Kehoe, B., Patil, S., Abbeel, P., and Goldberg, K. A survey of research on cloud robotics and automation. IEEE Transactions on automation science and engineering, 12 (2):398 409, 2015. Kingma, D. P. and Welling, M. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114, 2013. Klocek, S., Maziarka, Ł., Wołczyk, M., Tabor, J., Nowak, J., and Smieja, M. Hypernetwork functional image representation. In International Conference on Artificial Neural Networks, pp. 496 510. Springer, 2019. Kullback, S. and Leibler, R. A. On information and sufficiency. The annals of mathematical statistics, 22(1): 79 86, 1951. Hypernetwork Approach to Generating Point Clouds Li, C.-L., Zaheer, M., Zhang, Y., Poczos, B., and Salakhutdinov, R. Point cloud gan. ar Xiv preprint ar Xiv:1810.05795, 2018. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. ar Xiv preprint ar Xiv:1511.05644, 2015. Maturana, D. and Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922 928. IEEE, 2015. Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652 660, 2017a. Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099 5108, 2017b. Rubner, Y., Tomasi, C., and Guibas, L. J. The earth mover s distance as a metric for image retrieval. International journal of computer vision, 40(2):99 121, 2000. Sheikh, A.-S., Rasul, K., Merentitis, A., and Bergmann, U. Stochastic maximum likelihood optimization via hypernetworks. ar Xiv preprint ar Xiv:1712.01141, 2017. Shoef, M., Fogel, S., and Cohen-Or, D. Pointwise: An unsupervised point-wise feature learning network. ar Xiv preprint ar Xiv:1901.04544, 2019. Stypułkowski, M., Zamorski, M., Zieba, M., and Chorowski, J. Conditional invertible flow for point cloud generation. ar Xiv preprint ar Xiv:1910.07344, 2019. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945 953, 2015. Sun, Y., Wang, Y., Liu, Z., Siegel, J. E., and Sarma, S. E. Pointgrow: Autoregressively learned point cloud generation with self-attention. ar Xiv preprint ar Xiv:1810.05591, 2018. Tabor, J., Knop, S., Spurek, P., Podolak, I., Mazur, M., and Jastrzebski, S. Cramer-wold autoencoder. ar Xiv preprint ar Xiv:1805.09235, 2018. Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein auto-encoders. ar Xiv preprint ar Xiv:1711.01558, 2017. Tomczak, J. M. and Welling, M. Vae with a vampprior. ar Xiv preprint ar Xiv:1705.07120, 2017. Tran, M.-P. 3d contour closing: A local operator based on chamfer distance transformation. 2013. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al. Conditional image generation with pixelcnn decoders. In Advances in neural information processing systems, pp. 4790 4798, 2016. Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems, pp. 82 90, 2016. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912 1920, 2015. Yang, B., Luo, W., and Urtasun, R. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7652 7660, 2018a. Yang, G., Huang, X., Hao, Z., Liu, M.-Y., Belongie, S., and Hariharan, B. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4541 4550, 2019. Yang, Y., Feng, C., Shen, Y., and Tian, D. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206 215, 2018b. Yifan, W., Wu, S., Huang, H., Cohen-Or, D., and Sorkine Hornung, O. Patch-based progressive 3d point set upsampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5958 5967, 2019. Yu, L., Li, X., Fu, C.-W., Cohen-Or, D., and Heng, P.-A. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790 2799, 2018. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. Deep sets. In Advances in neural information processing systems, pp. 3391 3401, 2017. Zamorski, M., Zieba, M., Klukowski, P., Nowak, R., Kurach, K., Stokowiec, W., and Trzci nski, T. Adversarial autoencoders for compact representations of 3d point clouds. ar Xiv preprint ar Xiv:1811.07605, 2018.