# point_cloud_dataset_distillation__3b22803c.pdf

Point Cloud Dataset Distillation

Deyu Bo 1 Xinchao Wang 1

This study introduces dataset distillation (DD) tailored for 3D data, particularly point clouds. DD aims to substitute large-scale real datasets with a small set of synthetic samples while preserving model performance. Existing methods mainly focus on structured data such as images. However, adapting DD for unstructured point clouds poses challenges due to their diverse orientations and resolutions in 3D space. To address these challenges, we theoretically demonstrate the importance of matching rotation-invariant features between real and synthetic data for 3D distillation. We further propose a plug-and-play point cloud rotator to align the point cloud to a canonical orientation, facilitating the learning of rotationinvariant features by all point cloud models. Furthermore, instead of optimizing fixed-size synthetic data directly, we devise a point-wise generator to produce point clouds at various resolutions based on the sampled noise amount. Compared to conventional DD methods, the proposed approach, termed DD3D, enables efficient training on low-resolution point clouds while generating high-resolution data for evaluation, thereby significantly reducing memory requirements and enhancing model scalability. Extensive experiments validate the effectiveness of DD3D in shape classification and part segmentation tasks across diverse scenarios, such as cross-architecture and cross-resolution settings.

1. Introduction

Dataset distillation (DD) (Wang et al., 2018) aims to distill the knowledge of a large-scale dataset into a few synthetic samples, where the models trained on the real and synthetic data will have comparable performance. By doing so, DD

1National University of Singapore. Correspondence to: Xinchao Wang <xinchao@nus.edu.sg>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

significantly reduces the computational cost of training neural networks from scratch. Due to its remarkable efficiency and effectiveness, DD has been used in a variety of domains, such as image (Zhao et al., 2021; Zhao & Bilen, 2023; Cazenavette et al., 2022), video (Wang et al., 2024), graph (Jin et al., 2022; Liu et al., 2024), etc.

Despite great progress, existing DD methods have primarily been applied to structured 1D and 2D data, while the distillation of unstructured 3D data, e.g., point cloud, remains largely unexplored. Point cloud data exists in large quantities in machine vision. For example, MVPNet (Yu et al., 2023) scans more than 87K point clouds from real-world videos, and Objaverse-XL (Deitke et al., 2023) provides more than 10M high-quality 3D assets. Training on such datasets from scratch is timeand resource-intensive, highlighting the need for more efficient alternatives.

However, extending DD to 3D point clouds presents unique challenges. First, point clouds with different orientations represent the same semantic information, e.g., shapes. However, existing DD methods do not take the symmetry of data into account, which cannot handle the randomly rotated data and result in sub-optimal performance. As shown in Figure 1(a), directly applying DD to the point clouds with different orientations cannot obtain meaningful synthetic data. Second, point clouds have flexible resolutions, i.e., the number of points, depending on specific models and applications. Generally, a larger resolution encodes more fine-grain information but also increases the computational costs (Huang et al., 2024; Qiu et al., 2021). Existing DD methods initialize the synthetic data as a fixed-size tensor, which cannot be customized for different point cloud models. Moreover, the memory budget for fixed-size tensors will increase rapidly when dealing with dense-resolution scenes, e.g., segmentation (Chang et al., 2015).

Once the weaknesses of existing methods are identified, it is natural to ask: How can we build a distillation framework that overcomes the orientation and resolution issues of 3D point clouds? To answer this question, we first theoretically prove that random rotations weaken the principle components of real data, thereby degenerating the distillation performance. Based on this discovery, we propose DD3D, the first DD framework for 3D point clouds, illustrated in Figure 1(b). Specifically, DD3D first uses a rotator to convert the point cloud into a canonical orientation by

Point Cloud Dataset Distillation

(a) DD for point clouds

Rotate Distill Generate

(256 Points)

(512 Points)

(1024 Points) (Generator)

(b) DD3D for point clouds

Figure 1: Differences between vanilla DD and DD3D when distilling 3D point clouds.

learning a rotation-equivariant projection matrix to offset random rotation. Then, the knowledge of rotation-invariant data is distilled into a point-wise generator to predict the point coordinates from noise, where the resolution is based on the number of sampled noises. Finally, the rotator and generator are jointly optimized by minimizing the gradient differences between the real and synthetic data.

The contributions are summarized as follows. (1) We propose the first 3D distillation framework, DD3D, which can eliminate the influence of random rotations and synthesize point clouds at arbitrary resolutions. (2) We theoretically prove that matching the rotation-invariant features can preserve the principal components of real data and prevent data degeneration. (3) DD3D can be trained with low-resolution point clouds and generates high-resolution data for evaluation, significantly reducing memory usage and enhancing model scalability. (4) Extensive experiments on shape classification and part segmentation tasks validate the effectiveness of DD3D over baselines.

2. Related Work

Dataset Distillation. Research on DD can be roughly divided into two directions. The first is to explore advanced matching objectives to improve the distillation performance. For example, performance matching (Wang et al., 2018), gradient matching (Zhao et al., 2021; Zhao & Bilen, 2021), distribution matching (Zhao & Bilen, 2023; Wang et al., 2022), trajectory matching (Cazenavette et al., 2022; Guo et al., 2024; Du et al., 2023) and feature regression (Zhou et al., 2022; Loo et al., 2022; Nguyen et al., 2021). On the other hand, some methods innovate efficient data parameterizations to avoid directly optimizing the synthetic data. For example, neural networks (Liu et al., 2022), spectral representation (Shin et al., 2023), linear transformation (Deng & Russakovsky, 2022), and up-sampling (Kim et al., 2022). Among them, a special parameterization technique is to distill the knowledge into a generative model (Zhao & Bilen, 2022; Wang et al., 2023; Zhang et al., 2023; Cazenavette

et al., 2023; Zhang et al., 2024a), which can generate diverse synthetic data with unlimited samples. Although valid, these methods rely on the prior knowledge of generative models pre-trained on large-scale datasets, which is not feasible for point clouds. A recent work1 also applies GM to point cloud data. However, neither of them considers the orientation and resolution issues. For a more detailed introduction to DD, please refer to the recent surveys (Yu et al., 2024b; Lei & Tao, 2024; Geng et al., 2023; Sachdeva & Mc Auley, 2023).

Point Cloud Analysis. Deep learning on point clouds plays a vital role in 3D data analysis (Guo et al., 2021b). Traditional methods can be classified into three categories: Pointbased methods, e.g., Point Net (Qi et al., 2017a) and Point Net++ (Qi et al., 2017b), convolution-based methods, e.g., Point CNN (Li et al., 2018) and Point Conv (Wu et al., 2019), and relation-based methods, e.g., DGCNN (Wang et al., 2019) and Point Transformer (Guo et al., 2021a). However, these methods are rotation-sensitive and cannot handle point clouds with different orientations. Some advanced methods are designed to learn rotation-equivariant or invariant features, such as vector neuron (Deng et al., 2021), spherical harmonic (Poulenard et al., 2019), tensor field (Thomas et al., 2018), and graph features (Kim et al., 2020; Zhao et al., 2019). However, these methods introduce additional operators and cannot be applied to rotation-sensitive methods. Another way is to project point clouds into the same orientation. For example, principal component analysis (PCA) leverages the eigenvectors of the covariance matrix to transform point clouds into the direction with maximum variance. But this approach suffers from the sign-ambiguity issue (Xiao et al., 2020; Yu et al., 2020; Li et al., 2021).

3. Preliminary

Task Formulation. Suppose that T = {(Ci, yi)}|T | i=1 is a large-scale training dataset, where Ci is a point cloud with label yi for the shape classification task. Each point

1https://github.com/kghandour/dd3d

Point Cloud Dataset Distillation

cloud has n points, represented as C = {P, V }, where P Rn 3 represents the 3D coordinates of points and V Rn v indicates the part to which the point belongs in segmentation task and v is the number of parts. The goal of DD3D is to synthesize a much smaller point cloud dataset S = {(Cj, yj)}|S| j=1, where |S| |T |, such that a classification or segmentation model fθ trained on T and S will have comparable performance. Other tasks, such as detection, are left for future studies.

Dataset Distillation. In order to effectively optimize the synthetic data, existing DD methods adopt a bi-level optimization paradigm, which can be formulated as:

min S LDD (fθ (S), fθ (T )) (1)

s.t. θ = arg min θ Lcls(fθ(S), Y S), (2)

where the inner loop updates the model fθ on the synthetic data, and the outer loop optimizes the synthetic data. In particular, LDD is a metric that measures the distance between real and synthetic data. For example, gradient matching (Zhao et al., 2021) minimizes the gradient differences.

Dataset Distillation with Rotations. Before detailing the proposed method, we first give a general analysis of how rotations affect the performance of DD. Let XS R|S| d, XT R|T | d denote the representations learned by fθ on the synthetic data and real training data, respectively, and d is the hidden dimension. Theorem 3.1. Assume the classifier is a linear layer W and Lcls can be simplified to the mean-squared error XW Y 2 F . The objective of gradient matching is equal to variance preserving:

min S LGM = min S D W LS cls, W LT cls (3)

min S X S XS X T XT 2

where D is a distance metric and W is the gradient with respect to W.

Theorem 3.1 reveals that synthetic data preserves the variance information of real data. We then analyze how random rotations affect the variance of real data. Without loss of generality, we assume that fθ is rotation-equivariant, i.e., fθ (PR) = fθ (P) R, where R SO(d) is a random rotation matrix. Theorem 3.2. Assume XT follows a d-dimensional multivariate Gaussian distribution N(µ, Σ). Let X T be the rotated representations of XT such that:

λmax E h X T X T i λmax E h XT XT i (5)

σmax (E [X T ]) σmax (E [XT ]) , (6)

where λmax and σmax are the maximum eigenvalues and singular values, respectively.

Theorem 3.2 states that random rotations reduce the maximum singular value of the data representations, implying that the principle component of XT is weakened. In this case, the synthetic data cannot effectively capture the distribution of the real data, degenerating model performance. All proofs can be seen in Appendix A.

4. The Proposed Method

4.1. Plug-and-Play Point Cloud Rotator

Our analysis highlights the importance of learning rotationinvariant representations for effective point cloud distillation. However, many existing point cloud models lack this capability. To address this limitation, we introduce a plug-andplay point cloud rotator that transforms point clouds into a consistent canonical view. This transformation ensures that all models can learn rotation-invariant representations, enhancing their generalization and performance.

Rotation-equivariance. We leverage the orthogonality of the rotation matrix to eliminate its influence, i.e., RR = I, where PCA is a typical method:

X PR PR PR PR = R UΛU R, (7)

where P is the center of P and U represents the eigenvectors of the covariance matrix. Importantly, the projection R U maintains equivariance with respect to coordinate rotations, ensuring (PR)(R U) = PU remains invariant. However, eigenvectors suffer from sign ambiguity, implying that both ui and ui are valid solutions. As a result, the canonical view PU is not unique and has 8 ambiguities in 3D space (Xiao et al., 2020; Yu et al., 2020), i.e., PUQ = P [ u1, u2, u3], where Q R3 3|Qii = {1, 1} , Qij = 0, i = j .

Sign-invariant. Our proposed rotator r : Rn 3 Rn 3 is designed to enhance PCA by addressing the sign ambiguity issue. To achieve this, the rotator learns a sign-equivariant reflection matrix Q for each point cloud. This ensures that the transformed representation satisfies PUQ Q = PU, making it sign-invariant and improving the robustness of rotation-invariant learning. Specifically, the rotator first lifts the scalar coordinates to the vector representations:

H = [sin( PU), sin( 2PU) sin( t PU)]

= [sin(PU), sin(2PU) sin(t PU)] Q, (8)

where sin( ) is the sine function and t is the period of Fourier features. An average pooling is then applied on H to learn the representations of the whole point cloud. Finally, a learnable vector w Rt is used to decode the reflection matrix. The overall architecture of the rotation is formulated as follows:

r(P) = PUQ Q = PUQ Sign(w Pool(HQ)), (9)

Point Cloud Dataset Distillation

(Generator)

Figure 2: DD3D for part segmentation task. Each noise is first pre-partitioned into different parts according to its value, e.g., the noise within (0, 0.45) is marked as fuselage. Then the generator maps the noise into different parts for gradient matching.

where Sign means the signs of a matrix. The reflection matrix Q has the same signs as Q because the sinusoidal features, pooling function, and linear decoder preserve the sign information of HQ, which can solve the sign ambiguity and learn sign-invariant representations.

Alternative Approaches. Several methods (Zhang et al., 2024b; Yu et al., 2024a; Melnyk et al., 2024; Li et al., 2022; Xu et al., 2021) have been proposed for learning rotationinvariant representations, such as vector neurons (Deng et al., 2021) and graph-based features (Kim et al., 2020). However, these approaches modify the original point coordinates, making it difficult to integrate with existing models. Another line of work addresses the sign ambiguity issue using pooling (Yu et al., 2020) and attention mechanisms (Xiao et al., 2020; Li et al., 2021). While effective, these methods are computationally expensive, as they require evaluating representations across all possible ambiguous views.

4.2. Point-wise Generator

Beyond rotation alignment, point cloud distillation must also account for the variations in resolution. Unlike images, point clouds do not have a fixed structure, making traditional DD methods, which directly optimize fixed-size tensors, unsuitable. To solve this issue, a promising solution is to parameterize data with implicit neural representation (INR), which has been widely used to generate data at arbitrary resolutions (Sitzmann et al., 2020; Park et al., 2019; Chen et al., 2021; Singh et al., 2023).

Point Denoising. Our solution is to use INR as a point-wise generator g : R R3, which takes a random noise as input and predicts its corresponding 3D coordinates. Therefore, the number of points is the same as the sampling noise, which enables us to achieve low-resolution training and high-resolution evaluation, thus significantly reducing the computational costs and memory budget. See Section 5.6 for more details. For implementation, we choose SIREN (Sitzmann et al., 2020) as generator, which is formulated as

g = [Φ1 Φ2 ΦL] WP , Φi = sin(ziwi + bi), (10)

where L is the number of layers, denotes the cascade of neural networks, Φi is a multi-layer perceptron (MLP) with sine activation function in the i-th layer, and WP Rd 3

is the decoder to generate 3D point coordinates.

Conditional Modulating. While the point-wise generator can synthesize point clouds at arbitrary resolutions, it lacks class-conditional control, limiting its ability to generate category-specific data. To address this, we introduce a modulator c : Rd Rd, which is implemented as another MLP Ψ, to encode the label information and generate conditions for the point cloud generation:

c = Ψ1 Ψ2 ΨL, Ψi = Re LU (miw i + b i) , (11)

where Re LU( ) = max(0, ), mi Rd denotes the conditional representations. The first layer input, m1, is a one-hot matrix encoding class labels. Assume that there are K classes in total, and each class has N synthetic samples, then m1 RKN and w 1 RKN d.

The learned conditional representations are then used to modulate each layer of the generator, adjusting the frequency and phase features dynamically. The complete architecture is as follows:

g c = [(Ψ1 Φ1) (Ψ2 Φ2) (ΨL ΦL)] W, (12) where is the element-wise multiplication. For clarity, in the following sections, we use g(ϵ, k) to denote g c with the k-th condition.

Noise Distribution. For noise sampling, we use uniform distribution instead of Gaussian distribution. The reasons are two-fold. First, INR requires inputs to be normalized within [0, 1], which aligns naturally with the uniform distribution. Second, in the part segmentation task, each point must be assigned a label beforehand. A uniform distribution enables a straightforward division of noise samples accord-

Point Cloud Dataset Distillation

ing to the category ratio, ensuring a balanced representation across different parts. See Figure 2 for an intuitive example.

4.3. Distillation Tasks

To comprehensively validate the effectiveness of DD3D, we conduct experiments on both the basic shape classification task and the challenging part segmentation task. Shape classification aims to assign each point cloud a label, emphasizing global information, while part segmentation predicts the label of each point, which is more fine-grained.

Shape Classification. The distillation objective of the shape classification task is defined as:

k=1 D( Lcls(fθ r g(ϵ, k), Y S k ),

Lcls(fθ r(BT k )), Y T k )),

where K is the total classes of shapes, BT k and Y T k are a batch of real training data and labels.

Part Segmentation. In the part segmentation task, each shape is divided into multiple parts. For example, an airplane can be divided into its fuselage, wings, engines, and tail. Assigning these fine-grained labels before distillation helps stabilize the training process. Therefore, DD3D first partitions the noise into different segments based on its value. Then, the partitioned noise is fed into the generator and rotator to produce synthetic data. To effectively capture fine-grained details, DD3D aligns the gradients of each segment individually, rather than simply matching the gradients of the entire shape. This improves the preservation of local geometric features while maintaining overall structural coherence. A conceptual illustration is provided in Figure 2. The distillation of part segmentation task is formulated as:

p k D( Lseg(fθ r (g (ϵ, k) M S p ), V S p ),

Lseg(fθ r(BT k M T p ), V T p )), (14) where p k indicates parts belonging to a shape, V T p , V S p represents the real and synthetic part labels, and M S p , M T p are the part-specific mask to extract gradients corresponding to each part. See Algorithm 1 for detailed descriptions.

4.4. Discussion

DD3D has demonstrated strong potential in capturing both global shape structures and local details in object-level point clouds. Visualizations in Section 4 further illustrate its effectiveness. However, applying DD3D to scene-level tasks, such as object detection, remains challenging. This limitation can be attributed to two key factors: First, scenelevel tasks often involve a significant imbalance between

Algorithm 1 DD3D for part segmentation

Input: Training dataset T Ouput: Model f, Rotator r, Generator g repeat

for k = 1, , K do

Sample a batch BT k , V T k T Sample noise ϵ U(0, 1) Generate V S k , M S k by partitioning noise ϵ Generate point clouds BS k = g(ϵ, k) for p k do

Apply mask M S p , M T p on BS k , BT k Compute LS seg and LT seg end for end for Update g with Lpart repeat

Update f, r with LS seg until inner-loop end until outer-loop end

foreground and background points. Second, the detection task requires learning continuous bounding box coordinates, which cannot be predefined like segmentation labels, adding another layer of complexity.

5. Experiments

We benchmark our method on two fundamental tasks of point cloud analysis: shape classification (Section 5.1) and part segmentation (Section 5.2), followed by a series of analyses, including generalization (Section 5.3), ablation (Section 5.4), and visualization (Section 5.5).

Datasets. We employ three datasets of different scales for the shape classification task: (i) Scan Object NN (OBJ BG) (Uy et al., 2019) is the smallest dataset but consists of real-world data, which is challenging to distillate. (ii) Model Net40 (Wu et al., 2015) is a larger synthetic dataset generated from CAD models. (iii) MVPNet (Yu et al., 2023) is the largest dataset, containing 87K point clouds scanned from real-world videos. We use its subset MVPNet100, which includes data from the 100 most populous categories, to alleviate the influence of long-tail distribution, similar to the CAFIR-100 dataset. For the part segmentation task, we follow Qi et al. (2017a) and choose Shape Net-part (Yi et al., 2016) dataset for evaluation. All the datasets use the standard data splits, and their detailed statistic information can be found in Appendix C.

Data Preparation and Metrics. Each cloud contains 1,024 points and is normalized into a unit sphere. We consider two settings: Aligned and Rotated. In the Aligned setting, both training and test point clouds have the same orienta-

Point Cloud Dataset Distillation

Table 1: Shape classification results of different methods, mean accuracy (%) standard deviation. Bold indicates the best performance, and - means out-of-memory during distillation. CPC: Number of Clouds Per Class.

Dataset CPC Coreset-based Distillation-based Full Dataset Random Herding K-Center GM DM TM DD3D

Scan Object NN (Aligned)

1 22.00 2.56 16.29 1.37 18.18 1.04 26.34 2.07 25.90 1.34 26.42 2.08 30.62 1.75

66.96 10 32.63 1.51 31.94 3.31 33.46 1.46 39.87 3.00 37.61 2.78 36.44 2.74 43.77 2.63 50 54.15 1.77 51.70 1.87 54.22 1.30 57.52 2.03 56.91 1.17 - 61.96 1.44

Scan Object NN (Rotated)

1 14.90 2.10 18.10 1.55 19.91 2.16 14.64 3.04 18.74 2.44 19.29 3.90 23.59 2.17

54.84 10 20.50 1.26 20.20 2.19 22.05 1.76 20.55 3.99 20.26 4.31 19.20 4.52 25.84 3.11 50 42.98 1.84 43.39 1.34 44.29 2.07 47.74 1.82 48.11 2.30 - 50.26 1.42

Model Net40 (Aligned)

1 40.53 0.36 43.41 0.81 43.90 1.51 53.38 0.86 53.21 0.58 52.37 0.99 53.82 0.28

88.05 10 71.89 0.29 74.63 0.48 73.13 0.78 75.45 0.82 74.45 0.47 75.39 1.32 76.31 0.49 50 82.37 0.45 82.75 0.49 82.73 0.28 81.74 0.55 83.02 1.16 - 83.91 0.23

Model Net40 (Rotated)

1 34.65 0.71 30.03 1.42 30.05 0.50 41.32 1.96 41.71 1.65 37.36 2.98 42.36 0.83

80.45 10 58.87 0.65 56.03 0.62 57.69 0.97 55.69 1.63 55.45 1.80 56.21 1.14 58.14 1.36 50 70.13 0.64 70.02 0.71 69.68 0.59 68.92 0.73 69.31 0.79 - 71.27 0.32

MVPNet100 1 5.21 0.27 8.14 0.22 8.41 0.35 10.52 0.83 11.73 0.49 10.74 0.57 13.68 0.48

55.63 10 15.99 0.30 22.11 0.21 20.54 0.21 25.68 0.77 25.71 0.69 - 31.14 1.31 50 30.14 0.27 35.87 0.24 35.48 0.44 37.41 0.57 36.83 0.20 - 40.61 0.38

Note: All methods with rotated data are trained with the point cloud rotator. Ablations can be seen in Table 4.

Table 2: Part Segmentation results (%) on Shape Net dataset.

Ratio Method OA Instance Io U Class Io U

CPC=1 Coreset 61.24 48.21 31.61 GM 65.56 50.98 33.96 DD3D 73.06 60.27 37.73

CPC=10 Coreset 77.78 65.03 48.19 GM 78.32 65.79 49.88 DD3D 80.37 66.70 50.59

100% Full 90.04 77.38 65.63

tion, while in the Rotated setting, both training and test data are rotated randomly. For the rotated data, we project them along the direction of maximum variance during preprocessing. Note that the point clouds in MVPNet only have 180 views, so we do not randomly rotate them. The details of pre-processing can be found in Appendix C. We report the Overall Accuracy (OA, %) of each method in the shape classification task and the average class intersection of union (Io U, %) in the part segmentation task.

Baselines. To demonstrate the effectiveness of our method, we choose two types of baselines: (1) Coreset-based methods, including Random, Herding (Welling, 2009) and KCenter (Sener & Savarese, 2018). (2) Distillation-based methods, including Gradient Matching (GM) (Zhao et al., 2021), Distribution Matching (DM) (Zhao & Bilen, 2023), and Trajectory Matching (TM) (Cazenavette et al., 2022). We choose GM as the distillation objection for DD3D as it

makes a trade-off between time and memory consumption. See Appendix D for the detailed hyperparameters.

Backbones. We provide a lightweight Point Net as the backbone, which abandons the transformation network because previous literature (Yu et al., 2024b) pointed out that complex network architecture may lead to degraded distillation performance. In the evaluation stage, we adopt various advanced backbones to evaluate the generalization ability of distilled datasets, including Point Net++ (Qi et al., 2017b), DGCNN (Wang et al., 2019), Point Transformer (Guo et al., 2021a), Point MLP (Ma et al., 2022), and Point Next (Qian et al., 2022). Results can be found in Table 3.

Experimental Setup. For each method, we perform the distillation process twice, evaluate each synthetic point cloud dataset five times (10 results in total), and report the mean and standard deviation. Baselines are all initialized with original data, while DD3D is trained from scratch. For the shape classification task, we consider three different distillation ratios with 1, 10, and 50 synthetic point clouds per class (CPC). For the part segmentation task, we choose CPC=1 and CPC=10 due to the limitation of GPU memory.

5.1. Shape Classification

The results of different methods on the shape classification task are shown in Table 1, from which we have the following observations. Firstly, the results of distillation-based methods consistently outperform coreset-based methods, demonstrating the effectiveness of DD. However, as the amount of synthetic data increases, the performance of the

Point Cloud Dataset Distillation

Table 3: Cross-architecture results (%) with CPC=50.

Dataset Method Point Net++ DGCNN PCT Point MLP Point Ne Xt

Scan Object NN

DM 56.02 51.47 52.72 51.33 51.82 GM 55.38 52.98 53.28 51.33 52.81 DD3D 57.14 53.36 54.04 52.50 53.36

Model Net40

DM 74.35 74.84 76.92 72.49 71.48 GM 76.54 73.38 77.31 74.11 72.00 DD3D 77.71 75.36 79.21 75.36 73.99

DM 33.20 31.26 33.92 32.58 31.17 GM 31.35 29.88 31.43 31.79 30.82 DD3D 34.19 32.94 35.82 33.08 32.75

Table 4: Ablation studies of the point cloud rotator.

Model Net40 (CPC=50) Random GM DM DD3D

Point Net 14.75 9.47 10.16 17.91 Point Net + PCA 60.77 53.55 55.57 62.72 Point Net + Rotator 70.13 68.92 69.31 71.27

Full Dataset 80.45

coreset increases rapidly. Secondly, DD3D achieves stateof-the-art performance on all five datasets, demonstrating its superiority over traditional DD methods. Notably, DD3D obtains more improvements over baselines as the number of CPCs increases, possibly because the generator provides more diverse data. Thirdly, the results on the rotated data are weaker than those on the aligned data. Although we project the rotated data to the canonical orientation, i.e., direction with maximum variance, these point clouds still have slightly different orientations, while the aligned data is manually registered, which is strictly towards the direction of gravity and therefore has better performance.

5.2. Part Segmentation

Table 2 presents the results of the part segmentation task on the Shape Net dataset. Unlike classification, part segmentation requires learning both global shape structures and fine-grained part details, making it a more challenging task for DD. Since some traditional DD methods struggle with segmentation, we only compare DD3D against random coreset selection and GM. The results show that GM is slightly better than coreset selection as it is initialized by the real data. On the other hand, DD3D consistently outperforms both methods across all metrics by a large margin, demonstrating its effectiveness in learning the fine-grained features of point clouds. As expected, DD3D s performance does not yet reach the full dataset baseline. Nevertheless, it achieves 90% of the performance of the entire dataset using only 1% of the data, demonstrating its potential in 3D distillation.

5.3. Cross-architecture Generalization.

We evaluate whether DD3D can benefit different point cloud models. Specifically, we use Point Net as the distillation method and utilize five advanced point cloud models as evaluation methods, trained on the synthetic data from scratch.

Table 5: DD3D under different resolutions.

CPC=50 Resolution

256 512 1024 Avg.

Scan Object NN 61.27 60.59 61.96 61.27 Model Net40 83.03 83.59 83.91 83.51 MVPNet100 39.88 40.13 40.61 40.21

0 50 100 150 200 Iteration

Matching Loss

Scan Object NN

256 512 1024

0 100 200 300 400 Iteration

Matching Loss

Model Net40

256 512 1024

Figure 3: Matching losses of different resolutions.

Notably, we use synthetic data with CPC=50 to alleviate the randomness. The results are shown in Table 3, from which we can see that DD3D consistently outperforms DM and GM across different datasets and evaluation methods, proving that the synthetic data distilled by DD3D has better generalizability. This may be attributed to the generator that provides various point clouds in each epoch by sampling different noises, which plays a role like data augmentation. However, we can also observe that the results of evaluation methods are not as good as Point Net, emphasizing that the synthetic data is still biased by the distillation model.

5.4. Ablation Studies

Point Cloud Rotator. We first verify the effectiveness of the proposed point cloud rotator on the rotated Model Net40 dataset. Specifically, we consider three different models: (1) Point Net, which is rotation-sensitive; (2) Point Net + PCA, which is rotation-invariant but sign-variant; (3) Point Net + Rotator, which is rotationand sign-invariant. It can be observed from Table 4 that the performance of all methods drops rapidly when the data is randomly rotated. On the other hand, leveraging PCA to transform the point clouds into a canonical orientation can significantly improve the distillation performance. However, the results are still far from the model with the point cloud rotator, which reflects that sign ambiguity will seriously prevent the distillation model from learning meaningful synthetic data. Finally, it can be observed that the proposed rotator can help point cloud models to rotation-invariant representations, thus benefiting the learning of synthetic data.

Point-wise Generator. Next, we explore the performance of DD3D under different resolutions to verify the effectiveness and efficiency of the proposed generator. Typically, the shape classification task needs 1,024 points for training and

Point Cloud Dataset Distillation

Raw Images DD3D GM

Figure 4: Visualizations of different methods. Top: Model Net (Airplane). Bottom: Shape Net (Guitar, Laptop, and Pistol).

(a) Airplane

(b) Earphone

Figure 5: Geometric details of points generated by DD3D.

evaluation. In this experiment, we randomly sample 256 and 512 points from real data to supervise the distillation of DD3D. Once trained, we leverage DD3D to generate 1,024 points for evaluation. It is visible from Figure 3 that training on high-resolution point clouds can accelerate the convergence of DD3D but the final matching losses are similar. Moreover, Table 5 shows that different resolutions have similar performance. In some cases, low-resolution data also outperforms high-resolution point clouds, e.g., Scan Object NN. This discovery shows that DD3D can not only achieve stable results but also significantly reduce computational costs and GPU memory overhead.

5.5. Visualization

We visualize the real and synthetic point clouds in Figure 4 for a more intuitive comparison. The results of DD3D and GM are placed in the last two columns. It can be observed that the point clouds generated by GM tend to condense to some clusters, while some isolated points are left as noise. On the contrary, the point clouds generated by DD3D are coherent and encode the global geometric shapes. Moreover,

1 5 10 # Clouds Per Classes

Times (s) Per Iteration

1 5 10 # Clouds Per Class

Budget (GB)

256 512 1024 Number of Points

Time (s) Space (GB)

(c) Resolution

Figure 6: Time and space overhead between DD and DD3D.

in Shape Net, the point clouds of GM are squeezed, making its shape inconsistent with the real dataset, while the results of DD3D are more realistic and encode the spatial relationship between parts. Additionally, Figure 5 illustrates that DD3D not only preserves geometric details but also generates informative samples, further validating its effectiveness in 3D dataset distillation.

5.6. Time and Space Overhead

We compare the overhead between DD and DD3D from multiple views. Firstly, Figure 6(a) shows that the time overhead of DD3D is slightly higher than DD due to the generation of synthetic data. Then, we can observe from Figure 6(b) that the memory budget of DD grows faster than DD3D as the value of CPC increases. DD3D can save the budget of synthetic data by sharing the generator between different classes, and its memory is nearly 4x smaller than DD when CPC=10. Figure 6(c) illustrates the changes in time and space overhead of DD3D at different resolutions. We can see that training with low-resolution point clouds significantly reduces overhead, which is important for resource-constrained scenarios, such as edge computing.

6. Conclusion

This paper introduces DD3D for 3D point cloud distillation, which matches the rotation-invariant data distribution be-

Point Cloud Dataset Distillation

tween real and synthetic data by transforming point clouds into a canonical orientation. Once trained, DD3D can synthesize point clouds at arbitrary resolutions, reducing memory budget and improving scalability. Extensive experiments on both classification and segmentation tasks validate the superiority of DD3D over traditional DD methods. A promising direction is to initialize DD3D with real data to improve its performance.

Acknowledgment

This project is supported by the National Research Foundation, Singapore, under its Medium Sized Center for Advanced Robotics Technology Innovation.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. In particular, it aims to accelerate the training of deep neural networks and reduce memory overhead. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

Cazenavette, G., Wang, T., Torralba, A., Efros, A. A., and Zhu, J. Dataset distillation by matching training trajectories. In CVPR, pp. 10708 10717. IEEE, 2022.

Cazenavette, G., Wang, T., Torralba, A., Efros, A. A., and Zhu, J. Generalizing dataset distillation via deep generative prior. In CVPR, pp. 3739 3748. IEEE, 2023.

Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. Shapenet: An informationrich 3d model repository. Co RR, abs/1512.03012, 2015.

Chen, Y., Liu, S., and Wang, X. Learning continuous image representation with local implicit image function. In CVPR, pp. 8628 8638. Computer Vision Foundation / IEEE, 2021.

Deitke, M., Liu, R., Wallingford, M., Ngo, H., Michel, O., Kusupati, A., Fan, A., Laforte, C., Voleti, V., Gadre, S. Y., Vander Bilt, E., Kembhavi, A., Vondrick, C., Gkioxari, G., Ehsani, K., Schmidt, L., and Farhadi, A. Objaverse-xl: A universe of 10m+ 3d objects. In Neur IPS, 2023.

Deng, C., Litany, O., Duan, Y., Poulenard, A., Tagliasacchi, A., and Guibas, L. J. Vector neurons: A general framework for so(3)-equivariant networks. In ICCV, pp. 12180 12189. IEEE, 2021.

Deng, Z. and Russakovsky, O. Remember the past: Distilling datasets into addressable memories for neural networks. In Neur IPS, 2022.

Du, J., Jiang, Y., Tan, V. Y. F., Zhou, J. T., and Li, H. Minimizing the accumulated trajectory error to improve dataset distillation. In CVPR, pp. 3749 3758. IEEE, 2023.

Geng, J., Chen, Z., Wang, Y., Woisetschlaeger, H., Schimmler, S., Mayer, R., Zhao, Z., and Rong, C. A survey on dataset distillation: Approaches, applications and future directions. In IJCAI, pp. 6610 6618. ijcai.org, 2023.

Guo, M., Cai, J., Liu, Z., Mu, T., Martin, R. R., and Hu, S. PCT: point cloud transformer. Comput. Vis. Media, 7(2): 187 199, 2021a.

Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 43(12):4338 4364, 2021b.

Guo, Z., Wang, K., Cazenavette, G., Li, H., Zhang, K., and You, Y. Towards lossless dataset distillation via difficultyaligned trajectory matching. In ICLR. Open Review.net, 2024.

Huang, Z., Johnson, J., Debnath, S., Rehg, J. M., and Wu, C. Pointinfinity: Resolution-invariant point diffusion models. In CVPR, 2024.

Jin, W., Zhao, L., Zhang, S., Liu, Y., Tang, J., and Shah, N. Graph condensation for graph neural networks. In ICLR. Open Review.net, 2022.

Kim, J., Kim, J., Oh, S. J., Yun, S., Song, H., Jeong, J., Ha, J., and Song, H. O. Dataset condensation via efficient synthetic-data parameterization. In ICML, volume 162 of Proceedings of Machine Learning Research, pp. 11102 11118. PMLR, 2022.

Kim, S., Park, J., and Han, B. Rotation-invariant localto-global representation learning for 3d point cloud. In Neur IPS, 2020.

Lei, S. and Tao, D. A comprehensive survey of dataset distillation. IEEE Trans. Pattern Anal. Mach. Intell., 46 (1):17 32, 2024.

Li, F., Fujiwara, K., Okura, F., and Matsushita, Y. A closer look at rotation-invariant deep point cloud analysis. In ICCV, pp. 16198 16207. IEEE, 2021.

Li, X., Li, R., Chen, G., Fu, C., Cohen-Or, D., and Heng, P. A rotation-invariant framework for deep point cloud analysis. IEEE Trans. Vis. Comput. Graph., 28(12):4503 4514, 2022.

Point Cloud Dataset Distillation

Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. Pointcnn: Convolution on x-transformed points. In Neur IPS, pp. 828 838, 2018.

Liu, S., Wang, K., Yang, X., Ye, J., and Wang, X. Dataset distillation via factorization. In Neur IPS, 2022.

Liu, Y., Bo, D., and Shi, C. Graph distillation with eigenbasis matching. In ICML, 2024.

Loo, N., Hasani, R. M., Amini, A., and Rus, D. Efficient dataset distillation using random feature approximation. In Neur IPS, 2022.

Ma, X., Qin, C., You, H., Ran, H., and Fu, Y. Rethinking network design and local geometry in point cloud: A simple residual MLP framework. In ICLR. Open Review.net, 2022.

Melnyk, P., Robinson, A., Felsberg, M., and Wadenb ack, M. Tetrasphere: A neural descriptor for o(3)-invariant point cloud analysis. In CVPR, pp. 5620 5630. IEEE, 2024.

Nguyen, T., Chen, Z., and Lee, J. Dataset meta-learning from kernel ridge-regression. In ICLR. Open Review.net, 2021.

Park, J. J., Florence, P. R., Straub, J., Newcombe, R. A., and Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, pp. 165 174. Computer Vision Foundation / IEEE, 2019.

Poulenard, A., Rakotosaona, M., Ponty, Y., and Ovsjanikov, M. Effective rotation-invariant point CNN with spherical harmonics kernels. In 3DV, pp. 47 56. IEEE, 2019.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pp. 77 85. IEEE Computer Society, 2017a.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NIPS, pp. 5099 5108, 2017b.

Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., and Ghanem, B. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. In Neur IPS, 2022.

Qiu, S., Anwar, S., and Barnes, N. Dense-resolution network for point cloud classification and segmentation. In WACV, pp. 3812 3821. IEEE, 2021.

Sachdeva, N. and Mc Auley, J. J. Data distillation: A survey. Co RR, abs/2301.04272, 2023.

Sener, O. and Savarese, S. Active learning for convolutional neural networks: A core-set approach. In ICLR, 2018.

Shin, D., Shin, S., and Moon, I. Frequency domain-based dataset distillation. In Neur IPS, 2023.

Singh, R., Shukla, A., and Turaga, P. K. Polynomial implicit neural representations for large diverse datasets. In CVPR, pp. 2041 2051. IEEE, 2023.

Sitzmann, V., Martel, J. N. P., Bergman, A. W., Lindell, D. B., and Wetzstein, G. Implicit neural representations with periodic activation functions. In Neur IPS, 2020.

Thomas, N., Smidt, T. E., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., and Riley, P. Tensor field networks: Rotationand translation-equivariant neural networks for 3d point clouds. Co RR, abs/1802.08219, 2018.

Uy, M. A., Pham, Q., Hua, B., Nguyen, D. T., and Yeung, S. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In ICCV, pp. 1588 1597. IEEE, 2019.

Wang, K., Zhao, B., Peng, X., Zhu, Z., Yang, S., Wang, S., Huang, G., Bilen, H., Wang, X., and You, Y. CAFE: learning to condense dataset by aligning features. In CVPR, pp. 12186 12195. IEEE, 2022.

Wang, K., Gu, J., Zhou, D., Zhu, Z., Jiang, W., and You, Y. Dim: Distilling dataset into generative model. Co RR, abs/2303.04707, 2023.

Wang, T., Zhu, J., Torralba, A., and Efros, A. A. Dataset distillation. Co RR, abs/1811.10959, 2018.

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon, J. M. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph., 38(5):146:1 146:12, 2019.

Wang, Z., Xu, Y., Lu, C., and Li, Y.-L. Dancing with images: Video distillation via static-dynamic disentanglement. In CVPR, 2024.

Welling, M. Herding dynamical weights to learn. In ICML, volume 382, pp. 1121 1128, 2009.

Wu, W., Qi, Z., and Li, F. Pointconv: Deep convolutional networks on 3d point clouds. In CVPR, pp. 9621 9630. Computer Vision Foundation / IEEE, 2019.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pp. 1912 1920, 2015.

Xiao, Z., Lin, H., Li, R., Geng, L., Chao, H., and Ding, S. Endowing deep 3d models with rotation invariance based on principal component analysis. In ICME, pp. 1 6. IEEE, 2020.

Point Cloud Dataset Distillation

Xu, J., Tang, X., Zhu, Y., Sun, J., and Pu, S. Sgmnet: Learning rotation-invariant point cloud representations via sorted gram matrix. In ICCV, pp. 10448 10457. IEEE, 2021.

Yi, L., Kim, V. G., Ceylan, D., Shen, I., Yan, M., Su, H., Lu, C., Huang, Q., Sheffer, A., and Guibas, L. J. A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph., 35(6):210:1 210:12, 2016.

Yu, H., Hou, J., Qin, Z., Saleh, M., Shugurov, I., Wang, K., Busam, B., and Ilic, S. RIGA: rotation-invariant and globally-aware descriptors for point cloud registration. IEEE Trans. Pattern Anal. Mach. Intell., 46(5): 3796 3812, 2024a.

Yu, R., Wei, X., Tombari, F., and Sun, J. Deep positional and relational feature learning for rotation-invariant point cloud analysis. In ECCV, volume 12355, pp. 217 233. Springer, 2020.

Yu, R., Liu, S., and Wang, X. Dataset distillation: A comprehensive review. IEEE Trans. Pattern Anal. Mach. Intell., 46(1):150 170, 2024b.

Yu, X., Xu, M., Zhang, Y., Liu, H., Ye, C., Wu, Y., Yan, Z., Zhu, C., Xiong, Z., Liang, T., Chen, G., Cui, S., and Han, X. Mvimgnet: A large-scale dataset of multi-view images. In CVPR, pp. 9150 9161. IEEE, 2023.

Zhang, D. J., Wang, H., Xue, C., Yan, R., Zhang, W., Bai, S., and Shou, M. Z. Dataset condensation via generative model. Co RR, abs/2309.07698, 2023.

Zhang, H., Su, S., Zhu, Y., Sun, J., and Zhang, Y. GSDD: generative space dataset distillation for image superresolution. In AAAI, pp. 7069 7077. AAAI Press, 2024a.

Zhang, Z., Yang, L., and Xiang, Z. Risurconv: Rotation invariant surface attention-augmented convolutions for 3d point cloud classification and segmentation. In ECCV (28), volume 15086 of Lecture Notes in Computer Science, pp. 93 109. Springer, 2024b.

Zhao, B. and Bilen, H. Dataset condensation with differentiable siamese augmentation. In ICML, volume 139 of Proceedings of Machine Learning Research, pp. 12674 12685. PMLR, 2021.

Zhao, B. and Bilen, H. Synthesizing informative training samples with GAN. Co RR, abs/2204.07513, 2022.

Zhao, B. and Bilen, H. Dataset condensation with distribution matching. In WACV, pp. 6503 6512. IEEE, 2023.

Zhao, B., Mopuri, K. R., and Bilen, H. Dataset condensation with gradient matching. In ICLR. Open Review.net, 2021.

Zhao, C., Yang, J., Xiong, X., Zhu, A., Cao, Z., and Li, X. Rotation invariant point cloud classification: Where local geometry meets global topology. Co RR, abs/1911.00195, 2019.

Zhou, Y., Nezhadarya, E., and Ba, J. Dataset distillation using neural feature regression. In Neur IPS, 2022.

Point Cloud Dataset Distillation

A. Proof of Theorems

Theorem A.1. Assume the classifier is a linear layer W and Lcls can be simplified to the mean-squared error XW Y 2 F . The objective of gradient matching is equal to variance preserving:

min S LGM = min S D W LS cls, W LT cls min S X S XS X T XT 2

where D is a distance metric and W is the gradient with respect to W.

Proof. The gradient of XW Y 2 F is denoted as = X (XW Y ). We can then match the gradients between the real and synthetic data:

|| S T ||2 F = ||X S (XSW YS) X T (XT W YT )||2 F (16)

||W||2 F ||X S XS X T XT ||2 F | {z } Variance

+ ||X S YS X T YT ||2 F | {z } Mean

We can see that the first term is to preserve the variance of real data, and the second term aligns the average representations of samples belonging to the same class. These two terms can be combined if we set XS = XS X S YS and XT = XT X T YT for each class. Then we only need to match the variance between XS and XT .

Theorem A.2. Assume XT follows a d-dimensional multivariate Gaussian distribution N(µ, Σ). Let X T be the rotated representations of XT such that:

λmax E h X T X T i λmax E h XT XT i σmax (E [X T ]) σmax (E [XT ]) , (18)

where λmax and σmax are the maximum eigenvalues and singular values, respectively.

Proof. Firstly, the largest eigenvalue of the covariance matrix XT XT is equal to the largest singular value of XT . Therefore, we only prove the first inequality.

Secondly, as X X = Pn i=1 x i xi, for XT N(µ, Σ), we have:

E X T XT = E

i=1 E x i xi = n µ µ + Σ , (19)

E h X T X T i = E

i=1 R i x i xi Ri

i=1 E R i x i xi Ri =

i=1 R i E x i xi Ri. (20)

Thirdly, we have:

λmax E X T XT = nλmax µ µ + Σ , (21)

λmax E h X T X T i = λmax

i=1 R i E x i xi Ri

i=1 λmax R i E x i xi Ri (22)

i=1 λmax R i µ µRi + R i ΣRi λmax E X T XT . (23)

The above inequality shows that the largest eigenvalue of E X T XT is the upper bound of E h X T X T i . The equality holds if and only if the random rotation matrices are commutative, which is infeasible in practice.

B. Implementation Details of DD3D

Here, we explain some details of DD3D, consisting of two important components: a point cloud rotator and a point-wise generator. Both components are built based on the SIREN (Sitzmann et al., 2020) model, which stacks multiple fully connected layers with sin( ) activation to capture the high-frequency information. The Py Torch code is shown in Algorithm 2, where some details are highlighted.

Point Cloud Dataset Distillation

Algorithm 2 Py Torch code of DD3D

1 import torch 2 import torch.nn as nn 3 import SIREN 4 5 class Rotator(nn.Module): 6 def __init__(self, hidden_dim, w0): 7 super().__init__() 8 9 # w0 is to adjust the frequency of sine function 10 self.sign_encoder = SIREN(1, hidden_dim, w0=w0) 11 self.sign_decoder = SIREN(hidden_dim, 1, w0=1.) 12 13 def forward(self, x): 14 x = x.unsqueeze(-1) # x: [B, N, 3, 1] 15 16 feat = self.sign_encoder(x).mean(dim=1, keepdim=True) # [B, N, 3, 1] -> [B, 1, 3, d] 17 feat = self.sign_decoder(feat) # [B, 1, 3, d] -> [B, 1, 3, 1] 18 sign = torch.sign(feat) # sign-equivariant 19 20 x = x * sign # [B, N, 3, 1] * [B, 1, 3, 1] -> [B, N, 3, 1] 21 return x.squeeze(-1) 22 23 24 class Conditional Generator(nn.Module): 25 def __init__(self, genetator, num_classes, cpc, condition_dim, num_layers): 26 super().__init__() 27 28 self.genetator = genetator 29 self.lookup = nn.Embedding(num_classes * cpc, condition_dim) # class index as condition 30 self.num_layers = num_layers 31 32 self.layers = nn.Module List([]) 33 34 for _ in range(self.num_layers - 1): 35 self.layers.append(nn.Sequential(nn.Linear(condition_dim, condition_dim), nn.Re LU())) 36 37 def forward(self, noise, class_indices): 38 39 # noise [B, N, 1] 40 # class_inices [B, C] 41 42 mod = self.lookup(class_indices) 43 mods = [mod] 44 45 for layer in self.layers: 46 mod = layer(mod) 47 mods.append(mod) 48 49 return self.genetator(noise, tuple(mods))

C. Details of Datasets

The detailed statistical information of the datasets used in this paper is shown in Table 6. We list the sources of the datasets and their licenses in the following.

Scan Object NN: https://github.com/feiran-l/rotation-invariant-pointcloud-analysis

Model Net40: http://modelnet.cs.princeton.edu/Model Net40.zip

MVPNet: https://github.com/GAP-LAB-CUHK-SZ/MVImg Net

Shape Net: https://github.com/feiran-l/rotation-invariant-pointcloud-analysis

D. Hyperparameters

The hyperparameters of baselines and DD3D are listed in Tables 7 and 8, respectively.

Point Cloud Dataset Distillation

Table 6: Details of datasets

Scan Object NN Model Net40 MVPNet Shape Net

# Shape Classes 15 40 100 16 # Part Classes - - - 50 # Training Samples 2,322 9,843 62,494 14,007 # Validation Samples 580 2,468 15,670 2,874 Resolution 1,024 1,024 1,024 2,048

Table 7: Hyperparameters used for Data Synthesis.

Scan Object NN Model Net40 MVPNet100 Shape Net

Optimizer Adam Adam Adam Adam Initial LR 0.001 0.001 0.001 0.001 Batch Size 32 32 64 32 Iterations 200 400 600 200 Weight Decay 0.0005 0.0005 0.0005 0.0005 Augmentation Scale, Jitter, Rotate Scale, Jitter, Rotate Scale, Jitter, Rotate Scale, Jitter, Rotate

Scheduler Step LR (Decay 0.1 / 100 iter) Step LR (Decay 0.1 / 100 iter) Step LR (Decay 0.5 / 200 iter) -

Table 8: Hyperparameters used for Validation.

Scan Object NN Model Net40 MVPNet100 Shape Net

Optimizer Adam Adam Adam Adam Initial LR 0.001 0.001 0.001 0.001 Batch Size 8 8 32 8 Epochs 200 200 200 200 Weight Decay 0.0005 0.0005 0.0005 0.0005 Augmentation Scale, Jitter, Rotate Scale, Jitter, Rotate Scale, Jitter, Rotate Scale, Jitter, Rotate

Scheduler Step LR (Decay 0.1 / 100 epoch) Step LR (Decay 0.1 / 100 epoch) Cosine Annealing LR -