# general_neural_gauge_fields__0850d5aa.pdf

Published as a conference paper at ICLR 2023

GENERAL NEURAL GAUGE FIELDS

Fangneng Zhan, Lingjie Liu, Adam Kortylewski, Christian Theobalt Max Planck Institute for Informatics, 66123, Germany {fzhan,lliu,akortyle,theobalt}@mpi-inf.mpg.de

The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computation efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate systems can be typically dubbed as gauge transformation, which is usually a pre-defined mapping function, e.g., orthogonal projection or spatial hash function. This begs a question: can we directly learn a desired gauge transformation along with the neural field in an end-to-end manner? In this work, we extend this problem to a general paradigm with a taxonomy of discrete & continuous cases, and develop an end-to-end learning framework to jointly optimize the gauge transformation and neural fields. To counter the problem that the learning of gauge transformations can collapse easily, we derive a general regularization mechanism from the principle of information conservation during the gauge transformation. To circumvent the high computation cost in gauge learning with regularization, we directly derive an information-invariant gauge transformation which allows to preserve scene information inherently and yield superior performance.

1 INTRODUCTION

Representing 3D scenes with high efficiency and quality has been a long-standing target in computer vision and computer graphics research. Recently, the implicit representation of neural radiance fields (Mildenhall et al., 2021) has shown that a 3D scene can be modeled with neural networks, which achieves compelling visual quality and low memory footprint. However, it suffers from long training times. The explicit voxel-based methods (Yu et al., 2021a; Sun et al., 2022) emerged with faster convergence but higher memory requirements, due to the use of the 3D voxel grids as scene representation. To strike a good balance between computation efficiency and rendering quality, EG3D (Chan et al., 2022) proposed to project the 3D coordinate system to a tri-plane system. Along with this line of research, Tenso RF (Chen et al., 2022) factorizes 3D space into compact low-rank tensors; Instant-NGP (M uller et al., 2022) models the 3D space with multi-resolution hash grids to enable remarkably fast convergence speed.

These recent works (e.g., EG3D and Instant-NGP) share the same prevailing paradigm by converting the 3D coordinate of neural fields to another coordinate system. Particularly, a coordinate system of the scene (e.g., 3D coordinate and hash table) can be regarded as a kind of gauge, and the conversion between the coordinate systems can be referred to as gauge transformation (Moriyasu, 1983). Notably, existing gauge transformations in neural fields are usually pre-defined functions (e.g., orthogonal mappings and spatial hash functions (Teschner et al., 2003)), which are sub-optimal for modeling the neural fields as shown in Fig. 1. In this end, a learnable gauge transformation is more favored as it can be optimized towards the final objective. This raises an essential question: how to learn the gauge transformations along with the neural fields. Some previous works explore a special case of this problem, e.g., Neu Tex (Xiang et al., 2021) and Ne P (Ma et al., 2022) aim to transform 3D points into continuous 2D manifolds. However, a general and unified learning paradigm for various gauge transformations has not been established or explored currently.

In this work, we introduce general Neural Gauge Fields which unify various gauge transformations in neural fields with a special focus on how to learn the gauge transformations along with neural fields. Basically, a gauge is defined by gauge parameters and gauge basis, e.g., codebook indices

Published as a conference paper at ICLR 2023

3D Scene Points Multi-view Images Learned Mapping Orthogonal Mapping1

Orthogonal Mapping2

Figure 1: Conceptual illustration of a gauge transformation from 3D point coordinates to a 2D plane. Instead of naively employ a pre-defined orthogonal mapping which incurs overlap on the 2D plane, the proposed neural gauge fields aim to learn the mapping along with neural fields driven by multi-view synthesis loss.

and codebook vectors, which can be continuous or discrete. Thus, the gauge transformations for neural fields can be duly classified into continuous cases (e.g., triplane space) and discrete cases (e.g., a hash codebook space). We then develop general learning paradigms for continuous cases and discrete cases, which map a 3D point to a continuous coordinate or a discrete index in the target gauge, respectively.

As typical cases, we study continuous mapping from 3D space to the 2D plane and discrete mapping from 3D space to 256 discrete vectors. As shown in Fig. 2 and 3, we observed that naively optimizing the gauge transformations with the neural fields severely suffers from gauge collapse, which means the gauge transformations will collapse to a small region in a continuous case or collapse to a small number of indices in a discrete case Baevski et al. (2019); Kaiser et al. (2018). To regularize the learning of gauge transformation, a cycle consistency loss has been explored in Xiang et al. (2021); Ma et al. (2022) to avoid many-to-one mapping; a structural regularization is also adopted in Tretschk et al. (2021) to preserve local structure by only predicting a coordinate offset. However, the cycle consistency regularization tends to be heuristic without grounded derivation while the structural regularization is constrained to regularize continuous cases. In this work, we introduce a more intuitive Information Regularization (Info Reg) from the principle of information conservation during gauge transformation. By maximizing the mutual information between gauge parameters, we successfully derive general regularization forms for gauge transformations. Notably, a geometric uniform distribution and a discrete uniform distribution are assumed as the prior distribution of continuous and discrete gauge transformations, respectively, where an Earth Mover s distance and a KL divergence are duly applied to measure the distribution discrepancy for regularization.

Learning the gauge transformation with regularization usually incurs high computation cost which is infeasible for some practical applications such as fast scene representation. In line with relative information conservation, we directly derive an Information-Invariant (Info Inv) gauge transformation which allows to preserve scene information inherently in gauge transformations and obviate the need for regularizations. Particularly, the derived Info Inv coincides with the form of position encoding adopted in Ne RF (Mildenhall et al., 2021), which provides certain rationale for the effectiveness of position encoding in neural fields.

The contributions of this work are threefold. First, we develop a general framework of neural gauge fields which unifies various gauge transformations in neural fields, and gives general learning forms for continuous and discrete gauge transformations. Second, we strictly derive a regularization mechanism for gauge transformation learning from the perspective of information conservation during gauge transformation that outperform earlier heuristic approaches. Third, we present an information nested function to preserve scene information inherently during gauge transformations.

2 RELATED WORK

Recent work has demonstrated the potential of neural radiance fields (Mildenhall et al., 2021) and its extensions for multifarious vision and graphics applications, including fast view synthesis (Liu et al., 2020; Yu et al., 2021b; Hedman et al., 2021; Lindell et al., 2021; Neff et al., 2021; Yu et al., 2021a; Reiser et al., 2021; Sun et al., 2022), generative models (Schwarz et al., 2020; Niemeyer & Geiger, 2021; Gu et al., 2022; Chan et al., 2021; Or-El et al., 2022), surface reconstruction (Wang et al., 2021; Oechsle et al., 2021; Yariv et al., 2021), etc. Under this context, various gauge transformations have

Published as a conference paper at ICLR 2023

3D Scene Image Learning Continuous + Info Reg Learning Continuous + Cycle Learning Continuous

Learning Continuous + Stru Reg

Figure 2: Continuous gauge transformation from 3D scene to 2D square. Without regularization, the learned gauge transformation will collapse to a small region in the 2D square. Including cycle regularization (Xiang et al., 2021) or structural regularization (Stru Reg) Tretschk et al. (2021) don t fully solve the problem, and our Info Reg achieves a clearly better regularization performance.

Learning Discrete Learning Top-4 Learning Discrete + Info Reg Learning Top-4 + Info Reg Learning Discrete + Cycle

Figure 3: Discrete gauge transformation from 3D scene to 256 discrete vectors. Without regularization, the learned gauge transformation will collapse to a small number of vector indices (horizontal axis). The learning with cycle regularization (Xiang et al., 2021) is still significantly collapse-prone, while our Info Reg enables to alleviates the collapse substantially.

been intensely explored in neural fields for different purposes, e.g., reducing computation cost (Chan et al., 2022), fast convergence (Chen et al., 2022; M uller et al., 2022), editable UV map (Xiang et al., 2021; Ma et al., 2022).

A pre-defined mapping function is usually adopted as gauge transformation. A simple case is orthogonal mapping, which projects 3D space into a continuous low-dimension space. For instance, EG3D (Chan et al., 2022; Peng et al., 2020) orthogonally projects 3D space into single plane or triplane space for scene modelling. Tenso RF (Chen et al., 2022) further extends this line of research for fast and efficient optimization by mapping the 3D scene space into a set of low-dimension tensors. With a pre-defined spatial hash function, the recent Instant-NGP (M uller et al., 2022) maps 3D grid points into hash table indices and achieves a speedup of several orders of magnitude. Besides, PREF (Huang et al., 2022) explores to transform 3D scene points to a set of Fourier basis via Fourier transformation, which helps to reduce the aliasing caused by interpolation.

Some works also explore learning the gauge transformation for downstream tasks in neural fields. For example, Neu Tex (Xiang et al., 2021) and Ne P (Ma et al., 2022) propose to learn the mapping from 3D points to 2D texture spaces; GRAM (Deng et al., 2022) learns to map the 3D space into a set of implicit 2D surfaces; VQ-AD Takikawa et al. (2022) enables compressive scene representation by mapping coordinates to discrete codebook, which is a discrete case of gauge transformation. However, all above works suffer from gauge collapse without regularization or designed initialization. Besides, a popular line of research (Tretschk et al., 2021; Tewari et al., 2022; Park et al., 2020; Pumarola et al., 2020; Liu et al., 2021; Peng et al., 2021) learns the deformation from the actual 3D space into another (canonical) 3D space, where a structural regularization is usually applied by predicting a deformation offset instead of the absolute coordinate.

This section will start with the introduction of neural gauge fields framework, followed by the derivation of learning regularization based on information conservation. The nested gauge transformation which aims to preserve scene information inherently is then introduced and derived. Some discussion and insights for the application of gauge transformations are drawn in the end.

Published as a conference paper at ICLR 2023

3.1 NEURAL GAUGE FIELDS FRAMEWORK

In normal usage, a gauge defines a measuring system, e.g., pressure gauge and temperature gauge. Under the context of neural fields, a measuring system (i.e., gauge) is a specification of parameters to index a neural field, e.g., 3D Cartesian coordinate system, triplane in EG3D (Chan et al., 2022), hash table in Instant-NGP (M uller et al., 2022). The transformation between different measuring systems is dubbed as Gauge Transformation.

Based on this intuition, we introduce the general framework of Neural Gauge Fields which consists of a gauge transformation and a neural field. A neural gauge field aims to transform the original space to another gauge system to index a neural field. This additional transform could introduce certain bonus to the neural field, e.g., low memory cost, high rendering quality, or explicit texture, depending on the purpose of the model. The gauge transformation could be a pre-defined function e.g., an orthogonal mapping in EG3D and a spatial hash function in Instant-NGP, or directly learned by the neural network along with neural fields. Intuitively, compared with a pre-defined function, a learnable gauge transformation is more favorable as it can be optimized towards the final use case of the neural field and possibly yields better performance. For learning neural gauge fields, we disambiguate between two cases: continuous (e.g., 2D plane and sphere surface) and discrete (e.g., hash table space) mappings.

Continuous Gauge Transformation. As the target gauge system is continuous, the gauge transformation can be handled by directly regressing the coordinate (i.e., gauge parameters) in the target gauge. For a point x R3 in the 3D space and a target gauge, a gauge network M (a MLP structure or transformation matrix) can be employed to predict the target coordinate as x = M(x), x RN, where N is the coordinate dimension of the target gauge. Then, the predicted coordinate x can be fed into an MLP or used to look up feature vectors to model neural fields.

Discrete Gauge Transformation. Without loss of generality, we assume there is a codebook with N discrete vectors [v1, v2, , v N], and the discrete gauge transformation aims to transform a 3D point to a discrete codebook index (i.e., gauge parameter). In line with the setting of Instant-NGP, we divide the 3D space into a grid of size M M M. For each grid point x, we apply a gauge network M to predict its discrete distribution Px RN over the N discrete vectors, followed by a Top-k (k=1 by default) operation to select the Top-k probable indices i1, i2, , ik as below:

Px = M(x), i1, i2, , ik = Top-k(Px) Px RN. (1)

The produced indices are usually used to lookup corresponding vectors from the codebook 1, and the features of other 3D points can be acquired by interpolating the nearby grid point vectors. As the Argmax in Top-k operation is not differentiable, a reparameterization trick (Bengio et al., 2013) is applied to approximate gradient by replacing Argmax with Softmax in back-propagation. The pseudo code of the forward & backward propagation of discrete cases is given in Algorithm 1.

Algorithm 1 Pseudo code of forward & backward propagation in learning discrete gauge transformation.

Input: A 3D point x, predicted distribution Px = [p1, p2, , p N], codebook V.

Forward propagation: 1. index = Argmax(Px) 2. index hard = One Hot(index) 3. feature lookup: vx = Matmul(index hard, V) Backward propagation: 1. index soft = Softmax(Px) 2. feature lookup: vx = Matmul(index soft, V) Output: feature vx.

We now have established the general learning paradigms for continuous and discrete gauge transformations. In our setting, the gauge transformations are optimized along with the neural field supervised by the per-pixel color loss between the rendering and the ground-truth multiview images. However, we found the learning of gauge transformations is easily stuck in local minimal. Specifically, the gauge transformation tends to collapse to a small region in continuous cases as in Fig. 2 or collapse to a small number of indices in discrete cases as in Fig. 3, highlighting the need for an additional regularization to enable an effective learning of continuous and discrete gauge transformations, which we will introduce in the following.

1Actually, the predicted indices can also be fed into an MLP for embedding, but this manner tends to be meaningless in practical applications and is seldomly adopted.

Published as a conference paper at ICLR 2023

3.2 REGULARIZATION FOR LEARNING GAUGE TRANSFORMATIONS

Some previous works (Xiang et al., 2021; Ma et al., 2022) leverage a cycle consistency loss to regularize the mapping, preventing many-to-one mapping (gauge collapse) of 3D points. Alternatively, a structural regularization is also employed in some works (Tretschk et al., 2021; Tewari et al., 2022) to preserve local structure by predicting a residual offset instead of the absolute coordinate.

However, the cycle loss only works for certain continuous cases and shows poor performance for the regularization of discrete cases, while the structural regularization cannot be applied to discrete cases at all. In this work, we strictly derive a general regularization mechanism for gauge transformation from the perspective of information conservation which is superior to the heuristic regularizers used in prior work. Specifically, although the gauge of the scene has been changed, the intrinsic information of the scene is expected to be preserved during gauge transformation, which can be achieved by maximizing the mutual information between the original 3D points and target gauge parameters.

With the gauge network M for gauge transformation, the sets of original 3D points and target gauge parameters can be termed as X and Y = M(X), respectively. Then, the mutual information between X and Y can be formulated as:

I(X, Y ) = ZZ p(x, y) log p(x, y)

p(x)p(y)dxdy = ZZ p(y|x)p(x) log p(y|x)

p(y) dxdy, (2)

where p(x) and p(y) are the distributions of original 3D point x and target gauge parameter y, p(y|x) is the conditional distribution of y produced by the gauge transformation of x. To avoid that the distribution of Y collapses into a local region or small number of indices, we assume that Y obeys a prior uniform distribution q(y), and the KL divergence between p(y) and q(y) can be written as KL(p(y)||q(y)) = R p(y) log p(y)

q(y)dy. Then, the final regularization loss can translate into the following term (see detailed derivation in the Appendix):

Lreg = min p(y|x)

n (γ + ϵ) E[KL(p(y|x)p(x)||p(y)p(x))] + ϵ E[KL(p(y|x)||q(y))] o , (3)

where γ and ϵ denote the weight of regularization and prior distribution discrepancy.

Regarding the first term KL(p(y|x)p(x)||p(y)p(x)) in Eq. 3, we can replace the unbounded KL divergence with the upper-bounded JS divergence as: JS(p(y|x)p(x)||p(y)p(x)).

Lemma 1 Jensen-Shannon Mutual Information Estimator (Nowozin et al., 2016):

JS(P, Q) = max T

Ex p(x)[ sp( T(x))] Ex q(x)[sp(T(x))] , (4)

where sp(z) = log(1 + ez). For JS(p(y|x)p(x)||p(y)p(x)), we thus have:

E(x+,y) p(x)p(y|x)[ sp( T(x+, y))] E(x ,y) p(x)p(y)[sp(T(x , y))] , (5)

where T denotes a network which minimizes the distance of positive pairs and maximizes the distance of negative pairs. Please refer to section 4.1 for detailed implementation of T. Actually, Cycle Reg is special implementation of Info Reg by discarding the prior distribution discrepancy.

Now the first term of Eq. (3) has been well solved. We then turn to the second term E[KL(p(y|x)||q(y))] (i.e., the KL divergence between p(y|x) and q(y)). This term can be solved according to the types of prior uniform distributions q(y) assumed in continuous and discrete cases.

Continuous Gauge Transformation. To prevent that the gauge transformation collapses to a local region, we expect the prior distribution q(y) is a uniform distribution over the target gauge space. Without loss of generality, we assume the target gauge space is a 2D plane as shown in Fig. 4 (a). Then, we uniformly sampled h points on the 2D plane as a discrete approximation of the prior uniform distribution, which is denoted by q(y) = [u1, u2, , uh], ui R2). With a set of 3D points transformed into 2D points, the conditional distribution p(y|x) can also be approximated by sampling h points from the predicted 2D points according to radiance contribution (see A.3), which is denoted by p(y|x) = [v1, v2, , vh], vi R2. Considering q(y) and p(y|x) are distributions over a geometric space, KL divergence is inapposite to measure their discrepancy as the distribution geometry is not concerned in KL divergence. We thus resort to a geometry-aware metric: Earth Mover s Distance (EMD) (Rubner et al., 2000).

Published as a conference paper at ICLR 2023

KL Divergence 1 2 N

1 N-1 k N 0 0

EM Distance

Initial State Middle State Final State

(a) EM Distance for Continuous Cases (b) KL for Discrete Cases

Figure 4: (a) Earth Mover s distance to regularize the discrepancy between p(y|x) and q(y) in continuous case (2D plane), where p(y|x) and q(y) are predicted 2D points within a batch and uniformly sampled points in the 2D plane, respectively. (b) KL divergence to regularize the distribution discrepancy in discrete cases (N basis vectors), where p(y|x) and q(y) are the average of predicted one-hot distributions and a discrete uniform distribution, respectively.

Specifically, a cost matrix C and transport matrix M with size of (h, h) can be defined where each entry Cij in C gives the distance between points vi and uj, each entry Mij in M represents the amount of probability moved between points vi and uj. Then, the EMD between q(y) and p(y|x) can be formulated as: LEMD = min M (Ph i=1 Ph j=1 Cij Mij). The distribution of predicted points

will be regularized to fit the uniform distribution by minimizing LEMD as shown in Fig. 4 (a).

Discrete Gauge Transformation. For discrete gauge transformation, we expect the prior distribution q(y) to be a uniform distribution over the discrete vectors (i.e., the gauge basis), so that the target gauge parameters will not collapse to a small number of vector indices. Thus, the prior distribution q(y) can be denoted by q(y) = [ 1

N ] for N discrete vectors. As the one-hot distribution pi is predicted from 3D points as described in Algorithm 1, the conditional distribution p(y|x) can be approximated by the average of all predicted one-hot distributions over a training batch, which is denoted by p(y|x) = PB i=1 pi/N where B is the number of samples within a training batch. Then the KL divergence between q(y) and p(y|x) can be duly computed as shown in Fig. 4 (b).

3.3 INFORMATION-INVARIANT GAUGE TRANSFORMATION

Learning gauge transformation and applying regularization to preserve information usually incur additional computation cost, which is not desired in many applications. We thus introduce an Information-Invariant (Info Inv) gauge transformation which enables to preserve scene information inherently and avoids the cumbersome process of learning regularization.

Specifically, for two positions m, n, we apply a gauge transformation g to yield their new gauge parameters f(m) and f(n). As aiming to preserve the scene information, the relative position relationship between points should be preserved accordingly, i.e.:

Cos(g(m), g(n)) = < g(m), g(n) >

g(m) g(n) = h(m n), (6)

where Cos and <> denotes cosine similarity and inner product, h is a function of m n. For simplicity, we assume g(m) and g(n) are two-dimension vectors. Thus, < g(m), g(n) > can be written as ℜ[g(m)g (n)] where ℜsymbolises the real part of a complex value. With the exponential form of a complex number, we have:

g(m) = g(m) eiΘ(m), g(n) = g(n) eiΘ(n). (7)

Then Eq. (6) can be translated as Cos(g(m), g(n)) = ℜ[g(m)g (n)] g(m) g(n) = ℜ[ei(Θ(m) Θ(n))] = h(m n). One of the closed-form solution of Θ(m) can be derived as Θ(m) = mθ, which further

yields g(m) = g(m) ei(mθ). Specifically, ei(mθ) can be written as cos(mθ) sin(mθ)

. In implementa-

tion, g(m) can be approximated by a grid or absorbed by a network, θ is pre-defined or learned parameter. The 2-dimension case of g(m) can be easily generalized to higher dimension by stacking multiple 2-dimension cases, whose form coincides with the heuristic position encoding adopted in

Published as a conference paper at ICLR 2023

Continuous Gauge Transformation Discrete Gauge Transformation Regularizations S-Ne RF S-NSVF DTU T&T S-Ne RF S-NSVF DTU T&T Baseline (w/o Regularization) 0.021 0.017 0.014 0.019 0.102 0.094 0.063 0.078 +Stru Reg (Tretschk et al., 2021) 0.372 0.360 0.325 0.354 N/A N/A N/A N/A +Cycle Reg (Xiang et al., 2021) 0.547 0.562 0.568 0.522 0.105 0.086 0.076 0.088 +Info Reg 0.793 0.752 0.772 0.806 0.992 1.000 0.996 1.000

Table 1: The PSNR performance with different regularizations for the learning of continuous gauge transformation (3D space 2D plane) and discrete gauge transformation (3D space 256 vectors).

S-Ne RF S-NVSF DTU T&T Gauge Transformations PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM Orthogonal Mapping 18.20 0.816 19.32 0.833 23.31 0.845 19.97 0.829 Learning Continuous Gauge 30.74 0.919 32.51 0.949 27.06 0.873 25.95 0.867 Spatial Hash Function 27.06 0.902 29.81 0.932 27.02 0.876 26.22 0.869 Learning Discrete Gauge 26.56 0.893 29.13 0.925 26.21 0.863 24.79 0.855 Learning Top-4 Gauge 29.39 0.917 31.39 0.942 28.77 0.887 27.14 0.892

Table 2: Performance of different methods (pre-defined or learned) for gauge transformations. and indicate continuous gauge transformation (3D space 2D plane) and discrete gauge transformation (3D space 256 vectors). Orthogonal mapping and spatial hash function are typical pre-defined functions for continuous and discrete gauge transformations, respectively.

Ne RF (Mildenhall et al., 2021). The yielded parameters g(m) and cos(mθ) sin(mθ)

can be leveraged

to index neural fields via MLP networks or feature grids.

3.4 DISCUSSION

As a general framework, learning gauge transformations with Info Reg or Info Inv gauge transformation can be applied in various computer vision or computer graphic tasks.

As concrete examples, learning gauge transformation with Info Reg can be applied to 3D space 2D plane mapping to learn explicit and editable textures (Xiang et al., 2021), or 3D space codebook mapping to yield compressed neural fields (Takikawa et al., 2022). However, with relatively high computation cost, learning gauge transformation with Info Reg cannot be applied to fast scene representation or reconstruction naively. We leave this efficient extension as our future work.

As a kind of gauge transformation, Info Inv can be applied to index neural fields directly with multiscale setting (depending on how many different θ are selected). Notably, Info Inv allows to preserve scene information inherently without the cumbersome of regularization, which means it can be seamlessly combined with other gauge transformations (e.g., triplane projection and spatial hash function) to achieve fast scene representation and promote rendering performance.

4 EXPERIMENTS

4.1 IMPLEMENTATION

The gauge network M for learning gauge transformations can be a MLP network or a transformation matrix, depending on the specific downstream application. For the mapping from 3D space to 2D plane to get (view-dependent) textures, the neural field is modeled by a MLP-based network which takes predicted 2D coordinates, i.e., output of the gauge network, and a certain view to predict color and density. For the mapping from 3D space to discrete codebooks, the neural field is modeled by looking up features from the codebook, followed by a small MLP of two layers to predict color and density. By default, the codebook has two layers and each layer contains 256 vectors with 128 dimensions. In line with Instant-NGP (M uller et al., 2022), the 3D space is also divided into two-level 3D grids with size 16 16 16 and 32 32 32 for discrete gauge transformation.

Published as a conference paper at ICLR 2023

(a) 2D Mapping in 360 View Scenes (b) 2D Mapping in Limited View Scenes

Rendered Images Learned Mapping

Rendered Images Orthogonal Mapping

Rendered Images Orthogonal Mapping Rendered Images Learned Mapping

Figure 5: Qualitative results of continuous gauge transformation from 3D space to 2D plane on (a) 360 view scenes and (b) limited view scenes.

10-2 10-1 100 101 102

Regularization Weight γ

3D Space 2D Plane

Info Reg Cycle Loss Stru Reg Original

10-2 10-1 100 101 102

Regularization Weight γ

3D Space 256 Vectors

Info Reg Cycle Loss Original

10-2 10-1 100 101 102

Prior Discrepancy Weight ϵ

3D Space 2D Plane

Info Reg Stru Reg

10-2 10-1 100 101 102

Prior Discrepancy Weight ϵ

3D Space 256 Vectors

Info Reg Original

Figure 6: Ablation study of the regularization weights and prior distribution discrepancy weight.

4.2 EVALUATION METRICS

The introduced neural gauge fields focus on learning gauge transformation, the experiments are therefore performed to evaluate the performance of gauge transformations. By default, the experiments are conducted on continuous gauge transformation from 3D space to 2D plane and discrete gauge transformation from 3D space to 256 codebook vectors. Specifically, the effectiveness of regularization can be evaluated by measuring the space occupancy in the target gauge space, which refers to the percentage of mapped area in continuous cases or the utilization ratio of codebook in discrete cases. Besides, as a gauge transformation can be applied to index a radiance field, the gauge transformation can be evaluated by the novel view synthesis performance of a radiance field.

4.3 EVALUATION RESULTS

Regularization Performance. We conduct the comparison of different regularization methods including structural regularization (Stru Reg), Cycle regularization (Cycle Reg), and our Info Reg. As tabulated in Table 1, the proposed Info Reg outperforms other regularization clearly in both continuous and discrete cases. Particularly, Stru Reg and Cycle Reg can bring certain improvements for continuous cases, while Stru Reg can not be applied to discrete cases and Cycle Reg only brings marginal gain for discrete cases.

Pre-defined vs Learned Mapping. As shown in Table 2, we compare the performance of the pre-defined functions and the learned mapping for gauge transformation. For continuous cases, the learned mapping with Info Reg outperforms the pre-defined orthogonal mapping clearly. For discrete cases, the standard learned discrete mapping (top-1 gauge) is slightly inferior to the predefined spatial hash function, while the top-4 variant of the learned discrete mapping surpasses the pre-defined function substantially.

Qualitative Results. To demonstrate the superiority of learned gauge transformation over predefined function (i.e., orthogonal mapping), we visualize the mapping results from 3D space to 2D plane in Fig. 5. Fig. 5 (a) illustrates the mapping on the challenging 360 view scenes. The learned mapping can still well capture the scene texture, while the orthogonal mapping presents poor mapping performance and even fails in certain case. Fig. 5 (b) compares the mapping on limited view scenes. The orthogonal mapping is still prone to lose much texture information thanks to the inherent inflexibility.

Published as a conference paper at ICLR 2023

1 4 8 16 32 64 Top-K Vectors

Rendering Performance (256 Vectors)

Original+Reg Original

1 4 8 16 32 Top-K Vectors

Codebook Utilization

256 Vectors 1024 Vectos

0 2000 4000 6000 8000 10000 Training Steps

Info Nest on Continuous Case

Grid Grid (Info Inv)

0 20000 40000 60000 80000 100000 Training Steps

Info Nest on Discrete Case

MLP MLP (Info Inv)

Figure 7: The left two figures: rendering performance and codebook utilization with different k values in top-k gauges. The right two figures: the performance gain with the inclusion of Info Inv.

Model Training Details Synthetic-Ne RF Synthetic-NSVF Models Steps Time(s) PSNR SSIM PSNR SSIM Tenso RF 30k 708 32.32 0.958 35.32 0.976 Tenso RF+Info Inv 30k 721 32.96 0.962 35.81 0.979 Instant-NGP 30k 458 32.43 0.960 35.43 0.977 Instant-NGP+Info Inv 30k 496 32.91 0.962 36.01 0.981

Table 3: The rendering performance gain by combining our information-invariant gauge transformation to SOTA methods on Synthetic-Ne RF and Synthetic-NSVF. Notably, only very slight computation and memory cost are introduced with the including of neural gauge fields.

Top-k Ablation. As defined in Eq. (1), different k can be selected for discrete gauge transformation. Figs. 7 (1) and (2) ablate the rendering performance and codebook utilization with different k values. The rendering performance and codebook utilization both are improved consistently with the increase of k, which means the gauge collapse is gradually alleviated. The performance gain brought by Info Reg is also narrowed with the increase of k. We conjecture that a larger k leads to higher information capacity which alleviates the information loss in gauge transformation.

Info Reg Parameter Study. Fig. 6 examines the effect of the regularization weight γ and prior discrepancy weight ϵ in Eq. (3) on the Lego scene. With the increase of regularization weight, the regularization performance of Info Reg is improved consistently, although it is saturated with a large weight. The regularization performance presents a positive correlation with a small prior discrepancy weight and a negative correlation with a large prior discrepancy weight.

Info Inv Performance. To demonstrate the power of information-invariant gauge transformation, we apply Info Inv to both grid-based and MLP-based neural fields for novel view synthesis. As shown in Figs. 7 (3) and (4), both convergence speed and final performance of triplane projection and spatial hash function are improved with the including of Info Inv. Besides, we also combine our Info Inv with SOTA methods including Tenso RF and Instant-NGP. As shown in Table 3, the inclusion of Info Inv boost SOTA methods consistently with preferred model size. Besides, we also combine Info Inv with widely adopted transformations including triplane projection and spatial hash function. Specifically, g(m) is approximated by the indexed feature in original Tenso RF or Instant-NGP, cos(mθ) and sin(mθ) are used to index an auxiliary grid. As shown in Table 3, Info Inv benefits these SOTA methods consistently with ignorable (or very slight) increase of training time and model size.

5 CONCLUSION

This work introduces a general framework of neural gauge fields for unifying various gauge transformations in neural fields. With a target to optimize the gauge transformation along with neural fields, we establish the learning paradigm for continuous and discrete gauge transformations and derive a regularization mechanism to solve the issue of gauge collapse from the perspective of information preservation. As optimizing gauge transformation with regularization usually incurs high computation cost, a nested gauge transformation is introduced to perform instant transformation and preserve scene information with negligible computation cost. The extensive experiments and thorough analysis show that the derived gauge transformation methods outperform their vanilla counterparts, which demonstrates the potential of gauge transformations in neural fields.

Published as a conference paper at ICLR 2023

6 ETHICS STATEMENT

Our neural gauge fields focus on technically advancing neural scene representations and are not biased towards specific race, gender, or region. As a general framework, neural gauge fields can be applied to various fields, e.g., virtual reality, artistic creation, and game design. On the other hand, the achievements in the image synthesis quality lurk the risk of being misused, e.g., illegal impersonation of actors, or face synthesis for malicious purposes. Therefore, it is also of importance to explore safeguarding technologies that can reduce the potential for misuse. One potential way is to use our approach as well as other image synthesis approaches as a discriminator to identify the synthesized images automatically. This has been actively explored by the community, e.g. Mirsky & Lee (2021). Alternatively, one should explore watermarking technologies to identify synthesized images. Besides, access control should also be carefully considered when deploying the relevant techniques to alleviate the risk of misuse. Finally, we believe creating appropriate regulations and laws is essential to solving the issue of technology misuse while promoting its positive effects on technology development.

7 REPRODUCIBILITY STATEMENT

We attach the source code of neural gauge fields in the supplementary material. Additionally, we provide detailed implementations in the manuscript that enable a full reproduction of the experimental results. For example, we specify the choices of rendering resolution, model architecture, and the codebook setting for discrete gauge transformation. All datasets used in our experiments are publicly accessible. Finally, all pre-trained models will be made publicly available upon the publication of this work. The implementations will enable researchers to reproduce the results as well as extend this work to future research.

Alexei Baevski, Steffen Schneider, and Michael Auli. vq-wav2vec: Self-supervised learning of discrete speech representations. In International Conference on Learning Representations, 2019.

Yoshua Bengio, Nicholas L eonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. ar Xiv preprint ar Xiv:1308.3432, 2013.

Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5799 5809, 2021.

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16123 16133, 2022.

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. ar Xiv preprint ar Xiv:2203.09517, 2022.

Yu Deng, Jiaolong Yang, Jianfeng Xiang, and Xin Tong. Gram: Generative radiance manifolds for 3d-aware image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10673 10683, 2022.

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. Stylenerf: A style-based 3d aware generator for high-resolution image synthesis. In International Conference on Learning Representations, 2022.

Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5875 5884, 2021.

Binbin Huang, Xinhao Yan, Anpei Chen, Shenghua Gao, and Jingyi Yu. Pref: Phasorial embedding fields for compact neural representations. ar Xiv preprint ar Xiv:2205.13524, 2022.

Published as a conference paper at ICLR 2023

Lukasz Kaiser, Samy Bengio, Aurko Roy, Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, and Noam Shazeer. Fast decoding in sequence models using discrete latent variables. In International Conference on Machine Learning, pp. 2390 2399. PMLR, 2018.

David B Lindell, Julien NP Martel, and Gordon Wetzstein. Autoint: Automatic integration for fast neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14556 14565, 2021.

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651 15663, 2020.

Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Trans. Graph.(ACM SIGGRAPH Asia), 2021.

Li Ma, Xiaoyu Li, Jing Liao, Xuan Wang, Qi Zhang, Jue Wang, and Pedro Sander. Neural parameterization for dynamic human head editing. ar Xiv preprint ar Xiv:2207.00210, 2022.

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99 106, 2021.

Yisroel Mirsky and Wenke Lee. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1):1 41, 2021.

Keihachiro Moriyasu. An elementary primer for gauge theory. World Scientific, 1983.

Thomas M uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ar Xiv preprint ar Xiv:2201.05989, 2022.

Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H Mueller, Chakravarty R Alla Chaitanya, Anton Kaplanyan, and Markus Steinberger. Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In Computer Graphics Forum, 2021.

Michael Niemeyer and Andreas Geiger. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11453 11464, 2021.

Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. Advances in neural information processing systems, 29, 2016.

Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5589 5599, 2021.

Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher Shlizerman. Style SDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13503 13513, June 2022.

Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. Deformable neural radiance fields. ar Xiv preprint ar Xiv:2011.12948, 2020.

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14314 14323, 2021.

Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. In European Conference on Computer Vision, pp. 523 540. Springer, 2020.

Published as a conference paper at ICLR 2023

Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes, 2020.

Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In International Conference on Computer Vision (ICCV), 2021.

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover s distance as a metric for image retrieval. International journal of computer vision, 40(2):99 121, 2000.

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33:20154 20166, 2020.

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5459 5469, 2022.

Towaki Takikawa, Alex Evans, Jonathan Tremblay, Thomas M uller, Morgan Mc Guire, Alec Jacobson, and Sanja Fidler. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, pp. 1 9, 2022.

Matthias Teschner, Bruno Heidelberger, Matthias M uller, Danat Pomerantes, and Markus H Gross. Optimized spatial hashing for collision detection of deformable objects. In Vmv, volume 3, pp. 47 54, 2003.

Ayush Tewari, Xingang Pan, Ohad Fried, Maneesh Agrawala, Christian Theobalt, et al. Disentangled3d: Learning a 3d generative model with disentangled geometry and appearance from monocular images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1516 1525, 2022.

Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollh ofer, Christoph Lassner, and Christian Theobalt. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12959 12970, 2021.

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.

Robert M Wald. General relativity. 2010.

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems, 34:27171 27183, 2021.

Fanbo Xiang, Zexiang Xu, Milos Hasan, Yannick Hold-Geoffroy, Kalyan Sunkavalli, and Hao Su. Neutex: Neural texture mapping for volumetric neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7119 7128, 2021.

Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.

Alex Yu, Sara Fridovich-Keil, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. ar Xiv preprint ar Xiv:2112.05131, 2021a.

Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752 5761, 2021b.

Published as a conference paper at ICLR 2023

A.1 GAUGE TRANSFORMATION BACKGROUND

In normal usage, gauge means a measuring instrument or measuring system, e.g., pressure gauge and temperature gauge. The transformation between different gauges (e.g., Fahrenheit and Celsius temperature) can be termed as Gauge Transformation. In physics, a gauge is a particular choice or specification of vector and scalar potentials which will generate a physical field. A transformation from one physical field configuration to another is called a gauge transformation. The gauge transformations for classical general relativity Wald (2010) are arbitrary coordinate transformations.

Under the context of neural fields, the measuring systems (i.e., gauge) of the 3D scene can be extended to more general definition, e.g., 3D Cartesian coordinate system, 2D coordinate system, hash table space, and frequency space. The transformation between different scene measuring system is dubbed as Gauge Transformation, and the neural fields with certain gauge transformation is termed as Neural Gauge Fields as shown in Fig. 8.

𝑝= (𝑥, 𝑦, 𝑧)

Neural Field

Gauge Transformation

Continuous Cases 𝑓(𝑥, 𝑦, 𝑧)

Discrete Cases

Codebook [𝑖𝑛𝑑𝑒𝑥] Original Gauge Target Gauge

Fourier Space [ , 𝑐𝑜𝑠2!𝑝, ]

Feature Grid

Scene Representation

Spherical Surface 𝜃, 𝜑

( 𝑥, 𝑦, 𝑦, 𝑧, [𝑥, 𝑧])

Figure 8: The framework of neural gauge fields which consists of a neural field and a gauge transformation. The gauge transformation maps original 3D coordinate to another measuring system (e.g., 2D plane and codebook) for scene representation via feature grid or a MLP network.

A.2 DERIVATION OF INFOREG

According to Eq. (2), the final optimization loss can be formulated as:

Lreg = min p(y|x)

γ ZZ p(y|x)p(x) log p(y|x)

p(y) dxdy + ϵ Z p(y) log p(y)

q(y)dy , (8)

where γ and ϵ denote the weight of regularization and prior distribution discrepancy. Then, we have:

Lreg = min p(y|x)

γ ZZ p(y|x)p(x) log p(y|x)

p(y) dxdy + ϵ ZZ p(x|y)p(y) log p(y)p(y|x)

q(y)p(y|x)dxdy

= min p(y|x)

ZZ p(y|x)p(x) (γ + ϵ) log p(y|x)

p(y) + ϵ log p(y|x)

= min p(y|x)

n (γ + ϵ) I(X, Y ) + ϵ E[KL(p(y|x)||q(y))] o

For I(X, Y ), we have:

I(X, Y ) = ZZ p(y|x)p(x) log p(y|x)p(x)

p(y)p(x) = KL(p(y|x)p(x)||p(y)p(x)) (10)

Thus, we can get:

Lreg = min p(y|x)

n (γ + ϵ) E[KL(p(y|x)p(x)||p(y)p(x))] + ϵ E[KL(p(y|x)||q(y))] o , (11)

which can be further solved as described in the main paper.

Published as a conference paper at ICLR 2023

A.3 RADIANCE CONTRIBUTION

As 3D points near the scene surface are of more significance in mutual information, we leverage the radiance contribution weights per shading point to weight the mutual information term as in Xiang et al. (2021). Specifically, with volume density σ and radiance c of a 3D scene, the RGB color I can be computed by aggregating the radiance values of shading points on the ray as below:

i Ti(1 exp(1 σiδi))ci, Ti = exp(

j=1 σjδj), (12)

where i denotes the index of a shading point on the ray, σi and ci denote the density and radiance at shading point i, δi denotes the distance between two consecutive points, Ti is the transmittance. Thus, the radiance contribution weight of a shading point can be formulated as wi = Ti(1 exp(σiδi)). The wi is applied to weight the positive pairs in Eq. (5).

A.4 DIFFERENTIABLE TOP-K

With a predicted discrete distribution Px = [p1, p2, , p N], the top-k operation aims to select the k most probable index in a differentiable way so that the gradient can be back-propagated. For forward-propagation, we use the efficient top-k operation implemented in Py Torch to select the topk values and indices. For back-propagation, we apply the reparameterization trick (Bengio et al., 2013) to approximate the gradient by replacing the top-k operation with Softmax. The detailed pseudo code of the differentiable top-k operation is given in Algorithm 2.

Algorithm 2 Pseudo code of differentiable top-k operation.

Input: k values in top-k, a 3D point x, predicted distribution Px = [p1, p2, , p N], codebook V.

Forward propagation:

1. topk init=[0, 0, , 0] RN

2. index, value = Top-k(Px) 3. topk hard = topk init.scatter(index, value) 4. value sum = value.sum() 5. topk hard = topk hard value sum 6. feature lookup: vx = Matmul(topk hard, V) Backward propagation:

1. index soft = Softmax(Px) 2. feature lookup: vx = Matmul(index soft, V) Output: feature vx.

A.5 IMPLEMENTATION

We implemented our neural gauge fields and re-implemented all compared methods without customized CUDA kernels. All models are optimized for 150k steps with a batch size of 1024 pixel rays.

Efficient Gauge Transformation. The proposed neural gauge fields can be applied to benefit existing models without increasing their computation cost and training speed significantly. Here, we provide several strategies to achieve it: First, instead of using MLPs to learn the gauge transformation, we can directly optimize tensors for transformation, e.g., 2D offset grids to indicate the coordinate transformation in Tenso RF, 1D vectors for grid points to indicate codebook index distribution in Instant-NGP. Thus, the convergence speed of gauge transformation can be aligned with grid-based models like Tenso RF and Instant-NGP. Second, gauge transformation learning can be performed in the middle training stage instead of the full training process, then the learned transformation is frozen for ensuing model training. Thus, the computation cost introduced by learning gauge transformation (and regularization) can be ignorable compared with the total training time. Third, we empirically find that the prior distribution regularization term in Info Reg has been enough to avoid gauge collapse in most cases, which is much more efficient than the complete Info Reg.

Published as a conference paper at ICLR 2023

A.6 LIMITATIONS

This work focuses the fundamental problem of learning gauge transformation along with neural fields and is orthogonal to the SOTA methods for scene representation, e.g., Tenso RF and Instant NGP. Although we demonstrate that the learned gauge transformation can outperform its pre-defined counterpart (e.g., orthogonal mapping and spatial hash function), there are still some challenges to apply this method to the SOTA models.

For instance, the original Instant-NGP employs a 16-layer codebook with a size ranging from 214 to 224. When replacing the spatial hash function in Instant-NGP with the learned counterpart, the gauge network size will be extremely huge with an output layer dimension ranging from 214 to 224. One possible solution is firstly grouping the codebook into several sub-blocks with smaller size, followed by smaller gauge network to learn the mapping to each block.

A.7 DISCUSSION & FUTURE WORK

As a general framework, the proposed neural gauge fields can be extended in several directions. Firstly, this work focuses on neural fields for novel view synthesis, which can be easily applied to downstream tasks, e.g., surface reconstruction with SDF, dynamic scenes modelling. Secondly, our Info Reg solves the gauge collapse in discrete gauge transformation, and can be potentially applied to the field of discrete representation learning (e.g., VQ-VAE (Van Den Oord et al., 2017)) which also suffers from codebook collapse. Thirdly, this work mainly explores the pure pre-defined or learned mapping, and how to combine pre-defined mapping and learned mapping to yield better performance is also of great potential. Please refer to Appendix A.6 for more discussion of limitations.

A.8 MORE QUALITATIVE RESULTS

Fig. 9 compares the performance of Info Reg with Euclidean distance and cosine distance, the Euclidean distance is prone to preserve better texture structure compared with Cosine distance although they both cover the 2D space and achieve similar rendering performance.

Fig. 10 shows additional samples of leaning continuous gauge transformation with different regularizations.

Except for novel view synthesis, we also apply our neural gauge fields to the downstream task of texture editing as shown in Fig. 11. Specifically, we learn the mapping from 3D space (points on object surface) to sphere surface which is visualized by cubemap for texture editing.

Published as a conference paper at ICLR 2023

3D Scene Image Euclidean Distance

Cosine Distance

Figure 9: The results (3D space to 2D plane) of different distance metrics (Euclidean distance or Cosine distance) used in network T in the proposed Info Reg. Euclidean distance leads to better preservation of local texture structure compared with Cosine distance.

3D Scene Image With Cycle Regularization Without Regularization With Info Reg

Figure 10: The results (3D space to 2D plane) of different learning regularizations. Our Info Reg leads to the best mapping results by utilizing the full plane.

3D Scene Image Learned Texture

Texture-A View1

Edited Texture-A

Edited Texture-B

Texture-A View2 Texture-A View3 Texture-A View4

Texture-B View1 Texture-B View2 Texture-B View3 Texture-B View4

Figure 11: The application of neural gauge fields for texture editing. A gauge transformation is learned from 3D space to sphere surface (visualized by cubemap). By editing the texture of cubemap, the model is able to render multi-view images of the corresponding texture.