# near_neural_electromagnetic_array_response__bb6c85a0.pdf

NEAR: Neural Electromagnetic Array Response

Yinyan Bu 1 Jiajie Yu 1 Kai Zheng 1 Xinyu Zhang 1 Piya Pal 1

We address the challenge of achieving angular super-resolution in multi-antenna radar systems that are widely used for localization, navigation, and automotive perception. A multi-antenna radar achieves very high resolution by computationally creating a large virtual sensing system using very few physical antennas. However, practical constraints imposed by hardware, noise, and a limited number of antennas can impede its performance. Conventional supervised learning models that rely on extensive pre-training with large datasets, often exhibit poor generalization in unseen environments. To overcome these limitations, we propose NEAR, an untrained implicit neural representation (INR) framework that predicts radar responses at unseen locations from sparse measurements, by leveraging latent harmonic structures inherent in radar wave propagation. We establish new theoretical results linking antenna array response to expressive power of INR architectures, and develop a novel physics-informed and latent geometry-aware regularizer. Our approach integrates classical signal representation with modern implicit neural learning, enabling high-resolution radar sensing that is both interpretable and generalizable. Extensive simulations and real-world experiments using radar platforms demonstrate NEAR s effectiveness and its ability to adapt to unseen environments.

1. Introduction

In addition to Lidar and RGB-cameras, Radar has emerged as a crucial sensing modality for advanced sensing tasks such as driver assistance systems (ADAS) and autonomous vehicles (Bijelic et al., 2020; Caesar et al., 2020), especially

1Department of Electrical and Computer Engineering, University of California San Diego (UCSD), La Jolla, United States. Correspondence to: Yinyan Bu <y1bu@ucsd.edu>, Jiajie Yu <jiy088@ucsd.edu>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

due to its robustness to adverse weather conditions (e.g. fog, snow, rain). Multiple-Input-Multiple-Output (MIMO) radar (Li & Stoica, 2008) employs an array of transmit (Tx) antennas which generate signals that are reflected by targets of interest, and received at a receiving (Rx) antenna array. The distance and velocity of these targets are characterized by using the radar s Range-Doppler (RD) map, which is computed by applying Discrete-Time Fast Fourier Transform (FFT) on digitized receiver signals after Analogto-Digital Conversion (ADC) (Sun et al., 2020). Directionof-Arrival (DOA) estimation is then performed exclusively on peaks that pass the constant false alarm rate (CFAR) detector (Scharf & Demeure, 1991) to determine the angular orientation of objects. The angular resolution of MIMO radar, which reveals how well it can identify two or more closely spaced sources, is fundamentally constrained by the number and configuration of the antenna array. For instance, a device equipped with eight uniformly filled antenna arrays achieves at most an angular resolution of about 15

(Instruments, 2017). Thus, it is important to develop innovative technologies to enhance the angular resolution of radar sensing, without incurring substantial hardware costs.

For MIMO radar, range and Doppler resolution can be improved by adjusting signal bandwidth and frame time, which correspond to the frequency range and duration of signal pulses, respectively (Li et al., 2023). However, angular resolution is strictly dependent on the radar hardware specifications, and cannot be improved through parameter adjustments. Achieving higher angular resolution in both azimuth and elevation requires a large aperture in both horizontal and vertical directions, which, for uniformly filled arrays, necessitates a significant number of antennas, resulting in high hardware costs. An efficient alternative approach, which is becoming increasingly relevant for next-generation sensing (such as automotive radars) is to use sparse arrays (Pal & Vaidyanathan, 2010; Qin et al., 2015; Sarangi et al., 2023). Sparse arrays deploy a reduced number of transmit and receive antennas in order to achieve the same aperture as a standard uniform array (with quadratically larger number of sensors), which necessitates larger inter-element spacings. They are designed so that their virtual (sum or difference) co-arrays are dense uniform arrays filling the available aperture. This property can be utilized in several ways, such as localization of more sources than sensors, and achieving

NEAR: Neural Electromagnetic Array Response

very high resolution with sufficient temporal measurements (Cheng et al., 2014; Liu & Vaidyanathan, 2017; Wang & Nehorai, 2017; Qiao & Pal, 2019). Recently, it has been shown that sparse arrays are also near-optimal subspace codes, highlighting their novel connection to channel coding (Mahdavifar et al., 2024). However, naive processing of sparse array outputs using traditional or ad-hoc methods, can suffer from high sidelobes and degrade DOA estimation accuracy (Sun & Zhang, 2021). Thus, achieving super-resolution angular resolution with low hardware cost and irregular sampling geometries, remains a continuing challenge.

In this work, we introduce a machine learning framework that tackles the challenge of angular super-resolution at low hardware cost using sparse measurements that employ only a few antennas. Our goal is to predict complex-valued responses at any desired location (that can potentially be used for DOA estimation) within the 2D virtual antenna array domain using only a sparse set of responses. One straightforward approach to accomplish this is to train a machine learning model that maps a spatial location to the corresponding antenna response. However, this approach may fail to incorporate the underlying physics of wave signal propagation, and thus still require very dense measurements to achieve reasonable performance. As one of the important breakthroughs in computer vision, Ne RF (Mildenhall et al., 2021) has achieved remarkable success in 3D reconstruction and view synthesis tasks by learning a scene s radiance field from a set of input images and generating photorealistic renderings from novel viewpoints. At its core, Ne RF utilizes implicit neural representations (INRs) (Sitzmann et al., 2020; Tancik et al., 2020) to parameterize the radiance field as a continuous function, modeled by a multilayer perceptron (MLP) that maps 3D spatial coordinates to RGB color and volume density. Leveraging volume-rendering techniques to synthesize images, Ne RF incorporates the underlying physics of light propagation.

Inspired by recent advances in Ne RF and INRs, we propose Neural Electromagnetic Array Response (NEAR), a framework utilizing INR, that maps 2D spatial coordinates to complex-valued antenna response at those locations. Several distinguishing features differentiate our task from traditional Ne RF applications, particularly due to fundamental differences between the ways in which visible light and radar signals are processed (Zhao et al., 2023). Firstly, we have access to a limited number of complex-valued measurements, proportional to the number of deployed antennas. Apparently, this conveys significantly less information compared to an image comprising thousands of pixels. As a consequence, training a model using off-the-shelf INRbased algorithms with a limited number of antennas can fail to reliably predict unseen array response at arbitrary spatial locations. Secondly, while optical Ne RF frameworks rely solely on light intensity (amplitude), radar signals at

millimeter wavelengths necessitate consideration of phase information. Unlike visible light, where the phase is often neglected, the phase in radar signals is crucial for capturing fine-grained details of wave propagation, such as target locations. Ignoring phase would result in a significant loss of critical information. Finally, despite the increasing adoption of INRs in various domains, the theoretical understanding of their properties and their implications in specific applications remain limited. Key aspects, such as the behavior of deep layers in these networks and role of positional encoding (PE) in representing complex signals, are not yet well-understood.

To address these challenges, our work makes the following contributions:

We propose NEAR, the first electromagnetic array response prediction framework that implicitly integrates signal propagation characteristics into INRs. Our approach enables prediction of array response at unseen receiver locations, facilitating super-resolution angular estimation with low hardware requirements.

We provide tight characterization of the class of functions that INR s can represent with certain choices of positional encoding and activation functions. Our results improve upon existing theoretical analysis of INRs.

We evaluate NEAR through both simulation studies and real-world experiments, achieving superior performance in antenna array response prediction and other downstream tasks such as super-resolution angular estimation compared to existing model-based and machine learning methods.

Overall, we believe our findings contribute to advancing research in INRs and their unique applications in radar sensing. Our work also marks the first step towards leveraging INRs for predicting unseen antenna responses in radar sensing, paving the way for new opportunities to enhance the performance of future sensing and localization systems.

2. Preliminaries

In this section, we provide background knowledge on MIMO radar, virtual array, and implicit neural representation.

MIMO Radar. We consider targets in three-dimensional Euclidean space, represented by spherical coordinates as depicted in Figure 1. We consider a Lshaped MIMO radar system (consistent with our hardware) with Nt physical Tx antennas located at {(x T,1, 0, 0), , (x T,Nt, 0, 0)} and Nr physical Rx antennas located at {(0, y R,1, 0), , (0, y R,Nr, 0)}. The

NEAR: Neural Electromagnetic Array Response

k-th target

Figure 1. Left: Illustration of the target position in Spherical coordinate system. Right: One sub-Nyquist sampling pattern with indicating missing virtual element response.

Tx antennas emit a set of Nt orthogonal waveforms, which are reflected by K targets and their superposition is received at the Rx antenna array. Each Rx antenna of a MIMO radar is equipped with a bank of Nt matched filters, each matched to one of the Nt orthogonal waveforms. This yields a total of Nt Nr measurements at the output of Nt Nr matched filters, which can be used to perform different spatial sensing tasks such as localization, beamforming and so forth (Li & Stoica, 2008).

Virtual Array. One of the key features of a MIMO radar is that by using only Nt + Nr physical Tx and Rx antennas, it can create the effect of a much larger antenna array with Nt Nr virtual sensing elements at the output of the Nt Nr matched filters. Consider a far-field point target at direction (θ, ϕ). It can be shown that the noiseless array response at the m-th matched filter output in the n-th receiving antenna can be expressed as

λ (x T,m sin ϕk cos θk+y R,n sin ϕk sin θk), (1)

where x is the unknown amplitude of the signal reflected by the target and λ is the wavelength at which the narrowband radar operates. Therefore, the array response in (1) is the same as that of a (fictitious) two-dimensional receiving array with Nt Nr antenna elements located at

{(x T,m, 0, 0) + (0, x R,n, 0), 1 m Nt, 1 n Nr}.

This two-dimensional antenna array with Nt Nm elements is known as the virtual array (Chen & Vaidyanathan, 2008). Figure 1 shows a physical Tx-Rx antenna pair and the associated two-dimensional virtual array. Notice that depending on the geometry of the Tx-Rx pair, the 2D virtual array need not comprise of elements on consecutive locations over a uniform grid, and there can be missing elements (or holes) the virtual array, as indicated in Figure 1.

Consider K targets in the far field, with azimuth angle θk and elevation angle ϕk, 1 k K. Without loss of generality, let the reference virtual antenna be located at the

origin of the coordinate system. The array response at a coordinate (r1, r2) 1 (which may be a virtual array element location, or the location of a missing sensor), due to signals impinging from the K targets in absence of noise can be expressed as:

k=1 xkej 2π

λ (r1 sin ϕk cos θk+r2 sin ϕk sin θk), (2)

where xk is the unknown complex-valued reflection coefficient of k-th target. Various algorithms, such as beamforming (Van Trees, 2002) and subspace-based methods (Schmidt, 1986; Roy & Kailath, 1989), can be applied to estimate {θk}K k=1 and {ϕk}K k=1.

Implicit Neural Representations (INR). INRs are used to model a continuous function g : Rdin Rdout using a neural network fΘ : Rdin Rdout, parameterized by weights Θ, which map input coordinates r Rdin to signal values g(r) Rdout. A significant challenge for INRs is to accurately reconstruct high-frequency details, which is needed for radar super-resolution. Classical neural network architectures are known to exhibit strong spectral bias (Rahaman et al., 2019) towards lower frequencies. Recently, Tancik et al. (2020); Sitzmann et al. (2020) have proposed architectural solutions to overcome this spectral bias allowing faster convergence and higher accuracy of INRs.

Following the formulation of (Y uce et al., 2022), most INR architectures can be decomposed into a mapping function γ : RD RT followed by a MLP, with weights W (ℓ) RFℓ Fℓ 1, bias b(ℓ) RFℓ, and activation function ρ(ℓ) : R R applied element-wise at each layer ℓ= 1, . . . , L 1. Suppose z(ℓ) represents the post activation output at layer ℓ. The INR input-output relationship is given by

z(0) = γ(r),

z(ℓ) = ρ(ℓ) W (ℓ)z(ℓ 1) + b(ℓ) , ℓ= 1, . . . , L 1,

fΘ(r) = W (L)z(L 1) + b(L). (3) Tancik et al. (2020) introduced Fourier feature networks (FFNs), which use Fourier-based positional encoding γ(r) = sin(Ωr + ϕ), with parameters Ω RT D and ϕ RT followed by an MLP with ρ(ℓ) = Re LU. They demonstrated that by initializing Ωi,j N(0, σ2) with random Fourier features, and choosing large values of σ, one can drive the network response towards realizing higher frequencies. SIREN (Sitzmann et al., 2020) can also mitigate spectral bias in a similar way by choosing a different (sinusoidal) activation function and rescaling certain parameters at initialization.

1In our setting, the 2D virtual array is located in the xy-plane.

NEAR: Neural Electromagnetic Array Response

3. Related Work

Hallucinated Antenna Interpolation/Extrapolation. To mitigate the high sidelobes introduced by the sparse arrays and enhance the SNR of antenna array response, Sun & Zhang (2021) propose to recover missing elements or holes in the sparse arrays by completing a low-rank (Block) Hankel matrix (Chen & Chi, 2013). However, the nuclear norm minimization that they employed often suffers from suboptimal recovery performance (Lu et al., 2015) and exhibits sensitivity to the sampling pattern and noise(Bu et al., 2025; Sarangi et al., 2022). Furthermore, their approach is limited in its applicability to cases involving non-integer-multiple sampling. To enhance the azimuth resolution of MIMO radar, Li et al. (2023) proposed Analog-to-Digital superresolution model (ADC-SR) that predicts or hallucinates additional radar signals using signals from only a few receivers, essentially implementing a uniformly filled array extrapolation framework. However, their approach is restricted to 1D MIMO configurations and relies on a large training dataset, potentially limiting its generalization capability. In contrast, our method implicitly leverages the underlying physics of signal propagation and requires only single-snapshot sparse measurements, eliminating the dependence on extensive training data.

Expressive power of INRs. INRs have emerged as a versatile set of neural architectures for representing and processing signals on low-dimensional spaces. Understanding the function class that an INR architecture can represent is essential for their application to practical problems. Recognizing that polynomials of sinusoids generate linear combinations of integer harmonics of said sinusoids, Y uce et al. (2022) analyzed the expressivity of FFN, SIREN and related architectures in (Fathony et al., 2020). Subsequently Roddenberry et al. (2023) developed a broader theoretical understanding of INR architectures with a wider class of activation functions and provided a superset to which the integer harmonic frequencies characterizing INR functions belong. While their superset results provide valuable theoretical insights, our work refines this analysis and derives the exact set (and not a superset) of integer harmonics which describe the expressive power of INRs, delivering a tight characterization.

Neural Radio-Frequency Field Reconstruction. Building on the fact that light is a kind of electro magnetic (EM) wave, Zhao et al. (2023) and (Lu et al.) proposed two Ne RF-based frameworks, named Ne RF2 and Ne WRF, respectively, for wireless channel modeling based on implicit wireless radiation field reconstruction. Chen et al. (2024) further developed a hybrid model that integrates Ne RF-like object representation with physics-based ray tracing models. These models enable accurate characterization and prediction of channel properties. Building on the principles of

planar wave propagation, we propose a novel framework for reconstructing 2D MIMO virtual antenna array response fields using implicit neural representations. In contrast to aforementioned approaches that employ ray tracing for EM waves, our method employs a straightforward yet effective regularization strategy specifically designed to leverage the spectral sparsity of antenna array measurements and the characteristics of planar wave propagation.

4. Neural Electromagnetic Array Response

In this section, we present the design of NEAR. Section 4.1 outlines our problem formulation, followed by theoretical results on the expressive power of INRs and their connection to Fourier series in Section 4.2 elucidating why and how array response function in (2) can be effectively learned. Section 4.3 details our novel implicit regularization strategy that integrates signal propagation model while harnessing harmonic structure. Finally, we describe the response prediction process of NEAR in Section 4.4.

4.1. Problem Formulation

We consider a environment where all objects are located in the far-field relative to the radar antenna array. In this setup, the propagation of wireless signals can be modeled as planar waves that are emitted from the Tx array, reflected by objects and finally captured by the Rx array. Let Sx and Sy represent the sparse sets of physical Tx and Rx antennas, respectively. The coordinate set of available virtual antennas is given by {(rx, ry)}rx Sx,ry Sy as explained in Section 2. We define the domain of antenna array response field as D = {(x, y) | 0 x max(Sx), 0 y max(Sy)}. We represent the continuous complex-valued response field as a function y : D C, where the input is a 2D coordinate r = [r1, r2] within the domain D, and the output is a complexvalued response yr1,r2 that adheres to the signal model in (2). To approximate this continuous 2D response field, we employ an INR model that maps the input 2D coordinates to a vector in R2, where the two components correspond to the real and imaginary parts of the complex-valued response, respectively. Specifically, the model is defined as fΘ : R2 R2, and the parameters Θ are optimized to map each input 2D coordinate to its corresponding response. The goal of this paper is to learn the function fΘ solely from physical antenna measurements (without using any offline training data), by exploiting the harmonic structure of array response in (2). Once the response function is learned, it enables the prediction of array responses at any unseen locations within D, facilitating downstream tasks such as angle estimation and localization.

NEAR: Neural Electromagnetic Array Response

4.2. Representational Ability of INRs

While substantial empirical evidence demonstrates the effectiveness of INRs in representing scenes and various visual signals, the theoretical underpinnings of their ability to approximate continuous functions remain underexplored. In this subsection, we establish that many contemporary INRs inherently build upon similar underlying structures and shared fundamental principles, enabling them to represent a certain class of signals.

To rigorously analyze the expressive power of INRs, we follow the formulation outlined in (3). Following (Y uce et al., 2022; Roddenberry et al., 2023; Mehmeti-G opel et al., 2020), we restrict our investigation to polynomial activation functions of the form ρ(x) = PQ q=0 αqxq, a widely adopted approach in the study of the expressive capacity of INRs. Theorem 4.1. Let fΘ : RD R be an INR given by (3), where the activation function for layers ℓ> 1 is given by ρ(ℓ)(z) = PQ q=0 αqzq. Let ΩT = [ω1, . . . , ωT ] RT D represent the frequency matrix and ϕT RT the phase vector used to map the input coordinate r RD

into the feature space via the mapping γ(r) = sin(ΩT r + ϕT ). The resulting architecture is capable of representing functions of the form:

s ST cs sin Ω T s, r + ϕs , (4)

[s1, s2, . . . , s T ] st Z,

t=1 |st| QL 1 )

Theorem 4.1 gives an exact characterization of the set ST of all possible integer harmonics of the feature mapping γ(r). In contrast, Y uce et al. (2022); Roddenberry et al. (2023) only provide a superset to which ST belongs. Remark 4.2. Let y R and y I denote the real and imaginary parts of the response field function (2), respectively. The function y R(r) can be equivalently represented as (4) by applying Theorem 4.1 with the following parameterization:

λ [sin ϕk cos θk sin ϕk sin θk]1 k K RK 2,

c2k 1 = Re(xk), c2k = Im(xk), ϕ2k 1 = π

ϕ2k = 0, sk = ek RK, ST = {ek}1 k K.

Under this parameterization,

k=1 c2k 1 sin Ω T sk, r + ϕ2k 1

+ c2k sin Ω T sk, r + ϕ2k .

A similar representation also holds for y I.

This shows that our desired array response indeed belongs to the class of functions representable by INRs. Although the resulting INR architecture appears deceptively simple, it is to be noted that the positional encoding requires the groundtruth DOA and amplitude of each target, which are never available in practice. Hence it is important to investigate the class of functions that INR can approximate using a given mapping γ(r), such as the type of fixed sinusoid positional encodings employed in Ne RF (Mildenhall et al., 2021):

γ(r) = sin(Ωr) cos(Ωr)

Ω= 20π 0 21π 0 2T 1π 0 0 20π 0 21π 0 2T 1π

Consider a non-periodic function g : Rdin Rdout defined over a bounded domain D (e.g. the height and width of a image, the aperture of 2D-MIMO array). We can define its periodic extension g : Rdin Rdout with period p as follows (Benbarka et al., 2022):

g(x + n p) = g(x) x D, n Zdin, (6)

where denotes the Hadamard product. By normalizing the input domain to its respective bounds, we assume a period of 2 for each variable, i.e., within the range [ 1, 1). The Fourier series expansion for a periodic extension g : R2 R of period 2 is given by (Oppenheim et al., 2010):

m,n= Am,n cos(π(mx+ny))+Bm,n sin(π(mx+ny)).

(7) It can be shown that if the frequency matrix Ω R2T 2

of the INR described in Theorem 4.1 is chosen according to (5), then as the number (L) of layers of the MLP/INR increases, fΘ approximates to certain period-2 functions g of the form (7). See Appendix A.7 for more details.

4.3. Physics-Informed Implicit Regularization

To model the antenna array response field, we discretize the domain of interest into a finite set of points in the 2D plane. Let [0, U1] [0, U2] represent the antenna array response field with bounded domain positioned in the x y plane, consider a general case of a uniform sampling grid of dimensions M1 M2, with spacing d1 = U1 M1 1 λ

2 and d2 = U2 M2 1 λ

2 . Supposing an array snapshot containing K targets with azimuth angle θk and elevation angle ϕk (k = 1, , K), and leveraging planar wave propagation (2), the (m1, m2) th element of the response with respect to

NEAR: Neural Electromagnetic Array Response

K targets in the absence of noise can be written as

k=1 xkej 2π

λ ((m1 1)d1 sin ϕk cos θk+(m2 1)d2 sin ϕk sin θk)

(8) for 1 m1 M1 and 1 m2 M2. Notably, when d1 = d2 = λ

2 , the sampling pattern aligns with the Nyquist sampling. Let Y = [ym1,m2]1 m1 M1,1 m2 M2 CM1 M2 be the ground truth response matrix with entries as the antenna array response defined in (8).

Definition 4.3. Given Y = [ym1,m2] CM1 M2 for 1 m1 M1, 1 m2 M2, a Block Hankel matrix of Y, 1 N1 M1, 1 N2 M2 can be constructed as:

HN2(y1) HN2(y2) HN2(y M1 N1+1) HN2(y2) HN2(y3) HN2(y M1 N1+2) ... ... ... ... HN2(y N1) HN2(y N1+1) HN2(y M1)

where HN2(ym), 1 m M1 is defined as:

ym,1 ym,2 ym,M2 N2+1 ym,2 ym,3 ym,M2 N2+2 ... ... ... ... ym,N2 ym,N2+1 ym,M2

Remark 4.4. Definition 4.3 defines the block Hankel matrix constructed along the row direction. Similarly, a block Hankel matrix can also be constructed along the column direction, denoted as H N1, N2(Y) (see definition in Appendix B.1). Moreover, the rank property remains consistent for block Hankel matrices constructed along both the row and column directions.

We emphasize that the number of targets in the same range Doppler bin that need angle estimation is small since the targets are first separated in range-Doppler domain (Sun et al., 2020). In other words, the targets are sparsely present in the angular domain and, as a result, HN1,N2(Y) and H N1, N2(Y) exhibit low rank, with rank equal to K for appropriate choice of N1, N2, N1, N2 (see Lemma B.1 in Appendix). To characterize such a property, numerous convex/non-convex rank surrogate functions have been explored in the literature, which include but are not limited to nuclear norm (Candes & Recht, 2012), schatten-p norm (Mohan & Fazel, 2012) and truncated nuclear norm (Hu et al., 2012). However, all of these surrogate functions are explicit and requires singular value decomposition (SVD), which can be not only computational expensive but also sub-optimal. In this work, we propose a novel implicit regularizer that exploits the structure of the block Hankel matrix

and its latent representation. To further justify the effectiveness, we establish the algebraic properties of the block Hankel matrix corresponding to the ground truth response Y using its harmonic structure.

Theorem 4.5. Consider the ground truth response matrix Y as defined in (8). For K min( M1

2 ), there exists vectors mo 1 CK and mo 2 CK such that the last column of HM1,M2 K(Y) can be uniquely represented in terms of the first K columns of HM1,M2 K(Y) using the corresponding coefficient vectors mo 1, i.e.

HM1,M2 K(Y)Smo 1 HM1,M2 K(Y)b 2 = 0,

where S = [IK K 0K] R(K+1) K, and b = 0 K 1 R(K+1) 1. Similarly, an equivalent property holds for HM2,M1 K(Y), given by

HM2,M1 K(Y)Smo 2 HM2,M1 K(Y)b 2 = 0.

Building upon the planar wave signal propagation model, Theorem 4.5 establishes a connection between rank property and least squares by leveraging harmonic structure of Block Hankel matrix. However, the global optimizer mo 1 and mo 2 are intrinsically dependent on parameters {(θk, ϕk)}K k=1 (see Lemma B.3 in Appendix), which are part of the radar sensing task and not known in advance. To address this, as detailed in the next subsection, we integrate the least squares term into the loss function and parameterize the unknown coefficients, enabling them to be learned adaptively.

4.4. Optimizing NEAR

In practice, the model predicts the real and imaginary parts of the response signal (ℜ{yr1,r2}, ℑ{yr1,r2}), instead of amplitude and phase (A(yr1,r2), ψ(yr1,r2)). This is because phase is modulo against 2π, which is not differentiable. We perform uniform inference for fΘ( ) over the bounded domain, using a pre-chosen grid of M1 M2 data points. Denote the predicted response at the (m1, m2) th element as ˆym1,m2 = fΘ((m1 1) U1 M1 1, (m2 1) U2 M2 1), and let ˆY represent the predicted response matrix. All the other notations remain consistent with those introduced in Section 4.3, with an additional ˆ to distinguish predicted quantities. Consider two sparse sampling pattern Sx, Sy, where the observed noisy response yrx,ry is only available at locations r = [rx, ry] rx Sx, ry Sy. The overall loss function is defined as

L(Θ, m1, m2) = Ld + λLr, (9)

NEAR: Neural Electromagnetic Array Response

i Sx fΘ(i, j) yi,j 2,

Lr = (HM1,M2 K( ˆY)Sm1 HM1,M2 K( ˆY)b 2

+ ( HM2,M1 K( ˆY)Sm2 HM2,M1 K( ˆY)b 2, ˆY = [fΘ(i, j)]1 i M1,1 j M2 . (10) Specifically, Ld represents data fitting term, which quantifies the gap between the predicted and acquired responses at observed locations; Lr corresponds to regularization term, as elaborated in Section 4.3 and Appendix B. Parameters are optimized by minimizing the total loss function

Θo, mo 1, mo 2 = arg min Θ,m1,m2 Ld + λLr. (11)

Using the optimal parameters Θo, the predicted array response can be computed by ˆyi,j = fΘo(i, j), i, j D.

5. Experiments

We evaluate the performance of NEAR on both simulated (Section 5.1) and real-world (Section 5.2) tasks. All experiments are run on a laptop with CPU AMD Ryzen 9 5900 HS with Radeon Graphics and GPU NVIDIA Ge Force RTX 3050 Ti Laptop. See Appendix C for more experimental results. The codes are available at: https: //github.com/J1mmy Yu1/NEAR.

Baselines and Benchmark. We compare NEAR against four representative baselines: Enhanced Matrix Completion (EMa C) (Chen & Chi, 2013), SIREN (Sitzmann et al., 2020), Ne RF2 (Zhao et al., 2023), and NEAR without Regularization (NEAR w/o R), more implementation details and analysis of these baseline methods can be found in Appendix C.2. For a fair comparison, we adopt the hyperparameters recommended by the original authors. Additionally, we include a 20 20 full virtual array response (noisy) as a benchmark reference.

NEAR Architecture. In both simulated and real-world settings, we employ the architecture described in Equation (3), with a depth of L = 4, Re LU activation function ρ( ) = Re LU( ), and positional encoding γ(r) following Ne RF s formulation in Equation (5) with T = 10. The hidden layer dimension is set to 256. Additional implementation details and hyperparameter configurations are provided in Appendix C.1.

5.1. Simulation Tasks

Response Recovery. We evaluate the response recovery performance of NEAR against baseline methods and the full virtual array benchmark, as summarized in Tables 1 - 3. The evaluation metric is the Normalized Root Mean

Square Error (NRMSE), defined as 1 N PN n=1 ˆ Yn Yn F

where ˆYn and Yn denote the predicted array response and the (noiseless) ground truth full virtual array response at n-th realization, respectively, with F representing the Frobenius norm. Our method consistently outperforms all baselines across different evaluation settings, demonstrating superior generalization in response recovery tasks. Notably, NEAR achieves even lower error than the 20 20 full virtual array benchmark across different SNR levels with a fixed sampling number (Table 1). This can be attributed to the inherent denoising ability of our regularizer that exploits low-dimensional structure of array response and provides a cleaner estimate of at a given coordinate, compared to actual noisy measurement at the same location. The poor performance of SIREN and NEAR w/o R across all settings suggests that these models struggle to learn the appropriate continuous response function in the absence of physicsinformed regularization. This highlights the importance of incorporating prior knowledge into implicit neural representations for structured signal recovery. A more detailed analysis is provided in the Ablation Study.

Table 1. Averaged NRMSE of response at different SNR level. 8 8 sampling is employed for NEAR, EMa C, NEAR w/O R and SIREN.

METHOD 10 d B 20 d B 30 d B

BENCHMARK 0.2608 0.0825 0.0261 NEAR 0.2248 0.0495 0.0189 EMAC 0.3537 0.1889 0.0921 NEAR W/O R 1.0663 1.0504 1.0485 SIREN 1.0512 1.0277 1.0244

Table 2. Averaged NRMSE of response for different sampling number at 20 d B with 2 targets.

METHOD 6X6 8X8 10X10

NEAR 0.1884 0.0495 0.0362 EMAC 0.5306 0.1889 0.0724 NEAR W/O R 1.0689 1.0504 1.0030 SIREN 1.0656 1.0277 0.9860

Table 3. Averaged NRMSE of response for different number of targets at 20 d B. 8 8 sampling is employed for all methods.

METHOD 1 TARGET 2 TARGETS 3 TARGETS

NEAR 0.0382 0.0461 0.0860 EMAC 0.1454 0.1941 0.2503 NEAR W/O R 1.0399 1.0501 1.0308 SIREN 1.0077 1.0262 1.0543

Angular Resolution. The resolution probability (defined in Appendix C.1.2) of NEAR compared to baselines and the full virtual array benchmark is illustrated in Figure 2. Our method consistently achieves the highest resolution

NEAR: Neural Electromagnetic Array Response

probability among baselines and closely follows the full benchmark. While EMa C achieves comparable resolution probability for larger angle separations, its performance degrades significantly as the angle separation decreases. This is because convex relaxation techniques, such as the Nuclear Norm used in EMa C, impose separation conditions that inherently limit resolution (even in noise-free scenarios) (Dai & Milenkovic, 2009). In contrast, NEAR demonstrates robust resolution across different separations.

3 4 5 6 7 8 9 10 Angle Separation (deg)

Resolution Probability

Full Array NEAR EMa C NEAR w/o R SIREN

Figure 2. Angular resolution performance vs. angle separations.

0.5316 0.7649 0.943 0.5361

1 target 2 targets 3 targets Number of targets in a single Range-Doppler bin

DOA Estimation Error (deg)

Full Array NEAR EMa C

Figure 3. DOA estimation accuracy vs. different number of targets.

DOA Estimation. The DOA estimation error of NEAR, compared to baselines and the full virtual array benchmark, is presented in Figure 3. Both NEAR and EMa C achieve similar performance to the benchmark when estimating the DOA of a single target. However, for multiple targets, NEAR significantly outperforms EMa C, demonstrating superior robustness in resolving closely spaced sources. Notably, EMa C s performance deteriorates as the number of targets increases, whereas NEAR maintains a lower estimation error, showing its capacity to generalize effectively to more complex scenarios.

Ablation Study. Tables 1 - 3 present a comprehensive ablation study assessing the impact of the physics-informed regularizer on NEAR. Without this regularizer, implicit neural representations (INRs) merely perform data fitting on the observed array responses but fail to capture the inherent low-rank structure in the Hankel matrix of the noiseless full virtual array response. This limitation severely affects the model s ability to generalize beyond observed data. The findings confirm that leveraging physics-informed constraints allows NEAR to achieve superior signal reconstruction and DOA estimation accuracy, particularly in challenging multi-target scenarios.

5.2. Real-world Experiments

We further conduct experiments using a commercial MIMO radar platform (IMAGEVK-74) as shown in Figure 5. IMAGEVK-74 employs 20 Tx antennas on a vertical line and 20 Rx antennas on a horizontal line, resulting in a virtual array of 20 20. IMAGEVK-74 transmits a Stepped Frequency Continuous Wave (SFCW) waveform and the

bandwidth is set to be 67 69 GHz. The antenna spacing is roughly half of the wavelength. After collecting the 20 20 full array response matrix, we select a subset of data and treat it as a sparse set of measurements. Our procedure for active sensing using NEAR is depicted in Figure 4. Additional details on radar data processing (such as analogto-digital conversion) across range and Doppler cells are included in the Appendix C.3.

Angular resolution. To measure the angular resolution, we put two corner reflectors at the boresight of the radar and gradually reduced the spacing between them. We employ the same signal processing pipeline (e.g., beamforming) and record the angular separation when the two targets merge in the radar angular spectrum. Table 4 shows the measured angular resolution with different setups. As the distance increases, the SNR reduces, and the reflected signal becomes weaker. NEAR achieves similar performance as the full array across all the range settings, confirming its robustness at lower SNR conditions in real-world environments.

Table 4. Smallest angular separation that can be resolved across various distance (SNR).

METHOD 2M 3M 4.5M

BENCHMARK 5.7248 6.6769 6.9941

NEAR 5.7248 6.6769 6.9941

EMAC 8.5783 9.5273 10.1592

NERF2 8.5783 8.5783 8.8948

Target localization. We put several corner reflectors (1 4) in random positions in the field view of the radar and perform radar localization. The location of the reflectors spans 1 to 4 m in range, -45 to 45 in the azimuth angle, and -20 to 20 in the elevation angle. A total of 70 location samples are collected and their localization errors are calculated. Table 5 shows that NEAR outperforms the full array baseline, Ne RF2 and EMa C in terms of mean absolute error. NEAR exhibits a denoising effect that improves the localization accuracy compared with the full array baseline. This denoising was achieved by using only an upper bound (and not the exact value) on the number of targets to design the regularizer. The results further confirm NEAR s capability to work in a complicated real-world environment with multiple reflectors. See more experimental results in Appendix C.4.

Computation Time. Table 6 reports the averaged running times: NEAR (our approach) finishes in roughly 9 minutes, whereas EMa C and Ne RF2 require about 20 and 21 minutes, respectively. These results highlight the potential for realtime implementation of our approach with future advances in algorithms and computing hardware.

NEAR: Neural Electromagnetic Array Response

Range Processing

Select range bins, and arrange frames

2D MUSIC Predict Unseen

Range spectrum NEAR

Full Virtual Array

Response (High Resolution)

Physical Array

Response (Low Resolution)

Angular spectrum

Calculate coordinates

Figure 4. Radar active sensing workflow.

Figure 5. Left: 2D MIMO radar platform. Right: Real-world experimental setup.

Table 5. Localization accuracy for different number of targets (K) in the environment.

METHOD K = 1 K = 2 K = 3 K = 4

BENCHMARK 0.0827 0.0903 0.0965 0.0964 NEAR 0.0744 0.0770 0.0762 0.0718 EMAC 0.1062 0.1158 0.1170 0.1157 NERF2 0.4902 0.5096 0.4346 0.3898

6. Conclusions and Future Work

We proposed NEAR, the first framework that leverages Implicit Neural Representations to model and predict antenna array responses with sparse measurements without training data. By integrating harmonic signal structure and planar wave propagation models, NEAR effectively enables enhanced angular resolution and robustness in radar sensing applications. We believe NEAR represents the first step towards bridging the gap between deep learning-based neural fields and classical electromagnetic sensing and signal processing, unlocking new possibilities for super-resolution radar, wireless and autonomous sensing applications. Future work will focus on addressing the following challenges and improvements:

Computational Efficiency. Optimizing the framework for real-time inference on embedded radar hardware, reducing

Table 6. Computation time comparison.

METHOD NEAR NERF2 EMAC

AVERAGED TIME COST (S) 550.83 1278.31 1226.15

computational overhead while maintaining accuracy.

Multi-Modal Sensor Fusion. Integrating NEAR with Li DAR, camera, and RF-based sensing to enhance robustness in complex environmental conditions.

Acknowledgements

We thank the reviewers for their insightful comments. Additionally, we would like to thank Xingyu Chen for helpful discussions and instructions on implementation and experimentation. We also thank Parthasarathi Khirwadkar and Mohamed Hamdy for their valuable comments and feedback. This work is generously supported by the UC San Diego Center for Wireless Communications, by ONR N00014-191-2227, DOE DE-SC0022165, and by NSF under grants CNS-2128588, CNS-2312715, CNS-2403124, and NSF 2124929.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Benbarka, N., H ofer, T., Zell, A., et al. Seeing implicit neural representations as fourier series. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2041 2050, 2022.

Bijelic, M., Gruber, T., Mannan, F., Kraus, F., Ritter, W., Dietmayer, K., and Heide, F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.

NEAR: Neural Electromagnetic Array Response

11682 11692, 2020.

Bu, Y., Yu, J., and Pal, P. Prediction-driven untrained network for single-snapshot sparse array interpolation. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1 5. IEEE, 2025.

Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11621 11631, 2020.

Candes, E. and Recht, B. Exact matrix completion via convex optimization. Communications of the ACM, 55 (6):111 119, 2012.

Chen, C.-Y. and Vaidyanathan, P. P. Mimo radar space time adaptive processing using prolate spheroidal wave functions. IEEE Transactions on Signal Processing, 56 (2):623 635, 2008.

Chen, X., Feng, Z., Sun, K., Qian, K., and Zhang, X. Rf Canvas: Modeling RF channel by fusing visual priors and few-shot RF measurements. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems, 2024.

Chen, Y. and Chi, Y. Spectral compressed sensing via structured matrix completion. In International conference on machine learning, pp. 414 422. PMLR, 2013.

Cheng, Q., Pal, P., Tsuji, M., and Hua, Y. An mdl algorithm for detecting more sources than sensors using outer-products of array output. IEEE Transactions on Signal Processing, 62(24):6438 6453, 2014. doi: 10.1109/TSP.2014.2364019.

CVX Research, I. CVX: Matlab software for disciplined convex programming, version 2.0. https://cvxr. com/cvx, August 2012.

Dai, W. and Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE transactions on Information Theory, 55(5):2230 2249, 2009.

Ding, F., Wen, X., Zhu, Y., Li, Y., and Lu, C. X. Radarocc: Robust 3d occupancy prediction with 4d imaging radar. ar Xiv preprint ar Xiv:2405.14014, 2024.

Fathony, R., Sahu, A. K., Willmott, D., and Kolter, J. Z. Multiplicative filter networks. In International Conference on Learning Representations, 2020.

Grant, M. and Boyd, S. Graph implementations for nonsmooth convex programs. In Blondel, V., Boyd, S., and Kimura, H. (eds.), Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences,

pp. 95 110. Springer-Verlag Limited, 2008. http: //stanford.edu/ boyd/graph_dcp.html.

Hu, Y., Zhang, D., Ye, J., Li, X., and He, X. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE transactions on pattern analysis and machine intelligence, 35(9):2117 2130, 2012.

Hua, Y. Estimating two-dimensional frequencies by matrix enhancement and matrix pencil. IEEE Transactions on Signal Processing, 40(9):2267 2280, 1992.

IMAGEVK-74. Mini-circuits vk-74 product page. https://www.minicircuits.com/ Web Store/imagevk_74.html. [Online; accessed 29-January-2025].

Instruments, T. Short range radar reference design using awr1642. Technical report, Technical Report, 2017.

Kramer, A., Harlow, K., Williams, C., and Heckman, C. Coloradar: The direct 3d millimeter wave radar dataset. The International Journal of Robotics Research, 41(4): 351 360, 2022.

Li, J. and Stoica, P. MIMO radar signal processing. John Wiley & Sons, 2008.

Li, Y.-J., Hunt, S., Park, J., O Toole, M., and Kitani, K. Azimuth super-resolution for fmcw radar in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17504 17513, 2023.

Liu, C.-L. and Vaidyanathan, P. Cram er rao bounds for coprime and other sparse arrays, which find more sources than sensors. Digital Signal Processing, 61:43 61, 2017. ISSN 1051-2004. Special Issue on Coprime Sampling and Arrays.

Lu, C., Tang, J., Yan, S., and Lin, Z. Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm. IEEE Transactions on Image Processing, 25(2): 829 839, 2015.

Lu, H., Vattheuer, C., Mirzasoleiman, B., and Abari, O. Newrf: A deep learning framework for wireless radiation field reconstruction and channel prediction. In Forty-first International Conference on Machine Learning.

Mahdavifar, H., Rajam aki, R., and Pal, P. Subspace coding for spatial sensing. In 2024 IEEE International Symposium on Information Theory (ISIT), pp. 2394 2399, 2024. doi: 10.1109/ISIT57864.2024.10619248.

Mehmeti-G opel, C. H. A., Hartmann, D., and Wand, M. Ringing relus: Harmonic distortion analysis of nonlinear feedforward networks. In International Conference on Learning Representations, 2020.

NEAR: Neural Electromagnetic Array Response

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., and Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99 106, 2021.

Mohan, K. and Fazel, M. Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research, 13(1):3441 3473, 2012.

Oppenheim, A., Willsky, A., and Nawab, S. Signals & Systems. PHI Learning Private Limited, 2010. URL https://books.google.com/ books?id=y-I9nw EACAAJ.

Pal, P. and Vaidyanathan, P. P. Nested arrays: A novel approach to array processing with enhanced degrees of freedom. IEEE Transactions on Signal Processing, 58(8): 4167 4181, 2010.

Qiao, H. and Pal, P. Guaranteed localization of more sources than sensors with finite snapshots in multiple measurement vector models using difference co-arrays. IEEE Transactions on Signal Processing, 67(22):5715 5729, 2019. doi: 10.1109/TSP.2019.2943224.

Qin, S., Zhang, Y. D., and Amin, M. G. Generalized coprime array configurations for direction-of-arrival estimation. IEEE Transactions on Signal Processing, 63(6):1377 1390, 2015.

Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., and Courville, A. On the spectral bias of neural networks. In International conference on machine learning, pp. 5301 5310. PMLR, 2019.

Roddenberry, T. M., Saragadam, V., de Hoop, M. V., and Baraniuk, R. G. Implicit neural representations and the algebra of complex wavelets. ar Xiv preprint ar Xiv:2310.00545, 2023.

Roy, R. and Kailath, T. Esprit-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on acoustics, speech, and signal processing, 37(7): 984 995, 1989.

Sarangi, P., H uc umeno glu, M. C., and Pal, P. Singlesnapshot nested virtual array completion: Necessary and sufficient conditions. IEEE Signal Processing Letters, 29: 2113 2117, 2022.

Sarangi, P., H uc umeno glu, M. C., Rajam aki, R., and Pal, P. Super-resolution with sparse arrays: A nonasymptotic analysis of spatiotemporal trade-offs. IEEE Transactions on Signal Processing, 71:4288 4302, 2023.

Scharf, L. L. and Demeure, C. Statistical signal processing: detection, estimation, and time series analysis. (No Title), 1991.

Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE transactions on antennas and propagation, 34(3):276 280, 1986.

Sitzmann, V., Martel, J., Bergman, A., Lindell, D., and Wetzstein, G. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462 7473, 2020.

Sun, S. and Zhang, Y. D. 4d automotive radar sensing for autonomous vehicles: A sparsity-oriented approach. IEEE Journal of Selected Topics in Signal Processing, 15 (4):879 891, 2021.

Sun, S., Petropulu, A. P., and Poor, H. V. Mimo radar for advanced driver-assistance systems and autonomous driving: Advantages and challenges. IEEE Signal Processing Magazine, 37(4):98 117, 2020.

Tancik, M., Srinivasan, P., Mildenhall, B., Fridovich-Keil, S., Raghavan, N., Singhal, U., Ramamoorthi, R., Barron, J., and Ng, R. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33:7537 7547, 2020.

Van Trees, H. L. Optimum array processing: Part IV of detection, estimation, and modulation theory. John Wiley & Sons, 2002.

Wang, M. and Nehorai, A. Coarrays, music, and the cram er rao bound. IEEE Transactions on Signal Processing, 65(4):933 946, 2017. doi: 10.1109/TSP.2016. 2626255.

Y uce, G., Ortiz-Jim enez, G., Besbinar, B., and Frossard, P. A structured dictionary perspective on implicit neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19228 19238, 2022.

Zhao, X., An, Z., Pan, Q., and Yang, L. Nerf2: Neural radiofrequency radiance fields. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, pp. 1 15, 2023.

NEAR: Neural Electromagnetic Array Response

A. Proof of Theorem 4.1

A.1. Notations and Definitions

Let A, B Rn be two sets in n-dimensional Euclidean space. The Minkowski sum or difference of A and B is denoted by A + B, A B respectively, and defined as:

A + B = {a + b | a A, b B}, A B = {a b | a A, b B}

Additionally, we define D(A, B) as the union of the Minkowski sum and difference of A and B, given by:

D(A, B) := (A + B) (A B).

We define U(q) and B(q) as follows:

s(q) = h s(q) 1 , , s(q) T i s(q) t Z,

t=1 |s(q) t | q

s(q) = h s(q) 1 , , s(q) T i s(q) t Z,

t=1 |s(q) t | = q

Recall that we are interested in analyzing the expressive power of INR architectures, which consist of a mapping function γ : RD RT (positional encoding) followed by a multilayer perceptron (MLP). The MLP is parameterized by weights W (ℓ) RFℓ Fℓ 1, biases b(ℓ) RFℓ, and activation functions ρ(ℓ) : R R applied elementwise at each layer ℓ= 1, . . . , L 1. Specifically, denoting the post-activation output of each layer as z(ℓ), most INR architectures compute:

z(0) = γ(r),

z(ℓ) = ρ(ℓ) W (ℓ)z(ℓ 1) + b(ℓ) , ℓ= 1, . . . , L 1,

fΘ(r) = W (L)z(L 1) + b(L),

where r RD is the input coordinate. As it is plausible to normalize the inputs (r) to their bounds, we assume that each variable s period is 1 (normalized to [0, 1)) or 2 (normalized to [ 1, 1)).

A.2. Lemma A.1 and Proof

Lemma A.1. Let Ω= [ω1, . . . , ωT ] RT D be a frequency matrix, and let S1, S2 RT denote two sets of weights corresponding to these frequencies. Additionally, let {ϕs1 R | s1 S1} and {ϕs2 R | s2 S2} represent two collections of scalar phases, and {βs1 R | s1 S1} and {βs2 R | s2 S2} two corresponding sets of scalar coefficients. For any r RD, the following holds:

s1 S1 βs1 cos Ω s1, r + ϕs1 ! X

s2 S2 βs2 cos Ω s2, r + ϕs2 !

βs cos Ω s , r + ϕs ,

D(S1, S2) = (S1 + S2) (S1 S2) , (13)

and { ϕs R | s D(S1, S2)} and { βs R | s D(S1, S2)} denote the resulting scalar phases and coefficients, respectively.

NEAR: Neural Electromagnetic Array Response

s1 S1 βs1 cos Ω s1, r + ϕs1 ! X

s2 S2 βs2 cos Ω s2, r + ϕs2 !

s2 S2 βs1βs2 cos Ω s1, r + ϕs1 cos Ω s2, r + ϕs2

s2 S2 βs1βs2 1 2 cos Ω (s1 + s2), r + ϕs1 + ϕs2 + cos Ω (s1 s2), r + ϕs1 ϕs2

βs cos Ω s , r + ϕs .

The last equality combines terms with the same frequency, where βs and ϕs represent the resultant magnitude and phase, respectively, obtained through phasor addition after grouping. This technique will be used repeatedly in the following subsections to simplify analogous expressions.

A.3. Lemma A.2 and Proof

Lemma A.2. Let Ω= [ω1, . . . , ωT ] RT D be a frequency matrix, and let ET = {et RT | t Z, 1 t T} denote the set of canonical basis vectors of RT . Define S(1) as the augmented set of ET , given by S(1) = {et, et | et ET }. Additionally, let {ϕs(1) R | s(1) S(1)} be a collection of scalar phases, and {βs(1) R | s(1) S(1)} the corresponding set of scalar coefficients. For any r RD and q N, the following equality holds:

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(q) S(q) βs(q) cos Ω s(q), r + ϕs(q) , (14)

S(q) := D(S(q 1), S(1)), B(q) S(q) U(q), (15)

for some { ϕs(q) R | s(q) S(q)} and { βs(q) R | s(q) S(q)}.

Proof. We will use induction to prove our statement. The statement trivially holds for q = 1 since B(1) = S(1) U(1). Assume it also holds for q > 1, we first show that S(q+1) U(q+1). Using the induction hypothesis and Lemma A.1, we have:

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(q) S(q) βs(q) cos Ω s(q), r + ϕs(q)

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(q+1) D(S(q),S(1)) β s(q+1) cos Ω s(q+1), r + ϕ s(q+1)

s(q+1) S(q+1) β s(q+1) cos Ω s(q+1), r + ϕ s(q+1)

NEAR: Neural Electromagnetic Array Response

S(q+1) = D{S(q), S(1)} = n s(q+1) = s(q) s(1) s(q) S(q), s(1) S(1)o

n s(q+1) = s(q) s(1) s(q) U(q), s(1) S(1)o

s(q+1) = h s(q) 1 , , s(q) T i et

t=1 |s(q) t | q, et S(1) )

s(q+1) = h s(q+1) 1 , , s(q+1) T i s(q+1) t Z,

t=1 |s(q+1) t | q + 1

where the last line follows from triangle inequality. Next, we show that given the assumption, we have B(q+1) S(q+1):

S(q+1) = D{S(q), S(1)} = n s(q+1) = s(q) s(1) s(q) S(q), s(1) S(1)o

n s(q+1) = s(q) s(1) s(q) B(q), s(1) S(1)o

s(q+1) = h s(q) 1 , , s(q) T i et

t=1 |s(q) t | = q, et S(1) )

s(q+1) = h s(q+1) 1 , , s(q+1) T i s(q+1) t Z,

t=1 |s(q+1) t | = q + 1

Therefore we have B(q+1) S(q+1) U(q+1). Thus by induction (14) holds q N.

A.4. Lemma A.3 and Proof

Lemma A.3. Let Ω= [ω1, . . . , ωT ] RT D be a frequency matrix, and let ET = {et RT | t Z, 1 t T} denote the set of canonical basis vectors of RT . Define S(1) as the augmented set of ET , given by S(1) = {et, et | et ET }. Additionally, let {ϕs(1) R | s(1) S(1)} be a collection of scalar phases, and {βs(1) R | s(1) S(1)} the corresponding set of scalar coefficients. For any r RD, Q N and αq R (q = 1, , Q), the following equality holds:

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

β s(Q) cos Ω s(Q), r + ϕ s(Q) , (16)

s(Q) = h s(Q) 1 , , s(Q) T i s(Q) t Z

t=1 | s(Q) t | Q

= U(Q) (17)

for some { ϕ s(Q) R | s(Q) S(Q)} and { β s(Q) R | s(Q) S(Q)}.

Proof. According to Lemma A.2, we have:

s(1) S(1) βs(1) cos Ω s(1), r + ϕs(1)

s(q) S(q) βs(q) cos Ω s(q), r + ϕs(q)

β s(Q) cos Ω s(Q), r + ϕ s(Q) ,

where S(Q) = SQ q=1 S(q). Since B(q) S(q) U(q) for q N, we then have:

q=1 B(q) S(Q)

NEAR: Neural Electromagnetic Array Response

According to the definition of B(q) and U(q), we have:

s(q) = h s(q) 1 , , s(q) T i s(q) t Z,

t=1 |s(q) t | = q

s(Q) = h s(Q) 1 , , s(Q) T i s(Q) t Z,

t=1 |s(Q) t | {1, 2, , Q}

s(q) = h s(q) 1 , , s(q) T i s(q) t Z,

t=1 |s(q) t | q

s(q) = h s(q) 1 , , s(q) T i s(q) t Z,

t=1 |s(q) t | Q

Moreover, S(Q) B(0) = s(0) = h s(0) 1 , , s(0) T i s(0) t Z, PT t=1|s(0) t | = 0 since it is easy to verify that S(2)

B(0). Therefore, we have

q=1 B(q) B(0) S(Q) U(Q) = S(Q) = U(Q).

A.5. Lemma A.4 and Proof

Lemma A.4. Let Ω= [ω1, . . . , ωT ] RT D be a frequency matrix, and let ET = {et RT | t Z, 1 t T} denote the set of canonical basis vectors of RT . Define S(1) as the augmented set of ET , given by S(1) = {et, et | et ET }. Additionally, let {ϕs(1) R | s(1) S(1)} be a collection of scalar phases, and {βs(1) R | s(1) S(1)} the corresponding set of scalar coefficients. For any r RD and q, p N, the following equality holds:

s(p) U(p) βs(p) cos Ω s(p), r + ϕs(p)

s(qp) U(qp) βs(qp) cos Ω s(qp), r + ϕs(qp)

for some { βs(qp) R | s(qp) U(qp)} and { ϕs(qp) R | s(qp) U(qp)}.

Proof. Again we will use induction to prove the statement. The statement trivially holds for q = 1. Assume it also holds for q > 1, then

s(p) U(p) βs(p) cos Ω s(p), r + ϕs(p)

s(p) U(p) βs(p) cos Ω s(p), r + ϕs(p)

s(p) U(p) βs(p) cos Ω s(p), r + ϕs(p)

s(qp) U(qp) βs(qp) cos Ω s(qp), r + ϕs(qp)

s(p) U(p) βs(p) cos Ω s(p), r + ϕs(p)

s D(U(qp), U(p))

βs cos Ω s , r + ϕs

NEAR: Neural Electromagnetic Array Response

where the second equality holds by assumption, and the last equality holds by Lemma A.1. Moreover, we have

D{U(qp), U(p)} = n s = s(qp) + s(p) s(qp) U(qp), s(p) U(p)o

s = h s(qp) t i

1 t T + h s(p) t i

s(qp) t , s(p) t Z

t=1 |s(qp) t | qp

t=1 |s(p) t | p

s = [s t] 1 t T

t=1 |s t| (q + 1)p

= U((q+1)p),

where the third line follows from triangle inequality. Next, we will show that U((q+1)p) D{U(qp), U(p)}. For u = [u1, , u T ] U((q+1)p), we would like to construct two vectors such that v = [v1, , v T ] U(qp), w = [w1, , w T ] U(p), and u = v + w.

Let u = PT t=1|ut| and v = min( u, qp). It is easy to see u (q + 1)p by definition. Suppose t T is the largest integer that satisfies Pt

t=1|ut| v. If t = T, it follows that v = u, w = 0, and hence the statement holds. If t < T, we can

argue that |ut +1| > v Pt

t=1|ut| = u , otherwise Pt +1 t=1 |ut| v contradicts the assumption that t T is the largest integer that satisfies Pt

t=1|ut| v. Thus, we can construct such v and w as:

vt = 1{t t }ut + 1{t=t +1} sgn(ut +1)u t N+, t T

wt = 1{t=t +1} sgn(ut +1)(|ut +1| u ) + 1{t>t +1}ut t N+, t T,

where 1{ } denotes the indicator function and sgn( ) represents the sign function. It is easy to follow that v = PT t=1|vt| and

u = v + PT t=1|wt|. Then we will verify that v U(qp) and w U(p). By construction, PT t=1|vt| = Pt

t=1|ut| + u = v min( u, qp) qp, and PT t=1|wt| = u v = u min( u, qp) = max(0, u qp) p since u (q + 1)p, which completes the construction rule. Therefore, we have U((q+1)p) D{U(qp), U(p)}, and hence U((q+1)p) = D{U(qp), U(p)}.

A.6. Main Proof of Theorem 4.1

Proof. To begin with the proof of Theorem 4.1, we will prove the following two statements first.

Define z(1) = W (1)z(0) + b(1) as the pre-activation output of the first layer, where z(0) = γ(r) = sin(ΩT r + ϕT ). Let {ϕs(1) R | s(1) S(1)} be a collection of scalar phases, and {βs(1) R | s(1) S(1)} the corresponding set of scalar coefficients. Let z(1) i and b(1) i denote the ith entries of z(1) and b(1), respectively, and let W (1) i represent the ith row of W (1). We first would like to show that given {ϕs(1) R | s(1) S(1)} and {βs(1) R | s(1) S(1)} (S(1) is defined in Lemma A.2):

s(1) S(1) βs(1) cos Ω T s(1), r + ϕs(1) + ζ = z(1) i = W (1) i sin(ΩT r + ϕT ) + b(1) i

for some W (1) i R1 T and b(1) i R. Note that adding constant does not affect frequency and interchanging sines with cosines only affects the phase terms. We can express the summation as follows:

s(1) S(1) βs(1) cos Ω T s(1), r + ϕs(1) =

t=1 βt cos( ωt, r + ϕt) + β t cos( ωt, r + ϕ t)

t=1 βt sin ωt, r + ϕt + π

+ β t cos ( ωt, r ϕ t)

t=1 Rt sin ( ωt, r + ϕ t) .

NEAR: Neural Electromagnetic Array Response

The final line follows from the Auxiliary Angle Formula, with:

A2 t + B2 t , ϕ t = arctan Bt

At = βt cos(ϕt + π

2 ) + β t sin(ϕ t), Bt = βt sin(ϕt + π

2 ) + β t cos(ϕ t),

where we assume At > 0 for all t = 1, 2, . . . , T, Rt represents the magnitude, and ϕ t is the adjusted phase angle. For the case where At 0, we leave the derivation as an exercise for the reader. Let W (1) i = [Rt] 1 t T , ϕT = [ϕ t]1 t T , and

b(1) i = ζ. Then, we can conclude that the statement holds for i = 1, . . . , F1. Second, we would like to show that given W (1) i R1 T and b(1) i R,

W (1) i sin(ΩT r + ϕT ) + b(1) i = z(1) i = X

s(1) S(1) βs(1) cos Ω T s(1), r + ϕs(1) + ζ

for some {βs(1)} , {ϕs(1)} with cardinality 2T and ζ. We can re-express the summation as follows:

W (1) i sin(ΩT r + ϕT ) + b(1) i = W (1) i cos(ΩT r + ϕT π

2 ) + b(1) i

t=1 [W (1) i ]t cos( ωt, r + ϕt π

2 ) + b(1) i

t=1 ([W (1) i ]t ξ) cos( ωt, r + ϕt π

t=1 ξ cos( ωt, r ϕt + π

2 ) + b(1) i

for ξ R. Let ζ = b(1) i , {βs(1) R | s(1) S(1)} = {ξ, , ξ, [W (1) i ]1 ξ, , [W (1) i ]T ξ} and {ϕs(1) R | s(1) S(1)} = { ϕ1 + π

2 , , ϕT + π

2 } be two ordered sets. Then, we can conclude that the statement holds for i = 1, . . . , F1.

Next, we will prove Theorem 4.1 by induction using the previous statement.

Base case Let us denote the pre-activation vector at layer ℓas z(ℓ), i.e., z(ℓ) = ρ(ℓ)( z(ℓ)). Consider the pre-activation of a node at the first layer of the neural network for any mapping of the form in (3). Then

z(1) i = W (1) i γ(r) =

t=1 wit cos ( ωt, r + ϕt) ,

with some wit R depending on the first layer weights connected to that node and ϕt R. Also note that interchanging sines with cosines only affects the phase terms. After applying the activation function, and using the previous statement and the result of Lemma A.3, the output of each node at the first layer is given by

z(1) i = ρ(1) z(1) i =

q=0 αq z(1) i q =

t=1 wit cos ( ωt, r + ϕt)

s(Q) S(Q) β s(Q) cos Ω T s(Q), r + ϕ s(Q) ,

where S(Q) = SQ q=1 S(q) = U(Q) is defined in Lemma A.3. Therefore, the statement trivially holds, i.e.,

(h s(Q) 1 , , s(Q) T i s(Q) t Z

t=1 | s(Q) t | Q

Induction step Assume the output of the nodes at layer ℓsatisfy the following expression:

s(Qℓ) S(Qℓ) β s(Qℓ),i cos Ω T s(Qℓ), r + ϕ s(Qℓ) ,

NEAR: Neural Electromagnetic Array Response

(h s(Qℓ) 1 , , s(Qℓ) T i s(Qℓ) t Z

t=1 | s(Qℓ) t | Qℓ )

Then, the pre-activation of any node at the (ℓ+ 1)th layer can be expressed as:

z(ℓ+1) i = X

s(Qℓ) S(Qℓ)

β s(Qℓ),i cos Ω T s(Qℓ), r + ϕ s(Qℓ) ,

since the sum of sines/cosines with the same frequency only result in a sine/cosine with the same frequency but with a modified phase and amplitude. Hence, after applying the activation function, the output of the ith node at the (ℓ+ 1)th layer can be written as:

z(ℓ+1) i = ρ(ℓ+1) z(ℓ+1) i =

s(Qℓ) S(Qℓ)

β s(Qℓ),i cos Ω T s(Qℓ), r + ϕ s(Qℓ)

By using Lemma A.4, we have:

s(Qℓ) S(Qℓ)

β s(Qℓ),i cos Ω T s(Qℓ), r + ϕ s(Qℓ)

s(q Qℓ) S(q Qℓ) β s(q Qℓ),i cos Ω T s(q Qℓ), r + ϕ s(q Qℓ) ,

where S(q Qℓ) = h s(q Qℓ) 1 , , s(q Qℓ) T i s(q Qℓ) t Z PT t=1 | s(q Qℓ) t | q Qℓ .

Now, let us use the above result to complete the proof of the inductive step. In particular, we can now express z(ℓ+1) i as:

s(q Qℓ) S(q Qℓ) β s(q Qℓ),i cos Ω T s(q Qℓ), r + ϕ s(q Qℓ)

s S β s ,i sin Ω T s , r + ϕ s ,i ,

q=1 S(q Qℓ) = S(QQℓ) = S(Qℓ+1) =

(h s(Qℓ+1) 1 , , s(Qℓ+1) T i s(Qℓ+1) t Z

t=1 | s(Qℓ+1) t | Qℓ+1 )

This sequence of inclusions concludes the proof of induction. Thus, considering γ(r) = sin(ΩT r + ϕT ), the INR architecture of the form (3) can only represent functions of the form:

s ST cs sin Ω T s, r + ϕs ,

where ST = n [s1, s2, . . . , s T ] st Z, PT t=1|st| QL 1o .

A.7. Proof of the connection between the expressive power of INRs and certain period-2 functions.

Proof. As we previously mentioned, interchanging sines with cosines only affects the phase term, we can rewrite the positional encoding in (5) as

γ(r) = sin(Ωr) cos(Ωr)

= sin(Ωr) sin(Ωr + π

= sin( Ωr + ϕ),

NEAR: Neural Electromagnetic Array Response

where Ω= ΩT ΩT T R4T 2, and ϕ = h 02T 1T π

2 12T 1 T i T R4T 1. We use the same architecture of the form (3). Directly using the result of Theorem 4.1, the expressive power of this architecture is of the form:

s S T cs sin Ω s , r + ϕs ,

[s 1, s 2, . . . , s 4T ] s t Z,

t=1 |s t| QL 1 )

Using the Trigonometric Sum and Difference Formulas, we can rewrite the above as:

s S T cs sin Ω s , r + ϕs

s S T cs cos (ϕs ) sin Ω s , r + cs sin (ϕs ) cos Ω s , r

s S T ds sin Ω s , r + fs cos Ω s , r .

The inner product Ω s , r , where r = [x, y] , can be expressed as a linear combination of the corresponding components, involving coordinate x and y scaled by the respective elements of Ω s . Then, we have

|i|+|j| N,i,j Z Di,j sin (π(ix + jy)) + Fi,j cos (π(ix + jy)) , (19)

where Di,j, Fi,j are some constants with respect to i, j, and N = O(2T 1QL 1). This can be easily verified using the idea of binary representation, since the frequency matrix Ωonly contains coordinate-wise frequencies 2tπ, t = 0, . . . , T 1. Hence, as the layer of MLPs/INRs goes to infinity, i.e. L , we have fΘ(r) approaching to (7).

B. Proof of Theorem 4.5

B.1. Notations and Definitions

Consider an array snapshot containing K targets with azimuth angle θk and elevation angle ϕk (k = 1, , K). Let [0, U1] [0, U2] represent the antenna array response field with bounded domain positioned in the x y plane, consider a general case of an uniform sampling grid of dimensions M1 M2, with spacing d1 = U1 M1 1 λ

2 and d2 = U2 M2 1 λ

2 . According to (2), the (m1, m2) th element of the response with respect to K targets in absence of noise can be written as

k=1 xkej 2π

λ ((m1 1)d1 sin ϕk cos θk+(m2 1)d2 sin ϕk sin θk)

for 1 m1 M1 and 1 m2 M2. Notably, when d1 = d2 = λ

2 , the sampling pattern aligns with the Nyquist sampling. Let Y = [ym1,m2]1 m1 M1,1 m2 M2 CM1 M2 be the response matrix with entries as the antenna array response defined in (8).

Given Y = [ym1,m2] CM1 M2 for 1 m1 M1, 1 m2 M2, a Block Hankel matrix of Y can be constructed as:

HN1,N2(Y) =

HN2(y1) HN2(y2) HN2(y M1 N1+1) HN2(y2) HN2(y3) HN2(y M1 N1+2) ... ... ... ... HN2(y N1) HN2(y N1+1) HN2(y M1)

NEAR: Neural Electromagnetic Array Response

where HN2(ym) is defined as:

ym,1 ym,2 ym,M2 N2+1 ym,2 ym,3 ym,M2 N2+2 ... ... ... ... ym,N2 ym,N2+1 ym,M2

A block Hankel matrix can also be constructed along the column direction, which is defined as:

H N1, N2(Y) =

H N2(y(1)) H N2(y(2)) H N2(y(M2 N1+1)) H N2(y(2)) H N2(y(3)) H N2(y(M2 N1+2)) ... ... ... ... H N2(y( N1)) H N2(y( N1+1)) H N2(y(M2))

where H N2(y(m)) is defined as:

H N2(y(m)) =

y1,m y2,m y M1 N2+1,m y2,m y3,m y M1 N2+2,m ... ... ... ... y N2,m y N2+1,m y M1,m

Note that HN2,N1(Y ) = HN1,N2(Y ). Moreover, we have rank(HN1,N2(Y)) = rank( H N1, N2(Y)) based on the conditions in Lemma B.1. For the sake of clarity, we define

S = IK K 0K

R(K+1) K, b = 0K 1

The matrices S and b are column selection matrices, selecting the first K columns and the last column, respectively.

B.2. Lemma B.1, Lemma B.2, Lemma B.3 and their Proofs

Lemma B.1. For the Block Hankel matrix in Definition 4.3, if K N1 M1 K + 1 and K N2 M2 K + 1, then we have rank(HN1,N2(Y)) = K, and the first K columns of HN1,N2(Y) serve as a basis of R(HN1,N2(Y)). Similarly, if K N1 M2 K + 1 and K N2 M1 K + 1, we have rank( H N1, N2(Y)) = K, and the first K columns of H N1, N2(Y) serve as a basis of R( H N1, N2(Y)).

Proof. Proof followed by (Hua, 1992).

Lemma B.2. For Hankel matrix HN2(ym) (1 m M1), if K N2 M2 K +1, then rank(HN2(ym)) = K, and the first K columns of HN2(ym) serve as a basis of R(HN2(ym)). Similarly, for Hankel matrix H N2(y(m)) (1 m M2), if K N2 M1 K+1, rank( H N2(y(m))) = K, and the first K columns of H N2(y(m)) serve as a basis of R( H N2(y(m))).

Proof. We will prove the statement for HN2(ym), while the proof for H N2(y(m)) follows the same way therefore omitted here. According to (8), we have

1 1 1 ej 2π

λ d1 sin ϕ1 sin θ1 ej 2π

λ d1 sin ϕ2 sin θ2 ej 2π

λ d1 sin ϕK sin θK ... ... ... ... ej 2π

λ (M2 1)d1 sin ϕ1 sin θ1 ej 2π

λ (M2 1)d1 sin ϕ2 sin θ2 ej 2π

λ (M2 1)d1 sin ϕK sin θK

λ (m 1)d1 sin ϕ1 cos θ1

λ (m 1)d1 sin ϕ2 cos θ2 ... x Kej 2π

λ (m 1)d1 sin ϕK cos θK

= AM2(θ, ϕ)sm.

NEAR: Neural Electromagnetic Array Response

It can be shown that HN2(ym) admits a Vandermonde decomposition

HN2(ym) = AN2(θ, ϕ)diag(sm)(AM2 N2+1(θ, ϕ))T ,

where AN2(θ, ϕ) and AM2 N2+1(θ, ϕ) are both Vandermonde matrix. It is easy to verify that if K N2 M2 K + 1, rank(HN2(ym)) = K. And the first K columns of HN2(ym) have the form AN2(θ, ϕ)diag(sm)(AK(θ, ϕ))T , which can be verified to be rank-K due to the Vandermonde structure. Thus, the first K columns form a linearly independent set, therefore serve as a basis for R(HN2(ym)).

Lemma B.3. Consider Hankel matrix HM2 K(ym) (1 m M1) generated using (8), there exists a unique m1 CK

such that M1 X

m=1 HM2 K(ym)Sm1 HM2 K(ym)b 2 = 0. (20)

Similarly, consider Hankel matrix HM1 K(y(m)) (1 m M2) generated using (8), there exists a unique m2 CK such that M2 X

m=1 HM1 K(y(m))Sm2 HM1 K(y(m))b 2 = 0. (21)

Proof. We will prove the statement for HM2 K(ym) (1 m M1), while the proof for HM1 K(y(m)) follows the same way therefore omitted here. According to Lemma B.2, we have rank(HM2 K(ym)) = K where its first K columns serve as a basis of R(HM2 K(ym)). This means there exists α = [α1, , αK]T such that HM2 K(ym)Sα = HM2 K(ym)b. Now let us analyze whether α depend on m. Using Vandermonde decomposition and explicit form of least squares solution, we have HM2 K(ym)S = AM2 K(θ, ϕ)diag(sm)(AK(θ, ϕ))T ,

HM2 K(ym)b = AM2 K(θ, ϕ)diag(sm)

λ Kd1 sin ϕ1 sin θ1 ... ej 2π

λ Kd1 sin ϕK sin θK

α = (HM2 K(ym)S)H(HM2 K(ym)S) 1 (HM2 K(ym)S)HHM2 K(ym)b

= (AT K(θ, ϕ)) 1

λ Kd1 sin ϕ1 sin θ1 ... ej 2π

λ Kd1 sin ϕK sin θK

And we can see that α CK does not depend on m but only depend on θ and ϕ. This means for Hankel matrix HM2 K(ym) (1 m M1), there exists a unique m1 CK such that

m=1 HM2 K(ym)Sm1 HM2 K(ym)b 2 = 0.

B.3. Main Proof of Theorem 4.5

Proof. We will prove the statement for HN1,N2(Y), while the proof for H N1, N2(Y) follows the same way therefore omitted here. Let N1 = M1 K + 1 and N2 = M2 K, consider the matrix HM1 K+1,M2 K(Y). According to Lemma B.1, we have rank(HM1 K+1,M2 K(Y)) = K, and its first K columns

HM1 K+1,M2 K(Y)S =

HM2 K(y1)S HM2 K(y2)S ... HM2 K(y M1 K+1)S

NEAR: Neural Electromagnetic Array Response

are linear independent. If we keep appending columns at the bottom of HM1 K+1,M2 K(Y)S, the columns of the resulting matrix

HM1,M2 K(Y)S =

HM2 K(y1)S ... HM2 K(y M1 K+1)S HM2 K(y M1 K+2)S ... HM2 K(y M1)S

is still linear independent. Using Lemma B.2 and Lemma B.3, if we add the K + 1-th column at the right of the above matrix, the resulting matrix HM1,M2 K(Y) is still rank-K with its first K columns serve as a basis of R(HM1,M2 K(Y)). Thus, there exists a unique global optimizer mo 1 such that

HM1,M2 K(Y)Smo 1 HM1,M2 K(Y)b 2 = 0.

C. Further Experimental Results and Details

C.1. Experimental Setup

C.1.1. SIMULATION DATA GENERATION

For the signal model in (2), we assume the reflection coefficients follow a circularly symmetric complex Gaussian distribution, given by xk CN(0, σ2 x) with σx = 1 for k = 1, . . . , K. The additive noise is modeled as ni,j CN(0, σ2 n), where σn is determined by the specified SNR levels. The SNR is defined as:

SNRd B = 10 log10 Px Pn = 10 log10 σ2 x σ2n .

For tasks with different sampling configurations, we use the following selected indices for sub-sampling:

6 6: Sx = Sy = {0, 1, 2, 3, 11, 19},

8 8: Sx = Sy = {0, 1, 2, 3, 4, 9, 14, 19},

10 10: Sx = Sy = {0, 1, 2, 3, 4, 7, 10, 13, 16, 19}.

Remark. When the index sets are mapped to the world coordinate system, each discrete index (i, j) corresponds to the physical position rij = i λ

where λ is the wavelength.

For angle resolution experiments, we define the azimuth and elevation angles of two targets as [10, 20] and [10 + , 20 + ] degrees, where represents the angular separation, which varies from 3 to 10 degrees. For the other tasks, the azimuth and elevation angles of each target are randomly sampled from a uniform distribution over [ 60, 60] degrees. Each experiment is conducted with N = 50 Monte-Carlo trails.

C.1.2. EVALUATION METRICS

The Normalized Root Mean Square Error (NRMSE) is defined as:

where ˆY and Y denote the predicted array response and the ground truth full virtual array, respectively, and F represents the Frobenius norm.

NEAR: Neural Electromagnetic Array Response

The Resolution Probability is defined as:

n=1 1{Ea Ee}, with

Ea = ˆθn a,1 θa,1

, ˆθn a,2 θa,2

Ee = ˆθn e,1 θe,1

, ˆθn e,2 θe,2

where ˆθn i = [ˆθn a,i, ˆθn e,i], i = 1, 2 denote the prediction azimuth and elevation angles for each target in n-th Monte-Carlo trail, θi = [θa,i, θe,i], i = 1, 2 represent the ground truth azimuth and elevation angles, 1{ } is the indicator function, and

separation = θa,2 θa,1 = θe,2 θe,1.

C.1.3. OPTIMIZATION AND HYPERPARAMETERS

We optimize the loss function defined in (11) through a two-stage training process. In the initial warm-up stage, we set λ = 0 and optimize using the Adam optimizer with β = (0.9, 0.999) and a weight decay of 10 4. Letting Θ0 = arg minΘ Ld, we use the obtained parameters as the initialization for the next stage. In the adaptation/training stage, we optimize Θ, m1, and m2 using Adam with the same configuration as in the warm-up stage. In both the simulation and real-world experiments, we normalized the input coordinates to the range ( 1, 1].

We provide detail hyperparameter settings for both simulation and real-world experiments. The model architecture remains consistent with that described in the Experiments section. For simulation tasks, we use a learning rate of 10 4 and train for 5, 000 epochs in the warm-up stage. In the adaptation stage, we set λ = 0.5, lrΘ = 10 3, lrm1,m2 = 3 10 3, and train for 25, 000 epochs, with K max set to the exact number of targets for each scenario.

For real-world experiments, we adopt a learning rate of 10 4 and train for 10, 000 epochs in the warm-up stage. In the adaptation stage, we set λ = 1, lrΘ = 10 3, lrm1,m2 = 3 10 3, and train for 50, 000 epochs. Here, we set K max = 4 as an upper bound on the number of targets in each range bin, as typically, the number of targets within a single range bin is very small (Sun et al., 2020).

C.2. Implementation Details and Analysis of Baseline Methods

EMa C. We adopt equation (9) from the original paper (Chen & Chi, 2013) as the optimization problem, which can be solved using CVX (CVX Research, 2012; Grant & Boyd, 2008) toolbox.

SIREN. We adopt the same architecture and recommended hyperparameters from the original paper (Sitzmann et al., 2020). For a fair comparison, we match NEAR s network size, using a depth of L = 4 and a hidden layer width of 256.

Ne RF2. We adopt the same architecture and recommended hyperparameters from the original paper (Zhao et al., 2023). To match our experimental setup, the location of TX is fixed (co-located with RX), and the unknown antenna response is inferred based on their spatial coordinates. The loss function is calculated as the mean-squared error (MSE) between predicted array responses and observed array response, rather than using RSSI values.

Remark. The inferior performance of Ne RF2 is attributed to some important distinctions between radiance-field reconstruction and our method, which are listed below:

Our setting uses far fewer measurements (see below) in the form of a antenna array response, compared to Ne RF2. This renders measurement-heavy methods like Ne RF2 somewhat inferior in our settings. Hence we need to heavily utilize the underlying wave propagation model and the harmonic structure of measurements received at antenna arrays, in order to successfully regularize the problem with so few measurements. This is a major contribution of our work which sets us apart from direct use of Ne RF2.

In fact, NEAR targets a different objective than Ne RF2. Our approach emphasizes more on the (super-resolution) localization of the targets, while Ne RF2 cares more about the physical property of all objects in a 3D scene in order to model signal propagation. This also serves a crucial reason why we opt to directly predict the response from the antenna coordinates rather than modeling all the voxels properties as a continuous volumetric function.

NEAR: Neural Electromagnetic Array Response

As explained earlier, Ne RF2 requires a large set of measurements for training. According to (Zhao et al., 2023), it uses around 6000 21 measurements and 80%/20% for training/testing splitting, while we only use a sparse set of 8 8 measurements for training. Under the same setting of training, NEAR uses less than half of the training time of Ne RF2

due to our proposed regularization rather than the ray tracing strategy, which is well known for its heavy computational cost.

C.3. Additional Details on Radar Data Processing

To sense the environment, the system emits a sequence of waveforms, commonly referred to as chirp signals, through the Tx within a short time interval. These signals propagate, interact with objects in the environment, and are subsequently reflected back to be captured by the Rx. The received signals are then processed to generate an intermediate frequency (IF) signal by mixing the transmitted and received signals from each Tx-Rx antenna pair. This mixed signal is then sampled by an ADC to generate discrete samples for each chirp. By aggregating ADC samples across all chirps and Tx-Rx antenna pairs, the sensing system constructs a three-dimensional (3D) complex data cube for each frame. This data cube is organized into three dimensions: fast time, slow time, and channel, which correspond to range, range rate, and angle, respectively (Kramer et al., 2022).

To process the acquired ADC samples, fourier techniques are applied along the fast time and slow time dimensions to extract detailed information. The first range processing is performed across the fast time axis to isolate objects at different distances into distinct frequency responses within range bins defined by hardware specifications. Subsequently, a Doppler processing along the slow time axis decodes phase variances Doppler bins to derive relative radial velocities, producing a range-Doppler (RD) map (Ding et al., 2024). An additional CFAR target detector is usually employed to detect peaks that stand out prominently from their surroundings in the range-Doppler velocity heat-map by comparing local signal power to an adaptive threshold. DOA processing is then performed only for the peaks detected by the CFAR detector.

C.4. More Experimental Results

Some more experimental results of target localization are shown below.

Figure 6. Point cloud visualizations for target localization with K = 1 (scenario 1). Left: Full array, Middle: NEAR, Right: EMa C.

Figure 7. Point cloud visualizations for target localization with K = 1 (scenario 2). Left: Full array, Middle: NEAR, Right: EMa C.

NEAR: Neural Electromagnetic Array Response

Figure 8. Point cloud visualizations for target localization with K = 2 (scenario 1). Left: Full array, Middle: NEAR, Right: EMa C.

Figure 9. Point cloud visualizations for target localization with K = 2 (scenario 2). Left: Full array, Middle: NEAR, Right: EMa C.

Figure 10. Point cloud visualizations for target localization with K = 3 (scenario 1). Left: Full array, Middle: NEAR, Right: EMa C.

Figure 11. Point cloud visualizations for target localization with K = 3 (scenario 2). Left: Full array, Middle: NEAR, Right: EMa C.

NEAR: Neural Electromagnetic Array Response

Figure 12. Point cloud visualizations for target localization with K = 4 (scenario 1). Left: Full array, Middle: NEAR, Right: EMa C.

Figure 13. Point cloud visualizations for target localization with K = 4 (scenario 2). Left: Full array, Middle: NEAR, Right: EMa C.