# structureblind_signal_recovery__1bb3ca86.pdf

Structure-Blind Signal Recovery

Dmitry Ostrovsky Zaid Harchaoui Anatoli Juditsky Arkadi Nemirovski firstname.lastname@imag.fr

We consider the problem of recovering a signal observed in Gaussian noise. If the set of signals is convex and compact, and can be speciﬁed beforehand, one can use classical linear estimators that achieve a risk within a constant factor of the minimax risk. However, when the set is unspeciﬁed, designing an estimator that is blind to the hidden structure of the signal remains a challenging problem. We propose a new family of estimators to recover signals observed in Gaussian noise. Instead of specifying the set where the signal lives, we assume the existence of a well-performing linear estimator. Proposed estimators enjoy exact oracle inequalities and can be efﬁciently computed through convex optimization. We present several numerical illustrations that show the potential of the approach.

1 Introduction

We consider the problem of recovering a complex-valued signal (xt)t Z from the noisy observations

yτ = xτ + σζτ, n τ n. (1)

Here n Z+, and ζτ CN(0, 1) are i.i.d. standard complex-valued Gaussian random variables, meaning that ζ0 = ξ1 0 + ıξ2 0 with i.i.d. ξ1 0, ξ2 0 N(0, 1). Our goal is to recover xt, 0 t n, given the sequence of observations yt n, ..., yt up to instant t, a task usually referred to as (pointwise) ﬁltering in machine learning, statistics, and signal processing [5].

The traditional approach to this problem considers linear estimators, or linear ﬁlters, which write as

τ=0 φτyt τ, 0 t n.

Linear estimators have been thoroughly studied in various forms, they are both theoretically attractive [7, 3, 2, 16, 17, 11, 13] and easy to use in practice. If the set X of signals is well-speciﬁed, one can usually compute a (nearly) minimax on X linear estimator in a closed form. In particular, if X is a class of smooth signals, such as a H older or a Sobolev ball, then the corresponding estimator is given by the kernel estimator with the properly set bandwidth parameter [16] and is minimax among all possible estimators. Moreover, as shown by [6, 2], if only X is convex, compact, and centrally symmetric, the risk of the best linear estimator of xt is within a small constant factor of the minimax risk over X. Besides, if the set X can be speciﬁed in a computationally tractable way, which clearly is still a weaker assumption than classical smoothness assumptions, the best linear estimator can be efﬁciently computed by solving a convex optimization problem on X. In other words, given a computationally tractable set X on the input, one can compute a nearly-minimax linear estimator and the corresponding (nearly-minimax) risk over X. The strength of this approach, however, comes at

LJK, University of Grenoble Alpes, 700 Avenue Centrale, 38401 Domaine Universitaire de Saint-Martind H eres, France. University of Washington, Seattle, WA 98195, USA. Georgia Institute of Technology, Atlanta, GA 30332, USA.

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

a price: the set X still must be speciﬁed beforehand. Therefore, when one faces a recovery problem without any prior knowledge of X, this approach cannot be implemented.

We adopt here a novel approach to ﬁltering, which we refer to as structure-blind recovery. While we do not require X to be speciﬁed beforehand, we assume that there exists a linear oracle a wellperforming linear estimator of xt. Previous works [8, 10, 4], following a similar philosophy, proved that one can efﬁciently adapt to the linear oracle ﬁlter of length m = O(n) if the corresponding ﬁlter φ is time-invariant, i.e. it recovers the target signal uniformly well in the O(n)-sized neighbourhood of t, and if its ℓ2-norm is small bounded by ρ/ m for a moderate ρ 1. The adaptive estimator is computed by minimizing the ℓ -norm of the ﬁlter discrepancy, in the Fourier domain, under the constraint on the ℓ1-norm of the ﬁlter in the Fourier domain. Put in contrast to the oracle linear ﬁlter, the price for adaptation is proved to be O(ρ3

ln n), with the lower bound of O(ρ

ln n) [8, 4].

We make the following contributions:

we propose a new family of recovery methods, obtained by solving a least-squares problem constrained or penalized by the ℓ1-norm of the ﬁlter in the Fourier domain;

we prove exact oracle inequalities for the ℓ2-risk of these methods;

we show that the price for adaptation improves upon previous works [8, 4] to O(ρ2

ln n) for the point-wise risk and to O(ρ

ln n) for the ℓ2-risk.

we present numerical experiments that show the potential of the approach on synthetic and real-world images and signals.

Before presenting the theoretical results, let us introduce the notation we use throughout the paper.

Filters Let C(Z) be the linear space of all two-sided complex-valued sequences x = {xt C}t Z. For k, k Z we consider ﬁnite-dimensional subspaces

C(Zk k ) = {x C(Z) : xt = 0, t / [k, k ]} .

It is convenient to identify m-dimensional complex vectors, m = k k + 1, with elements of C(Zk k ) by means of the notation:

xk k := [xk; ...; xk ] Ck k+1.

We associate to linear mappings C(Zk k ) C(Zj

j ) (j j +1) (k k+1) matrices with complex entries. The convolution u v of two sequences u, v C(Z) is a sequence with elements

τ Z uτvt τ, t Z.

Given observations (1) and ϕ C(Zm 0 ) consider the (left) linear estimation of x associated with ﬁlter ϕ: bxt = [ϕ y]t (bxt is merely a kernel estimate of xt by a kernel ϕ supported on [0, ..., m]).

Discrete Fourier transform We deﬁne the unitary Discrete Fourier transform (DFT) operator Fn : Cn+1 Cn+1 by

z 7 Fnz, [Fnz]k = (n + 1) 1/2 n X

t=0 zt e 2πıkt

n+1 , 0 k n.

The inverse Discrete Fourier transform (i DFT) operator F 1 n is given by F 1 n := F H n (here AH stands for Hermitian adjoint of A). By the Fourier inversion theorem, F 1 n (Fn z) = z.

We denote p usual ℓp-norms on C(Z): x p = (P

t Z |xt|p)1/p, p [1, ]. Usually, the argument will be ﬁnite-dimensional an element of C(Zk k ); we reserve the special notation

x n,p := xn 0 p.

Furthermore, DFT allows to equip C(Zn 0) with the norms associated with ℓp-norms in the spectral domain: x n,p := xn 0 p := Fnxn 0 p, p [1, ];

note that unitarity of the DFT implies the Parseval identity: x n,2 = x n,2.

Finally, c, C, and C stand for generic absolute constants.

2 Oracle inequality for constrained recovery

Given observations (1) and ϱ > 0, we ﬁrst consider the constrained recovery bxcon given by

[bxcon]t = [bϕ y]t, t = 0, ..., n,

where bϕ is an optimal solution of the constrained optimization problem

min ϕ C(Zn 0 )

y ϕ y n,2 : ϕ n,1 ϱ/

n + 1 . (2)

The constrained recovery estimator minimizes a least-squares ﬁt criterion under a constraint on ϕ n,1 = Fnϕn 0 1, that is an ℓ1 constraint on the discrete Fourier transform of the ﬁlter. While the least-squares objective naturally follows from the Gaussian noise assumption, the constraint can be motivated as follows.

Small-error linear ﬁlters Linear ﬁlter ϕo with a small ℓ1 norm in the spectral domain and small recovery error exists, essentially, whenever there exists a linear ﬁlter with small recovery error [8, 4]. Indeed, let us say that x C(Zn 0) is simple [4] with parameters m Z+ and ρ 1 if there exists φo C(Zm 0 ) such that for all m τ 2m,

E |xτ [φo y]τ|2 1/2 σρ m + 1. (3)

In other words, x is (m, ρ)-simple if there exists a hypothetical ﬁlter φo of the length at most m + 1 which recovers xτ with squared risk uniformly bounded by σ2ρ2

m+1 in the interval m τ 2m. Note that (3) clearly implies that φo 2 ρ/ m + 1, and that |[x φo x]τ| σρ/ m + 1 τ, m τ 2m. Now, let n = 2m, and let

ϕo = φo φo Cn+1.

As proved in [15, Appendix C], we have

ϕo n,1 2ρ2/

and, for a moderate absolute constant c,

x ϕo y n,2 cσρ2p

1 + ln[1/α] (5)

with probability 1 α. To summarize, if x is (m, ρ)-simple, i.e., when there exists a ﬁlter φo of length m + 1 which recovers x with small risk on the interval [ m, 2m], then the ﬁlter ϕo = φo φo of the length at most n + 1, with n = 2m, has small norm ϕo n,1 and recovers the signal x with (essentially the same) small risk on the interval [0, n].

Hidden structure The constrained recovery estimator is completely blind to a possible hidden structure of the signal, yet can seamlessly adapt to it when such a structure exists, in a way that we can rigorously establish. Using the right-shift operator on C(Z), [ x]t = xt 1, we formalize the hidden structure as an unknown shift-invariant linear subspace of C(Z), S = S, of a small dimension s. We do not assume that x belongs to that subspace. Instead, we make a more general assumption that x is close to this subspace, that is, it may be decomposed into a sum of a component that lies in the subspace and a component whose norm we can control.

Assumption A We suppose that x admits the decomposition

x = x S + ε, x S S,

where S is an (unknown) shift-invariant, S = S, subspace of C(Z) of dimension s, 1 s n+1, and ε is small , namely, τε n,2 σκ, 0 τ n.

Shift-invariant subspaces of C(Z) are exactly the sets of solutions of homogeneous linear difference equations with polynomial operators. This is summarized by the following lemma (we believe it is a known fact; for completeness we provide a proof in [15, Appendix C]). Lemma 2.1. Solution set of a homogeneous difference equation with a polynomial operator p( ),

= 0, t Z, (6)

with deg(p( )) = s, p(0) = 1, is a shift-invariant subspace of C(Z) of dimension s. Conversely, any shift-invariant subspace S C(Z), S S, dim(S) = s < , is the set of solutions of some homogeneous difference equation (6) with deg(p( )) = s, p(0) = 1. Moreover, such p( ) is unique.

On the other hand, for any polynomial p( ), solutions of (6) are exponential polynomials [? ] with frequencies determined by the roots of p( ). For instance, discrete-time polynomials xt = Ps 1 k=0 cktk, t Z of degree s 1 (that is, exponential polynomials with all zero frequencies) form a linear space of dimension s of solutions of the equation (6) with a polynomial p( ) = (1 )s with a unique root of multiplicity s, having coefﬁcients pk = ( 1)k s k . Naturally, signals which are close, in the ℓ2 distance, to discrete-time polynomials are Sobolev-smooth functions sampled over the regular grid [10]. Sum of harmonic oscillations xt = Ps k=1 ckeıωkt, ωk [0, 2π) being all different, is another example; here, p( ) = Qs k=1(1 eıωk ).

We can now state an oracle inequality for the constrained recovery estimator; see [15, Appendix B]. Theorem 2.1. Let ϱ 1, and let ϕo C(Zn 0) be such that

Suppose that Assumption A holds for some s Z+ and κ < . Then for any α, 0 < α 1, it holds with probability at least 1 α:

x bxcon n,2 x ϕo y n,2 + Cσ q

ln [1/α] + ln [n/α] . (7)

When considering simple signals, Theorem 2.1 gives the following. Corollary 2.1. Assume that signal x is (m, ρ)-simple, ρ 1 and m Z+. Let n = 2m, ϱ 2ρ2, and let Assumption A hold for some s Z+ and κ < . Then for any α, 0 < α 1, it holds with probability at least 1 α:

x bxcon n,2 Cσρ2p

ln[1/α] + C σ q

ln [1/α] + ln [n/α] .

Adaptation and price The price for adaptation in Theorem 2.1 and Corollary 2.1 is determined by three parameters: the bound on the ﬁlter norm ϱ, the deterministic error κ, and the subspace dimension s. Assuming that the signal to recover is simple, and that ϱ = 2ρ2, let us compare the magnitude of the oracle error to the term of the risk which reﬂects price of adaptation . Typically (in fact, in all known to us cases of recovery of signals from a shift-invariant subspace), the parameter ρ is at least s. Therefore, the bound (5) implies the typical bound O(σ γρ2) = σs γ for the term x ϕo y n,2 (we denote γ = ln(1/α)). As a result, for instance, in the parametric situation , when the signal belongs or is very close to the subspace, that is when κ = O(ln(n)), the price of adaptation O σ[s + ρ2(γ + γ ln n)]1/2 is much smaller than the bound on the oracle error. In the nonparametric situation , when κ = O(ρ2), the price of adaptation has the same order of magnitude as the oracle error.

Finally, note that under the premise of Corollary 2.1 we can also bound the pointwise error. We state the result for ϱ = 2ρ2 for simplicity; the proof can be found in [15, Appendix B].

Theorem 2.2. Assume that signal x is (m, ρ)-simple, ρ 1 and m Z+. Let n = 2m, ϱ = 2ρ2, and let Assumption A hold for some s Z+ and κ < . Then for any α, 0 < α 1, the constrained recovery bxcon satisﬁes

|xn [bxcon]n| C σρ m + 1

ln[n/α] + ρ q

ln [1/α] + s .

3 Oracle inequality for penalized recovery

To use the constrained recovery estimator with a provable guarantee, see e.g. Theorem 2.1, one must know the norm of a small-error linear ﬁlter ϱ, or at least have an upper bound on it. However, if this parameter is unknown, but instead the noise variance is known (or can be estimated from data), we can build a more practical estimator that still enjoys an oracle inequality.

The penalized recovery estimator [bxpen]t = [bϕ y]t is an optimal solution to a regularized leastsquares minimization problem, where the regularization penalizes the ℓ1-norm of the ﬁlter in the Fourier domain: bϕ Argmin ϕ C(Zn 0 )

y ϕ y 2 n,2 + λ

n + 1 ϕ n,1 . (8)

Similarly to Theorem 2.1, we establish an oracle inequality for the penalized recovery estimator. Theorem 3.1. Let Assumption A hold for some s Z+ and κ < , and let ϕo C(Zn 0) satisfy ϕo n,1 ϱ/ n + 1 for some ϱ 1.

1o. Suppose that the regularization parameter of penalized recovery bxpen satisﬁes λ λ,

λ := 60σ2 ln[63n/α].

Then, for 0 < α 1, it holds with probability at least 1 α:

x bxpen n,2 x ϕo y n,2 + C p

s + (bϱ + 1)κ p

where bϱ := n + 1 bϕ n,1.

2o. Moreover, if κ κ,

κ := 10 ln[42n/α] p

ln [16/α] ,

and λ 2λ, one has

x bxpen n,2 x ϕo y n,2 + C p

ϱλ + C σ s.

The proof closely follows that of Theorem 2.1 and can also be found in [15, Appendix B].

4 Discussion

There is some redundancy between simplicity of a signal, as deﬁned by (3), and Assumption A. Usually a simple signal or image x is also close to a low-dimensional subspace of C(Z) (see, e.g., [10, section 4]), so that Assumption A holds automatically . Likewise, x is almost simple when it is close to a low-dimensional time-invariant subspace. Indeed, if x C(Z) belongs to S, i.e. Assumption A holds with κ = 0, one can easily verify that for n s there exists a ﬁlter φo C(Zn n) such that

s/(n + 1), and xτ = [φo x]τ, τ Z . (9)

See [15, Appendix C] for the proof. This implies that x can be recovered efﬁciently from observations (1): E |xτ [φo y]τ|2 1/2 σ r s n + 1.

In other words, if instead of the ﬁltering problem we were interested in the interpolation problem of recovering xt given 2n + 1 observations yt n, ..., yt+n on the left and on the right of t, Assumption

A would imply a kind of simplicity of x. On the other hand, it is clear that Assumption A is not sufﬁcient to imply the simplicity of x with respect to the ﬁltering , in the sense of the deﬁnition we use in this paper, when we are allowed to use only observations on the left of t to compute the estimation of xt. Indeed, one can see, for instance, that already signals from the parametric family Xα = {x C(Z) : xτ = cατ, c C}, with a given |α| > 1, which form a one-dimensional space of solutions of the equation xτ = αxτ 1, cannot be estimated with small risk at t using only observations on the left of t (unless c = 0), and thus are not simple in the sense of (3).

Of course, in the above example, the difﬁculty of the family Xα is due to instability of solutions of the difference equation which explode when τ + . Note that signals x Xα with |α| 1 (linear functions, oscillations, or damped oscillations) are simple. More generally, suppose that x satisﬁes a difference equation of degree s:

where p(z) = Ps i=0 pizi is the corresponding characteristic polynomial and is the right shift operator. When p(z) is unstable has roots inside the unit circle (depending on initial conditions ) the set of solutions to the equation (10) contains difﬁcult to ﬁlter signals. Observe that stability of solutions is related to the direction of the time axis; when the characteristic polynomial p(z) has roots outside the unit circle, the corresponding solutions may be left unstable increase exponentially when τ . In this case right ﬁltering estimating xτ using observations on the right of τ will be difﬁcult. A special situation where interpolation and ﬁltering is always simple arises when the characteristic polynomial of the difference equation has all its roots on the unit circle. In this case, solutions to (10) are generalized harmonic oscillations (harmonic oscillations modulated by polynomials), and such signals are known to be simple. Theorem 4.1 summarizes the properties of the solutions of (10) in this particular case; see [15, Appendix C] for the proof. Theorem 4.1. Let s be a positive integer, and let p = [p0; ...; ps] Cs+1 be such that the polynomial p(z) = Ps i=0 pizi has all its roots on the unit circle. Then for every integer m satisfying

m m(s) := Cs2 ln(s + 1),

one can point out q Cm+1 such that any solution to (10) satisﬁes

xτ = [q x]τ, τ Z,

and q 2 ρ(s, m)/ m where ρ(s, m) = C min n s3/2

ln[ms] o . (11)

5 Numerical experiments

We present preliminary results on simulated data of the proposed adaptive signal recovery methods in several application scenarios. We compare the performance of the penalized ℓ2-recovery of Sec. 3 to that of the Lasso recovery of [1] in signal and image denoising problems. Implementation details for the penalized ℓ2-recovery are given in Sec. 6. Discussion of the discretization approach underlying the competing Lasso method can be found in [1, Sec. 3.6].

We follow the same methodology in both signal and image denoising experiments. For each level of the signal-to-noise ratio SNR {1, 2, 4, 8, 16}, we perform N Monte-Carlo trials. In each trial, we generate a random signal x on a regular grid with n points, corrupted by the i.i.d. Gaussian noise of variance σ2. The signal is normalized: x 2 = 1 so SNR 1 = σ n. We set the regularization penalty in each method as follows. For penalized ℓ2-recovery (8), we use λ = 2σ2 log[63n/α] with α = 0.1. For Lasso [1], we use the common setting λ = σ 2 log n. We report experimental results by plotting the ℓ2-error bx x 2, averaged over N Monte-Carlo trials, versus the inverse of the signal-to-noise ratio SNR 1.

Signal denoising We consider denoising of a one-dimensional signal in two different scenarios, ﬁxing N = 100 and n = 100. In the Random Spikes scenario, the signal is a sum of 4 harmonic oscillations, each characterized by a spike of a random amplitude at a random position in the continuous frequency domain [0, 2π]. In the Coherent Spikes scenario, the same number of spikes is

0.06 0.12 0.25 0.5 1 2 4

Lasso [1] Pen. 2-rec.

0.06 0.12 0.25 0.5 1 2 4 0.025

Lasso [1] Pen. 2-rec.

0.06 0.12 0.25 0.5 1 2 4

Lasso [1] Pen. 2-rec.

0.06 0.12 0.25 0.5 1 2 4

Lasso [1] Pen. 2-rec.

Figure 1: Signal and image denoising in different scenarios, left to right: Random Spikes, Coherent Spikes, Random Spikes-2D, and Coherent Spikes-2D. The steep parts of the curves on high noise levels correspond to observations being thresholded to zero.

sampled by pairs. Spikes in each pair have the same amplitude and are separated by only 0.1 of the DFT bin 2π/n which could make recovery harder due to high signal coherency. However, in practice we found Random Spikes to be slightly harder than Coherent Spikes for both methods, see Fig. 1. As Fig. 1 shows, the proposed penalized ℓ2-recovery outperforms the Lasso method for all noise levels. The performance gain is particularly signiﬁcant for high signal-to-noise ratios.

Image Denoising We now consider recovery of an unknown regression function f on the regular grid on [0, 1]2 given the noisy observations:

yτ = xτ + σζτ, τ {0, 1, ..., m 1}2 , (12)

where xτ = f(τ/m). We ﬁx N = 40, and the grid dimension m = 40; the number of samples is then n = m2. For the penalized ℓ2-recovery, we implement the blockwise denoising strategy (see Appendix for the implementation details) with just one block for the entire image. We present additional numerical illustrations in the supplementary material.

We study three different scenarios for generating the ground-truth signal in this experiment. The ﬁrst two scenarios, Random Spikes-2D and Coherent Spikes-2D, are two-dimensional counterparts of those studied in the signal denoising experiment: the ground-truth signal is a sum of 4 harmonic oscillations in R2 with random frequencies and amplitudes. The separation in the Coherent Spikes2D scenario is 0.2π/m in each dimension of the torus [0, 2π]2. The results for these scenarios are shown in Fig. 1. Again, the proposed penalized ℓ2-recovery outperforms the Lasso method for all noise levels, especially for high signal-to-noise ratios.

In scenario Dimension Reduction-2D we investigate the problem of estimating a function with a hidden low-dimensional structure. We consider the single-index model of the regression function:

f(t) = g(θT t), g( ) S1 β(1). (13)

Here, S1 β(1) = {g : R R, g(β)( ) 2 1} is the Sobolev ball of smooth periodic functions on [0, 1], and the unknown structure is formalized as the direction θ. In our experiments we sample the direction θ uniformly at random and consider different values of the smoothness index β. If it is known a priori that the regression function possesses the structure (13), and only the index is unknown, one can use estimators attaining one-dimensional rates of recovery; see e.g. [12] and references therein. In contrast, our recovery algorithms are not aware of the underlying structure but might still adapt to it.

As shown in Fig. 2, the ℓ2-recovery performs well in this scenario despite the fact that the available theoretical bounds are pessimistic. For example, the signal (13) with a smooth g can be approximated by a small number of harmonic oscillations in R2. As follows from the proof of [9, Proposition 10] combined with Theorem 4.1, for a sum of k harmonic oscillations in Rd one can point out a reproducing linear ﬁlter with ϱ(k) = O(k2d) (neglecting the logarithmic factors), i.e. the theoretical guarantee is quite conservative for small values of β.

6 Details of algorithm implementation

Here we give a brief account of some techniques and implementation tricks exploited in our codes.

Solving the optimization problems Note that the optimization problems (2) and (8) underlying the proposed recovery algorithms are well structured Second-Order Conic Programs (SOCP) and

<pn 0.06 0.12 0.25 0.5 1 2 4

Lasso [1] Pen. 2-rec.

<pn 0.06 0.12 0.25 0.5 1 2 4 0.025

Lasso [1] Pen. 2-rec.

<pn 0.06 0.12 0.25 0.5 1 2 4 0.025

Lasso [1] Pen. 2-rec.

Figure 2: Image denoising in Dimension Reduction scenario; smoothness decreases from left to right.

can be solved using Interior-point methods (IPM). However, the computational complexity of IPM applied to SOCP with dense matrices grows rapidly with problem dimension, so that large problems of this type arising in signal and image processing are well beyond the reach of these techniques. On the other hand, these problems possess nice geometry associated with complex ℓ1-norm. Moreover, their ﬁrst-order information the value of objective and its gradient at a given ϕ can be computed using Fast Fourier Transform in time which is almost linear in problem size. Therefore, we used ﬁrstorder optimization algorithms, such as Mirror-Prox and Nesterov s accelerated gradient algorithms (see [14] and references therein) in our recovery implementation. A complete description of the application of these optimization algorithms to our problem is beyond the scope of the paper; we shall present it elsewhere.

Interpolating recovery In Sec. 2-3 we considered only recoveries which estimated the value xt of the signal via the observations at n + 1 points t n, ..., t on the left (ﬁltering problem). To recover the whole signal, one may consider a more ﬂexible alternative interpolating recovery which estimates xt using observations on the left and on the right of t. In particular, if the objective is to recover a signal on the interval { n, ..., n}, one can apply interpolating recoveries which use the same observations y n, ..., yn to estimate xτ at any τ { n, ..., n}, by altering the relative position of the ﬁlter and the current point.

Blockwise recovery Ideally, when using pointwise recovery, a speciﬁc ﬁlter is constructed for each time instant t. This may pose a tremendous amount of computation, for instance, when recovering a high-resolution image. Alternatively, one may split the signal into blocks, and process the points of each block using the same ﬁlter (cf. e.g. Theorem 2.1). For instance, a one-dimensional signal can be divided into blocks of length, say, 2m + 1, and to recover x C(Zm m) in each block one may ﬁt one ﬁlter of length m + 1 recovering the right half-block xm 0 and another ﬁlter recovering the left half-block x 1 m.

7 Conclusion

We introduced a new family of estimators for structure-blind signal recovery that can be computed using convex optimization. The proposed estimators enjoy oracle inequalities for the ℓ2-risk and for the pointwise risk. Extensive theoretical discussions and numerical experiments will be presented in the follow-up journal paper.

Acknowledgments

We would like to thank Arnak Dalalyan and Gabriel Peyr e for fruitful discussions. DO, AJ, ZH were supported by the Lab Ex PERSYVAL-Lab (ANR-11-LABX-0025) and the project Titan (CNRSMastodons). ZH was also supported by the project Macaron (ANR-14-CE23-0003-01), the MSRInria joint centre, and the program Learning in Machines and Brains (CIFAR). Research of AN was supported by NSF grants CMMI-1262063, CCF-1523768.

[1] B. N. Bhaskar, G. Tang, and B. Recht. Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Signal Processing, 61(23):5987 5999, 2013.

[2] D. L. Donoho. Statistical estimation and optimal recovery. Ann. Statist., 22(1):238 270, 03 1994.

[3] D. L. Donoho and M. G. Low. Renormalization exponents and optimal pointwise rates of convergence. Ann. Statist., 20(2):944 970, 06 1992.

[4] Z. Harchaoui, A. Juditsky, A. Nemirovski, and D. Ostrovsky. Adaptive recovery of signals by convex optimization. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 929 955, 2015.

[5] S. Haykin. Adaptive ﬁlter theory. Prentice Hall, 1991.

[6] I. Ibragimov and R. Khasminskii. Nonparametric estimation of the value of a linear functional in Gaussian white noise. Theor. Probab. & Appl., 29(1):1 32, 1984.

[7] I. Ibragimov and R. Khasminskii. Estimation of linear functionals in Gaussian noise. Theor. Probab. & Appl., 32(1):30 39, 1988.

[8] A. Juditsky and A. Nemirovski. Nonparametric denoising of signals with unknown local structure, I: Oracle inequalities. Appl. & Comput. Harmon. Anal., 27(2):157 179, 2009.

[9] A. Juditsky and A. Nemirovski. Nonparametric estimation by convex programming. Ann. Statist., 37(5a):2278 2300, 2009.

[10] A. Juditsky and A. Nemirovski. Nonparametric denoising signals of unknown local structure, II: Nonparametric function recovery. Appl. & Comput. Harmon. Anal., 29(3):354 367, 2010.

[11] T. Kailath, A. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000.

[12] O. Lepski and N. Serdyukova. Adaptive estimation under single-index constraint in a regression model. Ann. Statist., 42(1):1 28, 2014.

[13] S. Mallat. A wavelet tour of signal processing. Academic Press, 1999.

[14] Y. Nesterov and A. Nemirovski. On ﬁrst-order algorithms for ℓ1/nuclear norm minimization. Acta Num., 22:509 575, 2013.

[15] D. Ostrovsky, Z. Harchaoui, A. Juditsky, and A. Nemirovski. Structure-Blind Signal Recovery. ar Xiv:1607.05712v2, Oct. 2016.

[16] A. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2008.

[17] L. Wasserman. All of Nonparametric Statistics. Springer, 2006.