# jackpot_approximating_uncertainty_domains_with_adversarial_manifolds__8783ec3f.pdf

Journal of Machine Learning Research 24 (2025) 1-41 Submitted 10/24; Revised 7/25; Published 10/25

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

Nathana el Munier nathanael.munier@ens-rennes.fr University of Toulouse, CNRS, IMT, CBI, France

Emmanuel Soubies University of Toulouse, CNRS, IRIT, CBI, France

Pierre Weiss University of Toulouse, CNRS, IRIT, CBI, France

Editor: Joseph Salmon

1 Introduction 2 1.1 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Identifying Dynamical Systems Parameters . . . . . . . . . . . . . . 3 1.1.2 Blind Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Bayesian Posterior Exploration . . . . . . . . . . . . . . . . . . . . . 4 1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Structural Identiﬁability . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Practical Identiﬁability . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Main Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 The Jackpot Algorithm and its Guarantees 8 2.1 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Approximation Result . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Lowest Singular Vectors Computation . . . . . . . . . . . . . . . . . 10 2.2 Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 The Jackpot Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Approximation Guarantees . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Numerical Experiments 20 3.1 Measuring Masses in the Solar System . . . . . . . . . . . . . . . . . . . . . 20 3.2 Blind Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Posterior Exploration for an Image Deblurring Problem . . . . . . . . . . . 24

A Proofs 28 A.1 Deﬁnition and Characterization of Manifolds . . . . . . . . . . . . . . . . . 28

c 2025 N. Munier, E. Soubies and P. Weiss.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/24-1769.html.

N. Munier, E. Soubies and P. Weiss

A.2 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.2.1 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.3 Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.3.1 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.3.2 Proof of Theorem 8, Theorem 9, Proposition 11 and Theorem 12 . . 33 A.4 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 A.4.1 Proof of Proposition 14 . . . . . . . . . . . . . . . . . . . . . . . . . 36 A.4.2 Proof of Corollary 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Abstract

Given a forward mapping Φ : RN RM and a point x RN, the region {x RN, Φ(x) Φ(x ) 2 ε}, where ε 0 is a perturbation amplitude, represents the set of all possible inputs x that could have produced the measurement Φ(x ) within an acceptable error margin. This set is related to uncertainty analysis, a key challenge in inverse problems. In this work, we develop a numerical algorithm called Jackpot (Jacobian Kernel Projection Optimization) which approximates this set with a low-dimensional adversarial manifold. The proposed algorithm leverages automatic diﬀerentation, allowing it to handle complex, high dimensional mappings such as those found when dealing with dynamical systems or neural networks. We demonstrate the eﬀectiveness of our algorithm on various challenging large-scale, non-linear problems including parameter identiﬁcation in dynamical systems and blind image deblurring. Keywords: Inverse problem, uncertainty quantiﬁcation, identiﬁability

1. Introduction

Let Φ : RN RM denote an arbitrary C1 mapping and y = Φ(x) + η be some noisy measurement of x RN through this mapping. Given an estimate x of x, our main objective in this paper is to design a numerical algorithm to describe the set Uε deﬁned by

Uε def = x RN, Φ(x) Φ(x ) 2 ε (1)

= Φ 1 (B (Φ(x ), ε)) ,

for some ε 0, where 2 denotes the standard ℓ2-norm and B(y, r) denotes an ℓ2-ball centered at y of radius r. For example, choosing ε = Φ(x ) y deﬁnes Uε as the set of points whose forward images under Φ are as close to the measurement y as that of x , and can thus be interpreted as equally plausible estimates. As will be explained later, many important practical problems can be framed in this manner. If the mapping Φ is linear, then Uε is an ellipsoid that can be characterized using linear algebra tools. In the general nonlinear case, the set Uε is the sublevel set of a near arbitrary function. Indeed, an arbitrary closed set E can be written as the 0-level set of the function Φ(x) = dist(x, E)2. Given the level of generality, it is unreasonable to expect solving the problem properly for all mappings Φ. Describing Uε is more challenging than ﬁnding all the global minimizers of a near arbitrary non convex function. Therefore, we aim for a more modest goal: describe Uε locally around the point x . Additionally, we seek to approximate this set by another one of relatively low intrinsic dimensionality. This serves two purposes: i) to design tractable algorithms and ii) to provide an output that can be easily visualized. Our main objective in this paper can therefore be summarized as follows.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

Main objective Given a mapping Φ : RN RM and a point x RN, ﬁnd a Ddimensional manifold1 Mε δ that approximates the set

Uε δ def = {x B(x , δ), Φ(x) Φ(x ) 2 ε}, (2)

where δ 0 bounds the diameter of the approximating manifold. The set Uε δ can be encountered under diﬀerent names, depending on the scientiﬁc ﬁeld: equivalence domain (Grayver and Kuvshinov, 2016), low misﬁt region (Fern andez-Mart ınez et al., 2012), solution set (Jauberthie et al., 2013) or uncertainty region (Fern andez-Mart ınez and Fern andez-Mu niz, 2020). Throughout this paper, we will call it uncertainty region. We are particularly interested in complex models Φ (e.g. dynamical systems, neural networks), which can lead to problems involving high-dimensional (latent) spaces. Throughout the paper, we assume that the Jacobian of Φ can be accessed either through analytical derivation or by using automatic diﬀerentation algorithms.

1.1 Application Examples

Many problems involve identifying parameters of physical, biomedical, or chemical systems from indirect observations. The primary applications we have in mind are related to the ﬁeld of inverse problems. In this ﬁeld, the set Uε describes what can be identiﬁed in a system and what cannot. Let us illustrate this point with a few examples studied in the numerical experiments.

1.1.1 Identifying Dynamical Systems Parameters

Assume that some dynamical system u : R+ RP D of P particles in RD is governed by a ﬁrst or second order equation:

u(t) = f(u(t), x) or u(t) = f(u(t), x)

where u(t) = (u1(t), . . . , u P (t)) denotes the particles positions and f : RP D RN RP D

is a force or velocity ﬁeld parameterized by the vector x RN. After measuring the particles positions at multiple time points, we would like to recover the unknown parameter x that describes the dynamical system. An example that we will be studied later is the solar system, where u(t) represents the positions of P = N planets at time t, f are forces given by Newton s second law of attraction and x RN is the mass of the diﬀerent planets. We assume that a measurement system (e.g. telescope) returns estimates of the positions u at some times, leading to a measurement vector (y1, . . . , y K) of the form yk = u(tk)+bk for sampling times t1 < < t K and unknown measurement errors bk. The scientiﬁc question we aim to address is: can we recover the mass of the planets from the partial observation of their positions? To this end, we can set Φ : x 7 (u(t1), . . . , u(t K)) and ﬁnd a ﬁrst guess x of the mass using a least square regression:

x = argmin x RN 1 2 Φ(x) y 2 2.

1. See Appendix A.1 for a deﬁnition.

N. Munier, E. Soubies and P. Weiss

Describing the set Uε δ provides a much ﬁner description of the uncertainties than the single estimate x . As we will demonstrate, our algorithm reveals that predicting the mass of Pluto is highly challenging. This observation aligns with historical estimates: initially, Pluto s mass was thought to be twice that of Earth, but in 2015, the New Horizons probe measured it to be just 1/455th of Earth s mass.

1.1.2 Blind Inverse Problems

Assume that we acquire measurements y of the form

y = P(A(θ)(x)),

where P : RM RM represents a perturbation (e.g. noise, quantization) and θ RN

is a parameter that describes the state of an acquisition system A(θ) : RN RM. This parameter could represent an unknown point spread function in optics or uncertain projection angles in tomography for instance. The goal in blind inverse problems is to recover both x and θ from the measurements y. One approach we will explore in the numerical section involves constructing a learned reconstruction mapping R(A(θ), y), which, given a forward operator A(θ) and a measurement vector y, outputs an estimate ˆx(θ) = R(A(θ), y) of x. Then, a simple approach to estimate θ is to minimize the discrepancy:

argmin θ RN 1 2 A(θ)(ˆx(θ)) y 2 2 ,

ensuring that the pair (A(θ), ˆx(θ)) is coherent with the observation y. By setting Φ(θ) = A(θ)(ˆx(θ)) and exploring Uε δ around a local minimizer θ , we can identify all pairs coherent with the data. This is a way to describe some uncertainty properties of the blind inverse problem.

1.1.3 Bayesian Posterior Exploration

Similar to the previous example, assume that a system yields measurements y of the form

y = P(A(x)),

where A : RN RM is a known operator describing the acquisition device. Whenever A is not invertible, recovering x from y requires regularization. An elegant approach to formalize this principle uses Bayesian reasoning: we assume that x is the realization of some random vector x with probability distribution function px. Under the assumption that the prior px is a proper probability distribution and the likelihood py|x(y|x) is well-deﬁned for the observed data y, it is possible to construct a posterior distribution:

px|y(x|y) = py|x(y|x) px(x)

and Bayesian estimators such as the Maximum A Posteriori (MAP) estimate:

ˆx MAP(y) def = argmax x RN px|y(x|y)

= argmin x RN log py|x(y|x) log (px(x)) .

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

The prior px can be handcrafted or learned with large databases. The latter strategy oftentimes provides surprisingly good results, which are rapidly replacing older strategies (Nah et al., 2021; Muckley, 2021; Sidky and Pan, 2024). In scientiﬁc applications, it however becomes critical to certify the results, i.e. describe what information can be safely regarded as valid and the one which is uncertain. One approach to achieving this involves describing the high probability region

Hα = {x RN, log px|y(x|y) α} (3)

for some probability threshold α. However, the prior px(x) can often only be accessed indirectly through its gradient (Hyv arinen, 2005; Vincent, 2011). In that situation, another approach is to compute a local minimizer x of the posterior distribution and to explore the set Uε δ = x B(x , δ), log px|y(x|y) 2 ε . (4)

This problem ﬁts our formalism with Φ = log px|y( |y) and Φ(x ) = 0, since x is a minimizer. When x = ˆx MAP, we have Uε δ Hα for some parameter ε > 0 that depends on α and suﬃciently small δ > 0. However the reverse inclusion is not true in general.

1.2 Related Works

As illustrated by these examples, describing uncertainty regions can be useful in various ﬁelds such as system identiﬁcation, inverse and blind inverse problems, data assimilation. It can also be regarded as a type of sensibility analysis or uncertainty quantiﬁcation. Let us review some approaches available in the literature. We will use the nomenclature of the identiﬁability theory (Wieland et al., 2021), which classiﬁes the study as structural in the noiseless setting (ε = 0) and practical when noise is considered (ε > 0).

1.2.1 Structural Identifiability

Using our notation, global structural identiﬁability means that U0 = {x } while local structural identiﬁability means that there exists δ > 0 such that U0 δ = {x }. The literature devoted to those two notions of identiﬁability is vast. Under mild regularity assumptions on Φ, a necessary and suﬃcient condition for local structural identiﬁability is given by ker(JΦ(x )) = {0}, where JΦ(x ) RM N is the Jacobian matrix of Φ at x (Rothenberg, 1971). When Φ is linear, this condition reads ker(Φ) = {0} and implies global structural identiﬁability. Beyond the linear case, there exist rich theories regarding the (local) identiﬁability of bilinear inverse problems (Li et al., 2016; Kech and Krahmer, 2017), phase retrieval problems (Jaganathan et al., 2016; Grohs et al., 2020), partial diﬀerential equations (Villaverde et al., 2018) or neural networks (Phuong and Lampert, 2019; Bona-Pellissier et al., 2022), to name a few. When x is non-identiﬁable, the analysis and characterization of the non-trivial set U0 locally around x has also been the object of many works. Existing methods include computing the uncertainty region U0 in explicit form using diﬀerential algebra (Bates et al., 2019), numerical algebraic geometry (Hubert, 1999; Bellu et al., 2007) or Lie group theory (Merkt et al., 2015). However, these algebraic methods are limited to the analysis of rational functions and become computationally prohibitive as the number of parameters increases.

N. Munier, E. Soubies and P. Weiss

(b) Φ = UAU T

(c) Φ(x) = x 2 2

Figure 1: Illustration of the proﬁle likelihood method. The red-black curves are the level sets of the discrepancy Φ( ) Φ(x ) 2. The discrepancy proﬁles along each axis are plotted in black, together with the conﬁdence intervals in green. In (a), (b), we ﬁxed A = diag(.2, 1), an unitary matrix U and x = (0, 0). In (c), we ﬁxed x = (1, 0).

1.2.2 Practical Identifiability

Practical identiﬁability, which guarantees stability to noise on the observations, is the main topic of this paper. Alternative names are sensitivity analysis or uncertainty quantiﬁcation, which are closely related. For the local characterization of the uncertainty region Uε, various approaches exist.

Coordinate-based methods Coordinate-based methods examine the model s sensitivity along each direction of the canonical basis, where the map x 7 exp ( Φ(x) Φ(x ) 2) is often called likelihood. Proﬁle likelihood methods (Venzon and Moolgavkar, 1988; Raue et al., 2009) compute one-dimensional proﬁles. The proﬁles are evaluated by ﬁxing one coordinate and optimizing over the remaining ones: min x RN xi=x i +tj

Φ(x) Φ(x ) 2.

This is done for diﬀerent deviations tj R and all coordinates i. The directions i, where the likelihood is high are uncertain. An illustration with simple likelihood functions is given in Fig. 1. Notice that in practice, only endpoints of the conﬁdence intervals are computed (Fischer and Lewis, 2021). A weakness of these approaches is that they act component-wise. Hence, in Fig. 1 examples (a), (b), we see that the proﬁles change a lot depending on the coordinate system, showing that the proﬁle directions should be chosen with care. In Fig. 1 (c) for instance, the proﬁles are locally ﬂat in both directions, while the uncertainty is only one-dimensional along the unit circle. A possible approach to avoid this drawback, is to ﬁnd meaningful directions using the Fisher information matrix, deﬁned by JΦ(x )T JΦ(x ). The eigenvectors associated to the lowest singular values indicate the directions which are hardest to identify. This idea was used by Eisenberg and Hayashi (2014) to identify combinations of coordinates which are jointly identiﬁable and draw proﬁles likelihood for these subsets only. A similar mechanism is at the core of our approach, though used in a diﬀerent manner.

Monte Carlo sampling methods The problem can be viewed as recovering the preimage Φ 1(B(Φ(x ), ε)), i.e., all points that Φ maps into a ball around Φ(x ). A simple way

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

(b) Φ = UAU T (c) Φ(x) = x 2 2

Figure 2: Illustration of Monte Carlo sampling on the same examples as in Fig. 1. The set Uε is represented in green and the sampling points are in blue.

to do so is to deploy a Monte Carlo sampling method (Mosegaard and Tarantola, 1995). A few samples yk for 1 k K are drawn at random within B(Φ(x ), ε). Then, the antecedents can be computed as

ˆxk def = argmin x RN Φ(x) yk 2 2. (5)

The discrete set (ˆxk) provides a snapshot of Uε. An illustration is provided in Fig. 2 with our basic examples. Statistical analyses of the samples (ˆxk) can then be performed to quantify the uncertainty. These include, for instance, variance or covariance estimates, bootstrapping (Hengl et al., 2007), sensitivity-based methods (Helton et al., 2006; Bardsley, 2012), or randomize-thenoptimize strategies (Bardsley et al., 2014). As opposed to the proﬁle likelihood approach, Monte Carlo sampling accounts for relationships between coordinates. Yet, it suﬀers from multiple weaknesses.

It can be computationally expensive given that each sample ˆxk requires solving an optimization problem and that the number of sampling points required for statistical analyses grows exponentially with the dimension of the problem (Fern andez-Mart ınez and Fern andez-Mu niz, 2020; Fern andez-Mart ınez et al., 2013).

For degenerate problems, where Φ is non injective, the optimization problem may possess multiple minimizers. In those situations, sampling becomes ineﬃcient. In Fig. 2 (c) for instance, we started the optimization algorithm by setting xinit = x as an initialization point. As a result, the sampling approach only yields a 1D subset of Uε, that does not capture the complete annulus.

Uniform sampling in B(Φ(x ), ε) might not translate to a uniform sampling in Uε, potentially over-representing some regions and under-representing others. This issue will be striking in our numerical experiments.

Some of these challenges can be addressed with advanced diﬀusion models, which have recently gained signiﬁcant attention in artiﬁcial intelligence research. These methods were initially developed for generative modeling tasks (Ho et al., 2020) and have since been adapted for various inverse problems (Chung et al., 2022; Laumont et al., 2022; Chemseddine et al., 2024). Ongoing research focuses on certifying the accuracy of posterior sampling (Thong et al., 2024) and improving the eﬃciency of sample generation. However, given the computational cost of these approaches, we have chosen not to use them for comparisons.

N. Munier, E. Soubies and P. Weiss

Geometric analysis Finally, the last class of methods, the one we endorse, aims to compute geometrically the structure of the set Uε. Arutjunjan et al. (2022) proved that the boundary of Uε is a N 1-dimensional manifold and proposed a numerical method to compute its boundary. It relies on the construction of a complete vector ﬁeld tangential to the level sets of Φ(x) Φ(x ) 2. Then, the determination of the boundary of Uε amounts to solve an ODE. This approach is limited to low-dimensional problems such as the two-dimensional non-polynomial and was applied only to three-dimensional polynomial by Arutjunjan et al. (2022). Instead of considering the whole space, Raman et al. (2017) proposed to determine a trajectory referred to as minimally disruptive curve or talweg within the structural or practical non-identiﬁable sets U0 and Uε. This trajectory is a one-dimensional sub-manifold approximation of Uε with the smallest discrepancy values. However, it is unclear how to extend this approach to higher dimensions. Another approach given by Transtrum and Qiu (2014) consists in successively eliminating one parameter at a time to ﬁnally obtain a partial low-dimensional boundary of the uncertainty region. This method is not applicable to high-dimensional problems. For further details and references related to these approaches, we refer the reader to the comprehensive reviews (Miao et al., 2011; Wieland et al., 2021; Lam et al., 2022).

1.3 Main Contribution

We aim at advancing the ﬁeld of local practical identiﬁability by designing the Jackpot algorithm. It is applicable to large dimensional problems and yields a manifold of arbitrary (but preferably small) dimension D. This manifold is constructed so as to provide a good approximation of the ε-uncertainty region in Hausdorﬀdistance, regardless of the non-linear and non-algebraic nature of the model. A simple illustration of its output is given in Fig. 5. The proposed algorithm is based on automatic diﬀerentiation, allowing us to tackle high-dimensional problems. It relies on the Python package deepinv (Tachella et al., 2025). In particular, we illustrate its behavior in problems involving dynamical systems and neural networks. The numerical experiments supporting these results, together with the implementation of the Jackpot algorithm, are available in the corresponding Git Hub repository.

2. The Jackpot Algorithm and its Guarantees

Our main goal is to describe Uε δ = {x B(x , δ), Φ(x) Φ(x ) 2 ε}, the local ε2/2sublevel set of the function

F(x) def = 1

2 Φ(x) Φ(x ) 2 2. (6)

In what follows, to measure how well we approximate the set Uε δ , we will use Kolmogorov and Hausdorﬀdistances which we deﬁne next.

Deﬁnition 1 (D-width or Kolmogorov distance) Let G(D, N) denote the set of Ddimensional vector subspaces in dimension N (the Grassmanian). The Kolmogorov distance δD of a set S RN is deﬁned by:

δD(S) def = inf V G(D,N) sup x S inf v V x v 2. (7)

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

It measures how well a set S can be approximated by a D-dimensional subspace. Let us also recall the notion of Hausdorﬀdistance.

Deﬁnition 2 (Hausdorﬀdistance) For two subsets X, Y of a metric space (M, d), the Hausdorﬀdistance is deﬁned as

d H(X, Y) def = max

sup x X d(x, Y), sup y Y d(X, y)

= inf η 0 {X Y + B(0, η) and Y X + B(0, η)} . (9)

This distance intuitively measures how much a set should be dilated to include the other one and vice versa. In this section, we start by proposing a linear manifold approximation f Mε δ obtained from a quadratic approximation of F at x . This allows us to present the main approximation results in a simpliﬁed format. We then turn to the construction of a more precise nonlinear manifold approximation Mε δ. We describe its approximation guarantees, and a practical algorithm to compute it.

2.1 Linear Approximation

Under a C2 regularity assumption, we can approximate F by a quadratic function. Most terms in the Taylor expansion vanish since x is a minimizer and for any h RN, we obtain:

F(x + h) = 1

2 JΦ(x ) h 2 2 + o( h 2 2). (10)

This motivates us to consider the quadratic approximation of F at x :

e F(x) def = 1

2 JΦ(x ) (x x ) 2 2. (11)

The uncertainty region can be described as the ε2/2-sublevel set of this function, denoted e Uε = n x RN, F(x) ε2/2 o . Similarly to the deﬁnition of Uε δ , we also set

e Uε δ = x B(x , δ), F(x) ε2

2.1.1 Approximation Result

In the linear case, the best approximation of e Uε by a D-dimensional subspace is explicit using the singular vectors of the Jacobian matrix of Φ. We can indeed decompose the Jacobian as JΦ(x) = U(x)Σ(x)V T (x), where U(x) RM M and V (x) RN N are orthogonal matrices and Σ(x) RM N is a diagonal rectangular matrix with nonnegative and decaying

entries. To simplify the notation, we let J def = JΦ(x ) = U Σ V ,T , where U = [u 1, . . . , u M], V = [v 1, . . . , v N], diag(Σ ) = [σ 1, . . . , σ R , 0, . . . , 0], and R is the rank of J . By convention, we let σ R +1, σ R +2, . . . , σ N = 0. The following description is standard.

Proposition 3 (From Johansen, 1977) The sublevel set e Uε is a (degenerate) hyper-ellipsoid with axes aligned with the vectors (v n). The length of the ellipsoid along the semi-axis v n is given by ε σ n for n {1, . . . , R } and it is inﬁnite for the other axes.

N. Munier, E. Soubies and P. Weiss

The following result is rather standard in approximation theory, see e.g. (Pinkus, 2012, Thm. 1.3). First the uncertainty region is approximated by an aﬃne subspace L and then by a troncation of this subset which gives the approximation manifold f Mε δ of e Uε δ . Guarantees on the approximation distances are also provided and the proof is provided in Appendix A.2.1.

Proposition 4 Firstly, the aﬃne D dimensional aﬃne space with D dim ker JΦ(x )

that best approximates e Uε for the Kolmogorov distance is given by Tx def = x + V D with

V D def = span v N D+1, . . . , v N . (13)

The Kolmogorov distance is given by δD( e Uε x ) = ε σ N D . Secondly, for any δ > 0, we deﬁne the set

f Mε δ def = n x Tx ; Σ V ,T (x x ) 2 2 ε2 and x x 2 δ o . (14)

It satisﬁes the following equality:

d H( f Mε δ, e Uε δ ) = min ε σ N D , δ . (15)

This linearized approximate set f Mε δ was already described in Fern andez-Mart ınez and Fern andez-Mu niz (2020). The bounds in Proposition 4 are inversely proportional to the (N D)th singular value of the Jacobian of Φ at x . As such, the linearized uncertainty region e Uε δ can be well approximated by a low-dimensional (i.e., D small) manifold f Mε δ when the singular spectrum of the Jacobian JΦ(x ) admits few small singular values. A general discussion related to this is provided in Section 2.2.4 (see also Fig. 6).

2.1.2 Lowest Singular Vectors Computation

The results above show that the singular values and singular vectors of the Jacobian J = JΦ(x ) play a critical role for the description of the set e Uε. To compute them, two operation are required: i) evaluating the Jacobian matrix or matrix-vector products with it and ii) computing the singular value decomposition.

Jacobian computation Computing matrix-vector products with the Jacobian can be achieved eﬃciently using automatic diﬀerentiation algorithms. In Pytorch for instance, this can be achieved through the torch.jvp and torch.vjp functions. If the dimension N or M is suﬃciently small, then the whole Jacobian matrix J can be evaluated and stored. If the dimensions are too large, then automatic diﬀerentation algorithms make it possible to compute left and right matrix-vector products, i.e. products of the form J v or u T J for arbitrary directions u RM and v RN.

Singular vectors computation The computation of singular pairs of J or the eigenpairs of J ,T J are two equivalent problems. The latter is known as a symmetric eigenvalue problem (Parlett, 1998). Several algorithms, called eigensolvers (Liu et al., 2020), have been developed for this purpose. If the matrix is small enough, traditional decomposition methods based on the QR algorithm can be used.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

In high dimension, we adopted the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method (Knyazev, 2001; Duersch et al., 2018). It is matrix-free: it only requires matrix-vector products and does not process the entire matrix. The computation is achieved by optimizing the generalized Rayleigh quotient. For X RN D, this quotient is deﬁned as f(X) = Tr XT X 1 (J X)T J X . (16)

As such, ﬁnding the D leftmost singular vectors of J consists in ﬁnding the minimizer of

argmin X St(N,D) f(X),

where St(N, D) def = X RN D ; XT X = Id D is the set of N D orthonormal matrices also called the Stiefel manifold. The LOBPCG algorithm can be interpreted as a manifold gradient descent (Absil et al., 2008) with a carefully designed step-size based on the previous iterates.

2.2 Nonlinear Approximation

We now turn to the approximation of the sublevel sets of F without resorting to a quadratic approximation: we want to ﬁnd a D-dimensional manifold Mε δ that approximates Uε δ well.

2.2.1 The Jackpot Manifold

The linear subspace Tx = x + V D can be a poor approximation of Uε when Φ is nonlinear. However, it is a natural tangent space for the set Mε δ at x . The idea behind our algorithm consists in bending this tangent space, to better ﬁt the uncertainty region Uε δ . A similar idea was proposed by Rheinboldt (1996) for polynomials. However, the algorithm was limited to three dimensions and could not visualize the uncertainty region, as it was constrained to the case where ε = 0. In mathematical terms, we want to ﬁnd a one-to-one map γ that sends points of Tx onto Uε. This can be achieved thanks to the following optimization problem:

γ(z) def = argmin-loc V ,T D (x x )=z

1 2 Φ(x) Φ(x ) 2 2 , (Pz)

where V D def = [v N D+1, , v N 1, v N] RN D. (17)

The constraint V ,T D (x x ) = z means that the projection of γ(z) on the tangent space Tx has coordinate z. It ensures the injectivity of the map γ. The argmin-loc notation refers to a local minimizer near x . The main tools appearing in this description are depicted in Fig. 3.

A simple illustration To illustrate this principle, we consider the mapping Φ(x) = 1

2 x 2 2 and set x = e1, where (e1, . . . , e N) is the canonical basis. It is a basic example from a broader context named algebraic implicit curves parameterization. For this speciﬁc function, the set U0 is a sphere of radius 1 centered at 0. The tangent space is the hyperplane orthogonal to e1 passing through x . Simple computation yields γ(z) = p

1 z 2 2e1 +PN n=2 znen. We see

N. Munier, E. Soubies and P. Weiss

Figure 3: Summary of Jackpot (JACobian Kernel Projection by Op Timization). In this example, we set Φ(x) = x 2 2, ε = 0 and x = e1. In this setting, the uncertainty region Uε is just a unit sphere, and the aﬃne space Tx is the tangent space to Uε at x . The uncertainty region is obtained by projecting the tangent plane onto Uε.

that the proposed parameterization allows us to recover the half sphere, which is illustrated in Fig. 3. This example also shows the importance of considering local minimizers, since the problem (Pz) admits two global minimizers. We provide a few extra toy examples in Fig. 5.

Theorem 5 (Well-deﬁnedness of the parameterization γ) Let Φ : RN RM be a C1

map, x RN, and D dim ker JΦ(x ) denote an integer. Then there exists a neighborhood V RD such that (Pz) admits a unique solution γ(z) for z V. Moreover, the mapping γ : V RN veriﬁes γ(0D) = x and it has the following properties

If Φ is of class C2, then γ is of class C1.

If Φ is of class C1 and JΦ is locally Lipschitz and deﬁnable, then γ is a locally Lipschitz deﬁnable function.

Proof The proof is provided in Appendix A.3.1

Remark 6 The second condition in Theorem 5 is weaker than the ﬁrst one and allows us to handle a larger class of operators. The notion of deﬁnable function (Van den Dries, 1998; Coste, 2000) generalizes the notion of (implicitly deﬁned) semi-algebraic function. It encompasses all functions commonly considered in engineering. We refer the reader to (Bolte et al., 2021, Appendix A.2) for a precise deﬁnition.

Main deﬁnition The mapping γ provides a good candidate to approximate the uncertainty region Uε δ . It writes as follows.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

Deﬁnition 7 (The Jackpot manifold ) For δ > 0 suﬃciently small, we deﬁne the set

Mε δ def = γ(z) ; z RD, Φ(γ(z)) Φ(x ) 2 ε and γ(z) x 2 δ . (18)

and we refer to γ as the parameterization of Mε δ.

From Theorem 5, this set is well deﬁned whenever Φ is of class C1. Moreover, the Jackpot set recovers the local structural non-identiﬁabilities (i.e., U0 δ ).

Theorem 8 Let Φ be of class C1 and D = dim ker JΦ(x ) > 0. Moreover assume that dim ker JΦ(x) remains constant in a neighborhood of x . Then there exists δ > 0, such that Jackpot recovers the uncertainty region locally, that is M0 δ = U0 δ .

All the preceding results hold under the assumption that Φ is of class C1. In this case, the set Mε δ is well-deﬁned, but establishing whether it admits a manifold structure is nontrivial and remains an open problem. However, when Φ is of class C2, one can additionally prove that Mε δ indeed has a smooth manifold structure.

Theorem 9 Let Φ be of class C2 and D dim ker JΦ(x ). Then Mε δ deﬁnes a manifold.

The proofs of Theorem 8 and 9 are provided in Appendix A.3.2.

2.2.2 Approximation Guarantees

In this section, we focus on the case Φ of class C2. We show in Theorem 12 that Mε δ provides a good approximation of Uε δ .

Assumption 10 (A simple approximation condition) Let

Sz = n x Uε δ , V ,T D (x x ) = z o

denote the slice of Uε δ which has coordinate z on Tx . We assume that there exists η > 0 such that sup z RD γ(z) x 2 δ

diam(Sz) η. (19)

This assumption is illustrated in Fig. 4. Notice that the value η > 0 can be large even if the set is thin, when the set is bent a lot with respect to Tx . For a C2 mapping Φ, the thickness η can be controlled by regularity properties as shown in the following proposition, the proof of which is deferred to Appendix A.3.2.

Proposition 11 (Uniformly bounded curvature) Assume that Φ is of class C1 and that D dim ker(JΦ(x )) = N R . Assume moreover that its Jacobian JΦ is L-Lipschitz over the ball B(x , δ), i.e.

x, x B(x , δ), JΦ(x) JΦ(x ) op L x x 2 (20)

N. Munier, E. Soubies and P. Weiss

Figure 4: An illustration of Assumption 10. The slices Sz of Uε δ have a diameter uniformly bounded by η.

where op is the operator norm. Finally, suppose that the Lipschitz constant L satisﬁes

where σ N D is the (N D)-th singular value of JΦ(x ). Then Assumption 10 holds with

η = 2ε σ N D Lδ. (21)

From Proposition 11, we get that for a twice diﬀerentiable mapping Φ, Assumption 10 is always satisﬁed for suﬃciently small δ (i.e., suﬃciently locally). We are now in a position to show how well Mε δ approximates Uε δ in terms of the Hausdorﬀdistance deﬁned in (2). The proof is provided in Appendix A.3.2.

Theorem 12 ( Approximation guarantees) Let Φ be of class C2 and D dim ker JΦ(x ). Then, under Assumption 10, Mε δ is a good approximation manifold of Uε δ in the sense that:

d H(Mε δ, Uε δ ) η, (22)

where d H is deﬁned in (2). More precisely, we have

Mε δ Uε δ Mε δ + B(0, η). (23)

Remark 13 A few remarks are in order:

In the linear case, since the Hessian is null, any radius δ > 0 suits as long as the dimension D satisﬁes D dim ker JΦ(x ).

An upper-bound on the Hessian norm HΦ 2,δ is

sup x B(x ,δ)

(σmax (HΦm(x)))m M 2

with Φm being the m-th component of Φ.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

2.2.3 Numerical Computation

From a numerical perspective, we have to discretize the manifold Mε δ. This can be done by evaluating (Pz) on a discrete subset Z = {zk}0 k K of RD. We let Mε δ(Z) denote the discretized manifold:

Mε δ(Z) def = Mε δ {γ(z); z Z} . (24)

The main problem now reduces to evaluating γ for grid points Z satisfying the constraints. The principle is to propagate a front starting from z = 0D with a grid-search algorithm and to solve (Pz) at each z. A pseudo-code is given in Algorithm 1. We detail the two steps below.

Grid search In this work, we simply discretize RD with a Cartesian grid. Other choices such as the honeycomb pattern are possible, but have not been explored. Fixing a grid length s > 0, we deﬁne the following Cartesian grid

Zs def = s ZD B (0D, δ) = s δ

, . . . , δ

Since the initial point γ(0) = x is known, the grid search should start with the point 0D Z. Solving (Pz) can be achieved with ﬁrst order methods, which require a starting point. Since γ is a smooth mapping, a good approximation of γ(z) is γ(z ) for some z where γ has already been computed. This leads to consider a grid search method starting from 0D Z and passing from neighbor to neighbor. The Breadth-First Search (BFS) algorithm is well suited for this task. We add a graph structure G = (Z, E) (where E is the set of edges) on the grid in order to apply this algorithm. Given this structure, the natural choice of initial guess for the optimization at a non already-explored node is thus given by the closest already computed neighbor. BFS method also makes it possible to stop the search in the neighborhood of a vertex whenever one of the two criteria deﬁning (24) are violated.

Optimization algorithm Each step of Algorithm 1 requires solving the optimization problem (Pz). Obtaining the following result is based on inserting the linear constraint in the objective function. Its proof is provided in Appendix A.4.1.

Proposition 14 Let Π def = Id N V DV ,T D be the orthogonal projection on the subspace (Im V D) and consider the following optimization problem for z RD

η (z) def = argmin-loc x RN 1 2 Φ(x + V Dz + Π x) Φ(x ) 2 2 . (P z)

Then there exists a local neighbor V RD of 0D such that the solution γ of (Pz) veriﬁes for z V, γ(z) = x + V Dz + Π η (z). (26)

Moreover, if Φ is of class C2, a gradient descent algorithm applied to (P z) converges linearly provided the step size is properly chosen (Polyak, 1987).

N. Munier, E. Soubies and P. Weiss

Algorithm 1 BFS parameterization of Mε δ(Z)

Inputs of the model:

Φ : RN RM a C1 mapping. x RN an estimation of input parameter. 1 D < N the manifold dimension. V D RN D the D lowest right singular vectors of JΦ(x ). Inputs of the paramaterization:

δ > 0 a radius around x (to deal with x x 2 δ). ε > 0 a discrepancy threshold (to deal with Φ(x) Φ(x ) 2 ε). G = (Z, E) a connected graph with 0D Z RD. Initialization: η (0D) 0N. γ(0D) x . repeat

(z, zprev) the couple (non-evaluated, evaluated) neighboring vertices with smallest distance. η (z) solution of (P z) initialized at η (zprev) with an L-BFGS method. γ(z) x + V Dz + (Id N V DV ,T D )η (z). if Φ(γ(z)) Φ(x ) 2 >ε or γ(z) x 2 >δ then

Remove vertex z from Z and all its edges from E. end if until No couple (z, zprev) exists Return {γ(z)}z Z

To further accelerate the computation of γ(z) we propose to use the L-BFGS method on (P z), which is a quasi-Newton algorithm. It accelerates the convergence by using a preconditionning based on an estimation of the Hessian using the gradients at each step. It produces a sequence of iterates (γk(z)). A natural stopping criteria for this sequence is Φ(γk(z)) Φ(x ) 2 ν ε with ν < 1 a small constant. Moreover, we also deploy a warm-start strategy over the grid Z to further accelerate the convergence of L-BFGS, leveraging the smoothness of the manifold to obtain good initial points. We summarize the whole process and give a pseudo-code in Algorithm 1.

Approximation result One can extend the approximation result of the previous section to this discrete framework. The proof of the following corollary is provided in Appendix A.4.2.

Corollary 15 Under the same assumptions as in Theorem 12, and using a Cartesian grid Z = Zs deﬁned in (25), the discretized manifold provides a good approximation of the uncertainty region. Speciﬁcally, we have

d H(Mε δ(Z), Uε δ ) η + L

where L > 0 is the Lipschitz constant of the local solution γ to (Pz), d H is the Hausdorﬀ distance as deﬁned in Deﬁnition 2, and η > 0 is the threshold deﬁned in Theorem 12.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

(b) Φ = UAU T

(c) Φ(x) = x 2 2

Figure 5: Illustration of the proposed Jackpot method with D = 1 on the same examples as in Fig. 1. The low dimensional manifold Mε δ is represented with blue lines. In contrast to the proﬁle likelihood method, the proﬁle is computed along the most suitable direction. In addition, the set Mε δ is the best possible one-dimensional approximation of the 2 dimensional ellipses in (a) and (b). In example (c), the approximating set Mε δ precisely coincides with the unit circle (i.e., Uε δ ), except on the extremities.

2.2.4 Discussion

Before turning to the numerical experiments, we describe the strengths and weaknesses of the approach in order to highlight the problems for which it is best suited.

Behavior on the toy examples To begin with, let us illustrate how the algorithm behaves on the three introductory examples of Fig. 1. We see that the algorithm outputs one dimensional sets (in blue), which correctly approximate the uncertainty regions in all three cases. The method suﬀers neither from the limitation of the choice of coordinates as were proﬁle likelihood methods, nor from the instability of Monte Carlo-based methods when the Jacobian of Φ is degenerate.

Adapting to large dimensions Since the LOBPCG algorithm is matrix-free and combined with an automatic diﬀerentiation algorithm, our algorithm is scalable to large model (up to millions of parameters). The approximation being of smallest dimension D, it can be represented quite easily at least for dimensions D = 1, 2, 3. Depending on the geometry of the problem, our method gives a full exploration of the uncertainty region or only a partial one as shown in Fig. 6.

When is it most useful? From our theoretical analysis in Section 2.2.2, one can see that the low-dimensional manifold Mε δ is a good approximation of Uε δ provided that D is suﬃcient large relatively to the number of low singular values of JΦ(x ). Yet, for large-scale problems, it is not possible to set D too large due to computational limitations. This leads us to classify problems in three categories depending on the Jacobian singular spectrum, as illustrated in Fig. 6.

When the Jacobian is well-conditioned ( ﬂat spectrum, Fig. 6.a), we have practical local identiﬁability and the Jackpot method will detect it.

When the Jacobian is badly-conditioned, we distinguish two cases.

N. Munier, E. Soubies and P. Weiss

Figure 6: Relation between Jacobian singular spectrum and manifold approximation of the uncertainty region Uε. (a) No small singular values: the problem is identiﬁable. (b) Small number of small singular values. (c) Large number of small singular values. The set Uε can not be approximated by a low dimensional manifold.

If the spectrum admits few low singular values with a sharp transition (Fig. 6.b), Uε δ can be well approximated by a low-dimensional manifold and the Jackpot method will compute it. Otherwise (e.g., Fig. 6.c), Uε δ cannot be well approximated by a low-dimensional manifold. In this situation, the manifold Mε δ computed by Jackpot (with small D) will only partially represent Uε δ .

Notice that in the last case, we cannot expect to sketch the uncertainty set eﬃciently due to its intrinsically high dimensional nature.

Stability The Jackpot algorithm depends heavily on the subspace spanned by the D lowest right singular vectors of the Jacobian JΦ(x ). This subspace deﬁnes the tangent space of the constructed manifold at x . The stability of this subspace to perturbations in JΦ(x ) depends on the spectral gap between the D-th and (D + 1)-th smallest singular values, that is, on the quantity σN D+1 σN D (Stewart and Sun, 1990). When this gap is large, the D-dimensional subspace is well-separated from the remaining directions, and small errors in JΦ(x ) (due, e.g., to numerical diﬀerentiation or stochastic estimation) lead to only small perturbations of the tangent space. Conversely, when the singular values around the D-th index are clustered or close to zero, the subspace becomes highly sensitive to errors in JΦ(x ). In this situation, which is not favorable, as explained in the previous paragraph, many manifolds are near equivalent in terms of approximation, and Jackpot will just select one of them.

Adversarial versus natural perturbations Our goal in this paper is to design a manifold that approximates Uε = Φ 1 (B (Φ(x ), ε)) in Hausdorﬀdistance. We call the resulting set an adversarial manifold. The directions (v N D+1, . . . , v N) represent the perturbation directions that most signiﬁcantly aﬀect the inverse map Φ 1. These directions also capture the local geometry of Uε most eﬀectively, as seen in Theorem 12. In contrast, Monte Carlo sampling methods serve a diﬀerent purpose: they simulate natural perturbations, which are typically observed in real-world scenarios. Given a random

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

vector Y = Φ(x) + B, where B follows some probability distribution µ, they capture the pushforward distribution Φ 1 (µ ( Φ(x))). Points derived from the Jackpot method, however, may possess very low likelihood and would almost never appear through Monte Carlo sampling. This distinction becomes clearer in our numerical experiments. Adversarial and natural perturbations oﬀer distinct advantages, depending on the nature of uncertainty one aims to describe. Adversarial perturbations (Antun et al., 2020) are critical in scenarios where incorrect decisions may lead to severe consequences, as they identify the worst-case distortions. On the other hand, natural perturbations provide an averaged view of uncertainty, reﬂecting more common or benign variations.

The limitations of locality Our approximation and our analysis is only local. For instance, in Fig. 3, only half of the circle can be parameterized by our method. Hence the method should only be used locally. Solutions such as a periodic reassessment of the tangent plane could be considered, but we did not explore this rather computationally heavy approach in this paper.

Link with sensitivity analysis The uncertainty region is an object closely related but diﬀerent from the sensitivity region, which can also be treated by a slightly modiﬁed Jackpot algorithm. Let S : RM RN be a reconstruction mapping from the measurement space RM

to signal space RN. We deﬁne the sensitivity region as

Sε def = S (B(Φ(x ), ε)) , (28)

which corresponds to the set of reconstructions obtained when perturbing the data around Φ(x ) with a noise of norm smaller than ε. In the idealized case where S is a perfect inverse , satisfying Φ(S(y)) = y for all y RM, it holds that Sε Uε. (29)

That is, the sensitivity region is a subset of the full uncertainty region, reﬂecting additional constraints imposed by the solver, such as regularization or prior information. It captures the residual non-identiﬁability under the reconstruction method. Both Uε and Sε provide complementary views of the estimation problem:

The set Uε quantiﬁes the fundamental ambiguity in the inverse problem, independent of any reconstruction algorithm.

The set Sε reﬂects the actual spread of outputs produced by a speciﬁc solver when facing data perturbations.

The direct extension of the Jackpot algorithm to sensitivity analysis consists in deﬁning the manifold: γ(z) = argmax-loc y C(z) S(y) S(y ) 2 2, (30)

where C(z) is an aﬃne subspace of dimension M D, centered at y + V Dz, and oriented along the orthogonal complement of the subspace spanned by the D dominant right singular vectors of the Jacobian JS(y ). More precisely,

C(z) = y + V Dz + ran(V D) , (31)

N. Munier, E. Soubies and P. Weiss

where V D = [v 1, v 2, . . . , v D] RM D is the matrix formed by the top D right singular vectors of JS(y ), and y = Φ(x ). However, the optimization problem (30) may not admit a solution in general, as it involves maximizing over an unbounded set. To overcome this issue, we introduce a compact alternative:

C(z) = n y RM : y y 2 = z 2, y y + cone(V Dz) + ran(V D) o , (32)

where cone(V Dz) = {t V Dz : t R+} is a half-line in direction V Dz. The resulting set C(z) is compact: it is a spherical cap (half-sphere) of radius z 2, centered at y , and intersected with an (M + 1 D)-dimensional subspace. We choose the cone rather than the full span to capture the possible asymmetry of the sensitivity region around y . For intuition, consider the case D = 1. Then the curve γ(z) corresponds to the direction of maximal sensitivity, i.e., the worst-case adversarial perturbation at ﬁxed radius. More precisely, we have:

{γ(z), γ( z)} argmax-loc y y 2= z 2 S(y) S(y ) 2 2. (33)

Up to this point, we have used the deﬁnition of C(z) in (32). This alternative compact version is equally suitable for both uncertainty and sensitivity analysis. For clarity of exposition, we chose to present the former; however, most of the theoretical analysis and practical conclusions apply to the latter as well.

3. Numerical Experiments

In this section, we provide a few numerical examples on complex problems and compare it with some alternatives in the main three cases of practical identiﬁability described in Section 1.1.

3.1 Measuring Masses in the Solar System

In this ﬁrst example, we address the problem of identifying the masses of ﬁve planets of the solar system (Jupiter, Saturn, Uranus, Neptun, and Pluto), from their approximate positions at a few time points. We assume that the planets are moving according to Newton s law of gravitation. This corresponds to the problem described in Section 1.1.1.

Notation We sort the planets with respect to their distance to the Sun. We let 1 n N denote the index of the n-th outer planet and associate the index 0 to the Sun. We let un(t) R3 denote the positions in AU of the n-th planet at time t, in an heliocentric frame. Therefore u0(t) = 0 for all time. We let wn denote the weight of the n-th planet in the solar mass unit, that is w0 = 1. Moreover, we let u(t) = (u1(t), . . . , u N(t)) RN 3 denote the vector of positions and w = (w1, . . . , w N) denote the vector of masses. The time is expressed in years. The time t = 0 corresponds to January 1, 2000. Initial positions and speeds are taken from IMCCE (2019).

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

-0.61 -0.52 -0.54 0.27 -0.08

-0.42 -0.02 0.73 0.52 0.12

-0.68 0.56 -0.02 -0.47 -0.11

0.09 0.63 -0.4 0.65 -0.01

-0.07 0.02 -0.14 -0.1 0.98

(a) planet trajectories (250y), measurements (5y) (b) Jacobian singular values

(c) Jacobian singular vectors (d) Misﬁt evolution (e) Masses evolution

Figure 7: Solar system experiment with Jackpot. After measuring the approximate planet s positions for 5 years, we wish to recover their masses. The spectrum of the Jacobian (b) indicates that a direction is particularly uncertain. Looking at the Jacobian singular vectors (c), we see that it corresponds to Pluto s mass. In (d) and (e), we see the evolution of the misﬁt and of the masses along the 1D set Mε δ, and observe that all masses are stable (relative to their mass), except Pluto s which can vary by 4 orders of magnitude.

Modeling The solar system can be modeled through the second order dynamical system:

un(t) def =

un(t) un (t) 3 2 (un(t) un (t)), 1 n N. (34)

To discretize this system, we use a Runge-Kutta scheme of order 4 (Butcher, 2016). This yields a forward mapping Φ of the form:

Φ : w 7 (u(tk))0 k K, (35)

where tk = k t denote uniformly spaced sampling points. The measurements are taken on a period of 5 years within intervals of t = 7 days, leading to K = 260 positions for each planet. Notice that 5 Earth-years do not cover a complete revolution for the outer planets. For instance, Pluto s revolution lasts about 250 years. This is illustrated on Fig. 7 (a), where the black lines illustrate the measurements. We add white Gaussian noise to the measurements with a standard deviation ε = 1000 km ( 7 10 6 AU, where AU is the Earth-Sun distance). This yields a measurements vector y = Φ( w) + b, where w denotes the true planets weights and b is the random perturbation.

N. Munier, E. Soubies and P. Weiss

10 10 10 8 10 6 10 4 10 2

Mass of Jupiter

|| (x) y|| in AU

10 10 10 8 10 6 10 4 10 2

Mass of Saturn

10 10 10 8 10 6 10 4 10 2

Mass of Uranus

10 10 10 8 10 6 10 4 10 2

Mass of Neptun

10 10 10 8 10 6 10 4 10 2

Mass of Pluto

Figure 8: Proﬁle likelihood on the solar system problem. The blue stars represent the true values of the parameters. Green segments represent conﬁdent interval. Black dashed lines represent the error threshold ε. Proﬁle likelihood detects the Pluto s mass non-identiﬁability.

Uncertainty with Jackpot To estimate the masses w, we ﬁrst solve the nonlinear problem

w = argmin w RN 1 2 Φ(w) y 2 2 (36)

using a gradient descent to high accuracy. The gradient is evaluated through automatic diﬀerentiation. This gives a fairly good estimate of Pluto s mass, overestimating the true estimate by 4%. Then, we run algorithm 1. The singular values of the Jacobian of Φ are displayed in Fig. 7 b). As can be seen, the last singular value is nearly ten times smaller than the others. Taking a close look at the singular vectors in Fig. 7 c), shows that the last singular vector is entirely concentrated on Pluto s mass, indicating that the estimation of this planet s mass is dubious. We then compute a D=1-dimensional manifold approximation Mε δ of the uncertainty region with the Jackpot method. As can be seen in Fig. 7, we can vary the range of Pluto s mass within the range [10 10, 10 6], while staying consistent with the data. In that range, the other planets masses nearly do not change (at least relative to their weight). This illustrates a huge uncertainty on the mass of Pluto when evaluating it with 5 years of planetary observations only.

Comparisons with other approaches In Fig. 8 we use proﬁle likelihood on this problem to estimate the uncertainty. This problem is particularly well suited to this approach, since only Pluto s mass is uncertain. In this case, we see that all proﬁles are concentrated on the true planets mass, apart for Pluto. There, the proﬁle likelihood method recovers the same uncertainty interval as the one provided by Jackpot. The only diﬀerence is that Jackpot was able to automatically ﬁnd this direction, without having to explore all of them. Also notice that proﬁle-likelihood methods would likely bring partial information on the uncertainty if Pluto was discarded from the set. Indeed, the top-left 4x4 block in Fig. 7 c) indicates strong correlation eﬀects between the diﬀerent weights. We do not show the results of the Monte Carlo sampling method since it fails for this example. Indeed, the sampling process only provides points that are very close to x , and the method fails to recover the uncertainty set. This phenomenon is similar to the one observed in Fig. 2 (c): the regression problem to recover the samples xi is very ﬂat along the direction of Pluto s mass.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

3.2 Blind Deblurring

In this section, we explore the problem described in Section 1.1.2. Observing a biological object with a microscope yields diﬀraction limited (i.e. blurry) images. When the impulse response, or Point Spread Function (PSF) of the system is unknown, improving the image sharpness requires solving a blind deblurring problem. In this section, we showcase how Jackpot makes it possible to evaluate uncertainty on the recovered blur kernels and images.

Modeling We consider the classical image formation model

y = k(θ) x + b, (37)

where x RM denotes the observed sample, denotes the convolution product, b RM is a noise vector, and k : RN RM represents the point spread function (PSF) of the system. It depends on parameters θ RN. The scalar theory of diﬀraction (Goodman, 2005), informs us that the PSF k(θ) can be characterized by its pupil function; a two-dimensional function supported on a disk whose radius depends on the numerical aperture of the objective and the wavelength of the collected light. A common practice is to expand this pupil function on a truncated Zernike polynomial basis (Goodman, 2005). With this model, only a few Zernike polynomials are suﬃcient to describe a rich class of realistic PSF that includes typical aberrations. Hence, in our model (37), θ RN denotes the vector of Zernike coeﬃcients. We ﬁx N = 8 in this work, corresponding to the following optical aberrations: defocus, vertical and oblique astigmatism, trefoil and coma and primary spherical. The blind inverse problem consists in estimating both θ and x from y. To that end, we follow the approach proposed by Gossard and Weiss (2024). We ﬁrst train a reconstruction mapping R(θ, y) which computes an estimate ˆx(θ) such that the pair (θ, ˆx(θ)) is coherent with the data y. In this experiment, R is a deep plug-and-play image restoration method (DPIR). It corresponds to a half-quadratic splitting algorithm where the proximal step is replaced by a DRUNet denoiser with pre-trained weights (Zhang et al., 2021). Equipped with this (non-blind) reconstruction mapping R, we want to ﬁnd the pair (θ, ˆx(θ)) that best ﬁts the data y. That is, we want to minimize k(θ) R(θ, y) y 2 2. Using a Bayesian formalism, this means that we are trying to ﬁnd the maximum a posteriori pair (ˆθ, ˆx(ˆθ)). This blind deblurring problem can be cast in our framework by using the following forward mapping Φ : θ 7 k(θ) R(θ, y). (38)

In the ﬁrst row of Fig. 9 we display the input image (a) as well as the noiseless measurements (b) we generated for this experiment.

Uncertainty with Jackpot Given y, we ﬁrst estimate the parameters θ by solving

θ = argmin θ RN 1 2 Φ(θ) y 2 2 (39)

using a gradient descent with automatic diﬀerentiation. In Fig. 9 (c) and (d) we show the associated PSF k(θ ) and deconvolved image ˆx(θ ) = R(θ , y). Note that the image seems signiﬁcantly better resolved than the observation, which suggests that reconstruction mapping R is constructed carefully.

N. Munier, E. Soubies and P. Weiss

Then, we run Algorithm 1. The singular values of the Jacobian of Φ and the two last singular vectors are displayed in Fig. 9 (e) and (f), respectively. As opposed to the previous experiment, we see no clear gap in the amplitude of the last singular values. The singular vectors are clearly mixing the diﬀerent Zernike coeﬃcients, suggesting that the proﬁle likelihood methods will struggle indicating the main directions of uncertainty. We then compute a D=2-dimensional manifold approximation Mε δ of the uncertainty region with the Jackpot method. We obtain a parameterization of the Zernike coeﬃcients vector θ(z) for z R2 with θ(0) = θ . In Fig. 9 (g), we report the SNR between the outputs Φ(θ(z)) and Φ(θ ). The level line of 40d Bs is displayed in green. In Fig. 11 we show how the proﬁle likelihood behaves on this problem. The conﬁdence intervals obtained with the proﬁle likelihood method are also displayed as blue segments in Fig. 9 (g). They have been projected on the tangent plane Tθ for comparison. As can be seen from this plot, the proﬁle likelihood method signiﬁcantly underestimates the uncertainty domain, since the blue segments tips are far from the level line boundaries. In Fig. 10, we provide some zoomed regions of the deblurred images in (a) and of the corresponding PSF in (b) for diﬀerent parameters θ(z) Mε δ. The corresponding parameters z are shown as red stars in Fig. 9 (g). We see that it is possible to signiﬁcantly deform the PSFs with a negligible modiﬁcation of the measurements. We can also observe on Fig.s 10 (a) and (b) that some structures of the sample can be hallucinated without aﬀecting the measurements too much. Overall, this shows that the studied blind deblurring process is unstable, though it produces sharp and nice looking images for all parameters. Finally, let us mention that the Monte Carlo sampling method strongly underestimates uncertainty regions as illustrated in Fig. 12. The reason is related to the fact that we wish to recover a relatively low dimensional manifold (less than 8), but add noise on a high dimensional space of color images (200 200 3).

3.3 Posterior Exploration for an Image Deblurring Problem

In contrast to the previous section, we tackle regular inverse problems where the forward operator A is known perfectly.

Formalism We explore the problem described in Section 1.1.3. Given y = Ax + b, with b N(0, σ2Id), we construct the MAP estimator

ˆx MAP def = argmin x RN log px|y(x|y)

= argmin x RN 1 2σ2 Ax y 2 2 log px(x).

By ﬁrst order optimality conditions, ˆx MAP satisﬁes

log px|y (ˆx MAP|y) = 1

σ2 AT (Aˆx MAP y) log px(ˆx MAP) = 0. (40)

The term log px(ˆx MAP) can be estimated using a learned denoiser and Tweedie formula (Ho et al., 2020). In this paper, we use the denoiser proposed by Hurault et al. (2021), which enables to compute both the log likelihood and its gradient.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

(a) Input image x

(b) Measurements y

(c) PSF k(θ )

(d) Estimate ˆx(θ )

(e) Jacobian singular values

5 10 Zernike index

(f) Singular vectors

0.100 0.075 0.050 0.0250.000 0.025 0.050 0.075 0.100 z1

(g) Output SNR in d B

Figure 9: Blind delurring experiment. The reconstructed image ˆx(θ ) = R(θ , y) in (d) seems signiﬁcantly sharper than the observation y. There is no clear gap for the lowest singular values in (e), suggesting that the uncertainty domain has a rather large intrinsic dimension. In (f), the Zernike coeﬃcients of the two last singular vectors of J are mixed, meaning that the main uncertainty directions are not aligned with pure optical abberations. In (g), we evaluated the function SNR(Φ(θ), Φ(θ )) for θ Mε δ. The level line of 40d B, corresponding to a high ﬁdelity is displayed as a green curve. We also projected the conﬁdence intervals obtained with the proﬁle likelihood method as blue segments.

(a) Zoom on blue and green regions in Fig. 9 (a)

(b) PSF k(θ(z))

Figure 10: Blind deblurring identiﬁability - Grid sampling of the 2-dimensional manifold Mε δ. (a-b) Grid sampling of image diﬀerences |ˆx(θ(z)) x |. (c) PSF grid-sampling k(θ(z)).

Following Section 1.1.3, we set Φ = log px|y ( |y). The uncertainty region Uε δ now can be interpreted as the set of images with smallest gradient amplitude. We compute ˆx MAP using a gradient descent run for many iterations to reach a near 0 gradient.

N. Munier, E. Soubies and P. Weiss

0.1 0.0 0.1 Zernike coefficient z4

Output discrepancy in d B

0.1 0.0 0.1 Zernike coefficient z5

0.1 0.0 0.1 Zernike coefficient z6

0.1 0.0 0.1 Zernike coefficient z7

0.1 0.0 0.1 Zernike coefficient z8

0.1 0.0 0.1 Zernike coefficient z9

0.1 0.0 0.1 Zernike coefficient z10

0.1 0.0 0.1 Zernike coefficient z11

Figure 11: Proﬁle likelihood on blind inverse problem. The blue stars represent the true values of the parameters. The green segments represent conﬁdence intervals. The two ﬁrst coeﬃcients seem less identiﬁable than the other from this analysis.

(a) Monte Carlo sampling

(b) Jackpot

Figure 12: Violinplots of the Zernike coeﬃcients obtained using a Monte Carlo sampling with 1000 points (left) and the Jackpot algorithm (right) for the blind deblurring experiment. Observe that the amplitudes obtained with Monte Carlo sampling are about 103 times lower than those obtained with Jackpot, illustrating the large discrepancy between natural and adversarial perturbations.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

(b) PSF spectrum

(c) Kernel element

(d) Reference

(e) Blurry image y

Figure 13: The deblurring problem. To illustrate Jackpot, we designed a speciﬁc PSF with a highly oscillatory 2D kernel (null-space). An element is shown in (c).

(a) Recovered x

(b) Jackpot v N

(c) Jackpot v N 1

(d) Singular values

(e) The last 5 leftmost eigenvectors of J for the deblurring inverse problem.

Figure 14: Recovered image x , 2 images recovered by Jackpot, the singular spectrum and 5 leftmost eigenpairs of J .

Jackpot results We treat the 200 200 color image shown on Fig. 13d. The forward operator A is deﬁned as a convolution with the PSF shown on Fig. 13a. Its discrete Fourier transform is shown on Fig. 13b. It possesses only two zeros (green crosses). Considering the Fourier transform symmetries for real signals, it means that the kernel is 2D for each color channel and therefore dim(ker(A)) = 6. This deblurring problem can be considered as mildly ill-posed. The resulting blurry image and recovered images are shown on Fig. 13e and 14a respectively. The ﬁve last singular values of the Jacobian matrix J are displayed in Fig. 14d. Surprisingly, they all possess roughly the same amplitude. We could expect that the only directions of uncertainty are unlocalized patterns such as the one in Fig. 13c. Looking closer at the associated singular vectors in Fig. 14e, we see that they all consist of localized and highly oscillatory patterns. The reason why the uncertainty patterns can be localized is that we do not constrain Ax = y, but just promote it through the data term 1 2σ2 Ax y 2 2, leaving space for more uncertainty. In this examples, proﬁle-likelihood methods are unusable due to the too large number of parameters (3 200 200). We propose a comparison with Monte Carlo sampling in Fig. 15.

N. Munier, E. Soubies and P. Weiss

(a) Monte Carlo sampling

(b) Jackpot with D = 2

(c) Jackpot with D = 10

Figure 15: Comparison between Monte Carlo sampling and Jackpot for image deblurring. The pixel-wise maximum distance with x is displayed for 1000 Monte Carlo points in (a) and for 20 Jackpot points and D = 2 in (b) and 100 Jackpot points and D = 10 in (c). Notice the diﬀerent amplitude ranges.

After drawing 1000 samples, we display maximal pixel-wise distance to x both for the Monte Carlo and the Jackpot methods. As can be seen, the Monte Carlo method signiﬁcantly underestimates the size of the uncertainty region. This again illustrates the diﬀerence between the eﬀect of natural (Monte Carlo) and adversarial (Jackpot) perturbations.

Acknowledgments

This work was supported by the ANR Micro-Blind (grant ANR-21-CE48-0008), the University Research School EUR-MINT (ANR-18-EURE-0023). This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-AD011012210R2). Photo credit: David Villa Science Image, CBI, CNRS.

Appendix A. Proofs

A.1 Deﬁnition and Characterization of Manifolds

Deﬁnition 16 Let x be a point of an open set U of RD. A map f : U RN is said to be an immersion at x if f is of class C1, and its Jacobian Jf(x) is injective. If this is true for all x U, the map f is said to be an immersion.

Deﬁnition 17 (Local deﬁnition by image of immersion, Lee, 2012) Let M be a subset of RN. The set M is said to be a manifold of dimension D if, for all x M, there exist

a neighborhood U of 0D in RD

a neighborhood V of x in RN

f : U RN, an immersion at 0D such that f(0D) = x

such that f : U V M is an homeomorphism.

The proposed Jackpot builds an approximation Mε δ (see Deﬁnition 7) of Uε δ through the parametrization γ : RD RN in (Pz). According to Deﬁnition 17, to prove that Mε δ presents a manifold structure, we require the following two ingredients :

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

γ is of class C1 and Jγ(0D) is injective (i.e., γ is an immersion at 0D), such that γ(0D) = x (true by deﬁnition).

there exist neighborhoods V of x in RN and U of 0D in RD such that γ : U V M is an homeomorphism.

Actually, the second point is implied by the ﬁrst one as stated by Proposition 18 below. Consequently, the proof of the manifold structure of Mε δ in Theorem 12 reduces to proving only the ﬁrst point above.

Proposition 18 (Injective Jacobian implies a local structure of manifold) Let f : RD RN be a function of class C1. Assume that Jf(0D) is injective. Then there exists a neighborhood U RD of 0D such that f(U) is a manifold.

Proof First of all, Jf is locally injective near 0D because of the lower semi-continuity of the rank function. Indeed, the set of points on which Jf is injective can be reformulated as the inverse image of the open set R under the continuous map det(JT f Jf) and thus is open. From rank theorem (Lee, 2012, Thm. 5.13), up to diﬀeomorphisms, f is locally equal to its Jacobian which is locally injective, so is f. We actually get that f is locally an immersion. From (Lee, 2012, Lem. 5.34), since f is locally an immersion, f is also locally an embedding on a neighborhood U. Finally from (Lee, 2012, Thm. 5.31), f(U) is an embedded submanifold as the image of a smooth embedding, which concludes the proof.

A.2 Linear Approximation

A.2.1 Proof of Proposition 4

Deﬁning the ellipsoid E def = {x RN ; Σ V ,T (x x ) 2 2 ε2} we can reformulate

e Uε δ = E B(x , δ) and f Mε δ = E B(x , δ) Tx . (41)

As f Mε δ = e Uε δ Tx e Uε δ , the Hausdorﬀdistance is given by d H( f Mε δ, e Uε δ ) = supx e Uε δ d(x, f Mε δ). Without considering the constraint x B(x , δ), we get the same for E and E Tx

d H(E Tx , E) = sup x E d(x, E Tx ). (42)

Now, we can consider the following equivalent optimization problem

(Id N ΠV D)(x x ) 2

as d(x, E Tx )2 = (Id N ΠV D)(x x ) 2

2 where V D is deﬁned in Equation 13. The associated Lagrangian is given by

L(λ, x) = 1

(Id N ΠV D)(x x ) 2

Σ V ,T (x x ) 2 2 ε2 . (44)

N. Munier, E. Soubies and P. Weiss

And its gradient is

PN n=1 s2 n(σ n)2 ε2 PN D n=1 snv n λ PN n=1(σ n)2snv n

where we denote the coordinates x x = PN n=1 snv n. The optimum is achieved where the gradient vanishes, that is

n=1 s2 n(σ n)2 = ε2 ; sn = 0, n N D + 1 sn(1 + λσ n) = 0, n N D .

From this, only one sn is non zero. Among the N D local optima, only the following is global as long as σ N D < σ N D 1 (otherwise, it is still a global optimum but not unique)

sn = 0, n = N D ; s N D = ε σ N D and λ = 1 σ N D . (46)

Finally let s add the constraint x x 2 δ. If δ > ε σ N D , then the supremum is achieved at x + ε σ N v N D. Otherwise, δ < ε σ N D and the supremum is achieved at x + δv N D. In both cases, the Hausdorﬀdistance veriﬁes Equation 15.

A.3 Nonlinear Approximation

All proof are provided in a more general setting with the parametrization

γ(z) def = argmin-loc A(x x )=z

1 2 Φ(x) Φ(x ) 2 2 , (PA)

where A RD N with 1 D N satisﬁes the following assumption.

Assumption 19 The matrix A veriﬁes AAT = Id D and the matrix Πim AT + Πker A JΦ(x )T JΦ(x ) is invertible. We denote the projections operators on the image and kernel

of A as Πim AT def = AT A and Πker A def = Id N AT A respectively.

The results provided in the main text are speciﬁc instances of the results reported below with A = V ,T D for which Assumption 19 holds.

Lemma 20 Assumption 19 holds true for A = V ,T D .

Proof With A = V ,T D , the matrix

Πim AT + Πker A JΦ(x )T JΦ(x ) = V D Diag σ2 1, , σ2 N D, 1, , 1 V ,T D

is invertible by construction of V ,T D .

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

A.3.1 Proof of Theorem 5

Theorem 5 is a speciﬁc instance of Theorem 21 below with A = V ,T D .

Theorem 21 Let Φ : RN RM be a C1 map, x RN, and A RD N satisfying Assumption 19. Then, there exists an open neighborhood V RD such that (PA) admits a unique solution γ : V RN verifying γ(0D) = x . Moreover, it has the following properties

If Φ is of class C2, then γ is of class C1. Moreover, its Jacobian is given by

Jγ(z) = Πim AT + Πker A 2F(γ(z)) 1 AT . (47)

If Φ is of class C1 and JΦ is locally Lipschitz and deﬁnable, then γ is a locally Lipschitz deﬁnable function. Moreover, we can express a conservative Jacobian for γ as

Jγ : z n [Πim AT + Πker A B] 1 AT ; B J F (γ(z)) o . (48)

where F(x) = 1

2 Φ(x) Φ(x ) 2 2 denotes the objective function of (PA).

Proof Starting from the Karush-Kuhn-Tucker (KKT) conditions of Problem (PA), the idea of the proof is to apply an implicit function theorem on the dual formulation to get existence of solutions for both on primal and dual problems. We moreover derive an explicit expression of the (conservative) Jacobian of γ. From constraint optimization theory, the Lagrangian function associated to (PA) reads

Lz(λ, x) def = F(x) + λT (A(x x ) z) .

where λ RD is the vector of Lagrange multipliers and F(x) = 1

2 Φ(x) Φ(x ) 2 2 is the objective function. As Φ is of class C1, and given that the constraint is linear, the KKT conditions reduce to

Lz(λ, x) = A(x x ) z F(x) + AT λ

for z RD. In particular, multiplying the second line of Lz by the matrix A yields that λ(z) = A F(x) since AAT = Id D. Hence, ﬁnding a solution (λ, x) of the KKT conditions of (PA) is equivalent to solving the implicit system

G(z, x) def = A(x x ) z (Id N AT A) F(x)

Let s notice that G has values in RD+N but its image spans only the N dimensional subspace RD Ker A. Indeed, the matrix (Id N AT A) is the projection onto the subspace ker A = (im AT ) . Thus one can reduce the output dimension of G to make it surjective while keeping the same zeros level sets. This is done by left multiplying with [AT , Id N] which embeds the ﬁrst term A(x x ) z of G onto im AT while keeping the second term in (im AT ) . One obtains

H(z, x) def = AT (A(x x ) z) + (Id N AT A) F(x).

N. Munier, E. Soubies and P. Weiss

Our problem is now equivalent to the following implicit equation

H(z, γ(z)) = 0N, (49)

for z RD and with H : RD RN RN deﬁned before. Given that a solution of Equation 49 veriﬁes the KKT conditions, it thus is also a solution of the primal equation (PA). We now use two forms of implicit function theorem to get existence of a solution of Equation 49 and thus existence of a mapping γ for (PA).

C2 case: From implicit function theorem (Lee, 2012, Thm. 5.15), the only assumption required is the invertibility condition on the second sub-matrix D2H(0D, x ) of the diﬀerential DH = [D1H, D2H] of H at (0D, x ). With some computations, one exactly recovers the second condition in Assumption 19. Then as a consequence, the Jacobian of γ is given by

Jγ(z) = D2H(z, γ(z)) 1D1H(z, γ(z))

from which we recover the Equation 47.

C1 case: We use the Lipschitz deﬁnable version of implicit function theorem from (Bolte et al., 2021, Thm. 1). As JΦ is locally Lipschitz and deﬁnable, so is F and thus H. With γ(0D) = x , one recovers H(0D, γ(0D)) = 0N. A conservative Jacobian of H can be written as

JH : (z, x) n AT , Πim AT + Πker A B ; B J F (γ(z)) o . (50)

Since F(x) = JΦ(x)T (Φ(x) Φ(x )), by path-diﬀerentiation rules, one derives the formula

J F (x) = n B (Φ(x) Φ(x )) + JΦ(x)T JΦ(x) ; B JJT Φ (x) o

and evaluating in x = x provides, since the ﬁrst term vanishes,

J F (x ) = JΦ(x )T JΦ(x ) . (51)

Finally inserting Equation 51 in Equation 50 allows us to express

JH(0D, x ) = AT , Πim AT + Πker A JΦ(x )T JΦ(x ) .

This last equation implies that the assumptions of (Bolte et al., 2021, Thm. 1) are veriﬁed: JH(0D, x ) is a singleton and thus convex and its second sub-matrix is invertible from Assumption 19. This completes the proof.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

A.3.2 Proof of Theorem 8, Theorem 9, Proposition 11 and Theorem 12

Assumption 10, Proposition 11 and Theorem 12 are speciﬁc instances of Assumption 22, Proposition 23 and Theorem 24 below with A = V ,T D .

Assumption 22 (Controlled thickness) Let

Sz = {x Uε δ , A(x x ) = z}

denote the slice of Uε δ which has coordinate z on Tx . We assume that

sup z: γ(z) x 2 δ diam(Sz) η. (52)

Proposition 23 (Uniformly bounded curvature) Let Φ be of class C1. Assume that its Jacobian JΦ is L-Lipschitz over the ball B(x , δ), i.e.

x, x B(x , δ), JΦ(x) JΦ(x ) op L x x 2 . (53)

Then assume that the Lipschitz constant L satisﬁes

where σmin > 0 is the lowest singular value of the Jacobian JΦ(x ) restricted to the kernel of A. Then Assumption 22 is veriﬁed with

η = 2ε σmin Lδ. (55)

Proof Setting z B(0D, δ), let s upper bound the diameter of the intersection of Uε δ with the aﬃne hyperplane A( x ) = z. When this slice is empty there is nothing to do, and when it is not, it only remains to bound the norm x1 x2 2 for any couple x1, x2 Uε δ verifying A(x1 x ) = A(x2 x ) = z. To that end, one can remark that

Φ(x1) Φ(x2) 2 Φ(x1) Φ(x ) 2 + Φ(x ) Φ(x2) 2 2ε

as x1 and x2 belong to Uε δ . The remaining of the proof consists in lower bounding the left-hand side by a term of the form µ x1 x2 2 with some µ > 0. Applying integral form of Taylor expansion on Φ, one gets

Φ(x1) Φ(x2) = J (x1 x2), (56)

with the operator J RM N deﬁned as

J def = Z 1

0 JΦ (x2 + t(x1 x2)) dt. (57)

Then, from the inverse triangular inequality, we get

Φ(x1) Φ(x2) 2 JΦ(x )(x1 x2) 2 J JΦ(x ) (x1 x2) 2 . (58)

N. Munier, E. Soubies and P. Weiss

The ﬁrst right term of Equation 58 can be handled noticing that the diﬀerence x1 x2 lies in the kernel of A since A(x1 x2) = A(x1 x ) + A(x x2) = z z = 0. Thus by deﬁnition of the minimal singular value of JΦ(x ) restricted to the kernel of A, denoted σmin, it follows that JΦ(x )(x1 x2) 2 σmin x1 x2 2 .

Moreover, the second right term of Equation 58 can be bounded using Lipschitz constant as follows:

J JΦ(x ) (x1 x2) 2 Z 1

0 [JΦ (x2 + t(x1 x2)) JΦ(x )] (x1 x2) 2 dt

0 JΦ (x2 + t(x1 x2)) JΦ(x ) opdt x1 x2 2

0 x2 + t(x1 x2) x 2 dt x1 x2 2

Lδ x1 x2 2 .

Finally Equation 58 becomes

Φ(x1) Φ(x2) 2 (σmin Lδ) x1 x2 2

as intended.

Proof [Proof of Theorem 9] Using a standard result on smooth manifolds from Lee (2012), which is summarized in Proposition 18, the image of γ locally deﬁnes an embedded manifold since Jγ(0) is injective as the product of an invertible matrix with AT which is injective by assumption.

Proof [Proof of Theorem 8] The goal of this proof is to demonstrate that U0 is locally the image of a smooth transformation of a D-dimensional aﬃne subspace (step 1), and that this transformation preserves the local bijectivity of the orthogonal projection from the uncertainty region onto its tangent space (step 2). This result follows from an application of the inverse function theorem. As a consequence, each point in the tangent space near the origin corresponds to a unique point in the uncertainty region. Therefore, the optimization problem deﬁning the Jackpot parametrization admits a unique solution in a suﬃciently small neighborhood, and this solution coincides with U0. A graphical view of the geometric objects used in this proof can be found in Fig. 16. Step 1: By the constant rank theorem, since the rank of JΦ is constant in a neighborhood of x , there exist neighborhoods V of x and W of Φ(x ), and two diﬀeomorphisms v : RN V and w : RM W such that Φ|U = w JΦ(x ) v 1. (59)

Notice that U0 V = v(L), where

L def = JΦ(x ) 1 w 1(Φ(x )) (60)

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

Figure 16: Illustration of the decomposition of the direct model Φ thanks to constant rank theorem.

is an aﬃne subspace of RN. Step 2: Let T be the tangent space of v(L) at x and πT : RN T its orthogonal

projection. Let s show that f def = πT v : L T is locally a diﬀeomorphism around x0 def = v 1(x ) using inverse function theorem. The diﬀerential of f is given by

Df(x0) = DπT (x ) Dv(x0) = πT Dv(x0). (61)

Moreover, by deﬁnition of the tangent space T , one has

Dv(x0)(L) = T . (62)

Hence Df(x0)|L = Dv(x0)|L. The key argument is that since the linear application Dv(x0) : RN RN is invertible (because v is a diﬀeomorphism), the same goes for Dv(x0)|L : L T since T and L have the same dimension. Hence applying inverse function theorem, the map f : L T is a local diﬀeomorphism near v 1(x ). It induces that πT |v(L) = f v 1 : v(L) T is also a local diﬀeomorphism near x . Let take V = B(x , δ) with δ > 0 such that πT |U0 δ is a diﬀeomorphism to its image. Thus normal slices x + T for x T B(x , δ) intersect U0 at a unique point. As a consequence, the problem

argmin πT (x)=z, x 2 δ

1 2 Φ(x) Φ(x ) 2 2 (63)

that deﬁnes the Jackpot parameterization admits a unique global minimizer and thus M0 δ = U0 δ .

N. Munier, E. Soubies and P. Weiss

Theorem 24 Let Φ : RN RM be a function of class C2, x RN be a point, A RD N

satisfying Assumption 19, and let ε > 0 be a threshold. Then Mε δ deﬁnes a manifold with parameterization γ in (PA) for suﬃciently small δ > 0. Moreover, under Assumption 22, the manifold Mε δ is a good approximation of Uε δ in the sense that:

d H(Mε δ, Uε δ ) < η, (64)

where the Hausdorﬀdistance d H is deﬁned in Equation 2. More precisely, we have the following inclusions: Mε δ Uε δ Mε δ + B(0, η). (65)

Proof First of all, from Theorem 21, the set Mε δ is well deﬁned for some δ > 0 and by deﬁnition veriﬁes the ﬁrst inclusion Mε δ Uε δ . To prove the second inclusion, given a point x Uε δ , we need to ﬁnd a point x Mε δ such that x x 2 η. A natural candidate is x def = γ( z) with z def = A( x x ). Indeed, given that both x and x are in Uε δ and (by deﬁnition) verify A( x x ) = A(x x ) = z, Assumption 22 implies that x x 2 η. Moreover, we have

Φ(γ( z)) Φ(x ) 2 Φ( x) Φ(x ) 2 ε, (66)

and thus x = γ( z) Mε δ. Note that the ﬁrst inequality in Equation 66 comes from the fact that γ( z) solves (PA) while x is an admissible point. The second one is due to the fact that x Uε δ . This completes the proof.

A.4 Numerical Computation

A.4.1 Proof of Proposition 14

Proof Proof of Proposition 14. In (Pz), splitting the vector x x RN within the space decomposition RN = Im V D (Im V D) yields the reformulation (P z). Here the constraint V ,T D (x x ) = z is included in the objective function that we denote

2 Φ(r(z, x)) Φ(x ) 2 2 (67)

where r(z, x) def = x + V Dz + Π x. The main strategy to locally prove the linear convergence of gradient descent is to show a geometrical property on Fz. Since it is not locally strongly convex we use some Polyak-Lojasiewicz (PL) condition:

Deﬁnition 25 For a real number µ > 0, a function f is said to verify the µ-Lojasiewicz inequality if for all x, 1 2 f(x) 2 2 µ(f(x) f(x )). (68)

The argumentation will proceed according to the following four steps, where the results are veriﬁed locally around (z, x) = (0D, 0N):

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

1. The restricted map Fz def = Fz|(Im V D) on (Im V D) is strongly convex.

2. The map Fz satisﬁes a PL inequality since any strongly convex map also satisﬁes a PL inequality (Polyak, 1963, lemma).

3. Since Fz only depends on the orthogonal part (Im V D) , i.e. Fz(x) = Fz(x ) whenever Π x = Π x , the objective function Fz also veriﬁes the same PL inequality.

4. Since Fz veriﬁes a PL inequality and is also L-smooth for some L > 0 (since of class C1

on a neighbor set), this assures the linear convergence of the gradient descent (Polyak, 1963, thm. 4).

The only remaining argument is to prove the ﬁrst point : the local strong convexity of the objective function. The Hessian of Fz at x is

2Fz(x) =Π JT Φ(r(z, x))JΦ(r(z, x))Π + Π HT Φ(r(z, x))(Φ(r(z, x)) Φ(x )) RN N.

Evaluating at (z, x) = (0D, 0N) gives

H def = Π J ,T J Π RN N.

Restricted to the subspace (Im V D) , this matrix is a positive-deﬁnite matrix with eigenvalues equal to σ2 1 σ2 N D > 0 since D > R . Moreover, this submatrix corresponds to the Hessian of the map F0 at x = 0N. By smoothness of Φ, the map (z, x) 7 2 Fz(x) is continuous and thus the matrices 2 Fz(x) remain locally positive-deﬁnite matrix for (z, x) on a neigborhood U V of (0D, 0N). This property is a characterization of the strong convexity of the maps Fz retricted to V for all z U.

A.4.2 Proof of Corollary 15

From Theorem 5, one knows that the parameterization function γ is locally Lipschitz. The key remaining argument is that any point z of the ball BD(0, δ) is at most at a distance of

Ds/2 from a point z of the grid Zs. From Lipschitz consideration, γ(z) γ(z ) 2 L

Ds/2. This leads, using Theorem 24, the following inequalities

d H(Uε δ , Mε δ(Z)) d H(Uε δ , Mε δ) + d H(Mε δ, Mε δ(Z))

which completes the proof.

P. A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2008.

N. Munier, E. Soubies and P. Weiss

V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen. On instabilities of deep learning in image reconstruction and the potential costs of ai. Proceedings of the National Academy of Sciences, 117(48):30088 30095, 2020.

R. Arutjunjan, B. M. Schaefer, and C. Kreutz. Constructing exact conﬁdence regions on parameter manifolds of non-linear models. ar Xiv preprint ar Xiv:2211.03421, 2022.

J. M. Bardsley. MCMC-based image reconstruction with uncertainty quantiﬁcation. SIAM Journal on Scientiﬁc Computing, 34(3):A1316 A1332, 2012.

J. M. Bardsley, A. Solonen, H. Haario, and M. Laine. Randomize-then-optimize: A method for sampling from posterior distributions in nonlinear inverse problems. SIAM Journal on Scientiﬁc Computing, 36(4):A1895 A1910, 2014.

D. J. Bates, J. D. Hauenstein, and N. Meshkat. Identiﬁability and numerical algebraic geometry. Plos one, 2019.

G. Bellu, M. P. Saccomani, S. Audoly, and L. D Angi o. Daisy: A new software tool to test global identiﬁability of biological and physiological systems. Computer methods and programs in biomedicine, 2007.

J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls. Nonsmooth implicit diﬀerentiation for machine-learning and optimization. Advances in neural information processing systems, 34:13537 13549, 2021.

J. Bona-Pellissier, F. Malgouyres, and F. Bachoc. Local identiﬁability of deep Re LU neural networks: the theory. Advances in Neural Information Processing Systems, 35:27549 27562, 2022.

J. C. Butcher. Numerical Methods for Ordinary Diﬀerential Equations. John Wiley & Sons, 3rd edition, 2016. ISBN 978-1119121503. doi: 10.1002/9781119121534.

J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasserstein distances with applications in Bayesian OT ﬂow matching. ar Xiv preprint ar Xiv:2403.18705, 2024.

H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye. Diﬀusion posterior sampling for general noisy inverse problems. ar Xiv preprint ar Xiv:2209.14687, 2022.

M. Coste. An introduction to o-minimal geometry. Istituti editoriali e poligraﬁci internazionali Pisa, 2000.

J. A. Duersch, M. Shao, C. Yang, and M. Gu. A robust and eﬃcient implementation of LOBPCG. SIAM Journal on Scientiﬁc Computing, 2018.

M. C. Eisenberg and M. A. L. Hayashi. Determining identiﬁable parameter combinations using subset proﬁling. Mathematical biosciences, 2014.

J. L. Fern andez-Mart ınez and Z. Fern andez-Mu niz. The curse of dimensionality in inverse problems. Journal of Computational and Applied Mathematics, 2020.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

J. L. Fern andez-Mart ınez, M. Z. Fern andez-Mu niz, and M. J Tompkins. On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics, 2012.

J. L. Fern andez-Mart ınez, Z. Fern andez-Mu niz, J. L. G. Pallero, and L. M. Pedruelo-Gonz alez. From Bayes to Tarantola: new insights to understand uncertainty in inverse problems. Journal of Applied Geophysics, 98:62 72, 2013.

S. M. Fischer and M. A. Lewis. A robust and eﬃcient algorithm to ﬁnd proﬁle likelihood conﬁdence intervals. Statistics and Computing, 2021.

J. W. Goodman. Introduction to Fourier optics. Roberts and Company publishers, 2005.

A. Gossard and P. Weiss. Training adaptive reconstruction networks for blind inverse problems. SIAM Journal on Imaging Sciences, 17(2):1314 1346, 2024.

A. V. Grayver and A. V. Kuvshinov. Exploring equivalence domain in nonlinear inverse problems using covariance matrix adaption evolution strategy (CMAES) and random sampling. Geophysical Journal International, 2016.

P. Grohs, S. Koppensteiner, and M. Rathmair. Phase retrieval: uniqueness and stability. SIAM Review, 62(2):301 350, 2020.

J. C. Helton, J. D. Johnson, C. J. Sallaberry, and C. B. Storlie. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliability Engineering & System Safety, 2006.

S. Hengl, C. Kreutz, J. Timmer, and T. Maiwald. Data-based identiﬁability analysis of non-linear dynamical models. bioinformatics, 2007.

J. Ho, A. Jain, and P. Abbeel. Denoising diﬀusion probabilistic models. Advances in neural information processing systems, 33:6840 6851, 2020.

E. Hubert. Essential components of an algebraic diﬀerential equation. Journal of symbolic computation, 1999.

S. Hurault, A. Leclaire, and N. Papadakis. Gradient step denoiser for convergent Plug-and Play. ar Xiv preprint ar Xiv:2110.03220, 2021.

A. Hyv arinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6, dec 2005.

IMCCE. position ephemeris, 2019. URL https://ssp.imcce.fr/forms/ephemeris.

K. Jaganathan, Y. C. Eldar, and B. Hassibi. Phase retrieval: An overview of recent developments. Optical Compressive Imaging, pages 279 312, 2016.

C. Jauberthie, N. Verdi ere, and L. Trav e-Massuy es. Fault detection and identiﬁcation relying on set-membership identiﬁability. Annual Reviews in Control, 2013.

H. K. Johansen. A man/computer interpretation system for resistivity soundings over a horizontally straﬁﬁed earth. Geophysical Prospecting, 1977.

N. Munier, E. Soubies and P. Weiss

M. Kech and F. Krahmer. Optimal injectivity conditions for bilinear inverse problems with applications to identiﬁability of deconvolution problems. SIAM Journal on Applied Algebra and Geometry, 2017.

A. V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM journal on scientiﬁc computing, 2001.

N. N. Lam, P. D. Docherty, and R. Murray. Practical identiﬁability of parametrised models: A review of beneﬁts and limitations of various approaches. Mathematics and Computers in Simulation, 2022.

R. Laumont, V. D. Bortoli, A. Almansa, J. Delon, A. Durmus, and M. Pereyra. Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie. SIAM Journal on Imaging Sciences, 15(2):701 737, 2022.

J. M. Lee. Introduction to Smooth Manifolds. Springer New York, 2012.

Y. Li, K. Lee, and Y. Bresler. Identiﬁability in bilinear inverse problems with applications to subspace or sparsity-constrained blind gain and phase calibration. IEEE Transactions on Information Theory, 63(2):822 842, 2016.

E. Liu, K. Rivera, and J. Fox. Eﬃcient eigensolvers and their applications. 2020.

B. Merkt, J. Timmer, and D. Kaschek. Higher-order Lie symmetries in identiﬁability and predictability analysis of dynamic models. Physical Review E, 92(1):012920, 2015.

H. Miao, X. Xia, A. S. Perelson, and H. Wu. On identiﬁability of nonlinear ODE models and applications in viral dynamics. SIAM review, 2011.

K. Mosegaard and A. Tarantola. Monte Carlo sampling of solutions to inverse problems. Journal of Geophysical Research: Solid Earth, 100(B7):12431 12447, 1995.

M. J. et al. Muckley. Results of the 2020 fast MRI challenge for machine learning MR image reconstruction. IEEE Transactions on Medical Imaging, 2021. doi: 10.1109/TMI.2021. 3075856.

S. Nah, S. Son, S. Lee, R. Timofte, and K. M. Lee. NTIRE 2021 challenge on image deblurring, 2021. URL https://arxiv.org/abs/2104.14854.

Beresford N Parlett. The symmetric eigenvalue problem. SIAM, 1998.

M. Phuong and C. H. Lampert. Functional vs. parametric equivalence of Re LU networks. In International Conference on Learning Representations, 2019.

A. Pinkus. N-widths in Approximation Theory, volume 7. Springer Science & Business Media, 2012.

B. T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864 878, 1963.

B. T. Polyak. Introduction to optimization. 1987.

Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds

D. V. Raman, J. Anderson, and A. Papachristodoulou. Delineating parameter unidentiﬁabilities in complex models. Physical Review E, 2017.

A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingm uller, and J. Timmer. Structural and practical identiﬁability analysis of partially observed dynamical models by exploiting the proﬁle likelihood. Bioinformatics, 2009.

W. C. Rheinboldt. Manpak: A set of algorithms for computations on implicitly deﬁned manifolds. Computers & Mathematics with Applications, 1996.

T. J. Rothenberg. Identiﬁcation in parametric models. Econometrica: Journal of the Econometric Society, 1971.

E. Y. Sidky and X. Pan. Report on the AAPM deep-learning spectral CT grand challenge. Medical physics, 2024. URL https://arxiv.org/abs/2212.06718.

G. W. Stewart and J. G. Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990. ISBN 978-0126702309.

J. Tachella, M. Terris, S. Hurault, A. Wang, D. Chen, M. H. Nguyen, M. Song, T. Davies, L. Davy, J. Dong, et al. Deepinverse: A python package for solving imaging inverse problems with deep learning. ar Xiv preprint ar Xiv:2505.20160, 2025.

D. Y. W. Thong, C. K. Mbakam, and M. Pereyra. Do Bayesian imaging methods report trustworthy probabilities? ar Xiv preprint ar Xiv:2405.08179, 2024.

M. K. Transtrum and P. Qiu. Model reduction by manifold boundaries. Physical review letters, 2014.

L. Van den Dries. Tame topology and o-minimal structures, volume 248. Cambridge university press, 1998.

D. J. Venzon and S. H. Moolgavkar. A method for computing proﬁle-likelihood-based conﬁdence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 37(1):87 94, 1988.

A. F. Villaverde, N. D. Evans, M. J. Chappell, and J. R. Banga. Input-dependent structural identiﬁability of nonlinear systems. IEEE Control Systems Letters, 3(2):272 277, 2018.

P. Vincent. A connection between score matching and denoising autoencoders. Neural Comput., 23:1661 1674, jul 2011. URL https://doi.org/10.1162/NECO_a_00142.

F.-G. Wieland, A. L. Hauber, M. Rosenblatt, C. T onsing, and J. Timmer. On structural and practical identiﬁability. Current Opinion in Systems Biology, 2021.

K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360 6376, 2021.