# jackpot_approximating_uncertainty_domains_with_adversarial_manifolds__8783ec3f.pdf Journal of Machine Learning Research 24 (2025) 1-41 Submitted 10/24; Revised 7/25; Published 10/25 Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds Nathana el Munier nathanael.munier@ens-rennes.fr University of Toulouse, CNRS, IMT, CBI, France Emmanuel Soubies University of Toulouse, CNRS, IRIT, CBI, France Pierre Weiss University of Toulouse, CNRS, IRIT, CBI, France Editor: Joseph Salmon 1 Introduction 2 1.1 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Identifying Dynamical Systems Parameters . . . . . . . . . . . . . . 3 1.1.2 Blind Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Bayesian Posterior Exploration . . . . . . . . . . . . . . . . . . . . . 4 1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Structural Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Practical Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Main Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 The Jackpot Algorithm and its Guarantees 8 2.1 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Approximation Result . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Lowest Singular Vectors Computation . . . . . . . . . . . . . . . . . 10 2.2 Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 The Jackpot Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Approximation Guarantees . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Numerical Experiments 20 3.1 Measuring Masses in the Solar System . . . . . . . . . . . . . . . . . . . . . 20 3.2 Blind Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Posterior Exploration for an Image Deblurring Problem . . . . . . . . . . . 24 A Proofs 28 A.1 Definition and Characterization of Manifolds . . . . . . . . . . . . . . . . . 28 c 2025 N. Munier, E. Soubies and P. Weiss. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/24-1769.html. N. Munier, E. Soubies and P. Weiss A.2 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.2.1 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.3 Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.3.1 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.3.2 Proof of Theorem 8, Theorem 9, Proposition 11 and Theorem 12 . . 33 A.4 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 A.4.1 Proof of Proposition 14 . . . . . . . . . . . . . . . . . . . . . . . . . 36 A.4.2 Proof of Corollary 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Abstract Given a forward mapping Φ : RN RM and a point x RN, the region {x RN, Φ(x) Φ(x ) 2 ε}, where ε 0 is a perturbation amplitude, represents the set of all possible inputs x that could have produced the measurement Φ(x ) within an acceptable error margin. This set is related to uncertainty analysis, a key challenge in inverse problems. In this work, we develop a numerical algorithm called Jackpot (Jacobian Kernel Projection Optimization) which approximates this set with a low-dimensional adversarial manifold. The proposed algorithm leverages automatic differentation, allowing it to handle complex, high dimensional mappings such as those found when dealing with dynamical systems or neural networks. We demonstrate the effectiveness of our algorithm on various challenging large-scale, non-linear problems including parameter identification in dynamical systems and blind image deblurring. Keywords: Inverse problem, uncertainty quantification, identifiability 1. Introduction Let Φ : RN RM denote an arbitrary C1 mapping and y = Φ(x) + η be some noisy measurement of x RN through this mapping. Given an estimate x of x, our main objective in this paper is to design a numerical algorithm to describe the set Uε defined by Uε def = x RN, Φ(x) Φ(x ) 2 ε (1) = Φ 1 (B (Φ(x ), ε)) , for some ε 0, where 2 denotes the standard ℓ2-norm and B(y, r) denotes an ℓ2-ball centered at y of radius r. For example, choosing ε = Φ(x ) y defines Uε as the set of points whose forward images under Φ are as close to the measurement y as that of x , and can thus be interpreted as equally plausible estimates. As will be explained later, many important practical problems can be framed in this manner. If the mapping Φ is linear, then Uε is an ellipsoid that can be characterized using linear algebra tools. In the general nonlinear case, the set Uε is the sublevel set of a near arbitrary function. Indeed, an arbitrary closed set E can be written as the 0-level set of the function Φ(x) = dist(x, E)2. Given the level of generality, it is unreasonable to expect solving the problem properly for all mappings Φ. Describing Uε is more challenging than finding all the global minimizers of a near arbitrary non convex function. Therefore, we aim for a more modest goal: describe Uε locally around the point x . Additionally, we seek to approximate this set by another one of relatively low intrinsic dimensionality. This serves two purposes: i) to design tractable algorithms and ii) to provide an output that can be easily visualized. Our main objective in this paper can therefore be summarized as follows. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds Main objective Given a mapping Φ : RN RM and a point x RN, find a Ddimensional manifold1 Mε δ that approximates the set Uε δ def = {x B(x , δ), Φ(x) Φ(x ) 2 ε}, (2) where δ 0 bounds the diameter of the approximating manifold. The set Uε δ can be encountered under different names, depending on the scientific field: equivalence domain (Grayver and Kuvshinov, 2016), low misfit region (Fern andez-Mart ınez et al., 2012), solution set (Jauberthie et al., 2013) or uncertainty region (Fern andez-Mart ınez and Fern andez-Mu niz, 2020). Throughout this paper, we will call it uncertainty region. We are particularly interested in complex models Φ (e.g. dynamical systems, neural networks), which can lead to problems involving high-dimensional (latent) spaces. Throughout the paper, we assume that the Jacobian of Φ can be accessed either through analytical derivation or by using automatic differentation algorithms. 1.1 Application Examples Many problems involve identifying parameters of physical, biomedical, or chemical systems from indirect observations. The primary applications we have in mind are related to the field of inverse problems. In this field, the set Uε describes what can be identified in a system and what cannot. Let us illustrate this point with a few examples studied in the numerical experiments. 1.1.1 Identifying Dynamical Systems Parameters Assume that some dynamical system u : R+ RP D of P particles in RD is governed by a first or second order equation: u(t) = f(u(t), x) or u(t) = f(u(t), x) where u(t) = (u1(t), . . . , u P (t)) denotes the particles positions and f : RP D RN RP D is a force or velocity field parameterized by the vector x RN. After measuring the particles positions at multiple time points, we would like to recover the unknown parameter x that describes the dynamical system. An example that we will be studied later is the solar system, where u(t) represents the positions of P = N planets at time t, f are forces given by Newton s second law of attraction and x RN is the mass of the different planets. We assume that a measurement system (e.g. telescope) returns estimates of the positions u at some times, leading to a measurement vector (y1, . . . , y K) of the form yk = u(tk)+bk for sampling times t1 < < t K and unknown measurement errors bk. The scientific question we aim to address is: can we recover the mass of the planets from the partial observation of their positions? To this end, we can set Φ : x 7 (u(t1), . . . , u(t K)) and find a first guess x of the mass using a least square regression: x = argmin x RN 1 2 Φ(x) y 2 2. 1. See Appendix A.1 for a definition. N. Munier, E. Soubies and P. Weiss Describing the set Uε δ provides a much finer description of the uncertainties than the single estimate x . As we will demonstrate, our algorithm reveals that predicting the mass of Pluto is highly challenging. This observation aligns with historical estimates: initially, Pluto s mass was thought to be twice that of Earth, but in 2015, the New Horizons probe measured it to be just 1/455th of Earth s mass. 1.1.2 Blind Inverse Problems Assume that we acquire measurements y of the form y = P(A(θ)(x)), where P : RM RM represents a perturbation (e.g. noise, quantization) and θ RN is a parameter that describes the state of an acquisition system A(θ) : RN RM. This parameter could represent an unknown point spread function in optics or uncertain projection angles in tomography for instance. The goal in blind inverse problems is to recover both x and θ from the measurements y. One approach we will explore in the numerical section involves constructing a learned reconstruction mapping R(A(θ), y), which, given a forward operator A(θ) and a measurement vector y, outputs an estimate ˆx(θ) = R(A(θ), y) of x. Then, a simple approach to estimate θ is to minimize the discrepancy: argmin θ RN 1 2 A(θ)(ˆx(θ)) y 2 2 , ensuring that the pair (A(θ), ˆx(θ)) is coherent with the observation y. By setting Φ(θ) = A(θ)(ˆx(θ)) and exploring Uε δ around a local minimizer θ , we can identify all pairs coherent with the data. This is a way to describe some uncertainty properties of the blind inverse problem. 1.1.3 Bayesian Posterior Exploration Similar to the previous example, assume that a system yields measurements y of the form y = P(A(x)), where A : RN RM is a known operator describing the acquisition device. Whenever A is not invertible, recovering x from y requires regularization. An elegant approach to formalize this principle uses Bayesian reasoning: we assume that x is the realization of some random vector x with probability distribution function px. Under the assumption that the prior px is a proper probability distribution and the likelihood py|x(y|x) is well-defined for the observed data y, it is possible to construct a posterior distribution: px|y(x|y) = py|x(y|x) px(x) and Bayesian estimators such as the Maximum A Posteriori (MAP) estimate: ˆx MAP(y) def = argmax x RN px|y(x|y) = argmin x RN log py|x(y|x) log (px(x)) . Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds The prior px can be handcrafted or learned with large databases. The latter strategy oftentimes provides surprisingly good results, which are rapidly replacing older strategies (Nah et al., 2021; Muckley, 2021; Sidky and Pan, 2024). In scientific applications, it however becomes critical to certify the results, i.e. describe what information can be safely regarded as valid and the one which is uncertain. One approach to achieving this involves describing the high probability region Hα = {x RN, log px|y(x|y) α} (3) for some probability threshold α. However, the prior px(x) can often only be accessed indirectly through its gradient (Hyv arinen, 2005; Vincent, 2011). In that situation, another approach is to compute a local minimizer x of the posterior distribution and to explore the set Uε δ = x B(x , δ), log px|y(x|y) 2 ε . (4) This problem fits our formalism with Φ = log px|y( |y) and Φ(x ) = 0, since x is a minimizer. When x = ˆx MAP, we have Uε δ Hα for some parameter ε > 0 that depends on α and sufficiently small δ > 0. However the reverse inclusion is not true in general. 1.2 Related Works As illustrated by these examples, describing uncertainty regions can be useful in various fields such as system identification, inverse and blind inverse problems, data assimilation. It can also be regarded as a type of sensibility analysis or uncertainty quantification. Let us review some approaches available in the literature. We will use the nomenclature of the identifiability theory (Wieland et al., 2021), which classifies the study as structural in the noiseless setting (ε = 0) and practical when noise is considered (ε > 0). 1.2.1 Structural Identifiability Using our notation, global structural identifiability means that U0 = {x } while local structural identifiability means that there exists δ > 0 such that U0 δ = {x }. The literature devoted to those two notions of identifiability is vast. Under mild regularity assumptions on Φ, a necessary and sufficient condition for local structural identifiability is given by ker(JΦ(x )) = {0}, where JΦ(x ) RM N is the Jacobian matrix of Φ at x (Rothenberg, 1971). When Φ is linear, this condition reads ker(Φ) = {0} and implies global structural identifiability. Beyond the linear case, there exist rich theories regarding the (local) identifiability of bilinear inverse problems (Li et al., 2016; Kech and Krahmer, 2017), phase retrieval problems (Jaganathan et al., 2016; Grohs et al., 2020), partial differential equations (Villaverde et al., 2018) or neural networks (Phuong and Lampert, 2019; Bona-Pellissier et al., 2022), to name a few. When x is non-identifiable, the analysis and characterization of the non-trivial set U0 locally around x has also been the object of many works. Existing methods include computing the uncertainty region U0 in explicit form using differential algebra (Bates et al., 2019), numerical algebraic geometry (Hubert, 1999; Bellu et al., 2007) or Lie group theory (Merkt et al., 2015). However, these algebraic methods are limited to the analysis of rational functions and become computationally prohibitive as the number of parameters increases. N. Munier, E. Soubies and P. Weiss (b) Φ = UAU T (c) Φ(x) = x 2 2 Figure 1: Illustration of the profile likelihood method. The red-black curves are the level sets of the discrepancy Φ( ) Φ(x ) 2. The discrepancy profiles along each axis are plotted in black, together with the confidence intervals in green. In (a), (b), we fixed A = diag(.2, 1), an unitary matrix U and x = (0, 0). In (c), we fixed x = (1, 0). 1.2.2 Practical Identifiability Practical identifiability, which guarantees stability to noise on the observations, is the main topic of this paper. Alternative names are sensitivity analysis or uncertainty quantification, which are closely related. For the local characterization of the uncertainty region Uε, various approaches exist. Coordinate-based methods Coordinate-based methods examine the model s sensitivity along each direction of the canonical basis, where the map x 7 exp ( Φ(x) Φ(x ) 2) is often called likelihood. Profile likelihood methods (Venzon and Moolgavkar, 1988; Raue et al., 2009) compute one-dimensional profiles. The profiles are evaluated by fixing one coordinate and optimizing over the remaining ones: min x RN xi=x i +tj Φ(x) Φ(x ) 2. This is done for different deviations tj R and all coordinates i. The directions i, where the likelihood is high are uncertain. An illustration with simple likelihood functions is given in Fig. 1. Notice that in practice, only endpoints of the confidence intervals are computed (Fischer and Lewis, 2021). A weakness of these approaches is that they act component-wise. Hence, in Fig. 1 examples (a), (b), we see that the profiles change a lot depending on the coordinate system, showing that the profile directions should be chosen with care. In Fig. 1 (c) for instance, the profiles are locally flat in both directions, while the uncertainty is only one-dimensional along the unit circle. A possible approach to avoid this drawback, is to find meaningful directions using the Fisher information matrix, defined by JΦ(x )T JΦ(x ). The eigenvectors associated to the lowest singular values indicate the directions which are hardest to identify. This idea was used by Eisenberg and Hayashi (2014) to identify combinations of coordinates which are jointly identifiable and draw profiles likelihood for these subsets only. A similar mechanism is at the core of our approach, though used in a different manner. Monte Carlo sampling methods The problem can be viewed as recovering the preimage Φ 1(B(Φ(x ), ε)), i.e., all points that Φ maps into a ball around Φ(x ). A simple way Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds (b) Φ = UAU T (c) Φ(x) = x 2 2 Figure 2: Illustration of Monte Carlo sampling on the same examples as in Fig. 1. The set Uε is represented in green and the sampling points are in blue. to do so is to deploy a Monte Carlo sampling method (Mosegaard and Tarantola, 1995). A few samples yk for 1 k K are drawn at random within B(Φ(x ), ε). Then, the antecedents can be computed as ˆxk def = argmin x RN Φ(x) yk 2 2. (5) The discrete set (ˆxk) provides a snapshot of Uε. An illustration is provided in Fig. 2 with our basic examples. Statistical analyses of the samples (ˆxk) can then be performed to quantify the uncertainty. These include, for instance, variance or covariance estimates, bootstrapping (Hengl et al., 2007), sensitivity-based methods (Helton et al., 2006; Bardsley, 2012), or randomize-thenoptimize strategies (Bardsley et al., 2014). As opposed to the profile likelihood approach, Monte Carlo sampling accounts for relationships between coordinates. Yet, it suffers from multiple weaknesses. It can be computationally expensive given that each sample ˆxk requires solving an optimization problem and that the number of sampling points required for statistical analyses grows exponentially with the dimension of the problem (Fern andez-Mart ınez and Fern andez-Mu niz, 2020; Fern andez-Mart ınez et al., 2013). For degenerate problems, where Φ is non injective, the optimization problem may possess multiple minimizers. In those situations, sampling becomes inefficient. In Fig. 2 (c) for instance, we started the optimization algorithm by setting xinit = x as an initialization point. As a result, the sampling approach only yields a 1D subset of Uε, that does not capture the complete annulus. Uniform sampling in B(Φ(x ), ε) might not translate to a uniform sampling in Uε, potentially over-representing some regions and under-representing others. This issue will be striking in our numerical experiments. Some of these challenges can be addressed with advanced diffusion models, which have recently gained significant attention in artificial intelligence research. These methods were initially developed for generative modeling tasks (Ho et al., 2020) and have since been adapted for various inverse problems (Chung et al., 2022; Laumont et al., 2022; Chemseddine et al., 2024). Ongoing research focuses on certifying the accuracy of posterior sampling (Thong et al., 2024) and improving the efficiency of sample generation. However, given the computational cost of these approaches, we have chosen not to use them for comparisons. N. Munier, E. Soubies and P. Weiss Geometric analysis Finally, the last class of methods, the one we endorse, aims to compute geometrically the structure of the set Uε. Arutjunjan et al. (2022) proved that the boundary of Uε is a N 1-dimensional manifold and proposed a numerical method to compute its boundary. It relies on the construction of a complete vector field tangential to the level sets of Φ(x) Φ(x ) 2. Then, the determination of the boundary of Uε amounts to solve an ODE. This approach is limited to low-dimensional problems such as the two-dimensional non-polynomial and was applied only to three-dimensional polynomial by Arutjunjan et al. (2022). Instead of considering the whole space, Raman et al. (2017) proposed to determine a trajectory referred to as minimally disruptive curve or talweg within the structural or practical non-identifiable sets U0 and Uε. This trajectory is a one-dimensional sub-manifold approximation of Uε with the smallest discrepancy values. However, it is unclear how to extend this approach to higher dimensions. Another approach given by Transtrum and Qiu (2014) consists in successively eliminating one parameter at a time to finally obtain a partial low-dimensional boundary of the uncertainty region. This method is not applicable to high-dimensional problems. For further details and references related to these approaches, we refer the reader to the comprehensive reviews (Miao et al., 2011; Wieland et al., 2021; Lam et al., 2022). 1.3 Main Contribution We aim at advancing the field of local practical identifiability by designing the Jackpot algorithm. It is applicable to large dimensional problems and yields a manifold of arbitrary (but preferably small) dimension D. This manifold is constructed so as to provide a good approximation of the ε-uncertainty region in Hausdorffdistance, regardless of the non-linear and non-algebraic nature of the model. A simple illustration of its output is given in Fig. 5. The proposed algorithm is based on automatic differentiation, allowing us to tackle high-dimensional problems. It relies on the Python package deepinv (Tachella et al., 2025). In particular, we illustrate its behavior in problems involving dynamical systems and neural networks. The numerical experiments supporting these results, together with the implementation of the Jackpot algorithm, are available in the corresponding Git Hub repository. 2. The Jackpot Algorithm and its Guarantees Our main goal is to describe Uε δ = {x B(x , δ), Φ(x) Φ(x ) 2 ε}, the local ε2/2sublevel set of the function F(x) def = 1 2 Φ(x) Φ(x ) 2 2. (6) In what follows, to measure how well we approximate the set Uε δ , we will use Kolmogorov and Hausdorffdistances which we define next. Definition 1 (D-width or Kolmogorov distance) Let G(D, N) denote the set of Ddimensional vector subspaces in dimension N (the Grassmanian). The Kolmogorov distance δD of a set S RN is defined by: δD(S) def = inf V G(D,N) sup x S inf v V x v 2. (7) Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds It measures how well a set S can be approximated by a D-dimensional subspace. Let us also recall the notion of Hausdorffdistance. Definition 2 (Hausdorffdistance) For two subsets X, Y of a metric space (M, d), the Hausdorffdistance is defined as d H(X, Y) def = max sup x X d(x, Y), sup y Y d(X, y) = inf η 0 {X Y + B(0, η) and Y X + B(0, η)} . (9) This distance intuitively measures how much a set should be dilated to include the other one and vice versa. In this section, we start by proposing a linear manifold approximation f Mε δ obtained from a quadratic approximation of F at x . This allows us to present the main approximation results in a simplified format. We then turn to the construction of a more precise nonlinear manifold approximation Mε δ. We describe its approximation guarantees, and a practical algorithm to compute it. 2.1 Linear Approximation Under a C2 regularity assumption, we can approximate F by a quadratic function. Most terms in the Taylor expansion vanish since x is a minimizer and for any h RN, we obtain: F(x + h) = 1 2 JΦ(x ) h 2 2 + o( h 2 2). (10) This motivates us to consider the quadratic approximation of F at x : e F(x) def = 1 2 JΦ(x ) (x x ) 2 2. (11) The uncertainty region can be described as the ε2/2-sublevel set of this function, denoted e Uε = n x RN, F(x) ε2/2 o . Similarly to the definition of Uε δ , we also set e Uε δ = x B(x , δ), F(x) ε2 2.1.1 Approximation Result In the linear case, the best approximation of e Uε by a D-dimensional subspace is explicit using the singular vectors of the Jacobian matrix of Φ. We can indeed decompose the Jacobian as JΦ(x) = U(x)Σ(x)V T (x), where U(x) RM M and V (x) RN N are orthogonal matrices and Σ(x) RM N is a diagonal rectangular matrix with nonnegative and decaying entries. To simplify the notation, we let J def = JΦ(x ) = U Σ V ,T , where U = [u 1, . . . , u M], V = [v 1, . . . , v N], diag(Σ ) = [σ 1, . . . , σ R , 0, . . . , 0], and R is the rank of J . By convention, we let σ R +1, σ R +2, . . . , σ N = 0. The following description is standard. Proposition 3 (From Johansen, 1977) The sublevel set e Uε is a (degenerate) hyper-ellipsoid with axes aligned with the vectors (v n). The length of the ellipsoid along the semi-axis v n is given by ε σ n for n {1, . . . , R } and it is infinite for the other axes. N. Munier, E. Soubies and P. Weiss The following result is rather standard in approximation theory, see e.g. (Pinkus, 2012, Thm. 1.3). First the uncertainty region is approximated by an affine subspace L and then by a troncation of this subset which gives the approximation manifold f Mε δ of e Uε δ . Guarantees on the approximation distances are also provided and the proof is provided in Appendix A.2.1. Proposition 4 Firstly, the affine D dimensional affine space with D dim ker JΦ(x ) that best approximates e Uε for the Kolmogorov distance is given by Tx def = x + V D with V D def = span v N D+1, . . . , v N . (13) The Kolmogorov distance is given by δD( e Uε x ) = ε σ N D . Secondly, for any δ > 0, we define the set f Mε δ def = n x Tx ; Σ V ,T (x x ) 2 2 ε2 and x x 2 δ o . (14) It satisfies the following equality: d H( f Mε δ, e Uε δ ) = min ε σ N D , δ . (15) This linearized approximate set f Mε δ was already described in Fern andez-Mart ınez and Fern andez-Mu niz (2020). The bounds in Proposition 4 are inversely proportional to the (N D)th singular value of the Jacobian of Φ at x . As such, the linearized uncertainty region e Uε δ can be well approximated by a low-dimensional (i.e., D small) manifold f Mε δ when the singular spectrum of the Jacobian JΦ(x ) admits few small singular values. A general discussion related to this is provided in Section 2.2.4 (see also Fig. 6). 2.1.2 Lowest Singular Vectors Computation The results above show that the singular values and singular vectors of the Jacobian J = JΦ(x ) play a critical role for the description of the set e Uε. To compute them, two operation are required: i) evaluating the Jacobian matrix or matrix-vector products with it and ii) computing the singular value decomposition. Jacobian computation Computing matrix-vector products with the Jacobian can be achieved efficiently using automatic differentiation algorithms. In Pytorch for instance, this can be achieved through the torch.jvp and torch.vjp functions. If the dimension N or M is sufficiently small, then the whole Jacobian matrix J can be evaluated and stored. If the dimensions are too large, then automatic differentation algorithms make it possible to compute left and right matrix-vector products, i.e. products of the form J v or u T J for arbitrary directions u RM and v RN. Singular vectors computation The computation of singular pairs of J or the eigenpairs of J ,T J are two equivalent problems. The latter is known as a symmetric eigenvalue problem (Parlett, 1998). Several algorithms, called eigensolvers (Liu et al., 2020), have been developed for this purpose. If the matrix is small enough, traditional decomposition methods based on the QR algorithm can be used. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds In high dimension, we adopted the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method (Knyazev, 2001; Duersch et al., 2018). It is matrix-free: it only requires matrix-vector products and does not process the entire matrix. The computation is achieved by optimizing the generalized Rayleigh quotient. For X RN D, this quotient is defined as f(X) = Tr XT X 1 (J X)T J X . (16) As such, finding the D leftmost singular vectors of J consists in finding the minimizer of argmin X St(N,D) f(X), where St(N, D) def = X RN D ; XT X = Id D is the set of N D orthonormal matrices also called the Stiefel manifold. The LOBPCG algorithm can be interpreted as a manifold gradient descent (Absil et al., 2008) with a carefully designed step-size based on the previous iterates. 2.2 Nonlinear Approximation We now turn to the approximation of the sublevel sets of F without resorting to a quadratic approximation: we want to find a D-dimensional manifold Mε δ that approximates Uε δ well. 2.2.1 The Jackpot Manifold The linear subspace Tx = x + V D can be a poor approximation of Uε when Φ is nonlinear. However, it is a natural tangent space for the set Mε δ at x . The idea behind our algorithm consists in bending this tangent space, to better fit the uncertainty region Uε δ . A similar idea was proposed by Rheinboldt (1996) for polynomials. However, the algorithm was limited to three dimensions and could not visualize the uncertainty region, as it was constrained to the case where ε = 0. In mathematical terms, we want to find a one-to-one map γ that sends points of Tx onto Uε. This can be achieved thanks to the following optimization problem: γ(z) def = argmin-loc V ,T D (x x )=z 1 2 Φ(x) Φ(x ) 2 2 , (Pz) where V D def = [v N D+1, , v N 1, v N] RN D. (17) The constraint V ,T D (x x ) = z means that the projection of γ(z) on the tangent space Tx has coordinate z. It ensures the injectivity of the map γ. The argmin-loc notation refers to a local minimizer near x . The main tools appearing in this description are depicted in Fig. 3. A simple illustration To illustrate this principle, we consider the mapping Φ(x) = 1 2 x 2 2 and set x = e1, where (e1, . . . , e N) is the canonical basis. It is a basic example from a broader context named algebraic implicit curves parameterization. For this specific function, the set U0 is a sphere of radius 1 centered at 0. The tangent space is the hyperplane orthogonal to e1 passing through x . Simple computation yields γ(z) = p 1 z 2 2e1 +PN n=2 znen. We see N. Munier, E. Soubies and P. Weiss Figure 3: Summary of Jackpot (JACobian Kernel Projection by Op Timization). In this example, we set Φ(x) = x 2 2, ε = 0 and x = e1. In this setting, the uncertainty region Uε is just a unit sphere, and the affine space Tx is the tangent space to Uε at x . The uncertainty region is obtained by projecting the tangent plane onto Uε. that the proposed parameterization allows us to recover the half sphere, which is illustrated in Fig. 3. This example also shows the importance of considering local minimizers, since the problem (Pz) admits two global minimizers. We provide a few extra toy examples in Fig. 5. Theorem 5 (Well-definedness of the parameterization γ) Let Φ : RN RM be a C1 map, x RN, and D dim ker JΦ(x ) denote an integer. Then there exists a neighborhood V RD such that (Pz) admits a unique solution γ(z) for z V. Moreover, the mapping γ : V RN verifies γ(0D) = x and it has the following properties If Φ is of class C2, then γ is of class C1. If Φ is of class C1 and JΦ is locally Lipschitz and definable, then γ is a locally Lipschitz definable function. Proof The proof is provided in Appendix A.3.1 Remark 6 The second condition in Theorem 5 is weaker than the first one and allows us to handle a larger class of operators. The notion of definable function (Van den Dries, 1998; Coste, 2000) generalizes the notion of (implicitly defined) semi-algebraic function. It encompasses all functions commonly considered in engineering. We refer the reader to (Bolte et al., 2021, Appendix A.2) for a precise definition. Main definition The mapping γ provides a good candidate to approximate the uncertainty region Uε δ . It writes as follows. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds Definition 7 (The Jackpot manifold ) For δ > 0 sufficiently small, we define the set Mε δ def = γ(z) ; z RD, Φ(γ(z)) Φ(x ) 2 ε and γ(z) x 2 δ . (18) and we refer to γ as the parameterization of Mε δ. From Theorem 5, this set is well defined whenever Φ is of class C1. Moreover, the Jackpot set recovers the local structural non-identifiabilities (i.e., U0 δ ). Theorem 8 Let Φ be of class C1 and D = dim ker JΦ(x ) > 0. Moreover assume that dim ker JΦ(x) remains constant in a neighborhood of x . Then there exists δ > 0, such that Jackpot recovers the uncertainty region locally, that is M0 δ = U0 δ . All the preceding results hold under the assumption that Φ is of class C1. In this case, the set Mε δ is well-defined, but establishing whether it admits a manifold structure is nontrivial and remains an open problem. However, when Φ is of class C2, one can additionally prove that Mε δ indeed has a smooth manifold structure. Theorem 9 Let Φ be of class C2 and D dim ker JΦ(x ). Then Mε δ defines a manifold. The proofs of Theorem 8 and 9 are provided in Appendix A.3.2. 2.2.2 Approximation Guarantees In this section, we focus on the case Φ of class C2. We show in Theorem 12 that Mε δ provides a good approximation of Uε δ . Assumption 10 (A simple approximation condition) Let Sz = n x Uε δ , V ,T D (x x ) = z o denote the slice of Uε δ which has coordinate z on Tx . We assume that there exists η > 0 such that sup z RD γ(z) x 2 δ diam(Sz) η. (19) This assumption is illustrated in Fig. 4. Notice that the value η > 0 can be large even if the set is thin, when the set is bent a lot with respect to Tx . For a C2 mapping Φ, the thickness η can be controlled by regularity properties as shown in the following proposition, the proof of which is deferred to Appendix A.3.2. Proposition 11 (Uniformly bounded curvature) Assume that Φ is of class C1 and that D dim ker(JΦ(x )) = N R . Assume moreover that its Jacobian JΦ is L-Lipschitz over the ball B(x , δ), i.e. x, x B(x , δ), JΦ(x) JΦ(x ) op L x x 2 (20) N. Munier, E. Soubies and P. Weiss Figure 4: An illustration of Assumption 10. The slices Sz of Uε δ have a diameter uniformly bounded by η. where op is the operator norm. Finally, suppose that the Lipschitz constant L satisfies where σ N D is the (N D)-th singular value of JΦ(x ). Then Assumption 10 holds with η = 2ε σ N D Lδ. (21) From Proposition 11, we get that for a twice differentiable mapping Φ, Assumption 10 is always satisfied for sufficiently small δ (i.e., sufficiently locally). We are now in a position to show how well Mε δ approximates Uε δ in terms of the Hausdorffdistance defined in (2). The proof is provided in Appendix A.3.2. Theorem 12 ( Approximation guarantees) Let Φ be of class C2 and D dim ker JΦ(x ). Then, under Assumption 10, Mε δ is a good approximation manifold of Uε δ in the sense that: d H(Mε δ, Uε δ ) η, (22) where d H is defined in (2). More precisely, we have Mε δ Uε δ Mε δ + B(0, η). (23) Remark 13 A few remarks are in order: In the linear case, since the Hessian is null, any radius δ > 0 suits as long as the dimension D satisfies D dim ker JΦ(x ). An upper-bound on the Hessian norm HΦ 2,δ is sup x B(x ,δ) (σmax (HΦm(x)))m M 2 with Φm being the m-th component of Φ. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds 2.2.3 Numerical Computation From a numerical perspective, we have to discretize the manifold Mε δ. This can be done by evaluating (Pz) on a discrete subset Z = {zk}0 k K of RD. We let Mε δ(Z) denote the discretized manifold: Mε δ(Z) def = Mε δ {γ(z); z Z} . (24) The main problem now reduces to evaluating γ for grid points Z satisfying the constraints. The principle is to propagate a front starting from z = 0D with a grid-search algorithm and to solve (Pz) at each z. A pseudo-code is given in Algorithm 1. We detail the two steps below. Grid search In this work, we simply discretize RD with a Cartesian grid. Other choices such as the honeycomb pattern are possible, but have not been explored. Fixing a grid length s > 0, we define the following Cartesian grid Zs def = s ZD B (0D, δ) = s δ , . . . , δ Since the initial point γ(0) = x is known, the grid search should start with the point 0D Z. Solving (Pz) can be achieved with first order methods, which require a starting point. Since γ is a smooth mapping, a good approximation of γ(z) is γ(z ) for some z where γ has already been computed. This leads to consider a grid search method starting from 0D Z and passing from neighbor to neighbor. The Breadth-First Search (BFS) algorithm is well suited for this task. We add a graph structure G = (Z, E) (where E is the set of edges) on the grid in order to apply this algorithm. Given this structure, the natural choice of initial guess for the optimization at a non already-explored node is thus given by the closest already computed neighbor. BFS method also makes it possible to stop the search in the neighborhood of a vertex whenever one of the two criteria defining (24) are violated. Optimization algorithm Each step of Algorithm 1 requires solving the optimization problem (Pz). Obtaining the following result is based on inserting the linear constraint in the objective function. Its proof is provided in Appendix A.4.1. Proposition 14 Let Π def = Id N V DV ,T D be the orthogonal projection on the subspace (Im V D) and consider the following optimization problem for z RD η (z) def = argmin-loc x RN 1 2 Φ(x + V Dz + Π x) Φ(x ) 2 2 . (P z) Then there exists a local neighbor V RD of 0D such that the solution γ of (Pz) verifies for z V, γ(z) = x + V Dz + Π η (z). (26) Moreover, if Φ is of class C2, a gradient descent algorithm applied to (P z) converges linearly provided the step size is properly chosen (Polyak, 1987). N. Munier, E. Soubies and P. Weiss Algorithm 1 BFS parameterization of Mε δ(Z) Inputs of the model: Φ : RN RM a C1 mapping. x RN an estimation of input parameter. 1 D < N the manifold dimension. V D RN D the D lowest right singular vectors of JΦ(x ). Inputs of the paramaterization: δ > 0 a radius around x (to deal with x x 2 δ). ε > 0 a discrepancy threshold (to deal with Φ(x) Φ(x ) 2 ε). G = (Z, E) a connected graph with 0D Z RD. Initialization: η (0D) 0N. γ(0D) x . repeat (z, zprev) the couple (non-evaluated, evaluated) neighboring vertices with smallest distance. η (z) solution of (P z) initialized at η (zprev) with an L-BFGS method. γ(z) x + V Dz + (Id N V DV ,T D )η (z). if Φ(γ(z)) Φ(x ) 2 >ε or γ(z) x 2 >δ then Remove vertex z from Z and all its edges from E. end if until No couple (z, zprev) exists Return {γ(z)}z Z To further accelerate the computation of γ(z) we propose to use the L-BFGS method on (P z), which is a quasi-Newton algorithm. It accelerates the convergence by using a preconditionning based on an estimation of the Hessian using the gradients at each step. It produces a sequence of iterates (γk(z)). A natural stopping criteria for this sequence is Φ(γk(z)) Φ(x ) 2 ν ε with ν < 1 a small constant. Moreover, we also deploy a warm-start strategy over the grid Z to further accelerate the convergence of L-BFGS, leveraging the smoothness of the manifold to obtain good initial points. We summarize the whole process and give a pseudo-code in Algorithm 1. Approximation result One can extend the approximation result of the previous section to this discrete framework. The proof of the following corollary is provided in Appendix A.4.2. Corollary 15 Under the same assumptions as in Theorem 12, and using a Cartesian grid Z = Zs defined in (25), the discretized manifold provides a good approximation of the uncertainty region. Specifically, we have d H(Mε δ(Z), Uε δ ) η + L where L > 0 is the Lipschitz constant of the local solution γ to (Pz), d H is the Hausdorff distance as defined in Definition 2, and η > 0 is the threshold defined in Theorem 12. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds (b) Φ = UAU T (c) Φ(x) = x 2 2 Figure 5: Illustration of the proposed Jackpot method with D = 1 on the same examples as in Fig. 1. The low dimensional manifold Mε δ is represented with blue lines. In contrast to the profile likelihood method, the profile is computed along the most suitable direction. In addition, the set Mε δ is the best possible one-dimensional approximation of the 2 dimensional ellipses in (a) and (b). In example (c), the approximating set Mε δ precisely coincides with the unit circle (i.e., Uε δ ), except on the extremities. 2.2.4 Discussion Before turning to the numerical experiments, we describe the strengths and weaknesses of the approach in order to highlight the problems for which it is best suited. Behavior on the toy examples To begin with, let us illustrate how the algorithm behaves on the three introductory examples of Fig. 1. We see that the algorithm outputs one dimensional sets (in blue), which correctly approximate the uncertainty regions in all three cases. The method suffers neither from the limitation of the choice of coordinates as were profile likelihood methods, nor from the instability of Monte Carlo-based methods when the Jacobian of Φ is degenerate. Adapting to large dimensions Since the LOBPCG algorithm is matrix-free and combined with an automatic differentiation algorithm, our algorithm is scalable to large model (up to millions of parameters). The approximation being of smallest dimension D, it can be represented quite easily at least for dimensions D = 1, 2, 3. Depending on the geometry of the problem, our method gives a full exploration of the uncertainty region or only a partial one as shown in Fig. 6. When is it most useful? From our theoretical analysis in Section 2.2.2, one can see that the low-dimensional manifold Mε δ is a good approximation of Uε δ provided that D is sufficient large relatively to the number of low singular values of JΦ(x ). Yet, for large-scale problems, it is not possible to set D too large due to computational limitations. This leads us to classify problems in three categories depending on the Jacobian singular spectrum, as illustrated in Fig. 6. When the Jacobian is well-conditioned ( flat spectrum, Fig. 6.a), we have practical local identifiability and the Jackpot method will detect it. When the Jacobian is badly-conditioned, we distinguish two cases. N. Munier, E. Soubies and P. Weiss Figure 6: Relation between Jacobian singular spectrum and manifold approximation of the uncertainty region Uε. (a) No small singular values: the problem is identifiable. (b) Small number of small singular values. (c) Large number of small singular values. The set Uε can not be approximated by a low dimensional manifold. If the spectrum admits few low singular values with a sharp transition (Fig. 6.b), Uε δ can be well approximated by a low-dimensional manifold and the Jackpot method will compute it. Otherwise (e.g., Fig. 6.c), Uε δ cannot be well approximated by a low-dimensional manifold. In this situation, the manifold Mε δ computed by Jackpot (with small D) will only partially represent Uε δ . Notice that in the last case, we cannot expect to sketch the uncertainty set efficiently due to its intrinsically high dimensional nature. Stability The Jackpot algorithm depends heavily on the subspace spanned by the D lowest right singular vectors of the Jacobian JΦ(x ). This subspace defines the tangent space of the constructed manifold at x . The stability of this subspace to perturbations in JΦ(x ) depends on the spectral gap between the D-th and (D + 1)-th smallest singular values, that is, on the quantity σN D+1 σN D (Stewart and Sun, 1990). When this gap is large, the D-dimensional subspace is well-separated from the remaining directions, and small errors in JΦ(x ) (due, e.g., to numerical differentiation or stochastic estimation) lead to only small perturbations of the tangent space. Conversely, when the singular values around the D-th index are clustered or close to zero, the subspace becomes highly sensitive to errors in JΦ(x ). In this situation, which is not favorable, as explained in the previous paragraph, many manifolds are near equivalent in terms of approximation, and Jackpot will just select one of them. Adversarial versus natural perturbations Our goal in this paper is to design a manifold that approximates Uε = Φ 1 (B (Φ(x ), ε)) in Hausdorffdistance. We call the resulting set an adversarial manifold. The directions (v N D+1, . . . , v N) represent the perturbation directions that most significantly affect the inverse map Φ 1. These directions also capture the local geometry of Uε most effectively, as seen in Theorem 12. In contrast, Monte Carlo sampling methods serve a different purpose: they simulate natural perturbations, which are typically observed in real-world scenarios. Given a random Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds vector Y = Φ(x) + B, where B follows some probability distribution µ, they capture the pushforward distribution Φ 1 (µ ( Φ(x))). Points derived from the Jackpot method, however, may possess very low likelihood and would almost never appear through Monte Carlo sampling. This distinction becomes clearer in our numerical experiments. Adversarial and natural perturbations offer distinct advantages, depending on the nature of uncertainty one aims to describe. Adversarial perturbations (Antun et al., 2020) are critical in scenarios where incorrect decisions may lead to severe consequences, as they identify the worst-case distortions. On the other hand, natural perturbations provide an averaged view of uncertainty, reflecting more common or benign variations. The limitations of locality Our approximation and our analysis is only local. For instance, in Fig. 3, only half of the circle can be parameterized by our method. Hence the method should only be used locally. Solutions such as a periodic reassessment of the tangent plane could be considered, but we did not explore this rather computationally heavy approach in this paper. Link with sensitivity analysis The uncertainty region is an object closely related but different from the sensitivity region, which can also be treated by a slightly modified Jackpot algorithm. Let S : RM RN be a reconstruction mapping from the measurement space RM to signal space RN. We define the sensitivity region as Sε def = S (B(Φ(x ), ε)) , (28) which corresponds to the set of reconstructions obtained when perturbing the data around Φ(x ) with a noise of norm smaller than ε. In the idealized case where S is a perfect inverse , satisfying Φ(S(y)) = y for all y RM, it holds that Sε Uε. (29) That is, the sensitivity region is a subset of the full uncertainty region, reflecting additional constraints imposed by the solver, such as regularization or prior information. It captures the residual non-identifiability under the reconstruction method. Both Uε and Sε provide complementary views of the estimation problem: The set Uε quantifies the fundamental ambiguity in the inverse problem, independent of any reconstruction algorithm. The set Sε reflects the actual spread of outputs produced by a specific solver when facing data perturbations. The direct extension of the Jackpot algorithm to sensitivity analysis consists in defining the manifold: γ(z) = argmax-loc y C(z) S(y) S(y ) 2 2, (30) where C(z) is an affine subspace of dimension M D, centered at y + V Dz, and oriented along the orthogonal complement of the subspace spanned by the D dominant right singular vectors of the Jacobian JS(y ). More precisely, C(z) = y + V Dz + ran(V D) , (31) N. Munier, E. Soubies and P. Weiss where V D = [v 1, v 2, . . . , v D] RM D is the matrix formed by the top D right singular vectors of JS(y ), and y = Φ(x ). However, the optimization problem (30) may not admit a solution in general, as it involves maximizing over an unbounded set. To overcome this issue, we introduce a compact alternative: C(z) = n y RM : y y 2 = z 2, y y + cone(V Dz) + ran(V D) o , (32) where cone(V Dz) = {t V Dz : t R+} is a half-line in direction V Dz. The resulting set C(z) is compact: it is a spherical cap (half-sphere) of radius z 2, centered at y , and intersected with an (M + 1 D)-dimensional subspace. We choose the cone rather than the full span to capture the possible asymmetry of the sensitivity region around y . For intuition, consider the case D = 1. Then the curve γ(z) corresponds to the direction of maximal sensitivity, i.e., the worst-case adversarial perturbation at fixed radius. More precisely, we have: {γ(z), γ( z)} argmax-loc y y 2= z 2 S(y) S(y ) 2 2. (33) Up to this point, we have used the definition of C(z) in (32). This alternative compact version is equally suitable for both uncertainty and sensitivity analysis. For clarity of exposition, we chose to present the former; however, most of the theoretical analysis and practical conclusions apply to the latter as well. 3. Numerical Experiments In this section, we provide a few numerical examples on complex problems and compare it with some alternatives in the main three cases of practical identifiability described in Section 1.1. 3.1 Measuring Masses in the Solar System In this first example, we address the problem of identifying the masses of five planets of the solar system (Jupiter, Saturn, Uranus, Neptun, and Pluto), from their approximate positions at a few time points. We assume that the planets are moving according to Newton s law of gravitation. This corresponds to the problem described in Section 1.1.1. Notation We sort the planets with respect to their distance to the Sun. We let 1 n N denote the index of the n-th outer planet and associate the index 0 to the Sun. We let un(t) R3 denote the positions in AU of the n-th planet at time t, in an heliocentric frame. Therefore u0(t) = 0 for all time. We let wn denote the weight of the n-th planet in the solar mass unit, that is w0 = 1. Moreover, we let u(t) = (u1(t), . . . , u N(t)) RN 3 denote the vector of positions and w = (w1, . . . , w N) denote the vector of masses. The time is expressed in years. The time t = 0 corresponds to January 1, 2000. Initial positions and speeds are taken from IMCCE (2019). Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds -0.61 -0.52 -0.54 0.27 -0.08 -0.42 -0.02 0.73 0.52 0.12 -0.68 0.56 -0.02 -0.47 -0.11 0.09 0.63 -0.4 0.65 -0.01 -0.07 0.02 -0.14 -0.1 0.98 (a) planet trajectories (250y), measurements (5y) (b) Jacobian singular values (c) Jacobian singular vectors (d) Misfit evolution (e) Masses evolution Figure 7: Solar system experiment with Jackpot. After measuring the approximate planet s positions for 5 years, we wish to recover their masses. The spectrum of the Jacobian (b) indicates that a direction is particularly uncertain. Looking at the Jacobian singular vectors (c), we see that it corresponds to Pluto s mass. In (d) and (e), we see the evolution of the misfit and of the masses along the 1D set Mε δ, and observe that all masses are stable (relative to their mass), except Pluto s which can vary by 4 orders of magnitude. Modeling The solar system can be modeled through the second order dynamical system: un(t) def = un(t) un (t) 3 2 (un(t) un (t)), 1 n N. (34) To discretize this system, we use a Runge-Kutta scheme of order 4 (Butcher, 2016). This yields a forward mapping Φ of the form: Φ : w 7 (u(tk))0 k K, (35) where tk = k t denote uniformly spaced sampling points. The measurements are taken on a period of 5 years within intervals of t = 7 days, leading to K = 260 positions for each planet. Notice that 5 Earth-years do not cover a complete revolution for the outer planets. For instance, Pluto s revolution lasts about 250 years. This is illustrated on Fig. 7 (a), where the black lines illustrate the measurements. We add white Gaussian noise to the measurements with a standard deviation ε = 1000 km ( 7 10 6 AU, where AU is the Earth-Sun distance). This yields a measurements vector y = Φ( w) + b, where w denotes the true planets weights and b is the random perturbation. N. Munier, E. Soubies and P. Weiss 10 10 10 8 10 6 10 4 10 2 Mass of Jupiter || (x) y|| in AU 10 10 10 8 10 6 10 4 10 2 Mass of Saturn 10 10 10 8 10 6 10 4 10 2 Mass of Uranus 10 10 10 8 10 6 10 4 10 2 Mass of Neptun 10 10 10 8 10 6 10 4 10 2 Mass of Pluto Figure 8: Profile likelihood on the solar system problem. The blue stars represent the true values of the parameters. Green segments represent confident interval. Black dashed lines represent the error threshold ε. Profile likelihood detects the Pluto s mass non-identifiability. Uncertainty with Jackpot To estimate the masses w, we first solve the nonlinear problem w = argmin w RN 1 2 Φ(w) y 2 2 (36) using a gradient descent to high accuracy. The gradient is evaluated through automatic differentiation. This gives a fairly good estimate of Pluto s mass, overestimating the true estimate by 4%. Then, we run algorithm 1. The singular values of the Jacobian of Φ are displayed in Fig. 7 b). As can be seen, the last singular value is nearly ten times smaller than the others. Taking a close look at the singular vectors in Fig. 7 c), shows that the last singular vector is entirely concentrated on Pluto s mass, indicating that the estimation of this planet s mass is dubious. We then compute a D=1-dimensional manifold approximation Mε δ of the uncertainty region with the Jackpot method. As can be seen in Fig. 7, we can vary the range of Pluto s mass within the range [10 10, 10 6], while staying consistent with the data. In that range, the other planets masses nearly do not change (at least relative to their weight). This illustrates a huge uncertainty on the mass of Pluto when evaluating it with 5 years of planetary observations only. Comparisons with other approaches In Fig. 8 we use profile likelihood on this problem to estimate the uncertainty. This problem is particularly well suited to this approach, since only Pluto s mass is uncertain. In this case, we see that all profiles are concentrated on the true planets mass, apart for Pluto. There, the profile likelihood method recovers the same uncertainty interval as the one provided by Jackpot. The only difference is that Jackpot was able to automatically find this direction, without having to explore all of them. Also notice that profile-likelihood methods would likely bring partial information on the uncertainty if Pluto was discarded from the set. Indeed, the top-left 4x4 block in Fig. 7 c) indicates strong correlation effects between the different weights. We do not show the results of the Monte Carlo sampling method since it fails for this example. Indeed, the sampling process only provides points that are very close to x , and the method fails to recover the uncertainty set. This phenomenon is similar to the one observed in Fig. 2 (c): the regression problem to recover the samples xi is very flat along the direction of Pluto s mass. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds 3.2 Blind Deblurring In this section, we explore the problem described in Section 1.1.2. Observing a biological object with a microscope yields diffraction limited (i.e. blurry) images. When the impulse response, or Point Spread Function (PSF) of the system is unknown, improving the image sharpness requires solving a blind deblurring problem. In this section, we showcase how Jackpot makes it possible to evaluate uncertainty on the recovered blur kernels and images. Modeling We consider the classical image formation model y = k(θ) x + b, (37) where x RM denotes the observed sample, denotes the convolution product, b RM is a noise vector, and k : RN RM represents the point spread function (PSF) of the system. It depends on parameters θ RN. The scalar theory of diffraction (Goodman, 2005), informs us that the PSF k(θ) can be characterized by its pupil function; a two-dimensional function supported on a disk whose radius depends on the numerical aperture of the objective and the wavelength of the collected light. A common practice is to expand this pupil function on a truncated Zernike polynomial basis (Goodman, 2005). With this model, only a few Zernike polynomials are sufficient to describe a rich class of realistic PSF that includes typical aberrations. Hence, in our model (37), θ RN denotes the vector of Zernike coefficients. We fix N = 8 in this work, corresponding to the following optical aberrations: defocus, vertical and oblique astigmatism, trefoil and coma and primary spherical. The blind inverse problem consists in estimating both θ and x from y. To that end, we follow the approach proposed by Gossard and Weiss (2024). We first train a reconstruction mapping R(θ, y) which computes an estimate ˆx(θ) such that the pair (θ, ˆx(θ)) is coherent with the data y. In this experiment, R is a deep plug-and-play image restoration method (DPIR). It corresponds to a half-quadratic splitting algorithm where the proximal step is replaced by a DRUNet denoiser with pre-trained weights (Zhang et al., 2021). Equipped with this (non-blind) reconstruction mapping R, we want to find the pair (θ, ˆx(θ)) that best fits the data y. That is, we want to minimize k(θ) R(θ, y) y 2 2. Using a Bayesian formalism, this means that we are trying to find the maximum a posteriori pair (ˆθ, ˆx(ˆθ)). This blind deblurring problem can be cast in our framework by using the following forward mapping Φ : θ 7 k(θ) R(θ, y). (38) In the first row of Fig. 9 we display the input image (a) as well as the noiseless measurements (b) we generated for this experiment. Uncertainty with Jackpot Given y, we first estimate the parameters θ by solving θ = argmin θ RN 1 2 Φ(θ) y 2 2 (39) using a gradient descent with automatic differentiation. In Fig. 9 (c) and (d) we show the associated PSF k(θ ) and deconvolved image ˆx(θ ) = R(θ , y). Note that the image seems significantly better resolved than the observation, which suggests that reconstruction mapping R is constructed carefully. N. Munier, E. Soubies and P. Weiss Then, we run Algorithm 1. The singular values of the Jacobian of Φ and the two last singular vectors are displayed in Fig. 9 (e) and (f), respectively. As opposed to the previous experiment, we see no clear gap in the amplitude of the last singular values. The singular vectors are clearly mixing the different Zernike coefficients, suggesting that the profile likelihood methods will struggle indicating the main directions of uncertainty. We then compute a D=2-dimensional manifold approximation Mε δ of the uncertainty region with the Jackpot method. We obtain a parameterization of the Zernike coefficients vector θ(z) for z R2 with θ(0) = θ . In Fig. 9 (g), we report the SNR between the outputs Φ(θ(z)) and Φ(θ ). The level line of 40d Bs is displayed in green. In Fig. 11 we show how the profile likelihood behaves on this problem. The confidence intervals obtained with the profile likelihood method are also displayed as blue segments in Fig. 9 (g). They have been projected on the tangent plane Tθ for comparison. As can be seen from this plot, the profile likelihood method significantly underestimates the uncertainty domain, since the blue segments tips are far from the level line boundaries. In Fig. 10, we provide some zoomed regions of the deblurred images in (a) and of the corresponding PSF in (b) for different parameters θ(z) Mε δ. The corresponding parameters z are shown as red stars in Fig. 9 (g). We see that it is possible to significantly deform the PSFs with a negligible modification of the measurements. We can also observe on Fig.s 10 (a) and (b) that some structures of the sample can be hallucinated without affecting the measurements too much. Overall, this shows that the studied blind deblurring process is unstable, though it produces sharp and nice looking images for all parameters. Finally, let us mention that the Monte Carlo sampling method strongly underestimates uncertainty regions as illustrated in Fig. 12. The reason is related to the fact that we wish to recover a relatively low dimensional manifold (less than 8), but add noise on a high dimensional space of color images (200 200 3). 3.3 Posterior Exploration for an Image Deblurring Problem In contrast to the previous section, we tackle regular inverse problems where the forward operator A is known perfectly. Formalism We explore the problem described in Section 1.1.3. Given y = Ax + b, with b N(0, σ2Id), we construct the MAP estimator ˆx MAP def = argmin x RN log px|y(x|y) = argmin x RN 1 2σ2 Ax y 2 2 log px(x). By first order optimality conditions, ˆx MAP satisfies log px|y (ˆx MAP|y) = 1 σ2 AT (Aˆx MAP y) log px(ˆx MAP) = 0. (40) The term log px(ˆx MAP) can be estimated using a learned denoiser and Tweedie formula (Ho et al., 2020). In this paper, we use the denoiser proposed by Hurault et al. (2021), which enables to compute both the log likelihood and its gradient. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds (a) Input image x (b) Measurements y (c) PSF k(θ ) (d) Estimate ˆx(θ ) (e) Jacobian singular values 5 10 Zernike index (f) Singular vectors 0.100 0.075 0.050 0.0250.000 0.025 0.050 0.075 0.100 z1 (g) Output SNR in d B Figure 9: Blind delurring experiment. The reconstructed image ˆx(θ ) = R(θ , y) in (d) seems significantly sharper than the observation y. There is no clear gap for the lowest singular values in (e), suggesting that the uncertainty domain has a rather large intrinsic dimension. In (f), the Zernike coefficients of the two last singular vectors of J are mixed, meaning that the main uncertainty directions are not aligned with pure optical abberations. In (g), we evaluated the function SNR(Φ(θ), Φ(θ )) for θ Mε δ. The level line of 40d B, corresponding to a high fidelity is displayed as a green curve. We also projected the confidence intervals obtained with the profile likelihood method as blue segments. (a) Zoom on blue and green regions in Fig. 9 (a) (b) PSF k(θ(z)) Figure 10: Blind deblurring identifiability - Grid sampling of the 2-dimensional manifold Mε δ. (a-b) Grid sampling of image differences |ˆx(θ(z)) x |. (c) PSF grid-sampling k(θ(z)). Following Section 1.1.3, we set Φ = log px|y ( |y). The uncertainty region Uε δ now can be interpreted as the set of images with smallest gradient amplitude. We compute ˆx MAP using a gradient descent run for many iterations to reach a near 0 gradient. N. Munier, E. Soubies and P. Weiss 0.1 0.0 0.1 Zernike coefficient z4 Output discrepancy in d B 0.1 0.0 0.1 Zernike coefficient z5 0.1 0.0 0.1 Zernike coefficient z6 0.1 0.0 0.1 Zernike coefficient z7 0.1 0.0 0.1 Zernike coefficient z8 0.1 0.0 0.1 Zernike coefficient z9 0.1 0.0 0.1 Zernike coefficient z10 0.1 0.0 0.1 Zernike coefficient z11 Figure 11: Profile likelihood on blind inverse problem. The blue stars represent the true values of the parameters. The green segments represent confidence intervals. The two first coefficients seem less identifiable than the other from this analysis. (a) Monte Carlo sampling (b) Jackpot Figure 12: Violinplots of the Zernike coefficients obtained using a Monte Carlo sampling with 1000 points (left) and the Jackpot algorithm (right) for the blind deblurring experiment. Observe that the amplitudes obtained with Monte Carlo sampling are about 103 times lower than those obtained with Jackpot, illustrating the large discrepancy between natural and adversarial perturbations. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds (b) PSF spectrum (c) Kernel element (d) Reference (e) Blurry image y Figure 13: The deblurring problem. To illustrate Jackpot, we designed a specific PSF with a highly oscillatory 2D kernel (null-space). An element is shown in (c). (a) Recovered x (b) Jackpot v N (c) Jackpot v N 1 (d) Singular values (e) The last 5 leftmost eigenvectors of J for the deblurring inverse problem. Figure 14: Recovered image x , 2 images recovered by Jackpot, the singular spectrum and 5 leftmost eigenpairs of J . Jackpot results We treat the 200 200 color image shown on Fig. 13d. The forward operator A is defined as a convolution with the PSF shown on Fig. 13a. Its discrete Fourier transform is shown on Fig. 13b. It possesses only two zeros (green crosses). Considering the Fourier transform symmetries for real signals, it means that the kernel is 2D for each color channel and therefore dim(ker(A)) = 6. This deblurring problem can be considered as mildly ill-posed. The resulting blurry image and recovered images are shown on Fig. 13e and 14a respectively. The five last singular values of the Jacobian matrix J are displayed in Fig. 14d. Surprisingly, they all possess roughly the same amplitude. We could expect that the only directions of uncertainty are unlocalized patterns such as the one in Fig. 13c. Looking closer at the associated singular vectors in Fig. 14e, we see that they all consist of localized and highly oscillatory patterns. The reason why the uncertainty patterns can be localized is that we do not constrain Ax = y, but just promote it through the data term 1 2σ2 Ax y 2 2, leaving space for more uncertainty. In this examples, profile-likelihood methods are unusable due to the too large number of parameters (3 200 200). We propose a comparison with Monte Carlo sampling in Fig. 15. N. Munier, E. Soubies and P. Weiss (a) Monte Carlo sampling (b) Jackpot with D = 2 (c) Jackpot with D = 10 Figure 15: Comparison between Monte Carlo sampling and Jackpot for image deblurring. The pixel-wise maximum distance with x is displayed for 1000 Monte Carlo points in (a) and for 20 Jackpot points and D = 2 in (b) and 100 Jackpot points and D = 10 in (c). Notice the different amplitude ranges. After drawing 1000 samples, we display maximal pixel-wise distance to x both for the Monte Carlo and the Jackpot methods. As can be seen, the Monte Carlo method significantly underestimates the size of the uncertainty region. This again illustrates the difference between the effect of natural (Monte Carlo) and adversarial (Jackpot) perturbations. Acknowledgments This work was supported by the ANR Micro-Blind (grant ANR-21-CE48-0008), the University Research School EUR-MINT (ANR-18-EURE-0023). This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-AD011012210R2). Photo credit: David Villa Science Image, CBI, CNRS. Appendix A. Proofs A.1 Definition and Characterization of Manifolds Definition 16 Let x be a point of an open set U of RD. A map f : U RN is said to be an immersion at x if f is of class C1, and its Jacobian Jf(x) is injective. If this is true for all x U, the map f is said to be an immersion. Definition 17 (Local definition by image of immersion, Lee, 2012) Let M be a subset of RN. The set M is said to be a manifold of dimension D if, for all x M, there exist a neighborhood U of 0D in RD a neighborhood V of x in RN f : U RN, an immersion at 0D such that f(0D) = x such that f : U V M is an homeomorphism. The proposed Jackpot builds an approximation Mε δ (see Definition 7) of Uε δ through the parametrization γ : RD RN in (Pz). According to Definition 17, to prove that Mε δ presents a manifold structure, we require the following two ingredients : Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds γ is of class C1 and Jγ(0D) is injective (i.e., γ is an immersion at 0D), such that γ(0D) = x (true by definition). there exist neighborhoods V of x in RN and U of 0D in RD such that γ : U V M is an homeomorphism. Actually, the second point is implied by the first one as stated by Proposition 18 below. Consequently, the proof of the manifold structure of Mε δ in Theorem 12 reduces to proving only the first point above. Proposition 18 (Injective Jacobian implies a local structure of manifold) Let f : RD RN be a function of class C1. Assume that Jf(0D) is injective. Then there exists a neighborhood U RD of 0D such that f(U) is a manifold. Proof First of all, Jf is locally injective near 0D because of the lower semi-continuity of the rank function. Indeed, the set of points on which Jf is injective can be reformulated as the inverse image of the open set R under the continuous map det(JT f Jf) and thus is open. From rank theorem (Lee, 2012, Thm. 5.13), up to diffeomorphisms, f is locally equal to its Jacobian which is locally injective, so is f. We actually get that f is locally an immersion. From (Lee, 2012, Lem. 5.34), since f is locally an immersion, f is also locally an embedding on a neighborhood U. Finally from (Lee, 2012, Thm. 5.31), f(U) is an embedded submanifold as the image of a smooth embedding, which concludes the proof. A.2 Linear Approximation A.2.1 Proof of Proposition 4 Defining the ellipsoid E def = {x RN ; Σ V ,T (x x ) 2 2 ε2} we can reformulate e Uε δ = E B(x , δ) and f Mε δ = E B(x , δ) Tx . (41) As f Mε δ = e Uε δ Tx e Uε δ , the Hausdorffdistance is given by d H( f Mε δ, e Uε δ ) = supx e Uε δ d(x, f Mε δ). Without considering the constraint x B(x , δ), we get the same for E and E Tx d H(E Tx , E) = sup x E d(x, E Tx ). (42) Now, we can consider the following equivalent optimization problem (Id N ΠV D)(x x ) 2 as d(x, E Tx )2 = (Id N ΠV D)(x x ) 2 2 where V D is defined in Equation 13. The associated Lagrangian is given by L(λ, x) = 1 (Id N ΠV D)(x x ) 2 Σ V ,T (x x ) 2 2 ε2 . (44) N. Munier, E. Soubies and P. Weiss And its gradient is PN n=1 s2 n(σ n)2 ε2 PN D n=1 snv n λ PN n=1(σ n)2snv n where we denote the coordinates x x = PN n=1 snv n. The optimum is achieved where the gradient vanishes, that is n=1 s2 n(σ n)2 = ε2 ; sn = 0, n N D + 1 sn(1 + λσ n) = 0, n N D . From this, only one sn is non zero. Among the N D local optima, only the following is global as long as σ N D < σ N D 1 (otherwise, it is still a global optimum but not unique) sn = 0, n = N D ; s N D = ε σ N D and λ = 1 σ N D . (46) Finally let s add the constraint x x 2 δ. If δ > ε σ N D , then the supremum is achieved at x + ε σ N v N D. Otherwise, δ < ε σ N D and the supremum is achieved at x + δv N D. In both cases, the Hausdorffdistance verifies Equation 15. A.3 Nonlinear Approximation All proof are provided in a more general setting with the parametrization γ(z) def = argmin-loc A(x x )=z 1 2 Φ(x) Φ(x ) 2 2 , (PA) where A RD N with 1 D N satisfies the following assumption. Assumption 19 The matrix A verifies AAT = Id D and the matrix Πim AT + Πker A JΦ(x )T JΦ(x ) is invertible. We denote the projections operators on the image and kernel of A as Πim AT def = AT A and Πker A def = Id N AT A respectively. The results provided in the main text are specific instances of the results reported below with A = V ,T D for which Assumption 19 holds. Lemma 20 Assumption 19 holds true for A = V ,T D . Proof With A = V ,T D , the matrix Πim AT + Πker A JΦ(x )T JΦ(x ) = V D Diag σ2 1, , σ2 N D, 1, , 1 V ,T D is invertible by construction of V ,T D . Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds A.3.1 Proof of Theorem 5 Theorem 5 is a specific instance of Theorem 21 below with A = V ,T D . Theorem 21 Let Φ : RN RM be a C1 map, x RN, and A RD N satisfying Assumption 19. Then, there exists an open neighborhood V RD such that (PA) admits a unique solution γ : V RN verifying γ(0D) = x . Moreover, it has the following properties If Φ is of class C2, then γ is of class C1. Moreover, its Jacobian is given by Jγ(z) = Πim AT + Πker A 2F(γ(z)) 1 AT . (47) If Φ is of class C1 and JΦ is locally Lipschitz and definable, then γ is a locally Lipschitz definable function. Moreover, we can express a conservative Jacobian for γ as Jγ : z n [Πim AT + Πker A B] 1 AT ; B J F (γ(z)) o . (48) where F(x) = 1 2 Φ(x) Φ(x ) 2 2 denotes the objective function of (PA). Proof Starting from the Karush-Kuhn-Tucker (KKT) conditions of Problem (PA), the idea of the proof is to apply an implicit function theorem on the dual formulation to get existence of solutions for both on primal and dual problems. We moreover derive an explicit expression of the (conservative) Jacobian of γ. From constraint optimization theory, the Lagrangian function associated to (PA) reads Lz(λ, x) def = F(x) + λT (A(x x ) z) . where λ RD is the vector of Lagrange multipliers and F(x) = 1 2 Φ(x) Φ(x ) 2 2 is the objective function. As Φ is of class C1, and given that the constraint is linear, the KKT conditions reduce to Lz(λ, x) = A(x x ) z F(x) + AT λ for z RD. In particular, multiplying the second line of Lz by the matrix A yields that λ(z) = A F(x) since AAT = Id D. Hence, finding a solution (λ, x) of the KKT conditions of (PA) is equivalent to solving the implicit system G(z, x) def = A(x x ) z (Id N AT A) F(x) Let s notice that G has values in RD+N but its image spans only the N dimensional subspace RD Ker A. Indeed, the matrix (Id N AT A) is the projection onto the subspace ker A = (im AT ) . Thus one can reduce the output dimension of G to make it surjective while keeping the same zeros level sets. This is done by left multiplying with [AT , Id N] which embeds the first term A(x x ) z of G onto im AT while keeping the second term in (im AT ) . One obtains H(z, x) def = AT (A(x x ) z) + (Id N AT A) F(x). N. Munier, E. Soubies and P. Weiss Our problem is now equivalent to the following implicit equation H(z, γ(z)) = 0N, (49) for z RD and with H : RD RN RN defined before. Given that a solution of Equation 49 verifies the KKT conditions, it thus is also a solution of the primal equation (PA). We now use two forms of implicit function theorem to get existence of a solution of Equation 49 and thus existence of a mapping γ for (PA). C2 case: From implicit function theorem (Lee, 2012, Thm. 5.15), the only assumption required is the invertibility condition on the second sub-matrix D2H(0D, x ) of the differential DH = [D1H, D2H] of H at (0D, x ). With some computations, one exactly recovers the second condition in Assumption 19. Then as a consequence, the Jacobian of γ is given by Jγ(z) = D2H(z, γ(z)) 1D1H(z, γ(z)) from which we recover the Equation 47. C1 case: We use the Lipschitz definable version of implicit function theorem from (Bolte et al., 2021, Thm. 1). As JΦ is locally Lipschitz and definable, so is F and thus H. With γ(0D) = x , one recovers H(0D, γ(0D)) = 0N. A conservative Jacobian of H can be written as JH : (z, x) n AT , Πim AT + Πker A B ; B J F (γ(z)) o . (50) Since F(x) = JΦ(x)T (Φ(x) Φ(x )), by path-differentiation rules, one derives the formula J F (x) = n B (Φ(x) Φ(x )) + JΦ(x)T JΦ(x) ; B JJT Φ (x) o and evaluating in x = x provides, since the first term vanishes, J F (x ) = JΦ(x )T JΦ(x ) . (51) Finally inserting Equation 51 in Equation 50 allows us to express JH(0D, x ) = AT , Πim AT + Πker A JΦ(x )T JΦ(x ) . This last equation implies that the assumptions of (Bolte et al., 2021, Thm. 1) are verified: JH(0D, x ) is a singleton and thus convex and its second sub-matrix is invertible from Assumption 19. This completes the proof. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds A.3.2 Proof of Theorem 8, Theorem 9, Proposition 11 and Theorem 12 Assumption 10, Proposition 11 and Theorem 12 are specific instances of Assumption 22, Proposition 23 and Theorem 24 below with A = V ,T D . Assumption 22 (Controlled thickness) Let Sz = {x Uε δ , A(x x ) = z} denote the slice of Uε δ which has coordinate z on Tx . We assume that sup z: γ(z) x 2 δ diam(Sz) η. (52) Proposition 23 (Uniformly bounded curvature) Let Φ be of class C1. Assume that its Jacobian JΦ is L-Lipschitz over the ball B(x , δ), i.e. x, x B(x , δ), JΦ(x) JΦ(x ) op L x x 2 . (53) Then assume that the Lipschitz constant L satisfies where σmin > 0 is the lowest singular value of the Jacobian JΦ(x ) restricted to the kernel of A. Then Assumption 22 is verified with η = 2ε σmin Lδ. (55) Proof Setting z B(0D, δ), let s upper bound the diameter of the intersection of Uε δ with the affine hyperplane A( x ) = z. When this slice is empty there is nothing to do, and when it is not, it only remains to bound the norm x1 x2 2 for any couple x1, x2 Uε δ verifying A(x1 x ) = A(x2 x ) = z. To that end, one can remark that Φ(x1) Φ(x2) 2 Φ(x1) Φ(x ) 2 + Φ(x ) Φ(x2) 2 2ε as x1 and x2 belong to Uε δ . The remaining of the proof consists in lower bounding the left-hand side by a term of the form µ x1 x2 2 with some µ > 0. Applying integral form of Taylor expansion on Φ, one gets Φ(x1) Φ(x2) = J (x1 x2), (56) with the operator J RM N defined as J def = Z 1 0 JΦ (x2 + t(x1 x2)) dt. (57) Then, from the inverse triangular inequality, we get Φ(x1) Φ(x2) 2 JΦ(x )(x1 x2) 2 J JΦ(x ) (x1 x2) 2 . (58) N. Munier, E. Soubies and P. Weiss The first right term of Equation 58 can be handled noticing that the difference x1 x2 lies in the kernel of A since A(x1 x2) = A(x1 x ) + A(x x2) = z z = 0. Thus by definition of the minimal singular value of JΦ(x ) restricted to the kernel of A, denoted σmin, it follows that JΦ(x )(x1 x2) 2 σmin x1 x2 2 . Moreover, the second right term of Equation 58 can be bounded using Lipschitz constant as follows: J JΦ(x ) (x1 x2) 2 Z 1 0 [JΦ (x2 + t(x1 x2)) JΦ(x )] (x1 x2) 2 dt 0 JΦ (x2 + t(x1 x2)) JΦ(x ) opdt x1 x2 2 0 x2 + t(x1 x2) x 2 dt x1 x2 2 Lδ x1 x2 2 . Finally Equation 58 becomes Φ(x1) Φ(x2) 2 (σmin Lδ) x1 x2 2 as intended. Proof [Proof of Theorem 9] Using a standard result on smooth manifolds from Lee (2012), which is summarized in Proposition 18, the image of γ locally defines an embedded manifold since Jγ(0) is injective as the product of an invertible matrix with AT which is injective by assumption. Proof [Proof of Theorem 8] The goal of this proof is to demonstrate that U0 is locally the image of a smooth transformation of a D-dimensional affine subspace (step 1), and that this transformation preserves the local bijectivity of the orthogonal projection from the uncertainty region onto its tangent space (step 2). This result follows from an application of the inverse function theorem. As a consequence, each point in the tangent space near the origin corresponds to a unique point in the uncertainty region. Therefore, the optimization problem defining the Jackpot parametrization admits a unique solution in a sufficiently small neighborhood, and this solution coincides with U0. A graphical view of the geometric objects used in this proof can be found in Fig. 16. Step 1: By the constant rank theorem, since the rank of JΦ is constant in a neighborhood of x , there exist neighborhoods V of x and W of Φ(x ), and two diffeomorphisms v : RN V and w : RM W such that Φ|U = w JΦ(x ) v 1. (59) Notice that U0 V = v(L), where L def = JΦ(x ) 1 w 1(Φ(x )) (60) Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds Figure 16: Illustration of the decomposition of the direct model Φ thanks to constant rank theorem. is an affine subspace of RN. Step 2: Let T be the tangent space of v(L) at x and πT : RN T its orthogonal projection. Let s show that f def = πT v : L T is locally a diffeomorphism around x0 def = v 1(x ) using inverse function theorem. The differential of f is given by Df(x0) = DπT (x ) Dv(x0) = πT Dv(x0). (61) Moreover, by definition of the tangent space T , one has Dv(x0)(L) = T . (62) Hence Df(x0)|L = Dv(x0)|L. The key argument is that since the linear application Dv(x0) : RN RN is invertible (because v is a diffeomorphism), the same goes for Dv(x0)|L : L T since T and L have the same dimension. Hence applying inverse function theorem, the map f : L T is a local diffeomorphism near v 1(x ). It induces that πT |v(L) = f v 1 : v(L) T is also a local diffeomorphism near x . Let take V = B(x , δ) with δ > 0 such that πT |U0 δ is a diffeomorphism to its image. Thus normal slices x + T for x T B(x , δ) intersect U0 at a unique point. As a consequence, the problem argmin πT (x)=z, x 2 δ 1 2 Φ(x) Φ(x ) 2 2 (63) that defines the Jackpot parameterization admits a unique global minimizer and thus M0 δ = U0 δ . N. Munier, E. Soubies and P. Weiss Theorem 24 Let Φ : RN RM be a function of class C2, x RN be a point, A RD N satisfying Assumption 19, and let ε > 0 be a threshold. Then Mε δ defines a manifold with parameterization γ in (PA) for sufficiently small δ > 0. Moreover, under Assumption 22, the manifold Mε δ is a good approximation of Uε δ in the sense that: d H(Mε δ, Uε δ ) < η, (64) where the Hausdorffdistance d H is defined in Equation 2. More precisely, we have the following inclusions: Mε δ Uε δ Mε δ + B(0, η). (65) Proof First of all, from Theorem 21, the set Mε δ is well defined for some δ > 0 and by definition verifies the first inclusion Mε δ Uε δ . To prove the second inclusion, given a point x Uε δ , we need to find a point x Mε δ such that x x 2 η. A natural candidate is x def = γ( z) with z def = A( x x ). Indeed, given that both x and x are in Uε δ and (by definition) verify A( x x ) = A(x x ) = z, Assumption 22 implies that x x 2 η. Moreover, we have Φ(γ( z)) Φ(x ) 2 Φ( x) Φ(x ) 2 ε, (66) and thus x = γ( z) Mε δ. Note that the first inequality in Equation 66 comes from the fact that γ( z) solves (PA) while x is an admissible point. The second one is due to the fact that x Uε δ . This completes the proof. A.4 Numerical Computation A.4.1 Proof of Proposition 14 Proof Proof of Proposition 14. In (Pz), splitting the vector x x RN within the space decomposition RN = Im V D (Im V D) yields the reformulation (P z). Here the constraint V ,T D (x x ) = z is included in the objective function that we denote 2 Φ(r(z, x)) Φ(x ) 2 2 (67) where r(z, x) def = x + V Dz + Π x. The main strategy to locally prove the linear convergence of gradient descent is to show a geometrical property on Fz. Since it is not locally strongly convex we use some Polyak-Lojasiewicz (PL) condition: Definition 25 For a real number µ > 0, a function f is said to verify the µ-Lojasiewicz inequality if for all x, 1 2 f(x) 2 2 µ(f(x) f(x )). (68) The argumentation will proceed according to the following four steps, where the results are verified locally around (z, x) = (0D, 0N): Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds 1. The restricted map Fz def = Fz|(Im V D) on (Im V D) is strongly convex. 2. The map Fz satisfies a PL inequality since any strongly convex map also satisfies a PL inequality (Polyak, 1963, lemma). 3. Since Fz only depends on the orthogonal part (Im V D) , i.e. Fz(x) = Fz(x ) whenever Π x = Π x , the objective function Fz also verifies the same PL inequality. 4. Since Fz verifies a PL inequality and is also L-smooth for some L > 0 (since of class C1 on a neighbor set), this assures the linear convergence of the gradient descent (Polyak, 1963, thm. 4). The only remaining argument is to prove the first point : the local strong convexity of the objective function. The Hessian of Fz at x is 2Fz(x) =Π JT Φ(r(z, x))JΦ(r(z, x))Π + Π HT Φ(r(z, x))(Φ(r(z, x)) Φ(x )) RN N. Evaluating at (z, x) = (0D, 0N) gives H def = Π J ,T J Π RN N. Restricted to the subspace (Im V D) , this matrix is a positive-definite matrix with eigenvalues equal to σ2 1 σ2 N D > 0 since D > R . Moreover, this submatrix corresponds to the Hessian of the map F0 at x = 0N. By smoothness of Φ, the map (z, x) 7 2 Fz(x) is continuous and thus the matrices 2 Fz(x) remain locally positive-definite matrix for (z, x) on a neigborhood U V of (0D, 0N). This property is a characterization of the strong convexity of the maps Fz retricted to V for all z U. A.4.2 Proof of Corollary 15 From Theorem 5, one knows that the parameterization function γ is locally Lipschitz. The key remaining argument is that any point z of the ball BD(0, δ) is at most at a distance of Ds/2 from a point z of the grid Zs. From Lipschitz consideration, γ(z) γ(z ) 2 L Ds/2. This leads, using Theorem 24, the following inequalities d H(Uε δ , Mε δ(Z)) d H(Uε δ , Mε δ) + d H(Mε δ, Mε δ(Z)) which completes the proof. P. A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2008. N. Munier, E. Soubies and P. Weiss V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen. On instabilities of deep learning in image reconstruction and the potential costs of ai. Proceedings of the National Academy of Sciences, 117(48):30088 30095, 2020. R. Arutjunjan, B. M. Schaefer, and C. Kreutz. Constructing exact confidence regions on parameter manifolds of non-linear models. ar Xiv preprint ar Xiv:2211.03421, 2022. J. M. Bardsley. MCMC-based image reconstruction with uncertainty quantification. SIAM Journal on Scientific Computing, 34(3):A1316 A1332, 2012. J. M. Bardsley, A. Solonen, H. Haario, and M. Laine. Randomize-then-optimize: A method for sampling from posterior distributions in nonlinear inverse problems. SIAM Journal on Scientific Computing, 36(4):A1895 A1910, 2014. D. J. Bates, J. D. Hauenstein, and N. Meshkat. Identifiability and numerical algebraic geometry. Plos one, 2019. G. Bellu, M. P. Saccomani, S. Audoly, and L. D Angi o. Daisy: A new software tool to test global identifiability of biological and physiological systems. Computer methods and programs in biomedicine, 2007. J. Bolte, T. Le, E. Pauwels, and T. Silveti-Falls. Nonsmooth implicit differentiation for machine-learning and optimization. Advances in neural information processing systems, 34:13537 13549, 2021. J. Bona-Pellissier, F. Malgouyres, and F. Bachoc. Local identifiability of deep Re LU neural networks: the theory. Advances in Neural Information Processing Systems, 35:27549 27562, 2022. J. C. Butcher. Numerical Methods for Ordinary Differential Equations. John Wiley & Sons, 3rd edition, 2016. ISBN 978-1119121503. doi: 10.1002/9781119121534. J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasserstein distances with applications in Bayesian OT flow matching. ar Xiv preprint ar Xiv:2403.18705, 2024. H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye. Diffusion posterior sampling for general noisy inverse problems. ar Xiv preprint ar Xiv:2209.14687, 2022. M. Coste. An introduction to o-minimal geometry. Istituti editoriali e poligrafici internazionali Pisa, 2000. J. A. Duersch, M. Shao, C. Yang, and M. Gu. A robust and efficient implementation of LOBPCG. SIAM Journal on Scientific Computing, 2018. M. C. Eisenberg and M. A. L. Hayashi. Determining identifiable parameter combinations using subset profiling. Mathematical biosciences, 2014. J. L. Fern andez-Mart ınez and Z. Fern andez-Mu niz. The curse of dimensionality in inverse problems. Journal of Computational and Applied Mathematics, 2020. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds J. L. Fern andez-Mart ınez, M. Z. Fern andez-Mu niz, and M. J Tompkins. On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics, 2012. J. L. Fern andez-Mart ınez, Z. Fern andez-Mu niz, J. L. G. Pallero, and L. M. Pedruelo-Gonz alez. From Bayes to Tarantola: new insights to understand uncertainty in inverse problems. Journal of Applied Geophysics, 98:62 72, 2013. S. M. Fischer and M. A. Lewis. A robust and efficient algorithm to find profile likelihood confidence intervals. Statistics and Computing, 2021. J. W. Goodman. Introduction to Fourier optics. Roberts and Company publishers, 2005. A. Gossard and P. Weiss. Training adaptive reconstruction networks for blind inverse problems. SIAM Journal on Imaging Sciences, 17(2):1314 1346, 2024. A. V. Grayver and A. V. Kuvshinov. Exploring equivalence domain in nonlinear inverse problems using covariance matrix adaption evolution strategy (CMAES) and random sampling. Geophysical Journal International, 2016. P. Grohs, S. Koppensteiner, and M. Rathmair. Phase retrieval: uniqueness and stability. SIAM Review, 62(2):301 350, 2020. J. C. Helton, J. D. Johnson, C. J. Sallaberry, and C. B. Storlie. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliability Engineering & System Safety, 2006. S. Hengl, C. Kreutz, J. Timmer, and T. Maiwald. Data-based identifiability analysis of non-linear dynamical models. bioinformatics, 2007. J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840 6851, 2020. E. Hubert. Essential components of an algebraic differential equation. Journal of symbolic computation, 1999. S. Hurault, A. Leclaire, and N. Papadakis. Gradient step denoiser for convergent Plug-and Play. ar Xiv preprint ar Xiv:2110.03220, 2021. A. Hyv arinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6, dec 2005. IMCCE. position ephemeris, 2019. URL https://ssp.imcce.fr/forms/ephemeris. K. Jaganathan, Y. C. Eldar, and B. Hassibi. Phase retrieval: An overview of recent developments. Optical Compressive Imaging, pages 279 312, 2016. C. Jauberthie, N. Verdi ere, and L. Trav e-Massuy es. Fault detection and identification relying on set-membership identifiability. Annual Reviews in Control, 2013. H. K. Johansen. A man/computer interpretation system for resistivity soundings over a horizontally strafified earth. Geophysical Prospecting, 1977. N. Munier, E. Soubies and P. Weiss M. Kech and F. Krahmer. Optimal injectivity conditions for bilinear inverse problems with applications to identifiability of deconvolution problems. SIAM Journal on Applied Algebra and Geometry, 2017. A. V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM journal on scientific computing, 2001. N. N. Lam, P. D. Docherty, and R. Murray. Practical identifiability of parametrised models: A review of benefits and limitations of various approaches. Mathematics and Computers in Simulation, 2022. R. Laumont, V. D. Bortoli, A. Almansa, J. Delon, A. Durmus, and M. Pereyra. Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie. SIAM Journal on Imaging Sciences, 15(2):701 737, 2022. J. M. Lee. Introduction to Smooth Manifolds. Springer New York, 2012. Y. Li, K. Lee, and Y. Bresler. Identifiability in bilinear inverse problems with applications to subspace or sparsity-constrained blind gain and phase calibration. IEEE Transactions on Information Theory, 63(2):822 842, 2016. E. Liu, K. Rivera, and J. Fox. Efficient eigensolvers and their applications. 2020. B. Merkt, J. Timmer, and D. Kaschek. Higher-order Lie symmetries in identifiability and predictability analysis of dynamic models. Physical Review E, 92(1):012920, 2015. H. Miao, X. Xia, A. S. Perelson, and H. Wu. On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM review, 2011. K. Mosegaard and A. Tarantola. Monte Carlo sampling of solutions to inverse problems. Journal of Geophysical Research: Solid Earth, 100(B7):12431 12447, 1995. M. J. et al. Muckley. Results of the 2020 fast MRI challenge for machine learning MR image reconstruction. IEEE Transactions on Medical Imaging, 2021. doi: 10.1109/TMI.2021. 3075856. S. Nah, S. Son, S. Lee, R. Timofte, and K. M. Lee. NTIRE 2021 challenge on image deblurring, 2021. URL https://arxiv.org/abs/2104.14854. Beresford N Parlett. The symmetric eigenvalue problem. SIAM, 1998. M. Phuong and C. H. Lampert. Functional vs. parametric equivalence of Re LU networks. In International Conference on Learning Representations, 2019. A. Pinkus. N-widths in Approximation Theory, volume 7. Springer Science & Business Media, 2012. B. T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864 878, 1963. B. T. Polyak. Introduction to optimization. 1987. Jackpot: Approximating Uncertainty Domains with Adversarial Manifolds D. V. Raman, J. Anderson, and A. Papachristodoulou. Delineating parameter unidentifiabilities in complex models. Physical Review E, 2017. A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingm uller, and J. Timmer. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 2009. W. C. Rheinboldt. Manpak: A set of algorithms for computations on implicitly defined manifolds. Computers & Mathematics with Applications, 1996. T. J. Rothenberg. Identification in parametric models. Econometrica: Journal of the Econometric Society, 1971. E. Y. Sidky and X. Pan. Report on the AAPM deep-learning spectral CT grand challenge. Medical physics, 2024. URL https://arxiv.org/abs/2212.06718. G. W. Stewart and J. G. Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990. ISBN 978-0126702309. J. Tachella, M. Terris, S. Hurault, A. Wang, D. Chen, M. H. Nguyen, M. Song, T. Davies, L. Davy, J. Dong, et al. Deepinverse: A python package for solving imaging inverse problems with deep learning. ar Xiv preprint ar Xiv:2505.20160, 2025. D. Y. W. Thong, C. K. Mbakam, and M. Pereyra. Do Bayesian imaging methods report trustworthy probabilities? ar Xiv preprint ar Xiv:2405.08179, 2024. M. K. Transtrum and P. Qiu. Model reduction by manifold boundaries. Physical review letters, 2014. L. Van den Dries. Tame topology and o-minimal structures, volume 248. Cambridge university press, 1998. D. J. Venzon and S. H. Moolgavkar. A method for computing profile-likelihood-based confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 37(1):87 94, 1988. A. F. Villaverde, N. D. Evans, M. J. Chappell, and J. R. Banga. Input-dependent structural identifiability of nonlinear systems. IEEE Control Systems Letters, 3(2):272 277, 2018. P. Vincent. A connection between score matching and denoising autoencoders. Neural Comput., 23:1661 1674, jul 2011. URL https://doi.org/10.1162/NECO_a_00142. F.-G. Wieland, A. L. Hauber, M. Rosenblatt, C. T onsing, and J. Timmer. On structural and practical identifiability. Current Opinion in Systems Biology, 2021. K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360 6376, 2021.