# multifidelity_covariance_estimation_in_the_logeuclidean_geometry__7158b213.pdf Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry Aimee Maurais 1 Terrence Alsup 2 Benjamin Peherstorfer 2 Youssef Marzouk 1 We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold. The estimator fuses samples from a hierarchy of data sources of differing fidelities and costs for variance reduction while guaranteeing definiteness, in contrast with previous approaches. The new estimator makes covariance estimation tractable in applications where simulation or data collection is expensive; to that end, we develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget. Guaranteed definiteness is crucial to metric learning, data assimilation, and other downstream tasks. Evaluations of our approach using data from physical applications (heat conduction, fluid dynamics) demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks. 1. Introduction Covariance estimation is a fundamental task for understanding the relationships between multiple features of data points. It arises in a wide range of machine learning applications such as metric learning (Zadeh et al., 2016; Huang et al., 2015), graphical models (Loh & B uhlmann, 2014; Ravikumar et al., 2011), and classification (Hastie et al., 2001), as well as in spatial statistics (Cressie, 2015) and in science and engineering applications such as filtering and data assimilation (Schillings & Stuart, 2017; Bergemann & Reich, 2010; Law et al., 2015). The canonical Monte Carlo approach to estimating the covariance matrix Σ = Cov[y] of a random variable y with d components is to take n samples y1, . . . , yn and compute the sample covariance matrix 1Massachusetts Institute of Technology, Cambridge, MA USA 2Courant Institute of Mathematical Sciences, New York University, New York, NY USA. Correspondence to: Aimee Maurais , Terrence Alsup . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). i=1 (yi y) (yi y) , (1) where1 y = E[y]. The mean-squared error (MSE) of the estimator ˆΣn in the Euclidean metric, i.e., E[d F( ˆΣ, Σ)2] where d F(A, B) = A B F and F denotes the Frobenius norm, decays as O(1/n) if the samples are independent and identically distributed (i.i.d.). Furthermore, if Σ lies in the space Sd ++ of symmetric (strictly) positive definite (SPD) d d matrices, then the sample covariance matrix ˆΣn is in Sd ++ almost surely for n > d. We are interested in situations where obtaining a realization of y incurs a high cost c0 > 0 and thus straightforward application of the Monte Carlo estimator (1) becomes computationally intractable. High sampling cost is a pervasive issue in science and engineering, where sampling typically corresponds to measuring the response of processes and systems that must be either numerically simulated or physically observed. We note that there exist a wide range of techniques for covariance estimation in high dimensions (Bickel & Levina, 2008; Ledoit & Wolf, 2012; Loh & Wainwright, 2012; Fan et al., 2011; Tsiligkaridis & Hero, 2013; Loh & B uhlmann, 2014; Donoho et al., 2018). Our focus here instead is on situations where drawing each sample incurs high cost; see also the discussion in the Conclusions. Multi-fidelity estimation The opportunity that we exploit is that in many applications there is a multi-fidelity hierarchy of data sources from which correlated auxiliary samples can be obtained; see Peherstorfer et al. (2018) for a survey. Multi-fidelity approaches have found increasing utility in machine learning e.g., for optimization (Poloczek et al., 2017; Wu et al., 2020; Li et al., 2020), classification and sequential learning (Marques et al., 2018; Gundersen et al., 2021; Palizhati et al., 2022), and sampling (Cai & Adams, 2022; Prescott & Baker, 2020; Alsup et al., 2022). Let (Ω, F, P) be a probability triple. In the multi-fidelity setting, in addition to the original high-fidelity random variable y y(0) : Ω Rd generating the sample set {y(0)(ωi)}n0 i=1, one has access to a number of low-fidelity surrogate random variables y(1), . . . , y(ℓ) : Ω Rd and obtains corresponding sets of samples {y(ℓ)(ωi)}nℓ i=1, 1If the mean of y is unknown, one can replace y with the sample mean 1 n Pn i=1 yi and normalize (1) with 1/(n 1). Multi-Fidelity Covariance Estimation ℓ= 1, . . . , L. For a particular ω Ω, the realizations y(0)(ω), y(1)(ω), . . . , y(L)(ω) are correlated, but the realizations within each set {y(ℓ)(ωi)}nℓ i=1 are i.i.d., for ℓ= 0, . . . , L. We assume that the surrogates y(1), . . . , y(L) are sorted in order of decreasing fidelity to y(0) as measured by correlation coefficients 1 |ρ1| |ρ2| |ρL|. Here ρℓis a notion of multivariate correlation between y(0)(ω) and y(ℓ)(ω), ℓ= 1, . . . , L, that will be formally defined in Section 3.3. Corresponding to decreasing fidelity, the surrogate random variables have decreasing sampling costs, c0 c1 c L. Loss of definiteness in Euclidean multi-fidelity covariance estimation Direct application of multi-level and multifidelity Monte Carlo estimation (Giles, 2008; Cliffe et al., 2011; Teckentrup et al., 2013; Ng & Willcox, 2014) based on control variates to (co)variance estimation has been proposed in, e.g., Bierig & Chernov (2015); Qian et al. (2018); Mycek & De Lozzo (2019). These multi-fidelity estimators, however, rely on differences of single-fidelity Monte Carlo estimators, which can lead to a loss of definiteness of the estimated covariance matrix, as we will detail in Section 2. Ignoring this loss-of-definiteness can result in errors in downstream tasks (e.g., instabilities in Kalman filtering, distances becoming negative in a learned metric, inadmissible graph structures). Accordingly, post-processing strategies (Hoel et al., 2016; Chernov et al., 2021) have been proposed that eliminate negative eigenvalues, but elimination of eigenvalues introduces potential for large errors, requires hand-tuning of thresholds, and often, as the authors note, incurs high computational cost; see also Qi & Sun (2006); Popov et al. (2021). Our contribution: Guaranteed positive multi-fidelity log-Euclidean covariance estimation We introduce a multi-fidelity covariance estimator based on control variates in the log-Euclidean geometry for Sd ++ (Arsigny et al., 2006). The log-Euclidean geometry equips Sd ++ with vectorspace structure by placing it in one-to-one correspondence with Sd, the vector space of d d symmetric matrices, and defining notions of logarithmic addition and scalar multiplication for elements of Sd ++. In particular, we make use of this vector space structure to safely subtract sample covariance matrices as part of a control variate construction in the tangent space to Sd ++, which we identify with Sd. The log-Euclidean vector space can be equipped with the log-Euclidean metric for SPD matrices, which itself induces Riemannian structure on Sd ++. An additional advantage of the log-Euclidean geometry used here, as compared to many other geometries for Sd ++ (see Section 3.1), is that computations remain efficient as the most complex operations required are matrix exponentials and logarithms. Summary of main contributions The main contributions and key features of this work are summarized as follows: (a) We introduce a multi-fidelity estimator that preserves definiteness of covariance matrices in the finite-sample regime and that has first-order minimal mean squared error (MSE) in the log-Euclidean metric. (b) We provide analysis that leads to a first-order optimal sample allocation between the high-fidelity and surrogate models for gaining orders of magnitude speedups compared to using high-fidelity samples alone. (c) We demonstrate the algorithmic benefits of the log Euclidean geometry by showing that our estimator can be implemented with few lines of code (Appendix F), relying on standard numerical linear algebra routines. This low barrier to implementation allows the estimator to be readily plugged into existing code with minimal effort and even as a post-processing step. 2. Multi-Fidelity Covariance Estimation in Euclidean Geometry and Loss of Definiteness Recall from the introduction that we generate a set of high-fidelity samples {y(0)(ωi)}n0 i=1 and sets of surrogate samples {y(ℓ)(ωi)}nℓ i=1 over different fidelity levels ℓ= 1, . . . , L. The classical approach to multi-fidelity estimation, i.e., in the Euclidean geometry, is to use the surrogate samples to define control variates that reduce the variance of the standard Monte Carlo estimator (1) (Bierig & Chernov, 2015; Qian et al., 2018; Mycek & De Lozzo, 2019). This approach yields the Euclidean multi-fidelity (EMF) covariance estimator ˆΣ EMF n = ˆΣ (0) n0 + XL ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 , (2) where α = [α1, . . . , αL] RL are the control variate weights and ˆΣ (ℓ) nℓ= 1 y(ℓ)(ωi) y(ℓ) nℓ y(ℓ)(ωi) y(ℓ) nℓ , ˆΣ (ℓ) nℓ 1 = 1 nℓ 1 y(ℓ)(ωi) y(ℓ) nℓ y(ℓ)(ωi) y(ℓ) nℓ (3) are sample covariance matrices computed from {y(ℓ)(ωi)}nℓ i=1 and {y(ℓ)(ωi)}nℓ 1 i=1 , ℓ= 1, . . . , L. We require for well-definedness that nℓ nℓ 1, ℓ= 1, . . . , L. For levels ℓ, t {0, . . . , L} and integers mi, mj Z+, notice that the sample covariance matrices ˆΣ (ℓ) mi = 1 y(ℓ)(ωi ) y(ℓ) mi y(ℓ)(ωi ) y(ℓ) mi , ˆΣ (t) mj = 1 y(t)(ωj ) y(t) mj y(t)(ωj ) y(t) mj Multi-Fidelity Covariance Estimation are correlated because they are constructed from evaluations of y(ℓ) and y(t) on the common set of events {ωi}min{mi,mj} i=1 . This correlation is the key ingredient to achieving variance reduction in (2) over equivalent-cost single-fidelity estimators; see Section 3.3. Loss of definiteness The use of regular subtraction and scalar multiplication in Equation (2) treats the symmetric positive definite matrices ˆΣ (ℓ) nℓand ˆΣ (ℓ) nℓ 1, ℓ= 0, . . . , L, as elements of the Euclidean vector space Sd. However, SPD matrices do not constitute a vector space under the Euclidean geometry and thus the Euclidean multi-fidelity estimator (2) may become indefinite due to the presence of subtraction. That is, positive definiteness of (2) in the finitesample regime cannot be guaranteed even if its constituent single-fidelity sample covariance matrices are definite. 3. Log-Euclidean Multi-Fidelity (LEMF) Covariance Estimator 3.1. Log-Euclidean Geometry and its Benefits We begin by orienting ourselves within the log-Euclidean geometry for Sd ++ as defined in Arsigny et al. (2006). For A, B Sd ++ and λ R we have notions of logarithmic addition and scalar multiplication , A B = Exp(Log A + Log B) , λ A = Exp(λ Log A) = Aλ, (4) corresponding to regular addition and scalar multiplication of log A and log B in Sd. Here Exp : Sd Sd ++ is the matrix-exponential and Log : Sd ++ Sd is its inverse (Arsigny et al., 2007, Def. 2.1). The mappings Log and Exp place Sd and Sd ++ in one-to-one correspondence, enabling Sd ++ to make use of the vector space structure of Sd. The definitions of and (4) satisfy the axioms of a vector space, and we can equip this vector space Sd ++( , ) with the log-Euclidean metric, d LE(A, B) = Log A Log B F . (5) Computations related to the log-Euclidean geometry can be performed economically because efficient algorithms exist for computing matrix exponentials and logarithms. Efficient computation is a major advantage as compared to the geometries induced by other metrics such as the affine invariant metric (Bhatia, 2007; Huang et al., 2015). Review of other geometries for Sd ++ The set of d d symmetric positive definite matrices Sd ++ forms a Riemannian manifold embedded in the vector space of d d symmetric matrices Sd. One can equip Sd ++ with a number of geometries in addition to the Euclidean (which we have shown to be problematic for multi-fidelity estimation) and log-Euclidean geometries discussed here. There is, e.g., the affine-invariant geometry (Bhatia, 2007), often billed as the canonical choice, which gives rise to the affine-invariant metric, d Aff(A, B) = Log A 1B F . (6) Other choices include the Bures-Wasserstein geometry, arising from the L2-Wasserstein distance between multivariate, nondegenerate, mean-zero Gaussians (Malag o et al., 2018); and the Log-Cholesky geometry, obtained by parametrizing SPD matrices in terms of their Cholesky factors and defining a Riemannian metric on the space of lower-triangular matrices with positive diagonal entries (Lin, 2019). Choice of geometry for Sd ++ is application-dependent and often depends on factors such as cost to compute geodesic distance, availability of the Riemannian exponential and logarithmic maps in closed form, and whether the so-called swelling effect (Arsigny et al., 2006) is an issue. We choose the log-Euclidean geometry for its computational advantages, described at the end of the previous paragraph. 3.2. Log-Euclidean Multi-Fidelity Estimation We construct the log-Euclidean multi-fidelity (LEMF) covariance estimator via linear control variates in the log Euclidean geometry for Sd ++, ˆΣ LEMF n = ˆΣ (0) n0 ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 = Log ˆΣ (0) n0 + ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 ! where we have used the convention A B = A 1 B. The estimator (7) has the same algebraic structure as (2), but is guaranteeably positive definite at finite n = (n0, n1, . . . , n L) as the following proposition shows: Proposition 3.1. The log-Euclidean multi-fidelity covariance estimator (7) exists and is positive definite almost surely whenever n0, n1, . . . , n L > d. In addition to interpreting (7) as a linear control variate construction in the log-Euclidean geometry for Sd ++, we can equivalently view the LEMF estimator as a Euclidean linear control variate estimate of Log Σ on Sd, Log ˆΣ LEMF n = Log ˆΣ (0) n0 + ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 , (8) which we then map to Sd ++ via the matrix exponential. Variance reduction in (8) is achieved on Sd via the standard Euclidean control variate framework and then propagated Multi-Fidelity Covariance Estimation to Sd ++ via Exp( ). Furthermore, (7) has the form of a log- Euclidean Fr echet average of ˆΣ (0) n0 and L small perturbations induced by the surrogate sample covariance matrices; see Appendix A for details. The LEMF estimator requires computing 2L + 1 matrix logarithms and one matrix exponential. Note that obtaining the sample sets {y(ℓ)(ωi)}nℓ i=1, ℓ= 0, . . . , L, is typically much more computationally expensive than computing matrix logarithms and exponentials. 3.3. The MSE of the LEMF Estimator For the following analysis, we assume without loss of generality that y(ℓ), ℓ= 0, . . . , L, have zero mean, for ease of exposition. As introduced in Section 1, the surrogates y(1), . . . , y(L) are ordered by decreasing fidelity, 1 = ρ0 > |ρ1| > > |ρL| ρL+1 = 0, where ρℓis a multivariate generalization of the Pearson correlation coefficient, ρℓ= Tr Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ] with generalized variances defined as σ2 ℓ= Tr(Cov[y(ℓ)(y(ℓ)) ]), ℓ= 0, . . . , L. (10) The covariance of y(ℓ)(y(ℓ)) , ℓ= 0, . . . , L is a symmetric positive semidefinite linear operator from Sd to Sd, Cov[y(ℓ)(y(ℓ)) ] = E h y(ℓ)(y(ℓ)) Σ(ℓ) y(ℓ)(y(ℓ)) Σ(ℓ) i , where we define Σ(ℓ) = E[y(ℓ)(y(ℓ)) ]. Similarly, the cross-covariance between y(ℓ)(y(ℓ)) and y(m)(y(m)) , ℓ, m {0, . . . , L} is Cov h y(ℓ)(y(ℓ)) , y(m)(y(m)) i = E h y(ℓ)(y(ℓ)) Σ(ℓ) y(m)(y(m)) Σ(m) i . In writing (11) and (12) we do not invoke any particular representation for linear operators on Sd. Rather, we only assume that is an outer product for symmetric matrices compatible with the Frobenius inner product, that is, Tr(A B) = A, B F , A, B Sd. The following proposition derives the MSE of the log Euclidean estimator in the log-Euclidean metric (5). Proposition 3.2. Assume that Σ I F < h/4 with h < 1 and Σ Σ(ℓ) F < h/4 for ℓ= 1, . . . , L. Then, for coefficient vector α Rn and sufficiently large numbers of samples n NL+1, the MSE in the log-Euclidean metric of the LEMF estimator defined in (7) is E d LE ˆΣ LEMF n , Σ 2 = E Log ˆΣ LEMF n Log Σ 2 = σ2 0 n0 + (α2 ℓσ2 ℓ 2αℓρℓσℓσ0) + O(h2) . Note that the technical condition Σ I F < h/4 can always be established by re-scaling y(ℓ) = C 1/2y(ℓ) with a matrix C Sd ++ and using C as the base of the matrix logarithm and exponential in the estimator (7); we consider I as the base here for ease of exposition. Below we relate the MSE of the Euclidean estimator ˆΣ EMF n (2) in the Euclidean metric with the MSE of the log Euclidean estimator (7) ˆΣ LEMF n in the log-Euclidean metric: Corollary 3.3. The MSE in the Euclidean metric of the Euclidean multi-fidelity estimator (2) with sample size vector n and coefficients α is E d F ˆΣ EMF n , Σ 2 = E ˆΣ EMF n Σ 2 (α2 ℓσ2 ℓ 2αℓρℓσℓσ0) . (14) Corollary 3.3 shows that the MSE of the LEMF estimator in the log-Euclidean metric is equal to the MSE of the EMF estimator in the Euclidean metric up to first order. Note that it does not comment on a cross-comparison (e.g., MSE of the LEMF in the Euclidean metric and vice-versa); we will explore such comparisons numerically in Section 4. 3.4. Optimal Sample Allocation for the LEMF Proposition 3.2 and Corollary 3.3 show the dependence of the MSE in the log-Euclidean and Euclidean metrics of the LEMF and EMF estimators on the sample allocations n and coefficients α. We now derive n and α that minimize the respective MSEs up to first order and so achieve an optimal sample allocation across the hierarchy of data sources. We start with α : because the first-order approximation of the MSE in (13) and the MSE in (14) are both quadratic in the coefficients α1, . . . , αL, we can directly minimize for α1, . . . , αL and obtain α = [α 1, . . . , α L] , α ℓ= ρℓ σ0 σℓ , (15) Multi-Fidelity Covariance Estimation which are independent of the sample sizes. In preparation for finding n , we define the costs of the multi-fidelity estimators as ℓ=0 nℓcℓ. (16) Following Peherstorfer et al. (2016), Proposition 3.4 derives the optimal sample allocation n to obtain an estimator with first-order minimal MSE and costs c(n ) B for a fixed computational budget B > 0. Note that in writing (16), we assume that the costs of generating samples of the random variables y(0), . . . , y(L) are much higher than the cost of constructing a multi-fidelity covariance estimate, and in particular eclipse the cost of computing the matrix exponential and the 2L + 1 matrix logarithms required by the LEMF estimator (7). This situation is often the case, e.g., in science and engineering applications when generating samples may correspond to evaluating computationally intensive physics-based models. One could explicitly account for the cost of constructing the LEMF estimator by inserting a modified budget of ˆB = B clog/exp into Proposition 3.4, where clog/exp is the cost of the matrix logarithms and exponential, and subsequently obtain a result analogous to Corollary 3.5 with a more complicated condition for when the LEMF estimator outperforms an equivalent-cost high-fidelity estimator. Proposition 3.4. Let B > 0 denote the computational budget and let Proposition 3.2 apply. The first-order optimal sample allocation n for the LEMF estimator (2) that solves min n RL+1 E d LE ˆΣ LEMF n , Σ 2 such that c(n) B (17) c0(ρ2 ℓ ρ2 ℓ+1) cℓ(1 ρ2 1) c0(ρ2 i ρ2 i+1) ci(1 ρ2 1) . Because the MSE of the Euclidean estimator (in d F( , )) is equal to the first-order approximation of the MSE of the log-Euclidean estimator (in d LE( , )), we directly obtain that (18) is the optimal sample allocation for the Euclidean estimator as well. In practice, we round either up or down and so use either n or n . If the correlation coefficients ρl and costs cℓ, ℓ= 0, . . . , L are known, one can determine the optimal α and n for fixed budget B; or, alternatively, determine α and n so that the MSE of the LEMF estimator is below a threshold ϵ > 0. In situations when ρℓand cℓare unknown, it is common to estimate them in pilot studies (Cliffe et al., 2011; Peherstorfer et al., 2016) and even to reuse the pilot samples in the actual estimator (Konrad et al., 2022). 3.5. Discussion of the LEMF Estimator Interpretation Proposition 3.4 shows that when the correlation coefficients (9) are close to 1, then the LEMF estimator (7) allocates more of the budget to the cheaper surrogate samples. In the extreme case where ρℓ= 0, the entire budget is allocated to obtaining high-fidelity samples and one recovers the single-fidelity estimator (1) with n = B/c0. The following corollary shows that the corrrelation coefficients ρℓand costs cℓdictate whether the proposed LEMF estimator leads to lower MSE than the equivalent-cost single (high)-fidelity estimator. Corollary 3.5. Let Proposition 3.4 apply. Then, the firstorder term of the MSE of the LEMF estimator with n is E d LE ˆΣ LEMF n , Σ 2 =σ2 0 B cℓ(ρ2 ℓ ρ2 ℓ+1) (19) which will be smaller than the first-order log-Euclidean MSE of the cost-equivalent single-fidelity estimator (1) that uses n0 = B/c0 high-fidelity samples if and only if c0 (ρ2 ℓ ρ2 ℓ+1) < 1 . (20) Equation (20) demonstrates that for fixed c0, the benefit of using the LEMF estimator (7) over the equivalent-cost high-fidelity estimator (1) is a function of both the surrogate model correlations with the high-fidelity model, ρ1, . . . , ρL, and the surrogate model costs, c1, . . . , c L. It is particularly instructive to consider the bifidelity case with only one surrogate model (L = 1). In this setting, condition (20) simplifies to 1 ρ2 1 + rc1 c0 ρ2 1 < 1 2 1 ρ2 1 ρ2 1 < c0 c1 c0c1 . (21) If the normalized reduction in cost c0 c1 c0c1 achieved by the low-fidelity model is large, then the minimum required generalized correlation ρ1 such that condition (21) is satisfied decreases. Conversely, in the limit ρ1 1 we only require c1 < c0 in order to see a benefit from using the LEMF estimator. In Figure 1 we visually demonstrate this relationship by plotting the region in (ρ, c1/c0) space for which (21) holds and the LEMF estimator has a lower first-order MSE than the equivalent-cost high-fidelity estimator. Truncated multi-fidelity estimator In addition to the EMF estimator and equivalent-cost single-fidelity estimators, we Multi-Fidelity Covariance Estimation 𝜌1 0.0 0.5 1.0 Figure 1. The shaded region corresponds to pairs (ρ1, c1/c0) of correlation coefficient and cost ratio for which (21) holds and thus the LEMF estimator has lower first-order error than the equivalentcost single-fidelity estimator. Note that a lower correlation ρ1 requires a smaller cost ratio c1/c0 for the LEMF estimator to achieve lower error than the single-fidelity estimator. will compare the proposed LEMF estimator to the EMF estimator with positive definiteness enforced via eigenvalue truncation. Following Hoel et al. (2016), we define the truncated multi-fidelity covariance estimator as ˆΣ +δ n = Tδ ˆΣ EMF n , (22) where δ > 0 is a small threshold and the truncation operator Tδ acts on symmetric matrices via Tδ(A) = Q (Λ δ) Q with A = QΛQ denoting the eigen-decomposition of A and Λ δ denoting the diagonal matrix whose diagonal entries are max(λii, δ); see the Introduction for a discussion about drawbacks of the truncated estimator. 4. Numerical Examples 4.1. Motivating Example with Gaussians Consider the problem of estimating the covariance of a Gaussian random variable y N(0, Σ), with Σ S4 ++. For this motivating example, we obtain low-fidelity samples y(ℓ) = y + ϵ(ℓ) by corrupting a high-fidelity sample y with independent noise at different noise levels ϵ(ℓ) N(0, Γ(ℓ)) for ℓ= 1, 2, 3. The matrices Γ(1), . . . , Γ(3) are chosen such that noise levels lead to generalized correlations of ρ1 0.93, ρ2 0.74, and ρ3 0.58; see Appendix G. We impose unit costs for drawing a high-fidelity sample y and costs c1 = 10 2, c2 = 10 3, and c3 = 10 4 for drawing the surrogate samples. Because the samples are Gaussian, their empirical second moments follow a Wishart distribution and the generalized variances (10) and correlations (9) can be computed exactly; see Appendix G. Table 1 shows the MSE of the multi-fidelity estimators compared with that of single-fidelity covariance estimators using highfidelity or surrogate samples alone with the same budget B = 15. The LEMF estimator has 2 lower MSE than the other estimators in the log-Euclidean and affine-invariant metric, while being competitive in the Euclidean metric. 4.2. Uncertainty Quantification of Steady-State Heat Flow Physics model The steady-state heat equation over the domain X = (0, 1) with a variable heat conductivity is div (exp(κ(x; θ)) u(x; θ)) = f(x), x X , (23) where we impose Dirichlet boundary conditions u(0, θ) = 0, u(1, θ) = 1 and constant source f(x) = 1. We model the log heat conductivity κ as a degree-4 sine-polynomial κ(x; θ) = P4 k=1 θk sin(2πkx), whose coefficients are given by the components of θ = [θ1, . . . , θ4] R4. We obtain a high-fidelity observation model G(0) : R4 R10 by approximating the solution u( , θ) via a secondorder finite difference scheme with 65,536 grid points and observing the resulting temperature at d = 10 equally spaced points xi = i/11, i {1, . . . , 10}, in the interior of the domain X (see Figure 8 in Appendix H). The surrogate model G(1) is obtained by approximating the solution u( , θ) with only 1,024 grid points and performing the same measurement process. We identify the costs of the highfidelity and surrogate models with the number of grid points involved, c0 = 216 and c1 = 210. Covariance estimation We model θ N(0, I4 4) and define high-fidelity samples as y(0) = G(0)(θ) and surrogate samples as y(1) = G(1)(θ). Our aim is to estimate the covariance of y(0), which is a 10 10 matrix. We first estimate the generalized variances and correlations σ0, σ1, ρ1 in a pilot study to compute the optimal multi-fidelity sample allocations n = [n(0) , n(1) ] and α for seven values of budget B in the interval [106, 2 109]; see Appendix H. We then obtain n(1) n(0) realizations θ1, . . . , θn(1) of the input random variable θ and compute the n(0) high- fidelity samples {y(0)(θi)}n(0) i=1 and n(1) surrogate samples {y(1)(θi)}n(1) i=1 . Note that the first n(0) surrogate samples use the same realizations of θ as the high-fidelity samples, creating statistical coupling, and thus correlation, between the high-fidelity and surrogate samples. Results Figure 2 compares the MSE in the log-Euclidean, affine-invariant, and Euclidean metrics (6) of the singlefidelity and multi-fidelity estimators over 100 trials. Using the surrogate samples alone leads to estimates with large bias with respect to the true (high-fidelity) covariance. For example, the bias of the surrogate-only estimator can be seen in Figure 3c, which displays the convergence of each estimate of Σ2,5 to the truth; see also Appendix H. Multi-Fidelity Covariance Estimation MSE high-fidelity only surrogate only Euclidean MF truncated MF (δ = 10 16) LEMF (ours) log-Euclidean 1.72 5.82 Na N 66.88 0.83 affine-invariant 4.99 5.82 Na N 125.11 2.69 Euclidean 3.00 4.00 0.51 0.51 0.71 Table 1. Motivating example: The LEMF estimator leverages cheap surrogate samples to reduce the MSE compared to cost-equivalent single-fidelity estimators and other multi-fidelity estimators that rely on post-processing via truncation. 1e+07 1e+08 1e+09 MSE in log-Euclidean metric costs (budget B) 1e+07 1e+08 1e+09 MSE in affine-invariant metric costs (budget B) 1e+07 1e+08 1e+09 MSE in Euclidean metric costs (budget B) HF only surrogate only Euclidean MF truncated MF LEMF (ours) (a) log-Euclidean metric (b) affine-invariant metric (c) Euclidean metric Figure 2. Heat flow: In the log-Euclidean metric, our LEMF estimator achieves the MSE tolerance below 10 1 with 30 speedup compared to the single-fidelity estimator that uses high-fidelity samples alone. Only using surrogate samples leads to a large bias that prevents reaching an MSE below 10 1. The Euclidean multi-fidelity estimator is indefinite in more than 10% of the 100 trials used here and therefore does not provide a valid covariance matrix in this example. (Plot with min/max over 100 trials is shown in Appendix H.) HF only surrogate only Euc. MF trunc. MF LEMF (ours) nan nan (biased) nan nan -0.01 1e+07 1e+08 1e+09 smallest eigenvalue costs (budget B) Euc. MF trunc. MF LEMF (ours) 1e+07 1e+08 1e+09 costs (budget B) truth HF only surrogate only LEMF (ours) (a) speedup to reach MSE below 10 1 (b) smallest eigenvalue (c) convergence of estimators Figure 3. Heat flow: Plot (a): Our LEMF estimator achieves a 32 speedup. Plot (b): Our LEMF estimator remains positive definite even for small sample sizes, whereas the Euclidean estimator becomes indefinite. Plot (c): Entries of our LEMF estimates converge quickly to the entries of the true covariance, whereas surrogate-only estimators incur a large bias and high-fidelity-only estimators incur high costs. The EMF estimator is unbiased and achieves competitively low MSE in the Euclidean metric but becomes indefinite in more than 10% of trials (see Appendix H). For this reason its MSE in the log-Euclidean and affine-invariant metrics is infinite/nonexistent (see Figure 3b), as the log-Euclidean and affine-invariant distances between an element of Sd ++ and any non positive-definite matrix are . When we fix the EMF estimator via eigenvalue truncation (using a threshold of δ = 10 16 in (22)), obtaining the positive-definite truncated EMF estimator, we introduce the possibility of large MSE in the log-Euclidean and affine-invariant metrics because we have modify the spectrum, which can be seen well for small budgets in Figure 2. By contrast, the proposed log-Euclidean multi-fidelity estimator uses surrogate samples in conjunction with highfidelity samples and inherently retains positive definiteness, thus achieving about one order of magnitude speedup across metrics as compared to the estimator that uses the highfidelity samples alone; Figure 3a. The MSE of the LEMF estimator decays as O(1/B), in consort with the other Monte Carlo estimators which use high-fidelity samples. Multi-Fidelity Covariance Estimation Figure 4. Metric learning: An example of the final-time buoyancy for class i = 1. Our measurements consist of buoyancy values at nine spatial locations in the domain, indicated by black dots in the plot above. We use the observations to estimate a metric which distinguishes between observations corresponding to θ sampled from class i = 0 and from class i = 1. 4.3. Metric Learning: Surface Quasi-Geostrophy We now apply our multifidelity covariance estimators in an intermediate step in the metric learning of Zadeh et al. (2016) to enable easier classification of fluid regimes from buoyancy measurements given by a surface quasigeostrophic model (Held et al., 1985). Physics model In the model presented in Held et al. (1985), the evolution of the buoyancy b(x, t) over a periodic spatial domain X = [ π, π] [ π, π] is governed by tb(x, t; θ) + J(ψ, b) = 0 , (24) where ψ is the streamfunction and x = (x, y). The initial buoyancy is b0(x; θ) = 1 (2π/|θ5|)2 exp x2 exp(2θ1)y2 , the contours of which form ellipses parametrized by θ = [θ1, . . . , θ5] R5. θ1 is the log-aspect ratio and controls the origination of vortices in the buoyancy over time, and θ5 controls the amplitude of b0. The parameters θ2, θ3, and θ4 determine additional aspects of the dynamics (24); see Appendix I for details. We sample the parameters θ R5 from a two-component Gaussian mixture whose means differ only in the coordinate corresponding to the log-aspect ratio; see Appendix I. We use i {0, 1} to denote the mixture component from which a given realization of θ is sampled. The high-fidelity parameter-to-observable map G(0) maps θ onto d = 9 observations of the numerical solution of equation (24) in the spatial domain; see Figure 4 as well as Figure 12 in Appendix I. For G(0), numerical solution of HF only surrogate only Euclidean MF truncated MF LEMF (ours) mean relative error of distances (benchmark) (indefinite) Figure 5. Metric learning: When using the learned metric to compute distances between observations of the surface quasigeostrophic model, our LEMF estimator achieves a > 20% lower error in the distances than using high-fidelity samples alone. The Euclidean estimator provides an indefinite metric in this example. (24) is computed up to time T = 24 using finite differences with 256 grid points along each coordinate dimension and with a time-step of t = 0.005 2. High-fidelity samples y(0) are thus given by y(0) = G(0)(θ) and have covariance Σ S9 ++. The surrogate parameter-to-observable map G(1) and random variable y(1) are defined in the same way except that in solving Equation (24) we only use 64 grid points along each coordinate dimension. Evaluating G(1) is 16 times faster than evaluating G(0), with c0 = 2562 and c1 = 642. Metric learning We now apply our covariance estimator (7) to learn a metric on the data manifold. We use a slight modification of the geometric mean metric learning approach introduced in Zadeh et al. (2016), which learns an SPD matrix A to define the metric d A(y1, y2) = q (y1 y2) A(y1 y2) . The matrix A is an interpolation in the affine-invariant geometry for Sd ++ between the inverse of a similarity matrix S = Cov[y | i = 0]+Cov[y | i = 1] and a dissimilarity matrix D = S+µµ , where µ = E[y | i = 0] E[y | i = 1] . Following Zadeh et al. (2016), this interpolation is given by A = S 1/2(S1/2DS1/2)t S 1/2 , t [0, 1] . We apply our multifidelity techniques to estimating the two covariance matrices that comprise S, Cov[y | i = 0] and Cov[y | i = 1]. In constructing A we set t = 0.1, motivated by (Zadeh et al., 2016, Figure 3). Results Prior to applying our multi-fidelity estimators in this setting, we take a total of 24, 000 pilot samples between 2We use the Python package pyqg to compute our solutions; see http://pyqg.readthedocs.org/ Multi-Fidelity Covariance Estimation HF only surrogate only Euc. MF trunc. MF LEMF (ours) MSE in Euclidean metric 5.1e+02 5.1e+02 9.8e+02 9.8e+02 9.0e+02 9.0e+02 4.5e+02 4.5e+02 (indefinite) nan nan 0.4 HF only surrogate only Euc. MF trunc. MF LEMF (ours) MSE in log-Euclidean metric 4.6e-01 4.6e-01 7.4e-01 7.4e-01 5.0e-01 5.0e-01 3.6e-01 3.6e-01 (indefinite) HF only surrogate only Euc. MF trunc. MF LEMF (ours) MSE in affine-invariant metric 1.1e+00 1.1e+00 2.3e+00 2.3e+00 1.2e+00 1.2e+00 1.0e+00 1.0e+00 (indefinite) (a) Euclidean metric (b) log-Euclidean metric (c) affine-invariant metric Figure 6. Metric learning: Our LEMF estimator achieves the lowest MSE compared to the cost-equivalent benchmark estimator that uses high-fidelity samples alone. The Euclidean multi-fidelity estimator is indefinite over 100 trials and therefore fails to learn a valid metric. (y(0), y(1)) | i = 0 and (y(0), y(1)) | i = 1 in order to estimate the generalized variances σ2 0 and σ2 1 and correlation ρ1 in both classes. We additionally use these pilot samples to obtain a reference estimate of A, which we denote A0. We estimate Cov[y(0) | i = 0] and Cov[y(0) | i = 1] as follows: we construct multifidelity estimators from a combination of 15 high-fidelity samples and an additional number nlo,i of low-fidelity samples computed according to the optimal ratio in Proposition 3.4. We correspondingly construct equivalent-cost single-fidelity estimators with budgets Bi = 15chi + nlo,iclo, i {0, 1}, spent entirely on highor low-fidelity samples. We estimate Cov[y | i = 0] and Cov[y | i = 1] using the EMF, LEMF, and truncated EMF (δ = 10 16) estimators and the highfidelityand low-fidelity-only estimators with the sample allocations described above. The two covariance estimates are combined to obtain estimates of the metric matrix A. We first study the performance of the learned metrics relative to the reference metric. Let {y(0) j }5000 j=1 R9 be a test set of 5,000 points and consider the mean relative error of distances MRE(A) with respect to the reference metric A0; see Appendix I for a definition. Figure 5 shows the MRE for each estimator averaged over 50 independent trials. The LEMF estimator outperforms the single-fidelity estimator and the truncated MF estimator (22) by more than 20%. The Euclidean multi-fidelity estimator gives indefinite estimates of A, which cannot constitute valid metrics. We further compare the MSEs of the learned metric matrices relative to the reference A0. Figure 6 shows that the LEMF estimator (7) yields a closer approximation of the reference metric; in particular the LEMF estimator gives a more accurate estimate of A relative to the high-fidelity-only estimator while the other estimators give less accurate estimates. The surrogate-only estimator is limited by its large bias. The EMF estimator (2) is again indefinite in >10% of trials. 5. Conclusions We showed that formulating multi-fidelity estimation in a non-Euclidean geometry can be beneficial for enforcing structure: The proposed LEMF estimator leverages the log Euclidean geometry for SPD matrices to guarantee that the resulting covariance matrix estimates are positive definite. This property is in contrast to state-of-the-art multifidelity estimators that can lose definiteness, especially in the small-sample regime, which is typical in science and engineering applications in which drawing samples is expensive. Our LEMF estimator preserves definiteness and leverages surrogates to achieve speedups of more than one order of magnitude in our experiments. The experiments further show that preserving definiteness is key for enabling tasks such as metric learning and classification, for which indefinite Euclidean multi-fidelity estimators are invalid. Future work includes combining the multi-fidelity approach with, e.g., shrinkage and other regularization schemes for high-dimensional covariance estimation. Acknowledgements AM and YM acknowledge support from the Office of Naval Research, SIMDA (Sea Ice Modeling and Data Assimilation) MURI, award number N00014-20-1-2595 (Dr. Reza Malek-Madani and Dr. Scott Harper). AM additionally acknowledges support from the NSF Graduate Research Fellowship under Grant No. 1745302. TA and BP acknowledge support from the AFOSR under Award Number FA9550-211-0222 (Dr. Fariba Fahroo). Code Availability Python implementations of the LEMF, EMF, and truncated EMF estimators are available at https://github.com/amaurais/LEMF. Multi-Fidelity Covariance Estimation Alsup, T., Venturi, L., and Peherstorfer, B. Multilevel Stein variational gradient descent with applications to bayesian inverse problems. In Bruna, J., Hesthaven, J., and Zdeborova, L. (eds.), Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pp. 93 117. PMLR, 16 19 Aug 2022. Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. Log Euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 56(2):411 421, 2006. Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM journal on matrix analysis and applications, 29(1):328 347, 2007. Bergemann, K. and Reich, S. A localization technique for ensemble Kalman filters. Quarterly Journal of the Royal Meteorological Society, 136(648):701 707, 2010. Bhatia, R. Positive Definite Matrices. Princeton University Press, 2007. Bickel, P. J. and Levina, E. Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):199 227, 2008. doi: 10.1214/009053607000000758. Bierig, C. and Chernov, A. Convergence analysis of multilevel Monte Carlo variance estimators and application for random obstacle problems. Numerische Mathematik, 130 (4):579 613, Aug 2015. Cai, D. and Adams, R. P. Multi-fidelity Monte Carlo: a pseudo-marginal approach. ar Xiv, 2210.01534, 2022. Capet, X., Klein, P., Hua, B. L., Lapeyre, G., and Mc Williams, J. C. Surface kinetic energy transfer in surface quasi-geostrophic flows. Journal of Fluid Mechanics, 604:165 174, 2008. Chernov, A., Hoel, H., Law, K. J. H., Nobile, F., and Tempone, R. Multilevel ensemble Kalman filtering for spatiotemporal processes. Numerische Mathematik, 147(1): 71 125, Jan 2021. Cliffe, K. A., Giles, M. B., Scheichl, R., and Teckentrup, A. L. Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Computing and Visualization in Science, 14(1):3 15, 2011. Cressie, N. Statistics for spatial data. John Wiley & Sons, 2015. Donoho, D., Gavish, M., and Johnstone, I. Optimal shrinkage of eigenvalues in the spiked covariance model. The Annals of Statistics, 46(4):1742 1778, 2018. doi: 10.1214/17-AOS1601. Fan, J., Liao, Y., and Mincheva, M. High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39(6):3320 3356, 2011. Giles, M. B. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607 617, 2008. Gundersen, G. W., Cai, D., Zhou, C., Engelhardt, B. E., and Adams, R. P. Active multi-fidelity Bayesian online changepoint detection. In de Campos, C. and Maathuis, M. H. (eds.), Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, pp. 1916 1926. PMLR, 27 30 Jul 2021. Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del R ıo, J. F., Wiebe, M., Peterson, P., G erard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. Array programming with Num Py. Nature, 585(7825):357 362, September 2020. doi: 10. 1038/s41586-020-2649-2. Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer, 2001. Held, I., Pierrehumbert, R., Garner, S., and Swanson, K. Surface quasi-geostrophic dynamics. Journal of Fluid Mechanics, 282:1 20, 1985. Hoel, H., Law, K. J., and Tempone, R. Multilevel ensemble Kalman filtering. SIAM Journal on Numerical Analysis, 54(3):1813 1839, 2016. Huang, Z., Wang, R., Shan, S., Li, X., and Chen, X. Log Euclidean metric learning on symmetric positive definite manifold with application to image set classification. In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 720 729, Lille, France, 07 09 Jul 2015. PMLR. Konrad, J., Farcas , I.-G., Peherstorfer, B., Di Siena, A., Jenko, F., Neckel, T., and Bungartz, H.-J. Data-driven low-fidelity models for multi-fidelity Monte Carlo sampling in plasma micro-turbulence analysis. Journal of Computational Physics, 451:110898, 2022. Law, K., Stuart, A., and Zygalakis, K. Data Assimilation. Springer, 2015. Multi-Fidelity Covariance Estimation Ledoit, O. and Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics, 40(2):1024 1060, 2012. doi: 10.1214/ 12-AOS989. Li, S., Xing, W., Kirby, R., and Zhe, S. Multi-fidelity Bayesian optimization via deep neural networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 8521 8531. Curran Associates, Inc., 2020. Lin, Z. Riemannian geometry of symmetric positive definite matrices via Cholesky decomposition. SIAM Journal on Matrix Analysis and Applications, 40(4):1353 1370, 2019. Loh, P.-L. and B uhlmann, P. High-dimensional learning of linear causal networks via inverse covariance estimation. Journal of Machine Learning Research, 15(88): 3065 3105, 2014. Loh, P.-l. and Wainwright, M. J. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. Malag o, L., Montrucchio, L., and Pistone, G. Wasserstein Riemannian geometry of gaussian densities. Information Geometry, 1(2):137 179, 2018. Marques, A., Lam, R., and Willcox, K. Contour location via entropy reduction leveraging multiple information sources. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. Mycek, P. and De Lozzo, M. Multilevel Monte Carlo covariance estimation for the computation of Sobol indices. SIAM/ASA Journal on Uncertainty Quantification, 7(4): 1323 1348, 2019. Ng, L. and Willcox, K. Multifidelity approaches for optimization under uncertainty. International Journal for Numerical Methods in Engineering, 100(10):746 772, 2014. Palizhati, A., Torrisi, S. B., Aykol, M., Suram, S. K., Hummelshøj, J. S., and Montoya, J. H. Agents for sequential learning using multiple-fidelity data. Scientific Reports, 12(1):4694, Mar 2022. Peherstorfer, B., Willcox, K., and Gunzburger, M. Optimal model management for multifidelity Monte Carlo estimation. SIAM Journal on Scientific Computing, 38(5): A3163 A3194, 2016. Peherstorfer, B., Willcox, K., and Gunzburger, M. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550 591, 2018. Poloczek, M., Wang, J., and Frazier, P. Multi-information source optimization. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. Popov, A. A., Mou, C., Sandu, A., and Iliescu, T. A multifidelity ensemble Kalman filter with reduced order control variates. SIAM Journal on Scientific Computing, 43(2): A1134 A1162, 2021. Prescott, T. P. and Baker, R. E. Multifidelity approximate Bayesian computation. SIAM/ASA Journal on Uncertainty Quantification, 8(1):114 138, 2020. Qi, H. and Sun, D. A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM Journal on Matrix Analysis and Applications, 28 (2):360 385, 2006. Qian, E., Peherstorfer, B., O Malley, D., Vesselinov, V. V., and Willcox, K. Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA Journal on Uncertainty Quantification, 6(2):683 706, 2018. Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5(none):935 980, 2011. Schillings, C. and Stuart, A. M. Analysis of the ensemble Kalman filter for inverse problems. SIAM Journal on Numerical Analysis, 55(3):1264 1290, 2017. Teckentrup, A. L., Scheichl, R., Giles, M. B., and Ullmann, E. Further analysis of multilevel Monte Carlo methods for elliptic pdes with random coefficients. Numerische Mathematik, 125(3):569 600, Nov 2013. Tsiligkaridis, T. and Hero, A. O. Covariance estimation in high dimensions via Kronecker product expansions. IEEE Transactions on Signal Processing, 61(21):5347 5360, 2013. doi: 10.1109/TSP.2013.2279355. Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, I., Feng, Y., Moore, E. W., Vander Plas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and Sci Py 1.0 Contributors. Sci Py Multi-Fidelity Covariance Estimation 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261 272, 2020. doi: 10.1038/s41592-019-0686-2. Wu, J., Toscano-Palmerin, S., Frazier, P. I., and Wilson, A. G. Practical multi-fidelity Bayesian optimization for hyperparameter tuning. In Adams, R. P. and Gogate, V. (eds.), Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, pp. 788 798. PMLR, 22 25 Jul 2020. Zadeh, P. H., Hosseini, R., and Sra, S. Geometric mean metric learning. In ICML 16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, volume 48, pp. 2464 2471, June 2016. Multi-Fidelity Covariance Estimation A. Connection to Fr echet Averaging In addition to being a control-variate estimator in the log Euclidean geometry, Equation (7) can also be viewed as a Fr echet average. Recall that the LEMF estimator takes the form Log ˆΣ LEMF n = Log ˆΣ (0) n0 ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 . Defining the SPD difference matrices Dℓ= Exp(Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1) = ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1, ℓ= 1, . . . , L we equivalently have Log ˆΣ LEMF n = Log ˆΣ (0) n0 + ℓ=1 αℓLog Dℓ. (25) Assuming that α1, . . . , αL > 0 3, we see in (25) by definition of the Fr echet mean in the log-Euclidean geometry (Arsigny et al., 2006) that ˆΣ LEMF n satisfies ˆΣ LEMF n ELE, α( ˆΣ (0) n0 , D1, . . . , DL) = arg min Σ Sd ++ d2 LE(Σ, ˆΣ (0) n0 ) + ℓ=1 αℓd2 LE(Σ, Dℓ) where α = 1 α1 αL RL+1. Thus ˆΣ LEMF n can be interpreted as a weighted Fr echet average between high-fidelity ˆΣ (0) n0 and the low-fidelity perturbations Dℓ, ℓ= 1, . . . , L, with weights assigned according to how much ˆΣ (0) n0 is correlated in a generalized sense with each Dℓ. B. Proof of Proposition 3.1 We have defined the log-Euclidean multi-fidelity estimator by ˆΣ LEMF n = ˆΣ (0) n0 ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 = Log ˆΣ (0) n0 + ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 ! 3If αℓ< 0 for some ℓ, we can define βℓ= αℓand redefine Dℓ= ˆΣ (ℓ) nℓ 1 ˆΣ (ℓ) nℓby reversing the order of subtraction in order to ensure that the Fr echet weights are all positive. Because we have assumed n0, n1, . . . , n L > d, the sample covariance matrices ˆΣ (ℓ) nℓ, and ˆΣ (ℓ) nℓ 1, ℓ= 0, . . . , L are positive definite almost surely. That is, they are members of Sd ++. Because Sd ++( , ) is a vector space, it follows immediately that the top line of (26) ˆΣ LEMF n = ˆΣ (0) n0 ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 Sd ++ is positive definite. It remains to show that the bottom line of (26) is equivalent to the top line. This we accomplish by straightforward algebra, ˆΣ LEMF n = ˆΣ (0) n0 ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 = ˆΣ (0) n0 ℓ=1 αℓ ˆΣ (ℓ) nℓ Exp( Log ˆΣ (ℓ) nℓ 1) = ˆΣ (0) n0 ℓ=1 Exp αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 = ˆΣ (0) n0 Exp ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 ! Log ˆΣ (0) n0 + ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 ! C. Proof of Corollary 3.3 We first note that, by properties of inner and outer products on Sd, for two random matrices A, B Sd with E[A] = E[B] = 0 we have E [ A, B F] = E[Tr(A B)] = Tr(E[A B]) = Tr(Cov[A, B]), (28) where we have used linearity of trace. In the special case where A = B we have E[ A 2 F] = E[ A, A F] = Tr (Cov[A]) . (29) We now proceed with the remainder of the proposition. Recall that the EMF estimator is defined ˆΣ EMF n = ˆΣ (0) n0 + XL ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 , where α = [α1, . . . , αL] RL are the control variate weights and ˆΣ (ℓ) nℓ= 1 i=1 y(ℓ) i y(ℓ) i , ˆΣ (ℓ) nℓ 1 = 1 nℓ 1 i=1 y(ℓ) i y(ℓ) i (30) Multi-Fidelity Covariance Estimation are sample covariance matrices computed from {y(ℓ) i }nℓ i=1, ℓ= 0, . . . , L and we have assumed that E[y(ℓ)] is known and without loss of generality equal to 0, ℓ= 0, . . . , L. Because the means of y(0), . . . , y(L) are known, in defining the SCMs in Equation (30) we have normalized by nℓand nℓ 1 to obtain unbiased sample covariance estimates. Under this assumption the EMF estimator (2) with SCMs defined as in (30) is unbiased, E h ˆΣ EMF n i = E ˆΣ (0) n0 + XL ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 = Σ. Because of the unbiasedness, the Frobenius norm MSE of ˆΣ EMF n is thus equal to the trace of the covariance of ˆΣ EMF n , E[|| ˆΣ EMF n Σ||2 F ] = E[|| ˆΣ EMF n E[ ˆΣ EMF n ]||2 F ] = Tr(Cov[ ˆΣ EMF n ]). The covariance of ˆΣ EMF n can be decomposed Cov[ ˆΣ EMF n ] = Cov[ ˆΣ (0) n0 ] ℓ=1 α2 ℓ(Cov[ ˆΣ (ℓ) nℓ] + Cov[ ˆΣ (ℓ) nℓ 1]) ℓ=1 αℓ Cov[ ˆΣ (ℓ) nℓ, ˆΣ (0) n0 ] + Cov[ ˆΣ (0) n0 , ˆΣ (ℓ) nℓ] Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (0) n0 ] Cov[ ˆΣ (0) n0 , ˆΣ (ℓ) nℓ 1] m=ℓ+1 αℓαm Cov[ ˆΣ (ℓ) nℓ, ˆΣ (m) nm ] Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (m) nm ] Cov[ ˆΣ (ℓ) nℓ, ˆΣ (m) nm 1] + Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (m) nm 1] m=ℓ+1 αℓαm Cov[ ˆΣ (m) nm , ˆΣ (ℓ) nℓ] Cov[ ˆΣ (m) nm , ˆΣ (ℓ) nℓ 1] Cov[ ˆΣ (m) nm 1, ˆΣ (ℓ) nℓ] + Cov[ ˆΣ (m) nm 1, ˆΣ (ℓ) nℓ 1] ℓ=1 α2 ℓ Cov[ ˆΣ (ℓ) nℓ, ˆΣ (ℓ) nℓ 1)] + Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (ℓ) nℓ] We will need a result analogous to Lemma 3.2 in Peherstorfer et al. (2016) in order to simplify (31). Lemma C.1. Consider ˆΣ (ℓ) mi = 1 mi Pmi i =1 y(ℓ) i (y(ℓ) i ) and ˆΣ (t) mj = 1 mj Pmj j =1 y(t) j (y(t) j ) , where ℓ, t {0, . . . , L} and mi, mj Z+. It holds that Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj] = 1 max{mi, mj}Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ] Proof. Proceeding algebraically, we see Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj] = i =1 y(ℓ) i (y(ℓ) i ) , 1 mj j =1 y(t) j (y(t) j ) = 1 mimj Cov i =1 y(ℓ) i (y(ℓ) i ) , j =1 y(t) j (y(t) j ) y(ℓ) i (y(ℓ) i ) and y(t) j (y(t) j ) are independent except in the case that i = j . Thus the terms that remain are Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj] = min{mi,mj} X i =1 Cov[y(ℓ) i (y(ℓ) i ) , y(t) i (y(t) i ) ] = min{mi, mj} mimj Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ] = 1 max{mi, mj}Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ]. We return to Equation (31) with this result and simplify, seeing that the doubly-indexed sums are zero, Cov[ ˆΣ EMF n ] = 1 n0 Cov[y(0)(y(0)) ] ℓ=1 α2 ℓ( 1 nℓ+ 1 nℓ 1 )Cov[y(ℓ)(y(ℓ)) ] nℓ 1 nℓ 1 ) Cov[y(ℓ)(y(ℓ)) , y(0)(y(0)) ] + + Cov[y(0)(y(0)) , y(ℓ)(y(ℓ)) ] ℓ=1 α2 ℓ 2 nℓCov[y(ℓ)(y(ℓ)) ]. We rearrange and write the covariance of ˆΣ EMF n compactly Multi-Fidelity Covariance Estimation Cov[ ˆΣ EMF n ] = 1 n0 Cov[y(0)(y(0)) ] ℓ=1 ( 1 nℓ 1 1 nℓ) α2 ℓCov[y(ℓ)(y(ℓ)) ] αℓ Cov[y(ℓ)(y(ℓ)) , y(0)(y(0)) ] + Cov[y(0)(y(0)) , y(ℓ)(y(ℓ)) ] . The MSE of ˆΣ EMF n is equal to the trace of its covariance, E[|| ˆΣ EMF n Σ||] = Tr(Cov[ ˆΣ EMF n ]) = σ2 0 n0 + ℓ=1 ( 1 nℓ 1 1 nℓ )(α2 ℓσ2 ℓ 2αℓρℓσ0σℓ) which is the result we sought to show. D. Proof of Proposition 3.2 Proof. Canceling the matrix logarithm with the matrix exponential in the LEMF estimator (7) yields Log ˆΣ LEMF n Log Σ F = Log ˆΣ (0) n0 + ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 Log Σ Since Σ I F < h/4 < 1 we may define the matrix logarithm through its Taylor series k (Σ I)k , (32) which converges absolutely. Similarly, by the triangle inequality Σ(ℓ) I F Σ I F + Σ(ℓ) Σ F h 2 < 1 . (33) For the sample covariance matrices Σ(ℓ) nℓand Σ(ℓ) nℓ 1, the law of large numbers implies almost surely that for any δℓ> 0 sufficiently large sample size nℓguarantees max n ˆΣ (ℓ) nℓ Σ(ℓ) F , ˆΣ (ℓ) nℓ 1 Σ(ℓ) F o < δℓ, (34) for ℓ= 0, . . . , L. Setting δℓ= h/4 and applying the triangle inequality again with the result of (33) gives max n ˆΣ (ℓ) nℓ I F , ˆΣ (ℓ) nℓ 1 I F δℓ+ Σ(ℓ) I F < δℓ+ h 4 < 1 . (35) Expanding the Taylor series for each matrix logarithm up to first-order (i.e. Log(Σ) Σ I) and canceling out the resulting identity matrices gives Log ˆΣ LEMF n Log Σ F = ˆΣ (0) n0 + ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 Σ + M where the matrix M is the remainder of the first-order Taylor expansions k ( ˆΣ (0) n0 I)k k ( ˆΣ (ℓ) nℓ I)k ! k ( ˆΣ (ℓ) nℓ 1 I)k ! Applying both the triangle and reverse-triangle inequalities gives Log ˆΣ LEMF n Log Σ F ˆΣ EMF n Σ F M F . (38) To bound the right-hand-side of (38), we again use the triangle inequality and that Ak F A k F to obtain ˆΣ (0) n0 I k ˆΣ (ℓ) nℓ I k ˆΣ (ℓ) nℓ 1 I k 1 k Σ I k F . By (35) and the assumption Σ I F < h/4, each series in (39) converges absolutely. Bounding 1/k with 1 and Multi-Fidelity Covariance Estimation factoring out a quadratic power for each term in (39) gives M F ˆΣ (0) n0 I 2 ˆΣ (0) n0 I k ℓ=1 αℓ ˆΣ (ℓ) nℓ I 2 ˆΣ (ℓ) nℓ I k ℓ=1 αℓ ˆΣ (ℓ) nℓ 1 I 2 ˆΣ (ℓ) nℓ 1 I k k=0 Σ I k F . Evaluating each of these geometric series gives the bound ˆΣ (0) n0 I 2 1 ˆΣ (0) n0 I F ˆΣ (ℓ) nℓ I 2 1 ˆΣ (ℓ) nℓ I F ˆΣ (ℓ) nℓ 1 I 2 1 ˆΣ (ℓ) nℓ 1 I F + Σ I 2 F 1 Σ I F . By (35) we have that 1 ˆΣ (ℓ) nℓ I F 1 ˆΣ (ℓ) nℓ 1 I F 1 1 3h/4 < 4 , (42) and similarly 1 1 Σ I F < 1 1 h/4 < 4 Substituting the bounds (42) and (43) into (41) gives M F 4 ˆΣ (0) n0 I 2 ℓ=1 αℓ ˆΣ (ℓ) nℓ I 2 ℓ=1 αℓ ˆΣ (ℓ) nℓ 1 I 2 3 Σ I 2 F . Applying the bound (35) with (44) gives h2 = Kh2 . (45) Factoring using the difference of squares formula (a2 b2 = (a b)(a + b)) gives Log ˆΣ LEMF n Log Σ 2 F ˆΣ EMF n Σ 2 = Log ˆΣ LEMF n Log Σ F + ˆΣ EMF n Σ F Log ˆΣ LEMF n Log Σ F ˆΣ EMF n Σ F Applying the law of large numbers again, we know that almost surely for sufficiently large sample sizes max n ˆΣ EMF n Σ F , Log ˆΣ LEMF n Log Σ F o < 1 . (47) Combining the bound (45) with (38) and substituting into (46) gives Log ˆΣ LEMF n Log Σ 2 F ˆΣ EMF n Σ 2 (48) Taking the expectation and applying Jensen s inequality gives the result. E Log ˆΣ LEMF n Log Σ 2 E ˆΣ EMF n Σ 2 Log ˆΣ LEMF n Log Σ 2 F ˆΣ EMF n Σ 2 2Kh2 . (49) E. Proof of Proposition 3.4 and Corollary 3.5 Since the Euclidean MSE (Corollary 3.3) has the exact same form as the MSE of the scalar multi-fidelity Monte Carlo estimator in Peherstorfer et al. (2016), the optimal allocations and weights there apply directly to the Euclidean multi-fidelity covariance estimator (2) and in first-order to the LEMF estimator (7), with the exception that our definitions of σ2 ℓand ρℓare generalized for matrices. Equation 20 directly follows from (19). F. Algorithmic Description and Code Sketch of LEMF Estimator The code shown in Figure 7 sketches an implementation of the LEMF estimator. It is important to note that it only Multi-Fidelity Covariance Estimation from numpy import cov from scipy.linalg import logm, expm # compute sample-size vector n[] and coefficients a[] as in Proposition 3.4 # collect samples n_l x d of level l = 0, ..., L in array samps[l] S = logm(cov(samps[0].T)) for i in range(len(a)): S = S + a[i]*(logm(cov(samps[i+1].T))-logm(cov(samps[i+1][:n[i]].T))) S = expm(S) Figure 7. This code sketches an implementation of the LEMF estimator. It builds on standard numerical linear algebra routines that are readily available in Num Py/Sci Py (Harris et al., 2020; Virtanen et al., 2020). requires numerical linear algebra functions that are readily available in, e.g., Num Py/Sci Py (Harris et al., 2020; Virtanen et al., 2020). A detailed implementation to reproduce the shown numerical results is provided in the supplemental material. G. Supplemental to Motivating Example with Gaussian Consider high-fidelity samples y(0) N(0, Σ) with Σ Sd ++ and low-fidelity samples which correspond to highfidelity samples corrupted by additional independent noise, y(ℓ) = y + ϵ(ℓ) , ϵ N(0, Γ(ℓ)) . (50) Because the samples are Gaussian we may compute explicitly the generalized variances (10) and generalized correlations (9) needed for the optimal sample allocation in Theorem 3.4. Generalized variance For the generalized variance we have that Tr(Cov[y(0)(y(0)) ]) = E y(0)(y(0)) Σ 2 = E y(0)(y(0)) 2 2E h D y(0)(y(0)) , Σ E = E h Tr(y(0)(y(0)) y(0)(y(0)) ) i Σ 2 F . Focusing on the first term we see Tr y(0) 2 y(0)(y(0)) = y(0) 2 d X y(0) i 2 = y(0) 4 , Tr Cov[y(0)(y(0)) ] = E y(0) 4 Σ 2 F , and is analogous to the formula Var[x] = E[x2] E[x]2 for scalar random variables. For a multivariate Gaussian random variable with zero mean we have that E y(0) 4 = Tr(Σ)2 + 2Tr(Σ2) . (51) Tr(Cov[y(0)(y(0)) ]) = Tr(Σ2) + Tr(Σ)2 . (52) Similarly, we may apply (52) for the low-fidelity samples y(ℓ) N(0, Σ + Γ(ℓ)). Generalized correlation In order to compute the generalized correlation we compute the cross-covariance, Tr(Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ]) = E h D y(0)(y(0)) Σ, (y(ℓ))(y(ℓ)) Σ(ℓ)E = E h Tr y(0)(y(0)) (y(ℓ))(y(ℓ)) i Tr(ΣΣ(ℓ)) = E (y(0)) (y(ℓ)) 2 Tr(ΣΣ(ℓ)) , where we have used the notation Σ(ℓ) = Σ + Γ(ℓ). Writing y(ℓ) = y(0) + ϵ(ℓ) and expanding the squared inner product gives (y(0)) (y(ℓ)) 2 = y(0) 2 + (y(0)) ϵ(ℓ) 2 = y(0) 4 + 2 y(0) 2 (y(0)) ϵ(ℓ)+ (y(0)) ϵ(ℓ) 2 . For the first term we may use (51) while for the second term we have by independence that E y(0) 2 (y(0)) ϵ(ℓ) = E y(0) 2 y(0) E h ϵ(ℓ)i = 0 . Multi-Fidelity Covariance Estimation For the last term we rely on properties of the trace operator. E (y(0)) ϵ(ℓ) 2 = E h Tr (y(0)) ϵ(ℓ)(y(0)) ϵ(ℓ) i = E h Tr (y(0)) ϵ(ℓ)(ϵ(ℓ)) y(0) i = E h Tr y(0)(y(0)) (ϵ(ℓ))(ϵ(ℓ)) i = Tr E[y(0)(y(0)) ]E[(ϵ(ℓ))(ϵ(ℓ)) ] = Tr(ΣΓ(ℓ)) . Combining everything gives the cross-covariance Tr(Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ]) = Tr(Σ)2 + Tr(Σ2) = σ2 0 , (53) which is analogous to the scalar setting where Cov[y, ϵ] = 0 so that Cov[y, y + ϵ] = Var[y]. Using the notation σ0, σℓ, and ρℓdefined in Section 3.3 we have that the generalized correlation is given by Notice that σℓ> σ and so ρℓ< 1. Example set up For the example presented in Section 4.1, we select Σ randomly from the Wishart(4) ensemble, that is, Σ = A A where A R4 4 has i.i.d. standard normal entries. The particular realization of Σ we obtained was 2.52 0.17 0.67 0.98 0.17 0.64 0.29 0.35 0.67 0.29 0.49 0.52 0.98 0.35 0.52 1.31 The noise covariance matrices defining the low-fidelity samples at each level ℓ {1, 2, 3} are given by Γ(1) = 0.1I4 4, Γ(2) = 0.5I4 4, Γ(3) = I4 4 , and result in generalized correlations (9) ρ1 0.93, ρ2 0.74, ρ3 0.58 . We impose sampling costs c0 = 1 and cℓ= 10 ℓ 1 for ℓ {1, 2, 3}. Setting the computational budget to B = 15 and taking sample sizes n from Theorem 3.4 yields n = 12, n1 = 199, n2 = 505, n3 = 2073 . Using the derived sample allocation we draw the corresponding number of samples from each level to obtain the three multi-fidelity estimators: EMF (2), LEMF (7), and the truncated estimator (22). For the truncated multi-fidelity estimator, we set a threshold of δ = 10 16 on the eigenvalues. Additionally, we also draw nhigh = B/c = 15 high-fidelity samples and nlow = B/c3 = 1.5 105 lowfidelity samples y(3) to compute single-fidelity estimates with the same computational budget. The results reported in Table 1 are estimated MSE averaged over 100 independent trials. Across all 100 trials, about 5% of the EMF estimates were indefinite. H. Supplemental to Uncertainty Quantification of Steady-State Heat Flow For demonstration purposes, we show in Figure 8 the numerical solution of the steady-state heat-flow problem (23) for three different, randomly sampled parameters θ. Observations y(ℓ), ℓ {0, 1}, consist of temperature measurements at 10 equidistant locations over the spatial domain, where ℓcorresponds to the number of grid points used to numerically solve (23). Our goal in this example is to estimate the covariance of y(0), the observations corresponding to the highest number of solver grid points. We perform a pilot study to estimate the generalized variances and correlations σ0, σ1, ρ1 needed to compute the optimal multi-fidelity sample allocations n = [n(0) , n(1) ] and coefficients α . We use 105 samples of θ for the pilot study to avoid mixing errors of the optimal sample allocation with the MSEs of the estimators. Figure 9 is analogus to Figure 2 and shows the MSEs of our LEMF estimator compared to the MSEs of single-fidelity and other multi-fidelity estimators, except that now the minimum and maximum MSEs over all 100 trials are shown with error bars. Figure 10a-b shows speedups of our LEMF estimator with respect to the Euclidean and affine-invariant metric. As the budget is increased, the Euclidean estimator becomes indefinite less frequently, which can be seen in Figure 10c. Figure 11 visualizes the convergence behavior of various estimators. Our LEMF estimator converges more quickly to the true value of Σ element-wise than the single-fidelity estimators; this is particularly true for the diagonal elements of Σ. The y-scale is comparable to that of Figure 3c and the x-scale shows the costs and is the same as the x-axis of Figure 3c. I. Supplemental to Metric Learning: Surface Quasi-Geostrophy We follow the surface quasi-geostrophic (SQG) model for the evolution of surface buoyancy b : X [0, ) R in the periodic domain X = [ π, π]2 ( , 0] presented Multi-Fidelity Covariance Estimation 0 0.2 0.4 0.6 0.8 1 numerical solution u(x, θ) spatial domain x Figure 8. Supplemental to heat flow example: Three sample solutions (temperature) of the steady-state heat equation (23) corresponding to different random parameters θ R4. in Held et al. (1985) and Capet et al. (2008). tb(x, t; θ) + J(ψ, b) = 0 , z = 0 , ψ = 0 , z < 0 Here x = [x, y, z] X and the Jacobian is J(ψ, b) = ψ The SQG equation (55) is solved by applying a Fourier transform and integrating in time with finite differences. The parameter vector θ = [θ1, . . . , θ5] defines the initial buoyancy b0 : R2 R at the surface z = 0 as well as the flow. The initial buoyancy is given by the Gaussian b0(x, y; θ) = 1 (2π/|θ5|)2 exp x2 exp(2θ1)y2 , (57) where θ1 is the log aspect ratio that determines the shape of the ellipse, see Figure 12, and θ5 controls the amplitude. Of the remaining parameters θ2 is the gradient Coriolis parameter, θ3 is the log buoyancy frequency, and θ4 is background zonal flow along the x-axis. The high-fidelity random variable consists of nine equallyspaced observations of the solution b(x, T; θ) with T = 24, where b( , ; θ) is computed numerically using a time step of size t = 0.005 and 256 grid points along each coordinate axis. The surrogate random variable consists of observations of the solution at the same locations and time except with b( , ; θ) computed numerically using only 64 grid points along each coordinate axis. The costs of each sampling each random variable are taken to be proportional to the total number of spatial grid points used in the solvers, c0 = 2562 = 65, 536 and c1 = 642 = 4, 096. The observation points are shown in plot (c) and (d) in Figure 12. The input parameters θ are drawn from a Gaussian mixture model where i {0, 1} is a latent variable determining the mixture component sampled from. The full distribution of θ is p(θ | i) = N(µi, C) , i Bernoulli(1/2) , µ0 = [1, 0, 0, 0, 4] , µ1 = [0.1, 0, 0, 0, 4] , Note that we fix the third parameter θ3 = 0, but the covariance matrix of the output observations y(0) and y(1) will still be positive definite. The first class i = 0 corresponds to an initial buoyancy with a larger log-aspect ratio, which introduces an instability, while the second class i = 1 corresponds to a smaller logaspect ratio. Figure 12 shows the buoyancy at initial and final time for stochastic inputs corresponding to both classes. We sample the class variable i from a Bernoulli distribution with parameter 1/2, i.e., i = 0 and i = 1 are drawn with equal probability. We define the mean relative error of distances MRE(A) for the metric A with respect to the reference metric A0 as MRE(A) = 1 5000 i=1 |d A(yi, 0) d A0(yi, 0)| d A0(y, 0) . Multi-Fidelity Covariance Estimation HF only Euclidean MF truncated MF LEMF (ours) 1e+07 1e+08 1e+09 MSE in log-Euclidean metric costs (budget B) 1e+07 1e+08 1e+09 MSE in affine-invariant metric costs (budget B) 1e+07 1e+08 1e+09 MSE in Euclidean metric costs (budget B) (a) log-Euclidean metric (b) affine-invariant metric (c) Euclidean metric Figure 9. Supplemental for heat-flow problem: Plots analogous to Figure 2, except that the minimum and maximum MSE over all 100 trials are shown as error bars. Note that the surrogate-only estimator is not shown to ease exposition. The Euclidean multi-fidelity estimator is shown only in plot (c) because it becomes indefinite and therefore the MSE cannot be computed in the log-Euclidean and the affine-invariant metric. HF only surrogate only Euc. MF trunc. MF LEMF (ours) 18 18 18 18 18 18 HF only surrogate only Euc. MF trunc. MF LEMF (ours) nan nan (biased) 1e+07 1e+08 1e+09 trials indefinite [%] costs (budget B) (a) speedup Euc. metric, tol. 2 10 4 (b) speedup affine-invariant metric, tol. 3 101 (c) indefiniteness of Euc. MF Figure 10. Supplemental for heat flow problem: Plots (a) and (b) show the speedups of our LEMF estimator compared to the single-fidelity estimator that uses the high-fidelity samples alone. Note that the Euclidean estimator is indefinite for more than 10% of the trials and thus the speedup with respect to the affine-invariant metric cannot be computed. Plot (c) shows the percentage of the trails of the Euclidean multi-fidelity estimator that lead to an indefinite covariance matrix estimate. Multi-Fidelity Covariance Estimation costs HF only surrogate only costs LEMF (ours) truth Figure 11. Supplemental for heat-flow problem: The plot shows the convergence behavior of the estimators. Our LEMF estimator is closer to the true value (dashed black) than the single-fidelity estimator, especially for low costs and for diagonal entries of the covariance matrix. Multi-Fidelity Covariance Estimation (a) class i = 0, initial buoyancy (b) class i = 1, initial buoyancy (c) class i = 0, final-time buoyancy (d) class i = 1, final-time buoyancy Figure 12. Supplemental for metric learning: Plots show examples of the buoyancy at initial (top) and final time (bottom) for θ sampled from class i = 0 (left) and i = 1 (right). Observations consist of solution values at nine spatial locations in the domain, as depicted in plots (c) and (d). We use the observations to estimate a metric which will distinguish between observations corresponding to θ sampled from class i = 0 and θ sampled from class i = 1.