# multifidelity_covariance_estimation_in_the_logeuclidean_geometry__7158b213.pdf

Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry

Aimee Maurais 1 Terrence Alsup 2 Benjamin Peherstorfer 2 Youssef Marzouk 1

We introduce a multi-ﬁdelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-deﬁnite manifold. The estimator fuses samples from a hierarchy of data sources of differing ﬁdelities and costs for variance reduction while guaranteeing deﬁniteness, in contrast with previous approaches. The new estimator makes covariance estimation tractable in applications where simulation or data collection is expensive; to that end, we develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a ﬁxed budget. Guaranteed deﬁniteness is crucial to metric learning, data assimilation, and other downstream tasks. Evaluations of our approach using data from physical applications (heat conduction, ﬂuid dynamics) demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.

1. Introduction

Covariance estimation is a fundamental task for understanding the relationships between multiple features of data points. It arises in a wide range of machine learning applications such as metric learning (Zadeh et al., 2016; Huang et al., 2015), graphical models (Loh & B uhlmann, 2014; Ravikumar et al., 2011), and classiﬁcation (Hastie et al., 2001), as well as in spatial statistics (Cressie, 2015) and in science and engineering applications such as ﬁltering and data assimilation (Schillings & Stuart, 2017; Bergemann & Reich, 2010; Law et al., 2015). The canonical Monte Carlo approach to estimating the covariance matrix Σ = Cov[y] of a random variable y with d components is to take n samples y1, . . . , yn and compute the sample covariance matrix

1Massachusetts Institute of Technology, Cambridge, MA USA 2Courant Institute of Mathematical Sciences, New York University, New York, NY USA. Correspondence to: Aimee Maurais <maurais@mit.edu>, Terrence Alsup <alsup@cims.nyu.edu>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

i=1 (yi y) (yi y) , (1)

where1 y = E[y]. The mean-squared error (MSE) of the estimator ˆΣn in the Euclidean metric, i.e., E[d F( ˆΣ, Σ)2] where d F(A, B) = A B F and F denotes the Frobenius norm, decays as O(1/n) if the samples are independent and identically distributed (i.i.d.). Furthermore, if Σ lies in the space Sd ++ of symmetric (strictly) positive deﬁnite (SPD) d d matrices, then the sample covariance matrix ˆΣn is in Sd ++ almost surely for n > d.

We are interested in situations where obtaining a realization of y incurs a high cost c0 > 0 and thus straightforward application of the Monte Carlo estimator (1) becomes computationally intractable. High sampling cost is a pervasive issue in science and engineering, where sampling typically corresponds to measuring the response of processes and systems that must be either numerically simulated or physically observed. We note that there exist a wide range of techniques for covariance estimation in high dimensions (Bickel & Levina, 2008; Ledoit & Wolf, 2012; Loh & Wainwright, 2012; Fan et al., 2011; Tsiligkaridis & Hero, 2013; Loh & B uhlmann, 2014; Donoho et al., 2018). Our focus here instead is on situations where drawing each sample incurs high cost; see also the discussion in the Conclusions.

Multi-ﬁdelity estimation The opportunity that we exploit is that in many applications there is a multi-ﬁdelity hierarchy of data sources from which correlated auxiliary samples can be obtained; see Peherstorfer et al. (2018) for a survey. Multi-ﬁdelity approaches have found increasing utility in machine learning e.g., for optimization (Poloczek et al., 2017; Wu et al., 2020; Li et al., 2020), classiﬁcation and sequential learning (Marques et al., 2018; Gundersen et al., 2021; Palizhati et al., 2022), and sampling (Cai & Adams, 2022; Prescott & Baker, 2020; Alsup et al., 2022).

Let (Ω, F, P) be a probability triple. In the multi-ﬁdelity setting, in addition to the original high-ﬁdelity random variable y y(0) : Ω Rd generating the sample set {y(0)(ωi)}n0 i=1, one has access to a number of low-ﬁdelity surrogate random variables y(1), . . . , y(ℓ) : Ω Rd

and obtains corresponding sets of samples {y(ℓ)(ωi)}nℓ i=1,

1If the mean of y is unknown, one can replace y with the sample mean 1

n Pn i=1 yi and normalize (1) with 1/(n 1).

Multi-Fidelity Covariance Estimation

ℓ= 1, . . . , L. For a particular ω Ω, the realizations y(0)(ω), y(1)(ω), . . . , y(L)(ω) are correlated, but the realizations within each set {y(ℓ)(ωi)}nℓ i=1 are i.i.d., for ℓ= 0, . . . , L. We assume that the surrogates y(1), . . . , y(L) are sorted in order of decreasing ﬁdelity to y(0) as measured by correlation coefﬁcients 1 |ρ1| |ρ2| |ρL|. Here ρℓis a notion of multivariate correlation between y(0)(ω) and y(ℓ)(ω), ℓ= 1, . . . , L, that will be formally deﬁned in Section 3.3. Corresponding to decreasing ﬁdelity, the surrogate random variables have decreasing sampling costs, c0 c1 c L.

Loss of deﬁniteness in Euclidean multi-ﬁdelity covariance estimation Direct application of multi-level and multiﬁdelity Monte Carlo estimation (Giles, 2008; Cliffe et al., 2011; Teckentrup et al., 2013; Ng & Willcox, 2014) based on control variates to (co)variance estimation has been proposed in, e.g., Bierig & Chernov (2015); Qian et al. (2018); Mycek & De Lozzo (2019). These multi-ﬁdelity estimators, however, rely on differences of single-ﬁdelity Monte Carlo estimators, which can lead to a loss of deﬁniteness of the estimated covariance matrix, as we will detail in Section 2. Ignoring this loss-of-deﬁniteness can result in errors in downstream tasks (e.g., instabilities in Kalman ﬁltering, distances becoming negative in a learned metric, inadmissible graph structures). Accordingly, post-processing strategies (Hoel et al., 2016; Chernov et al., 2021) have been proposed that eliminate negative eigenvalues, but elimination of eigenvalues introduces potential for large errors, requires hand-tuning of thresholds, and often, as the authors note, incurs high computational cost; see also Qi & Sun (2006); Popov et al. (2021).

Our contribution: Guaranteed positive multi-ﬁdelity log-Euclidean covariance estimation We introduce a multi-ﬁdelity covariance estimator based on control variates in the log-Euclidean geometry for Sd ++ (Arsigny et al., 2006). The log-Euclidean geometry equips Sd ++ with vectorspace structure by placing it in one-to-one correspondence with Sd, the vector space of d d symmetric matrices, and deﬁning notions of logarithmic addition and scalar multiplication for elements of Sd ++. In particular, we make use of this vector space structure to safely subtract sample covariance matrices as part of a control variate construction in the tangent space to Sd ++, which we identify with Sd. The log-Euclidean vector space can be equipped with the log-Euclidean metric for SPD matrices, which itself induces Riemannian structure on Sd ++. An additional advantage of the log-Euclidean geometry used here, as compared to many other geometries for Sd ++ (see Section 3.1), is that computations remain efﬁcient as the most complex operations required are matrix exponentials and logarithms.

Summary of main contributions The main contributions and key features of this work are summarized as follows:

(a) We introduce a multi-ﬁdelity estimator that preserves deﬁniteness of covariance matrices in the ﬁnite-sample regime and that has ﬁrst-order minimal mean squared error (MSE) in the log-Euclidean metric. (b) We provide analysis that leads to a ﬁrst-order optimal sample allocation between the high-ﬁdelity and surrogate models for gaining orders of magnitude speedups compared to using high-ﬁdelity samples alone. (c) We demonstrate the algorithmic beneﬁts of the log Euclidean geometry by showing that our estimator can be implemented with few lines of code (Appendix F), relying on standard numerical linear algebra routines. This low barrier to implementation allows the estimator to be readily plugged into existing code with minimal effort and even as a post-processing step.

2. Multi-Fidelity Covariance Estimation in Euclidean Geometry and Loss of Deﬁniteness

Recall from the introduction that we generate a set of high-ﬁdelity samples {y(0)(ωi)}n0 i=1 and sets of surrogate samples {y(ℓ)(ωi)}nℓ i=1 over different ﬁdelity levels ℓ= 1, . . . , L. The classical approach to multi-ﬁdelity estimation, i.e., in the Euclidean geometry, is to use the surrogate samples to deﬁne control variates that reduce the variance of the standard Monte Carlo estimator (1) (Bierig & Chernov, 2015; Qian et al., 2018; Mycek & De Lozzo, 2019). This approach yields the Euclidean multi-ﬁdelity (EMF) covariance estimator

ˆΣ EMF n = ˆΣ (0) n0 + XL

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 , (2)

where α = [α1, . . . , αL] RL are the control variate weights and

ˆΣ (ℓ) nℓ= 1

y(ℓ)(ωi) y(ℓ) nℓ y(ℓ)(ωi) y(ℓ) nℓ ,

ˆΣ (ℓ) nℓ 1 = 1 nℓ 1

y(ℓ)(ωi) y(ℓ) nℓ y(ℓ)(ωi) y(ℓ) nℓ

(3) are sample covariance matrices computed from {y(ℓ)(ωi)}nℓ i=1 and {y(ℓ)(ωi)}nℓ 1 i=1 , ℓ= 1, . . . , L. We require for well-deﬁnedness that nℓ nℓ 1, ℓ= 1, . . . , L. For levels ℓ, t {0, . . . , L} and integers mi, mj Z+, notice that the sample covariance matrices

ˆΣ (ℓ) mi = 1

y(ℓ)(ωi ) y(ℓ) mi y(ℓ)(ωi ) y(ℓ) mi ,

ˆΣ (t) mj = 1

y(t)(ωj ) y(t) mj y(t)(ωj ) y(t) mj

Multi-Fidelity Covariance Estimation

are correlated because they are constructed from evaluations of y(ℓ) and y(t) on the common set of events {ωi}min{mi,mj} i=1 . This correlation is the key ingredient to achieving variance reduction in (2) over equivalent-cost single-ﬁdelity estimators; see Section 3.3.

Loss of deﬁniteness The use of regular subtraction and scalar multiplication in Equation (2) treats the symmetric

positive deﬁnite matrices ˆΣ (ℓ) nℓand ˆΣ (ℓ) nℓ 1, ℓ= 0, . . . , L, as elements of the Euclidean vector space Sd. However, SPD matrices do not constitute a vector space under the Euclidean geometry and thus the Euclidean multi-ﬁdelity estimator (2) may become indeﬁnite due to the presence of subtraction. That is, positive deﬁniteness of (2) in the ﬁnitesample regime cannot be guaranteed even if its constituent single-ﬁdelity sample covariance matrices are deﬁnite.

3. Log-Euclidean Multi-Fidelity (LEMF) Covariance Estimator

3.1. Log-Euclidean Geometry and its Beneﬁts

We begin by orienting ourselves within the log-Euclidean geometry for Sd ++ as deﬁned in Arsigny et al. (2006). For A, B Sd ++ and λ R we have notions of logarithmic addition and scalar multiplication ,

A B = Exp(Log A + Log B) ,

λ A = Exp(λ Log A) = Aλ, (4)

corresponding to regular addition and scalar multiplication of log A and log B in Sd. Here Exp : Sd Sd ++ is the matrix-exponential and Log : Sd ++ Sd is its inverse (Arsigny et al., 2007, Def. 2.1). The mappings Log and Exp place Sd and Sd ++ in one-to-one correspondence, enabling Sd ++ to make use of the vector space structure of Sd. The deﬁnitions of and (4) satisfy the axioms of a vector space, and we can equip this vector space Sd ++( , ) with the log-Euclidean metric,

d LE(A, B) = Log A Log B F . (5)

Computations related to the log-Euclidean geometry can be performed economically because efﬁcient algorithms exist for computing matrix exponentials and logarithms. Efﬁcient computation is a major advantage as compared to the geometries induced by other metrics such as the afﬁne invariant metric (Bhatia, 2007; Huang et al., 2015).

Review of other geometries for Sd ++ The set of d d symmetric positive deﬁnite matrices Sd ++ forms a Riemannian manifold embedded in the vector space of d d symmetric matrices Sd. One can equip Sd ++ with a number of geometries in addition to the Euclidean (which we have shown to be problematic for multi-ﬁdelity estimation) and log-Euclidean geometries discussed here. There is, e.g., the

afﬁne-invariant geometry (Bhatia, 2007), often billed as the canonical choice, which gives rise to the afﬁne-invariant metric,

d Aﬀ(A, B) = Log A 1B F . (6)

Other choices include the Bures-Wasserstein geometry, arising from the L2-Wasserstein distance between multivariate, nondegenerate, mean-zero Gaussians (Malag o et al., 2018); and the Log-Cholesky geometry, obtained by parametrizing SPD matrices in terms of their Cholesky factors and deﬁning a Riemannian metric on the space of lower-triangular matrices with positive diagonal entries (Lin, 2019). Choice of geometry for Sd ++ is application-dependent and often depends on factors such as cost to compute geodesic distance, availability of the Riemannian exponential and logarithmic maps in closed form, and whether the so-called swelling effect (Arsigny et al., 2006) is an issue. We choose the log-Euclidean geometry for its computational advantages, described at the end of the previous paragraph.

3.2. Log-Euclidean Multi-Fidelity Estimation

We construct the log-Euclidean multi-ﬁdelity (LEMF) covariance estimator via linear control variates in the log Euclidean geometry for Sd ++,

ˆΣ LEMF n = ˆΣ (0) n0

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 =

Log ˆΣ (0) n0 +

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 !

where we have used the convention A B = A 1 B. The estimator (7) has the same algebraic structure as (2), but is guaranteeably positive deﬁnite at ﬁnite n = (n0, n1, . . . , n L) as the following proposition shows:

Proposition 3.1. The log-Euclidean multi-ﬁdelity covariance estimator (7) exists and is positive deﬁnite almost surely whenever n0, n1, . . . , n L > d.

In addition to interpreting (7) as a linear control variate construction in the log-Euclidean geometry for Sd ++, we can equivalently view the LEMF estimator as a Euclidean linear control variate estimate of Log Σ on Sd,

Log ˆΣ LEMF n = Log ˆΣ (0) n0 +

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 ,

(8) which we then map to Sd ++ via the matrix exponential. Variance reduction in (8) is achieved on Sd via the standard Euclidean control variate framework and then propagated

Multi-Fidelity Covariance Estimation

to Sd ++ via Exp( ). Furthermore, (7) has the form of a log-

Euclidean Fr echet average of ˆΣ (0) n0 and L small perturbations induced by the surrogate sample covariance matrices; see Appendix A for details.

The LEMF estimator requires computing 2L + 1 matrix logarithms and one matrix exponential. Note that obtaining the sample sets {y(ℓ)(ωi)}nℓ i=1, ℓ= 0, . . . , L, is typically much more computationally expensive than computing matrix logarithms and exponentials.

3.3. The MSE of the LEMF Estimator

For the following analysis, we assume without loss of generality that y(ℓ), ℓ= 0, . . . , L, have zero mean, for ease of exposition. As introduced in Section 1, the surrogates y(1), . . . , y(L) are ordered by decreasing ﬁdelity,

1 = ρ0 > |ρ1| > > |ρL| ρL+1 = 0,

where ρℓis a multivariate generalization of the Pearson correlation coefﬁcient,

ρℓ= Tr Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ]

with generalized variances deﬁned as

σ2 ℓ= Tr(Cov[y(ℓ)(y(ℓ)) ]), ℓ= 0, . . . , L. (10)

The covariance of y(ℓ)(y(ℓ)) , ℓ= 0, . . . , L is a symmetric positive semideﬁnite linear operator from Sd to Sd,

Cov[y(ℓ)(y(ℓ)) ] =

E h y(ℓ)(y(ℓ)) Σ(ℓ) y(ℓ)(y(ℓ)) Σ(ℓ) i ,

where we deﬁne Σ(ℓ) = E[y(ℓ)(y(ℓ)) ]. Similarly, the cross-covariance between y(ℓ)(y(ℓ)) and y(m)(y(m)) , ℓ, m {0, . . . , L} is

Cov h y(ℓ)(y(ℓ)) , y(m)(y(m)) i =

E h y(ℓ)(y(ℓ)) Σ(ℓ) y(m)(y(m)) Σ(m) i .

In writing (11) and (12) we do not invoke any particular representation for linear operators on Sd. Rather, we only assume that is an outer product for symmetric matrices compatible with the Frobenius inner product, that is,

Tr(A B) = A, B F , A, B Sd.

The following proposition derives the MSE of the log Euclidean estimator in the log-Euclidean metric (5).

Proposition 3.2. Assume that Σ I F < h/4 with h < 1 and Σ Σ(ℓ) F < h/4 for ℓ= 1, . . . , L. Then, for coefﬁcient vector α Rn and sufﬁciently large numbers of samples n NL+1, the MSE in the log-Euclidean metric of the LEMF estimator deﬁned in (7) is

E d LE ˆΣ LEMF n , Σ 2 = E Log ˆΣ LEMF n Log Σ 2

= σ2 0 n0 +

(α2 ℓσ2 ℓ 2αℓρℓσℓσ0) + O(h2) .

Note that the technical condition Σ I F < h/4 can always be established by re-scaling y(ℓ) = C 1/2y(ℓ) with a matrix C Sd ++ and using C as the base of the matrix logarithm and exponential in the estimator (7); we consider I as the base here for ease of exposition.

Below we relate the MSE of the Euclidean estimator ˆΣ EMF n (2) in the Euclidean metric with the MSE of the log Euclidean estimator (7) ˆΣ LEMF n in the log-Euclidean metric: Corollary 3.3. The MSE in the Euclidean metric of the Euclidean multi-ﬁdelity estimator (2) with sample size vector n and coefﬁcients α is

E d F ˆΣ EMF n , Σ 2 = E ˆΣ EMF n Σ 2

(α2 ℓσ2 ℓ 2αℓρℓσℓσ0) . (14)

Corollary 3.3 shows that the MSE of the LEMF estimator in the log-Euclidean metric is equal to the MSE of the EMF estimator in the Euclidean metric up to ﬁrst order. Note that it does not comment on a cross-comparison (e.g., MSE of the LEMF in the Euclidean metric and vice-versa); we will explore such comparisons numerically in Section 4.

3.4. Optimal Sample Allocation for the LEMF

Proposition 3.2 and Corollary 3.3 show the dependence of the MSE in the log-Euclidean and Euclidean metrics of the LEMF and EMF estimators on the sample allocations n and coefﬁcients α. We now derive n and α that minimize the respective MSEs up to ﬁrst order and so achieve an optimal sample allocation across the hierarchy of data sources.

We start with α : because the ﬁrst-order approximation of the MSE in (13) and the MSE in (14) are both quadratic in the coefﬁcients α1, . . . , αL, we can directly minimize for α1, . . . , αL and obtain

α = [α 1, . . . , α L] , α ℓ= ρℓ σ0 σℓ , (15)

Multi-Fidelity Covariance Estimation

which are independent of the sample sizes.

In preparation for ﬁnding n , we deﬁne the costs of the multi-ﬁdelity estimators as

ℓ=0 nℓcℓ. (16)

Following Peherstorfer et al. (2016), Proposition 3.4 derives the optimal sample allocation n to obtain an estimator with ﬁrst-order minimal MSE and costs c(n ) B for a ﬁxed computational budget B > 0.

Note that in writing (16), we assume that the costs of generating samples of the random variables y(0), . . . , y(L) are much higher than the cost of constructing a multi-ﬁdelity covariance estimate, and in particular eclipse the cost of computing the matrix exponential and the 2L + 1 matrix logarithms required by the LEMF estimator (7). This situation is often the case, e.g., in science and engineering applications when generating samples may correspond to evaluating computationally intensive physics-based models. One could explicitly account for the cost of constructing the LEMF estimator by inserting a modiﬁed budget of ˆB = B clog/exp into Proposition 3.4, where clog/exp is the cost of the matrix logarithms and exponential, and subsequently obtain a result analogous to Corollary 3.5 with a more complicated condition for when the LEMF estimator outperforms an equivalent-cost high-ﬁdelity estimator.

Proposition 3.4. Let B > 0 denote the computational budget and let Proposition 3.2 apply. The ﬁrst-order optimal sample allocation n for the LEMF estimator (2) that solves

min n RL+1 E d LE ˆΣ LEMF n , Σ 2

such that c(n) B (17)

c0(ρ2 ℓ ρ2 ℓ+1) cℓ(1 ρ2 1)

c0(ρ2 i ρ2 i+1) ci(1 ρ2 1) .

Because the MSE of the Euclidean estimator (in d F( , )) is equal to the ﬁrst-order approximation of the MSE of the log-Euclidean estimator (in d LE( , )), we directly obtain that (18) is the optimal sample allocation for the Euclidean estimator as well. In practice, we round either up or down and so use either n or n .

If the correlation coefﬁcients ρl and costs cℓ, ℓ= 0, . . . , L are known, one can determine the optimal α and n for ﬁxed budget B; or, alternatively, determine α and n so that the MSE of the LEMF estimator is below a threshold ϵ > 0. In situations when ρℓand cℓare unknown, it is

common to estimate them in pilot studies (Cliffe et al., 2011; Peherstorfer et al., 2016) and even to reuse the pilot samples in the actual estimator (Konrad et al., 2022).

3.5. Discussion of the LEMF Estimator

Interpretation Proposition 3.4 shows that when the correlation coefﬁcients (9) are close to 1, then the LEMF estimator (7) allocates more of the budget to the cheaper surrogate samples. In the extreme case where ρℓ= 0, the entire budget is allocated to obtaining high-ﬁdelity samples and one recovers the single-ﬁdelity estimator (1) with n = B/c0.

The following corollary shows that the corrrelation coefﬁcients ρℓand costs cℓdictate whether the proposed LEMF estimator leads to lower MSE than the equivalent-cost single (high)-ﬁdelity estimator. Corollary 3.5. Let Proposition 3.4 apply. Then, the ﬁrstorder term of the MSE of the LEMF estimator with n is

E d LE ˆΣ LEMF n , Σ 2 =σ2 0 B

cℓ(ρ2 ℓ ρ2 ℓ+1)

(19) which will be smaller than the ﬁrst-order log-Euclidean MSE of the cost-equivalent single-ﬁdelity estimator (1) that uses n0 = B/c0 high-ﬁdelity samples if and only if

c0 (ρ2 ℓ ρ2 ℓ+1) < 1 . (20)

Equation (20) demonstrates that for ﬁxed c0, the beneﬁt of using the LEMF estimator (7) over the equivalent-cost high-ﬁdelity estimator (1) is a function of both the surrogate model correlations with the high-ﬁdelity model, ρ1, . . . , ρL, and the surrogate model costs, c1, . . . , c L. It is particularly instructive to consider the biﬁdelity case with only one surrogate model (L = 1). In this setting, condition (20) simpliﬁes to

1 ρ2 1 + rc1

c0 ρ2 1 < 1 2

1 ρ2 1 ρ2 1 < c0 c1 c0c1 .

(21) If the normalized reduction in cost c0 c1 c0c1 achieved by the low-ﬁdelity model is large, then the minimum required generalized correlation ρ1 such that condition (21) is satisﬁed decreases. Conversely, in the limit ρ1 1 we only require c1 < c0 in order to see a beneﬁt from using the LEMF estimator. In Figure 1 we visually demonstrate this relationship by plotting the region in (ρ, c1/c0) space for which (21) holds and the LEMF estimator has a lower ﬁrst-order MSE than the equivalent-cost high-ﬁdelity estimator.

Truncated multi-ﬁdelity estimator In addition to the EMF estimator and equivalent-cost single-ﬁdelity estimators, we

Multi-Fidelity Covariance Estimation

𝜌1 0.0 0.5 1.0

Figure 1. The shaded region corresponds to pairs (ρ1, c1/c0) of correlation coefﬁcient and cost ratio for which (21) holds and thus the LEMF estimator has lower ﬁrst-order error than the equivalentcost single-ﬁdelity estimator. Note that a lower correlation ρ1 requires a smaller cost ratio c1/c0 for the LEMF estimator to achieve lower error than the single-ﬁdelity estimator.

will compare the proposed LEMF estimator to the EMF estimator with positive deﬁniteness enforced via eigenvalue truncation. Following Hoel et al. (2016), we deﬁne the truncated multi-ﬁdelity covariance estimator as

ˆΣ +δ n = Tδ ˆΣ EMF n , (22)

where δ > 0 is a small threshold and the truncation operator Tδ acts on symmetric matrices via Tδ(A) = Q (Λ δ) Q

with A = QΛQ denoting the eigen-decomposition of A and Λ δ denoting the diagonal matrix whose diagonal entries are max(λii, δ); see the Introduction for a discussion about drawbacks of the truncated estimator.

4. Numerical Examples

4.1. Motivating Example with Gaussians

Consider the problem of estimating the covariance of a Gaussian random variable y N(0, Σ), with Σ S4 ++. For this motivating example, we obtain low-ﬁdelity samples y(ℓ) = y + ϵ(ℓ) by corrupting a high-ﬁdelity sample y with independent noise at different noise levels ϵ(ℓ) N(0, Γ(ℓ)) for ℓ= 1, 2, 3. The matrices Γ(1), . . . , Γ(3) are chosen such that noise levels lead to generalized correlations of ρ1 0.93, ρ2 0.74, and ρ3 0.58; see Appendix G. We impose unit costs for drawing a high-ﬁdelity sample y and costs c1 = 10 2, c2 = 10 3, and c3 = 10 4 for drawing the surrogate samples. Because the samples are Gaussian, their empirical second moments follow a Wishart distribution and the generalized variances (10) and correlations (9) can be computed exactly; see Appendix G. Table 1 shows the MSE of the multi-ﬁdelity estimators compared

with that of single-ﬁdelity covariance estimators using highﬁdelity or surrogate samples alone with the same budget B = 15. The LEMF estimator has 2 lower MSE than the other estimators in the log-Euclidean and afﬁne-invariant metric, while being competitive in the Euclidean metric.

4.2. Uncertainty Quantiﬁcation of Steady-State Heat Flow

Physics model The steady-state heat equation over the domain X = (0, 1) with a variable heat conductivity is

div (exp(κ(x; θ)) u(x; θ)) = f(x), x X , (23)

where we impose Dirichlet boundary conditions u(0, θ) = 0, u(1, θ) = 1 and constant source f(x) = 1. We model the log heat conductivity κ as a degree-4 sine-polynomial κ(x; θ) = P4 k=1 θk sin(2πkx), whose coefﬁcients are given by the components of θ = [θ1, . . . , θ4] R4.

We obtain a high-ﬁdelity observation model G(0) : R4 R10 by approximating the solution u( , θ) via a secondorder ﬁnite difference scheme with 65,536 grid points and observing the resulting temperature at d = 10 equally spaced points xi = i/11, i {1, . . . , 10}, in the interior of the domain X (see Figure 8 in Appendix H). The surrogate model G(1) is obtained by approximating the solution u( , θ) with only 1,024 grid points and performing the same measurement process. We identify the costs of the highﬁdelity and surrogate models with the number of grid points involved, c0 = 216 and c1 = 210.

Covariance estimation We model θ N(0, I4 4) and deﬁne high-ﬁdelity samples as y(0) = G(0)(θ) and surrogate samples as y(1) = G(1)(θ). Our aim is to estimate the covariance of y(0), which is a 10 10 matrix. We ﬁrst estimate the generalized variances and correlations σ0, σ1, ρ1 in a pilot study to compute the optimal multi-ﬁdelity sample allocations n = [n(0) , n(1) ] and α for seven values of budget B in the interval [106, 2 109]; see Appendix H. We then obtain n(1) n(0) realizations θ1, . . . , θn(1) of

the input random variable θ and compute the n(0) high-

ﬁdelity samples {y(0)(θi)}n(0) i=1 and n(1) surrogate samples

{y(1)(θi)}n(1) i=1 . Note that the ﬁrst n(0) surrogate samples use the same realizations of θ as the high-ﬁdelity samples, creating statistical coupling, and thus correlation, between the high-ﬁdelity and surrogate samples.

Results Figure 2 compares the MSE in the log-Euclidean, afﬁne-invariant, and Euclidean metrics (6) of the singleﬁdelity and multi-ﬁdelity estimators over 100 trials. Using the surrogate samples alone leads to estimates with large bias with respect to the true (high-ﬁdelity) covariance. For example, the bias of the surrogate-only estimator can be seen in Figure 3c, which displays the convergence of each estimate of Σ2,5 to the truth; see also Appendix H.

Multi-Fidelity Covariance Estimation

MSE high-ﬁdelity only surrogate only Euclidean MF truncated MF (δ = 10 16) LEMF (ours) log-Euclidean 1.72 5.82 Na N 66.88 0.83 afﬁne-invariant 4.99 5.82 Na N 125.11 2.69 Euclidean 3.00 4.00 0.51 0.51 0.71

Table 1. Motivating example: The LEMF estimator leverages cheap surrogate samples to reduce the MSE compared to cost-equivalent single-ﬁdelity estimators and other multi-ﬁdelity estimators that rely on post-processing via truncation.

1e+07 1e+08 1e+09

MSE in log-Euclidean metric

costs (budget B)

1e+07 1e+08 1e+09

MSE in afﬁne-invariant metric

costs (budget B)

1e+07 1e+08 1e+09

MSE in Euclidean metric

costs (budget B)

HF only surrogate only Euclidean MF truncated MF LEMF (ours)

(a) log-Euclidean metric (b) afﬁne-invariant metric (c) Euclidean metric

Figure 2. Heat ﬂow: In the log-Euclidean metric, our LEMF estimator achieves the MSE tolerance below 10 1 with 30 speedup compared to the single-ﬁdelity estimator that uses high-ﬁdelity samples alone. Only using surrogate samples leads to a large bias that prevents reaching an MSE below 10 1. The Euclidean multi-ﬁdelity estimator is indeﬁnite in more than 10% of the 100 trials used here and therefore does not provide a valid covariance matrix in this example. (Plot with min/max over 100 trials is shown in Appendix H.)

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

nan nan (biased)

nan nan -0.01

1e+07 1e+08 1e+09

smallest eigenvalue

costs (budget B)

Euc. MF trunc. MF LEMF (ours)

1e+07 1e+08 1e+09

costs (budget B)

truth HF only surrogate only LEMF (ours)

(a) speedup to reach MSE below 10 1 (b) smallest eigenvalue (c) convergence of estimators

Figure 3. Heat ﬂow: Plot (a): Our LEMF estimator achieves a 32 speedup. Plot (b): Our LEMF estimator remains positive deﬁnite even for small sample sizes, whereas the Euclidean estimator becomes indeﬁnite. Plot (c): Entries of our LEMF estimates converge quickly to the entries of the true covariance, whereas surrogate-only estimators incur a large bias and high-ﬁdelity-only estimators incur high costs.

The EMF estimator is unbiased and achieves competitively low MSE in the Euclidean metric but becomes indeﬁnite in more than 10% of trials (see Appendix H). For this reason its MSE in the log-Euclidean and afﬁne-invariant metrics is inﬁnite/nonexistent (see Figure 3b), as the log-Euclidean and afﬁne-invariant distances between an element of Sd ++ and any non positive-deﬁnite matrix are . When we ﬁx the EMF estimator via eigenvalue truncation (using a threshold of δ = 10 16 in (22)), obtaining the positive-deﬁnite truncated EMF estimator, we introduce the possibility of large MSE in the log-Euclidean and afﬁne-invariant metrics

because we have modify the spectrum, which can be seen well for small budgets in Figure 2.

By contrast, the proposed log-Euclidean multi-ﬁdelity estimator uses surrogate samples in conjunction with highﬁdelity samples and inherently retains positive deﬁniteness, thus achieving about one order of magnitude speedup across metrics as compared to the estimator that uses the highﬁdelity samples alone; Figure 3a. The MSE of the LEMF estimator decays as O(1/B), in consort with the other Monte Carlo estimators which use high-ﬁdelity samples.

Multi-Fidelity Covariance Estimation

Figure 4. Metric learning: An example of the ﬁnal-time buoyancy for class i = 1. Our measurements consist of buoyancy values at nine spatial locations in the domain, indicated by black dots in the plot above. We use the observations to estimate a metric which distinguishes between observations corresponding to θ sampled from class i = 0 and from class i = 1.

4.3. Metric Learning: Surface Quasi-Geostrophy

We now apply our multiﬁdelity covariance estimators in an intermediate step in the metric learning of Zadeh et al. (2016) to enable easier classiﬁcation of ﬂuid regimes from buoyancy measurements given by a surface quasigeostrophic model (Held et al., 1985).

Physics model In the model presented in Held et al. (1985), the evolution of the buoyancy b(x, t) over a periodic spatial domain X = [ π, π] [ π, π] is governed by

tb(x, t; θ) + J(ψ, b) = 0 , (24)

where ψ is the streamfunction and x = (x, y). The initial buoyancy is b0(x; θ) = 1 (2π/|θ5|)2 exp x2 exp(2θ1)y2 , the contours of which form ellipses parametrized by θ = [θ1, . . . , θ5] R5. θ1 is the log-aspect ratio and controls the origination of vortices in the buoyancy over time, and θ5 controls the amplitude of b0. The parameters θ2, θ3, and θ4 determine additional aspects of the dynamics (24); see Appendix I for details. We sample the parameters θ R5 from a two-component Gaussian mixture whose means differ only in the coordinate corresponding to the log-aspect ratio; see Appendix I. We use i {0, 1} to denote the mixture component from which a given realization of θ is sampled.

The high-ﬁdelity parameter-to-observable map G(0) maps θ onto d = 9 observations of the numerical solution of equation (24) in the spatial domain; see Figure 4 as well as Figure 12 in Appendix I. For G(0), numerical solution of

HF only surrogate only Euclidean MF truncated MF LEMF (ours)

mean relative error of distances

(benchmark) (indeﬁnite)

Figure 5. Metric learning: When using the learned metric to compute distances between observations of the surface quasigeostrophic model, our LEMF estimator achieves a > 20% lower error in the distances than using high-ﬁdelity samples alone. The Euclidean estimator provides an indeﬁnite metric in this example.

(24) is computed up to time T = 24 using ﬁnite differences with 256 grid points along each coordinate dimension and with a time-step of t = 0.005 2. High-ﬁdelity samples y(0) are thus given by y(0) = G(0)(θ) and have covariance Σ S9 ++.

The surrogate parameter-to-observable map G(1) and random variable y(1) are deﬁned in the same way except that in solving Equation (24) we only use 64 grid points along each coordinate dimension. Evaluating G(1) is 16 times faster than evaluating G(0), with c0 = 2562 and c1 = 642.

Metric learning We now apply our covariance estimator (7) to learn a metric on the data manifold. We use a slight modiﬁcation of the geometric mean metric learning approach introduced in Zadeh et al. (2016), which learns an SPD matrix A to deﬁne the metric

d A(y1, y2) = q

(y1 y2) A(y1 y2) .

The matrix A is an interpolation in the afﬁne-invariant geometry for Sd ++ between the inverse of a similarity matrix S = Cov[y | i = 0]+Cov[y | i = 1] and a dissimilarity matrix D = S+µµ , where µ = E[y | i = 0] E[y | i = 1] . Following Zadeh et al. (2016), this interpolation is given by

A = S 1/2(S1/2DS1/2)t S 1/2 , t [0, 1] .

We apply our multiﬁdelity techniques to estimating the two covariance matrices that comprise S, Cov[y | i = 0] and Cov[y | i = 1]. In constructing A we set t = 0.1, motivated by (Zadeh et al., 2016, Figure 3).

Results Prior to applying our multi-ﬁdelity estimators in this setting, we take a total of 24, 000 pilot samples between

2We use the Python package pyqg to compute our solutions; see http://pyqg.readthedocs.org/

Multi-Fidelity Covariance Estimation

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

MSE in Euclidean metric

5.1e+02 5.1e+02

9.8e+02 9.8e+02

9.0e+02 9.0e+02

4.5e+02 4.5e+02

(indeﬁnite)

nan nan 0.4

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

MSE in log-Euclidean metric

4.6e-01 4.6e-01

7.4e-01 7.4e-01

5.0e-01 5.0e-01

3.6e-01 3.6e-01

(indeﬁnite)

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

MSE in afﬁne-invariant metric

1.1e+00 1.1e+00

2.3e+00 2.3e+00

1.2e+00 1.2e+00

1.0e+00 1.0e+00

(indeﬁnite)

(a) Euclidean metric (b) log-Euclidean metric (c) afﬁne-invariant metric

Figure 6. Metric learning: Our LEMF estimator achieves the lowest MSE compared to the cost-equivalent benchmark estimator that uses high-ﬁdelity samples alone. The Euclidean multi-ﬁdelity estimator is indeﬁnite over 100 trials and therefore fails to learn a valid metric.

(y(0), y(1)) | i = 0 and (y(0), y(1)) | i = 1 in order to estimate the generalized variances σ2 0 and σ2 1 and correlation ρ1 in both classes. We additionally use these pilot samples to obtain a reference estimate of A, which we denote A0.

We estimate Cov[y(0) | i = 0] and Cov[y(0) | i = 1] as follows: we construct multiﬁdelity estimators from a combination of 15 high-ﬁdelity samples and an additional number nlo,i of low-ﬁdelity samples computed according to the optimal ratio in Proposition 3.4. We correspondingly construct equivalent-cost single-ﬁdelity estimators with budgets

Bi = 15chi + nlo,iclo, i {0, 1},

spent entirely on highor low-ﬁdelity samples. We estimate Cov[y | i = 0] and Cov[y | i = 1] using the EMF, LEMF, and truncated EMF (δ = 10 16) estimators and the highﬁdelityand low-ﬁdelity-only estimators with the sample allocations described above. The two covariance estimates are combined to obtain estimates of the metric matrix A.

We ﬁrst study the performance of the learned metrics relative to the reference metric. Let {y(0) j }5000 j=1 R9 be a test set of 5,000 points and consider the mean relative error of distances MRE(A) with respect to the reference metric A0; see Appendix I for a deﬁnition. Figure 5 shows the MRE for each estimator averaged over 50 independent trials. The LEMF estimator outperforms the single-ﬁdelity estimator and the truncated MF estimator (22) by more than 20%. The Euclidean multi-ﬁdelity estimator gives indeﬁnite estimates of A, which cannot constitute valid metrics.

We further compare the MSEs of the learned metric matrices relative to the reference A0. Figure 6 shows that the LEMF estimator (7) yields a closer approximation of the reference metric; in particular the LEMF estimator gives a more accurate estimate of A relative to the high-ﬁdelity-only estimator while the other estimators give less accurate estimates. The surrogate-only estimator is limited by its large bias. The EMF estimator (2) is again indeﬁnite in >10% of trials.

5. Conclusions

We showed that formulating multi-ﬁdelity estimation in a non-Euclidean geometry can be beneﬁcial for enforcing structure: The proposed LEMF estimator leverages the log Euclidean geometry for SPD matrices to guarantee that the resulting covariance matrix estimates are positive definite. This property is in contrast to state-of-the-art multiﬁdelity estimators that can lose deﬁniteness, especially in the small-sample regime, which is typical in science and engineering applications in which drawing samples is expensive. Our LEMF estimator preserves deﬁniteness and leverages surrogates to achieve speedups of more than one order of magnitude in our experiments. The experiments further show that preserving deﬁniteness is key for enabling tasks such as metric learning and classiﬁcation, for which indeﬁnite Euclidean multi-ﬁdelity estimators are invalid. Future work includes combining the multi-ﬁdelity approach with, e.g., shrinkage and other regularization schemes for high-dimensional covariance estimation.

Acknowledgements

AM and YM acknowledge support from the Ofﬁce of Naval Research, SIMDA (Sea Ice Modeling and Data Assimilation) MURI, award number N00014-20-1-2595 (Dr. Reza Malek-Madani and Dr. Scott Harper). AM additionally acknowledges support from the NSF Graduate Research Fellowship under Grant No. 1745302. TA and BP acknowledge support from the AFOSR under Award Number FA9550-211-0222 (Dr. Fariba Fahroo).

Code Availability

Python implementations of the LEMF, EMF, and truncated EMF estimators are available at https://github.com/amaurais/LEMF.

Multi-Fidelity Covariance Estimation

Alsup, T., Venturi, L., and Peherstorfer, B. Multilevel Stein variational gradient descent with applications to bayesian inverse problems. In Bruna, J., Hesthaven, J., and Zdeborova, L. (eds.), Proceedings of the 2nd Mathematical and Scientiﬁc Machine Learning Conference, volume 145 of Proceedings of Machine Learning Research, pp. 93 117. PMLR, 16 19 Aug 2022.

Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. Log Euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine: An Ofﬁcial Journal of the International Society for Magnetic Resonance in Medicine, 56(2):411 421, 2006.

Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. Geometric means in a novel vector space structure on symmetric positive-deﬁnite matrices. SIAM journal on matrix analysis and applications, 29(1):328 347, 2007.

Bergemann, K. and Reich, S. A localization technique for ensemble Kalman ﬁlters. Quarterly Journal of the Royal Meteorological Society, 136(648):701 707, 2010.

Bhatia, R. Positive Deﬁnite Matrices. Princeton University Press, 2007.

Bickel, P. J. and Levina, E. Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):199 227, 2008. doi: 10.1214/009053607000000758.

Bierig, C. and Chernov, A. Convergence analysis of multilevel Monte Carlo variance estimators and application for random obstacle problems. Numerische Mathematik, 130 (4):579 613, Aug 2015.

Cai, D. and Adams, R. P. Multi-ﬁdelity Monte Carlo: a pseudo-marginal approach. ar Xiv, 2210.01534, 2022.

Capet, X., Klein, P., Hua, B. L., Lapeyre, G., and Mc Williams, J. C. Surface kinetic energy transfer in surface quasi-geostrophic ﬂows. Journal of Fluid Mechanics, 604:165 174, 2008.

Chernov, A., Hoel, H., Law, K. J. H., Nobile, F., and Tempone, R. Multilevel ensemble Kalman ﬁltering for spatiotemporal processes. Numerische Mathematik, 147(1): 71 125, Jan 2021.

Cliffe, K. A., Giles, M. B., Scheichl, R., and Teckentrup, A. L. Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefﬁcients. Computing and Visualization in Science, 14(1):3 15, 2011.

Cressie, N. Statistics for spatial data. John Wiley & Sons, 2015.

Donoho, D., Gavish, M., and Johnstone, I. Optimal shrinkage of eigenvalues in the spiked covariance model. The Annals of Statistics, 46(4):1742 1778, 2018. doi: 10.1214/17-AOS1601.

Fan, J., Liao, Y., and Mincheva, M. High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39(6):3320 3356, 2011.

Giles, M. B. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607 617, 2008.

Gundersen, G. W., Cai, D., Zhou, C., Engelhardt, B. E., and Adams, R. P. Active multi-ﬁdelity Bayesian online changepoint detection. In de Campos, C. and Maathuis, M. H. (eds.), Proceedings of the Thirty-Seventh Conference on Uncertainty in Artiﬁcial Intelligence, volume 161 of Proceedings of Machine Learning Research, pp. 1916 1926. PMLR, 27 30 Jul 2021.

Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del R ıo, J. F., Wiebe, M., Peterson, P., G erard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. Array programming with Num Py. Nature, 585(7825):357 362, September 2020. doi: 10. 1038/s41586-020-2649-2.

Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Springer, 2001.

Held, I., Pierrehumbert, R., Garner, S., and Swanson, K. Surface quasi-geostrophic dynamics. Journal of Fluid Mechanics, 282:1 20, 1985.

Hoel, H., Law, K. J., and Tempone, R. Multilevel ensemble Kalman ﬁltering. SIAM Journal on Numerical Analysis, 54(3):1813 1839, 2016.

Huang, Z., Wang, R., Shan, S., Li, X., and Chen, X. Log Euclidean metric learning on symmetric positive deﬁnite manifold with application to image set classiﬁcation. In Bach, F. and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 720 729, Lille, France, 07 09 Jul 2015. PMLR.

Konrad, J., Farcas , I.-G., Peherstorfer, B., Di Siena, A., Jenko, F., Neckel, T., and Bungartz, H.-J. Data-driven low-ﬁdelity models for multi-ﬁdelity Monte Carlo sampling in plasma micro-turbulence analysis. Journal of Computational Physics, 451:110898, 2022.

Law, K., Stuart, A., and Zygalakis, K. Data Assimilation. Springer, 2015.

Multi-Fidelity Covariance Estimation

Ledoit, O. and Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics, 40(2):1024 1060, 2012. doi: 10.1214/ 12-AOS989.

Li, S., Xing, W., Kirby, R., and Zhe, S. Multi-ﬁdelity Bayesian optimization via deep neural networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 8521 8531. Curran Associates, Inc., 2020.

Lin, Z. Riemannian geometry of symmetric positive deﬁnite matrices via Cholesky decomposition. SIAM Journal on Matrix Analysis and Applications, 40(4):1353 1370, 2019.

Loh, P.-L. and B uhlmann, P. High-dimensional learning of linear causal networks via inverse covariance estimation. Journal of Machine Learning Research, 15(88): 3065 3105, 2014.

Loh, P.-l. and Wainwright, M. J. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.

Malag o, L., Montrucchio, L., and Pistone, G. Wasserstein Riemannian geometry of gaussian densities. Information Geometry, 1(2):137 179, 2018.

Marques, A., Lam, R., and Willcox, K. Contour location via entropy reduction leveraging multiple information sources. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.

Mycek, P. and De Lozzo, M. Multilevel Monte Carlo covariance estimation for the computation of Sobol indices. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 7(4): 1323 1348, 2019.

Ng, L. and Willcox, K. Multiﬁdelity approaches for optimization under uncertainty. International Journal for Numerical Methods in Engineering, 100(10):746 772, 2014.

Palizhati, A., Torrisi, S. B., Aykol, M., Suram, S. K., Hummelshøj, J. S., and Montoya, J. H. Agents for sequential learning using multiple-ﬁdelity data. Scientiﬁc Reports, 12(1):4694, Mar 2022.

Peherstorfer, B., Willcox, K., and Gunzburger, M. Optimal model management for multiﬁdelity Monte Carlo estimation. SIAM Journal on Scientiﬁc Computing, 38(5): A3163 A3194, 2016.

Peherstorfer, B., Willcox, K., and Gunzburger, M. Survey of multiﬁdelity methods in uncertainty propagation, inference, and optimization. SIAM Review, 60(3):550 591, 2018.

Poloczek, M., Wang, J., and Frazier, P. Multi-information source optimization. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

Popov, A. A., Mou, C., Sandu, A., and Iliescu, T. A multiﬁdelity ensemble Kalman ﬁlter with reduced order control variates. SIAM Journal on Scientiﬁc Computing, 43(2): A1134 A1162, 2021.

Prescott, T. P. and Baker, R. E. Multiﬁdelity approximate Bayesian computation. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 8(1):114 138, 2020.

Qi, H. and Sun, D. A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM Journal on Matrix Analysis and Applications, 28 (2):360 385, 2006.

Qian, E., Peherstorfer, B., O Malley, D., Vesselinov, V. V., and Willcox, K. Multiﬁdelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 6(2):683 706, 2018.

Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5(none):935 980, 2011.

Schillings, C. and Stuart, A. M. Analysis of the ensemble Kalman ﬁlter for inverse problems. SIAM Journal on Numerical Analysis, 55(3):1264 1290, 2017.

Teckentrup, A. L., Scheichl, R., Giles, M. B., and Ullmann, E. Further analysis of multilevel Monte Carlo methods for elliptic pdes with random coefﬁcients. Numerische Mathematik, 125(3):569 600, Nov 2013.

Tsiligkaridis, T. and Hero, A. O. Covariance estimation in high dimensions via Kronecker product expansions. IEEE Transactions on Signal Processing, 61(21):5347 5360, 2013. doi: 10.1109/TSP.2013.2279355.

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, I., Feng, Y., Moore, E. W., Vander Plas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and Sci Py 1.0 Contributors. Sci Py

Multi-Fidelity Covariance Estimation

1.0: Fundamental Algorithms for Scientiﬁc Computing in Python. Nature Methods, 17:261 272, 2020. doi: 10.1038/s41592-019-0686-2.

Wu, J., Toscano-Palmerin, S., Frazier, P. I., and Wilson, A. G. Practical multi-ﬁdelity Bayesian optimization for hyperparameter tuning. In Adams, R. P. and Gogate, V. (eds.), Proceedings of The 35th Uncertainty in Artiﬁcial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, pp. 788 798. PMLR, 22 25 Jul 2020.

Zadeh, P. H., Hosseini, R., and Sra, S. Geometric mean metric learning. In ICML 16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, volume 48, pp. 2464 2471, June 2016.

Multi-Fidelity Covariance Estimation

A. Connection to Fr echet Averaging

In addition to being a control-variate estimator in the log Euclidean geometry, Equation (7) can also be viewed as a Fr echet average. Recall that the LEMF estimator takes the form

Log ˆΣ LEMF n = Log ˆΣ (0) n0

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 .

Deﬁning the SPD difference matrices

Dℓ= Exp(Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1) = ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1,

ℓ= 1, . . . , L

we equivalently have

Log ˆΣ LEMF n = Log ˆΣ (0) n0 +

ℓ=1 αℓLog Dℓ. (25)

Assuming that α1, . . . , αL > 0 3, we see in (25) by deﬁnition of the Fr echet mean in the log-Euclidean geometry (Arsigny et al., 2006) that ˆΣ LEMF n satisﬁes

ˆΣ LEMF n ELE, α( ˆΣ (0) n0 , D1, . . . , DL)

= arg min Σ Sd ++

d2 LE(Σ, ˆΣ (0) n0 ) +

ℓ=1 αℓd2 LE(Σ, Dℓ)

where α = 1 α1 αL RL+1. Thus ˆΣ LEMF n can be interpreted as a weighted Fr echet average between

high-ﬁdelity ˆΣ (0) n0 and the low-ﬁdelity perturbations Dℓ, ℓ= 1, . . . , L, with weights assigned according to how much ˆΣ (0) n0 is correlated in a generalized sense with each Dℓ.

B. Proof of Proposition 3.1

We have deﬁned the log-Euclidean multi-ﬁdelity estimator by

ˆΣ LEMF n = ˆΣ (0) n0

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 =

Log ˆΣ (0) n0 +

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 !

3If αℓ< 0 for some ℓ, we can deﬁne βℓ= αℓand redeﬁne Dℓ= ˆΣ (ℓ) nℓ 1 ˆΣ (ℓ) nℓby reversing the order of subtraction in order to ensure that the Fr echet weights are all positive.

Because we have assumed n0, n1, . . . , n L > d, the sample

covariance matrices ˆΣ (ℓ) nℓ, and ˆΣ (ℓ) nℓ 1, ℓ= 0, . . . , L are positive deﬁnite almost surely. That is, they are members of Sd ++. Because Sd ++( , ) is a vector space, it follows immediately that the top line of (26)

ˆΣ LEMF n = ˆΣ (0) n0

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 Sd ++

is positive deﬁnite. It remains to show that the bottom line of (26) is equivalent to the top line. This we accomplish by straightforward algebra,

ˆΣ LEMF n = ˆΣ (0) n0

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1

= ˆΣ (0) n0

ℓ=1 αℓ ˆΣ (ℓ) nℓ Exp( Log ˆΣ (ℓ) nℓ 1)

= ˆΣ (0) n0

ℓ=1 Exp αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1

= ˆΣ (0) n0 Exp

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 !

Log ˆΣ (0) n0 +

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 !

C. Proof of Corollary 3.3

We ﬁrst note that, by properties of inner and outer products on Sd, for two random matrices A, B Sd with E[A] = E[B] = 0 we have

E [ A, B F] = E[Tr(A B)]

= Tr(E[A B]) = Tr(Cov[A, B]), (28)

where we have used linearity of trace. In the special case where A = B we have

E[ A 2 F] = E[ A, A F] = Tr (Cov[A]) . (29)

We now proceed with the remainder of the proposition.

Recall that the EMF estimator is deﬁned

ˆΣ EMF n = ˆΣ (0) n0 + XL

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 ,

where α = [α1, . . . , αL] RL are the control variate weights and

ˆΣ (ℓ) nℓ= 1

i=1 y(ℓ) i y(ℓ) i ,

ˆΣ (ℓ) nℓ 1 = 1 nℓ 1

i=1 y(ℓ) i y(ℓ) i (30)

Multi-Fidelity Covariance Estimation

are sample covariance matrices computed from {y(ℓ) i }nℓ i=1, ℓ= 0, . . . , L and we have assumed that E[y(ℓ)] is known and without loss of generality equal to 0, ℓ= 0, . . . , L. Because the means of y(0), . . . , y(L) are known, in deﬁning the SCMs in Equation (30) we have normalized by nℓand nℓ 1 to obtain unbiased sample covariance estimates. Under this assumption the EMF estimator (2) with SCMs deﬁned as in (30) is unbiased,

E h ˆΣ EMF n i =

E ˆΣ (0) n0 + XL

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 = Σ.

Because of the unbiasedness, the Frobenius norm MSE of ˆΣ EMF n is thus equal to the trace of the covariance of ˆΣ EMF n ,

E[|| ˆΣ EMF n Σ||2 F ] = E[|| ˆΣ EMF n E[ ˆΣ EMF n ]||2 F ]

= Tr(Cov[ ˆΣ EMF n ]).

The covariance of ˆΣ EMF n can be decomposed

Cov[ ˆΣ EMF n ] = Cov[ ˆΣ (0) n0 ]

ℓ=1 α2 ℓ(Cov[ ˆΣ (ℓ) nℓ] + Cov[ ˆΣ (ℓ) nℓ 1])

ℓ=1 αℓ Cov[ ˆΣ (ℓ) nℓ, ˆΣ (0) n0 ] + Cov[ ˆΣ (0) n0 , ˆΣ (ℓ) nℓ]

Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (0) n0 ] Cov[ ˆΣ (0) n0 , ˆΣ (ℓ) nℓ 1]

m=ℓ+1 αℓαm Cov[ ˆΣ (ℓ) nℓ, ˆΣ (m) nm ] Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (m) nm ]

Cov[ ˆΣ (ℓ) nℓ, ˆΣ (m) nm 1] + Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (m) nm 1]

m=ℓ+1 αℓαm Cov[ ˆΣ (m) nm , ˆΣ (ℓ) nℓ] Cov[ ˆΣ (m) nm , ˆΣ (ℓ) nℓ 1]

Cov[ ˆΣ (m) nm 1, ˆΣ (ℓ) nℓ] + Cov[ ˆΣ (m) nm 1, ˆΣ (ℓ) nℓ 1]

ℓ=1 α2 ℓ Cov[ ˆΣ (ℓ) nℓ, ˆΣ (ℓ) nℓ 1)] + Cov[ ˆΣ (ℓ) nℓ 1, ˆΣ (ℓ) nℓ]

We will need a result analogous to Lemma 3.2 in Peherstorfer et al. (2016) in order to simplify (31).

Lemma C.1. Consider ˆΣ (ℓ) mi = 1 mi Pmi i =1 y(ℓ) i (y(ℓ) i ) and

ˆΣ (t) mj = 1 mj Pmj j =1 y(t) j (y(t) j ) , where ℓ, t {0, . . . , L}

and mi, mj Z+. It holds that

Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj]

= 1 max{mi, mj}Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ]

Proof. Proceeding algebraically, we see

Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj] =

i =1 y(ℓ) i (y(ℓ) i ) , 1 mj

j =1 y(t) j (y(t) j )

= 1 mimj Cov

i =1 y(ℓ) i (y(ℓ) i ) ,

j =1 y(t) j (y(t) j )

y(ℓ) i (y(ℓ) i ) and y(t) j (y(t) j ) are independent except in the case that i = j . Thus the terms that remain are

Cov[ ˆΣ (ℓ) mi, ˆΣ (t) mj] =

min{mi,mj} X

i =1 Cov[y(ℓ) i (y(ℓ) i ) , y(t) i (y(t) i ) ]

= min{mi, mj}

mimj Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ]

= 1 max{mi, mj}Cov[y(ℓ)(y(ℓ)) , y(t)(y(t)) ].

We return to Equation (31) with this result and simplify, seeing that the doubly-indexed sums are zero,

Cov[ ˆΣ EMF n ] = 1 n0 Cov[y(0)(y(0)) ]

ℓ=1 α2 ℓ( 1

nℓ+ 1 nℓ 1 )Cov[y(ℓ)(y(ℓ)) ]

nℓ 1 nℓ 1 ) Cov[y(ℓ)(y(ℓ)) , y(0)(y(0)) ] +

+ Cov[y(0)(y(0)) , y(ℓ)(y(ℓ)) ]

ℓ=1 α2 ℓ 2 nℓCov[y(ℓ)(y(ℓ)) ].

We rearrange and write the covariance of ˆΣ EMF n compactly

Multi-Fidelity Covariance Estimation

Cov[ ˆΣ EMF n ] = 1 n0 Cov[y(0)(y(0)) ]

ℓ=1 ( 1 nℓ 1 1 nℓ) α2 ℓCov[y(ℓ)(y(ℓ)) ]

αℓ Cov[y(ℓ)(y(ℓ)) , y(0)(y(0)) ]

+ Cov[y(0)(y(0)) , y(ℓ)(y(ℓ)) ] .

The MSE of ˆΣ EMF n is equal to the trace of its covariance,

E[|| ˆΣ EMF n Σ||] = Tr(Cov[ ˆΣ EMF n ])

= σ2 0 n0 +

ℓ=1 ( 1 nℓ 1 1

nℓ )(α2 ℓσ2 ℓ 2αℓρℓσ0σℓ)

which is the result we sought to show.

D. Proof of Proposition 3.2

Proof. Canceling the matrix logarithm with the matrix exponential in the LEMF estimator (7) yields Log ˆΣ LEMF n Log Σ F = Log ˆΣ (0) n0 +

ℓ=1 αℓ Log ˆΣ (ℓ) nℓ Log ˆΣ (ℓ) nℓ 1 Log Σ

Since Σ I F < h/4 < 1 we may deﬁne the matrix logarithm through its Taylor series

k (Σ I)k , (32)

which converges absolutely. Similarly, by the triangle inequality Σ(ℓ) I F Σ I F + Σ(ℓ) Σ F h

2 < 1 . (33) For the sample covariance matrices Σ(ℓ) nℓand Σ(ℓ) nℓ 1, the law of large numbers implies almost surely that for any δℓ> 0 sufﬁciently large sample size nℓguarantees

max n ˆΣ (ℓ) nℓ Σ(ℓ) F , ˆΣ (ℓ) nℓ 1 Σ(ℓ) F

o < δℓ, (34)

for ℓ= 0, . . . , L. Setting δℓ= h/4 and applying the triangle inequality again with the result of (33) gives

max n ˆΣ (ℓ) nℓ I F , ˆΣ (ℓ) nℓ 1 I F

δℓ+ Σ(ℓ) I F < δℓ+ h

4 < 1 . (35)

Expanding the Taylor series for each matrix logarithm up to ﬁrst-order (i.e. Log(Σ) Σ I) and canceling out the resulting identity matrices gives

Log ˆΣ LEMF n Log Σ F = ˆΣ (0) n0 +

ℓ=1 αℓ ˆΣ (ℓ) nℓ ˆΣ (ℓ) nℓ 1 Σ + M

where the matrix M is the remainder of the ﬁrst-order Taylor expansions

k ( ˆΣ (0) n0 I)k

k ( ˆΣ (ℓ) nℓ I)k !

k ( ˆΣ (ℓ) nℓ 1 I)k !

Applying both the triangle and reverse-triangle inequalities gives

Log ˆΣ LEMF n Log Σ F ˆΣ EMF n Σ F

M F . (38) To bound the right-hand-side of (38), we again use the triangle inequality and that Ak F A k F to obtain

ˆΣ (0) n0 I k

ˆΣ (ℓ) nℓ I k

ˆΣ (ℓ) nℓ 1 I k

1 k Σ I k F .

By (35) and the assumption Σ I F < h/4, each series in (39) converges absolutely. Bounding 1/k with 1 and

Multi-Fidelity Covariance Estimation

factoring out a quadratic power for each term in (39) gives

M F ˆΣ (0) n0 I 2

ˆΣ (0) n0 I k

ℓ=1 αℓ ˆΣ (ℓ) nℓ I 2

ˆΣ (ℓ) nℓ I k

ℓ=1 αℓ ˆΣ (ℓ) nℓ 1 I 2

ˆΣ (ℓ) nℓ 1 I k

k=0 Σ I k F .

Evaluating each of these geometric series gives the bound

ˆΣ (0) n0 I 2

1 ˆΣ (0) n0 I F

ˆΣ (ℓ) nℓ I 2

1 ˆΣ (ℓ) nℓ I F

ˆΣ (ℓ) nℓ 1 I 2

1 ˆΣ (ℓ) nℓ 1 I F

+ Σ I 2 F 1 Σ I F .

By (35) we have that

1 ˆΣ (ℓ) nℓ I F

1 ˆΣ (ℓ) nℓ 1 I F

1 1 3h/4 < 4 , (42)

and similarly

1 1 Σ I F < 1 1 h/4 < 4

Substituting the bounds (42) and (43) into (41) gives

M F 4 ˆΣ (0) n0 I 2

ℓ=1 αℓ ˆΣ (ℓ) nℓ I 2

ℓ=1 αℓ ˆΣ (ℓ) nℓ 1 I 2

3 Σ I 2 F .

Applying the bound (35) with (44) gives

h2 = Kh2 . (45)

Factoring using the difference of squares formula (a2 b2 = (a b)(a + b)) gives

Log ˆΣ LEMF n Log Σ 2

F ˆΣ EMF n Σ 2

= Log ˆΣ LEMF n Log Σ F + ˆΣ EMF n Σ F

Log ˆΣ LEMF n Log Σ F ˆΣ EMF n Σ F

Applying the law of large numbers again, we know that almost surely for sufﬁciently large sample sizes

max n ˆΣ EMF n Σ F , Log ˆΣ LEMF n Log Σ F

o < 1 . (47) Combining the bound (45) with (38) and substituting into (46) gives

Log ˆΣ LEMF n Log Σ 2

F ˆΣ EMF n Σ 2

(48) Taking the expectation and applying Jensen s inequality gives the result. E Log ˆΣ LEMF n Log Σ 2

E ˆΣ EMF n Σ 2

Log ˆΣ LEMF n Log Σ 2

F ˆΣ EMF n Σ 2

2Kh2 . (49)

E. Proof of Proposition 3.4 and Corollary 3.5

Since the Euclidean MSE (Corollary 3.3) has the exact same form as the MSE of the scalar multi-ﬁdelity Monte Carlo estimator in Peherstorfer et al. (2016), the optimal allocations and weights there apply directly to the Euclidean multi-ﬁdelity covariance estimator (2) and in ﬁrst-order to the LEMF estimator (7), with the exception that our deﬁnitions of σ2 ℓand ρℓare generalized for matrices. Equation 20 directly follows from (19).

F. Algorithmic Description and Code Sketch of LEMF Estimator

The code shown in Figure 7 sketches an implementation of the LEMF estimator. It is important to note that it only

Multi-Fidelity Covariance Estimation

from numpy import cov from scipy.linalg import logm, expm

# compute sample-size vector n[] and coefficients a[] as in Proposition 3.4 # collect samples n_l x d of level l = 0, ..., L in array samps[l]

S = logm(cov(samps[0].T)) for i in range(len(a)):

S = S + a[i]*(logm(cov(samps[i+1].T))-logm(cov(samps[i+1][:n[i]].T))) S = expm(S)

Figure 7. This code sketches an implementation of the LEMF estimator. It builds on standard numerical linear algebra routines that are readily available in Num Py/Sci Py (Harris et al., 2020; Virtanen et al., 2020).

requires numerical linear algebra functions that are readily available in, e.g., Num Py/Sci Py (Harris et al., 2020; Virtanen et al., 2020). A detailed implementation to reproduce the shown numerical results is provided in the supplemental material.

G. Supplemental to Motivating Example with Gaussian

Consider high-ﬁdelity samples y(0) N(0, Σ) with Σ Sd ++ and low-ﬁdelity samples which correspond to highﬁdelity samples corrupted by additional independent noise,

y(ℓ) = y + ϵ(ℓ) , ϵ N(0, Γ(ℓ)) . (50)

Because the samples are Gaussian we may compute explicitly the generalized variances (10) and generalized correlations (9) needed for the optimal sample allocation in Theorem 3.4.

Generalized variance For the generalized variance we have that

Tr(Cov[y(0)(y(0)) ])

= E y(0)(y(0)) Σ 2

= E y(0)(y(0)) 2

2E h D y(0)(y(0)) , Σ E

= E h Tr(y(0)(y(0)) y(0)(y(0)) ) i Σ 2 F .

Focusing on the ﬁrst term we see

Tr y(0) 2 y(0)(y(0)) = y(0) 2 d X

y(0) i 2 = y(0) 4 ,

Tr Cov[y(0)(y(0)) ] = E y(0) 4 Σ 2 F ,

and is analogous to the formula Var[x] = E[x2] E[x]2

for scalar random variables. For a multivariate Gaussian random variable with zero mean we have that

E y(0) 4 = Tr(Σ)2 + 2Tr(Σ2) . (51)

Tr(Cov[y(0)(y(0)) ]) = Tr(Σ2) + Tr(Σ)2 . (52)

Similarly, we may apply (52) for the low-ﬁdelity samples y(ℓ) N(0, Σ + Γ(ℓ)).

Generalized correlation In order to compute the generalized correlation we compute the cross-covariance,

Tr(Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ])

= E h D y(0)(y(0)) Σ, (y(ℓ))(y(ℓ)) Σ(ℓ)E

= E h Tr y(0)(y(0)) (y(ℓ))(y(ℓ)) i Tr(ΣΣ(ℓ))

= E (y(0)) (y(ℓ)) 2 Tr(ΣΣ(ℓ)) ,

where we have used the notation Σ(ℓ) = Σ + Γ(ℓ). Writing y(ℓ) = y(0) + ϵ(ℓ) and expanding the squared inner product gives

(y(0)) (y(ℓ)) 2 = y(0) 2 + (y(0)) ϵ(ℓ) 2

= y(0) 4 + 2 y(0) 2 (y(0)) ϵ(ℓ)+ (y(0)) ϵ(ℓ) 2 .

For the ﬁrst term we may use (51) while for the second term we have by independence that

E y(0) 2 (y(0)) ϵ(ℓ) =

E y(0) 2 y(0) E h ϵ(ℓ)i = 0 .

Multi-Fidelity Covariance Estimation

For the last term we rely on properties of the trace operator.

E (y(0)) ϵ(ℓ) 2 = E h Tr (y(0)) ϵ(ℓ)(y(0)) ϵ(ℓ) i

= E h Tr (y(0)) ϵ(ℓ)(ϵ(ℓ)) y(0) i

= E h Tr y(0)(y(0)) (ϵ(ℓ))(ϵ(ℓ)) i

= Tr E[y(0)(y(0)) ]E[(ϵ(ℓ))(ϵ(ℓ)) ]

= Tr(ΣΓ(ℓ)) .

Combining everything gives the cross-covariance

Tr(Cov[y(0)(y(0)) , (y(ℓ))(y(ℓ)) ])

= Tr(Σ)2 + Tr(Σ2) = σ2 0 , (53)

which is analogous to the scalar setting where Cov[y, ϵ] = 0 so that Cov[y, y + ϵ] = Var[y]. Using the notation σ0, σℓ, and ρℓdeﬁned in Section 3.3 we have that the generalized correlation is given by

Notice that σℓ> σ and so ρℓ< 1.

Example set up For the example presented in Section 4.1, we select Σ randomly from the Wishart(4) ensemble, that is, Σ = A A where A R4 4 has i.i.d. standard normal entries. The particular realization of Σ we obtained was

2.52 0.17 0.67 0.98 0.17 0.64 0.29 0.35 0.67 0.29 0.49 0.52 0.98 0.35 0.52 1.31

The noise covariance matrices deﬁning the low-ﬁdelity samples at each level ℓ {1, 2, 3} are given by

Γ(1) = 0.1I4 4, Γ(2) = 0.5I4 4, Γ(3) = I4 4 ,

and result in generalized correlations (9)

ρ1 0.93, ρ2 0.74, ρ3 0.58 .

We impose sampling costs c0 = 1 and cℓ= 10 ℓ 1 for ℓ {1, 2, 3}. Setting the computational budget to B = 15 and taking sample sizes n from Theorem 3.4 yields

n = 12, n1 = 199, n2 = 505, n3 = 2073 .

Using the derived sample allocation we draw the corresponding number of samples from each level to obtain the three multi-ﬁdelity estimators: EMF (2), LEMF (7), and the truncated estimator (22). For the truncated multi-ﬁdelity estimator, we set a threshold of δ = 10 16 on the eigenvalues. Additionally, we also draw nhigh = B/c = 15

high-ﬁdelity samples and nlow = B/c3 = 1.5 105 lowﬁdelity samples y(3) to compute single-ﬁdelity estimates with the same computational budget. The results reported in Table 1 are estimated MSE averaged over 100 independent trials. Across all 100 trials, about 5% of the EMF estimates were indeﬁnite.

H. Supplemental to Uncertainty Quantiﬁcation of Steady-State Heat Flow

For demonstration purposes, we show in Figure 8 the numerical solution of the steady-state heat-ﬂow problem (23) for three different, randomly sampled parameters θ. Observations y(ℓ), ℓ {0, 1}, consist of temperature measurements at 10 equidistant locations over the spatial domain, where ℓcorresponds to the number of grid points used to numerically solve (23). Our goal in this example is to estimate the covariance of y(0), the observations corresponding to the highest number of solver grid points.

We perform a pilot study to estimate the generalized variances and correlations σ0, σ1, ρ1 needed to compute the optimal multi-ﬁdelity sample allocations n = [n(0) , n(1) ] and coefﬁcients α . We use 105 samples of θ for the pilot study to avoid mixing errors of the optimal sample allocation with the MSEs of the estimators.

Figure 9 is analogus to Figure 2 and shows the MSEs of our LEMF estimator compared to the MSEs of single-ﬁdelity and other multi-ﬁdelity estimators, except that now the minimum and maximum MSEs over all 100 trials are shown with error bars.

Figure 10a-b shows speedups of our LEMF estimator with respect to the Euclidean and afﬁne-invariant metric. As the budget is increased, the Euclidean estimator becomes indeﬁnite less frequently, which can be seen in Figure 10c.

Figure 11 visualizes the convergence behavior of various estimators. Our LEMF estimator converges more quickly to the true value of Σ element-wise than the single-ﬁdelity estimators; this is particularly true for the diagonal elements of Σ. The y-scale is comparable to that of Figure 3c and the x-scale shows the costs and is the same as the x-axis of Figure 3c.

I. Supplemental to Metric Learning: Surface Quasi-Geostrophy

We follow the surface quasi-geostrophic (SQG) model for the evolution of surface buoyancy b : X [0, ) R in the periodic domain X = [ π, π]2 ( , 0] presented

Multi-Fidelity Covariance Estimation

0 0.2 0.4 0.6 0.8 1

numerical solution u(x, θ)

spatial domain x

Figure 8. Supplemental to heat ﬂow example: Three sample solutions (temperature) of the steady-state heat equation (23) corresponding to different random parameters θ R4.

in Held et al. (1985) and Capet et al. (2008).

tb(x, t; θ) + J(ψ, b) = 0 , z = 0 ,

ψ = 0 , z < 0

Here x = [x, y, z] X and the Jacobian is

J(ψ, b) = ψ

The SQG equation (55) is solved by applying a Fourier transform and integrating in time with ﬁnite differences. The parameter vector θ = [θ1, . . . , θ5] deﬁnes the initial buoyancy b0 : R2 R at the surface z = 0 as well as the ﬂow. The initial buoyancy is given by the Gaussian

b0(x, y; θ) = 1 (2π/|θ5|)2 exp x2 exp(2θ1)y2 ,

(57) where θ1 is the log aspect ratio that determines the shape of the ellipse, see Figure 12, and θ5 controls the amplitude. Of the remaining parameters θ2 is the gradient Coriolis parameter, θ3 is the log buoyancy frequency, and θ4 is background zonal ﬂow along the x-axis.

The high-ﬁdelity random variable consists of nine equallyspaced observations of the solution b(x, T; θ) with T = 24, where b( , ; θ) is computed numerically using a time step of size t = 0.005 and 256 grid points along each coordinate axis. The surrogate random variable consists of observations of the solution at the same locations and time except with

b( , ; θ) computed numerically using only 64 grid points along each coordinate axis. The costs of each sampling each random variable are taken to be proportional to the total number of spatial grid points used in the solvers, c0 = 2562 = 65, 536 and c1 = 642 = 4, 096. The observation points are shown in plot (c) and (d) in Figure 12.

The input parameters θ are drawn from a Gaussian mixture model where i {0, 1} is a latent variable determining the mixture component sampled from. The full distribution of θ is

p(θ | i) = N(µi, C) , i Bernoulli(1/2) ,

µ0 = [1, 0, 0, 0, 4] , µ1 = [0.1, 0, 0, 0, 4] ,

Note that we ﬁx the third parameter θ3 = 0, but the covariance matrix of the output observations y(0) and y(1) will still be positive deﬁnite.

The ﬁrst class i = 0 corresponds to an initial buoyancy with a larger log-aspect ratio, which introduces an instability, while the second class i = 1 corresponds to a smaller logaspect ratio. Figure 12 shows the buoyancy at initial and ﬁnal time for stochastic inputs corresponding to both classes. We sample the class variable i from a Bernoulli distribution with parameter 1/2, i.e., i = 0 and i = 1 are drawn with equal probability.

We deﬁne the mean relative error of distances MRE(A) for the metric A with respect to the reference metric A0 as

MRE(A) = 1 5000

i=1 |d A(yi, 0) d A0(yi, 0)|

d A0(y, 0) .

Multi-Fidelity Covariance Estimation

HF only Euclidean MF truncated MF LEMF (ours)

1e+07 1e+08 1e+09

MSE in log-Euclidean metric

costs (budget B)

1e+07 1e+08 1e+09

MSE in afﬁne-invariant metric

costs (budget B)

1e+07 1e+08 1e+09

MSE in Euclidean metric

costs (budget B)

(a) log-Euclidean metric (b) afﬁne-invariant metric (c) Euclidean metric

Figure 9. Supplemental for heat-ﬂow problem: Plots analogous to Figure 2, except that the minimum and maximum MSE over all 100 trials are shown as error bars. Note that the surrogate-only estimator is not shown to ease exposition. The Euclidean multi-ﬁdelity estimator is shown only in plot (c) because it becomes indeﬁnite and therefore the MSE cannot be computed in the log-Euclidean and the afﬁne-invariant metric.

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

18 18 18 18 18 18

HF only surrogate only Euc. MF trunc. MF LEMF (ours)

nan nan (biased)

1e+07 1e+08 1e+09

trials indeﬁnite [%]

costs (budget B)

(a) speedup Euc. metric, tol. 2 10 4 (b) speedup afﬁne-invariant metric, tol. 3 101 (c) indeﬁniteness of Euc. MF

Figure 10. Supplemental for heat ﬂow problem: Plots (a) and (b) show the speedups of our LEMF estimator compared to the single-ﬁdelity estimator that uses the high-ﬁdelity samples alone. Note that the Euclidean estimator is indeﬁnite for more than 10% of the trials and thus the speedup with respect to the afﬁne-invariant metric cannot be computed. Plot (c) shows the percentage of the trails of the Euclidean multi-ﬁdelity estimator that lead to an indeﬁnite covariance matrix estimate.

Multi-Fidelity Covariance Estimation

costs HF only surrogate only

costs LEMF (ours) truth

Figure 11. Supplemental for heat-ﬂow problem: The plot shows the convergence behavior of the estimators. Our LEMF estimator is closer to the true value (dashed black) than the single-ﬁdelity estimator, especially for low costs and for diagonal entries of the covariance matrix.

Multi-Fidelity Covariance Estimation

(a) class i = 0, initial buoyancy (b) class i = 1, initial buoyancy

(c) class i = 0, ﬁnal-time buoyancy (d) class i = 1, ﬁnal-time buoyancy

Figure 12. Supplemental for metric learning: Plots show examples of the buoyancy at initial (top) and ﬁnal time (bottom) for θ sampled from class i = 0 (left) and i = 1 (right). Observations consist of solution values at nine spatial locations in the domain, as depicted in plots (c) and (d). We use the observations to estimate a metric which will distinguish between observations corresponding to θ sampled from class i = 0 and θ sampled from class i = 1.