# robust_portfolio_optimization__6a4e6a33.pdf Robust Portfolio Optimization Huitong Qiu Department of Biostatistics Johns Hopkins University Baltimore, MD 21205 hqiu7@jhu.edu Fang Han Department of Biostatistics Johns Hopkins University Baltimore, MD 21205 fhan@jhu.edu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544 hanliu@princeton.edu Brian Caffo Department of Biostatistics Johns Hopkins University Baltimore, MD 21205 bcaffo@jhsph.edu We propose a robust portfolio optimization approach based on quantile statistics. The proposed method is robust to extreme events in asset returns, and accommodates large portfolios under limited historical data. Specifically, we show that the risk of the estimated portfolio converges to the oracle optimal risk with parametric rate under weakly dependent asset returns. The theory does not rely on higher order moment assumptions, thus allowing for heavy-tailed asset returns. Moreover, the rate of convergence quantifies that the size of the portfolio under management is allowed to scale exponentially with the sample size of the historical data. The empirical effectiveness of the proposed method is demonstrated under both synthetic and real stock data. Our work extends existing ones by achieving robustness in high dimensions, and by allowing serial dependence. 1 Introduction Markowitz s mean-variance analysis sets the basis for modern portfolio optimization theory [1]. However, the mean-variance analysis has been criticized for being sensitive to estimation errors in the mean and covariance matrix of the asset returns [2, 3]. Compared to the covariance matrix, the mean of the asset returns is more influential and harder to estimate [4, 5]. Therefore, many studies focus on the global minimum variance (GMV) formulation, which only involves estimating the covariance matrix of the asset returns. Estimating the covariance matrix of asset returns is challenging due to the high dimensionality and heavy-tailedness of asset return data. Specifically, the number of assets under management is usually much larger than the sample size of exploitable historical data. On the other hand, extreme events are typical in financial asset prices, leading to heavy-tailed asset returns. To overcome the curse of dimensionality, structured covariance matrix estimators are proposed for asset return data. [6] considered estimators based on factor models with observable factors. [7, 8, 9] studied covariance matrix estimators based on latent factor models. [10, 11, 12] proposed to shrink the sample covariance matrix towards highly structured covariance matrices, including the identity matrix, order 1 autoregressive covariance matrices, and one-factor-based covariance matrix estimators. These estimators are commonly based on the sample covariance matrix. (sub)Gaussian tail assumptions are required to guarantee consistency. For heavy-tailed data, robust estimators of covariance matrices are desired. Classic robust covariance matrix estimators include M-estimators, minimum volume ellipsoid (MVE) and minimum covari- ance determinant (MCD) estimators, S-estimators, and estimators based on data outlyingness and depth [13]. These estimators are specifically designed for data with very low dimensions and large sample sizes. For generalizing the robust estimators to high dimensions, [14] proposed the Orthogonalized Gnanadesikan-Kettenring (OGK) estimator, which extends [15] s estimator by re-estimating the eigenvalues; [16, 17] studied shrinkage estimators based on Tyler s M-estimator. However, although OGK is computationally tractable in high dimensions, consistency is only guaranteed under fixed dimension. The shrunken Tylor s M-estimator involves iteratively inverting large matrices. Moreover, its consistency is only guaranteed when the dimension is in the same order as the sample size. The aforementioned robust estimators are analyzed under independent data points. Their performance under time series data is questionable. In this paper, we build on a quantile-based scatter matrix1 estimator, and propose a robust portfolio optimization approach. Our contributions are in three aspects. First, we show that the proposed method accommodates high dimensional data by allowing the dimension to scale exponentially with sample size. Secondly, we verify that consistency of the proposed method is achieved without any tail conditions, thus allowing for heavy-tailed asset return data. Thirdly, we consider weakly dependent time series, and demonstrate how the degree of dependence affects the consistency of the proposed method. 2 Background In this section, we introduce the notation system, and provide a review on the gross-exposure constrained portfolio optimization that will be exploited in this paper. 2.1 Notation Let v = (v1, . . . , vd)T be a d-dimensional real vector, and M = [Mjk] Rd1 d2 be a d1 d2 matrix with Mjk as the (j, k) entry. For 0 < q < , we define the ℓq vector norm of v as v q := (Pd j=1 |vj|)1/q and the ℓ vector norm of v as v := maxd j=1 |vj|. Let the matrix ℓmax norm of M be M max := maxjk |Mjk|, and the Frobenius norm be M F := q P Let X = (X1, . . . , Xd)T and Y = (Y1, . . . , Yd)T be two random vectors. We write X d= Y if X and Y are identically distributed. We use 1, 2, . . . to denote vectors with 1, 2, . . . at every entry. 2.2 Gross-exposure Constrained GMV Formulation Under the GMV formulation, [18] found that imposing a no-short-sale constraint improves portfolio efficiency. [19] relaxed the no-short-sale constraint by a gross-exposure constraint, and showed that portfolio efficiency can be further improved. Let X Rd be a random vector of asset returns. A portfolio is characterized by a vector of investment allocations, w = (w1, . . . , wd)T, among the d assets. The gross-exposure constrained GMV portfolio optimization can be formulated as min w w TΣw s.t. 1Tw = 1, w 1 c. (2.1) Here 1Tw = 1 is the budget constraint, and w 1 c is the gross-exposure constraint. c 1 is called the gross exposure constant, which controls the percentage of long and short positions allowed in the portfolio [19]. The optimization problem (2.1) can be converted into a quadratic programming problem, and solved by standard software [19]. In this section, we introduce the quantile-based portfolio optimization approach. Let Z R be a random variable with distribution function F, and {zt}T t=1 be a sequence of observations from Z. For a constant q [0, 1], we define the q-quantiles of Z and {zt}T t=1 to be Q(Z; q) = Q(F; q) := inf{z : P(Z z) q}, b Q({zt}T t=1; q) := z(k) where k = min n t : t 1A scatter matrix is defined to be any matrix proportional to the covariance matrix by a constant. Here z(1) . . . z(T ) are the order statistics of {zt}T t=1. We say Q(Z; q) is unique if there exists a unique z such that P(Z z) = q. We say b Q({zt}T t=1; q) is unique if there exists a unique z {z1, . . . , z T } such that z = z(k). Following the estimator Qn [20], we define the population and sample quantile-based scales to be σQ(Z) := Q(|Z e Z|; 1/4) and bσQ({zt}T t=1) := b Q({|zs zt|}1 s 0. Secondly, the projection (3.3) is more robust compared to the OGK estimate [14]. OGK induces positive definiteness by re-estimating the eigenvalues using the variances of the principal components. Robustness is lost when the data, possibly containing outliers, are projected onto the principal directions for estimating the principal components. Remark 3.3. We adopt the 1/4 quantile in the definitions of σQ and bσQ to achieve 50% breakdown point. However, we note that our methodology and theory carries through if 1/4 is replaced by any absolute constant q (0, 1). 4 Theoretical Properties In this section, we provide theoretical analysis of the proposed portfolio optimization approach. For an optimized portfolio, bwopt, based on an estimate, R, of RQ, the next lemma shows that the error between the risks R(bwopt; RQ) and R(wopt; RQ) is essentially related to the estimation error in R. Lemma 4.1. Let bwopt be the solution to min w R(w; R) s.t. 1Tw = 1, w 1 c (4.1) for an arbitrary matrix R. Then, we have |R(bwopt; RQ) R(wopt; RQ)| 2c2 R RQ max, where wopt is the solution to the oracle portfolio optimization problem (3.2), and c is the grossexposure constant. Next, we derive the rate of convergence for R(ewopt; RQ), which relates to the rate of convergence in e RQ RQ max. To this end, we first introduce a dependence condition on the asset return series. Definition 4.2. Let {Xt}t Z be a stationary process. Denote by F0 := σ(Xt : t 0) and F n := σ(Xt : t n) the σ-fileds generated by {Xt}t 0 and {Xt}t n, respectively. The φ-mixing coefficient is defined by φ(n) := sup B F0 ,A F n ,P(B)>0 |P(A | B) P(A)|. The process {Xt}t Z is φ-mixing if and only if limn φ(n) = 0. Condition 1. {Xt Rd}t Z is a stationary process such that for any j = k {1, . . . , d}, {Xtj}t Z, {Xtj +Xtk}t Z, and {Xtj Xtk}t Z are φ-mixing processes satisfying φ(n) 1/n1+ϵ for any n > 0 and some constant ϵ > 0. The parameter ϵ determines the rate of decay in φ(n), and characterizes the degree of dependence in {Xt}t Z. Next, we introduce an identifiability condition on the distribution function of the asset returns. Condition 2. Let f X = ( e X1, . . . , e Xd)T be an independent copy of X1. For any j = k {1, . . . , d}, let F1;j, F2;j,k, and F3;j,k be the distribution functions of |X1j e Xj|, |X1j + X1k e Xj e Xk|, and |X1j X1k e Xj + e Xk|. We assume there exist constants κ > 0 and η > 0 such that inf |y Q(F ;1/4)| κ d dy F(y) η for any F {F1;j, F2;j,k, F3;j,k : j = k = 1, . . . , d}. Condition 2 guarantees the identifiability of the 1/4 quantiles, and is standard in the literature on quantile statistics [22, 23]. Based on Conditions 1 and 2, we can present the rates of convergence for b RQ and e RQ. Theorem 4.3. Let {Xt}t Z be an absolutely continuous stationary process satisfying Conditions 1 and 2. Suppose log d/T 0 as T . Then, for any α (0, 1) and T large enough , with probability no smaller than 1 8α2, we have b RQ RQ max r T . (4.2) Here the rate of convergence r T is defined by r T = max n 2 4(1 + 2Cϵ)(log d log α) 4(1 + 2Cϵ)(log d log α) where σQ max := max{σQ(Xj), σQ(Xj + Xk), σQ(Xj Xk) : j = k {1, . . . , d}} and Cϵ := P k=1 1/k1+ϵ. Moreover, if RQ Sλ for Sλ defined in (3.3), we further have e RQ RQ max 2r T . (4.4) The implications of Theorem 4.3 are as follows. 1. When the parameters η, ϵ, and σQ max do not scale with T, the rate of convergence reduces to OP ( p log d/T). Thus, the number of assets under management is allowed to scale exponentially with sample size T. Compared to similar rates of convergence obtained for sample-covariance-based estimators [24, 25, 9], we do not require any moment or tail conditions, thus accommodating heavy-tailed asset return data. 2. The effect of serial dependence on the rate of convergence is characterized by Cϵ. Specifically, as ϵ approaches 0, Cϵ = P k=1 1/k1+ϵ increases towards infinity, inflating r T . ϵ is allowed to scale with T such that Cϵ = o(T/ log d). 3. The rate of convergence r T is inversely related to the lower bound, η, on the marginal density functions around the 1/4 quantiles. This is because when η is small, the distribution functions are flat around the 1/4 quantiles, making the population quantiles harder to estimate. Combining Lemma 4.1 and Theorem 4.3, we obtain the rate of convergence for R(ewopt; RQ). Theorem 4.4. Let {Xt}t Z be an absolutely continuous stationary process satisfying Conditions 1 and 2. Suppose that log d/T 0 as T and RQ Sλ. Then, for any α (0, 1) and T large enough, we have |R(ewopt; RQ) R(wopt; RQ)| 2c2r T , (4.5) where r T is defined in (4.3) and c is the gross-exposure constant. Theorem 4.4 shows that the risk of the estimated portfolio converges to the oracle optimal risk with parametric rate r T . The number of assets, d, is allowed to scale exponentially with sample size T. Moreover, the rate of convergence does not rely on any tail conditions on the distribution of the asset returns. For the rest of this section, we build the connection between the proposed robust portfolio optimization and its moment-based counterpart. Specifically, we show that they are consistent under the elliptical model. Definition 4.5. [26] A random vector X Rd follows an elliptical distribution with location µ Rd and scatter S Rd d if and only if there exist a nonnegative random variable ξ R, a matrix A Rd r with rank(A) = r, a random vector U Rr independent from ξ and uniformly distributed on the r-dimensional sphere, Sr 1, such that X d= µ + ξAU. Here S = AAT has rank r. We denote X ECd(µ, S, ξ). ξ is called the generating variate. Commonly used elliptical distributions include Gaussian distribution and t-distribution. Elliptical distributions have been widely used for modeling financial return data, since they naturally capture many stylized properties including heavy tails and tail dependence [27, 28, 29, 30, 31, 32]. The next theorem relates RQ and R(w; RQ) to their moment-based counterparts, Σ and R(w; Σ), under the elliptical model. Theorem 4.6. Let X = (X1, . . . , Xd)T ECd(µ, S, ξ) be an absolutely continuous elliptical random vector and f X = ( e X1, . . . , e Xd)T be an independent copy of X. Then, we have RQ = m QS (4.6) for some constant m Q only depending on the distribution of X. Moreover, if 0 < Eξ2 < , we have RQ = c QΣ and R(w; RQ) = c QR(w; Σ), (4.7) where Σ = Cov(X) is the covariance matrix of X, and c Q is a constant given by c Q =Q n(Xj e Xj)2 Var(Xj) ; 1 o = Q n(Xj + Xk e Xj e Xk)2 Var(Xj + Xk) ; 1 =Q n(Xj Xk e Xj + e Xk)2 Var(Xj Xk) ; 1 Here the last two inequalities hold when Var(Xj + Xk) > 0 and Var(Xj Xk) > 0. By Theorem 4.6, under the elliptical model, minimizing the robust risk metric, R(w; RQ), is equivalent with minimizing the standard moment-based risk metric, R(w; Σ). Thus, the robust portfolio optimization (3.2) is equivalent to its moment-based counterpart (2.1) in the population level. Plugging (4.7) into (4.5) leads to the following theorem. Theorem 4.7. Let {Xt}t Z be an absolutely continuous stationary process satisfying Conditions 1 and 2. Suppose that X1 ECd(µ, S, ξ) follows an elliptical distribution with covariance matrix Σ, and log d/T 0 as T . Then, we have |R(ewopt; Σ) R(wopt; Σ)| 2c2 where c is the gross-exposure constant, c Q is defined in (4.8), and r T is defined in (4.3). Thus, under the elliptical model, the optimal portfolio, ewopt, obtained from the robust portfolio optimization also leads to parametric rate of convergence for the standard moment-based risk. 5 Experiments In this section, we investigate the empirical performance of the proposed portfolio optimization approach. In Section 5.1, we demonstrate the robustness of the proposed approach using synthetic heavy-tailed data. In Section 5.2, we simulate portfolio management using the Standard & Poor s 500 (S&P 500) stock index data. The proposed portfolio optimization approach (QNE) is compared with three competitors. These competitors are constructed by replacing the covariance matrix Σ in (2.1) by commonly used covariance/scatter matrix estimators: 1. OGK: The orthogonalized Gnanadesikan-Kettenring estimator constructs a pilot scatter matrix estimate using a robust τ-estimator of scale, then re-estimates the eigenvalues using the variances of the principal components [14]. 2. Factor: The principal factor estimator iteratively solves for the specific variances and the factor loadings [33]. 3. Shrink: The shrinkage estimator shrinkages the sample covariance matrix towards a onefactor covariance estimator[10]. 5.1 Synthetic Data Following [19], we construct the covariance matrix of the asset returns using a three-factor model: Xj = bj1f1 + bj2f2 + bj3f3 + εj, j = 1, . . . , d, (5.1) where Xj is the return of the j-th stock, bjk is the loadings of the j-th stock on factor fk, and εj is the idiosyncratic noise independent of the three factors. Under this model, the covariance matrix of the stock returns is given by Σ = BΣf BT + diag(σ2 1, . . . , σ2 d), (5.2) where B = [bjk] is a d 3 matrix consisting of the factor loadings, Σf is the covariance matrix of the three factors, and σ2 j is the variance of the noise εi. We adopt the covariance in (5.2) in our simulations. Following [19], we generate the factor loadings B from a trivariate normal distribution, Nd(µb, Σb), where the mean, µb, and covariance, Σb, are specified in Table 1. After the factor loadings are generated, they are fixed as parameters throughout the simulations. The covariance matrix, Σf, of the three factors is also given in Table 1. The standard deviations, σ1, . . . , σd, of the idiosyncratic noises are generated independently from a truncated gamma distribution with shape 3.3586 and scale 0.1876, restricting the support to [0.195, ). Again these standard deviations are fixed as parameters once they are generated. According to [19], these parameters are obtained by fitting the three-factor model, (5.1), using three-year daily return data of 30 Industry Portfolios from May 1, 2002 to Aug. 29, 2005. The covariance matrix, Σ, is fixed throughout the simulations. Since we are only interested in risk optimization, we set the mean of the asset returns to be µ = 0. The dimension of the stocks under consideration is fixed at d = 100. Given the covariance matrix Σ, we generate the asset return data from the following three distributions. D1: multivariate Gaussian distribution, Nd(0, Σ); Table 1: Parameters for generating the covariance matrix in Equation (5.2). Parameters for factor loadings Parameters for factor returns 0.7828 0.02915 0.02387 0.01018 1.2507 -0.035 -0.2042 0.5180 0.02387 0.05395 -0.00697 -0.0350 0.3156 -0.0023 0.4100 0.01018 -0.00697 0.08686 -0.2042 -0.0023 0.1930 1.0 1.2 1.4 1.6 1.8 2.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) Oracle QNE OGK Factor Shrink 1.0 1.2 1.4 1.6 1.8 2.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) Oracle QNE OGK Factor Shrink 1.0 1.2 1.4 1.6 1.8 2.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) Oracle QNE OGK Factor Shrink Gaussian multivariate t elliptical log-normal 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) matching rate Factor Shrink 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) matching rate Factor Shrink 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 gross exposure constant (c) matching rate Factor Shrink Gaussian multivariate t elliptical log-normal Figure 1: Portfolio risks, selected number of stocks, and matching rates to the oracle optimal portfolios. D2: multivariate t distribution with degree of freedom 3 and covariance matrix Σ; D2: elliptical distribution with log-normal generating variate, log N(0, 2), and covariance matrix Σ. Under each distribution, we generate asset return series of half a year (T = 126). We estimate the covariance/scatter matrices using QNE and the three competitors, and plug them into (2.1) to optimize the portfolio allocations. We also solve (2.1) with the true covariance matrix, Σ, to obtain the oracle optimal portfolios as benchmarks. We range the gross-exposure constraint, c, from 1 to 2. The results are based on 1,000 simulations. Figure 1 shows the portfolio risks R(bw; Σ) and the matching rates between the optimized portfolios and the oracle optimal portfolios2. Here the matching rate is defined as follows. For two portfolios P1 and P2, let S1 and S2 be the corresponding sets of selected assets, i.e., the assets for which the weights, wi, are non-zero. The matching rate between P1 and P2 is defined as r(P1, P2) = |S1 T S2|/|S1 S S2|, where |S| denotes the cardinality of set S. We note two observations from Figure 1. (i) The four estimators leads to comparable portfolio risks under the Gaussian model D1. However, under heavy-tailed distributions D2 and D3, QNE achieves lower portfolio risk. (ii) The matching rates of QNE are stable across the three models, and are higher than the competing methods under heavy-tailed distributions D2 and D3. Thus, we conclude that QNE is robust to heavy tails in both risk minimization and asset selection. 5.2 Real Data In this section, we simulate portfolio management using the S&P 500 stocks. We collect 1,258 adjusted daily closing prices3 for 435 stocks that stayed in the S&P 500 index from January 1, 2003 2Due to the ℓ1 regularization in the gross-exposure constraint, the solution is generally sparse. 3The adjusted closing prices accounts for all corporate actions including stock splits, dividends, and rights offerings. Table 2: Annualized Sharpe ratios, returns, and risks under 4 competing approaches, using S&P 500 index data. QNE OGK Factor Shrink Sharpe ratio c=1.0 2.04 1.64 1.29 0.92 c=1.2 1.89 1.39 1.22 0.74 c=1.4 1.61 1.24 1.34 0.72 c=1.6 1.56 1.31 1.38 0.75 c=1.8 1.55 1.48 1.41 0.78 c=2.0 1.53 1.51 1.43 0.83 return (in %) c=1.0 20.46 16.59 13.18 9.84 c=1.2 18.41 13.15 10.79 7.20 c=1.4 15.58 11.30 10.88 6.55 c=1.6 15.02 11.48 10.68 6.49 c=1.8 14.77 12.39 10.57 6.58 c=2.0 14.51 12.27 10.60 6.76 risk (in %) c=1.0 10.02 10.09 10.19 10.70 c=1.2 9.74 9.46 8.83 9.76 c=1.4 9.70 9.10 8.12 9.14 c=1.6 9.63 8.75 7.71 8.68 c=1.8 9.54 8.39 7.51 8.38 c=2.0 9.48 8.13 7.43 8.18 to December 31, 2007. Using the closing prices, we obtain 1,257 daily returns as the daily growth rates of the prices. We manage a portfolio consisting of the 435 stocks from January 1, 2003 to December 31, 20074. On days i = 42, 43, . . . , 1, 256, we optimize the portfolio allocations using the past 2 months stock return data (42 sample points). We hold the portfolio for one day, and evaluate the portfolio return on day i + 1. In this way, we obtain 1,215 portfolio returns. We repeat the process for each of the four methods under comparison, and range the gross-exposure constant c from 1 to 25. Since the true covariance matrix of the stock returns is unknown, we adopt the Sharpe ratio for evaluating the performances of the portfolios. Table 2 summarizes the annualized Sharpe ratios, mean returns, and empirical risks (i.e., standard deviations of the portfolio returns). We observe that QNE achieves the largest Sharpe ratios under all values of the gross-exposure constant, indicating the lowest risks under the same returns (or equivalently, the highest returns under the same risk). 6 Discussion In this paper, we propose a robust portfolio optimization framework, building on a quantile-based scatter matrix. We obtain non-asymptotic rates of convergence for the scatter matrix estimators and the risk of the estimated portfolio. The relations of the proposed framework with its moment-based counterpart are well understood. The main contribution of the robust portfolio optimization approach lies in its robustness to heavy tails in high dimensions. Heavy tails present unique challenges in high dimensions compared to low dimensions. For example, asymptotic theory of M-estimators guarantees consistency in the rate OP ( p d/n) even for non-Gaussian data [34, 35]. If d n, statistical error diminishes rapidly with increasing n. However, when d n, statistical error may scale rapidly with dimension. Thus, stringent tail conditions, such as sub Gaussian conditions, are required to guarantee consistency for moment-based estimators in high dimensions [36]. In this paper, based on quantile statistics, we achieve consistency for portfolio risk without assuming any tail conditions, while allowing d to scale nearly exponentially with n. Another contribution of his work lies in the theoretical analysis of how serial dependence may affect consistency of the estimation. We measure the degree of serial dependence using the φ-mixing coefficient, φ(n). We show that the effect of the serial dependence on the rate of convergence is summarized by the parameter Cϵ, which characterizes the size of P n=1 φ(n). 4We drop the data after 2007 to avoid the financial crisis, when the stock prices are likely to violate the stationary assumption. 5c = 2 imposes a 50% upper bound on the percentage of short positions. In practice, the percentage of short positions is usually strictly controlled to be much lower. [1] Harry Markowitz. Portfolio selection. The Journal of Finance, 7(1):77 91, 1952. [2] Michael J Best and Robert R Grauer. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results. Review of Financial Studies, 4(2):315 342, 1991. [3] Vijay Kumar Chopra and William T Ziemba. The effect of errors in means, variances, and covariances on optimal portfolio choice. The Journal of Portfolio Management, 19(2):6 11, 1993. [4] Robert C Merton. On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics, 8(4):323 361, 1980. [5] Jarl G Kallberg and William T Ziemba. Mis-specifications in portfolio selection problems. In Risk and Capital, pages 74 87. Springer, 1984. [6] Jianqing Fan, Yingying Fan, and Jinchi Lv. High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1):186 197, 2008. [7] James H Stock and Mark W Watson. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460):1167 1179, 2002. [8] Jushan Bai, Kunpeng Li, et al. Statistical analysis of factor models of high dimension. The Annals of Statistics, 40(1):436 465, 2012. [9] Jianqing Fan, Yuan Liao, and Martina Mincheva. Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(4):603 680, 2013. [10] Olivier Ledoit and Michael Wolf. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance, 10(5):603 621, 2003. [11] Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2):365 411, 2004. [12] Olivier Ledoit and Michael Wolf. Honey, I shrunk the sample covariance matrix. The Journal of Portfolio Management, 30(4):110 119, 2004. [13] Peter J Huber. Robust Statistics. Wiley, 1981. [14] Ricardo A Maronna and Ruben H Zamar. Robust estimates of location and dispersion for highdimensional datasets. Technometrics, 44(4):307 317, 2002. [15] Ramanathan Gnanadesikan and John R Kettenring. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28(1):81 124, 1972. [16] Yilun Chen, Ami Wiesel, and Alfred O Hero. Robust shrinkage estimation of high-dimensional covariance matrices. IEEE Transactions on Signal Processing, 59(9):4097 4107, 2011. [17] Romain Couillet and Matthew R Mc Kay. Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators. Journal of Multivariate Analysis, 131:99 120, 2014. [18] Ravi Jagannathan and T Ma. Risk reduction in large portfolios: Why imposing the wrong constraints helps. The Journal of Finance, 58(4):1651 1683, 2003. [19] Jianqing Fan, Jingjin Zhang, and Ke Yu. Vast portfolio selection with gross-exposure constraints. Journal of the American Statistical Association, 107(498):592 606, 2012. [20] Peter J Rousseeuw and Christophe Croux. Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424):1273 1283, 1993. [21] M. H. Xu and H. Shao. Solving the matrix nearness problem in the maximum norm by applying a projection and contraction method. Advances in Operations Research, 2012:1 15, 2012. [22] Alexandre Belloni and Victor Chernozhukov. ℓ1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39(1):82 130, 2011. [23] Lan Wang, Yichao Wu, and Runze Li. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497):214 222, 2012. [24] Peter J Bickel and Elizaveta Levina. Covariance regularization by thresholding. The Annals of Statistics, 36(6):2577 2604, 2008. [25] T Tony Cai, Cun-Hui Zhang, and Harrison H Zhou. Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38(4):2118 2144, 2010. [26] Kai-Tai Fang, Samuel Kotz, and Kai Wang Ng. Symmetric Multivariate and Related Distributions. Chapman and Hall, 1990. [27] Harry Joe. Multivariate Models and Dependence Concepts. Chapman and Hall, 1997. [28] Rafael Schmidt. Tail dependence for elliptically contoured distributions. Mathematical Methods of Operations Research, 55(2):301 327, 2002. [29] Svetlozar Todorov Rachev. Handbook of Heavy Tailed Distributions in Finance. Elsevier, 2003. [30] Svetlozar T Rachev, Christian Menn, and Frank J Fabozzi. Fat-tailed and Skewed Asset Return Distributions: Implications for Risk Management, Portfolio Selection, and Option Pricing. Wiley, 2005. [31] Kevin Dowd. Measuring Market Risk. Wiley, 2007. [32] Torben Gustav Andersen. Handbook of Financial Time Series. Springer, 2009. [33] Jushan Bai and Shuzhong Shi. Estimating high dimensional covariance matrices and its applications. Annals of Economics and Finance, 12(2):199 215, 2011. [34] Sara Van De Geer and SA Van De Geer. Empirical Processes in M-estimation. Cambridge University Press, Cambridge, 2000. [35] Alastair R Hall. Generalized Method of Moments. Oxford University Press, Oxford, 2005. [36] Peter B uhlmann and Sara Van De Geer. Statistics for High-dimensional Data: Methods, Theory and Applications. Springer, 2011.