Robust estimation of highdimensional covariance and. We also show that generalized thresholding is, in the terminology of lam and fan 2007, sparsistent, meaning that in addition to being consistent it estimates true zeros as. The tests are applicable i when the data dimension is much larger than the sample. A twosample test for highdimensional data with applications to geneset testing chen, song xi and qin, yingli, the annals of statistics, 2010. The problems arise from statistical analysis of large panel economics. Corrections to lrt on largedimensional covariance matrix by rmt bai, zhidong, jiang, dandan, yao, jianfeng, and zheng, shurong, the annals of statistics, 2009. Shrinkage estimators for highdimensional covariance matrices brian williamson abstract. Saowapha chaipitak degree doctor of philosophy statistics year 2012 in multivariate statistical analysis, it is a necessity to know the facts regarding the covariance matrix of the data in hand before applying any further analysis. Estimation of large covariance matrices, particularly in situations where the data dimension p is comparable to or larger than the sample size n, has attracted a lot of attention recently. In the central limit theorem of linear spectral statistics for sample covariance matrices, the theoretical mean and covariance are computed numerically. Largest entries of sample correlation matrices from equicorrelated normal populations. Global testing and largescale multiple testing for high.
Two sample tests for highdimensional covariance matrices. Department of statistics, university of california, berkeley abstract we place ourselves in the setting of highdimensional statistical inference, where the. In the highdimensional settings, these methods either do not perform well or are no longer applicable. Massive data analyses and statistical learning in many real applications require a careful understanding of the high dimensional covariance structure. Large sample approximations for variancecovariance matrices of. As highdimensional data becomes ubiquitous, standard estimators of the population covariance matrix become di cult to use. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of bigdata. Sparse estimation of large covariance matrices via a.
An overview on the estimation of large covariance and. Analysis of high dimensional data, whose dimension pcan be much larger than the sample. Spectral analysis of highdimensional sample covariance matrices. Power computation for hypothesis testing with high. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of big data. The sample covariance matrix is regarded a poor estimator, since it is not consistent w. Sample covariance matrices and highdimensional data analysis revised draft april 2019 the is a revision of the book published by cambridge university press in 2015 isbn.
Pdf highdimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. Cambridge series in statistical and probabilistic mathematics 39. Exact separation of eigenvalues of large dimensional sample covariance matrices bai, z. Estimating covariance matrices is an important part of portfolio selection, risk management, and asset pricing. It is wellknown that the sample covariance based on the observed data is singular when the dimension is larger than the sample size. Covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. More recently, his research group has developed new statistical methods for highdimensional data analysis. This book places particular emphasis on random vectors, random matrices, and random projections. Many problems in statistical pattern recognition and analysis require the classi. Sparse estimation of highdimensional covariance matrices. Methods for estimating sparse and large covariance matrices covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. Highdimensional data appear in many fields, and their analysis has become increasingly important in modern statistics.
However, it has long been observed that several wellknown methods in multivariate analysis become inef. Spectrum estimation for large dimensional covariance. Large sample covariance matrices and highdimensional. Estimating a covariance matrix based on a sample of multivariate. Large sample covariance matrices and highdimensional data analysis highdimensional data appear in many fields, and their analysis. Pdf large sample covariance matrices and highdimensional. Covariance estimation for high dimensional data vectors. Large covariance matrix typically plays a role through either its quadratic and spectral functionals or a structure of lowrank plus sparse components.
Two sample tests for highdimensional covariance matrices 911 those based on the other norm, which is especially the case when considering the limiting distribution of the test statistics. We may say that the main success of multiparametric statistics is based on methods of spectral theory of large sample covariance matrices and their limit spectra. These include estimation of and testing for large covariance matrices, volatility matrices, correlation matrices, precision matrices, gaussian graphical models. Probability, statistical theory and methods, statistics and probability. Generalized thresholding of large covariance matrices. Testing high dimensional covariance matrices 2361 settings here include all the cases where the dimension ppn.
Large sample covariance matrices and highdimensional data. Techniques and asymptotic theory for highdimensional covariance matrix estimates are quite different from the lowdimensional ones. The abundance of highdimensional data is one reason for the interest in the problem. Methods for estimating sparse and large covariance matrices. Global testing and testing for highdimensional covariance. However, it has long been observed that several wellknown methods in multivariate analysis become inefficient, or even misleading, when the data dimension p. Jianfeng yao, university of hong kong, zhidong bai, northeast normal. This book deals with the analysis of covariance matrices under two different assumptions.
Compressed covariance estimation with automated dimension. For inference on highdimensional covariance matrices, there has been an array of works on the convergence of the sample covariance matrices based on the spectral analysis of largedimensional random matrices bai and yin 1993. Estimating structured highdimensional covariance and. Large sample covariance matrices and highdimensional data analysis.
University of california, berkeley estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental i mportance in multivariate statistics. An overview on the estimation of large covariance and precision matrices jianqing fan. The central limit theorems clts for linear spectral statistics of highdimensional noncentralized sample covariance matrices have received considerable attention in random matrix theory and have been applied to many highdimensional statistical problems. However, covariance estimation for high dimensional vectors is a classically dif. Covariance matrix an overview sciencedirect topics. A less developed theory nonparametric estimation of sparse means y i. Main large sample covariance matrices and highdimensional data analysis large sample covariance matrices and highdimensional data analysis bai, zhidong, yao, jianfeng, zheng, shurong. Highdimensional covariance estimation provides accessible and comprehensive. How to analyze highdimensional highlycorrelated vector time series. The latter renders formulations of test procedures and power analysis, as. In the classical setting of low dimension and large sample size, several methods have been developed for testing specific global patterns of covariance matrices. Large sample covariance matrices and highdimensional data analysis large sample covariance matrices and highdimensional data analysis bai, zhidong, yao, jianfeng, zheng, shurong.
Highdimensional covariance estimation provides accessible and comprehensive coverage of. Sample covariance matrices and highdimensional data analysis. Large sample covariance matrices and highdimensional data analysis highdimensional data appear in many. We propose two tests for the equality of covariance matrices between two highdimensional populations. Large sample covariance matrices and highdimensional data analysis jianfeng yao, shurong zheng and zhidong bai excerpt. Shrinkage estimators for highdimensional covariance matrices. Estimating covariance matrices has always been an important part of multivariate analysis, and estimating large covariance matrices where the dimension of the data p is comparable to or larger than the sample size nhas gained particular attention recently, since highdimensional data are so common in.
Sample covariance matrices are widely used in multivariate statistical analysis. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Large random matrices and applications to highdimensional. How to analyze high dimensional highlycorrelated vector time series. Minimax rates of convergence for estimating several classes of structured covariance and precision matrices, including bandable, toeplitz, and sparse covariance matrices as well as sparse precision matrices, are given under the spectral norm loss. It teaches basic theoretical skills for the analysis. Large sample covariance matrices and highdimensional data analysis high dimensional data appear in many fields, and their analysis. We study highdimensional sample covariance matrices based on. Tests for highdimensional covariance matrices statistics and. Optimal hypothesis testing for high dimensional covariance. Two sample tests for high dimensional covariance matrices. Estimating high dimensional covariance matrices and its applications. Highdimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components.
Highdimensional variancecovariance matrices play a crucial role in those areas, since they provide information on the dependence of the coordinates 2nd order. Highdimensional covariance estimation mohsen pourahmadi. Highdimensional probability is an area of probability theory that studies random objects in rn where the dimension ncan be very large. Gaussian fluctuations for linear spectral statistics of large random covariance matrices najim, jamal and yao, jianfeng, the annals of applied probability, 2016. One test is on the whole variancecovariance matrices, and the other is on offdiagonal submatrices which define the covariance between two nonoverlapping segments of the highdimensional random vectors. Sorry, we are unable to provide the full text but you may find it at the following locations. However, it has long been observed that several wellknown methods in multivariate analysis. Bickel and elizavetalevina1 university of california, berkeley and university of michigan this paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix. Large sample approximations for variancecovariance.
1420 1507 981 1226 790 1030 246 1579 1389 1262 216 513 987 1185 929 1160 965 1619 1481 566 1641 1087 926 210 580 423 512 1061 1466 711 375 1658 932 281 1423 1553 1285 603 905 724 1024 416 370 635 1168 591