Chakrabarti and Sen
*Rituparna Sen, Indian Statistical Institute, Bangalore, KA, India.
Limiting Spectral Distribution of High-dimensional Hayashi-Yoshida Estimator of Integrated Covariance Matrix
Abstract
[Summary]In this paper, the estimation of the Integrated Covariance matrix from high-frequency data, for high dimensional stock price process, is considered. The Hayashi-Yoshida covolatility estimator is an improvement over Realized covolatility for asynchronous data and works well in low dimensions. However it becomes inconsistent and unreliable in the high dimensional situation. We study the bulk spectrum of this matrix and establish its connection to the spectrum of the true covariance matrix in the limiting case where the dimension goes to infinity. The results are illustrated with simulation studies in finite, but high, dimensional cases. An application to real data with tick-by-tick data on 50 stocks is presented.
keywords:
Asynchronicity, Integrated Covariance, Realized Variance, Spectral Distribution, High-frequency data.1 Introduction
Intraday financial data of multiple stocks are almost always nonsynchronous in nature. If not adjusted appropriately, nonsynchronous trading can affect multivariate stock price data analysis and the resulting inference quite heavily 1. The analysis of (intraday) financial data would fail to capture the reality of the financial market if the effect of asynchronicity is ignored. 2. Despite this problem, intraday data are important to measure the (co)variance of daily log returns of a given set of securities and can offer additional information compared to the estimate obtained from daily financial data. This covariance is called integrated (co)variance (or integrated (co)volatility). For low-dimensional stock-price process, integrated covariance can be accurately estimated by the Hayashi-Yoshida estimator 3. But in case of high-dimensional data, it suffers from the same problems as any sample covariance matrix under high-dimensional set up 4. As a consequence, its eigenvalue spectrum deviates considerably from the population counterpart. In this study, we derive the limiting spectral distribution of the Hayashi-Yoshida estimator for high-dimensional data.
In 1979, T. W. Epps reported that stock return correlations decrease as the sampling frequency of data increases 1. This is one of the earliest manifestations of the problems caused by asynchronicity and is known as the Epps effect. Later the phenomenon has been reported in several studies of different stock markets 5, 6, 7 and foreign exchange markets 8, 9. This is primarily a result of asynchronicity of price observations and the existing lead-lag relation between asset prices 10, 11, 12. Empirical results showed that considering only the synchronous or nearly synchronous ticks mitigates the problem significantly 10. In several studies it was shown that asynchronicity can induce potentially serious bias in the estimates of moments and co-moments of asset returns, such as their means, variances, covariances, betas, and autocorrelation and cross-autocorrelation coefficients 12, 13, 14, 15.
Integrated volatility is defined as the variance of the log return over a day of a given security. In fact, due to its high frequency, intraday financial data has been proven to be more efficient in measuring daily volatility when compared to daily financial data 16. For a single stock, Merton (1980) showed that the variance over a fixed interval can be estimated accurately as sum of squared realization as long as the data are available at sufficiently high sampling frequency 17. This estimator is known as Realized volatility. But often univariate modeling is not sufficient, it is also important to model the correlation dynamics between several assets. Hence one of the parameters of interest, to accurately estimate and infer about, is the integrated co-volatility or integrated covariance matrix. Analogous to the realized volatility, for a multivariate stock price process, the realized covolatility matrix/realized covariance matrix is defined. But the realized covolatility matrix relies upon synchronous observations and can not be readily extended for asynchronous data. Therefore in order to evaluate the realized covolatility, we have to first “synchronize” the data. Fixed clock time and refresh time 18 samplings are two such synchronizing algorithms widely used in practice. But the realized covariance, evaluated on a synchronous grid, is biased 3. 3 proposed an unbiased estimator of the Integrated covolatility that is applicable on intraday data without a need for synchronization. We will call this estimator as the Hayashi-Yoshida estimator. Although in presence of microstructure noise Hayashi-Yoshida estimator is also biased, a bias-corrected version was developed 19.
Hayashi-Yoshida estimator has good asymptotic properties as long as the data comes from an underlying low-dimensional diffusion process. But as the dimension of the data increases the estimator becomes inefficient. Developing a good estimator of high dimensional covolatility is challenging unless we impose some structure. A consistent and positive definite estimator is proposed based on blocking and regularization techniques 20. The central idea is to obtain one large covariance matrix from a series of smaller covariance matrices, each based on different sampling frequency. Shrinkage estimator of the covariance matrix with optimal shrinkage intensity, which is also important for portfolio optimization, also reduces the estimation error significantly 21. Many other modified shrinkage estimator with good asymptotic properties are proposed and applied in financial context 22, 23. The mixed frequency factor models, which uses high-frequency data to estimate factor covariance and low-frequency data to estimate the factor loadings, are also used to estimate high dimensional covariance matrices 24. The composite realized Kernel approach that estimates each entry of ICV matrix optimally (in terms of bandwidth and data loss) has been proposed and asymptotic properties are established 25. High-dimensionality affects the subsequent calculations of many important quantities based on covariance matrix. 26 showed that high-dimensionality affects the solution of Markowitz problem and results in underestimation of risk.
Instead of imposing a structure, an alternative avenue of investigating a high dimensional covariance matrix is to study its spectral distribution. 27 established the limiting spectral distribution of realized covariance matrix obtained from synchronized data. Recently an asymptotic relationship has been established between the limiting spectral distributions of the true sample covariance matrix and noisy sample covariance matrix 28. 29 studied the estimation of integrated covariance matrix based on noisy high-frequency data with multiple transactions using random matrix theory. 30 obtained the limiting spectral distribution of the covariance matrices of time-lagged processes. The limiting spectral distribution of sample covariance matrix was also derived under VARMA(p,q) model assumption 31. In this paper, we establish the limiting spectral distribution for the Hayashi-Yoshida estimator which has not yet been studied. Rest of the paper is organized as follows. In section 2, we discuss the background of the problem. Section 3 deals with a very brief introduction to random matrix theory. In section 4, we determine the limiting spectral distribution of high-dimensional Hayashi-Yoshida estimator. Simulated data analysis results are presented in section 6.1. The summary of this work and a brief discussion on some further directions are given in section 7.
2 Integrated Covariance Matrix and Asynchronicity
Suppose, we have stocks, whose price processes are denoted by for . and define the th log price process as . Let . Then we can model as a -dimensional diffusion process described as
(1) |
where is a dimensional drift process and is a x matrix, called instantaneous covolatility process. is a dimensional standard Brownian motion. The Integrated covariance (ICV) matrix, our parameter of interest, is defined by
(2) |
In univariate case, the most widely used estimator of integrated variance is called the Realized variance. For stocks, analogous covariance estimator can be defined in the following way.
2.1 Realized covariance
Note that the transactions in each stock occur at random time points. Let be the number of observations for the th stock. The arrival time of the th observation of the th stock is denoted by . When the observations are assumed to be synchronous i.e. for , the Realized Covariance (RCV) matrix can be defined as the following:
(3) |
(a) Nonsynchronous observations |
(b) Synchronized observations illustrated by arrows. |
2.2 Hayashi-Yoshida estimator
For asynchronous intraday data, the Realized covariance can not be directly calculated unless we synchronize the data by some ad hoc method. This means that we have to throw away some of the observations such that synchronized vectors of observations can be formed. In Fig. 1, we illustrate this for the bivariate case. Fig. 1(a) shows how nonsynchronous data would look like. The circles and quadrangles represent the transaction times for the first and second stock respectively. Synchronization of transaction times, as indicated by the arrows, are shown in Fig. 1(b). A synchronized dataset can be formed by pairing the stock prices corresponding to the synchronized time points. For example, two consecutive observations can be and . This is equivalent to “pretending” that is observed at instead of at and similarly, is observed at instead of at .***For this reason, synchronization methods can be expressed as a problem of choosing a set of sampling times , from the set . The corresponding price of each stock at is taken as the price observed previous to . We can see from Fig 1(b) that for the first stock, two observations at time and are not synchronized with any time point in the second stock and therefore can be excluded from the study. Extending this to stocks, let us denote that the number of resulting synchronized vectors is . Then it is evident that .
In Fig. 1(b), we have applied a particular synchronization method called refresh time sampling 32, 33.
3 proposed an alternative estimator () of ICV matrix that does not require the dataset to be synchronized and therefore can be directly applied on the asynchronous data. Before defining it for high-dimension, we introduce it for the bivariate case. In the following expression, instead of writing and we simply write and . Now, for two stocks the Hayashi-Yoshida estimator is defined as the following way:
(4) |
where is an indicator function that takes value 1 when the condition is satisfied. Fig. 2 illustrates the computation. will contribute to the sum in Eq. (4) as and are overlapping intervals. But will not contribute to the sum in Eq. (4) as the intervals and are non-overlapping.
2.2.1 Hayashi-Yoshida covariance and Refresh-time sampling
Even though the Hayashi-Yoshida estimator does not require prior synchronization of intraday data, we will show that the estimator still throws away some of the data points. Moreover these data-points are exactly the same as thrown by refresh-time sampling. To see this let us consider the case as shown in Fig. 2. The Hayashi-Yoshida estimate is:
(5) |
But is just the difference between log-price at and log-price at which doesn’t require any information on stock price anytime time in between. Therefore, although the Hayashi-Yoshida estimator doesn’t require presynchronization, it actually throws away the exact same observation as thrown by refresh-time sampling. As a consequence, the value of Hayashi-Yoshida covariance on full data will be equal to Hayashi-Yoshida estimator on the set of refresh-time pairs. Synchronizing the data using refresh-time sampling before computing the covariance can reduce the computational cost quite significantly.
When we move away from bivariate case to higher dimension, synchronize every pair of variables separately would not be very efficient. It would be preferable to synchronize the data for all the stocks simultaneously. This can be achieved by applying “all refresh method” which results in a synchronous sampled time points defined in the following way:
(6) |
where is the number of observation before time 34, 35. In this paper, we will define the Hayashi-Yoshida estimator on refresh time sampling times. The theoretical implication is that- given the synchronized data, we can now assume the number of observations of each stock to be the same. We will denote this common sample size as .
For stocks the Hayashi-Yoshida estimator is defined as the following :
(7) |
and ‘’ is the Hadamard product and is a matrix with element is the indicator function involving interarrival of stock and interarrival of stock: , where . In other words if two interarrivals intersect then product will contribute to the sum. In Fig. 2, and are shown.
2.3 Scaled Realized Covariance estimator
In this section we show the “closeness” of the Hayashi-Yoshida estimator with a scaled realized estimator which is motivated from the intraday covariance estimator proposed in 36. We determine the scaling coefficients for the bivariate case. The result will be key in our proof of the Limiting Spectral distribution. For the bivariate case, let us denote the log price for two stocks at a particular time as . Following 36, we synchronize the data for two stocks in the following fashion:
Algorithm (): 1. For , assign and . 2. While and : • If then find . The th pair will be . Modify . • If then find . The th pair will be . Modify • Modify . and .
The pairs created by this algorithm are identical to the pairs created by refresh time sampling but accommodates more information by retaining the actual transaction times. To see this, note that in Eq.(6)), a common set of synchronized points are defined for all stocks. For each stock the last observed stock price prior to was taken to be the price at . Therefore, from the refresh time sampling it is not possible to retrieve the actual transaction times of the synchronized pairs. Algorithm , on the other hand, keep these information. Instead of writing we shall henceforth write .
2.3.1 Overlapping and non-overlapping regions for return construction
For two such consecutive synchronized pairs of stock-prices, we can now consider the bivariate return as: . Note that, in this bivariate return vector, the first component (for ) is defined on the interval and the return on the is defined on the interval . It can be shown the overlap and nonoverlapping parts of these two intervals play a crucial role in bias of the estimated covariance 36. To define the overlap we first illustrate four possible configurations of the intervals. In Fig. 3 we show four such configurations of intervals corresponding to a particular return vector, constructed from synchronized pairs of observations. More formally, suppose for some and . Then one of these four configurations is true:
(8) |
Given this set of possible configurations, we define a random variable , denoting the overlapping time interval of th interarrivals corresponding to and as
(9) |
Fig. 3 illustrates the overlapping regions for all four configurations described in Eq. 8.
(a) | (b) |
(c) | (d) |
The next theorem says that in presence of high-frequency data, the performance of Hayashi-Yoshida covariance would be very similar to a scaled realized covariance. We make the following set of assumptions:
-
():
The log return process follows independent and stationary increment property.
-
():
The observation times (arrival process) of two stocks are independent Renewal processes and as .
-
():
Estimation is based on paired data obtained by algorithm .
Define a scaled (pairwise) realized covariance (SRCV) on a synchronized data as follows:
(10) |
where
The following theorem says that for all practical purposes this estimator performs as good as the Hayashi-Yoshida estimator.
Theorem 2.1.
Under the assumptions SRCV is a consistent estimator of the pairwise integrated covariance.
The ’s in Eq.(10) are the scaling coefficients which are functions of the arrival (transaction) times only (does not depend on the stock prices at those time points). The conventional refresh-time periods s in Eq. 6 do not allow us to calculate these scaling coefficients. The algorithm , on the other hand, enable us to calculate the scaling coefficients as it preserves the actual arrival times of the synchronized pairs. The importance of this estimator will be evident in
As both Hayashi-Yoshida (pairwise) covariance and SRCV are consistent estimators, in presence of sufficient amount of data, they would be “close” to each other. For -dimensional process, the SRVC matrix can be written as the following:
(11) |
where is a symmetric matrix consisting of the pairwise scaling coefficients. This matrix will help us to make an important assumption necessary to determine the LSD of Hayashi-Yoshida matrix.
2.4 Inconsistency in high-dimension
From multivariate statistical theory we know that the sample covariance matrix is consistent for the population covariance matrix. In high-dimensional scenario, when the dimension grows at the same or a higher rate as the number of observations, this properties do not hold anymore. It can be shown that under a high-dimensional setup the following is true:
where is the operator norm, is the usual sample covariance matrix and is the population covariance matrix.
But what impact would this inconsistency of sample covariance matrix make on the eigenvalues and the eigenvectors? Weyl’s theorem and Davis-Kahan theorems show that when the sample covariance matrix is not consistent, neither the sample eigenvalues and the eigenvectors are going to converge to their true counterpart 4, 37. Therefore it is worthwhile to study the limiting distribution of sample eigenvalues and its relation to the distribution of true eigenvalues.
For low dimensions, the Hayashi’s estimator is consistent estimator of ICV matrix. But for high dimensional stock price process, when the dimension grows at the same or a higher rate as the number of observations, neither RCV nor Hayashi-Yoshida estimator is consistent anymore. It is evident from the above discussion that the sample spectrum of Hayashi-Yoshida estimator deviates significantly form the true spectrum. In this paper, we study the asymptotic behavior of the distribution of eigenvalues of the Hayashi-Yoshida matrix.
3 Spectral Distribution
The empirical spectral distribution (ESD) of a symmetric (more generally Hermitian) matrix is defined as
where ’s are the eigenvalues of the matrix and denotes the cardinality of set . The limit distribution () of ESD is called the limiting spectral distribution (LSD). One commonly used method of finding LSD is through Stieltjes transform.
Stieltjes transform: Let is a Hermitian matrix and be its ESD. Then the Stieltjes transform of is defined as
where , being the imaginary part of . The importance of Stieltjes transformation in Random Matrix Theory is due to Theorem B.9 and Theorem B.10 38).
These theorems suggest that in order to determine the LSD, it is enough to obtain the limit of the Stieltjes transform.
4 Spectral Analysis of High Dimensional Hayashi’s Estimator
Based on our model (Eq. 1), the distribution of can be written like the following:
where is a diagonal matrix and is a -dimensional vector with its components being iid standard normal distribution. As a consequence, the Hayashi-Yoshida estimator has the same distribution as that of :
Hence, to determine the LSD of Hayashi-Yoshida estimator, it is enough to find the limit of the Stieltjes transform of .
The set of assumptions () necessary for determining the limiting spectral distribution of Hayashi-Yoshida estimator are given below:
-
()
as .
-
()
( ) a.s. and has a finite second order moment.
-
()
’s () are iid with mean , variance 1 and finite moments of all orders.
-
()
and let such that
(12) and
(13) where with is a sufficiently large positive number being positive semidefinite.
-
()
There exists such that Also there exists a nonnegative cadlag process such that
(14) -
()
There exists a and such that for all , almost surely.
Before stating the main theorem, we present a brief motivation of the assumption . In Sec. 2.3, we have seen that under a low dimensional setup, SRCV is also a consistent estimator for ICV. Therefore with high frequency data we can expect and to be “close” to each other. The matrix replaces the matrix in Eq. (11) with a constant matrix where each element of the matrix is . Assumption claims that under high-dimensional setup, even when both and are inconsistent, upon choosing the ’s, and have a closeness in the sense expressed by Eq. (12) & Eq. (13).
Now we state our main theorem.
Theorem 4.1.
Under the assumptions (), almost surely, ESD of converges in distribution to a probability distribution with Stieltjes transform
(15) |
where can be solved by the equation:
(16) |
This theorem establishes the Limiting spectral distribution of Hayashi-Yoshida estimator by determining its spectral distribution through the limiting spectral distribution of . In other words the theorem establishes the link between the limiting spectral distributions of and Hayashi-Yoshida estimator through its Stieltjes transform.
See Appendix for the lemmas that would lead to the theorem.
Now we are ready to prove Theorem 4.1.
Proof 4.2.
Define
where and . We denote the Stieltjes transform of by ,
Similarly, Stieltjes transform of and
Stieltjes transform of are denoted by
and respectively.
In order to show the convergence of it is enough to show
for all with sufficiently large and the condition,
in Lemma 8, is satisfied.
We will start by showing that and
converge to the same limit.
According to our assumption, , .
It is enough to show that
We write where
where with .
It is sufficient to show that
Convergence of and are followed by Lemma A.10 and Lemma A.12. Convergence of and can be proved similarly.
Now due to Assumption 5, we have
As
we get
(17) |
The fact that and implies and therefore . So from Lemma 5, it is clear that .
Now
Also, equation 17 implies
Due to Lemma 6 and Lemma 7, and . Same will be true for their limits. Now to show that is not unique it is enough to show that if there are two solutions and (and therefore ) then . If possible let there are two limiting spectral distributions and such that . To show the contradiction, it is enough to show .
Note that,
But,
On simplification, this gives
(18) |
As ,
And , implies
So for , with sufficiently large, Eq. (18) can not be true. So is unique.
The above theorem is true for time-varying instantaneous covolatility process with a little stronger set assumptions. Following 27, we can assume that time varying covolatility process can be decomposed in two parts: a time varying cadlag process and a symmetric matrix that is not varying with time. Formally, where does not depend on time and as mentioned is a time-varying cadlag process.
If we assume
this, then has the same distribution as the following-
where ; ’s are iid normal with mean and variance for and and when , and
.
Therefore we are interested in the spectral distribution of
as both and have the same LSD.
Suppose now we denote the Limiting spectral distribution of as . Then along with other assumptions of we need the following additional assumptions
-
()
where does not depend on time and is a time-varying cadlag process.
-
()
are independent of .
Then the result holds for time-varying covolatility process. How these conditions impose constraints on the time-varying covolatility or more specifically the cadlag process is a separate but interesting question to study. We write the theorem below
5 Spectral Analysis of Hayashi’s Estimator when
Due to the fact that Hayashi’s estimator is unbiased we will be concerned about the following matrix (let us call ),
Like the previous chapter here also we have to take some assumptions, we will call it .
-
(1)
: as and .
-
(2)
: (, ) are iid Gaussian random variables with , .
-
(3)
: as where is a distribution function.
-
(4)
: Define, . Then , and are positive definite, , , and .
-
(5)
: is bounded above.
-
(6)
: and as
Now we are ready to state the main theorem.
Theorem 5.1.
If the above assumptions () are true then the empirical spectral distribution of almost surely converges weakly to a nonrandom distribution as , whose Stieltjes transform is determined by the following system of equations:
for any .
Proof of this theorem will be in the similar path as in 39. Before proving the theorem we will define some quantities and make some observations. Let be the spectral decomposition of . Now define,
Let be the th row of and be the matrix after deleting the th row. Define,
where is the matrix obtained by deleting th diagonal
element of .
Now we will make some remarks. Justifications of the remarks are given
in the appendix.
-
Remark 1:
and .
-
Remark 2:
-
Remark 3:
-
Remark 4:
Now we are ready to prove Theorem 5.1.
Proof 5.2 (Proof of Theorem 5.1).
Suppose the Stiletjes transform of is . So .
Observe that,
This implies the following:
(19) |
Notice that,
Now,
The last line of the above derivation is a consequence of Assumption
. Moreover as and , we have a.s.
This means that we can derive valuable
information about by studying the spectral distribution
of .
Note that, according to our definitions
(20) |
Now from Eq. (19) we have,
Define , then similar derivation will lead to
(21) |
But again,
By Remark 4, we know that . Define, . Therefore from Eq. (21),
Let us consider the second part of the right hand side. This equals
We try to argue that as ,
It can be shown that the above expression is (see 39).
It can be shown that all three terms is .
We further observe that
and
This implies a.s.
With this we showed that
where as .
If we replace by and
call the resulting new by (similarly
by ) then it is easy to show that
and
Hence by Lemma 9,
and
By Borel-Cantelli lemma, a.s. and a.s. and thus we also have . Now as and is bounded, is bounded. So by dominated convergence theorem converges to . But as a.s., we have a.s. Similarly for a.s. So can be evaluated by the following two equations,
6 Data Analysis
6.1 Simulated Data Analysis
Although we are working with a high-dimensional set up, the computational complexity of the Hayashi’s estimator is worth paying attention to. The fact that the time to compute the Hayashi’s estimator is much greater compared to the Realized covariance estimator, restricts us to a moderate dimension and sample size in the simulation study. We simulated 30 stocks with each 500 observations where the spot volatility matrix is taken to be . The empirical cdf of Hayashi-Yoshida estimator and the cdf of integrated covariance matrix are shown in Fig. 4. The red line is obtained by generating data from the same underlying process on sufficiently fine synchronous grid and calculating the realized covariance for such data. It serves as the proxy of the spectrum of Integrated covariance matrix. One limitation of the simulation study is that we have taken same number of observations for each stock. This is of course not a practical assumption but as discussed in Sec.2.2.1 it corresponds to the refresh time sampling.
We create the nonsynchronous data using the following algorithm.
Algorithm 1. Initialise (the number of stocks), (the number of observations in each stock) and (the interval represents a day). 2. Draw a sample of size from a uniform distribution on where represents a day. Denote it by . Assume . 3. Generate random vectors () from a -dimensional distribution. Denote them by . Scale the ’s appropriately to represent the increment in returns for the interval (): . 4. Define 5. for From take a random subset of size . Denote it by . Data for th stock is {(} Update i.e. removing the time points chose for from .

We want to see the effect of , that is the ratio of on the empirical spectral distribution. So we repeat the simulation with and . The result is given in the left panel of Fig. 5
Next, we create a similar plot when the stocks are dependent. We have taken a 30-dimensional positive semi-definite covariance matrix with stocks. As nontrivial high-dimensional covariance matrix is difficult to prefix, we take the principal sub-matrix of the estimated covariance matrix from real data (Sec. 6.2). The right panel of Fig. 5 shows the distribution of the eigenvalues. We can see that for general covariance matrix Hayashi-Yoshida estimator may not be positive definite.


It is clear from the algorithm that the time points are generated by a Poisson Process with different intensity parameter. For computational convenience we have taken all ’s to be same ().
6.2 Real Data analysis
The limiting spectral distribution is particularly useful to test for deviation from null model, for example, whether the covolatility process is or not. Spectrum of integrated covariance matrix also helps to understand some of the key properties of the interacting units of the intraday-financial-network 40. The extreme (highest) eigenvalue, for example, gives us significant insight about the market mode or the collective overall response of the market to some external information. Spectral analysis, therefore, reveals broadly three types of fluctuations: (i) common to all stocks (i.e., due to market), (ii) related to a particular business sector (e.g. sectoral) and (iii) limited to an individual stock (i.e., idiosyncratic). These can be captured by simply segregating the network spectrum into the following parts: (i) the extreme eigenvalue (ii) eigenvalues deviating from the theoretical spectral distribution and (iii) bulk of the spectrum (41, 42, 43). Limiting spectral distribution of Hayashi-Yoshida estimator would help us to identify the sectoral mode of intraday financial network.
We collect intraday tick by tick Bloomberg data of equities in Nifty 50 for several days. Here we present the results for three consecutive days starting from 22-12-2020 which are fairly representative. In Fig.6, we have plotted the scree plots of eigenvalues for these 50 stocks for the three days on the left panel. We see that the impact of the market mode makes the largest eigenvalue away from the bulk. On the right panel some of the eigenvectors, for the corresponding days, are plotted. Specifically, these are the eigenvector 3 of day 1, eigenvector 2 of day 2 and eigenvector 3 of day 3. Each of these has high contributions from stocks 2,4,13,14, 36 with same sign. These stocks are all from IT sector and there are no other stocks from IT sector in our dataset. This suggests that the IT sector mean (same sign) is the next big component that drives the market after the overall mean (market mode).






7 Conclusion and further directions
In this work we have determined the limiting spectral distribution of high dimensional Hayashi-Yoshida estimator for nonsynchronous intraday data. Limiting spectral distribution can help to construct a shrinkage estimator of high-dimensional integrated covariance matrix (see 44). It can also be used for testing for a particular structure in spot volatility matrix.
In this paper we have only considered asynchronicity but not the presence of microstructure noise, which is also a feature of intraday stock-price data. So a natural direction to extend this work is by adding microstructure noise to it. Significant insights can be obtained from 28 where the same was derived for realized (co)volatility matrix. In presence of noise the spectral distribution may deviate from the ideal situation in significant ways. We have restricted ourselves to the simple Black-Scholes setup. Geometric Brownian motion models are not always very realistic models to describe financial data. One can try to go beyond that and investigate into the spectral analysis of Hayashi’s estimator for more complex models. One can also try to extend the results for a general class of time varying covolatility processes (for more details, see 27, 28). Changes due to leverage effects can also be quite serious and so worth looking into.
References
- 1 Epps TW. Comovements in stock prices in the very short run. Journal of the American Statistical Association 1979; 74(366a): 291–298.
- 2 Baumöhl E, Vỳrost T. Stock market integration: Granger causality testing with respect to nonsynchronous trading effects. Finance a Uver 2010; 60(5): 414.
- 3 Hayashi T, Yoshida N, others . On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 2005; 11(2): 359–379.
- 4 Pourahmadi M. High-dimensional covariance estimation: with high-dimensional data. 882. John Wiley & Sons . 2013.
- 5 Tumminello M, Di Matteo T, Aste T, Mantegna R. Correlation based networks of equity returns sampled at different time horizons. The European Physical Journal B-Condensed Matter and Complex Systems 2007; 55(2): 209–217.
- 6 Zebedee AA, Kasch-Haroutounian M. A closer look at co-movements among stock returns. Journal of Economics and Business 2009; 61(4): 279–294.
- 7 Bonanno G, Lillo F, Mantegna RN. High-frequency cross-correlation in a set of stocks. 2001.
- 8 Muthuswamy J, Sarkar S, Low A, Terry E, others . Time variation in the correlation structure of exchange rates: High-frequency analyses. Journal of Futures markets 2001; 21(2): 127–144.
- 9 Lundin MC, Dacorogna MM, Müller UA. Correlation of high frequency financial time series. 1998.
- 10 Reno R. A closer look at the Epps effect. International Journal of theoretical and applied finance 2003; 6(01): 87–102.
- 11 Precup OV, Iori G. A comparison of high-frequency cross-correlation measures. Physica A: Statistical Mechanics and its Applications 2004; 344(1): 252–256.
- 12 Lo AW, MacKinlay AC. An econometric analysis of nonsynchronous trading. Journal of Econometrics 1990; 45(1-2): 181–211.
- 13 Campbell JY, Lo AWC, MacKinlay AC. The econometrics of financial markets. princeton University press . 1997.
- 14 Bernhardt D, Davies RJ. The impact of nonsynchronous trading on differences in portfolio cross-autocorrelations. 2008.
- 15 Atchison MD, Butler KC, Simonds RR. Nonsynchronous security trading and market index autocorrelation. The Journal of Finance 1987; 42(1): 111–118.
- 16 Barndorff-Nielsen OE, Shephard N. Econometric analysis of realized covariation: High frequency based covariance, regression, and correlation in financial economics. Econometrica 2004; 72(3): 885–925.
- 17 Merton RC. On estimating the expected return on the market: An exploratory investigation. Journal of financial economics 1980; 8(4): 323–361.
- 18 Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Journal of Econometrics 2011; 162(2): 149–169.
- 19 Voev V, Lunde A. Integrated covariance estimation using high-frequency data in the presence of noise. Journal of Financial Econometrics 2006; 5(1): 68–104.
- 20 Hautsch N, Kyj LM, Oomen RC. A blocking and regularization approach to high-dimensional realized covariance estimation. Journal of Applied Econometrics 2012; 27(4): 625–645.
- 21 Ledoit O, Wolf M. Honey, I shrunk the sample covariance matrix. UPF economics and business working paper 2003(691).
- 22 Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis 2004; 88(2): 365–411.
- 23 Ledoit O, Wolf M, others . Analytical nonlinear shrinkage of large-dimensional covariance matrices. Annals of Statistics 2020; 48(5): 3043–3065.
- 24 Bannouh K, Martens M, Oomen RC, Dijk vDJ. Realized mixed-frequency factor models for vast dimensional covariance estimation. ERIM Report Series Reference No. ERS-2012-017-F&A 2012.
- 25 Lunde A, Shephard N, Sheppard K. Econometric analysis of vast covariance matrices using composite realized kernels. Manuscript, University of Aarhus 2011.
- 26 El Karoui N, others . High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. The Annals of Statistics 2010; 38(6): 3487–3566.
- 27 Zheng X, Li Y. On the estimation of integrated covariance matrices of high dimensional diffusion processes. 2011.
- 28 Xia N, Zheng X, others . On the inference about the spectral distribution of high-dimensional covariance matrix based on high-frequency noisy observations. The Annals of Statistics 2018; 46(2): 500–525.
- 29 Wang M, Xia N. Estimation of high-dimensional integrated covariance matrix based on noisy high-frequency data with multiple observations. Statistics & Probability Letters 2021; 170: 108996.
- 30 Robert CY, Rosenbaum M. On the limiting spectral distribution of the covariance matrices of time-lagged processes. Journal of multivariate analysis 2010; 101(10): 2434–2451.
- 31 Wang C, Jin B, Miao B. On limiting spectral distribution of large sample covariance matrices by VARMA (p, q). Journal of Time Series Analysis 2011; 32(5): 539–546.
- 32 Aït-Sahalia Y, Fan J, Xiu D. High-frequency covariance estimates with noisy and asynchronous financial data. Journal of the American Statistical Association 2010; 105(492): 1504–1517.
- 33 Fan J, Li Y, Yu K. Vast volatility matrix estimation using high-frequency data for portfolio selection. Journal of the American Statistical Association 2012; 107(497): 412–428.
- 34 Guo X, Lai TL, Shek H, Wong SPS. Quantitative trading: algorithms, analytics, data, models, optimization. CRC Press . 2017.
- 35 Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Subsampling realised kernels. Journal of Econometrics 2011; 160(1): 204–219.
- 36 Chakrabarti A, Sen R. Copula estimation for nonsynchronous financial data. arXiv preprint arXiv:1904.10182 2019.
- 37 Yu Y, Wang T, Samworth RJ. A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 2015; 102(2): 315–323.
- 38 Bai Z, Silverstein JW. Spectral analysis of large dimensional random matrices. 20. Springer . 2010.
- 39 Wang L, Paul D. Limiting spectral distribution of renormalized separable sample covariance matrices when p/n 0. Journal of Multivariate Analysis 2014; 126: 25–52.
- 40 Kumar S, Deo N. Correlation and network analysis of global financial indices. Physical Review E 2012; 86(2): 026101.
- 41 Plerou V, Gopikrishnan P, Rosenow B, Amaral LAN, Stanley HE. Universal and nonuniversal properties of cross correlations in financial time series. Physical review letters 1999; 83(7): 1471.
- 42 Sinha S, Chatterjee A, Chakraborti A, Chakrabarti BK. Econophysics: an introduction. John Wiley & Sons . 2010.
- 43 Onatski A. Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics 2010; 92(4): 1004–1016.
- 44 Ledoit O, Wolf M, others . Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics 2012; 40(2): 1024–1060.
- 45 Bai Z. Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica 1999: 611–662.
- 46 Geronimo JS, Hill TP. Necessary and sufficient condition that the limit of Stieltjes transforms is a Stieltjes transform. Journal of Approximation Theory 2003; 121(1): 54–60.
- 47 McDiarmid C. Centering sequences with bounded differences. Combinatorics, Probability and Computing 1997; 6(1): 79–86.
Appendix
Appendix A Lemmas
Lemma A.1.
Let , with and , is a Hermitian nnd matrix, being any matrix, and , then
For proof, see 45.
Lemma A.2.
Let , with , and are matrices, with being Hermitian, and . Then
for all
For proof, see 45.
Lemma A.3.
For any Hermitian matrix and , with -
For proof, see 38.
Lemma A.4.
For where ’s are iid random variables such that , , and for some , there exists , depending only on , and , such that for any nonrandom matrix ,
(22) |
For proof, see 27.
Lemma A.5.
Suppose is a matrix defined as assumption , and , then
where ( is defined as in assumption ) and is the unique solution in to the following equation:
For proof, see 27.
Lemma A.6.
Let with , be any matrix, and be a Hermitian nonnegative definite matrix. Then .
For proof, See 27.
Lemma A.7.
Let with , A be a Hermitian nonnegative definite matrix, . Then
(23) |
For proof, see 27.
Lemma A.8.
Suppose that are real probability measures with Stieltjes transforms . Let be an infinite set with limit points in . If exists for all , then there exists a Probability measure with Stieltjes transform if and only if
in which case .
For proof, see 46.
The next Lemma is known as McDiarmid Inequality 47.
Lemma A.9.
Let be independent random vectors taking values in . Suppose that is a function of satisfying ,
Then for all
Lemma A.10.
†††Lemma A.10 and A.12 can be found in Theorem 1 in 27 in slightly different form.For , , described as Assumption and , define,
and
Then,
Proof A.11 (Proof of Lemma A.10).
The last inequality is due to Lemma 1 (where ,
, ,
), and Assumption 6.
Applying Markov Inequality and Lemma 4,
Choosing , and using Borel Cantelli lemma, we get a.s. As a consequence for some .
Define, , and consider
Now we will consider the third part of the above equation,
by Assumption 1 and 6,
Use of Lemma 4, Lemma 3 with Borel Cantelli Lemma will give us for and ,
Also,
The first and second inequalities are result of
application of Lemma 2 and Assumption 6 respectively.
This proves out claim.
Lemma A.12.
For , , described as Assumption and , define,
and
then
Proof A.13 (Proof of Lemma A.12).
Using Lemma 4 and and Markov Inequality it is easy to show that,
After choosing appropriate value of and using Borel Cantelli Lemma we can get the claim.
Appendix B Proof of the Remarks:
Proof B.1 (Proof of Remark 1).
also it is easy to see that,
Proof B.2 (Proof of Remark2:).
and
So,
Proof B.3 (Proof of Remark 3).
To see this, manipulate the left hand side the following way-
Observe that because . So,
As a consequence . And,