This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotic Distribution of the Markowitz Portfolio

Steven E. Pav steven@gilgamath.com
Abstract

The asymptotic distribution of the Markowitz portfolio, Σ^1𝝁^{{\mathsf{\hat{\Sigma}}}^{-1}}{\boldsymbol{\hat{\mu}}}, is derived, for the general case (assuming fourth moments of returns exist), and for the case of multivariate normal returns. The derivation allows for inference which is robust to heteroskedasticity and autocorrelation of moments up to order four. As a side effect, one can estimate the proportion of error in the Markowitz portfolio due to mis-estimation of the covariance matrix. A likelihood ratio test is given which generalizes Dempster’s Covariance Selection test to allow inference on linear combinations of the precision matrix and the Markowitz portfolio. [15] Extensions of the main method to deal with hedged portfolios, conditional heteroskedasticity, conditional expectation, and constrained estimation are given. It is shown that the Hotelling-Lawley statistic generalizes the (squared) Sharpe ratio under the conditional expectation model. Asymptotic distributions of all four of the common ‘MGLH’ statistics are found, assuming random covariates. [60] Examples are given demonstrating the possible uses of these results.

1 Introduction

Given pp assets with expected return 𝝁\boldsymbol{\mu} and covariance of return Σ\mathsf{\Sigma}, the portfolio defined as

𝝂=dfλΣ1𝝁{\boldsymbol{\nu}}_{{}*}=_{\operatorname{df}}\lambda{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}} (1)

plays a special role in modern portfolio theory. [38, 5, 9] It is known as the ‘efficient portfolio’, the ‘tangency portfolio’, and, somewhat informally, the ‘Markowitz portfolio’. It appears, for various λ\lambda, in the solution to numerous portfolio optimization problems. Besides the classic mean-variance formulation, it solves the (population) Sharpe ratio maximization problem:

max𝝂:𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\max_{\boldsymbol{\nu}:{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (2)

where r00{{r}_{0}}\geq 0 is the risk-free, or ‘disastrous’, rate of return, and R>0R>0 is some given ‘risk budget’. The solution to this optimization problem is λΣ1𝝁\lambda{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}, where λ=R/𝝁Σ1𝝁.\lambda=R/\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}.

In practice, the Markowitz portfolio has a somewhat checkered history. The population parameters 𝝁\boldsymbol{\mu} and Σ\mathsf{\Sigma} are not known and must be estimated from samples. Estimation error results in a feasible portfolio, 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*}, of dubious value. Michaud went so far as to call mean-variance optimization, “error maximization.” [44] It has been suggested that simple portfolio heuristics outperform the Markowitz portfolio in practice. [13]

This paper focuses on the asymptotic distribution of the sample Markowitz portfolio. By formulating the problem as a linear regression, Britten-Jones very cleverly devised hypothesis tests on elements of 𝝂{\boldsymbol{\nu}}_{{}*}, assuming multivariate Gaussian returns. [7] In a remarkable series of papers, Okhrin and Schmid, and Bodnar and Okhrin give the (univariate) density of the dot product of 𝝂{\boldsymbol{\nu}}_{{}*} and a deterministic vector, again for the case of Gaussian returns. [50, 3] Okhrin and Schmid also show that all moments of 𝝂^/𝟏𝝂^{\boldsymbol{\hat{\nu}}}_{{}*}/{{\boldsymbol{1}}^{\top}}{\boldsymbol{\hat{\nu}}}_{{}*} of order greater than or equal to one do not exist. [50]

Here I derive asymptotic normality of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*}, the sample analogue of 𝝂{\boldsymbol{\nu}}_{{}*}, assuming only that the first four moments exist. Feasible estimation of the variance of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*} is amenable to heteroskedasticity and autocorrelation robust inference. [68] The asymptotic distribution under Gaussian returns is also derived.

After estimating the covariance of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*}, one can compute Wald test statistics for the elements of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*}, possibly leading one to drop some assets from consideration (‘sparsification’). Having an estimate of the covariance can also allow portfolio shrinkage. [14, 31]

The derivations in this paper actually solve a more general problem than the distribution of the sample Markowitz portfolio. The covariance of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*} and the ‘precision matrix,’ Σ^1{{\mathsf{\hat{\Sigma}}}^{-1}} are derived. This allows one, for example, to estimate the proportion of error in the Markowitz portfolio attributable to mis-estimation of the covariance matrix. According to lore, the error in portfolio weights is mostly attributable to mis-estimation of 𝝁\boldsymbol{\mu}, not of Σ\mathsf{\Sigma}. [8, 43]

Finally, assuming Gaussian returns, a likelihood ratio test for performing inference on linear combinations of elements of the Markowitz portfolio and the precision matrix is derived. This test generalizes a procedure by Dempster for inference on the precision matrix alone. [15]

2 The augmented second moment

Let 𝒙\boldsymbol{x} be an array of returns of pp assets, with mean 𝝁\boldsymbol{\mu}, and covariance Σ\mathsf{\Sigma}. Let 𝒙~\boldsymbol{\tilde{x}} be 𝒙\boldsymbol{x} prepended with a 1: 𝒙~=[1,𝒙]\boldsymbol{\tilde{x}}={{\left[1,{{\boldsymbol{x}}^{\top}}\right]}^{\top}}. Consider the second moment of 𝒙~\boldsymbol{\tilde{x}}:

Θ=dfE[𝒙~𝒙~]=[1𝝁𝝁Σ+𝝁𝝁].\mathsf{\Theta}=_{\operatorname{df}}\operatorname{E}\left[\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right]=\left[\begin{array}[]{cc}{1}&{{{\boldsymbol{\mu}}^{\top}}}\\ {\boldsymbol{\mu}}&{\mathsf{\Sigma}+\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}}\end{array}\right]. (3)

By inspection one can confirm that the inverse of Θ\mathsf{\Theta} is

Θ1=[1+𝝁Σ1𝝁𝝁Σ1Σ1𝝁Σ1]=[1+ζ2𝝂𝝂Σ1],{{\mathsf{\Theta}}^{-1}}=\left[\begin{array}[]{cc}{1+{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}&{-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}\\ {-{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}&{{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]=\left[\begin{array}[]{cc}{1+{\zeta}^{2}_{*}}&{-{{{\boldsymbol{\nu}}_{{}*}}^{\top}}}\\ {-{\boldsymbol{\nu}}_{{}*}}&{{{\mathsf{\Sigma}}^{-1}}}\end{array}\right], (4)

where 𝝂=Σ1𝝁^{\boldsymbol{\nu}}_{{}*}={{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\hat{\mu}}} is the Markowitz portfolio, and ζ=𝝁Σ1𝝁{\zeta}_{*}=\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}} is the Sharpe ratio of that portfolio. The matrix Θ\mathsf{\Theta} contains the first and second moment of 𝒙\boldsymbol{x}, but is also the uncentered second moment of 𝒙~\boldsymbol{\tilde{x}}, a fact which makes it amenable to analysis via the central limit theorem.

The relationships above are merely facts of linear algebra, and so hold for sample estimates as well:

[1𝝁^𝝁^Σ^+𝝁^𝝁^]1=[1+ζ^2𝝂^𝝂^Σ^1],{{\left[\begin{array}[]{cc}{1}&{{{\boldsymbol{\hat{\mu}}}^{\top}}}\\ {\boldsymbol{\hat{\mu}}}&{\mathsf{\hat{\Sigma}}+\boldsymbol{\hat{\mu}}{{\boldsymbol{\hat{\mu}}}^{\top}}}\end{array}\right]}^{-1}}={\left[\begin{array}[]{cc}{1+{\hat{\zeta}}^{2}_{*}}&{-{{{\boldsymbol{\hat{\nu}}}_{{}*}}^{\top}}}\\ {-{\boldsymbol{\hat{\nu}}}_{{}*}}&{{{\mathsf{\hat{\Sigma}}}^{-1}}}\end{array}\right]},

where 𝝁^\boldsymbol{\hat{\mu}}, Σ^\mathsf{\hat{\Sigma}} are some sample estimates of 𝝁\boldsymbol{\mu} and Σ\mathsf{\Sigma}, and 𝝂^=Σ^1𝝁^,ζ^2=𝝁^Σ^1𝝁^{\boldsymbol{\hat{\nu}}}_{{}*}={{\mathsf{\hat{\Sigma}}}^{-1}}{\boldsymbol{\hat{\mu}}},{\hat{\zeta}}^{2}_{*}={{\boldsymbol{\hat{\mu}}}^{\top}}{{\mathsf{\hat{\Sigma}}}^{-1}}\boldsymbol{\hat{\mu}}.

Given nn i.i.d. observations 𝒙i{\boldsymbol{x}}_{i}, let 𝖷~\mathsf{\tilde{{X}}} be the matrix whose rows are the vectors 𝒙~i{{{\boldsymbol{\tilde{x}}}_{i}}^{\top}}. The naïve sample estimator

Θ^=df1n𝖷~𝖷~\mathsf{\hat{\Theta}}=_{\operatorname{df}}\frac{1}{n}{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}} (5)

is an unbiased estimator since Θ=E[𝒙~𝒙~]\mathsf{\Theta}=\operatorname{E}\left[{{\boldsymbol{\tilde{x}}}^{\top}}\boldsymbol{\tilde{x}}\right].

2.1 Matrix Derivatives

Some notation and technical results concerning matrices are required.

Definition 2.1 (Matrix operations).

For matrix 𝖠\mathsf{A}, let vec(𝖠)\operatorname{vec}\left(\mathsf{A}\right), and vech(𝖠)\operatorname{vech}\left(\mathsf{A}\right) be the vector and half-space vector operators. The former turns an p×p{p}\times{p} matrix into an p2p^{2} vector of its columns stacked on top of each other; the latter vectorizes a symmetric (or lower triangular) matrix into a vector of the non-redundant elements. Let 𝖫\mathsf{L} be the ‘Elimination Matrix,’ a matrix of zeros and ones with the property that vech(𝖠)=𝖫vec(𝖠).\operatorname{vech}\left(\mathsf{A}\right)=\mathsf{L}\operatorname{vec}\left(\mathsf{A}\right). The ‘Duplication Matrix,’ 𝖣\mathsf{D}, is the matrix of zeros and ones that reverses this operation: 𝖣vech(𝖠)=vec(𝖠).\mathsf{D}\operatorname{vech}\left(\mathsf{A}\right)=\operatorname{vec}\left(\mathsf{A}\right). [36] Note that this implies that

𝖫𝖣=𝖨(𝖣𝖫).\mathsf{L}\mathsf{D}=\mathsf{I}\left(\neq\mathsf{D}\mathsf{L}\right).

We will let 𝖪\mathsf{K} be the ’commutation matrix’, i.e., the matrix whose rows are a permutation of the rows of the identity matrix such that 𝖪vec(𝖠)=vec(𝖠)\mathsf{K}\operatorname{vec}\left(\mathsf{A}\right)=\operatorname{vec}\left({{\mathsf{A}}^{\top}}\right) for square matrix 𝖠\mathsf{A}.

Definition 2.2 (Derivatives).

For mm-vector 𝒙\boldsymbol{x}, and nn-vector 𝒚\boldsymbol{y}, let the derivative d𝒚d𝒙\frac{\mathrm{d}{\boldsymbol{y}}}{\mathrm{d}\boldsymbol{x}} be the n×m{n}\times{m} matrix whose first column is the partial derivative of 𝒚\boldsymbol{y} with respect to x1x_{1}. This follows the so-called ‘numerator layout’ convention. For matrices 𝖸\mathsf{Y} and 𝖷\mathsf{X}, define

d𝖸d𝖷=dfdvec(𝖸)dvec(𝖷).\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{X}}=_{\operatorname{df}}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{X}\right)}.
Lemma 2.3 (Miscellaneous Derivatives).

For symmetric matrices 𝖸\mathsf{Y} and 𝖷\mathsf{X},

dvech(𝖸)dvec(𝖷)=𝖫d𝖸d𝖷,dvec(𝖸)dvech(𝖷)=d𝖸d𝖷𝖣,dvech(𝖸)dvech(𝖷)=𝖫d𝖸d𝖷𝖣.\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{X}\right)}=\mathsf{L}\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{X}},\quad\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{X}\right)}=\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{X}}\mathsf{D},\quad\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{X}\right)}=\mathsf{L}{\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{X}}}\mathsf{D}. (6)
Proof.

For the first equation, note that vech(𝖸)=𝖫vec(𝖸)\operatorname{vech}\left(\mathsf{Y}\right)=\mathsf{L}\operatorname{vec}\left(\mathsf{Y}\right), thus by the chain rule:

dvech(𝖸)dvec(𝖷)=d𝖫vec(𝖸)dvec(𝖸)=𝖫d𝖸d𝖷,\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{X}\right)}=\frac{\mathrm{d}{\mathsf{L}\operatorname{vec}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Y}\right)}=\mathsf{L}\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{X}},

by linearity of the derivative. The other identities follow similarly. ∎

Lemma 2.4 (Derivative of matrix inverse).

For invertible matrix 𝖠\mathsf{A},

d𝖠1d𝖠=(𝖠𝖠1)=(𝖠𝖠)1.\frac{\mathrm{d}{{{\mathsf{A}}^{-1}}}}{\mathrm{d}\mathsf{A}}=-\left({{\mathsf{A}}^{-\top}}\otimes{{\mathsf{A}}^{-1}}\right)=-{{\left({{\mathsf{A}}^{\top}}\otimes\mathsf{A}\right)}^{-1}}. (7)

For symmetric 𝖠\mathsf{A}, the derivative with respect to the non-redundant part is

dvech(𝖠1)dvech(𝖠)=𝖫(𝖠1𝖠1)𝖣.\frac{\mathrm{d}{\operatorname{vech}\left({{\mathsf{A}}^{-1}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{A}\right)}=-\mathsf{L}{\left({{\mathsf{A}}^{-1}}\otimes{{\mathsf{A}}^{-1}}\right)}\mathsf{D}. (8)

Note how this result generalizes the scalar derivative: dx1dx=(x1x1).\frac{\mathrm{d}{x^{-1}}}{\mathrm{d}x}=-\left(x^{-1}x^{-1}\right).

Proof.

Equation 7 is a known result. [18, 37] Equation 8 then follows using Lemma 2.3. ∎

2.2 Asymptotic distribution of the Markowitz portfolio

Collecting the mean and covariance into the second moment matrix gives the asymptotic distribution of the sample Markowitz portfolio without much work. In some sense, this computation generalizes the ‘standard’ asymptotic analysis of Sharpe ratio of multiple assets. [27, 34, 32, 33]

Theorem 2.5.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Then, asymptotically in nn,

n(vech(Θ^1)vech(Θ1))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)-\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (9)

where

𝖧=𝖫(Θ1Θ1)𝖣.\mathsf{H}=-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}. (10)

Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

Proof.

Under the multivariate central limit theorem [65]

n(vech(Θ^)vech(Θ))𝒩(0,Ω),\sqrt{n}\left(\operatorname{vech}\left(\mathsf{\hat{\Theta}}\right)-\operatorname{vech}\left(\mathsf{\Theta}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{\Omega}\right), (11)

where Ω\mathsf{\Omega} is the variance of vech(𝒙~𝒙~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right), which, in general, is unknown. By the delta method [65],

n(vech(Θ^1)vech(Θ1))𝒩(0,[dvech(Θ1)dvech(Θ)]Ω[dvech(Θ1)dvech(Θ)]).\sqrt{n}\left(\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)-\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\left[\frac{\mathrm{d}{\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{\Theta}\right)}\right]\mathsf{\Omega}{{\left[\frac{\mathrm{d}{\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{\Theta}\right)}\right]}^{\top}}\right).

The derivative is given by Lemma 2.4, and the result follows. ∎

To estimate the covariance of vech(Θ^1)\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right), plug in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in the covariance computation, and use some consistent estimator for Ω\mathsf{\Omega}, call it Ω^\mathsf{\hat{\Omega}}. One way to compute Ω^\mathsf{\hat{\Omega}} is to via the sample covariance of the vectors vech(𝒙~i𝒙~i)=[1,𝒙i,vech(𝒙i𝒙i)]\operatorname{vech}\left({\boldsymbol{\tilde{x}}}_{i}{{{\boldsymbol{\tilde{x}}}_{i}}^{\top}}\right)={{\left[1,{{{\boldsymbol{x}}_{i}}^{\top}},{{\operatorname{vech}\left({\boldsymbol{x}}_{i}{{{\boldsymbol{x}}_{i}}^{\top}}\right)}^{\top}}\right]}^{\top}}. More elaborate covariance estimators can be used, for example, to deal with violations of the i.i.d. assumptions. [68] Note that because the first element of vech(𝒙~i𝒙~i)\operatorname{vech}\left({\boldsymbol{\tilde{x}}}_{i}{{{\boldsymbol{\tilde{x}}}_{i}}^{\top}}\right) is a deterministic 11, the first row and column of Ω\mathsf{\Omega} is all zeros, and we need not estimate it.

2.3 The Sharpe ratio optimal portfolio

Lemma 2.6 (Sharpe ratio optimal portfolio).

Assuming 𝛍𝟎\boldsymbol{\mu}\neq\boldsymbol{0}, and Σ\mathsf{\Sigma} is invertible, the portfolio optimization problem

argmax𝝂:𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\mathop{\mathrm{argmax}}_{\boldsymbol{\nu}:\,{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (12)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂R,=dfR𝝁Σ1𝝁Σ1𝝁.{\boldsymbol{\nu}}_{{R,}*}=_{\operatorname{df}}\frac{R}{\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}}{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}. (13)

Moreover, this is the unique solution whenever r0>0{{r}_{0}}>0. The maximal objective achieved by this portfolio is 𝛍Σ1𝛍r0/R=ζr0/R\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}-{{r}_{0}}/R={\zeta}_{*}-{{r}_{0}}/R.

Proof.

By the Lagrange multiplier technique, the optimal portfolio solves the following equations:

0=c1𝝁c2Σ𝝂γΣ𝝂,𝝂Σ𝝂R2,\begin{split}0&=c_{1}\boldsymbol{\mu}-c_{2}\mathsf{\Sigma}\boldsymbol{\nu}-\gamma\mathsf{\Sigma}\boldsymbol{\nu},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}&\leq R^{2},\end{split}

where γ\gamma is the Lagrange multiplier, and c1,c2c_{1},c_{2} are scalar constants. Solving the first equation gives us

𝝂=cΣ1𝝁.\boldsymbol{\nu}=c\,{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}.

This reduces the problem to the univariate optimization

maxc:c2R2/ζ2sign(c)ζr0|c|ζ,\max_{c:\,c^{2}\leq R^{2}/{\zeta}^{2}_{*}}\operatorname{sign}\left(c\right){\zeta}_{*}-\frac{{{r}_{0}}}{\left|c\right|{\zeta}_{*}}, (14)

where ζ2=𝝁Σ1𝝁.{\zeta}^{2}_{*}={{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}. The optimum occurs for c=R/ζc=R/{\zeta}_{*}, moreover the optimum is unique when r0>0{{r}_{0}}>0. ∎

Note that the first element of vech(Θ1)\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right) is 1+𝝁Σ1𝝁1+{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}, and elements 2 through p+1p+1 are 𝝂-{\boldsymbol{\nu}}_{{}*}. Thus, 𝝂R,{\boldsymbol{\nu}}_{{R,}*}, the portfolio that maximizes the Sharpe ratio, is some transformation of vech(Θ1)\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right), and another application of the delta method gives its asymptotic distribution, as in the following corollary to Theorem 2.5.

Corollary 2.7.

Let

𝝂R,=R𝝁Σ1𝝁Σ1𝝁,{\boldsymbol{\nu}}_{{R,}*}=\frac{R}{\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}}{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}, (15)

and similarly, let 𝛎^R,{\boldsymbol{\hat{\nu}}}_{{R,}*} be the sample analogue, where RR is some risk budget. Then

n(𝝂^R,𝝂R,)𝒩(0,𝖧Ω𝖧),\sqrt{n}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}-{\boldsymbol{\nu}}_{{R,}*}\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (16)

where

𝖧=([12ζ2𝝂R,,Rζ𝖨p,𝟢])(𝖫(Θ1Θ1)𝖣),ζ2=df𝝁Σ1𝝁.\begin{split}\mathsf{H}&=\left(-\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right]\right)\left(-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right),\\ {\zeta}^{2}_{*}&=_{\operatorname{df}}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}.\end{split} (17)

Moreover, we may express 𝖧\mathsf{H} as

((𝒆1Θ1)[1ζ22ζ𝝂R,,Rζ(Σ1Σ1𝝁𝝁Σ12ζ2)])𝖣.-\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\frac{1-{\zeta}^{2}_{*}}{2{\zeta}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}\left({{\mathsf{\Sigma}}^{-1}}-\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{2{\zeta}^{2}_{*}}\right)\right]\right)\mathsf{D}. (18)
Proof.

By the delta method, and Theorem 2.5, it suffices to show that

d𝝂R,dvech(Θ1)=[12ζ2𝝂R,,Rζ𝖨p,𝟢].\frac{\mathrm{d}{{\boldsymbol{\nu}}_{{R,}*}}}{\mathrm{d}\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)}=-\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right].

To show this, note that 𝝂R,{\boldsymbol{\nu}}_{{R,}*} is R-R times elements 2 through p+1p+1 of vech(Θ1)\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right) divided by ζ=𝒆1vech(Θ1)1{\zeta}_{*}=\sqrt{{{{\boldsymbol{e}}_{1}}^{\top}}\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)-1}, where 𝒆i{\boldsymbol{e}}_{i} is the ithi^{\text{th}} column of the identity matrix. The result follows from basic calculus.

To establish Equation 18, note that only the first p+1p+1 columns of d𝝂R,dvech(Θ1)\frac{\mathrm{d}{{\boldsymbol{\nu}}_{{R,}*}}}{\mathrm{d}\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)} have non-zero entries, thus the elimination matrix, 𝖫\mathsf{L}, can be ignored in the term on the right, and we could write the derivative as

d𝝂R,dvech(Θ1)=𝒆1[12ζ2𝝂R,,Rζ𝖨p].\frac{\mathrm{d}{{\boldsymbol{\nu}}_{{R,}*}}}{\mathrm{d}\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)}={{{\boldsymbol{e}}_{1}}^{\top}}\otimes-\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p}\right].

And we can write the product as

𝖧=((𝒆1Θ1)([12ζ2𝝂R,,Rζ𝖨p]Θ1))𝖣.\begin{split}\mathsf{H}&=-\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left(\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p}\right]{{\mathsf{\Theta}}^{-1}}\right)\right)\mathsf{D}.\end{split}

Perform the matrix multiplication to find

[12ζ2𝝂R,,Rζ𝖨p]Θ1=[1+ζ22ζ2𝝂R,RζΣ1𝝁,12ζ2𝝂R,𝝁Σ1+RζΣ1],=[1ζ22ζ2𝝂R,,12ζ2𝝂R,𝝁Σ1+RζΣ1],\begin{split}\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p}\right]{{\mathsf{\Theta}}^{-1}}&=\left[\frac{1+{\zeta}^{2}_{*}}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*}-\frac{R}{{\zeta}_{*}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu},-\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}+\frac{R}{{\zeta}_{*}}{{\mathsf{\Sigma}}^{-1}}\right],\\ &=\left[\frac{1-{\zeta}^{2}_{*}}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},-\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}+\frac{R}{{\zeta}_{*}}{{\mathsf{\Sigma}}^{-1}}\right],\\ \end{split}

which then further simplifies to the form given.

The sample statistic ζ^2{\hat{\zeta}}^{2}_{*} is, up to scaling involving nn, just Hotelling’s T2{T}^{2} statistic. [1] One can perform inference on ζ2{\zeta}^{2}_{*} via this statistic, at least under Gaussian returns, where the distribution of T2{T}^{2} takes a (noncentral) FF-distribution. Note, however, that ζ{\zeta}_{*} is the maximal population Sharpe ratio of any portfolio, so it is an upper bound of the Sharpe ratio of the sample portfolio 𝝂^R,{\boldsymbol{\hat{\nu}}}_{{R,}*}. It is of little comfort to have an estimate of ζ{\zeta}_{*} when the sample portfolio may have a small, or even negative, Sharpe ratio.

Because ζ{\zeta}_{*} is an upper bound on the Sharpe ratio of a portfolio, it seems odd to claim that the Sharpe ratio of the sample portfolio might be asymptotically normal with mean ζ{\zeta}_{*}. In fact, the delta method will fail because the gradient of ζ{\zeta}_{*} with respect to the portfolio is zero at 𝝂R,{\boldsymbol{\nu}}_{{R,}*}. One solution to this puzzle is to estimate the ‘signal-noise ratio,’ incorporating a strictly positive r0{{r}_{0}}. In this case a portfolio may achieve a higher value than ζr0/R{\zeta}_{*}-{{r}_{0}}/R, which is achieved by 𝝂R,{\boldsymbol{\nu}}_{{R,}*}, by violating the risk budget. To push this argument forward, we construct a quadratic approximation to the signal-noise ratio function.

Suppose r0>0,R>0{{r}_{0}}>0,R>0, and assume the population parameters, Θ\mathsf{\Theta} are fixed. Define the signal-noise ratio as

SNR(𝝂;Θ,r0)=df𝝂𝝁r0𝝂Σ𝝂.\operatorname{SNR}\left(\boldsymbol{\nu};\mathsf{\Theta},{{r}_{0}}\right)=_{\operatorname{df}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}. (19)

We will usually drop the dependence on Θ\mathsf{\Theta} and r0{{r}_{0}} and simply write SNR(𝝂)\operatorname{SNR}\left(\boldsymbol{\nu}\right). Defining 𝝂R,{\boldsymbol{\nu}}_{{R,}*} as in Equation 13, note that

SNR(𝝂R,)=ζr0/R.\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)={\zeta}_{*}-{{r}_{0}}/R.
Lemma 2.8 (Quadratic Taylor expansion of signal-noise ratio).

Let 𝛎R,{\boldsymbol{\nu}}_{{R,}*} be defined in Equation 13, and let SNR(𝛎)\operatorname{SNR}\left(\boldsymbol{\nu}\right) be defined as in Equation 26. Then

SNR(𝝂R,+ϵ)=SNR(𝝂R,)+r0R2ζ𝝁ϵ+121R2ϵ((SNR(𝝂R,)2r0R)𝝁𝝁ζ2SNR(𝝂R,)Σ)ϵ+\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}+\boldsymbol{\epsilon}\right)=\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)+\frac{{{r}_{0}}}{R^{2}{\zeta}_{*}}{{\boldsymbol{\mu}}^{\top}}\boldsymbol{\epsilon}+\\ \frac{1}{2}\frac{1}{R^{2}}{{\boldsymbol{\epsilon}}^{\top}}\left(\left(\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)-2\frac{{{r}_{0}}}{R}\right)\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}}{{\zeta}^{2}_{*}}-\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right){\mathsf{\Sigma}}\right)\boldsymbol{\epsilon}+\ldots
Proof.

By Taylor’s theorem,

SNR(𝝂R,+ϵ)=SNR(𝝂R,)+(dSNR(𝒙)d𝒙|𝒙=𝝂R,)ϵ+12ϵ(H𝒙SNR(𝒙)|𝒙=𝝂R,)ϵ+\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}+\boldsymbol{\epsilon}\right)=\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)+\left(\left.\frac{\mathrm{d}{\operatorname{SNR}\left(\boldsymbol{x}\right)}}{\mathrm{d}\boldsymbol{x}}\right|_{\boldsymbol{x}={\boldsymbol{\nu}}_{{R,}*}}\right)\boldsymbol{\epsilon}+\\ \frac{1}{2}{{\boldsymbol{\epsilon}}^{\top}}\left(\left.{{H}_{\boldsymbol{x}}}\operatorname{SNR}\left(\boldsymbol{x}\right)\right|_{\boldsymbol{x}={\boldsymbol{\nu}}_{{R,}*}}\right)\boldsymbol{\epsilon}+\ldots

By simple calculus,

dSNR(𝝂)d𝝂=𝝂Σ𝝂𝝁𝝂𝝁r0𝝂Σ𝝂Σ𝝂𝝂Σ𝝂=𝝂Σ𝝂𝝁SNR(𝝂)Σ𝝂𝝂Σ𝝂\frac{\mathrm{d}{\operatorname{SNR}\left(\boldsymbol{\nu}\right)}}{\mathrm{d}\boldsymbol{\nu}}=\frac{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}\boldsymbol{\mu}-\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}\mathsf{\Sigma}\boldsymbol{\nu}}{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}=\frac{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}\boldsymbol{\mu}-\operatorname{SNR}\left(\boldsymbol{\nu}\right)\mathsf{\Sigma}\boldsymbol{\nu}}{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}} (20)

To compute the Hessian, take the derivative of this gradient:

HSNR=[𝝁𝝂^Σ+Σ𝝂^𝝁3SNR(𝝂^)Σ𝝂^𝝂^Σ𝝂^Σ𝝂^+SNR(𝝂^)𝝂^Σ𝝂^Σ](𝝂^Σ𝝂^)3/2H\operatorname{SNR}=\frac{-\left[\boldsymbol{\mu}{{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}+\mathsf{\Sigma}\boldsymbol{\hat{\nu}}{{\boldsymbol{\mu}}^{\top}}-3\operatorname{SNR}\left(\boldsymbol{\hat{\nu}}\right)\frac{\mathsf{\Sigma}\boldsymbol{\hat{\nu}}{{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}}{\sqrt{{{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}\boldsymbol{\hat{\nu}}}}+\operatorname{SNR}\left(\boldsymbol{\hat{\nu}}\right)\sqrt{{{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}\boldsymbol{\hat{\nu}}}\mathsf{\Sigma}\right]}{\left({{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}\boldsymbol{\hat{\nu}}\right)^{3/2}} (21)

At 𝝂^=𝝂R,=(R/𝝁Σ1𝝁)Σ1𝝁,\boldsymbol{\hat{\nu}}={\boldsymbol{\nu}}_{{R,}*}=\left(R/\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}\right){{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}, the derivative takes value

dSNR(𝒙)d𝒙|𝒙=𝝂R,=R𝝁(ζr0/R)Rζ𝝁R2=r0R2ζ𝝁,\left.\frac{\mathrm{d}{\operatorname{SNR}\left(\boldsymbol{x}\right)}}{\mathrm{d}\boldsymbol{x}}\right|_{\boldsymbol{x}={\boldsymbol{\nu}}_{{R,}*}}=\frac{R\boldsymbol{\mu}-\left({\zeta}_{*}-{{r}_{0}}/R\right)\frac{R}{{\zeta}_{*}}\boldsymbol{\mu}}{R^{2}}=\frac{{{r}_{0}}}{R^{2}{\zeta}_{*}}\boldsymbol{\mu},

and the Hessian takes value

H𝒙SNR(𝒙)|𝒙=𝝂R,=1ζ2(ζ3r0R)𝝁𝝁(ζr0R)ΣR2,\left.{{H}_{\boldsymbol{x}}}\operatorname{SNR}\left(\boldsymbol{x}\right)\right|_{\boldsymbol{x}={\boldsymbol{\nu}}_{{R,}*}}=\frac{\frac{1}{{\zeta}^{2}_{*}}\left({\zeta}_{*}-3\frac{{{r}_{0}}}{R}\right)\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}-\left({\zeta}_{*}-\frac{{{r}_{0}}}{R}\right)\mathsf{\Sigma}}{R^{2}},

completing the proof. ∎

Combining Lemma 2.8 and Corollary 2.7, we get the following:

Corollary 2.9.

Let 𝛎R,{\boldsymbol{\nu}}_{{R,}*} and 𝛎^R,{\boldsymbol{\hat{\nu}}}_{{R,}*} be defined as in Corollary 2.7. As per Lemma 2.6, SNR(𝛎R,)=ζr0/R\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)={\zeta}_{*}-{{r}_{0}}/R. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right).

Then, asymptotically in nn,

SNR(𝝂^R,)𝒩(SNR(𝝂R,),1n𝒉Ω𝒉),\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right)\rightsquigarrow\mathcal{N}\left(\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right),\frac{1}{n}{{\boldsymbol{h}}^{\top}}\mathsf{\Omega}\boldsymbol{h}\right), (22)

where

𝒉=r0Rζ2[12,𝝁,𝟎](𝖫(Θ1Θ1)𝖣).{{\boldsymbol{h}}^{\top}}=-\frac{{{r}_{0}}}{R{\zeta}^{2}_{*}}\left[\frac{1}{2},{{\boldsymbol{\mu}}^{\top}},\boldsymbol{0}\right]\left(-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right). (23)

Moreover, we may express 𝐡{{\boldsymbol{h}}^{\top}} as

𝒉=r02Rζ2([1+ζ2,𝝁Σ1][1ζ2,𝝁Σ1])𝖣.{{\boldsymbol{h}}^{\top}}=-\frac{{{r}_{0}}}{2R{\zeta}^{2}_{*}}\left(\left[1+{\zeta}^{2}_{*},-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\right]\otimes\left[1-{\zeta}^{2}_{*},{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}\right]\right)\mathsf{D}. (24)
Proof.

By the delta method, and the chain rule,

SNR(𝝂^R,)𝒩(SNR(𝝂R,),1n𝒉Ω𝒉),𝒉=dSNR(𝝂R,)d𝝂R,d𝝂R,dvech(Θ).\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right)\rightsquigarrow\mathcal{N}\left(\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right),\frac{1}{n}{{\boldsymbol{h}}^{\top}}\mathsf{\Omega}\boldsymbol{h}\right),\,\,\,{{\boldsymbol{h}}^{\top}}={{\frac{\mathrm{d}{\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)}}{\mathrm{d}{\boldsymbol{\nu}}_{{R,}*}}}^{\top}}\frac{\mathrm{d}{{\boldsymbol{\nu}}_{{R,}*}}}{\mathrm{d}\operatorname{vech}\left(\mathsf{\Theta}\right)}.

From Corollary 2.7, we have

𝒉=dSNR(𝝂R,)d𝝂R,[12ζ2𝝂R,,Rζ𝖨p,𝟢](𝖫(Θ1Θ1)𝖣).{{\boldsymbol{h}}^{\top}}={{{\frac{\mathrm{d}{\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)}}{\mathrm{d}{\boldsymbol{\nu}}_{{R,}*}}}^{\top}}{\left[-\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},-\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right]}}\left(-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right).

Taking the derivative of SNR()\operatorname{SNR}\left(\cdot\right) from Lemma 2.8,

dSNR(𝝂R,)d𝝂R,[12ζ2𝝂R,,Rζ𝖨p,𝟢]=r0R2ζ𝝁[12ζ2𝝂R,,Rζ𝖨p,𝟢],=r0Rζ2[12,𝝁,𝟎].\begin{split}{{{\frac{\mathrm{d}{\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)}}{\mathrm{d}{\boldsymbol{\nu}}_{{R,}*}}}^{\top}}{\left[-\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},-\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right]}}&=-\frac{{{r}_{0}}}{R^{2}{\zeta}_{*}}{{\boldsymbol{\mu}}^{\top}}\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right],\\ &=-\frac{{{r}_{0}}}{R{\zeta}^{2}_{*}}\left[\frac{1}{2},{{\boldsymbol{\mu}}^{\top}},\boldsymbol{0}\right].\end{split} (25)

To establish Equation 24, one proceeds as in the proof of Corollary 2.7. ∎

Caution.

Since 𝝁\boldsymbol{\mu} and Σ\mathsf{\Sigma} are population parameters, SNR(𝝂^R,)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right) is an unobserved quantity. Nevertheless, we can estimate the variance of SNR(𝝂^R,)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right), and possibly construct confidence intervals on it using sample statistics.

This corollary is useless in the case where r0=0{{r}_{0}}=0, and gives somewhat puzzling results when one considers r00{{r}_{0}}\searrow 0, since it suggest that the variance of SNR(𝝂^R,)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right) goes to zero. This is not the case, because the rate of convergence in the corollary is a function of r0{{r}_{0}}. To consider the r0=0{{r}_{0}}=0 case, one must take the quadratic Taylor expansion of the signal-noise ratio function.

Corollary 2.10.

Suppose R>0R>0, and r0=0{{r}_{0}}=0, thus

SNR(𝝂^)=df𝝂^𝝁𝝂^Σ𝝂^.\operatorname{SNR}\left(\boldsymbol{\hat{\nu}}\right)=_{\operatorname{df}}\frac{{{\boldsymbol{\hat{\nu}}}^{\top}}\boldsymbol{\mu}}{\sqrt{{{\boldsymbol{\hat{\nu}}}^{\top}}\mathsf{\Sigma}\boldsymbol{\hat{\nu}}}}. (26)

Let 𝛎R,{\boldsymbol{\nu}}_{{R,}*} and 𝛎^R,{\boldsymbol{\hat{\nu}}}_{{R,}*} be defined as in Corollary 2.7. As per Lemma 2.6, SNR(𝛎R,)=ζ\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)={\zeta}_{*}, where we take r0=0{{r}_{0}}=0. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right).

Then, asymptotically in nn,

n[SNR(𝝂^R,)SNR(𝝂R,)]12tr((𝖧Ω1/2)𝖥(𝖧Ω1/2)𝒛𝒛),n\left[\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{R,}*}\right)-\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)\right]\rightsquigarrow\frac{1}{2}\operatorname{tr}\left({{\left({\mathsf{H}}{{\mathsf{\Omega}}^{1/2}}\right)}^{\top}}\mathsf{F}\left({\mathsf{H}}{{\mathsf{\Omega}}^{1/2}}\right)\boldsymbol{z}{{\boldsymbol{z}}^{\top}}\right), (27)

where z𝒩(𝟎,𝖨p)z\sim\mathcal{N}\left(\boldsymbol{0},{\mathsf{I}}_{p}\right), and where

𝖥\displaystyle\mathsf{F} =1R2(𝝁𝝁ζζΣ),\displaystyle=\frac{1}{R^{2}}\left(\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}}{{\zeta}_{*}}-{\zeta}_{*}{\mathsf{\Sigma}}\right),
𝖧\displaystyle\mathsf{H} =[12ζ2𝝂R,,Rζ𝖨p,𝟢](𝖫(Θ1Θ1)𝖣),\displaystyle={-\left[\frac{1}{2{\zeta}^{2}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}{\mathsf{I}}_{p},\mathsf{0}\right]}\left(-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right),

as in Corollary 2.7.

Proof.

From Lemma 2.8

SNR(𝝂R,+ϵ)\displaystyle\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}+\boldsymbol{\epsilon}\right) =SNR(𝝂R,)+12ϵ𝖥ϵ+,\displaystyle=\operatorname{SNR}\left({\boldsymbol{\nu}}_{{R,}*}\right)+\frac{1}{2}{{\boldsymbol{\epsilon}}^{\top}}\mathsf{F}\boldsymbol{\epsilon}+\ldots,
𝖥\displaystyle\mathsf{F} =1R2(𝝁𝝁ζζΣ).\displaystyle=\frac{1}{R^{2}}\left(\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}}{{\zeta}_{*}}-{\zeta}_{*}{\mathsf{\Sigma}}\right).

By Corollary 2.7,

ϵ=𝝂^R,𝝂R,𝒩(0,1n𝖧Ω𝖧),\boldsymbol{\epsilon}={\boldsymbol{\hat{\nu}}}_{{R,}*}-{\boldsymbol{\nu}}_{{R,}*}\rightsquigarrow\mathcal{N}\left(0,\frac{1}{n}\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right),

so, asymptotically ϵ1n(𝖧Ω1/2)𝒛,\boldsymbol{\epsilon}\rightsquigarrow\frac{1}{\sqrt{n}}\left({\mathsf{H}}{{\mathsf{\Omega}}^{1/2}}\right)\boldsymbol{z}, where 𝒛𝒩(𝟎,𝖨p),\boldsymbol{z}\sim\mathcal{N}\left(\boldsymbol{0},{\mathsf{I}}_{p}\right), and the result follows. ∎

We now seek to link the sample estimate of the optimal achievable Sharpe ratio with the achieved signal-noise ratio of the sample Markowitz portfolio. The magnitude of ζ^2{\hat{\zeta}}^{2}_{*} (along with nn) is the only information on which we might estimate whether the sample Markowitz portfolio is any good. If we view them both as functions of Θ1{{\mathsf{\Theta}}^{-1}}, we can find their expected values and any covariance between them. Somewhat surprisingly, the two quantities are asymptotically almost uncorrelated.

Thus for the following theorem, let us abuse notation to express the signal-noise ratio function, defined in Equation 26 as a function of some vector:

SNR(𝒙;Θ,r0)=([𝟎,𝖨p,𝟢]𝒙)𝝁r0([𝟎,𝖨p,𝟢]𝒙)Σ([𝟎,𝖨p,𝟢]𝒙).\operatorname{SNR}\left(\boldsymbol{x};\mathsf{\Theta},{{r}_{0}}\right)=\frac{{{\left(\left[\boldsymbol{0},{\mathsf{I}}_{p},\mathsf{0}\right]\boldsymbol{x}\right)}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\left(\left[\boldsymbol{0},{\mathsf{I}}_{p},\mathsf{0}\right]\boldsymbol{x}\right)}^{\top}}\mathsf{\Sigma}\left(\left[\boldsymbol{0},{\mathsf{I}}_{p},\mathsf{0}\right]\boldsymbol{x}\right)}}. (28)

Then SNR(vech(Θ^1);Θ,r0)\operatorname{SNR}\left(\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right);\mathsf{\Theta},{{r}_{0}}\right) corresponds to the definition in Equation 26. Moreover, define the “optimal signal-noise ratio” function also as a function of a vector as

SNR(𝒙)=𝒆1𝒙1.{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)=\sqrt{{{{\boldsymbol{e}}_{1}}^{\top}}\boldsymbol{x}-1}. (29)

We have SNR(vech(Θ1))=ζ{{\operatorname{SNR}}_{*}}\left(\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)\right)={\zeta}_{*}, as desired. In an abuse of notation we will simply write SNR(Θ^1;Θ,r0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},{{r}_{0}}\right) and SNR(Θ^1){{\operatorname{SNR}}_{*}}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right) instead of writing out the vector function.

Theorem 2.11.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}. Assume ζ>0{\zeta}_{*}>0. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Define SNR()\operatorname{SNR}\left(\cdot\right) and SNR(){{\operatorname{SNR}}_{*}}\left(\cdot\right) as in Equation 28 and Equation 29. Then, asymptotically in nn,

SNR(Θ^1;Θ,0)ζ+12ntr((𝖧Ω1/2)𝖥(𝖧Ω1/2)𝒛𝒛),ζ^=SNR(Θ^1)ζ+𝒉Ω1/2𝒛2ζntr((𝒉Ω1/2)(𝒉Ω1/2)𝒛𝒛)8ζ3n,ζ^1=SNR(Θ^1)1ζ1(1𝒉Ω1/2𝒛2ζ2n+3tr((𝒉Ω1/2)(𝒉Ω1/2)𝒛𝒛)8ζ4n),\begin{split}\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)&\rightsquigarrow{\zeta}_{*}+\frac{1}{2n}\operatorname{tr}\left({{\left({\mathsf{H}}{{\mathsf{\Omega}}^{1/2}}\right)}^{\top}}\mathsf{F}\left({\mathsf{H}}{{\mathsf{\Omega}}^{1/2}}\right)\boldsymbol{z}{{\boldsymbol{z}}^{\top}}\right),\\ {\hat{\zeta}}_{*}={{\operatorname{SNR}}_{*}}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)&\rightsquigarrow{\zeta}_{*}+\frac{{{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\boldsymbol{z}}{2{\zeta}_{*}\sqrt{n}}-\frac{\operatorname{tr}\left({{\left({{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\right)}^{\top}}\left({{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\right)\boldsymbol{z}{{\boldsymbol{z}}^{\top}}\right)}{8{\zeta}_{*}^{3}n},\\ {\hat{\zeta}}_{*}^{-1}={{\operatorname{SNR}}_{*}}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)^{-1}&\rightsquigarrow{\zeta}_{*}^{-1}\left(1-\frac{{{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\boldsymbol{z}}{2{\zeta}_{*}^{2}\sqrt{n}}+\frac{3\operatorname{tr}\left({{\left({{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\right)}^{\top}}\left({{\boldsymbol{h}}^{\top}}{{\mathsf{\Omega}}^{1/2}}\right)\boldsymbol{z}{{\boldsymbol{z}}^{\top}}\right)}{8{\zeta}_{*}^{4}n}\right),\end{split} (30)

where z𝒩(𝟎,𝖨p)z\sim\mathcal{N}\left(\boldsymbol{0},{\mathsf{I}}_{p}\right), where the matrices 𝖧\mathsf{H} and 𝖥\mathsf{F} are as given in Corollary 2.10, and where

𝒉=([1+ζ2,𝝁Σ1][1+ζ2,𝝁Σ1])𝖣.{{\boldsymbol{h}}^{\top}}=-\left({\left[1+{\zeta}^{2}_{*},-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\right]}\otimes{\left[1+{\zeta}^{2}_{*},-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\right]}\right)\mathsf{D}.
Proof.

The distribution of SNR(𝒙;Θ,0)\operatorname{SNR}\left(\boldsymbol{x};\mathsf{\Theta},0\right) is a restatement of Corollary 2.10. To prove the distribution of ζ^{\hat{\zeta}}_{*}, perform a Taylor’s series expansion of the function SNR(){{\operatorname{SNR}}_{*}}\left(\cdot\right):

SNR(𝒙+ϵ)=SNR(𝒙)+dSNR(𝒙)d𝒙ϵ+12ϵH𝒙SNR(𝒙)ϵ+=SNR(𝒙)+𝒆1ϵ2SNR(x)12ϵ𝒆1𝒆14SNR(𝒙)3ϵ+\begin{split}{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}+\boldsymbol{\epsilon}\right)&={{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)+\frac{\mathrm{d}{{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)}}{\mathrm{d}\boldsymbol{x}}\boldsymbol{\epsilon}+\frac{1}{2}{{\boldsymbol{\epsilon}}^{\top}}{{H}_{\boldsymbol{x}}}{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)\boldsymbol{\epsilon}+\ldots\\ &={{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)+\frac{{{{\boldsymbol{e}}_{1}}^{\top}}\boldsymbol{\epsilon}}{2{{\operatorname{SNR}}_{*}}\left(x\right)}-\frac{1}{2}{{\boldsymbol{\epsilon}}^{\top}}\frac{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}{4{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)^{3}}\boldsymbol{\epsilon}+\ldots\end{split}

By similar reasoning,

1SNR(𝒙+ϵ)=1SNR(𝒙)𝒆1ϵ2SNR(x)3+12ϵ3𝒆1𝒆18SNR(𝒙)5ϵ+\begin{split}\frac{1}{{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}+\boldsymbol{\epsilon}\right)}&=\frac{1}{{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)}-\frac{{{{\boldsymbol{e}}_{1}}^{\top}}\boldsymbol{\epsilon}}{2{{\operatorname{SNR}}_{*}}\left(x\right)^{3}}+\frac{1}{2}{{\boldsymbol{\epsilon}}^{\top}}\frac{3{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}{8{{\operatorname{SNR}}_{*}}\left(\boldsymbol{x}\right)^{5}}\boldsymbol{\epsilon}+\ldots\end{split}

Now by Theorem 2.5, we know the asymptotic distribution of Θ1{{\mathsf{\Theta}}^{-1}}, and thus we get the claimed distributional form for some 𝒉\boldsymbol{h}, with

𝒉=𝒆1(𝖫(Θ1Θ1)𝖣).\boldsymbol{h}={{{\boldsymbol{e}}_{1}}^{\top}}\left(-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right).

We can eliminate the 𝖫\mathsf{L} and write the 𝒆1{\boldsymbol{e}}_{1} as a (p+1)2\left(p+1\right)^{2} length vector, or rather as 𝒆1𝒆1{{{{\boldsymbol{e}}_{1}}^{\top}}}\otimes{{{{\boldsymbol{e}}_{1}}^{\top}}}, where this 𝒆1{\boldsymbol{e}}_{1} is a (p+1)\left(p+1\right) length vector. The product of Kronecker products is the Kronecker product of matrix products so

𝒉=(𝒆1Θ1)(𝒆1Θ1)𝖣,\boldsymbol{h}=-{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\otimes{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\mathsf{D},

establishing the identity of 𝒉\boldsymbol{h}. ∎

This theorem suggests the use of the observable quantity ζ^{\hat{\zeta}}_{*} to perform inference on the unknown quantity, SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right). Asymptotically they have low correlation, and small error between them, which we quantify in the following corollaries.

Corollary 2.12.

The covariance of these two quantities is asymptotically 𝒪(n2)\mathcal{O}\left(n^{-2}\right):

Cov(SNR(Θ^1;Θ,0),SNR(Θ^1))𝒪(n2),\operatorname{Cov}\left(\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right),{{\operatorname{SNR}}_{*}}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)\right)\rightsquigarrow\mathcal{O}\left(n^{-2}\right), (31)

and their correlation is asymptotically 𝒪(n1/2)\mathcal{O}\left(n^{-1/2}\right).

Proof.

To find the moments of SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right) and ζ^{\hat{\zeta}}_{*}, we take 𝒛\boldsymbol{z} to be multivariate standard normal, and thus odd powered products have zero expectation. Their covariance comes entirely from the product of the quadratic (in 𝒛\boldsymbol{z}) terms. By the same logic, the asymptotic standard error of SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right) is 𝒪(n1)\mathcal{O}\left(n^{-1}\right), while that of ζ^{\hat{\zeta}}_{*} is 𝒪(n1/2)\mathcal{O}\left(n^{-1/2}\right). ∎

Corollary 2.13.

The difference SNR(Θ^1;Θ,0)ζ^\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*} has the following asymptotic mean and variance:

E[SNR(Θ^1;Θ,0)ζ^]\displaystyle\operatorname{E}\left[\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right] 12ntr((𝖧𝖥𝖧+𝒉𝒉4ζ3)Ω).\displaystyle\rightsquigarrow\frac{1}{2n}\operatorname{tr}\left(\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}+\frac{\boldsymbol{h}{{\boldsymbol{h}}^{\top}}}{4{\zeta}_{*}^{3}}\right)\mathsf{\Omega}\right). (32)
Var(SNR(Θ^1;Θ,0)ζ^)\displaystyle\operatorname{Var}\left(\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right) 14ζ2n𝒉Ω𝒉,\displaystyle\rightsquigarrow\frac{1}{4{\zeta}^{2}_{*}n}{{\boldsymbol{h}}^{\top}}\mathsf{\Omega}\boldsymbol{h}, (33)

where 𝖧\mathsf{H}, 𝖥\mathsf{F}, and 𝐡\boldsymbol{h} are given in the theorem. Their ratio has the following asymptotic mean and variance:

E[SNR(Θ^1;Θ,0)ζ^]\displaystyle\operatorname{E}\left[\frac{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)}{{\hat{\zeta}}_{*}}\right] 1+12nζtr((𝖧𝖥𝖧+3𝒉𝒉4ζ3)Ω).\displaystyle\rightsquigarrow 1+\frac{1}{2n{\zeta}_{*}}\operatorname{tr}\left(\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}+\frac{3\boldsymbol{h}{{\boldsymbol{h}}^{\top}}}{4{\zeta}_{*}^{3}}\right)\mathsf{\Omega}\right). (34)
Var(SNR(Θ^1;Θ,0)ζ^)\displaystyle\operatorname{Var}\left(\frac{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)}{{\hat{\zeta}}_{*}}\right) 14ζ4n𝒉Ω𝒉.\displaystyle\rightsquigarrow\frac{1}{4{\zeta}_{*}^{4}n}{{\boldsymbol{h}}^{\top}}\mathsf{\Omega}\boldsymbol{h}. (35)

This corollary gives a recipe for building confidence intervals on SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right) via the observed ζ{\zeta}_{*}, namely by plugging in sample estimates for Ω\mathsf{\Omega} and 𝖧\mathsf{H}, 𝖥\mathsf{F} and 𝒉\boldsymbol{h}. This result is comparable to the “Sharpe Ratio Information Criterion” estimator for SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right). [51]

To use this corollary, one may need a compact expression for 𝖧𝖥𝖧{{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}. We start with Equation 18, and write

𝖧𝖥𝖧=𝖧((𝒆1Θ1)𝖥[1ζ22ζ𝝂R,,Rζ(Σ1Σ1𝝁𝝁Σ12ζ2)])𝖣.\begin{split}{{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}&={{\mathsf{H}}^{\top}}\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\mathsf{F}\left[\frac{1-{\zeta}^{2}_{*}}{2{\zeta}_{*}}{\boldsymbol{\nu}}_{{R,}*},\frac{R}{{\zeta}_{*}}\left({{\mathsf{\Sigma}}^{-1}}-\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{2{\zeta}^{2}_{*}}\right)\right]\right)\mathsf{D}.\end{split}

But note that 𝖥Σ1𝝁=𝟎\mathsf{F}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}=\boldsymbol{0} (and so 𝖥𝝂R,=𝟎\mathsf{F}{\boldsymbol{\nu}}_{{R,}*}=\boldsymbol{0}) so we may write

𝖧𝖥𝖧\displaystyle{{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H} =𝖧((𝒆1Θ1)[𝟎,Rζ𝖥Σ1])𝖣,\displaystyle={{\mathsf{H}}^{\top}}\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\boldsymbol{0},\frac{R}{{\zeta}_{*}}\mathsf{F}{{\mathsf{\Sigma}}^{-1}}\right]\right)\mathsf{D},
=1R𝖧((𝒆1Θ1)[𝟎,𝝁𝝁Σ1ζ2𝖨])𝖣,\displaystyle=\frac{1}{R}{{\mathsf{H}}^{\top}}\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\boldsymbol{0},\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}\right]\right)\mathsf{D},
=1ζ𝖣((Θ1𝒆1𝒆1Θ1)[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1])𝖣.\displaystyle=\frac{1}{{\zeta}_{*}}{{\mathsf{D}}^{\top}}\left(\left({{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\right)\mathsf{D}. (38)

Similarly, a compact form for 𝒉𝒉\boldsymbol{h}{{\boldsymbol{h}}^{\top}}:

𝒉𝒉\displaystyle\boldsymbol{h}{{\boldsymbol{h}}^{\top}} =𝖣((𝒆1Θ1Θ1𝒆1)(𝒆1Θ1Θ1𝒆1))𝖣.\displaystyle={{\mathsf{D}}^{\top}}\left({\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}\right)}\otimes{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}\right)}\right)\mathsf{D}. (39)

3 Distribution under Gaussian returns

The goal of this section is to derive a variant of Theorem 2.5 for the case where 𝒙\boldsymbol{x} follow a multivariate Gaussian distribution. First, assuming 𝒙𝒩(𝝁,Σ)\boldsymbol{x}\sim\mathcal{N}\left(\boldsymbol{\mu},\mathsf{\Sigma}\right), we can express the density of 𝒙\boldsymbol{x}, and of Θ^\mathsf{\hat{\Theta}}, in terms of pp, nn, and Θ\mathsf{\Theta}.

Lemma 3.1 (Gaussian sample density).

Suppose 𝐱𝒩(𝛍,Σ)\boldsymbol{x}\sim\mathcal{N}\left(\boldsymbol{\mu},\mathsf{\Sigma}\right). Letting 𝐱~=[1,𝐱]\boldsymbol{\tilde{x}}={{\left[1,{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, and Θ=E[𝐱~𝐱~]\mathsf{\Theta}=\operatorname{E}\left[\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right], then the negative log likelihood of 𝐱\boldsymbol{x} is

logf𝒩(𝒙;𝝁,Σ)=cp+12log|Θ|+12tr(Θ1𝒙~𝒙~),-\log{{f}_{\mathcal{N}}}\left(\boldsymbol{x};\boldsymbol{\mu},\mathsf{\Sigma}\right)=c_{p}+\frac{1}{2}\log\left|\mathsf{\Theta}\right|+\frac{1}{2}\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right), (40)

for the constant cp=12+p2log(2π).c_{p}=-\frac{1}{2}+\frac{p}{2}\log\left(2\pi\right).

Proof.

By the block determinant formula,

|Θ|=|1||(Σ+𝝁𝝁)𝝁11𝝁|=|Σ|.\left|\mathsf{\Theta}\right|=\left|1\right|\left|\left(\mathsf{\Sigma}+\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}\right)-\boldsymbol{\mu}1^{-1}{{\boldsymbol{\mu}}^{\top}}\right|=\left|\mathsf{\Sigma}\right|.

Note also that

(𝒙𝝁)Σ1(𝒙𝝁)=𝒙~Θ1𝒙~1.{{\left(\boldsymbol{x}-\boldsymbol{\mu}\right)}^{\top}}{{\mathsf{\Sigma}}^{-1}}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)={{\boldsymbol{\tilde{x}}}^{\top}}{{\mathsf{\Theta}}^{-1}}\boldsymbol{\tilde{x}}-1.

These relationships hold without assuming a particular distribution for 𝒙\boldsymbol{x}.

The density of 𝒙\boldsymbol{x} is then

f𝒩(𝒙;𝝁,Σ)=1(2π)p|Σ|exp(12(𝒙𝝁)Σ1(𝒙𝝁)),=|Σ|12(2π)p/2exp(12(𝒙~Θ1𝒙~1)),=(2π)p/2|Θ|12exp(12(𝒙~Θ1𝒙~1)),=(2π)p/2exp(1212log|Θ|12tr(Θ1𝒙~𝒙~)),\begin{split}{{f}_{\mathcal{N}}}\left(\boldsymbol{x};\boldsymbol{\mu},\mathsf{\Sigma}\right)&=\frac{1}{\sqrt{\left(2\pi\right)^{p}\left|\mathsf{\Sigma}\right|}}\operatorname{exp}\left(-\frac{1}{2}{{\left(\boldsymbol{x}-\boldsymbol{\mu}\right)}^{\top}}{{\mathsf{\Sigma}}^{-1}}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)\right),\\ &=\frac{{\left|\mathsf{\Sigma}\right|}^{-\frac{1}{2}}}{\left(2\pi\right)^{p/2}}\operatorname{exp}\left(-\frac{1}{2}\left({{\boldsymbol{\tilde{x}}}^{\top}}{{\mathsf{\Theta}}^{-1}}\boldsymbol{\tilde{x}}-1\right)\right),\\ &=\left(2\pi\right)^{-p/2}{\left|\mathsf{\Theta}\right|}^{-\frac{1}{2}}\operatorname{exp}\left(-\frac{1}{2}\left({{\boldsymbol{\tilde{x}}}^{\top}}{{\mathsf{\Theta}}^{-1}}\boldsymbol{\tilde{x}}-1\right)\right),\\ &=\left(2\pi\right)^{-p/2}\operatorname{exp}\left(\frac{1}{2}-\frac{1}{2}\log\left|\mathsf{\Theta}\right|-\frac{1}{2}\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right)\right),\end{split}

and the result follows. ∎

Lemma 3.2 (Gaussian second moment matrix density).

Let 𝐱𝒩(𝛍,Σ)\boldsymbol{x}\sim\mathcal{N}\left(\boldsymbol{\mu},\mathsf{\Sigma}\right), 𝐱~=[1,𝐱]\boldsymbol{\tilde{x}}={{\left[1,{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, and Θ=E[𝐱~𝐱~]\mathsf{\Theta}=\operatorname{E}\left[\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right]. Given nn i.i.d. samples 𝐱i{\boldsymbol{x}}_{i}, let Let Θ^=1ni𝐱~i𝐱~i\mathsf{\hat{\Theta}}=\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{x}}}_{i}{{{\boldsymbol{\tilde{x}}}_{i}}^{\top}}. Then the density of Θ^\mathsf{\hat{\Theta}} is

f(Θ^;Θ)=exp(cn,p)|Θ^|np22|Θ|n2exp(n2tr(Θ1Θ^)),f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)=\operatorname{exp}\left(c^{\prime}_{n,p}\right)\frac{\left|\mathsf{\hat{\Theta}}\right|^{\frac{n-p-2}{2}}}{\left|\mathsf{\Theta}\right|^{\frac{n}{2}}}\operatorname{exp}\left(-\frac{n}{2}\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}\right)\right), (41)

for some cn,p.c^{\prime}_{n,p}.

Proof.

Let 𝖷~\mathsf{\tilde{{X}}} be the matrix whose rows are the vectors 𝒙i{{{\boldsymbol{x}}_{i}}^{\top}}. From Lemma 3.1, and using linearity of the trace, the negative log density of 𝖷~\mathsf{\tilde{{X}}} is

logf𝒩(𝖷~;Θ)=ncp+n2log|Θ|+12tr(Θ1𝖷~𝖷~),2logf𝒩(𝖷~;Θ)n=2cp+log|Θ|+tr(Θ1Θ^).\begin{split}-\log{{f}_{\mathcal{N}}}\left(\mathsf{\tilde{{X}}};\mathsf{\Theta}\right)&=nc_{p}+\frac{n}{2}\log\left|\mathsf{\Theta}\right|+\frac{1}{2}\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}\right),\\ \therefore\frac{-2\log{{f}_{\mathcal{N}}}\left(\mathsf{\tilde{{X}}};\mathsf{\Theta}\right)}{n}&=2c_{p}+\log\left|\mathsf{\Theta}\right|+\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}\right).\end{split}

By Lemma (5.1.1) of Press [55], this can be expressed as a density on Θ^\mathsf{\hat{\Theta}}:

2logf(Θ^;Θ)n=2logf𝒩(𝖷~;Θ)n2n(np22log|Θ^|)2n(p+12(np2)logπj=1p+1logΓ(n+1j2)),=[2cpp+1n(np2)logπ2nj=1p+1logΓ(n+1j2)]+log|Θ|np2nlog|Θ^|+tr(Θ1Θ^),=cn,plog|Θ^|np2n|Θ|+tr(Θ1Θ^),\begin{split}\frac{-2\log f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)}{n}&=\frac{-2\log{{f}_{\mathcal{N}}}\left(\mathsf{\tilde{{X}}};\mathsf{\Theta}\right)}{n}-\frac{2}{n}\left(\frac{n-p-2}{2}\log\left|\mathsf{\hat{\Theta}}\right|\right)\\ &\phantom{=}\,-\frac{2}{n}\left(\frac{p+1}{2}\left(n-\frac{p}{2}\right)\log\pi-\sum_{j=1}^{p+1}\log\Gamma\left(\frac{n+1-j}{2}\right)\right),\\ &=\left[2c_{p}-\frac{p+1}{n}\left(n-\frac{p}{2}\right)\log\pi-\frac{2}{n}\sum_{j=1}^{p+1}\log\Gamma\left(\frac{n+1-j}{2}\right)\right]\\ &\phantom{=}\,+\log\left|\mathsf{\Theta}\right|-\frac{n-p-2}{n}\log\left|\mathsf{\hat{\Theta}}\right|+\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}\right),\\ &=c^{\prime}_{n,p}-\log\frac{\left|\mathsf{\hat{\Theta}}\right|^{\frac{n-p-2}{n}}}{\left|\mathsf{\Theta}\right|}+\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}\right),\end{split}

where cn,pc^{\prime}_{n,p} is the term in brackets on the third line. Factoring out 2/n-2/n and taking an exponent gives the result. ∎

Corollary 3.3.

The random variable nΘ^n\mathsf{\hat{\Theta}} has the same density, up to a constant in pp and nn, as a p+1p+1-dimensional Wishart random variable with nn degrees of freedom and scale matrix Θ\mathsf{\Theta}. Thus nΘ^n\mathsf{\hat{\Theta}} is a conditional Wishart, conditional on Θ^1,1=1\mathsf{\hat{\Theta}}_{1,1}=1. [55, 1]

Corollary 3.4.

The derivatives of log likelihood are given by

dlogf(Θ^;Θ)dvec(Θ)=n2[vec(Θ1Θ1Θ^Θ1)],dlogf(Θ^;Θ)dvec(Θ1)=n2[vec(ΘΘ^)].\begin{split}\frac{\mathrm{d}{\log f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{\Theta}\right)}&=-\frac{n}{2}{{\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}-{{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}{{\mathsf{\Theta}}^{-1}}\right)\right]}^{\top}},\\ \frac{\mathrm{d}{\log f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)}}{\mathrm{d}\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}\right)}&=-\frac{n}{2}{{\left[\operatorname{vec}\left(\mathsf{\Theta}-\mathsf{\hat{\Theta}}\right)\right]}^{\top}}.\end{split} (42)
Proof.

Plugging in the log likelihood gives

dlogf(Θ^;Θ)dvec(Θ)=n2[dlog|Θ|dvec(Θ)+dtr(Θ1Θ^)dvec(Θ)],\frac{\mathrm{d}{\log f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{\Theta}\right)}=-\frac{n}{2}\left[\frac{\mathrm{d}{\log\left|\mathsf{\Theta}\right|}}{\mathrm{d}\operatorname{vec}\left(\mathsf{\Theta}\right)}+\frac{\mathrm{d}{\operatorname{tr}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\hat{\Theta}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{\Theta}\right)}\right],

and then standard matrix calculus gives the first result. [37, 54] Proceeding similarly gives the second. ∎

This immediately gives us the Maximum Likelihood Estimator.

Corollary 3.5 (MLE).

Θ^\mathsf{\hat{\Theta}} is the maximum likelihood estimator of Θ\mathsf{\Theta}.

To compute the covariance of vech(Θ)\operatorname{vech}\left(\mathsf{\Theta}\right), Ω\mathsf{\Omega}, in the Gaussian case, one can compute the Fisher Information, then appeal to the fact that Θ\mathsf{\Theta} is the MLE. However, because the first element of vech(Θ)\operatorname{vech}\left(\mathsf{\Theta}\right) is a deterministic 11, the first row and column of Ω\mathsf{\Omega} are all zeros. This is an unfortunate wrinkle. One solution would be to to compute the Fisher Information with respect to the nonredundant variables; however, a direct brute-force approach is also possible, and gives a slightly more general result, as in the following section.

3.1 Distribution under elliptical returns

We pursue a slightly more general result on the distribution of Θ^\mathsf{\hat{\Theta}}, assuming that 𝒙\boldsymbol{x} are drawn independently from an elliptical distribution, with mean 𝝁\boldsymbol{\mu}, covariance Σ\mathsf{\Sigma}, and ‘kurtosis factor’ κ\kappa, by which we mean the excess kurtosis of each 𝒙i{\boldsymbol{x}}_{i} is 3(κ1)3\left(\kappa-1\right). In the Gaussian case, κ=1\kappa=1.

To be concrete, we suppose 𝒙=𝝁+aΛ1/2𝒛𝒛2\boldsymbol{x}=\boldsymbol{\mu}+\frac{a{{\mathsf{\Lambda}}^{1/2}}\boldsymbol{z}}{\left\|{\boldsymbol{z}}\right\|_{2}}, where 𝒩(𝟎,𝖨n)\boldsymbol{\sim}\mathcal{N}\left(\boldsymbol{0},{\mathsf{I}}_{n}\right), and where aa is a scalar random variable independent of 𝒛\boldsymbol{z}. The covariance Σ\mathsf{\Sigma} is related to the matrix Λ\mathsf{\Lambda} via

Σ=E[a2]nΛ.\mathsf{\Sigma}=\frac{\operatorname{E}\left[a^{2}\right]}{n}\mathsf{\Lambda}.

The kurtosis parameter is then defined as

κ=nn+2E[a4]E[a2].\kappa=\frac{n}{n+2}\frac{\operatorname{E}\left[a^{4}\right]}{\operatorname{E}\left[a^{2}\right]}.

An extension of Isserlis’ Theorem to the elliptical distribution gives moments of products of elements of 𝒙\boldsymbol{x}. [64, 29] This result is comparable to the covariance of the centered second moments given as Equation (2.1) of Iwashita and Siotani, but is applicable to the uncentered second moment. [25]

Theorem 3.6.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}, assumed to take an elliptical distribution with kurtosis parameter κ\kappa. In analogue to how 𝐱~\boldsymbol{\tilde{x}} are built from 𝐱\boldsymbol{x}, define

𝝁~=df[1,𝝁],andΣ~=df[0𝟎𝟎Σ].\boldsymbol{\tilde{\mu}}=_{\operatorname{df}}{{\left[1,{{\boldsymbol{\mu}}^{\top}}\right]}^{\top}},\quad\mbox{and}\quad\mathsf{\tilde{\Sigma}}=_{\operatorname{df}}\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\mathsf{\Sigma}}\end{array}\right]. (43)

Note that Θ=Σ~+𝛍~𝛍~.\mathsf{\Theta}=\mathsf{\tilde{\Sigma}}+\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}. Then

nVar(vec(Θ^))=Ω0\displaystyle{n}\operatorname{Var}\left(\operatorname{vec}\left(\mathsf{\hat{\Theta}}\right)\right)={\mathsf{\Omega}}_{0} =(κ1)[vec(Σ~)vec(Σ~)+(𝖨+𝖪)Σ~Σ~]\displaystyle=\left(\kappa-1\right)\left[\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}+\left(\mathsf{I}+\mathsf{K}\right){\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right] (44)
+(𝖨+𝖪)[ΘΘ𝝁~𝝁~𝝁~𝝁~].\displaystyle\phantom{=}\,+\left(\mathsf{I}+\mathsf{K}\right)\left[{\mathsf{\Theta}}\otimes{\mathsf{\Theta}}-{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right].

As it is cumbersome and unenlightening, we relegate the proof to the appendix. The central limit theorem then gives us the following corollary.

Corollary 3.7.

Under the conditions of the previous theorem, asymptotically in nn,

n(vech(Θ^)vech(Θ))𝒩(0,𝖫Ω0𝖫),\sqrt{n}\left(\operatorname{vech}\left(\mathsf{\hat{\Theta}}\right)-\operatorname{vech}\left(\mathsf{\Theta}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{L}{\mathsf{\Omega}}_{0}{{\mathsf{L}}^{\top}}\right), (45)

where Ω0{\mathsf{\Omega}}_{0} is defined in Equation 44.

Using Theorem 2.5 we also immediately get the following

Corollary 3.8.

Under the conditions of the previous theorem, asymptotically in nn,

n(vech(Θ^1)vech(Θ1))𝒩(0,𝖧𝖫Ω0𝖫𝖧),\sqrt{n}\left(\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)-\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{L}{\mathsf{\Omega}}_{0}{{\mathsf{L}}^{\top}}{{\mathsf{H}}^{\top}}\right), (46)

where Ω0{\mathsf{\Omega}}_{0} is defined in Equation 44, and

𝖧=𝖫(Θ1Θ1)𝖣.\mathsf{H}=-\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}. (47)

An uglier form of the same corollary gives the covariance explicitly. See the appendix for the proof.

Corollary 3.9.

Under the conditions of the previous theorem, asymptotically in nn,

n(vech(Θ^1)vech(Θ1))𝒩(0,𝖡),\sqrt{n}\left(\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right)-\operatorname{vech}\left({{\mathsf{\Theta}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{B}\right), (48)

where

𝖡\displaystyle\mathsf{B} =(κ1)𝖫[vec(Θ1𝒆1𝒆1)vec(Θ1𝒆1𝒆1)]𝖫\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right){{\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}^{\top}}\right]{{\mathsf{L}}^{\top}}
+2(κ1)𝖫𝖭[(Θ1𝒆1𝒆1)(Θ1𝒆1𝒆1)]𝖭𝖫\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left[{\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}\otimes{\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2𝖫𝖭[Θ1Θ1𝒆1𝒆1𝒆1𝒆1]𝖭𝖫.\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}-{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}.

We are often concerned with the signal-noise ratio and sample Markowitz portfolio, whose joint asymptotic distribution we can pick out from the previous corollary:

Corollary 3.10.

Under the conditions of the previous theorem, asymptotically in nn,

n([1+ζ^2𝝂^][1+ζ2𝝂])𝒩(0,𝖢),\sqrt{n}\left(\left[\begin{array}[]{r}{1+{\hat{\zeta}}^{2}_{*}}\\ {-{\boldsymbol{\hat{\nu}}}_{{}*}}\end{array}\right]-\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{C}\right), (49)

where

𝖢=[2ζ2(2+ζ2)+3(κ1)ζ4(2(1+ζ2)+3(κ1)ζ2)𝝂(2(1+ζ2)+3(κ1)ζ2)𝝂(1+κζ2)Σ1+(2κ1)𝝂𝝂].\mathsf{C}=\left[\begin{array}[]{cc}{2{\zeta}^{2}_{*}\left(2+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}_{*}^{4}}&{-\left(2\left(1+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}^{2}_{*}\right){{{\boldsymbol{\nu}}_{{}*}}^{\top}}}\\ {-\left(2\left(1+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}^{2}_{*}\right){\boldsymbol{\nu}}_{{}*}}&{\left(1+\kappa{\zeta}^{2}_{*}\right){{\mathsf{\Sigma}}^{-1}}+\left(2\kappa-1\right){\boldsymbol{\nu}}_{{}*}{{{\boldsymbol{\nu}}_{{}*}}^{\top}}}\end{array}\right].

Furthermore, if 𝖰\mathsf{Q} is an orthogonal matrix (𝖰𝖰=𝖨\mathsf{Q}{{\mathsf{Q}}^{\top}}=\mathsf{I}) such that

𝖰(Σ1/2)1𝝁=ζ𝒆1,\mathsf{Q}{{\left({{\mathsf{\Sigma}}^{1/2}}\right)}^{-1}}\boldsymbol{\mu}={\zeta}_{*}{\boldsymbol{e}}_{1},

where Σ1/2{{\mathsf{\Sigma}}^{1/2}} is the lower triangular Cholesky factor of Σ\mathsf{\Sigma}, then asymptotically in nn,

n([1+ζ^2𝖰Σ/2𝝂^][1+ζ2ζ𝒆1])𝒩(0,𝖣),\sqrt{n}\left(\left[\begin{array}[]{r}{1+{\hat{\zeta}}^{2}_{*}}\\ {\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}\end{array}\right]-\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {{\zeta}_{*}{\boldsymbol{e}}_{1}}\end{array}\right]\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{D}\right), (50)

where

𝖣=[2ζ2(2+ζ2)+3(κ1)ζ4(2(1+ζ2)+3(κ1)ζ2)ζ𝒆1(2(1+ζ2)+3(κ1)ζ2)ζ𝒆1(1+κζ2)𝖨+(2κ1)ζ2𝒆1𝒆1].\mathsf{D}=\left[\begin{array}[]{cc}{2{\zeta}^{2}_{*}\left(2+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}_{*}^{4}}&{\left(2\left(1+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}^{2}_{*}\right){\zeta}_{*}{{{\boldsymbol{e}}_{1}}^{\top}}}\\ {\left(2\left(1+{\zeta}^{2}_{*}\right)+3\left(\kappa-1\right){\zeta}^{2}_{*}\right){\zeta}_{*}{\boldsymbol{e}}_{1}}&{\left(1+\kappa{\zeta}^{2}_{*}\right)\mathsf{I}+\left(2\kappa-1\right){\zeta}^{2}_{*}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\end{array}\right].

We note that the asymptotic variance of ζ^2{\hat{\zeta}}^{2}_{*} we find here for the case of Gaussian returns (κ=1\kappa=1) is consistent with the exact variance one computes from the non-central FF distribution, via the connection to Hotelling’s T2{T}^{2}. That variance (assuming, as we do here, that Σ^\mathsf{\hat{\Sigma}} is estimated with nn in the numerator, not n1n-1) is

2n2(ζ4+2ζ2)+n(p4ζ2)2p(np2)2(np4)=2ζ22+ζ2n+𝒪(n2).2\frac{n^{2}\left({\zeta}^{4}_{*}+2{\zeta}^{2}_{*}\right)+n\left(p-4{\zeta}^{2}_{*}\right)-2p}{\left(n-p-2\right)^{2}\left(n-p-4\right)}=2{\zeta}^{2}_{*}\frac{2+{\zeta}^{2}_{*}}{n}+\mathcal{O}\left(n^{-2}\right).

Our asymptotic variance captures only the leading term, as one would expect.

The choice to rescale by 𝖰\mathsf{Q} and Σ/2{{\mathsf{\Sigma}}^{\top/2}} in Equation 50 is worthy of explanation. First note that the true expected return of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*} is equal to

𝝁𝝂^\displaystyle{{\boldsymbol{\mu}}^{\top}}{\boldsymbol{\hat{\nu}}}_{{}*} =𝝁((Σ1/2)1)Σ/2𝝂^\displaystyle={{\boldsymbol{\mu}}^{\top}}{{\left({{\left({{\mathsf{\Sigma}}^{1/2}}\right)}^{-1}}\right)}^{\top}}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}
=((Σ1/2)1𝝁)𝖰𝖰Σ/2𝝂^=ζ𝒆1𝖰Σ/2𝝂^.\displaystyle={{\left({{\left({{\mathsf{\Sigma}}^{1/2}}\right)}^{-1}}\boldsymbol{\mu}\right)}^{\top}}{{\mathsf{Q}}^{\top}}\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}={\zeta}_{*}{{{\boldsymbol{e}}_{1}}^{\top}}\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}.

That is the expected return is determined entirely by the first element of 𝖰Σ/2𝝂^\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}. Now note that the volatility of 𝝂^{\boldsymbol{\hat{\nu}}}_{{}*} is equal to the Euclidian norm of that vector:

𝝂^Σ𝝂^\displaystyle{{{\boldsymbol{\hat{\nu}}}_{{}*}}^{\top}}\mathsf{\Sigma}{\boldsymbol{\hat{\nu}}}_{{}*} =(Σ/2𝝂^)(Σ/2𝝂^)\displaystyle={{\left({{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}\right)}^{\top}}\left({{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}\right)
=(Σ/2𝝂^)𝖰𝖰(Σ/2𝝂^)=𝖰Σ/2𝝂^2.\displaystyle={{\left({{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}\right)}^{\top}}{{\mathsf{Q}}^{\top}}\mathsf{Q}\left({{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}\right)=\left\|{\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}\right\|_{2}.

The rotation also gives a diagonal asymptotic covariance. That is, the matrix (1+κζ2)𝖨+(2κ1)ζ2𝒆1𝒆1\left(1+\kappa{\zeta}^{2}_{*}\right)\mathsf{I}+\left(2\kappa-1\right){\zeta}^{2}_{*}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}} is diagonal, so we treat the errors in the vector 𝖰Σ/2𝝂^\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*} as asymptotically uncorrelated.

We can use these facts to arrive at an approximate asymptotic distribution of the signal-noise ratio of the Markowitz portfolio, defined via the function

SNR(𝝂)=df𝝂𝝁𝝂Σ𝝂.\operatorname{SNR}\left({\boldsymbol{\nu}}_{{}*}\right)=_{\operatorname{df}}\frac{{{{\boldsymbol{\nu}}_{{}*}}^{\top}}\boldsymbol{\mu}}{\sqrt{{{{\boldsymbol{\nu}}_{{}*}}^{\top}}\mathsf{\Sigma}{\boldsymbol{\nu}}_{{}*}}}.

So, by the above

SNR(𝝂^)=ζ𝒆1𝖰Σ/2𝝂^𝖰Σ/2𝝂^2.\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right)={\zeta}_{*}\frac{{{{\boldsymbol{e}}_{1}}^{\top}}\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}{\left\|{\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}\right\|_{2}}.

Asymptotically we can think of this as

SNR(𝝂^)=ζζ+λ1z1(ζ+λ1z1)2+λp2(z22++zp2),\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right)={\zeta}_{*}\frac{{\zeta}_{*}+\lambda_{1}z_{1}}{\sqrt{\left({\zeta}_{*}+\lambda_{1}z_{1}\right)^{2}+\lambda_{p}^{2}\left(z_{2}^{2}+\ldots+z_{p}^{2}\right)}},

where

λ1=n1/2(1+κζ2)+(2κ1)ζ2,andλp=n1/2(1+κζ2),\lambda_{1}=n^{-1/2}\sqrt{\left(1+\kappa{\zeta}^{2}_{*}\right)+\left(2\kappa-1\right){\zeta}^{2}_{*}},\quad\mbox{and}\quad\lambda_{p}=n^{-1/2}\sqrt{\left(1+\kappa{\zeta}^{2}_{*}\right)},

and where the ziz_{i} are independent standard normals.

Now consider the Tangent of Arcsine, or “tas,” transform defined as ftas(x)=x/1x2{{f}_{\mbox{tas}}}\left(x\right)=x/\sqrt{1-x^{2}}. [52] Applying this transformation to the rescaled signal-noise ratio, one arrives at

ftas(SNR(𝝂^)ζ)=ζ+λ1z1λpz22++zp2,{{f}_{\mbox{tas}}}\left(\frac{\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right)}{{\zeta}_{*}}\right)=\frac{{\zeta}_{*}+\lambda_{1}z_{1}}{\lambda_{p}\sqrt{z_{2}^{2}+\ldots+z_{p}^{2}}},

which looks a lot like a non-central tt random variable, up to scaling. So write

ftas(SNR(𝝂^)ζ)=λ1λpp1ζλ1+z1z22++zp2/p1=λ1λpp1t,{{f}_{\mbox{tas}}}\left(\frac{\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right)}{{\zeta}_{*}}\right)=\frac{\lambda_{1}}{\lambda_{p}\sqrt{p-1}}\frac{\frac{{\zeta}_{*}}{\lambda_{1}}+z_{1}}{\sqrt{z_{2}^{2}+\ldots+z_{p}^{2}}/\sqrt{p-1}}=\frac{\lambda_{1}}{\lambda_{p}\sqrt{p-1}}t, (51)

where tt is a non-central tt random variable with p1p-1 degrees of freedom and non-centrality parameter ζ/λ1{\zeta}_{*}/\lambda_{1}. See Section 7.1.2, however, for simulations which indicate that an unreasonably large sample size is required for this approximation to be of any use.

We can perform one more transformation and use the delta method once again on the map xx1x\mapsto\sqrt{x-1} to convert Equation 50 into

n([ζ^𝖰Σ/2𝝂^][ζζ𝒆1])𝒩(0,𝖣𝟣),\sqrt{n}\left(\left[\begin{array}[]{r}{{\hat{\zeta}}_{*}}\\ {\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}\end{array}\right]-\left[\begin{array}[]{r}{{\zeta}_{*}}\\ {{\zeta}_{*}{\boldsymbol{e}}_{1}}\end{array}\right]\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{D_{1}}\right), (52)

where

𝖣𝟣=[1+2+3(κ1)4ζ2((1+ζ2)+32(κ1)ζ2)𝒆1(2(1+ζ2)+32(κ1)ζ2)𝒆1(1+κζ2)𝖨+(2κ1)ζ2𝒆1𝒆1].\mathsf{D_{1}}=\left[\begin{array}[]{cc}{1+\frac{2+3\left(\kappa-1\right)}{4}{\zeta}_{*}^{2}}&{\left(\left(1+{\zeta}^{2}_{*}\right)+\frac{3}{2}\left(\kappa-1\right){\zeta}^{2}_{*}\right){{{\boldsymbol{e}}_{1}}^{\top}}}\\ {\left(2\left(1+{\zeta}^{2}_{*}\right)+\frac{3}{2}\left(\kappa-1\right){\zeta}^{2}_{*}\right){\boldsymbol{e}}_{1}}&{\left(1+\kappa{\zeta}^{2}_{*}\right)\mathsf{I}+\left(2\kappa-1\right){\zeta}^{2}_{*}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\end{array}\right].

Note that the variance for ζ{\zeta}_{*} given here is just Mertens’ form of the standard error of the Sharpe ratio, given that elliptical distributions have zero skew and excess kurtosis of 3(κ1)3\left(\kappa-1\right). [42]

We can take this a step further by swapping in ζ^{\hat{\zeta}}_{*} for ζ{\zeta}_{*} to arrive at

n(𝖰Σ/2𝝂^ζ^𝒆1)𝒩(0,𝖣𝟤),\sqrt{n}\left(\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}-{\hat{\zeta}}_{*}{\boldsymbol{e}}_{1}\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{D_{2}}\right), (53)

where

𝖣𝟤=(1+κζ2)𝖨(1+2+(κ1)4ζ2)𝒆1𝒆1.\mathsf{D_{2}}=\left(1+\kappa{\zeta}^{2}_{*}\right)\mathsf{I}-\left(1+\frac{2+\left(\kappa-1\right)}{4}{\zeta}^{2}_{*}\right){\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}.

Now perform the same transformation to a tt random variable to claim that

ftas(SNR(𝝂^)ζ)=λ1λpp1t,{{f}_{\mbox{tas}}}\left(\frac{\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right)}{{\zeta}_{*}}\right)=\frac{\lambda_{1}^{\prime}}{\lambda_{p}\sqrt{p-1}}t, (54)

where tt is a non-central tt random variable with p1p-1 degrees of freedom and non-centrality parameter ζ^/λ1{\hat{\zeta}}_{*}/\lambda_{1}^{\prime}, and

λ1=n1/22+3(κ1)4ζ2,andλp=n1/2(1+κζ2).\lambda_{1}^{\prime}=n^{-1/2}\sqrt{\frac{2+3\left(\kappa-1\right)}{4}{\zeta}^{2}_{*}},\quad\mbox{and}\quad\lambda_{p}=n^{-1/2}\sqrt{\left(1+\kappa{\zeta}^{2}_{*}\right)}.

This suggests another confidence limit, namely take tt to be the α\alpha quantile of the non-central tt distribution with p1p-1 degrees of freedom and non-centrality parameter ζ^/λ1{\hat{\zeta}}_{*}/\lambda_{1}^{\prime}, where λ1\lambda_{1}^{\prime} and λp\lambda_{p} plug in ζ^{\hat{\zeta}}_{*} for ζ{\zeta}_{*} wherever needed, then invert Equation 54 to get a confidence limit on SNR(𝝂^)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right). This confidence limit is also of dubious value, however, see Section 7.1.1.

Note that the relation in Equation 51 requires one to know ζ{\zeta}_{*}. To perform inference on SNR(𝝂^)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right) given the observed data, we adapt Corollary 2.13 to the case of elliptical returns.

Theorem 3.11.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}, assumed to take an elliptical distribution with kurtosis parameter κ\kappa. Let 𝛎^{\boldsymbol{\hat{\nu}}}_{{}*} be the sample Markowitz portfolio, and ζ^{\hat{\zeta}}_{*} be the sample Sharpe ratio of that portfolio. Define the signal-noise ratio of 𝛎^{\boldsymbol{\hat{\nu}}}_{{}*} as

SNR(𝝂;Θ,0)=df𝝂𝝁𝝂Σ𝝂.\operatorname{SNR}\left(\boldsymbol{\nu};\mathsf{\Theta},0\right)=_{\operatorname{df}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}.

Then, asymptotically in nn, the difference SNR(Θ^1;Θ,0)ζ^\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*} has the following mean and variance:

E[SNR(Θ^1;Θ,0)ζ^]\displaystyle\operatorname{E}\left[\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right] (κζ2+1)(1p)+(3κ1)4ζ2+12nζ.\displaystyle\rightsquigarrow\frac{\left(\kappa{\zeta}^{2}_{*}+1\right)\left(1-p\right)+\frac{\left(3\kappa-1\right)}{4}{\zeta}^{2}_{*}+1}{2n{\zeta}_{*}}. (55)
Var(SNR(Θ^1;Θ,0)ζ^)\displaystyle\operatorname{Var}\left(\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right) (3κ1)4ζ2+1n.\displaystyle\rightsquigarrow\frac{\frac{\left(3\kappa-1\right)}{4}{\zeta}^{2}_{*}+1}{n}. (56)

And the ratio has the asymptotic mean and variance

E[SNR(Θ^1;Θ,0)ζ^]\displaystyle\operatorname{E}\left[\frac{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)}{{\hat{\zeta}}_{*}}\right] 1+(κζ2+1)(1p)+3((3κ1)4ζ2+1)2nζ2.\displaystyle\rightsquigarrow 1+\frac{\left(\kappa{\zeta}^{2}_{*}+1\right)\left(1-p\right)+3\left(\frac{\left(3\kappa-1\right)}{4}{\zeta}^{2}_{*}+1\right)}{2n{\zeta}_{*}^{2}}. (57)
Var(SNR(Θ^1;Θ,0)ζ^)\displaystyle\operatorname{Var}\left(\frac{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)}{{\hat{\zeta}}_{*}}\right) (3κ1)4ζ2+1nζ2.\displaystyle\rightsquigarrow\frac{\frac{\left(3\kappa-1\right)}{4}{\zeta}^{2}_{*}+1}{n{\zeta}_{*}^{2}}. (58)

We relegate the long proof to the Appendix. Note that a similar line of reasoning should produce the asymptotic distribution of Hotelling’s T2T^{2} under elliptical returns, which could be compared to form given by Iwashita. [24] Also of note is the result of Paulsen and Söhl [51], who show that for Gaussian returns,

E[SNR(Θ^1;Θ,0)(ζ^+1pnζ^)]=0.\operatorname{E}\left[\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-\left({\hat{\zeta}}_{*}+\frac{1-p}{n{\hat{\zeta}}_{*}}\right)\right]=0.

The theorem suggests the following α\alpha confidence intervals for SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right), by plugging in ζ^{\hat{\zeta}}_{*} for the unknown quantity ζ{\zeta}_{*}:

ζ^+(κζ^2+1)(1p)+c((3κ1)4ζ^2+1)2nζ^±Z1α/2(3κ1)4ζ^2+1n,{\hat{\zeta}}_{*}+\frac{\left(\kappa{\hat{\zeta}}^{2}_{*}+1\right)\left(1-p\right)+c\left(\frac{\left(3\kappa-1\right)}{4}{\hat{\zeta}}^{2}_{*}+1\right)}{2n{\hat{\zeta}}_{*}}\pm Z_{1-\alpha/2}\sqrt{\frac{\frac{\left(3\kappa-1\right)}{4}{\hat{\zeta}}^{2}_{*}+1}{n}}, (59)

where one can take c=1c=1 for the difference formula, and c=3c=3 for the ratio formula. However, these confidence intervals are not well supported by simulations, in the sense that they require very large nn to give near nominal coverage, cf. Section 7.1.1.

3.2 Distribution under matrix normal returns

Now we consider the case where the (augmented) returns follow a matrix normal distribution. That is, we suppose that there is a n×(1+p){n}\times{\left(1+p\right)} matrix 𝖬\mathsf{M} and symmetric positive semi-definite matrices Ξ\mathsf{\Xi} and Ψ\mathsf{\Psi}, respectively of size (1+p)×(1+p){\left(1+p\right)}\times{\left(1+p\right)} and n×n{n}\times{n} such that

vec(𝖷~)𝒩(vec(𝖬),ΞΨ).\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)\sim\mathcal{N}\left(\operatorname{vec}\left(\mathsf{M}\right),\mathsf{\Xi}\otimes\mathsf{\Psi}\right).

This form allows us to consider deviations from the i.i.d. assumption by allowing e.g., the 𝝁\boldsymbol{\mu} to change over time, autocorrelation in returns and so on111We do not consider the elliptical distribution in this case, as it can impose long term dependence among returns even when they are uncorrelated..

We now seek the moments of

Θ^=1n𝖷~𝖷~.\mathsf{\hat{\Theta}}=\frac{1}{n}{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}.
Lemma 3.12.

For matrix normal returns vec(𝖷~)𝒩(vec(𝖬),ΞΨ)\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)\sim\mathcal{N}\left(\operatorname{vec}\left(\mathsf{M}\right),\mathsf{\Xi}\otimes\mathsf{\Psi}\right), the mean and covariance of the gram are

E[𝖷~𝖷~]=𝖬𝖬+tr(Ψ)Ξ,\operatorname{E}\left[{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}\right]={{\mathsf{M}}^{\top}}\mathsf{M}+\operatorname{tr}\left(\mathsf{\Psi}\right)\mathsf{\Xi}, (60)

and

Var(vec(𝖷~𝖷~))=(𝖨+𝖪)[ΞΞtr(Ψ2)+(𝖬Ψ𝖬)Ξ+Ξ(𝖬Ψ𝖬)].\operatorname{Var}\left(\operatorname{vec}\left({{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}\right)\right)=\left(\mathsf{I}+\mathsf{K}\right)\left[\mathsf{\Xi}\otimes\mathsf{\Xi}\operatorname{tr}\left(\mathsf{\Psi}^{2}\right)+\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)\otimes\mathsf{\Xi}+\mathsf{\Xi}\otimes\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)\right]. (61)

As a check we note this result is consistent with Theorem 3.6 in the i.i.d. case, which corresponds to 𝖬=𝟏𝝁~\mathsf{M}=\boldsymbol{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}, Ψ=𝖨\mathsf{\Psi}=\mathsf{I}, Ξ=Σ~\mathsf{\Xi}=\mathsf{\tilde{\Sigma}}.

Lemma 3.13.

Let vec(𝖷~)𝒩(vec(𝖬),ΞΨ)\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)\sim\mathcal{N}\left(\operatorname{vec}\left(\mathsf{M}\right),\mathsf{\Xi}\otimes\mathsf{\Psi}\right), where Ψ\mathsf{\Psi} and Ξ\mathsf{\Xi} are rescaled such that tr(Ψ)=n\operatorname{tr}\left(\mathsf{\Psi}\right)=n. Let

Θ^=1n𝖷~𝖷~\mathsf{\hat{\Theta}}=\frac{1}{n}{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}

Define

Θ=limn1n𝖬𝖬+Ξ.\mathsf{\Theta}=\lim_{n\to\infty}\frac{1}{n}{{\mathsf{M}}^{\top}}\mathsf{M}+\mathsf{\Xi}.

Then asymptotically in nn,

n(Θ^1Θ1)\displaystyle\sqrt{n}\left({{\mathsf{\hat{\Theta}}}^{-1}}-{{\mathsf{\Theta}}^{-1}}\right) 𝒩(0,𝖡),\displaystyle\rightsquigarrow\mathcal{N}\left(0,\mathsf{B}\right), (62)

with

𝖡\displaystyle\mathsf{B} =2𝖭[(Θ1ΞΘ1)(Θ1ΞΘ1)tr(Ψ2)]\displaystyle=2\mathsf{N}\left[{\left({{\mathsf{\Theta}}^{-1}}\mathsf{\Xi}{{\mathsf{\Theta}}^{-1}}\right)}\otimes{\left({{\mathsf{\Theta}}^{-1}}\mathsf{\Xi}{{\mathsf{\Theta}}^{-1}}\right)}\operatorname{tr}\left(\mathsf{\Psi}^{2}\right)\right]
+2𝖭[(Θ1𝖬Ψ𝖬Θ1)(Θ1ΞΘ1)+(Θ1ΞΘ1)(Θ1𝖬Ψ𝖬Θ1)].\displaystyle\phantom{=}\,+2\mathsf{N}\left[\left({{\mathsf{\Theta}}^{-1}}{{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left({{\mathsf{\Theta}}^{-1}}\mathsf{\Xi}{{\mathsf{\Theta}}^{-1}}\right)+\left({{\mathsf{\Theta}}^{-1}}\mathsf{\Xi}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left({{\mathsf{\Theta}}^{-1}}{{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}{{\mathsf{\Theta}}^{-1}}\right)\right].

In particular, letting ζ2=𝐞1Θ1𝐞11{\zeta}^{2}_{*}={\boldsymbol{e}}^{\top}_{1}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}-1, and ζ^2=𝐞1Θ^1𝐞11{\hat{\zeta}}^{2}_{*}={\boldsymbol{e}}^{\top}_{1}{{\mathsf{\hat{\Theta}}}^{-1}}{\boldsymbol{e}}_{1}-1, then

n(ζ^2ζ2)\displaystyle\sqrt{n}\left({\hat{\zeta}}^{2}_{*}-{\zeta}^{2}_{*}\right) 𝒩(0,b2),\displaystyle\rightsquigarrow\mathcal{N}\left(0,b^{2}\right), (63)

for b2=(𝐞1𝐞1)𝖡(𝐞1𝐞1).b^{2}=\left({{\boldsymbol{e}}^{\top}_{1}}\otimes{{\boldsymbol{e}}^{\top}_{1}}\right)\mathsf{B}\left({{\boldsymbol{e}}_{1}}\otimes{{\boldsymbol{e}}_{1}}\right).

The proof follows along the lines of Corollary 3.10 and is omitted. This corollary could be useful in finding the asymptotic distribution of ζ^2{\hat{\zeta}}^{2}_{*} (and Hotellings T2{T}^{2}) under certain divergences from i.i.d. normality. For example, one could impose a general autocorrelation by making Ψi,j=ρ|ij|{\mathsf{\Psi}}_{i,j}=\rho^{\left|i-j\right|} for some ρ(1,1)\rho\in\left(-1,1\right). One could impose heteroskedasticity structure where 𝖬=𝒍𝝁\mathsf{M}=\boldsymbol{l}{{\boldsymbol{\mu}}^{\top}} and Ψ=diag(𝒍λ)\mathsf{\Psi}=\operatorname{diag}\left(\boldsymbol{l}^{\lambda}\right) for some scalar lambda. It has been shown that the Sharpe ratio is relatively robust to deviations from assumptions under similar configurations. [53]

3.3 Likelihood ratio test on Markowitz portfolio

Let us again consider xx to take a Gaussian distribution, rather than a general elliptical distribution. Consider the null hypothesis

H0:tr(𝖠iΘ1)=ai,i=1,,m.{{H}_{0}}:\operatorname{tr}\left({{\mathsf{A}}_{i}}{{\mathsf{\Theta}}^{-1}}\right)={{a}_{i}},\,i=1,\ldots,m. (64)

The constraints have to be sensible. For example, they cannot violate the positive definiteness of Θ1{{\mathsf{\Theta}}^{-1}}, symmetry, etc. Without loss of generality, we can assume that the 𝖠i{{\mathsf{A}}_{i}} are symmetric, since Θ\mathsf{\Theta} is symmetric, and for symmetric 𝖦\mathsf{G} and square 𝖧\mathsf{H}, tr(𝖦𝖧)=tr(𝖦12(𝖧+𝖧))\operatorname{tr}\left(\mathsf{G}\mathsf{H}\right)=\operatorname{tr}\left(\mathsf{G}\frac{1}{2}\left(\mathsf{H}+{{\mathsf{H}}^{\top}}\right)\right), and so we could replace any non-symmetric 𝖠i{{\mathsf{A}}_{i}} with 12(𝖠i+𝖠i)\frac{1}{2}\left({{\mathsf{A}}_{i}}+{{{{\mathsf{A}}_{i}}}^{\top}}\right).

Employing the Lagrange multiplier technique, the maximum likelihood estimator under the null hypothesis, call it Θ0{{\mathsf{\Theta}}_{0}}, solves the following equation

0=dlogf(Θ^;Θ)dΘ1iλidtr(𝖠iΘ1)dΘ1,=Θ0+Θ^iλi𝖠i,.\begin{split}0&=\frac{\mathrm{d}{\log f\left(\mathsf{\hat{\Theta}};\mathsf{\Theta}\right)}}{\mathrm{d}{{\mathsf{\Theta}}^{-1}}}-\sum_{i}\lambda_{i}\frac{\mathrm{d}{\operatorname{tr}\left({{\mathsf{A}}_{i}}{{\mathsf{\Theta}}^{-1}}\right)}}{\mathrm{d}{{\mathsf{\Theta}}^{-1}}},\\ &=-{{\mathsf{\Theta}}_{0}}+\mathsf{\hat{\Theta}}-\sum_{i}\lambda_{i}{{\mathsf{A}}_{i}},.\end{split}

Thus the MLE under the null is

Θ0=Θ^iλi𝖠i.{{\mathsf{\Theta}}_{0}}=\mathsf{\hat{\Theta}}-\sum_{i}\lambda_{i}{{\mathsf{A}}_{i}}. (65)

The maximum likelihood estimator under the constraints has to be found numerically by solving for the λi\lambda_{i}, subject to the constraints in Equation 64.

This framework slightly generalizes Dempster’s “Covariance Selection,” [15] which reduces to the case where each ai{{a}_{i}} is zero, and each 𝖠i{{\mathsf{A}}_{i}} is a matrix of all zeros except two (symmetric) ones somewhere in the lower right p×p{p}\times{p} sub-matrix. In all other respects, however, the solution here follows Dempster.

An iterative technique for finding the MLE based on a Newton step would proceed as follow. [48] Let 𝝀(0){\boldsymbol{\lambda}}^{\left(0\right)} be some initial estimate of the vector of λi\lambda_{i}. (A good initial estimate can likely be had by abusing the asymptotic normality result from Section 2.2.) The residual of the kthk^{\text{th}} estimate, 𝝀(k){\boldsymbol{\lambda}}^{\left(k\right)} is

ϵi(k)=dftr(𝖠i[Θ^jλj(k)𝖠j]1)ai.{\boldsymbol{\epsilon}}^{\left(k\right)}_{i}=_{\operatorname{df}}\operatorname{tr}\left({{\mathsf{A}}_{i}}{{\left[\mathsf{\hat{\Theta}}-\sum_{j}{\lambda}^{\left(k\right)}_{j}{{\mathsf{A}}_{j}}\right]}^{-1}}\right)-{{a}_{i}}. (66)

The Jacobian of this residual with respect to the lthl^{\text{th}} element of 𝝀i(k){\boldsymbol{\lambda}}^{\left(k\right)}_{i} s

dϵi(k)dλl(k)=tr(𝖠i[Θ^jλj(k)𝖠j]1𝖠l[Θ^jλj(k)𝖠j]1),=vec(𝖠i)([Θ^jλj(k)𝖠j]1[Θ^jλj(k)𝖠j]1)vec(𝖠l).\begin{split}\frac{\mathrm{d}{{\boldsymbol{\epsilon}}^{\left(k\right)}_{i}}}{\mathrm{d}{\lambda}^{\left(k\right)}_{l}}&=\operatorname{tr}\left({{\mathsf{A}}_{i}}{{\left[\mathsf{\hat{\Theta}}-\sum_{j}{\lambda}^{\left(k\right)}_{j}{{\mathsf{A}}_{j}}\right]}^{-1}}{{\mathsf{A}}_{l}}{{\left[\mathsf{\hat{\Theta}}-\sum_{j}{\lambda}^{\left(k\right)}_{j}{{\mathsf{A}}_{j}}\right]}^{-1}}\right),\\ &={{\operatorname{vec}\left({{\mathsf{A}}_{i}}\right)}^{\top}}\left({{{\left[\mathsf{\hat{\Theta}}-\sum_{j}{\lambda}^{\left(k\right)}_{j}{{\mathsf{A}}_{j}}\right]}^{-1}}}\otimes{{{\left[\mathsf{\hat{\Theta}}-\sum_{j}{\lambda}^{\left(k\right)}_{j}{{\mathsf{A}}_{j}}\right]}^{-1}}}\right)\operatorname{vec}\left({{\mathsf{A}}_{l}}\right).\end{split} (67)

Newton’s method is then the iterative scheme

𝝀(k+1)𝝀(k)(dϵ(k)d𝝀(k))1ϵ.(k){\boldsymbol{\lambda}}^{\left(k+1\right)}\leftarrow{\boldsymbol{\lambda}}^{\left(k\right)}-{{\left(\frac{\mathrm{d}{{\boldsymbol{\epsilon}}^{\left(k\right)}}}{\mathrm{d}{\boldsymbol{\lambda}}^{\left(k\right)}}\right)}^{-1}}{\boldsymbol{\epsilon}}^{\left(k\right)}_{.} (68)

When (if?) the iterative scheme converges on the optimum, plugging in 𝝀(k){\boldsymbol{\lambda}}^{\left(k\right)} into Equation 65 gives the MLE under the null. The likelihood ratio test statistic is

2logΛ=df2log((Θ0|Θ^)(Θunrestricted MLE|Θ^)),=n(log|Θ0Θ^1|+tr([Θ01Θ^1]Θ^)),=n(log|Θ0Θ^1|+tr(Θ01Θ^)[p+1]),\begin{split}-2\log\Lambda&=_{\operatorname{df}}-2\log\left(\frac{\mathcal{L}\left({{{\mathsf{\Theta}}_{0}}}\left|{\mathsf{\hat{\Theta}}}\right.\right)}{\mathcal{L}\left({{{\mathsf{\Theta}}_{\mbox{unrestricted }\mathrm{MLE}}}}\left|{\mathsf{\hat{\Theta}}}\right.\right)}\right),\\ &=n\left(\log\left|{{\mathsf{\Theta}}_{0}}{{\mathsf{\hat{\Theta}}}^{-1}}\right|+\operatorname{tr}\left(\left[{{{{\mathsf{\Theta}}_{0}}}^{-1}}-{{\mathsf{\hat{\Theta}}}^{-1}}\right]\mathsf{\hat{\Theta}}\right)\right),\\ &=n\left(\log\left|{{\mathsf{\Theta}}_{0}}{{\mathsf{\hat{\Theta}}}^{-1}}\right|+\operatorname{tr}\left({{{{\mathsf{\Theta}}_{0}}}^{-1}}\mathsf{\hat{\Theta}}\right)-\left[p+1\right]\right),\end{split} (69)

using the fact that Θ^\mathsf{\hat{\Theta}} is the unrestricted MLE, per Corollary 3.5. By Wilks’ Theorem, under the null hypothesis, 2logΛ-2\log\Lambda is, asymptotically in nn, distributed as a chi-square with mm degrees of freedom. [66] However, most ‘interesting’ null tests posit Θ\mathsf{\Theta} to be somewhere on the boundary of acceptable values; for such tests, asymptotic convergence is to some other distribution. [2]

4 Extensions

For large samples, Wald statistics of the elements of the Markowitz portfolio computed using the procedure outlined above tend to be very similar to the t-statistics produced by the procedure of Britten-Jones. [7] However, the technique proposed here admits a number of interesting extensions.

The script for each of these extensions is the same: define, then solve, some portfolio optimization problem; show that the solution can be defined in terms of some transformation of Θ1{{\mathsf{\Theta}}^{-1}}, giving an implicit recipe for constructing the sample portfolio based on the same transformation of Θ^1{{\mathsf{\hat{\Theta}}}^{-1}}; find the asymptotic distribution of the sample portfolio in terms of Ω\mathsf{\Omega}.

4.1 Subspace Constraint

Consider the constrained portfolio optimization problem

max𝝂:𝖩𝝂=𝟎,𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\max_{\begin{subarray}{c}\boldsymbol{\nu}:{{\mathsf{J}}^{\bot}}\boldsymbol{\nu}=\boldsymbol{0},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}\end{subarray}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (70)

where 𝖩{{\mathsf{J}}^{\bot}} is a (ppj)×p{\left(p-{{p}_{j}}\right)}\times{p} matrix of rank ppjp-{{p}_{j}}, r0{{r}_{0}} is the disastrous rate, and R>0R>0 is the risk budget. Let the rows of 𝖩\mathsf{J} span the null space of the rows of 𝖩{{\mathsf{J}}^{\bot}}; that is, 𝖩𝖩=𝟢{{\mathsf{J}}^{\bot}}{{\mathsf{J}}^{\top}}=\mathsf{0}, and 𝖩𝖩=𝖨\mathsf{J}{{\mathsf{J}}^{\top}}=\mathsf{I}. We can interpret the orthogonality constraint 𝖩𝝂=𝟎{{\mathsf{J}}^{\bot}}\boldsymbol{\nu}=\boldsymbol{0} as stating that 𝝂\boldsymbol{\nu} must be a linear combination of the columns of 𝖩{{\mathsf{J}}^{\top}}, thus 𝝂=𝖩𝝃\boldsymbol{\nu}={{\mathsf{J}}^{\top}}\boldsymbol{\xi}. The columns of 𝖩{{\mathsf{J}}^{\top}} may be considered ‘baskets’ of assets to which our investments are restricted.

We can rewrite the portfolio optimization problem in terms of solving for 𝝃\boldsymbol{\xi}, but then find the asymptotic distribution of the resultant 𝝂\boldsymbol{\nu}. Note that the expected return and covariance of the portfolio 𝝃\boldsymbol{\xi} are, respectively, 𝝃𝖩𝝁{{\boldsymbol{\xi}}^{\top}}\mathsf{J}\boldsymbol{\mu} and 𝝃𝖩Σ𝖩𝝃{{\boldsymbol{\xi}}^{\top}}\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\boldsymbol{\xi}. Thus we can plug in 𝖩𝝁\mathsf{J}\boldsymbol{\mu} and 𝖩Σ𝖩\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}} into Lemma 2.6 to get the following analogue.

Lemma 4.1 (subspace constrained Sharpe ratio optimal portfolio).

Assuming the rows of 𝖩\mathsf{J} span the null space of the rows of 𝖩{{\mathsf{J}}^{\bot}}, 𝖩𝛍𝟎\mathsf{J}\boldsymbol{\mu}\neq\boldsymbol{0}, and Σ\mathsf{\Sigma} is invertible, the portfolio optimization problem

max𝝂:𝖩𝝂=𝟎,𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\max_{\begin{subarray}{c}\boldsymbol{\nu}:{{\mathsf{J}}^{\bot}}\boldsymbol{\nu}=\boldsymbol{0},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}\end{subarray}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (71)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂R,𝖩,=dfc𝖩(𝖩Σ𝖩)1𝖩𝝁,c=R𝝁𝖩(𝖩Σ𝖩)1𝖩𝝁.\begin{split}{\boldsymbol{\nu}}_{{R,\mathsf{J},}*}&=_{\operatorname{df}}c{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\boldsymbol{\mu},\\ c&=\frac{R}{\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\boldsymbol{\mu}}}.\end{split}

When r0>0{{r}_{0}}>0 the solution is unique.

We can easily find the asymptotic distribution of 𝝂^R,𝖩,{\boldsymbol{\hat{\nu}}}_{{R,\mathsf{J},}*}, the sample analogue of the optimal portfolio in Lemma 4.1. First define the subspace second moment.

Definition 4.2.

Let 𝖩~\tilde{\mathsf{J}} be the (1+pj)×(p+1){\left(1+{{p}_{j}}\right)}\times{\left(p+1\right)} matrix,

𝖩~=df[100𝖩].\tilde{\mathsf{J}}=_{\operatorname{df}}\left[\begin{array}[]{cc}{1}&{0}\\ {0}&{\mathsf{J}}\end{array}\right].

Simple algebra proves the following lemma.

Lemma 4.3.

The elements of 𝖩~(𝖩~Θ𝖩~)1𝖩~{{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}} are

𝖩~(𝖩~Θ𝖩~)1𝖩~=[1+𝝁𝖩(𝖩Σ𝖩)1𝖩𝝁𝝁𝖩(𝖩Σ𝖩)1𝖩𝖩(𝖩Σ𝖩)1𝖩𝝁𝖩(𝖩Σ𝖩)1𝖩].{{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}=\left[\begin{array}[]{cc}{1+{{\boldsymbol{\mu}}^{\top}}{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\boldsymbol{\mu}}&{-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}}\\ {-{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\boldsymbol{\mu}}&{{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}}\end{array}\right].

In particular, elements 22 through p+1p+1 of vech(𝖩~(𝖩~Θ𝖩~)1𝖩~)-\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right) are the portfolio 𝛎^R,𝖩,{\boldsymbol{\hat{\nu}}}_{{R,\mathsf{J},}*} defined in Lemma 4.1, up to the scaling constant cc which is the ratio of RR to the square root of the first element of vech(𝖩~(𝖩~Θ𝖩~)1𝖩~)\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right) minus one.

The asymptotic distribution of vech(𝖩~(𝖩~Θ𝖩~)1𝖩~)\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right) is given by the following theorem, which is the analogue of Theorem 2.5.

Theorem 4.4.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}. Let 𝖩~\tilde{\mathsf{J}} be defined as in Definition 4.2. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Then, asymptotically in nn,

n(vech(𝖩~(𝖩~Θ^𝖩~)1𝖩~)vech(𝖩~(𝖩~Θ𝖩~)1𝖩~))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\hat{\Theta}}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right)-\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (72)

where

𝖧=𝖫(𝖩~𝖩~)((𝖩~Θ𝖩~)1(𝖩~Θ𝖩~)1)(𝖩~𝖩~)𝖣.\mathsf{H}=-\mathsf{L}{\left({{{\tilde{\mathsf{J}}}^{\top}}}\otimes{{{\tilde{\mathsf{J}}}^{\top}}}\right)\left({{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}\otimes{{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}\right)\left({\tilde{\mathsf{J}}}\otimes{\tilde{\mathsf{J}}}\right)}\mathsf{D}.
Proof.

By the multivariate delta method, it suffices to prove that

𝖧=dvech(𝖩~(𝖩~Θ^𝖩~)1𝖩~)dvech(Θ).\mathsf{H}=\frac{\mathrm{d}{\operatorname{vech}\left({{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\hat{\Theta}}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{\Theta}\right)}.

Via Lemma 2.3, it suffices to prove that

d𝖩~(𝖩~Θ𝖩~)1𝖩~dΘ=(𝖩~𝖩~)((𝖩~Θ𝖩~)1(𝖩~Θ𝖩~)1)(𝖩~𝖩~).\frac{\mathrm{d}{{{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}}}{\mathrm{d}\mathsf{\Theta}}=-\left({{{\tilde{\mathsf{J}}}^{\top}}}\otimes{{{\tilde{\mathsf{J}}}^{\top}}}\right)\left({{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}\otimes{{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}\right)\left({\tilde{\mathsf{J}}}\otimes{\tilde{\mathsf{J}}}\right).

A well-known fact regarding matrix manipulation [37] is

vec(𝖠𝖡𝖢)=(𝖢𝖠)vec(𝖡),therefore,d𝖠𝖡𝖢d𝖡=𝖢𝖠.\operatorname{vec}\left(\mathsf{A}\mathsf{B}\mathsf{C}\right)=\left({{\mathsf{C}}^{\top}}\otimes\mathsf{A}\right)\operatorname{vec}\left(\mathsf{B}\right),\quad\mbox{therefore,}\quad\frac{\mathrm{d}{\mathsf{A}\mathsf{B}\mathsf{C}}}{\mathrm{d}\mathsf{B}}={{\mathsf{C}}^{\top}}\otimes\mathsf{A}.

Using this, and the chain rule, we have:

d𝖩~(𝖩~Θ𝖩~)1𝖩~dΘ=d𝖩~(𝖩~Θ𝖩~)1𝖩~d(𝖩~Θ𝖩~)1d(𝖩~Θ𝖩~)1d𝖩~Θ𝖩~d𝖩~Θ𝖩~dΘ=(𝖩~𝖩~)d(𝖩~Θ𝖩~)1d𝖩~Θ𝖩~(𝖩~𝖩~).\begin{split}\frac{\mathrm{d}{{{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}}}{\mathrm{d}\mathsf{\Theta}}&=\frac{\mathrm{d}{{{\tilde{\mathsf{J}}}^{\top}}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{J}}}}{\mathrm{d}{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}\frac{\mathrm{d}{{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}}{\mathrm{d}\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}}\frac{\mathrm{d}{\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}}}{\mathrm{d}\mathsf{\Theta}}\\ &=\left({{{\tilde{\mathsf{J}}}^{\top}}}\otimes{{{\tilde{\mathsf{J}}}^{\top}}}\right)\frac{\mathrm{d}{{{\left(\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}\right)}^{-1}}}}{\mathrm{d}\tilde{\mathsf{J}}\mathsf{\Theta}{{\tilde{\mathsf{J}}}^{\top}}}\left({\tilde{\mathsf{J}}}\otimes{\tilde{\mathsf{J}}}\right).\end{split}

Lemma 2.4 gives the middle term, completing the proof. ∎

An analogue of Corollary 2.7 gives the asymptotic distribution of 𝝂R,𝖩,{\boldsymbol{\nu}}_{{R,\mathsf{J},}*} defined in Lemma 4.1.

4.2 Hedging Constraint

Consider, now, the constrained portfolio optimization problem,

max𝝂:𝖦Σ𝝂=𝟎,𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\max_{\begin{subarray}{c}\boldsymbol{\nu}:\mathsf{G}\mathsf{\Sigma}\boldsymbol{\nu}=\boldsymbol{0},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}\end{subarray}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (73)

where 𝖦\mathsf{G} is now a pg×p{{{p}_{g}}}\times{p} matrix of rank pg{{p}_{g}}. We can interpret the 𝖦\mathsf{G} constraint as stating that the covariance of the returns of a feasible portfolio with the returns of a portfolio whose weights are in a given row of 𝖦\mathsf{G} shall equal zero. In the garden variety application of this problem, 𝖦\mathsf{G} consists of pg{{p}_{g}} rows of the identity matrix; in this case, feasible portfolios are ‘hedged’ with respect to the pg{{p}_{g}} assets selected by 𝖦\mathsf{G} (although they may hold some position in the hedged assets).

Lemma 4.5 (constrained Sharpe ratio optimal portfolio).

Assuming 𝛍𝟎\boldsymbol{\mu}\neq\boldsymbol{0}, and Σ\mathsf{\Sigma} is invertible, the portfolio optimization problem

max𝝂:𝖦Σ𝝂=𝟎,𝝂Σ𝝂R2𝝂𝝁r0𝝂Σ𝝂,\max_{\begin{subarray}{c}\boldsymbol{\nu}:\mathsf{G}\mathsf{\Sigma}\boldsymbol{\nu}=\boldsymbol{0},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}\leq R^{2}\end{subarray}}\frac{{{\boldsymbol{\nu}}^{\top}}\boldsymbol{\mu}-{{r}_{0}}}{\sqrt{{{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}}}, (74)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂R,𝖦,=dfc(Σ1𝝁𝖦(𝖦Σ𝖦)1𝖦𝝁),c=R𝝁Σ1𝝁𝝁𝖦(𝖦Σ𝖦)1𝖦𝝁.\begin{split}{\boldsymbol{\nu}}_{{R,\mathsf{G},}*}&=_{\operatorname{df}}c\left({{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}-{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}\right),\\ c&=\frac{R}{\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}}}.\end{split}

When r0>0{{r}_{0}}>0 the solution is unique.

Proof.

By the Lagrange multiplier technique, the optimal portfolio solves the following equations:

0=c1𝝁c2Σ𝝂γ1Σ𝝂Σ𝖦𝜸𝟐,𝝂Σ𝝂R2,𝖦Σ𝝂=𝟎,\begin{split}0&=c_{1}\boldsymbol{\mu}-c_{2}\mathsf{\Sigma}\boldsymbol{\nu}-\gamma_{1}\mathsf{\Sigma}\boldsymbol{\nu}-\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\boldsymbol{\gamma_{2}},\\ {{\boldsymbol{\nu}}^{\top}}\mathsf{\Sigma}\boldsymbol{\nu}&\leq R^{2},\\ \mathsf{G}\mathsf{\Sigma}\boldsymbol{\nu}&=\boldsymbol{0},\end{split}

where γi\gamma_{i} are Lagrange multipliers, and c1,c2c_{1},c_{2} are scalar constants.

Solving the first equation gives

𝝂=c3[Σ1𝝁𝖦𝜸𝟐].\boldsymbol{\nu}=c_{3}\left[{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}-{{\mathsf{G}}^{\top}}\boldsymbol{\gamma_{2}}\right].

Reconciling this with the hedging equation we have

𝟎=𝖦Σ𝝂=c3𝖦Σ[Σ1𝝁𝖦𝜸𝟐],\boldsymbol{0}=\mathsf{G}\mathsf{\Sigma}\boldsymbol{\nu}=c_{3}\mathsf{G}\mathsf{\Sigma}\left[{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}-{{\mathsf{G}}^{\top}}\boldsymbol{\gamma_{2}}\right],

and therefore 𝜸𝟐=(𝖦Σ𝖦)1𝖦𝝁.\boldsymbol{\gamma_{2}}={{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}{\mathsf{G}}\boldsymbol{\mu}. Thus

𝝂=c3[Σ1𝝁𝖦(𝖦Σ𝖦)1𝖦𝝁].\boldsymbol{\nu}=c_{3}\left[{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}-{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}\right].

Plugging this into the objective reduces the problem to the univariate optimization

maxc3:c32R2/ζ,𝖦2sign(c3)ζ,𝖦r0|c3|ζ,𝖦,\max_{c_{3}:\,c_{3}^{2}\leq R^{2}/{\zeta}^{2}_{*,\mathsf{G}}}\operatorname{sign}\left(c_{3}\right){\zeta}_{*,\mathsf{G}}-\frac{{{r}_{0}}}{\left|c_{3}\right|{\zeta}_{*,\mathsf{G}}},

where ζ,𝖦2=𝝁Σ1𝝁𝝁𝖦(𝖦Σ𝖦)1𝖦𝝁.{\zeta}^{2}_{*,\mathsf{G}}={{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}. The optimum occurs for c=R/ζ,𝖦c=R/{\zeta}_{*,\mathsf{G}}, moreover the optimum is unique when r0>0{{r}_{0}}>0. ∎

The optimal hedged portfolio in Lemma 4.5 is, up to scaling, the difference of the unconstrained optimal portfolio from Lemma 2.6 and the subspace constrained portfolio in Lemma 4.1. This ‘delta’ analogy continues for the rest of this section.

Definition 4.6 (Delta Inverse Second Moment).

Let 𝖦~\tilde{\mathsf{G}} be the (1+pg)×(p+1){\left(1+{{p}_{g}}\right)}\times{\left(p+1\right)} matrix,

𝖦~=df[100𝖦].\tilde{\mathsf{G}}=_{\operatorname{df}}\left[\begin{array}[]{cc}{1}&{0}\\ {0}&{\mathsf{G}}\end{array}\right].

Define the ‘delta inverse second moment’ as

Δ𝖦Θ1=dfΘ1𝖦~(𝖦~Θ𝖦~)1𝖦~.{\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}=_{\operatorname{df}}{{\mathsf{\Theta}}^{-1}}-{{\tilde{\mathsf{G}}}^{\top}}{{\left(\tilde{\mathsf{G}}\mathsf{\Theta}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{G}}.

Simple algebra proves the following lemma.

Lemma 4.7.

The elements of Δ𝖦Θ1{\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}} are

Δ𝖦Θ1=[𝝁Σ1𝝁𝝁𝖦(𝖦Σ𝖦)1𝖦𝝁𝝁Σ1+𝝁𝖦(𝖦Σ𝖦)1𝖦Σ1𝝁+𝖦(𝖦Σ𝖦)1𝖦𝝁Σ1𝖦(𝖦Σ𝖦)1𝖦].{\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}=\left[\begin{array}[]{cc}{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}}&{-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}+{{\boldsymbol{\mu}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}}\\ {-{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}+{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\mu}}&{{{\mathsf{\Sigma}}^{-1}}-{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}}\end{array}\right].

In particular, elements 22 through p+1p+1 of vech(Δ𝖦Θ1)-\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}\right) are the portfolio 𝛎R,𝖦,{\boldsymbol{\nu}}_{{R,\mathsf{G},}*} defined in Lemma 4.5, up to the scaling constant cc which is the ratio of RR to the square root of the first element of vech(Δ𝖦Θ1)\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}\right).

The statistic 𝝁^Σ^1𝝁^𝝁^𝖦(𝖦Σ^𝖦)1𝖦𝝁^{{\boldsymbol{\hat{\mu}}}^{\top}}{{\mathsf{\hat{\Sigma}}}^{-1}}\boldsymbol{\hat{\mu}}-{{\boldsymbol{\hat{\mu}}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\hat{\Sigma}}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\boldsymbol{\hat{\mu}}, for the case where 𝖦\mathsf{G} is some rows of the p×p{p}\times{p} identity matrix, was first proposed by Rao, and its distribution under Gaussian returns was later found by Giri. [57, 20] This test statistic may be used for tests of portfolio spanning for the case where a risk-free instrument is traded. [23, 30]

The asymptotic distribution of Δ𝖦Θ^1{\Delta}_{\mathsf{G}}{{\mathsf{\hat{\Theta}}}^{-1}} is given by the following theorem, which is the analogue of Theorem 2.5.

Theorem 4.8.

Let Θ^\mathsf{\hat{\Theta}} be the unbiased sample estimate of Θ\mathsf{\Theta}, based on nn i.i.d. samples of 𝐱\boldsymbol{x}. Let Δ𝖦Θ1{\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}} be defined as in Definition 4.6, and similarly define Δ𝖦Θ^1{\Delta}_{\mathsf{G}}{{\mathsf{\hat{\Theta}}}^{-1}}. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Then, asymptotically in nn,

n(vech(Δ𝖦Θ^1)vech(Δ𝖦Θ1))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{\mathsf{\hat{\Theta}}}^{-1}}\right)-\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (75)

where

𝖧=𝖫[Θ1Θ1(𝖦~𝖦~)((𝖦~Θ𝖦~)1(𝖦~Θ𝖦~)1)(𝖦~𝖦~)]𝖣.\mathsf{H}=-\mathsf{L}{\left[{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}-\left({{{\tilde{\mathsf{G}}}^{\top}}}\otimes{{{\tilde{\mathsf{G}}}^{\top}}}\right)\left({{{\left(\tilde{\mathsf{G}}\mathsf{\Theta}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}}\otimes{{{\left(\tilde{\mathsf{G}}\mathsf{\Theta}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}}\right)\left({\tilde{\mathsf{G}}}\otimes{\tilde{\mathsf{G}}}\right)\right]}\mathsf{D}.
Proof.

Minor modification of proof of Theorem 4.4. ∎

Caution.

In the hedged portfolio optimization problem considered here, the optimal portfolio will, in general, hold money in the row space of 𝖦\mathsf{G}. For example, in the garden variety application, where one is hedging out exposure to ‘the market’ by including a broad market ETF, and taking 𝖦\mathsf{G} to be the corresponding row of the identity matrix, the final portfolio may hold some position in that broad market ETF. This is fine for an ETF, but one may wish to hedge out exposure to an untradeable returns stream–the returns of an index, say. Combining the hedging constraint of this section with the subspace constraint of Section 4.1 is simple in the case where the rows of 𝖦\mathsf{G} are spanned by the rows of 𝖩\mathsf{J}. The more general case, however, is rather more complicated.

4.3 Conditional Heteroskedasticity

The methods described above ignore ‘volatility clustering’, and assume homoskedasticity. [11, 47, 4] To deal with this, consider a strictly positive scalar random variable, si{{s}_{i}}, observable at the time the investment decision is required to capture 𝒙i+1{\boldsymbol{x}}_{i+1}. For reasons to be obvious later, it is more convenient to think of si{{s}_{i}} as a ‘quietude’ indicator, or a ‘weight’ for a weighted regression.

Two simple competing models for conditional heteroskedasticity are

(constant):E[𝒙i+1|si]\displaystyle\mbox{(constant):}\quad\operatorname{E}\left[{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right] =si1𝝁,\displaystyle={{s}_{i}}^{-1}\boldsymbol{\mu}, Var(𝒙i+1|si)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right) =si2Σ,\displaystyle={{s}_{i}}^{-2}\mathsf{\Sigma}, (76)
(floating):E[𝒙i+1|si]\displaystyle\mbox{(floating):}\quad\operatorname{E}\left[{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right] =𝝁,\displaystyle=\boldsymbol{\mu}, Var(𝒙i+1|si)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right) =si2Σ.\displaystyle={{s}_{i}}^{-2}\mathsf{\Sigma}. (77)

Under the model in Equation 76, the maximal Sharpe ratio is 𝝁Σ1𝝁\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}, independent of si{{s}_{i}}; under Equation 77, it is is si𝝁Σ1𝝁{{s}_{i}}\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}. The model names reflect whether or not the maximal Sharpe ratio varies conditional on si{{s}_{i}}.

The optimal portfolio under both models is the same, as stated in the following lemma, the proof of which follows by simply using Lemma 2.6.

Lemma 4.9 (Conditional Sharpe ratio optimal portfolio).

Under either the model in Equation 76 or Equation 77, conditional on observing si{{s}_{i}}, the portfolio optimization problem

argmax𝝂:Var(𝝂𝒙i+1|si)R2E[𝝂𝒙i+1|si]r0Var(𝝂𝒙i+1|si),\mathop{\mathrm{argmax}}_{\boldsymbol{\nu}:\,\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right)\leq R^{2}}\frac{\operatorname{E}\left[{{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right]-{{r}_{0}}}{\sqrt{\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right)}}, (78)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂=siR𝝁Σ1𝝁Σ1𝝁.{\boldsymbol{\nu}}_{{}*}=\frac{{{s}_{i}}R}{\sqrt{{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}}{{\mathsf{\Sigma}}^{-1}}{\boldsymbol{\mu}}. (79)

Moreover, this is the unique solution whenever r0>0{{r}_{0}}>0.

To perform inference on the portfolio 𝝂{\boldsymbol{\nu}}_{{}*} from Lemma 4.9, under the ‘constant’ model of Equation 76, apply the unconditional techniques to the sample second moment of si𝒙~i+1{{s}_{i}}{\boldsymbol{\tilde{x}}}_{i+1}.

For the ‘floating’ model of Equation 77, however, some adjustment to the technique is required. Define 𝒙~~i+1=dfsi𝒙~i+1{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}=_{\operatorname{df}}{{s}_{i}}{\boldsymbol{\tilde{x}}}_{i+1}; that is, 𝒙~~i+1=[si,si𝒙i+1]{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}={{\left[{{s}_{i}},{{s}_{i}}{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}. Consider the second moment of 𝒙~~\boldsymbol{\tilde{\tilde{x}}}:

Θs=dfE[𝒙~~𝒙~~]=[γ2γ2𝝁γ2𝝁Σ+𝝁γ2𝝁],whereγ2=dfE[s2].{{\mathsf{\Theta}}_{s}}=_{\operatorname{df}}\operatorname{E}\left[\boldsymbol{\tilde{\tilde{x}}}{{\boldsymbol{\tilde{\tilde{x}}}}^{\top}}\right]=\left[\begin{array}[]{cc}{{\gamma}^{2}}&{{\gamma}^{2}{{\boldsymbol{\mu}}^{\top}}}\\ {{\gamma}^{2}\boldsymbol{\mu}}&{\mathsf{\Sigma}+\boldsymbol{\mu}{\gamma}^{2}{{\boldsymbol{\mu}}^{\top}}}\end{array}\right],\quad\mbox{where}\quad{\gamma}^{2}=_{\operatorname{df}}\operatorname{E}\left[s^{2}\right]. (80)

The inverse of Θs{{\mathsf{\Theta}}_{s}} is

Θs1=[γ2+𝝁Σ1𝝁𝝁Σ1Σ1𝝁Σ1]{{{{\mathsf{\Theta}}_{s}}}^{-1}}=\left[\begin{array}[]{cc}{{\gamma}^{-2}+{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}&{-{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}\\ {-{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}}&{{{\mathsf{\Sigma}}^{-1}}}\end{array}\right] (81)

Once again, the optimal portfolio (up to scaling and sign), appears in vech(Θs1)\operatorname{vech}\left({{{{\mathsf{\Theta}}_{s}}}^{-1}}\right). Similarly, define the sample analogue:

Θ^s=df1ni𝒙~~i+1𝒙~~i+1.{{\mathsf{\hat{\Theta}}}_{s}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}{{{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}}^{\top}}. (82)

We can find the asymptotic distribution of vech(Θ^s)\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}_{s}}\right) using the same techniques as in the unconditional case, as in the following analogue of Theorem 2.5:

Theorem 4.10.

Let Θ^s=df1ni𝐱~~i+1𝐱~~i+1{{\mathsf{\hat{\Theta}}}_{s}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}{{{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}}^{\top}}, based on nn i.i.d. samples of [s,𝐱]{{\left[s,{{\boldsymbol{x}}^{\top}}\right]}^{\top}}. Let Ω\mathsf{\Omega} be the variance of vech(𝐱~~𝐱~~)\operatorname{vech}\left(\boldsymbol{\tilde{\tilde{x}}}{{\boldsymbol{\tilde{\tilde{x}}}}^{\top}}\right). Then, asymptotically in nn,

n(vech(Θ^s1)vech(Θs1))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{s}}}^{-1}}\right)-\operatorname{vech}\left({{{{\mathsf{\Theta}}_{s}}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (83)

where

𝖧=𝖫(Θs1Θs1)𝖣.\mathsf{H}=-\mathsf{L}{\left({{{{{\mathsf{\Theta}}_{s}}}^{-1}}}\otimes{{{{{\mathsf{\Theta}}_{s}}}^{-1}}}\right)}\mathsf{D}. (84)

Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

The only real difference from the unconditional case is that we cannot automatically assume that the first row and column of Ω\mathsf{\Omega} is zero (unless ss is actually constant, which misses the point). Moreover, the shortcut for estimating Ω\mathsf{\Omega} under Gaussian returns is not valid without some patching, an exercise left for the reader.

Dependence or independence of maximal Sharpe ratio from volatility is an assumption which, ideally, one could test with data. A mixed model containing both characteristics can be written as follows:

(mixed):E[𝒙i+1|si]\displaystyle\mbox{(mixed):}\quad\operatorname{E}\left[{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right] =si1𝝁0+𝝁1,\displaystyle={{s}_{i}}^{-1}{\boldsymbol{\mu}}_{0}+{\boldsymbol{\mu}}_{1}, Var(𝒙i+1|si)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}}\right.\right) =si2Σ.\displaystyle={{s}_{i}}^{-2}\mathsf{\Sigma}. (85)

One could then test whether elements of 𝝁0{\boldsymbol{\mu}}_{0} or of 𝝁1{\boldsymbol{\mu}}_{1} are zero. Analyzing this model is somewhat complicated without moving to a more general framework, as in the sequel.

4.4 Conditional Expectation and Heteroskedasticity

Suppose you observe random variables si>0{{s}_{i}}>0, and ff-vector 𝒇i{\boldsymbol{f}}_{i} at some time prior to when the investment decision is required to capture 𝒙i+1{\boldsymbol{x}}_{i+1}. It need not be the case that ss and 𝒇\boldsymbol{f} are independent. The general model is now

(bi-conditional):E[𝒙i+1|si,𝒇i]\displaystyle\mbox{(bi-conditional):}\quad\operatorname{E}\left[{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right] =𝖡𝒇i,\displaystyle=\mathsf{B}{\boldsymbol{f}}_{i}, Var(𝒙i+1|si,𝒇i)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right) =si2Σ,\displaystyle={{s}_{i}}^{-2}\mathsf{\Sigma}, (86)

where 𝖡\mathsf{B} is some p×f{p}\times{f} matrix. Without the si{{s}_{i}} term, these are the ‘predictive regression’ equations commonly used in Tactical Asset Allocation. [10, 22, 5]

By letting 𝒇i=[si1,1]{\boldsymbol{f}}_{i}={{\left[{{s}_{i}}^{-1},1\right]}^{\top}} we recover the mixed model in Equation 85; the bi-conditional model is considerably more general, however. The conditionally-optimal portfolio is given by the following lemma. Once again, the proof proceeds simply by plugging in the conditional expected return and volatility into Lemma 2.6.

Lemma 4.11 (Conditional Sharpe ratio optimal portfolio).

Under the model in Equation 86, conditional on observing si{{s}_{i}} and 𝐟i{\boldsymbol{f}}_{i}, the portfolio optimization problem

argmax𝝂:Var(𝝂𝒙i+1|si,𝒇i)R2E[𝝂𝒙i+1|si,𝒇i]r0Var(𝝂𝒙i+1|si,𝒇i),\mathop{\mathrm{argmax}}_{\boldsymbol{\nu}:\,\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right)\leq R^{2}}\frac{\operatorname{E}\left[{{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right]-{{r}_{0}}}{\sqrt{\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right)}}, (87)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂=siR𝒇i𝖡Σ1𝖡𝒇iΣ1𝖡𝒇i.{\boldsymbol{\nu}}_{{}*}=\frac{{{s}_{i}}R}{\sqrt{{{{\boldsymbol{f}}_{i}}^{\top}}{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}{\boldsymbol{f}}_{i}}}{{\mathsf{\Sigma}}^{-1}}{\mathsf{B}{\boldsymbol{f}}_{i}}. (88)

Moreover, this is the unique solution whenever r0>0{{r}_{0}}>0.

Caution.

It is emphatically not the case that investing in the portfolio 𝝂{\boldsymbol{\nu}}_{{}*} from Lemma 4.11 at every time step is long-term Sharpe ratio optimal. One may possibly achieve a higher long-term Sharpe ratio by down-levering at times when the conditional Sharpe ratio is low. The optimal long term investment strategy falls under the rubric of ‘multiperiod portfolio choice’, and is an area of active research. [46, 17, 5]

The matrix Σ1𝖡{{\mathsf{\Sigma}}^{-1}}{\mathsf{B}} is the generalization of the Markowitz portfolio: it is the multiplier for a model under which the optimal portfolio is linear in the features 𝒇i{\boldsymbol{f}}_{i} (up to scaling to satisfy the risk budget). We can think of this matrix as the ‘Markowitz coefficient’. If an entire column of Σ1𝖡{{\mathsf{\Sigma}}^{-1}}{\mathsf{B}} is zero, it suggests that the corresponding element of 𝒇\boldsymbol{f} can be ignored in investment decisions; if an entire row of Σ1𝖡{{\mathsf{\Sigma}}^{-1}}{\mathsf{B}} is zero, it suggests the corresponding instrument delivers no return or hedging benefit.

Conditional on observing 𝒇i{\boldsymbol{f}}_{i} and si{{s}_{i}}, the maximal achievable squared signal-noise ratio is

ζ2|si,𝒇i=df(E[𝝂𝒙i+1|si,𝒇i]Var(𝝂𝒙i+1|si,𝒇i))2=𝒇i𝖡Σ1𝖡𝒇i.{\zeta}^{2}_{*}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.=_{\operatorname{df}}\left(\frac{\operatorname{E}\left[{{{\boldsymbol{\nu}}_{{}*}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right]}{\sqrt{\operatorname{Var}\left({{{\boldsymbol{\nu}}_{{}*}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right)}}\right)^{2}={{{\boldsymbol{f}}_{i}}^{\top}}{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}{\boldsymbol{f}}_{i}.

This is independent of si{{s}_{i}}, but depends on 𝒇i{\boldsymbol{f}}_{i}. The unconditional expected value of the maximal squared signal-noise ratio is thus

E𝒇[ζ2]=dfE𝒇[tr(𝒇t𝖡Σ1𝖡𝒇t)],=E𝒇[tr((𝖡Σ1𝖡)𝒇t𝒇t)],=tr((𝖡Σ1𝖡)E𝒇[𝒇t𝒇t]),=tr((𝖡Σ1𝖡)Γf).\begin{split}{{\operatorname{E}}_{\boldsymbol{f}}}\left[{\zeta}^{2}_{*}\right]&=_{\operatorname{df}}{{\operatorname{E}}_{\boldsymbol{f}}}\left[\operatorname{tr}\left({{{\boldsymbol{f}}_{t}}^{\top}}{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}{\boldsymbol{f}}_{t}\right)\right],\\ &={{\operatorname{E}}_{\boldsymbol{f}}}\left[\operatorname{tr}\left(\left({{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}\right){\boldsymbol{f}}_{t}{{{\boldsymbol{f}}_{t}}^{\top}}\right)\right],\\ &=\operatorname{tr}\left(\left({{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}\right){{\operatorname{E}}_{\boldsymbol{f}}}\left[{\boldsymbol{f}}_{t}{{{\boldsymbol{f}}_{t}}^{\top}}\right]\right),\\ &=\operatorname{tr}\left(\left({{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}\right){\mathsf{\Gamma}}_{f}\right).\end{split}

This quantity is the Hotelling-Lawley trace, typically used to test the so-called Multivariate General Linear Hypothesis. [58, 45] See Section 6.

To perform inference on the Markowitz coefficient, we can proceed exactly as above. Let

𝒙~~i+1=df[si𝒇i,si𝒙i+1].{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}=_{\operatorname{df}}{{\left[{{s}_{i}}{{{\boldsymbol{f}}_{i}}^{\top}},{{s}_{i}}{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}. (89)

Consider the second moment of 𝒙~~\boldsymbol{\tilde{\tilde{x}}}:

Θf=dfE[𝒙~~𝒙~~]=[ΓfΓf𝖡𝖡ΓfΣ+𝖡Γf𝖡],whereΓf=dfE[s2𝒇𝒇].{{\mathsf{\Theta}}_{f}}=_{\operatorname{df}}\operatorname{E}\left[\boldsymbol{\tilde{\tilde{x}}}{{\boldsymbol{\tilde{\tilde{x}}}}^{\top}}\right]=\left[\begin{array}[]{cc}{{\mathsf{\Gamma}}_{f}}&{{\mathsf{\Gamma}}_{f}{{\mathsf{B}}^{\top}}}\\ {\mathsf{B}{\mathsf{\Gamma}}_{f}}&{\mathsf{\Sigma}+\mathsf{B}{\mathsf{\Gamma}}_{f}{{\mathsf{B}}^{\top}}}\end{array}\right],\quad\mbox{where}\quad{\mathsf{\Gamma}}_{f}=_{\operatorname{df}}\operatorname{E}\left[s^{2}\boldsymbol{f}{{\boldsymbol{f}}^{\top}}\right]. (90)

The inverse of Θf{{\mathsf{\Theta}}_{f}} is

Θf1=[Γf1+𝖡Σ1𝖡𝖡Σ1Σ1𝖡Σ1]{{{{\mathsf{\Theta}}_{f}}}^{-1}}=\left[\begin{array}[]{cc}{{{{\mathsf{\Gamma}}_{f}}^{-1}}+{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}}&{-{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}\\ {-{{\mathsf{\Sigma}}^{-1}}\mathsf{B}}&{{{\mathsf{\Sigma}}^{-1}}}\end{array}\right] (91)

Once again, the Markowitz coefficient (up to scaling and sign), appears in vech(Θf1)\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{-1}}\right).

The following theorem is an analogue of, and shares a proof with, Theorem 2.5.

Theorem 4.12.

Let Θ^f=df1ni𝐱~~i+1𝐱~~i+1{{\mathsf{\hat{\Theta}}}_{f}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}{{{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}}^{\top}}, based on nn i.i.d. samples of [s,𝐟,𝐱]{{\left[s,{{\boldsymbol{f}}^{\top}},{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, where

𝒙~~i+1=df[si𝒇i,si𝒙i+1].{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}=_{\operatorname{df}}{{\left[{{s}_{i}}{{{\boldsymbol{f}}_{i}}^{\top}},{{s}_{i}}{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}.

Let Ω\mathsf{\Omega} be the variance of vech(𝐱~~𝐱~~)\operatorname{vech}\left(\boldsymbol{\tilde{\tilde{x}}}{{\boldsymbol{\tilde{\tilde{x}}}}^{\top}}\right). Then, asymptotically in nn,

n(vech(Θ^f1)vech(Θf1))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{-1}}\right)-\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (92)

where

𝖧=𝖫(Θf1Θf1)𝖣.\mathsf{H}=-\mathsf{L}{\left({{{{{\mathsf{\Theta}}_{f}}}^{-1}}}\otimes{{{{{\mathsf{\Theta}}_{f}}}^{-1}}}\right)}\mathsf{D}. (93)

Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

4.5 Conditional Expectation and Heteroskedasticity with Hedging Constraint

A little work allows us to combine the conditional model of Section 4.4 with the hedging constraint of Section 4.2. Suppose returns follow the model of Equation 86. To prove the following lemma, simply plug in 𝖡𝒇i\mathsf{B}{\boldsymbol{f}}_{i} for 𝝁\boldsymbol{\mu}, and si2Σ{{s}_{i}}^{-2}\mathsf{\Sigma} for Σ\mathsf{\Sigma} into Lemma 4.5.

Lemma 4.13 (Hedged Conditional Sharpe ratio optimal portfolio).

Let 𝖦\mathsf{G} be a given pg×p{{{p}_{g}}}\times{p} matrix of rank pg{{p}_{g}}. Under the model in Equation 86, conditional on observing si{{s}_{i}} and 𝐟i{\boldsymbol{f}}_{i}, the portfolio optimization problem

argmax𝝂:𝖦Σ𝝂=𝟎,Var(𝝂𝒙i+1|si,𝒇i)R2E[𝝂𝒙i+1|si,𝒇i]r0Var(𝝂𝒙i+1|si,𝒇i),\mathop{\mathrm{argmax}}_{\begin{subarray}{c}\boldsymbol{\nu}:\,\mathsf{G}\mathsf{\Sigma}\boldsymbol{\nu}=\boldsymbol{0},\\ \operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right)\leq R^{2}\end{subarray}}\frac{\operatorname{E}\left[{{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right]-{{r}_{0}}}{\sqrt{\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right)}}, (94)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂R,𝖦,=dfc(Σ1𝖡𝖦(𝖦Σ𝖦)1𝖦𝖡)𝒇i,c=siR(𝖡𝒇i)Σ1(𝖡𝒇i)(𝖡𝒇i)𝖦(𝖦Σ𝖦)1𝖦(𝖡𝒇i).\begin{split}{\boldsymbol{\nu}}_{{R,\mathsf{G},}*}&=_{\operatorname{df}}c\left({{\mathsf{\Sigma}}^{-1}}{\mathsf{B}}-{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\mathsf{B}\right){\boldsymbol{f}}_{i},\\ c&=\frac{{{s}_{i}}R}{\sqrt{{{\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}^{\top}}{{\mathsf{\Sigma}}^{-1}}\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)-{{\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}}.\end{split}

Moreover, this is the unique solution whenever r0>0{{r}_{0}}>0.

The same cautions regarding multiperiod portfolio choice apply to the above lemma. Results analogous to those of Section 4.2 follow, but with one minor modification to the analogue of Definition 4.6.

Lemma 4.14.

Let 𝖦~\tilde{\mathsf{G}} now be the (f+pg)×(f+p){\left(f+{{p}_{g}}\right)}\times{\left(f+p\right)} matrix,

𝖦~=df[𝖨f00𝖦],\tilde{\mathsf{G}}=_{\operatorname{df}}\left[\begin{array}[]{cc}{\mathsf{I}_{f}}&{0}\\ {0}&{\mathsf{G}}\end{array}\right],

where the upper right corner is the f×f{f}\times{f} identity matrix. Define the ‘delta inverse second moment’ as

Δ𝖦Θf1=dfΘf1𝖦~(𝖦~Θf𝖦~)1𝖦~,{\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}}=_{\operatorname{df}}{{{{\mathsf{\Theta}}_{f}}}^{-1}}-{{\tilde{\mathsf{G}}}^{\top}}{{\left(\tilde{\mathsf{G}}{{\mathsf{\Theta}}_{f}}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}\tilde{\mathsf{G}}, (95)

where Θf{{\mathsf{\Theta}}_{f}} is defined as in Equation 90. The elements of Δ𝖦Θf1{\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}} are

Δ𝖦Θ1=[𝖡Σ1𝖡𝖡𝖦(𝖦Σ𝖦)1𝖦𝖡𝖡Σ1+𝖡𝖦(𝖦Σ𝖦)1𝖦Σ1𝖡+𝖦(𝖦Σ𝖦)1𝖦𝖡Σ1𝖦(𝖦Σ𝖦)1𝖦].{\Delta}_{\mathsf{G}}{{\mathsf{\Theta}}^{-1}}=\left[\begin{array}[]{cc}{{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}\mathsf{B}-{{\mathsf{B}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\mathsf{B}}&{-{{\mathsf{B}}^{\top}}{{\mathsf{\Sigma}}^{-1}}+{{\mathsf{B}}^{\top}}{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}}\\ {-{{\mathsf{\Sigma}}^{-1}}\mathsf{B}+{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}\mathsf{B}}&{{{\mathsf{\Sigma}}^{-1}}-{{\mathsf{G}}^{\top}}{{\left(\mathsf{G}\mathsf{\Sigma}{{\mathsf{G}}^{\top}}\right)}^{-1}}\mathsf{G}}\end{array}\right].

In particular, the Markowitz coefficient from Lemma 4.13 appears in the lower left corner of vech(Δ𝖦Θf1)-\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}}\right), and the denominator of the constant cc from Lemma 4.13 depends on a quadratic form of 𝐟i{\boldsymbol{f}}_{i} with the upper right left corner of vech(Δ𝖦Θf1)\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}}\right).

Theorem 4.15.

Let Θ^f=df1ni𝐱~~i+1𝐱~~i+1{{\mathsf{\hat{\Theta}}}_{f}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}{{{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}}^{\top}}, based on nn i.i.d. samples of [s,𝐟,𝐱]{{\left[s,{{\boldsymbol{f}}^{\top}},{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, where

𝒙~~i+1=df[si𝒇i,si𝒙i+1].{\boldsymbol{\tilde{\tilde{x}}}}_{i+1}=_{\operatorname{df}}{{\left[{{s}_{i}}{{{\boldsymbol{f}}_{i}}^{\top}},{{s}_{i}}{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}.

Let Ω\mathsf{\Omega} be the variance of vech(𝐱~~𝐱~~)\operatorname{vech}\left(\boldsymbol{\tilde{\tilde{x}}}{{\boldsymbol{\tilde{\tilde{x}}}}^{\top}}\right). Define Δ𝖦Θf1{\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}} as in Equation 95 for given (f+pg)×(p+f){\left(f+{{p}_{g}}\right)}\times{\left(p+f\right)} matrix 𝖦~\tilde{\mathsf{G}}.

Then, asymptotically in nn,

n(vech(Δ𝖦Θ^f1)vech(Δ𝖦Θf1))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{{{\mathsf{\hat{\Theta}}}_{f}}}^{-1}}\right)-\operatorname{vech}\left({\Delta}_{\mathsf{G}}{{{{\mathsf{\Theta}}_{f}}}^{-1}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (96)

where

𝖧=𝖫[Θf1Θf1(𝖦~𝖦~)((𝖦~Θf𝖦~)1(𝖦~Θf𝖦~)1)(𝖦~𝖦~)]𝖣.\mathsf{H}=-\mathsf{L}{\left[{{{{{\mathsf{\Theta}}_{f}}}^{-1}}}\otimes{{{{{\mathsf{\Theta}}_{f}}}^{-1}}}-\left({{{\tilde{\mathsf{G}}}^{\top}}}\otimes{{{\tilde{\mathsf{G}}}^{\top}}}\right)\left({{{\left(\tilde{\mathsf{G}}{{\mathsf{\Theta}}_{f}}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}}\otimes{{{\left(\tilde{\mathsf{G}}{{\mathsf{\Theta}}_{f}}{{\tilde{\mathsf{G}}}^{\top}}\right)}^{-1}}}\right)\left({\tilde{\mathsf{G}}}\otimes{\tilde{\mathsf{G}}}\right)\right]}\mathsf{D}.

Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

4.6 Conditional Expectation and Multivariate Heteroskedasticity

Here we extend the model from Section 4.4 to accept multiple heteroskedasticity ‘features’. First note that if we redefined 𝒇i{\boldsymbol{f}}_{i} to be 𝒇isi{\boldsymbol{f}}_{i}{{s}_{i}}, we could rewrite the model in Equation 86 as

E[𝒙i+1si|si,𝒇i]\displaystyle\operatorname{E}\left[{\boldsymbol{x}}_{i+1}{{s}_{i}}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right] =𝖡𝒇i,\displaystyle=\mathsf{B}{\boldsymbol{f}}_{i}, Var(𝒙i+1si|si,𝒇i)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}{{s}_{i}}\left|\,{{s}_{i}},{\boldsymbol{f}}_{i}\right.\right) =Σ,\displaystyle=\mathsf{\Sigma},

This can be generalized to vector-valued 𝒔\boldsymbol{s} by means of the flattening trick. [6]

Suppose you observe the state variables qq-vector 𝒔i{\boldsymbol{s}}_{i}, and ff-vector 𝒇i{\boldsymbol{f}}_{i} at some time prior to when the investment decision is required to capture 𝒙i+1{\boldsymbol{x}}_{i+1}. It need not be the case that 𝒔\boldsymbol{s} and 𝒇\boldsymbol{f} are independent. For sane interpretation of the model, it makes sense that all elements of 𝒔\boldsymbol{s} are positive. The general model is now

E[vec(𝒙i+1𝒔i)|𝒔i,𝒇i]\displaystyle\operatorname{E}\left[\operatorname{vec}\left({\boldsymbol{x}}_{i+1}{{{\boldsymbol{s}}_{i}}^{\top}}\right)\left|\,{\boldsymbol{s}}_{i},{\boldsymbol{f}}_{i}\right.\right] =𝖡𝒇i,\displaystyle=\mathsf{B}{\boldsymbol{f}}_{i}, Var(vec(𝒙i+1𝒔i)|𝒔i,𝒇i)\displaystyle\operatorname{Var}\left(\operatorname{vec}\left({\boldsymbol{x}}_{i+1}{{{\boldsymbol{s}}_{i}}^{\top}}\right)\left|\,{\boldsymbol{s}}_{i},{\boldsymbol{f}}_{i}\right.\right) =Σ,\displaystyle=\mathsf{\Sigma}, (97)

where 𝖡\mathsf{B} is some (pq)×f{\left(pq\right)}\times{f} matrix, and Σ\mathsf{\Sigma} is now a (pq)×(pq){\left(pq\right)}\times{\left(pq\right)} matrix.

Conditional on observing 𝒔i{\boldsymbol{s}}_{i}, a portfolio on the ff assets, 𝝂^\boldsymbol{\hat{\nu}}, can be expressed as the portfolio vec(𝝂^𝒔i)\operatorname{vec}\left(\boldsymbol{\hat{\nu}}{{{\boldsymbol{s}}_{i}}^{-\top}}\right) on the (pq)\left(pq\right) assets whose returns are the vector vec(𝒙i+1𝒔i)\operatorname{vec}\left({\boldsymbol{x}}_{i+1}{{{\boldsymbol{s}}_{i}}^{\top}}\right); here 𝒔i{{{\boldsymbol{s}}_{i}}^{-\top}} refers to the element-wise, or Hadamard, inverse of 𝒔i{{{\boldsymbol{s}}_{i}}^{\top}}. Thus we may perform portfolio conditional optimization on the enlarged space of (pq)\left(pq\right) assets, and then, conditional on 𝒔i{\boldsymbol{s}}_{i}, impose a subspace constraint requiring the portfolio to be spanned by the column space of (𝖨p𝒔i){{\left({\mathsf{I}}_{p}\divideontimes{\boldsymbol{s}}_{i}\right)}^{\top}}, where \divideontimes is used to mean the Kronecker product with 𝒔i1{{{\boldsymbol{s}}_{i}}^{-1}}, the Hadamard inverse of 𝒔i{\boldsymbol{s}}_{i}.

We can then combine the results of Section 4.1 and Section 4.4 to solve the portfolio optimization problem, and perform inference on that portfolio. The following is the analogue of Lemma 4.11 combined with Lemma 4.1.

Lemma 4.16 (Conditional Sharpe ratio optimal portfolio).

Suppose returns follow the model in Equation 97, and suppose 𝐬i{\boldsymbol{s}}_{i} and 𝐟i{\boldsymbol{f}}_{i} have been observed. Let 𝖩=(𝖨p𝐬i),\mathsf{J}={{\left({\mathsf{I}}_{p}\divideontimes{\boldsymbol{s}}_{i}\right)}^{\top}}, and suppose 𝖩𝖡𝐟i\mathsf{J}\mathsf{B}{\boldsymbol{f}}_{i} is not all zeros. Then the portfolio optimization problem

argmax𝝂:Var(𝝂𝒙i+1|𝒔i,𝒇i)R2E[𝝂𝒙i+1|𝒔i,𝒇i]r0Var(𝝂𝒙i+1|𝒔i,𝒇i),\mathop{\mathrm{argmax}}_{\boldsymbol{\nu}:\,\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{\boldsymbol{s}}_{i},{\boldsymbol{f}}_{i}\right.\right)\leq R^{2}}\frac{\operatorname{E}\left[{{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{\boldsymbol{s}}_{i},{\boldsymbol{f}}_{i}\right.\right]-{{r}_{0}}}{\sqrt{\operatorname{Var}\left({{\boldsymbol{\nu}}^{\top}}{\boldsymbol{x}}_{i+1}\left|\,{\boldsymbol{s}}_{i},{\boldsymbol{f}}_{i}\right.\right)}}, (98)

for r00,R>0{{r}_{0}}\geq 0,R>0 is solved by

𝝂=c𝖩(𝖩Σ𝖩)1𝖩𝖡𝒇i,c=R(𝖡𝒇i)𝖩(𝖩Σ𝖩)1𝖩(𝖡𝒇i).\begin{split}{\boldsymbol{\nu}}_{{}*}&=c{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\mathsf{B}{\boldsymbol{f}}_{i},\\ c&=\frac{R}{\sqrt{{{\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}^{\top}}{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}}.\end{split}

Moreover, the solution is unique whenever r0>0{{r}_{0}}>0. This portfolio achieves the maximal objective of

(𝖡𝒇i)𝖩(𝖩Σ𝖩)1𝖩(𝖡𝒇i)r0R.\sqrt{{{\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}^{\top}}{{\mathsf{J}}^{\top}}{{\left(\mathsf{J}\mathsf{\Sigma}{{\mathsf{J}}^{\top}}\right)}^{-1}}\mathsf{J}\left(\mathsf{B}{\boldsymbol{f}}_{i}\right)}-\frac{{{r}_{0}}}{R}.

The distribution of the sample analogue of the portfolio in Lemma 4.16 is given essentially by Theorem 4.4, applied to the case of conditional expected returns.

5 Constrained Estimation

Now consider the case where the population parameter Θf{{\mathsf{\Theta}}_{f}} is known or suspected, a priori, to satisfy some constraints. One then wishes to impose the same constraints on the sample estimate prior to constructing the Markowitz portfolio, imposing a hedge, etc.

To avoid the possibility that the constrained estimate is not positive definite or the need for cone programming to find the estimate, here we assume the constraint can be expressed in terms of the (lower) Cholesky factor of Θf{{\mathsf{\Theta}}_{f}}. Note that this takes the form

Θf1/2=[Γf1/2𝟢𝖡Γf1/2Σ1/2],{{{{\mathsf{\Theta}}_{f}}}^{1/2}}=\left[\begin{array}[]{cc}{{{{\mathsf{\Gamma}}_{f}}^{1/2}}}&{\mathsf{0}}\\ {\mathsf{B}{{{\mathsf{\Gamma}}_{f}}^{1/2}}}&{{{\mathsf{\Sigma}}^{1/2}}}\end{array}\right], (99)

as can be confirmed by multiplying the above by its transpose.

5.1 Linear constraints

Now consider equality constraints of the form 𝖠𝖡=𝖳\mathsf{A}\mathsf{B}=\mathsf{T} for conformable matrices 𝖠,𝖳\mathsf{A},\mathsf{T}, a less general form of the Multivariate General Linear Hypothesis, of which more in the sequel. Via equalities of this form, one can constrain the mean of certain assets to be zero (for example, assets to be hedged out), or force certain elements of 𝒇i{\boldsymbol{f}}_{i} to have no marginal predictive ability on certain elements of 𝒙i+1{\boldsymbol{x}}_{i+1}. When this constraint is satisfied, note that

[𝖳𝖠]Θf1/2[𝖨𝟢]=𝟢,{\left[\begin{array}[]{rr}{-\mathsf{T}}&{\mathsf{A}}\end{array}\right]}{{{{\mathsf{\Theta}}_{f}}}^{1/2}}\left[\begin{array}[]{r}{\mathsf{I}}\\ {\mathsf{0}}\end{array}\right]=\mathsf{0},

which can be rewritten as

𝟢=([𝖨𝟢][𝖳𝖠])vec(Θf1/2),=([𝖨𝟢][𝖳𝖠])𝖫vech(Θf1/2).\begin{split}\mathsf{0}&=\left(\left[\begin{array}[]{rr}{\mathsf{I}}&{\mathsf{0}}\end{array}\right]\otimes\left[\begin{array}[]{rr}{-\mathsf{T}}&{\mathsf{A}}\end{array}\right]\right)\operatorname{vec}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right),\\ &=\left(\left[\begin{array}[]{rr}{\mathsf{I}}&{\mathsf{0}}\end{array}\right]\otimes\left[\begin{array}[]{rr}{-\mathsf{T}}&{\mathsf{A}}\end{array}\right]\right){{\mathsf{L}}^{\top}}\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right).\end{split}

This motivates the imposition of nc{{n}_{c}} equality constraints of the form

𝔇vech(Θf1/2)=𝔟,\mathfrak{D}\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right)=\mathfrak{b}, (100)

where 𝔇\mathfrak{D} is some nc×nv{{{n}_{c}}}\times{{{n}_{v}}} matrix and 𝔟\mathfrak{b} is a nc{{n}_{c}}-vector, where nv=(p+f+1)(p+f)/2{{n}_{v}}=\left(p+f+1\right)\left(p+f\right)/2 is the number of elements in vech(Θf1/2)\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right).

Now consider the optimization problem

minz:𝔇z=𝔟(zvech(Θ^f1/2))𝔚(zvech(Θ^f1/2)),\min_{z:\,\mathfrak{D}z=\mathfrak{b}}{{\left(z-\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right)\right)}^{\top}}\mathfrak{W}\left(z-\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right)\right), (101)

where 𝔚\mathfrak{W} is some symmetric positive definite nv×nv{{{n}_{v}}}\times{{{n}_{v}}} ‘weighting’ matrix, the identity in the garden variety application. The solution to this problem can easily be identified via the Lagrange multiplier technique to be

z=vech(Θ^f1/2)+𝔚1𝔇(𝔇𝔚1𝔇)1(𝔟𝔇vech(Θ^f1/2)),=𝔚1𝔇(𝔇𝔚1𝔇)1𝔟+[𝖨𝔚1𝔇(𝔇𝔚1𝔇)1𝔇]vech(Θ^f1/2).\begin{split}z_{*}&=\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right)+{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left({{\mathfrak{D}}^{\top}}{{\mathfrak{W}}^{-1}}\mathfrak{D}\right)}^{-1}}\left(\mathfrak{b}-\mathfrak{D}\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right)\right),\\ &={{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left({{\mathfrak{D}}^{\top}}{{\mathfrak{W}}^{-1}}\mathfrak{D}\right)}^{-1}}\mathfrak{b}+\left[\mathsf{I}-{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left(\mathfrak{D}{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}\right)}^{-1}}\mathfrak{D}\right]\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right).\end{split}

Define Θ~f{{\mathsf{\tilde{\Theta}}}_{f}} to be the nv×nv{{{n}_{v}}}\times{{{n}_{v}}} matrix whose Cholesky factor solves minimization problem 101:

Θ~f=df(vec1(z))(vec1(z)).{{\mathsf{\tilde{\Theta}}}_{f}}=_{\operatorname{df}}\left({{\operatorname{vec}}}^{-1}\left(z_{*}\right)\right){{\left({{\operatorname{vec}}}^{-1}\left(z_{*}\right)\right)}^{\top}}.

When the population parameter satisfies the constraints, this sample estimate is asymptotically unbiased.

Theorem 5.1.

Suppose 𝔇vech(Θf1/2)=𝔟\mathfrak{D}\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right)=\mathfrak{b} for given nc×nv{{{n}_{c}}}\times{{{n}_{v}}} matrix 𝔇\mathfrak{D} and nc{{n}_{c}}-vector 𝔟\mathfrak{b}. Let 𝔚\mathfrak{W} be a symmetric, positive definite nv×nv{{{n}_{v}}}\times{{{n}_{v}}} matrix. Let Θ^f=df1ni𝐱~i+1𝐱~i+1{{\mathsf{\hat{\Theta}}}_{f}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{x}}}_{i+1}{{{\boldsymbol{\tilde{x}}}_{i+1}}^{\top}}, based on nn i.i.d. samples of [𝐟,𝐱]{{\left[{{\boldsymbol{f}}^{\top}},{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, where

𝒙~i+1=df[𝒇i,𝒙i+1].{\boldsymbol{\tilde{x}}}_{i+1}=_{\operatorname{df}}{{\left[{{{\boldsymbol{f}}_{i}}^{\top}},{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}.

Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Define Θ~f{{\mathsf{\tilde{\Theta}}}_{f}} such that

vech(Θ~f1/2)=𝔚1𝔇(𝔇𝔚1𝔇)1𝔟+[𝖨𝔚1𝔇(𝔇𝔚1𝔇)1𝔇]vech(Θ^f1/2).\operatorname{vech}\left({{{{\mathsf{\tilde{\Theta}}}_{f}}}^{1/2}}\right)={{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left({{\mathfrak{D}}^{\top}}{{\mathfrak{W}}^{-1}}\mathfrak{D}\right)}^{-1}}\mathfrak{b}+\\ \left[\mathsf{I}-{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left(\mathfrak{D}{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}\right)}^{-1}}\mathfrak{D}\right]\operatorname{vech}\left({{{{\mathsf{\hat{\Theta}}}_{f}}}^{1/2}}\right).

Then, asymptotically in nn,

n(vech(Θ~f)vech(Θf))𝒩(0,𝖧Ω𝖧),\sqrt{n}\left(\operatorname{vech}\left({{\mathsf{\tilde{\Theta}}}_{f}}\right)-\operatorname{vech}\left({{\mathsf{\Theta}}_{f}}\right)\right)\rightsquigarrow\mathcal{N}\left(0,\mathsf{H}\mathsf{\Omega}{{\mathsf{H}}^{\top}}\right), (102)

where 𝖧=𝖧1𝖧2𝖧3\mathsf{H}=\mathsf{H}_{1}\mathsf{H}_{2}\mathsf{H}_{3} defined as

𝖧1=𝖫(𝖨+𝖪)(Θf1/2𝖨),𝖧2=𝖫[𝖨𝔚1𝔇(𝔇𝔚1𝔇)1𝔇],𝖧3=(𝖫(𝖨+𝖪)(Θf1/2𝖨)𝖫)1,\begin{split}\mathsf{H}_{1}&=\mathsf{L}\left(\mathsf{I}+\mathsf{K}\right)\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\otimes\mathsf{I}\right),\\ \mathsf{H}_{2}&={{\mathsf{L}}^{\top}}\left[\mathsf{I}-{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left(\mathfrak{D}{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}\right)}^{-1}}\mathfrak{D}\right],\\ \mathsf{H}_{3}&={{\left(\mathsf{L}\left(\mathsf{I}+\mathsf{K}\right)\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\otimes\mathsf{I}\right){{\mathsf{L}}^{\top}}\right)}^{-1}},\\ \end{split}

where 𝖪\mathsf{K} is the Commutation matrix.

Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

Proof.

Define the functions f1(𝖷){f}_{1}\left(\mathsf{X}\right), f2(𝖷){f}_{2}\left(\mathsf{X}\right), f3(𝖷){f}_{3}\left(\mathsf{X}\right) as follows:

f1(𝖷)=vech(𝖷𝖷),f2(𝖷)=tril1(𝔚1𝔇(𝔇𝔚1𝔇)1𝔟+[𝖨𝔚1𝔇(𝔇𝔚1𝔇)1𝔇]𝖷),f3(𝖷)=vech(𝖷1/2),\begin{split}{f}_{1}\left(\mathsf{X}\right)&=\operatorname{vech}\left(\mathsf{X}{{\mathsf{X}}^{\top}}\right),\\ {f}_{2}\left(\mathsf{X}\right)&={{\operatorname{tril}}}^{-1}\left({{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left({{\mathfrak{D}}^{\top}}{{\mathfrak{W}}^{-1}}\mathfrak{D}\right)}^{-1}}\mathfrak{b}+\left[\mathsf{I}-{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}{{\left(\mathfrak{D}{{\mathfrak{W}}^{-1}}{{\mathfrak{D}}^{\top}}\right)}^{-1}}\mathfrak{D}\right]\mathsf{X}\right),\\ {f}_{3}\left(\mathsf{X}\right)&=\operatorname{vech}\left({{\mathsf{X}}^{1/2}}\right),\\ \end{split}

where tril1(𝖷){{\operatorname{tril}}}^{-1}\left(\mathsf{X}\right) is the function that takes a conformable vector to a lower triangular matrix. We have then defined vech(Θ~f)\operatorname{vech}\left({{\mathsf{\tilde{\Theta}}}_{f}}\right) as f1(f2(f3(Θ^f))){f}_{1}\left({f}_{2}\left({f}_{3}\left({{\mathsf{\hat{\Theta}}}_{f}}\right)\right)\right). By the central limit theorem, and the matrix chain rule, it suffices to show that 𝖧1\mathsf{H}_{1} is the derivative of f1(){f}_{1}\left(\cdot\right) evaluated at f2(f3(Θf)){f}_{2}\left({f}_{3}\left({{\mathsf{\Theta}}_{f}}\right)\right), that 𝖧2\mathsf{H}_{2} is the derivative of f2(){f}_{2}\left(\cdot\right) evaluated at f3(Θf){f}_{3}\left({{\mathsf{\Theta}}_{f}}\right), and 𝖧3\mathsf{H}_{3} is the derivative of f3(){f}_{3}\left(\cdot\right) evaluated at Θf{{\mathsf{\Theta}}_{f}}.

These are established by Equation 123 of Lemma A.1, and Lemma A.2, and by the assumption that 𝔇vech(Θf1/2)=𝔟\mathfrak{D}\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right)=\mathfrak{b}, which implies that f2(f3(Θf))=Θf1/2{f}_{2}\left({f}_{3}\left({{\mathsf{\Theta}}_{f}}\right)\right)={{{{\mathsf{\Theta}}_{f}}}^{1/2}}.

The choice of 𝔚\mathfrak{W} is non-trivial. Armed with Theorem 5.1 and knowledge of Θ\mathsf{\Theta} and Ω\mathsf{\Omega}, one would attempt to minimize the covariance of the estimator Θ~f{{\mathsf{\tilde{\Theta}}}_{f}}. Since these are unknown population parameters, one would have to estimate them somehow. [16, 12]

The linear equality constraint can be generalized to a single half-space inequality constraint. That is,

𝔇vech(Θf1/2)𝔟,\mathfrak{D}\operatorname{vech}\left({{{{\mathsf{\Theta}}_{f}}}^{1/2}}\right)\leq\mathfrak{b}, (103)

for 𝔇\mathfrak{D} a 1×nv{1}\times{{{n}_{v}}} matrix and 𝔟\mathfrak{b} a scalar. [61, 62] However, the general case of multiple inequality constraints is much more difficult, and remains an open question.

5.2 Rank constraints

Another plausible type of constraint is a rank constraint. Here it is suspected a priori that the (p+f)×(p+f){\left(p+f\right)}\times{\left(p+f\right)} matrix Θf{{\mathsf{\Theta}}_{f}} has rank r<p+fr<p+f. One sane response of a portfolio manager with this belief is to project Θ^f{{\mathsf{\hat{\Theta}}}_{f}} to a rank rr matrix, take the pseudoinverse, and use the (negative) corner sub-matrix as the Markowitz coefficient. (cf. Lemma 4.13) Here we consider the asymptotic distribution of this reduced rank Markowitz coefficient.

To find the asymptotic distribution of sample estimates of Θf{{\mathsf{\Theta}}_{f}} with a rank constraint, the derivative of the reduced rank decomposition is needed. [54, 26]

Lemma 5.2.

Let 𝖠\mathsf{A} be a real J×J{J}\times{J} symmetric matrix with rank rJr\leq J. Let vj(𝖠){v}_{j}\left(\mathsf{A}\right) be the function that returns the jthj^{\text{th}} eigenvalue of 𝖠\mathsf{A}, and similarly let Vj(𝖠){V}_{j}\left(\mathsf{A}\right) compute the corresponding eigenvector. Then

dvj(𝖠)dvech(𝖠)\displaystyle\frac{\mathrm{d}{{v}_{j}\left(\mathsf{A}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{A}\right)} =vech(Vj(𝖠)Vj(𝖠))𝖣,\displaystyle={{\operatorname{vech}\left({V}_{j}\left(\mathsf{A}\right){{{V}_{j}\left(\mathsf{A}\right)}^{\top}}\right)}^{\top}}\mathsf{D}, (104)
dVj(𝖠)dvech(𝖠)\displaystyle\frac{\mathrm{d}{{V}_{j}\left(\mathsf{A}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{A}\right)} =(vj(𝖠)𝖨𝖠)+(Vj(𝖠)𝖨)𝖣.\displaystyle={{\left({v}_{j}\left(\mathsf{A}\right)\mathsf{I}-\mathsf{A}\right)}^{+}}\left({{{V}_{j}\left(\mathsf{A}\right)}^{\top}}\otimes\mathsf{I}\right)\mathsf{D}. (105)
Proof.

The derivatives are known. [54, equation (67)-(68)] The form here follows from algebra and Lemma 2.3. ∎

From these, the derivative of the r×r{r}\times{r} diagonal matrix with diagonal

[v1(𝖠)p,v2(𝖠)p,,vr(𝖠)p]{{\left[{v}_{1}\left(\mathsf{A}\right)^{p},{v}_{2}\left(\mathsf{A}\right)^{p},\ldots,{v}_{r}\left(\mathsf{A}\right)^{p}\right]}^{\top}}

can be computed with respect to vech(𝖠)\operatorname{vech}\left(\mathsf{A}\right) for arbitrary non-zero pp. Similarly the derivative of the matrix whose columns are v1(𝖠),v2(𝖠),,vr(𝖠){v}_{1}\left(\mathsf{A}\right),{v}_{2}\left(\mathsf{A}\right),\ldots,{v}_{r}\left(\mathsf{A}\right) can be computed with respect to vech(𝖠)\operatorname{vech}\left(\mathsf{A}\right). From these the derivative of the pseudo-inverse of the rank rr approximation to 𝖠\mathsf{A} can be computed with respect to vech(𝖠)\operatorname{vech}\left(\mathsf{A}\right). By the delta method, then, an asymptotic normal distribution of the pseudo-inverse of the rank rr approximation to 𝖠\mathsf{A} can be established. The formula for the variance is best left unwritten, since it would be too complex to be enlightening, and is best constructed by automatic differentiation anyway.

6 The multivariate general linear hypothesis

Dropping the conditional heteroskedasticity term from Equation 86, we have the model

E[𝒙i+1|𝒇i]\displaystyle\operatorname{E}\left[{\boldsymbol{x}}_{i+1}\left|\,{\boldsymbol{f}}_{i}\right.\right] =𝖡𝒇i,\displaystyle=\mathsf{B}{\boldsymbol{f}}_{i}, Var(𝒙i+1|𝒇i)\displaystyle\operatorname{Var}\left({\boldsymbol{x}}_{i+1}\left|\,{\boldsymbol{f}}_{i}\right.\right) =Σ,\displaystyle=\mathsf{\Sigma},

The unknowns 𝖡\mathsf{B} and Σ\mathsf{\Sigma} can be estimated by multivariate multiple linear regression. Testing for significance of the elements of 𝖡\mathsf{B} is via the Multivariate General Linear Hypothesis (MGLH). [45, 59, 60, 49, 63, 67] The MGLH can be posed as

H0:𝖠𝖡𝖢=𝖳,{{H}_{0}}:\mathsf{A}\mathsf{B}\mathsf{C}=\mathsf{T}, (106)

for a×p{a}\times{p} matrix 𝖠\mathsf{A}, f×c{f}\times{c} matrix 𝖢\mathsf{C}, and a×c{a}\times{c} matrix 𝖳\mathsf{T}. We require 𝖠\mathsf{A} and 𝖢\mathsf{C} to have full rank, and apa\leq p and cf.c\leq f. In the garden-variety application one tests whether 𝖡\mathsf{B} is all zero by letting 𝖠\mathsf{A} and 𝖢\mathsf{C} be identity matrices, and 𝖳\mathsf{T} a matrix of all zeros.

Testing the MGLH proceeds by one of four test statistics, each defined in terms of two matrices, the model variance matrix, 𝖧^\mathsf{\hat{H}}, and the error variance matrix, 𝖤^\mathsf{\hat{E}}, defined as

𝖧^=df(𝖠𝖡^𝖢𝖳)(𝖢Γ^f1𝖢)1(𝖠𝖡^𝖢𝖳),𝖤^=df𝖠Σ^𝖠,\mathsf{\hat{H}}=_{\operatorname{df}}\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right){{\left({{\mathsf{C}}^{\top}}{{{\mathsf{\hat{\Gamma}}}_{f}}^{-1}}\mathsf{C}\right)}^{-1}}{{\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right)}^{\top}},\quad\mathsf{\hat{E}}=_{\operatorname{df}}{{\mathsf{A}}^{\top}}\mathsf{\hat{\Sigma}}\mathsf{A}, (107)

where Γ^f=1ni𝒇i𝒇i{\mathsf{\hat{\Gamma}}}_{f}=\frac{1}{n}\sum_{i}{\boldsymbol{f}}_{i}{{{\boldsymbol{f}}_{i}}^{\top}}. Note that typically in non-finance applications, the regressors are deterministic and controlled by the experimenter (giving rise to the term ‘design matrix’). In this case, it is assumed that Γ^f{\mathsf{\hat{\Gamma}}}_{f} estimates the population analogue, Γf{\mathsf{\Gamma}}_{f}, without error, though some work has been done for the case of ‘random explanatory variables.’ [60]

The four test statistics for the MGLH are:

Hotelling-Lawley trace:T^\displaystyle\mbox{Hotelling-Lawley trace:}\quad\hat{T} =dftr(𝖤^1𝖧^)=tr(𝖨a+𝖤^1𝖧^)a,\displaystyle=_{\operatorname{df}}\operatorname{tr}\left({{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)=\operatorname{tr}\left({\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)-a, (108)
Pillai-Bartlett trace:P^\displaystyle\mbox{Pillai-Bartlett trace:}\quad\hat{P} =dftr((𝖨a+𝖤^1𝖧^)1),\displaystyle=_{\operatorname{df}}\operatorname{tr}\left({{\left({\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)}^{-1}}\right), (109)
Wilk’s LRT:U^\displaystyle\mbox{Wilk's LRT:}\quad\hat{U} =df|(𝖨a+𝖤^1𝖧^)1|,\displaystyle=_{\operatorname{df}}\left|{{\left({\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)}^{-1}}\right|, (110)
Roy’s largest root:R^\displaystyle\mbox{Roy's largest root:}\quad\hat{R} =dfmax(eig(𝖤^1𝖧^)),\displaystyle=_{\operatorname{df}}max\left(eig\left({{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)\right), (111)
=max(eig(𝖨a+𝖤^1𝖧^))1.\displaystyle=max\left(eig\left({\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right)\right)-1.

Of these four, Roy’s largest root has historically been the least well understood. [28] Each of these can be described as some function of the eigenvalues of the matrix 𝖨a+𝖤^1𝖧^{\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}. Under the null hypothesis, H0{{H}_{0}}, the matrix 𝖧^\mathsf{\hat{H}} ‘should’ be all zeros, in a sense that will be made precise later, and thus the Hotelling Lawley trace and Roy’s largest root ‘should’ equal zero, the Pillai Bartlett trace ‘should’ equal aa, and Wilk’s LRT ‘should’ equal 11.

One can describe the MGLH tests statistics in terms of the asymptotic expansions of the matrix Θf{{\mathsf{\Theta}}_{f}} given in the previous sections. As in Section 4.4, let

𝒙~i+1=df[𝒇i,𝒙i+1].{\boldsymbol{\tilde{x}}}_{i+1}=_{\operatorname{df}}{{\left[{{{\boldsymbol{f}}_{i}}^{\top}},{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}. (112)

The second moment of 𝒙~\boldsymbol{\tilde{x}} is

Θf=dfE[𝒙~𝒙~]=[ΓfΓf𝖡𝖡ΓfΣ+𝖡Γf𝖡].{{\mathsf{\Theta}}_{f}}=_{\operatorname{df}}\operatorname{E}\left[\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right]=\left[\begin{array}[]{cc}{{\mathsf{\Gamma}}_{f}}&{{\mathsf{\Gamma}}_{f}{{\mathsf{B}}^{\top}}}\\ {\mathsf{B}{\mathsf{\Gamma}}_{f}}&{\mathsf{\Sigma}+\mathsf{B}{\mathsf{\Gamma}}_{f}{{\mathsf{B}}^{\top}}}\end{array}\right]. (113)

We can express the MGLH statistics in terms of the product of two matrices defined in terms of Θf{{\mathsf{\Theta}}_{f}}. Let 𝖬\mathsf{M} be the (f+p)×(c+a){\left(f+p\right)}\times{\left(c+a\right)} matrix

𝖬=df[𝖨f00𝖠].\mathsf{M}=_{\operatorname{df}}\left[\begin{array}[]{cc}{{\mathsf{I}}_{f}}&{0}\\ {0}&{\mathsf{A}}\end{array}\right].

Linear algebra confirms that

(𝖬Θf𝖬)1=[Γ^f1+(𝖠𝖡^)(𝖠Σ^𝖠)1(𝖠𝖡^)𝖡^𝖠(𝖠Σ^𝖠)1(𝖠Σ^𝖠)1𝖠𝖡^(𝖠Σ^𝖠)1].{{\left({{\mathsf{M}}^{\top}}{{\mathsf{\Theta}}_{f}}\mathsf{M}\right)}^{-1}}=\left[\begin{array}[]{cc}{{{{\mathsf{\hat{\Gamma}}}_{f}}^{-1}}+{{\left(\mathsf{A}\mathsf{\hat{B}}\right)}^{\top}}{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}\left(\mathsf{A}\mathsf{\hat{B}}\right)}&{-{{\mathsf{\hat{B}}}^{\top}}{{\mathsf{A}}^{\top}}{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}}\\ {-{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}{\mathsf{A}\mathsf{\hat{B}}}}&{{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}}\end{array}\right]. (114)

Thus

𝖦2=df[𝖢𝖳](𝖬Θf𝖬)1[𝖢𝖳]=𝖢Γ^f1𝖢+(𝖠𝖡^𝖢𝖳)(𝖠Σ^𝖠)1(𝖠𝖡^𝖢𝖳).\mathsf{G}_{2}=_{\operatorname{df}}\left[\begin{array}[]{rr}{{{\mathsf{C}}^{\top}}}&{{{\mathsf{T}}^{\top}}}\end{array}\right]{{\left({{\mathsf{M}}^{\top}}{{\mathsf{\Theta}}_{f}}\mathsf{M}\right)}^{-1}}\left[\begin{array}[]{r}{\mathsf{C}}\\ {\mathsf{T}}\end{array}\right]=\\ {{\mathsf{C}}^{\top}}{{{\mathsf{\hat{\Gamma}}}_{f}}^{-1}}\mathsf{C}+{{\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right)}^{\top}}{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right). (115)

Now define

𝖦1=df(𝖢([𝖨f𝟢p]Θf[𝖨f𝟢p])1𝖢)1=(𝖢Γ^f1𝖢)1.\mathsf{G}_{1}=_{\operatorname{df}}{{\left({{\mathsf{C}}^{\top}}{{\left(\left[\begin{array}[]{rr}{{{{\mathsf{I}}_{f}}^{\top}}}&{{{{\mathsf{0}}_{p}}^{\top}}}\end{array}\right]{{\mathsf{\Theta}}_{f}}\left[\begin{array}[]{r}{{\mathsf{I}}_{f}}\\ {{\mathsf{0}}_{p}}\end{array}\right]\right)}^{-1}}\mathsf{C}\right)}^{-1}}={{\left({{\mathsf{C}}^{\top}}{{{\mathsf{\hat{\Gamma}}}_{f}}^{-1}}\mathsf{C}\right)}^{-1}}. (116)

Thus

𝖦1𝖦2=𝖨c+(𝖢Γ^f1𝖢)1(𝖠𝖡^𝖢𝖳)(𝖠Σ^𝖠)1(𝖠𝖡^𝖢𝖳).\mathsf{G}_{1}\mathsf{G}_{2}={\mathsf{I}}_{c}+{{\left({{\mathsf{C}}^{\top}}{{{\mathsf{\hat{\Gamma}}}_{f}}^{-1}}\mathsf{C}\right)}^{-1}}{{\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right)}^{\top}}{{\left(\mathsf{A}\mathsf{\hat{\Sigma}}{{\mathsf{A}}^{\top}}\right)}^{-1}}\left(\mathsf{A}\mathsf{\hat{B}}\mathsf{C}-\mathsf{T}\right). (117)

This matrix is ‘morally equivalent’222To quote my advisor, Noel Walkington. to the matrix vech(𝖨a+𝖤^1𝖧^)\operatorname{vech}\left({\mathsf{I}}_{a}+{{\mathsf{\hat{E}}}^{-1}}{\mathsf{\hat{H}}}\right), in that they have the same eigenvalues. This holds because eig(𝖠𝖡)=eig(𝖡𝖠)eig\left(\mathsf{A}\mathsf{B}\right)=eig\left(\mathsf{B}\mathsf{A}\right). [54, equation (280)] Taking into account that they have diferent sizes (one is a×a{a}\times{a}, the other c×c{c}\times{c}), the MGLH statistics can be expressed as:

T^\displaystyle\hat{T} =tr(𝖦1𝖦2)c,\displaystyle=\operatorname{tr}\left(\mathsf{G}_{1}\mathsf{G}_{2}\right)-c, P^\displaystyle\quad\hat{P} =tr((𝖦1𝖦2)1)+ac,\displaystyle=\operatorname{tr}\left({{\left(\mathsf{G}_{1}\mathsf{G}_{2}\right)}^{-1}}\right)+a-c,
U^\displaystyle\hat{U} =|(𝖦1𝖦2)1|,\displaystyle=\left|{{\left(\mathsf{G}_{1}\mathsf{G}_{2}\right)}^{-1}}\right|, R^\displaystyle\quad\hat{R} =max(eig(𝖦1𝖦2))1.\displaystyle=max\left(eig\left(\mathsf{G}_{1}\mathsf{G}_{2}\right)\right)-1.

To find the asymptotic distribution of 𝖦1𝖦2\mathsf{G}_{1}\mathsf{G}_{2}, and of the MGLH test statistics, the derivatives of the matrices above with respect to Θf{{\mathsf{\Theta}}_{f}} need to be found. In practice this would certainly be better achieved through automatic differentation. [56] For concreteness, however, the derivatives are given here.

It must also be noted that the straightforward application of the delta method results in asymptotic normal approximations for the MGLH statistics. By their very nature, however, these statistics look much more like (non-central) Chi-square or F statistics. [49] Further study is warranted on this matter, perhaps using Hall’s approach. [21]

Lemma 6.1 (Some derivatives).

Define

𝖦1=df(𝖢([𝖨f𝟢p]Θf[𝖨f𝟢p])1𝖢)1,𝖦2=df([𝖢𝖳](𝖬Θf𝖬)1[𝖢𝖳]).\begin{split}\mathsf{G}_{1}&=_{\operatorname{df}}{{\left({{\mathsf{C}}^{\top}}{{\left(\left[\begin{array}[]{rr}{{{{\mathsf{I}}_{f}}^{\top}}}&{{{{\mathsf{0}}_{p}}^{\top}}}\end{array}\right]{{\mathsf{\Theta}}_{f}}\left[\begin{array}[]{r}{{\mathsf{I}}_{f}}\\ {{\mathsf{0}}_{p}}\end{array}\right]\right)}^{-1}}\mathsf{C}\right)}^{-1}},\\ \mathsf{G}_{2}&=_{\operatorname{df}}\left(\left[\begin{array}[]{rr}{{{\mathsf{C}}^{\top}}}&{{{\mathsf{T}}^{\top}}}\end{array}\right]{{\left({{\mathsf{M}}^{\top}}{{\mathsf{\Theta}}_{f}}\mathsf{M}\right)}^{-1}}\left[\begin{array}[]{r}{\mathsf{C}}\\ {\mathsf{T}}\end{array}\right]\right).\end{split} (118)

Then

d𝖦1dΘf=f𝒫(([𝖨f𝟢p]Θf[𝖨f𝟢p])1;𝖢)f𝒫(Θf;[𝖨f𝟢p]),d𝖦2dΘf=([𝖢𝖳][𝖢𝖳])f𝒫(Θf;𝖬),d𝖦11dΘf=(𝖢𝖢)f𝒫(Θf;[𝖨f𝟢p]),d𝖦21dΘf=f𝒫((𝖬Θf𝖬)1;[𝖢𝖳])f𝒫(Θf;𝖬),\begin{split}\frac{\mathrm{d}{\mathsf{G}_{1}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}&={f}_{\mathcal{P}}\left({{\left(\left[\begin{array}[]{rr}{{{{\mathsf{I}}_{f}}^{\top}}}&{{{{\mathsf{0}}_{p}}^{\top}}}\end{array}\right]{{\mathsf{\Theta}}_{f}}\left[\begin{array}[]{r}{{\mathsf{I}}_{f}}\\ {{\mathsf{0}}_{p}}\end{array}\right]\right)}^{-1}};\mathsf{C}\right){f}_{\mathcal{P}}\left({{\mathsf{\Theta}}_{f}};\left[\begin{array}[]{r}{{\mathsf{I}}_{f}}\\ {{\mathsf{0}}_{p}}\end{array}\right]\right),\\ \frac{\mathrm{d}{\mathsf{G}_{2}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}&=\left({\left[\begin{array}[]{rr}{{{\mathsf{C}}^{\top}}}&{{{\mathsf{T}}^{\top}}}\end{array}\right]}\otimes{\left[\begin{array}[]{rr}{{{\mathsf{C}}^{\top}}}&{{{\mathsf{T}}^{\top}}}\end{array}\right]}\right){f}_{\mathcal{P}}\left({{\mathsf{\Theta}}_{f}};\mathsf{M}\right),\\ \frac{\mathrm{d}{{{\mathsf{G}_{1}}^{-1}}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}&=\left({{{\mathsf{C}}^{\top}}}\otimes{{{\mathsf{C}}^{\top}}}\right){f}_{\mathcal{P}}\left({{\mathsf{\Theta}}_{f}};\left[\begin{array}[]{r}{{\mathsf{I}}_{f}}\\ {{\mathsf{0}}_{p}}\end{array}\right]\right),\\ \frac{\mathrm{d}{{{\mathsf{G}_{2}}^{-1}}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}&={f}_{\mathcal{P}}\left({{\left({{\mathsf{M}}^{\top}}{{\mathsf{\Theta}}_{f}}\mathsf{M}\right)}^{-1}};\left[\begin{array}[]{r}{\mathsf{C}}\\ {\mathsf{T}}\end{array}\right]\right){f}_{\mathcal{P}}\left({{\mathsf{\Theta}}_{f}};\mathsf{M}\right),\end{split}

where we define

f𝒫(𝖷;𝖩)=df((𝖩𝖷𝖩)1(𝖩𝖷𝖩)1)(𝖩𝖩).{f}_{\mathcal{P}}\left(\mathsf{X};\mathsf{J}\right)=_{\operatorname{df}}-\left({{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}\otimes{{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}\right)\left({{{\mathsf{J}}^{\top}}}\otimes{{{\mathsf{J}}^{\top}}}\right).
Proof.

These follow from Lemma A.1 and the chain rule. ∎

Lemma 6.2 (MGLH derivatives).

Define the population analogues of the MGLH statistics as

T\displaystyle T =dftr(𝖨a+𝖤1𝖧)a,\displaystyle=_{\operatorname{df}}\operatorname{tr}\left({\mathsf{I}}_{a}+{{\mathsf{E}}^{-1}}{\mathsf{H}}\right)-a, P\displaystyle P =dftr((𝖨a+𝖤1𝖧)1),\displaystyle=_{\operatorname{df}}\operatorname{tr}\left({{\left({\mathsf{I}}_{a}+{{\mathsf{E}}^{-1}}{\mathsf{H}}\right)}^{-1}}\right), (119)
U\displaystyle U =df|(𝖨a+𝖤1𝖧)1|,\displaystyle=_{\operatorname{df}}\left|{{\left({\mathsf{I}}_{a}+{{\mathsf{E}}^{-1}}{\mathsf{H}}\right)}^{-1}}\right|, R\displaystyle R =dfmax(eig(𝖨a+𝖤1𝖧))1,\displaystyle=_{\operatorname{df}}max\left(eig\left({\mathsf{I}}_{a}+{{\mathsf{E}}^{-1}}{\mathsf{H}}\right)\right)-1, (120)

where

𝖧=df(𝖠𝖡𝖢𝖳)(𝖢Γf1𝖢)1(𝖠𝖡𝖢𝖳),𝖤=df𝖠Σ𝖠.\mathsf{H}=_{\operatorname{df}}\left(\mathsf{A}\mathsf{B}\mathsf{C}-\mathsf{T}\right){{\left({{\mathsf{C}}^{\top}}{{{\mathsf{\Gamma}}_{f}}^{-1}}\mathsf{C}\right)}^{-1}}{{\left(\mathsf{A}\mathsf{B}\mathsf{C}-\mathsf{T}\right)}^{\top}},\quad\mathsf{E}=_{\operatorname{df}}{{\mathsf{A}}^{\top}}\mathsf{\Sigma}\mathsf{A}.

Define Θf{{\mathsf{\Theta}}_{f}} as in Equation 113. Let

𝖰T=dfdTdΘf,{\mathsf{Q}}_{T}=_{\operatorname{df}}\frac{\mathrm{d}{T}}{\mathrm{d}{{{\mathsf{\Theta}}_{f}}}},

and similarly define 𝖰P,𝖰U,𝖰R.{\mathsf{Q}}_{P},{\mathsf{Q}}_{U},{\mathsf{Q}}_{R}. Then

𝖰T\displaystyle{\mathsf{Q}}_{T} =vec(𝖦1)d𝖦2dΘf+vec(𝖦2)d𝖦1dΘf,\displaystyle={{\operatorname{vec}\left({{\mathsf{G}_{1}}^{\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{G}_{2}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}+{{\operatorname{vec}\left(\mathsf{G}_{2}\right)}^{\top}}\frac{\mathrm{d}{{{\mathsf{G}_{1}}^{\top}}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}},
𝖰P\displaystyle{\mathsf{Q}}_{P} =vec(𝖦1)d𝖦21dΘf+vec(𝖦21)d𝖦1dΘf,\displaystyle={{\operatorname{vec}\left({{\mathsf{G}_{1}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{{{\mathsf{G}_{2}}^{-1}}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}+{{\operatorname{vec}\left({{\mathsf{G}_{2}}^{-1}}\right)}^{\top}}\frac{\mathrm{d}{{{\mathsf{G}_{1}}^{-\top}}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}},
𝖰U\displaystyle{\mathsf{Q}}_{U} =|𝖦1𝖦2|1(vec(𝖦1)d𝖦1dΘf+vec(𝖦2)d𝖦2dΘf),\displaystyle=\left|\mathsf{G}_{1}\mathsf{G}_{2}\right|^{-1}\left({{\operatorname{vec}\left({{\mathsf{G}_{1}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{G}_{1}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}+{{\operatorname{vec}\left({{\mathsf{G}_{2}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{G}_{2}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}\right),
𝖰R\displaystyle{\mathsf{Q}}_{R} =(𝝂1𝝂1)[(𝖨𝖦1)d𝖦2dΘf+(𝖦2𝖨)d𝖦1dΘf],\displaystyle=\left({{{{\boldsymbol{\nu}}_{1}}^{\top}}}\otimes{{{{\boldsymbol{\nu}}_{1}}^{\top}}}\right)\left[\left(\mathsf{I}\otimes\mathsf{G}_{1}\right)\frac{\mathrm{d}{\mathsf{G}_{2}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}+\left({{\mathsf{G}_{2}}^{\top}}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\mathsf{G}_{1}}}{\mathrm{d}{{\mathsf{\Theta}}_{f}}}\right],

where 𝛎1{\boldsymbol{\nu}}_{1} is the leading eigenvector of 𝖦1𝖦2\mathsf{G}_{1}\mathsf{G}_{2}, normalized by 𝛎1𝛎1=1{{{\boldsymbol{\nu}}_{1}}^{\top}}{\boldsymbol{\nu}}_{1}=1, and where where 𝖦2\mathsf{G}_{2} and 𝖦1\mathsf{G}_{1} are defined in Equation 115 and Equation 116, and their derivatives with respect to Θf{{\mathsf{\Theta}}_{f}} are given in Lemma 6.1.

Proof.

These follow from the definition of the MGLH statistics, and Equations 124, 127, and 128 of Lemma A.1. ∎

Theorem 6.3.

Let Θ^f=df1ni𝐱~i+1𝐱~i+1{{\mathsf{\hat{\Theta}}}_{f}}=_{\operatorname{df}}\frac{1}{n}\sum_{i}{\boldsymbol{\tilde{x}}}_{i+1}{{{\boldsymbol{\tilde{x}}}_{i+1}}^{\top}}, based on nn i.i.d. samples of [𝐟,𝐱]{{\left[{{\boldsymbol{f}}^{\top}},{{\boldsymbol{x}}^{\top}}\right]}^{\top}}, where

𝒙~i+1=df[𝒇i,𝒙i+1].{\boldsymbol{\tilde{x}}}_{i+1}=_{\operatorname{df}}{{\left[{{{\boldsymbol{f}}_{i}}^{\top}},{{{\boldsymbol{x}}_{i+1}}^{\top}}\right]}^{\top}}.

Let Ω\mathsf{\Omega} be the variance of vech(𝐱~𝐱~)\operatorname{vech}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right). Define 𝖧^\mathsf{\hat{H}} and 𝖤^\mathsf{\hat{E}} as in Equation 107, for given a×p{a}\times{p} matrix 𝖠\mathsf{A}, f×c{f}\times{c} matrix 𝖢\mathsf{C}, and a×c{a}\times{c} matrix 𝖳\mathsf{T}.

Define the MGLH test statistics, T^\hat{T}, P^\hat{P}, U^\hat{U}, and R^\hat{R} as in Equations 108 through 111, and let TT, PP, UU and RR be their population analogues.

Then, asymptotically in nn,

n(T^T)\displaystyle\sqrt{n}\left(\hat{T}-T\right) 𝒩(0,(𝖰T𝖣)Ω(𝖰T𝖣)),\displaystyle\rightsquigarrow\mathcal{N}\left(0,\left({\mathsf{Q}}_{T}\mathsf{D}\right)\mathsf{\Omega}{{\left({\mathsf{Q}}_{T}\mathsf{D}\right)}^{\top}}\right),
n(P^P)\displaystyle\sqrt{n}\left(\hat{P}-P\right) 𝒩(0,(𝖰P𝖣)Ω(𝖰P𝖣)),\displaystyle\rightsquigarrow\mathcal{N}\left(0,\left({\mathsf{Q}}_{P}\mathsf{D}\right)\mathsf{\Omega}{{\left({\mathsf{Q}}_{P}\mathsf{D}\right)}^{\top}}\right),
n(U^U)\displaystyle\sqrt{n}\left(\hat{U}-U\right) 𝒩(0,(𝖰U𝖣)Ω(𝖰U𝖣)),\displaystyle\rightsquigarrow\mathcal{N}\left(0,\left({\mathsf{Q}}_{U}\mathsf{D}\right)\mathsf{\Omega}{{\left({\mathsf{Q}}_{U}\mathsf{D}\right)}^{\top}}\right),
n(R^R)\displaystyle\sqrt{n}\left(\hat{R}-R\right) 𝒩(0,(𝖰R𝖣)Ω(𝖰R𝖣)),\displaystyle\rightsquigarrow\mathcal{N}\left(0,\left({\mathsf{Q}}_{R}\mathsf{D}\right)\mathsf{\Omega}{{\left({\mathsf{Q}}_{R}\mathsf{D}\right)}^{\top}}\right),

where 𝖰T{\mathsf{Q}}_{T}, 𝖰P{\mathsf{Q}}_{P}, 𝖰U{\mathsf{Q}}_{U}, 𝖰R{\mathsf{Q}}_{R} are given in Lemma 6.2. Furthermore, we may replace Ω\mathsf{\Omega} in this equation with an asymptotically consistent estimator, Ω^\mathsf{\hat{\Omega}}.

Proof.

This follows from the delta method and Lemma 6.2. ∎

7 Examples

7.1 Random Data

Empirically, the marginal Wald test for zero weighting in the Markowitz portfolio based on the approximation of Theorem 2.5 are nearly identical to the tt-statistics produced by the procedure of Britten-Jones. [7] Here 10241024 days of Gaussian returns for 55 assets with mean zero and some randomly generated covariance are randomly generated. The procedure of Britten-Jones is applied marginally on each asset. The Wald statistic is also computed via Theorem 2.5 by ‘plugging in’ the sample estimate, Θ^\mathsf{\hat{\Theta}}, to estimate the standard error. The two test values for the 55 assets are presented in Table 1, and match very well.

The value of the asymptotic approach is that it admits the generalizations of Section 4, and allows robust estimation of Ω\mathsf{\Omega}. [68]

rets1 rets2 rets3 rets4 rets5
Britten.Jones 0.4950 0.0479 1.2077 -0.4544 -1.4636
Wald 0.4965 0.0479 1.2107 -0.4573 -1.4635
Table 1: The tt statistics of Britten-Jones [7] are presented, along with the Wald statistics from plugging in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in Theorem 2.5, for 1024 days of Gaussian returns of 5 assets with zero mean. Statistics are presented with 4 significant digits to illustrate the difference in values of the two methods, not because these statistics are worthy of such accuracy.

7.1.1 Normal Returns

We test the confidence intervals of Theorem 3.11 and Equation 54 for random data. We draw returns from the multivariate normal distribution, for which κ=1\kappa=1. We fix the number of days of data, nn, the number of assets, pp, and the optimal signal-noise ratio, ζ{\zeta}_{*}, and perform 2.5×1042.5\times 10^{4} simulations. We then let nn vary from 100 to 1.28×1041.28\times 10^{4} days; we let pp vary from 2 to 16; we let ζ{\zeta}_{*} vary from 0.5 to 2 in ‘annualized’ units (per square root year), where we assume 252 days per year. We compute the lower confidence limits on SNR(𝝂;Θ,0)\operatorname{SNR}\left(\boldsymbol{\nu};\mathsf{\Theta},0\right), the signal-noise ratio of the sample Markowitz portfolio based on the difference and ratio forms from the theorem. The confidence limtis are computed very optimistically, by using the actual ζ{\zeta}_{*} in the expressions for the mean and variance of SNR(Θ^1;Θ,0)ζ^{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}} and SNR(Θ^1;Θ,0)/ζ^{\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)/{\hat{\zeta}}_{*}}. For the ‘TAS’ form of confidence limit, we use ζ^{\hat{\zeta}}_{*} once in the non-centrality parameter, but otherwise use the actual ζ{\zeta}_{*} when computing parameters. Thus while this does not test the confidence limits in the way they would be practically used (e.g., Equation 59), the results are sufficiently discouraging even with this bit of clairvoyance to recommend against their general use.

We compute the lower 0.05 confidence limit based on the difference and ratio forms and the ‘TAS’ transform. We then compute the empirical type I rate. These are plotted against nn in Figure 1. We show facet columns for ζ{\zeta}_{*}, and facet rows for pp. The confidence intervals fail to achieve nominal coverage except perhaps for the largest values of nn, though these are much larger than would be used in practice.

Refer to caption
Figure 1: The empirical type I rate of three different one-sided confidence intervals for SNR(Θ^1;Θ,0)\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right), the signal-noise ratio of the Markowitz portfolio are shown, where the nominal type I rate is 0.050.05. The daily returns are drawn from multivariate normal distribution with varying ζ{\zeta}_{*}, nn, and pp. The confidence intervals generally fail to achieve the nominal rate except for unrealistically large values of nn.

As a check, we also compare the empirical mean of the signal-noise ratio of the Markowitz portfolio from our experiments with the theoretical asymptotic value from Theorem 3.11, namely

(κζ2+1)(1p)2nζ.\frac{\left(\kappa{\zeta}^{2}_{*}+1\right)\left(1-p\right)}{2n{\zeta}_{*}}.

We plot the empirical and theoretical means in Figure 2, again versus nn with facet columns for ζ{\zeta}_{*}, and facet rows for pp. The theoretical asymptotic value gives a good approximation for larger sample sizes, but ‘only’ around 6 years of daily data are required. Note that the theoretical value gets worse as nζ0n{\zeta}_{*}\to 0, as one would expect: the signal-noise ratio of any portfolio must be no greater than ζ{\zeta}_{*} in absolute value, but the theoretical value goes to -\infty.

Refer to caption
Figure 2: The empirical and theoretical asymptotic mean value of the signal-noise ratio of the Markowitz portfolio are shown versus sample size, nn, for varying ζ{\zeta}_{*} and pp.

7.1.2 Elliptical Returns

We now test Equation 50 via simulation. We fix the number of days of data, nn, the number of assets, pp, and the optimal signal-noise ratio, ζ{\zeta}_{*}, and the kurtosis factor, κ\kappa, and perform 10410^{4} simulations. We let nn vary from 100 to 16001600 days; we let pp vary from 4 to 16; we let κ\kappa vary from 1 to 16; we let ζ{\zeta}_{*} vary from 0.5 to 2 in ‘annualized’ units (per square root year), where we assume 252 days per year. When κ=1\kappa=1 we draw from a multivariate normal distribution; when κ>1\kappa>1, we draw from a multivariate shifted tt distribution.

For each simulation, we collect the first and last elements of the vector

n(𝖰Σ/2𝝂^ζ𝒆1)𝖣1/2.\sqrt{n}\left({\mathsf{Q}{{\mathsf{\Sigma}}^{\top/2}}{\boldsymbol{\hat{\nu}}}_{{}*}}-{{\zeta}_{*}{\boldsymbol{e}}_{1}}\right)\mathsf{D}^{-1/2}.

By Equation 50 these should be asymptotically distributed as a standard normal. In Figure 3 we give Q-Q plots of the first element of this vector versus quantiles of the standard normal for each setting of the simulation parameters. Rather than present the full Q-Q plot (each facet would contain 10,000 points), we take evenly spaced points between 4-4 and 44, then convert them into percentage points of the standard normal. We then find the empirical quantiles at those levels and plot the empirical quantiles against the selected theoretical. This allows us to also plot the pointwise confidence bands, which should be very small for the sample size, except at the periphery. Similarly, in Figure 4 we give the same kind of subsampled Q-Q plots of the last element of this vector.

With some exception, the Q-Q plots show a fair degree of support for normality when n/pn/p is reasonably large. For example, for p=4p=4, a sample of n=400n=400 is apparently sufficient to get near normality for the last element of the vector. The first element of the vector, which one suspects is dependent on ζ{\zeta}_{*}, appears to suffer when ζ{\zeta}_{*} is larger.

Refer to caption
Figure 3: Q-Q plots of the first element of the transformed Markowitz portfolio, transformed to an approximate standard normal, versus normal quantiles are shown for varying pp, nn, ζ{\zeta}_{*}, and κ\kappa. Evenly spaced theoretical quantiles are used to compute the empirical quantiles, allowing us to plot 0.95% confidence bands.
Refer to caption
Figure 4: Q-Q plots of the last element of the transformed Markowitz portfolio, transformed to an approximate standard normal, versus normal quantiles are shown for varying pp, nn, ζ{\zeta}_{*}, and κ\kappa. Evenly spaced theoretical quantiles are used to compute the empirical quantiles, allowing us to plot 0.95% confidence bands.

We also check Equation 51 via a smaller set of simulations. We fix ζ=2{\zeta}_{*}=2 in annualized units, p=6p=6, and κ=4\kappa=4. We then test two different sample sizes: n=2,520n=2,520 and n=25,200n=25,200, corresponding to 10 and 100 years of daily data, at a rate of 252252 days per year. For a given simulation we draw the appropriate number of days of independent returns from a shifted multivariate tt distribution. We compute the sample Markowitz portfolio and compute its signal-noise ratio, SNR(𝝂^)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right). We perform this simulation 50,000 times for each setting of nn. We construct theoretical approximate quantiles of SNR(𝝂^)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right) using Equation 51, then plot the empirical quantiles versus the theoretical quantiles in Figure 5. It appears that more than 10 years of daily data (but fewer than 100 years worth) are required for this approximation to be any good. When the approximation breaks down, it tends to underestimate the true signal-noise ratio.

Refer to caption
Figure 5: Q-Q plots of SNR(𝝂^)\operatorname{SNR}\left({\boldsymbol{\hat{\nu}}}_{{}*}\right) are given versus the theoretical values from Equation 51, for p=6p=6, κ=4\kappa=4, ζ=2{\zeta}_{*}=2 and for 10 and 100 years of daily data at a rate of 252 days per year. We plot only selected theoretical quantiles and the corresponding empirical quantiles, with 95% confidence bands. Note that the two facets have different xx limits and different aspect ratios. The approximation appears to underestimate the true signal-noise ratio for small sample sizes.

7.2 Fama French Three Factor Portfolios

The monthly returns of the Fama French 3 factor portfolios from Jul 1926 to Jul 2013 were downloaded from Quandl. [19, 41] The returns excess the risk-free rate (which is also given in the same data) are computed. The procedure of Britten-Jones is applied to get marginal tt statistics for each of the three assets. The marginal Wald statistics are also computed, first using the vanilla estimator of Ω\mathsf{\Omega}, then using a robust (HAC) estimator via the errors via the sandwich package. [68] These are presented in Table 2. The Wald statistics are slightly less optimistic than the Britten-Jones tt-statistics for the long MKT and short SMB positions. This is amplified when the HAC estimator is used.

MKT HML SMB
Britten.Jones 4.10 0.30 -1.97
Wald 3.86 0.31 -1.92
Wald.HAC 3.51 0.27 -1.78
Table 2: The marginal tt statistics of Britten-Jones [7] procedure, along with the Wald statistics from plugging in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in Theorem 2.5, with ‘vanilla’ and HAC estimators of Ω\mathsf{\Omega}, are shown for 1045 months of excess returns of the three Fama-French portfolios.

7.2.1 Incorporating conditional heteroskedasticity

A rolling estimate of general market volatility is computed by taking the 11 month FIR mean of the median absolute return of the three portfolios, delayed by one month. The model of ‘constant maximal Sharpe ratio’ (i.e., Equation 76) is assumed, where si{{s}_{i}} is the inverse of this estimated volatility. This is equivalent to dividing the returns by the estimated volatility, then applying the unconditional estimator.

The marginal Wald statistics are presented in Table 3, and are more ‘confident’ in the long MKT position, and short SMB position, with little evidence to support a long or short position in HML.

MKT HML SMB
HAC.wald.wt 4.51 0.62 -2.93
Table 3: The marginal Wald statistics computed from plugging in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in Theorem 2.5, with a HAC estimator of Ω\mathsf{\Omega}, are shown for 1045 months of excess returns of the three Fama-French portfolios. To adjust for heteroskedasticity, returns are divided by a lagged volatility estimate based on 11 previous months of returns of the three portfolios.

7.2.2 Conditional expectation

The Shiller cyclically adjusted price/earnings data (CAPE) are downloaded from Quandl. [41] The CAPE data are delayed by a month so that they qualify as features which could be used in the investment decision. That is, we are testing a model where CAPE are used to predict returns, not ‘explain’ them, contemporaneously. The CAPE data are centered by subtracting the mean. The marginal Wald statistics, computed using a HAC estimator for Ω\mathsf{\Omega}, for the 6 elements of the Markowitz coefficient matrix are presented in Table 4, and indicate a significant unconditional long MKT position; when CAPE is above the long term average value of 17.54, decreasing the position in MKT is warranted.

MKT HML SMB
Intercept 2.22 0.59 -1.33
CAPE -2.46 -1.13 -0.70
Table 4: The marginal Wald statistics of the Markowitz coefficient computed from plugging in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in Theorem 4.12, with a HAC estimator of Ω\mathsf{\Omega}, are shown for 1045 months of excess returns of the three Fama-French portfolios. The Shiller CAPE data are delayed by a month, and centered. No adjustments for conditional heteroskedasticity are performed.
MKT HML SMB
Intercept 3.49 0.27 -1.77
del.CAPE 1.19 0.03 2.52
Table 5: The marginal Wald statistics of the Markowitz coefficient computed from plugging in Θ^\mathsf{\hat{\Theta}} for Θ\mathsf{\Theta} in Theorem 4.12, with a HAC estimator of Ω\mathsf{\Omega}, are shown for 1045 months of excess returns of the three Fama-French portfolios. The Shiller CAPE data are delayed by a month, and the first difference is computed. No adjustments for conditional heteroskedasticity are performed.

The CAPE data change at a very low frequency. It is possible that the changes in the CAPE data are predictive of future returns. The Wald statistics of the Markowitz coefficient using the first difference in monthly CAPE dataa, delayed by a month, are presented in Table 5. These suggest a long unconditional position in MKT, with the differences in CAPE providing a ‘timing’ signal for an unconditional short SMB position.

7.2.3 Attribution of error

Theorem 2.5 gives the asymptotic distribution of Θ^1{{\mathsf{\hat{\Theta}}}^{-1}}, which contains the (negative) Markowitz portfolio and the precision matrix. This allows one to estimate the amount of error in the Markowitz portfolio which is attributable to mis-estimation of the covariance. The remainder one can attribute to mis-estimation of the mean vector, which, is typically implicated as the leading effect. [8]

The computation is performed as follows: the estimated covariance of vech(Θ^1)\operatorname{vech}\left({{\mathsf{\hat{\Theta}}}^{-1}}\right) is turned into a correlation matrix in the usual way333That is, by a Hadamard divide of the rank one matrix of the outer product of the diagonal. Or, more practically, by the R function cov2cor.. This gives a correlation matrix, call it 𝖱\mathsf{R}, some of the elements of which correspond to the negative Markowitz portfolio, and some to the precision matrix. For a single element of the Markowitz portfolio, let 𝒓\boldsymbol{r} be a sub-column of 𝖱\mathsf{R} consisting of the column corresponding to that element of the Markowitz portfolio and the rows of the precision matrix. And let 𝖱Σ\mathsf{R}_{\mathsf{\Sigma}} be the sub-matrix of 𝖱\mathsf{R} corresponding to the precision matrix. The multiple correlation coefficient is then 𝒓𝖱Σ1𝒓{{\boldsymbol{r}}^{\top}}\mathsf{R}_{\mathsf{\Sigma}}^{-1}\boldsymbol{r}. This is an ‘R-squared’ number between zero and one, estimating the proportion of variance in that element of the Markowitz portfolio ‘explained’ by error in the precision matrix.

vanilla weighted
MKT 41 % 32 %
HML 11 % 6.8 %
SMB 29 % 13 %
Table 6: Estimated multiple correlation coefficients for the three elements of the Markowitz portfolio are presented. These are the percentage of error attributable to mis-estimation of the precision matrix. HAC estimators for Ω\mathsf{\Omega} are used for both. In the ‘vanilla’ column, no adjustments are made for conditional heteroskedasticity, while in the ‘weighted’ column, returns are divided by the volatility estimate.

Here, for each of the members of the vanilla Markowitz portfolio on the 3 assets, this squared coefficient of multiple correlation, is expressed as percents in Table 6. A HAC estimator for Ω\mathsf{\Omega} is used. In the ‘weighted’ column, the returns are divided by the rolling estimate of volatility described above, assuming a model of ‘constant maximal Sharpe ratio’. We can claim, then, that approximately 41 percent of the error in the MKT position is due to mis-estimation of the precision matrix.

References

Appendix A Matrix Derivatives

Lemma A.1 (Derivatives).

Given conformable, symmetric, matrices 𝖷\mathsf{X}, 𝖸\mathsf{Y}, 𝖹\mathsf{Z}, and constant matrix 𝖩\mathsf{J}, define

f𝒫(𝖷;𝖩)=df((𝖩𝖷𝖩)1(𝖩𝖷𝖩)1)(𝖩𝖩).{f}_{\mathcal{P}}\left(\mathsf{X};\mathsf{J}\right)=_{\operatorname{df}}-\left({{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}\otimes{{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}\right)\left({{{\mathsf{J}}^{\top}}}\otimes{{{\mathsf{J}}^{\top}}}\right).

Then

d(𝖩𝖷𝖩)1d𝖹\displaystyle\frac{\mathrm{d}{{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}}{\mathrm{d}\mathsf{Z}} =f𝒫(𝖷;𝖩)d𝖷d𝖹.\displaystyle={f}_{\mathcal{P}}\left(\mathsf{X};\mathsf{J}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}. (121)
d𝖷𝖸d𝖹\displaystyle\frac{\mathrm{d}{\mathsf{X}\mathsf{Y}}}{\mathrm{d}\mathsf{Z}} =(𝖨𝖷)d𝖸d𝖹+(𝖸𝖨)d𝖷d𝖹.\displaystyle=\left(\mathsf{I}\otimes\mathsf{X}\right)\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{Z}}+\left({{\mathsf{Y}}^{\top}}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}. (122)
d𝖷𝖷d𝖹\displaystyle\frac{\mathrm{d}{\mathsf{X}{{\mathsf{X}}^{\top}}}}{\mathrm{d}\mathsf{Z}} =(𝖨+𝖪)(𝖷𝖨)d𝖷d𝖹.\displaystyle=\left(\mathsf{I}+\mathsf{K}\right)\left(\mathsf{X}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}. (123)
dtr(𝖷𝖸)d𝖹\displaystyle\frac{\mathrm{d}{\operatorname{tr}\left(\mathsf{X}\mathsf{Y}\right)}}{\mathrm{d}\mathsf{Z}} =vec(𝖷)d𝖸d𝖹+vec(𝖸)d𝖷d𝖹.\displaystyle={{\operatorname{vec}\left({{\mathsf{X}}^{\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{Z}}+{{\operatorname{vec}\left(\mathsf{Y}\right)}^{\top}}\frac{\mathrm{d}{{{\mathsf{X}}^{\top}}}}{\mathrm{d}\mathsf{Z}}. (124)
d|𝖷|d𝖹\displaystyle\frac{\mathrm{d}{\left|\mathsf{X}\right|}}{\mathrm{d}\mathsf{Z}} =|𝖷|vec(𝖷)d𝖷d𝖹.\displaystyle=\left|\mathsf{X}\right|{{\operatorname{vec}\left({{\mathsf{X}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}. (125)
d|𝖷𝖸|d𝖹\displaystyle\frac{\mathrm{d}{\left|\mathsf{X}\mathsf{Y}\right|}}{\mathrm{d}\mathsf{Z}} =|𝖷𝖸|(vec(𝖷)d𝖷d𝖹+vec(𝖸)d𝖸d𝖹).\displaystyle=\left|\mathsf{X}\mathsf{Y}\right|\left({{\operatorname{vec}\left({{\mathsf{X}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}+{{\operatorname{vec}\left({{\mathsf{Y}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{Z}}\right). (126)
d|(𝖷𝖸)1|d𝖹\displaystyle\frac{\mathrm{d}{\left|{{\left(\mathsf{X}\mathsf{Y}\right)}^{-1}}\right|}}{\mathrm{d}\mathsf{Z}} =|𝖷𝖸|1(vec(𝖷)d𝖷d𝖹+vec(𝖸)d𝖸d𝖹).\displaystyle=\left|\mathsf{X}\mathsf{Y}\right|^{-1}\left({{\operatorname{vec}\left({{\mathsf{X}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}+{{\operatorname{vec}\left({{\mathsf{Y}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{Y}}}{\mathrm{d}\mathsf{Z}}\right). (127)

Here 𝖪\mathsf{K} is the ’commutation matrix.’

Let λj{\lambda}_{j} be the jthj^{\text{th}} eigenvalue of 𝖷\mathsf{X}, with corresponding eigenvector 𝛎j{\boldsymbol{\nu}}_{j}, normalized so that 𝛎j𝛎j=1{{{\boldsymbol{\nu}}_{j}}^{\top}}{\boldsymbol{\nu}}_{j}=1. Then

dλjd𝖹\displaystyle\frac{\mathrm{d}{{\lambda}_{j}}}{\mathrm{d}\mathsf{Z}} =(𝝂j𝝂j)d𝖷d𝖹.\displaystyle=\left({{{{\boldsymbol{\nu}}_{j}}^{\top}}}\otimes{{{{\boldsymbol{\nu}}_{j}}^{\top}}}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}\mathsf{Z}}. (128)
Proof.

For Equation 121, write

d(𝖩𝖷𝖩)1d𝖹=d(𝖩𝖷𝖩)1d(𝖩𝖷𝖩)d(𝖩𝖷𝖩)d𝖹.\frac{\mathrm{d}{{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}}{\mathrm{d}\mathsf{Z}}=\frac{\mathrm{d}{{{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}^{-1}}}}{\mathrm{d}\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}\frac{\mathrm{d}{\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)}}{\mathrm{d}\mathsf{Z}}.

Lemma 2.4 gives the derivative on the left; to get the derivative on the right, note that vec(𝖩𝖷𝖩)=(𝖩𝖩)vec(𝖷)\operatorname{vec}\left({{\mathsf{J}}^{\top}}\mathsf{X}\mathsf{J}\right)=\left({{{\mathsf{J}}^{\top}}}\otimes{{{\mathsf{J}}^{\top}}}\right)\operatorname{vec}\left(\mathsf{X}\right), then use linearity of the derivative.

For Equation 122, write vec(𝖷𝖸)=(𝖸𝖷)vec(𝖨)\operatorname{vec}\left(\mathsf{X}\mathsf{Y}\right)=\left({{\mathsf{Y}}^{\top}}\otimes\mathsf{X}\right)\operatorname{vec}\left(\mathsf{I}\right). Then consider the derivative of vec(𝖷𝖸)\operatorname{vec}\left(\mathsf{X}\mathsf{Y}\right) with respect to any scalar zz :

dvec(𝖷𝖸)dz=d(𝖸𝖷)vec(𝖨)dz=vec1(d(𝖸𝖷)dz)vec(𝖨),\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}\mathsf{Y}\right)}}{\mathrm{d}z}=\frac{\mathrm{d}{\left({{\mathsf{Y}}^{\top}}\otimes\mathsf{X}\right)\operatorname{vec}\left(\mathsf{I}\right)}}{\mathrm{d}z}={{\operatorname{vec}}}^{-1}\left(\frac{\mathrm{d}{\left({{\mathsf{Y}}^{\top}}\otimes\mathsf{X}\right)}}{\mathrm{d}z}\right)\operatorname{vec}\left(\mathsf{I}\right),

where vec1(){{\operatorname{vec}}}^{-1}\left(\cdot\right) is the inverse of vec()\operatorname{vec}\left(\cdot\right). That is, vec1(vec()){{\operatorname{vec}}}^{-1}\left(\operatorname{vec}\left(\cdot\right)\right) is the identity over square matrices. (This wrinkle is needed because we have defined derivatives of matrices to be the derivative of their vectorization.)

For Equation 123, by Equation 122,

dvec(𝖷𝖷)dvec(𝖹)=(𝖨𝖷)dvec(𝖷)dvec(𝖹)+(𝖷𝖨)dvec(𝖷)dvec(𝖹),=(𝖨𝖷)𝖪dvec(𝖷)dvec(𝖹)+(𝖷𝖨)dvec(𝖷)dvec(𝖹).\begin{split}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}{{\mathsf{X}}^{\top}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Z}\right)}&=\left(\mathsf{I}\otimes\mathsf{X}\right)\frac{\mathrm{d}{\operatorname{vec}\left({{\mathsf{X}}^{\top}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Z}\right)}+\left({\mathsf{X}}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Z}\right)},\\ &=\left(\mathsf{I}\otimes\mathsf{X}\right)\mathsf{K}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Z}\right)}+\left({\mathsf{X}}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Z}\right)}.\\ \end{split}

Now let 𝖠\mathsf{A} be any conformable square matrix. We have:

(𝖨𝖷)𝖪vec(𝖠)=(𝖨𝖷)vec(𝖠)=vec(𝖷𝖠)=𝖪vec(𝖠𝖷)=𝖪(𝖷𝖨)vec(𝖠).\left(\mathsf{I}\otimes\mathsf{X}\right)\mathsf{K}\operatorname{vec}\left(\mathsf{A}\right)=\left(\mathsf{I}\otimes\mathsf{X}\right)\operatorname{vec}\left({{\mathsf{A}}^{\top}}\right)=\operatorname{vec}\left(\mathsf{X}{{\mathsf{A}}^{\top}}\right)=\\ \mathsf{K}\operatorname{vec}\left(\mathsf{A}{{\mathsf{X}}^{\top}}\right)=\mathsf{K}\left(\mathsf{X}\otimes\mathsf{I}\right)\operatorname{vec}\left(\mathsf{A}\right).

Because 𝖠\mathsf{A} was arbitrary, we have (𝖨𝖷)𝖪=𝖪(𝖷𝖨),\left(\mathsf{I}\otimes\mathsf{X}\right)\mathsf{K}=\mathsf{K}\left(\mathsf{X}\otimes\mathsf{I}\right), and the result follows.

Using the product rule for Kronecker products [54], then using the vector identity again we have

dvec(𝖷𝖸)dz=(vec1(d𝖸dz)𝖷+𝖸vec1(d𝖷dz))vec(𝖨),=(𝖨𝖷)d𝖸dz+(𝖸𝖨)d𝖷dz.\begin{split}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{X}\mathsf{Y}\right)}}{\mathrm{d}z}&=\left({{\operatorname{vec}}}^{-1}\left(\frac{\mathrm{d}{{{\mathsf{Y}}^{\top}}}}{\mathrm{d}z}\right)\otimes\mathsf{X}+{{\mathsf{Y}}^{\top}}\otimes{{\operatorname{vec}}}^{-1}\left(\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}\right)\right)\operatorname{vec}\left(\mathsf{I}\right),\\ &=\left(\mathsf{I}\otimes\mathsf{X}\right)\frac{\mathrm{d}{{\mathsf{Y}}}}{\mathrm{d}z}+\left({{\mathsf{Y}}^{\top}}\otimes\mathsf{I}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}.\end{split}

Then apply this result to every element of vec(𝖹)\operatorname{vec}\left(\mathsf{Z}\right) to get the result.

For Equation 124, write

tr(𝖷𝖸)=vec(𝖷)vec(𝖸),\operatorname{tr}\left(\mathsf{X}\mathsf{Y}\right)={{\operatorname{vec}\left({{\mathsf{X}}^{\top}}\right)}^{\top}}\operatorname{vec}\left(\mathsf{Y}\right),

then use the product rule.

For Equation 125, first consider the derivative of |𝖷|\left|\mathsf{X}\right| with respect to a scalar zz. This is known to take form: [54]

d|𝖷|dz=|X|tr(𝖷1vec1(d𝖷dz)),\frac{\mathrm{d}{\left|\mathsf{X}\right|}}{\mathrm{d}z}=\left|X\right|\operatorname{tr}\left({{\mathsf{X}}^{-1}}{{\operatorname{vec}}}^{-1}\left(\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}\right)\right),

where the vec1(){{\operatorname{vec}}}^{-1}\left(\cdot\right) is here because of how we have defined derivatives of matrices. Rewrite the trace as the dot product of two vectors:

d|𝖷|dz=|X|vec(𝖷)d𝖷dz.\frac{\mathrm{d}{\left|\mathsf{X}\right|}}{\mathrm{d}z}=\left|X\right|{{\operatorname{vec}\left({{\mathsf{X}}^{-\top}}\right)}^{\top}}\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}.

Using this to compute the derivative with respect to each element of vec(𝖹)\operatorname{vec}\left(\mathsf{Z}\right) gives the result. Equation 126 follows from the scalar product rule since |𝖷𝖸|=|𝖷||𝖸|\left|\mathsf{X}\mathsf{Y}\right|=\left|\mathsf{X}\right|\left|\mathsf{Y}\right|. Equation 127 then follows, using the scalar chain rule.

For Equation 128, the derivative of the jthj^{\text{th}} eigenvalue of matrix 𝖷\mathsf{X} with respect to a scalar zz is known to be: [54, equation (67)]

dλjdz=𝝂jvec1(d𝖷dz)𝝂j.\frac{\mathrm{d}{{\lambda}_{j}}}{\mathrm{d}z}={{{\boldsymbol{\nu}}_{j}}^{\top}}{{\operatorname{vec}}}^{-1}\left(\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}\right){\boldsymbol{\nu}}_{j}.

Take the vectorization of this scalar, and rewrite it in Kronecker form:

dλjdz=(𝝂j𝝂j)d𝖷dz.\frac{\mathrm{d}{{\lambda}_{j}}}{\mathrm{d}z}=\left({{{{\boldsymbol{\nu}}_{j}}^{\top}}}\otimes{{{{\boldsymbol{\nu}}_{j}}^{\top}}}\right)\frac{\mathrm{d}{\mathsf{X}}}{\mathrm{d}z}.

Use this to compute the derivative of λj{\lambda}_{j} with respect to each element of vec(𝖹)\operatorname{vec}\left(\mathsf{Z}\right).

Lemma A.2 (Cholesky Derivatives).

Let 𝖷\mathsf{X} be a symmetric positive definite matrix. Let 𝖸\mathsf{Y} be its lower triangular Cholesky factor. That is, 𝖸\mathsf{Y} is the lower triangular matrix such that 𝖸𝖸=𝖷\mathsf{Y}{{\mathsf{Y}}^{\top}}=\mathsf{X}. Then

dvech(𝖸)dvech(𝖷)\displaystyle\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{X}\right)} =(𝖫(𝖨+𝖪)(𝖸𝖨)𝖫)1,\displaystyle={{\left(\mathsf{L}\left(\mathsf{I}+\mathsf{K}\right)\left(\mathsf{Y}\otimes\mathsf{I}\right){{\mathsf{L}}^{\top}}\right)}^{-1}}, (129)

where 𝖪\mathsf{K} is the ’commutation matrix’. [37]

Proof.

By Equation 123 of Lemma A.1,

dvec(𝖸𝖸)dvec(𝖸)=(𝖨+𝖪)(𝖸𝖨).\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Y}\right)}=\left(\mathsf{I}+\mathsf{K}\right)\left(\mathsf{Y}\otimes\mathsf{I}\right).

By the chain rule, for lower triangular matrix 𝖸\mathsf{Y}, we have

dvech(𝖸𝖸)dvech(𝖸)=dvech(𝖸𝖸)dvec(𝖸𝖸)dvec(𝖸𝖸)dvec(𝖸)dvec(𝖸)dvech(𝖸),=𝖫(𝖨+𝖪)(𝖸𝖨)𝖫.\begin{split}\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{Y}\right)}&=\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}}{\mathrm{d}\operatorname{vec}\left(\mathsf{Y}\right)}\frac{\mathrm{d}{\operatorname{vec}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{Y}\right)},\\ &=\mathsf{L}\left(\mathsf{I}+\mathsf{K}\right)\left(\mathsf{Y}\otimes\mathsf{I}\right){{\mathsf{L}}^{\top}}.\end{split}

The result now follows since

dvech(𝖸)dvech(𝖷)=dvech(𝖸)dvech(𝖸𝖸)=(dvech(𝖸𝖸)dvech(𝖸))1.\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{X}\right)}=\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}={{\left(\frac{\mathrm{d}{\operatorname{vech}\left(\mathsf{Y}{{\mathsf{Y}}^{\top}}\right)}}{\mathrm{d}\operatorname{vech}\left(\mathsf{Y}\right)}\right)}^{-1}}.

Appendix B Proofs

Here we give proofs of Theorem 3.6 and Corollary 3.9.

Proof of Theorem 3.6.

We note that in the Gaussian case (κ=1\kappa=1) this result is proved by Magnus and Neudecker, up to a rearrangement in the terms [35, Theorem 4.4 (ii)].

First we suppose that the first element of 𝒙~\boldsymbol{\tilde{x}}, rather than being a deterministic 1, is a random variable with mean 11, no covariance with the elements of xx, and a variance of ϵ\epsilon. We assume the random first element is such that 𝒙~\boldsymbol{\tilde{x}} is elliptically distributed with mean 𝝁~\boldsymbol{\tilde{\mu}} and covariance Σ~\mathsf{\tilde{\Sigma}}. After finding the variance of 𝒙~𝒙~\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}, we will take ϵ0.\epsilon\to 0. Below we will use Θ\mathsf{\Theta} to mean Σ~+𝝁~𝝁~\mathsf{\tilde{\Sigma}}+\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}, which converges to our usual definition as ϵ0\epsilon\to 0.

An extension of Isserlis’ theorem gives the moments of centered elements of 𝒙~\boldsymbol{\tilde{x}}. [64, 29] In particular, the first four centered moments are

E[𝒙~i𝝁~i]\displaystyle\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}-{\boldsymbol{\tilde{\mu}}}_{i}\right] =0,\displaystyle=0,
E[(𝒙~i𝝁~i)(𝒙~j𝝁~j)]\displaystyle\operatorname{E}\left[\left({\boldsymbol{\tilde{x}}}_{i}-{\boldsymbol{\tilde{\mu}}}_{i}\right)\left({\boldsymbol{\tilde{x}}}_{j}-{\boldsymbol{\tilde{\mu}}}_{j}\right)\right] =Σ~i,j,\displaystyle={\mathsf{\tilde{\Sigma}}}_{i,j},
E[(𝒙~i𝝁~i)(𝒙~j𝝁~j)(𝒙~k𝝁~k)]\displaystyle\operatorname{E}\left[\left({\boldsymbol{\tilde{x}}}_{i}-{\boldsymbol{\tilde{\mu}}}_{i}\right)\left({\boldsymbol{\tilde{x}}}_{j}-{\boldsymbol{\tilde{\mu}}}_{j}\right)\left({\boldsymbol{\tilde{x}}}_{k}-{\boldsymbol{\tilde{\mu}}}_{k}\right)\right] =0,\displaystyle=0,
E[(𝒙~i𝝁~i)(𝒙~j𝝁~j)(𝒙~k𝝁~k)(𝒙~l𝝁~l)]\displaystyle\operatorname{E}\left[\left({\boldsymbol{\tilde{x}}}_{i}-{\boldsymbol{\tilde{\mu}}}_{i}\right)\left({\boldsymbol{\tilde{x}}}_{j}-{\boldsymbol{\tilde{\mu}}}_{j}\right)\left({\boldsymbol{\tilde{x}}}_{k}-{\boldsymbol{\tilde{\mu}}}_{k}\right)\left({\boldsymbol{\tilde{x}}}_{l}-{\boldsymbol{\tilde{\mu}}}_{l}\right)\right] =κ[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k].\displaystyle=\kappa\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right].

It is a tedious exercise to compute the raw uncentered moments:

E[𝒙~i]\displaystyle\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}\right] =𝝁~i,\displaystyle={\boldsymbol{\tilde{\mu}}}_{i},
E[𝒙~i𝒙~j]\displaystyle\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j}\right] =Θi,j,\displaystyle={{\mathsf{\Theta}}_{i,j}},
E[𝒙~i𝒙~j𝒙~k]\displaystyle\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j}{\boldsymbol{\tilde{x}}}_{k}\right] =𝝁~i𝝁~j𝝁~k+𝝁~iΣ~j,k+𝝁~jΣ~i,k+𝝁~kΣ~i,j,\displaystyle={\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}+{\boldsymbol{\tilde{\mu}}}_{i}{\mathsf{\tilde{\Sigma}}}_{j,k}+{\boldsymbol{\tilde{\mu}}}_{j}{\mathsf{\tilde{\Sigma}}}_{i,k}+{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,j},
E[𝒙~i𝒙~j𝒙~k𝒙~l]\displaystyle\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j}{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}\right] =κ[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\kappa\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+𝝁~i𝝁~jΣ~k,l+𝝁~i𝝁~kΣ~j,l+𝝁~i𝝁~lΣ~j,k\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΣ~i,l+𝝁~j𝝁~lΣ~i,k+𝝁~k𝝁~lΣ~i,j\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,k}+{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,j}
+𝝁~i𝝁~j𝝁~k𝝁~l.\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}.

We now want to compute the covariance of 𝒙~i𝒙~j{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j} with 𝒙~k𝒙~l{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}. We have

Cov(𝒙~i𝒙~j,𝒙~k𝒙~l)\displaystyle\operatorname{Cov}\left({\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j},{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}\right) =E[𝒙~i𝒙~j𝒙~k𝒙~l]E[𝒙~i𝒙~j]E[𝒙~k𝒙~l],\displaystyle=\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j}{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}\right]-\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j}\right]\operatorname{E}\left[{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}\right],
=κ[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\kappa\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+𝝁~i𝝁~jΣ~k,l+𝝁~i𝝁~kΣ~j,l+𝝁~i𝝁~lΣ~j,k\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΣ~i,l+𝝁~j𝝁~lΣ~i,k+𝝁~k𝝁~lΣ~i,j\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,k}+{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,j}
+𝝁~i𝝁~j𝝁~k𝝁~l.\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}.
(𝝁~i𝝁~j+Σ~i,j)(𝝁~k𝝁~l+Σ~k,l),\displaystyle\phantom{=}\,-\left({\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}+{\mathsf{\tilde{\Sigma}}}_{i,j}\right)\left({\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}+{\mathsf{\tilde{\Sigma}}}_{k,l}\right),
=κ[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\kappa\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+𝝁~i𝝁~kΣ~j,l+𝝁~i𝝁~lΣ~j,k\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΣ~i,l+𝝁~j𝝁~lΣ~i,k\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,k}
Σ~i,jΣ~k,l,\displaystyle\phantom{=}\,-{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l},
=(κ1)[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\left(\kappa-1\right)\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k\displaystyle\phantom{=}\,+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~i𝝁~kΣ~j,l+𝝁~i𝝁~lΣ~j,k\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΣ~i,l+𝝁~j𝝁~lΣ~i,k,\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,k},
=(κ1)[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\left(\kappa-1\right)\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+Θ^i,kΣ~j,l+Θ^i,lΣ~j,k\displaystyle\phantom{=}\,+{{\mathsf{\hat{\Theta}}}_{i,k}}{\mathsf{\tilde{\Sigma}}}_{j,l}+{{\mathsf{\hat{\Theta}}}_{i,l}}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΣ~i,l+𝝁~j𝝁~lΣ~i,k,\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\mathsf{\tilde{\Sigma}}}_{i,l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{\mathsf{\tilde{\Sigma}}}_{i,k},
=(κ1)[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\left(\kappa-1\right)\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+Θ^i,kΣ~j,l+Θ^i,lΣ~j,k\displaystyle\phantom{=}\,+{{\mathsf{\hat{\Theta}}}_{i,k}}{\mathsf{\tilde{\Sigma}}}_{j,l}+{{\mathsf{\hat{\Theta}}}_{i,l}}{\mathsf{\tilde{\Sigma}}}_{j,k}
+𝝁~j𝝁~kΘ^i,l𝝁~i𝝁~j𝝁~k𝝁~l+𝝁~j𝝁~lΘ^i,k𝝁~i𝝁~j𝝁~k𝝁~l,\displaystyle\phantom{=}\,+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{{\mathsf{\hat{\Theta}}}_{i,l}}-{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}+{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{l}{{\mathsf{\hat{\Theta}}}_{i,k}}-{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l},
=(κ1)[Σ~i,jΣ~k,l+Σ~i,kΣ~j,l+Σ~i,lΣ~j,k]\displaystyle=\left(\kappa-1\right)\left[{\mathsf{\tilde{\Sigma}}}_{i,j}{\mathsf{\tilde{\Sigma}}}_{k,l}+{\mathsf{\tilde{\Sigma}}}_{i,k}{\mathsf{\tilde{\Sigma}}}_{j,l}+{\mathsf{\tilde{\Sigma}}}_{i,l}{\mathsf{\tilde{\Sigma}}}_{j,k}\right]
+Θ^i,kΘ^j,l+Θ^i,lΘ^j,k2𝝁~i𝝁~j𝝁~k𝝁~l.\displaystyle\phantom{=}\,+{{\mathsf{\hat{\Theta}}}_{i,k}}{{\mathsf{\hat{\Theta}}}_{j,l}}+{{\mathsf{\hat{\Theta}}}_{i,l}}{{\mathsf{\hat{\Theta}}}_{j,k}}-2{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}.

Now we need only translate this scalar result into the vector result in the theorem. If 𝒙~\boldsymbol{\tilde{x}} is mm-dimensional, then the variance-covariance matrix of vec(𝒙~𝒙~)\operatorname{vec}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right) is m2×m2{m^{2}}\times{m^{2}} whose ij,klthij,kl^{\text{th}} element is given above. The term Σ~i,j{\mathsf{\tilde{\Sigma}}}_{i,j} Σ~k,l{\mathsf{\tilde{\Sigma}}}_{k,l} is the ij,klthij,kl^{\text{th}} element of vec(Σ~)vec(Σ~)\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}. The term Σ~i,k{\mathsf{\tilde{\Sigma}}}_{i,k} Σ~j,l{\mathsf{\tilde{\Sigma}}}_{j,l} is the ij,klthij,kl^{\text{th}} element of Σ~Σ~{\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}. The term Σ~j,k{\mathsf{\tilde{\Sigma}}}_{j,k} Σ~i,l{\mathsf{\tilde{\Sigma}}}_{i,l} is the ji,klthji,kl^{\text{th}} element of Σ~Σ~{\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}, and thus is the ij,klthij,kl^{\text{th}} element of 𝖪(Σ~Σ~)\mathsf{K}\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right). We can similarly identify the terms Θ^i,kΘ^j,l{{\mathsf{\hat{\Theta}}}_{i,k}}{{\mathsf{\hat{\Theta}}}_{j,l}}, Θ^i,lΘ^j,k{{\mathsf{\hat{\Theta}}}_{i,l}}{{\mathsf{\hat{\Theta}}}_{j,k}}, and 𝝁~i𝝁~j𝝁~k𝝁~l{\boldsymbol{\tilde{\mu}}}_{i}{\boldsymbol{\tilde{\mu}}}_{j}{\boldsymbol{\tilde{\mu}}}_{k}{\boldsymbol{\tilde{\mu}}}_{l}, and thus

Var(vec(𝒙~𝒙~))ij,kl\displaystyle\operatorname{Var}\left(\operatorname{vec}\left(\boldsymbol{\tilde{x}}{{\boldsymbol{\tilde{x}}}^{\top}}\right)\right)_{ij,kl} =Cov(𝒙~i𝒙~j,𝒙~k𝒙~l),\displaystyle=\operatorname{Cov}\left({\boldsymbol{\tilde{x}}}_{i}{\boldsymbol{\tilde{x}}}_{j},{\boldsymbol{\tilde{x}}}_{k}{\boldsymbol{\tilde{x}}}_{l}\right),
=(κ1)[vec(Σ~)vec(Σ~)+(𝖨+𝖪)Σ~Σ~]ij,kl\displaystyle=\left(\kappa-1\right)\left[\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}+\left(\mathsf{I}+\mathsf{K}\right){\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right]_{ij,kl}
+(𝖨+𝖪)(Θ^Θ^𝝁~𝝁~𝝁~𝝁~)ij,kl.\displaystyle\phantom{=}\,+\left(\mathsf{I}+\mathsf{K}\right)\left({\mathsf{\hat{\Theta}}}\otimes{\mathsf{\hat{\Theta}}}-{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right)_{ij,kl}.

Proof of Corollary 3.9.

From the previous corollary, it suffices to prove the identity of 𝖡\mathsf{B}. Define 𝖭=12(𝖨+𝖪)\mathsf{N}=\frac{1}{2}\left(\mathsf{I}+\mathsf{K}\right). Then note that Equation 44 becomes

Ω0\displaystyle{\mathsf{\Omega}}_{0} =(κ1)[𝖭vec(Σ~)vec(Σ~)+2𝖭Σ~Σ~]\displaystyle=\left(\kappa-1\right)\left[\mathsf{N}\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)\otimes{{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}+2\mathsf{N}{\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right]
+2𝖭[ΘΘ𝝁~𝝁~𝝁~𝝁~].\displaystyle\phantom{=}\,+2\mathsf{N}\left[{\mathsf{\Theta}}\otimes{\mathsf{\Theta}}-{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right].

By Lemma 3.5 of Magnus and Neudecker, note that 𝖣𝖫𝖭=𝖭\mathsf{D}\mathsf{L}\mathsf{N}=\mathsf{N}. [36] So consider

𝖡\displaystyle\mathsf{B} =(𝖫(Θ1Θ1)𝖣)𝖫Ω0𝖫(𝖫(Θ1Θ1)𝖣),\displaystyle=\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right){\mathsf{L}{\mathsf{\Omega}}_{0}{{\mathsf{L}}^{\top}}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}},
=(κ1)𝖫(Θ1Θ1)𝖭[vec(Σ~)vec(Σ~)]𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle=\left(\kappa-1\right)\mathsf{L}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\mathsf{N}\left[\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2(κ1)𝖫(Θ1Θ1)𝖭[Σ~Σ~]𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\mathsf{N}\left[{\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2𝖫(Θ1Θ1)𝖭[ΘΘ𝝁~𝝁~𝝁~𝝁~]𝖫(𝖫(Θ1Θ1)𝖣).\displaystyle\phantom{=}\,+2\mathsf{L}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\mathsf{N}\left[{\mathsf{\Theta}}\otimes{\mathsf{\Theta}}-{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}.

Now by Lemma 2.1 of Magnus and Neudecker, 𝖭(𝖠𝖠)=(𝖠𝖠)𝖭=𝖭(𝖠𝖠)𝖭\mathsf{N}\left({\mathsf{A}}\otimes{\mathsf{A}}\right)=\left({\mathsf{A}}\otimes{\mathsf{A}}\right)\mathsf{N}=\mathsf{N}\left({\mathsf{A}}\otimes{\mathsf{A}}\right)\mathsf{N}, so we can slide the 𝖭\mathsf{N} matrix around. Also note that because Σ~\mathsf{\tilde{\Sigma}} is symmetric, 𝖭vec(Σ~)vec(Σ~)=vec(Σ~)vec(Σ~)\mathsf{N}\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}=\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}. Then

𝖡\displaystyle\mathsf{B} =(κ1)𝖫(Θ1Θ1)[vec(Σ~)vec(Σ~)]𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle=\left(\kappa-1\right)\mathsf{L}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\left[\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2(κ1)𝖫𝖭(Θ1Θ1)[Σ~Σ~]𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\left[{\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2𝖫𝖭(Θ1Θ1)[ΘΘ𝝁~𝝁~𝝁~𝝁~]𝖫(𝖫(Θ1Θ1)𝖣),\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\left[{\mathsf{\Theta}}\otimes{\mathsf{\Theta}}-{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right]{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}},
=(κ1)𝖫[(Θ1Θ1)vec(Σ~)vec(Σ~)]𝖭𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2(κ1)𝖫[Θ1Σ~Θ1Σ~]𝖭𝖫(𝖫(Θ1Θ1)𝖣)\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\left[{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\otimes{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}}
+2𝖫[𝖨𝒆1𝝁~𝒆1𝝁~]𝖭𝖫(𝖫(Θ1Θ1)𝖣),\displaystyle\phantom{=}\,+2\mathsf{L}\left[\mathsf{I}-{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\left(\mathsf{L}{\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right)}\mathsf{D}\right)}^{\top}},

where we have slided the 𝖭\mathsf{N} matrix to the left, and multiplied by Θ1Θ1{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}. Now work on the right sides, shifting 𝖭\mathsf{N}, collapsing 𝖭𝖫𝖣{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\mathsf{D}}^{\top}} to 𝖭{{\mathsf{N}}^{\top}}, and multiplying by Θ1Θ1{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}} from the right:

𝖡\displaystyle\mathsf{B} =(κ1)𝖫[vec(Θ1Σ~Θ1)vec(Σ~)]𝖭𝖫𝖣(Θ1Θ1)𝖫\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\mathsf{D}}^{\top}}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{L}}^{\top}}
+2(κ1)𝖫𝖭[Θ1Σ~Θ1Σ~]𝖭𝖫𝖣(Θ1Θ1)𝖫\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\otimes{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\mathsf{D}}^{\top}}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{L}}^{\top}}
+2𝖫𝖭[𝖨𝒆1𝝁~𝒆1𝝁~]𝖭𝖫𝖣(Θ1Θ1)𝖫,\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left[\mathsf{I}-{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}{{\mathsf{D}}^{\top}}\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{L}}^{\top}},
=(κ1)𝖫[vec(Θ1Σ~Θ1)vec(Σ~)](Θ1Θ1)𝖭𝖫\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right]\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2(κ1)𝖫𝖭[Θ1Σ~Θ1Σ~](Θ1Θ1)𝖭𝖫\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\otimes{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}}\right]\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2𝖫𝖭[𝖨𝒆1𝝁~𝒆1𝝁~](Θ1Θ1)𝖭𝖫,\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left[\mathsf{I}-{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right]\left({{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}\right){{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}},
=(κ1)𝖫[vec(Θ1Σ~Θ1)vec(Θ1Σ~Θ1)]𝖭𝖫\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}\right){{\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}\right)}^{\top}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2(κ1)𝖫𝖭[Θ1Σ~Θ1Θ1Σ~Θ1]𝖭𝖫\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2𝖫𝖭[Θ1Θ1𝒆1𝒆1𝒆1𝒆1]𝖭𝖫.\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}-{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}.

Now note that Θ1Σ~Θ1=Θ1𝒆1𝒆1{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}={{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}. Then

𝖡\displaystyle\mathsf{B} =(κ1)𝖫[vec(Θ1𝒆1𝒆1)vec(Θ1𝒆1𝒆1)]𝖭𝖫\displaystyle=\left(\kappa-1\right)\mathsf{L}\left[\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right){{\operatorname{vec}\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}^{\top}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2(κ1)𝖫𝖭[(Θ1𝒆1𝒆1)(Θ1𝒆1𝒆1)]𝖭𝖫\displaystyle\phantom{=}\,+2\left(\kappa-1\right)\mathsf{L}\mathsf{N}\left[{\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}\otimes{\left({{\mathsf{\Theta}}^{-1}}-{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}\right)}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}
+2𝖫𝖭[Θ1Θ1𝒆1𝒆1𝒆1𝒆1]𝖭𝖫.\displaystyle\phantom{=}\,+2\mathsf{L}\mathsf{N}\left[{{{\mathsf{\Theta}}^{-1}}}\otimes{{{\mathsf{\Theta}}^{-1}}}-{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\otimes{{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}}\right]{{\mathsf{N}}^{\top}}{{\mathsf{L}}^{\top}}.

Proof of Theorem 3.11.

Starting from Corollary 2.13, take Ω=𝖫Ω0𝖫\mathsf{\Omega}=\mathsf{L}{\mathsf{\Omega}}_{0}{{\mathsf{L}}^{\top}}, with Ω0{\mathsf{\Omega}}_{0} defined in Theorem 3.6. In Equation 38 we give the abbreviated form for 𝖧𝖥𝖧{{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}.

Because of the symmetry of Ω0{\mathsf{\Omega}}_{0}, we can drop the 𝖫\mathsf{L} and 𝖣\mathsf{D} terms (but not the 𝖭\mathsf{N} matrix). We decompose Ω\mathsf{\Omega} into four terms based on the terms of Ω0{\mathsf{\Omega}}_{0}.

tr(𝖧𝖥𝖧Ω)\displaystyle\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}\mathsf{\Omega}\right) =a1+a2+a3a4.\displaystyle=a_{1}+a_{2}+a_{3}-a_{4}.

Now we need some technical results. The first is that for conformable matrices

tr((𝖠𝖡)vec(𝖷)vec(𝖷))=tr(vec(𝖷)(𝖠𝖡)vec(𝖷))=tr(𝖷𝖡𝖷𝖠).\operatorname{tr}\left(\left(\mathsf{A}\otimes\mathsf{B}\right)\operatorname{vec}\left(\mathsf{X}\right){{\operatorname{vec}\left(\mathsf{X}\right)}^{\top}}\right)=\operatorname{tr}\left({{\operatorname{vec}\left(\mathsf{X}\right)}^{\top}}\left(\mathsf{A}\otimes\mathsf{B}\right)\operatorname{vec}\left(\mathsf{X}\right)\right)=\operatorname{tr}\left({{\mathsf{X}}^{\top}}\mathsf{B}\mathsf{X}{{\mathsf{A}}^{\top}}\right).

The other concerns traces with 𝖭\mathsf{N}, where 2𝖭=𝖨+𝖪2\mathsf{N}=\mathsf{I}+\mathsf{K}, and where 𝖪\mathsf{K} is the commutation matrix. Let 𝖠\mathsf{A} and 𝖡\mathsf{B} be square matrices of the same size, then

tr(2𝖭(𝖠𝖡))=tr(𝖠)tr(𝖡)+tr(𝖠𝖡).\operatorname{tr}\left(2\mathsf{N}\left(\mathsf{A}\otimes\mathsf{B}\right)\right)=\operatorname{tr}\left(\mathsf{A}\right)\operatorname{tr}\left(\mathsf{B}\right)+\operatorname{tr}\left(\mathsf{A}\mathsf{B}\right). (130)

The proof is simple: since tr(𝖠𝖡)=tr(𝖠)tr(𝖡)\operatorname{tr}\left(\mathsf{A}\otimes\mathsf{B}\right)=\operatorname{tr}\left(\mathsf{A}\right)\operatorname{tr}\left(\mathsf{B}\right) is a known result, we need only focus on the 𝖪\mathsf{K} term. Note then that

tr(𝖪(𝖠𝖡))\displaystyle\operatorname{tr}\left(\mathsf{K}\left(\mathsf{A}\otimes\mathsf{B}\right)\right) =i,j(𝖪(𝖠𝖡))ij,ij=i,j((𝖠𝖡))ji,ij,\displaystyle=\sum_{i,j}\left(\mathsf{K}\left(\mathsf{A}\otimes\mathsf{B}\right)\right)_{ij,ij}=\sum_{i,j}\left(\left(\mathsf{A}\otimes\mathsf{B}\right)\right)_{ji,ij},
=i,j𝖡j,i𝖠i,j=i(𝖠𝖡)i,i=tr(𝖠𝖡).\displaystyle=\sum_{i,j}\mathsf{B}_{j,i}\mathsf{A}_{i,j}=\sum_{i}\left(\mathsf{A}\mathsf{B}\right)_{i,i}=\operatorname{tr}\left(\mathsf{A}\mathsf{B}\right).

This is Magnus and Neudecker Theorem 3.1 (xiii). [35] We will also implicitly use the fact that one can rotate terms in a trace operator. We then tackle the trace terms one at a time.

a1\displaystyle a_{1} =(κ1)tr(𝖧𝖥𝖧vec(Σ~)vec(Σ~)),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right),
=κ1ζtr(((Θ1𝒆1𝒆1Θ1)[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1])vec(Σ~)vec(Σ~)),\displaystyle=\frac{\kappa-1}{{\zeta}_{*}}\operatorname{tr}\left(\left(\left({{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\right)\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right),
=κ1ζtr(Σ~[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1]Σ~(Θ1𝒆1𝒆1Θ1)).\displaystyle=\frac{\kappa-1}{{\zeta}_{*}}\operatorname{tr}\left({{\mathsf{\tilde{\Sigma}}}^{\top}}\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\mathsf{\tilde{\Sigma}}\left({{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\right).

But note that

Σ~[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1]Σ~Θ1𝒆1\displaystyle\mathsf{\tilde{\Sigma}}{\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1} =[0𝟎𝟎𝝁𝝁ζ2Σ]Θ1𝒆1,\displaystyle={\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}}{{\zeta}^{2}_{*}}-\mathsf{\Sigma}}\end{array}\right]}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1},
=[0𝟎𝟎𝝁𝝁Σ1ζ2𝖨]𝒆1=[0𝟎],\displaystyle={\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right]}{\boldsymbol{e}}_{1}=\left[\begin{array}[]{r}{0}\\ {\boldsymbol{0}}\end{array}\right],

thus a1=0a_{1}=0.

Now consider

a2\displaystyle a_{2} =(κ1)tr(𝖧𝖥𝖧2𝖭(Σ~Σ~)),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}2\mathsf{N}\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right)\right),
=(κ1)tr(2𝖭(Σ~Σ~)𝖧𝖥𝖧).\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(2\mathsf{N}\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right){{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}\right).

Note that

(Σ~Σ~)𝖧𝖥𝖧\displaystyle\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right){{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H} =1ζ(Σ~Σ~)((Θ1𝒆1𝒆1Θ1)[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1]),\displaystyle=\frac{1}{{\zeta}_{*}}\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right)\left(\left({{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\right),
=1ζ(Σ~Θ1𝒆1𝒆1Θ1)[0𝟎𝟎𝝁𝝁Σ1ζ2𝖨],\displaystyle=\frac{1}{{\zeta}_{*}}\left(\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right],
=1ζ([0𝝁][1+ζ2,𝝂])[0𝟎𝟎𝝁𝝁Σ1ζ2𝖨].\displaystyle=\frac{1}{{\zeta}_{*}}\left(\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right].

So

a2\displaystyle a_{2} =κ1ζtr([0𝝁][1+ζ2,𝝂])tr([0𝟎𝟎𝝁𝝁Σ1ζ2𝖨]),\displaystyle=\frac{\kappa-1}{{\zeta}_{*}}\operatorname{tr}\left(\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\right)\operatorname{tr}\left(\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right]\right),
+κ1ζtr([0𝝁][1+ζ2,𝝂][0𝟎𝟎𝝁𝝁Σ1ζ2𝖨]),\displaystyle\phantom{=}\,+\frac{\kappa-1}{{\zeta}_{*}}\operatorname{tr}\left(\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right]\right),
=κ1ζζ2(1p)+κ1ζtr([0𝝁][0,𝟎]),\displaystyle=\frac{\kappa-1}{{\zeta}_{*}}{\zeta}^{2}_{*}\left(1-p\right)+\frac{\kappa-1}{{\zeta}_{*}}\operatorname{tr}\left(\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]\left[0,{{\boldsymbol{0}}^{\top}}\right]\right),
=(κ1)ζ(1p).\displaystyle=\left(\kappa-1\right){\zeta}_{*}\left(1-p\right).

Now consider

a3\displaystyle a_{3} =tr(𝖧𝖥𝖧2𝖭(ΘΘ)).\displaystyle=\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}2\mathsf{N}\left({\mathsf{\Theta}}\otimes{\mathsf{\Theta}}\right)\right).

Proceeding as before we have

(ΘΘ)𝖧𝖥𝖧\displaystyle\left({\mathsf{\Theta}}\otimes{\mathsf{\Theta}}\right){{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H} =1ζ(ΘΘ)((Θ1𝒆1𝒆1Θ1)[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1]),\displaystyle=\frac{1}{{\zeta}_{*}}\left({\mathsf{\Theta}}\otimes{\mathsf{\Theta}}\right)\left(\left({{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\right),
=1ζ(𝒆1[1+ζ2,𝝂])[0𝟎𝟎𝝁𝝁Σ1ζ2𝖨].\displaystyle=\frac{1}{{\zeta}_{*}}\left({\boldsymbol{e}}_{1}\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\right)\otimes\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right].

Then

a3\displaystyle a_{3} =1ζtr(𝒆1[1+ζ2,𝝂])tr([0𝟎𝟎𝝁𝝁Σ1ζ2𝖨]),\displaystyle=\frac{1}{{\zeta}_{*}}\operatorname{tr}\left({\boldsymbol{e}}_{1}\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\right)\operatorname{tr}\left(\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right]\right),
+1ζtr(𝒆1[1+ζ2,𝝂][0𝟎𝟎𝝁𝝁Σ1ζ2𝖨]),\displaystyle\phantom{=}\,+\frac{1}{{\zeta}_{*}}\operatorname{tr}\left({\boldsymbol{e}}_{1}\left[1+{\zeta}^{2}_{*},-{\boldsymbol{\nu}}_{{}*}\right]\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-\mathsf{I}}\end{array}\right]\right),
=1ζ(1+ζ2)(1p)+1ζtr(𝒆1[0,𝟎]),\displaystyle=\frac{1}{{\zeta}_{*}}\left(1+{\zeta}^{2}_{*}\right)\left(1-p\right)+\frac{1}{{\zeta}_{*}}\operatorname{tr}\left({\boldsymbol{e}}_{1}\left[0,{{\boldsymbol{0}}^{\top}}\right]\right),
=1ζ(1+ζ2)(1p).\displaystyle=\frac{1}{{\zeta}_{*}}\left(1+{\zeta}^{2}_{*}\right)\left(1-p\right).

Finally

a4\displaystyle a_{4} =tr(𝖧𝖥𝖧2𝖭(𝝁~𝝁~𝝁~𝝁~)).\displaystyle=\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}2\mathsf{N}\left({\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right)\right).

We have

(𝝁~𝝁~𝝁~𝝁~)𝖧𝖥𝖧\displaystyle\left({\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right){{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H} =1ζ(𝝁~𝝁~Θ1𝒆1𝒆1Θ1)(𝝁~𝝁~[0𝟎𝟎Σ1𝝁𝝁Σ1ζ2Σ1]),\displaystyle=\frac{1}{{\zeta}_{*}}\left(\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\left(\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}\left[\begin{array}[]{cc}{0}&{{{\boldsymbol{0}}^{\top}}}\\ {\boldsymbol{0}}&{\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}}\end{array}\right]\right),
=1ζ(𝝁~𝝁~Θ1𝒆1𝒆1Θ1)𝟢=𝟢,\displaystyle=\frac{1}{{\zeta}_{*}}\left(\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}{{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)\otimes\mathsf{0}=\mathsf{0},

because 𝝁(Σ1𝝁𝝁Σ1ζ2Σ1)=𝟎{{\boldsymbol{\mu}}^{\top}}\left(\frac{{{\mathsf{\Sigma}}^{-1}}\boldsymbol{\mu}{{\boldsymbol{\mu}}^{\top}}{{\mathsf{\Sigma}}^{-1}}}{{\zeta}^{2}_{*}}-{{\mathsf{\Sigma}}^{-1}}\right)=\boldsymbol{0}. Thus a4=0a_{4}=0.

Putting them together we have

tr(𝖧𝖥𝖧Ω0)\displaystyle\operatorname{tr}\left({{\mathsf{H}}^{\top}}\mathsf{F}\mathsf{H}{\mathsf{\Omega}}_{0}\right) =κζ2+1ζ(1p).\displaystyle=\frac{\kappa{\zeta}^{2}_{*}+1}{{\zeta}_{*}}\left(1-p\right).

Now consider the term with 𝒉𝒉\boldsymbol{h}{{\boldsymbol{h}}^{\top}} in Corollary 2.13. Define the four terms as

tr(𝒉𝒉Ω0)\displaystyle\operatorname{tr}\left(\boldsymbol{h}{{\boldsymbol{h}}^{\top}}{\mathsf{\Omega}}_{0}\right) =b1+b2+b3b4.\displaystyle=b_{1}+b_{2}+b_{3}-b_{4}.

Again, we will casually drop the 𝖫\mathsf{L} and 𝖣\mathsf{D} as needed.

From Theorem 2.11,

𝒉=(𝒆1Θ1)(𝒆1Θ1)𝖣.\boldsymbol{h}=-{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\otimes{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\mathsf{D}.

Continuing in order, we have:

b1\displaystyle b_{1} =(κ1)tr(((𝒆1Θ1)(𝒆1Θ1))((𝒆1Θ1)(𝒆1Θ1))vec(Σ~)vec(Σ~)),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(\left({\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\otimes{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\right){{\left({\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\otimes{\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\right)}\right)}^{\top}}\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right){{\operatorname{vec}\left(\mathsf{\tilde{\Sigma}}\right)}^{\top}}\right),
=(κ1)tr((𝒆1Θ1Σ~Θ1𝒆1)(𝒆1Θ1Σ~Θ1𝒆1)),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}\right)\left({{{\boldsymbol{e}}_{1}}^{\top}}{{\mathsf{\Theta}}^{-1}}\mathsf{\tilde{\Sigma}}{{\mathsf{\Theta}}^{-1}}{\boldsymbol{e}}_{1}\right)\right),
=(κ1)tr([0𝝁][1+ζ2𝝂][0𝝁][1+ζ2𝝂])=(κ1)ζ4.\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(\left[0-{{\boldsymbol{\mu}}^{\top}}\right]\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]\left[0-{{\boldsymbol{\mu}}^{\top}}\right]\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]\right)=\left(\kappa-1\right){\zeta}_{*}^{4}.
b2\displaystyle b_{2} =(κ1)tr(2𝖭(Σ~Σ~)(([1+ζ2𝝂][1+ζ2𝝂])([1+ζ2𝝂][1+ζ2𝝂]))),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(2\mathsf{N}\left({\mathsf{\tilde{\Sigma}}}\otimes{\mathsf{\tilde{\Sigma}}}\right)\left({\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=(κ1)tr(2𝖭(([0𝝁][1+ζ2𝝂])([0𝝁][1+ζ2𝝂]))),\displaystyle=\left(\kappa-1\right)\operatorname{tr}\left(2\mathsf{N}\left({\left({\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left({\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=(κ1)ζ4+(κ1)tr([0𝝁][1+ζ2𝝂][0𝝁][1+ζ2𝝂]),\displaystyle=\left(\kappa-1\right){\zeta}_{*}^{4}+\left(\kappa-1\right)\operatorname{tr}\left(\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\left[\begin{array}[]{r}{0}\\ {-\boldsymbol{\mu}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right),
=2(κ1)ζ4.\displaystyle=2\left(\kappa-1\right){\zeta}_{*}^{4}.
b3\displaystyle b_{3} =tr(2𝖭(ΘΘ)(([1+ζ2𝝂][1+ζ2𝝂])([1+ζ2𝝂][1+ζ2𝝂]))),\displaystyle=\operatorname{tr}\left(2\mathsf{N}\left({\mathsf{\Theta}}\otimes{\mathsf{\Theta}}\right)\left({\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=tr(2𝖭((𝒆1[1+ζ2𝝂])(𝒆1[1+ζ2𝝂]))),\displaystyle=\operatorname{tr}\left(2\mathsf{N}\left({\left({\boldsymbol{e}}_{1}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left({\boldsymbol{e}}_{1}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=(1+ζ2)2+tr(𝒆1[1+ζ2𝝂]𝒆1[1+ζ2𝝂]),\displaystyle=\left(1+{\zeta}^{2}_{*}\right)^{2}+\operatorname{tr}\left({\boldsymbol{e}}_{1}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}{\boldsymbol{e}}_{1}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right),
=2(1+ζ2)2.\displaystyle=2\left(1+{\zeta}^{2}_{*}\right)^{2}.
b4\displaystyle b_{4} =tr(2𝖭(𝝁~𝝁~𝝁~𝝁~)(([1+ζ2𝝂][1+ζ2𝝂])([1+ζ2𝝂][1+ζ2𝝂]))),\displaystyle=\operatorname{tr}\left(2\mathsf{N}\left({\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\otimes{\boldsymbol{\tilde{\mu}}{{\boldsymbol{\tilde{\mu}}}^{\top}}}\right)\left({\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left(\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=tr(2𝖭((𝝁~[1+ζ2𝝂])(𝝁~[1+ζ2𝝂]))),\displaystyle=\operatorname{tr}\left(2\mathsf{N}\left({\left(\boldsymbol{\tilde{\mu}}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\otimes{\left(\boldsymbol{\tilde{\mu}}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)}\right)\right),
=1+tr(𝝁~[1+ζ2𝝂]𝝁~[1+ζ2𝝂])=2.\displaystyle=1+\operatorname{tr}\left(\boldsymbol{\tilde{\mu}}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\boldsymbol{\tilde{\mu}}{{\left[\begin{array}[]{r}{1+{\zeta}^{2}_{*}}\\ {-{\boldsymbol{\nu}}_{{}*}}\end{array}\right]}^{\top}}\right)=2.

Collecting terms,

tr(𝒉𝒉Ω0)\displaystyle\operatorname{tr}\left(\boldsymbol{h}{{\boldsymbol{h}}^{\top}}{\mathsf{\Omega}}_{0}\right) =3(κ1)ζ4+2(1+ζ2)22=3(κ1)ζ4+4ζ2+2ζ4,\displaystyle=3\left(\kappa-1\right){\zeta}_{*}^{4}+2\left(1+{\zeta}^{2}_{*}\right)^{2}-2=3\left(\kappa-1\right){\zeta}_{*}^{4}+4{\zeta}^{2}_{*}+2{\zeta}_{*}^{4},
=(3κ1)ζ4+4ζ2.\displaystyle=\left(3\kappa-1\right){\zeta}_{*}^{4}+4{\zeta}^{2}_{*}.

The proof of Theorem 3.11 is rather tedious; in Section C, we confirm that it is correct.

Proof of Lemma 3.12.

We first note that

𝖷~𝖷~i,j\displaystyle{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}_{i,j} =t𝒆t𝖷~𝒆i𝒆t𝖷~𝒆j,\displaystyle=\sum_{t}{\boldsymbol{e}}^{\top}_{t}\mathsf{\tilde{{X}}}{\boldsymbol{e}}_{i}{\boldsymbol{e}}^{\top}_{t}\mathsf{\tilde{{X}}}{\boldsymbol{e}}_{j},
=t𝒆j𝒆t𝒆i𝒆tvec(vec(𝖷~)vec(𝖷~)).\displaystyle=\sum_{t}{\boldsymbol{e}}^{\top}_{j}\otimes{\boldsymbol{e}}^{\top}_{t}\otimes{\boldsymbol{e}}^{\top}_{i}\otimes{\boldsymbol{e}}^{\top}_{t}\operatorname{vec}\left(\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right){{\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)}^{\top}}\right).

We then note that

E[vec(𝖷~)vec(𝖷~)]=vec(𝖬)vec(𝖬)+ΞΨ.\operatorname{E}\left[\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right){{\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)}^{\top}}\right]=\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}+\mathsf{\Xi}\otimes\mathsf{\Psi}.

Then

E[𝖷~𝖷~i,j]\displaystyle\operatorname{E}\left[{{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}_{i,j}\right] =t𝒆j𝒆t𝒆i𝒆t[vec(vec(𝖬)vec(𝖬))+vec(ΞΨ)],\displaystyle=\sum_{t}{\boldsymbol{e}}^{\top}_{j}\otimes{\boldsymbol{e}}^{\top}_{t}\otimes{\boldsymbol{e}}^{\top}_{i}\otimes{\boldsymbol{e}}^{\top}_{t}\left[\operatorname{vec}\left(\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}\right)+\operatorname{vec}\left(\mathsf{\Xi}\otimes\mathsf{\Psi}\right)\right],
=(𝖬𝖬)i,j+tvec(𝒆i𝒆t(ΞΨ)𝒆j𝒆t),\displaystyle=\left({{\mathsf{M}}^{\top}}\mathsf{M}\right)_{i,j}+\sum_{t}\operatorname{vec}\left({\boldsymbol{e}}^{\top}_{i}\otimes{\boldsymbol{e}}^{\top}_{t}\left(\mathsf{\Xi}\otimes\mathsf{\Psi}\right){\boldsymbol{e}}_{j}\otimes{\boldsymbol{e}}_{t}\right),
=(𝖬𝖬)i,j+tΞi,jΨt,t.\displaystyle=\left({{\mathsf{M}}^{\top}}\mathsf{M}\right)_{i,j}+\sum_{t}\mathsf{\Xi}_{i,j}\mathsf{\Psi}_{t,t}.

This establishes the mean.

Now for the variance. By Equation 44 with κ=1\kappa=1 (and with n=1n=1 since we are vectorizing the 𝖷~\mathsf{\tilde{{X}}}), letting 𝒚=vec(vec(𝖷~)vec(𝖷~))\boldsymbol{y}=\operatorname{vec}\left(\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right){{\operatorname{vec}\left(\mathsf{\tilde{{X}}}\right)}^{\top}}\right),

Var(𝒚)\displaystyle\operatorname{Var}\left(\boldsymbol{y}\right) =(𝖨+𝖪)[(vec(𝖬)vec(𝖬)+ΞΨ)(vec(𝖬)vec(𝖬)+ΞΨ)]\displaystyle=\left(\mathsf{I}+\mathsf{K}\right)\left[{\left(\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}+\mathsf{\Xi}\otimes\mathsf{\Psi}\right)}\otimes{\left(\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}+\mathsf{\Xi}\otimes\mathsf{\Psi}\right)}\right]
(𝖨+𝖪)[vec(𝖬)vec(𝖬)vec(𝖬)vec(𝖬)],\displaystyle\phantom{=}\,-\left(\mathsf{I}+\mathsf{K}\right)\left[{\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}}\otimes{\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}}\right],
=2𝖭[ΞΨΞΨ+(vec(𝖬)vec(𝖬))ΞΨ+ΞΨ(vec(𝖬)vec(𝖬))].\displaystyle=2\mathsf{N}\left[{\mathsf{\Xi}\otimes\mathsf{\Psi}}\otimes{\mathsf{\Xi}\otimes\mathsf{\Psi}}+\left(\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}\right)\otimes\mathsf{\Xi}\otimes\mathsf{\Psi}+\mathsf{\Xi}\otimes\mathsf{\Psi}\otimes\left(\operatorname{vec}\left(\mathsf{M}\right){{\operatorname{vec}\left(\mathsf{M}\right)}^{\top}}\right)\right].

Let 𝒛=vec(𝖷~𝖷~)\boldsymbol{z}=\operatorname{vec}\left({{\mathsf{\tilde{{X}}}}^{\top}}\mathsf{\tilde{{X}}}\right). Then

Var(𝒛)ij,kl\displaystyle\operatorname{Var}\left(\boldsymbol{z}\right)_{ij,kl} =t,s𝒆j𝒆t𝒆i𝒆tVar(𝒚)𝒆l𝒆s𝒆k𝒆s,\displaystyle=\sum_{t,s}{\boldsymbol{e}}^{\top}_{j}\otimes{\boldsymbol{e}}^{\top}_{t}\otimes{\boldsymbol{e}}^{\top}_{i}\otimes{\boldsymbol{e}}^{\top}_{t}\operatorname{Var}\left(\boldsymbol{y}\right){\boldsymbol{e}}_{l}\otimes{\boldsymbol{e}}_{s}\otimes{\boldsymbol{e}}_{k}\otimes{\boldsymbol{e}}_{s},
=t,sΞj,lΨt,sΞi,kΨt,s+𝖬t,j𝖬s,lΞi,kΨt,s+𝖬t,i𝖬s,kΞj,lΨt,s\displaystyle=\sum_{t,s}{\mathsf{\Xi}}_{j,l}{\mathsf{\Psi}}_{t,s}{\mathsf{\Xi}}_{i,k}{\mathsf{\Psi}}_{t,s}+{\mathsf{M}}_{t,j}{\mathsf{M}}_{s,l}{\mathsf{\Xi}}_{i,k}{\mathsf{\Psi}}_{t,s}+{\mathsf{M}}_{t,i}{\mathsf{M}}_{s,k}{\mathsf{\Xi}}_{j,l}{\mathsf{\Psi}}_{t,s}
+t,sΞi,lΨt,sΞj,kΨt,s+𝖬t,i𝖬s,lΞj,kΨt,s+𝖬t,j𝖬s,kΞi,lΨt,s\displaystyle\phantom{=}\,+\sum_{t,s}{\mathsf{\Xi}}_{i,l}{\mathsf{\Psi}}_{t,s}{\mathsf{\Xi}}_{j,k}{\mathsf{\Psi}}_{t,s}+{\mathsf{M}}_{t,i}{\mathsf{M}}_{s,l}{\mathsf{\Xi}}_{j,k}{\mathsf{\Psi}}_{t,s}+{\mathsf{M}}_{t,j}{\mathsf{M}}_{s,k}{\mathsf{\Xi}}_{i,l}{\mathsf{\Psi}}_{t,s}
=Ξj,lΞi,ktr(ΨΨ)+(𝖬Ψ𝖬)j,lΞi,k+(𝖬Ψ𝖬)i,kΞj,l\displaystyle={\mathsf{\Xi}}_{j,l}{\mathsf{\Xi}}_{i,k}\operatorname{tr}\left(\mathsf{\Psi}\mathsf{\Psi}\right)+\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)_{j,l}{\mathsf{\Xi}}_{i,k}+\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)_{i,k}{\mathsf{\Xi}}_{j,l}
+Ξi,lΞj,ktr(ΨΨ)+(𝖬Ψ𝖬)i,lΞj,k+(𝖬Ψ𝖬)j,kΞi,l.\displaystyle\phantom{=}\,+{\mathsf{\Xi}}_{i,l}{\mathsf{\Xi}}_{j,k}\operatorname{tr}\left(\mathsf{\Psi}\mathsf{\Psi}\right)+\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)_{i,l}{\mathsf{\Xi}}_{j,k}+\left({{\mathsf{M}}^{\top}}\mathsf{\Psi}\mathsf{M}\right)_{j,k}{\mathsf{\Xi}}_{i,l}.

Appendix C Confirming Theorem 3.11

Here we ‘confirm’ Theorem 3.11, by which we mean we generate random population values of 𝝁\boldsymbol{\mu} and Σ\mathsf{\Sigma}, then compute Ω0{\mathsf{\Omega}}_{0} from Theorem 3.6, and use it with to generate the estimate values of E[SNR(Θ^1;Θ,0)ζ^]\operatorname{E}\left[\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right] and Var(SNR(Θ^1;Θ,0)ζ^)\operatorname{Var}\left(\operatorname{SNR}\left({{\mathsf{\hat{\Theta}}}^{-1}};\mathsf{\Theta},0\right)-{\hat{\zeta}}_{*}\right) from Corollary 2.13. We then compare these to the forms given in Theorem 3.11, finding that they are equal to machine precision. Note that this does nothing to confirm Theorem 3.6 nor Corollary 2.13, rather it gives some assurance that Theorem 3.11 is consistent with those two results.

\MakeFramed
library(matrixcalc)

# linear algebra utilities
# quadratic form x’ A x
qform <- function(A,x) { t(x) %*% (A %*% x) }
# quadratic form x A x’
qoform <- function(A,x) { qform(A,t(x)) }
# outer gram: x x’
ogram <- function(x) { x %*% t(x) }
# A kron A
AkronA <- function(A) { kronecker(A,A,FUN=’*’) }
# matrix trace
matrace <- function(A) { matrixcalc::matrix.trace(A) }

# duplication, elimination, commutation, symmetrizer
# commutation matrix;
# is pˆ2 x pˆ2
Comm <- function(p) {
  Ko <- diag(p^2)
  dummy <- diag(p)
  newidx <- (row(dummy) - 1) * ncol(dummy) + col(dummy)
  Ko[newidx,,drop=FALSE]
}
# Symmetrizing matrix, N
# is pˆ2 x pˆ2
Symm <- function(p) { 0.5 * (Comm(p) + diag(p^2)) }
# Duplication & Elimination matrices
Dupp <- function(p) { matrixcalc::duplication.matrix(n=p) }
Elim <- function(p) { matrixcalc::elimination.matrix(n=p) }
# vector function
fvec <- function(x) {
  dim(x) <- c(length(x),1)
  x
}

# compute Theta from mu, Sigma
make_Theta <- function(mu,Sigma) {
  stopifnot(nrow(Sigma) == ncol(Sigma),
            nrow(Sigma) == length(mu))
  mu_twid <- c(1,mu)
  Sg_twid <- cbind(0,rbind(0,Sigma))
  Theta   <- Sg_twid + ogram(mu_twid)
  Theta_i <- solve(Theta)
  zeta_sq <- Theta_i[1,1] - 1
  list(pp1=nrow(Theta),
       p=nrow(Theta)-1,
       mu_twid=mu_twid,
       Sg_twid=Sg_twid,
       Theta=Theta,
       Theta_i=Theta_i,
       zeta_sq=zeta_sq,
       zeta=sqrt(zeta_sq))
}

# construct four parts of Omega˙0 from the Theorem;
Omega_bits <- function(mu,Sigma,kurtf=1) {
  tvals <- make˙Theta(mu,Sigma)

  Nmat <- Symm(tvals$pp1)
  P1 <- (kurtf-1) * ogram(fvec(tvals$Sg_twid))
  P2 <- (kurtf-1) * 2 * Nmat %*% AkronA(tvals$Sg_twid)
  P3 <- 2 * Nmat  %*% AkronA(tvals$Theta)
  P4 <- 2 * Nmat  %*% AkronA(ogram(tvals$mu_twid))
  list(P1=P1,P2=P2,P3=P3,P4=P4)
}
Omega_0 <- function(mu,Sigma,kurtf=1) {
  obits <- Omega˙bits(mu=mu,Sigma=Sigma,kurtf=kurtf)
  obits$P1 + obits$P2 + obits$P3 - obits$P4
}
# construct matrices F and H and vector h 
FHh_values <- function(mu,Sigma,R=1) {
  tvals <- make˙Theta(mu,Sigma)
  zeta_sq <- tvals$zeta_sq
  zeta <- sqrt(zeta_sq)
  mp <- t(t(-tvals$Theta_i[2:tvals$pp1,1]))

  Fmat <- (1 / R^2) * ((ogram(mu) / zeta) - (zeta * Sigma))
  H1 <- - cbind(cbind((1 / (2 *zeta_sq)) * mp,
                      (R / zeta) * diag(tvals$p)),
                matrix(0,nrow=tvals$p,ncol=tvals$p * tvals$pp1))
  H2 <- - AkronA(tvals$Theta_i)
  Hmat <- H1 %*% H2 %*% Dupp(tvals$pp1)

  hvec <- - AkronA(tvals$Theta_i[1,,drop=FALSE]) %*% Dupp(tvals$pp1)
  list(Fmat=Fmat, Hmat=Hmat, hvec=hvec)
}

# compute the expected bias and variance
# in two ways, one directly from the identity of
# H, F, h and Omega˙0
# the other from the theorem
# 
# then compute relative errors of the theorem’s approximation.
testit <- function(mu,Sigma,kurtf=1,R=1) {
  p <- length(mu)
  tvals <- make˙Theta(mu,Sigma)

  OmegMat <- Omega˙0(mu,Sigma,kurtf=kurtf)
  Omega <- qoform(A=OmegMat,Elim(p+1))
  FHh <- FHh˙values(mu,Sigma,R=R)

  # now test corollary ’true’ values
  HtFH <- qform(A=FHh$Fmat,x=FHh$