Indirect multivariate response linear regression
Abstract
We propose a new class of estimators of the multivariate response linear regression coefficient matrix that exploits the assumption that the response and predictors have a joint multivariate Normal distribution. This allows us to indirectly estimate the regression coefficient matrix through shrinkage estimation of the parameters of the inverse regression, or the conditional distribution of the predictors given the responses. We establish a convergence rate bound for estimators in our class and we study two examples. The first example estimator exploits an assumption that the inverse regression’s coefficient matrix is sparse. The second example estimator exploits an assumption that the inverse regression’s coefficient matrix is rank deficient. These estimators do not require the popular assumption that the forward regression coefficient matrix is sparse or has small Frobenius norm. Using simulation studies, we show that our example estimators outperform relevant competitors for some data generating models.
1 Introduction
Some statistical applications require the modeling of a multivariate response. Let be the measurement of the -variate response for the th subject and let be the nonrandom values of the predictors for the th subject (. The multivariate response linear regression model assumes that is a realization of the random vector
(1) |
where is the unknown intercept, is the unknown by regression coefficient matrix, and are independent copies of a mean zero random vector with covariance matrix .
The ordinary least squares estimator of is
(2) |
where is the Frobenius norm, is the set of real valued by matrices, is the by matrix with th row , and is the by matrix with th row (). It is well known that is the maximum likelihood estimator of when are independent and identically distributed and the corresponding maximum likelihood estimator of exists.
Many shrinkage estimators of have been proposed by penalizing the optimization in (2). Some of these estimators simultaneously estimate and remove irrelevant predictors (Turlach et al.,, 2005; Obozinski et al.,, 2010; Peng et al.,, 2010). Others encourage an estimator of reduced rank (Yuan et al.,, 2007; Chen and Huang,, 2012).
Under the restriction that are independent and identically distributed , shrinkage estimators of that penalize or constrain the minimization of the negative loglikelihood have been proposed. These methods simultaneously estimate and . Examples include maximum likelihood reduced rank regression (Izenman,, 1975; Reinsel and Velu,, 1998), envelope models (Cook et al.,, 2010; Su and Cook,, 2011, 2012, 2013), and multivariate regression with covariance estimation (Rothman et al.,, 2010; Lee and Liu,, 2012; Bhadra and Mallick,, 2013).
To fit (1) with these shrinkage estimators, one exploits explicit assumptions about , but these may be unreasonable in some applications. As an alternative, we propose an indirect method to fit (1) without making explicit assumptions about . We exploit the assumption that response and predictors have a joint multivariate Normal distribution and we employ shrinkage estimators of the parameters of the conditional distribution of the predictors given the response. Our method provides an alternative indirect estimator of , which may be suitable when the existing shrinkage estimators are inadequate.
2 A new class of indirect estimators of
2.1 Class definition
We assume that the measured predictor and response pairs are a realization of independent copies of , where . We also assume that positive definite. Define the marginal parameters through the following partitions:
Our goal is to estimate the multivariate regression coefficient matrix in the forward regression model
without assuming that is sparse or that is small. To do this we will estimate the inverse regression’s coefficient matrix and the inverse regression’s error precision matrix in the inverse regression model
We connect the parameters of the inverse regression model to with the following proposition.
Proposition 1.
If is positive definite, then
(3) |
We prove Proposition 1 in Appendix A.1. This result leads us to propose a class of estimators of defined by
(4) |
where , , and are user-selected estimators of , , and . If and the ordinary sample estimators are used for , and , then is equivalent to .
We propose to use shrinkage estimators of , , and in (4). This gives us the potential to indirectly fit an unparsimonious forward regression model by fitting a parsimonious inverse regression model. For example, suppose that and are sparse, but is dense. To fit the inverse regression model, we could use any of the forward regression shrinkage estimators discussed in Section 1.
2.2 Related work
Lee and Liu, (2012) proposed an estimator of that also exploits the assumption that is multivariate Normal; however, unlike our approach that makes no explicit assumptions about , their approach assumes that both and are sparse.
Modeling the inverse regression is a well-known idea in multivariate analysis. For example, when is categorical, quadratic discriminant analysis models as -variate Normal. There are also many examples of modeling the inverse regression in the sufficient dimension reduction literature (Adragni and Cook,, 2009).
The most closely related work to ours is that by Cook et al., (2013). They proposed indirect estimators of based on modeling the inverse regression in the special case when the response is univariate, i.e. . Under the same multivariate Normal assumption on that we make, Cook et al., (2013) showed that
(5) |
They proposed estimators of by replacing and in the right hand side of (5) with their usual sample estimators, and by replacing with a shrinkage estimator. This class of estimators was designed to exploit an abundant signal rate in the forward univariate response regression when .
3 Asymptotic Analysis
We present a convergence rate bound for the indirect estimator of defined by (4). Our bound allows and to grow with the sample size . In the following proposition, is the spectral norm and is the minimum eigenvalue.
Proposition 2.
Suppose that following conditions are true: (i) is positive definite for all ; (ii) the estimator is positive definite for all ; (iii) the estimator is positive definite for all ; (iv) there exists a positive constant such that for all ; and (v) there exist sequences and such that , , , and as . Then
We prove Proposition 2 in Appendix A.1. We used the spectral norm because it is compatible with the convergence rate bounds established for sparse inverse covariance estimators (Rothman et al.,, 2008; Lam and Fan,, 2009; Ravikumar et al.,, 2011).
If the inverse regression is parsimonious in the sense that and are bounded, then the bound in Proposition 2 simplifies to . From an asymptotic perspective, it is not surprising that the indirect estimator of is only as good as its worst plug-in estimator. We explore finite sample performance in Section 5.
4 Example estimators in our class
4.1 Sparse inverse regression
We now describe an estimator of the forward regression coefficient matrix defined by (4) that exploits zeros in the inverse regression’s coefficient matrix , zeros in the inverse regression’s error precision matrix , and zeros in the precision matrix of the responses . We estimate with
(6) |
which separates into -penalized least-squares regressions (Tibshirani,, 1996): the first predictor regressed on the response through the th predictor regressed on the response. We select with 5-fold cross-validation, minimizing squared prediction error totaled over the folds, in the regression of the th predictor on the response . This allows us to estimate the columns of in parallel.
We estimate and with -penalized Normal likelihood precision matrix estimation (Yuan and Lin,, 2007; Banerjee et al.,, 2008). Let be a generic version of this estimator with tuning parameter and input by sample covariance matrix :
(7) |
where is the set of symmetric and positive definite by matrices. There are many algorithms that solve (7). Two good choices are the graphical lasso algorithm (Yuan,, 2008; Friedman et al.,, 2008) and the QUIC algorithm (Hsieh et al.,, 2011). We select with 5-fold cross-validation maximizing a validation likelihood criterion (Huang et al.,, 2006):
(8) |
where is a user-selected finite subset of the non-negative real line, is the sample covariance matrix from the observations outside the th fold, and is the sample covariance matrix from the observations in the th fold centered by the sample mean of the observations outside the th fold. We estimate using (7) with its tuning parameter selected by (8) and . Similarly, we estimate using (7) with its tuning parameter selected by (8) and .
4.2 Reduced rank inverse regression
We propose indirect estimators of that exploit the assumption that the inverse regression’s coefficient matrix is rank deficient. We have the following simple proposition that links rank deficiency in and its estimator to and its indirect estimator.
Proposition 3.
If is positive definite, then . In addition, if and are positive definite in the indirect estimator defined by (4), then .
The proof of this proposition is simple so we excluded it to save space.
We propose the following two example reduced rank indirect estimators of :
- 1.
- 2.
Both example indirect reduced rank estimators of are formed by plugging in the estimators of , and to (4). The first estimator is likelihood-based and the second estimator exploits sparsity in and . Neither estimator is defined when . In this case, which we do not address, a regularized reduced rank estimator of could be used instead of the estimator defined in (9), e.g. the factor estimation and selection estimator (Yuan et al.,, 2007) or the reduced rank ridge regression estimator (Mukherjee and Zhu,, 2011).
5 Simulations
5.1 Sparse inverse regression simulation
We compared the following indirect estimators of when the inverse regression’s coefficient matrix is sparse:
-
.
This is the indirect estimator proposed in Section 4.1.
- .
- .
- .
- .
We also included the following forward regression estimators of :
-
OLS/MP.
This is the ordinary least squares estimator defined by . When , we use the solution , where is the Moore-Penrose generalized inverse of .
-
R.
This is the ridge penalized least squares estimator defined by
-
.
This is an alternative ridge penalized least squares estimator defined by
where a separate tuning parameter is used for each response.
We selected the tuning parameters for uses of (6) with 5-fold cross-validation, minimizing validation prediction error on the inverse regression. Tuning parameters for and R were selected with 5-fold cross-validation, minimizing validation prediction error on the forward regression. We selected tuning parameters for uses of (7) with (8). The candidate set of tuning parameters was .
For 50 independent replications, we generated a realization of independent copies of , where and . The th entry of was set to and the th entry of was set to . We set , where denotes the element-wise product: had entries independently drawn from and had entries independently drawn from the Bernoulli distribution with nonzero probability . This model is ideal for because and are both sparse. Every entry in the corresponding randomly generated is nonzero with high probability, but the magnitudes of these entries are small. This motivated us to compare our indirect estimators of to the ridge-penalized least squares forward regression estimators R and .
We evaluated performance with model error (Breiman and Friedman,, 1997; Yuan et al.,, 2007), which is defined by .
OLS | R | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
0.7 | 0.0 | 0.1 | 0.61 | 0.32 | 0.53 | 0.40 | 1.35 | 2.10 | 1.23 | 1.22 |
0.7 | 0.5 | 0.1 | 0.72 | 0.39 | 0.59 | 0.51 | 1.30 | 1.91 | 1.29 | 1.30 |
0.7 | 0.7 | 0.1 | 0.76 | 0.45 | 0.65 | 0.56 | 1.27 | 1.73 | 1.27 | 1.29 |
0.7 | 0.9 | 0.1 | 0.83 | 0.66 | 0.85 | 0.64 | 1.26 | 1.35 | 1.05 | 1.09 |
0.0 | 0.9 | 0.1 | 0.81 | 0.87 | 0.87 | 0.79 | 2.04 | 2.34 | 1.26 | 1.87 |
0.5 | 0.9 | 0.1 | 0.96 | 0.76 | 0.99 | 0.74 | 1.63 | 1.84 | 1.36 | 1.49 |
0.9 | 0.9 | 0.1 | 0.46 | 0.39 | 0.47 | 0.36 | 0.63 | 0.62 | 0.48 | 0.48 |
0.7 | 0.9 | 0.3 | 0.60 | 0.53 | 0.65 | 0.46 | 0.83 | 0.67 | 0.64 | 0.63 |
0.7 | 0.9 | 0.5 | 0.48 | 0.37 | 0.48 | 0.37 | 0.65 | 0.53 | 0.52 | 0.51 |
0.7 | 0.9 | 0.7 | 0.42 | 0.29 | 0.39 | 0.31 | 0.55 | 0.46 | 0.45 | 0.44 |
We report the average model errors, based on these 50 replications, in Table 1. When , the indirect estimators defined by (4) performed well for all choices of and . Our proposed estimator was competitive with other indirect estimators also defined by (4), even those that used some oracle information. As increased with and fixed, the forward regression estimators performed nearly as well as .
MP | R | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.7 | 0.0 | 0.1 | 8.59 | 4.28 | 5.70 | 7.40 | 78.33 | 13.85 | 12.44 |
0.7 | 0.5 | 0.1 | 9.67 | 5.09 | 6.37 | 8.49 | 73.82 | 14.79 | 13.34 |
0.7 | 0.7 | 0.1 | 10.01 | 6.37 | 7.44 | 8.75 | 70.30 | 15.56 | 14.40 |
0.7 | 0.9 | 0.1 | 9.92 | 10.07 | 11.44 | 8.88 | 61.83 | 16.43 | 15.94 |
0.0 | 0.9 | 0.1 | 15.17 | 17.09 | 16.93 | 15.23 | 119.60 | 28.63 | 29.41 |
0.5 | 0.9 | 0.1 | 14.88 | 13.59 | 16.91 | 12.01 | 86.88 | 23.62 | 22.69 |
0.9 | 0.9 | 0.1 | 4.71 | 4.78 | 5.94 | 3.99 | 25.37 | 6.36 | 5.91 |
0.7 | 0.9 | 0.3 | 16.86 | 17.43 | 19.66 | 15.44 | 43.88 | 15.30 | 14.14 |
0.7 | 0.9 | 0.5 | 26.89 | 26.81 | 29.93 | 24.95 | 36.87 | 14.79 | 13.62 |
0.7 | 0.9 | 0.7 | 31.86 | 35.98 | 38.64 | 30.36 | 33.58 | 14.35 | 13.65 |
Similarly, Table 2 shows that when , outperforms all three forward regression estimators. However, unlike in the lower dimensional setting illustrated in table 1, when is not sparse, i.e. , is outperformed by forward regression approaches. The part oracle method that used the knowledge of outperformed the other two part oracle indirect estimators and when . Also, when , was competitive with the part oracle estimators. Taken together, the results in Tables 1 and 2 suggest that when is very sparse, our proposed indirect estimator may perform nearly as well as the part oracle indirect estimators and the forward regression estimators.
5.2 Reduced rank inverse regression simulation
We compared the performance of the following indirect reduced rank estimators of :
-
.
This is the likelihood-based indirect example estimator 1 proposed in Section 4.2.
- .
- .
- .
- .
We compared these indirect estimators to the following forward reduced rank regression estimator:
- RR.
We selected the rank parameter for uses of (9) with 5-fold cross-validation, minimizing validation prediction error on the inverse regression. The rank parameter for RR was selected with 5-fold cross-validation, minimizing validation prediction error on the forward regression. We selected tuning parameters for uses of (7) with (8). The candidate set of tuning parameters was .
For 50 independent replications, we generated a realization of independent copies of where and . The th entry of was set to and the th entry of was set to . After specifying , we set , where and had entries independently drawn from so that . As we did in the simulation in Section 5.1, we measured performance with model error.
OLS | RR | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.7 | 0.0 | 10 | 0.33 | 0.04 | 0.86 | 0.75 | 0.64 | 1.38 | 0.64 |
0.7 | 0.5 | 10 | 0.34 | 0.04 | 0.86 | 0.74 | 0.60 | 1.31 | 0.60 |
0.7 | 0.7 | 10 | 0.31 | 0.03 | 0.86 | 0.80 | 0.62 | 1.32 | 0.61 |
0.7 | 0.9 | 10 | 0.31 | 0.02 | 0.85 | 0.88 | 0.60 | 1.30 | 0.61 |
0.0 | 0.9 | 10 | 0.15 | 0.03 | 1.00 | 1.77 | 1.22 | 2.61 | 1.21 |
0.5 | 0.9 | 10 | 0.42 | 0.01 | 1.11 | 1.36 | 0.90 | 1.97 | 0.89 |
0.9 | 0.9 | 10 | 0.12 | 0.01 | 0.32 | 0.30 | 0.22 | 0.46 | 0.22 |
0.7 | 0.9 | 4 | 0.35 | 0.02 | 1.73 | 2.61 | 0.49 | 3.12 | 0.49 |
0.7 | 0.9 | 8 | 0.35 | 0.01 | 1.15 | 1.33 | 0.68 | 1.73 | 0.65 |
0.7 | 0.9 | 12 | 0.31 | 0.04 | 0.64 | 0.59 | 0.55 | 0.96 | 0.53 |
0.7 | 0.9 | 16 | 0.25 | 0.08 | 0.30 | 0.20 | 0.44 | 0.50 | 0.42 |
We report the model errors, averaged over the 50 independent replications, in Table 3. Under every setting, outperformed all non-oracle competitors. When , outperformed both and , which suggests that shrinkage estimation of and was helpful. In each setting, performed similarly to even though they are estimating parameters of different condition distributions.
5.3 Reduced rank forward regression simulation
Our simulation studies in the previous sections used inverse regression data generating models. In this section, we compare the estimators from Section 5.2 using a forward regression data generating model.
For 50 independent replications, we generated a realization of independent copies of where and . The th entry of was set to and the th entry of was set to . After specifying , we set where had entries independently drawn from and had entries independently drawn from . In this data generating model, neither nor had entries equal to zero.
OLS | RR | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.0 | 0.9 | 10 | 2.79 | 0.54 | 4.27 | 5.05 | 2.48 | 4.99 | 2.82 |
0.5 | 0.9 | 10 | 2.90 | 0.47 | 5.36 | 5.94 | 2.73 | 5.00 | 2.89 |
0.7 | 0.9 | 10 | 2.97 | 0.51 | 4.64 | 5.03 | 2.71 | 4.93 | 2.76 |
0.9 | 0.9 | 10 | 2.84 | 0.73 | 3.78 | 4.16 | 2.67 | 5.19 | 2.73 |
0.7 | 0.0 | 10 | 4.66 | 1.92 | 3.59 | 5.88 | 4.53 | 5.11 | 4.34 |
0.7 | 0.5 | 10 | 4.27 | 1.65 | 3.88 | 5.51 | 3.99 | 5.06 | 3.97 |
0.7 | 0.7 | 10 | 3.55 | 1.26 | 3.99 | 5.29 | 3.43 | 5.00 | 3.44 |
0.7 | 0.9 | 4 | 1.27 | 0.08 | 3.84 | 4.71 | 0.95 | 5.00 | 1.11 |
0.7 | 0.9 | 8 | 2.39 | 0.36 | 4.15 | 5.15 | 2.05 | 4.81 | 2.22 |
0.7 | 0.9 | 12 | 3.58 | 0.79 | 4.44 | 5.21 | 3.20 | 5.15 | 3.27 |
0.7 | 0.9 | 16 | 4.53 | 1.29 | 4.62 | 4.42 | 4.33 | 5.11 | 4.38 |
The model errors, averaged over the 50 replications, are reported in Table 4. Both and were competitive with RR in most settings. Although neither nor were sparse, we again see that generally outperforms and , both of which use some oracle information. These results indicate that shrinkage estimators of and in (4) are helpful when neither is sparse.
6 Tobacco chemical composition data example
As an example application, we use the chemical composition of tobacco leaves data from Anderson and Bancroft, (1952) and Izenman, (2009). These data have cases, predictors, and responses. The names of the predictors, taken from page 183 of Izenman, (2009), are percent nitrogen, percent chlorine, percent potassium, percent phosphorus, percent calcium, and percent magnesium. The names of the response variables, also taken from page 183 of Izenman, (2009), are rate of cigarette burn in inches per 1,000 seconds, percent sugar in the leaf, and percent nicotine in the leaf. In these data, it may inappropriate to assume that is sparse. For this reason, we consider another example indirect estimator of called that estimates with (6), estimates with (7) using , and estimates with
(10) |
where . We compute (10) with the closed form solution derived by Witten and Tibshirani, (2009). As before, we select from using (8). We also consider the forward regression estimators RR, , and OLS defined in Section 5.1 and Section 5.2. We introduce another competitor , defined as
which is equivalent to performing separate lasso regressions (Tibshirani,, 1996). We randomly split the data into a 40% test set and 60% training set in each of 500 replications and we measured the squared prediction error on the test set. All tuning parameters were chosen from by 5-fold cross validation.
OLS | RR | ||||||
---|---|---|---|---|---|---|---|
Rate of burn | 1.19 | 1.33 | 0.45 | 2.96 | 2.17 | 0.57 | 1.55 |
(0.08) | (0.10) | (0.03) | (0.15) | (0.15) | (0.07) | (0.13) | |
Percent sugar | 442.38 | 347.76 | 235.55 | 799.03 | 605.30 | 365.13 | 583.98 |
(17.97) | (21.31) | (6.31) | (29.45) | (25.52) | (20.68) | (24.36) | |
Percent nicotene | 2.55 | 2.54 | 0.79 | 5.65 | 4.59 | 0.81 | 2.82 |
(0.29) | (0.30) | (0.05) | (0.41) | (0.31) | (0.21) | (0.29) |
Table 5 shows squared prediction errors, averaged over the 10 predictions and the 500 replications. These results indicate that outperforms all the competitors we considered. Also, was outperformed by , but was competitive with separate lasso regressions. Reduced rank regression was not competitive with the proposed indirect estimators.
Acknowledgment
We thank Liliana Forzani for an important discussion. This research is supported in part by a grant from the U.S. National Science Foundation.
Appendix A Appendix
A.1 Proofs
Proof of Proposition 1.
Since is positive definite, we apply the partitioned inverse formula to obtain that
where and . The symmetry of implies that so
(11) |
Using the Woodbury identity,
(12) |
Using the inverse of the expression above in (11) establishes the result. ∎
In our proof of Proposition 2, we use the matrix inequality
(13) |
Bickel and Levina, (2008) used (A.1) to prove their Theorem 3.
Proof of Proposition 2.
From (12) in the proof of Proposition 1, . Define . Applying (A.1),
(14) |
We will show that the third term in (14) dominates the others. We continue by deriving its bound. Employing a matrix identity used by Cai et al., (2010), we write , so
(15) |
Using the triangle inequality and (A.1),
(16) |
Since and is positive definite, Weyl’s eigenvalue inequality implies that so
(17) |
Also,
(18) |
because , is positive definite, and in (16). Using (16), (17), and (18), in (15),
We then see that the third term in (14) dominates and
∎
References
- Adragni and Cook, (2009) Adragni, K. P. and Cook, R. D. (2009). Sufficient dimension reduction and prediction in regression. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 367(1906):4385–4405.
- Anderson and Bancroft, (1952) Anderson, R. L. and Bancroft, T. A. (1952). Statistical theory in research.
- Banerjee et al., (2008) Banerjee, O., El Ghaoui, L., and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation. Journal of Machine Learning Research, 9:485–516.
- Bhadra and Mallick, (2013) Bhadra, A. and Mallick, B. K. (2013). Joint high-dimensional bayesian variable and covariance selection with an application to eqtl analysis. Biometrics, 69(2):447–457.
- Bickel and Levina, (2008) Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Annals of Statistics, 36(1):199–227.
- Breiman and Friedman, (1997) Breiman, L. and Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1):3–54.
- Cai et al., (2010) Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Annals of Statistics, 38:2118–2144.
- Chen and Huang, (2012) Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association, 107(500):1533–1545.
- Cook et al., (2013) Cook, R. D., Forzani, L., and Rothman, A. J. (2013). Prediction in abundant high-dimensional linear regression. Electronic Journal of Statistics, 7:3059–3088.
- Cook et al., (2010) Cook, R. D., Li, B., and Chiaromonte, F. (2010). Envelope models for parsimonious and efficient multivariate linear regression (with discussion). Statistica Sinica, 20:927–1010.
- Friedman et al., (2008) Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441.
- Hsieh et al., (2011) Hsieh, C.-J., Sustik, M. A., Dhillon, I. S., and Ravikumar, P. K. (2011). Sparse inverse covariance matrix estimation using quadratic approximation. In Advances in Neural Information Processing Systems, volume 24, pages 2330–2338. MIT Press, Cambridge, MA.
- Huang et al., (2006) Huang, J., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93(1):85–98.
- Izenman, (1975) Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5(2):248–264.
- Izenman, (2009) Izenman, A. J. (2009). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer.
- Lam and Fan, (2009) Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrices estimation. Annals of Statistics, 37:4254–4278.
- Lee and Liu, (2012) Lee, W. and Liu, Y. (2012). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized gaussian maximum likelihood. Journal of Multivariate Analysis, 111:241–255.
- Mukherjee and Zhu, (2011) Mukherjee, A. and Zhu, J. (2011). Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining, 4(6):612–622.
- Obozinski et al., (2010) Obozinski, G., Taskar, B., and Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231–252.
- Peng et al., (2010) Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J. R., and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics, 4(1):53.
- Ravikumar et al., (2011) Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011). High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980.
- Reinsel and Velu, (1998) Reinsel, G. C. and Velu, R. P. (1998). Multivariate Reduced-rank Regression. Springer.
- Rothman et al., (2008) Rothman, A. J., Bickel, P. J., Levina, E., and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515.
- Rothman et al., (2010) Rothman, A. J., Levina, E., and Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19(4):947–962.
- Su and Cook, (2011) Su, Z. and Cook, R. D. (2011). Partial envelopes for efficient estimation in multivariate linear regression. Biometrika, 98:133–146.
- Su and Cook, (2012) Su, Z. and Cook, R. D. (2012). Inner envelopes: Efficient estimation in multivariate linear regression. Biometrika, 99:687–702.
- Su and Cook, (2013) Su, Z. and Cook, R. D. (2013). Scaled envelopes: Scale invariant and efficient estimation in multivariate linear regression. Biometrika, 100:921–938.
- Tibshirani, (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc., Ser. B, 58:267–288.
- Turlach et al., (2005) Turlach, B. A., Venables, W. N., and Wright, S. J. (2005). Simultaneous variable selection. Technometrics, 47(3):349–363.
- Witten and Tibshirani, (2009) Witten, D. M. and Tibshirani, R. (2009). Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):615–636.
- Yuan, (2008) Yuan, M. (2008). Efficient computation of l1 regularized estimates in gaussian graphical models. Journal of Computational and Graphical Statistics, 17(4):809–826.
- Yuan et al., (2007) Yuan, M., Ekici, A., Lu, Z., and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346.
- Yuan and Lin, (2007) Yuan, M. and Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94(1):19–35.