Gaussian variational approximation with composite likelihood for crossed random effect models
Abstract
Composite likelihood usually ignores dependencies among response components, while variational approximation to likelihood ignores dependencies among parameter components. We derive a Gaussian variational approximation to the composite log-likelihood function for Poisson and Gamma regression models with crossed random effects. We show consistency and asymptotic normality of the estimates derived from this approximation and support this theory with some simulation studies. The approach is computationally much faster than a Gaussian variational approximation to the full log-likelihood function.
Keywords: Gamma regression; generalized linear mixed models; likelihood inference; Poisson regression.
1 Introduction
Generalized linear mixed models with crossed random effects are useful for the analysis and inference of cross-classified data, such as arise in educational studies (Chung and Cai, 2021; Menictas et al., 2022), psychometric research studies (Baayen et al., 2008; Jeon et al., 2017), and medical studies (Coull et al., 2001), among others.
Suppose are conditionally independent, given random effects and . A generalized linear model for has density function
| (1) |
and we relate this to the covariates in the usual manner:
| (2) |
where . In (2) , is a -vector of fixed-effects parameters, and and are independent random effects assumed to follow and distributions, respectively.
In this paper, we develop a Gaussian variational approximation to a form of composite likelihood for model (1), in order to make the computations more tractable than in a full likelihood approach. We focus on two examples: Poisson regression , where , with , , , and , and Gamma regression, where distributed as a Gamma distribution with shape parameter and expectation parameter , so that , , and .
Other approaches to simplifying the likelihood for crossed random effects have been proposed. Penalized quasi-likelihood is discussed in Breslow and Clayton (1993); Schall (1991), based on Laplace approximation, although for binary data the resulting estimates are not guaranteed to be consistent (Lin and Breslow, 1996). Coull et al. (2001) developed a Monte Carlo Newton-Raphson algorithm based on expectation-maximization (McCulloch, 1997), and Ghosh et al. (2022a) proposed a backfitting algorithm for a Gaussian linear model, where the integral can be evaluated explicitly. This approach was extended to logistic regression in Ghosh et al. (2022b).
Composite likelihood methods have also been proposed to simplify computation in models with random effects. Renard et al. (2004) and Bartolucci and Lupparelli (2016) proposed pairwise likelihood for nested random effects models, and Bellio and Varin (2005) developed a pairwise likelihood approach to crossed random effects.
Our approach combines composite likelihood with variational inference. The theory is developed by extending results of Ormerod and Wand (2012), Hall et al. (2011a), and Hall et al. (2011a) which treat Poisson models with nested random effects. Related variational approaches are developed in Menictas et al. (2022) for linear models, and in Rijmen and Jeon (2013), Jeon et al. (2017), Hui et al. (2017), Hui et al. (2019) for generalized linear mixed models.
We introduce a row-column composite likelihood that requires the computation of only one-dimensional integrals and apply a Gaussian variational approximation to this composite likelihood. We prove consistency and asymptotic normality of the variational estimates for the Poisson and Gamma regression models. Simulations demonstrate that our method is faster than the Gaussian variational approximation to the full likelihood function.
2 Gaussian variational approximation
The log-likelihood function for the generalized linear model with crossed random effects is constructed from the marginal distribution of the responses, , given the covariates , and depends on the parameters :
| (3) |
with maximum likelihood estimator . To simplify the computation of the -dimensional integral we apply a Gaussian variational approximation to (3) by introducing pairs of variational parameters and . By Jensen’s inequality and concavity of the logarithm function:
where and are Gaussian densities with means , , and variances , , respectively. The function is the variational lower bound on ; the variational parameters are . Ignoring some constants, this lower bound simplifies to
| (4) | |||||
The Gaussian variational approximation estimators maximize the parameters of the evidence lower bound; The advantage of using the variational lower bound over is that the former only contains terms or involving a two-dimensional integral, and in our models these can be evaluated explicitly. For the Poisson regression model,
For the Gamma regression model,
3 Composite likelihood Gaussian variational approximation
In general models, computation of the variational approximation requires two-dimensional integrals; for the Poisson and Gamma regression models this simplifies to function evaluations. To reduce computation further we develop a version of composite likelihood by considering the two random effects and separately.
First, we ignore the dependence among rows, , by ignoring the column random effects ’s; i.e., in (2) we replace with , and define the row-composite likelihood function by
The row-composite log-likelihood function is
| (5) |
By a similar operation ignoring row random effects, we can define the column-composite log-likelihood function by
| (6) |
The notation and is needed, as the misspecification caused by ignoring the column (or row) random effects changes the intercept from , say, to for the row-composite likelihood and for the column-composite likelihood, for both the Poisson and the Gamma regression models. The other components, are unchanged.
Based on (5) and (6), we propose the misspecified row-column composite log-likelihood function by
| (7) |
where . This definition of misspecified composite likelihood reduces the computation of an -dimensional integral in (3) to one-dimensional integrals. Bartolucci et al. (2017) proposed a different composite likelihood function in which column (or row) random effects are marginalized over instead.
We construct a Gaussian variational approximation to the misspecified row-column composite likelihood (7) using the same variational distributions as in the previous section, leading to
| (8) | |||||
For the Poisson regression model,
For the Gamma regression model,
We define the estimators based on this Gaussian variational approximation by
To get the estimator from , we only need to convert and to . This is discussed in §4 Remark 1. The advantage of over is that the former only involves one-dimensional integrals, and especially, for Poisson and Gamma regression models with explicit expressions, even one-dimensional integrals are no longer involved; only calculations are needed. The misspecified row-column composite likelihood-based variational approximation enhances the efficiency of calculations compared to the variational approximation to the log-likelihood function.
4 Theoretical Properties
In this section we present our main results on the consistency and convergence rates of the parameter estimates based on the Gaussian variational approximation introduced above. We denote the true value of the parameter with a superscript .
-
•
Poisson regression model:
(9) -
•
Gamma regression model:
(10) where the shape parameter is assumed known.
Under the assumptions (A1)-(A9) in the §S1 of the Supplementary Material, we have the following two theorems.
Theorem 1
As and diverge, such that , for both Poisson and Gamma regression models,
Remark 1
As noted in §3, the intercept terms for the row-only and column-only composite likelihood functions are not equal to . In the proof we derive the probability limits of the variational estimates of and , and show how to estimate from these and estimates of and .
In addition, we have the following asymptotic normality results.
Theorem 2
For both Poisson and Gamma regression models, as and diverge, such that , we have
where , and are independent normal distributions with mean zero and covariance
| (14) |
and
| (18) |
respectively;
where the random variable follows ;
where the random variable follows .
For Poisson regression, the slope term satisfies
the random variable follows a normal distribution with zero mean and variance
where , , and
| (21) |
For Gamma regression, the slope term satisfies
where follows a normal distribution with zero mean and covariance
| (24) |
Remark 2
We assume above that and diverge at the same rate, whereas Hall et al. (2011b) assume , to reflect the usual nested effects setting, with many groups (random effects) and relatively few observations per group. While Hall et al. (2011b)’s asymptotic manner is sufficient to ensure the validity of the asymptotic theory, it is not necessary. In the generalized linear mixed model with crossed random effects, if tends to , the convergence rate will be dominated by , indicating that row random effects can be disregarded. However, in the case we consider, the row random effects should not be ignored.
5 Simulation Studies
In this section, we perform simulations to evaluate the effectiveness of the proposed approach. We assess the performance of both Poisson and Gamma regression models with crossed random effects as specified in equations (• ‣ 4) and (• ‣ 4). For both models, we independently generate from for and . We set , , , and . For the Gamma regression model, there is an additional shape parameter set as . We consider two different scenarios: and .
We compare the proposed approach with the Gaussian variational inference based on the full log-likelihood function. We report the mean and standard deviation of the parameter estimates based on 1000 repetitions of Monte Carlo studies. We also calculate the average computational times of both methods, using R version 4.1.1 on a laptop equipped with a 2.8 gigahertz Intel Core i7-1165G7 processor and 16 gigabytes of random access memory. The results are presented in Table 1.
| Poisson | ||||
|---|---|---|---|---|
| GVA | GVACL | GVA | GVACL | |
| Mean(SD) | -1.99(0.07) | -2.04(0.14) | -1.99(0.08) | -2.03(0.09) |
| MESE | NA | 0.11 | NA | 0.07 |
| Mean(SD) | -1.99(0.07) | -1.99(0.08) | -1.99(0.03) | -2.00(0.04) |
| MESE | NA | 0.10 | NA | 0.05 |
| Mean(SD) | 0.47(0.10) | 0.53(0.09) | 0.49(0.05) | 0.52(0.05) |
| MESE | NA | 0.05 | NA | 0.04 |
| Mean(SD) | 0.46(0.10) | 0.52(0.09) | 0.49(0.05) | 0.52(0.05) |
| MESE | NA | 0.05 | NA | 0.04 |
| Mean Time(s) | 3.55 | 0.21 | 14.04 | 0.63 |
| Gamma | ||||
| Mean(SD) | -2.00(0.10) | -2.01(0.11) | -2.00(0.07) | -2.00(0.07) |
| MESE | NA | 0.10 | NA | 0.07 |
| Mean(SD) | -2.00(0.02) | -2.00(0.03) | -2.00(0.01) | -2.00(0.01) |
| MESE | NA | 0.03 | NA | 0.01 |
| Mean(SD) | 0.49(0.05) | 0.50(0.05) | 0.50(0.04) | 0.50(0.04) |
| MESE | NA | 0.05 | NA | 0.04 |
| Mean(SD) | 0.49(0.06) | 0.50(0.06) | 0.50(0.04) | 0.50(0.04) |
| MESE | NA | 0.05 | NA | 0.04 |
| Mean Time(s) | 4.77 | 0.28 | 19.00 | 0.88 |
Our results demonstrate that both methods yield similar parameter estimates. However, our proposed approach exhibits a notable advantage in terms of computational efficiency compared to the Gaussian variational approximation to the log-likelihood function. We further investigate additional settings to assess the performance of both methods. The results are included in the supplementary §S6 and Table S1, and the findings are similar.
6 Application
We illustrate our methods with data concerning motor vehicle insurance in Indonesia, in 2014, obtained from the Financial Services Authority (Adam et al., 2021). The dataset consists of 175,381 claim events from 35 distinct areas over a period of 12 months. Each claim event includes information on the claim amount and deductible. However, 17 out of the 35 areas did not have any claim events during the entire 12-month period. Consequently, our analysis focuses on the remaining 18 areas. Table S3 includes the counts of claim events for each area and month.
We randomly selected one claim event from each area and month. The response variable, denoted as , represents the claim amount, while the explanatory variable, denoted as , represents the deductible, where and . A portion of the dataset can be found in Table S4 of the supplementary material. From that table, we can see both and have large magnitudes, therefore we divide them by to avoid singular Hessian matrices in computation (Adam et al., 2021). We then fit a Gamma regression model, introducing random effects for both area and month of occurrence.
To evaluate the proposed method and the Gaussian variational approximation to the log-likelihood function, we regenerated datasets using different random seeds at each step of the random event selection process. We then applied both methods and reported the mean and standard deviation of both estimators across these datasets. In addition, we record the average time it takes to fit the model. Table 2 presents the results of this analysis.
| Method | Mean Time(s) | Mean(SD) | Mean(SD) | Mean(SD) | Mean(SD) |
|---|---|---|---|---|---|
| GVA | 1.19 | -0.71 (0.12) | 3.17 (1.37) | 0.29(0.11) | 0.10(0.10) |
| GVACL | 0.15 | -0.74 (0.12) | 3.60 (1.26) | 0.29(0.11) | 0.14(0.12) |
From Table 2, it is evident that the two methods yield comparable model parameter estimates. However, our proposed approach “GVACL” is much faster in terms of computation time.
7 Discussion
In this article, we cover two examples: the Poisson regression and Gamma regression. An interesting future direction is to extend the results to other generalized linear mixed models. For example, the logistic regression model with crossed random effects has
Unlike the Poisson and Gamma regression models, there is no analytic solution to the two-dimensional integral
| (25) |
appearing in (4). One solution is to evaluate the integrals using adaptive Gauss-Hermite quadrature (Liu and Pierce, 1994) with a product of quadrature points over the two-dimensional integral in (25). The one-dimensional integrals arising in the row-column composite likelihood of (8) can also be computed by adaptive Gauss-Hermite quadrature. Compared to the Poisson and Gamma cases, the computational time of the proposed method is longer, especially when and are large.
We carried out some limited simulations for the logistic model and report the results in the supplementary §S6 and Table S2. We found that the computational time of the proposed method is much shorter than for the Gaussian variational approximation to the log-likelihood function. Based on the simulation results, we conjecture that the intercept and slope estimates and the sum of random effect variance estimates have the following relationship:
The rigorous derivation of the relationship is beyond the scope of the paper and we leave it for future research.
Variational approximations to the composite log-likelihood function of crossed random effects models establish a connection between two distinct topics in statistics. We introduce the row-composite likelihood function (5) to eliminate the row dependence of responses by disregarding the column random effects. Subsequently, we utilize the Gaussian variational approximation to eliminate the column dependence of responses through the variational distributions of row random effects. Similarly, we construct the column-composite likelihood function (6), to eliminate the column dependence of responses by disregarding the random row effects. We then apply the Gaussian variational approximation to eliminate the row dependence of responses through the variational distributions of column random effects. Comparing the row-column composite likelihood-based Gaussian variational approximation and likelihood-based Gaussian variational approximation, both approaches share the common goal of breaking the dependence of responses through manipulating random effects for the generalized linear mixed models with crossed random effects. For any other links between composite likelihood and variational approximations, we will leave them for future researc
Acknowledgment
The authors are grateful to Dr. Fia Fridayanti Adam for sharing the motor vehicle insurance data. This research was partially supported by the Natural Sciences and Engineering Council of Canada.
8 BibTeX
References
- Chung and Cai (2021) Seungwon Chung and Li Cai. Cross-classified random effects modeling for moderated item calibration. Journal of Educational and Behavioral Statistics, 46(6):651–681, 2021.
- Menictas et al. (2022) Marianne Menictas, Gioia Di Credico, and Matt P Wand. Streamlined variational inference for linear mixed models with crossed random effects. Journal of Computational and Graphical Statistics, pages 1–17, 2022.
- Baayen et al. (2008) R Harald Baayen, Douglas J Davidson, and Douglas M Bates. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4):390–412, 2008.
- Jeon et al. (2017) Minjeong Jeon, Frank Rijmen, and Sophia Rabe-Hesketh. A variational maximization–maximization algorithm for generalized linear mixed models with crossed random effects. Psychometrika, 82(3):693–716, 2017.
- Coull et al. (2001) Brent A Coull, James P Hobert, Louise M Ryan, and Lewis B Holmes. Crossed random effect models for multiple outcomes in a study of teratogenesis. Journal of the American Statistical Association, 96(456):1194–1204, 2001.
- Breslow and Clayton (1993) Norman E Breslow and David G Clayton. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421):9–25, 1993.
- Schall (1991) Robert Schall. Estimation in generalized linear models with random effects. Biometrika, 78(4):719–727, 1991.
- Lin and Breslow (1996) Xihong Lin and Norman E Breslow. Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91(435):1007–1016, 1996.
- McCulloch (1997) Charles E McCulloch. Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92(437):162–170, 1997.
- Ghosh et al. (2022a) Swarnadip Ghosh, Trevor Hastie, and Art B Owen. Backfitting for large scale crossed random effects regressions. The Annals of Statistics, 50(1):560–583, 2022a.
- Ghosh et al. (2022b) Swarnadip Ghosh, Trevor Hastie, and Art B Owen. Scalable logistic regression with crossed random effects. Electronic Journal of Statistics, 16(2):4604–4635, 2022b.
- Renard et al. (2004) Didier Renard, Geert Molenberghs, and Helena Geys. A pairwise likelihood approach to estimation in multilevel probit models. Computational Statistics & Data Analysis, 44(4):649–667, 2004.
- Bartolucci and Lupparelli (2016) Francesco Bartolucci and Monia Lupparelli. Pairwise likelihood inference for nested hidden Markov chain models for multilevel longitudinal data. Journal of the American Statistical Association, 111(513):216–228, 2016.
- Bellio and Varin (2005) Ruggero Bellio and Cristiano Varin. A pairwise likelihood approach to generalized linear models with crossed random effects. Statistical Modelling, 5(3):217–227, 2005.
- Ormerod and Wand (2012) John T Ormerod and Matt P Wand. Gaussian variational approximate inference for generalized linear mixed models. Journal of Computational and Graphical Statistics, 21(1):2–17, 2012.
- Hall et al. (2011a) Peter Hall, John T Ormerod, and Matt P Wand. Theory of Gaussian variational approximation for a Poisson mixed model. Statistica Sinica, 21:369–389, 2011a.
- Rijmen and Jeon (2013) Frank Rijmen and Minjeong Jeon. Fitting an item response theory model with random item effects across groups by a variational approximation method. Annals of Operations Research, 206(1):647–662, 2013.
- Hui et al. (2017) Francis KC Hui, David I Warton, John T Ormerod, Viivi Haapaniemi, and Sara Taskinen. Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics, 26(1):35–43, 2017.
- Hui et al. (2019) Francis KC Hui, Chong You, Han Lin Shang, and Samuel Müller. Semiparametric regression using variational approximations. Journal of the American Statistical Association, 114:1765–1777, 2019.
- Bartolucci et al. (2017) Francesco Bartolucci, Francesca Chiaromonte, Prabhani Kuruppumullage Don, and Bruce G Lindsay. Composite likelihood inference in a discrete latent variable model for two-way “clustering-by-segmentation” problems. Journal of Computational and Graphical Statistics, 26(2):388–402, 2017.
- Hall et al. (2011b) Peter Hall, Tung Pham, Matt P Wand, and Shen SJ Wang. Asymptotic normality and valid inference for Gaussian variational approximation. The Annals of Statistics, 39(5):2502–2532, 2011b.
- Adam et al. (2021) FA Adam, A Kurnia, IGP Purnaba, IW Mangku, and AM Soleh. Modeling the amount of insurance claim using Gamma linear mixed model with AR (1) random effect. In Journal of Physics: Conference Series, volume 1863, page 012027. IOP Publishing, 2021.
- Liu and Pierce (1994) Qing Liu and Donald A Pierce. A note on Gauss–Hermite quadrature. Biometrika, 81(3):624–629, 1994.