¹¹affiliationtext: Department of Statistics, University of Missouri²²affiliationtext: Department of Statistics, University of California, Santa Cruz³³affiliationtext: Research and Methodology Directorate, U.S. Census Bureau

Bayesian Unit-level Modeling of Categorical Survey Data with a Longitudinal Design

Daniel Vedensky Paul A. Parker Scott H. Holan

Abstract

Categorical response data are ubiquitous in complex survey applications, yet few methods model the dependence across different outcome categories when the response is ordinal. Likewise, few methods exist for the common combination of a longitudinal design and categorical data. By modeling individual survey responses at the unit-level, it is possible to capture both ordering information in ordinal responses and any longitudinal correlation. However, accounting for a complex survey design becomes more challenging in the unit-level setting. We propose a Bayesian hierarchical, unit-level, model-based approach for categorical data that is able to capture ordering among response categories, can incorporate longitudinal dependence, and accounts for the survey design. To handle computational scalability, we develop efficient Gibbs samplers with appropriate data augmentation as well as variational Bayes algorithms. Using public-use microdata from the Household Pulse Survey, we provide an analysis of an ordinal response that asks about the frequency of anxiety symptoms at the beginning of the COVID-19 pandemic. We compare both design-based and model-based estimators and demonstrate superior performance for the proposed approaches.

Key words: Bayesian hierarchical model; Bayesian ordinal logistic regression; Informative sample; Longitudinal survey data; Pseudo-likelihood; Small area estimation

^†^†footnotetext: ^†To whom correspondence should be addressed. dvedensky@mail.missouri.edu

1 Introduction

In the context of survey data applications, modeling individual records (i.e., “unit-level modeling”) rather than area-level aggregates is often necessary. For ordinal-valued responses, modeling the individual survey responses allows us to capture dependence among the response categories, which is not possible in area-level models that model group averages. Likewise, for longitudinal studies, modeling at the unit-level is necessary in order to link individual responses and capture the serial correlation across a given respondent’s survey responses. In small area estimation applications, unit-level models have the additional benefit of allowing for arbitrary aggregation of data and may increase the precision of estimates (Hidiroglou and You,, 2016). Yet, despite the ubiquity of non-Gaussian survey data collected according to a longitudinal design, there is scarce unit-level methodology for such settings. Notably, there are few existing methods for the combination of ordinal and longitudinal data that are able to incorporate spatio-temporal dependence and account for complex survey designs.

In general, more attention has been paid to binary and nominal responses in the complex survey literature (Skinner,, 2018). In contrast, in the spatial literature for complex surveys, categorical data are typically modeled at the area level, which requires calculating weighted proportions first. When the data follow a natural ordering, it is common to ignore this structure and instead model the ordinal proportions as though they were nominal. For example, Bauder et al., (2021) model multinomial probabilities as multivariate normal after employing a logit transformation. Although it is possible to induce an ordinal structure by modifying the covariance matrix (Agresti,, 2010), this does not capture the dependence a unit-level approach would.

Meanwhile, most existing unit-level methodology for categorical survey data does not fully take into account the survey design or applies only to cross-sectional surveys. For example, Machini et al., (2022) present an ordinal model for spatial data, but disregard the survey design, Beltrán-Sánchez et al., (2024) model ordinal survey data with spatial dependence, but rely on the restrictive assumption that all design information is available in the form of covariates. Methods that do account for a longitudinal survey design generally assume Gaussian, binary, or occasionally nominal data. One exception is the longitudinal model of Kunihama et al., (2019) that is able to handle multivariate responses and incorporates the weights into a Bayesian nonparametric (BNP) sampling algorithm. However, the response types considered are continuous, binary, or nominal, but not ordinal. Sutradhar and Kovacevic, (2000) consider ordinal, longitudinal, complex design, but use GEE which does not easily extend to the spatial setting or allow for straightforward uncertainty quantification.

In this paper, we propose an ordinal model within a unit-level modeling framework that incorporates survey weights, spatial, and temporal dependence. We do so by extending the work of Parker et al., (2022) and Vedensky et al., (2023). The former models cross-sectional binary and nominal complex survey data at the unit level while the latter builds on this approach to cover longitudinal data with Gaussian or binary responses. By employing a particular stick-breaking representation of a multinomial likelihood (Linderman et al.,, 2015)—also known as a continuation ratio factorization (Fienberg,, 2007)—along with Pólya-Gamma data augmentation (Polson et al.,, 2013), we obtain conjugate sampling algorithms, with Bayesian, ordinal logistic regression as a special case. When data are especially high-dimensional, MCMC algorithms may prove insufficient, and we introduce Variational Bayes (VB) algorithms (Blei et al.,, 2017) as an alternative to Gibbs sampling for such applications.

Recent work has made use of this combination of the continuation ratio factorization and Pólya-Gamma data augmentation. Kang and Kottas, 2024a develop a BNP model for cross-sectional data and Kang and Kottas, 2024b extend this to functional, longitudinal data, modeling continuous curves. In the psychological literature, Jimenez et al., (2023) cover the special case of an item-response model with ordinal values. However, modifying these approaches to handle complex survey data with a spatial component is nontrivial, and to our knowledge, the link to Bayesian logistic regression has not been pointed out elsewhere.

To illustrate the efficacy of our proposed methodology, we apply our models to data from the Household Pulse Survey (HPS), a high-dimensional complex survey that follows a longitudinal design. This includes an empirical simulation study and a full data analysis.

This article proceeds as follows. In Section 2 we describe our general approach to modeling ordinal data. Section 3 describes how the ordinal and nominal models may be extended for use with survey data. In Section 4 variational Bayes algorithms for each of the models are introduced. We then present empirical results in Section 5 and conclude with a discussion in Section 6.

2 Bayesian Ordinal Logistic Regression

Let $\mathcal{U}=\{1,\ldots,N\}$ index a population, from which we observe a sample $\mathcal{S}\subseteq\mathcal{U}$ of size $|\mathcal{S}|=n$ . Associated with each element of the sample are a set of covariates $\bm{x}_{i}=(x_{i1},\ldots,x_{iq})$ as well as an ordinal-valued response $y_{i}\in\{1,\ldots,K\}$ , with $K>2$ possible categories. We also define the one-hot encoded vector representation of the response, $\widetilde{\bm{y}}_{i}=(\mathbbm{1}(y_{i}=1),\mathbbm{1}(y_{i}=2),\ldots,\mathbbm{1}(y_{i}=K))$ , where $\mathbbm{1}(\cdot)$ denotes the indicator function. The response vectors for the entire sample may be collated into the $nK$ -dimensional vector $\widetilde{\bm{y}}=(\widetilde{\bm{y}}_{1},\widetilde{\bm{y}}_{2},\ldots,\widetilde{\bm{y}}_{n}).$

It is common to model such data with a cumulative model (McCullagh,, 1980), where latent, ordered cutpoints $0\leq\gamma_{1}\leq\cdots\leq\gamma_{K-1}$ are introduced and it is assumed

P(y_{i}\leq k)=F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}),

with $F$ being any strictly monotonic distribution function and ${\bm{\beta}}$ a set of regression coefficients. This implies the probability of observing a given category to be

P(y_{i}=k)=F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta})-F(\gamma_{k-1}-\bm{x}_{i}^{\prime}\bm{\beta}).

In the Bayesian setting, Albert and Chib, (1993) propose a popular data augmentation algorithm for cumulative, ordinal probit regression. They take $F$ to be the standard normal cdf $\Phi$ and add Gaussian latent variables to produce an easily implementable Gibbs sampler. This is the most widely used Bayesian formulation for ordinal models (Agresti,, 2010), and has been adapted to spatial models, as well (Higgs and Hoeting,, 2010; Schliep and Hoeting,, 2015; Carter et al.,, 2024).

Tutz, (1990, 1991) specify an alternative latent variable formulation for a sequential ordinal model, which has received relatively less attention in the literature. However, a number of authors have noted benefits, such as more flexible modeling of response curves and the possibility for category-specific covariates (Tutz et al.,, 2005; Bürkner and Vuorre,, 2019; Boes and Winkelmann,, 2006; Kang and Kottas, 2024a, ). In contrast to the cumulative approach, the sequential approach assumes no ordering for $\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{K-1})$ , and that the conditional probability for observing a given category $k<K$ is

P(y_{i}=k|\bm{\beta},\bm{\gamma})=F(\gamma_{k}-\bm{x}_{i}^{\prime}\beta)\prod_{j<k}(1-F(\gamma_{j}-\bm{x}_{i}^{\prime}\bm{\beta}))

(2.1)

and

P(y_{i}=K|\bm{\beta},\bm{\gamma})=\prod_{j=1}^{K-1}(1-F(\gamma_{j}-\bm{x}_{i}^{\prime}\bm{\beta}))

(2.2)

for the last category. In this way, the probability of observing a particular category may be viewed as a sequential process, where category $k$ can only be attained if all previous categories, $j<k$ are not observed. The final category, $K$ , is observed only if all other categories fail to be attained.

This formulation implies an independent binomial model for each response category, conditional on $\bm{\beta}$ and $\bm{\gamma}$ , via the continuation-ratio parametrization of Fienberg, (1980), which is also referred to as a “stick-breaking” representation (Linderman et al.,, 2015). Specifically, a multinomial pmf parameterized by counts $n$ and a vector of probabilities $\bm{\pi}_{i}=(\pi_{i1},\ldots,\pi_{iK})$ , can be expressed as the product

\text{Multinomial}({\widetilde{\bm{y}}}_{i}|n_{i},\bm{\pi}_{i})=\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}|n_{ik},\widetilde{\pi}_{ik}),

(2.3)

where $\pi_{ik}=\widetilde{\pi}_{ik}\prod_{j<k}(1-\widetilde{\pi}_{ij})$ and $n_{ik}=n_{i}-\sum_{j<k}y_{j}$ for $k=1,\ldots,K-1$ . That is, if we let $\pi_{ik}=P(y_{i}=k|\bm{\beta},\bm{\gamma})$ and $\widetilde{\pi}_{ik}=F(\gamma_{k}-\bm{x}_{i}\bm{\beta})$ , with $F$ the inverse logit function, we have that (2.1) and (2.2) define a multinomial likelihood over the ordinal categories.

Placing Gaussian priors on $\bm{\beta}$ and $\bm{\gamma}$ , we may then specify the full ordinal logistic regression model hierarchically as a Bayesian GLM

$\displaystyle p(\bm{\widetilde{y}}\|\bm{\beta})$	$\displaystyle=\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}\|n_{ik},\widetilde{\pi}_{ik})$
$\displaystyle\text{logit}(\widetilde{\pi}_{ik})$	$\displaystyle=\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}$
$\displaystyle\bm{\beta}$	$\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}\text{I}_{q})$	(2.4)
$\displaystyle\bm{\gamma}$	$\displaystyle\sim\text{N}_{K-1}(\bm{0},\sigma^{2}_{\gamma}I_{K-1})$
$\displaystyle\sigma_{\beta},\sigma_{\gamma}$	$\displaystyle>0.$

For nominal responses, where there is no ordering of response categories, model (2) simplifies to estimating $\text{logit}(\widetilde{\pi}_{i})=\bm{x}_{i}\bm{\beta}_{k}$ with category-specific coefficients, $\bm{\beta}_{k}$ and with no need for cutpoints $\gamma_{k}$ (Parker et al.,, 2022)

$\displaystyle p(\bm{\widetilde{y}}\|\bm{\beta})$	$\displaystyle=\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}\|n_{ik},\widetilde{\pi}_{ik})$
$\displaystyle\text{logit}(\widetilde{\pi}_{ik})$	$\displaystyle=\bm{x}_{i}^{\prime}\bm{\beta}_{k}$
$\displaystyle\bm{\beta}_{k}$	$\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}\text{I}_{q})$	(2.5)
$\displaystyle\sigma_{\beta}$	$\displaystyle>0.$

Because both (2) and (2) have logistic likelihoods, it is possible to introduce latent variables that follow a Pólya-Gamma (PG) distribution (Polson et al.,, 2013) to obtain computationally efficient Gibbs samplers. In the case of (2) this yields a conjugate sampler for logistic ordinal regression.

A random variable $X$ is said to be distributed PG $(b,c)$ with shape parameter $b>0$ and scale parameter $c\in\mathbb{R}$ if it is equal in distribution to

\frac{1}{2\pi^{2}}\sum_{k=1}^{\infty}\frac{g_{k}}{(k-1/2)^{2}+c^{2}/(4\pi^{2})},

where $g_{k}\overset{ind.}{\sim}\text{Gamma}(b,1)$ . Polson et al., (2013) also show that

\mathbb{E}[X]=\frac{b}{2c}\tanh(c/2)

(2.6)

and derive the integral identity,

\frac{e^{a\lambda}}{(1+e^{\lambda})^{b}}=2^{-b}e^{\kappa\lambda}\int_{0}^{\infty}e^{-\omega\lambda^{2}/2}p(\omega)d\omega,

(2.7)

where $\kappa=a-b/2$ and $\omega\sim PG(b,0)$ . In the right-hand side of (2.7), after conditioning on $\omega$ , we have a Gaussian density in terms of $\lambda$ . If we condition on $\lambda$ , then we have a PG density in $\omega$ . Therefore, sampling the model only requires alternating between Gaussian and PG random draws.

Notation for the full sampling algorithm is greatly simplified if, following Albert and Chib, (2001)’s algorithm for probit regression, we augment the covariate vector for each response such that if $y_{i}=k$ , then $\bm{\widetilde{x}}_{ij}=(0,\ldots,1,\ldots,0,-\bm{x}_{i}^{\prime})$ , for $j=1,\ldots,\min(k,K-1)$ , where the $1$ is in the $j$ th position. Further, let $\bm{\widetilde{X}}_{i}=(\widetilde{\bm{x}}_{i1},\ldots,\widetilde{\bm{x}}_{ij})^{\prime}$ be the new covariate matrix for respondent $i$ and

\bm{\widetilde{X}}=\begin{pmatrix}\widetilde{\bm{X}}_{1}\\ \vdots\\ \widetilde{\bm{X}}_{n}\end{pmatrix}

(2.8)

the augmented design matrix for the entire sample. Letting $\bm{\delta}=(\bm{\gamma}^{\prime},\bm{\beta}^{\prime})^{\prime},$ we then have

\text{logit}^{-1}(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta})=\text{logit}^{-1}(\widetilde{\bm{x}}_{ik}^{\prime}\bm{\delta})=\frac{\exp(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})}{1+\exp(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})},

so that the integral identity (2.7) allows us to write the multinomial data model in (2) as

\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\frac{1}{2}e^{\kappa_{ik}\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta}}\int_{0}^{\infty}e^{-\omega_{ik}(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})^{2}/2}p(\omega_{ik})d\omega_{ik},

with $\kappa_{ik}=y_{ik}-1/2.$

Assuming the prior distribution $\bm{\delta}\sim N(\bm{d},\bm{D})$ , a Gibbs sampler for the model requires iterating over only two steps

1.

$\omega_{ik}|\bm{\delta}\sim\text{PG}(1,\bm{\widetilde{x}_{ik}}^{\prime}\bm{\delta})$ (for $i=1,\ldots,n$ and $k=1,\ldots,k_{i}$ )
2.

$\bm{\delta}|\bm{\Omega}\sim\text{N}(\bm{\mu}_{\delta},\bm{\Sigma}_{\delta})$ ,

where $\bm{\Sigma}_{\delta}=(\bm{D}^{-1}+\bm{\widetilde{X}^{\prime}\Omega\widetilde{X}})^{-1}$ and $\bm{\mu}_{\delta}=\bm{\Sigma}_{\delta}(\bm{\widetilde{X}}^{\prime}\widetilde{\bm{\kappa}}+\bm{D}^{-1}\bm{d})$ with $\bm{\omega}=(\omega_{11},\omega_{12},\ldots,\omega_{nk_{n}})$ and $\bm{\Omega}=\text{diag}(\bm{\omega}).$ The vector $\widetilde{\bm{\kappa}}$ is constructed by the same process as (2.8), where $\widetilde{\bm{\kappa}}_{i}=(\kappa_{i1},\ldots,\kappa_{ij})^{\prime}$ and $\widetilde{\bm{\kappa}}=(\widetilde{\kappa}_{1}^{\prime},\ldots,\widetilde{\kappa}_{n}^{\prime})^{\prime},$

3 Categorical Models for Complex Survey Data

In order to apply the above models to unit-level survey data, it is important to consider a number of additional factors. First, complex surveys often collect data according to a design that may not be known to the data modeler (Gelman,, 2007). Even if the design is known, factors such as non-response, attrition (in the case of longitudinal surveys) (Thompson,, 2015), and informative sampling (Parker et al.,, 2023) may cause bias. These complications must be accounted for through proper use of survey weights. Large-scale surveys are also likely to exhibit spatial dependence and it is common to produce predictions for spatial (or demographic) domains with low sample sizes (i.e., small area estimation). Additionally, many surveys are conducted repeatedly over time, or employ a longitudinal structure, where the same people or households are re-interviewed, leading to temporal correlation. The Bayesian hierarchical modeling framework allows us to address each of these types of dependence by extending the data and process model components of (2).

3.1 Complex survey design

In area-level models, the survey design is incorporated into the direct estimates being modeled. For instance, suppose our population $\mathcal{U}$ can be partitioned into $m$ disjoint areas $\mathcal{U}_{j}$ such that $\mathcal{U}=\bigcup_{j=1}^{m}\mathcal{U}_{j}$ . In a complex survey, a sample is taken according to known selection probabilities $\pi_{ij}=P(i\in\mathcal{S}_{j})$ and weights are defined as $w_{ij}=1/\pi_{ij}$ . To fit an area-level model, a direct estimator such as the Horvitz-Thompson estimator for the mean (Horvitz and Thompson,, 1952)

\widehat{\bar{y}}_{j}=\frac{1}{|\mathcal{U}_{j}|}\sum_{i\in\mathcal{S}_{j}}w_{ij}y_{ij}

(3.1)

is calculated for each area and treated as the response variable. The inclusion of survey weights in (3.1) reflects the survey design, hence there is no bias due to an unrepresentative sample.

Unit-level models, on the other hand, are fit to individual survey responses, so incorporating survey weights becomes less straightforward. Parker et al., (2023) provide an overview of remedies for informative sampling/complex design in this setting. In particular, the sequential ordinal regression model readily fits into the pseudo-likelihood framework (Binder,, 1983; Skinner,, 1989), in which the likelihood contribution of each unit is exponentiated by its survey weight,

\prod_{i\in\mathcal{S}}f(y_{i}|\bm{\theta})^{w_{i}}.

Savitsky and Toth, (2016) show that in the Bayesian case, a pseudo-likelihood combined with a prior density $p(\bm{\theta})$ , leads to a valid pseudo-posterior

p(\bm{\theta}|\bm{y},\bm{\widetilde{w}})\propto\prod_{i\in\mathcal{S}}f(y_{i}|\bm{\theta})^{\widetilde{w}_{i}}p(\bm{\theta})

provided that the original survey weights are rescaled to $\widetilde{w}_{i}=nw_{i}/\sum_{i\in\mathcal{S}}w_{i}$ in order to ensure $\sum\widetilde{w}_{i}=n$ .

In the case of (2), introducing a pseudo-likelihood means that the binomial pmf in the data model, which is parameterized by log odds $F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta})$ , becomes proportional to

\displaystyle[(F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{y_{ik}}(1-F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{1-y_{ik}}]^{\widetilde{w}_{i}}

\displaystyle=\frac{(\exp(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{y_{ik}\widetilde{w}_{i}}}{(1+\exp(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{n_{ik}\widetilde{w}_{i}}},

which is the integral identity (2.7) with $a=y_{i}\widetilde{w}_{i}$ and $b=n_{i}\widetilde{w}_{i}$ . Consequently, sampling again proceeds by drawing iteratively from PG and Gaussian distributions, but with the PG shape parameter now containing a term for the survey weights so that $\omega_{ik}|\bm{\delta}\sim\text{PG}(\widetilde{w}_{i}n_{ik},\widetilde{\bm{x}}^{\prime}_{ik}\bm{\delta})$ in the first step.

3.2 Spatio-temporal dependence

Survey data often contain spatial and temporal information. If the spatial domain of interest is especially fine-grained, then many areas will have small sample sizes and we would like to leverage spatial dependence to “borrow strength.” This can be achieved by including an area-level random effect within a unit-level model. Parker et al., (2022) propose Bayesian hierarchical models for cross-sectional data with binary, count, and categorical responses. They introduce an $m-$ dimensional vector of areal random effects $\bm{\eta}\sim N_{m}(\bm{0},\sigma^{2}_{\eta}I_{m})$ , which retain conjugacy with the Pólya-Gamma data augmentation scheme introduced above. To induce a more explicit spatial dependence structure, they take $\bm{\psi}_{i}$ to be a set of spatial basis functions, constructed as a subset of the eigenvectors of the adjacency matrix for all states. In our model, we retain only the $m$ columns corresponding to positive eigenvalues (and hence, positive spatial dependence), but for higher-resolution geographies further truncation can be performed to achieve greater dimension reduction. Thus, the $n\times m$ matrix of basis functions is defined as

\bm{\Psi}=\begin{pmatrix}\bm{\psi}_{1}^{\prime}\\ \vdots\\ \bm{\psi}_{n}^{\prime}\end{pmatrix},

with $\bm{\psi}_{i}$ the basis function evaluated at the area containing unit $i$ . This construction may be viewed as a special case of the Moran’s I basis functions (Hughes and Haran,, 2013; Bradley et al.,, 2015). Incorporating such a random effect into (2) and (2) leads us to set the linear predictor to be $\text{logit}(\widetilde{\pi}_{ik})=\gamma_{k}-(\bm{x}_{i}\bm{\beta}+\bm{\psi}_{i}\bm{\eta}).$

Many complex surveys follow a longitudinal design wherein the same respondents (or some subset of them) are re-interviewed over time. Two types of dependence arise in this setting: within-unit responses are correlated, as are the aggregate-level trends. For longitudinal designs where the survey is repeated at regular intervals, Vedensky et al., (2023) present models for Gaussian, binary and count data that handle both types of correlation. For discrete responses, the within-unit correlation is captured by constructing a categorical covariate that indexes the value (or absence) of the respondent’s previous response. The aggregate-level dependence is captured by adding an AR(1) structure onto the areal random effect so that the process model follows

	$\displaystyle\bm{\eta}_{t}\|\bm{\eta}_{t-1},\phi,\sigma^{2}_{\eta}$	$\displaystyle\sim\text{N}_{m}(\phi\bm{\eta}_{t-1},\sigma^{2}_{\eta}I_{m}),\qquad\text{ for }t=2,\ldots,T$		(3.2)
	$\displaystyle\bm{\eta}_{1}\|\sigma^{2}_{\eta}$	$\displaystyle\sim\text{N}_{m}(\bm{0},\sigma^{2}_{\eta}I_{m}),$		(3.3)

where $\phi\sim\text{Unif}(-1,1)$ functions as an autoregressive parameter, $\sigma^{2}_{\eta},\sigma^{2}_{\eta_{1}}\overset{ind.}{\sim}IG(a,b)$ are the prior random effect variances, and $a,b,\sigma_{\beta}>0$ are fixed hyperparameters.

We note that a survey may also follow a repeated, cross-sectional design, where the entire sample is refreshed at each time point. In such a survey, there is no within-respondent correlation and the random effects structure (3.2)–(3.3) alone is sufficient for capturing the design.

Combining the above modifications, the full spatio-temporal, ordinal model can be specified as

	$\displaystyle\bm{y}\|\bm{\beta},\bm{\eta},\bm{\gamma}$	$\displaystyle\propto\prod_{t=1}^{T}\prod_{i\in\mathcal{S}_{t}}\prod_{k=1}^{K-1}\text{Binom}(y_{i{t}k}\|n_{i{t}k},\widetilde{\pi}_{i{t}k})^{\widetilde{w}_{i{t}}}$
	$\displaystyle\text{logit}(\widetilde{\pi}_{i{t}k})$	$\displaystyle={\bm{u}_{itk}^{\prime}\bm{\gamma}}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}\bm{\eta}_{t}$
	$\displaystyle\bm{\gamma}$	$\displaystyle\sim\text{N}_{g}(0,\sigma^{2}_{\gamma}I_{g})$
	$\displaystyle{\bm{\eta}_{t}\|\bm{\eta}_{t-1},\phi,\sigma^{2}_{\eta}\phantom{.}}$	$\displaystyle\sim\text{N}_{m}(\phi\bm{\eta}_{t-1},\sigma^{2}_{\eta}I_{m}),\hskip 5.69054ptt=2,\ldots,T$
	$\displaystyle{\bm{\eta}_{1}\|\sigma^{2}_{\eta}\phantom{.}}$	$\displaystyle\sim{\text{N}_{m}(\bm{0},\sigma^{2}_{\eta}I_{m})}$
	$\displaystyle\bm{\beta}$	$\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}I_{q})$
	$\displaystyle{\phi}\phantom{.}$	$\displaystyle\sim\text{Unif}(-1,1)$
	$\displaystyle\sigma^{2}_{\eta}$	$\displaystyle\sim\text{IG}(a,b),$

where $n_{itk}=n_{it}-\sum_{j<k}y_{itj}$ and $\widetilde{\pi}_{itk}=\pi_{itk}/(1-\sum_{j<k}\pi_{itj})$ . One additional feature of the model is that the cutpoints are made to vary by time and according to a subject’s previous response. Therefore, there are $g=(K-1)+(T-1)*(K+1)*(K-1)$ cutpoints because subjects in the first time point have no previous response, whereas for the remaining $T-1$ weeks, there are $K+1$ possible values for the previous response. Each of these scenarios requires $K-1$ cutpoints. The indicator vector $\bm{u}_{itk}$ picks out the appropriate cutpoint for a response.

For unordered data, a similar nominal model can be fit as $K-1$ independent binomial models, but with no cutpoints and with category-specific fixed and random effects

\displaystyle\begin{split}\bm{y}|\bm{\beta},\bm{\eta}&\propto\prod_{t=1}^{T}\prod_{i\in\mathcal{S}_{t}}\prod_{k=1}^{K-1}\text{Binomial}(y_{itk}|n_{itk},\widetilde{\pi}_{itk})^{\widetilde{w}_{it}}\\ \text{logit}(\widetilde{\pi}_{itk})&=\bm{x}_{i}^{\prime}\bm{\beta}_{k}+\bm{\psi}_{i}^{\prime}\bm{\eta}_{tk}\\ \bm{\eta}_{tk}|\bm{\eta}_{(t-1)k},\phi_{k},\sigma^{2}_{\eta k}&\sim\text{N}_{m}(\phi_{k}\bm{\eta}_{(t-1)k},\sigma^{2}_{\eta k}I_{m}),\qquad t=2,\ldots,T;\phantom{--}k=1,\ldots K-1\\ \bm{\eta}_{1k}|\sigma^{2}_{\eta_{1}k}&\sim\text{N}_{m}(\bm{0},\sigma^{2}_{\eta_{1}k}I_{m})\qquad k=1,\ldots,K-1\\ \bm{\beta}_{k}&\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}I_{q})\qquad k=1,\ldots,K-1\\ \phi_{k}&\sim\text{Unif}(-1,1)\\ \sigma^{2}_{\eta_{1}k},\sigma^{2}_{\eta k},&\overset{ind.}{\sim}IG(a,b)\\ \sigma_{\beta},a,b&>0.\end{split}

In all of the above models, the latent processes and priors are either conditionally conjugate with the Pólya-Gamma latent variables or produce known conditional distributions. The resulting Gibbs samplers are thereby efficient and straightforward to implement. Full details of the sampling algorithms and derivations of full conditionals are included in the Appendix.

4 Variational Bayes

Despite the conjugacy in the above models, the Gibbs samplers may be inadequate for extremely high-dimensional problems, which require sampling a large number of latent variables and random effects (e.g., large sample sizes and fine-scaled geographies). In this case, we also propose a set of variational Bayes algorithms (Bishop,, 2006; Blei et al.,, 2017) as an alternative to MCMC. These algorithms are much faster albeit at the cost of approximating the posterior distribution rather than sampling from the exact posterior.

Instead of sampling sequentially, VB methods minimize the Kullback-Leibler divergence (Kullback and Leibler,, 1951)

\text{KL}(q(\bm{\theta})\mid\mid p(\bm{\theta}|\bm{x}))=\int_{\Theta}q(\bm{\theta})\log\frac{q(\bm{\theta})}{p(\bm{\theta}|\bm{x})}d\bm{\theta}.

between the true posterior $p(\theta|x)$ and an approximation $q(\theta)$ selected from a given class of distributions $\mathcal{Q}$ . That is, an optimal approximation is selected as

q^{*}(\bm{\theta})=\mathop{\arg\min}\limits_{q(\bm{\theta})\in\mathcal{Q}}\text{KL}(q(\bm{\theta})\mid\mid p(\bm{\theta}\mid\bm{x})).

A common choice for $\mathcal{Q}$ is the mean-field variational family (Blei et al.,, 2017) which consists of all densities $q$ that can be factored into mutually independent “variational factors” $q_{j}(\theta_{j})$ for each of $J$ latent variables so that

q(\bm{\theta})=\prod_{j=1}^{J}q_{j}(\theta_{j}).

Since the KL divergence is not necessarily computable, the optimization problem may be reformulated as a maximization with respect to the evidence lower bound (ELBO) (Blei et al.,, 2017)

\text{ELBO}(\bm{\theta})=\mathbb{E}[\log p(\bm{\theta},\bm{x})]-\mathbb{E}[\log q(\bm{\theta})].

Durante and Rigon, (2019), building on earlier work of Jaakkola and Jordan, (2000), show that logistic models augmented by Pólya-Gamma latent variables have a particularly simple VB algorithm for performing this maximization, and we adapt this to the ordinal case.

Returning to the notation of the two-stage Gibbs sampler at the end of Section 2, note that the Gaussian and PG densities therein have the exponential family representation

p(\bm{\delta}|\bm{y},\bm{\omega})\propto\exp[\bm{\eta}_{1}(\bm{y})^{\prime}\bm{\delta}+\text{vec}[\bm{\eta}_{2}(\bm{\omega})]^{\prime}\text{vec}(\bm{\delta}\bm{\delta}^{\prime})-\alpha[\bm{\eta}_{1}(\bm{y}),\bm{\eta}_{2}(\bm{\omega})]]

and

p(\bm{\omega}_{ik}|\bm{y},\bm{\omega})\propto\exp\{\eta_{ik}(\bm{\delta})\omega_{ik}-\alpha[\eta_{i}(\bm{\delta})]\}p(\omega_{ik}),\text{ for }i=1,\ldots,n;k=1,\ldots,k_{i}

respectively, with natural parameters $\bm{\eta}_{1}(\bm{y})=\bm{\widetilde{X}}^{\prime}\widetilde{\bm{\kappa}}+\bm{D}^{-1}\bm{d}$ and $\bm{\eta}_{2}(\bm{\omega})=-\frac{1}{2}\bm{D}^{-1}+\widetilde{\bm{X}}^{\prime}\bm{\Omega}\widetilde{\bm{X}}$ and $\eta_{ik}(\bm{\omega})=-\frac{1}{2}(\widetilde{\bm{x}}_{ik}^{\prime}\bm{\beta})^{2}.$ Durante and Rigon, (2019) show that the optimal CAVI parameter value updates at iteration $j$ are

1.

$\bm{\lambda}_{1}^{(j)}=\mathbb{E}_{q^{(j-1)(\bm{\omega})}}[\bm{\eta}_{1}(\bm{y})]$
2.

$\bm{\lambda}_{2}^{(j)}=\mathbb{E}_{q^{(j-1)}(\bm{\omega})}[\bm{\eta}_{2}(\bm{\omega}]$
3.

$\widecheck{\xi}_{ik}^{(j)}=\mathbb{E}_{q^{(j)}(\bm{\delta}_{ik})}[\eta_{ik}(\bm{\omega})]$ for $i=1,\ldots,n$ and $k=1,\ldots,k_{i}$ ,

where the $q^{(r)}(\bm{\delta}_{ik})$ densities are Gaussian and the $q^{(r)}(\omega_{ik})$ densities are PG.

Calculation of the expectation in Step 2 is straightforward due to the closed-form expression provided by (2.6). Convergence is measured by comparing the change in successive iterations of the ELBO and once this value drops below a predetermined threshold, the algorithm terminates. Independent samples are then drawn from the variational posterior N $(\widecheck{\bm{\mu}},\widecheck{\bm{\Sigma}})$ , as an approximation to the true posterior, with $\widecheck{\bm{\mu}}=(-2\bm{\lambda}_{2})^{-1}\bm{\lambda}_{1}$ and $\widecheck{\bm{\Sigma}}=(-2\bm{\lambda}_{2})^{-1}$ .

Parker et al., (2022) adapt this approach to cross-sectional binary and nominal pseudo-likelihood models with random effects. We further develop a cross-sectional, ordinal pseudo-likelihood sampling procedure in Algorithm 1 and a longitudinal variant in Algorithm 2. The matrices $\widetilde{G}$ , $\widetilde{X}$ and $\widetilde{\Psi}$ are constructed in the same manner as (2.8), where if $y_{it}=k$ , then we define $j_{it}=\min(k,K-1)$ and $\widetilde{G}_{it}=(\bm{u}_{it1},\ldots,\bm{u}_{itj_{it}})$ and let

\widetilde{G}=\begin{pmatrix}\widetilde{G}_{11}\\ \widetilde{G}_{12}\\ \vdots\\ \widetilde{G}_{nT}\end{pmatrix}

and similarly for $\widetilde{X}=(\bm{x}_{it1},\ldots,\bm{x}_{itj_{it}})^{\prime}$ and $\widetilde{\Psi}=(\bm{\psi}_{it1},\ldots,\bm{\psi}_{itj_{it}})^{\prime}$ .

In Algorithm 2, calculations for the variational parameters $\widetilde{\mu}_{\phi}$ depend on moments of the truncated normal distribution (Johnson et al.,, 1994). In that algorithm, $\Phi$ denotes the standard normal cdf and $\varphi$ the pdf. Note, too, the distinction between $(\widetilde{\mu}_{\phi})^{2}$ and $\widetilde{\mu}_{\phi^{2}}$ . The former is the squared mean, while the latter is the variational variance parameter for $\phi$ . Subscripts on the matrices, such as $X_{t}$ , refer to the subset of rows corresponding to respondents at time $t$ . A subscript such as, $\widecheck{\bm{\Sigma}}_{\eta_{-T}}$ refers to all entries of the matrix except those corresponding to the last time point $t=T$ .

Data:

\widetilde{\bm{C}}=[\widetilde{\bm{G}},-\widetilde{\bm{X}},-\widetilde{\bm{\Psi}}]

and

\widetilde{\bm{\kappa}}

;

Initialize

\widecheck{\sigma}^{2}_{\eta}

and

\widecheck{\xi}_{ik}

for

i=1,\ldots,n

and

k=1,\ldots,k_{i}

;

for $j=1$ until convergence do

\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\frac{\widetilde{w}_{11}}{2\xi_{11}}\tanh(\widecheck{\xi}_{11}/2),\ldots,\frac{\widetilde{w}_{nk_{n}}}{2\xi_{nk_{n}}}\tanh(\widecheck{\xi}_{nk_{n}}/2)\right);

\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\gamma}}I_{K-1},\frac{1}{\sigma^{2}_{\beta}}I_{q},{\widecheck{\sigma}^{2}_{\eta}}I_{m}\right)+\widetilde{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}\widetilde{\bm{C}}\right)^{-1};

\widecheck{\bm{\Sigma}}_{\eta}\leftarrow\widecheck{\bm{\Sigma}}[(g+q+1):(g+q+m),(g+q+1):(g+q+m)];

\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\gamma}^{\prime},\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}^{\prime}\widetilde{\bm{\kappa}};

\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+\frac{1}{2}(\widecheck{\bm{\mu}}_{\eta}^{\prime}\widecheck{\bm{\mu}}_{\eta}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta})))/(a+m/2);

for $i=1$ to $n$ do

for $k=1,\ldots,k_{i}$ do

\widecheck{\xi}_{ik}\leftarrow(\widetilde{\bm{C}}_{ik}^{\prime}\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}_{ik}+(\widetilde{\bm{C}}_{ik}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2};

end for

Algorithm 1 Variational Bayes algorithm for the cross-sectional ordinal model

Data:

\widetilde{\bm{C}}=[\widetilde{\bm{G}},-\widetilde{\bm{X}},-\widetilde{\bm{\Psi}}]

and

\widetilde{\bm{\kappa}}

Initialize

\widecheck{\bm{\mu}}_{\gamma}

\widecheck{\bm{\mu}}_{\eta}

\widecheck{\mu}_{\phi}

\widecheck{\mu}_{\phi^{2}}

\widecheck{\sigma}^{2}_{\phi}

\widecheck{\bm{\Sigma}}_{\eta_{t}}

\widecheck{\sigma}^{2}_{\eta}

and

\widecheck{\xi}_{itk}

for

t=1,\ldots,T

;

i=1,\ldots,n_{t}

;

k=1,\ldots,k_{i}

;

for $j=1$ until convergence do

\bm{\widecheck{\omega}}\leftarrow\left(\frac{\widetilde{w}_{111}}{2\widecheck{\xi}_{111}}\tanh(\widecheck{\xi}_{111}/2),\ldots,\frac{\widetilde{w}_{nTK}}{2\widecheck{\xi}_{nTK}}\tanh(\widecheck{\xi}_{nTK}/2)\right);

\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\widecheck{\bm{\omega}}\right);

\bm{\widecheck{\mu}}_{\beta}\leftarrow(\widetilde{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}\widetilde{\bm{X}}+1/\sigma^{2}_{\beta}\bm{I}_{p})^{-1}\widetilde{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}(\widetilde{\bm{G}}\widecheck{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}/\widecheck{\bm{\omega}}-\widetilde{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});

\bm{\widecheck{\mu}}_{\gamma}\leftarrow(\widetilde{\bm{G}}^{\prime}\widecheck{\bm{\Omega}}\widetilde{\bm{G}}+1/\sigma^{2}_{\gamma}\bm{I}_{g})^{-1}\widetilde{\bm{G}}^{\prime}\widecheck{\bm{\Omega}}(\widetilde{\bm{\kappa}}/\widecheck{\bm{\omega}}+\widetilde{\bm{X}}\widecheck{\bm{\mu}}_{\beta}+\widetilde{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});

\bm{\widecheck{\Sigma}}_{\eta_{1}}\leftarrow(\widetilde{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}\widetilde{\bm{\Psi}}_{1}+\widecheck{\sigma}^{2}_{\eta_{1}}(1+\widecheck{\mu}_{\phi^{2}})\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\widecheck{\bm{\Sigma}}_{\eta_{1}}\widetilde{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}(\widetilde{\bm{G}}_{1}\widecheck{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{1}/\widecheck{\bm{\omega}}_{1}-\widetilde{\bm{X}}_{1}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\bm{\mu}}_{\eta_{2}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta_{1}};

\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\bm{\widecheck{\mu}}_{\eta_{1}}-\mathbb{E}[\bm{\widecheck{\mu}}_{\eta_{1}}];

//Sum-to-zero constraint

for $t=1$ to $T$ do

\bm{\widecheck{\Sigma}}_{\eta_{t}}\leftarrow(\widetilde{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}\widetilde{\bm{\Psi}}_{t}+(1+\widecheck{\mu}_{\phi^{2}})\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{t}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{t}}\widetilde{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}(\widetilde{\bm{G}}_{t}\bm{\widecheck{}}{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{t}/\widecheck{\bm{\omega}}_{t}-\widetilde{\bm{X}}_{t}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta}(\bm{\widecheck{\mu}}_{\eta_{t}}+\bm{\widecheck{\mu}}_{\eta_{t+1}});

end for

\bm{\widecheck{\Sigma}}_{\eta_{T}}\leftarrow(\widetilde{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}\widetilde{\bm{\Psi}}_{T}+\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{T}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{T}}\widetilde{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}(\widetilde{\bm{G}}_{T}\bm{\widecheck{}}{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{T}/\widecheck{\bm{\omega}}_{T}-\widetilde{\bm{X}}_{T}\widecheck{\bm{\mu}}_{\beta})+\bm{\widecheck{\mu}}_{\eta_{T-1}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta};

\bm{\widecheck{\mu}}_{\eta}\leftarrow(\bm{\widecheck{\mu}}_{\eta_{1}},\ldots,\bm{\widecheck{\mu}}_{\eta_{T}});

\bm{\widecheck{\Sigma}}_{\eta}\leftarrow\text{blockdiag}(\bm{\widecheck{\Sigma}}_{\eta_{1}},\ldots,\bm{\widecheck{\Sigma}}_{\eta_{T}});

\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\gamma}}I_{g},\frac{1}{\sigma^{2}_{\beta}}I_{q},\bm{\widecheck{\Sigma}}_{\eta}\right)+\widetilde{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}\widetilde{\bm{C}}\right)^{-1};

\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\gamma}^{\prime},\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}^{\prime}\widetilde{\bm{\kappa}};

m\leftarrow\sum_{t=1}^{T-1}\bm{\widecheck{\mu}}_{\eta_{t}}^{\prime}\bm{\widecheck{\mu}}_{\eta_{t+1}}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);

\widecheck{\sigma}^{2}_{\phi}\leftarrow\widecheck{\sigma}^{2}_{\eta}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);

\ell\leftarrow(-1-m)/\widecheck{\sigma}_{\phi};

u\leftarrow(1-m)/\widecheck{\sigma}_{\phi};

\widecheck{\mu}_{\phi}\leftarrow m-\widecheck{\sigma}_{\phi}\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)};

\widecheck{\mu}_{\phi^{2}}\leftarrow\widecheck{\mu}_{\phi}^{2}+\widecheck{\sigma}^{2}_{\phi}\left(1-\frac{u\varphi(u)-\ell\varphi(\ell)}{\Phi(u)-\Phi(\ell)}-\left(\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)}\right)^{2}\right);

\widecheck{\sigma}^{2}_{\eta_{1}}\leftarrow(b+\widecheck{\bm{\mu}}_{\eta_{1}}^{\prime}\widecheck{\bm{\mu}}_{\eta_{1}}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{1}}))/(a+r/2);

\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+(1/2)(\sum_{t=2}^{T}\widecheck{\bm{\eta}}_{1}^{\prime}\widecheck{\bm{\eta}}_{1}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-1}})-2\widecheck{\mu}_{\phi}\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}\widecheck{\bm{\eta}}_{t+1}+

\widecheck{\mu}_{\phi^{2}}(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}}))))/(a+r(T-1)/2);

for $i=1$ to $n$ do

for $k=1,\ldots,k_{i}$ do

for $t=t_{i},t_{i}+1,t_{i}+2$ do

\widecheck{\xi}_{ikt}\leftarrow(\widetilde{\bm{C}}_{ikt}^{\prime}\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}_{ikt}+(\widetilde{\bm{C}}_{ikt}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2}

end for

Algorithm 2 Variational Bayes algorithm for the longitudinal ordinal model

\ULforem

The final case, of a longitudinal, nominal VB model, proceeds similarly to Algorithm 2 but without the cutpoint parameters. We defer this algorithm to the Appendix.

5 Empirical Results

5.1 Data description

To illustrate the proposed methodology, we apply the ordinal models to the Household Pulse Survey (HPS). The U.S. Census Bureau launched the HPS to measure the immediate effects of the COVID-19 pandemic on U.S. households. The survey was first deployed during the week of April 23, 2020 and continues to be conducted as the Household Trends and Outlook Pulse Survey (HTOPS).¹¹1https://www.census.gov/data/experimental-data-products/household-pulse-survey.html Data collection for the survey was split into “phases” of varying lengths, which differed in methodology, data collection, and structure. The intention was to make it straightforward to modify the survey and add new questions as needed amid the uncertainty of the pandemic.

Though many questions ran for the duration of HPS, the first phase, which lasted 12 weeks, was unique in its longitudinal structure. During this time, households were repeatedly interviewed according to a rotating panel design. Respondents were selected in an initial week, then re-interviewed for up to two more weeks if they continued to provide responses but dropped from the sampling frame if they failed to provide a response at any point. Therefore, each respondent provided between one and three consecutive responses. We focus on applying our methods to this first phase because of its longitudinal nature. It is important to note, however, that any modeling of later HPS phases that includes the first 12 weeks will still require using our methodology for these early measurements. Modeling of later phases also still requires capturing the temporal dependence due to the repeated cross-sectional design as our model does. HTOPS is intended to reinstate the longitudinal design, as well.

Collaboration with numerous federal agencies led to a range of survey items in the HPS. These span topics like economic well-being, physical and mental health, and other demographic information. A large number of these questions are categorical in nature. For instance, the Bureau of Labor Statistics included questions about the receipt and use of the Economic Impact Payment stimulus (e.g., was it used to (1) pay for expenses, (2) pay off debt, (3) add to savings, or (4) not applicable).²²2bls.gov/cex/research_papers/pdf/safir-effects-of-covid-on-household-finances.pdf The Health Resources and Services Administration asked questions on childcare and children’s health (e.g., “whether any children in the last 4 weeks showed any of […] 8 mental health-related behaviors”).³³3https://mchb.hrsa.gov/covid-19/data The National Center for Health Statistics (NCHS) added questions regarding delayed medical care and anxiety and depression symptoms (e.g., frequency (1) not at all, (2) several days, (3) more than half the days, or (4) nearly every day).⁴⁴4https://www.cdc.gov/nchs/covid19/health-care-access-and-mental-health.htm As such, producing precise estimates for HPS questions is of use to a large number of stakeholders and requires modeling longitudinal categorical data.

In particular, questions regarding mental health are of interest to researchers and policymakers due to the unprecedented disruptions to daily life, working conditions, and mental health services associated with the pandemic and the concomitant public health measures (World Health Organization,, 2022). Longitudinal studies are important here in order to judge the effect on individuals over time and to determine if changes are “acute” or sustained (Daly and Robinson,, 2022). Also, it is posited that specific demographic groups, such as women and young adults, are disproportionately impacted (Hawes et al.,, 2022; Metin et al.,, 2022). This makes our method suited to the task. Design-based estimates, on the other hand, break down for group comparisons in finer partitions since the standard error estimates will be unstable for subdomains with low sample sizes and many domains will have no sampled units at all. For this reason, we focus our data analysis in Section 5.3 on the aforementioned NCHS question regarding anxiety.

5.2 Simulation study

To assess the performance of our ordinal models, we conduct an empirical simulation study, for the aforementioned response regarding frequency of anxiety over course of the week, which is ranked on an ascending scale of 4 categories. We take as our ground-truth population all $774,882$ unique respondents and $991,412$ total responses in the HPS sample from the continental United States (excluding the District of Columbia).

To mimic a true complex survey, we draw informative subsamples from this population and obtain model-based and direct estimates for each subsample. We compare based on summary measures, including mean square error (MSE), absolute bias, credible interval coverage rate, and the interval score (IS) of Gneiting and Raftery, (2007). The interval score is calculated as

IS_{\alpha}(\ell,u;x)=(u-\ell)+\frac{2}{\alpha}(\ell-x){1}\{x<\ell\}+\frac{2}{\alpha}(x-u){1}\{x>u\}

and penalizes overly wide predictive intervals. A lower score is more desirable.

To make the subsamples informative, we take a probability proportional to size sample following the Poisson method of Brewer et al., (1984) with an expected sample size of $5\%$ of the population. Each member of the population is sampled with a probability $\pi_{i}$ proportional to a size variable

\pi_{i}\propto\exp\{.1w_{i}^{*}+.2\bar{y}_{i}\},

where $w_{i}^{*}$ is the log survey weight of household $i$ , scaled to have zero mean and standard deviation one, and $\bar{y}_{i}$ the scaled mean of all of a household’s responses. The weight assigned to sampled unit $i$ is then $w_{i}=1/\pi_{i}.$

Spatial basis functions are constructed via the process outlined in Section 3.2. Retaining only the eigenvectors corresponding to positive eigenvalues leads to basis functions of dimension $m=20$ . Model covariates include race (white, Black, Asian, or other), sex (male, female), and age categories (one category for 18-25, one for 65+, and the remaining intermediary ages binned into five-year groups, for a total of 10 categories). The design matrix includes no intercept and further drops a reference category from one of the covariates in order for the cutpoints to be identifiable. To capture longitudinal dependence in the nominal model, we include a “synthetic” covariate that indexes a household’s previous response category (or lack thereof). In the ordinal model, this variable is used to vary the cutpoints as described in Section 3.2.

We compute direct estimators, and fit cross-sectional and longitudinal variants of both the Gibbs and VB models for a total of five different estimators. The cross-sectional models are fit independently for each week. For each of the models, $R$ posterior draws (or draws from the variational distribution) of the parameter estimates are taken. For all models, we set $R=1,500$ , and discard $500$ burn-in iterations beforehand in the Gibbs sampler. The estimated probability for a given category $k=1,\ldots,K-1$ is calculated as

\widetilde{\pi}_{itk}^{(r)}=\text{logit}^{-1}(\bm{u}^{\prime}_{itk}\bm{\gamma}^{(r)}-\bm{x}_{i}^{\prime}\bm{\beta}_{k}^{(r)}-\bm{\psi}_{i}\bm{\eta}_{t}^{(r)}).

The full probability vector $\widehat{\bm{\pi}}_{it}=(\widehat{\pi}_{it1},\ldots,\widehat{\pi}_{itK})$ is reconstructed from the stick-breaking representation and estimates for the full population are then generated as

\widehat{y}_{it}^{(r)}\mid\cdot\sim\text{Multinomial}(\bm{n},\bm{\pi}_{i})

for $t=1,\ldots,T$ and $i=1,\ldots,N$ , where $\bm{n}$ is a K-dimensional vector of ones. This population can then be aggregated into cells by taking the average

\widehat{\bar{y}}_{jtk}^{(r)}=\frac{1}{N_{j}}\sum_{i\in\mathcal{S}_{j}}\widehat{y}_{itk}^{(r)}

for each subdomain $j$ and summary metrics are computed over the posterior distributions of the cell estimates.

The results of $100$ simulation runs are summarized in Table 1. Regardless of sampling algorithm, the longitudinal models lead to improved MSE, lower interval scores, and faster runtimes as compared to their cross-sectional counterparts. The Gibbs models also outperform their VB counterparts on MSE and IS, but with the trade-off of a greatly increased runtime. The longitudinal Gibbs model performs best overall in terms of MSE and interval score. Although the direct estimator attains nearly the nominal coverage rate, it does so at the cost of overly wide confidence intervals, hence its high interval score.

Table 1: Overall results for direct- and model-based estimates averaged over 100 simulations in the ordinal case. Runtime is the median of 100 runs. All other values are averages over weeks, areas, and simulations.

Method	MSE	Abs Bias	Cov.	IS	Runtime (s)
Direct	$2.7\times 10^{-3}$	${4.0\times 10^{-3}}$	${95\%}$	$.252$	-
VB-CS	$5.4\times 10^{-4}$	$1.4\times 10^{-2}$	$89\%$	$.115$	15
VB-Lon	$4.1\times 10^{-4}$	$1.4\times 10^{-2}$	$86\%$	$.113$	$43$
Gibbs-CS	$5.1\times 10^{-4}$	$1.4\times 10^{-2}$	$90\%$	$.112$	$99,103$
Gibbs-Lon	$\mathbf{3.7\times 10^{-4}}$	$1.3\times 10^{-2}$	$89\%$	$\mathbf{.099}$	$15,673$

For a more detailed look, Figure 1 shows a time series plot of the MSE for each estimator averaged over areas and simulations. Again, all models outperform the direct estimator, and we see the same pattern. Longitudinal models uniformly reduce the MSE over their cross-sectional counterparts and the Gibbs models uniformly reduce MSE over their VB counterparts, with the longitudinal Gibbs sampler displaying the lowest MSE.

Refer to caption — Figure 1: Time series plot of the MSE for each estimator by week in the ordinal empirical simulation study. MSE is calculated for every combination of week and area then averaged across areas and 100 samples.

As a further diagnostic, Figure 2 plots the ratios of the model MSE for a given time-area combination relative to the direct estimator. The $x=y$ reference line denotes a ratio of one, indicating identical MSE between a model and the direct estimator. Only a few domains have a ratio less than one, with the longitudinal Gibbs model having the fewest.

5.3 Analysis of Household Pulse Anxiety Data

We conduct a data analysis by running the ordinal, longitudinal VB algorithm on the entire HPS dataset for the same anxiety response considered in the simulation study. For population values of the design matrix we use data from the 2020 Census Population Estimates program.⁵⁵5Available at https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/state/asrh/sc-est2021-alldata6.csv The choice of covariates and basis functions is the same as in the simulation studies.

To emphasize our models’ ability to make predictions for small areas, we consider a finer partition into subdomains and present results for a particular combination of race, sex, and age – namely, Asian males aged 36 to 40 – rather than weekly area-level averages as in the simulation. This requires first making estimates for a cross-tabulation of two sex categories, four race categories, 10 age categories and the $K$ response categories across 49 states and 12 weeks. For the given ordinal response with $K=4$ there are $94,080$ domains to estimate. Relying on direct estimators is of little use in this context, where $41\%$ of the domains have zero respondents, only $3\%$ of cells have a sample size greater than 30, and the median sample size for all cells is one. The effect of this sparsity is reflected in Figure 3 which plots the direct estimates for a single response category among Asian males between the ages of 36 and 40. Many states have no estimate and different states are missing estimates in different weeks, meaning uninterrupted longitudinal estimates are generally unobtainable. Where the direct estimator can be calculated, the resulting values tend to be at the boundary, either zero or one.

The standard errors of the direct estimates are calculated using the replicate weights method in the R survey library (Lumley,, 2004). Like the point estimates, these values tend to be extreme and the model estimates improve on them substantially. In fact, for $60\%$ of domains, a valid standard error estimate cannot be produced for the direct estimator, either because the sample size is too low or because all response values in a given cell are identical. For the remaining cells where direct estimate standard errors can be produced, plotting the ratio of model standard errors to those of the direct estimate in Figure 4 demonstrates a reduction in all but nine domains ( $.01\%$ of the total).

Moreover, model estimates in Figure 5 show much smoother trends across both space and time whereas the direct estimates in Figure 3 jump around dramatically. For example, the estimated proportion for Florida is nearly zero in week 8, nearly one in the following week, then back to zero in week 10. The underlying truth is highly unlikely to change so rapidly. A similar pattern can be observed for a number of other states, such as Iowa, which jumps between two extreme values in weeks six and seven, with the additional complication that many weeks also have no estimate so that the trend is unobservable.

Additionally, if we want to compare multiple response categories simultaneously, our unit-level modeling approach allows us to visualize detailed estimates as in Figure 6, which plots estimated trajectories for the proportion of each category in a given demographic group by state. For readability, the states are grouped into the nine geographical Census divisions. The first timepoint of the plots corresponds to April 23, 2020 and the last corresponds to July 21. Over the course of this time period the proportion of respondents reporting no anxiety at all grows and then decreases. In week two, all states exhibit an increase in the proportion of “several days” of anxiety while “nearly every day” stays flat until an uptick occurs, starting in week eight. We can also see that some Census divisions, like the Pacific, are fairly homogeneous. In contrast, the Mountain division shows more variability and Nevada, whose economy is far more dependent on tourism, is estimated to have distinctly more anxiety than its neighbors.

Of further interest are group comparisons, which the direct estimator would be incapable of handling, since it considers each group independently. Figure 7 shows a comparison between the trajectories for men and women in a single Census division. The trends between the two groups are similar across states, but women are more likely to report some frequency of anxiety and less likely to report feeling not at all anxious. This is in line with existing research (Metin et al.,, 2022). Similar patterns hold for the other Census divisions.

6 Discussion

In this paper, we have introduced Bayesian models for both nominal and ordinal survey data and both cross-sectional and longitudinal designs, where a special case of the cross-sectional ordinal model is a conjugate Bayesian, ordinal logistic regression. Alongside the work of Parker et al., (2022) and Vedensky et al., (2023), this covers all of the most common types of survey responses within a unified, unit-level modeling framework. The models are efficient, capture dependencies that area-level models are not able to, and lead to great improvements over direct estimators as shown in the simulation study. The application to HPS phase 1 data regarding mental health symptoms demonstrates the types of fine-grained comparisons we are able to make that would not be possible in an area-level approach.

The longitudinal models in this paper extend readily to a number of large and important surveys that follow rotating panel designs such as the Survey of Income and Program Participation or the National Crime Victimization Survey. Future work remains to mitigate some of the biases inherent to these designs such as rotation group bias (Bailar,, 1975) and to consider other longitudinal designs, possibly with uneven sampling intervals.

Acknowledgements

This article is released to inform interested parties of ongoing research and to encourage discussion. The views expressed on statistical issues are those of the authors and not those of the NSF or U.S. Census Bureau.

Funding

This research was partially supported by the Census Bureau Dissertation Fellowship program and the U.S. National Science Foundation (NSF) under NSF grants NCSE-2215168, and NCSE-2215169.

Appendix

MCMC algorithms for the cross-sectional ordinal and longitudinal ordinal are presented below, as is the VB algorithm for longitudinal, nominal data.. For details on the cross-sectional nominal MCMC and VB algorithms, see Parker et al., (2022).

Gibbs sampling algorithms

Cross-sectional ordinal

1.

Sample $\omega_{ik}|\cdot\sim\text{PG}(\widetilde{w}_{i}n_{ik},\bm{u}_{ik}^{\prime}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}^{\prime}\bm{\eta})$ for $i=1,\ldots,n$ and $k=1,\ldots,k_{i}$
2.

Sample $\sigma^{2}_{\eta}|\cdot\sim\text{IG}(a+m/2,b+\bm{\eta}^{\prime}\bm{\eta}/2)$
3.

Sample $\bm{\beta}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\left(\bm{X}^{\prime}\bm{\Omega}(G^{\prime}\bm{\gamma}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{X}^{\prime}\bm{\Omega}\bm{X}+\frac{1}{\sigma^{2}_{\beta}}I_{p}\right)^{-1}\right)$
4.

Sample $\bm{\gamma}|\cdot\sim\text{N}_{g}\left(\bm{\mu}=\left(\bm{G}^{\prime}\bm{\Omega}(X^{\prime}\bm{\beta}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{G}^{\prime}\bm{\Omega}\bm{G}+\frac{1}{\sigma^{2}_{\gamma}}I_{g}\right)^{-1}\right)$
5.

Sample $\bm{\eta}|\cdot\sim\text{N}_{m}\left(\bm{\mu}=\left(\bm{\Psi}^{\prime}\bm{\Omega}(X^{\prime}\bm{\beta}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{\Psi}^{\prime}\bm{\Omega}\bm{\Psi}+\frac{1}{\sigma^{2}_{\eta_{1}}}I_{g}\right)^{-1}\right)$

Longitudinal ordinal

Below, $\Psi_{f}$ is the matrix of basis functions for all areas and timepoints, corresponding to $\text{vec}(\bm{\eta})$ .

Sample $\omega_{itk}|\cdot$ for $i=1,\ldots,n_{t}$ , $t=1,\ldots,n_{t}$ , $k=1,\ldots,k_{i}$

\omega_{itk}|\cdot\sim\text{PG}(\widetilde{w}_{it}n_{itk},\bm{u}_{itk}^{\prime}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}^{\prime}\bm{\eta}_{t})

2.

Sample $\phi|\cdot\sim\text{TruncNorm}\left(\frac{\sum_{t=2}^{T}\bm{\eta}_{t}^{\prime}\bm{\eta}_{t-1}}{\sum_{t=2}^{T}\bm{\eta}_{t-1}^{\prime}\bm{\eta}_{t-1}},\frac{\sigma^{2}_{\eta}}{\sum_{t=2}^{T}\bm{\eta}_{t-1}^{\prime}\bm{\eta}_{t-1}},-1,1\right)$
3.

Sample $\sigma^{2}_{\eta_{1}}|\cdot\sim\text{IG}(a+m/2,b+\frac{1}{2}\bm{\eta}_{1}^{\prime}\bm{\eta}_{1})$
4.

Sample $\sigma^{2}_{\eta}|\cdot\sim\text{IG}(a+m(T-1)/2,b+\sum_{t=2}^{T}(\bm{\eta}_{t}-\phi\bm{\eta}_{t-1})^{\prime}(\bm{\eta}_{t}-\phi\bm{\eta}_{t-1})/2)$

Sample $\bm{\beta}:$

\bm{\beta}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\left(\bm{\widetilde{}}{\bm{X}}^{\prime}\bm{\Omega}(\widetilde{\bm{G}}\bm{\gamma}-\widetilde{\bm{\kappa}}/\bm{\omega}-\widetilde{\bm{\Psi}}_{f}\text{vec}(\bm{\eta}))\right),\bm{\Sigma}=\left(\widetilde{\bm{X}}^{\prime}\bm{\Omega}\widetilde{\bm{X}}+\frac{1}{\sigma^{2}_{\beta}}I_{p}\right)^{-1}\right)

Sample $\bm{\gamma}:$

\bm{\gamma}|\cdot\sim\text{N}_{g}\left(\bm{\mu}=\left(\widetilde{\bm{G}}^{\prime}\bm{\Omega}(\widetilde{\bm{\kappa}}/\bm{\omega}+\widetilde{\bm{X}}^{\prime}\bm{\beta}+\widetilde{\bm{\Psi}}_{f}\text{vec}(\bm{\eta})\right),\bm{\Sigma}=\left(\widetilde{\bm{G}}^{\prime}\bm{\Omega}\widetilde{\bm{G}}+\frac{1}{\sigma^{2}_{\gamma}}I_{g}\right)^{-1}\right)

Sample $\bm{\eta}_{1}:$

\bm{\eta}_{1}|\cdot\sim\text{N}_{m}\left(\bm{\mu}=\left(\widetilde{\bm{\Psi}}_{1}^{\prime}\bm{\Omega}_{1}(\widetilde{\bm{G}}_{1}^{\prime}\bm{\gamma}-\widetilde{\bm{\kappa}}_{1}/\bm{\omega}_{1}-\widetilde{\bm{X}}_{1}\bm{\beta})\right),\bm{\Sigma}=\left(\widetilde{\bm{\Psi}}_{1}^{\prime}\bm{\Omega}_{1}\widetilde{\bm{\Psi}}_{1}+\left(\frac{1}{\sigma^{2}_{\eta_{1}}}+\frac{\phi^{2}}{\sigma^{2}_{\eta}}\right)I_{m}\right)^{-1}\right)

Sample $\bm{\eta}_{t}$ , for $t=2,\ldots,T-1:$

\bm{\eta}_{t}|\cdot\sim\begin{aligned} \text{N}_{p}\Bigg{(}\bm{\mu}&=\bm{\Sigma}\left(\widetilde{\bm{\Psi}}_{t}^{\prime}\bm{\Omega}_{t}(\widetilde{\bm{G}}^{\prime}_{t}\bm{\gamma}-\widetilde{\bm{\kappa}}_{t}/\bm{\omega}_{t}-\widetilde{\bm{X}}_{t}\bm{\beta})+\frac{\phi}{\sigma^{2}_{\eta}}(\bm{\eta}_{t-1}+\bm{\eta}_{t+1})\right),\\ \bm{\Sigma}&=\left(\widetilde{\bm{\Psi}}_{t}^{\prime}\bm{\Omega}_{t}\widetilde{\bm{\Psi}}_{t}+\frac{1+\phi^{2}}{\sigma^{2}_{\eta}}I_{m}\right)^{-1}\Bigg{)}\end{aligned}

Sample $\bm{\eta}_{T}$

\bm{\eta}_{T}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\bm{\Sigma}\left(\widetilde{\bm{\Psi}}_{T}^{\prime}\bm{\Omega}_{T}(\bm{\widetilde{\bm{G}}}_{T}^{\prime}\bm{\gamma}_{T}-\widetilde{\bm{\kappa}}_{T}/\bm{\omega}_{T}-\widetilde{\bm{X}}_{T}\bm{\beta})+\frac{\phi}{\sigma^{2}_{\eta}}\bm{\eta}_{T-1}\right),\bm{\Sigma}=\left(\widetilde{\bm{\Psi}}_{T}^{\prime}\bm{\Omega}_{T}\widetilde{\bm{\Psi}}_{t}+\frac{1}{\sigma^{2}_{\eta}}I_{m}\right)^{-1}\right)

Longitudinal, nominal VB algorithm

For longitudinal, nominal data, the binary VB algorithm below can be fit $K-1$ times to data that is factored according to the stick-breaking representation (2.3).

Data:

{\bm{C}}=[{\bm{X}},{\bm{\Psi}}]

and

{\bm{\kappa}}

Initialize

\widecheck{\bm{\mu}}_{\eta}

\widecheck{\mu}_{\phi}

\widecheck{\mu}_{\phi^{2}}

\widecheck{\sigma}^{2}_{\phi}

\widecheck{\bm{\Sigma}}_{\eta_{t}}

\widecheck{\sigma}^{2}_{\eta}

and

\widecheck{\xi}_{itk}

for

t=1,\ldots,T

;

i=1,\ldots,n_{t}

;

for $j=1$ until convergence do

\bm{\widecheck{\omega}}\leftarrow\left(\frac{\widetilde{w}_{11}}{2\widecheck{\xi}_{11}}\tanh(\widecheck{\xi}_{11}/2),\ldots,\frac{\widetilde{w}_{n_{T}T}}{2\widecheck{\xi}_{n_{T}T}}\tanh(\widecheck{\xi}_{n_{T}T}/2)\right);

\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\widecheck{\bm{\omega}}\right);

\bm{\widecheck{\mu}}_{\beta}\leftarrow({\bm{X}}^{\prime}\widecheck{\bm{\Omega}}{\bm{X}}+1/\sigma^{2}_{\beta}\bm{I}_{p})^{-1}{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}({\bm{\kappa}}/\widecheck{\bm{\omega}}-{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});

\bm{\widecheck{\Sigma}}_{\eta_{1}}\leftarrow({\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}{\bm{\Psi}}_{1}+\widecheck{\sigma}^{2}_{\eta_{1}}(1+\widecheck{\mu}_{\phi^{2}})\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\widecheck{\bm{\Sigma}}_{\eta_{1}}{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}({\bm{\kappa}}_{1}/\widecheck{\bm{\omega}}_{1}-{\bm{X}}_{1}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\bm{\mu}}_{\eta_{2}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta_{1}};

\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\bm{\widecheck{\mu}}_{\eta_{1}}-\mathbb{E}[\bm{\widecheck{\mu}}_{\eta_{1}}];

for $t=1$ to $T$ do

\bm{\widecheck{\Sigma}}_{\eta_{t}}\leftarrow({\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}{\bm{\Psi}}_{t}+(1+\widecheck{\mu}_{\phi^{2}})\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{t}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{t}}{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}({\bm{\kappa}}_{t}/\widecheck{\bm{\omega}}_{t}-{\bm{X}}_{t}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta}(\bm{\widecheck{\mu}}_{\eta_{t}}+\bm{\widecheck{\mu}}_{\eta_{t+1}});

end for

\bm{\widecheck{\Sigma}}_{\eta_{T}}\leftarrow({\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}{\bm{\Psi}}_{T}+\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};

\bm{\widecheck{\mu}}_{\eta_{T}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{T}}{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}({\bm{\kappa}}_{T}/\widecheck{\bm{\omega}}_{T}-{\bm{X}}_{T}\widecheck{\bm{\mu}}_{\beta})+\bm{\widecheck{\mu}}_{\eta_{T-1}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta};

\bm{\widecheck{\mu}}_{\eta}\leftarrow(\bm{\widecheck{\mu}}_{\eta_{1}},\ldots,\bm{\widecheck{\mu}}_{\eta_{T}});

\bm{\widecheck{\Sigma}}_{\eta}\leftarrow\text{blockdiag}(\bm{\widecheck{\Sigma}}_{\eta_{1}},\ldots,\bm{\widecheck{\Sigma}}_{\eta_{T}});

\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\beta}}I_{q},\bm{\widecheck{\Sigma}}_{\eta}\right)+{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}{\bm{C}}\right)^{-1};

\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}{\bm{C}}^{\prime}{\bm{\kappa}};

m\leftarrow\sum_{t=1}^{T-1}\bm{\widecheck{\mu}}_{\eta_{t}}^{\prime}\bm{\widecheck{\mu}}_{\eta_{t+1}}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);

\widecheck{\sigma}^{2}_{\phi}\leftarrow\widecheck{\sigma}^{2}_{\eta}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);

\ell\leftarrow(-1-m)/\widecheck{\sigma}_{\phi};

u\leftarrow(1-m)/\widecheck{\sigma}_{\phi};

\widecheck{\mu}_{\phi}\leftarrow m-\widecheck{\sigma}_{\phi}\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)};

\widecheck{\mu}_{\phi^{2}}\leftarrow\widecheck{\mu}_{\phi}^{2}+\widecheck{\sigma}^{2}_{\phi}\left(1-\frac{u\varphi(u)-\ell\varphi(\ell)}{\Phi(u)-\Phi(\ell)}-\left(\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)}\right)^{2}\right);

\widecheck{\sigma}^{2}_{\eta_{1}}\leftarrow(b+\widecheck{\bm{\mu}}_{\eta_{1}}^{\prime}\widecheck{\bm{\mu}}_{\eta_{1}}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{1}}))/(a+r/2);

\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+(1/2)(\sum_{t=2}^{T}\widecheck{\bm{\eta}}_{1}^{\prime}\widecheck{\bm{\eta}}_{1}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-1}})-2\widecheck{\mu}_{\phi}\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}\widecheck{\bm{\eta}}_{t+1}+

\widecheck{\mu}_{\phi^{2}}(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}}))))/(a+r(T-1)/2);

for $i=1$ to $n$ do

for $t=t_{i},t_{i}+1,t_{i}+2$ do

\widecheck{\xi}_{it}\leftarrow(\widetilde{\bm{C}}_{it}^{\prime}\widecheck{\bm{\Sigma}}{\bm{C}}_{it}+({\bm{C}}_{it}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2}

end for

Algorithm 3 Variational Bayes algorithm for the longitudinal binary model

References

Agresti, (2010) Agresti, A. (2010). Analysis of Ordinal Categorical Data, volume 656. John Wiley & Sons.
Albert and Chib, (1993) Albert, J. H. and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association, 88(422):669–679.
Albert and Chib, (2001) Albert, J. H. and Chib, S. (2001). Sequential Ordinal Modeling with Applications to Survival Data. Biometrics, 57(3):829–836.
Bailar, (1975) Bailar, B. A. (1975). The effects of rotation group bias on estimates from panel surveys. Journal of the American Statistical Association, 70(349):23–30.
Bauder et al., (2021) Bauder, C., Luery, D., and Szelepka, S. (2021). Small Area Estimation of Health Insurance Coverage in 2010-2019.
Beltrán-Sánchez et al., (2024) Beltrán-Sánchez, M. Á., Martinez-Beneito, M., and Corberán-Vallet, A. (2024). Bayesian modeling of spatial ordinal data from health surveys. Statistics in Medicine.
Binder, (1983) Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review, 51(3):279.
Bishop, (2006) Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.
Blei et al., (2017) Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518):859–877.
Boes and Winkelmann, (2006) Boes, S. and Winkelmann, R. (2006). Ordered Response Models. Allgemeines Statistisches Archiv, 90:167–181.
Bradley et al., (2015) Bradley, J. R., Holan, S. H., and Wikle, C. K. (2015). Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics. The Annals of Applied Statistics, 9(4):1761 – 1791.
Brewer et al., (1984) Brewer, K., Early, L., and Hanif, M. (1984). Poisson, Modified Poisson and Collocated Sampling. Journal of Statistical Planning and Inference, 10(1):15–30.
Bürkner and Vuorre, (2019) Bürkner, P.-C. and Vuorre, M. (2019). Ordinal Regression Models in Psychology: Tutorial. Advances in Methods and Practices in Psychological Science, 2(1):77–101.
Carter et al., (2024) Carter, J. B., Browning, C. R., Boettner, B., Pinchak, N., and Calder, C. A. (2024). Land-use filtering for nonstationary spatial prediction of collective efficacy in an urban environment. The Annals of Applied Statistics, 18(1).
Daly and Robinson, (2022) Daly, M. and Robinson, E. (2022). Depression and anxiety during COVID-19. The Lancet, 399(10324):518.
Durante and Rigon, (2019) Durante, D. and Rigon, T. (2019). Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Statistical Science, 34(3).
Fienberg, (1980) Fienberg, S. E. (1980). The analysis of cross-classified categorical data. Massachusetts Institute of Technology Press, Cambridge and London.
Fienberg, (2007) Fienberg, S. E. (2007). The Analysis of Cross-classified Categorical Data. Springer Science & Business Media.
Gelman, (2007) Gelman, A. (2007). Struggles with Survey Weighting and Regression Modeling. Statistical Science, 22(2).
Gneiting and Raftery, (2007) Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359–378.
Hawes et al., (2022) Hawes, M. T., Szenczy, A. K., Klein, D. N., Hajcak, G., and Nelson, B. D. (2022). Increases in depression and anxiety symptoms in adolescents and young adults during the COVID-19 pandemic. Psychol. Med., 52(14):3222–3230.
Hidiroglou and You, (2016) Hidiroglou, M. A. and You, Y. (2016). Comparison of Unit Level and Area Level Small Area Estimators. Survey Methodology, 42:41–61.
Higgs and Hoeting, (2010) Higgs, M. D. and Hoeting, J. A. (2010). A clipped latent variable model for spatially correlated ordered categorical data. Computational Statistics & Data Analysis, 54(8):1999 – 2011.
Horvitz and Thompson, (1952) Horvitz, D. G. and Thompson, D. J. (1952). A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association, 47(260):663–685.
Hughes and Haran, (2013) Hughes, J. and Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(1):139–159.
Jaakkola and Jordan, (2000) Jaakkola, T. and Jordan, M. I. (2000). Bayesian Parameter Estimation via Variational Methods. Statistics and Computing, 10:25–37.
Jimenez et al., (2023) Jimenez, A., Balamuta, J. J., and Culpepper, S. A. (2023). A Sequential Exploratory Diagnostic Model Using a Pólya-Gamma Data Augmentation Strategy. British Journal of Mathematical and Statistical Psychology, 76(3):513–538.
Johnson et al., (1994) Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions, volume 1 of Wiley Series in Probability and Statistics. John Wiley & Sons, Nashville, TN, 2 edition.
(29) Kang, J. and Kottas, A. (2024a). Bayesian Nonparametric Risk Assessment in Developmental Toxicity Studies with Ordinal Responses. arXiv preprint arXiv:2408.11803.
(30) Kang, J. and Kottas, A. (2024b). Flexible Bayesian Modeling for Longitudinal Binary and Ordinal Responses. Statistics and Computing, 34(6).
Kullback and Leibler, (1951) Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86.
Kunihama et al., (2019) Kunihama, T., Halpern, C. T., and Herring, A. H. (2019). Non-parametric Bayes Models for Mixed Scale Longitudinal Surveys. Journal of the Royal Statistical Society Series C: Applied Statistics, 68(4):1091–1109.
Linderman et al., (2015) Linderman, S., Johnson, M. J., and Adams, R. P. (2015). Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-Gamma Augmentation. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
Lumley, (2004) Lumley, T. (2004). Analysis of Complex Survey Samples. Journal of Statistical Software, 9(1):1–19. R package verson 2.2.
Machini et al., (2022) Machini, B., Achia, T. N., Chesang, J., Amboko, B., Mwaniki, P., and Kipruto, H. (2022). Cross-sectional Study to Predict Subnational Levels of Health Workers’ Knowledge about Severe Malaria Treatment in Kenya. BMJ Open, 12(1):e058511.
McCullagh, (1980) McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109–127.
Metin et al., (2022) Metin, A., Erbiçer, E. S., Şen, S., and Çetinkaya, A. (2022). Gender and COVID-19 related fear and anxiety: A meta-analysis. Journal of Affective Disorders, 310:384–395.
Parker et al., (2022) Parker, P. A., Holan, S. H., and Janicki, R. (2022). Computationally efficient Bayesian Unit-level models for non-Gaussian data Under informative sampling with application to estimation of health insurance coverage. The Annals of Applied Statistics, 16(2):887 – 904.
Parker et al., (2023) Parker, P. A., Holan, S. H., and Janicki, R. (2023). Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure. Journal of Survey Statistics and Methodology.
Polson et al., (2013) Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian Inference for Logistic Models Using Pólya-Gamma Latent Variables. Journal of the American Statistical Association, 108(504):1339–1349.
Savitsky and Toth, (2016) Savitsky, T. D. and Toth, D. (2016). Bayesian estimation under informative sampling. Electronic Journal of Statistics, 10(1):1677 – 1708.
Schliep and Hoeting, (2015) Schliep, E. M. and Hoeting, J. A. (2015). Data augmentation and parameter expansion for independent or spatially correlated ordinal data. Computational Statistics & Data Analysis, 90:1–14.
Skinner, (2018) Skinner, C. (2018). Analysis of Categorical Data for Complex Surveys. International Statistical Review, 87(S1).
Skinner, (1989) Skinner, C. J. (1989). Domain means, regression and multivariate analysis. In Skinner, C. J., Holt, D., and Smith, T. M. F., editors, Analysis of Complex Surveys, chapter 2, pages 59–84. Wiley, Chichester.
Sutradhar and Kovacevic, (2000) Sutradhar, B. C. and Kovacevic, M. (2000). Analysing ordinal longitudinal survey data: Generalised estimating equations approach. Biometrika, 87(4):837–848.
Thompson, (2015) Thompson, M. E. (2015). Using Longitudinal Complex Survey Data. Annual Review of Statistics and Its Application, 2(1):305–320.
Tutz, (1990) Tutz, G. (1990). Sequential Item Response Models with an Ordered Response. British Journal of Mathematical and Statistical Psychology, 43(1):39–55.
Tutz, (1991) Tutz, G. (1991). Sequential Models in Categorical Regression. Computational Statistics & Data Analysis, 11(3):275–295.
Tutz et al., (2005) Tutz, G., Simonoff, J., Kateri, M., Lesaffre, E., Loughin, T., Svensson, E., Aguilera, A., Liu, I., and Agresti, A. (2005). The analysis of ordered categorical data: An overview and a survey of recent developments – Discussion. TEST, 14(1):30–73.
Vedensky et al., (2023) Vedensky, D., Parker, P. A., and Holan, S. H. (2023). Bayesian Unit-level Models for Longitudinal Survey Data under Informative Sampling: An Analysis of Expected Job Loss Using the Household Pulse Survey. arXiv preprint arXiv:2304.07897.
World Health Organization, (2022) World Health Organization (2022). Mental Health and COVID-19: Early evidence of the pandemic’s impact. Technical report, World Health Organization.