This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11affiliationtext: Department of Statistics, University of Missouri22affiliationtext: Department of Statistics, University of California, Santa Cruz33affiliationtext: Research and Methodology Directorate, U.S. Census Bureau

Bayesian Unit-level Modeling of Categorical Survey Data with a Longitudinal Design

Daniel Vedensky Paul A. Parker Scott H. Holan
Abstract

Categorical response data are ubiquitous in complex survey applications, yet few methods model the dependence across different outcome categories when the response is ordinal. Likewise, few methods exist for the common combination of a longitudinal design and categorical data. By modeling individual survey responses at the unit-level, it is possible to capture both ordering information in ordinal responses and any longitudinal correlation. However, accounting for a complex survey design becomes more challenging in the unit-level setting. We propose a Bayesian hierarchical, unit-level, model-based approach for categorical data that is able to capture ordering among response categories, can incorporate longitudinal dependence, and accounts for the survey design. To handle computational scalability, we develop efficient Gibbs samplers with appropriate data augmentation as well as variational Bayes algorithms. Using public-use microdata from the Household Pulse Survey, we provide an analysis of an ordinal response that asks about the frequency of anxiety symptoms at the beginning of the COVID-19 pandemic. We compare both design-based and model-based estimators and demonstrate superior performance for the proposed approaches.

Key words: Bayesian hierarchical model; Bayesian ordinal logistic regression; Informative sample; Longitudinal survey data; Pseudo-likelihood; Small area estimation

footnotetext: To whom correspondence should be addressed. dvedensky@mail.missouri.edu

1 Introduction

In the context of survey data applications, modeling individual records (i.e., “unit-level modeling”) rather than area-level aggregates is often necessary. For ordinal-valued responses, modeling the individual survey responses allows us to capture dependence among the response categories, which is not possible in area-level models that model group averages. Likewise, for longitudinal studies, modeling at the unit-level is necessary in order to link individual responses and capture the serial correlation across a given respondent’s survey responses. In small area estimation applications, unit-level models have the additional benefit of allowing for arbitrary aggregation of data and may increase the precision of estimates (Hidiroglou and You,, 2016). Yet, despite the ubiquity of non-Gaussian survey data collected according to a longitudinal design, there is scarce unit-level methodology for such settings. Notably, there are few existing methods for the combination of ordinal and longitudinal data that are able to incorporate spatio-temporal dependence and account for complex survey designs.

In general, more attention has been paid to binary and nominal responses in the complex survey literature (Skinner,, 2018). In contrast, in the spatial literature for complex surveys, categorical data are typically modeled at the area level, which requires calculating weighted proportions first. When the data follow a natural ordering, it is common to ignore this structure and instead model the ordinal proportions as though they were nominal. For example, Bauder et al., (2021) model multinomial probabilities as multivariate normal after employing a logit transformation. Although it is possible to induce an ordinal structure by modifying the covariance matrix (Agresti,, 2010), this does not capture the dependence a unit-level approach would.

Meanwhile, most existing unit-level methodology for categorical survey data does not fully take into account the survey design or applies only to cross-sectional surveys. For example, Machini et al., (2022) present an ordinal model for spatial data, but disregard the survey design, Beltrán-Sánchez et al., (2024) model ordinal survey data with spatial dependence, but rely on the restrictive assumption that all design information is available in the form of covariates. Methods that do account for a longitudinal survey design generally assume Gaussian, binary, or occasionally nominal data. One exception is the longitudinal model of Kunihama et al., (2019) that is able to handle multivariate responses and incorporates the weights into a Bayesian nonparametric (BNP) sampling algorithm. However, the response types considered are continuous, binary, or nominal, but not ordinal. Sutradhar and Kovacevic, (2000) consider ordinal, longitudinal, complex design, but use GEE which does not easily extend to the spatial setting or allow for straightforward uncertainty quantification.

In this paper, we propose an ordinal model within a unit-level modeling framework that incorporates survey weights, spatial, and temporal dependence. We do so by extending the work of Parker et al., (2022) and Vedensky et al., (2023). The former models cross-sectional binary and nominal complex survey data at the unit level while the latter builds on this approach to cover longitudinal data with Gaussian or binary responses. By employing a particular stick-breaking representation of a multinomial likelihood (Linderman et al.,, 2015)—also known as a continuation ratio factorization (Fienberg,, 2007)—along with Pólya-Gamma data augmentation (Polson et al.,, 2013), we obtain conjugate sampling algorithms, with Bayesian, ordinal logistic regression as a special case. When data are especially high-dimensional, MCMC algorithms may prove insufficient, and we introduce Variational Bayes (VB) algorithms (Blei et al.,, 2017) as an alternative to Gibbs sampling for such applications.

Recent work has made use of this combination of the continuation ratio factorization and Pólya-Gamma data augmentation. Kang and Kottas, 2024a develop a BNP model for cross-sectional data and Kang and Kottas, 2024b extend this to functional, longitudinal data, modeling continuous curves. In the psychological literature, Jimenez et al., (2023) cover the special case of an item-response model with ordinal values. However, modifying these approaches to handle complex survey data with a spatial component is nontrivial, and to our knowledge, the link to Bayesian logistic regression has not been pointed out elsewhere.

To illustrate the efficacy of our proposed methodology, we apply our models to data from the Household Pulse Survey (HPS), a high-dimensional complex survey that follows a longitudinal design. This includes an empirical simulation study and a full data analysis.

This article proceeds as follows. In Section 2 we describe our general approach to modeling ordinal data. Section 3 describes how the ordinal and nominal models may be extended for use with survey data. In Section 4 variational Bayes algorithms for each of the models are introduced. We then present empirical results in Section 5 and conclude with a discussion in Section 6.

2 Bayesian Ordinal Logistic Regression

Let 𝒰={1,,N}\mathcal{U}=\{1,\ldots,N\} index a population, from which we observe a sample 𝒮𝒰\mathcal{S}\subseteq\mathcal{U} of size |𝒮|=n|\mathcal{S}|=n. Associated with each element of the sample are a set of covariates 𝒙i=(xi1,,xiq)\bm{x}_{i}=(x_{i1},\ldots,x_{iq}) as well as an ordinal-valued response yi{1,,K}y_{i}\in\{1,\ldots,K\}, with K>2K>2 possible categories. We also define the one-hot encoded vector representation of the response, 𝒚~i=(𝟙(yi=1),𝟙(yi=2),,𝟙(yi=K))\widetilde{\bm{y}}_{i}=(\mathbbm{1}(y_{i}=1),\mathbbm{1}(y_{i}=2),\ldots,\mathbbm{1}(y_{i}=K)), where 𝟙()\mathbbm{1}(\cdot) denotes the indicator function. The response vectors for the entire sample may be collated into the nKnK-dimensional vector 𝒚~=(𝒚~1,𝒚~2,,𝒚~n).\widetilde{\bm{y}}=(\widetilde{\bm{y}}_{1},\widetilde{\bm{y}}_{2},\ldots,\widetilde{\bm{y}}_{n}).

It is common to model such data with a cumulative model (McCullagh,, 1980), where latent, ordered cutpoints 0γ1γK10\leq\gamma_{1}\leq\cdots\leq\gamma_{K-1} are introduced and it is assumed

P(yik)=F(γk𝒙i𝜷),P(y_{i}\leq k)=F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}),

with FF being any strictly monotonic distribution function and 𝜷{\bm{\beta}} a set of regression coefficients. This implies the probability of observing a given category to be

P(yi=k)=F(γk𝒙i𝜷)F(γk1𝒙i𝜷).P(y_{i}=k)=F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta})-F(\gamma_{k-1}-\bm{x}_{i}^{\prime}\bm{\beta}).

In the Bayesian setting, Albert and Chib, (1993) propose a popular data augmentation algorithm for cumulative, ordinal probit regression. They take FF to be the standard normal cdf Φ\Phi and add Gaussian latent variables to produce an easily implementable Gibbs sampler. This is the most widely used Bayesian formulation for ordinal models (Agresti,, 2010), and has been adapted to spatial models, as well (Higgs and Hoeting,, 2010; Schliep and Hoeting,, 2015; Carter et al.,, 2024).

Tutz, (1990, 1991) specify an alternative latent variable formulation for a sequential ordinal model, which has received relatively less attention in the literature. However, a number of authors have noted benefits, such as more flexible modeling of response curves and the possibility for category-specific covariates (Tutz et al.,, 2005; Bürkner and Vuorre,, 2019; Boes and Winkelmann,, 2006; Kang and Kottas, 2024a, ). In contrast to the cumulative approach, the sequential approach assumes no ordering for 𝜸=(γ1,,γK1)\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{K-1}), and that the conditional probability for observing a given category k<Kk<K is

P(yi=k|𝜷,𝜸)=F(γk𝒙iβ)j<k(1F(γj𝒙i𝜷))P(y_{i}=k|\bm{\beta},\bm{\gamma})=F(\gamma_{k}-\bm{x}_{i}^{\prime}\beta)\prod_{j<k}(1-F(\gamma_{j}-\bm{x}_{i}^{\prime}\bm{\beta})) (2.1)

and

P(yi=K|𝜷,𝜸)=j=1K1(1F(γj𝒙i𝜷))P(y_{i}=K|\bm{\beta},\bm{\gamma})=\prod_{j=1}^{K-1}(1-F(\gamma_{j}-\bm{x}_{i}^{\prime}\bm{\beta})) (2.2)

for the last category. In this way, the probability of observing a particular category may be viewed as a sequential process, where category kk can only be attained if all previous categories, j<kj<k are not observed. The final category, KK, is observed only if all other categories fail to be attained.

This formulation implies an independent binomial model for each response category, conditional on 𝜷\bm{\beta} and 𝜸\bm{\gamma}, via the continuation-ratio parametrization of Fienberg, (1980), which is also referred to as a “stick-breaking” representation (Linderman et al.,, 2015). Specifically, a multinomial pmf parameterized by counts nn and a vector of probabilities 𝝅i=(πi1,,πiK)\bm{\pi}_{i}=(\pi_{i1},\ldots,\pi_{iK}), can be expressed as the product

Multinomial(𝒚~i|ni,𝝅i)=k=1K1Binomial(y~ik|nik,π~ik),\text{Multinomial}({\widetilde{\bm{y}}}_{i}|n_{i},\bm{\pi}_{i})=\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}|n_{ik},\widetilde{\pi}_{ik}), (2.3)

where πik=π~ikj<k(1π~ij)\pi_{ik}=\widetilde{\pi}_{ik}\prod_{j<k}(1-\widetilde{\pi}_{ij}) and nik=nij<kyjn_{ik}=n_{i}-\sum_{j<k}y_{j} for k=1,,K1k=1,\ldots,K-1. That is, if we let πik=P(yi=k|𝜷,𝜸)\pi_{ik}=P(y_{i}=k|\bm{\beta},\bm{\gamma}) and π~ik=F(γk𝒙i𝜷)\widetilde{\pi}_{ik}=F(\gamma_{k}-\bm{x}_{i}\bm{\beta}), with FF the inverse logit function, we have that (2.1) and (2.2) define a multinomial likelihood over the ordinal categories.

Placing Gaussian priors on 𝜷\bm{\beta} and 𝜸\bm{\gamma}, we may then specify the full ordinal logistic regression model hierarchically as a Bayesian GLM

p(𝒚~|𝜷)\displaystyle p(\bm{\widetilde{y}}|\bm{\beta}) =i𝒮k=1K1Binomial(y~ik|nik,π~ik)\displaystyle=\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}|n_{ik},\widetilde{\pi}_{ik})
logit(π~ik)\displaystyle\text{logit}(\widetilde{\pi}_{ik}) =γk𝒙i𝜷\displaystyle=\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}
𝜷\displaystyle\bm{\beta} Nq(𝟎,σβ2Iq)\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}\text{I}_{q}) (2.4)
𝜸\displaystyle\bm{\gamma} NK1(𝟎,σγ2IK1)\displaystyle\sim\text{N}_{K-1}(\bm{0},\sigma^{2}_{\gamma}I_{K-1})
σβ,σγ\displaystyle\sigma_{\beta},\sigma_{\gamma} >0.\displaystyle>0.

For nominal responses, where there is no ordering of response categories, model (2) simplifies to estimating logit(π~i)=𝒙i𝜷k\text{logit}(\widetilde{\pi}_{i})=\bm{x}_{i}\bm{\beta}_{k} with category-specific coefficients, 𝜷k\bm{\beta}_{k} and with no need for cutpoints γk\gamma_{k} (Parker et al.,, 2022)

p(𝒚~|𝜷)\displaystyle p(\bm{\widetilde{y}}|\bm{\beta}) =i𝒮k=1K1Binomial(y~ik|nik,π~ik)\displaystyle=\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\text{Binomial}(\widetilde{y}_{ik}|n_{ik},\widetilde{\pi}_{ik})
logit(π~ik)\displaystyle\text{logit}(\widetilde{\pi}_{ik}) =𝒙i𝜷k\displaystyle=\bm{x}_{i}^{\prime}\bm{\beta}_{k}
𝜷k\displaystyle\bm{\beta}_{k} Nq(𝟎,σβ2Iq)\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}\text{I}_{q}) (2.5)
σβ\displaystyle\sigma_{\beta} >0.\displaystyle>0.

Because both (2) and (2) have logistic likelihoods, it is possible to introduce latent variables that follow a Pólya-Gamma (PG) distribution (Polson et al.,, 2013) to obtain computationally efficient Gibbs samplers. In the case of (2) this yields a conjugate sampler for logistic ordinal regression.

A random variable XX is said to be distributed PG(b,c)(b,c) with shape parameter b>0b>0 and scale parameter cc\in\mathbb{R} if it is equal in distribution to

12π2k=1gk(k1/2)2+c2/(4π2),\frac{1}{2\pi^{2}}\sum_{k=1}^{\infty}\frac{g_{k}}{(k-1/2)^{2}+c^{2}/(4\pi^{2})},

where gkind.Gamma(b,1)g_{k}\overset{ind.}{\sim}\text{Gamma}(b,1). Polson et al., (2013) also show that

𝔼[X]=b2ctanh(c/2)\mathbb{E}[X]=\frac{b}{2c}\tanh(c/2) (2.6)

and derive the integral identity,

eaλ(1+eλ)b=2beκλ0eωλ2/2p(ω)𝑑ω,\frac{e^{a\lambda}}{(1+e^{\lambda})^{b}}=2^{-b}e^{\kappa\lambda}\int_{0}^{\infty}e^{-\omega\lambda^{2}/2}p(\omega)d\omega, (2.7)

where κ=ab/2\kappa=a-b/2 and ωPG(b,0)\omega\sim PG(b,0). In the right-hand side of (2.7), after conditioning on ω\omega, we have a Gaussian density in terms of λ\lambda. If we condition on λ\lambda, then we have a PG density in ω\omega. Therefore, sampling the model only requires alternating between Gaussian and PG random draws.

Notation for the full sampling algorithm is greatly simplified if, following Albert and Chib, (2001)’s algorithm for probit regression, we augment the covariate vector for each response such that if yi=ky_{i}=k, then 𝒙~ij=(0,,1,,0,𝒙i)\bm{\widetilde{x}}_{ij}=(0,\ldots,1,\ldots,0,-\bm{x}_{i}^{\prime}), for j=1,,min(k,K1)j=1,\ldots,\min(k,K-1), where the 11 is in the jjth position. Further, let 𝑿~i=(𝒙~i1,,𝒙~ij)\bm{\widetilde{X}}_{i}=(\widetilde{\bm{x}}_{i1},\ldots,\widetilde{\bm{x}}_{ij})^{\prime} be the new covariate matrix for respondent ii and

𝑿~=(𝑿~1𝑿~n)\bm{\widetilde{X}}=\begin{pmatrix}\widetilde{\bm{X}}_{1}\\ \vdots\\ \widetilde{\bm{X}}_{n}\end{pmatrix} (2.8)

the augmented design matrix for the entire sample. Letting 𝜹=(𝜸,𝜷),\bm{\delta}=(\bm{\gamma}^{\prime},\bm{\beta}^{\prime})^{\prime}, we then have

logit1(γk𝒙i𝜷)=logit1(𝒙~ik𝜹)=exp(𝒙~i𝜹)1+exp(𝒙~i𝜹),\text{logit}^{-1}(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta})=\text{logit}^{-1}(\widetilde{\bm{x}}_{ik}^{\prime}\bm{\delta})=\frac{\exp(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})}{1+\exp(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})},

so that the integral identity (2.7) allows us to write the multinomial data model in (2) as

i𝒮k=1K112eκik𝒙~i𝜹0eωik(𝒙~i𝜹)2/2p(ωik)𝑑ωik,\prod_{i\in\mathcal{S}}\prod_{k=1}^{K-1}\frac{1}{2}e^{\kappa_{ik}\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta}}\int_{0}^{\infty}e^{-\omega_{ik}(\widetilde{\bm{x}}_{i}^{\prime}\bm{\delta})^{2}/2}p(\omega_{ik})d\omega_{ik},

with κik=yik1/2.\kappa_{ik}=y_{ik}-1/2.

Assuming the prior distribution 𝜹N(𝒅,𝑫)\bm{\delta}\sim N(\bm{d},\bm{D}), a Gibbs sampler for the model requires iterating over only two steps

  1. 1.

    ωik|𝜹PG(1,𝒙~𝒊𝒌𝜹)\omega_{ik}|\bm{\delta}\sim\text{PG}(1,\bm{\widetilde{x}_{ik}}^{\prime}\bm{\delta}) (for i=1,,ni=1,\ldots,n and k=1,,kik=1,\ldots,k_{i})

  2. 2.

    𝜹|𝛀N(𝝁δ,𝚺δ)\bm{\delta}|\bm{\Omega}\sim\text{N}(\bm{\mu}_{\delta},\bm{\Sigma}_{\delta}),

where 𝚺δ=(𝑫1+𝑿~𝛀𝑿~)1\bm{\Sigma}_{\delta}=(\bm{D}^{-1}+\bm{\widetilde{X}^{\prime}\Omega\widetilde{X}})^{-1} and 𝝁δ=𝚺δ(𝑿~𝜿~+𝑫1𝒅)\bm{\mu}_{\delta}=\bm{\Sigma}_{\delta}(\bm{\widetilde{X}}^{\prime}\widetilde{\bm{\kappa}}+\bm{D}^{-1}\bm{d}) with 𝝎=(ω11,ω12,,ωnkn)\bm{\omega}=(\omega_{11},\omega_{12},\ldots,\omega_{nk_{n}}) and 𝛀=diag(𝝎).\bm{\Omega}=\text{diag}(\bm{\omega}). The vector 𝜿~\widetilde{\bm{\kappa}} is constructed by the same process as (2.8), where 𝜿~i=(κi1,,κij)\widetilde{\bm{\kappa}}_{i}=(\kappa_{i1},\ldots,\kappa_{ij})^{\prime} and 𝜿~=(κ~1,,κ~n),\widetilde{\bm{\kappa}}=(\widetilde{\kappa}_{1}^{\prime},\ldots,\widetilde{\kappa}_{n}^{\prime})^{\prime},

3 Categorical Models for Complex Survey Data

In order to apply the above models to unit-level survey data, it is important to consider a number of additional factors. First, complex surveys often collect data according to a design that may not be known to the data modeler (Gelman,, 2007). Even if the design is known, factors such as non-response, attrition (in the case of longitudinal surveys) (Thompson,, 2015), and informative sampling (Parker et al.,, 2023) may cause bias. These complications must be accounted for through proper use of survey weights. Large-scale surveys are also likely to exhibit spatial dependence and it is common to produce predictions for spatial (or demographic) domains with low sample sizes (i.e., small area estimation). Additionally, many surveys are conducted repeatedly over time, or employ a longitudinal structure, where the same people or households are re-interviewed, leading to temporal correlation. The Bayesian hierarchical modeling framework allows us to address each of these types of dependence by extending the data and process model components of (2).

3.1 Complex survey design

In area-level models, the survey design is incorporated into the direct estimates being modeled. For instance, suppose our population 𝒰\mathcal{U} can be partitioned into mm disjoint areas 𝒰j\mathcal{U}_{j} such that 𝒰=j=1m𝒰j\mathcal{U}=\bigcup_{j=1}^{m}\mathcal{U}_{j}. In a complex survey, a sample is taken according to known selection probabilities πij=P(i𝒮j)\pi_{ij}=P(i\in\mathcal{S}_{j}) and weights are defined as wij=1/πijw_{ij}=1/\pi_{ij}. To fit an area-level model, a direct estimator such as the Horvitz-Thompson estimator for the mean (Horvitz and Thompson,, 1952)

y¯^j=1|𝒰j|i𝒮jwijyij\widehat{\bar{y}}_{j}=\frac{1}{|\mathcal{U}_{j}|}\sum_{i\in\mathcal{S}_{j}}w_{ij}y_{ij} (3.1)

is calculated for each area and treated as the response variable. The inclusion of survey weights in (3.1) reflects the survey design, hence there is no bias due to an unrepresentative sample.

Unit-level models, on the other hand, are fit to individual survey responses, so incorporating survey weights becomes less straightforward. Parker et al., (2023) provide an overview of remedies for informative sampling/complex design in this setting. In particular, the sequential ordinal regression model readily fits into the pseudo-likelihood framework (Binder,, 1983; Skinner,, 1989), in which the likelihood contribution of each unit is exponentiated by its survey weight,

i𝒮f(yi|𝜽)wi.\prod_{i\in\mathcal{S}}f(y_{i}|\bm{\theta})^{w_{i}}.

Savitsky and Toth, (2016) show that in the Bayesian case, a pseudo-likelihood combined with a prior density p(𝜽)p(\bm{\theta}), leads to a valid pseudo-posterior

p(𝜽|𝒚,𝒘~)i𝒮f(yi|𝜽)w~ip(𝜽)p(\bm{\theta}|\bm{y},\bm{\widetilde{w}})\propto\prod_{i\in\mathcal{S}}f(y_{i}|\bm{\theta})^{\widetilde{w}_{i}}p(\bm{\theta})

provided that the original survey weights are rescaled to w~i=nwi/i𝒮wi\widetilde{w}_{i}=nw_{i}/\sum_{i\in\mathcal{S}}w_{i} in order to ensure w~i=n\sum\widetilde{w}_{i}=n.

In the case of (2), introducing a pseudo-likelihood means that the binomial pmf in the data model, which is parameterized by log odds F(γk𝒙i𝜷)F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}), becomes proportional to

[(F(γk𝒙i𝜷))yik(1F(γk𝒙i𝜷))1yik]w~i\displaystyle[(F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{y_{ik}}(1-F(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{1-y_{ik}}]^{\widetilde{w}_{i}} =(exp(γk𝒙i𝜷))yikw~i(1+exp(γk𝒙i𝜷))nikw~i,\displaystyle=\frac{(\exp(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{y_{ik}\widetilde{w}_{i}}}{(1+\exp(\gamma_{k}-\bm{x}_{i}^{\prime}\bm{\beta}))^{n_{ik}\widetilde{w}_{i}}},

which is the integral identity (2.7) with a=yiw~ia=y_{i}\widetilde{w}_{i} and b=niw~ib=n_{i}\widetilde{w}_{i}. Consequently, sampling again proceeds by drawing iteratively from PG and Gaussian distributions, but with the PG shape parameter now containing a term for the survey weights so that ωik|𝜹PG(w~inik,𝒙~ik𝜹)\omega_{ik}|\bm{\delta}\sim\text{PG}(\widetilde{w}_{i}n_{ik},\widetilde{\bm{x}}^{\prime}_{ik}\bm{\delta}) in the first step.

3.2 Spatio-temporal dependence

Survey data often contain spatial and temporal information. If the spatial domain of interest is especially fine-grained, then many areas will have small sample sizes and we would like to leverage spatial dependence to “borrow strength.” This can be achieved by including an area-level random effect within a unit-level model. Parker et al., (2022) propose Bayesian hierarchical models for cross-sectional data with binary, count, and categorical responses. They introduce an mm-dimensional vector of areal random effects 𝜼Nm(𝟎,ση2Im)\bm{\eta}\sim N_{m}(\bm{0},\sigma^{2}_{\eta}I_{m}), which retain conjugacy with the Pólya-Gamma data augmentation scheme introduced above. To induce a more explicit spatial dependence structure, they take 𝝍i\bm{\psi}_{i} to be a set of spatial basis functions, constructed as a subset of the eigenvectors of the adjacency matrix for all states. In our model, we retain only the mm columns corresponding to positive eigenvalues (and hence, positive spatial dependence), but for higher-resolution geographies further truncation can be performed to achieve greater dimension reduction. Thus, the n×mn\times m matrix of basis functions is defined as

𝚿=(𝝍1𝝍n),\bm{\Psi}=\begin{pmatrix}\bm{\psi}_{1}^{\prime}\\ \vdots\\ \bm{\psi}_{n}^{\prime}\end{pmatrix},

with 𝝍i\bm{\psi}_{i} the basis function evaluated at the area containing unit ii. This construction may be viewed as a special case of the Moran’s I basis functions (Hughes and Haran,, 2013; Bradley et al.,, 2015). Incorporating such a random effect into (2) and (2) leads us to set the linear predictor to be logit(π~ik)=γk(𝒙i𝜷+𝝍i𝜼).\text{logit}(\widetilde{\pi}_{ik})=\gamma_{k}-(\bm{x}_{i}\bm{\beta}+\bm{\psi}_{i}\bm{\eta}).

Many complex surveys follow a longitudinal design wherein the same respondents (or some subset of them) are re-interviewed over time. Two types of dependence arise in this setting: within-unit responses are correlated, as are the aggregate-level trends. For longitudinal designs where the survey is repeated at regular intervals, Vedensky et al., (2023) present models for Gaussian, binary and count data that handle both types of correlation. For discrete responses, the within-unit correlation is captured by constructing a categorical covariate that indexes the value (or absence) of the respondent’s previous response. The aggregate-level dependence is captured by adding an AR(1) structure onto the areal random effect so that the process model follows

𝜼t|𝜼t1,ϕ,ση2\displaystyle\bm{\eta}_{t}|\bm{\eta}_{t-1},\phi,\sigma^{2}_{\eta} Nm(ϕ𝜼t1,ση2Im), for t=2,,T\displaystyle\sim\text{N}_{m}(\phi\bm{\eta}_{t-1},\sigma^{2}_{\eta}I_{m}),\qquad\text{ for }t=2,\ldots,T (3.2)
𝜼1|ση2\displaystyle\bm{\eta}_{1}|\sigma^{2}_{\eta} Nm(𝟎,ση2Im),\displaystyle\sim\text{N}_{m}(\bm{0},\sigma^{2}_{\eta}I_{m}), (3.3)

where ϕUnif(1,1)\phi\sim\text{Unif}(-1,1) functions as an autoregressive parameter, ση2,ση12ind.IG(a,b)\sigma^{2}_{\eta},\sigma^{2}_{\eta_{1}}\overset{ind.}{\sim}IG(a,b) are the prior random effect variances, and a,b,σβ>0a,b,\sigma_{\beta}>0 are fixed hyperparameters.

We note that a survey may also follow a repeated, cross-sectional design, where the entire sample is refreshed at each time point. In such a survey, there is no within-respondent correlation and the random effects structure (3.2)–(3.3) alone is sufficient for capturing the design.

Combining the above modifications, the full spatio-temporal, ordinal model can be specified as

𝒚|𝜷,𝜼,𝜸\displaystyle\bm{y}|\bm{\beta},\bm{\eta},\bm{\gamma} t=1Ti𝒮tk=1K1Binom(yitk|nitk,π~itk)w~it\displaystyle\propto\prod_{t=1}^{T}\prod_{i\in\mathcal{S}_{t}}\prod_{k=1}^{K-1}\text{Binom}(y_{i{t}k}|n_{i{t}k},\widetilde{\pi}_{i{t}k})^{\widetilde{w}_{i{t}}}
logit(π~itk)\displaystyle\text{logit}(\widetilde{\pi}_{i{t}k}) =𝒖itk𝜸𝒙i𝜷𝝍i𝜼t\displaystyle={\bm{u}_{itk}^{\prime}\bm{\gamma}}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}\bm{\eta}_{t}
𝜸\displaystyle\bm{\gamma} Ng(0,σγ2Ig)\displaystyle\sim\text{N}_{g}(0,\sigma^{2}_{\gamma}I_{g})
𝜼t|𝜼t1,ϕ,ση2\displaystyle{\bm{\eta}_{t}|\bm{\eta}_{t-1},\phi,\sigma^{2}_{\eta}\phantom{.}} Nm(ϕ𝜼t1,ση2Im),t=2,,T\displaystyle\sim\text{N}_{m}(\phi\bm{\eta}_{t-1},\sigma^{2}_{\eta}I_{m}),\hskip 5.69054ptt=2,\ldots,T
𝜼1|ση2\displaystyle{\bm{\eta}_{1}|\sigma^{2}_{\eta}\phantom{.}} Nm(𝟎,ση2Im)\displaystyle\sim{\text{N}_{m}(\bm{0},\sigma^{2}_{\eta}I_{m})}
𝜷\displaystyle\bm{\beta} Nq(𝟎,σβ2Iq)\displaystyle\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}I_{q})
ϕ\displaystyle{\phi}\phantom{.} Unif(1,1)\displaystyle\sim\text{Unif}(-1,1)
ση2\displaystyle\sigma^{2}_{\eta} IG(a,b),\displaystyle\sim\text{IG}(a,b),

where nitk=nitj<kyitjn_{itk}=n_{it}-\sum_{j<k}y_{itj} and π~itk=πitk/(1j<kπitj)\widetilde{\pi}_{itk}=\pi_{itk}/(1-\sum_{j<k}\pi_{itj}). One additional feature of the model is that the cutpoints are made to vary by time and according to a subject’s previous response. Therefore, there are g=(K1)+(T1)(K+1)(K1)g=(K-1)+(T-1)*(K+1)*(K-1) cutpoints because subjects in the first time point have no previous response, whereas for the remaining T1T-1 weeks, there are K+1K+1 possible values for the previous response. Each of these scenarios requires K1K-1 cutpoints. The indicator vector 𝒖itk\bm{u}_{itk} picks out the appropriate cutpoint for a response.

For unordered data, a similar nominal model can be fit as K1K-1 independent binomial models, but with no cutpoints and with category-specific fixed and random effects

𝒚|𝜷,𝜼t=1Ti𝒮tk=1K1Binomial(yitk|nitk,π~itk)w~itlogit(π~itk)=𝒙i𝜷k+𝝍i𝜼tk𝜼tk|𝜼(t1)k,ϕk,σηk2Nm(ϕk𝜼(t1)k,σηk2Im),t=2,,T;k=1,K1𝜼1k|ση1k2Nm(𝟎,ση1k2Im)k=1,,K1𝜷kNq(𝟎,σβ2Iq)k=1,,K1ϕkUnif(1,1)ση1k2,σηk2,ind.IG(a,b)σβ,a,b>0.\displaystyle\begin{split}\bm{y}|\bm{\beta},\bm{\eta}&\propto\prod_{t=1}^{T}\prod_{i\in\mathcal{S}_{t}}\prod_{k=1}^{K-1}\text{Binomial}(y_{itk}|n_{itk},\widetilde{\pi}_{itk})^{\widetilde{w}_{it}}\\ \text{logit}(\widetilde{\pi}_{itk})&=\bm{x}_{i}^{\prime}\bm{\beta}_{k}+\bm{\psi}_{i}^{\prime}\bm{\eta}_{tk}\\ \bm{\eta}_{tk}|\bm{\eta}_{(t-1)k},\phi_{k},\sigma^{2}_{\eta k}&\sim\text{N}_{m}(\phi_{k}\bm{\eta}_{(t-1)k},\sigma^{2}_{\eta k}I_{m}),\qquad t=2,\ldots,T;\phantom{--}k=1,\ldots K-1\\ \bm{\eta}_{1k}|\sigma^{2}_{\eta_{1}k}&\sim\text{N}_{m}(\bm{0},\sigma^{2}_{\eta_{1}k}I_{m})\qquad k=1,\ldots,K-1\\ \bm{\beta}_{k}&\sim\text{N}_{q}(\bm{0},\sigma^{2}_{\beta}I_{q})\qquad k=1,\ldots,K-1\\ \phi_{k}&\sim\text{Unif}(-1,1)\\ \sigma^{2}_{\eta_{1}k},\sigma^{2}_{\eta k},&\overset{ind.}{\sim}IG(a,b)\\ \sigma_{\beta},a,b&>0.\end{split}

In all of the above models, the latent processes and priors are either conditionally conjugate with the Pólya-Gamma latent variables or produce known conditional distributions. The resulting Gibbs samplers are thereby efficient and straightforward to implement. Full details of the sampling algorithms and derivations of full conditionals are included in the Appendix.

4 Variational Bayes

Despite the conjugacy in the above models, the Gibbs samplers may be inadequate for extremely high-dimensional problems, which require sampling a large number of latent variables and random effects (e.g., large sample sizes and fine-scaled geographies). In this case, we also propose a set of variational Bayes algorithms (Bishop,, 2006; Blei et al.,, 2017) as an alternative to MCMC. These algorithms are much faster albeit at the cost of approximating the posterior distribution rather than sampling from the exact posterior.

Instead of sampling sequentially, VB methods minimize the Kullback-Leibler divergence (Kullback and Leibler,, 1951)

KL(q(𝜽)p(𝜽|𝒙))=Θq(𝜽)logq(𝜽)p(𝜽|𝒙)d𝜽.\text{KL}(q(\bm{\theta})\mid\mid p(\bm{\theta}|\bm{x}))=\int_{\Theta}q(\bm{\theta})\log\frac{q(\bm{\theta})}{p(\bm{\theta}|\bm{x})}d\bm{\theta}.

between the true posterior p(θ|x)p(\theta|x) and an approximation q(θ)q(\theta) selected from a given class of distributions 𝒬\mathcal{Q}. That is, an optimal approximation is selected as

q(𝜽)=argminq(𝜽)𝒬KL(q(𝜽)p(𝜽𝒙)).q^{*}(\bm{\theta})=\mathop{\arg\min}\limits_{q(\bm{\theta})\in\mathcal{Q}}\text{KL}(q(\bm{\theta})\mid\mid p(\bm{\theta}\mid\bm{x})).

A common choice for 𝒬\mathcal{Q} is the mean-field variational family (Blei et al.,, 2017) which consists of all densities qq that can be factored into mutually independent “variational factors” qj(θj)q_{j}(\theta_{j}) for each of JJ latent variables so that

q(𝜽)=j=1Jqj(θj).q(\bm{\theta})=\prod_{j=1}^{J}q_{j}(\theta_{j}).

Since the KL divergence is not necessarily computable, the optimization problem may be reformulated as a maximization with respect to the evidence lower bound (ELBO) (Blei et al.,, 2017)

ELBO(𝜽)=𝔼[logp(𝜽,𝒙)]𝔼[logq(𝜽)].\text{ELBO}(\bm{\theta})=\mathbb{E}[\log p(\bm{\theta},\bm{x})]-\mathbb{E}[\log q(\bm{\theta})].

Durante and Rigon, (2019), building on earlier work of Jaakkola and Jordan, (2000), show that logistic models augmented by Pólya-Gamma latent variables have a particularly simple VB algorithm for performing this maximization, and we adapt this to the ordinal case.

Returning to the notation of the two-stage Gibbs sampler at the end of Section 2, note that the Gaussian and PG densities therein have the exponential family representation

p(𝜹|𝒚,𝝎)exp[𝜼1(𝒚)𝜹+vec[𝜼2(𝝎)]vec(𝜹𝜹)α[𝜼1(𝒚),𝜼2(𝝎)]]p(\bm{\delta}|\bm{y},\bm{\omega})\propto\exp[\bm{\eta}_{1}(\bm{y})^{\prime}\bm{\delta}+\text{vec}[\bm{\eta}_{2}(\bm{\omega})]^{\prime}\text{vec}(\bm{\delta}\bm{\delta}^{\prime})-\alpha[\bm{\eta}_{1}(\bm{y}),\bm{\eta}_{2}(\bm{\omega})]]

and

p(𝝎ik|𝒚,𝝎)exp{ηik(𝜹)ωikα[ηi(𝜹)]}p(ωik), for i=1,,n;k=1,,kip(\bm{\omega}_{ik}|\bm{y},\bm{\omega})\propto\exp\{\eta_{ik}(\bm{\delta})\omega_{ik}-\alpha[\eta_{i}(\bm{\delta})]\}p(\omega_{ik}),\text{ for }i=1,\ldots,n;k=1,\ldots,k_{i}

respectively, with natural parameters 𝜼1(𝒚)=𝑿~𝜿~+𝑫1𝒅\bm{\eta}_{1}(\bm{y})=\bm{\widetilde{X}}^{\prime}\widetilde{\bm{\kappa}}+\bm{D}^{-1}\bm{d} and 𝜼2(𝝎)=12𝑫1+𝑿~𝛀𝑿~\bm{\eta}_{2}(\bm{\omega})=-\frac{1}{2}\bm{D}^{-1}+\widetilde{\bm{X}}^{\prime}\bm{\Omega}\widetilde{\bm{X}} and ηik(𝝎)=12(𝒙~ik𝜷)2.\eta_{ik}(\bm{\omega})=-\frac{1}{2}(\widetilde{\bm{x}}_{ik}^{\prime}\bm{\beta})^{2}. Durante and Rigon, (2019) show that the optimal CAVI parameter value updates at iteration jj are

  1. 1.

    𝝀1(j)=𝔼q(j1)(𝝎)[𝜼1(𝒚)]\bm{\lambda}_{1}^{(j)}=\mathbb{E}_{q^{(j-1)(\bm{\omega})}}[\bm{\eta}_{1}(\bm{y})]

  2. 2.

    𝝀2(j)=𝔼q(j1)(𝝎)[𝜼2(𝝎]\bm{\lambda}_{2}^{(j)}=\mathbb{E}_{q^{(j-1)}(\bm{\omega})}[\bm{\eta}_{2}(\bm{\omega}]

  3. 3.

    ξˇik(j)=𝔼q(j)(𝜹ik)[ηik(𝝎)]\widecheck{\xi}_{ik}^{(j)}=\mathbb{E}_{q^{(j)}(\bm{\delta}_{ik})}[\eta_{ik}(\bm{\omega})] for i=1,,ni=1,\ldots,n and k=1,,kik=1,\ldots,k_{i},

where the q(r)(𝜹ik)q^{(r)}(\bm{\delta}_{ik}) densities are Gaussian and the q(r)(ωik)q^{(r)}(\omega_{ik}) densities are PG.

Calculation of the expectation in Step 2 is straightforward due to the closed-form expression provided by (2.6). Convergence is measured by comparing the change in successive iterations of the ELBO and once this value drops below a predetermined threshold, the algorithm terminates. Independent samples are then drawn from the variational posterior N(𝝁ˇ,𝚺ˇ)(\widecheck{\bm{\mu}},\widecheck{\bm{\Sigma}}), as an approximation to the true posterior, with 𝝁ˇ=(2𝝀2)1𝝀1\widecheck{\bm{\mu}}=(-2\bm{\lambda}_{2})^{-1}\bm{\lambda}_{1} and 𝚺ˇ=(2𝝀2)1\widecheck{\bm{\Sigma}}=(-2\bm{\lambda}_{2})^{-1}.

Parker et al., (2022) adapt this approach to cross-sectional binary and nominal pseudo-likelihood models with random effects. We further develop a cross-sectional, ordinal pseudo-likelihood sampling procedure in Algorithm 1 and a longitudinal variant in Algorithm 2. The matrices G~\widetilde{G}, X~\widetilde{X} and Ψ~\widetilde{\Psi} are constructed in the same manner as (2.8), where if yit=ky_{it}=k, then we define jit=min(k,K1)j_{it}=\min(k,K-1) and G~it=(𝒖it1,,𝒖itjit)\widetilde{G}_{it}=(\bm{u}_{it1},\ldots,\bm{u}_{itj_{it}}) and let

G~=(G~11G~12G~nT)\widetilde{G}=\begin{pmatrix}\widetilde{G}_{11}\\ \widetilde{G}_{12}\\ \vdots\\ \widetilde{G}_{nT}\end{pmatrix}

and similarly for X~=(𝒙it1,,𝒙itjit)\widetilde{X}=(\bm{x}_{it1},\ldots,\bm{x}_{itj_{it}})^{\prime} and Ψ~=(𝝍it1,,𝝍itjit)\widetilde{\Psi}=(\bm{\psi}_{it1},\ldots,\bm{\psi}_{itj_{it}})^{\prime}.

In Algorithm 2, calculations for the variational parameters μ~ϕ\widetilde{\mu}_{\phi} depend on moments of the truncated normal distribution (Johnson et al.,, 1994). In that algorithm, Φ\Phi denotes the standard normal cdf and φ\varphi the pdf. Note, too, the distinction between (μ~ϕ)2(\widetilde{\mu}_{\phi})^{2} and μ~ϕ2\widetilde{\mu}_{\phi^{2}}. The former is the squared mean, while the latter is the variational variance parameter for ϕ\phi. Subscripts on the matrices, such as XtX_{t}, refer to the subset of rows corresponding to respondents at time tt. A subscript such as, 𝚺ˇηT\widecheck{\bm{\Sigma}}_{\eta_{-T}} refers to all entries of the matrix except those corresponding to the last time point t=Tt=T.

Data: 𝑪~=[𝑮~,𝑿~,𝚿~]\widetilde{\bm{C}}=[\widetilde{\bm{G}},-\widetilde{\bm{X}},-\widetilde{\bm{\Psi}}] and 𝜿~\widetilde{\bm{\kappa}};
Initialize σˇη2\widecheck{\sigma}^{2}_{\eta} and ξˇik\widecheck{\xi}_{ik} for i=1,,ni=1,\ldots,n and k=1,,kik=1,\ldots,k_{i};
for j=1j=1 until convergence do
 𝛀ˇDiag(w~112ξ11tanh(ξˇ11/2),,w~nkn2ξnkntanh(ξˇnkn/2));\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\frac{\widetilde{w}_{11}}{2\xi_{11}}\tanh(\widecheck{\xi}_{11}/2),\ldots,\frac{\widetilde{w}_{nk_{n}}}{2\xi_{nk_{n}}}\tanh(\widecheck{\xi}_{nk_{n}}/2)\right);
 𝚺ˇ(blockdiag(1σγ2IK1,1σβ2Iq,σˇη2Im)+𝑪~𝛀ˇ𝑪~)1;\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\gamma}}I_{K-1},\frac{1}{\sigma^{2}_{\beta}}I_{q},{\widecheck{\sigma}^{2}_{\eta}}I_{m}\right)+\widetilde{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}\widetilde{\bm{C}}\right)^{-1};
 𝚺ˇη𝚺ˇ[(g+q+1):(g+q+m),(g+q+1):(g+q+m)];\widecheck{\bm{\Sigma}}_{\eta}\leftarrow\widecheck{\bm{\Sigma}}[(g+q+1):(g+q+m),(g+q+1):(g+q+m)];
 𝝁ˇ(𝝁ˇγ,𝝁ˇβ,𝝁ˇη)𝚺ˇ𝑪~𝜿~;\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\gamma}^{\prime},\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}^{\prime}\widetilde{\bm{\kappa}};
 σˇη2(b+12(𝝁ˇη𝝁ˇη+tr(𝚺ˇη)))/(a+m/2);\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+\frac{1}{2}(\widecheck{\bm{\mu}}_{\eta}^{\prime}\widecheck{\bm{\mu}}_{\eta}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta})))/(a+m/2);
 for i=1i=1 to nn do
    for k=1,,kik=1,\ldots,k_{i} do
       ξˇik(𝑪~ik𝚺ˇ𝑪~ik+(𝑪~ik𝝁ˇ)2)1/2;\widecheck{\xi}_{ik}\leftarrow(\widetilde{\bm{C}}_{ik}^{\prime}\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}_{ik}+(\widetilde{\bm{C}}_{ik}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2};
      end for
    
   end for
 
end for
Algorithm 1 Variational Bayes algorithm for the cross-sectional ordinal model
Data: 𝑪~=[𝑮~,𝑿~,𝚿~]\widetilde{\bm{C}}=[\widetilde{\bm{G}},-\widetilde{\bm{X}},-\widetilde{\bm{\Psi}}] and 𝜿~\widetilde{\bm{\kappa}}
Initialize 𝝁ˇγ\widecheck{\bm{\mu}}_{\gamma}, 𝝁ˇη\widecheck{\bm{\mu}}_{\eta}, μˇϕ\widecheck{\mu}_{\phi}, μˇϕ2\widecheck{\mu}_{\phi^{2}}, σˇϕ2\widecheck{\sigma}^{2}_{\phi} , 𝚺ˇηt\widecheck{\bm{\Sigma}}_{\eta_{t}}, σˇη2\widecheck{\sigma}^{2}_{\eta} and ξˇitk\widecheck{\xi}_{itk} for t=1,,Tt=1,\ldots,T; i=1,,nti=1,\ldots,n_{t}; k=1,,kik=1,\ldots,k_{i};
for j=1j=1 until convergence do
  𝝎ˇ(w~1112ξˇ111tanh(ξˇ111/2),,w~nTK2ξˇnTKtanh(ξˇnTK/2));\bm{\widecheck{\omega}}\leftarrow\left(\frac{\widetilde{w}_{111}}{2\widecheck{\xi}_{111}}\tanh(\widecheck{\xi}_{111}/2),\ldots,\frac{\widetilde{w}_{nTK}}{2\widecheck{\xi}_{nTK}}\tanh(\widecheck{\xi}_{nTK}/2)\right);
  𝛀ˇDiag(𝝎ˇ);\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\widecheck{\bm{\omega}}\right);
  𝝁ˇβ(𝑿~𝛀ˇ𝑿~+1/σβ2𝑰p)1𝑿~𝛀ˇ(𝑮~𝝁ˇγ𝜿~/𝝎ˇ𝚿~𝝁ˇη);\bm{\widecheck{\mu}}_{\beta}\leftarrow(\widetilde{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}\widetilde{\bm{X}}+1/\sigma^{2}_{\beta}\bm{I}_{p})^{-1}\widetilde{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}(\widetilde{\bm{G}}\widecheck{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}/\widecheck{\bm{\omega}}-\widetilde{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});
  𝝁ˇγ(𝑮~𝛀ˇ𝑮~+1/σγ2𝑰g)1𝑮~𝛀ˇ(𝜿~/𝝎ˇ+𝑿~𝝁ˇβ+𝚿~𝝁ˇη);\bm{\widecheck{\mu}}_{\gamma}\leftarrow(\widetilde{\bm{G}}^{\prime}\widecheck{\bm{\Omega}}\widetilde{\bm{G}}+1/\sigma^{2}_{\gamma}\bm{I}_{g})^{-1}\widetilde{\bm{G}}^{\prime}\widecheck{\bm{\Omega}}(\widetilde{\bm{\kappa}}/\widecheck{\bm{\omega}}+\widetilde{\bm{X}}\widecheck{\bm{\mu}}_{\beta}+\widetilde{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});
  𝚺ˇη1(𝚿~1𝛀ˇ1𝚿~1+σˇη12(1+μˇϕ2)𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{1}}\leftarrow(\widetilde{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}\widetilde{\bm{\Psi}}_{1}+\widecheck{\sigma}^{2}_{\eta_{1}}(1+\widecheck{\mu}_{\phi^{2}})\bm{I}_{m})^{-1};
  𝝁ˇη1𝚺ˇη1𝚿~1𝛀ˇ1(𝑮~1𝝁ˇγ𝜿~1/𝝎ˇ1𝑿~1𝝁ˇβ)+𝝁ˇη2μˇϕσˇη12;\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\widecheck{\bm{\Sigma}}_{\eta_{1}}\widetilde{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}(\widetilde{\bm{G}}_{1}\widecheck{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{1}/\widecheck{\bm{\omega}}_{1}-\widetilde{\bm{X}}_{1}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\bm{\mu}}_{\eta_{2}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta_{1}};
  𝝁ˇη1𝝁ˇη1𝔼[𝝁ˇη1];\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\bm{\widecheck{\mu}}_{\eta_{1}}-\mathbb{E}[\bm{\widecheck{\mu}}_{\eta_{1}}]; //Sum-to-zero constraint
  for t=1t=1 to TT do
     𝚺ˇηt(𝚿~t𝛀ˇt𝚿~t+(1+μˇϕ2)σˇη2𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{t}}\leftarrow(\widetilde{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}\widetilde{\bm{\Psi}}_{t}+(1+\widecheck{\mu}_{\phi^{2}})\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};
    𝝁ˇηt𝚺ˇηt𝚿~t𝛀ˇt(𝑮~tˇ𝝁γ𝜿~t/𝝎ˇt𝑿~t𝝁ˇβ)+μˇϕσˇη2(𝝁ˇηt+𝝁ˇηt+1);\bm{\widecheck{\mu}}_{\eta_{t}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{t}}\widetilde{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}(\widetilde{\bm{G}}_{t}\bm{\widecheck{}}{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{t}/\widecheck{\bm{\omega}}_{t}-\widetilde{\bm{X}}_{t}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta}(\bm{\widecheck{\mu}}_{\eta_{t}}+\bm{\widecheck{\mu}}_{\eta_{t+1}});
    
  end for
 𝚺ˇηT(𝚿~T𝛀ˇT𝚿~T+σˇη2𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{T}}\leftarrow(\widetilde{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}\widetilde{\bm{\Psi}}_{T}+\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};
  𝝁ˇηT𝚺ˇηT𝚿~T𝛀ˇT(𝑮~Tˇ𝝁γ𝜿~T/𝝎ˇT𝑿~T𝝁ˇβ)+𝝁ˇηT1μˇϕσˇη2;\bm{\widecheck{\mu}}_{\eta_{T}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{T}}\widetilde{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}(\widetilde{\bm{G}}_{T}\bm{\widecheck{}}{\bm{\mu}}_{\gamma}-\widetilde{\bm{\kappa}}_{T}/\widecheck{\bm{\omega}}_{T}-\widetilde{\bm{X}}_{T}\widecheck{\bm{\mu}}_{\beta})+\bm{\widecheck{\mu}}_{\eta_{T-1}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta};
  𝝁ˇη(𝝁ˇη1,,𝝁ˇηT);\bm{\widecheck{\mu}}_{\eta}\leftarrow(\bm{\widecheck{\mu}}_{\eta_{1}},\ldots,\bm{\widecheck{\mu}}_{\eta_{T}});
  𝚺ˇηblockdiag(𝚺ˇη1,,𝚺ˇηT);\bm{\widecheck{\Sigma}}_{\eta}\leftarrow\text{blockdiag}(\bm{\widecheck{\Sigma}}_{\eta_{1}},\ldots,\bm{\widecheck{\Sigma}}_{\eta_{T}});
  𝚺ˇ(blockdiag(1σγ2Ig,1σβ2Iq,𝚺ˇη)+𝑪~𝛀ˇ𝑪~)1;\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\gamma}}I_{g},\frac{1}{\sigma^{2}_{\beta}}I_{q},\bm{\widecheck{\Sigma}}_{\eta}\right)+\widetilde{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}\widetilde{\bm{C}}\right)^{-1};
  𝝁ˇ(𝝁ˇγ,𝝁ˇβ,𝝁ˇη)𝚺ˇ𝑪~𝜿~;\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\gamma}^{\prime},\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}^{\prime}\widetilde{\bm{\kappa}};
  mt=1T1𝝁ˇηt𝝁ˇηt+1/(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT));m\leftarrow\sum_{t=1}^{T-1}\bm{\widecheck{\mu}}_{\eta_{t}}^{\prime}\bm{\widecheck{\mu}}_{\eta_{t+1}}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);
  σˇϕ2σˇη2/(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT));\widecheck{\sigma}^{2}_{\phi}\leftarrow\widecheck{\sigma}^{2}_{\eta}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);
 (1m)/σˇϕ;\ell\leftarrow(-1-m)/\widecheck{\sigma}_{\phi};
  u(1m)/σˇϕ;u\leftarrow(1-m)/\widecheck{\sigma}_{\phi};
  μˇϕmσˇϕφ(u)φ()Φ(u)Φ();\widecheck{\mu}_{\phi}\leftarrow m-\widecheck{\sigma}_{\phi}\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)};
  μˇϕ2μˇϕ2+σˇϕ2(1uφ(u)φ()Φ(u)Φ()(φ(u)φ()Φ(u)Φ())2);\widecheck{\mu}_{\phi^{2}}\leftarrow\widecheck{\mu}_{\phi}^{2}+\widecheck{\sigma}^{2}_{\phi}\left(1-\frac{u\varphi(u)-\ell\varphi(\ell)}{\Phi(u)-\Phi(\ell)}-\left(\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)}\right)^{2}\right);
  σˇη12(b+𝝁ˇη1𝝁ˇη1+tr(𝚺ˇη1))/(a+r/2);\widecheck{\sigma}^{2}_{\eta_{1}}\leftarrow(b+\widecheck{\bm{\mu}}_{\eta_{1}}^{\prime}\widecheck{\bm{\mu}}_{\eta_{1}}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{1}}))/(a+r/2);
  σˇη2(b+(1/2)(t=2T𝜼ˇ1𝜼ˇ1+tr(𝚺ˇη1)2μˇϕt=1T1𝜼ˇt𝜼ˇt+1+\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+(1/2)(\sum_{t=2}^{T}\widecheck{\bm{\eta}}_{1}^{\prime}\widecheck{\bm{\eta}}_{1}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-1}})-2\widecheck{\mu}_{\phi}\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}\widecheck{\bm{\eta}}_{t+1}+
      μˇϕ2(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT))))/(a+r(T1)/2);\widecheck{\mu}_{\phi^{2}}(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}}))))/(a+r(T-1)/2);
  for i=1i=1 to nn do
     for k=1,,kik=1,\ldots,k_{i} do
        for t=ti,ti+1,ti+2t=t_{i},t_{i}+1,t_{i}+2 do
           ξˇikt(𝑪~ikt𝚺ˇ𝑪~ikt+(𝑪~ikt𝝁ˇ)2)1/2\widecheck{\xi}_{ikt}\leftarrow(\widetilde{\bm{C}}_{ikt}^{\prime}\widecheck{\bm{\Sigma}}\widetilde{\bm{C}}_{ikt}+(\widetilde{\bm{C}}_{ikt}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2}
        end for
       
     end for
    
  end for
 
end for
Algorithm 2 Variational Bayes algorithm for the longitudinal ordinal model
\ULforem

The final case, of a longitudinal, nominal VB model, proceeds similarly to Algorithm 2 but without the cutpoint parameters. We defer this algorithm to the Appendix.

5 Empirical Results

5.1 Data description

To illustrate the proposed methodology, we apply the ordinal models to the Household Pulse Survey (HPS). The U.S. Census Bureau launched the HPS to measure the immediate effects of the COVID-19 pandemic on U.S. households. The survey was first deployed during the week of April 23, 2020 and continues to be conducted as the Household Trends and Outlook Pulse Survey (HTOPS).111https://www.census.gov/data/experimental-data-products/household-pulse-survey.html Data collection for the survey was split into “phases” of varying lengths, which differed in methodology, data collection, and structure. The intention was to make it straightforward to modify the survey and add new questions as needed amid the uncertainty of the pandemic.

Though many questions ran for the duration of HPS, the first phase, which lasted 12 weeks, was unique in its longitudinal structure. During this time, households were repeatedly interviewed according to a rotating panel design. Respondents were selected in an initial week, then re-interviewed for up to two more weeks if they continued to provide responses but dropped from the sampling frame if they failed to provide a response at any point. Therefore, each respondent provided between one and three consecutive responses. We focus on applying our methods to this first phase because of its longitudinal nature. It is important to note, however, that any modeling of later HPS phases that includes the first 12 weeks will still require using our methodology for these early measurements. Modeling of later phases also still requires capturing the temporal dependence due to the repeated cross-sectional design as our model does. HTOPS is intended to reinstate the longitudinal design, as well.

Collaboration with numerous federal agencies led to a range of survey items in the HPS. These span topics like economic well-being, physical and mental health, and other demographic information. A large number of these questions are categorical in nature. For instance, the Bureau of Labor Statistics included questions about the receipt and use of the Economic Impact Payment stimulus (e.g., was it used to (1) pay for expenses, (2) pay off debt, (3) add to savings, or (4) not applicable).222bls.gov/cex/research_papers/pdf/safir-effects-of-covid-on-household-finances.pdf The Health Resources and Services Administration asked questions on childcare and children’s health (e.g., “whether any children in the last 4 weeks showed any of […] 8 mental health-related behaviors”).333https://mchb.hrsa.gov/covid-19/data The National Center for Health Statistics (NCHS) added questions regarding delayed medical care and anxiety and depression symptoms (e.g., frequency (1) not at all, (2) several days, (3) more than half the days, or (4) nearly every day).444https://www.cdc.gov/nchs/covid19/health-care-access-and-mental-health.htm As such, producing precise estimates for HPS questions is of use to a large number of stakeholders and requires modeling longitudinal categorical data.

In particular, questions regarding mental health are of interest to researchers and policymakers due to the unprecedented disruptions to daily life, working conditions, and mental health services associated with the pandemic and the concomitant public health measures (World Health Organization,, 2022). Longitudinal studies are important here in order to judge the effect on individuals over time and to determine if changes are “acute” or sustained (Daly and Robinson,, 2022). Also, it is posited that specific demographic groups, such as women and young adults, are disproportionately impacted (Hawes et al.,, 2022; Metin et al.,, 2022). This makes our method suited to the task. Design-based estimates, on the other hand, break down for group comparisons in finer partitions since the standard error estimates will be unstable for subdomains with low sample sizes and many domains will have no sampled units at all. For this reason, we focus our data analysis in Section 5.3 on the aforementioned NCHS question regarding anxiety.

5.2 Simulation study

To assess the performance of our ordinal models, we conduct an empirical simulation study, for the aforementioned response regarding frequency of anxiety over course of the week, which is ranked on an ascending scale of 4 categories. We take as our ground-truth population all 774,882774,882 unique respondents and 991,412991,412 total responses in the HPS sample from the continental United States (excluding the District of Columbia).

To mimic a true complex survey, we draw informative subsamples from this population and obtain model-based and direct estimates for each subsample. We compare based on summary measures, including mean square error (MSE), absolute bias, credible interval coverage rate, and the interval score (IS) of Gneiting and Raftery, (2007). The interval score is calculated as

ISα(,u;x)=(u)+2α(x)1{x<}+2α(xu)1{x>u}IS_{\alpha}(\ell,u;x)=(u-\ell)+\frac{2}{\alpha}(\ell-x){1}\{x<\ell\}+\frac{2}{\alpha}(x-u){1}\{x>u\}

and penalizes overly wide predictive intervals. A lower score is more desirable.

To make the subsamples informative, we take a probability proportional to size sample following the Poisson method of Brewer et al., (1984) with an expected sample size of 5%5\% of the population. Each member of the population is sampled with a probability πi\pi_{i} proportional to a size variable

πiexp{.1wi+.2y¯i},\pi_{i}\propto\exp\{.1w_{i}^{*}+.2\bar{y}_{i}\},

where wiw_{i}^{*} is the log survey weight of household ii, scaled to have zero mean and standard deviation one, and y¯i\bar{y}_{i} the scaled mean of all of a household’s responses. The weight assigned to sampled unit ii is then wi=1/πi.w_{i}=1/\pi_{i}.

Spatial basis functions are constructed via the process outlined in Section 3.2. Retaining only the eigenvectors corresponding to positive eigenvalues leads to basis functions of dimension m=20m=20. Model covariates include race (white, Black, Asian, or other), sex (male, female), and age categories (one category for 18-25, one for 65+, and the remaining intermediary ages binned into five-year groups, for a total of 10 categories). The design matrix includes no intercept and further drops a reference category from one of the covariates in order for the cutpoints to be identifiable. To capture longitudinal dependence in the nominal model, we include a “synthetic” covariate that indexes a household’s previous response category (or lack thereof). In the ordinal model, this variable is used to vary the cutpoints as described in Section 3.2.

We compute direct estimators, and fit cross-sectional and longitudinal variants of both the Gibbs and VB models for a total of five different estimators. The cross-sectional models are fit independently for each week. For each of the models, RR posterior draws (or draws from the variational distribution) of the parameter estimates are taken. For all models, we set R=1,500R=1,500, and discard 500500 burn-in iterations beforehand in the Gibbs sampler. The estimated probability for a given category k=1,,K1k=1,\ldots,K-1 is calculated as

π~itk(r)=logit1(𝒖itk𝜸(r)𝒙i𝜷k(r)𝝍i𝜼t(r)).\widetilde{\pi}_{itk}^{(r)}=\text{logit}^{-1}(\bm{u}^{\prime}_{itk}\bm{\gamma}^{(r)}-\bm{x}_{i}^{\prime}\bm{\beta}_{k}^{(r)}-\bm{\psi}_{i}\bm{\eta}_{t}^{(r)}).

The full probability vector 𝝅^it=(π^it1,,π^itK)\widehat{\bm{\pi}}_{it}=(\widehat{\pi}_{it1},\ldots,\widehat{\pi}_{itK}) is reconstructed from the stick-breaking representation and estimates for the full population are then generated as

y^it(r)Multinomial(𝒏,𝝅i)\widehat{y}_{it}^{(r)}\mid\cdot\sim\text{Multinomial}(\bm{n},\bm{\pi}_{i})

for t=1,,Tt=1,\ldots,T and i=1,,Ni=1,\ldots,N, where 𝒏\bm{n} is a K-dimensional vector of ones. This population can then be aggregated into cells by taking the average

y¯^jtk(r)=1Nji𝒮jy^itk(r)\widehat{\bar{y}}_{jtk}^{(r)}=\frac{1}{N_{j}}\sum_{i\in\mathcal{S}_{j}}\widehat{y}_{itk}^{(r)}

for each subdomain jj and summary metrics are computed over the posterior distributions of the cell estimates.

The results of 100100 simulation runs are summarized in Table 1. Regardless of sampling algorithm, the longitudinal models lead to improved MSE, lower interval scores, and faster runtimes as compared to their cross-sectional counterparts. The Gibbs models also outperform their VB counterparts on MSE and IS, but with the trade-off of a greatly increased runtime. The longitudinal Gibbs model performs best overall in terms of MSE and interval score. Although the direct estimator attains nearly the nominal coverage rate, it does so at the cost of overly wide confidence intervals, hence its high interval score.

Table 1: Overall results for direct- and model-based estimates averaged over 100 simulations in the ordinal case. Runtime is the median of 100 runs. All other values are averages over weeks, areas, and simulations.
Method MSE Abs Bias Cov. IS Runtime (s)
Direct 2.7×1032.7\times 10^{-3} 4.0×103{4.0\times 10^{-3}} 95%{95\%} .252.252 -
VB-CS 5.4×1045.4\times 10^{-4} 1.4×1021.4\times 10^{-2} 89%89\% .115.115 15
VB-Lon 4.1×1044.1\times 10^{-4} 1.4×1021.4\times 10^{-2} 86%86\% .113.113 4343
Gibbs-CS 5.1×1045.1\times 10^{-4} 1.4×1021.4\times 10^{-2} 90%90\% .112.112 99,10399,103
Gibbs-Lon 3.7×𝟏𝟎𝟒\mathbf{3.7\times 10^{-4}} 1.3×1021.3\times 10^{-2} 89%89\% .099\mathbf{.099} 15,67315,673

For a more detailed look, Figure 1 shows a time series plot of the MSE for each estimator averaged over areas and simulations. Again, all models outperform the direct estimator, and we see the same pattern. Longitudinal models uniformly reduce the MSE over their cross-sectional counterparts and the Gibbs models uniformly reduce MSE over their VB counterparts, with the longitudinal Gibbs sampler displaying the lowest MSE.

Refer to caption
Figure 1: Time series plot of the MSE for each estimator by week in the ordinal empirical simulation study. MSE is calculated for every combination of week and area then averaged across areas and 100 samples.

As a further diagnostic, Figure 2 plots the ratios of the model MSE for a given time-area combination relative to the direct estimator. The x=yx=y reference line denotes a ratio of one, indicating identical MSE between a model and the direct estimator. Only a few domains have a ratio less than one, with the longitudinal Gibbs model having the fewest.

Refer to caption
Figure 2: Ratio of model MSE to direct estimate MSE for each of the ordinal models in each area-week post-stratification cell averaged across 100 simulations. Values less than one indicate a reduction in MSE over the direct estimate.

5.3 Analysis of Household Pulse Anxiety Data

We conduct a data analysis by running the ordinal, longitudinal VB algorithm on the entire HPS dataset for the same anxiety response considered in the simulation study. For population values of the design matrix we use data from the 2020 Census Population Estimates program.555Available at https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/state/asrh/sc-est2021-alldata6.csv The choice of covariates and basis functions is the same as in the simulation studies.

To emphasize our models’ ability to make predictions for small areas, we consider a finer partition into subdomains and present results for a particular combination of race, sex, and age – namely, Asian males aged 36 to 40 – rather than weekly area-level averages as in the simulation. This requires first making estimates for a cross-tabulation of two sex categories, four race categories, 10 age categories and the KK response categories across 49 states and 12 weeks. For the given ordinal response with K=4K=4 there are 94,08094,080 domains to estimate. Relying on direct estimators is of little use in this context, where 41%41\% of the domains have zero respondents, only 3%3\% of cells have a sample size greater than 30, and the median sample size for all cells is one. The effect of this sparsity is reflected in Figure 3 which plots the direct estimates for a single response category among Asian males between the ages of 36 and 40. Many states have no estimate and different states are missing estimates in different weeks, meaning uninterrupted longitudinal estimates are generally unobtainable. Where the direct estimator can be calculated, the resulting values tend to be at the boundary, either zero or one.

Refer to caption
Figure 3: Areal time series plot of the direct estimates for proportion of Asian males aged 36 to 40, who feel “not at all” anxious during the past week. Estimates are calculated from the entire HPS Phase 1 data. The plot is faceted by week, with the first time point at top-left and the last time-point at bottom-right.

The standard errors of the direct estimates are calculated using the replicate weights method in the R survey library (Lumley,, 2004). Like the point estimates, these values tend to be extreme and the model estimates improve on them substantially. In fact, for 60%60\% of domains, a valid standard error estimate cannot be produced for the direct estimator, either because the sample size is too low or because all response values in a given cell are identical. For the remaining cells where direct estimate standard errors can be produced, plotting the ratio of model standard errors to those of the direct estimate in Figure 4 demonstrates a reduction in all but nine domains (.01%.01\% of the total).

Refer to caption
Figure 4: Scatter plot of ratio of model standard errors to direct estimate standard errors. Values less than one indicate an improvement of the model over the direct estimator. The four values less than one are marked with crosses.

Moreover, model estimates in Figure 5 show much smoother trends across both space and time whereas the direct estimates in Figure 3 jump around dramatically. For example, the estimated proportion for Florida is nearly zero in week 8, nearly one in the following week, then back to zero in week 10. The underlying truth is highly unlikely to change so rapidly. A similar pattern can be observed for a number of other states, such as Iowa, which jumps between two extreme values in weeks six and seven, with the additional complication that many weeks also have no estimate so that the trend is unobservable.

Additionally, if we want to compare multiple response categories simultaneously, our unit-level modeling approach allows us to visualize detailed estimates as in Figure 6, which plots estimated trajectories for the proportion of each category in a given demographic group by state. For readability, the states are grouped into the nine geographical Census divisions. The first timepoint of the plots corresponds to April 23, 2020 and the last corresponds to July 21. Over the course of this time period the proportion of respondents reporting no anxiety at all grows and then decreases. In week two, all states exhibit an increase in the proportion of “several days” of anxiety while “nearly every day” stays flat until an uptick occurs, starting in week eight. We can also see that some Census divisions, like the Pacific, are fairly homogeneous. In contrast, the Mountain division shows more variability and Nevada, whose economy is far more dependent on tourism, is estimated to have distinctly more anxiety than its neighbors.

Refer to caption
Figure 5: Areal time series plot of the ordinal VB model estimates for proportion of Asian males aged 36 to 40, who feel “not at all” anxious during the past week. Estimates are calculated from the entire HPS Phase 1 data. The plot is faceted by week, with the first time point at top-left and the last time-point at bottom-right.
Refer to caption
Figure 6: Longitudinal VB model estimates for frequency of anxiety over the past 7 days for Asian males aged 36 to 40 grouped by Census division.

Of further interest are group comparisons, which the direct estimator would be incapable of handling, since it considers each group independently. Figure 7 shows a comparison between the trajectories for men and women in a single Census division. The trends between the two groups are similar across states, but women are more likely to report some frequency of anxiety and less likely to report feeling not at all anxious. This is in line with existing research (Metin et al.,, 2022). Similar patterns hold for the other Census divisions.

Refer to caption
Figure 7: Longitudinal VB model estimates for frequency of anxiety over the past 7 days comparing Asian males and females aged 36 to 40 in the New England Census division.

6 Discussion

In this paper, we have introduced Bayesian models for both nominal and ordinal survey data and both cross-sectional and longitudinal designs, where a special case of the cross-sectional ordinal model is a conjugate Bayesian, ordinal logistic regression. Alongside the work of Parker et al., (2022) and Vedensky et al., (2023), this covers all of the most common types of survey responses within a unified, unit-level modeling framework. The models are efficient, capture dependencies that area-level models are not able to, and lead to great improvements over direct estimators as shown in the simulation study. The application to HPS phase 1 data regarding mental health symptoms demonstrates the types of fine-grained comparisons we are able to make that would not be possible in an area-level approach.

The longitudinal models in this paper extend readily to a number of large and important surveys that follow rotating panel designs such as the Survey of Income and Program Participation or the National Crime Victimization Survey. Future work remains to mitigate some of the biases inherent to these designs such as rotation group bias (Bailar,, 1975) and to consider other longitudinal designs, possibly with uneven sampling intervals.

Acknowledgements

This article is released to inform interested parties of ongoing research and to encourage discussion. The views expressed on statistical issues are those of the authors and not those of the NSF or U.S. Census Bureau.

Funding

This research was partially supported by the Census Bureau Dissertation Fellowship program and the U.S. National Science Foundation (NSF) under NSF grants NCSE-2215168, and NCSE-2215169.

Appendix

MCMC algorithms for the cross-sectional ordinal and longitudinal ordinal are presented below, as is the VB algorithm for longitudinal, nominal data.. For details on the cross-sectional nominal MCMC and VB algorithms, see Parker et al., (2022).

Gibbs sampling algorithms

Cross-sectional ordinal

  1. 1.

    Sample ωik|PG(w~inik,𝒖ik𝒙i𝜷𝝍i𝜼)\omega_{ik}|\cdot\sim\text{PG}(\widetilde{w}_{i}n_{ik},\bm{u}_{ik}^{\prime}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}^{\prime}\bm{\eta}) for i=1,,ni=1,\ldots,n and k=1,,kik=1,\ldots,k_{i}

  2. 2.

    Sample ση2|IG(a+m/2,b+𝜼𝜼/2)\sigma^{2}_{\eta}|\cdot\sim\text{IG}(a+m/2,b+\bm{\eta}^{\prime}\bm{\eta}/2)

  3. 3.

    Sample 𝜷|Np(𝝁=(𝑿𝛀(G𝜸𝜿/𝝎𝚿𝜼)),𝚺=(𝑿𝛀𝑿+1σβ2Ip)1)\bm{\beta}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\left(\bm{X}^{\prime}\bm{\Omega}(G^{\prime}\bm{\gamma}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{X}^{\prime}\bm{\Omega}\bm{X}+\frac{1}{\sigma^{2}_{\beta}}I_{p}\right)^{-1}\right)

  4. 4.

    Sample 𝜸|Ng(𝝁=(𝑮𝛀(X𝜷𝜿/𝝎𝚿𝜼)),𝚺=(𝑮𝛀𝑮+1σγ2Ig)1)\bm{\gamma}|\cdot\sim\text{N}_{g}\left(\bm{\mu}=\left(\bm{G}^{\prime}\bm{\Omega}(X^{\prime}\bm{\beta}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{G}^{\prime}\bm{\Omega}\bm{G}+\frac{1}{\sigma^{2}_{\gamma}}I_{g}\right)^{-1}\right)

  5. 5.

    Sample 𝜼|Nm(𝝁=(𝚿𝛀(X𝜷𝜿/𝝎𝚿𝜼)),𝚺=(𝚿𝛀𝚿+1ση12Ig)1)\bm{\eta}|\cdot\sim\text{N}_{m}\left(\bm{\mu}=\left(\bm{\Psi}^{\prime}\bm{\Omega}(X^{\prime}\bm{\beta}-\bm{\kappa}/\bm{\omega}-\bm{\Psi}\bm{\eta})\right),\bm{\Sigma}=\left(\bm{\Psi}^{\prime}\bm{\Omega}\bm{\Psi}+\frac{1}{\sigma^{2}_{\eta_{1}}}I_{g}\right)^{-1}\right)

Longitudinal ordinal

Below, Ψf\Psi_{f} is the matrix of basis functions for all areas and timepoints, corresponding to vec(𝜼)\text{vec}(\bm{\eta}).

  1. 1.

    Sample ωitk|\omega_{itk}|\cdot for i=1,,nti=1,\ldots,n_{t}, t=1,,ntt=1,\ldots,n_{t}, k=1,,kik=1,\ldots,k_{i}

    ωitk|PG(w~itnitk,𝒖itk𝒙i𝜷𝝍i𝜼t)\omega_{itk}|\cdot\sim\text{PG}(\widetilde{w}_{it}n_{itk},\bm{u}_{itk}^{\prime}-\bm{x}_{i}^{\prime}\bm{\beta}-\bm{\psi}_{i}^{\prime}\bm{\eta}_{t})
  2. 2.

    Sample ϕ|TruncNorm(t=2T𝜼t𝜼t1t=2T𝜼t1𝜼t1,ση2t=2T𝜼t1𝜼t1,1,1)\phi|\cdot\sim\text{TruncNorm}\left(\frac{\sum_{t=2}^{T}\bm{\eta}_{t}^{\prime}\bm{\eta}_{t-1}}{\sum_{t=2}^{T}\bm{\eta}_{t-1}^{\prime}\bm{\eta}_{t-1}},\frac{\sigma^{2}_{\eta}}{\sum_{t=2}^{T}\bm{\eta}_{t-1}^{\prime}\bm{\eta}_{t-1}},-1,1\right)

  3. 3.

    Sample ση12|IG(a+m/2,b+12𝜼1𝜼1)\sigma^{2}_{\eta_{1}}|\cdot\sim\text{IG}(a+m/2,b+\frac{1}{2}\bm{\eta}_{1}^{\prime}\bm{\eta}_{1})

  4. 4.

    Sample ση2|IG(a+m(T1)/2,b+t=2T(𝜼tϕ𝜼t1)(𝜼tϕ𝜼t1)/2)\sigma^{2}_{\eta}|\cdot\sim\text{IG}(a+m(T-1)/2,b+\sum_{t=2}^{T}(\bm{\eta}_{t}-\phi\bm{\eta}_{t-1})^{\prime}(\bm{\eta}_{t}-\phi\bm{\eta}_{t-1})/2)

  5. 5.

    Sample 𝜷:\bm{\beta}:

    𝜷|Np(𝝁=(~𝑿𝛀(𝑮~𝜸𝜿~/𝝎𝚿~fvec(𝜼))),𝚺=(𝑿~𝛀𝑿~+1σβ2Ip)1)\bm{\beta}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\left(\bm{\widetilde{}}{\bm{X}}^{\prime}\bm{\Omega}(\widetilde{\bm{G}}\bm{\gamma}-\widetilde{\bm{\kappa}}/\bm{\omega}-\widetilde{\bm{\Psi}}_{f}\text{vec}(\bm{\eta}))\right),\bm{\Sigma}=\left(\widetilde{\bm{X}}^{\prime}\bm{\Omega}\widetilde{\bm{X}}+\frac{1}{\sigma^{2}_{\beta}}I_{p}\right)^{-1}\right)
  6. 6.

    Sample 𝜸:\bm{\gamma}:

    𝜸|Ng(𝝁=(𝑮~𝛀(𝜿~/𝝎+𝑿~𝜷+𝚿~fvec(𝜼)),𝚺=(𝑮~𝛀𝑮~+1σγ2Ig)1)\bm{\gamma}|\cdot\sim\text{N}_{g}\left(\bm{\mu}=\left(\widetilde{\bm{G}}^{\prime}\bm{\Omega}(\widetilde{\bm{\kappa}}/\bm{\omega}+\widetilde{\bm{X}}^{\prime}\bm{\beta}+\widetilde{\bm{\Psi}}_{f}\text{vec}(\bm{\eta})\right),\bm{\Sigma}=\left(\widetilde{\bm{G}}^{\prime}\bm{\Omega}\widetilde{\bm{G}}+\frac{1}{\sigma^{2}_{\gamma}}I_{g}\right)^{-1}\right)
  7. 7.

    Sample 𝜼1:\bm{\eta}_{1}:

    𝜼1|Nm(𝝁=(𝚿~1𝛀1(𝑮~1𝜸𝜿~1/𝝎1𝑿~1𝜷)),𝚺=(𝚿~1𝛀1𝚿~1+(1ση12+ϕ2ση2)Im)1)\bm{\eta}_{1}|\cdot\sim\text{N}_{m}\left(\bm{\mu}=\left(\widetilde{\bm{\Psi}}_{1}^{\prime}\bm{\Omega}_{1}(\widetilde{\bm{G}}_{1}^{\prime}\bm{\gamma}-\widetilde{\bm{\kappa}}_{1}/\bm{\omega}_{1}-\widetilde{\bm{X}}_{1}\bm{\beta})\right),\bm{\Sigma}=\left(\widetilde{\bm{\Psi}}_{1}^{\prime}\bm{\Omega}_{1}\widetilde{\bm{\Psi}}_{1}+\left(\frac{1}{\sigma^{2}_{\eta_{1}}}+\frac{\phi^{2}}{\sigma^{2}_{\eta}}\right)I_{m}\right)^{-1}\right)
  8. 8.

    Sample 𝜼t\bm{\eta}_{t}, for t=2,,T1:t=2,\ldots,T-1:

    𝜼t|Np(𝝁=𝚺(𝚿~t𝛀t(𝑮~t𝜸𝜿~t/𝝎t𝑿~t𝜷)+ϕση2(𝜼t1+𝜼t+1)),𝚺=(𝚿~t𝛀t𝚿~t+1+ϕ2ση2Im)1)\bm{\eta}_{t}|\cdot\sim\begin{aligned} \text{N}_{p}\Bigg{(}\bm{\mu}&=\bm{\Sigma}\left(\widetilde{\bm{\Psi}}_{t}^{\prime}\bm{\Omega}_{t}(\widetilde{\bm{G}}^{\prime}_{t}\bm{\gamma}-\widetilde{\bm{\kappa}}_{t}/\bm{\omega}_{t}-\widetilde{\bm{X}}_{t}\bm{\beta})+\frac{\phi}{\sigma^{2}_{\eta}}(\bm{\eta}_{t-1}+\bm{\eta}_{t+1})\right),\\ \bm{\Sigma}&=\left(\widetilde{\bm{\Psi}}_{t}^{\prime}\bm{\Omega}_{t}\widetilde{\bm{\Psi}}_{t}+\frac{1+\phi^{2}}{\sigma^{2}_{\eta}}I_{m}\right)^{-1}\Bigg{)}\end{aligned}
  9. 9.

    Sample 𝜼T\bm{\eta}_{T}

    𝜼T|Np(𝝁=𝚺(𝚿~T𝛀T(𝑮~T𝜸T𝜿~T/𝝎T𝑿~T𝜷)+ϕση2𝜼T1),𝚺=(𝚿~T𝛀T𝚿~t+1ση2Im)1)\bm{\eta}_{T}|\cdot\sim\text{N}_{p}\left(\bm{\mu}=\bm{\Sigma}\left(\widetilde{\bm{\Psi}}_{T}^{\prime}\bm{\Omega}_{T}(\bm{\widetilde{\bm{G}}}_{T}^{\prime}\bm{\gamma}_{T}-\widetilde{\bm{\kappa}}_{T}/\bm{\omega}_{T}-\widetilde{\bm{X}}_{T}\bm{\beta})+\frac{\phi}{\sigma^{2}_{\eta}}\bm{\eta}_{T-1}\right),\bm{\Sigma}=\left(\widetilde{\bm{\Psi}}_{T}^{\prime}\bm{\Omega}_{T}\widetilde{\bm{\Psi}}_{t}+\frac{1}{\sigma^{2}_{\eta}}I_{m}\right)^{-1}\right)

Longitudinal, nominal VB algorithm

For longitudinal, nominal data, the binary VB algorithm below can be fit K1K-1 times to data that is factored according to the stick-breaking representation (2.3).

Data: 𝑪=[𝑿,𝚿]{\bm{C}}=[{\bm{X}},{\bm{\Psi}}] and 𝜿{\bm{\kappa}}
Initialize 𝝁ˇη\widecheck{\bm{\mu}}_{\eta}, μˇϕ\widecheck{\mu}_{\phi}, μˇϕ2\widecheck{\mu}_{\phi^{2}}, σˇϕ2\widecheck{\sigma}^{2}_{\phi} , 𝚺ˇηt\widecheck{\bm{\Sigma}}_{\eta_{t}}, σˇη2\widecheck{\sigma}^{2}_{\eta} and ξˇitk\widecheck{\xi}_{itk} for t=1,,Tt=1,\ldots,T; i=1,,nti=1,\ldots,n_{t};
for j=1j=1 until convergence do
  𝝎ˇ(w~112ξˇ11tanh(ξˇ11/2),,w~nTT2ξˇnTTtanh(ξˇnTT/2));\bm{\widecheck{\omega}}\leftarrow\left(\frac{\widetilde{w}_{11}}{2\widecheck{\xi}_{11}}\tanh(\widecheck{\xi}_{11}/2),\ldots,\frac{\widetilde{w}_{n_{T}T}}{2\widecheck{\xi}_{n_{T}T}}\tanh(\widecheck{\xi}_{n_{T}T}/2)\right);
  𝛀ˇDiag(𝝎ˇ);\bm{\widecheck{\Omega}}\leftarrow\text{Diag}\left(\widecheck{\bm{\omega}}\right);
  𝝁ˇβ(𝑿𝛀ˇ𝑿+1/σβ2𝑰p)1𝑿𝛀ˇ(𝜿/𝝎ˇ𝚿𝝁ˇη);\bm{\widecheck{\mu}}_{\beta}\leftarrow({\bm{X}}^{\prime}\widecheck{\bm{\Omega}}{\bm{X}}+1/\sigma^{2}_{\beta}\bm{I}_{p})^{-1}{\bm{X}}^{\prime}\widecheck{\bm{\Omega}}({\bm{\kappa}}/\widecheck{\bm{\omega}}-{\bm{\Psi}}\widecheck{\bm{\mu}}_{\eta});
  𝚺ˇη1(𝚿1𝛀ˇ1𝚿1+σˇη12(1+μˇϕ2)𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{1}}\leftarrow({\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}{\bm{\Psi}}_{1}+\widecheck{\sigma}^{2}_{\eta_{1}}(1+\widecheck{\mu}_{\phi^{2}})\bm{I}_{m})^{-1};
  𝝁ˇη1𝚺ˇη1𝚿1𝛀ˇ1(𝜿1/𝝎ˇ1𝑿1𝝁ˇβ)+𝝁ˇη2μˇϕσˇη12;\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\widecheck{\bm{\Sigma}}_{\eta_{1}}{\bm{\Psi}}_{1}^{\prime}\widecheck{\bm{\Omega}}_{1}({\bm{\kappa}}_{1}/\widecheck{\bm{\omega}}_{1}-{\bm{X}}_{1}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\bm{\mu}}_{\eta_{2}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta_{1}};
  𝝁ˇη1𝝁ˇη1𝔼[𝝁ˇη1];\bm{\widecheck{\mu}}_{\eta_{1}}\leftarrow\bm{\widecheck{\mu}}_{\eta_{1}}-\mathbb{E}[\bm{\widecheck{\mu}}_{\eta_{1}}];
  for t=1t=1 to TT do
     𝚺ˇηt(𝚿t𝛀ˇt𝚿t+(1+μˇϕ2)σˇη2𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{t}}\leftarrow({\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}{\bm{\Psi}}_{t}+(1+\widecheck{\mu}_{\phi^{2}})\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};
    𝝁ˇηt𝚺ˇηt𝚿t𝛀ˇt(𝜿t/𝝎ˇt𝑿t𝝁ˇβ)+μˇϕσˇη2(𝝁ˇηt+𝝁ˇηt+1);\bm{\widecheck{\mu}}_{\eta_{t}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{t}}{\bm{\Psi}}_{t}^{\prime}\widecheck{\bm{\Omega}}_{t}({\bm{\kappa}}_{t}/\widecheck{\bm{\omega}}_{t}-{\bm{X}}_{t}\widecheck{\bm{\mu}}_{\beta})+\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta}(\bm{\widecheck{\mu}}_{\eta_{t}}+\bm{\widecheck{\mu}}_{\eta_{t+1}});
    
  end for
 𝚺ˇηT(𝚿T𝛀ˇT𝚿T+σˇη2𝑰m)1;\bm{\widecheck{\Sigma}}_{\eta_{T}}\leftarrow({\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}{\bm{\Psi}}_{T}+\widecheck{\sigma}^{2}_{\eta}\bm{I}_{m})^{-1};
  𝝁ˇηT𝚺ˇηT𝚿T𝛀ˇT(𝜿T/𝝎ˇT𝑿T𝝁ˇβ)+𝝁ˇηT1μˇϕσˇη2;\bm{\widecheck{\mu}}_{\eta_{T}}\leftarrow\bm{\widecheck{\Sigma}}_{\eta_{T}}{\bm{\Psi}}_{T}^{\prime}\widecheck{\bm{\Omega}}_{T}({\bm{\kappa}}_{T}/\widecheck{\bm{\omega}}_{T}-{\bm{X}}_{T}\widecheck{\bm{\mu}}_{\beta})+\bm{\widecheck{\mu}}_{\eta_{T-1}}\widecheck{\mu}_{\phi}\widecheck{\sigma}^{2}_{\eta};
  𝝁ˇη(𝝁ˇη1,,𝝁ˇηT);\bm{\widecheck{\mu}}_{\eta}\leftarrow(\bm{\widecheck{\mu}}_{\eta_{1}},\ldots,\bm{\widecheck{\mu}}_{\eta_{T}});
  𝚺ˇηblockdiag(𝚺ˇη1,,𝚺ˇηT);\bm{\widecheck{\Sigma}}_{\eta}\leftarrow\text{blockdiag}(\bm{\widecheck{\Sigma}}_{\eta_{1}},\ldots,\bm{\widecheck{\Sigma}}_{\eta_{T}});
  𝚺ˇ(blockdiag(1σβ2Iq,𝚺ˇη)+𝑪𝛀ˇ𝑪)1;\widecheck{\bm{\Sigma}}\leftarrow\left(\text{blockdiag}\left(\frac{1}{\sigma^{2}_{\beta}}I_{q},\bm{\widecheck{\Sigma}}_{\eta}\right)+{\bm{C}}^{\prime}\bm{\widecheck{\Omega}}{\bm{C}}\right)^{-1};
  𝝁ˇ(𝝁ˇβ,𝝁ˇη)𝚺ˇ𝑪𝜿;\widecheck{\bm{\mu}}\leftarrow(\widecheck{\bm{\mu}}_{\beta}^{\prime},\widecheck{\bm{\mu}}_{\eta}^{\prime})^{\prime}\leftarrow\widecheck{\bm{\Sigma}}{\bm{C}}^{\prime}{\bm{\kappa}};
  mt=1T1𝝁ˇηt𝝁ˇηt+1/(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT));m\leftarrow\sum_{t=1}^{T-1}\bm{\widecheck{\mu}}_{\eta_{t}}^{\prime}\bm{\widecheck{\mu}}_{\eta_{t+1}}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);
  σˇϕ2σˇη2/(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT));\widecheck{\sigma}^{2}_{\phi}\leftarrow\widecheck{\sigma}^{2}_{\eta}\big{/}\left(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}})\right);
 (1m)/σˇϕ;\ell\leftarrow(-1-m)/\widecheck{\sigma}_{\phi};
  u(1m)/σˇϕ;u\leftarrow(1-m)/\widecheck{\sigma}_{\phi};
  μˇϕmσˇϕφ(u)φ()Φ(u)Φ();\widecheck{\mu}_{\phi}\leftarrow m-\widecheck{\sigma}_{\phi}\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)};
  μˇϕ2μˇϕ2+σˇϕ2(1uφ(u)φ()Φ(u)Φ()(φ(u)φ()Φ(u)Φ())2);\widecheck{\mu}_{\phi^{2}}\leftarrow\widecheck{\mu}_{\phi}^{2}+\widecheck{\sigma}^{2}_{\phi}\left(1-\frac{u\varphi(u)-\ell\varphi(\ell)}{\Phi(u)-\Phi(\ell)}-\left(\frac{\varphi(u)-\varphi(\ell)}{\Phi(u)-\Phi(\ell)}\right)^{2}\right);
  σˇη12(b+𝝁ˇη1𝝁ˇη1+tr(𝚺ˇη1))/(a+r/2);\widecheck{\sigma}^{2}_{\eta_{1}}\leftarrow(b+\widecheck{\bm{\mu}}_{\eta_{1}}^{\prime}\widecheck{\bm{\mu}}_{\eta_{1}}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{1}}))/(a+r/2);
  σˇη2(b+(1/2)(t=2T𝜼ˇ1𝜼ˇ1+tr(𝚺ˇη1)2μˇϕt=1T1𝜼ˇt𝜼ˇt+1+\widecheck{\sigma}^{2}_{\eta}\leftarrow(b+(1/2)(\sum_{t=2}^{T}\widecheck{\bm{\eta}}_{1}^{\prime}\widecheck{\bm{\eta}}_{1}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-1}})-2\widecheck{\mu}_{\phi}\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}\widecheck{\bm{\eta}}_{t+1}+
      μˇϕ2(t=1T1𝜼ˇt𝜼ˇt+tr(𝚺ˇηT))))/(a+r(T1)/2);\widecheck{\mu}_{\phi^{2}}(\sum_{t=1}^{T-1}\widecheck{\bm{\eta}}_{t}^{\prime}\widecheck{\bm{\eta}}_{t}+\text{tr}(\widecheck{\bm{\Sigma}}_{\eta_{-T}}))))/(a+r(T-1)/2);
  for i=1i=1 to nn do
     for t=ti,ti+1,ti+2t=t_{i},t_{i}+1,t_{i}+2 do
        ξˇit(𝑪~it𝚺ˇ𝑪it+(𝑪it𝝁ˇ)2)1/2\widecheck{\xi}_{it}\leftarrow(\widetilde{\bm{C}}_{it}^{\prime}\widecheck{\bm{\Sigma}}{\bm{C}}_{it}+({\bm{C}}_{it}^{\prime}\widecheck{\bm{\mu}})^{2})^{1/2}
     end for
    
  end for
 
end for
Algorithm 3 Variational Bayes algorithm for the longitudinal binary model

References

  • Agresti, (2010) Agresti, A. (2010). Analysis of Ordinal Categorical Data, volume 656. John Wiley & Sons.
  • Albert and Chib, (1993) Albert, J. H. and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association, 88(422):669–679.
  • Albert and Chib, (2001) Albert, J. H. and Chib, S. (2001). Sequential Ordinal Modeling with Applications to Survival Data. Biometrics, 57(3):829–836.
  • Bailar, (1975) Bailar, B. A. (1975). The effects of rotation group bias on estimates from panel surveys. Journal of the American Statistical Association, 70(349):23–30.
  • Bauder et al., (2021) Bauder, C., Luery, D., and Szelepka, S. (2021). Small Area Estimation of Health Insurance Coverage in 2010-2019.
  • Beltrán-Sánchez et al., (2024) Beltrán-Sánchez, M. Á., Martinez-Beneito, M., and Corberán-Vallet, A. (2024). Bayesian modeling of spatial ordinal data from health surveys. Statistics in Medicine.
  • Binder, (1983) Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review, 51(3):279.
  • Bishop, (2006) Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.
  • Blei et al., (2017) Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518):859–877.
  • Boes and Winkelmann, (2006) Boes, S. and Winkelmann, R. (2006). Ordered Response Models. Allgemeines Statistisches Archiv, 90:167–181.
  • Bradley et al., (2015) Bradley, J. R., Holan, S. H., and Wikle, C. K. (2015). Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics. The Annals of Applied Statistics, 9(4):1761 – 1791.
  • Brewer et al., (1984) Brewer, K., Early, L., and Hanif, M. (1984). Poisson, Modified Poisson and Collocated Sampling. Journal of Statistical Planning and Inference, 10(1):15–30.
  • Bürkner and Vuorre, (2019) Bürkner, P.-C. and Vuorre, M. (2019). Ordinal Regression Models in Psychology: Tutorial. Advances in Methods and Practices in Psychological Science, 2(1):77–101.
  • Carter et al., (2024) Carter, J. B., Browning, C. R., Boettner, B., Pinchak, N., and Calder, C. A. (2024). Land-use filtering for nonstationary spatial prediction of collective efficacy in an urban environment. The Annals of Applied Statistics, 18(1).
  • Daly and Robinson, (2022) Daly, M. and Robinson, E. (2022). Depression and anxiety during COVID-19. The Lancet, 399(10324):518.
  • Durante and Rigon, (2019) Durante, D. and Rigon, T. (2019). Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Statistical Science, 34(3).
  • Fienberg, (1980) Fienberg, S. E. (1980). The analysis of cross-classified categorical data. Massachusetts Institute of Technology Press, Cambridge and London.
  • Fienberg, (2007) Fienberg, S. E. (2007). The Analysis of Cross-classified Categorical Data. Springer Science & Business Media.
  • Gelman, (2007) Gelman, A. (2007). Struggles with Survey Weighting and Regression Modeling. Statistical Science, 22(2).
  • Gneiting and Raftery, (2007) Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477):359–378.
  • Hawes et al., (2022) Hawes, M. T., Szenczy, A. K., Klein, D. N., Hajcak, G., and Nelson, B. D. (2022). Increases in depression and anxiety symptoms in adolescents and young adults during the COVID-19 pandemic. Psychol. Med., 52(14):3222–3230.
  • Hidiroglou and You, (2016) Hidiroglou, M. A. and You, Y. (2016). Comparison of Unit Level and Area Level Small Area Estimators. Survey Methodology, 42:41–61.
  • Higgs and Hoeting, (2010) Higgs, M. D. and Hoeting, J. A. (2010). A clipped latent variable model for spatially correlated ordered categorical data. Computational Statistics & Data Analysis, 54(8):1999 – 2011.
  • Horvitz and Thompson, (1952) Horvitz, D. G. and Thompson, D. J. (1952). A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association, 47(260):663–685.
  • Hughes and Haran, (2013) Hughes, J. and Haran, M. (2013). Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(1):139–159.
  • Jaakkola and Jordan, (2000) Jaakkola, T. and Jordan, M. I. (2000). Bayesian Parameter Estimation via Variational Methods. Statistics and Computing, 10:25–37.
  • Jimenez et al., (2023) Jimenez, A., Balamuta, J. J., and Culpepper, S. A. (2023). A Sequential Exploratory Diagnostic Model Using a Pólya-Gamma Data Augmentation Strategy. British Journal of Mathematical and Statistical Psychology, 76(3):513–538.
  • Johnson et al., (1994) Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions, volume 1 of Wiley Series in Probability and Statistics. John Wiley & Sons, Nashville, TN, 2 edition.
  • (29) Kang, J. and Kottas, A. (2024a). Bayesian Nonparametric Risk Assessment in Developmental Toxicity Studies with Ordinal Responses. arXiv preprint arXiv:2408.11803.
  • (30) Kang, J. and Kottas, A. (2024b). Flexible Bayesian Modeling for Longitudinal Binary and Ordinal Responses. Statistics and Computing, 34(6).
  • Kullback and Leibler, (1951) Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86.
  • Kunihama et al., (2019) Kunihama, T., Halpern, C. T., and Herring, A. H. (2019). Non-parametric Bayes Models for Mixed Scale Longitudinal Surveys. Journal of the Royal Statistical Society Series C: Applied Statistics, 68(4):1091–1109.
  • Linderman et al., (2015) Linderman, S., Johnson, M. J., and Adams, R. P. (2015). Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-Gamma Augmentation. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
  • Lumley, (2004) Lumley, T. (2004). Analysis of Complex Survey Samples. Journal of Statistical Software, 9(1):1–19. R package verson 2.2.
  • Machini et al., (2022) Machini, B., Achia, T. N., Chesang, J., Amboko, B., Mwaniki, P., and Kipruto, H. (2022). Cross-sectional Study to Predict Subnational Levels of Health Workers’ Knowledge about Severe Malaria Treatment in Kenya. BMJ Open, 12(1):e058511.
  • McCullagh, (1980) McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109–127.
  • Metin et al., (2022) Metin, A., Erbiçer, E. S., Şen, S., and Çetinkaya, A. (2022). Gender and COVID-19 related fear and anxiety: A meta-analysis. Journal of Affective Disorders, 310:384–395.
  • Parker et al., (2022) Parker, P. A., Holan, S. H., and Janicki, R. (2022). Computationally efficient Bayesian Unit-level models for non-Gaussian data Under informative sampling with application to estimation of health insurance coverage. The Annals of Applied Statistics, 16(2):887 – 904.
  • Parker et al., (2023) Parker, P. A., Holan, S. H., and Janicki, R. (2023). Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure. Journal of Survey Statistics and Methodology.
  • Polson et al., (2013) Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian Inference for Logistic Models Using Pólya-Gamma Latent Variables. Journal of the American Statistical Association, 108(504):1339–1349.
  • Savitsky and Toth, (2016) Savitsky, T. D. and Toth, D. (2016). Bayesian estimation under informative sampling. Electronic Journal of Statistics, 10(1):1677 – 1708.
  • Schliep and Hoeting, (2015) Schliep, E. M. and Hoeting, J. A. (2015). Data augmentation and parameter expansion for independent or spatially correlated ordinal data. Computational Statistics & Data Analysis, 90:1–14.
  • Skinner, (2018) Skinner, C. (2018). Analysis of Categorical Data for Complex Surveys. International Statistical Review, 87(S1).
  • Skinner, (1989) Skinner, C. J. (1989). Domain means, regression and multivariate analysis. In Skinner, C. J., Holt, D., and Smith, T. M. F., editors, Analysis of Complex Surveys, chapter 2, pages 59–84. Wiley, Chichester.
  • Sutradhar and Kovacevic, (2000) Sutradhar, B. C. and Kovacevic, M. (2000). Analysing ordinal longitudinal survey data: Generalised estimating equations approach. Biometrika, 87(4):837–848.
  • Thompson, (2015) Thompson, M. E. (2015). Using Longitudinal Complex Survey Data. Annual Review of Statistics and Its Application, 2(1):305–320.
  • Tutz, (1990) Tutz, G. (1990). Sequential Item Response Models with an Ordered Response. British Journal of Mathematical and Statistical Psychology, 43(1):39–55.
  • Tutz, (1991) Tutz, G. (1991). Sequential Models in Categorical Regression. Computational Statistics & Data Analysis, 11(3):275–295.
  • Tutz et al., (2005) Tutz, G., Simonoff, J., Kateri, M., Lesaffre, E., Loughin, T., Svensson, E., Aguilera, A., Liu, I., and Agresti, A. (2005). The analysis of ordered categorical data: An overview and a survey of recent developments – Discussion. TEST, 14(1):30–73.
  • Vedensky et al., (2023) Vedensky, D., Parker, P. A., and Holan, S. H. (2023). Bayesian Unit-level Models for Longitudinal Survey Data under Informative Sampling: An Analysis of Expected Job Loss Using the Household Pulse Survey. arXiv preprint arXiv:2304.07897.
  • World Health Organization, (2022) World Health Organization (2022). Mental Health and COVID-19: Early evidence of the pandemic’s impact. Technical report, World Health Organization.