This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Modeling for Dynamic Ordinal Regression Relationships: An Application to Estimating Maturity of Rockfish in California

Maria DeYoreo and Athanasios Kottas M. DeYoreo (maria.deyoreo@stat.duke.edu) is postdoctoral researcher, Department of Statistical Science, Duke University, Durham, NC, 27708, USA, and A. Kottas (thanos@soe.ucsc.edu) is Professor, Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA, 95064, USA. The authors wish to thank Stephan Munch for providing the rockfish data and for several useful comments on the interpretation of the results, as well as Alec MacCall, Don Pearson, and Marc Mangel for valuable input on data collection and on key aspects of the specific problem from fisheries research. This research is part of the Ph.D dissertation of M. DeYoreo, completed at University of California, Santa Cruz, and was supported in part by the National Science Foundation under award DMS 1310438. M. DeYoreo is currently supported by a grant from the National Science Foundation under award SES 1131897.
Abstract

We develop a Bayesian nonparametric framework for modeling ordinal regression relationships which evolve in discrete time. The motivating application involves a key problem in fisheries research on estimating dynamically evolving relationships between age, length and maturity, the latter recorded on an ordinal scale. The methodology builds from nonparametric mixture modeling for the joint stochastic mechanism of covariates and latent continuous responses. This approach yields highly flexible inference for ordinal regression functions while at the same time avoiding the computational challenges of parametric models. A novel dependent Dirichlet process prior for time-dependent mixing distributions extends the model to the dynamic setting. The methodology is used for a detailed study of relationships between maturity, age, and length for Chilipepper rockfish, using data collected over 15 years along the coast of California.

KEY WORDS: Chilipepper rockfish; dependent Dirichlet process; dynamic density estimation; growth curves; Markov chain Monte Carlo; ordinal regression

1 Introduction

Consider ordinal responses collected along with covariates over discrete time. Furthermore, assume multiple observations are recorded at each point in time. This article develops Bayesian nonparametric modeling and inference for a discrete time series of ordinal regression relationships. Our aim is to provide flexible inference for the series of regression functions, estimating the unique relationships present at each time, while introducing dependence by assuming each distribution is correlated with its predecessors.

Environmental characteristics consisting of ordered categorical and continuous measurements may be monitored and recorded at different points in time, requiring a model for the temporal relationships between the environmental variables. The relationships present at a particular point in time are of interest, as well as any trends or changes which occur over time. Empirical distributions in environmental settings may exhibit non-standard features including heavy tails, skewness, and multimodality. To capture these features, one must move beyond standard parametric models in order to obtain more flexible inference and prediction.

The motivating application for this work lies in modeling fish maturity as a function of age and length. This is a key problem in fisheries science, one reason being that estimates of age at maturity play an important role in population estimates of sustainable harvest rates (Clark, 1991; Hannah et al., 2009). The specific data set comes from the National Marine Fisheries Service and consists of year of sampling, age recorded in years, length in millimeters, and maturity for female Chilipepper rockfish, with measurements collected over 15 years along the coast of California. Maturity is recorded on an ordinal scale, with values taken to be from 11 through 33, where 11 indicates immature and 22 and 33 represent pre- and post-spawning mature, respectively. More details on the data are provided in Section 3. Exploratory analysis suggests both symmetric, unimodal as well as less standard shapes for the marginal distributions of length and age; histograms for three years are shown in Figure 1. Bivariate data plots of age and length suggest similar shapes across years, with some differences in location and scale, and clear differences in sample size, as can be seen from Figure 2. To make the plot more readable, random noise has been added to age, which is recorded on a discretized scale. Maturity level is also indicated; red color represents immature, green pre-spawning mature, and blue post-spawning mature. Again, there are similarities including the concentration of immature fish near the lower left quadrants, but also differences such as the lack of immature fish in years 19951995 through 20002000 as compared to the early and later years.

Refer to caption
Figure 1: Data histograms of length and age for female Chilipepper rockfish in years 19941994, 19981998, and 20002000.
Refer to caption
Figure 2: Bivariate plots of length versus age at each year of data, with data points colored according to maturity level. Red represents level 11, green level 22, and blue level 33. Values of age have been jittered to make the plots more readable.

In addition to studying maturity as a function of age and length, inference for the age and length distributions is also important. This requires a joint model which treats age and length as random in addition to maturity. We are not aware of any existing modeling strategy for this problem which can handle multivariate mixed data collected over time. Compromising this important aspect of the problem, and assuming the regression of maturity on body characteristics is the sole inferential objective, a possible approach would be to use an ordered probit regression model. Empirical (data-based) estimates for the trend in maturity as a function of length or age indicate shapes which may not be captured well by a parametric model. For instance, the probability a fish is immature (level 1) is generally decreasing with length, however, in some of the years, the probability a fish is post-spawning mature (level 3) is increasing up to a certain length value and then decreasing. This is not a trend that can be captured by parametric models for ordinal regression (Boes and Winkelmann, 2006, discuss some of these properties). One could include higher order and/or interaction terms, though it is not obvious which terms to include, and how to capture the different trends across years.

In practice, virtually all methods for studying maturity as a function of age and/or length use logistic regression or some variant, often collapsing maturity into two levels (immature and mature) and treating each covariate separately in the analysis (e.g., Hannah et al., 2009; Bobko and Berkeley, 2004). Bobko and Berkeley (2004) applied logistic regression with length as a covariate, and to obtain an estimate of age at 50%50\% maturity (the age at which 50%50\% of fish are mature), they used their estimate for length at 50%50\% maturity and solved for the corresponding age given by the von Bertalanffy growth curve, which relates age to length using a particular parametric function. Others assume that maturity is independent of length after conditioning on age, leading to inaccurate estimates of the proportion mature at a particular age or length (Morgan and Hoenig, 1997).

We develop a nonparametric Bayesian model to study time-evolving relationships between maturation, length, and age. These three variables constitute a random vector, and although maturity is recorded on an ordinal scale, it is natural to conceptualize an underlying continuous maturation variable. Distinguishing features of our approach include the joint modeling for the stochastic mechanism of maturation and length and age, and the ability to obtain flexible time-dependent inference for multiple ordinal maturation categories. While estimating maturity as a function of length and age is of primary interest, the joint modeling framework provides inference for a variety of functionals involving the three body characteristics.

Our goal is to construct a modeling framework for dynamic ordinal regression which avoids strict parametric assumptions and possesses features that make it well-suited to the fish maturity application, as well as to similar evolutionary biology problems on studying natural selection characteristics (such as survival or maturation) in terms of phenotypic traits. To this end, we build on previous work on ordinal regression not involving time (DeYoreo and Kottas, 2014a), where the ordinal responses arise from latent continuous variables, and the joint latent response-covariate distribution is modeled using a Dirichlet process (DP) mixture of multivariate normals (Müller et al., 1996). In the context of the rockfish data, we model maturity, length, and age jointly, using a DP mixture. This modeling approach is further developed here to handle ordinal regressions which are indexed in discrete time, through use of a new dependent Dirichlet process (DDP) prior (MacEachern, 1999, 2000), which estimates the regression relationship at each time point in a flexible way, while incorporating dependence across time.

We review the model for ordinal regression without the time component in Section 2.1. Section 2.2 introduces the DDP, and in Section 2.3, we develop a new method for incorporating dependence in the DP weights to handle distributions indexed in discrete time. The model is then developed further in Section 3 in the context of the motivating application, and applied to analyze the rockfish data discussed above. Section 4 concludes with a discussion. Technical details on properties of the DDP prior model, and on the posterior simulation method are included in the appendixes.

2 Modeling Framework

2.1 Bayesian Nonparametric Ordinal Regression

We first describe our approach to regression in the context of a single distribution, that is, without any aspect of time. Let {(yi,𝒙i):i=1,,n}\{(y_{i},\boldsymbol{x}_{i}):i=1,\dots,n\} denote the data, where each observation consists of an ordinal response yiy_{i} along with a vector of covariates 𝒙i=(xi1,,xip)\boldsymbol{x}_{i}=(x_{i1},\dots,x_{ip}). The methodology is developed in DeYoreo and Kottas (2014a) for multivariate ordinal responses, however we work with a univariate response for notational simplicity and because this is the relevant setting for the application of interest. Our model assumes the ordinal responses arise as discretized versions of latent continuous responses, which is natural for many settings and particularly relevant for the fish maturity application, as maturation is a continuous variable recorded on a discrete scale. With CC categories, introduce latent continuous responses (Z1,,Zn)(Z_{1},\dots,Z_{n}) such that Yi=jY_{i}=j if and only if Zi(γj1,γj]Z_{i}\in(\gamma_{j-1},\gamma_{j}], for j=1,,Cj=1,\dots,C, and cut-offs =γ0<γ1<<γC=-\infty=\gamma_{0}<\gamma_{1}<\dots<\gamma_{C}=\infty.

We focus on settings in which the covariates may be treated as random, which is appropriate, indeed necessary, for many environmental applications. In the fish maturity example, the body characteristics are interrelated and arise in the form of a data vector, and we are interested in various relationships, including but not limited to the way in which maturity varies with age and length. This motivates our focus on building a flexible model for the joint density f(z,𝒙)f(z,\boldsymbol{x}), for which we apply a DP mixture of multivariate normals: (zi,𝒙i)GiidN(;𝝁,𝚺)𝑑G(𝝁,𝚺)(z_{i},\boldsymbol{x}_{i})\mid G\stackrel{{\scriptstyle iid}}{{\sim}}\int\mathrm{N}(\cdot;\boldsymbol{\mu},\boldsymbol{\Sigma})dG(\boldsymbol{\mu},\boldsymbol{\Sigma}), with Gα,G0DP(α,G0)G\mid\alpha,G_{0}\sim\mathrm{DP}(\alpha,G_{0}).

By the constructive definition of the DP (Sethuraman, 1994), a realization GG from a DP(α,G0)\mathrm{DP}(\alpha,G_{0}) is almost surely of the form G=l=1plδ𝜽lG=\sum_{l=1}^{\infty}p_{l}\delta_{\boldsymbol{\theta}_{l}}. The locations 𝜽l=\boldsymbol{\theta}_{l}= (𝝁l,𝚺l)(\boldsymbol{\mu}_{l},\boldsymbol{\Sigma}_{l}) are independent realizations from the centering distribution G0G_{0}, and the weights are determined through stick-breaking from beta distributed random variables. In particular, let vliidbeta(1,α)v_{l}\stackrel{{\scriptstyle iid}}{{\sim}}\text{beta}(1,\alpha), l=1,2,l=1,2,\dots, independently of {𝜽l}\{\boldsymbol{\theta}_{l}\}, and define p1=v1p_{1}=v_{1}, and for l=2,3,l=2,3,\dots, pl=vlr=1l1(1vr)p_{l}=v_{l}\prod_{r=1}^{l-1}(1-v_{r}). Therefore, the model for f(z,𝒙)f(z,\boldsymbol{x}) has an almost sure representation as a countable mixture of multivariate normals, and implies the following induced model for the regression function:

Pr(Y=j𝒙;G)=r=1wr(𝒙)γj1γjN(z;mr(𝒙),sr)𝑑z\mathrm{Pr}(Y=j\mid\boldsymbol{x};G)=\sum_{r=1}^{\infty}w_{r}(\boldsymbol{x})\int_{\gamma_{j-1}}^{\gamma_{j}}\text{N}(z;m_{r}(\boldsymbol{x}),s_{r})\,dz (1)

with covariate-dependent weights wr(𝒙)prN(𝒙;𝝁rx,𝚺rxx)w_{r}(\boldsymbol{x})\propto p_{r}\text{N}(\boldsymbol{x};\boldsymbol{\mu}_{r}^{x},\boldsymbol{\Sigma}_{r}^{xx}), covariate-dependent means mr(𝒙)=μrz+𝚺rzx(𝚺rxx)1(𝒙𝝁rx)m_{r}(\boldsymbol{x})=\mu_{r}^{z}+\boldsymbol{\Sigma}_{r}^{zx}(\boldsymbol{\Sigma}_{r}^{xx})^{-1}(\boldsymbol{x}-\boldsymbol{\mu}_{r}^{x}), and variances sr=Σrzz𝚺rzx(𝚺rxx)1𝚺rxzs_{r}=\Sigma_{r}^{zz}-\boldsymbol{\Sigma}_{r}^{zx}(\boldsymbol{\Sigma}_{r}^{xx})^{-1}\boldsymbol{\Sigma}_{r}^{xz}. Here, 𝝁r\boldsymbol{\mu}_{r} is partitioned into μrz{\mu}_{r}^{z} and 𝝁rx\boldsymbol{\mu}_{r}^{x} according to ZZ and 𝑿\boldsymbol{X}, and (Σrzz,𝚺rxx,𝚺rzx,𝚺rxz)({\Sigma}_{r}^{zz},\boldsymbol{\Sigma}_{r}^{xx},\boldsymbol{\Sigma}_{r}^{zx},\boldsymbol{\Sigma}_{r}^{xz}) are the components of the corresponding partition of covariance matrix 𝚺r\boldsymbol{\Sigma}_{r}.

This modeling strategy allows for non-linear, non-standard relationships to be captured, and overcomes many limitations of standard parametric models. In addition, the cut-offs may be fixed to arbitrary increasing values (which we recommend to be equally spaced and centered at zero) without sacrificing the ability of the model to approximate any distribution for mixed ordinal-continuous data. In particular, it can be shown that the induced prior model on the space of mixed ordinal-continuous distributions assigns positive probability to all Kullback-Leibler neighborhoods of any distribution in this space. This represents a key computational advantage over parametric models. We refer to DeYoreo and Kottas (2014a) for more details on model properties, a review of existing approaches to ordinal regression, and illustration of the benefits afforded by the nonparametric joint model over standard methods. This discussion refers to ordinal responses with three or more categories. For the case of binary regression, i.e., when C=2C=2, additional restrictions are needed on the covariance matrix 𝚺\boldsymbol{\Sigma} to facilitate identifiability (DeYoreo and Kottas, 2014b).

2.2 Dependent Dirichlet Processes

In developing a model for a collection of distributions indexed in discrete time, we seek to build on previous knowledge, retaining the powerful and well-studied DP mixture model marginally at each time t𝒯t\in\mathcal{T}, with 𝒯={1,2,}\mathcal{T}=\{1,2,\dots\}. We thus seek to extend the DP prior to model G𝒯={Gt:t𝒯}G_{\mathcal{T}}=\{G_{t}:t\in\mathcal{T}\}, a set of dependent distributions such that each GtG_{t} follows a DP marginally. The dynamic DP extension can be developed by introducing temporal dependence in the weights and/or atoms of the constructive definition, G=G= l=1plδ𝜽l\sum_{l=1}^{\infty}p_{l}\delta_{\boldsymbol{\theta}_{l}}.

The general formulation of the DDP introduced by MacEachern (1999, 2000) expresses the atoms 𝜽l𝒮={𝜽l,t:tS}\boldsymbol{\theta}_{l\mathcal{S}}=\{\boldsymbol{\theta}_{l,t}:t\in S\}, l=1,2,l=1,2,\dots as independently and identically distributed (i.i.d.) sample paths from a stochastic process over SS, and the latent beta random variables which drive the weights, 𝒗l𝒮={vl,t:tS}\boldsymbol{v}_{l\mathcal{S}}=\{v_{l,t}:t\in S\}, l=1,2,l=1,2,\dots, as i.i.d. realizations from a stochastic process with beta(1,αt)(1,\alpha_{t}) marginal distributions. The distributions could be indexed in time, space, or by covariates, and SS represents the corresponding index set, being +\mathbb{Z}^{+} in our case. The DDP model for distributions indexed in discrete time expresses GtG_{t} as l=1pl,tδ𝜽l,t\sum_{l=1}^{\infty}p_{l,t}\delta_{{\boldsymbol{\theta}}_{l,t}}, for t𝒯t\in\mathcal{T}. The locations 𝜽l𝒯={𝜽l,t:t𝒯}\boldsymbol{\theta}_{l\mathcal{T}}=\{\boldsymbol{\theta}_{l,t}:t\in\mathcal{T}\} are i.i.d. for l=1,2,l=1,2,\dots, from a time series model for the kernel parameters. The stick-breaking weights 𝒑l𝒯={pl,t:t𝒯}\boldsymbol{p}_{l\mathcal{T}}=\{p_{l,t}:t\in\mathcal{T}\}, l=1,2,l=1,2,\dots, arise through a latent time series with beta(1,αt)(1,\alpha_{t}) marginal distributions, independently of 𝜽l𝒯\boldsymbol{\theta}_{l\mathcal{T}}.

The general DDP can be simplified by introducing dependence only in the weights, such that the atoms are not time dependent, or alternatively, the atoms can be time dependent while the weights remain independent of time. We refer to these as common atoms and common weights models, respectively. While the majority of DDP applications fall into the common weights category, we believe the most natural of the two simplifications for time series is to assume that the locations are constant over time, and introduce dependence in the weights. Although we will end up working with the more general version of the DDP with dependent atoms and weights, for the application and related settings, it seems plausible that there is a fixed set of factors that determine the region in which the joint density of body characteristics is supported, but dynamics are caused by changes in the relative importance of the factors. The construction of dependent weights requires dependent beta random variables, so that p1,t=v1,tp_{1,t}=v_{1,t}, pl,t=vl,tr=1l1(1vr,t)p_{l,t}=v_{l,t}\prod_{r=1}^{l-1}(1-v_{r,t}), for l=2,3,l=2,3,\dots, with each {vl,t:t𝒯}\{v_{l,t}:t\in\mathcal{T}\} a realization from a time series model with beta(1,α1,\alpha) marginals. Equivalently, we can write p1,t=1β1,tp_{1,t}=1-\beta_{1,t}, pl,t=(1βl,t)r=1l1βr,tp_{l,t}=(1-\beta_{l,t})\prod_{r=1}^{l-1}\beta_{r,t}, for l=2,3,l=2,3,\dots, with each {βl,t:t𝒯}\{\beta_{l,t}:t\in\mathcal{T}\} a realization from a time series model with beta(α,1\alpha,1) marginals.

There have been many variations of the DDP model proposed in the literature. The common weights version was originally discussed by MacEachern (2000), in which a Gaussian process was used to generate dependent locations, with the autocorrelation function controlling the degree to which distributions which are “close” are similar, and how quickly this similarity decays. De Iorio et al. (2004) consider also a common weights model, in which the index of dependence is a covariate, a key application of DDP models. In the order-based DDP of Griffin and Steel (2006), covariates are used to sort the weights. Covariate dependence is incorporated in the weights in the kernel and probit stick-breaking models of Dunson and Park (2008) and Rodriguez and Dunson (2011), respectively, however these prior models do not retain the DP marginally. Gelfand et al. (2005) developed a DP mixture model for spatial data, using a spatial Gaussian process to induce dependence in DDP locations indexed by space. For data indexed in discrete time, Rodriguez and ter Horst (2008) apply a common weights model, with atoms arising from a dynamic linear model. Di Lucca et al. (2013) develop a model for a single time series of continuous or binary responses through a DDP in which the atoms are dependent on lagged terms. Xiao et al. (2015) construct a dynamic model for Poisson process intensities built from a DDP mixture with common weights and different types of autoregressive processes for the atoms. Taddy (2010) assumes the alternative simplification of the DDP with common atoms, and models the stick-breaking proportions {vl,t:t𝒯}\{v_{l,t}:t\in\mathcal{T}\} using the positive correlated autoregressive beta process from McKenzie (1985). Nieto-Barajas et al. (2012) also use the common atoms simplification of the DDP, modeling a time series of random distributions by linking the beta random variables through latent binomially distributed random variables.

2.3 A Time-Dependent Nonparametric Prior

To generate a correlated series {βl,t:t𝒯}\{\beta_{l,t}:t\in\mathcal{T}\} such that each βl,tbeta(α,1)\beta_{l,t}\sim\mathrm{beta}(\alpha,1) marginally, we define a stochastic process

={βt=exp(ζ2+ηt22α):t𝒯},\mathcal{B}=\left\{\beta_{t}=\exp\left(-\frac{\zeta^{2}+\eta^{2}_{t}}{2\alpha}\right):\,t\in\mathcal{T}\right\}, (2)

which is built from a standard normal random variable ζ\zeta and an independent stochastic process η𝒯={ηt:t𝒯}{\eta}_{\mathcal{T}}=\{\eta_{t}:t\in\mathcal{T}\} with standard normal marginal distributions. This transformation leads to marginal distributions βtbeta(α,1)\beta_{t}\sim\text{beta}(\alpha,1) for any tt. To see this, take two independent standard normal random variables Y1Y_{1} and Y2Y_{2}, such that W=W= (Y12+Y22)/2(Y_{1}^{2}+Y_{2}^{2})/2 follows an exponential distribution with mean 1, and thus B=exp(W/α)B=\exp(-W/\alpha) beta(α,1)\sim\mathrm{beta}(\alpha,1). To our knowledge, this is a novel construction for a common atoms DDP prior model. The practical utility of the transformation in (2) is that it facilitates building the temporal dependence through Gaussian time-series models, while maintaining the DP structure marginally.

Because we work with distributions indexed in discrete time, we assume η𝒯{\eta}_{\mathcal{T}} to be a first-order autoregressive (AR) process, however alternatives such as higher order processes or Gaussian processes for spatially indexed data are possible. The requirement of standard normal marginal distributions on η𝒯{\eta}_{\mathcal{T}} leads to a restriction on the variance of the AR(1) model, such that ηl,tN(ϕηl,t1,1ϕ2)\eta_{l,t}\sim\text{N}(\phi\eta_{l,t-1},1-\phi^{2}), t=2,,Tt=2,\dots,T. Thus |ϕ|<1|\phi|<1, which implies stationarity for the stochastic process η𝒯{\eta}_{\mathcal{T}}. Since \mathcal{B} is a transformation of a strongly stationary stochastic process, it is also strongly stationary. Note that the correlation in (βl,t,βl,t+k)(\beta_{l,t},\beta_{l,t+k}) is driven by the autocorrelation present in η𝒯{\eta}_{\mathcal{T}}, and this induces dependence in the weights (pl,t,pl,t+k)(p_{l,t},p_{l,t+k}), which leads to dependent distributions (Gt,Gt+k)(G_{t},G_{t+k}).

We now explore this dependence, discussing some of the correlations implied by this prior model. First, consider the correlation of the beta random variables used to define the dynamic stick-breaking weights. Let ρk=corr(ηt,ηt+k)\rho_{k}=\text{corr}(\eta_{t},\eta_{t+k}), which is equal to ϕk\phi^{k} under the assumption of an AR(1) process for η𝒯\eta_{\mathcal{T}}. The autocorrelation function associated with \mathcal{B} is

corr(βt,βt+kα,ϕ)=α1/2(1ρk2)1/2(α+1)2(α+2)1/2{(1ρk2+α)2α2ρk2}1/2α(α+2)\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi)=\frac{\alpha^{1/2}(1-\rho^{2}_{k})^{1/2}(\alpha+1)^{2}(\alpha+2)^{1/2}}{\left\{(1-\rho^{2}_{k}+\alpha)^{2}-\alpha^{2}\rho^{2}_{k}\right\}^{1/2}}-\alpha(\alpha+2) (3)

as described in Appendix A. Smaller values for α\alpha lead to smaller correlations for any fixed ϕ\phi at a particular lag, and ϕ\phi controls the strength of correlation, with large ϕ\phi producing large correlations which decay slowly. The parameters ϕ\phi and α\alpha in combination can lead to a wide range of correlations, however α1\alpha\geq 1 implies a lower bound near 0.50.5 on the correlation for any lag kk. In the limit, as α0+\alpha\rightarrow 0^{+}, corr(βt,βt+kα,ϕ)0\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi)\rightarrow 0, and as α\alpha\rightarrow\infty, corr(βt,βt+kα,ϕ)\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi) tends towards 0.50.5 as ρk0+\rho_{k}\rightarrow 0^{+}, and 11 as ρk1\rho_{k}\rightarrow 1^{-}. Assuming ρk=ϕk\rho_{k}=\phi^{k}, gives limϕ1corr(βt,βt+kα,ϕ)=1\lim_{\phi\to 1^{-}}\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi)=1 and limϕ0+corr(βt,βt+kα,ϕ)=α1/2(α+1)(α+2)1/2α(α+2).\lim_{\phi\to 0^{+}}\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi)=\alpha^{1/2}(\alpha+1)(\alpha+2)^{1/2}-\alpha(\alpha+2). This tends upwards to 0.50.5 quickly as α\alpha\rightarrow\infty.

Note that corr(βt,βt+kα,ϕ)\text{corr}(\beta_{t},\beta_{t+k}\mid\alpha,\phi) is a function of ρk2\rho^{2}_{k} but not ρk\rho_{k}, which is to be expected since ηt\eta_{t} enters the expression for βt\beta_{t} only through ηt2\eta_{t}^{2}, and thus ρk-\rho_{k} and ρk\rho_{k} have the same effect in the correlation. The same is true of the correlation in the DP weights, which is given below. We believe that ϕ(0,1)\phi\in(0,1) is a natural restriction, since we are building a stochastic process for distributions correlated in time through a transformation of an AR process, which intuition suggests should be positively correlated. However, all that is strictly required to preserve the DP marginals is |ϕ|<1|\phi|<1.

Assuming 𝜷l,𝒯={βl,t:t𝒯}\boldsymbol{\beta}_{l,\mathcal{T}}=\{\beta_{l,t}:t\in\mathcal{T}\} is generated by \mathcal{B}, from an underlying AR(1) process for η𝒯{\eta}_{\mathcal{T}} with coefficient ϕ\phi, we study the dependence induced in the resulting DDP weights at consecutive time points, (pl,t,pl,t+1)(p_{l,t},p_{l,t+1}). The covariance is given by

cov(pl,t,pl,t+1α,ϕ)={α3/2(1ϕ2)1/2(2+α)1/2{(1ϕ2+α)2α2ϕ2}1/2}l1{12αα+1+α3/2(1ϕ2)1/2(2+α)1/2{(1ϕ2+α)2α2ϕ2}1/2}α2l2(1+α)2l,\text{cov}(p_{l,t},p_{l,t+1}\mid\alpha,\phi)=\left\{\frac{\alpha^{3/2}(1-\phi^{2})^{1/2}}{(2+\alpha)^{1/2}\left\{(1-\phi^{2}+\alpha)^{2}-\alpha^{2}\phi^{2}\right\}^{1/2}}\right\}^{l-1}\\ \left\{1-\frac{2\alpha}{\alpha+1}+\frac{\alpha^{3/2}(1-\phi^{2})^{1/2}}{(2+\alpha)^{1/2}\left\{(1-\phi^{2}+\alpha)^{2}-\alpha^{2}\phi^{2}\right\}^{1/2}}\right\}-\frac{\alpha^{2l-2}}{(1+\alpha)^{2l}}, (4)

which can be divided by var(pl,tα)=\mathrm{var}(p_{l,t}\mid\alpha)= {2αl1/((1+α)(2+α)l)}{α2l2/(1+α)2l}\{2\alpha^{l-1}/((1+\alpha)(2+\alpha)^{l})\}-\{\alpha^{2l-2}/(1+\alpha)^{2l}\} to yield corr(pl,t,pl,t+1α,ϕ)\text{corr}(p_{l,t},p_{l,t+1}\mid\alpha,\phi); note that E(pl,tα)=\text{E}(p_{l,t}\mid\alpha)= E(pl,t+1α)\text{E}(p_{l,t+1}\mid\alpha) and var(pl,tα)=\text{var}(p_{l,t}\mid\alpha)= var(pl,t+1α)\text{var}(p_{l,t+1}\mid\alpha). Derivations are given in Appendix A. This expression is decreasing in index ll, and larger values of ϕ\phi lead to larger correlations in the weights at any particular ll. Moreover, the decay in correlations with weight index is faster for small α\alpha and small ϕ\phi. As α0+\alpha\rightarrow 0^{+}, corr(p1,t,p1,t+1α,ϕ)1\text{corr}(p_{1,t},p_{1,t+1}\mid\alpha,\phi)\rightarrow 1 for any value of ϕ\phi, and as α\alpha\rightarrow\infty, corr(p1,t,p1,t+1α,ϕ)\text{corr}(p_{1,t},p_{1,t+1}\mid\alpha,\phi) is contained in (0.5,1)(0.5,1), with values closer to 11 for larger ϕ\phi. Note that corr(pl,t,pl,t+kα,ϕ)\text{corr}(p_{l,t},p_{l,t+k}\mid\alpha,\phi) has the same expression as corr(pl,t,pl,t+1α,ϕ)\text{corr}(p_{l,t},p_{l,t+1}\mid\alpha,\phi), but with ϕ\phi replaced by ϕk\phi^{k}; it is thus decreasing with the lag kk, with the speed of decay controlled by ϕ\phi.

Finally, assume GtG_{t} is a random distribution on M\mathbb{R}^{M}, such that Gt=G_{t}= l=1pl,tδ𝜽l\sum_{l=1}^{\infty}p_{l,t}\delta_{\boldsymbol{\theta}_{l}}, where the {pl,t:t𝒯}\{p_{l,t}:t\in\mathcal{T}\} are defined through the dynamic stick-breaking weights {βl,t:t𝒯}\{\beta_{l,t}:t\in\mathcal{T}\}, and the 𝜽l\boldsymbol{\theta}_{l} are i.i.d. from a distribution G0G_{0} on M\mathbb{R}^{M}. Consider two consecutive distributions (Gt,Gt+1)(G_{t},G_{t+1}), and a measurable subset AMA\subset\mathbb{R}^{M}. The correlation of consecutive distributions, corr(Gt(A),Gt+1(A)ϕ,α,G0)\text{corr}(G_{t}(A),G_{t+1}(A)\mid\phi,\alpha,G_{0}), is discussed in Appendix A.

3 Estimating Maturity of Rockfish

3.1 Chilipepper Rockfish Data

We now utilize the method for incorporating dependence into the weights of the DP and the approach to ordinal regression involving DP mixtures of normals for the latent response-covariate distribution to further develop the DDP mixture modeling framework for dynamic ordinal regression.

In the original rockfish data source, maturity is recorded on an ordinal scale from 11 to 66, representing immature (1), early and late vitellogenesis (2,3)(2,3), eyed larvae (4)(4), and post-spawning (5,6)(5,6). Because scientists are not necessarily interested in differentiating between every one of these maturity levels, and to make the model output simple and more interpretable, we collapse maturity into three ordinal levels, representing immature (1)(1), pre-spawning mature (2,3,4)(2,3,4), and post-spawning mature (5,6)(5,6).

Many observations have age missing or maturity recorded as unknown. Exploratory analysis suggests there to be no systematic pattern in missingness. Further discussion with fisheries research scientists having expertise in aging of rockfish and data collection revealed that the reason for missing age in a sample is that otoliths (ear stones used in aging) were not collected or have not yet been aged. Maturity may be recorded as unknown because it can be difficult to distinguish between stages, and samplers are told to record unknown unless they are reasonably sure of the stage. Therefore, there is no systematic reason that age or maturity is not present, and it is reasonable to assume that the data are missing at random, or that the probability an observation is missing does not depend on the missing values, allowing us to ignore the missing data mechanism, and base inferences only on the complete data (e.g., Rubin, 1976; Gelman et al., 2004).

Age can not be treated as a continuous covariate, as there are approximately 2525 distinct values of age in over 2,2002,200 observations. Age is in fact an ordinal random variable, such that a recorded age jj implies the fish was between jj and j+1j+1 years of age. This relationship between discrete recorded age and continuous age is obtained by the following reasoning. Chilipepper rockfish are winter spawning, and the young are assumed to be born in early January. The annuli (rings) of the otiliths are counted in order to determine age, and these also form sometime around January. Thus, for each ring, there has been one year of growth.

We therefore treat age much in the same way as maturity, using a latent continuous age variable. Let UU represent observed ordinal age, let UU^{*} represent underlying continuous age, and assume, for j=1,2,,j=1,2,\dots, that U=jU=j iff U(j,j+1]U^{*}\in(j,j+1]. Equivalently, U=jU=j iff log(U)(log(j),log(j+1)]\log(U^{*})\in(\log(j),\log(j+1)], for j=1,2,j=1,2,\dots, and U=0U=0 iff log(U)(,0]\log(U^{*})\in(-\infty,0], so that the support of the latent continuous random variable corresponding to age is \mathbb{R}. Letting WW be the latent continuous random variable which determines UU through discretization, we assume ut,i=ju_{t,i}=j iff wt,i(log(j),log(j+1)]w_{t,i}\in(\log(j),\log(j+1)], for j=0,1,j=0,1,\dots, so that WW is interpretable as log-age on a continuous scale.

Considering year of sampling as the index of dependence, observations occur in years 19931993 through 20072007, indexed by t=1,,T=15t=1,\dots,T=15, with no observations in 20032003, 20052005, or 20062006. Let the missing years be given by 𝒔\boldsymbol{s}, here 𝒔={11,13,14}\boldsymbol{s}=\{11,13,14\}, and let 𝒔c={1,,T}𝒔\boldsymbol{s}^{c}=\{1,\dots,T\}\setminus\boldsymbol{s} represent all other years in {1,,T}\{1,\dots,T\}. This situation involving time points in which data is completely missing is not uncommon in these types of problems, and can be handled with our model for equally spaced time points. We retain the ability to provide inference and estimation in years for which no data was recorded, which we will see are reasonable and exhibit more uncertainty than in other years.

3.2 Hierarchical Model and Implementation Details

The common atoms DDP model presented in Section 2.3 was tested extensively on simulated data, and performed very well in capturing trends in the underlying distributions when there were no missing years or forecasting was not the focus. In these settings, the common atoms model is sufficiently powerful from an inferential perspective such that the need to turn to a more complex model is diminished. However, as a consequence of the need to force the same set of components to be present at each time point, density estimates at missing time points tend to resemble an average across all time points, which is not desirable when a trend or change in support is suggested. A more general model is required for this application, since it is important to make inferences in years for which data was not collected. We therefore consider a simple extension, adding dependence through a vector autoregressive model in the mean component of the DP atoms, such that 𝜽l=(𝝁l,𝚺l)\boldsymbol{\theta}_{l}=(\boldsymbol{\mu}_{l},\boldsymbol{\Sigma}_{l}) becomes 𝜽l,t=(𝝁l,t,𝚺l)\boldsymbol{\theta}_{l,t}=(\boldsymbol{\mu}_{l,t},\boldsymbol{\Sigma}_{l}).

Assuming ZZ represents maturity YY on a continuous scale, WW is interpretable as log-age, and XX represents length, a dependent nonparametric mixture model is applied to estimate the time-dependent distributions of the trivariate continuous random vectors 𝒀ti=(Zti,Wti,Xti)\boldsymbol{Y}^{*}_{ti}=(Z_{ti},W_{ti},X_{ti}), for t=1,,Tt=1,\dots,T, and i=1,,nti=1,\dots,n_{t}. In our notation, TT is the number of years or the final year containing data (and it is possible that not all years in {1,,T}\{1,\dots,T\} contain observations), and ntn_{t} is the sample size in year tt.

We utilize the computationally efficient approach to inference which involves truncating the countable representation for each GtG_{t} to a finite level NN (Ishwaran and James, 2001), such that the dependent stick-breaking weights are given by p1,t=1β1,tp_{1,t}=1-\beta_{1,t}, pl,t=(1βl,t)r=1l1βr,tp_{l,t}=(1-\beta_{l,t})\prod_{r=1}^{l-1}\beta_{r,t}, for r=1,,N1r=1,\dots,N-1, and pN,t=l=1N1βl,tp_{N,t}=\prod_{l=1}^{N-1}\beta_{l,t}, ensuring l=1Npl,t=1\sum_{l=1}^{N}p_{l,t}=1. Since α\alpha is not a function of tt, the same truncation level is applied for all mixing distributions. In choosing the truncation level, we use the expression relating NN to the expectation of the sum of the first NN weights w1,,wNw_{1},\dots,w_{N} generated from stick-breaking of beta(1,α)(1,\alpha) random variables. The expression is E(j=1Nwjα)=\mathrm{E}(\sum_{j=1}^{N}w_{j}\mid\alpha)= 1(α/(α+1))N1-(\alpha/(\alpha+1))^{N}, which can be further averaged over the prior for α\alpha to obtain E(j=1Nwj)\mathrm{E}(\sum_{j=1}^{N}w_{j}), with NN chosen such that this expectation is close to 1 up to the desired level of tolerance for the approximation. The hierarchical model can be expressed as follows:

yt,i=jγj1<zt,iγj,t𝒔c,i=1,,nt\displaystyle y_{t,i}=j\leftrightarrow\gamma_{j-1}<z_{t,i}\leq\gamma_{j},\,t\in\boldsymbol{s}^{c},\,i=1,\dots,n_{t}
ut,i=jlog(j)<wt,ilog(j+1),t𝒔c,i=1,,nt\displaystyle u_{t,i}=j\leftrightarrow\log(j)<w_{t,i}\leq\log(j+1),\,t\in\boldsymbol{s}^{c},\,i=1,\dots,n_{t}
{𝒚t,i}{𝝁l,t},{𝚺l},{Lt,i}t𝒔ci=1ntN(𝝁Lt,i,t,𝚺Lt,i)\displaystyle\{\boldsymbol{y}^{*}_{t,i}\}\mid\{\boldsymbol{\mu}_{l,t}\},\{\boldsymbol{\Sigma}_{l}\},\{L_{t,i}\}\sim\prod_{t\in\boldsymbol{s}^{c}}\prod_{i=1}^{n_{t}}\text{N}(\boldsymbol{\mu}_{L_{t,i},t},\boldsymbol{\Sigma}_{L_{t,i}})
{Lt,i}{ηl,t},{ζl}t𝒔ci=1ntl=1Npl,tδl(Lt,i)\displaystyle\{L_{t,i}\}\mid\{\eta_{l,t}\},\{\zeta_{l}\}\sim\prod_{t\in\boldsymbol{s}^{c}}\prod_{i=1}^{n_{t}}\sum_{l=1}^{N}p_{l,t}\delta_{l}(L_{t,i})
ζliidN(0,1),l=1,,N1\displaystyle\zeta_{l}\stackrel{{\scriptstyle iid}}{{\sim}}\text{N}(0,1),\quad l=1,\dots,N-1
ηl,1iidN(0,1),l=1,,N1\displaystyle\eta_{l,1}\stackrel{{\scriptstyle iid}}{{\sim}}\text{N}(0,1),\quad l=1,\dots,N-1
ηl,tηl,t1,ϕN(ϕηl,t1,1ϕ2),l=1,,N1,t=2,,T\displaystyle\eta_{l,t}\mid\eta_{l,t-1},\phi\sim\text{N}(\phi\eta_{l,t-1},1-\phi^{2}),\quad l=1,\dots,N-1,\,t=2,\dots,T
𝝁l,1𝒎𝟎,𝑽𝟎N(𝒎𝟎,𝑽𝟎),l=1,,N\displaystyle\boldsymbol{\mu}_{l,1}\mid\boldsymbol{m_{0}},\boldsymbol{V_{0}}\sim\text{N}(\boldsymbol{m_{0}},\boldsymbol{V_{0}}),\quad l=1,\dots,N
𝝁l,t𝝁l,t1,Θ,𝒎,𝑽N(𝒎+Θ𝝁l,t1,𝑽),l=1,,N,t=2,,T\displaystyle\boldsymbol{\mu}_{l,t}\mid\boldsymbol{\mu}_{l,t-1},\Theta,\boldsymbol{m},\boldsymbol{V}\sim\text{N}(\boldsymbol{m}+\Theta\boldsymbol{\mu}_{l,t-1},\boldsymbol{V}),\quad l=1,\dots,N,\,t=2,\dots,T
𝚺lν,𝑫iidIW(𝚺l;ν,𝑫),l=1,,N\displaystyle\boldsymbol{\Sigma}_{l}\mid\nu,\boldsymbol{D}\stackrel{{\scriptstyle iid}}{{\sim}}\text{IW}(\boldsymbol{\Sigma}_{l};\nu,\boldsymbol{D}),\quad l=1,\dots,N (5)

with priors on α\alpha, 𝝍=(𝒎,𝑽,𝑫)\boldsymbol{\psi}=(\boldsymbol{m},\boldsymbol{V},\boldsymbol{D}), ϕ\phi, and Θ\Theta. Recall that βl,t\beta_{l,t} is defined through (ηl,t,ζl,α)(\eta_{l,t},\zeta_{l},\alpha), and the {βl,t}\{\beta_{l,t}\} determine the {pl,t}\{p_{l,t}\} through stick-breaking.

The parameters {ζl}\{\zeta_{l}\} and {ηl,t}\{\eta_{l,t}\} can be updated individually with slice samplers, which involves alternating simulation from uniform random variables and truncated normal random variables, implying draws for {βl,t}\{\beta_{l,t}\}, and hence {pl,t}\{p_{l,t}\}. The configuration variables Lt,iL_{t,i} are drawn from discrete distributions on {1,,N}\{1,\dots,N\}, with probabilities proportional to pl,tN(𝒚t,i;𝝁l,t,𝚺l)p_{l,t}\text{N}(\boldsymbol{y}^{*}_{t,i};\boldsymbol{\mu}_{l,t},\boldsymbol{\Sigma}_{l}) for l=1,,Nl=1,\dots,N. The update for Σl\Sigma_{l} is IW(ν+Ml,𝑫+{(t,i):Lti=l}(𝒚t,i𝝁l,t)(𝒚t,i𝝁l,t)T)\text{IW}(\nu+M_{l},\boldsymbol{D}+\sum_{\{(t,i):L_{ti}=l\}}(\boldsymbol{y}^{*}_{t,i}-\boldsymbol{\mu}_{l,t})(\boldsymbol{y}^{*}_{t,i}-\boldsymbol{\mu}_{l,t})^{T}), where Ml=|{t,i}:Lt,i=l|M_{l}=|\{t,i\}:L_{t,i}=l|, and each 𝝁l,t\boldsymbol{\mu}_{l,t} is updated from a normal distribution. The parameters α\alpha and ϕ\phi, given priors IG(aα,bα)\text{IG}(a_{\alpha},b_{\alpha}), and uniform on (0,1)(0,1) or (1,1)(-1,1), respectively, can be sampled using Metropolis-Hastings steps. We assume that Θ\Theta is diagonal, with elements (θ1,θ2,θ3)(\theta_{1},\theta_{2},\theta_{3}), however we advocate for a full covariance matrix 𝑽\boldsymbol{V}. This implies that each element of 𝝁l,t\boldsymbol{\mu}_{l,t} has a mean which depends only on the corresponding element of 𝝁l,t1\boldsymbol{\mu}_{l,t-1}, however there exists dependence in the elements of 𝝁l,t\boldsymbol{\mu}_{l,t}, which seems reasonable for most applications. We assume uniform priors on (0,1)(0,1) or (1,1)(-1,1) for each element of Θ\Theta, and update them with a Metropolis-Hastings step. Finally, the parameters 𝝍\boldsymbol{\psi} have closed-form full conditional distributions, given priors 𝒎N(𝒂𝒎,𝑩𝒎)\boldsymbol{m}\sim\text{N}(\boldsymbol{a_{m}},\boldsymbol{B_{m}}), 𝑽IW(a𝑽,𝑩𝑽)\boldsymbol{V}\sim\text{IW}(a_{\boldsymbol{V}},\boldsymbol{B_{V}}), 𝑫W(a𝑫,𝑩𝑫)\boldsymbol{D}\sim\text{W}(a_{\boldsymbol{D}},\boldsymbol{B_{D}}). The full conditionals and posterior simulation details are further described in Appendix B.

To implement the model, we must specify the parameters of the hyperpriors on 𝝍\boldsymbol{\psi}. A default specification strategy is developed by considering the limiting case of the model as α0+\alpha\rightarrow 0^{+} and Θ𝟎\Theta\rightarrow\boldsymbol{0}, which results in a single normal distribution for 𝒀t\boldsymbol{Y}^{*}_{t}. In the limit, with 𝒀t𝝁t,𝚺N(𝝁t,𝚺)\boldsymbol{Y}^{*}_{t}\mid\boldsymbol{\mu}_{t},\boldsymbol{\Sigma}\sim\text{N}(\boldsymbol{\mu}_{t},\boldsymbol{\Sigma}) and 𝝁t𝒎,𝑽N(𝒎,𝑽)\boldsymbol{\mu}_{t}\mid\boldsymbol{m},\boldsymbol{V}\sim\text{N}(\boldsymbol{m},\boldsymbol{V}), we find E(𝒀t)=𝒂𝒎\text{E}(\boldsymbol{Y}^{*}_{t})=\boldsymbol{a_{m}} and Cov(𝒀t)=𝑩𝒎+𝑩𝑽(a𝑽d1)1+a𝑫𝑩𝑫(νd1)1\text{Cov}(\boldsymbol{Y}^{*}_{t})=\boldsymbol{B_{m}}+\boldsymbol{B_{V}}(a_{\boldsymbol{V}}-d-1)^{-1}+a_{\boldsymbol{D}}\boldsymbol{B_{D}}(\nu-d-1)^{-1}, where dd is the response-covariate dimension, here d=3d=3. The only covariate information we require is an approximate center (such as the midpoint of the data) and range, denoted by cxc^{x} and rxr^{x} for XX, and analogously for UU. We use cxc^{x} and rx/4r^{x}/4 as proxies for the marginal mean and standard deviation of XX. We also seek to scale the latent variables appropriately. The centers and ranges cuc^{u} and rur^{u} provide approximate centers cwc^{w} and ranges rwr^{w} of latent log-age WW. Since YY is supported on {1,,C}\{1,\dots,C\}, latent continuous ZZ must be supported on values slightly below γ1\gamma_{1} up to slightly above γC1\gamma_{C-1}, so that rz/4r^{z}/4 is a proxy for the standard deviation of ZZ, where rz=(γC1γ1)r^{z}=(\gamma_{C-1}-\gamma_{1}). Using these mean and variance proxies, we fix a𝒎=(0,cu,cx)a_{\boldsymbol{m}}=(0,c^{u},c^{x}). Each of the three terms in Cov(𝒀t)\text{Cov}(\boldsymbol{Y}^{*}_{t}) can be assigned an equal part of the total covariance, for instance being set to 31diag{(rz/4)2,(rw/4)2,(rx/4)2}3^{-1}\text{diag}\{(r^{z}/4)^{2},(r^{w}/4)^{2},(r^{x}/4)^{2}\}. For dispersed but proper priors, ν\nu, a𝑽a_{\boldsymbol{V}} and a𝑫a_{\boldsymbol{D}} can be fixed to small values such as d+2d+2, and 𝑩𝒎\boldsymbol{B_{m}}, 𝑩𝑽\boldsymbol{B_{V}}, and 𝑩𝑫\boldsymbol{B_{D}} determined accordingly.

It remains to specify 𝒎0\boldsymbol{m}_{0} and 𝑽0\boldsymbol{V}_{0}, the mean and covariance for the initial distributions 𝝁l,1\boldsymbol{\mu}_{l,1}. We propose a fairly conservative specification, noting that in the limit, E(𝒀1)=𝒎0\text{E}(\boldsymbol{Y}^{*}_{1})=\boldsymbol{m}_{0}, and Cov(𝒀1)=a𝑫𝑩𝑫(νd1)1+𝑽0\text{Cov}(\boldsymbol{Y}^{*}_{1})=a_{\boldsymbol{D}}\boldsymbol{B_{D}}(\nu-d-1)^{-1}+\boldsymbol{V}_{0}. Therefore, 𝒎0\boldsymbol{m}_{0} can be specified in the same way as 𝒂𝒎\boldsymbol{a_{m}} but using only the subset of data at t=1t=1, and 𝑽0\boldsymbol{V}_{0} can be set to diag{(r1z/4)2,(r1w/4)2,(r1x/4)2}a𝑫𝑩𝑫(νd1)1\text{diag}\{(r^{z}_{1}/4)^{2},(r^{w}_{1}/4)^{2},(r_{1}^{x}/4)^{2}\}-a_{\boldsymbol{D}}\boldsymbol{B_{D}}(\nu-d-1)^{-1}, where the subscript 11 indicates the subset of data at t=1t=1.

In simulation studies and the rockfish application, we observed a moderate to large amount of learning for all hyperparameters. For instance, for the rockfish data, the posterior distribution for ϕ\phi was concentrated on values close to 1, indicating the DDP weights are strongly correlated across time. There was also moderate learning for α\alpha as its posterior distribution was concentrated around 0.50.5, with small variance, shifted down relative to the prior which had expectation 22. The posterior distribution for each element of 𝒎\boldsymbol{m} was reduced in variance and concentrated on values not far from those indicated by the prior mean. The posterior samples for the covariance matrices 𝑽\boldsymbol{V} and 𝑫\boldsymbol{D} supported smaller variance components than suggested by the prior.

3.3 Results

Various simulation settings were developed to study both the common atoms version of the model and the more general version (DeYoreo, 2014, chapter 4). While we focus only on the fish maturity data application in this paper, our extensive simulation studies have revealed the inferential power of the model under different scenarios for the true latent response distribution and ordinal regression relationships.

We first discuss inference results for quantities not involving maturity. As illustrated with results for six years in Figure 3, the estimates for the density of length display a range of shapes. The interval estimates reflect the different sample sizes in these years; for instance, in 19941994 there are 271271 observations, whereas in 20002000 and 20042004 there are only 6464 and 3737 observations, respectively. A feature of our modeling approach is that inference for the density of age can be obtained over a continuous scale. The corresponding estimates are shown in Figure 4 for the same years as length.

Refer to caption
Figure 3: Posterior mean and 95%95\% interval estimates for the density of length (in millimeters) across six years, with the data shown as a histogram.
Refer to caption
Figure 4: Posterior mean and 95%95\% interval estimates for the density of age on a continuous scale across six years, with the data shown as a histogram.
Refer to caption
Figure 5: Posterior mean estimates for the bivariate density of age and length across all years.
Refer to caption
Figure 6: Posterior mean and 95%95\% interval bands for the expected value of length over (continuous) age, E(XU=u;Gt)\text{E}(X\mid U^{*}=u^{*};G_{t}), across three years. Overlaid are the data (in blue) and the estimated von Bertalanffy growth curves (in red).

The posterior mean surfaces for the bivariate density of age and length are shown in Figure 5 for all years, including the ones (years 20032003, 20052005 and 20062006) for which data is not available. The model yields more smooth shapes for the density estimates in these years. An ellipse with a slight “banana” shape appears at each year, though some nonstandard features and differences across years are present. In particular, the density in year 20022002 extends down farther to smaller ages and lengths; this year is unique in that it contains a very large proportion of the young fish which are present in the data. One can envision a curve going through the center of these densities, representing E(XU=u;Gt)\text{E}(X\mid U^{*}=u^{*};G_{t}), for which we show posterior mean and 95%95\% interval bands for three years in Figure 6. The estimates from our model are compared with the von Bertalanffy growth curves for length-at-age, which are based on a particular function of age and three parameters (estimated here using nonlinear least squares). It is noteworthy that the nonparametric mixture model for the joint distribution of length and age yields estimated growth curves which are overall similar to the von Bertalanffy parametric model, with some local differences especially in year 2002. The uncertainty quantification in the growth curves afforded by the nonparametric model is important, since the attainment of unique growth curves by group (i.e. by location or cohort) is often used to suggest that the groups differ in some way, and this type of analysis should clearly take into account the uncertainty in the estimated curves.

The last year 20072007, in addition to containing few observations, is peculiar. There are no fish that are younger than age 66 in this year, and most of the age 66 and 77 fish are recorded as immature, even though in all years combined, less than 10%10\% of age 66 as well as age 77 fish are immature. This year appears to be an anomaly. As there are no observations in 20052005 or 20062006, and a small number of observations in 20072007 which seem to contradict the other years of data, hereinafter, we report inferences only up to 20042004.

Inference for the maturation probability curves is shown over length and age in Figures 7 and 8. The probability that a fish is immature is generally decreasing over length, reaching a value near 0 at around 350350 mm in most years. There is a large change in this probability over length in 20022002 and 20042004 as compared to other years, as these years suggest a probability close to 11 for very small fish near 200200 to 250250 mm. Turning to age, the probability of immaturity is also decreasing with age, also showing differences in 20022002 and 20042004 in comparison to other years. There is no clear indication of a general trend in the probabilities associated with levels 22 or 33. Years 19951995-19971997 and 19991999 display similar behavior, with a peak in probability of post-spawning mature for moderate length values near 350350 mm, and ages 66-77, favoring pre-spawning mature fish at other lengths and ages. The last four years 20012001-20042004 suggest the probability of pre-spawning mature to be increasing with length up to a point and then leveling off, while post-spawning is favored most for large fish. Post-spawning appears to have a lower probability than pre-spawning mature for any age at all years, with the exception of 19981998, for which the probability associated with post-spawning is very high for older fish.

Refer to caption
Figure 7: Posterior mean (black lines) and 95%95\% interval estimates (gray shaded regions) for the marginal ordinal probability curves associated with length. Category 1 (immature) given by solid line, category 2 (pre-spawning mature) given by dashed line, and category 3 (post-spawning mature) shown as a dotted line.
Refer to caption
Figure 8: Posterior mean (black lines) and 95%95\% interval estimates (gray shaded regions) for the marginal ordinal probability curves associated with age. Category 1 (immature) given by solid line, category 2 (pre-spawning mature) given by dashed line, and category 3 (post-spawning mature) shown as a dotted line.
Refer to caption
Refer to caption
Figure 9: Posterior mean and 90%90\% intervals for the smallest value of age above 22 years at which probability of maturity first exceeds 90%90\% (left), and similar inference for length (right).

The Pacific States Marine Fisheries Commission states that all Chilipepper rockfish are mature at around 44-55 years, and at size 304304 to 330330 mm. A stock assessment produced by the Pacific Fishery Management Council (Field, 2009) fitted a logistic regression to model maturity over length, from which it appears that 90%90\% of fish are mature around 300300-350350 mm. As our model does not enforce monotonicity on the probability of maturity across age, we obtain posterior distributions for the first age greater than 22 (since biologically all fish under 22 should be immature) at which the probability of maturity exceeds 90%90\%, given that it exceeds 90%90\% at some point. That is, for each posterior sample we evaluate Pr(Y>1u;Gt)\mathrm{Pr}(Y>1\mid u^{*};G_{t}) over a grid in uu^{*} beginning at 22, and find the smallest value of uu^{*} at which this probability exceeds 0.90.9. Note that there were very few posterior samples for which this probability did not exceed 0.90.9 for any age (only 44 samples in 19931993 and 88 in 20032003). The estimates for age at 90%90\% maturity are shown in the left panel of Figure 9. The model uncovers a (weak) U-shaped trend across years. Also noteworthy are the very narrow interval bands in 20022002. Recall that this year contains an abnormally large number of young fish. In this year, over half of fish age 22 (that is, of age 22-33) are immature, and over 90%90\% of age 33 (that is, of age 33-44) fish are mature, so we would expect the age at 90%90\% maturity to be above 33 but less than 44, which our estimate confirms. A similar analysis is performed for length (right panel of Figure 9) suggesting a trend over time which is consistent with the age analysis.

Due to the monotonicity in the maturity probability curve in standard approaches, and the fact that age and length are treated as fixed, the point at which maturity exceeds a certain probability is a reasonable quantity to obtain in order to study the age or length at which most fish are mature. However, since we are modeling age and length, we can obtain their entire distribution at a given maturity level. These are inverse inferences, in which we study, for instance, f(xY=1;Gt)f(x\mid Y=1;G_{t}) as opposed to Pr(Y=1x;Gt)\mathrm{Pr}(Y=1\mid x;G_{t}). It is most informative to look at age and length for immature fish, as this makes it clear at which age or length there is essentially no probability assigned to the immature category. The posterior mean estimates for f(u,xY=1;Gt)f(u^{*},x\mid Y=1;G_{t}) are shown in Figure 10.

Refer to caption
Figure 10: Posterior mean estimate for the bivariate density of age and length for immature fish across years. The data on age and length of immature fish is overlaid in each plot.

3.4 Model Checking

Here, we discuss results from posterior predictive model checking. In particular, we generated replicate data sets from the posterior predictive distribution, and compared to the real data using specific test quantities (Gelman et al., 2004). Using the MCMC output, we simulate replicate data sets {(yti,wti,xti)rep}\{(y_{ti},w_{ti},x_{ti})^{rep}\} of the same size as the original data. We then choose some test quantity, T({yti,wti,xti})T(\{y_{ti},w_{ti},x_{ti}\}), and for each replicate data set at time tt, determine the value of the test quantity and compare the distribution of test quantities with the value computed from the actual data. To obtain Figure 11 we computed, for each replicate sample, the proportion of age 66 fish that were of maturity levels 11 and 22. Boxplots of these proportions are shown, with the actual proportions from the data indicated as blue points. The width of each box is proportional to the number of age 66 fish in that year. Figure 12 refers to fish of at least age 77 with length larger than 400400 mm. Finally, Figure 13 plots the sample correlation of length and age for fish of maturity level 22. The results reported in Figures 11, 12 and 13 suggest that the model is predicting data which is very similar to the observed data in terms of practically important inferences.

Although results are not shown here, we also studied residuals with cross-validation, randomly selecting 20%20\% of the observations in each year and refitting the model, leaving out these observations. We obtained residuals y~tiE(YW=w~ti,X=x~ti;Gt)\tilde{y}_{ti}-\text{E}({Y}\mid W=\tilde{w}_{ti},X=\tilde{x}_{ti};G_{t}) for each observation (y~ti,w~ti,x~ti)(\tilde{y}_{ti},\tilde{w}_{ti},\tilde{x}_{ti}) which was left out. There was no apparent trend in the residuals across covariate values, that is, no indication that we are systematically under or overestimating fish maturity of a particular length and/or age.

Refer to caption
Refer to caption
Figure 11: Distributions of the proportion of age 66 fish that were of maturity level 1 (left panel) and 2 (right panel) in the replicated data sets are shown as boxplots, with width proportional to the number of age 66 fish in each year. The actual proportion from the data is given as a blue circle.
Refer to caption
Refer to caption
Figure 12: Distributions of the proportion of fish age 77 and above and length larger than 400400 mm that were of maturity level 1 (left panel) and 2 (right panel) in the replicated data sets are shown as boxplots, with width proportional to the number fish of this age and length in each year. The actual proportion from the data is given as a blue circle.
Refer to caption
Figure 13: Distributions of the sample correlation between length and age for fish that were of maturity level 2 in the replicated data sets are shown as boxplots, with width proportional to the number of level 2 fish in each year. The sample correlation based on the data is given as a blue circle.

4 Discussion

The methods developed for dynamic ordinal regression are widely applicable to modeling mixed ordinal-continuous distributions indexed in discrete time. At any particular point in time, the DP mixture representation for the latent response-covariate distribution is retained, enabling flexible inference for a variety of functionals, and allowing standard posterior simulation techniques for DP mixture models to be utilized.

In contrast to standard approaches to ordinal regression, the model does not force specific trends, such as monotonicity, in the regression functions. We view this as an attribute in most settings. Nevertheless, in situations in which it is believed that monotonicity exists, we must realize that the data will determine the model output, and may not produce strictly monotonic relationships. In the fish maturity example, it is generally accepted that monotonicity exists in the relationship between maturity and age or length. Although our model does not enforce this, the inferences generally agree with what is expected to be true biologically. Our model is also extremely relevant to this setting, as the covariates age and length are treated as random, and the ordinal nature of recorded age is accounted for using variables which represent underlying continuous age. The set of inferences that are provided under this framework, including estimates for length as a function of age, make this modeling approach powerful for the particular application considered, as well as related problems.

While year of sampling was considered to be the index of dependence in this analysis, an alternative is to consider cohort as an index of dependence. All fish born in the same year, or the same age in a given year, represent one cohort. Grouping fish by cohort rather than year of record should lead to more homogeneity within a group, however there are also some possible issues since fish will generally be younger as cohort index increases. This is a consequence of having a particular set of years for which data is collected, i.e., the cohort of fish born in 20062006 can not be older than 44 if data collection stopped in 20092009. Due to complications such as these, combined with exploration of the relationships within each cohort, we decided to treat year of data collection as the index of dependence, but cohort indexing could be more appropriate in other analyses of similar data structures.

The proposed modeling approach could also be useful in applications in finance. One such example arises in the analysis of price changes of stocks. In the past, stocks traded on the New York Stock Exchange were priced in eighths, later moved to sixteenths, and corporate bonds still trade in eighths. In analyzing price changes of stocks which are traded in fractions, it is inappropriate to treat the measurements as continuous, particularly if the range of values is not very large (e.g., Müller and Czado, 2009). The price changes should be treated with a discrete response model, and the possible responses are ordered, ranging from a large negative return to a large positive return. One possible analysis may involve modeling the monthly returns as a function of covariates such as trade volume, and must take into account the ordinal nature of the responses. In addition, the distribution of returns in a particular month is likely correlated with the previous month, and the regression relationships must be allowed to be related from one month to the next. In finance as well as environmental science, empirical distributions may exhibit non-standard features which require more general methods, such as the nonparametric mixture model developed here.


References

  • Bobko and Berkeley (2004) Bobko, S. and Berkeley, S. (2004), “Maturity, ovarian cycle, fecundity, and age-specific parturition of black rockfish (Sebastes melanops),” Fisheries Bulletin, 102, 418–429.
  • Boes and Winkelmann (2006) Boes, S. and Winkelmann, R. (2006), “Ordered Response Models,” Advances in Statistical Analysis, 90, 165–179.
  • Clark (1991) Clark, W. (1991), “Groundfish exploitation rates based on life history parameters,” Canadian Journal of Fisheries and Aquatic Sciences, 48, 734–750.
  • De Iorio et al. (2004) De Iorio, M., Müller, P., Rosner, G., and MacEachern, S. (2004), “An ANOVA Model for Dependent Random Measures,” Journal of the American Statistical Association, 99, 205–215.
  • DeYoreo (2014) DeYoreo, M. (2014), “A Bayesian Framework for Fully Nonparametric Ordinal Regression,” Ph.D. thesis, University of California, Santa Cruz.
  • DeYoreo and Kottas (2014a) DeYoreo, M. and Kottas, A. (2014a), “Bayesian Nonparametric Modeling for Multivariate Ordinal Regression,” arXiv:1408.1027, stat.ME.
  • DeYoreo and Kottas (2014b) — (2014b), “A Fully Nonparametric Modeling Approach to Binary Regression,” arXiv:1404.5097, stat.ME.
  • Di Lucca et al. (2013) Di Lucca, M., Guglielmi, A., Muller, P., and Quintana, F. (2013), “Bayesian Dynamic Density Estimation,” A Simple Class of Bayesian Nonparametric Autoregressive Models, 8, 63–88.
  • Dunson and Park (2008) Dunson, D. and Park, J. (2008), “Kernel Stick-breaking Processes,” Biometrika, 95, 307–323.
  • Field (2009) Field, J. (2009), “Status of the Chilipepper rockfish, Sebastes Goodei, in 2007,” Tech. rep., Southwest Fisheries Science Center.
  • Gelfand et al. (2005) Gelfand, A., Kottas, A., and MacEachern, S. (2005), “Bayesian Nonparametric Spatial Modeling with Dirichlet Process Mixing,” Journal of the American Statistical Association, 100, 1021–1035.
  • Gelman et al. (2004) Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2004), Bayesian Data Analysis, Chapman and Hall/CRC.
  • Griffin and Steel (2006) Griffin, J. and Steel, M. (2006), “Order-based Dependent Dirichlet Processes,” Journal of the American Statistical Association, 101, 179–194.
  • Hannah et al. (2009) Hannah, R., Blume, M., and Thompson, J. (2009), “Length and age at maturity of female yelloweye rockfish (Sebastes rubberimus) and cabezon (Scorpaenichthys marmoratus) from Oregon waters based on histological evaluation of maturity,” Tech. rep., Oregon Department of Fish and Wildlife.
  • Ishwaran and James (2001) Ishwaran, H. and James, L. (2001), “Gibbs Sampling Methods for Stick-breaking Priors,” Journal of the American Statistical Association, 96, 161–173.
  • MacEachern (1999) MacEachern, S. (1999), “Dependent Nonparametric Processes,” in ASA Procedings of the Section on Bayesian Statistical Science, pp. 50–55.
  • MacEachern (2000) — (2000), “Dependent Dirichlet processes,” Tech. rep., The Ohio State University Department of Statistics.
  • McKenzie (1985) McKenzie, E. (1985), “An Autoregressive Process for Beta Random Variables,” Management Science, 31, 988–997.
  • Morgan and Hoenig (1997) Morgan, M. and Hoenig, J. (1997), “Estimating Maturity-at-Age from Length Stratified Sampling,” Journal of Northwest Atlantic Fisheries Science, 21, 51–63.
  • Müller and Czado (2009) Müller, G. and Czado, C. (2009), “Stochastic Volatility Models for Ordinal Valued Time Series with Application to Finance,” Statistical Modeling, 9, 69–95.
  • Müller et al. (1996) Müller, P., Erkanli, A., and West, M. (1996), “Bayesian Curve Fitting Using Multivariate Normal Mixtures,” Biometrika, 83, 67–79.
  • Nieto-Barajas et al. (2012) Nieto-Barajas, L., Müller, P., Ji, Y., Lu, Y., and Mills, G. (2012), “A Time-Series DDP for Functional Proteomics Profiles,” Biometrics, 68, 859–868.
  • Rodriguez and Dunson (2011) Rodriguez, A. and Dunson, D. (2011), “Nonparametric Bayesian models through probit stick-breaking processes,” Bayesian Analysis, 6, 145–177.
  • Rodriguez and ter Horst (2008) Rodriguez, A. and ter Horst, E. (2008), “Bayesian Dynamic Density Estimation,” Bayesian Analysis, 3, 339–366.
  • Rubin (1976) Rubin, D. (1976), “Inference and Missing Data,” Biometrika, 63, 581–592.
  • Sethuraman (1994) Sethuraman, J. (1994), “A Constructive Definition of Dirichlet Priors,” Statistica Sinica, 4, 639–650.
  • Taddy (2010) Taddy, M. (2010), “Autoregressive Mixture Models for Dynamic Spatial Poisson Processes: Application to Tracking the Intensity of Violent Crime,” Journal of the American Statistical Association, 105, 1403–1417.
  • Xiao et al. (2015) Xiao, S., Kottas, A., and Sansó, B. (2015), “Modeling for seasonal marked point processes: An analysis of evolving hurricane occurrences,” The Annals of Applied Statistics, 9, 353–382.

Appendix A Properties of the DDP Prior Model

Here, we provide derivations of the various correlations associated with the DDP prior, as given in Section 2.3.

Autocorrelation of (βt,βt+k)(\beta_{t},\beta_{t+k})

Since the process is stationary with βtbeta(α,1)\beta_{t}\sim\text{beta}(\alpha,1) at any time tt, E(βtα)=\text{E}(\beta_{t}\mid\alpha)= α/(α+1)\alpha/(\alpha+1) and var(βtα)=\text{var}(\beta_{t}\mid\alpha)= α/{(α+1)2(α+2)}\alpha/\{(\alpha+1)^{2}(\alpha+2)\}. We also have

E(βtβt+kα,ϕ)=E{exp(ζ2/α)}E{exp((ηt2+ηt+k2)/2α)}\text{E}(\beta_{t}\beta_{t+k}\mid\alpha,\phi)=\text{E}\left\{\exp({-\zeta^{2}/\alpha})\right\}\text{E}\left\{\exp(-(\eta_{t}^{2}+\eta_{t+k}^{2})/2\alpha)\right\} (A.1)

using the definition of the \mathcal{B} process in (2). The first expectation can be obtained through the moment generating function of ζ2χ12\zeta^{2}\sim\chi_{1}^{2}, which is given by E(etζ2)=\text{E}(e^{t\zeta^{2}})= (12t)1/2(1-2t)^{-1/2}, for t<1/2t<1/2. Hence, for t=1/αt=-1/\alpha, we obtain E{exp(ζ2/α)}=\text{E}\left\{\exp({-\zeta^{2}/\alpha})\right\}= α1/2/(2+α)1/2\alpha^{1/2}/(2+\alpha)^{1/2}. Regarding the second expectation, note that (ηt,ηt+k)N(0,Ck)(\eta_{t},\eta_{t+k})\sim\mathrm{N}(0,C_{k}), with CkC_{k} a covariance matrix with diagonal elements equal to 1 and off-diagonal elements equal to ρk\rho_{k}. Integration results in E{exp((ηt2+ηt+k2)/2α)}=\text{E}\left\{\exp(-(\eta_{t}^{2}+\eta_{t+k}^{2})/2\alpha)\right\}= α(1ρk2)1/2/{(1ρk2+α)2α2ρk2}1/2\alpha(1-\rho_{k}^{2})^{1/2}/\{(1-\rho_{k}^{2}+\alpha)^{2}-\alpha^{2}\rho_{k}^{2}\}^{1/2}. The correlation in (3) results by combining the terms above with the expressions for E(βtα)\text{E}(\beta_{t}\mid\alpha) and var(βtα)\text{var}(\beta_{t}\mid\alpha).

Autocorrelation of DP weights

First, E(pl,tα)=\text{E}(p_{l,t}\mid\alpha)= E{(1βl,t)r=1l1βr,tα}\text{E}\{(1-\beta_{l,t})\prod_{r=1}^{l-1}\beta_{r,t}\mid\alpha\}. Since the βl,t\beta_{l,t} are independent across ll, and E(βl,tα)=\text{E}(\beta_{l,t}\mid\alpha)= α/(α+1)\alpha/(\alpha+1), we obtain E(pl,tα)=\text{E}(p_{l,t}\mid\alpha)= αl1/(1+α)l\alpha^{l-1}/(1+\alpha)^{l}. Similarly, E(pl,t2α)=\text{E}(p_{l,t}^{2}\mid\alpha)= E{(1βl,t)2α}r=1l1E(βr,t2α)=\text{E}\{(1-\beta_{l,t})^{2}\mid\alpha\}\prod_{r=1}^{l-1}\text{E}(\beta_{r,t}^{2}\mid\alpha)= 2αl1/{(α+1)(α+2)l}2\alpha^{l-1}/\{(\alpha+1)(\alpha+2)^{l}\}, from which var(pl,tα)\text{var}(p_{l,t}\mid\alpha) obtains.

Since pl,tpl,t+1=p_{l,t}p_{l,t+1}= (1βl,t)(1βl,t+1)r=1l1βr,tβr,t+1(1-\beta_{l,t})(1-\beta_{l,t+1})\prod_{r=1}^{l-1}\beta_{r,t}\beta_{r,t+1}, and (βl,t,βl,t+1)(\beta_{l,t},\beta_{l,t+1}) is independent of (βm,t,βm,t+1)(\beta_{m,t},\beta_{m,t+1}), for any lml\neq m, we can write

E(pl,tpl,t+1α,ϕ)=E{(1βl,t)(1βl,t+1)α,ϕ}r=1l1E(βr,tβr,t+1α,ϕ).\text{E}(p_{l,t}p_{l,t+1}\mid\alpha,\phi)=\text{E}\{(1-\beta_{l,t})(1-\beta_{l,t+1})\mid\alpha,\phi\}\prod_{r=1}^{l-1}\text{E}(\beta_{r,t}\beta_{r,t+1}\mid\alpha,\phi).

The required expectations in the above equation can be obtained from (A.1) for k=1k=1, such that ρ1=ϕ\rho_{1}=\phi. Combining the above expressions yields the covariance in (4).

Autocorrelation of consecutive distributions

The DDP prior model implies at any tt a DP(α,G0)\text{DP}(\alpha,G_{0}) prior for Gt=G_{t}= l=1pl,tδ𝜽l\sum_{l=1}^{\infty}p_{l,t}\delta_{\boldsymbol{\theta}_{l}}, where the 𝜽l\boldsymbol{\theta}_{l} are i.i.d. from a distribution G0G_{0} on M\mathbb{R}^{M}. Hence, for any measurable subset AMA\subset\mathbb{R}^{M}, we have E(Gt(A)α,G0)=\text{E}(G_{t}(A)\mid\alpha,G_{0})= E(Gt+1(A)α,G0)=G0(A)\text{E}(G_{t+1}(A)\mid\alpha,G_{0})=G_{0}(A), and var(Gt(A)α,G0)=\text{var}(G_{t}(A)\mid\alpha,G_{0})= var(Gt+1(A)α,G0)=\text{var}(G_{t+1}(A)\mid\alpha,G_{0})= G0(A)(1G0(A))/(α+1)G_{0}(A)(1-G_{0}(A))/(\alpha+1).

The additional expectation needed in order to obtain corr(Gt(A),Gt+1(A)ϕ,α,G0)\text{corr}(G_{t}(A),G_{t+1}(A)\mid\phi,\alpha,G_{0}) is E(Gt(A)Gt+1(A)ϕ,α,G0)=\text{E}(G_{t}(A)G_{t+1}(A)\mid\phi,\alpha,G_{0})= E{(l=1pl,tδ𝜽l(A))(m=1pm,t+1δ𝜽m(A))ϕ,α,G0}\text{E}\left\{(\sum\nolimits_{l=1}^{\infty}p_{l,t}\delta_{\boldsymbol{\theta}_{l}}(A))(\sum\nolimits_{m=1}^{\infty}p_{m,t+1}\delta_{\boldsymbol{\theta}_{m}}(A))\mid\phi,\alpha,G_{0}\right\}, which can be written as

E(l=1pl,tpl,t+1(δ𝜽l(A))2)+E(l=1mlpl,tpm,t+1δ𝜽l(A)δ𝜽m(A)).\text{E}\left(\sum\nolimits_{l=1}^{\infty}p_{l,t}p_{l,t+1}(\delta_{\boldsymbol{\theta}_{l}}(A))^{2}\right)+\text{E}\left(\sum\nolimits_{l=1}^{\infty}\sum\nolimits_{m\neq l}p_{l,t}p_{m,t+1}\delta_{\boldsymbol{\theta}_{l}}(A)\delta_{\boldsymbol{\theta}_{m}}(A)\right). (A.2)

For the first expectation in (A.2), note that pl,tpl,t+1p_{l,t}p_{l,t+1} is independent of (δ𝜽l(A))2(\delta_{\boldsymbol{\theta}_{l}}(A))^{2}, and E(pl,tpl,t+1α,ϕ)\text{E}(p_{l,t}p_{l,t+1}\mid\alpha,\phi) has been derived earlier. Moreover, since (δ𝜽l(A))2(\delta_{\boldsymbol{\theta}_{l}}(A))^{2} is equal to 11 if 𝜽lA\boldsymbol{\theta}_{l}\in A and 0 otherwise, E((δ𝜽l(A))2α,G0)=G0(A)\text{E}((\delta_{\boldsymbol{\theta}_{l}}(A))^{2}\mid\alpha,G_{0})=G_{0}(A). Regarding the second expectation in (A.2), we have again that pl,tpm,t+1p_{l,t}p_{m,t+1} and δ𝜽l(A)δ𝜽m(A)\delta_{\boldsymbol{\theta}_{l}}(A)\delta_{\boldsymbol{\theta}_{m}}(A) are independent. Here, E(δ𝜽l(A)δ𝜽m(A)α,G0)=\text{E}(\delta_{\boldsymbol{\theta}_{l}}(A)\delta_{\boldsymbol{\theta}_{m}}(A)\mid\alpha,G_{0})= (G0(A))2(G_{0}(A))^{2}, since 𝜽l\boldsymbol{\theta}_{l} and 𝜽m\boldsymbol{\theta}_{m} are independent for mlm\neq l. The final step of the derivation involves the expectations E(pl,tpm,t+1α,ϕ)\text{E}(p_{l,t}p_{m,t+1}\mid\alpha,\phi), for mlm\neq l. Note that, if l<ml<m, pl,tpm,t+1=p_{l,t}p_{m,t+1}= (r=1l1βr,tβr,t+1)βl,t+1(1βl,t)(r=l+1m1βr,t+1)(1βm,t+1)(\prod_{r=1}^{l-1}\beta_{r,t}\beta_{r,t+1})\,\beta_{l,t+1}(1-\beta_{l,t})\,(\prod_{r=l+1}^{m-1}\beta_{r,t+1})\,(1-\beta_{m,t+1}), and an analogous expression can be written when m<lm<l. Therefore, for general mlm\neq l, pl,tpm,t+1p_{l,t}p_{m,t+1} can be expressed as a product of max{l,m}\max\{l,m\} independent components, since each component comprises one or two of the random variables in {βl,t}\{\beta_{l,t}\} and {βm,t+1}\{\beta_{m,t+1}\}, in the latter case having the same first subscript. Thus, E(pl,tpm,t+1α,ϕ)\text{E}(p_{l,t}p_{m,t+1}\mid\alpha,\phi) can be developed through products of expectations of the form E(βl,tα)\text{E}(\beta_{l,t}\mid\alpha) and E(βl,tβl,t+1α,ϕ)\text{E}(\beta_{l,t}\beta_{l,t+1}\mid\alpha,\phi), which have been obtained earlier.

Appendix B Posterior Simulation Details

We derive the posterior full conditionals and provide updating strategies for many of the parameters of the hierarchical model in Section 3.2.

Updating the weights

The full conditional for ({ζl},{ηl,t})(\{\zeta_{l}\},\{\eta_{l,t}\}) is given by p({ζl},{ηl,t},data)p(\{\zeta_{l}\},\{\eta_{l,t}\}\mid\dots,\text{data})\propto

l=1N1N(ζl;0,1)N(ηl,1;0,1)t=2Tl=1N1N(ηl,t;ϕηl,t1,1ϕ2)t=1Ti=1ntl=1Npl,tδl(Lt,i).\prod_{l=1}^{N-1}\text{N}(\zeta_{l};0,1)\text{N}(\eta_{l,1};0,1)\prod_{t=2}^{T}\prod_{l=1}^{N-1}\text{N}(\eta_{l,t};\phi\eta_{l,t-1},1-\phi^{2})\prod_{t=1}^{T}\prod_{i=1}^{n_{t}}\sum_{l=1}^{N}p_{l,t}\delta_{l}(L_{t,i}).

Write i=1ntl=1Npl,tδl(Lt,i)=l=1Npl,tMl,t\prod_{i=1}^{n_{t}}\sum_{l=1}^{N}p_{l,t}\delta_{l}(L_{t,i})=\prod_{l=1}^{N}p_{l,t}^{M_{l,t}}, where Ml,t={(t,i):Lt,i=l}M_{l,t}=\mid\{(t,i):L_{t,i}=l\}\mid, i.e., the number of observations at time tt assigned to component ll. Filling in the form for {pl,t}\{p_{l,t}\} gives

i=1ntl=1Npl,tδl(Lt,i)=(1exp(ζ12+η1,t22α))M1,texp(MN,tl=1N1(ζl2+ηl,t2)2α)l=2N1{(1exp(ζl2+ηl,t22α))Ml,texp(Ml,tr=1l1(ζr2+ηr,t2)2α)}.\prod_{i=1}^{n_{t}}\sum_{l=1}^{N}p_{l,t}\delta_{l}(L_{t,i})=\left(1-\exp\left(-\frac{\zeta_{1}^{2}+\eta_{1,t}^{2}}{2\alpha}\right)\right)^{M_{1,t}}\exp\left(-\frac{M_{N,t}\sum_{l=1}^{N-1}(\zeta_{l}^{2}+\eta_{l,t}^{2})}{2\alpha}\right)\\ \prod_{l=2}^{N-1}\left\{\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}\exp\left(-\frac{M_{l,t}\sum_{r=1}^{l-1}(\zeta_{r}^{2}+\eta_{r,t}^{2})}{2\alpha}\right)\right\}.

The full conditional for each ζl\zeta_{l}, l=1,,N1l=1,\dots,N-1, is therefore

p(ζl,data)exp(ζl22)exp(ζl2t=1Tr=l+1NMr,t2α)t=1T(1exp(ζl2+ηl,t22α))Ml,tp(\zeta_{l}\mid\dots,\text{data})\propto\exp\left(-\frac{\zeta_{l}^{2}}{2}\right)\exp\left(\frac{-\zeta_{l}^{2}\sum_{t=1}^{T}\sum_{r=l+1}^{N}M_{r,t}}{2\alpha}\right)\prod_{t=1}^{T}\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}

giving

p(ζl,data)N(ζl;0,(1+α1t=1Tr=l+1NMr,t)1)t=1T(1exp(ζl2+ηl,t22α))Ml,tp(\zeta_{l}\mid\dots,\text{data})\propto\text{N}(\zeta_{l};0,(1+\alpha^{-1}\sum_{t=1}^{T}\sum_{r=l+1}^{N}M_{r,t})^{-1})\prod_{t=1}^{T}\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}

We use a slice sampler to update ζl\zeta_{l}, with the following steps:

  • Draw utuniform(0,(1exp(ζl2+ηl,t22α))Ml,t)u_{t}\sim\text{uniform}\left(0,\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}\right), for t=1,,T.t=1,\dots,T.

  • Draw ζlN(0,(1+α1t=1Tr=l+1NMr,t)1)\zeta_{l}\sim\text{N}(0,(1+\alpha^{-1}\sum_{t=1}^{T}\sum_{r=l+1}^{N}M_{r,t})^{-1}), restricted to the lie in the interval {ζl:ut<(1exp(ζl2+ηl,t22α))Ml,t,t=1,,T}\left\{\zeta_{l}:u_{t}<\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}},\,t=1,\dots,T\right\}. Solving for ζl\zeta_{l} in each of these TT equations gives ζl2>ηl,t22αlog(1ut1/Ml,t)\zeta_{l}^{2}>-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}}), for t=1,,Tt=1,\dots,T. Therefore, if ηl,t22αlog(1ut1/Ml,t)<0-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}})<0 for all tt, then ζl\zeta_{l} has no restrictions, and is therefore sampled from a normal distribution. Otherwise, if ηl,t22αlog(1ut1/Ml,t)>0-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}})>0 for some tt, then ζl>maxt{(ηl,t22αlog(1ut1/Ml,t))1/2}\mid\zeta_{l}\mid>\max_{t}\{(-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}}))^{1/2}\}. This then requires sampling ζl\zeta_{l} from a normal distribution, restricted to the intervals (,maxt{(ηl,t22αlog(1ut1/Ml,t))1/2})(-\infty,-\max_{t}\{(-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}}))^{1/2}\}), and (maxt{(ηl,t22αlog(1ut1/Ml,t))1/2},)(\max_{t}\{(-\eta_{l,t}^{2}-2\alpha\log(1-u_{t}^{1/M_{l,t}}))^{1/2}\},\infty).

In the second step above, we may have to sample from a normal distribution, restricted to two disjoint intervals. The resulting distribution is therefore a mixture of two truncated normals, with probabilities determined by the (normalized) probability the normal assigns to each interval. These truncated normals both have mean 0 and variance (1+t=1Tr=l+1NMr,t/α)1(1+\sum_{t=1}^{T}\sum_{r=l+1}^{N}M_{r,t}/\alpha)^{-1}, and each mixture component has equal probability.

The full conditional for each ηl,t\eta_{l,t}, l=1,,N1l=1,\dots,N-1, t=2,,T1t=2,\dots,T-1, is proportional to

N(ηl,t;0,αr=l+1NMr,t)N(ηl,t;ϕηl,t1,1ϕ2)N(ηl,t+1;ϕηl,t,1ϕ2)(1exp(ζl2+ηl,t22α))Ml,t\text{N}\left(\eta_{l,t};0,\frac{\alpha}{\sum_{r=l+1}^{N}M_{r,t}}\right)\text{N}(\eta_{l,t};\phi\eta_{l,t-1},1-\phi^{2})\text{N}(\eta_{l,t+1};\phi\eta_{l,t},1-\phi^{2})\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}
N(ηl,t;ϕα(ηl,t1+ηl,t+1)ϕ2(αr=l+1NMr,t)+α+r=l+1NMr,t,α(1ϕ2)ϕ2(αr=l+1NMr,t)+α+r=l+1NMr,t)(1exp(ζl2+ηl,t22α))Ml,t\propto\text{N}\left(\eta_{l,t};\frac{\phi\alpha(\eta_{l,t-1}+\eta_{l,t+1})}{\phi^{2}(\alpha-\sum_{r=l+1}^{N}M_{r,t})+\alpha+\sum_{r=l+1}^{N}M_{r,t}},\frac{\alpha(1-\phi^{2})}{\phi^{2}(\alpha-\sum_{r=l+1}^{N}M_{r,t})+\alpha+\sum_{r=l+1}^{N}M_{r,t}}\right)\\ \left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}

Each ηl,t\eta_{l,t}, l=1,,N1l=1,\dots,N-1, and t=2,,T1t=2,\dots,T-1, can therefore be sampled with a slice sampler:

  • Draw uUnif(0,(1exp(ζl2+ηl,t22α))Ml,t)u\sim\text{Unif}\left(0,\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}\right).

  • Draw ηl,tN(ηl,t;ϕα(ηl,t1+ηl,t+1)ϕ2(αr=l+1NMr,t)+α+r=l+1NMr,t,α(1ϕ2)ϕ2(αr=l+1NMr,t)+α+r=l+1NMr,t)\eta_{l,t}\sim\text{N}\left(\eta_{l,t};\frac{\phi\alpha(\eta_{l,t-1}+\eta_{l,t+1})}{\phi^{2}(\alpha-\sum_{r=l+1}^{N}M_{r,t})+\alpha+\sum_{r=l+1}^{N}M_{r,t}},\frac{\alpha(1-\phi^{2})}{\phi^{2}(\alpha-\sum_{r=l+1}^{N}M_{r,t})+\alpha+\sum_{r=l+1}^{N}M_{r,t}}\right), restricted to {ηl,t:(1exp(ζl2+ηl,t22α))Ml,t>u}\left\{\eta_{l,t}:\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}>u\right\}, giving ηl,t2>2αlog(1u1/Ml,t)ζl2\eta_{l,t}^{2}>-2\alpha\log(1-u^{1/M_{l,t}})-\zeta_{l}^{2}.

In the second step above, we will again either sample from a single normal or a mixture of truncated normals, where each normal has the same mean and variance, but the truncation intervals differ. Since the mean of this normal is not zero, the weights assigned to each truncated normal are not the same. The unnormalized weight assigned to the normal which places positive probability on ((2αlog(1u1/Ml,t)ζl2)1/2,)((-2\alpha\log(1-u^{1/M_{l,t}})-\zeta_{l}^{2})^{1/2},\infty) is given by 1F((2αlog(1u1/Ml,t)ζl2)1/2)1-F((-2\alpha\log(1-u^{1/M_{l,t}})-\zeta_{l}^{2})^{1/2}), where FF is the CDF of the normal for ηl,t\eta_{l,t} given in the second step. The unnormalized weight given to the component which places positive probability on (,(2αlog(1u1/Ml,t)ζl2)1/2)(-\infty,-(-2\alpha\log(1-u^{1/M_{l,t}})-\zeta_{l}^{2})^{1/2}) is given by F((2αlog(1u1/Ml,t)ζl2)1/2)F(-(-2\alpha\log(1-u^{1/M_{l,t}})-\zeta_{l}^{2})^{1/2}).

The full conditionals for ηl,1\eta_{l,1} and ηl,T\eta_{l,T} are slightly different. The full conditional for ηl,1\eta_{l,1} is

p(ηl,1,data)N(ηl,1;0,αr=l+1NMr,1)N(ηl,1;0,1)N(ηl,2;ϕηl,1,1ϕ2)(1exp(ζl2+ηl,122α))Ml,1,p(\eta_{l,1}\mid\dots,\text{data})\propto\text{N}(\eta_{l,1};0,\frac{\alpha}{\sum_{r=l+1}^{N}M_{r,1}})\text{N}(\eta_{l,1};0,1)\text{N}(\eta_{l,2};\phi\eta_{l,1},1-\phi^{2})\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,1}^{2}}{2\alpha}\right)\right)^{M_{l,1}},
N(ηl,1;ϕαηl,2α+r=l+1NMr,1ϕ2r=l+1NMr,1,α(1ϕ2)α+r=l+1NMr,1ϕ2r=l+1NMr,1)(1exp(ζl2+ηl,122α))Ml,1.\propto\text{N}\left(\eta_{l,1};\frac{\phi\alpha\eta_{l,2}}{\alpha+\sum_{r=l+1}^{N}M_{r,1}-\phi^{2}\sum_{r=l+1}^{N}M_{r,1}},\frac{\alpha(1-\phi^{2})}{\alpha+\sum_{r=l+1}^{N}M_{r,1}-\phi^{2}\sum_{r=l+1}^{N}M_{r,1}}\right)\\ \left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,1}^{2}}{2\alpha}\right)\right)^{M_{l,1}}.

For ηl,T\eta_{l,T}, we have:

p(ηl,T,data)N(ηl,T;0,αr=l+1NMr,T)N(ηl,T;ϕηl,T1,1ϕ2)(1exp(ζl2+ηl,T22α))Ml,T,p(\eta_{l,T}\mid\dots,\text{data})\propto\text{N}(\eta_{l,T};0,\frac{\alpha}{\sum_{r=l+1}^{N}M_{r,T}})\text{N}(\eta_{l,T};\phi\eta_{l,T-1},1-\phi^{2})\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,T}^{2}}{2\alpha}\right)\right)^{M_{l,T}},

which is proportional to

N(ηl,T;ϕαηl,T1α+r=l+1NMr,Tϕ2r=l+1NMr,T,α(1ϕ2)α+r=l+1NMr,Tϕ2r=l+1NMr,T)(1exp(ζl2+ηl,T22α))Ml,T\text{N}\left(\eta_{l,T};\frac{\phi\alpha\eta_{l,T-1}}{\alpha+\sum_{r=l+1}^{N}M_{r,T}-\phi^{2}\sum_{r=l+1}^{N}M_{r,T}},\frac{\alpha(1-\phi^{2})}{\alpha+\sum_{r=l+1}^{N}M_{r,T}-\phi^{2}\sum_{r=l+1}^{N}M_{r,T}}\right)\\ \left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,T}^{2}}{2\alpha}\right)\right)^{M_{l,T}}

The slice samplers for ηl,1\eta_{l,1} and ηl,T\eta_{l,T} are therefore implemented in the same way as for ηl,t\eta_{l,t}, except the normals which are sampled from have different means and variances.

Updating α\alpha

The full conditional for α\alpha is

p(α,data)p(α)exp(t=1TMN,tl=1N1(ζl2+ηl,t2)2α)exp(t=1Tl=2N1Ml,tr=1l1(ζr2+ηr,t2)2α)p(\alpha\mid\dots,\text{data})\propto p(\alpha)\exp\left(-\frac{\sum_{t=1}^{T}M_{N,t}\sum_{l=1}^{N-1}(\zeta_{l}^{2}+\eta_{l,t}^{2})}{2\alpha}\right)\exp\left(-\frac{\sum_{t=1}^{T}\sum_{l=2}^{N-1}M_{l,t}\sum_{r=1}^{l-1}(\zeta_{r}^{2}+\eta_{r,t}^{2})}{2\alpha}\right)
t=1Tl=1N1(1exp(ζl2+ηl,t22α))Ml,t\prod_{t=1}^{T}\prod_{l=1}^{N-1}\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}

Therefore, with p(α)=IG(aα,bα)p(\alpha)=\text{IG}(a_{\alpha},b_{\alpha}), we have

p(α,data)=IG(α;aα,bα+12t=1T(MN,tl=1N1(ζl2+ηl,t2)+l=2N1Ml,tr=1l1(ζr2+ηr,t2)))p(\alpha\mid\dots,\text{data})=\text{IG}\left(\alpha;a_{\alpha},b_{\alpha}+\frac{1}{2}\sum_{t=1}^{T}\left(M_{N,t}\sum_{l=1}^{N-1}(\zeta_{l}^{2}+\eta_{l,t}^{2})+\sum_{l=2}^{N-1}M_{l,t}\sum_{r=1}^{l-1}(\zeta_{r}^{2}+\eta_{r,t}^{2})\right)\right)
t=1Tl=1N1(1exp(ζl2+ηl,t22α))Ml,t\prod_{t=1}^{T}\prod_{l=1}^{N-1}\left(1-\exp\left(-\frac{\zeta_{l}^{2}+\eta_{l,t}^{2}}{2\alpha}\right)\right)^{M_{l,t}}

The parameter α\alpha can be sampled using a Metropolis-Hastings algorithm. In particular we work with log(α)\log(\alpha), and use a normal proposal distribution centered at the log of the current value of α\alpha.

Updating ϕ\phi

The full conditional for the AR parameter ϕ\phi is

p(ϕ,data)p(ϕ)t=2Tl=1N1N(ηl,t;ϕηl,t1,1ϕ2)p(\phi\mid\dots,\text{data})\propto p(\phi)\prod_{t=2}^{T}\prod_{l=1}^{N-1}\text{N}(\eta_{l,t};\phi\eta_{l,t-1},1-\phi^{2})
(1ϕ2)(N1)(T1)/2exp(t=2Tl=1N112(1ϕ2)(ηl,tϕηl,t1)2)p(ϕ)\propto(1-\phi^{2})^{-(N-1)(T-1)/2}\exp\left(-\sum_{t=2}^{T}\sum_{l=1}^{N-1}\frac{1}{2(1-\phi^{2})}(\eta_{l,t}-\phi\eta_{l,t-1})^{2}\right)p(\phi)

We assume p(ϕ)=uniform(0,1)p(\phi)=\mathrm{uniform}(0,1) or p(ϕ)=uniform(1,1)p(\phi)=\mathrm{uniform}(-1,1), and apply a Metropolis-Hastings algorithm to sample log(ϕ1ϕ)\log\left(\frac{\phi}{1-\phi}\right) or log(ϕ+11ϕ)\log\left(\frac{\phi+1}{1-\phi}\right), respectively, using a normal proposal distribution.

Updating {𝝁l,t}\{\boldsymbol{\mu}_{l,t}\}

The updates for 𝝁l,t\boldsymbol{\mu}_{l,t} are N(𝒎,𝑽)\text{N}(\boldsymbol{m}^{*},\boldsymbol{V}^{*}), with 𝒎\boldsymbol{m}^{*} and 𝑽\boldsymbol{V}^{*} given by:

  • For t=2,,T1t=2,\dots,T-1, if Ml,t=0M_{l,t}=0, then the update for 𝝁l,t\boldsymbol{\mu}_{l,t} has 𝑽=(𝑽1+(Θ1𝑽ΘT)1)1\boldsymbol{V}^{*}=(\boldsymbol{V}^{-1}+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1})^{-1} and 𝒎=𝑽(𝑽1(𝒎+Θ𝝁l,t1)+(Θ1𝑽ΘT)1Θ1(𝝁l,t+1𝒎))\boldsymbol{m}^{*}=\boldsymbol{V}^{*}(\boldsymbol{V}^{-1}(\boldsymbol{m}+\Theta\boldsymbol{\mu}_{l,t-1})+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}\Theta^{-1}(\boldsymbol{\mu}_{l,t+1}-\boldsymbol{m}))

  • For t=2,,T1t=2,\dots,T-1, if Ml,t0M_{l,t}\neq 0, then the update for 𝝁l,t\boldsymbol{\mu}_{l,t} has 𝑽=(𝑽1+(Θ1𝑽ΘT)1+Ml,t𝚺l1)1\boldsymbol{V}^{*}=(\boldsymbol{V}^{-1}+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}+M_{l,t}\boldsymbol{\Sigma}_{l}^{-1})^{-1} and 𝒎=𝑽(𝑽1(𝒎+Θ𝝁l,t1)+(Θ1𝑽ΘT)1Θ1(𝝁l,t+1𝒎)+𝚺l1{i:Lt,i=l}𝒚t,i)\boldsymbol{m}^{*}=\boldsymbol{V}^{*}(\boldsymbol{V}^{-1}(\boldsymbol{m}+\Theta\boldsymbol{\mu}_{l,t-1})+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}\Theta^{-1}(\boldsymbol{\mu}_{l,t+1}-\boldsymbol{m})+\boldsymbol{\Sigma}_{l}^{-1}\sum_{\{i:L_{t,i}=l\}}\boldsymbol{y}_{t,i})

  • for t=1t=1, if Ml,1=0M_{l,1}=0, then the update for 𝝁l,1\boldsymbol{\mu}_{l,1} has 𝑽=((Θ1𝑽ΘT)1+𝑽01)1\boldsymbol{V}^{*}=((\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}+\boldsymbol{V}_{0}^{-1})^{-1}, and 𝒎=𝑽((Θ1𝑽ΘT)1Θ1(𝝁l,2𝒎)+𝑽01𝒎𝟎)\boldsymbol{m}^{*}=\boldsymbol{V}^{*}((\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}\Theta^{-1}(\boldsymbol{\mu}_{l,2}-\boldsymbol{m})+\boldsymbol{V}_{0}^{-1}\boldsymbol{m_{0}})

  • for t=1t=1, if Ml,10M_{l,1}\neq 0, then the update for 𝝁l,1\boldsymbol{\mu}_{l,1} has 𝑽=(Ml,1𝚺l1+(Θ1𝑽ΘT)1+𝑽01)1\boldsymbol{V}^{*}=(M_{l,1}\boldsymbol{\Sigma}_{l}^{-1}+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}+\boldsymbol{V}_{0}^{-1})^{-1} and 𝒎=𝑽(𝚺l1{i:L1,i=l}𝒚1,i+(Θ1𝑽ΘT)1Θ1(𝝁l,2𝒎)+𝑽01𝒎𝟎)\boldsymbol{m}^{*}=\boldsymbol{V}^{*}(\boldsymbol{\Sigma}_{l}^{-1}\sum_{\{i:L_{1,i}=l\}}\boldsymbol{y}_{1,i}+(\Theta^{-1}\boldsymbol{V}\Theta^{-T})^{-1}\Theta^{-1}(\boldsymbol{\mu}_{l,2}-\boldsymbol{m})+\boldsymbol{V}_{0}^{-1}\boldsymbol{m_{0}})

  • for t=Tt=T, if Ml,T=0M_{l,T}=0, then the update for 𝝁l,T\boldsymbol{\mu}_{l,T} has 𝑽=𝑽\boldsymbol{V}^{*}=\boldsymbol{V}, and 𝒎=𝒎+Θ𝝁l,T1\boldsymbol{m}^{*}=\boldsymbol{m}+\Theta\boldsymbol{\mu}_{l,T-1}

  • for t=Tt=T, if Ml,T0M_{l,T}\neq 0, then the update for 𝝁l,T\boldsymbol{\mu}_{l,T} has 𝑽=(Ml,T𝚺l1+𝑽1)1\boldsymbol{V}^{*}=(M_{l,T}\boldsymbol{\Sigma}_{l}^{-1}+\boldsymbol{V}^{-1})^{-1} and 𝒎=𝑽(𝚺l1{i:LT,i=l}𝒚t,i+𝑽1(𝒎+Θ𝝁l,T1))\boldsymbol{m}^{*}=\boldsymbol{V}^{*}(\boldsymbol{\Sigma}_{l}^{-1}\sum_{\{i:L_{T,i}=l\}}\boldsymbol{y}_{t,i}+\boldsymbol{V}^{-1}(\boldsymbol{m}+\Theta\boldsymbol{\mu}_{l,T-1}))

Updating {𝚺l}\{\boldsymbol{\Sigma}_{l}\}

The posterior full conditional for 𝚺l\boldsymbol{\Sigma}_{l} is proportional to IW(ν+Ml,𝑫+{(t,i):Lti=l}(𝒚t,i𝝁l)(𝒚t,i𝝁l)T)\text{IW}(\nu+M_{l},\boldsymbol{D}+\sum_{\{(t,i):L_{ti}=l\}}(\boldsymbol{y}_{t,i}-\boldsymbol{\mu}_{l})(\boldsymbol{y}_{t,i}-\boldsymbol{\mu}_{l})^{T}). When |{(t,i):Lt,i=l}|=0|\{(t,i):L_{t,i}=l\}|=0, 𝚺l\boldsymbol{\Sigma}_{l} is drawn from IW(ν,𝑫)\text{IW}(\nu,\boldsymbol{D}).