Embedding Positive Process Models into Lognormal Bayesian State Space Frameworks using Moment Matching

Smith, J. W., Johnson, L. R., Thomas, R. Q John W. Smith^1,2 Leah R. Johnson^2,4 R. Quinn Thomas^3,4

(¹Department of Mathematical Sciences, Montana State University
²Department of Statistics, Virginia Tech
³Department of Forest Resources and Environmental Conservation, Virginia Tech
⁴Department of Biological Sciences, Virginia Tech
)

Abstract

In ecology it is common for processes to be bounded based on physical constraints of the system. One common example is the positivity constraint, which applies to phenomena such as duration times, population sizes, and total stock of a system’s commodity. In this paper, we propose a novel method for embedding these dynamical systems into a lognormal state space model using an approach based on moment matching. Our method enforces the positivity constraint, allows for embedding of arbitrary mean evolution and variance structure, and has a closed-form Markov transition density which allows for more flexibility in fitting techniques. We discuss two existing lognormal state space models, and examine how they differ from the method presented here. We use 180 synthetic datasets to compare the forecasting performance under model misspecification and assess estimability of precision parameters between our method and existing methods. We find that our models well under misspecification, and that fixing the observation variance both helps to improve estimation of the process variance and improves forecast performance. To test our method on a difficult problem, we compare the predictive performance of two lognormal state space models in predicting Leaf Area Index over a 151 day horizon by embedding a process-based ecosystem model. We find that our moment matching model performs better than its competitor, and is better suited for long predictive horizons. Overall, our study helps to inform practitioners about the importance of embedding sensible dynamics when using models complex systems to predict out of sample.
Keywords: State space model, Bayesian statistics, MCMC, Particle filter, Forecasting

1 Introduction

Process based models are mathematical representations of the evolution of biological or physical systems (Buck-Sorlin,, 2013). These models are often comprised of systems of ordinary or partial differential equations in time and space, or are discretizations of such systems. Because they encapsulate known or hypothesized mechanisms of physical or biological systems, process based modeling approaches have advantages over empirical or phenomenological modeling approaches, particularly when low data availability limit the ability of empirical models to accurately represent complex processes. As a result, process based models remain common in ecological forecasting applications (Lewis et al.,, 2022). However process-based modeling approaches have their own set of challenges, particularly the quantification of uncertainty in model dynamics and structure or over-parameterization (Luo et al.,, 2009).

Quantifying uncertainty in process based models is one of the most challenging tasks when using them for forecasting applications that involve uncertainty propagation. Uncertainty comes from a wide variety of sources, including but not limited to: process, measurement, initial conditions, driver data, and estimated parameters (Dietze et al.,, 2018). Process uncertainty (or process stochasticity) is particularly important to address, as it acknowledges that the modeling framework may contain unknown errors that are best represented stochastically or contain elements that are known to be non-deterministic and that affect the dynamics of the biological process. For this reason, the state space modeling (SSM) framework (Durbin and Koopman,, 2012; Petris et al.,, 2009; Auger-Méthé et al.,, 2021) has been used frequently in ecological applications (see Thomas et al.,, 2017; Dowd and Meyer,, 2003; Dennis et al.,, 2006, for examples). State space models provide a flexible framework that is able to handle missing data and partition multiple sources of uncertainty (Dietze et al.,, 2018). Many existing fitting methods for ecosystem process based models already estimate initial condition and parameter uncertainty while accounting for observation uncertainty. By adding stochastic elements to the process based model they can be analyzed as state space models.

One of the nuances of including uncertainty/stochasticity in the process model is the trade-off between complexity and ecological realism. For example, in forest carbon modeling, Gaussian error, with its positive to negative infinity bounds, is commonly assumed for carbon stocks (pools) (Thomas et al.,, 2017; Jiang et al.,, 2018), despite the biophysical impossibility of an ecosystem having a negative amount of carbon. In general, biological processes may have well defined lower bounds (and potentially upper bounds) that are not accurately captured by error structures that have support over the entire real line. Consequently, models that have biophysically unrealistic error structures can produce nonsensical predictions as the states approach these well defined bounds, or if the observations have large measurement error which can allow for predictions with negative values as the modeled states approach the lower bounds. This is especially relevant in forecasting applications, where the uncertainty compounds as the forecast horizon increases.

A wealth of probability distributions are available for modeling non-negative processes. In particular, the lognormal distribution has a rich history in ecology (Dennis and Patil,, 1988), and is frequently used in state space modeling of populations (e.g., Buckland et al.,, 2004; Dennis et al.,, 2006; Knape et al.,, 2011) and species abundance (e.g, Maunder et al.,, 2015; Mäntyniemi et al.,, 2015). Two common formulations for lognormal state space models (LN-SSMs) are the stochastic Gompertz SSM (Gompertz,, 1825) and the stochastic Moran-Ricker SSM (Ricker,, 1954; Dennis et al.,, 2006). In their simplest forms, these models include assumptions that are disguised by writing the models as Gaussian in log space - namely the assumptions of density dependent variance and systematic bias in the process evolution and observation functions. When these assumptions are not feasible for an application, we need a mechanism to insert more appropriate assumptions about the process evolution, observation, and variance dynamics. To address challenges with incorporating ecologically realistic error structures into state space models, we propose a novel lognormal moment matching approach that allows users to specify the mean and variance of their process evolution and observation models.

Here, we fit our lognormal moment matching models from a Bayesian perspective using Markov Chain Monte Carlo (MCMC) and Particle Markov Chain Monte Carlo (pMCMC; Andrieu et al.,, 2010). The Bayesian paradigm provides a flexible framework for fitting complex models, and the moment matching approach we introduce here offers a closed form Markov transition density. This closed form transition density provides the option to fit these models using MCMC, while still supporting access to particle filter (Cappe et al.,, 2007; Doucet and Johansen,, 2011) methods such as pMCMC. MCMC and pMCMC also provide a rigorous framework for quantifying parameter uncertainty and assessing forecast performance. MCMC methods generate samples from the posterior distribution, allowing practitioners to generate parameter estimates and build empirical density functions for their forecasts (Krüger et al.,, 2021). We can then validate our models by combining out of sample forecast observations with MCMC output and evaluating them with proper scoring rules (Gneiting and Raftery,, 2007), such as the Continuous Ranked Probability Score (CRPS; Matheson and Winkler,, 1976) and the Ignorance score (IGN; Good,, 1952; Roulston and Smith,, 2002).

We create four different models using the lognormal moment matching technique that we will present here. The four models are all based off of the Gompertz and Moran-Ricker models (Gompertz,, 1825; Dennis et al.,, 2006; Ricker,, 1954), and are embedded (put into the latent process model in a particular way) to have unbiased process evolution. We explore two different variance structures: a density dependent variance (Dennis et al.,, 2006) for the evolution and observations, and a constant variance for the evolution and observations. First, we discuss interpretations of the Gompertz and Moran-Ricker SSMs and contrast them with interpretation of the models we present here. Next, we design and conduct simulation studies to compare forecast performance under model mis-specification, and assess estimability of precision parameters. Finally, we embed a two-dimensional process-based ecosystem model based on the DALEC2 model of Bloom and Williams, (2015), and use it to predict Leaf Area Index (LAI) at a focal forest site (University of Notre Dame Environmental Research Center; UNDE) in Wisconsin, USA. We perform the embeddings in two different ways: once using a biased embedding with a density dependent variance structure, and once using our moment matching embedding with a density dependent variance structure. We design and perform an analysis to test the viability of using a state space modeling framework to predict out of sample LAI, and to identify similarities and differences between our moment matching technique and the more standard biased embedding.

2 Motivating Examples

In this section, we motivate the use of biologically realistic error structures in applications by demonstrating problems that may occur when modeling positive processes with error distributions that have mass over the entire real line. In particular we demonstrate how process models produce nonsensical predictions as they approach or exceed well defined biophysical bounds. First, we consider the following toy dynamical system in time:

\displaystyle X_{t}\lvert X_{t-1}\sim\text{log}\mathcal{N}\left(\mu^{*}=\log\left(\frac{X_{t-1}^{2}}{\sqrt{X_{t-1}^{2}+\sigma^{2}}}\right),\sigma^{2*}=\log\left(1+\frac{\sigma^{2}}{X_{t-1}^{2}}\right)\right).

(1)

Since the stochastic evolution function is lognormally distributed, the process is positive, i.e. $X_{t}\in(0,\infty),\forall t=1,\dots,T$ . Further the conditional mean and variance are $\mathbb{E}[X_{t}\lvert X_{t-1}]=X_{t-1}$ and $\mathbb{V}[X_{t}\lvert X_{t-1}]=\sigma^{2}$ .

Suppose that the positivity of the dynamical system is ignored and instead the process is modeled as simple a Gaussian random walk:

\displaystyle X_{t}\lvert X_{t-1}\sim\mathcal{N}(\mu^{*}=X_{t-1},\sigma^{2*}=\sigma^{2}).

(2)

When the system starts sufficiently far from zero the two models are indistinguishable from each other, producing nearly identical forecast distributions in terms of median estimates and credible intervals (Figure 1, top two panels). When the starting value $X_{0}$ is chosen to be close to the biological lower bound, the differences in forecasts become more apparent. The lognormal model forecast intervals are now asymmetric and bounded below at zero, and quickly reach zero as the forecast horizon increases. The random walk model forecast intervals retain their symmetry and continue to put considerable forecast mass below zero as the forecast horizon increases (Figure 1, bottom two panels).

Refer to caption — Figure 1: Two sample trajectories from the Lognormal dynamical system in Equation 1, the top panels start from $X_{0}=50$ and the bottom panels start from $X_{0}=2$ . 95 % credible intervals for 10 day forecast horizons (dotted lines) are created using the Lognormal equations and the Gaussian equations to generate 10,000 sample prediction trajectories. In the bottom panels, the Lognormal forecasts (bottom left) are bounded below by zero, the biological lower bound. In contrast, the Gaussian random walk forecasts (bottom right) put considerable forecast probability mass below the biological bound.

In this simple example, the negative forecasts are not causing any issues in the process model, thus only producing unrealistic forecasts. To illustrate an example where issues arise in the process model, consider the following dynamical system:

\displaystyle X_{t}\lvert X_{t-1}\sim\text{log}\mathcal{N}\left(\mu^{*}=\log(\lvert X_{t-1}\lvert)-a,\sigma^{2*}=\log\Big{(}1+\frac{\sigma^{2}}{\lvert X_{t-1}\rvert}\Big{)}^{2}\right).

(3)

We note that, formally, absolute values are unnecessary for the evolution of the true model (since $X_{t}$ is positive for all $t$ ). However they will become necessary when we attempt to find an analogous Gaussian model with which to forecast.

Computing the conditional mean and variance of the lognormal distribution with the given values of $\mu^{*}$ and $\sigma^{2*}$ and using them as the conditional mean and variance of a Gaussian distribution results in:

\displaystyle X_{t}\lvert X_{t-1}\sim\mathcal{N}\left(\exp\left(\mu^{*}+\frac{\sigma^{2*}}{2}\right),\left(\exp(\sigma^{2*})-1\right)\exp\left(2\mu^{*}+\sigma^{2*}\right)\right).

(4)

For values sufficiently far from zero, the forecasts from the Lognormal model are nearly identical to the forecasts from the Gaussian model (Figure 2, top two panels). The Lognormal forecasts (bottom left panel, Figure 2) enforce the lower bound, and produce forecast distributions with heavy upper tails. However, the Gaussian forecasts when the system approaches zero (bottom right panel, Figure 2) produce forecast distributions with large upper credible bounds that march away from zero, and lower credible bounds that first increase and then move back towards zero. In working on specific applications, predictions like these may be challenging to analyze because it is easy to associate the dynamics with an inadequate process model, when in reality the problem lies in having an error structure that can put probability mass on regions where it should be biologically impossible.

3 Methods

3.1 Lognormal SSMs

State Space Models, sometimes referred to as Hidden Markov Models (HMMs), are a broad class of models used to track the states of a model (some unobserved process $X_{1:T}$ ) through a set of observations, $Y_{1:T}$ (Durbin and Koopman,, 2012; Petris et al.,, 2009). A state space model has three components – the state component, the observation component, and additional parameters. The state component consists of an (unobserved) Markov process, $X_{t}$ , being moved through time by an evolution function (or process function) $f(X_{t}\lvert X_{t-1},\Theta)$ . The implication of $X_{1:T}$ following a Markov process is that the past and the future are independent conditional on the state at the current time, $X_{t}$ (Shumway and Stoffer,, 2011). The observation component consists of noisy observations of the latent process, $Y_{1:T}$ , that are governed by an observation density function, $g(Y_{t}\lvert X_{t},\Theta)$ . These observations are assumed to be independent of one another conditional on the latent states $X_{1:T}$ . The additional parameter component, $\Theta$ , contains the parameters that govern the evolution function and observation function.

The Gompertz (Gompertz,, 1825) and Moran-Ricker (Ricker,, 1954) SSMs are simple discrete density-dependent state space models with lognormally distributed process and measurement error (Dennis et al.,, 2006). The Gompertz and Moran-Ricker SSMs are popular choices for lognormal state space models because they can be easily transformed into Normal Dynamic Models (NDMs), and can then take advantage of a suite of well studied fitting methods including Kalman filtering (Kalman,, 1960), extended Kalman filtering (Julier and Uhlmann,, 1997), and Gibbs sampling (Geman and Geman,, 1984; Carter and Kohn,, 1994), easing computational difficulties associated with fitting SSMs. The latent process models for the Gompertz model can be written as:

\displaystyle X_{t}

\displaystyle=X_{t-1}\exp(a+b\log(X_{t-1})+\epsilon_{t}),\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi)

(5)

The latent process model for the Moran-Ricker model can be written as:

\displaystyle X_{t}

\displaystyle=X_{t-1}\exp(a+bX_{t-1}+\epsilon_{t}),\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi)

(6)

Letting $A=\exp(a)$ , we can rewrite Equations 5 and 6, the Gompertz and Moran-Ricker process equations (respectively), in the form:

	$\displaystyle X_{t}$	$\displaystyle=A(X_{t-1})^{b+1}\exp(\epsilon_{t}),\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi)$		(7)
	$\displaystyle X_{t}$	$\displaystyle=AX_{t-1}\exp(bX_{t-1})\exp(\epsilon_{t}),\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi)$		(8)

More generally, we can think of both the Gompertz and Moran-Ricker process models as belonging to a class of lognormally distributed process models that have the form:

\displaystyle X_{t}=f^{*}(X_{t-1}\lvert\Theta)\exp(\epsilon_{t}),\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi),\text{ }f^{*}(X_{t-1}\lvert\Theta)>0,

(9)

where $f^{*}(X_{t-1}\lvert\theta)$ is the model of choice for the non-negative physical process, and the process error is governed by a multiplicative zero mean lognormal distribution. Key statistical properties (conditional mean, conditional variance, and conditional median) of this class of process models are:

$\displaystyle\mathbb{E}[X_{t}\lvert X_{t-1}]$	$\displaystyle=f^{*}(X_{t-1}\lvert\Theta)\exp((2\phi)^{-1}),$	(10)
$\displaystyle\mathbb{V}[X_{t}\lvert X_{t-1}]$	$\displaystyle=f^{*}(X_{t-1}\lvert\Theta)^{2}\exp(\phi^{-1}),\Big{(}\exp(\phi^{-1})-1\Big{)}$	(11)
$\displaystyle\mathscr{M}[X_{t}\lvert X_{t-1}]$	$\displaystyle=f^{*}(X_{t-1}\lvert\Theta).$	(12)

When looking at the conditional expected value, we see that this class of models is biased in terms of mean process evolution, but unbiased in terms of median process evolution. Further, the variance of the latent state is controlled by the value of the process model $f^{*}(\cdot)$ , with larger values of the process assumed to have larger variation than smaller ones. Therefore, in the class of process models that describe the Gompertz and Moran-Ricker models, the density dependent relationship of the latent process variance is assumed to scale quadratically with the predicted value of the process model.

The generalization of the Gompertz and Moran-Ricker process models also holds true for the observation model. For example, for continuous responses $Y$ , the assumed relationship between an arbitrary observation $Y_{t}$ and the corresponding latent state $X_{t}$ for the Gompertz model may be given by:

\displaystyle Y_{t}=X_{t}\exp(\epsilon_{obs,t}),\text{ }\epsilon_{obs,t}\sim\mathcal{N}(0,\tau)

(13)

Broadly, the Gompertz and Moran-Ricker observation models belong to the same class of lognormally distributed observation models in Equation 13 that have the form:

\displaystyle Y_{t}=g^{*}(X_{t}\lvert\Theta)\exp(\epsilon_{obs,t}),\text{ }\epsilon_{obs,t}\sim\mathcal{N}(0,\tau),\text{ }g^{*}(X_{t}\lvert\Theta)>0,

(14)

where $g^{*}(X_{t}\lvert\Theta)$ is now interpreted as the model used to link our observations to their respective latent states, and the measurement error is dictated by a multiplicative zero mean lognormal distribution. The conditional mean, conditional variance, and conditional median of this class of observation models are given by:

	$\displaystyle\mathbb{E}[Y_{t}\lvert X_{t}]$	$\displaystyle=g^{*}(X_{t}\lvert\Theta)\exp((2\tau)^{-1}),$
	$\displaystyle\mathbb{V}[Y_{t}\lvert X_{t}]$	$\displaystyle=g^{*}(X_{t}\lvert\Theta)^{2}\exp(\tau^{-1})\Big{(}\exp(\tau^{-1})-1\Big{)},$
	$\displaystyle\mathscr{M}[Y_{t}\lvert X_{t}]$	$\displaystyle=g^{*}(X_{t}\lvert\Theta).$

Importantly, while this class of observation models assumes that observations are unbiased in log space, they assume there is systematic observation bias in their measurement process, which is given by $B(Y_{t}\lvert X_{t})=g^{*}(X_{t}\lvert\Theta)(1-\exp((2\tau)^{-1}))$ .

Though we have only examined the Gompertz and Moran-Ricker models when discussing the class of lognormal process and observation models presented here, there are many examples of lognormal modeling frameworks in the animal population modeling literature (see Buckland et al.,, 2004; Knape et al.,, 2011, for examples) and fisheries modeling literature (see Maunder et al.,, 2015; Mäntyniemi et al.,, 2015, for examples) that fall into the classes we discuss here. Including a term to correct the bias is a common method used in applications to rectify assumptions of biased process evolution and biased observations (Knape et al.,, 2011; Maunder et al.,, 2015; Mäntyniemi et al.,, 2015) but this bias correction term changes the variance structure, as the lognormal variance is a function of both $\mu$ and $\sigma^{2}$ .

3.2 Lognormal Moment Matching Models

Instead of assuming an unbiased median process evolution and a density dependent variance structure, a modeler developing a lognormal SSM for their application may want to specify the mean evolution and the variance structure of the stochastic lognormal process model. That is, we desire a lognormally distributed stochastic evolution function such that $\mathbb{E}[X_{t}\lvert X_{t-1}]=f^{*}(X_{t-1}\lvert\Theta)$ and $\mathbb{V}[X_{t}\lvert X_{t-1}]=\phi_{t}^{-1}$ . We can create a process model with these characteristics by using a moment matching transformation on the mean and precision, specifically:

	$\displaystyle\mu^{*}_{t}$	$\displaystyle=\log\Big{(}\frac{f^{}(X_{t-1}\lvert\Theta)^{2}}{\sqrt{f^{}(X_{t-1}\lvert\Theta)^{2}+\phi_{t}^{-1}}}\Big{)},$		(15)
	$\displaystyle\phi^{*}_{t}$	$\displaystyle=\log\Big{(}1+(f^{*}(X_{t-1}\lvert\Theta)^{2}\phi_{t})^{-1}\Big{)}^{-1}.$		(16)

This approach provides a stochastic lognormal process model with the desired properties (see Appendix 6.1 for derivation):

	$\displaystyle X_{t}\lvert X_{t-1}$	$\displaystyle\sim\text{Lognormal}(\mu^{}_{t},\phi^{}_{t}),$
	$\displaystyle\mathbb{E}[X_{t}\lvert X_{t-1}]$	$\displaystyle=f^{*}(X_{t-1}\lvert\Theta),$
	$\displaystyle\mathbb{V}[X_{t}\lvert X_{t-1}]$	$\displaystyle=\phi_{t}^{-1}.$

Thus, there is a framework for embedding the process model $f^{*}(X_{t-1}\lvert\Theta)$ such that the temporal evolution is unbiased, and that allows for a flexible way to model the variance of the process through time. For example, we can choose $\phi_{t}=\phi$ to model constant variance through time, $\phi_{t}=\phi/f^{*}(X_{t-1}\lvert\Theta)^{2}$ to model density dependent variance, or $\phi_{t}=\phi\exp(-t^{-1})$ to model variance that dissipates over time.

The lognormal moment matching approach can be similarly applied to the observation model to produce a lognormally distributed observation density with the properties $\mathbb{E}[Y_{t}\lvert X_{t}]=g^{*}(X_{t}\lvert\Theta)$ , $\mathbb{V}[Y_{t}\lvert X_{t}]=\tau_{t}$ by using the transformation:

	$\displaystyle\mu^{*}_{t,obs}$	$\displaystyle=\log\Big{(}\frac{g^{}(X_{t}\lvert\Theta)^{2}}{\sqrt{g^{}(X_{t}\lvert\Theta)^{2}+\tau_{t}^{-1}}}\Big{)},$		(17)
	$\displaystyle\tau^{*}_{t}$	$\displaystyle=\log\Big{(}1+(g^{*}(X_{t}\lvert\Theta)^{2}\tau_{t})^{-1}\Big{)}^{-1}.$		(18)

With the forms of both the process model and observation model fully specified in this fashion, it is possible to write a likelihood equation for a model that includes both components, that we call the Lognormal Moment Matching model (LNM3). Suppose that we have observations $\mathbf{Y}$ at a subset of time points $I\subset\{1,\dots,T\}$ . Then the likelihood for the LNM3 is given by

	$\displaystyle\mathscr{L}(X_{1:T},\Theta\lvert\mathbf{Y})=$
	$\displaystyle\prod_{t=1}^{T}\text{log}\mathcal{N}\left(\log\left(\frac{f^{}(X_{t-1}\lvert\Theta)^{2}}{\sqrt{f^{}(X_{t-1}\lvert\Theta)^{2}+\phi_{t}^{-1}}}\right),\log\left(1+(f^{*}(X_{t-1}\lvert\Theta)^{2}\phi_{t})^{-1}\right)^{-1}\right)$
	$\displaystyle\times\prod_{i\in I}\text{log}\mathcal{N}\left(\log\left(\frac{g^{}(X_{i}\lvert\Theta)^{2}}{\sqrt{g^{}(X_{i}\lvert\Theta)^{2}+\tau_{i}^{-1}}}\right),\log\left(1+(g^{*}(X_{i}\lvert\Theta)^{2}\tau_{i})^{-1}\right)^{-1}\right)$		(19)

The moment matching method can be used to embed any positive process or observation model and any variance structure into a stochastic lognormal model, and thus helps to maintain biophysical realism when modeling positive processes. Fitting these models as state space models further allows them to easily handle missing data and partition between process and measurement error. Thus the LNM3 approach provides a framework that is flexible, biophysically realistic, and statistically coherent.

3.3 Model Fitting

Our analysis focused on comparing and contrasting the LNM3 approach to the Gompertz and Moran-Ricker approaches, two common lognormal SSMs. We estimate the latent states, precisions, and model parameters for six different models presented here using Bayesian state space models (SSMs) (Hamilton,, 1994; Durbin and Koopman,, 2012; Petris et al.,, 2009). The six different models include the Gompertz SSM, the Moran-Ricker SSM, and four different SSM formulations using the Lognormal Moment Matching (LNM3) method presented in Equations 15 – 18. Model parameters, latent states, and precisions were estimated using Markov Chain Monte Carlo (MCMC) (Robert and Casella,, 2005). MCMC is an estimation method that generates samples of parameters from their posterior distributions using Markov chains. Parameter uncertainty can be quantified using posterior samples from these Markov chains. All models were fit using JAGS (Plummer,, 2003) using the rjags package (Plummer,, 2019) in R version 4.1.0 (R Core Team,, 2016).

3.3.1 Gompertz SSM Fitting

We fit the Gompertz model in log space, by taking the logarithm of the latent states $(X_{1:T})$ and observations ( $Y_{t\in I}$ ), $D_{t}=\log(X_{t})$ $F_{i}=\log(Y_{i})$ . Under this transformation, the process model and observation model can be written as a Normal Dynamic Linear Model (NDLM) (West and Harrison,, 1997):

	$\displaystyle D_{t}$	$\displaystyle=a+(1+b)D_{t-1}+\epsilon_{t},\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi)$		(20)
	$\displaystyle Y_{i}$	$\displaystyle=D_{i}+\epsilon_{obs,i},\text{ }\epsilon_{obs,i}\sim\mathcal{N}(0,\tau)$		(21)

If the log observations ( $\mathbf{F}$ ) are available at a subset of time points, $I\subset\{1,\dots,T\}$ , the likelihood for the Gompertz model can be written as:

	$\displaystyle\mathscr{L}(D_{1:T},a,b,\tau,\phi\lvert\mathbf{F})\propto$	$\displaystyle\prod_{t=1}^{T}\sqrt{\phi}\exp\left(-\frac{\phi}{2}(D_{t}-a-(1+b)D_{t-1})^{2}\right)$
		$\displaystyle\times\prod_{i\in I}\sqrt{\tau}\exp\left(-\frac{\tau}{2}(F_{i}-D_{i})^{2}\right).$		(22)

Based on the likelihood for the Gompertz SSM, we need prior distributions for $a,b,\phi,\tau$ , and the initial latent state $X_{0}$ . Given the interpretation of $a$ and $b$ as the multiplicative constant and the growth rate in the Gompertz model (Equation 7), we used Uniform $(-10,10)$ priors for both $a$ and $b$ . There are important considerations when choosing the prior distributions for $\phi$ and $\tau$ . Though the likelihood in Equation 22 can exploit conjugacy and use improper Jeffreys priors (Jeffreys,, 1946) for both $\phi$ and $\tau$ , Gelman, (2006) shows that the posterior is sensitive to the choice of $\epsilon$ when using a Gamma( $\epsilon,\epsilon$ ) prior in JAGS to emulate the improper Jeffreys prior. Following the advice of Gelman, (2006) and Polson and Scott, (2012), we use a central half Cauchy prior distribution for the precision parameters. We note that though these studies recommend central half Cauchy priors on variance parameters, under this choice of prior the inverse variance (precision) is also implied to have a central half Cauchy prior (see Appendix 6.2 for proof). We chose the initial condition prior, $\pi(D_{0})$ , to be normally distributed, with an initial mean of $\mu_{0}$ , and an initial precision of $\phi_{0}$ . Thus the priors are given by:

$\displaystyle a$	$\displaystyle\sim\text{Uniform}(-10,10),$	(23)
$\displaystyle b$	$\displaystyle\sim\text{Uniform}(-10,10),$	(24)
$\displaystyle\phi$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$	(25)
$\displaystyle\tau$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$	(26)
$\displaystyle D_{0}$	$\displaystyle\sim\mathcal{N}(\mu_{0},\phi_{0}).$	(27)

The full conditional distributions for $D_{1:T}$ in the Gompertz model are analytically tractable, allowing for Gibbs sampling (Geman and Geman,, 1984). For interior latent states ( $k=1,2,\dots,T-1$ ) the Gibbs updates are given by:

	$\displaystyle\pi(D_{k}\lvert\cdot)\sim N\big{(}$	$\displaystyle\mu^{*}=\frac{\phi((1+b)D_{k-1}+a+(1+b)(D_{k+1}-a))+\tau F_{k}\mathbbm{1}_{k\in I}}{\phi(1+(1+b)^{2})+\tau\mathbbm{1}_{k\in I}},$		(28)
		$\displaystyle\phi^{*}=\phi(1+(1+b)^{2})+\tau\mathbbm{1}_{k\in I}\big{)}.$

The Gibbs updates for the initial latent state and the final latent state are given by

	$\displaystyle\pi(D_{T}\lvert\cdot)$	$\displaystyle\sim N\left(\frac{\phi(1+(1+b)D_{T-1}+a)+\tau F_{T}\mathbbm{1}_{T\in I})}{\phi+\tau\mathbbm{1}_{T\in I}},\phi+\tau\mathbbm{1}_{T\in I}\right),$		(29)
	$\displaystyle\pi(D_{0}\lvert\cdot)$	$\displaystyle\sim N\left(\frac{\mu_{0}\phi_{0}+\phi(1+b)(D_{1}-a)}{\phi(1+(1+b)^{2})+\phi_{0}},\phi(1+(1+b)^{2})+\phi_{0}\right),$		(30)

where the function $\mathbbm{1}_{i\in I}$ is an indicator function that is checking if an observation $F_{i}$ is available at time $i$ .

We ran the MCMC for the Gompertz model for a total of 10,000 iterations, with a burn-in of 2,000 iterations, and an adaptation period (n.adapt in JAGS) of 1,000.

3.3.2 Moran-Ricker SSM Fitting

We also fit the Moran-Ricker model in log space by taking the logarithm of the latent states $(X_{1:T})$ and observations ( $Y_{t\in I}$ ), $D_{t}=\log(X_{t})$ , $F_{i}=\log(Y_{i})$ . Under this log transformation, the process model and observation model can be written as:

	$\displaystyle D_{t}$	$\displaystyle=a+D_{t-1}+\exp(bD_{t-1})+\epsilon_{t},\text{ }\epsilon_{t}\sim\mathcal{N}(0,\phi),$		(31)
	$\displaystyle Y_{i}$	$\displaystyle=D_{i}+\epsilon_{obs,i},\text{ }\epsilon_{obs,i}\sim\mathcal{N}(0,\tau).$		(32)

Assuming that log observations ( $\mathbf{F}$ ) are available at a subset of time points, $I\subset\{1,\dots,T\}$ , the likelihood for the Moran-Ricker model fit in log space is given by:

	$\displaystyle\mathscr{L}(X_{1:T},a,b,\tau,\phi\lvert Y_{I})\propto$	$\displaystyle\prod_{t=1}^{T}\sqrt{\phi}\exp\left(-\frac{\phi}{2}(D_{t}-a-D_{t-1}-\exp(bD_{t-1}))^{2}\right),$
		$\displaystyle\prod_{i\in I}\sqrt{\tau}\exp\left(-\frac{\tau}{2}(F_{i}-D_{i})^{2}\right).$		(33)

Prior distributions for the Moran-Ricker are identical to those chosen for the Gompertz model (Equations 23 – 27). $a$ and $b$ were given Uniform( $-10,10$ ), $\phi$ and $\tau$ were given diffuse half Cauchy priors following recommendations by Gelman, (2006) and Polson and Scott, (2012), and the initial latent state $D_{0}$ was given a normal prior centered at an initial mean $\mu_{0}$ with initial precision $\phi_{0}$ . Altogether, the prior equations are:

	$\displaystyle a$	$\displaystyle\sim\text{Uniform}(-10,10),$
	$\displaystyle b$	$\displaystyle\sim\text{Uniform}(-10,10),$
	$\displaystyle\phi$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$
	$\displaystyle\tau$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$
	$\displaystyle D_{0}$	$\displaystyle\sim\mathcal{N}(\mu_{0},\phi_{0}).$

We ran the MC for Moran-Ricker model for a total of 10,000 iterations, with a burn-in of 2,000 iterations and an adaptation period (n.adapt in JAGS) of 1,000.

3.3.3 Lognormal Moment Match SSM Fitting

We explored four different lognormal models using the moment matching approach. These models include: the Gompertz process model embedded for unbiased mean process evolution and unbiased observation model with constant variances (LGC), the Moran-Ricker process model embedded for unbiased mean process evolution and unbiased observation model with constant variances (LMRC), the Gompertz process model embedded for unbiased mean process evolution and unbiased observation model with density dependent variances (LGD), and the Moran-Ricker process model embedded for unbiased mean process evolution and unbiased observation model with density dependent variances (LMRD). For the remainder of this paper, we will be referring to the first two models (LGC, LMRC) as the constant variance models. The term density dependent models will be used to describe the second two models (LGD, LMRD) as well as the classical Gompertz and Moran-Ricker models. We carefully chose these four models to embed common assumptions for different SSM formulations. The unbiased mean process evolution, unbiased observation density function, and constant variance models (LGC, LMRC) were chosen to mimic the assumptions of homoskedastic Gaussian SSMs, which are not frequently used in lognormal SSMs. The embeddings for the LGD and LMRD models were chosen to mimic the variance structure of the Gompertz and Moran-Ricker models while maintaining an unbiased process evolution function and unbiased observation density function.

Model Name	$f^{*}(X_{t-1}\lvert\Theta)$	$\phi_{t}$	$g^{*}(X_{i}\lvert\Theta)$	$\tau_{i}$
LGC	$\exp(a)(X_{t-1})^{b+1}$	$\phi$	$X_{i}$	$\tau$
LMRC	$X_{t-1}\exp(a+bX_{t-1})$	$\phi$	$X_{i}$	$\tau$
LGD	$\exp(a)(X_{t-1})^{b+1}$	$\phi(\exp(a)(X_{t-1})^{b+1})^{-2}$	$X_{i}$	$\tau X_{i}^{-2}$
LMRD	$X_{t-1}\exp(a+bX_{t-1})$	$\phi(X_{t-1}\exp(a+bX_{t-1}))^{-2}$	$X_{i}$	$\tau X_{i}^{-2}$

Table 1: Process evolution functions, process error structure, observation density function, and observation error structure for the four types of Lognormal Moment Matching SSMs used. LGC and LMRC represent the Gompertz and Moran-Ricker process functions and observation functions with constant process and measurement variance. LGD and LMRD represent the Gompertz and Moran-Ricker process functions and observation functions with density dependent process and measurement variance.

The likelihood for the constant variance models is obtained by substituting the values for $f^{*}(X_{t-1}\lvert\Theta)$ , $\phi_{t},g^{*}(X_{i}\lvert\Theta)$ and $\tau_{i}$ from Table 1 into the LNM3 likelihood (Equation 19). For the constant variance models, two of the prior choices were modified from the priors used for the Gompertz and Moran-Ricker models. First, the prior distribution on $a$ was modified to a strictly positive uniform distribution, to reflect the non-negativity of the latent process $X_{1:T}$ . Second, the prior on the initial condition, $X_{0}$ , was changed from $\mathcal{N}(\mu_{0},\phi_{0})$ to $\text{Log}\mathcal{N}(\mu_{0},\phi_{0})$ since the constant variance models are not being fit in log space. Priors for $b,\phi,$ and $\tau$ (Equations 24 - 26) were not changed. Altogether, the priors for the constant variance models are

$\displaystyle a$	$\displaystyle\sim\text{Uniform}(0,10),$	(34)
$\displaystyle b$	$\displaystyle\sim\text{Uniform}(-10,10),$	(35)
$\displaystyle\phi$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$	(36)
$\displaystyle\tau$	$\displaystyle\sim\text{HalfCauchy}(\gamma=100),$	(37)
$\displaystyle D_{0}$	$\displaystyle\sim\text{Lognormal}(\mu_{0},\phi_{0}).$	(38)

The density dependent moment matching models were fit in log space, by taking the log of the latent states and observations; $D_{t}=\log(X_{t}),F_{i}=\log(Y_{i})$ . The model likelihood in log space for the LGD model can be written as:

$\displaystyle\mathscr{L}(D_{1:T},a,$	$\displaystyle b,\tau,\phi\lvert\mathbf{F})\propto$
	$\displaystyle\prod_{t=1}^{T}\frac{\exp\left(-\frac{\log(1+\phi^{-1})^{-1}}{2}(D_{t}-(a+(1+b)D_{t-1}-.5\log(1+\phi^{-1})))^{2}\right)}{\sqrt{\log(1+\phi^{-1})}}$
	$\displaystyle\prod_{i\in I}\frac{\exp\left(-\frac{\log(1+\tau^{-1})^{-1}}{2}(F_{i}-(D_{i}-.5\log(1+\tau^{-1})))^{2}\right)}{\sqrt{\log(1+\tau^{-1})}}.$	(39)

Similarly, the model likelihood in log space for the LMRD model can be written as:

$\displaystyle\mathscr{L}($	$\displaystyle D_{1:T},a,b,\tau,\phi\lvert\mathbf{F})\propto$
	$\displaystyle\prod_{t=1}^{T}\frac{\exp\left(-\frac{\log(1+\phi^{-1})^{-1}}{2}(D_{t}-(a+D_{t-1}+b\exp(D_{t-1})-.5\log(1+\phi^{-1})))^{2}\right)}{\sqrt{\log(1+\phi^{-1})}}$
	$\displaystyle\prod_{i\in I}\frac{\exp\left(-\frac{\log(1+\tau^{-1})^{-1}}{2}(F_{i}-(D_{i}-.5\log(1+\tau^{-1})))^{2}\right)}{\sqrt{\log(1+\tau^{-1})}}.$	(40)

Priors for these two density dependent moment matching models were chosen to be identical to those chosen for the Gompertz and Moran-Ricker models (Equations 23 – 27).

The specification of the density dependent variance as $\phi_{t}=\phi f^{*}(X_{t-1}\lvert\Theta)^{2}$ and the log-linear nature of the Gompertz curve conveniently lead to closed form full conditional distributions for the latent states in the LGD model. Similar to the Gompertz model, this allows for Gibbs sampling to be used to update the latent states, which helps to decrease computation time. The derivation of the full conditional distributions is involved, and can be found in the Appendix.

We ran the MC for each of the four moment matching models for a total of 10,000 iterations, with a burn-in period of 2,000 iterations, and an adaptation period (n.adapt in JAGS) of 1,000.

3.4 Simulation Study

We designed a simulation study to assess the forecasting performance of the six models presented here, both under the cases where they are the true generating models and where the models are mis-specified. Our three primary objectives for the simulation study were: 1) assess forecasting performance of each model when it is the true generating model and when it is misspecified, when observation precision is being estimated 2) assess forecasting performance of each model when it is the true generating model and when it is misspecified, with fixed observation precision; 3) analyze the estimability of precisions for the models considered in this manuscript.

To assess the forecast performance of our models, we used proper scoring rules (Gneiting and Raftery,, 2007). Broadly, proper scoring rules use information about the predictive distribution coupled with observations to assign a measure of agreement of the forecast and the observations (Krüger et al.,, 2021). Specifically, Gneiting and Raftery, (2007) define a scoring rule to be proper if the expected value of the score is maximized by a draw from the true forecast distribution, and show that both the Continuous Ranked Probability Score (CRPS; Matheson and Winkler,, 1976) and the Ignorance (IGN; Good,, 1952; Roulston and Smith,, 2002) score are proper scoring rules. Given a probability density function $f(\cdot)$ and corresponding cumulative distribution function $F(\cdot)$ for our forecast, and an observation $y$ , the IGN and CRPS may be written:

	$\displaystyle\text{IGN}(y)$	$\displaystyle=-\log(f(y));$		(41)
	$\displaystyle\text{CRPS}(y)$	$\displaystyle=\int_{\mathbb{R}}(F(z)-\mathbbm{1}_{z\geq y})^{2}dz.$		(42)

For our simulation studies, we consider both the CRPS and IGN scores, so that we have one scoring rule defined in terms of the probability density function (IGN) and one scoring rule defined in terms of the cumulative distribution function (CRPS). We use the IGN and CRPS scores to quantitatively compare forecasts, with lower scores within a scoring rule indicating a better performance. CRPS has support over the positive real line, $[0,\infty)$ , while IGN can take values between $[-\log(f(y^{*})),\infty)$ , where $y^{*}=\text{argmax}_{y\in\mathbb{R}}f(y)$ .

To analyze the estimability of precision parameters in each of our six models, we used the empirical coverage rate of the 95% highest posterior density credible intervals for the precision parameters. Auger-Méthé et al., (2016) show that SSMs can have difficulty recovering the process and observation precisions even when the models are linear and Gaussian. To perform a thorough analysis on the estimability of the precision parameters, we fit each model under two different scenarios. In the first scenario, we fixed the observation precision and estimated the process precision for each model. In the second scenario, we estimated both the observation precision and the process precision for each model. This approach allowed us to assess the estimability by looking at the increase in empirical coverage rate for the process precision when the observation precision is fixed compared to when the observation precision is estimated. By choosing to use coverage and proper scoring rules to quantify our model performance we follow a common approach used in the literature – using frequentist concepts to assess Bayesian models (Box,, 1980; Rubin,, 1984; Little,, 2006, 2012).

Param.	Gomp	MR	LGC	LMRC	LGD	LMRD
$\exp(a)$	.82	1.26	1.21	1.11	1.21	1.11
$b$	-.658	-.034	-.099	-.014	-.099	-.014
$\phi$	70.2	51.9	4	4	70.2	70.2
$\tau$	188.7	188.7	4	4	188.7	188.7

Table 2: Parameter values used for the six different model formulations to create thirty synthetic datasets for each generating model.

To quantify our three objectives, we performed the simulation study as follows: for each of the six models, thirty different synthetic datasets of length 575 were generated by simulating from the underlying process model and observation model, with parameter values taken from Table 2. Parameter values for each model were chosen so that the systems had similar mean dynamics for each generating model. Each synthetic dataset was fit to each of the six models discussed here. Models were initially fit with the first 365 days of data, and then forecasts were computed for days 366 – 372, a seven day forecast horizon. The average IGN and CRPS for the seven day forecasts horizons were computed using the logs_sample and crps_sample function from the scoringRules R package (Jordan et al.,, 2017). We also saved the highest posterior density credible intervals for the estimated precisions. We then re-fit the models with the first 372 days of data, and forecasts were computed for days 373–379. This process was repeated until the synthetic data was exhausted. Overall, each individual synthetic data set was fit a total of 12 times – once by each model with observation precision known and once by each model with observation precision estimated, for a total of 2,160 simulations.

This simulation study design helps us to assess our three objectives. Our first two objectives were to assess the forecasting performance of each model under model mis-specification; both in the case of estimated observation precision and in the case of fixed observation precision. Our simulation study had each model fit each dataset twice: once with observation precision fixed and once with observation precision estimated. The forecasts for each precision scenario were evaluated using the CRPS and IGN scores. By using the same synthetic datasets, we were able to isolate the differences in forecast performance for each precision scenario and quantify the difference using paired t-tests with a Holm-Bonferroni adjustment (Holm,, 1979). With thirty different synthetic datasets generated by each model and each synthetic dataset evaluated over thirty forecast horizons by each of the six models, we were able to get a comprehensive picture of how each model performed under model mis-specification for each of the precision scenarios. Our third objective was to analyze the estimability of precisions for the models considered in this manuscript. We used differences between the scores of the forecasts and coverage rates of the precision credible intervals to determine the severity of underlying precision estimation problems. In the absence of estimation problems, we expect that the empirical coverage rates of the precision parameters should achieve close to the nominal coverage rate of 95%. CRPS and IGN are proper scoring rules, and their expected scores are maximized by draws from the true forecast distribution (Gneiting and Raftery,, 2007). Thus we expect that across a large number of simulations, the true generating model should score best in terms of CRPS and IGN in the absence of estimation problems.

3.5 Application: Leaf Area Index Predictions

We examined the differences in predictive performance between two different LN-SSM formulations by modeling Leaf Area Index (LAI) at University of Notre Dame Environmental Research Center (UNDE), a National Ecological Observatory Network (NEON) site. LAI is the total surface of leaves per area of ground and is a metric of photosynthetic capacity of an ecosystem. We obtained estimates LAI at UNDE from Oak Ridge National Laboratory Distributed Active Archive Center using their fixed subsets feature (DAAC,, 2018). The dataset contains estimates of LAI at four day intervals, computed using reflectance estimates the from Moderate Resolution Imaging Spectroradiometers (MODIS) satellite as inputs to a phenological model that predicts LAI (Yang et al.,, 2006). The two statistical models that we considered were a biased LN-SSM, based on Equations 9 and 14, and a moment matching LN-SSM, based on Equations 15, 16, 17, and 18. Our analysis was designed with two primary questions in mind: 1) can we construct informed out-of-sample predictions for LAI for long horizons (e.g., multiple seasons) by embedding a process-based ecosystem model into a lognormal state space framework?; 2) do the moment matching and biased formulations show differences in out-of-sample predictive performance for long horizons, when measured by CRPS and IGN?

To fit the LAI data, we used a reduced version of the DALEC2 process-based ecosystem model (Bloom and Williams,, 2015). In DALEC2, the LAI is modeled as a function of the carbon stored in foliage. Rather than fitting the full model (a six dimensional dynamical system with 23 process parameters) we only considered the interaction between foliage carbon ( $C_{f}$ ) and the labile carbon ( $C_{lab}$ ). This reduced the model to a two dimensional dynamical system with eight process parameters. The form of the reduced process model is given as:

	$\displaystyle\mathbf{C}^{(t)}=M_{t}\mathbf{C}^{(t-1)}+p_{t},\text{ where}$		(43)
	$\displaystyle M_{t}=\begin{bmatrix}1-\Phi_{f}^{(t)}&\Phi_{o}^{(t)}\\ 0&1-\Phi_{o}^{(t)}\end{bmatrix},\mathbf{C}^{(t-1)}=\begin{bmatrix}C_{f}^{(t-1)}\\ C_{lab}^{(t-1)}\end{bmatrix},p_{t}=\begin{bmatrix}G(\mathbf{D^{(t)}},c_{lma},c_{eff})f_{f}\\ G(\mathbf{D^{(t)}},c_{lma},c_{eff})f_{lab}\end{bmatrix}$
	$\displaystyle\text{LAI}^{(t)}=\frac{C_{f}^{(t)}}{c_{lma}}$		(44)

	$\displaystyle\Phi_{f}(t,d_{f},c_{r},c_{lf})=\sqrt{\frac{2}{\pi}}\cdot\frac{-\log(1-c_{lf})}{c_{rf}}\exp\left(-\left(\sin\left(\frac{t-d_{f}+\psi_{f}}{s}\right)\frac{\sqrt{2}s}{c_{rf}}\right)^{2}\right),$		(45)
	$\displaystyle\Phi_{o}(t,d_{o},c_{ro})=\sqrt{\frac{2}{\pi}}\left(\frac{6.9088}{c_{ro}}\right)\exp\left(-\left(\sin\left(\frac{t-d_{o}+.6245c_{ro}}{s}\right)\frac{\sqrt{2}s}{c_{ro}}\right)^{2}\right),$		(46)

where $s=365.25/\pi$ and $\psi_{f}=-\sqrt{W_{0}\left(\left(2\pi\log(1-c_{lf})^{2}\right)^{-1}\right)}/\sqrt{2}$ , where $W_{0}$ is the principal branch of the Lambert W function (Lambert,, 1758), and $G(\mathbf{D^{(t)}},c_{lma},c_{eff})$ is the output of the Aggregated Canopy Model (ACM) for gross photosynthetic production (Williams et al.,, 1997). $\mathbf{D}^{(t)}$ represents meteorological driver variables for day $t$ that include: daily minimum and maximum temperatures (^∘C), daily incoming shortwave radiation (gCm^-1), and atmospheric carbon (CO₂ ppm). Minimum temperatures, maximum temperatures, and daily shortwave radiation were obtained from the National Ecological Observatory Network (NEON; National Ecological Observatory Network,, 2020; National Ecological Observatory Network , 2022b, NEON; National Ecological Observatory Network , 2022a, NEON). We imputed any missing NEON observations using a piece-wise linear interpolation. We took monthly measurements of atmospheric carbon from the Scripps Project (Keeling et al.,, 2005), and interpolated daily measurements by assigning the monthly values to each day within the month.

To fit the reduced DALEC2 as a LN-SSM, we embedded the process-based model (Equation 43) as the state component and the LAI equation (Equation 44) as the observation component. To model the variance structure of the process evolution function and observation function, we used a density dependent variance. We chose a density dependent variance structure because we expect more process variation and measurement error for foliage carbon when it is large during the early spring and summer months, and less when it is low in the winter months. We considered two different LN-SSM formulations with density dependent variance structures. The first model, the biased model, embeds the process and observation components in a biased manner using Equation 9 and Equation 14. The second model that we consider embeds the process and observation components in an unbiased manner using a moment matching approach. The process evolution function and the observation function for the moment matching model are given by:

	$\displaystyle\mathbf{C}^{(t)}\lvert\mathbf{C}^{(t-1)}\sim\text{MV}\log\mathcal{N}\left(\log\left(M_{t}\mathbf{C}^{(t-1)}+p^{(t)}\right)-.5\log(\mathbf{11}^{T}+\Omega_{m})\mathbf{1},\log(\mathbf{11}^{T}+\Omega_{m})\right),$
	$\displaystyle\text{LAI}^{(i)}\lvert\mathbf{C}^{(i)}\sim\log\mathcal{N}\left(\log\left(\frac{C_{f}^{(i)}}{c_{lma}}\right)-.5\log(1+\tau_{m}^{-1}),\log(1+\tau_{m}^{-1})\right),$

where $M_{t},p_{t}$ and $\mathbf{C^{(t)}}$ take on the same values from Equation 43, $\mathbf{1}$ represents a $2\times 1$ vector of ones, $\Omega_{m}=\text{Diag}(\omega^{2}_{f,m},\omega^{2}_{lab,m})$ , and the $\log$ in the process evolution variance represents taking the component-wise logarithm of each element of the matrix/vector.

The process evolution function and observation function for the biased model are given by:

	$\displaystyle\mathbf{C}^{(t)}\lvert\mathbf{C}^{(t-1)}\sim\text{MV}\log\mathcal{N}\left(\log\left(M_{t}\mathbf{C}^{(t-1)}+p^{(t)}\right),\Omega_{b}\right),$
	$\displaystyle\text{LAI}^{(i)}\lvert\mathbf{C}^{(i)}\sim\log\mathcal{N}\left(\log\left(\frac{C_{f}^{(i)}}{c_{lma}}\right),\tau_{b}^{-1}\right),$

where $\Omega_{b}=\text{Diag}(\omega^{2}_{f,b},\omega^{2}_{lab,b})$ , and the $\log$ once again represents a component-wise logarithm. The subscripts $b$ and $m$ are used to distinguish between parameters of the biased and moment matching models, respectively. Additional information on expected values and variances for the two different DALEC2 LN-SSM formulations can be found in Table 6 in the Appendix.

The equations above that we use to define our LN-SSM also highlight an interesting difficulty of our analysis: we are fitting a two state dynamical system, but we only have observations for one of the states, ( $C_{f}$ ). Although we are primarily interested in predicting LAI, which is a function of $C_{f}$ , we still need to estimate labile carbon so that we can model it’s contribution to foliage during the period of leaf regrowth. This also highlights an advantage of our state space modeling approach: the uncertainty from our lack of labile carbon measurements is propagated through time, giving us a more complete picture of the uncertainty in the foliage carbon.

We treated two model parameters as fixed: the Leaf Mass per Area ( $c_{lma}$ ) and the density dependent observation precision parameter ( $\tau$ ). We fixed $c_{lma}$ to avoid identifiability issues with the canopy efficiency parameter ( $c_{eff}$ ), as the two parameters appear exclusively together in the ACM (Williams et al.,, 1997). For both models, we fixed $c_{lma}$ to a value of 75, based on empirical results from Serbin et al., (2019) that were calibrated specifically at UNDE. We estimated the density dependent observation precision parameter using historical data from MODIS, which reports both the standard deviation and the mean of the LAI estimates. For the moment matching model, we took the median value of the means divided by the standard deviations. This gave us a value of $\hat{\tau}_{m}\approx 4$ . For the standard lognormal SSM formulation, we used the form of the variance for the lognormal distribution (Equation 10) to obtain $\hat{\tau}_{m}\approx 4.18$ . For the DALEC2 process parameters, we used uniform prior distributions over the range of acceptable values taken from Bloom and Williams, (2015). For the density dependent process variance components, we used a $\text{Uniform}(0,1)$ distribution as the prior. We chose this because the $\omega$ parameters for the biased and moment matching parameters have the interpretation that they roughly control the average proportion of process error at each time point. All prior distributions for parameters were identical for the moment matching model and the biased model. A full table detailing prior distributions and fixed parameters can be found in Appendix 6.3.

We fit both of the models using particle MCMC (pMCMC; Andrieu et al.,, 2010). We ran each model for a total of 881 days, from September 5th, 2019 to February 2nd, 2022. We used four day MODIS LAI measurements from September 5th, 2019 to September 6th, 2021 to fit each model, and we used the remaining MODIS LAI measurements from September 7th, 2021 to February 2nd, 2022 to assess out of sample prediction. To assess out of sample predictions, we used the pMCMC samples of the latent states to generate samples from the posterior predictive distribution for the observations. We then used these samples to validate against the out of sample MODIS LAI measurements using CRPS and IGN using the scoringRules package (Jordan et al.,, 2017) in the R programming language (R Core Team,, 2016). By doing this, we are scoring on $Y\lvert X$ rather than directly on the observation $Y$ , and acknowledge that the LAI observations that we are using for validation have measurement error (Ferro,, 2017; Bessac and Naveau,, 2021). We implemented pMCMC using the R package pomp (King et al.,, 2016). We ran each model for a total of 100,000 iterations, with a burn in of 50,000 iterations, 500 particles, and an adaptive multivariate normal proposal distribution (Andrieu and Thoms,, 2008; Rosenthal,, 2009) that began using a scaled empirical covariance matrix after 1,000 samples are accepted. To obtain initial parameter estimates that start in a region of high posterior density, we used a Gaussian process surrogate model optimization (Gramacy,, 2020) using TRBO (Eriksson et al.,, 2019), with the particle filter marginal log-likelihood as the objective function.

LAI is ubiquitous for forecasting changes in carbon stored in different components of a terrestrial ecosystem under different projections of climate. For our analysis, we chose to predict over multiple seasons, in an attempt to emulate a forecasting scenario of LAI response to predicted changes in climate, such as a particularly hot or cold winter. Our analysis also serves as an efficacy test for predictive modeling of LAI using a state space framework. While much work has been done on LAI prediction using ecosystem process-based models (see Mahowald et al.,, 2016; Ercanli et al.,, 2018, for examples), to our knowledge there has been little work done on predicting LAI by embedding a mechanistic process-based ecosystem model as the process component of a statistical state space model.

4 Results

4.1 Simulation Study

The first objective of our simulation study was to assess the forecasting performance of each model under model mis-specification when observation precision is estimated. We expected that the true generating model would perform best on average for its thirty synthetic datasets, using the average IGN and CRPS scores over each seven day forecast horizon. For the case where both the process precision, $\phi$ , and the observation precision, $\tau$ , were estimated, the Gompertz model (Gomp in Table 3) had the best forecasting performance among the density dependent models (MR, Gomp, LMRD, LGD) for both CRPS and IGN. The unbiased moment matching analog to the Gompertz, LGD, also had a strong performance, scoring second highest for IGN on all four density dependent variance models, and scoring second highest for CRPS on three out of the four. For the constant variance models (LMRC, LGC), the constant variance Moran-Ricker model (LMRC) had the best forecasting performance for both CRPS and IGN.

Generating Model
Model	MR	Gomp	LMRC	LGC	LMRD	LGD
Average CRPS
MR	.8587	.8027	.6496	.6362	.8752	.8970
Gomp	.8344	.7773	.5997	.5952	.8279	.8500
LMRC	.8356	.7810	.5913	.5889	.8411	.8672
LGC	.8375	.7809	.5927	.5900	.8440	.8672
LMRD	.8380	.7872	.6181	.6100	.8357	.8585
LGD	.8357	.7789	.6080	.6016	.8310	.8517
Average IGN Score
MR	1.816	1.744	1.595	1.557	1.806	1.830
Gomp	1.785	1.710	1.514	1.494	1.751	1.774
LMRC	1.812	1.757	1.456	1.452	1.858	1.915
LGC	1.806	1.750	1.458	1.454	1.840	1.882
LMRD	1.793	1.723	1.537	1.507	1.762	1.790
LGD	1.792	1.714	1.530	1.500	1.755	1.778

Table 3: Average CRPS and IGN scores for the simulations where both

\tau

and

\phi

are estimated. Columns represent the generating model for the synthetic datasets and rows represent the models used to fit the datasets. Scores are averaged over thirty different synthetic datasets and thirty different 7 day forecast horizon for each combination of generating model and model used to fit the data. Bolded entries represent the lowest score for a given generating model, and italicized entries represent the second lowest score.

The second objective of our simulation study was to assess the forecasting performance of each model under model mis-specification when observation precision was fixed. For the simulations where the observation precision, $\tau$ , was fixed and the process precision, $\phi$ , was estimated, we found higher consistency between our scoring rules and the generating model. For the simulations that estimated both precision parameters (Table 3), the Gomp and LGD models consistently outperformed the other density dependent variance models for both scoring rules. In contrast, the simulations where $\tau$ was fixed and $\phi$ was estimated (Table 4) showed a more equal representation among the density dependent models. For the four density dependent generating models, the Moran-Ricker and its unbiased moment matching counterpart scored the best for CRPS, with each model producing the lowest score twice. When measuring performance with IGN, the Moran-Ricker and Gompertz models each scored lowest twice. The density dependent models all scored similarly, agnostic to the choice of true generating model, both for CRPS (max difference $\approx.0052$ ) and for IGN (max difference $\approx.006$ ).

Generating Model
Model	MR	Gomp	LMRC	LGC	LMRD	LGD
Average CRPS
MR	.8332	.7770	.5968	.5945	.8230	.8492
Gomp	.8358	.7774	.6000	.5954	.8276	.8508
LMRC	.8357	.7813	.5911	.5891	.8337	.8619
LGC	.8362	.7805	.5929	.5897	.8349	.8603
LMRD	.8332	.7766	.5971	.5937	.8233	.8495
LGD	.8359	.7777	.5999	.5959	.8282	.8510
Average IGN Score
MR	1.785	1.711	1.502	1.487	1.748	1.775
Gomp	1.788	1.710	1.515	1.494	1.752	1.774
LMRC	1.819	1.764	1.455	1.453	1.848	1.905
LGC	1.813	1.750	1.458	1.453	1.834	1.887
LMRD	1.786	1.712	1.505	1.486	1.752	1.780
LGD	1.788	1.710	1.513	1.494	1.752	1.776

Table 4: Average CRPS and IGN scores for the simulations where

\phi

is estimated and

\tau

is known. Columns represent the generating model for the synthetic datasets and rows represent the models used to fit the datasets. Scores are averaged over thirty different synthetic datasets and thirty different 7 day forecast horizon for each combination of generating model and model used to fit the data. Bolded entries represent the lowest score for a given generating model, and italicized entries represent the second lowest score.

Our third objective was to analyze the estimability of precisions for each of the six models we used. Our first investigation for the estimability of the precision parameters was to quantify the differences in IGN and CRPS scores between the simulations where the observation, $\tau$ was fixed and the simulations where $\tau$ was estimated. This investigation is also intimately related to the performance of the models under model mis-specification, and therefore provides insight for all three of our objectives. To evaluate the impacts statistically, we used a paired t-test. To do this, we took the average IGN and CRPS scores over each seven day forecast horizon for the two different scenarios, and treating them as ”before” and ”after”. We justify this by noting that each precision scenario was fit using identical synthetic datasets, with the only difference being whether $\tau$ was estimated or not. Unsurprisingly, we found that fixing the observation precision helped to improve forecasting performance for nearly all models. The LGC and LMRC models performed significantly better in terms of CRPS when $\tau$ was fixed (LGC: $p=4.7\text{e-}05$ ; LMRC: $p=.0032$ ), but did not perform significantly better in terms of IGN (LGC: $p=.68$ ; LMRC: $p=.36$ ). The Moran-Ricker, LMRD, and LGD models all had statistically significant decreases in IGN (MR: $p<2\text{e-}16$ ; LMRD: $p=2.1\text{e-}09$ ; LGD: $2.8\text{e-}05$ ) as well as CRPS (MR: $p<2\text{e-}16$ ; LMRD: $p<2\text{e-}16$ ; LGD: $1.4\text{e-}05$ ) when $\tau$ was fixed. Fixing the observation precision did not significantly impact the performance of the Gompertz model for IGN ( $p=.84$ ) or CRPS ( $p=.82$ ).

Our second investigation for the estimability of the precision parameters was to use the empirical coverage rate of the HPD intervals. The empirical coverage of the highest posterior density (HPD) credible intervals for $\phi$ unanimously increased for the simulations where $\tau$ was fixed, and all six models had empirical coverage that falls close to the nominal rate. For the simulations where both $\phi$ and $\tau$ are estimated, Table 5 shows that the Gompertz model and LGD models produce the best empirical coverage for the precision parameters. This is likely to be related to their excellent performance in these simulations, where other models struggled to consistently produce precision estimates that contained the ground truth in their 95% HPD intervals. The empirical undercoverage of the Moran-Ricker (MR) model for both $\phi$ and $\tau$ may also explain its poor performance in the forecast results from Table 3, where it came in last place for CRPS for all six generating models and last place for IGN for three out of six generating models, including the case where it was itself the true generating model.

Generating Model
Coverage	MR	Gomp	LMRC	LGC	LMRD	LGD
$\phi$ Cvg	72.3%	89.1%	89.2%	86.9%	85.8%	89%
$\tau$ Cvg	74.7%	96.4%	92.1%	91.8%	91.7%	95.3%
$\phi$ Cvg, $\tau$ fixed	95.2%	94.8%	94.1%	93.6%	94.2%	94.9%

Table 5: Average empirical HPD credible interval coverage for the precisions of each generating model, under the scenarios where both

\phi

and

\tau

are estimated and when

\tau

is fixed. Coverage rates are averaged over all thirty synthetic datasets and all thirty forecast horizons, for a total of 900 samples.

4.2 Leaf Are Index Predictions at UNDE

We found that both the moment matching LN-SSM and the biased LN-SSM produced predictive distributions that captured the dynamics of both the in sample and the out of sample LAI observations. Both models showed similar fits for the in sample LAI observations when looking at medians (Figure 3) and 90% highest posterior density intervals (Figure 3). This was not surprising to us, as both models used an identical process model and prior distributions, and only differed slightly in the formulation of process evolution and observation functions. For the out of sample LAI predictive distributions, the models behaved differently. The biased model (Figure 3, top panel) had lower variance at the start of the predictive horizon, and then began to tail off at the end of the horizon. The moment matching model (Figure 3, bottom panel) had larger predictive variance at first, but then leveled off and accumulated slowly at the end of the horizon. The median predicted values for the moment matching model are slightly larger than the median predicted values for the biased model over the entire horizon. This is an interesting result: for identical process parameter values, the median of the moment matching models should be strictly smaller than the median of the biased model. This tells us that there is some mismatch between the parameters being estimated for the biased model and the moment matching model. This is something that we must be cautious about, especially since we are using parameters that have physical interpretations and modeling the evolution by embedding an ecosystem model that was designed to be deterministic.

We also found differences in performance between the two models when measured by IGN and CRPS. When measured by mean IGN across the 32 out of sample LAI measurements, the two models had similar performance, with the moment matching model performing slightly better (mean ${\text{IGN}}_{mm}=.3566$ ; mean ${\text{IGN}}_{biased}=.3850$ ). The moment matching model had marginally better IGN scores for the observations where neither model performed well (e.g. out of sample measurements two, three, and four, Figure 4, top panel), and the models had similar IGN score performance otherwise. When measured by mean CRPS across the 32 out of sample LAI measurements, the moment matching model showed a much better performance (mean ${\text{CRPS}}_{mm}=.2389$ ; mean ${\text{CRPS}}_{biased}=179.93$ ). Towards the end of the prediction horizon the CRPS for the biased model quickly increases, while the CRPS for the moment matching model stays comparatively small. We believe that this happens because of an accumulation of bias in the biased model combined with the heavy tails of the lognormal distribution.

5 Discussion

Although the Gaussian distribution is a common choice in modeling applications, many ecological processes have strict lower bounds that are not accurately captured by Gaussian models. This mismatch becomes especially problematic in forecasting applications, where uncertainty grows as the forecast horizon increases. However, embedding assumptions about the evolution of the latent process and variance dynamics into non-Gaussian SSM frameworks can be challenging. To remedy this, we proposed a method for embedding non-negative process and observation models with arbitrary variance structures into lognormal Bayesian SSMs using moment matching (LNM3). The primary advantage of our method is flexibility: it allows practitioners to create stochastic lognormal distributions for process and measurement components that are unbiased in terms of their mean evolution and observations, have a flexible variance that can change through time, and offer a closed form Markov transition density that allows models to be fit with MCMC software such as JAGS (Plummer,, 2003).

We used a computationally intensive Monte Carlo approach to assess the forecasting performance of the six models discussed here, using a total of 180 synthetic datasets that were fit twelve times each: once by each model with the observation precision estimated, and once by each model with the observation precision fixed. We found that the forecasting performance of our models under mis-specification was heavily dependent on whether or not the observation precision was fixed, and also dependent on the metric used for evaluation: CRPS or IGN. With the observation precision estimated, the Gompertz model had the best average CRPS and IGN scores across all of the synthetic datasets for four out of the six generating models. With the observation precisions fixed at the true values, we found that every model except for the Gompertz model had a significant increase in forecast performance when measured by average CRPS or average IGN. For these simulations, no one model dominated the others in terms of forecast performance for either metric. The Gompertz model outperforming the true generating models identifies one of the difficulties of using proper scoring rules to evaluate forecast performance, especially if using forecasting performance as a way to guide model choice. Although the CRPS and IGN scores should favor the true generating model in expectation, we are unlikely to have access to the ground truth parameter values and instead have to rely on estimates of parameter values from our MCMC.

Even simple linear Gaussian SSMs can be prone to estimation problems, especially for parameters that govern the variance structure of the process and observations (Auger-Méthé et al.,, 2016). We found that our models were no exception to this: when both the process precision and observation precision were estimated, no model came within 5% of the nominal 95% coverage rate of the process precision HPDs. We also found that estimation of the precision parameters was closely related to forecasting performance. For the simulation where both observation and process precision parameters were estimated, the two models with the best performance (Gompertz and LGD) were also the models that had empirical coverage rates closest to 95% for the precision parameters. Similarly, the Moran-Ricker model was 20% below the nominal coverage rate for both the observation and process precision parameters, and had the worst average CRPS scores for every generating model, including itself. For the simulation studies where the observation precision was fixed, we found that coverage rates for each of the six models were close to the nominal 95% coverage rate, with the empirical coverage rates ranging from 93.6% to 95.2%. This supports the findings from Auger-Méthé et al., (2016), who show that fixing the measurement error in linear Gaussian SSMs can help to alleviate estimation problems.

We tested the efficacy of our method when applied to a challenging problem by embedding a two-dimensional process-based ecosystem model into a LN-SSM and using it to predict Leaf Area Index (LAI). Overall, we found that both models performed well in reconstructing latent states that had good agreement with the measurements while capturing the dynamics of the out of sample measurements that we used for validation. Both models showed similar fits for the in sample LAI measurements, but showed differences in out of sample predictive performance. The moment matching model, developed using the methodology we present here, had a superior performance for the out of sample LAI when assessed using both IGN scores and CRPS. This lends credence to the idea that having a flexible mechanism for adjusting the process evolution, observation function, and variance structure, even ever so slightly as we did here, can help to better capture out of sample dynamics and improve predictive performance. Towards the end of the predictive horizon, the biased model begins to perform very poorly when measured by CRPS, and the moment matching model continues to perform well. This indicates that the moment matching framework used here may be better for applications that have long predictive horizons, such as multi-year projections of LAI under different climate scenarios.

Though we saw better performance using our moment matching technique, the analysis that we did here serves mainly as proof-of-concept for state space modeling of LAI, and there is much room for improvement in future work. The largest improvement would be to integrate additional data streams, and to include weather drivers that are forecast. For example, including data on litterfall accumulation from NEON would help to further constrain process parameters, process variance, and potentially improve out of sample prediction performance, and including forecasted weather drivers adds an additional level of uncertainty to our model predictions. Similarly, in future work we can consider additional methods for helping to constrain parameters in the absence of direct observations, such as the ecological data constraints described in Bloom and Williams, (2015).

Though the methods presented here use the lognormal distribution to represent stochasticity, the moment matching approach broadly applies to other distributions as well, and provides opportunities for future directions. For example, the gamma distribution has been considered for state space modeling (Smith and Miller,, 1986) and stochastic differential equation modeling (Dennis and Patil,, 1984) applications, has non-negative support, can be parameterized in terms of its mean and variance using a moment matching approach, and has lighter tail behavior than the lognormal distribution. The beta distribution, which is beginning to be used in SSMs (see Osthus et al.,, 2017; Deo and Grover,, 2021, for examples), has a useful support for modeling proportions and can also be parameterized in terms of its mean and variance to allow for moment matching approaches.

In conclusion, to address biological non-realism in models of physical systems, we proposed a novel lognormal state space modeling framework that preserves positivity of the latent process and observations. The methods presented here allow practitioners to embed complex process models and error dynamics into state space models while ensuring that the forecasts they get out of the model agree with the constraints of the system. The flexibility of the moment matching method for representing complex systems along with the variance partitioning of the state space model provide a coherent statistical framework for forecasting, in terms of biophysical realism, forecast assessment, and uncertainty quantification.

References

Andrieu et al., (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.
Andrieu and Thoms, (2008) Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive mcmc. Statistics and Computing, 18:343–373.
Auger-Méthé et al., (2016) Auger-Méthé, M., Field, C., Albertsen, C., Derocher, A., Lewis, M., Jonsen, I., and Flemming, J. (2016). State-space models’ dirty little secrets: Even simple linear gaussian models can have estimation problems. Scientific Reports, 6:266–277.
Auger-Méthé et al., (2021) Auger-Méthé, M., Newman, K., Cole, D., Empacher, F., Gryba, R., King, A. A., Leos-Barajas, V., Mills Flemming, J., Nielsen, A., Petris, G., and Thomas, L. (2021). A guide to state–space modeling of ecological time series. Ecological Monographs, 91(4):e01470.
Bessac and Naveau, (2021) Bessac, J. and Naveau, P. (2021). Forecast score distributions with imperfect observations. Advances in Statistical Climatology, Meteorology and Oceanography, 7(2):53–71.
Bloom and Williams, (2015) Bloom, A. and Williams, M. (2015). Constraining ecosystem carbon dynamics in a data-limited world: integrating ecological ”common sense” in a model–data fusion framework. Biogeosciences, 12(5):1299–1315.
Box, (1980) Box, G. E. P. (1980). Sampling and bayes’ inference in scientific modelling and robustness. Journal of the Royal Statistical Society. Series A (General), 143(4):383–430.
Buck-Sorlin, (2013) Buck-Sorlin, G. (2013). Process-based model. In Dubitzky, W., Wolkenhauer, O., Cho, K.-H., and Yokota, H., editors, Encyclopedia of Systems Biology, pages 1755–1755. Springer New York, New York, NY.
Buckland et al., (2004) Buckland, S., Newman, K., Thomas, L., and Koesters, N. (2004). State-space models for the dynamics of wild animal populations. Ecological Modelling, 171(1):157–175.
Cappe et al., (2007) Cappe, O., Godsill, S. J., and Moulines, E. (2007). An overview of existing methods and recent advances in sequential monte carlo. Proceedings of the IEEE, 95(5):899–924.
Carter and Kohn, (1994) Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81(3):541–553.
DAAC, (2018) DAAC, O. (2018). Fixed sites subsetting and visualization tool. ORNL DAAC, Oak Ridge, Tennessee, USA. Accessed March 23, 2022. Subset obtained for MCD15A3Hvproduct at site id us_wisconsin_neon_unde.
Dennis and Patil, (1984) Dennis, B. and Patil, G. (1984). The gamma distribution and weighted multimodal gamma distributions as models of population abundance. Mathematical Biosciences, 68(2):187–212.
Dennis and Patil, (1988) Dennis, B. and Patil, G. P. (1988). Applications in ecology. In Crow, E. L. and Shimizu, K., editors, Lognormal Distributions, chapter 12, pages 303–330. Taylor and Francis, Routledge.
Dennis et al., (2006) Dennis, B., Ponciano, J. M., Lele, S. R., Taper, M. L., and Staples, D. F. (2006). Estimating density dependence, process noise, and observation error. Ecological Monographs, 76(3):323–341.
Deo and Grover, (2021) Deo, V. and Grover, G. (2021). A new extension of state-space sir model to account for underreporting – an application to the covid-19 transmission in california and florida. Results in Physics, 24:104182.
Dietze et al., (2018) Dietze, M. C., Fox, A., Beck-Johnson, L. M., Betancourt, J. L., Hooten, M. B., Jarnevich, C. S., Keitt, T. H., Kenney, M. A., Laney, C. M., Larsen, L. G., Loescher, H. W., Lunch, C. K., Pijanowski, B. C., Randerson, J. T., Read, E. K., Tredennick, A. T., Vargas, R., Weathers, K. C., and White, E. P. (2018). Iterative near-term ecological forecasting: Needs, opportunities, and challenges. Proceedings of the National Academy of Sciences, 115(7):1424–1432.
Doucet and Johansen, (2011) Doucet, A. and Johansen, A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later. In Crisan, D. and Rozovskii, B., editors, The Oxford handbook of nonlinear filtering, pages 656–705. Oxford University Press, Oxford ; N.Y.
Dowd and Meyer, (2003) Dowd, M. and Meyer, R. (2003). A bayesian approach to the ecosystem inverse problem. Ecological Modelling, 168(1):39 – 55.
Durbin and Koopman, (2012) Durbin, J. and Koopman, S. (2012). Time Series analysis by state space methods. Number Second Edition. Oxford University press.
Ercanli et al., (2018) Ercanli, I., Günlü, A., Şenyurt, M., and Keleş, S. (2018). Artificial neural network models predicting the leaf area index: a case study in pure even-aged crimean pine forests from turkey. Forest Ecosystems, 5.
Eriksson et al., (2019) Eriksson, D., Pearce, M., Gardner, J., Turner, R. D., and Poloczek, M. (2019). Scalable global optimization via local bayesian optimization. Advances in neural information processing systems, 32.
Ferro, (2017) Ferro, C. A. T. (2017). Measuring forecast performance in the presence of observation error. Quarterly Journal of the Royal Meteorological Society, 143(708):2665–2676.
Gelman, (2006) Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1(3):515 – 534.
Geman and Geman, (1984) Geman, S. and Geman, D. (1984). Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741.
Gneiting and Raftery, (2007) Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378.
Gompertz, (1825) Gompertz, B. (1825). Xxiv. on the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. in a letter to francis baily, esq. f. r. s. Philosophical Transactions of the Royal Society of London, 115:513–583.
Good, (1952) Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society. Series B (Methodological), 14(1):107–114.
Gramacy, (2020) Gramacy, R. B. (2020). Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Chapman Hall/CRC, Boca Raton, Florida. http://bobby.gramacy.com/surrogates/.
Hamilton, (1994) Hamilton, J. (1994). Time series analysis. Princeton Univ. Press, Princeton, NJ.
Holm, (1979) Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70.
Jeffreys, (1946) Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 186(1007):453–461.
Jiang et al., (2018) Jiang, J., Huang, Y., Ma, S., Stacy, M., Shi, Z., Ricciuto, D. M., Hanson, P. J., and Luo, Y. (2018). Forecasting responses of a northern peatland carbon cycle to elevated co2 and a gradient of experimental warming. Journal of Geophysical Research: Biogeosciences, 123(3):1057–1071.
Jordan et al., (2017) Jordan, A., Krueger, F., and Lerch, S. (2017). Evaluating probabilistic forecasts with the r package scoringrules. Journal of Statistical Software, 90:1–37.
Julier and Uhlmann, (1997) Julier, S. and Uhlmann, J. (1997). New extension of the kalman filter to nonlinear systems. In Defense, Security, and Sensing.
Kalman, (1960) Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82(Series D):35–45.
Keeling et al., (2005) Keeling, C. D., Piper, S. C., Bacastow, R. B., Wahlen, M., Whorf, T. P., Heimann, M., and Meijer, H. A. (2005). Atmospheric CO2 and 13CO2 Exchange with the Terrestrial Biosphere and Oceans from 1978 to 2000: Observations and Carbon Cycle Implications, pages 83–113. Springer New York, New York, NY.
King et al., (2016) King, A. A., Nguyen, D., and Ionides, E. L. (2016). Statistical inference for partially observed markov processes via the r package pomp. Journal of Statistical Software, 69(12):1–43.
Knape et al., (2011) Knape, J., Jonzén, N., and Sköld, M. (2011). On observation distributions for state space models of population survey data. The Journal of animal ecology, 80:1269–77.
Krüger et al., (2021) Krüger, F., Lerch, S., Thorarinsdottir, T., and Gneiting, T. (2021). Predictive inference based on markov chain monte carlo output. International Statistical Review, 89(2):274–301.
Lambert, (1758) Lambert, J. H. (1758). Observationes variae in mathesin puram. Acta Helvetica, physico- mathematico-anatomico-botanico-medica, pages 128–168.
Lewis et al., (2022) Lewis, A. S. L., Woelmer, W. M., Wander, H. L., Howard, D. W., Smith, J. W., McClure, R. P., Lofton, M. E., Hammond, N. W., Corrigan, R. S., Thomas, R. Q., and Carey, C. C. (2022). Increased adoption of best practices in ecological forecasting enables comparisons of forecastability. Ecological Applications, 32(2):e02500.
Little, (2012) Little, R. (2012). Calibrated bayes, an alternative inferential paradigm for official statistics. Journal of Official Statistics, 28:309–334.
Little, (2006) Little, R. J. (2006). Calibrated bayes. The American Statistician, 60(3):213–223.
Luo et al., (2009) Luo, Y., Weng, E., Wu, X., Gao, C., Zhou, X., and Zhang, L. (2009). Parameter identifiability, constraint, and equifinality in data assimilation with ecosystem models. Ecological Applications, 19(3):571–574.
Mahowald et al., (2016) Mahowald, N., Lo, F., Zheng, Y., Harrison, L., Funk, C., Lombardozzi, D., and Goodale, C. (2016). Projections of leaf area index in earth system models. Earth System Dynamics, 7(1):211–229.
Matheson and Winkler, (1976) Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions. Management Science, 22(10):1087–1096.
Maunder et al., (2015) Maunder, M. N., Deriso, R. B., and Hanson, C. H. (2015). Use of state-space population dynamics models in hypothesis testing: advantages over simple log-linear regressions for modeling survival, illustrated with application to longfin smelt (spirinchus thaleichthys). Fisheries Research, 164:102–111.
Mäntyniemi et al., (2015) Mäntyniemi, S. H. P., Whitlock, R. E., Perälä, T. A., Blomstedt, P. A., Vanhatalo, J. P., Rincón, M. M., Kuparinen, A. K., Pulkkinen, H. P., and Kuikka, O. S. (2015). General state-space population dynamics model for Bayesian stock assessment. ICES Journal of Marine Science, 72(8):2209–2222.
National Ecological Observatory Network, (2020) National Ecological Observatory Network (2020). Woody plant vegetation structure, Data Product DP1.10098.001, Provisional data downloaded from http://data.neonscience.org on April 21, 2020.
National Ecological Observatory Network , 2022a (NEON) National Ecological Observatory Network (NEON) (2022a). Shortwave radiation (primary pyranometer) (dp1.00022.001).
National Ecological Observatory Network , 2022b (NEON) National Ecological Observatory Network (NEON) (2022b). Triple aspirated air temperature (dp1.00003.001).
Osthus et al., (2017) Osthus, D., Hickmann, K. S., Caragea, P. C., Higdon, D., and Del Valle, S. Y. (2017). Forecasting seasonal influenza with a state-space sir model. The Annals of Applied Statistics, 11(1).
Petris et al., (2009) Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R, volume 38, pages 31–84.
Plummer, (2003) Plummer, M. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling.
Plummer, (2019) Plummer, M. (2019). rjags: Bayesian Graphical Models using MCMC. R package version 4-10.
Polson and Scott, (2012) Polson, N. G. and Scott, J. G. (2012). On the Half-Cauchy Prior for a Global Scale Parameter. Bayesian Analysis, 7(4):887 – 902.
R Core Team, (2016) R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Ricker, (1954) Ricker, W. E. (1954). Stock and recruitment. Journal of the Fisheries Research Board of Canada, 11(5):559–623.
Robert and Casella, (2005) Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag, Berlin, Heidelberg.
Rosenthal, (2009) Rosenthal, S. (2009). Optimal proposal distributions and adaptive mcmc. Handbook of Markov Chain Monte Carlo.
Roulston and Smith, (2002) Roulston, M. S. and Smith, L. A. (2002). Evaluating probabilistic forecasts using information theory. Monthly Weather Review, 130(6):1653 – 1660.
Rubin, (1984) Rubin, D. B. (1984). Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician. The Annals of Statistics, 12(4):1151 – 1172.
Serbin et al., (2019) Serbin, S. P., Wu, J., Ely, K. S., Kruger, E. L., Townsend, P. A., Meng, R., Wolfe, B. T., Chlus, A., Wang, Z., and Rogers, A. (2019). From the arctic to the tropics: multibiome prediction of leaf mass per area using leaf reflectance. New Phytologist, 224(4):1557–1568.
Shumway and Stoffer, (2011) Shumway, R. and Stoffer, D. (2011). Time Series Analysis and Its Applications With R Examples, volume 9.
Smith and Miller, (1986) Smith, R. L. and Miller, J. E. (1986). A non-gaussian state space model and application to prediction of records. Journal of the Royal Statistical Society. Series B (Methodological), 48(1):79–88.
Thomas et al., (2017) Thomas, Q., Brooks, E., Jersild, A., Ward, E., Wynne, R., Albaugh, T., Dinon-Aldridge, H., Burkhart, H., Domec, J.-C., Fox, T., Gonzalez-Benecke, C., Martin, T., Noormets, A., Sampson, D., and Teskey, R. (2017). Leveraging 35 years of pinus taeda research in the southeastern us to constrain forest carbon cycle predictions: Regional data assimilation using ecosystem experiments. Biogeosciences, 14:3525–3547.
West and Harrison, (1997) West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models (2nd Ed.). Springer-Verlag, Berlin, Heidelberg.
Williams et al., (1997) Williams, M., Rastetter, E. B., Fernandes, D. N., Goulden, M. L., Shaver, G. R., and Johnson, L. C. (1997). Predicting gross primary productivity in terrestrial ecosystems. Ecological Applications, 7(3):882–894.
Yang et al., (2006) Yang, W., Tan, B., Huang, D., Rautiainen, M., Shabanov, N., Wang, Y., Privette, J., Huemmrich, K., Fensholt, R., Sandholt, I., Weiss, M., Ahl, D., Gower, S., Nemani, R., Knyazikhin, Y., and Myneni, R. (2006). Modis leaf area index products: from validation to algorithm improvement. IEEE Transactions on Geoscience and Remote Sensing, 44(7):1885–1898.

6 Appendix

6.1 Lognormal Moment Matching

Suppose that we are interested in finding a transformation for random variable $X$ , $X\sim\text{Lognormal}(\mu^{*},\sigma^{*2})$ such that $\mathbb{E}[X]=\mu,\mathbb{V}[X]=\sigma^{2}$ , with pdf given by:

\displaystyle f(x\lvert\mu,\sigma)=\frac{1}{\mu\sigma\sqrt{2\pi}}\exp\left(-\frac{(\log(x)-\mu)^{2}}{2\sigma^{2}}\right),\mu\in\mathbb{R},\sigma>0,x\in(0,\infty)

Then, we have:

	$\displaystyle\mathbb{E}[X]=\mu=\exp\left(\mu^{}+\frac{\sigma^{2}}{2}\right)$
	$\displaystyle\mathbb{V}[X]=\sigma^{2}=\left(\exp(\sigma^{2})-1\right)\exp(2\mu^{}+\sigma^{*2})$

Looking at the first equality, we see that $\sigma^{*2}=2(\log(\mu)-\mu^{*})$ . We can substitute this into the second equality to get $\mu^{*}$ written in terms of known constants,

	$\displaystyle\sigma^{2}$	$\displaystyle=\Big{(}\exp(2\log(\mu)-2\mu^{})-1\Big{)}\exp(2\mu^{}+2\log(\mu)-2\mu^{*})$
		$\displaystyle=\Big{(}\exp(2\log(\mu)-2\mu^{*})-1\Big{)}\exp(2\log(\mu))$
		$\displaystyle=(\mu^{2}\exp(-2\mu^{*})-1)\mu^{2}$

Then, rewriting this equation in terms of $\exp(-2\mu^{*})$ ,

	$\displaystyle\exp(-2\mu^{*})=\frac{\sigma^{2}+\mu^{2}}{\mu^{4}}$
	$\displaystyle-2\mu^{*}=\log\Big{(}\frac{\sigma^{2}+\mu^{2}}{\mu^{4}}\Big{)}$
	$\displaystyle\mu^{*}=-\frac{1}{2}\log\Big{(}\frac{\sigma^{2}+\mu^{2}}{\mu^{4}}\Big{)}$
	$\displaystyle\mu^{*}=\log\Big{(}\frac{\mu^{2}}{\sqrt{\mu^{2}+\sigma^{2}}}\Big{)}$

Substituting our value for $\mu^{*}$ back into the relation $\sigma^{*2}=2(\log(\mu)-\mu^{*})$ , we have:

	$\displaystyle\sigma^{*2}$	$\displaystyle=2(\log(\mu)-\mu^{*})$
		$\displaystyle=2\Big{(}\log(\mu)-\log\Big{(}\frac{\mu^{2}}{\sqrt{\mu^{2}+\sigma^{2}}}\Big{)}\Big{)}$
		$\displaystyle=2\log\Big{(}\frac{\sqrt{\mu^{2}+\sigma^{2}}}{\mu}\Big{)}$
		$\displaystyle=\log\Big{(}\frac{\mu^{2}+\sigma^{2}}{\mu^{2}}\Big{)}$
		$\displaystyle=\log\Big{(}1+\frac{\sigma^{2}}{\mu^{2}}\Big{)}$

Thus our desired transform is $\mu^{*}=\log\Big{(}\frac{\mu^{2}}{\sqrt{\mu^{2}+\sigma^{2}}}\Big{)}$ and $\sigma^{*2}=\log\Big{(}1+\frac{\sigma^{2}}{\mu^{2}}\Big{)}$

6.2 Half Cauchy Precision Prior

Suppose that we are interested in using a half-Cauchy distribution on $\sigma^{2}$ , and want to understand the implied prior on $\phi=\sigma^{-2}$ .

	$\displaystyle\sigma^{2}\sim\text{HalfCauchy}(\mu=0,a=\gamma)$		(47)
	$\displaystyle\pi(\sigma^{2})=\frac{2}{\pi\gamma(1+(\frac{\sigma^{2}}{\gamma})^{2})},\text{ }\sigma^{2}>0$		(48)

Let $\phi=\sigma^{-2}$ . Then, the pdf of $\phi$ is:

$\displaystyle\pi(\phi)$	$\displaystyle=\frac{2}{\pi\gamma(1+(\frac{1}{\phi\gamma})^{2})}\frac{1}{\phi^{2}},\text{ }\phi>0$	(49)
	$\displaystyle=\frac{2}{\pi\gamma(\phi^{2}+(\frac{1}{\gamma})^{2})}\text{ }\phi>0$	(50)
	$\displaystyle=\frac{2}{\pi\frac{1}{\gamma}(1+(\phi\gamma)^{2})}\text{ }\phi>0$	(51)

Thus if the prior for $\sigma^{2}$ is $\sigma^{2}\sim\text{HalfCauchy}(\mu=0,a=\gamma)$ , then the implied prior on $\phi$ is $\phi\sim\text{HalfCauchy}(\mu=0,a=\gamma^{-1})$

6.3 Reduced DALEC2 Details

In this section we provide additional details on the moments and prior distributions for the reduced DALEC2 model that we use (Bloom and Williams,, 2015).

Model	MM	Biased
$\mathbb{E}[\mathbf{C}^{(t)}\lvert\mathbf{C}^{(t-1)},\Theta)]$	$M_{t}\mathbf{C}^{(t-1)}+p_{t}$	$(M_{t}\mathbf{C}^{(t-1)}+p_{t})\exp((2\phi)^{-1})$
$\mathbb{V}[\mathbf{C}^{(t)}\lvert\mathbf{C}^{(t-1)},\Theta]$	$\phi^{-1}(M_{t}\mathbf{C}^{(t-1)}+p_{t})$	$\phi$
$\mathbb{E}[\mathbf{C}^{(i)}_{obs}\lvert\mathbf{C}^{(i)},\Theta]$	$C_{f}^{(i)}c_{lma}^{-1}$	$C_{f}^{(i)}c_{lma}^{-1}\exp((2\tau)^{-1})$
$\mathbb{V}[\mathbf{C}^{(i)}_{obs}\lvert\mathbf{C}^{(i)},\Theta]$	$\tau^{-1}(C_{f}^{(i)}c_{lma}^{-1})$	$\phi(X_{t-1}\exp(a+bX_{t-1}))^{-2}$

Table 6: Expected values and variances for the process evolution functions and observation functions for the Moment Matching LN-SSM and Biased LN-SSM used to model LAI data at UNDE.

Param.	Description	Units	Prior
$f_{lab}$	Proportion of GPP allocated to labile carbon	unitless	Unif(.01, .5)
$f_{f}$	Proportion of GPP allocated to foliage carbon	unitless	Unif(.01, .5)
$d_{o}$	Start day of leaf regrowth onset	unitless	Unif(1, 365)
$d_{f}$	Start day of leaf fall	unitless	Unif(1, 365)
$c_{eff}$	Canopy efficiency	unitless	Unif(10, 100)
$c_{lf}$	Proportion of leaves lost annually	unitless	Unif(.125, 1)
$c_{ro}$	Length of labile carbon release period	day	Unif(10, 100)
$c_{rf}$	Length of leaf fall period	day	Unif(20, 150)
$\omega_{f}$	Density dependent variance parameter	$g^{.5}\text{C}^{.5}\text{m}^{-1}$	Unif(0, 1)
$\omega_{lab}$	Density dependent variance parameter	$g^{.5}\text{C}^{.5}\text{m}^{-1}$	Unif(0, 1)

Table 7: Information on the 10 parameters estimated in our reduced DALEC2 model (Bloom and Williams,, 2015). This information includes notation, interpretations of the parameters, units, and the prior distribution used in our Bayesian Lognormal SSM. Upper and lower bounds for process parameters (

f_{lab}

through

c_{rf}

) are taken from the table of upper and lower bounds in Bloom and Williams, (2015).