This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newcites

smAdditional References

Nearest-Neighbor Mixture Models for Non-Gaussian Spatial Processes

Xiaotian Zheng, Athanasios Kottas, and Bruno Sansó
Department of Statistics, University of California, Santa Cruz
Abstract

We develop a class of nearest-neighbor mixture models that provide direct, computationally efficient, probabilistic modeling for non-Gaussian geospatial data. The class is defined over a directed acyclic graph, which implies conditional independence in representing a multivariate distribution through factorization into a product of univariate conditionals, and is extended to a full spatial process. We model each conditional as a mixture of spatially varying transition kernels, with locally adaptive weights, for each one of a given number of nearest neighbors. The modeling framework emphasizes the description of non-Gaussian dependence at the data level, in contrast with approaches that introduce a spatial process for transformed data, or for functionals of the data probability distribution. Thus, it facilitates efficient, full simulation-based inference. We study model construction and properties analytically through specification of bivariate distributions that define the local transition kernels, providing a general strategy for modeling general types of non-Gaussian data. Regarding computation, the framework lays out a new approach to handling spatial data sets, leveraging a mixture model structure to avoid computational issues that arise from large matrix operations. We illustrate the methodology using synthetic data examples and an analysis of Mediterranean Sea surface temperature observations.


Keywords: Bayesian hierarchical models; Copulas; Spatial generalized linear mixed model; Spatial statistics; Vecchia approximations

1 Introduction

Gaussian processes have been widely used as an underlying structure in model-based analysis of irregularly located spatial data in order to capture short range variability. The fruitfulness of these spatial models owes to the simple characterization of the Gaussian process by a mean and a covariance function, and the optimal prediction it provides that justifies kriging. However, the assumption of Gaussianity is restrictive in many fields where the data exhibits non-Gaussian features, for example, vegetation abundance (Eskelson et al., 2011), precipitation data (Sun et al., 2015), temperature data (North et al., 2011), and wind speed data (Bevilacqua et al., 2020). This article aims at developing a flexible class of geostatistical models that is customizable to general non-Gaussian distributions, with particular focus on continuous data.

Several approaches have been developed for non-Gaussian geostatistical modeling. A straightforward approach consists of fitting a Gaussian process after transformation of the original data. Possible transformations include Box-Cox (De Oliveira et al., 1997), power (Allcroft and Glasbey, 2003), square-root (Johns et al., 2003), and Tukey g-and-h (Xu and Genton, 2017) transforms, to name a few. An alternative approach is to represent a non-Gaussian distribution as a location-scale mixture of Gaussian distributions. This yields Gaussian process extensions that are able to capture skewness and long tails (Kim and Mallick, 2004; Palacios and Steel, 2006; Zhang and El-Shaarawi, 2010; Mahmoudian, 2017; Morris et al., 2017; Zareifard et al., 2018; Bevilacqua et al., 2021). Beyond methods based on continuous mixtures of Gaussian distributions, Bayesian nonparametric methods have been explored for geostatistical data modeling, starting with the approach in Gelfand et al. (2005) which extends the Dirichlet process (Ferguson, 1973) to a prior model for random spatial surfaces. We refer to Müller et al. (2018) for a review. From a different perspective, Bolin (2014) formulates stochastic partial differential equations driven by non-Gaussian noise, resulting in a class of non-Gaussian Matérn fields.

An alternative popular approach involves a hierarchical model structure that assumes conditionally independent non-Gaussian marginals, combined with a latent spatial process that is associated with some functional or link function of the first-stage marginals. Hereafter, we refer to these models as hierarchical first-stage non-Gaussian models. If the latent process is linked through a function of some parameter(s) of the first-stage marginal which belongs to the exponential dispersion family, the approach is known as the spatial generalized linear mixed model and its extensions (Diggle et al., 1998). Non-Gaussian spatial models that build from copulas (Joe, 2014) can also be classified into this category. Copula models assume pre-specified families of marginals for observations, with a multivariate distribution underlying the copula for a vector of latent variables that are probability integral transformations of the observations (Danaher and Smith, 2011). Spatial copula models replace the multivariate distribution with one that corresponds to a spatial process, thus introducing spatial dependence (Bárdossy, 2006; Ghosh and Mallick, 2011; Krupskii et al., 2018; Beck et al., 2020).

The non-Gaussian modeling framework proposed in this article is distinctly different from the previously mentioned approaches. Our methodology builds on the class of nearest-neighbor processes obtained by extending a joint density for a reference set of locations to the entire spatial domain. The original joint density is factorized into a product of conditionals with respect to a sparse directed acyclic graph (DAG). Deriving each conditional from a Gaussian process results in the nearest-neighbor Gaussian process (Datta et al., 2016a). Models defined over DAGs have received substantial attention; see, e.g., Datta et al. (2016b); Finley et al. (2019); Peruzzi et al. (2020); Peruzzi and Dunson (2022). The class of DAG-based models originates from the Vecchia approximation framework (Vecchia, 1988). Katzfuss and Guinness (2021) provide further generalization. Considerably less attention, however, has been devoted to defining models over a sparse DAG with non-Gaussian distributions for the conditionals of the joint density. This is in general a difficult problem, as each conditional involves, say, a pp-dimensional conditioning set, which requires a coherent model for a (p+1)(p+1)-dimensional non-Gaussian distribution, with pp potentially large. In this article, we take on the challenging task of developing a computationally efficient, interpretable framework that provides generality for modeling different types of non-Gaussian data and flexibility for complex spatial dependence.

We overcome the aforementioned challenge by modeling each conditional of the joint density as a weighted combination of spatially varying transition kernels, each of which depends on a specific neighbor. This approach produces multivariate non-Gaussian distributions by specification of the bivariate distributions that define the local transition kernels. Thus, it provides generality for modeling different non-Gaussian behaviors, since, relative to the multivariate analogue, constructing bivariate distributions is substantially easier, for instance, using bivariate copulas. Moreover, such a model structure offers the convenience of quantifying multivariate dependence through the collection of bivariate distributions. As an illustration, we study tail dependence properties under appropriate families of bivariate distributions, and provide results that guide modeling choices. The modeling framework achieves flexibility by letting both the weights and transition kernels be spatially dependent, inducing sufficient local dependence to describe a wide range of spatial variability. We refer to the resulting geospatial process as the nearest-neighbor mixture process (NNMP).

An important feature of the model structure is that it facilitates the study of conditions for constructing NNMPs with pre-specified families of marginal distributions. Such conditions are easily implemented without parameter constraints, thus resulting in a general modeling tool to describe spatial data distributions that are skewed, heavy-tailed, positive-valued, or have bounded support, as illustrated through several examples in Section 4 and in the supplementary material. The NNMP framework emphasizes direct modeling by introducing spatial dependence at the data level. It avoids the use of transformations that may distort the Gaussian process properties (Wallin and Bolin, 2015). It is fundamentally different from the class of hierarchical first-stage non-Gaussian models that introduce spatial dependence through functionals of the data probability distribution, such as the transformed mean. Regarding computation, NNMP models do not require estimation of potentially large vectors of spatially correlated latent variables, something unavoidable with hierarchical first-stage non-Gaussian models. In fact, approaches for such models typically resort to approximate inference, either directly or combined with a scalable model (Zilber and Katzfuss, 2021). Estimation of NNMPs is analogous to that of a finite mixture model, thus avoiding the need to perform costly matrix operations for large data sets, and allowing for computationally efficient, full simulation-based inference. Overall, the NNMP framework offers a flexible class of models that is able to describe complex spatial dependence, coupled with an efficient computational approach, leveraged from the mixture structure of the model.

The rest of the article is organized as follows. In Section 2, we formulate the NNMP framework and study model properties. Specific examples of NNMP models illustrate different components of the methodology. Section 3 develops the general approach to Bayesian estimation and prediction under NNMP models. In Section 4, we demonstrate different NNMP models with a synthetic data example and with the analysis of Mediterranean Sea surface temperature data, respectively. Finally, Section 5 concludes with a summary and discussion of future work.

2 Nearest-Neighbor Mixture Processes for Spatial Data

2.1 Modeling Framework

Consider a univariate spatial process {Z(𝒗):𝒗𝒟}\{Z(\bm{v}):\bm{v}\in\mathcal{D}\}, where 𝒟p\mathcal{D}\subset\mathbb{R}^{p}, for p1p\geq 1. Let 𝒮={𝒔1,,𝒔n}\operatorname*{\mathcal{S}}=\{\bm{s}_{1},\dots,\bm{s}_{n}\} be a finite collection of locations in 𝒟\mathcal{D}, referred to as the reference set. We write the joint density of the random vector 𝒛𝒮=(Z(𝒔1),,Z(𝒔n))\bm{z}_{\operatorname*{\mathcal{S}}}=(Z(\bm{s}_{1}),\dots,Z(\bm{s}_{n}))^{\top} as p(𝒛𝒮)=p(z(𝒔1))i=2np(z(𝒔i)z(𝒔i1),,z(𝒔1))p(\bm{z}_{\operatorname*{\mathcal{S}}})=p(z(\bm{s}_{1}))\prod_{i=2}^{n}p(z(\bm{s}_{i})\mid z(\bm{s}_{i-1}),\dots,z(\bm{s}_{1})). If we regard the conditioning set of z(𝒔i)z(\bm{s}_{i}) as the parents of z(𝒔i)z(\bm{s}_{i}), the joint density p(𝒛𝑺)p(\bm{z}_{\bm{S}}) is a factorization according to a DAG whose vertices are z(𝒔i)z(\bm{s}_{i}). We obtain a sparse DAG by reducing the conditioning set of z(𝒔i)z(\bm{s}_{i}) to a smaller subset, denoted as 𝒛Ne(𝒔i)\bm{z}_{\text{Ne}(\bm{s}_{i})}, with Ne(𝒔i)𝒮i={𝒔1,,𝒔i1}\text{Ne}(\bm{s}_{i})\subset\operatorname*{\mathcal{S}}_{i}=\{\bm{s}_{1},\dots,\bm{s}_{i-1}\}. We refer to Ne(𝒔i)\text{Ne}(\bm{s}_{i}) as the neighbor set for 𝒔i\bm{s}_{i}, having at most LL elements with LnL\ll n. The resulting density for the sparse DAG is

p~(𝒛𝒮)=p(z(𝒔1))i=2np(z(𝒔i)𝒛Ne(𝒔i)),\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}})=p(z(\bm{s}_{1}))\prod_{i=2}^{n}p(z(\bm{s}_{i})\mid\bm{z}_{\text{Ne}(\bm{s}_{i})}), (1)

which is a proper density (Lauritzen, 1996). Choosing the neighbor sets Ne(𝒔i)\text{Ne}(\bm{s}_{i}) creates different sparse DAGs. There are different ways to select members from 𝒮i\operatorname*{\mathcal{S}}_{i} for Ne(𝒔i)\text{Ne}(\bm{s}_{i}); see, e.g., Vecchia (1988), Stein et al. (2004), and Gramacy and Apley (2015). Our selection is based on the geostatistical distance between 𝒔i\bm{s}_{i} and 𝒔j𝒮i\bm{s}_{j}\in\operatorname*{\mathcal{S}}_{i}. The selected locations 𝒔j\bm{s}_{j} are placed in ascending order according to the distance, denoted as 𝒔(i1),,𝒔(i,iL)\bm{s}_{(i1)},\dots,\bm{s}_{(i,i_{L})}, where iL:=(i1)Li_{L}:=(i-1)\wedge L. We note that the development of the proposed framework holds true for any choice of the neighbor sets.

Constructing a nearest-neighbor process involves specification of the conditional densities p(z(𝒔i)𝒛Ne(𝒔i))p(z(\bm{s}_{i})\mid\bm{z}_{\text{Ne}(\bm{s}_{i})}) in (1), and extension to an arbitrary finite set in 𝒟\mathcal{D} that is not overlapping with 𝒮\operatorname*{\mathcal{S}}. We approach the problem of constructing a nearest-neighbor non-Gaussian process following this idea. A notable difference, however, from the nearest-neighbor Gaussian process approach in Datta et al. (2016a) is that we do not posit a parent process for p(𝒛𝒮)p(\bm{z}_{\operatorname*{\mathcal{S}}}) when deriving p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}). Datta et al. (2016a) assume a Gaussian process for p(𝒛𝒮)p(\bm{z}_{\operatorname*{\mathcal{S}}}), and use it to model the conditional densities p(z(𝒔i)𝒛Ne(𝒔i))p(z(\bm{s}_{i})\mid\bm{z}_{\text{Ne}(\bm{s}_{i})}). Similar ideas underlie the Vecchia approximation framework which considers (1) as an approximation for the density of a Gaussian process realization. We instead utilize the structure in (1) to develop nearest-neighbor models for non-Gaussian spatial processes. To this end, we define the conditional density in the product in (1) as

p(z(𝒔i)𝒛Ne(𝒔i))=l=1iLwl(𝒔i)f𝒔i,l(z(𝒔i)z(𝒔(il))),p(z(\bm{s}_{i})\mid\bm{z}_{\text{Ne}(\bm{s}_{i})})=\sum_{l=1}^{i_{L}}w_{l}(\bm{s}_{i})\,f_{\bm{s}_{i},l}(z(\bm{s}_{i})\mid z(\bm{s}_{(il)})), (2)

where f𝒔i,lf_{\bm{s}_{i},l} is the llth component of the mixture density pp, and the weights satisfy wl(𝒔i)0w_{l}(\bm{s}_{i})\geq 0, for all ll, and l=1iLwl(𝒔i)=1\sum_{l=1}^{i_{L}}w_{l}(\bm{s}_{i})=1, for every 𝒔i𝒮\bm{s}_{i}\in\operatorname*{\mathcal{S}}. In a sparse DAG, nearest neighbors in set Ne(𝒔i)\text{Ne}(\bm{s}_{i}) are nodes that have directed edges pointing to 𝒔i\bm{s}_{i}. Thus, it is appealing to consider a high-order Markov model in which temporal lags have a similar notion of direction. Our approach to formulate (2) is motivated from a class of mixture transition distribution models (Le et al., 1996), which consists of a mixture of first-order transition densities with a vector of common weights. A key feature of the formulation in (2) is the decomposition of a non-Gaussian conditional density, with a potentially large conditioning set, into a weighted sum of local conditional densities. This provides flexible, parsimonious modeling of p(z(𝒔i)𝒛Ne(𝒔i))p(z(\bm{s}_{i})\mid\bm{z}_{\text{Ne}(\bm{s}_{i})}) through specifying bivariate distributions that define the local conditionals f𝒔i,l(z(𝒔i)z(𝒔(il))f_{\bm{s}_{i},l}(z(\bm{s}_{i})\mid z(\bm{s}_{(il)}). We provide further discussion on this feature for model construction and relevant properties in the following sections.

Spatial dependence characterized by (2) is twofold. First, each component f𝒔i,lf_{\bm{s}_{i},l} is associated with spatially varying parameters indexed at 𝒔i𝒮\bm{s}_{i}\in\operatorname*{\mathcal{S}}, defined by a probability model or a link function. Secondly, the weights wl(𝒔i)w_{l}(\bm{s}_{i}) are spatially varying. As each component density f𝒔i,lf_{\bm{s}_{i},l} depends on a specific neighbor, the weights indicate the contribution of each neighbor of 𝒔i\bm{s}_{i}. Besides, the weights adapt to the change of locations. For two different 𝒔i,𝒔j\bm{s}_{i},\bm{s}_{j} in 𝒮\operatorname*{\mathcal{S}}, the relative locations of the nearest neighbors Ne(𝒔i)\text{Ne}(\bm{s}_{i}) to 𝒔i\bm{s}_{i} are different from that of Ne(𝒔j)\text{Ne}(\bm{s}_{j}) to 𝒔j\bm{s}_{j}. If all elements of Ne(𝒔i)\text{Ne}(\bm{s}_{i}) are very close to 𝒔i\bm{s}_{i}, then values of (w1(𝒔i),,wiL(𝒔i))(w_{1}(\bm{s}_{i}),\dots,w_{i_{L}}(\bm{s}_{i}))^{\top} should be quite even. On the other hand, if, for 𝒔j\bm{s}_{j}, only a subset of its neighbors in Ne(𝒔j)\text{Ne}(\bm{s}_{j}) are close to 𝒔j\bm{s}_{j}, then the weights corresponding to this subset should receive larger values. We remark that in general, probability models or link functions for the spatially varying parameters should be considered case by case, given different specifications on the components f𝒔i,lf_{\bm{s}_{i},l}. Details of the construction for the component densities and the weights are deferred to later sections.

We obtain the NNMP, a legitimate spatial process, by extending (2) to an arbitrary set of non-reference locations 𝒰={𝒖1,,𝒖r}\;\operatorname*{\mathcal{U}}=\{\bm{u}_{1},\dots,\bm{u}_{r}\} where 𝒰𝒟𝒮\operatorname*{\mathcal{U}}\subset\mathcal{D}\setminus\operatorname*{\mathcal{S}}. In particular, we define the conditional density of 𝒛𝒰\bm{z}_{\operatorname*{\mathcal{U}}} given 𝒛𝒮\bm{z}_{\operatorname*{\mathcal{S}}} as

p~(𝒛𝒰𝒛𝒮)=i=1rp(z(𝒖i)𝒛Ne(𝒖i))=i=1rl=1Lwl(𝒖i)f𝒖i,l(z(𝒖i)z(𝒖(il))),\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}}\mid\bm{z}_{\operatorname*{\mathcal{S}}})=\prod_{i=1}^{r}p(z(\bm{u}_{i})\mid\bm{z}_{\text{Ne}(\bm{u}_{i})})=\prod_{i=1}^{r}\sum_{l=1}^{L}w_{l}(\bm{u}_{i})\,f_{\bm{u}_{i},l}(z(\bm{u}_{i})\mid z(\bm{u}_{(il)})), (3)

where the specification on wl(𝒖i)w_{l}(\bm{u}_{i}) and f𝒖i,lf_{\bm{u}_{i},l} for all ii and all ll is analogous to that for (2), except that Ne(𝒖i)={𝒖(i1),,𝒖(iL)}\text{Ne}(\bm{u}_{i})=\{\bm{u}_{(i1)},\dots,\bm{u}_{(iL)}\} are the first LL locations in 𝒮\operatorname*{\mathcal{S}} that are closest to 𝒖i\bm{u}_{i} in terms of geostatistical distance. Building the construction of the neighbor sets Ne(𝒖i)\text{Ne}(\bm{u}_{i}) on the reference set ensures that p~(𝒛𝒰|𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}}\,|\,\bm{z}_{\operatorname*{\mathcal{S}}}) is a proper density.

Given (2) and (3), we can obtain the joint density p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) of a realization 𝒛𝒱\bm{z}_{\operatorname*{\mathcal{V}}} over any finite set of locations 𝒱𝒟\operatorname*{\mathcal{V}}\subset\mathcal{D}. When 𝒱𝒮\operatorname*{\mathcal{V}}\subset\operatorname*{\mathcal{S}}, the joint density p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) is directly available as the appropriate marginal of p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}). Otherwise, we have p~(𝒛𝒱)=p~(𝒛𝒰|𝒛𝒮)p~(𝒛𝒮){𝒔i𝒮𝒱}dz(𝒔i)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}})=\int\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}}\,|\,\bm{z}_{\operatorname*{\mathcal{S}}})\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}})\prod_{\{\bm{s}_{i}\in\operatorname*{\mathcal{S}}\setminus\operatorname*{\mathcal{V}}\}}dz(\bm{s}_{i}), where 𝒰=𝒱𝒮\operatorname*{\mathcal{U}}=\operatorname*{\mathcal{V}}\setminus\operatorname*{\mathcal{S}}. If 𝒮𝒱\operatorname*{\mathcal{S}}\setminus\operatorname*{\mathcal{V}} is empty, p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) is simply p~(𝒛𝒰|𝒛𝒮)p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}}\,|\,\bm{z}_{\operatorname*{\mathcal{S}}})\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}). In general, the joint density p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) of an NNMP is intractable. However, since both p~(𝒛𝒰|𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}}\,|\,\bm{z}_{\operatorname*{\mathcal{S}}}) and p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}) are products of mixtures, we can recognize that p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) is a finite mixture, which suggests flexibility of the model to capture complex non-Gaussian dependence over the domain 𝒟\mathcal{D}. Moreover, we show in Section 2.3 that for some NNMPs, the joint density p~(𝒛𝒱)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{V}}}) has a closed-form expression. In the subsequent development of model properties, we will use the conditional density

p(z(𝒗)𝒛Ne(𝒗))=l=1Lwl(𝒗)f𝒗,l(z(𝒗)z(𝒗(l))),𝒗𝒟,p(z(\bm{v})\mid\bm{z}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,f_{\bm{v},l}(z(\bm{v})\mid z(\bm{v}_{(l)})),\;\;\bm{v}\in\mathcal{D}, (4)

to characterize an NNMP, where Ne(𝒗)\text{Ne}(\bm{v}) contains the first LL locations that are closest to 𝒗\bm{v}, selected from locations in 𝒮\operatorname*{\mathcal{S}}. These locations in Ne(𝒗)\text{Ne}(\bm{v}) are placed in ascending order according to distance, denoted as 𝒗(1),,𝒗(L)\bm{v}_{(1)},\dots,\bm{v}_{(L)}.

Before closing this section, we note that spatial locations are not naturally ordered. Given a distance function, a different ordering on the locations results in different neighbor sets. Therefore, a different sparse DAG with density p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}) is created accordingly for model inference. For the NNMP models illustrated in the data examples, we found through simulation experiments that there were no discernible differences between the inferences based on p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}), given two different orderings. This observation is coherent with that from the literature that considers nearest-neighbor likelihood approximations. Since the approximation of p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}) to p(𝒛𝒮)p(\bm{z}_{\operatorname*{\mathcal{S}}}) depends on the information borrowed from the neighbors, as outlined in Datta et al. (2016a), the effectiveness is determined by the size of Ne(𝒔i)\text{Ne}(\bm{s}_{i}) rather than the ordering. A further remark is that, the ordering of the reference set 𝒮\operatorname*{\mathcal{S}} is typically reserved for observed data. Thus, the ordering effect lies only in the model estimation based on (2) with realization 𝒛𝒮\bm{z}_{\operatorname*{\mathcal{S}}}. Spatial prediction typically rests on locations outside 𝒮\operatorname*{\mathcal{S}} using (3), where the ordering effect disappears.

2.2 NNMPs with Stationary Marginal Distributions

We develop a sufficient condition to construct NNMPs with general stationary marginal distributions. The key feature of this result is that the condition relies on the bivariate distributions that define the first order transition kernels in (4) without the need to impose restrictions on the parameter space. The supplementary material includes the proof of Proposition 1, as well as of Propositions 2, 3 and 4, and of Corollary 1, formulated later in Sections 2.3 and 2.4.

Proposition 1.

Consider an NNMP for which the component density f𝐯,lf_{\bm{v},l} is specified by the conditional density of U𝐯,lU_{\bm{v},l} given V𝐯,lV_{\bm{v},l}, where the random vector (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}) follows a bivariate distribution with marginal densities fU𝐯,lf_{U_{\bm{v},l}} and fV𝐯,lf_{V_{\bm{v},l}}, for l=1,,Ll=1,\dots,L. The NNMP has stationary marginal density fZf_{Z} if it satisfies the invariant condition: Z(𝐬1)fZZ(\bm{s}_{1})\sim f_{Z}, 𝐬1𝒮\bm{s}_{1}\in\operatorname*{\mathcal{S}}, and for every 𝐯𝒟\bm{v}\in\mathcal{D}, fZ(z)=f_{Z}(z)= fU𝐯,l(z)=f_{U_{\bm{v},l}}(z)= fV𝐯,l(z)f_{V_{\bm{v},l}}(z), for all zz and for all ll.

This result builds from the one in Zheng et al. (2022) where mixture transition distribution models with stationary marginal distributions were constructed. It applies regardless of Z(𝒗)Z(\bm{v}) being a continuous, discrete or mixed random variable, thus allowing for a wide range of non-Gaussian marginal distributions and a general functional form, either linear or non-linear, for the expectation with respect to the conditional density pp in (4).

As previously discussed, the mixture model formulation for the conditional density in (4) induces a finite mixture for the NNMP finite-dimensional distributions. On the other hand, due to the mixture form, an explicit expression for the covariance function is difficult to derive. A recursive equation can be obtained for a class of NNMP models for which the conditional expectation with respect to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) is linear, that is, E(U𝒗,l|V𝒗,l=z)=al(𝒗)+bl(𝒗)zE(U_{\bm{v},l}\,|\,V_{\bm{v},l}=z)=a_{l}(\bm{v})+b_{l}(\bm{v})\,z for some al(𝒗),bl(𝒗),l=1,,La_{l}(\bm{v}),b_{l}(\bm{v})\in\mathbb{R},\;l=1,\dots,L, and for all 𝒗𝒟\bm{v}\in\mathcal{D}. Suppose the NNMP has a stationary marginal distribution with finite first and second moments. Without loss of generality, we assume the first moment is zero. Then the covariance over any two locations 𝒗1,𝒗2𝒟\bm{v}_{1},\bm{v}_{2}\in\mathcal{D} is

Cov(Z(𝒗1),Z(𝒗2))={l=1Lwl(𝒔i)bl(𝒔i)E(Z(𝒔j)Z(𝒔(il))),𝒗1𝒔i𝒮,𝒗2𝒔j𝒮,l=1Lwl(𝒗1)bl(𝒗1)E(Z(𝒔j)Z(𝒗(1l))),𝒗1𝒮,𝒗2𝒔j𝒮,l=1Ll=1Lwll{all+bllE(Z(𝒗(1l))Z(𝒗(2l)))},𝒗1,𝒗2𝒮,\displaystyle\mathrm{Cov}(Z(\bm{v}_{1}),Z(\bm{v}_{2}))=\begin{cases}\sum_{l=1}^{L}w_{l}(\bm{s}_{i})\,b_{l}(\bm{s}_{i})\,E(Z(\bm{s}_{j})Z(\bm{s}_{(il)})),\;\;\bm{v}_{1}\equiv\bm{s}_{i}\in\operatorname*{\mathcal{S}},\bm{v}_{2}\equiv\bm{s}_{j}\in\operatorname*{\mathcal{S}},\\ \sum_{l=1}^{L}w_{l}(\bm{v}_{1})\,b_{l}(\bm{v}_{1})\,E(Z(\bm{s}_{j})Z(\bm{v}_{(1l)})),\;\;\bm{v}_{1}\notin\operatorname*{\mathcal{S}},\bm{v}_{2}\equiv\bm{s}_{j}\in\operatorname*{\mathcal{S}},\\ \sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}w_{ll^{\prime}}\,\{a_{ll^{\prime}}+b_{ll^{\prime}}E(Z(\bm{v}_{(1l)})Z(\bm{v}_{(2l^{\prime})}))\},\;\;\bm{v}_{1},\bm{v}_{2}\notin\operatorname*{\mathcal{S}},\end{cases} (5)

where wllwl(𝒗1)wl(𝒗2)w_{ll^{\prime}}\equiv w_{l}(\bm{v}_{1})w_{l^{\prime}}(\bm{v}_{2}), allal(𝒗1)al(𝒗2)a_{ll^{\prime}}\equiv a_{l}(\bm{v}_{1})a_{l^{\prime}}(\bm{v}_{2}), bllbl(𝒗1)bl(𝒗2)b_{ll^{\prime}}\equiv b_{l}(\bm{v}_{1})b_{l^{\prime}}(\bm{v}_{2}), and without loss of generality, we assume i>ji>j. The covariance in (5) implies that, even though the NNMP has a stationary marginal distribution, it is second-order non-stationary.

2.3 Construction of NNMP Models

The spatially varying conditional densities f𝒗,lf_{\bm{v},l} in (4) correspond to a sequence of bivariate distributions indexed at 𝒗\bm{v}, namely, the distributions of (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}), for l=1,,Ll=1,...,L. To balance model flexibility and scalability, we build spatially varying distributions by considering the distribution of random vector (Ul,Vl)(U_{l},V_{l}), for l=1,,Ll=1,\dots,L, and extending some of its parameters to be spatially varying, that is, indexed in 𝒗\bm{v}. To this end, we use a probability model or a link function. We refer to the random vectors (Ul,Vl)(U_{l},V_{l}) as the set of base random vectors. With a careful choice of the model/function for the spatially varying parameter(s), this construction method reduces significantly the dimension of the parameter space, while preserving the capability of the NNMP model structure to capture spatial dependence.

We illustrate the method with several examples below, starting with a bivariate Gaussian distribution and its continuous mixtures for real-valued data, followed by a general strategy using bivariate copulas that can model data with general support. Before proceeding to the examples, we emphasize that our method allows for general bivariate distributions. One can also consider using a pair of compatible conditionals to specify bivariate distributions (Arnold et al., 1999), for instance, a pair of Lomax conditionals. This is illustrated in Example 4 in Section 2.4.

Example 1.

Gaussian and continuous mixture of Gaussian NNMP models.

For l=1,,Ll=1,\dots,L, take (Ul,Vl)(U_{l},V_{l}) to be a bivariate Gaussian random vector with mean μl𝟏2\mu_{l}\bm{1}_{2} and covariance matrix Σl=σl2(1ρlρl1)\Sigma_{l}=\sigma_{l}^{2}\left(\begin{smallmatrix}1&\rho_{l}\\ \rho_{l}&1\end{smallmatrix}\right), where 𝟏2\bm{1}_{2} is the two-dimensional column vector of ones, resulting in a Gaussian conditional density fUl|Vl(ul|vl)=N(ul|(1ρl)μl+ρlvl,σl2(1ρl2))f_{U_{l}|V_{l}}(u_{l}\,|\,v_{l})=N(u_{l}\,|\,(1-\rho_{l})\mu_{l}+\rho_{l}v_{l},\sigma_{l}^{2}(1-\rho_{l}^{2})). If we extend the correlation parameter to be spatially varying, ρl(𝒗)=\rho_{l}(\bm{v})= kl(𝒗,𝒗(l))k_{l}(\bm{v},\bm{v}_{(l)}), for a correlation function klk_{l}, we obtain the spatially varying conditional density,

p(z(𝒗)𝒛Ne(𝒗))=l=1Lwl(𝒗)N(z(𝒗)(1ρl(𝒗))μl+ρl(𝒗)z(𝒗(l)),σl2(1(ρl(𝒗))2)).p(z(\bm{v})\mid\bm{z}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,N(z(\bm{v})\mid(1-\rho_{l}(\bm{v}))\mu_{l}+\rho_{l}(\bm{v})z(\bm{v}_{(l)}),\sigma_{l}^{2}(1-(\rho_{l}(\bm{v}))^{2})). (6)

This NNMP is referred to as the Gaussian NNMP. If we take Z(𝒔1)N(z|μ,σ2)Z(\bm{s}_{1})\sim N(z\,|\,\mu,\sigma^{2}), and set μl=μ\mu_{l}=\mu and σl2=σ2\sigma^{2}_{l}=\sigma^{2}, for all ll, the resulting model satisfies the invariant condition of Proposition 1 with stationary marginal given by the N(μ,σ2)N(\mu,\sigma^{2}) distribution. The finite-dimensional distribution of the Gaussian NNMP is characterized by the following proposition.

Proposition 2.

Consider the Gaussian NNMP in (6) with μl=μ\mu_{l}=\mu and σl2=σ2\sigma^{2}_{l}=\sigma^{2}, for all ll. If Z(𝐬1)N(z|μ,σ2)Z(\bm{s}_{1})\sim N(z\,|\,\mu,\sigma^{2}), the Gaussian NNMP has the N(μ,σ2)N(\mu,\sigma^{2}) stationary marginal distribution, and its finite-dimensional distributions are mixtures of multivariate Gaussian distributions.

We refer to the model in Proposition 2 as the stationary Gaussian NNMP. Based on the Gaussian NNMP, various NNMP models with different families for (Ul,Vl)(U_{l},V_{l}) can be constructed by exploiting location-scale mixtures of Gaussian distributions. We illustrate the approach with the skew-Gaussian NNMP model. Denote by TN(μ,σ2;a,b)\mathrm{TN}(\mu,\sigma^{2};a,b) the Gaussian distribution with mean μ\mu and variance σ2\sigma^{2}, truncated at the interval (a,b)(a,b). Building from the Gaussian NNMP, we start with a conditional bivariate Gaussian distribution for (Ul,Vl)(U_{l},V_{l}), given z0TN(0,1;0,)z_{0}\sim\mathrm{TN}(0,1;0,\infty), where μl\mu_{l} is replaced with μl+λlz0\mu_{l}+\lambda_{l}z_{0}. Marginalizing out z0z_{0} yields the bivariate skew-Gaussian distribution for (Ul,Vl)(U_{l},V_{l}) (Azzalini, 2013). Extending again ρl\rho_{l} to ρl(𝒗)\rho_{l}(\bm{v}), for all ll, we can express the conditional density p(z(𝒗)|𝒛Ne(𝒗))p(z(\bm{v})\,|\,\bm{z}_{\text{Ne}(\bm{v})}) for the skew-Gaussian NNMP model as l=1Lwl(𝒗)0N(z(𝒗)|μl(𝒗),σl2(𝒗))TN(z0(𝒗)|μ0l(𝒗(l)),σ0l2;0,)𝑑z0(𝒗),\sum_{l=1}^{L}w_{l}(\bm{v})\int_{0}^{\infty}N(z(\bm{v})\,|\,\mu_{l}(\bm{v}),\sigma_{l}^{2}(\bm{v}))\,\mathrm{TN}(z_{0}(\bm{v})\,|\,\mu_{0l}(\bm{v}_{(l)}),\sigma_{0l}^{2};0,\infty)dz_{0}(\bm{v}), where we have: μl(𝒗)=\mu_{l}(\bm{v})= {1ρl(𝒗)}{μl+λlz0(𝒗)}+ρl(𝒗)z(𝒗(l))\{1-\rho_{l}(\bm{v})\}\{\mu_{l}+\lambda_{l}z_{0}(\bm{v})\}+\rho_{l}(\bm{v})z(\bm{v}_{(l)}); σl2(𝒗)=\sigma_{l}^{2}(\bm{v})= σl2{1(ρl(𝒗))2}\sigma_{l}^{2}\{1-(\rho_{l}(\bm{v}))^{2}\}; μ0l(𝒗(l))=\mu_{0l}(\bm{v}_{(l)})= {z(𝒗(l))μl}λl/(σl2+λl2)\{z(\bm{v}_{(l)})-\mu_{l}\}\lambda_{l}/(\sigma^{2}_{l}+\lambda_{l}^{2}); and σ0l2=\sigma_{0l}^{2}= σl2/(σl2+λl2)\sigma^{2}_{l}/(\sigma^{2}_{l}+\lambda_{l}^{2}). Setting λl=λ\lambda_{l}=\lambda, μl=μ\mu_{l}=\mu, and σl2=σ2\sigma^{2}_{l}=\sigma^{2}, for all ll, we obtain the stationary skew-Gaussian NNMP model, with skew-Gaussian marginal fZ(z)=2N(z|μ,λ2+σ2)Φ((zμ)λ/(σλ2+σ2))f_{Z}(z)=2\,N(z\,|\,\mu,\lambda^{2}+\sigma^{2})\,\Phi((z-\mu)\lambda/(\sigma\sqrt{\lambda^{2}+\sigma^{2}})), denoted as SN(μ,λ2+σ2,λ/σ)\mathrm{SN}(\mu,\lambda^{2}+\sigma^{2},\lambda/\sigma).

The skew-Gaussian NNMP model is an example of a location mixture of Gaussian distributions. Scale mixtures can also be considered to obtain, for example, the Student-t model. In that case, we replace the covariance matrix Σl\Sigma_{l} with cΣlc\Sigma_{l}, taking cc as a random variable with an appropriate inverse-gamma distribution. Important families that admit a location and/or scale mixture of Gaussians representation include the skew-t, Laplace, and asymmetric Laplace distributions. Using a similar approach to the one for the skew-Gaussian NNMP example, we can construct the corresponding NNMP models.

Example 2.

Copula NNMP models.

A copula function C:[0,1]p[0,1]C:[0,1]^{p}\rightarrow[0,1] is a function such that, for any multivariate distribution F(z1,,zp)F(z_{1},\dots,z_{p}), there exists a copula CC for which F(z1,,zp)=C(F1(z1),,Fp(zp))F(z_{1},\dots,z_{p})=C(F_{1}(z_{1}),\dots,F_{p}(z_{p})), where FjF_{j} is the marginal distribution function of ZjZ_{j}, j=1,,pj=1,\dots,p (Sklar, 1959). If FjF_{j} is continuous for all jj, CC is unique. A copula enables us to separate the modeling of the marginal distributions from the dependence. Thus, the invariant condition in Proposition 1 can be attained by specifying the stationary distribution FZF_{Z} as the marginal distribution of (Ul,Vl)(U_{l},V_{l}) for all ll. The copula parameter that determines the dependence of (Ul,Vl)(U_{l},V_{l}) can be modeled as spatially varying to create a sequence of spatially dependent bivariate vectors (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}). Here, we focus on continuous distributions, although this strategy can be applied for any family of distributions for FZF_{Z}. We consider bivariate copulas with a single copula parameter, and illustrate next the construction of a copula NNMP given a stationary marginal density fZf_{Z}.

For the bivariate distribution of each (Ul,Vl)(U_{l},V_{l}) with marginals fUlf_{U_{l}} and fVlf_{V_{l}}, we consider a copula ClC_{l} with parameter ηl\eta_{l}, for l=1,,Ll=1,\dots,L. We obtain a spatially varying copula C𝒗,lC_{\bm{v},l} for (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) by extending ηl\eta_{l} to ηl(𝒗)\eta_{l}(\bm{v}). The joint density of (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) is given by c𝒗,l(z(𝒗),z(𝒗(l)))fU𝒗,l(z(𝒗))fV𝒗,l(z(𝒗(l)))c_{\bm{v},l}(z(\bm{v}),z(\bm{v}_{(l)}))f_{U_{\bm{v},l}}(z(\bm{v}))f_{V_{\bm{v},l}}(z(\bm{v}_{(l)})), where c𝒗,lc_{\bm{v},l} is the copula density of C𝒗,lC_{\bm{v},l}, and fU𝒗,l=fUlf_{U_{\bm{v},l}}=f_{U_{l}} and fV𝒗,l=fVlf_{V_{\bm{v},l}}=f_{V_{l}} are the marginal densities of U𝒗,lU_{\bm{v},l} and V𝒗,lV_{\bm{v},l}, respectively. Given a pre-specified stationary marginal fZf_{Z}, we replace both fU𝒗,lf_{U_{\bm{v},l}} and fV𝒗,lf_{V_{\bm{v},l}} with fZf_{Z}, for every 𝒗\bm{v} and for all ll. We then obtain the conditional density

p(z(𝒗)𝒛Ne(𝒗))=l=1Lwl(𝒗)c𝒗,l(z(𝒗),z(𝒗(l)))fZ(z(𝒗))p(z(\bm{v})\mid\bm{z}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,c_{\bm{v},l}(z(\bm{v}),z(\bm{v}_{(l)}))\,f_{Z}(z(\bm{v})) (7)

that characterizes the stationary copula NNMP.

Under the copula framework, one strategy to specify the spatially varying parameter is through the Kendall’s τ\tau coefficient. The Kendall’s τ\tau, taking values in [1,1][-1,1], is a bivariate concordance measure with properties useful for non-Gaussian modeling. In particular, its existence does not require finite second moment and it is invariant under strictly increasing transformations. If (Ul,Vl)(U_{l},V_{l}) is continuous with a copula ClC_{l}, its Kendall’s τ\tau is ρτ,l=4[0,1]2Cl𝑑Cl1\rho_{\tau,l}=4\int_{[0,1]^{2}}C_{l}dC_{l}-1. Taking Al[1,1]A_{l}\subset[-1,1] as the range of ρτ,l\rho_{\tau,l}, we can construct a composition function hl:=glklh_{l}:=g_{l}\circ k_{l} for some link function gl:AlHlg_{l}:A_{l}\rightarrow H_{l} and kernel function kl:𝒟×𝒟Alk_{l}:\mathcal{D}\times\mathcal{D}\rightarrow A_{l}, where HlH_{l} is the parameter space associated with ClC_{l}. The kernel klk_{l} should be specified with caution; klk_{l} must satisfy axioms in the definition of a bivariate concordance measure (Joe 2014, Section 2.12). We illustrate the strategy with the following example.

Example 3.

The bivariate Gumbel copula is an asymmetric copula useful for modeling dependence when the marginals are positive and heavy-tailed. The spatial Gumbel copula can be defined as C𝒗,l=exp([{logFU𝒗,l(z(𝒗))}ηl(𝒗)+{logFV𝒗,l(z(𝒗(l)))}ηl(𝒗)]1/ηl(𝒗))C_{\bm{v},l}=\exp\big{(}-[\{-\log F_{U_{\bm{v},l}}(z(\bm{v}))\}^{\eta_{l}(\bm{v})}+\{-\log F_{V_{\bm{v},l}}(z(\bm{v}_{(l)}))\}^{\eta_{l}(\bm{v})}]^{1/\eta_{l}(\bm{v})}\big{)}, where ηl(𝒗)[1,)\eta_{l}(\bm{v})\in[1,\infty) and perfect dependence is obtained if ηl(𝒗)\eta_{l}(\bm{v})\rightarrow\infty. The Kendall’s τ\tau is ρτ,l(𝒗)=1ηl1(𝒗)\rho_{\tau,l}(\bm{v})=1-\eta_{l}^{-1}(\bm{v}), taking values in [0,1][0,1]. We define ρτ,l(𝒗):=kl(𝒗𝒗(l))\rho_{\tau,l}(\bm{v}):=k_{l}(||\bm{v}-\bm{v}_{(l)}||), an isotropic correlation function. Let gl(x)=(1x)1g_{l}(x)=(1-x)^{-1}. Then, the function hl(𝒗𝒗(l))=glkl(𝒗𝒗(l))=(1kl(𝒗𝒗(l)))1h_{l}(||\bm{v}-\bm{v}_{(l)}||)=g_{l}\circ k_{l}(||\bm{v}-\bm{v}_{(l)}||)=(1-k_{l}(||\bm{v}-\bm{v}_{(l)}||))^{-1}. Thus, the parameter ηl(𝒗)η(𝒗𝒗(l))\eta_{l}(\bm{v})\equiv\eta(||\bm{v}-\bm{v}_{(l)}||) is given by hl(𝒗𝒗(l))h_{l}(||\bm{v}-\bm{v}_{(l)}||), and ηl(𝒗)\eta_{l}(\bm{v})\rightarrow\infty as 𝒗𝒗(l)0||\bm{v}-\bm{v}_{(l)}||\rightarrow 0.

After we define a spatially varying copula, we obtain a family of copula NNMPs by choosing a desired family of marginal distributions. Section 4.1 illustrates Gaussian and Gumbel copula NNMP models with gamma marginals, and the supplementary material provides an additional example of a Gaussian copula NNMP with beta marginals.

Copula NNMP models offer avenues to capture complex dependence using general bivariate copulas. Traditional spatial copula models specify the finite dimensional distributions of the underlying spatial process with a multivariate copula. However, multivariate copulas need to be used with careful consideration in a spatial setting. For example, it is common to assume that spatial processes exhibit stronger dependence at smaller distances. Thus, copulas such as the multivariate Archimedean copula that induce an exchangeable dependence structure are inappropriate. Though spatial vine copula models (Gräler, 2014) can resolve this restriction, their model structure and computation are substantially more complicated than copula NNMP models.

2.4 Mixture Component Specification and Tail Dependence

A benefit of building NNMPs from a set of base random vectors is that specification of the multivariate dependence of Z(𝒗)Z(\bm{v}) given its neighbors is determined mainly by that of the base random vectors. In this section, we illustrate this attractive property of the model with the establishment of lower bounds for two measures used to assess strength of tail dependence.

The main assumption is that the base random vector (Ul,Vl)(U_{l},V_{l}) has stochastically increasing positive dependence. UlU_{l} is said to be stochastically increasing in VlV_{l}, if P(Ul>ul|Vl=vl)P\big{(}U_{l}>u_{l}\,|\,V_{l}=v_{l}\big{)} increases as vlv_{l} increases. The definition can be extended to a multivariate random vector (Z1,,Zp)(Z_{1},\dots,Z_{p}). Z1Z_{1} is said to be stochastically increasing in (Z2,,Zp)(Z_{2},\dots,Z_{p}) if P(Z1>z1|Z2=z2,,Zp=zp)P(Z1>z1|Z2=z2,,Zp=zp)P\big{(}Z_{1}>z_{1}\,|\,Z_{2}=z_{2},\dots,Z_{p}=z_{p}\big{)}\leq P\big{(}Z_{1}>z_{1}\,|\,Z_{2}=z_{2}^{\prime},\dots,Z_{p}=z_{p}^{\prime}\big{)}, for all (z2,,zp)(z_{2},\dots,z_{p}) and (z2,,zp)(z_{2}^{\prime},\dots,z_{p}^{\prime}) in the support of (Z2,,Zp)(Z_{2},\dots,Z_{p}), where zjzjz_{j}\leq z_{j}^{\prime}, for j=2,,pj=2,\dots,p. The conditional density in (4) implies that

P(Z(𝒗)>z|𝒁Ne(𝒗)=𝒛Ne(𝒗))=l=1Lwl(𝒗)P(Z(𝒗)>z|Z(𝒗(l))=z(𝒗(l))).P\big{(}Z(\bm{v})>z\,|\,\bm{Z}_{\text{Ne}(\bm{v})}=\bm{z}_{\text{Ne}(\bm{v})}\big{)}=\sum_{l=1}^{L}w_{l}(\bm{v})\,P\big{(}Z(\bm{v})>z\,|\,Z(\bm{v}_{(l)})=z(\bm{v}_{(l)})\big{)}.

Therefore, Z(𝒗)Z(\bm{v}) is stochastically increasing in 𝒁Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})} if Z(𝒗)Z(\bm{v}) is stochastically increasing in Z(𝒗(l))Z(\bm{v}_{(l)}) with respect to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) for all ll. If the sequence (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) is built from the vector (Ul,Vl)(U_{l},V_{l}), then the set of base random vectors determines the stochastically increasing positive dependence of Z(𝒗)Z(\bm{v}) given its neighbors.

For a bivariate random vector (Ul,Vl)(U_{l},V_{l}), the upper and lower tail dependence coefficients, denoted as λ,l\lambda_{\mathcal{H},l} and λ,l\lambda_{\mathcal{L},l}, respectively, are λ,l=limq1P(Ul>FUl1(q)|Vl>FVl1(q))\lambda_{\mathcal{H},l}=\lim_{q\rightarrow 1^{-}}P\big{(}U_{l}>F_{U_{l}}^{-1}(q)\,|\,V_{l}>F_{V_{l}}^{-1}(q)\big{)} and λ,l=limq0+P(UlFUl1(q)|VlFVl1(q))\lambda_{\mathcal{L},l}=\lim_{q\rightarrow 0^{+}}P\big{(}U_{l}\leq F_{U_{l}}^{-1}(q)\,|\,V_{l}\leq F_{V_{l}}^{-1}(q)\big{)}. When λ,l>0\lambda_{\mathcal{H},l}>0, we say UlU_{l} and VlV_{l} have upper tail dependence. When λ,l=0\lambda_{\mathcal{H},l}=0, UlU_{l} and VlV_{l} are said to be asymptotically independent in the upper tail. Lower tail dependence and asymptotically independence in the lower tail are similarly defined using λ,l\lambda_{\mathcal{L},l}. Let FZ(𝒗)F_{Z(\bm{v})} be the marginal distribution function of Z(𝒗)Z(\bm{v}). Analogously, we can define the upper and lower tail dependence coefficients for Z(𝒗)Z(\bm{v}) given its nearest neighbors,

λ(𝒗)\displaystyle\lambda_{\mathcal{H}}(\bm{v}) =limq1P(Z(𝒗)>FZ(𝒗)1(q)Z(𝒗(1))>FZ(𝒗(1))1(q),,Z(𝒗(L))>FZ(𝒗(L))1(q)),\displaystyle=\lim_{q\rightarrow 1^{-}}P\big{(}Z(\bm{v})>F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})>F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})>F_{Z(\bm{v}_{(L)})}^{-1}(q)\big{)},
λ(𝒗)\displaystyle\lambda_{\mathcal{L}}(\bm{v}) =limq0+P(Z(𝒗)FZ(𝒗)1(q)Z(𝒗(1))FZ(𝒗(1))1(q),,Z(𝒗(L))FZ(𝒗(L))1(q)).\displaystyle=\lim_{q\rightarrow 0^{+}}P\big{(}Z(\bm{v})\leq F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})\leq F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})\leq F_{Z(\bm{v}_{(L)})}^{-1}(q)\big{)}.

The following proposition provides lower bounds for the tail dependence coefficients.

Proposition 3.

Consider an NNMP for which the component density f𝐯,lf_{\bm{v},l} is specified by the conditional density of U𝐯,lU_{\bm{v},l} given V𝐯,lV_{\bm{v},l}, where the random vector (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}) follows a bivariate distribution with marginal distribution functions FU𝐯,lF_{U_{\bm{v},l}} and FV𝐯,lF_{V_{\bm{v},l}}, for l=1,,Ll=1,\dots,L. The spatial dependence of random vector (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}) is built from the base vector (Ul,Vl)(U_{l},V_{l}), which has a bivariate distribution such that UlU_{l} is stochastically increasing in VlV_{l}, for l=1,,Ll=1,\dots,L. Then, for every 𝐯\bm{v}, the lower bound for the upper tail dependence coefficient λ(𝐯)\lambda_{\mathcal{H}}(\bm{v}) is l=1Lwl(𝐯)limq1P(Z(𝐯)>FU𝐯,l1(q)|Z(𝐯(l))=FV𝐯,l1(q))\sum_{l=1}^{L}w_{l}(\bm{v})\lim_{q\rightarrow 1^{-}}P\big{(}Z(\bm{v})>F_{U_{\bm{v},l}}^{-1}(q)\,|\,Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(q)\big{)}, and the lower bound for the lower tail dependence coefficient λ(𝐯)\lambda_{\mathcal{L}}(\bm{v}) is l=1Lwl(𝐯)limq0+P(Z(𝐯)FU𝐯,l1(q)|Z(𝐯(l))=FV𝐯,l1(q))\sum_{l=1}^{L}w_{l}(\bm{v})\lim_{q\rightarrow 0^{+}}P\big{(}Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\,|\,Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(q)\big{)}.

Proposition 3 establishes that the lower and upper tail dependence coefficients are bounded below by a convex combination of, respectively, the limits of the conditional distribution functions and the conditional survival functions. These are fully determined by the dependence structure of the bivariate distribution for (Ul,Vl)(U_{l},V_{l}). The result is best illustrated with an example.

Example 4.

Consider a Lomax NNMP for which the bivariate distributions of the base random vectors correspond to a bivariate Lomax distribution (Arnold et al., 1999), resulting in conditional density, p(z(𝒗)|𝒛Ne(𝒗))=p(z(\bm{v})\,|\,\bm{z}_{\text{Ne}(\bm{v})})= l=1Lwl(𝒗)Lo(z(𝒗)|z(𝒗(l))+ϕl,αl(𝒗))\sum_{l=1}^{L}w_{l}(\bm{v})\,\mathrm{Lo}(z(\bm{v})\,|\,z(\bm{v}_{(l)})+\phi_{l},\alpha_{l}(\bm{v})), where Lo(x|ϕ,α)=αϕ1(1+xϕ1)(α+1)\mathrm{Lo}(x\,|\,\phi,\alpha)=\alpha\phi^{-1}(1+x\phi^{-1})^{-(\alpha+1)} denotes the Lomax density, a shifted version of the Pareto Type I density. A small value of α\alpha indicates a heavy tail. The component conditional survival function of the Lomax NNMP, expressed in terms of the quantile qq, is {1+FU𝒗,l1(q)/(FV𝒗,l1(q)+ϕl)}αl(𝒗)\left\{1+F_{U_{\bm{v},l}}^{-1}(q)/\big{(}F_{V_{\bm{v},l}}^{-1}(q)+\phi_{l}\big{)}\right\}^{-\alpha_{l}(\bm{v})} which converges to 2αl(𝒗)2^{-\alpha_{l}(\bm{v})} as q1q\rightarrow 1^{-}. Therefore, the lower bound for λ(𝒗)\lambda_{\mathcal{H}}(\bm{v}) is l=1Lwl(𝒗) 2αl(𝒗)\sum_{l=1}^{L}w_{l}(\bm{v})\,2^{-\alpha_{l}(\bm{v})}. As αl(𝒗)0\alpha_{l}(\bm{v})\rightarrow 0 for all ll, the lower bound for λ(𝒗)\lambda_{\mathcal{H}}(\bm{v}) tends to one, and hence λ(𝒗)\lambda_{\mathcal{H}}(\bm{v}) tends to one, since λ(𝒗)1\lambda_{\mathcal{H}}(\bm{v})\leq 1. As αl(𝒗)\alpha_{l}(\bm{v})\rightarrow\infty for all ll, the lower bound tends to zero.

Proposition 3 holds for the general framework. If the distribution of (Ul,Vl)(U_{l},V_{l}) with FUl=FVlF_{U_{l}}=F_{V_{l}} has first order partial derivatives and exchangeable dependence, namely (Ul,Vl)(U_{l},V_{l}) and (Vl,Ul)(V_{l},U_{l}) have the same joint distribution, the lower bounds of the tail dependence coefficients depend on the component tail dependence coefficients. The result is summarized in the following corollary.

Corollary 1.

Suppose that the base random vector (Ul,Vl)(U_{l},V_{l}) in Proposition 3 is exchangeable, and its bivariate distribution with marginals FUl=FVlF_{U_{l}}=F_{V_{l}} has first order partial derivatives, for all ll. Then the upper and lower tail dependence coefficients λ(𝐯)\lambda_{\mathcal{H}}(\bm{v}) and λ(𝐯)\lambda_{\mathcal{L}}(\bm{v}) are bounded below by l=1Lwl(𝐯)λ,l(𝐯)/2\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{H},l}(\bm{v})/2 and l=1Lwl(𝐯)λ,l(𝐯)/2\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{L},l}(\bm{v})/2, where λ,l(𝐯)\lambda_{\mathcal{H},l}(\bm{v}) and λ,l(𝐯)\lambda_{\mathcal{L},l}(\bm{v}) are the tail dependence coefficients with respect to (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}).

Under Corollary 1, if the bivariate distribution of (Ul,Vl)(U_{l},V_{l}) is symmetric, for instance, an elliptically symmetric distribution, the upper and lower tail dependence coefficients coincide, and can simply be denoted as λ(𝒗)\lambda(\bm{v}). Then, we have that λ(𝒗)l=1Lwl(𝒗)λl(𝒗)/2\lambda(\bm{v})\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{l}(\bm{v})/2, where λl(𝒗)\lambda_{l}(\bm{v}) is the tail dependence coefficient with respect to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}).

Tail dependence can also be quantified using the boundary of the conditional distribution function, as proposed in Hua and Joe (2014) for a bivariate random vector. In particular, the upper tail dependence of (Ul,Vl)(U_{l},V_{l}) is said to have some strength if FUl|Vl(FUl1(q)|FVl1(1))F_{U_{l}|V_{l}}\big{(}F_{U_{l}}^{-1}(q)\,|\,F_{V_{l}}^{-1}(1)\big{)} is positive at q=1q=1. Likewise, a non-zero FUl|Vl(FUl1(q)|FVl1(0))F_{U_{l}|V_{l}}\big{(}F_{U_{l}}^{-1}(q)\,|\,F_{V_{l}}^{-1}(0)\big{)} at q=0q=0 indicates some strength of dependence in the lower tails. The functions FUl|Vl(FVl1(0))F_{U_{l}|V_{l}}\big{(}\cdot\mid F_{V_{l}}^{-1}(0)\big{)} and FUl|Vl(FVl1(1))F_{U_{l}|V_{l}}\big{(}\cdot\mid F_{V_{l}}^{-1}(1)\big{)} are referred to as the boundary conditional distribution functions.

We use F1|2(|F𝒁Ne(𝒗)1(q))F_{1|2}\big{(}\cdot\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(q)\big{)} for simpler notation for the conditional distribution function of Z(𝒗)Z(\bm{v}), F(|Z(𝒗(1))=FZ(𝒗(1))1(q),,Z(𝒗(L))=FZ(𝒗(L))1(q))F\big{(}\cdot\,|\,Z(\bm{v}_{(1)})=F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})=F_{Z(\bm{v}_{(L)})}^{-1}(q)\big{)}. Then F1|2(|F𝒁Ne(𝒗)1(0))F_{1|2}\big{(}\cdot\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0)\big{)} and F1|2(|F𝒁Ne(𝒗)1(1))F_{1|2}\big{(}\cdot\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} are the boundary conditional distribution functions for the NNMP model. The upper tail dependence is said to be i) strongest if F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(1))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} equals 0 for 0q<10\leq q<1 and has a mass of 1 at q=1q=1; ii) intermediate if F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(1))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} has positive but not unit mass at q=1q=1; iii) weakest if F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(1))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} has no mass at q=1q=1. The strength of lower tail dependence is defined likewise using F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(0))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0)\big{)}. The following result gives lower bounds for the boundary conditional distribution functions.

Proposition 4.

Consider an NNMP for which the component density f𝐯,lf_{\bm{v},l} is specified by the conditional density of U𝐯,lU_{\bm{v},l} given V𝐯,lV_{\bm{v},l}. The spatial dependence of random vector (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}) is built from the base vector (Ul,Vl)(U_{l},V_{l}), which has a bivariate distribution such that UlU_{l} is stochastically increasing in VlV_{l}, for l=1,,Ll=1,\dots,L. Let λ,l(𝐯)\lambda_{\mathcal{L},l}(\bm{v}) and λ,l(𝐯)\lambda_{\mathcal{H},l}(\bm{v}) be the lower and upper tail dependence coefficients corresponding to (U𝐯,l,V𝐯,l)(U_{\bm{v},l},V_{\bm{v},l}). If for a given 𝐯\bm{v}, there exists λ,l(𝐯)>0\lambda_{\mathcal{L},l}(\bm{v})>0 for some ll, then the conditional distribution function F1|2(FZ(𝐯)1(q)|F𝐙Ne(𝐯)1(0))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0)\big{)} has strictly positive mass p0(𝐯)p_{0}(\bm{v}) at q=0q=0 with p0(𝐯)l=1Lwl(𝐯)λ,l(𝐯)p_{0}(\bm{v})\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{L},l}(\bm{v}). Similarly, if for a given 𝐯\bm{v}, there exists λ,l(𝐯)>0\lambda_{\mathcal{H},l}(\bm{v})>0 for some ll, then the conditional distribution function F1|2(FZ(𝐯)1(q)|F𝐙Ne(𝐯)1(1))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} has strictly positive mass p1(𝐯)p_{1}(\bm{v}) at q=1q=1 with p1(𝐯)l=1Lwl(𝐯)λ,l(𝐯)p_{1}(\bm{v})\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{H},l}(\bm{v}).

Proposition 4 complements Proposition 3 to assess strength of tail dependence. It readily applies for bivariate distributions, especially for copulas which yield explicit expressions for the tail dependence coefficients. In particular, the spatially varying Gumbel copula C𝒗,lC_{\bm{v},l} in Example 3 has upper tail dependence coefficient 221/ηl(𝒗)>02-2^{1/\eta_{l}(\bm{v})}>0 for ηl(𝒗)>1\eta_{l}(\bm{v})>1, so the tail dependence of a Gumbel copula NNMP model has some strength if ηl(𝒗)>1\eta_{l}(\bm{v})>1 for some ll. In fact, applying the result in Hua and Joe (2014), with a Gumbel copula, F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(1))F_{1|2}\big{(}F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)\big{)} degenerates at q=1q=1, implying strongest tail dependence.

3 Bayesian Hierarchical Model and Inference

3.1 Hierarchical Model Formulation

We introduce the general approach for NNMP Bayesian implementation, treating the observed spatial responses as an NNMP realization. The inferential framework can be easily extended to incorporate model components that may be needed in practical settings, such as covariates and additional error terms. We illustrate the extensions with the real data analysis in Section 4.2 and in the supplementary material, and provide further discussion in Section 5.

Our approach for inference is based on a likelihood conditional on the first LL elements of the realization 𝒛𝒮=(z(𝒔1),,z(𝒔n))\bm{z}_{\operatorname*{\mathcal{S}}}=(z(\bm{s}_{1}),\dots,z(\bm{s}_{n}))^{\top} over the reference set 𝒮𝒟\operatorname*{\mathcal{S}}\subset\mathcal{D}. Following a commonly used approach for mixture models fitting, we use data augmentation to facilitate inference. For each z(𝒔i)z(\bm{s}_{i}), i=L+1,,ni=L+1,\dots,n, we introduce a configuration variable i\ell_{i}, taking values in {1,,L}\{1,\dots,L\}, such that P(i|𝒘(𝒔i))=l=1Lwl(𝒔i)δl(i)P\big{(}\ell_{i}\,|\,\bm{w}(\bm{s}_{i})\big{)}=\sum_{l=1}^{L}w_{l}(\bm{s}_{i})\delta_{l}(\ell_{i}), where 𝒘(𝒔i)=(w1(𝒔i),,wL(𝒔i))\bm{w}(\bm{s}_{i})=(w_{1}(\bm{s}_{i}),\dots,w_{L}(\bm{s}_{i}))^{\top}, and δl(i)=1\delta_{l}(\ell_{i})=1 if i=l\ell_{i}=l and 0 otherwise. Conditional on the configuration variables and the vector (z(𝒔1),,z(𝒔L))(z(\bm{s}_{1}),\dots,z(\bm{s}_{L}))^{\top}, the augmented model is

z(𝒔i)z(𝒔(i,i)),i,𝜽ind.f𝒔i,i(z(𝒔i)|z(𝒔(i,i)),𝜽),i𝒘(𝒔i)ind.l=1Lwl(𝒔i)δl(i),\displaystyle z(\bm{s}_{i})\mid z(\bm{s}_{(i,\ell_{i})}),\ell_{i},\bm{\theta}\,\stackrel{{\scriptstyle ind.}}{{\sim}}f_{\bm{s}_{i},\ell_{i}}(z(\bm{s}_{i})\,|\,z(\bm{s}_{(i,\ell_{i})}),\bm{\theta}),\;\;\;\ell_{i}\mid\bm{w}(\bm{s}_{i})\,\stackrel{{\scriptstyle ind.}}{{\sim}}\sum_{l=1}^{L}w_{l}(\bm{s}_{i})\delta_{l}(\ell_{i}), (8)

where 𝜽\bm{\theta} collects the parameters of the densities f𝒔i,lf_{\bm{s}_{i},l}.

A key component of the Bayesian model formulation is the prior model for the weights. Weights are allowed to vary in space, adjusting to the neighbor structure of different reference locations. We describe the construction for weights corresponding to a point in the reference set. For non-reference points, weights are defined analogously. Consider a collection of spatially dependent distribution functions {G𝒔i:𝒔i𝒮}\{G_{\bm{s}_{i}}:\bm{s}_{i}\in\operatorname*{\mathcal{S}}\} supported on (0,1)(0,1). For each 𝒔i\bm{s}_{i}, the weights are defined as the increments of G𝒔iG_{\bm{s}_{i}} with cutoff points r𝒔i,0,,r𝒔i,Lr_{\bm{s}_{i},0}\,,\dots,r_{\bm{s}_{i},L}. More specifically,

wl(𝒔i)=𝟙(r𝒔i,l1,r𝒔i,l)(t)𝑑G𝒔i(t),l=1,,L,w_{l}(\bm{s}_{i})=\int\mathbbm{1}_{(r_{\bm{s}_{i},l-1},\,r_{\bm{s}_{i},l})}(t)\,dG_{\bm{s}_{i}}(t),\;\;\;l=1,\dots,L, (9)

where 𝟙A\mathbbm{1}_{A} denotes the indicator function for set AA. The cutoff points 0=r𝒔i,0<r𝒔i,1<<r𝒔i,L=10=r_{\bm{s}_{i},0}<r_{\bm{s}_{i},1}<\dots<r_{\bm{s}_{i},L}=1 are such that, for l=1,,Ll=1,\dots,L, r𝒔i,lr𝒔i,l1=k(𝒔i,𝒔(il)|𝜻)/l=1Lk(𝒔i,𝒔(il)|𝜻)r_{\bm{s}_{i},l}-r_{\bm{s}_{i},l-1}=k^{\prime}(\bm{s}_{i},\bm{s}_{(il)}\,|\,\bm{\zeta})/\sum_{l=1}^{L}k^{\prime}(\bm{s}_{i},\bm{s}_{(il)}\,|\,\bm{\zeta}), where k:𝒟×𝒟[0,1]k^{\prime}:\mathcal{D}\times\mathcal{D}\rightarrow[0,1] is a bounded kernel function with parameters 𝜻\bm{\zeta}. The kernel and its associated parameters affect the smoothness of the resulting random field. We take G𝒔iG_{\bm{s}_{i}} as a logit Gaussian distribution, denoted as G𝒔i(|μ(𝒔i),κ2)G_{\bm{s}_{i}}(\cdot\,|\,\mu(\bm{s}_{i}),\kappa^{2}), such that the corresponding Gaussian distribution has mean μ(𝒔i)\mu(\bm{s}_{i}) and variance κ2\kappa^{2}. The spatial dependence across the weights is introduced through the mean μ(𝒔i)=γ0+γ1si1+γ2si2\mu(\bm{s}_{i})=\gamma_{0}+\gamma_{1}s_{i1}+\gamma_{2}s_{i2}, where 𝒔i=(si1,si2)\bm{s}_{i}=(s_{i1},s_{i2}). Given the cutoff points and κ2\kappa^{2}, a small value of μ(𝒔i)\mu(\bm{s}_{i}) favors large weights for the near neighbors of 𝒔i\bm{s}_{i}. A simpler version of the model in (9) is obtained by letting G𝒔iG_{\bm{s}_{i}} be the uniform distribution on (0,1)(0,1). Then the weights become k(𝒔i,𝒔(il)|𝜻)/l=1Lk(𝒔i,𝒔(il)|𝜻)k^{\prime}(\bm{s}_{i},\bm{s}_{(il)}\,|\,\bm{\zeta})/\sum_{l=1}^{L}k^{\prime}(\bm{s}_{i},\bm{s}_{(il)}\,|\,\bm{\zeta}). We notice that Cadonna et al. (2019) use a set of fixed, uniform cutoff points on [0,1][0,1], i.e., r𝒔i,lr𝒔i,l1=1/Lr_{\bm{s}_{i},l}-r_{\bm{s}_{i},l-1}=1/L, for spectral density estimation, with a collection of logit Gaussian distributions indexed by frequency.

The full Bayesian model is completed with prior specification for parameters 𝜽,𝜻,𝜸=(γ0,γ1,γ2)\bm{\theta},\bm{\zeta},\bm{\gamma}=(\gamma_{0},\gamma_{1},\gamma_{2})^{\top}, and κ2\kappa^{2}. The priors for 𝜽\bm{\theta} and 𝜻\bm{\zeta} depend on the choices of the densities f𝒔i,lf_{\bm{s}_{i},l} and the cutoff point kernel kk^{\prime}, respectively. For parameters 𝜸\bm{\gamma} and κ2\kappa^{2}, we specify N(𝜸|𝝁γ,𝑽γ)N(\bm{\gamma}\,|\,\bm{\mu}_{\gamma},\bm{V}_{\gamma}) and IG(κ2|uκ2,vκ2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}}) priors, respectively, where IG\mathrm{IG} denotes the inverse gamma distribution.

Finally, we note that an NNMP model requires selection of the neighborhood size LL. This can be done using standard model comparison metrics, scoring rules, or information criteria; for example, Datta et al. (2016a) used root mean square predictive error and Guinness (2018) used Kullback–Leibler divergence. In general, a larger LL increases computational cost. Datta et al. (2016a) conclude that a moderate value LL (20)(\leq 20) typically suffices for the nearest-neighbor Gaussian process models. Peruzzi et al. (2020) point out that a smaller LL corresponds to a larger Kullback–Leibler divergence of p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}) from p(𝒛𝒮)p(\bm{z}_{\operatorname*{\mathcal{S}}}), regardless of the distributional assumption of the density. Moreover, it is possible that information from the farthest neighbors is also important (Stein et al., 2004). Therefore, for large non-Gaussian data sets with complex dependence, one may seek a larger LL to obtain a better approximation to the full model. Our model for the weights allows taking a relatively large neighbor set with less computational demand. We assign small probabilities a priori to distant neighbors. The contribution of each neighbor will be induced by the mixing, with important neighbors being assigned large weights a posteriori.

3.2 Estimation and Prediction

We implement a Markov chain Monte Carlo sampler to simulate from the posterior distribution of the model parameters. To allow for efficient simulation of parameters 𝜸\bm{\gamma} and κ2\kappa^{2}, we associate each y(𝒔i)y(\bm{s}_{i}) with a latent Gaussian variable tit_{i} with mean μ(𝒔i)\mu(\bm{s}_{i}) and variance κ2\kappa^{2}. There is a one-to-one correspondence between the configuration variables i\ell_{i} and latent variables tit_{i}: i=l\ell_{i}=l if and only if ti(r𝒔i,l1,r𝒔i,l)t_{i}\in(r_{\bm{s}_{i},l-1}^{*},r_{\bm{s}_{i},l}^{*}) where r𝒔i,l=log(r𝒔i,l/(1r𝒔i,l))r^{*}_{\bm{s}_{i},l}=\log(r_{\bm{s}_{i},l}/(1-r_{\bm{s}_{i},l})), for l=1,,Ll=1,\dots,L. The posterior distribution of the model parameters, based on the new augmented model, is

p(𝜽,\displaystyle p(\bm{\theta}, 𝜻,𝜸,κ2,{ti}i=L+1n|𝒛𝒮)π𝜽(𝜽)×π𝜻(𝜻)×N(𝜸|𝝁γ,𝑽γ)×IG(κ2|uκ2,vκ2)\displaystyle\bm{\zeta},\bm{\gamma},\kappa^{2},\{t_{i}\}_{i=L+1}^{n}\,|\,\bm{z}_{\operatorname*{\mathcal{S}}})\propto\;\pi_{\bm{\theta}}(\bm{\theta})\times\pi_{\bm{\zeta}}(\bm{\zeta})\times N(\bm{\gamma}\,|\,\bm{\mu}_{\gamma},\bm{V}_{\gamma})\times\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}})
×N(𝒕|𝑫𝜸,κ2𝐈nL)×i=L+1nl=1Lf𝒔i,l(z(𝒔i)|z(𝒔(il)),𝜽) 1(r𝒔i,l1,r𝒔i,l)(ti),\displaystyle\times N(\bm{t}\,|\,\bm{D}\bm{\gamma},\,\kappa^{2}\mathbf{I}_{n-L})\times\prod_{i=L+1}^{n}\sum_{l=1}^{L}f_{\bm{s}_{i},l}(z(\bm{s}_{i})\,|\,z(\bm{s}_{(il)}),\bm{\theta})\,\mathbbm{1}_{(r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l})}(t_{i}),

where π𝜽\pi_{\bm{\theta}} and π𝜻\pi_{\bm{\zeta}} are the priors for 𝜽\bm{\theta} and 𝜻\bm{\zeta}, respectively, 𝐈nL\mathbf{I}_{n-L} is an (nL)×(nL)(n-L)\times(n-L) identity matrix, the vector 𝒕=(tL+1,,tn)\bm{t}=(t_{L+1},\dots,t_{n})^{\top}, and the matrix 𝑫\bm{D} is (nL)×3(n-L)\times 3 such that the iith row is (1,sL+i,1,sL+i,2)(1,s_{L+i,1},s_{L+i,2}).

The posterior full conditional distribution of 𝜽\bm{\theta} depends on the form of f𝒔i,lf_{\bm{s}_{i},l}. Details for the models implemented in Section 4 are given in the supplementary material. To update 𝜻\bm{\zeta}, we first marginalize out the latent variables tit_{i} from the joint posterior distribution. We then update 𝜻\bm{\zeta} using a random walk Metropolis step with target density π𝜻(𝜻)i=L+1n{G𝒔i(r𝒔i,i|μ(𝒔i),κ2)G𝒔i(r𝒔i,i1|μ(𝒔i),κ2)}\pi_{\bm{\zeta}}(\bm{\zeta})\prod_{i=L+1}^{n}\{G_{\bm{s}_{i}}(r_{\bm{s}_{i},\ell_{i}}\,|\,\mu(\bm{s}_{i}),\kappa^{2})-G_{\bm{s}_{i}}(r_{\bm{s}_{i},\ell_{i}-1}\,|\,\mu(\bm{s}_{i}),\kappa^{2})\}. The posterior full conditional distribution of the latent variable tit_{i} is l=1iLql(𝒔i)TN(ti|μ(𝒔i),κ2;r𝒔i,l1,r𝒔i,l)\sum_{l=1}^{i_{L}}q_{l}(\bm{s}_{i})\mathrm{TN}(t_{i}\,|\,\mu(\bm{s}_{i}),\kappa^{2};r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l}), where ql(𝒔i)wl(𝒔i)f𝒔i,l(z(𝒔i)|z(𝒔(il)),𝜽)q_{l}(\bm{s}_{i})\propto w_{l}(\bm{s}_{i})f_{\bm{s}_{i},l}(z(\bm{s}_{i})\,|\,z(\bm{s}_{(il)}),\bm{\theta}) and wl(𝒔i)=G𝒔i(r𝒔i,l|μ(𝒔i),κ2)G𝒔i(r𝒔i,l1|μ(𝒔i),κ2)w_{l}(\bm{s}_{i})=G_{\bm{s}_{i}}(r_{\bm{s}_{i},l}\,|\,\mu(\bm{s}_{i}),\kappa^{2})-G_{\bm{s}_{i}}(r_{\bm{s}_{i},l-1}\,|\,\mu(\bm{s}_{i}),\kappa^{2}), for l=1,,Ll=1,...,L. Hence, each tit_{i} can be updated by sampling from the ll-th truncated Gaussian with probability proportional to ql(𝒔i)q_{l}(\bm{s}_{i}). The posterior full conditional distribution of 𝜸\bm{\gamma} is N(𝜸|𝝁γ,𝑽γ)N(\bm{\gamma}\,|\,\bm{\mu}_{\gamma}^{*},\bm{V}_{\gamma}^{*}), where 𝑽γ=(𝑽γ1+κ2𝑫𝑫)1\bm{V}_{\gamma}^{*}=(\bm{V}_{\gamma}^{-1}+\kappa^{-2}\bm{D}^{\top}\bm{D})^{-1} and 𝝁γ=𝑽γ(𝑽γ1𝝁γ+κ2𝑫𝒕)\bm{\mu}_{\gamma}^{*}=\bm{V}_{\gamma}^{*}(\bm{V}_{\gamma}^{-1}\bm{\mu}_{\gamma}+\kappa^{-2}\bm{D}^{\top}\bm{t}). The posterior full conditional of κ2\kappa^{2} is IG(κ2|uκ2+(nL)/2,vκ2+i=L+1n(tiμ(𝒔i))2/2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}}+(n-L)/2,\,v_{\kappa^{2}}+\sum_{i=L+1}^{n}(t_{i}-\mu(\bm{s}_{i}))^{2}/2).

Turning to the prediction, let 𝒗0𝒟\bm{v}_{0}\in\mathcal{D}. We obtain posterior predictive samples of z(𝒗0)z(\bm{v}_{0}) as follows. If 𝒗0𝒮\bm{v}_{0}\notin\operatorname*{\mathcal{S}}, for each posterior sample of the parameters, we first compute the cutoff points r𝒗0,lr_{\bm{v}_{0},l}, such that r𝒗0,lr𝒗0,l1=k(𝒗0,𝒗(0l)|𝜻)/l=1Lk(𝒗0,𝒗(0l)|𝜻)r_{\bm{v}_{0},l}-r_{\bm{v}_{0},l-1}=k^{\prime}(\bm{v}_{0},\bm{v}_{(0l)}\,|\,\bm{\zeta})/\sum_{l=1}^{L}k^{\prime}(\bm{v}_{0},\bm{v}_{(0l)}\,|\,\bm{\zeta}), and obtain the weights wl(𝒗0)=G𝒗0(r𝒗0,l|μ(𝒗0),κ2)G𝒗0(r𝒗0,l1|μ(𝒗0),κ2)w_{l}(\bm{v}_{0})=G_{\bm{v}_{0}}(r_{\bm{v}_{0},l}\,|\,\,\mu(\bm{v}_{0}),\kappa^{2})-G_{\bm{v}_{0}}(r_{\bm{v}_{0},l-1}\,|\,\mu(\bm{v}_{0}),\kappa^{2}), for l=1,,Ll=1,\dots,L. We then predict z(𝒗0)z(\bm{v}_{0}) using (3). If 𝒗0𝒔i𝒮\bm{v}_{0}\equiv\bm{s}_{i}\in\operatorname*{\mathcal{S}}, we generate z(𝒗0)z(\bm{v}_{0}) similarly, but using samples for the weights collected from the posterior simulation, and applying (2) instead of (3) to generate z(𝒗0)z(\bm{v}_{0}).

4 Data Illustrations

4.1 Simulation Study

We have conducted several simulation experiments to study the inferential benefits of the NNMP modeling framework. Here, we present one synthetic data example. Three additional simulation examples are detailed in the supplementary material, the first demonstrates that Gaussian NNMP models can effectively approximate Gaussian random fields, the second illustrates the ability of skew-Gaussian NNMPs to handle data with different levels of skewness, and the third explores a copula NNMP model with beta marginals for bounded spatial data.

In this section, we demonstrate the use of copulas to construct NNMPs for tail dependence modeling. Our focus is on illustrating the flexibility of copula NNMPs for modeling complex dependence structures, and not specifically on extreme value modeling.

To simulate data, we created a regular grid of 200×200200\times 200 resolution on a unit square domain 𝒟\mathcal{D}, and generated over this grid a realization from random field y(𝒗)=F1(Tν(ω(𝒗))),y(\bm{v})=F^{-1}\big{(}T_{\nu}(\omega(\bm{v}))\big{)},
𝒗𝒟\bm{v}\in\mathcal{D}; see Fig. 1(a). Here, ω(𝒗)\omega(\bm{v}) is a standard Student-t process with tail parameter ν\nu and scale matrix specified by an exponential correlation function with range parameter ϕw\phi_{w}, and the distribution functions FF and TνT_{\nu} correspond to a gamma distribution, Ga(2,2)\mathrm{Ga}(2,2) with mean 11, and a standard Student-t distribution with tail parameter ν\nu, respectively. For a given pair of locations in 𝒟\mathcal{D} with correlation ρ0=exp(d0/ϕw)\rho_{0}=\exp(-d_{0}/\phi_{w}), the corresponding tail dependence coefficient of the random field is χν=2Tν+1((1+ν)(1ρ0)/(1+ρ0))\chi_{\nu}=2T_{\nu+1}\big{(}-\sqrt{(1+\nu)(1-\rho_{0})/(1+\rho_{0})}\big{)}. We took ϕw=1/12\phi_{w}=1/12, and chose ν=10\nu=10 so that the synthetic data exhibits moderate tail dependence at close distance, and the dependence decreases rapidly as the distance d0d_{0} becomes larger. In particular, for ρ0=\rho_{0}= 0.05,0.5,0.950.05,0.5,0.95, we obtain χ10=\chi_{10}= 0.01,0.08,0.610.01,0.08,0.61, respectively.

Refer to caption
(a) True y(𝒗)y(\bm{v})
Refer to caption
(b) Gaussian copula NNMP
Refer to caption
(c) Gumbel copula NNMP
Refer to caption
(d) Estimated marginals
Refer to caption
(e) Estimated probability (site 1)
Refer to caption
(f) Estimated probability (site 2)
Figure 1: Synthetic data example. Top panels are interpolated surfaces of the true field and posterior median estimates from both models. Bottom panels are estimated marginal densities and conditional survival probabilities from the two models. The green dashed lines correspond to the true model. The red (blue) dashed lines and shaded regions are the posterior mean and 95% credible interval estimates from the Gaussian (Gumbel) copula NNMP models.

We applied two copula NNMP models. The models are of the form in (7) with stationary gamma marginal Ga(a,b)\mathrm{Ga}(a,b) with mean a/ba/b. In the first model, the component copula density c𝒗,lc_{\bm{v},l} corresponds to a bivariate Gaussian copula, which is known to be unsuitable for tail dependence modeling. The spatially varying correlation parameter of the copula was specified by an exponential correlation function with range parameter ϕ1\phi_{1}. In the second model, we consider a spatially varying Gumbel copula as in Example 3. The spatially varying parameter of the copula density is defined with the link function ηl(𝒗)ηl(𝒗𝒗(l))=min{(1exp(𝒗𝒗(l)/ϕ2))1,50}\eta_{l}(\bm{v})\equiv\eta_{l}(||\bm{v}-\bm{v}_{(l)}||)=\min\{(1-\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi_{2}))^{-1},50\}, where the upper bound 5050 ensures numerical stability. When ηl(d0)=50\eta_{l}(d_{0})=50, exp(d0/ϕ2)=0.98\exp(-d_{0}/\phi_{2})=0.98. With this link function, we assume that given ϕ2\phi_{2}, the strength of the tail dependence with respect to the llth component of the Gumbel model stays the same for any distance smaller than d0d_{0} between two locations. For the cutoff point kernels, we specified an exponential correlation function with range parameters ζ1\zeta_{1} and ζ2\zeta_{2}, respectively, for each model. The Bayesian model is completed with a IG(3,1/3)\mathrm{IG}(3,1/3) prior for ϕ1\phi_{1} and ϕ2\phi_{2}, a Ga(1,1)\mathrm{Ga}(1,1) prior for aa and bb, a IG(3,0.2)\mathrm{IG}(3,0.2) prior for ζ1\zeta_{1} and ζ2\zeta_{2}, and N(𝜸|(1.5,0,0), 2𝐈3)N(\bm{\gamma}\,|\,(-1.5,0,0)^{\top},\,2\mathbf{I}_{3}) and IG(κ2| 3,1)\mathrm{IG}(\kappa^{2}\,|\,3,1) priors.

We randomly selected 2000 locations as the reference set with a random ordering for model fitting. For the purposes of illustration, we chose neighbor size L=10L=10. Results are based on posterior samples collected every 10 iterations from a Markov chain of 30000 iterations, with the first 10000 samples being discarded. Implementation details for both models are provided in the supplementary material. The computing time was around 1818 minutes.

Fig. 1 shows the random fields, marginal densities, and conditional survival probabilities estimated by the two models. From Fig. 1(a)-1(c), we see that, comparing with the true field, the posterior median estimate by the Gumbel copula model seems to recover the large values better than the Gaussian copula model. Besides, as shown in Fig. 1(d), the Gumbel copula model provides a more accurate estimate of the marginal distribution, especially in the tails. We computed the conditional survival probabilities at two different unobserved sites marked in Fig. 1(a). In particular, Site 1/2 is surrounded by reference set observations with moderate/large values. The Gumbel copula model provides much more accurate estimates of the conditional survival probabilities, indicating that the model captures better the tail dependence structure in the data. Overall, this example demonstrates that the Gumbel copula NNMP model is a useful option for modeling spatial processes with tail dependence.

4.2 Mediterranean Sea Surface Temperature Data Analysis

The study of Ocean’s dynamics is crucial for understanding climate variability. One of the most valuable sources of information regarding the evolution of the state of the ocean is provided by the centuries-long record of temperature observations from the surface of the oceans. The record of sea surface temperatures consists of data collected over time at irregularly scattered locations. In this section, we examine the sea surface temperature from the Mediterranean Sea area during December 2003.

Refer to caption
(a) Sea surface temperature observations
Refer to caption
(b) Mediterranean Sea partitions
Refer to caption
(c) 50%50\% predicted sea surface temperature
Refer to caption
(d) 95%95\% credible interval width
Refer to caption
(e) Histograms of residuals
Figure 2: Mediterranean Sea surface temperature data analysis. Panel (a) shows observations. Panels (b) and (e) are partitions according to Mediterranean sub-basins and histograms of the residuals obtained from a non-spatial linear model. Panels (c) and (d) are posterior median and 95% credible interval estimates of the sea surface temperature from the extended skew-Gaussian NNMP model.

It is well known that the Mediterranean Sea area produces very heterogeneous temperature fields. A goal of the spatial analysis of sea surface temperature in the area is to generate a spatially continuous field that accounts for the complexity of the surrounding coastlines as well as the non-linear dynamics of the circulation system. An additional source of complexity comes from the data collection process. Historically, the observations are collected from different types of devices: buckets launched from navigating vessels, readings from the water intake of ships’ engine rooms, moored buoys, and drifting buoys (Kirsner and Sansó, 2020). The source of some observations is known, but not all the data are labelled. A thorough case study will be needed to include all this information in order to account for possible heterogeneities due to the different measuring devices. That is beyond the scope of this paper. We will focus on demonstrating the ability of the proposed framework to model non-Gaussian spatial processes that, hopefully, capture the complexities of the physical process and the data collection protocol. We notice that in the original record several sites had multiple observations. In those cases we took the median of the observations, resulting in a total of 1966 observations. The data are shown in Fig. 2(a).

We first examine the Gaussianity assumption for the data. We compare the Gaussian NNMP and the nearest-neighbor Gaussian process models over a subset of the region where the ocean dynamics are known to be complex and the observations are heterogeneous. The model comparison is detailed in the supplementary material and the results support the NNMP model.

In light of the evidence (Pisano et al., 2020) that sea surface temperature spatial patterns are different over Mediterranean sub-basins, shown in Fig. 2(b), which are characterized by different dynamics and high variability of surface currents (Bouzaiene et al., 2020), we further investigate the sea surface temperature over those sub-basins. We fitted a non-spatial linear model to all data, including longitude and latitude as covariates, and obtained residuals from the linear model. Fig. 2(e) shows that the histograms of the residuals are asymmetric over the sub-basins, indicating skewness in the marginal distribution, with levels of skewness that vary across sub-basins.

The exploratory data analysis suggests the need for a spatial model that can capture skewness. We thus analyze the full data set with an extension of the skew-Gaussian NNMP model. The new model has two features that extend the skew-Gaussian NNMP: (i) it incorporates a fixed effect through the location parameter of the mixture component; (ii) it allows the skewness parameter λ\lambda to vary in space. More specifically, the spatially varying conditional density f𝒗,lf_{\bm{v},l} of the model builds from a Gaussian random vector with mean (𝒙(𝒗)𝜷+λ(𝒗)z0(𝒗),𝒙(𝒗(l))𝜷+λ(𝒗(l))z0(𝒗))\left(\bm{x}(\bm{v})^{\top}\bm{\beta}+\lambda(\bm{v})z_{0}(\bm{v}),\,\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}+\lambda(\bm{v}_{(l)})z_{0}(\bm{v})\right)^{\top} and covariance matrix σ2(1ρl(𝒗)ρl(𝒗)1)\sigma^{2}\left(\begin{smallmatrix}1&\rho_{l}(\bm{v})\\ \rho_{l}(\bm{v})&1\end{smallmatrix}\right), where 𝒙(𝒗)=(1,v1,v2)\bm{x}(\bm{v})=(1,v_{1},v_{2})^{\top} and z0(𝒗)TN(0,1;0,)z_{0}(\bm{v})\sim\mathrm{TN}(0,1;0,\infty), for all 𝒗\bm{v} and for all ll, and (v1,v2)(v_{1},v_{2}) are longitude and latitude. The conditional density p(y(𝒗)|𝒚Ne(𝒗))p(y(\bm{v})\,|\,\bm{y}_{\text{Ne}(\bm{v})}) of the extended model is

l=1Lwl(𝒗)0N(y(𝒗)|μl(𝒗),σl2(𝒗))TN(z0(𝒗)|μ0l(𝒗(l)),σ0l2(𝒗(l));0,)𝑑z0(𝒗),\sum_{l=1}^{L}w_{l}(\bm{v})\,\int_{0}^{\infty}N(y(\bm{v})\,|\,\mu_{l}(\bm{v}),\sigma_{l}^{2}(\bm{v}))\mathrm{TN}(z_{0}(\bm{v})\,|\,\mu_{0l}(\bm{v}_{(l)}),\sigma_{0l}^{2}(\bm{v}_{(l)});0,\infty)dz_{0}(\bm{v}), (10)

with parameters μl(𝒗)=𝒙(𝒗)𝜷+λ(𝒗)z0(𝒗)+ρl(𝒗){y(𝒗(l))𝒙(𝒗(l))𝜷λ(𝒗(l))z0(𝒗)}\mu_{l}(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+\lambda(\bm{v})z_{0}(\bm{v})+\rho_{l}(\bm{v})\{y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}-\lambda(\bm{v}_{(l)})z_{0}(\bm{v})\}, σl2(𝒗)=σ2{1(ρl(𝒗))2}\,\sigma_{l}^{2}(\bm{v})=\sigma^{2}\{1-(\rho_{l}(\bm{v}))^{2}\}, μ0l(𝒗(l))={y(𝒗(l))𝒙(𝒗(l))𝜷}λ(𝒗(l))/{σ2+(λ(𝒗(l)))2}\,\mu_{0l}(\bm{v}_{(l)})=\{y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}\}\lambda(\bm{v}_{(l)})/\{\sigma^{2}+(\lambda(\bm{v}_{(l)}))^{2}\}, and σ0l2(𝒗(l))=σ2/{σ2+(λ(𝒗(l)))2}\sigma_{0l}^{2}(\bm{v}_{(l)})=\sigma^{2}/\{\sigma^{2}+(\lambda(\bm{v}_{(l)}))^{2}\}. After marginalizing out z0(𝒗)z_{0}(\bm{v}), we obtain that the marginal distribution of Y(𝒗)Y(\bm{v}) is SN(𝒙(𝒗)𝜷,(λ(𝒗))2+σ2,λ(𝒗)/σ)\mathrm{SN}\big{(}\bm{x}(\bm{v})^{\top}\bm{\beta},(\lambda(\bm{v}))^{2}+\sigma^{2},\lambda(\bm{v})/\sigma\big{)}, based on the result of Proposition 1. We model the spatially varying λ(𝒗)\lambda(\bm{v}) via a partitioning approach. In particular, we partition the Mediterranean Sea 𝒟\mathcal{D} according to the sub-basins, that is, 𝒟=k=1KPk\mathcal{D}=\cup_{k=1}^{K}P_{k}, PiPj=P_{i}\cap P_{j}=\emptyset for iji\neq j, where K=5K=5. For all 𝒗Pk\bm{v}\in P_{k}, we take λ(𝒗)=λk\lambda(\bm{v})=\lambda_{k}, for k=1,,Kk=1,\dots,K. The partitions P1,,PKP_{1},\dots,P_{K} correspond to the sub-basins: Westernmost Mediterranean Sea, Tyrrhenian Sea, Adriatic Sea, Ionian Sea, and Levantine-Aegean Sea.

We applied the extended skew-Gaussian NNMP model (10) using the whole data set as the reference set, with LL chosen to be 10,1510,15 or 2020. The regression parameters 𝜷=(β0,β1,β2)\bm{\beta}=(\beta_{0},\beta_{1},\beta_{2})^{\top} were assigned mean-zero, dispersed normal priors. For the skewness parameters 𝝀=(λ1,,λ5)\bm{\lambda}=(\lambda_{1},\dots,\lambda_{5}), each element received a N(0,5)N(0,5) prior. We used the same prior specification for other parameters as in the first simulation experiment. Posterior inference was based on thinned samples retaining every 4th iteration, from a total of 30000 samples with a burn-in of 10000 samples. The computing time was around 14, 16, and 20 minutes, respectively, for each of the LL values.

We focus on the estimation of regression and skewness parameters 𝜷\bm{\beta} and 𝝀\bm{\lambda}. We report the estimates for L=15L=15; they were similar for L=10L=10 or 2020. The posterior mean and 95%95\% credible interval estimates of β0\beta_{0}, β1\beta_{1}, and β2\beta_{2} were 30.51(28.88,32.16)30.51\,(28.88,32.16), 0.12(0.09,0.15)0.12\,(0.09,0.15), and 0.37(0.42,0.33)-0.37\,(-0.42,-0.33), indicating that there was an increasing trend in sea surface temperature as longitude increased and latitude decreased. The corresponding posterior estimates of 𝝀\bm{\lambda} were 0.38(0.94,0.14)-0.38\,(-0.94,0.14), 1.37(2.10,0.71)-1.37\,(-2.10,-0.71), 2.44(4.03,1.14)-2.44\,(-4.03,-1.14), 1.60(2.54,0.86)-1.60\,(-2.54,-0.86), and 2.69(3.95,1.82)-2.69\,(-3.95,-1.82). These estimates suggest different levels of left skewness over the sub-basins except for the Westernmost Mediterranean Sea.

Fig. 2(c) provides the temperature posterior median estimate over a dense grid of locations on the Mediterranean Sea. Compared to Fig. 2(a), the estimate generally resembles the observed pattern. The prediction was quite smooth even for areas with few observations. The 95% credible interval width of the prediction over the gridded locations, as shown in Fig. 2(d), demonstrates that the model describes the uncertainty in accordance with the observed data structure; the uncertainty is higher in areas where there are less observations or the observations are volatile.

5 Discussion

We have introduced a class of geostatistical models for non-Gaussian processes, and demonstrated its flexibility for modeling complex dependence by specification of a collection of bivariate distributions indexed in space. The scope of the methodology has been illustrated through synthetic spatial data examples corresponding to distributions of different nature and support, and with the analysis of sea surface temperature measurements from the Mediterranean Sea.

To incorporate covariates, the NNMP can be embedded in an additive or multiplicative model. The former is illustrated in the supplementary material with a spatially varying regression model. Under an additive model, the posterior simulation algorithm requires extra care as it involves sequential updating of the elements in 𝒛𝒮\bm{z}_{\operatorname*{\mathcal{S}}}. This may induce slow convergence behavior. An alternative strategy for covariate inclusion is to model the weights or some parameter(s) of the spatially varying conditional density as a function of covariates. For example, in Section 4.2, we modeled the location parameter of the skew-Gaussian marginal as a linear function of the covariates. Posterior simulation under this approach is easily developed by modifying the update of the relevant parameters discussed in Section 3.2 to that of the regression coefficients.

The NNMP model structure not only bypasses all the potential issues from large matrix operations, but it also enhances modeling power. Kernel functions, such as wave covariance functions, that are impractical for Gaussian process-based models due to numerical instability from matrix inversion, can be used as link functions for the spatially varying parameter of the NNMP. One limitation of the NNMP’s computation, similar to mixture models, is that the posterior simulation algorithm may experience slow convergence issues. Further development is needed on efficient algorithms for fast computation, especially when dealing with massive, complex data sets.

We have focused in this article on a modeling framework for continuous data. The proposed approach can be naturally extended to modeling discrete spatial data. Modeling options for geostatistical count data in the existing literature involve either spatial generalized linear mixed models or spatial copula models (Madsen, 2009). However, owing to their structures, both models have limitations with respect to the distributional assumption for the spatial random effects, as well as in computational efficiency. The extension to discrete NNMP models has the potential to provide both inferential and computational benefits in modeling large discrete data sets.

It is also interesting to explore the opportunities for the analysis of spatial extremes using the NNMP framework. We developed guidelines in Section 2.4 to choose NNMP mixture components based on strength of tail dependence. The results highlight the ability of the NNMP model structure to capture local tail dependence at different levels, controlled by the mixture component bivariate distributions, for instance, with a class of bivariate extreme-value copulas. Moreover, using NNMPs for spatial extreme value modeling allows for efficient implementation of inference which is typically a challenge with existing approaches (Davison et al., 2012).

Other research directions include extensions to multivariate and spatio-temporal settings. The former extension requires families of high-dimensional multivariate distributions to construct an NNMP. Effective strategies will be needed to define the spatially varying multivariate distributions that balance flexibility and scalability. When it comes to a joint model over time and space, there is large scope for exploring the integration of the time component into the model, including extending the NNMP weights or the NNMP mixture components.

Supplementary Material

The supplementary material includes proofs of the propositions, additional data examples, and implementation details.

References

  • Allcroft and Glasbey (2003) Allcroft, D. J. and Glasbey, C. A. (2003), “A latent Gaussian Markov random-field model for spatiotemporal rainfall disaggregation,” Journal of the Royal Statistical Society: Series C (Applied Statistics), 52, 487–498.
  • Arnold et al. (1999) Arnold, B. C., Castillo, E., and Sarabia, J. M. (1999), Conditional specification of statistical models, New York: Springer.
  • Azzalini (2013) Azzalini, A. (2013), The skew-normal and related families, Cambridge, UK: Cambridge University Press.
  • Bárdossy (2006) Bárdossy, A. (2006), “Copula-based geostatistical models for groundwater quality parameters,” Water Resources Research, 42.
  • Beck et al. (2020) Beck, N., Genest, C., Jalbert, J., and Mailhot, M. (2020), “Predicting extreme surges from sparse data using a copula-based hierarchical Bayesian spatial model,” Environmetrics, 31, e2616.
  • Bevilacqua et al. (2021) Bevilacqua, M., Caamaño-Carrillo, C., Arellano-Valle, R. B., and Morales-Oñate, V. (2021), “Non-Gaussian geostatistical modeling using (skew) t processes,” Scandinavian Journal of Statistics, 48, 212–245.
  • Bevilacqua et al. (2020) Bevilacqua, M., Caamaño-Carrillo, C., and Gaetan, C. (2020), “On modeling positive continuous data with spatiotemporal dependence,” Environmetrics, 31, e2632.
  • Bolin (2014) Bolin, D. (2014), “Spatial Matérn fields driven by non-Gaussian noise,” Scandinavian Journal of Statistics, 41, 557–579.
  • Bouzaiene et al. (2020) Bouzaiene, M., Menna, M., Poulain, P.-M., Bussani, A., and Elhmaidi, D. (2020), “Analysis of the Surface Dispersion in the Mediterranean Sub-Basins,” Frontiers in Marine Science, 7, 486.
  • Cadonna et al. (2019) Cadonna, A., Kottas, A., and Prado, R. (2019), “Bayesian spectral modeling for multiple time series,” Journal of the American Statistical Association, 114, 1838–1853.
  • Danaher and Smith (2011) Danaher, P. J. and Smith, M. S. (2011), “Modeling multivariate distributions using copulas: Applications in marketing,” Marketing Science, 30, 4–21.
  • Datta et al. (2016a) Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016a), “Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets,” Journal of the American Statistical Association, 111, 800–812.
  • Datta et al. (2016b) Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A., and Schaap, M. (2016b), “Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis,” The Annals of Applied Statistics, 10, 1286.
  • Davison et al. (2012) Davison, A. C., Padoan, S. A., and Ribatet, M. (2012), “Statistical modeling of spatial extremes,” Statistical science, 27, 161–186.
  • De Oliveira et al. (1997) De Oliveira, V., Kedem, B., and Short, D. A. (1997), “Bayesian prediction of transformed Gaussian random fields,” Journal of the American Statistical Association, 92, 1422–1433.
  • Diggle et al. (1998) Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998), “Model-based geostatistics,” Journal of the Royal Statistical Society: Series C (Applied Statistics), 47, 299–350.
  • Eskelson et al. (2011) Eskelson, B. N., Madsen, L., Hagar, J. C., and Temesgen, H. (2011), “Estimating riparian understory vegetation cover with beta regression and copula models,” Forest Science, 57, 212–221.
  • Ferguson (1973) Ferguson, T. S. (1973), “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, 209–230.
  • Finley et al. (2019) Finley, A. O., Datta, A., Cook, B. D., Morton, D. C., Andersen, H. E., and Banerjee, S. (2019), “Efficient algorithms for Bayesian nearest neighbor Gaussian processes,” Journal of Computational and Graphical Statistics, 28, 401–414.
  • Gelfand et al. (2005) Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005), “Bayesian nonparametric spatial modeling with Dirichlet process mixing,” Journal of the American Statistical Association, 100, 1021–1035.
  • Ghosh and Mallick (2011) Ghosh, S. and Mallick, B. K. (2011), “A hierarchical Bayesian spatio-temporal model for extreme precipitation events,” Environmetrics, 22, 192–204.
  • Gräler (2014) Gräler, B. (2014), “Modelling skewed spatial random fields through the spatial vine copula,” Spatial Statistics, 10, 87–102.
  • Gramacy and Apley (2015) Gramacy, R. B. and Apley, D. W. (2015), “Local Gaussian process approximation for large computer experiments,” Journal of Computational and Graphical Statistics, 24, 561–578.
  • Guinness (2018) Guinness, J. (2018), “Permutation and grouping methods for sharpening Gaussian process approximations,” Technometrics, 60, 415–429.
  • Hua and Joe (2014) Hua, L. and Joe, H. (2014), “Strength of tail dependence based on conditional tail expectation,” Journal of Multivariate Analysis, 123, 143–159.
  • Joe (2014) Joe, H. (2014), Dependence modeling with copulas, Boca Raton, FL: CRC Press.
  • Johns et al. (2003) Johns, C. J., Nychka, D., Kittel, T. G. F., and Daly, C. (2003), “Infilling sparse records of spatial fields,” Journal of the American Statistical Association, 98, 796–806.
  • Katzfuss and Guinness (2021) Katzfuss, M. and Guinness, J. (2021), “A general framework for Vecchia approximations of Gaussian processes,” Statistical Science, 36, 124–141.
  • Kim and Mallick (2004) Kim, H.-M. and Mallick, B. K. (2004), “A Bayesian prediction using the skew Gaussian distribution,” Journal of Statistical Planning and Inference, 120, 85–101.
  • Kirsner and Sansó (2020) Kirsner, D. and Sansó, B. (2020), “Multi-scale shotgun stochastic search for large spatial datasets,” Computational Statistics & Data Analysis, 146, 106931.
  • Krupskii et al. (2018) Krupskii, P., Huser, R., and Genton, M. G. (2018), “Factor copula models for replicated spatial data,” Journal of the American Statistical Association, 113, 467–479.
  • Lauritzen (1996) Lauritzen, S. L. (1996), Graphical models, Oxford, UK: Clarendon Press.
  • Le et al. (1996) Le, N. D., Martin, R. D., and Raftery, A. E. (1996), “Modeling flat stretches, bursts outliers in time series using mixture transition distribution models,” Journal of the American Statistical Association, 91, 1504–1515.
  • Madsen (2009) Madsen, L. (2009), “Maximum likelihood estimation of regression parameters with spatially dependent discrete data,” Journal of Agricultural, Biological, and Environmental Statistics, 14, 375–391.
  • Mahmoudian (2017) Mahmoudian, B. (2017), “A skewed and heavy-tailed latent random field model for spatial extremes,” Journal of Computational and Graphical Statistics, 26, 658–670.
  • Morris et al. (2017) Morris, S. A., Reich, B. J., Thibaud, E., and Cooley, D. (2017), “A space-time skew-t model for threshold exceedances,” Biometrics, 73, 749–758.
  • Müller et al. (2018) Müller, P., Quintana, F. A., and Page, G. (2018), “Nonparametric Bayesian inference in applications,” Statistical Methods & Applications, 27, 175–206.
  • North et al. (2011) North, G. R., Wang, J., and Genton, M. G. (2011), “Correlation models for temperature fields,” Journal of Climate, 24, 5850–5862.
  • Palacios and Steel (2006) Palacios, M. B. and Steel, M. F. J. (2006), “Non-Gaussian Bayesian geostatistical modeling,” Journal of the American Statistical Association, 101, 604–618.
  • Peruzzi et al. (2020) Peruzzi, M., Banerjee, S., and Finley, A. O. (2020), “Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains,” Journal of the American Statistical Association, 0, 1–14.
  • Peruzzi and Dunson (2022) Peruzzi, M. and Dunson, D. B. (2022), “Spatial Multivariate Trees for Big Data Bayesian Regression.” Journal of Machine Learning Research, 23, 17–1.
  • Pisano et al. (2020) Pisano, A., Marullo, S., Artale, V., Falcini, F., Yang, C., Leonelli, F. E., Santoleri, R., and Buongiorno Nardelli, B. (2020), “New evidence of Mediterranean climate change and variability from sea surface temperature observations,” Remote Sensing, 12, 132.
  • Sklar (1959) Sklar, M. (1959), “Fonctions de repartition an dimensions et leurs marges,” Publications de l’Institut de Statistique de L’Université de Paris, 8, 229–231.
  • Stein et al. (2004) Stein, M. L., Chi, Z., and Welty, L. J. (2004), “Approximating likelihoods for large spatial data sets,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 275–296.
  • Sun et al. (2015) Sun, Y., Stein, M. L., et al. (2015), “A stochastic space-time model for intermittent precipitation occurrences,” The Annals of Applied Statistics, 9, 2110–2132.
  • Vecchia (1988) Vecchia, A. V. (1988), “Estimation and model identification for continuous spatial processes,” Journal of the Royal Statistical Society: Series B (Methodological), 50, 297–312.
  • Wallin and Bolin (2015) Wallin, J. and Bolin, D. (2015), “Geostatistical modelling using non-Gaussian Matérn fields,” Scandinavian Journal of Statistics, 42, 872–890.
  • Xu and Genton (2017) Xu, G. and Genton, M. G. (2017), “Tukey g-and-h random fields,” Journal of the American Statistical Association, 112, 1236–1249.
  • Zareifard et al. (2018) Zareifard, H., Khaledi, M. J., Rivaz, F., Vahidi-Asl, M. Q., et al. (2018), “Modeling skewed spatial data using a convolution of Gaussian and log-Gaussian processes,” Bayesian Analysis, 13, 531–557.
  • Zhang and El-Shaarawi (2010) Zhang, H. and El-Shaarawi, A. (2010), “On spatial skew-Gaussian processes and applications,” Environmetrics, 21, 33–47.
  • Zheng et al. (2022) Zheng, X., Kottas, A., and Sansó, B. (2022), “On Construction and Estimation of Stationary Mixture Transition Distribution Models,” Journal of Computational and Graphical Statistics, 31, 283–293.
  • Zilber and Katzfuss (2021) Zilber, D. and Katzfuss, M. (2021), “Vecchia–Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data,” Computational Statistics & Data Analysis, 153, 107081.

Supplementary Material for “Nearest-Neighbor Mixture Models for Non-Gaussian Spatial Processes”

A Proofs

Proof of Proposition 1.

We consider a univariate spatial process {Z(𝒗),𝒗𝒟}\{Z(\bm{v}),\bm{v}\in\mathcal{D}\}, where Z(𝒗)Z(\bm{v}) takes values in 𝒳\mathcal{X}\subseteq\mathbb{R}, and 𝒟p,p1\mathcal{D}\subset\mathbb{R}^{p},p\geq 1. Let 𝒮𝒟\operatorname*{\mathcal{S}}\subset\mathcal{D} be a reference set. Without loss of generality, we consider the continuous case, i.e., Z(𝒗)Z(\bm{v}) has a continuous distribution for which its density exists, for all 𝒗𝒟\bm{v}\in\mathcal{D}. To verify the proposition, we partition the domain 𝒟\mathcal{D} into the reference set 𝒮\operatorname*{\mathcal{S}} and the nonreference set 𝒰\operatorname*{\mathcal{U}}.

Given any 𝒗𝒟\bm{v}\in\mathcal{D}, consider a bivariate random vector indexed at 𝒗\bm{v}, denoted as (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) taking values in 𝒳×𝒳\mathcal{X}\times\mathcal{X}. We denote f𝒗,lf_{\bm{v},l} as the conditional density of U𝒗,lU_{\bm{v},l} given V𝒗,lV_{\bm{v},l}, and fU𝒗,l,fV𝒗,lf_{U_{\bm{v},l}},f_{V_{\bm{v},l}} as the marginal densities of U𝒗,lU_{\bm{v},l}, V𝒗,lV_{\bm{v},l}, respectively. Using the proposition assumption that fZ=fU𝒗,l=fV𝒗,lf_{Z}=f_{U_{\bm{v},l}}=f_{V_{\bm{v},l}}, we have that

𝒳f𝒗,l(uv)fZ(v)𝑑v=𝒳f𝒗,l(uv)fV𝒗,l(v)𝑑v=fU𝒗,l(u)=fZ(u),\int_{\mathcal{X}}f_{\bm{v},l}(u\mid v)f_{Z}(v)dv=\int_{\mathcal{X}}f_{\bm{v},l}(u\mid v)f_{V_{\bm{v},l}}(v)dv=f_{U_{\bm{v},l}}(u)=f_{Z}(u), (1)

for every 𝒗𝒟\bm{v}\in\mathcal{D} and for all ll.

We first prove the result for the reference set 𝒮\operatorname*{\mathcal{S}}. By the model assumption, locations in 𝒮\operatorname*{\mathcal{S}} are ordered. In this regard, using the proposition assumptions, we can show that Z(𝒔)fZZ(\bm{s})\sim f_{Z} for all 𝒔𝒮\bm{s}\in\operatorname*{\mathcal{S}} by applying Proposition 1 in (Zheng et al., 2022).

Turning to the nonreference set 𝒰\operatorname*{\mathcal{U}}. Let g𝒖(z(𝒖))g_{\bm{u}}(z(\bm{u})) be the marginal density of Z(𝒖)Z(\bm{u}) for every 𝒖𝒰\bm{u}\in\operatorname*{\mathcal{U}}. Denote by p~(𝒛Ne(𝒖))\tilde{p}(\bm{z}_{\text{Ne}(\bm{u})}) the joint density for the random vector 𝒛Ne(𝒖)\bm{z}_{\text{Ne}(\bm{u})} where Ne(𝒖)={𝒖(1),,𝒖(L)}𝒮\text{Ne}(\bm{u})=\{\bm{u}_{(1)},\dots,\bm{u}_{(L)}\}\subset\operatorname*{\mathcal{S}}, so every element of 𝒁Ne\bm{Z}_{\text{Ne}} has marginal density fZf_{Z}. Then, the marginal density for Z(𝒖)Z(\bm{u}) is given by:

g𝒖(z(𝒖))\displaystyle g_{\bm{u}}(z(\bm{u})) =𝒳Lp(z(𝒖)𝒛Ne(𝒖))p~(𝒛Ne(𝒖)){𝒔iNe(𝒖)}d(z(𝒔i))\displaystyle=\int_{\mathcal{X}^{L}}p(z(\bm{u})\mid\bm{z}_{\text{Ne}(\bm{u})})\tilde{p}(\bm{z}_{\text{Ne}(\bm{u})})\prod_{\{\bm{s}_{i}\in\text{Ne}(\bm{u})\}}d(z(\bm{s}_{i}))
=l=1Lwl(𝒗)𝒳Lf𝒗,l(z(𝒖)z(𝒖(l)))p~(𝒛Ne(𝒖)){𝒔iNe(𝒖),𝒔i𝒖(l)}d(z(𝒔i))\displaystyle=\sum_{l=1}^{L}w_{l}(\bm{v})\int_{\mathcal{X}^{L}}f_{\bm{v},l}(z(\bm{u})\mid z(\bm{u}_{(l)}))\tilde{p}(\bm{z}_{\text{Ne}(\bm{u})})\prod_{\{\bm{s}_{i}\in\text{Ne}(\bm{u}),\bm{s}_{i}\neq\bm{u}_{(l)}\}}d(z(\bm{s}_{i}))
=l=1Lwl(𝒗)𝒳f𝒗,l(z(𝒖)z(𝒖(l)))fZ(z(𝒖(l)))d(z(𝒖(l)))\displaystyle=\sum_{l=1}^{L}w_{l}(\bm{v})\int_{\mathcal{X}}f_{\bm{v},l}(z(\bm{u})\mid z(\bm{u}_{(l)}))f_{Z}(z(\bm{u}_{(l)}))d(z(\bm{u}_{(l)}))
=fZ(z(𝒖)),\displaystyle=f_{Z}(z(\bm{u})),

where the second-to-last equality holds by the result that Z(𝒔)fZZ(\bm{s})\sim f_{Z} for all 𝒔𝒮\bm{s}\in\operatorname*{\mathcal{S}} and Ne(𝒖)𝒮\text{Ne}(\bm{u})\subset\operatorname*{\mathcal{S}} for every 𝒖𝒰\bm{u}\in\operatorname*{\mathcal{U}}. The last equality follows from (1).

Proof of Proposition 2.

We verify the proposition by partitioning the domain 𝒟\mathcal{D} into the reference set 𝒮\operatorname*{\mathcal{S}} and the nonreference set 𝒰\operatorname*{\mathcal{U}}. We first prove by induction the result for the joint distribution p~(𝒛𝒮)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}) over 𝒮\operatorname*{\mathcal{S}}. Then to complete the proof, it suffices to show that for every location 𝒖𝒰\bm{u}\in\operatorname*{\mathcal{U}}, the joint density p~(𝒛𝒰1)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}_{1}}) is a mixture of multivariate Gaussian distributions, where 𝒰1=𝒮{𝒖}\operatorname*{\mathcal{U}}_{1}=\operatorname*{\mathcal{S}}\cup\{\bm{u}\}.

Without loss of generality, we assume μ=0\mu=0 for the stationary Gaussian NNMP, i.e., the Gaussian NNMP has invariant marginal fZ(z)=N(z| 0,σ2)f_{Z}(z)=N(z\,|\,0,\sigma^{2}). The conditional density for the reference set is p(z(𝒔i)|𝒛Ne(𝒔i))=l=1iLwl(𝒔i)N(z(𝒔i)|ρl(𝒔i)z(𝒔(il)),σ2(1(ρl(𝒔i))2))p(z(\bm{s}_{i})|\bm{z}_{\text{Ne}(\bm{s}_{i})})=\sum_{l=1}^{i_{L}}w_{l}(\bm{s}_{i})N(z(\bm{s}_{i})|\rho_{l}(\bm{s}_{i})z(\bm{s}_{(il)}),\sigma^{2}(1-(\rho_{l}(\bm{s}_{i}))^{2})), where for, i=2,,Li=2,\dots,L, iL=i1i_{L}=i-1, and for i>Li>L, iL=Li_{L}=L. For each ii, we denote as {wi,li}li=1iL\{w_{i,l_{i}}\}_{l_{i}=1}^{i_{L}} the vector of mixture weights, as {ρi,li}li=1iL\{\rho_{i,l_{i}}\}_{l_{i}=1}^{i_{L}} the vector of the correlation coefficients, and as {zi,li}li=1iL\{z_{i,l_{i}}\}_{l_{i}=1}^{i_{L}} the vector of the nearest neighbors of ziz(𝒔i)z_{i}\equiv z(\bm{s}_{i}), for i2i\geq 2, where wi,liwli(𝒔i)w_{i,l_{i}}\equiv w_{l_{i}}(\bm{s}_{i}), ρi,liρli(𝒔i)\rho_{i,l_{i}}\equiv\rho_{l_{i}}(\bm{s}_{i}), zi,liz(𝒔(i,li))z_{i,l_{i}}\equiv z(\bm{s}_{(i,l_{i})}). Let z1z(𝒔1)z_{1}\equiv z(\bm{s}_{1}). We denote by 𝒛1:k=(z(𝒔1),,z(𝒔k))\bm{z}_{1:k}=(z(\bm{s}_{1}),\dots,z(\bm{s}_{k})) the realization of Z(𝒔)Z(\bm{s}) over locations (𝒔1,,𝒔k)(\bm{s}_{1},\dots,\bm{s}_{k})^{\top} for k2k\geq 2, and use 𝒛1:kzj\bm{z}_{1:k}^{-z_{j}} to denote the random vector 𝒛1:k\bm{z}_{1:k} with element zjz_{j} removed, 1jk1\leq j\leq k. In the following, for a vector 𝒂=(a1,,am)\bm{a}=(a_{1},\dots,a_{m})^{\top}, we have that 𝒂c=(a1c,,amc)\bm{a}c=(a_{1}c,\dots,a_{m}c)^{\top}, where cc is a scalar.

Take Z1N(z1| 0,σ2)Z_{1}\sim N(z_{1}\,|\,0,\sigma^{2}). When i=2i=2, iL=1i_{L}=1 and w2,1=1w_{2,1}=1. The joint density of 𝒛1:2\bm{z}_{1:2} is p~(𝒛1:2)=N(z2|ρ2,1z1),σ2(1ρ2,12))N(z1|0,σ2)=N(𝒛1:2|𝟎,σ2𝛀2,1)\tilde{p}(\bm{z}_{1:2})=N(z_{2}|\rho_{2,1}z_{1}),\sigma^{2}(1-\rho_{2,1}^{2}))N(z_{1}|0,\sigma^{2})=N(\bm{z}_{1:2}|\bm{0},\sigma^{2}\bm{\Omega}_{2,1}), where 𝛀2,1=(1ρ2,1ρ2,11)\bm{\Omega}_{2,1}=\left(\begin{smallmatrix}1&\rho_{2,1}\\ \rho_{2,1}&1\end{smallmatrix}\right). The joint density of 𝒛1:3\bm{z}_{1:3} is

p~(𝒛1:3)\displaystyle\tilde{p}(\bm{z}_{1:3}) =p3(z3|𝒛1:2)p~(𝒛1:2)\displaystyle=p_{3}(z_{3}|\bm{z}_{1:2})\tilde{p}(\bm{z}_{1:2})
=l3=12w3,l3N(z3|ρ3,l3z3,l3,σ2(1ρ3,l32))N(𝒛1:2|𝟎,σ2𝛀2,1)\displaystyle=\sum_{l_{3}=1}^{2}w_{3,l_{3}}N(z_{3}|\rho_{3,l_{3}}z_{3,l_{3}},\sigma^{2}(1-\rho_{3,l_{3}}^{2}))N(\bm{z}_{1:2}|\bm{0},\sigma^{2}\bm{\Omega}_{2,1})
=l3=12w3,l3N(z3|ρ3,l3z3,l3,σ2(1ρ3,l32))N(𝒛1:2z3,l3|ρ2,l2z3,l3,σ2(1ρ2,12))N(z3,l3|0,σ2)\displaystyle=\sum_{l_{3}=1}^{2}w_{3,l_{3}}N(z_{3}|\rho_{3,l_{3}}z_{3,l_{3}},\sigma^{2}(1-\rho_{3,l_{3}}^{2}))N(\bm{z}_{1:2}^{-z_{3,l_{3}}}|\rho_{2,l_{2}}z_{3,l_{3}},\sigma^{2}(1-\rho_{2,1}^{2}))N(z_{3,l_{3}}|0,\sigma^{2})
=l3=12w3,l3N((z3,𝒛1:2z3,l3)|𝒎3,l3z3,l3,𝑽3,l3)N(z3,l3|0,σ2)\displaystyle=\sum_{l_{3}=1}^{2}w_{3,l_{3}}N((z_{3},\bm{z}_{1:2}^{-z_{3,l_{3}}})^{\top}|\bm{m}_{3,l_{3}}z_{3,l_{3}},\bm{V}_{3,l_{3}})N(z_{3,l_{3}}|0,\sigma^{2})

where 𝒎3,l3=(ρ3,l3,ρ2,1)\bm{m}_{3,l_{3}}=(\rho_{3,l_{3}},\rho_{2,1})^{\top}, and 𝑽3,l3=(σ2(1ρ3,l32)00σ2(1ρ2,12))\bm{V}_{3,l_{3}}=\left(\begin{smallmatrix}\sigma^{2}(1-\rho_{3,l_{3}}^{2})&0\\ 0&\sigma^{2}(1-\rho_{2,1}^{2})\end{smallmatrix}\right). The last equality follows from the fact that a product of conditionally independent Gaussian densities is a Gaussian density. By the properties of the Gaussian distribution and the property of the model that has a stationary marginal N(0,σ2)N(0,\sigma^{2}), for each l3l_{3}, we have that

N(𝒛~1:3,l3|𝟎,σ2𝑹3,l3)=N((z3,𝒛1:2z3,l3)|𝒎3,l3z3,l3,𝑽3,l3)N(z3,l3|0,σ2),N(\tilde{\bm{z}}_{1:3,l_{3}}|\bm{0},\sigma^{2}\bm{R}_{3,l_{3}})=N((z_{3},\bm{z}_{1:2}^{-z_{3,l_{3}}})^{\top}|\bm{m}_{3,l_{3}}z_{3,l_{3}},\bm{V}_{3,l_{3}})N(z_{3,l_{3}}|0,\sigma^{2}),

where 𝒛~1:3,l3=(z3,𝒛1:2z3,l3,z3,l3)\tilde{\bm{z}}_{1:3,l_{3}}=(z_{3},\bm{z}_{1:2}^{-z_{3,l_{3}}},z_{3,l_{3}})^{\top}, with the following partition relevant to the vector 𝒛~1:3,l3\tilde{\bm{z}}_{1:3,l_{3}},

𝒛~1:3,l3=((z3,𝒛1:2z3,l3)z3,l3),E(𝒛~1:3,l3)=(𝟎0),𝑹3,l3=(𝑹3,l3(11)𝑹3,l3(12)𝑹3,l3(21)𝑹3,l3(22)),\tilde{\bm{z}}_{1:3,l_{3}}=\begin{pmatrix}(z_{3},\bm{z}_{1:2}^{-z_{3,l_{3}}})^{\top}\\ z_{3,l_{3}}\end{pmatrix},\;E(\tilde{\bm{z}}_{1:3,l_{3}})=\begin{pmatrix}\bm{0}\\ 0\end{pmatrix},\;\bm{R}_{3,l_{3}}=\begin{pmatrix}\bm{R}_{3,l_{3}}^{(11)}&\bm{R}_{3,l_{3}}^{(12)}\\ \bm{R}_{3,l_{3}}^{(21)}&\bm{R}_{3,l_{3}}^{(22)}\end{pmatrix},

where 𝑹3,l3(22)=1\bm{R}_{3,l_{3}}^{(22)}=1 corresponds to z3,l3z_{3,l_{3}}. It follows that

𝒎3,l3z3,l3\displaystyle\bm{m}_{3,l_{3}}z_{3,l_{3}} =E((Z3,𝒁~1:2Z3,l3)|Z3,l3=z3,l3)=𝑹3,l3(12)z3,l3,\displaystyle=E((Z_{3},\tilde{\bm{Z}}_{1:2}^{-Z_{3,l_{3}}})\,|\,Z_{3,l_{3}}=z_{3,l_{3}})=\bm{R}_{3,l_{3}}^{(12)}z_{3,l_{3}},\;\; (2)
𝑽3,l3\displaystyle\bm{V}_{3,l_{3}} =σ2(𝑹3,l3(11)𝑹3,l3(12)𝑹3,l3(21)).\displaystyle=\sigma^{2}(\bm{R}_{3,l_{3}}^{(11)}-\bm{R}_{3,l_{3}}^{(12)}\bm{R}_{3,l_{3}}^{(21)}).

From (2), we obtain 𝒎3,l3=𝑹3,l3(12)\bm{m}_{3,l_{3}}=\bm{R}_{3,l_{3}}^{(12)} and 𝑹3,l3=(1ρ2,1ρ3,l3ρ3,l3ρ2,1ρ3,l31ρ2,1ρ3,l3ρ2,11)\bm{R}_{3,l_{3}}=\left(\begin{smallmatrix}1&\rho_{2,1}\rho_{3,l_{3}}&\rho_{3,l_{3}}\\ \rho_{2,1}\rho_{3,l_{3}}&1&\rho_{2,1}\\ \rho_{3,l_{3}}&\rho_{2,1}&1\end{smallmatrix}\right) for l3=1,2l_{3}=1,2. Then we reorder 𝒛~1:3,l3\tilde{\bm{z}}_{1:3,l_{3}} with a matrix 𝑩3,l3\bm{B}_{3,l_{3}} such that 𝒛1:3=𝑩3,l3𝒛~1:3,l3\bm{z}_{1:3}=\bm{B}_{3,l_{3}}\tilde{\bm{z}}_{1:3,l_{3}}. It follows that 𝛀3,l31=𝑩3,l3𝑹3,l3𝑩3,l3T\bm{\Omega}_{3,l_{3}1}=\bm{B}_{3,l_{3}}\bm{R}_{3,l_{3}}\bm{B}_{3,l_{3}}^{T}, and the joint density is p(𝒛1:3)=l3=12w3,l3N(𝒛1:3|𝟎,σ2𝛀3,l31)p(\bm{z}_{1:3})=\sum_{l_{3}=1}^{2}w_{3,l_{3}}N(\bm{z}_{1:3}|\bm{0},\sigma^{2}\bm{\Omega}_{3,l_{3}1}).

Similarly, the joint density of 𝒛1:4\bm{z}_{1:4} is given by

p~(𝒛1:4)\displaystyle\tilde{p}(\bm{z}_{1:4}) =p4(z4|𝒛1:3)p~(𝒛1:3)\displaystyle=p_{4}(z_{4}|\bm{z}_{1:3})\tilde{p}(\bm{z}_{1:3})
=l4=13w4,l4N(z4|ρ4,l4z4,l4,σ2(1ρ4,l42))l3=12w3,l3N(𝒛1:3|𝟎,σ2𝛀3,l31)\displaystyle=\sum_{l_{4}=1}^{3}w_{4,l_{4}}N(z_{4}|\rho_{4,l_{4}}z_{4,{l_{4}}},\sigma^{2}(1-\rho_{4,l_{4}}^{2}))\sum_{l_{3}=1}^{2}w_{3,l_{3}}N(\bm{z}_{1:3}|\bm{0},\sigma^{2}\bm{\Omega}_{3,l_{3}1})
=l4=13l3=12w4,l4w3,l3N(z4|ρ4,l4z4,l4,σ2(1ρ4,l42))\displaystyle=\sum_{l_{4}=1}^{3}\sum_{l_{3}=1}^{2}w_{4,l_{4}}w_{3,l_{3}}N(z_{4}|\rho_{4,l_{4}}z_{4,l_{4}},\sigma^{2}(1-\rho_{4,l_{4}}^{2}))
N((𝒛1:3z4,l4)𝛀~3,l31(12)z4,l4,σ2(𝛀~3,l31(11)𝛀~3,l31(12)𝛀~3,l31(21)))N(z4,l4|0,σ2)\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;N((\bm{z}_{1:3}^{-z_{4,{l_{4}}}})^{\top}\mid\tilde{\bm{\Omega}}_{3,l_{3}1}^{(12)}z_{4,l_{4}},\sigma^{2}(\tilde{\bm{\Omega}}_{3,l_{3}1}^{(11)}-\tilde{\bm{\Omega}}_{3,l_{3}1}^{(12)}\tilde{\bm{\Omega}}_{3,l_{3}1}^{(21)}))N(z_{4,l_{4}}|0,\sigma^{2})
=l4=13l3=12w4,l4w3,l3N((z4,𝒛1:3z4,l4)|𝒎4,l4l3z4,l4,𝑽4,l4l3)N(z4,l4|0,σ2),\displaystyle=\sum_{l_{4}=1}^{3}\sum_{l_{3}=1}^{2}w_{4,l_{4}}w_{3,l_{3}}N((z_{4},\bm{z}_{1:3}^{-z_{4,{l_{4}}}})^{\top}|\bm{m}_{4,l_{4}l_{3}}z_{4,l_{4}},\bm{V}_{4,l_{4}l_{3}})N(z_{4,l_{4}}|0,\sigma^{2}),

where 𝛀~3,l31=𝑩~4,l4𝛀3,l31𝑩~4,l4\tilde{{\bm{\Omega}}}_{3,l_{3}1}=\tilde{\bm{B}}_{4,l_{4}}\bm{\Omega}_{3,l_{3}1}\tilde{\bm{B}}_{4,l_{4}}^{\top}, and 𝑩~4,l4\tilde{\bm{B}}_{4,l_{4}} is a rotation matrix such that (𝒛1:3z4,l4,z4,l4)=𝑩~4,l4𝒛1:3(\bm{z}_{1:3}^{-z_{4,l_{4}}},z_{4,l_{4}})^{\top}=\tilde{\bm{B}}_{4,l_{4}}\bm{z}_{1:3}. We partition the matrix 𝛀~3,l31\tilde{{\bm{\Omega}}}_{3,l_{3}1} such that 𝛀~3,l41(11)\tilde{\bm{\Omega}}_{3,l_{4}1}^{(11)} and 𝛀~3,l41(22)\tilde{\bm{\Omega}}_{3,l_{4}1}^{(22)} correspond to 𝒛1:3z4,l4\bm{z}_{1:3}^{-z_{4,l_{4}}} and z4,l4z_{4,l_{4}}, respectively. We have that for l3=1,2,l4=1,2,3l_{3}=1,2,\;l_{4}=1,2,3,

N(𝒛~1:4,l4|𝟎,σ2𝑹4,l4l3)=N((z4,𝒛1:3z4,l4)|𝒎4,l4l3z4,l4,𝑽4,l4l3)N(z4,l4|0,σ2),N(\tilde{\bm{z}}_{1:4,l_{4}}|\bm{0},\sigma^{2}\bm{R}_{4,l_{4}l_{3}})=N((z_{4},\bm{z}_{1:3}^{-z_{4,{l_{4}}}})^{\top}|\bm{m}_{4,l_{4}l_{3}}z_{4,l_{4}},\bm{V}_{4,l_{4}l_{3}})N(z_{4,l_{4}}|0,\sigma^{2}),

and

𝒛~1:4,l4\displaystyle\tilde{\bm{z}}_{1:4,l_{4}} =(z4,𝒛1:3z4,l4,z4,l4),𝒎4,l4l3=(ρ4,l4,(𝛀~3,l31(12))),\displaystyle=(z_{4},\bm{z}_{1:3}^{-z_{4,l_{4}}},z_{4,l_{4}})^{\top},\;\;\bm{m}_{4,l_{4}l_{3}}=(\rho_{4,l_{4}},(\tilde{\bm{\Omega}}_{3,l_{3}1}^{(12)})^{\top})^{\top},
𝑽4,l4l3\displaystyle\bm{V}_{4,l_{4}l_{3}} =(σ2(1ρ4,l42)𝟎T𝟎σ2(𝛀~3,l31(11)𝛀~3,l31(12)𝛀~3,l31(21))),\displaystyle=\begin{pmatrix}\sigma^{2}(1-\rho_{4,l_{4}}^{2})&\bm{0}^{T}\\ \bm{0}&\sigma^{2}(\tilde{\bm{\Omega}}_{3,l_{3}1}^{(11)}-\tilde{\bm{\Omega}}_{3,l_{3}1}^{(12)}\tilde{\bm{\Omega}}_{3,l_{3}1}^{(21)})\end{pmatrix},
𝑹4,l4l3(12)\displaystyle\bm{R}_{4,l_{4}l_{3}}^{(12)} =(𝑹4,l4l3(21))=𝒎4,l4l3,𝑹4,l4l3(11)=𝑽4,l4l3/σ2+𝒎4,l4l3𝒎4,l4l3T.\displaystyle=(\bm{R}_{4,l_{4}l_{3}}^{(21)})^{\top}=\bm{m}_{4,l_{4}l_{3}},\;\;\bm{R}_{4,l_{4}l_{3}}^{(11)}=\bm{V}_{4,l_{4}l_{3}}/\sigma^{2}+\bm{m}_{4,l_{4}l_{3}}\bm{m}_{4,l_{4}l_{3}}^{T}.

We reorder 𝒛~1:4,l4\tilde{\bm{z}}_{1:4,l_{4}} with matrix 𝑩4,l4\bm{B}_{4,l_{4}} such that 𝒛1:4=𝑩4,l4𝒛~1:4,l4\bm{z}_{1:4}=\bm{B}_{4,l_{4}}\tilde{\bm{z}}_{1:4,l_{4}} and let 𝛀4,l4l31=𝑩4,l4𝑹4,l4l3𝑩4,l3T\bm{\Omega}_{4,l_{4}l_{3}1}=\bm{B}_{4,l_{4}}\bm{R}_{4,l_{4}l_{3}}\bm{B}_{4,l_{3}}^{T}. Then we obtain the joint density of 𝒛1:4\bm{z}_{1:4} as p~(𝒛1:4)=l4=13l3=12w4,l4w3,l3N(𝒛1:4|𝟎,σ2𝛀4,l4l31)\tilde{p}(\bm{z}_{1:4})=\sum_{l_{4}=1}^{3}\sum_{l_{3}=1}^{2}w_{4,l_{4}}w_{3,l_{3}}N(\bm{z}_{1:4}|\bm{0},\sigma^{2}\bm{\Omega}_{4,l_{4}l_{3}1}).

Applying the above technique iteratively for p~(𝒛1:j)\tilde{p}(\bm{z}_{1:j}) for 5jk5\leq j\leq k, we obtain the joint density p~(𝒛1:k)p~(𝒛𝒮)\tilde{p}(\bm{z}_{1:k})\equiv\tilde{p}(\bm{z}_{\operatorname*{\mathcal{S}}}), for k2k\geq 2, namely,

p~(𝒛1:k)=lk=1kLl2=12Lwk,lkw3,l3w2,l2N(𝒛1:k|𝟎,σ2𝛀k,lkl3l2)\tilde{p}(\bm{z}_{1:k})=\sum_{l_{k}=1}^{k_{L}}\dots\sum_{l_{2}=1}^{2_{L}}w_{k,l_{k}}\dots w_{3,l_{3}}w_{2,l_{2}}N(\bm{z}_{1:k}|\bm{0},\sigma^{2}\bm{\Omega}_{k,l_{k}\dots l_{3}l_{2}})

where kL:=(k1)Lk_{L}:=(k-1)\wedge L, w2,1=1w_{2,1}=1, and for k3k\geq 3,

𝛀~k1,lk1l31\displaystyle\tilde{\bm{\Omega}}_{k-1,l_{k-1}\dots l_{3}1} =𝑩~k,lk1𝛀k1,lk1l31𝑩~k,lk,𝒎k,lkl1=(ρk,lk,(𝛀~k1,lk1l31(12))),\displaystyle=\tilde{\bm{B}}_{k,l_{k-1}}\bm{\Omega}_{k-1,l_{k-1}\dots l_{3}1}\tilde{\bm{B}}_{k,l_{k}},\;\;\bm{m}_{k,l_{k}\dots l_{1}}=(\rho_{k,l_{k}},(\tilde{\bm{\Omega}}_{k-1,l_{k-1}\dots l_{3}1}^{(12)})^{\top})^{\top},
𝑽k,lkl3\displaystyle\bm{V}_{k,l_{k}\dots l_{3}} =(σ2(1ρk,lk2)𝟎𝟎Tσ2(𝛀~k1,lk1l31(11)𝛀~k1,lk1l31(12)𝛀~k1,lk1l31(21))),\displaystyle=\begin{pmatrix}\sigma^{2}(1-\rho_{k,l_{k}}^{2})&\bm{0}\\ \bm{0}^{T}&\sigma^{2}(\tilde{\bm{\Omega}}_{k-1,l_{k-1}\dots l_{3}1}^{(11)}-\tilde{\bm{\Omega}}_{k-1,l_{k-1}\dots l_{3}1}^{(12)}\tilde{\bm{\Omega}}_{k-1,l_{k-1}\dots l_{3}1}^{(21)})\end{pmatrix},
𝑹k,lkl3(12)\displaystyle\bm{R}_{k,l_{k}\dots l_{3}}^{(12)} =(𝑹k,lkl3(21))=𝒎k,lkl3,𝑹k,lkl3(11)=𝑽k,lkl3/σ2+𝒎k,lkl3𝒎k,lkl3,\displaystyle=(\bm{R}_{k,l_{k}\dots l_{3}}^{(21)})^{\top}=\bm{m}_{k,l_{k}\dots l_{3}},\;\;\bm{R}_{k,l_{k}\dots l_{3}}^{(11)}=\bm{V}_{k,l_{k}\dots l_{3}}/\sigma^{2}+\bm{m}_{k,l_{k}\dots l_{3}}\bm{m}_{k,l_{k}\dots l_{3}}^{\top},
𝛀k,lkl31\displaystyle\bm{\Omega}_{k,l_{k}\dots l_{3}1} =𝑩k,lk𝑹k,lkl3𝑩k,lkT,\displaystyle=\bm{B}_{k,l_{k}}\bm{R}_{k,l_{k}\dots l_{3}}\bm{B}_{k,l_{k}}^{T},

where 𝑩~k,lk\tilde{\bm{B}}_{k,l_{k}} is the rotation matrix such that (𝒛1:(k1)zk,lk,zk,lk)=𝑩~k,lk𝒛1:(k1)(\bm{z}_{1:(k-1)}^{-z_{k,l_{k}}},z_{k,l_{k}})^{\top}=\tilde{\bm{B}}_{k,l_{k}}\bm{z}_{1:(k-1)}, and 𝑩k,lk\bm{B}_{k,l_{k}} is the rotation matrix such that the vector 𝒛1:k=𝑩k,lk𝒛~1:k,lk\bm{z}_{1:k}=\bm{B}_{k,l_{k}}\tilde{\bm{z}}_{1:k,l_{k}}, where 𝒛~1:k,lk=(zk,𝒛1:(k1)zk,lk,zk,lk)\tilde{\bm{z}}_{1:k,l_{k}}=(z_{k},\bm{z}_{1:(k-1)}^{-z_{k,l_{k}}},z_{k,l_{k}})^{\top}.

To complete the proof, what remains to be shown is that the density p~(𝒛𝒰1)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}_{1}}) is a mixture of multivariate Gaussian distributions, where 𝒰1=𝒮{𝒖}\operatorname*{\mathcal{U}}_{1}=\operatorname*{\mathcal{S}}\cup\{\bm{u}\}. We have that p~(𝒛𝒰1)=l=1Lwl(𝒖)N(z(𝒖)ρl(𝒖)z(𝒖(l)),σ2(1(ρl(𝒖))2))p~(𝒛1:k)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}_{1}})=\sum_{l=1}^{L}w_{l}(\bm{u})N(z(\bm{u})\mid\rho_{l}(\bm{u})z(\bm{u}_{(l)}),\sigma^{2}(1-(\rho_{l}(\bm{u}))^{2}))\tilde{p}(\bm{z}_{1:k}), where z(𝒖(l))z(\bm{u}_{(l)}) is an element of 𝒛1:k\bm{z}_{1:k} for l=1,,Ll=1,\dots,L. We can express each component density N(𝒛1:k| 0,σ2Ωk,lkl31)N(\bm{z}_{1:k}\,|\,\bm{0},\sigma^{2}\Omega_{k,l_{k}\dots l_{3}1}) of the joint density p~(𝒛1:k)\tilde{p}(\bm{z}_{1:k}) as the product of a Gaussian density of 𝒁1:kZ(𝒖(l))\bm{Z}_{1:k}^{-Z(\bm{u}_{(l)})} conditional on Z(𝒖(l))=z(𝒖(l))Z(\bm{u}_{(l)})=z(\bm{u}_{(l)}) and a Gaussian density of Z(𝒖(l))Z(\bm{u}_{(l)}). Using the approach in deriving the joint density over 𝒮\operatorname*{\mathcal{S}}, we can show that p~(𝒛𝒰1)\tilde{p}(\bm{z}_{\operatorname*{\mathcal{U}}_{1}}) is a mixture of multivariate Gaussian distributions.

Proof of Proposition 3.

For an NNMP Z(𝒗)Z(\bm{v}), the conditional probability that Z(𝒗)Z(\bm{v}) is greater than zz given its neighbors 𝒁Ne(𝒗)=𝒛Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})}=\bm{z}_{\text{Ne}(\bm{v})}, where 𝒛Ne(𝒗)=(z𝒗(1),,z𝒗(L))\bm{z}_{\text{Ne}(\bm{v})}=(z_{\bm{v}_{(1)}},\dots,z_{\bm{v}_{(L)}}), is P(Z(𝒗)>z|𝒁Ne(𝒗)=𝒛Ne(𝒗))=l=1Lwl(𝒗)P(Z(𝒗)>z|Z(𝒗(l))=z(𝒗(l)))P(Z(\bm{v})>z\,|\,\bm{Z}_{\text{Ne}(\bm{v})}=\bm{z}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})P(Z(\bm{v})>z\,|\,Z(\bm{v}_{(l)})=z(\bm{v}_{(l)})), where the conditional probability P(Z(𝒗)>z|Z(𝒗(l))=z(𝒗(l)))P(Z(\bm{v})>z\,|\,Z(\bm{v}_{(l)})=z(\bm{v}_{(l)})) corresponds to the bivariate random vector (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}). If UlU_{l} is stochastically increasing in VlV_{l} for all ll, by the assumption that the sequence (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) is built from the base random vectors (Ul,Vl)(U_{l},V_{l}) for all ll, we have that Z(𝒗)Z(\bm{v}) is stochastically increasing in 𝒁Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})} for every 𝒗𝒟\bm{v}\in\mathcal{D}, i.e., P(Z(𝒗)>z|𝒁Ne(𝒗)=𝒛Ne(𝒗))P(Z(𝒗)>z|𝒁Ne(𝒗)=𝒛Ne(𝒗))P(Z(\bm{v})>z\,|\,\bm{Z}_{\text{Ne}(\bm{v})}=\bm{z}_{\text{Ne}(\bm{v})})\,\leq\,P(Z(\bm{v})>z\,|\,\bm{Z}_{\text{Ne}(\bm{v})}=\bm{z}_{\text{Ne}(\bm{v})}^{\prime}), for all 𝒛Ne(𝒗)\bm{z}_{\text{Ne}(\bm{v})} and 𝒛Ne(𝒗)\bm{z}_{\text{Ne}(\bm{v})}^{\prime} in the support of 𝒁Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})}, such that z𝒗(l)z𝒗(l)z_{\bm{v}_{(l)}}\leq z_{\bm{v}_{(l)}}^{\prime} for all ll.

Let FZ(𝒗)F_{Z(\bm{v})} and FZ(𝒗(1)),,Z(𝒗(L))F_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})} be the distribution functions of Z(𝒗)Z(\bm{v}) and 𝒁Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})}, respectively. Denote by SZ(𝒗(1)),,Z(𝒗(L))(z1,,zL)=P(Z(𝒗(1))>z1,,Z(𝒗(L))>zL)S_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})}(z_{1},\dots,z_{L})=P(Z(\bm{v}_{(1)})>z_{1},\dots,Z(\bm{v}_{(L)})>z_{L}) the joint survival probability. Then for every 𝒗𝒟\bm{v}\in\mathcal{D} and q(0,1)q\in(0,1),

P(Z(𝒗)>FZ(𝒗)1(q)Z(𝒗(1))>FZ(𝒗(1))1(q),,Z(𝒗(L))>FZ(𝒗(L))1(q))\displaystyle\;\;\;\;P(Z(\bm{v})>F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})>F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})>F_{Z(\bm{v}_{(L)})}^{-1}(q)) (3)
={FZ(𝒗(1))1(q)FZ(𝒗(L))1(q)P(Z(𝒗)>FZ(𝒗)1(q)Z(𝒗(1))=z1,,Z(𝒗(L))=zL)\displaystyle=\bigg{\{}\int_{F_{Z(\bm{v}_{(1)})}^{-1}(q)}^{\infty}\dots\int_{F_{Z(\bm{v}_{(L)})}^{-1}(q)}^{\infty}P(Z(\bm{v})>F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})=z_{1},\dots,Z(\bm{v}_{(L)})=z_{L})
dFZ(𝒗(1)),,Z(𝒗(L))(z1,,zL)}/SZ(𝒗(1)),,Z(𝒗(L))(FZ(𝒗(1))1(q),,FZ(𝒗(L))1(q))\displaystyle\;\;\;\;dF_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})}(z_{1},\dots,z_{L})\bigg{\}}\big{/}S_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})}(F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,F_{Z(\bm{v}_{(L)})}^{-1}(q))
{FZ(𝒗(1))1(q)FZ(𝒗(L))1(q)P(Z(𝒗)>FZ(𝒗)1(q)Z(𝒗(1))=FZ(𝒗(1))1(q),,Z(𝒗(L))=FZ(𝒗(L))1(q))\displaystyle\geq\bigg{\{}\int_{F_{Z(\bm{v}_{(1)})}^{-1}(q)}^{\infty}\dots\int_{F_{Z(\bm{v}_{(L)})}^{-1}(q)}^{\infty}P(Z(\bm{v})>F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})=F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})=F_{Z(\bm{v}_{(L)})}^{-1}(q))
dFZ(𝒗(1)),,Z(𝒗(L))(z1,,zL)}/SZ(𝒗(1)),,Z(𝒗(L))(FZ(𝒗(1))1(q),,FZ(𝒗(L))1(q))\displaystyle\;\;\;\;dF_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})}(z_{1},\dots,z_{L})\bigg{\}}\big{/}S_{Z(\bm{v}_{(1)}),\dots,Z(\bm{v}_{(L)})}(F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,F_{Z(\bm{v}_{(L)})}^{-1}(q))
=P(Z(𝒗)>FZ(𝒗)1(q)Z(𝒗(1))=FZ(𝒗(1))1(q),,Z(𝒗(L))=FZ(𝒗(L))1(q))\displaystyle=P(Z(\bm{v})>F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})=F_{Z(\bm{v}_{(1)})}^{-1}(q),\dots,Z(\bm{v}_{(L)})=F_{Z(\bm{v}_{(L)})}^{-1}(q))
=l=1Lwl(𝒗)P(Z(𝒗)>FU𝒗,l1(q)Z(𝒗(l))=FV𝒗,l1(q)),\displaystyle=\sum_{l=1}^{L}w_{l}(\bm{v})P(Z(\bm{v})>F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(q)),

where the inequality follows the assumption of stochastically increasing positive dependence of Z(𝒗)Z(\bm{v}) given 𝒁Ne(𝒗)\bm{Z}_{\text{Ne}(\bm{v})}.

Taking q1q\rightarrow 1^{-} on both sides of (3), we obtain λ(𝒗)l=1Lwl(𝒗)limq1P(Z(𝒗)>FU𝒗,l1(q)Z(𝒗(l))=FV𝒗,l1(q))\lambda_{\mathcal{H}}(\bm{v})\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lim_{q\rightarrow 1^{-}}P(Z(\bm{v})>F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(q)), where FU𝒗,lF_{U_{\bm{v},l}} and FV𝒗,lF_{V_{\bm{v},l}} are the marginal distribution functions of (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}). Similarly, we can obtain the lower bound for λ(𝒗)\lambda_{\mathcal{L}}(\bm{v}).

Proof of Corollary 1.

We prove the result for λ(𝒗)\lambda_{\mathcal{L}}(\bm{v}). The result for λ(𝒗)\lambda_{\mathcal{H}}(\bm{v}) is obtained in a similar way. Consider a bivariate cdf FUl,VlF_{U_{l},V_{l}} for random vector (Ul,Vl)(U_{l},V_{l}), with marginal cdfs FUl=FVl=FlF_{U_{l}}=F_{V_{l}}=F_{l} and marginal densities fUl=fVl=flf_{U_{l}}=f_{V_{l}}=f_{l}, for all ll. The lower tail dependence coefficient is expressed as λ,l=limq0+FUl,Vl(Fl1(q),Fl1(q))Fl(Fl1(q))\lambda_{\mathcal{L},l}=\lim_{q\rightarrow 0^{+}}\frac{F_{U_{l},V_{l}}(F_{l}^{-1}(q),F_{l}^{-1}(q))}{F_{l}(F_{l}^{-1}(q))} with q[0,1]q\in[0,1]. If FUl,VlF_{U_{l},V_{l}} has first order partial derivatives, applying the L’Hopital’s rule, we obtain

λ,l\displaystyle\lambda_{\mathcal{L},l} =limq0+FUl,Vl/Vl(Fl1(q),Fl1(q))+FUl,Vl/Ul(Fl1(q),Fl1(q))fl(Fl1(q))\displaystyle=\lim_{q\rightarrow 0^{+}}\frac{\partial{F_{U_{l},V_{l}}}/\partial{V_{l}}(F_{l}^{-1}(q),F_{l}^{-1}(q))+\partial{F_{U_{l},V_{l}}}/\partial{U_{l}}(F_{l}^{-1}(q),F_{l}^{-1}(q))}{f_{l}(F_{l}^{-1}(q))}
=limq0+P(UlFl1(q)Vl=Fl1(q))+limq0+P(VlFl1(q)Ul=Fl1(q)).\displaystyle=\lim_{q\rightarrow 0^{+}}P(U_{l}\leq F_{l}^{-1}(q)\mid V_{l}=F_{l}^{-1}(q))+\lim_{q\rightarrow 0^{+}}P(V_{l}\leq F_{l}^{-1}(q)\mid U_{l}=F_{l}^{-1}(q)).

The above is a reproduced result from Theorem 8.57 of (Joe, 2014). If (Ul,Vl)(U_{l},V_{l}) is exchangeable, we have λ,l=2limq0+P(UlFl1(q)|Vl=Fl1(q))\lambda_{\mathcal{L},l}=2\lim_{q\rightarrow 0^{+}}P(U_{l}\leq F_{l}^{-1}(q)\,|\,V_{l}=F_{l}^{-1}(q)). If the sequences (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) of an NNMP model are built from the base random vectors (Ul,Vl)(U_{l},V_{l}). By our assumption that FUl=FVlF_{U_{l}}=F_{V_{l}} for all ll, the marginal cdfs of (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) extended from (Ul,Vl)(U_{l},V_{l}) are F𝒗,l=FU𝒗,l=FV𝒗,lF_{\bm{v},l}=F_{U_{\bm{v},l}}=F_{V_{\bm{v},l}} for all 𝒗\bm{v} and all ll. Then we have λ,l(𝒗)=2limq0+P(U𝒗,lF𝒗,l1(q)|V𝒗,l=F𝒗,l1(q))\lambda_{\mathcal{L},l}(\bm{v})=2\lim_{q\rightarrow 0^{+}}P(U_{\bm{v},l}\leq F_{\bm{v},l}^{-1}(q)\,|\,V_{\bm{v},l}=F_{\bm{v},l}^{-1}(q)). Using the result of Proposition 3, we obtain λ(𝒗)l=1Lwl(𝒗)λ,l(𝒗)/2.\lambda_{\mathcal{L}}(\bm{v})\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{L},l}(\bm{v})/2.

Proof of Proposition 4.

By the assumption that UlU_{l} is stochastically increasing in VlV_{l} and that (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) is constructed based on (Ul,Vl)(U_{l},V_{l}), U𝒗,lU_{\bm{v},l} is stochastically increasing in V𝒗,lV_{\bm{v},l} for all 𝒗𝒟\bm{v}\in\mathcal{D} and for all ll. Then for (Z(𝒗),Z(𝒗(l)))(Z(\bm{v}),Z(\bm{v}_{(l)})) with respect to the bivariate distribution of (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) with marginal distributions FU𝒗,lF_{U_{\bm{v},l}} and FV𝒗,lF_{V_{\bm{v},l}}, we have that

P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))FV𝒗,l1(q))\displaystyle\;\;\;\;P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})\leq F_{V_{\bm{v},l}}^{-1}(q))
=FV𝒗,l1(0)FV𝒗,l1(q)P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))=zl)𝑑FV𝒗,l(zl)/FV𝒗,l1(0)FV𝒗,l1(q)𝑑FV𝒗,l\displaystyle=\int_{F_{V_{\bm{v},l}}^{-1}(0)}^{F_{V_{\bm{v},l}}^{-1}(q)}P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=z_{l})dF_{V_{\bm{v},l}}(z_{l})\big{/}\int_{F_{V_{\bm{v},l}}^{-1}(0)}^{F_{V_{\bm{v},l}}^{-1}(q)}dF_{V_{\bm{v},l}}
FV𝒗,l1(0)FV𝒗,l1(q)P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))=FV𝒗,l1(0))𝑑FV𝒗,l(zl)/FV𝒗,l1(0)FV𝒗,l1(q)𝑑FV𝒗,l\displaystyle\leq\int_{F_{V_{\bm{v},l}}^{-1}(0)}^{F_{V_{\bm{v},l}}^{-1}(q)}P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(0))dF_{V_{\bm{v},l}}(z_{l})\big{/}\int_{F_{V_{\bm{v},l}}^{-1}(0)}^{F_{V_{\bm{v},l}}^{-1}(q)}dF_{V_{\bm{v},l}}
=P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))=FV𝒗,l1(0)).\displaystyle=P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(0)).

It follows that the boundary cdf of the NNMP model

F1|2(FZ(𝒗)1(q)F𝒁Ne(𝒗)1(0))\displaystyle\;\;\;\;F_{1|2}(F_{Z(\bm{v})}^{-1}(q)\mid F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0)) (4)
=P(Z(𝒗)FZ(𝒗)1(q)Z(𝒗(1))=FZ(𝒗(1))1(0),,Z(𝒗(L))=FZ(𝒗(L))1(0))\displaystyle=P(Z(\bm{v})\leq F_{Z(\bm{v})}^{-1}(q)\mid Z(\bm{v}_{(1)})=F_{Z(\bm{v}_{(1)})}^{-1}(0),\dots,Z(\bm{v}_{(L)})=F_{Z(\bm{v}_{(L)})}^{-1}(0))
=l=1Lwl(𝒗)P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))=FV𝒗,l1(0))\displaystyle=\sum_{l=1}^{L}w_{l}(\bm{v})P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})=F_{V_{\bm{v},l}}^{-1}(0))
l=1Lwl(𝒗)P(Z(𝒗)FU𝒗,l1(q)Z(𝒗(l))FV𝒗,l1(q)),\displaystyle\geq\sum_{l=1}^{L}w_{l}(\bm{v})P(Z(\bm{v})\leq F_{U_{\bm{v},l}}^{-1}(q)\mid Z(\bm{v}_{(l)})\leq F_{V_{\bm{v},l}}^{-1}(q)),

Taking q0+q\rightarrow 0^{+} on both sides of (4), we obtain F1|2(FZ(𝒗)1(0)|F𝒁Ne(𝒗)1(0))l=1Lwl(𝒗)λ,l(𝒗)F_{1|2}(F_{Z(\bm{v})}^{-1}(0)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0))\geq\sum_{l=1}^{L}w_{l}(\bm{v})\lambda_{\mathcal{L},l}(\bm{v}). If there exists some ll such that λ,l(𝒗)>0\lambda_{\mathcal{L},l}(\bm{v})>0, then F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(0))F_{1|2}(F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(0)) has strictly positive mass at q=0q=0. Similarly, we can prove the result for F1|2(FZ(𝒗)1(q)|F𝒁Ne(𝒗)1(1))F_{1|2}(F_{Z(\bm{v})}^{-1}(q)\,|\,F_{\bm{Z}_{\text{Ne}(\bm{v})}}^{-1}(1)).

B Additional Data Illustrations

We provide three additional simulation experiments to demonstrate the benefits of the proposed NNMP framework. The first simulation experiment demonstrates that the Gaussian NNMP (GNNMP) provides a good approximation to Gaussian random fields. The second example illustrates the ability of the framework to handle skewed data using a skew-Gaussian NNMP (skew-GNNMP) model. Next, we demonstrate the effectiveness of the NNMP model for bounded spatial data. The last part of this section examines the non-Gaussian process assumption for the analysis of Mediterranean Sea surface temperature data example presented in the main paper, using a limited region to compare the GNNMP and the nearest-neighbor Gaussian process (NNGP) models.

In each of the experiments, we created a regular grid of 200×200200\times 200 resolution on a unit square domain, and generated data on each grid location. We then randomly selected a subset of the data as the reference set with a random ordering for model fitting. For the purpose of illustration, we chose neighbor size L=10L=10 for both cases. Results are based on posterior samples collected every 10 iterations from a Markov chain of 30000 iterations, with the first 10000 samples as a burn-in.

B.1 Additional Simulation Experiment 1

We generated data from a spatially varying regression, y(𝒗)=𝒙(𝒗)𝜷+z(𝒗)+ϵ(𝒗),𝒗𝒟y(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+z(\bm{v})+\epsilon(\bm{v}),\bm{v}\in\mathcal{D}, where z(𝒗)z(\bm{v}) is a standard Gaussian process with an exponential correlation function with range parameter 1/121/12. We included an intercept and a covariate drawn from N(0,1)N(0,1) in the model, and chose 𝜷=(β0,β1)=(1,5)\bm{\beta}=(\beta_{0},\beta_{1})^{\top}=(1,5)^{\top}, and τ2=0.1\tau^{2}=0.1. The setting followed the simulation experiment in Datta et al. (2016a).

We applied two models. The first one assumes that z(𝒗)z(\bm{v}) follows an NNGP model with variance σ02\sigma^{2}_{0} and exponential correlation function with range parameter ϕ0\phi_{0}. The second one assumes that z(𝒗)z(\bm{v}) follows a stationary GNNMP model, i.e., μl=0\mu_{l}=0 and σl2=σ2\sigma_{l}^{2}=\sigma^{2} for all ll, such that z(𝒗)z(\bm{v}) has a stationary marginal N(0,σ2)N(0,\sigma^{2}). For the GNNMP, we used exponential correlation functions with range parameter ϕ\phi and ζ\zeta, respectively, for the correlation with respect to the component density and the cutoff points kernel function. For the NNGP model, we implement the latent NNGP algorithm from the spNNGP package in R \citepsmspnngp. We trained both models using 2000 of the 2500 observations, and used the remaining 500 observations for model comparison.

Refer to caption
(a) True GP
Refer to caption
(b) NNGP (L=10L=10)
Refer to caption
(c) GNNMP (L=10L=10)
Figure 1: Additional simulation experiment 1 data analysis. Interpolated surface of the true Gaussian random field and posterior median estimates from the NNGP and GNNMP models.

For both models, the regression coefficients 𝜷\bm{\beta} were assigned flat priors. The variances σ02\sigma_{0}^{2} and σ2\sigma^{2} received the same inverse gamma prior IG(2,1)\mathrm{IG}(2,1), and τ2\tau^{2} was assigned IG(2,0.1)\mathrm{IG}(2,0.1). The range parameter ϕ0\phi_{0} of the NNGP received a uniform prior Unif(1/30,1/3)\mathrm{Unif}(1/30,1/3), while the range parameters ϕ\phi and ζ\zeta of the GNNMP received inverse gamma priors IG(3,1/3)\mathrm{IG}(3,1/3) and IG(3,0.2)\mathrm{IG}(3,0.2), respectively. Regarding the logit Gaussian distribution parameters, 𝜸\bm{\gamma} and κ2\kappa^{2}, we used N(𝜸|(1.5,0,0), 2𝐈3)N(\bm{\gamma}\,|\,(-1.5,0,0)^{\top},\,2\mathbf{I}_{3}) and IG(3,1)\mathrm{IG}(3,1) priors, respectively.

The posterior estimates from the two models for the common parameters, 𝜷\bm{\beta} and τ2\tau^{2}, were quite close. The posterior mean and 95%95\% credible interval estimates of β0\beta_{0} and β1\beta_{1} were 1.32(1.11,1.54)1.32\,(1.11,1.54) and 5.01(4.99,5.04)5.01\,(4.99,5.04) from the GNNMP model, and 1.25(0.83,1.62)1.25\,(0.83,1.62) and 5.01(4.99,5.04)5.01\,(4.99,5.04) from the NNGP model. The corresponding estimates of τ2\tau^{2} were 0.12(0.09,0.15)0.12\,(0.09,0.15) and 0.10(0.07,0.12)0.10\,(0.07,0.12) from the GNNMP and NNMP models, respectively.

Table 1: Additional simulation experiment 1 data analysis. Performance metrics of different models
RMSPE 95% CI 95% CI width CRPS PPLC DIC
GNNMP 0.57 0.96 2.51 0.32 461.40 2656.67
NNGP 0.54 0.95 2.09 0.30 382.15 2268.79

Table 1 shows the performance metrics of the two models. The performance metrics of the GNNMP model are comparable to those of the NNGP model, the model assumptions of which are more well suited to the particular synthetic data example. The posterior median estimate of the spatial random effects from both models are shown in Figure 1. We can see that the predictive field given by the GNNMP looks close to the true field and that predicted by the NNGP. On the whole, the GNNMP model provides a reasonably good approximation to the Gaussian random field.

B.2 Additional Simulation Experiment 2

Refer to caption
(a) True y(𝒗)y(\bm{v}) (σ1=5\sigma_{1}=-5)
Refer to caption
(b) True y(𝒗)y(\bm{v}) (σ1=1\sigma_{1}=1)
Refer to caption
(c) True y(𝒗)y(\bm{v}) (σ1=10\sigma_{1}=10)
Refer to caption
(d) Skew-GNNMP
Refer to caption
(e) Skew-GNNMP
Refer to caption
(f) Skew-GNNMP
Figure 2: Additional simulation example 2 data analysis. Top panels are interpolated surfaces of y(𝒗)y(\bm{v}) generated by (5). Bottom panels are the posterior median estimates from the skew-GNNMP model.

We generated data from the skew-Gaussian process (Zhang and El-Shaarawi, 2010),

y(𝒗)=σ1|ω1(𝒗)|+σ2ω2(𝒗),𝒗𝒟y(\bm{v})=\sigma_{1}\,|\omega_{1}(\bm{v})|+\sigma_{2}\,\omega_{2}(\bm{v}),\;\;\bm{v}\in\mathcal{D} (5)

where ω1(𝒗)\omega_{1}(\bm{v}) and ω2(𝒗)\omega_{2}(\bm{v}) are both standard Gaussian processes with correlation matrix specified by an exponential correlation function with range parameter 1/121/12. The parameter σ1\sigma_{1}\in\mathbb{R} controls the skewness, whereas σ2>0\sigma_{2}>0 is a scale parameter. The model has a stationary skew-Gaussian marginal density SN(0,σ12+σ22,σ1/σ2)\mathrm{SN}(0,\sigma_{1}^{2}+\sigma_{2}^{2},\sigma_{1}/\sigma_{2}). We took σ2=1\sigma_{2}=1, and generated data with σ1=5\sigma_{1}=-5, 11 and 1010, resulting in three different random fields that are, respectively, moderately negative-skewed, slightly positive-skewed, and strongly positive-skewed, as shown in Figure 2(a)-2(c).

We applied the stationary skew-GNNMP model. The model is obtained as a special case of the skew-GNNMP model discussed in Section 2.3 of the main paper, taking λl=λ\lambda_{l}=\lambda, μl=0\mu_{l}=0, and σl2=σ2\sigma_{l}^{2}=\sigma^{2}, for all ll. Here, λ\lambda\in\mathbb{R} controls the skewness, such that a large positive (negative) value of λ\lambda indicates strong positive (negative) skewness. If λ=0\lambda=0, the skew-GNNMP model reduces to the GNNMP model. After marginalizing out z0z_{0}, we obtain a stationary skew-Gaussian marginal density SN(0,λ2+σ2,λ/σ)\mathrm{SN}(0,\lambda^{2}+\sigma^{2},\lambda/\sigma). We completed the full Bayesian specification for the model, by assigning priors N(λ| 0,5)N(\lambda\,|\,0,5), IG(σ2| 2,1)\mathrm{IG}(\sigma^{2}\,|\,2,1), IG(ϕ| 3,1/3)\mathrm{IG}(\phi\,|\,3,1/3), IG(ζ| 3,0.2)\mathrm{IG}(\zeta\,|\,3,0.2), N(𝜸|(1.5,0,0), 2𝐈3)N(\bm{\gamma}\,|\,(-1.5,0,0)^{\top},\,2\mathbf{I}_{3}), and IG(κ2| 3,1)\mathrm{IG}(\kappa^{2}\,|\,3,1), where ζ\zeta is the range parameter of the exponential correlation function specified for the cutoff point kernel.

Refer to caption
(a) Estimated marginal (σ1=5\sigma_{1}=-5)
Refer to caption
(b) Estimated marginal (σ1=1\sigma_{1}=1)
Refer to caption
(c) Estimated marginal (σ1=10\sigma_{1}=10)
Figure 3: Additional simulation example 2 data analysis. Green lines are true marginal densities. Posterior means (dashed lines) and 95% credible intervals (shaded regions) of the estimated marginal densities.

We focus on the model performance on capturing skewness . The posterior mean and 95% credible interval of λ\lambda for the three scenarios were 3.65(4.10,3.27)-3.65\,(-4.10,-3.27), 1.09(0.91,1.28)1.09\,(0.91,1.28) and 7.69(6.88,8.68)7.69\,(6.88,8.68), respectively, indicating the model’s ability to estimate different levels of skewness. The bottom row of Figure 2 shows that the posterior median estimates of the surfaces capture well features of the true surfaces, even when the level of skewness is small, thus demonstrating that the model is also able to recover near-Gaussian features. Figure 3 plots the posterior mean and pointwise 95% credible interval for the marginal density, overlaid on the histogram of the simulated data for each of the three cases. These estimates demonstrate the adaptability of the skew-GNNMP model in capturing skewed random fields with different levels of skewness.

B.3 Additional Simulation Experiment 3

Many spatial processes are measured over a compact interval. As an example, data on proportions are common in ecological applications. In this experiment, we demonstrate the effectiveness of the NNMP model for directly modeling bounded spatial data. We generated data using the model y(𝒗)=F1(Φ(ω(𝒗)))y(\bm{v})=F^{-1}\big{(}\Phi(\omega(\bm{v}))\big{)}, where the cdf FF corresponds to a beta distribution, denoted as Beta(a0,b0)\mathrm{Beta}(a_{0},b_{0}), and ω(𝒗)\omega(\bm{v}) is a standard Gaussian process with exponential correlation function with range parameter 0.10.1. We set a0=3a_{0}=3, b0=6b_{0}=6.

We applied a Gaussian copula NNMP model with stationary marginal Beta(a,b)\mathrm{Beta}(a,b), with the same spatial Gaussian copula and prior specification used in the second experiment. We used 2000 observations to train the model. Figure 4(b) shows the estimated random field which captures well the main features of the true field in Figure 4(a). The posterior mean and pointwise 95%95\% credible interval of the estimated marginal density in Figure 4(c) overlay on the data histogram. These show that the beta NNMP estimation and prediction provide good approximation to the true field.

It is worth mentioning that implementing the beta NNMP model is simpler than fitting existing models for data corresponding to proportions. For example, a spatial Gaussian copula model, that corresponds to the data generating process of this experiment, involves computations for large matrices. Alternatively, if a multivariate non-Gaussian copula is used, the resulting likelihood can be intractable and require certain approximations. Another model that is commonly used in this setting is defined analogously to a spatial generalized linear mixed model. The spatial element in the model is introduced through the transformed mean of the observations. A sample-based approach to fit such a model requires sampling a large number of highly correlated latent variables. We conducted an experiment to demonstrate the effectiveness of the beta NNMP to approximate random fields simulated by the link function approach. We generated data as follows.

y(𝒗)|μ(𝒗),ψ\displaystyle y(\bm{v})\,|\,\mu(\bm{v}),\psi Beta(μ(𝒗)ψ,(1μ(𝒗))ψ),\displaystyle\sim\mathrm{Beta}(\mu(\bm{v})\psi,(1-\mu(\bm{v}))\psi),
logit(μ(𝒗))\displaystyle\mathrm{logit}(\mu(\bm{v})) =μ0+σ0ω(𝒗).\displaystyle=\mu_{0}+\sigma_{0}\omega(\bm{v}).

The above model is analogous to a spatial generalized linear mixed model where the mean μ(𝒗)\mu(\bm{v}) of the beta distribution is modeled via a logit link function, and ω(𝒗)\omega(\bm{v}) is a standard Gaussian process with exponential correlation function with range parameter 0.10.1. We set ψ=20\psi=20, μ0=0.5\mu_{0}=-0.5 and σ0=0.8\sigma_{0}=0.8.

Refer to caption
(a) True y(𝒗)y(\bm{v})
Refer to caption
(b) Beta NNMP
Refer to caption
(c) Estimated marginals
Figure 4: Additional simulation example 3 data analysis. Panels (a) and (b) are interpolated surfaces of the true field and posterior median estimate from the beta NNMP model, respectively. In Panel (c), the green dotted line corresponds to the true marginal. The red dash line and shaded region are the posterior mean and pointwise 95%95\% credible interval for the estimated marginal.

Since our purpose is primarily demonstrative, we applied a Gaussian copula NNMP model with a stationary beta marginal Beta(a,b)\mathrm{Beta}(a,b), referred to as the beta NNMP model. The correlation parameter of the Gaussian copula was specified by an exponential correlation function with range parameter ϕ\phi. We specified an exponential correlation function for the random cutoff points kernel function with range parameter ζ\zeta. The Bayesian model is fully specified with a IG(3,1/3)\mathrm{IG}(3,1/3) prior for ϕ\phi, a Ga(1,1)\mathrm{Ga}(1,1) prior for aa and bb, a IG(3,0.2)\mathrm{IG}(3,0.2) prior for ζ\zeta, N(𝜸|(1.5,0,0), 2𝐈3)N(\bm{\gamma}\,|\,(-1.5,0,0)^{\top},\,2\mathbf{I}_{3}) and IG(κ2| 3,1)\mathrm{IG}(\kappa^{2}\,|\,3,1).

We trained the model using 2000 observations. Figure 5(a)-(b) shows the interpolated surface of the true field and the predictive field given by the beta NNMP model. Although the beta NNMP’s stationary marginal distribution assumption does not align with the true model, we can see that the predictive filed was able to capture the main feature of the true field. Moreover, it is worth mentioning that the MCMC algorithm for the beta NNMP to fit the data set took around 18 minutes with 30000 iterations. This is substantially faster than the MCMC algorithm for fitting the true model which involves sampling a large number of highly correlated latent variables.

B.4 Mediterranean Sea Surface Temperature Regional Analysis

In this section, we examine the non-Gaussian process assumption for the Mediterranean Sea surface temperature data. We focus on sea surface temperature (SST) over an area near the Gulf of Lion, along the islands near the shores of Spain, France, Monaco and Italy, between 0 - 9 E. longitude and 33.5 - 44.5 N. latitude. The SST observations in the region, as shown in Figure 6(a), are very heterogeneous, implying that the short range variability can be non-Gaussian. We compare the GNNMP with the NNGP in a spatially varying regression model, demonstrating the benefit of using a non-Gaussian process to explain the SST variability. In particular, the GNNMP has the same Gaussian marginals as the NNGP, but its finite-dimensional distribution is a mixture of Gaussian distributions.

Refer to caption
(a) True y(𝒗)y(\bm{v})
Refer to caption
(b) Beta NNMP
Figure 5: Additional simulation experiment 3 data analysis. Interpolated surfaces of the true field and posterior median estimate from the beta NNMP model.

We consider the following spatially varying regression model,

y(𝒗)=𝒙(𝒗)𝜷+z(𝒗)+ϵ(𝒗),𝒗𝒟,y(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+z(\bm{v})+\epsilon(\bm{v}),\;\;\bm{v}\in\mathcal{D}, (6)

where y(𝒗)y(\bm{v}) is the SST observation, 𝒙(𝒗)=(1,v1,v2)\bm{x}(\bm{v})=(1,v_{1},v_{2})^{\top} includes longitude v1v_{1} and latitude v2v_{2} to account for the long range variability in SST with regression parameters 𝜷=(β0,β1,β2)\bm{\beta}=(\beta_{0},\beta_{1},\beta_{2})^{\top}, z(𝒗)z(\bm{v}) is a spatial process, and ϵ(𝒗)i.i.d.N(0,τ2)\epsilon(\bm{v})\stackrel{{\scriptstyle i.i.d.}}{{\sim}}N(0,\tau^{2}) represents the micro-scale variability and/or the measurement error.

We model z(𝒗)z(\bm{v}) with the GNNMP defined in (7) with μl=0\mu_{l}=0 and σl2=σ2\sigma_{l}^{2}=\sigma^{2}, for all ll. For comparison, we also applied an NNGP model for z(𝒗)z(\bm{v}) with variance σ02\sigma^{2}_{0} and exponential correlation function with range parameter ϕ0\phi_{0}. For the GNNMP, we used exponential correlation functions with range parameter ϕ\phi and ζ\zeta, respectively, for the correlation with respect to the component density, and the cutoff point kernel. For both models, the regression coefficients 𝜷\bm{\beta} were assigned flat priors. The variances σ02\sigma_{0}^{2} and σ2\sigma^{2} received the same inverse gamma prior IG(2,1)\mathrm{IG}(2,1), and τ2\tau^{2} was assigned IG(2,0.1)\mathrm{IG}(2,0.1). The range parameter ϕ0\phi_{0} of the NNGP received a uniform prior Unif(1/30,1/3)\mathrm{Unif}(1/30,1/3), while the range parameters ϕ\phi and ζ\zeta of the GNNMP received inverse gamma priors IG(3,1/3)\mathrm{IG}(3,1/3) and IG(3,0.2)\mathrm{IG}(3,0.2), respectively. Regarding the logit Gaussian distribution parameters, 𝜸\bm{\gamma} and κ2\kappa^{2}, we used N((1.5,0,0), 2𝐈3)N((-1.5,0,0)^{\top},\,2\mathbf{I}_{3}) and IG(3,1)\mathrm{IG}(3,1) priors, respectively.

We took around 10% of the data in the region, as the held-out data for model comparison, and used the remaining 580 observations to train models. We compare models based on RMSPE, 95% posterior credible interval coverage rate (95% CI coverage), deviance information criterion (DIC; \citealtsmspiegelhalter2002bayesian), PPLC \citepsmgelfand1998model, and continuous ranked probability score (CRPS; \citealtsmgneiting2007strictly). To effectively compare the GNNMP and NNGP models, we first trained the NNGP model with neighbor sizes LL from 5 to 20, and selected the optimal neighbor size, L=11L=11 that corresponds to the smallest RMSPE. We then trained the GNNMP model with the same LL. In all cases, we ran the MCMC with 120000 iterations, discarding the first 20000 samples, and collected samples every 20 iterations. The computing time was around 12 and 9 minutes for the GNNMP and NNGP models, respectively.

Refer to caption
(a) Regional SST
Refer to caption
(b) Predicted SST (GNNMP)
Refer to caption
(c) Predicted SST (NNGP)
Figure 6: SST data analysis. Panel (a) shows the observations at the selected region. Panels (b) - (d) plot posterior median estimates of the SST by different models.
Table 2: SST data analysis. Performance metrics of different models
RMSPE 95% CI coverage 95% CI width CRPS PPLC DIC
GNNMP 0.92 0.97 4.09 0.50 135.58 539.53
NNGP 1.00 0.97 4.30 0.56 526.81 1083.61

We report the results for both models. The posterior mean and 95%95\% credible interval estimates of the regression intercept from the GNNMP was higher than the NNGP. They were 28.55(22.81,35.34)28.55\,(22.81,35.34) and 27.18(23.53,30.94)27.18\,(23.53,30.94), respectively. The corresponding posterior estimates of the coefficients for longitude and latitude given by the GNNMP and the NNGP were 0.09(0.03,0.24);0.09\,(-0.03,0.24); 0.32(0.50,0.17)-0.32\,(-0.50,-0.17) and 0.09(0.01,0.17);0.29(0.39,0.19)0.09\,(0.01,0.17);-0.29\,(-0.39,-0.19), respectively. Both models indicated that there was a trend of SST decreasing in the latitude at the selected region. For the error variance τ2\tau^{2}, the GNNMP provided a smaller estimate 0.12(0.02,0.38)0.12\,(0.02,0.38), compared to 0.45(0.02,0.92)0.45\,(0.02,0.92) from the NNGP.

Regarding the model performance metrics in Table 2, both the PPLC and DIC suggest that the GNNMP had a better goodness-of-fit than the NNGP. For out-of-sample prediction, the GNNMP produced smaller RMSPE and CRPS than the NNGP. Both models gave the same 95% CI coverage, while the one from GNNMP had a narrower width. Figure 6(b)-6(c) show the posterior median estimates of the temperature field from both models. We can see that both models yield estimates that resemble the pattern in the observations. The predictive surface produced by the NNGP depicts some very localized, unrealistic features. These are not present in the results from the GNNMP.

C Implementation Details

We provide model implementation details for the data examples. In particular, Section C.1 corresponds to the GNNMP model applied in Sections B.1 and B.4. Section C.2 discusses the stationary skew-GNNMP model for the data example in Section B.2, and the extended skew-GNNMP model for the Mediterranean Sea surface data analysis of the main paper. Section C.3 introduces the Gaussian and Gumbel copula NNMP models implemented for the data example in Section B.3, and the simulation study of the main paper.

C.1 GNNMP Models

We consider the spatially varying regression model, y(𝒗)=𝒙(𝒗)𝜷+z(𝒗)+ϵ(𝒗),𝒗𝒟y(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+z(\bm{v})+\epsilon(\bm{v}),\bm{v}\in\mathcal{D}, where ϵ(𝒗)i.i.d.N(0,τ2)\epsilon(\bm{v})\stackrel{{\scriptstyle i.i.d.}}{{\sim}}N(0,\tau^{2}), and the spatial random effect z(𝒗)z(\bm{v}) follows a stationary GNNMP model. The associated conditional density of the model is

p(z(𝒗)|𝒛Ne(𝒗))=l=1Lwl(𝒗)N(z(𝒗)|ρl(𝒗)z(𝒗(l)),σ2(1(ρl(𝒗))2)),p(z(\bm{v})\,|\,\bm{z}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})N(z(\bm{v})\,|\,\rho_{l}(\bm{v})z(\bm{v}_{(l)}),\sigma^{2}(1-(\rho_{l}(\bm{v}))^{2})), (7)

where ρl(𝒗)ρl(𝒗𝒗(l))=exp(𝒗𝒗(l)/ϕ)\rho_{l}(\bm{v})\equiv\rho_{l}(||\bm{v}-\bm{v}_{(l)}||)=\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi). For the weights, we consider an exponential correlation function with range parameter ζ\zeta for the kernel function that defines the random cutoff points. The Bayesian model is completed with priors N(𝜷|𝝁𝜷,𝑽𝜷)N(\bm{\beta}\,|\,\bm{\mu}_{\bm{\beta}},\bm{V}_{\bm{\beta}}), IG(σ2|uσ2,vσ2)\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}}), IG(τ2|uτ2,vτ2)\mathrm{IG}(\tau^{2}\,|\,u_{\tau^{2}},v_{\tau^{2}}), IG(ϕ|uϕ,vϕ)\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi}), IG(ζ|uζ,vζ)\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta}), N(𝜸|𝝁𝜸,𝑽𝜸)N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}}), IG(κ2|uκ2,vκ2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}}).

Let y(𝒔i)y(\bm{s}_{i}), i=1,,ni=1,\dots,n, be the observations over reference set 𝒮=(𝒔1,,𝒔n)\operatorname*{\mathcal{S}}=(\bm{s}_{1},\dots,\bm{s}_{n}). We introduce the MCMC sampler. It involves sampling the latent variables z(𝒔i)z(\bm{s}_{i}), but it is easily developed based on the algorithm described in the main paper. For each z(𝒔i)z(\bm{s}_{i}), i=3,,ni=3,\dots,n, we introduce a configuration variable i\ell_{i}, taking values in {1,,iL}\{1,\dots,i_{L}\} where iL=(i1)Li_{L}=(i-1)\wedge L, such that Pr(i|𝒘(𝒔i))=l=1iLwl(𝒔i)δl(i)\mathrm{Pr}(\ell_{i}\,|\,\bm{w}(\bm{s}_{i}))=\sum_{l=1}^{i_{L}}w_{l}(\bm{s}_{i})\delta_{l}(\ell_{i}), where 𝒘(𝒔i)=(w1(𝒔i),,wiL(𝒔i))\bm{w}(\bm{s}_{i})=(w_{1}(\bm{s}_{i}),\dots,w_{i_{L}}(\bm{s}_{i}))^{\top} and δl(i)=1\delta_{l}(\ell_{i})=1 if i=l\ell_{i}=l and 0 otherwise. To allow for efficient simulation of parameters 𝜸=(γ0,γ1,γ2)\bm{\gamma}=(\gamma_{0},\gamma_{1},\gamma_{2})^{\top} and κ2\kappa^{2} for the weights, we associate each z(𝒔i)z(\bm{s}_{i}) with a latent Gaussian variable with mean μ(𝒔i)=γ0+si1γ1+si2γ2\mu(\bm{s}_{i})=\gamma_{0}+s_{i1}\gamma_{1}+s_{i2}\gamma_{2} and variance κ2\kappa^{2}, where 𝒔i=(si1,si2)\bm{s}_{i}=(s_{i1},s_{i2}), for i=3,,ni=3,\dots,n. There is a one-to-one correspondence between the configuration variables i\ell_{i} and latent variables tit_{i}: i=l\ell_{i}=l if and only if ti(r𝒔i,l1,r𝒔i,l)t_{i}\in(r_{\bm{s}_{i},l-1}^{*},r_{\bm{s}_{i},l}^{*}) where r𝒔i,l=log(r𝒔i,l/(1r𝒔i,l))r^{*}_{\bm{s}_{i},l}=\log(r_{\bm{s}_{i},l}/(1-r_{\bm{s}_{i},l})), for l=1,,iLl=1,\dots,i_{L}. The posterior distribution of the model parameters is given by

N(𝜷|𝝁𝜷,𝑽𝜷)×IG(τ2|uτ2,vτ2)×IG(σ2|uσ2,vσ2)×IG(ϕ|uϕ,vϕ)×IG(ζ|uζ,vζ)\displaystyle N(\bm{\beta}\,|\,\bm{\mu}_{\bm{\beta}},\bm{V}_{\bm{\beta}})\times\mathrm{IG}(\tau^{2}\,|\,u_{\tau^{2}},v_{\tau^{2}})\times\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}})\times\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\times\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta})
×N(𝜸|𝝁𝜸,𝑽𝜸)×IG(κ2|uκ2,vκ2)×i=1nN(y(𝒔i)|𝒙(𝒔i)𝜷+z(𝒔i),τ2)\displaystyle\times N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}})\times\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}})\times\prod_{i=1}^{n}N(y(\bm{s}_{i})\,|\,\bm{x}(\bm{s}_{i})^{\top}\bm{\beta}+z(\bm{s}_{i}),\tau^{2})
×N(𝒕|𝑫𝜸,κ2𝐈n2)×N(z(𝒔1)| 0,σ2)×N(z(𝒔2)ρ1(𝒔2)z(𝒔1),σ2(1(ρ1(𝒔2))2))\displaystyle\times N(\bm{t}\,|\,\bm{D}\bm{\gamma},\,\kappa^{2}\mathbf{I}_{n-2})\times N(z(\bm{s}_{1})\,|\,0,\sigma^{2})\times N(z(\bm{s}_{2})\mid\rho_{1}(\bm{s}_{2})z(\bm{s}_{1}),\sigma^{2}(1-(\rho_{1}(\bm{s}_{2}))^{2}))
×i=3nl=1iLN(z(𝒔)ρl(𝒔i)z(𝒔(il)),σ2(1(ρl(𝒔i))2))𝟙(r𝒔i,l1,r𝒔i,l)(ti),\displaystyle\times\prod_{i=3}^{n}\sum_{l=1}^{i_{L}}N(z(\bm{s})\mid\rho_{l}(\bm{s}_{i})z(\bm{s}_{(il)}),\sigma^{2}(1-(\rho_{l}(\bm{s}_{i}))^{2}))\mathbbm{1}_{(r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l})}(t_{i}),

where the vector 𝒕=(t3,,tn)\bm{t}=(t_{3},\dots,t_{n})^{\top}, and the matrix 𝑫\bm{D} is (n2)×3(n-2)\times 3 such that the iith row is (1,s2+i,1,s2+i,2)(1,s_{2+i,1},s_{2+i,2}).

We describe the MCMC sampler to simulate from the posterior distribution of model parameters (𝜷,𝜸,σ2,ϕ,ζ,τ2,κ2)(\bm{\beta},\bm{\gamma},\sigma^{2},\phi,\zeta,\tau^{2},\kappa^{2}) and latent variables {ti}i=3n,{z(𝒔i)}i=1n\{t_{i}\}_{i=3}^{n},\{z(\bm{s}_{i})\}_{i=1}^{n}. Denote by 𝒚𝒮=(y(𝒔1),,y(𝒔n))\bm{y}_{\operatorname*{\mathcal{S}}}=(y(\bm{s}_{1}),\dots,y(\bm{s}_{n}))^{\top}, 𝒛𝒮=(z(𝒔1),,z(𝒔n))\bm{z}_{\operatorname*{\mathcal{S}}}=(z(\bm{s}_{1}),\dots,z(\bm{s}_{n}))^{\top}, and let 𝑿\bm{X} be the covariate matrix with the iith row being 𝒙(𝒔i)\bm{x}(\bm{s}_{i})^{\top}. The posterior full conditional distribution for 𝜷\bm{\beta} is N(𝜷|𝝁β,𝑽𝜷)N(\bm{\beta}\,|\,\bm{\mu}_{\beta}^{*},\bm{V}_{\bm{\beta}}^{*}) where 𝑽𝜷=(𝑽𝜷1+τ2𝑿𝑿)1\bm{V}_{\bm{\beta}}^{*}=(\bm{V}_{\bm{\beta}}^{-1}+\tau^{-2}\bm{X}^{\top}\bm{X})^{-1} and 𝝁𝜷=𝑽𝜷(𝑽𝜷1𝝁𝜷+τ2𝑿(𝒚𝒮𝒛𝒮))\bm{\mu}_{\bm{\beta}}^{*}=\bm{V}_{\bm{\beta}}^{*}(\bm{V}_{\bm{\beta}}^{-1}\bm{\mu}_{\bm{\beta}}+\tau^{-2}\bm{X}^{\top}(\bm{y}_{\operatorname*{\mathcal{S}}}-\bm{z}_{\operatorname*{\mathcal{S}}})). An inverse gamma prior for τ2\tau^{2} yields an IG(τ2|uτ2+n/2,vτ2+i=1nei2/2)\mathrm{IG}(\tau^{2}\,|\,u_{\tau^{2}}+n/2,v_{\tau^{2}}+\sum_{i=1}^{n}e_{i}^{2}/2) posterior full conditional, where ei=y(𝒔i)𝒙(𝒔i)𝜷z(𝒔i)e_{i}=y(\bm{s}_{i})-\bm{x}(\bm{s}_{i})^{\top}\bm{\beta}-z(\bm{s}_{i}).

To update the parameter ζ\zeta, we first marginalize out the latent variables tit_{i} from the joint posterior distribution. The posterior full conditional distribution of ζ\zeta is proportional to IG(ζ|uζ,vζ)i=3n{G𝒔i(r𝒔i,i|μ(𝒔i),κ2)G𝒔i(r𝒔i,i1|μ(𝒔i),κ2)}\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta})\prod_{i=3}^{n}\{G_{\bm{s}_{i}}(r_{\bm{s}_{i},\ell_{i}}\,|\,\mu(\bm{s}_{i}),\kappa^{2})-G_{\bm{s}_{i}}(r_{\bm{s}_{i},\ell_{i}-1}\,|\,\mu(\bm{s}_{i}),\kappa^{2})\}. We update ζ\zeta on its log scale using a random walk Metropolis step. The posterior full conditional distribution of tit_{i} is l=1iLql(𝒔i)TN(ti|μ(𝒔i),κ2;r𝒔i,l1,r𝒔i,l)\sum_{l=1}^{i_{L}}q_{l}(\bm{s}_{i})\mathrm{TN}(t_{i}\,|\,\mu(\bm{s}_{i}),\kappa^{2};r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l}), where ql(𝒔i)wl(𝒔i)f𝒔i,l(z(𝒔i)|z(𝒔(il)),𝜽)q_{l}(\bm{s}_{i})\propto w_{l}(\bm{s}_{i})f_{\bm{s}_{i},l}(z(\bm{s}_{i})\,|\,z(\bm{s}_{(il)}),\bm{\theta}) and wl(𝒔i)=G𝒔i(r𝒔i,l|μ(𝒔i),κ2)G𝒔i(r𝒔i,l1|μ(𝒔i),κ2)w_{l}(\bm{s}_{i})=G_{\bm{s}_{i}}(r_{\bm{s}_{i},l}\,|\,\mu(\bm{s}_{i}),\kappa^{2})-G_{\bm{s}_{i}}(r_{\bm{s}_{i},l-1}\,|\,\mu(\bm{s}_{i}),\kappa^{2}), for l=1,,Ll=1,...,L. Hence, each tit_{i} can be readily updated by sampling from the ll-th truncated Gaussian with probability proportional to ql(𝒔i)q_{l}(\bm{s}_{i}). The posterior full conditional distribution of 𝜸\bm{\gamma} is N(𝜸|𝝁𝜸,𝑽𝜸)N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}}^{*},\bm{V}_{\bm{\gamma}}^{*}) where 𝑽𝜸=(𝑽𝜸1+κ2𝑫𝑫)1\bm{V}_{\bm{\gamma}}^{*}=(\bm{V}_{\bm{\gamma}}^{-1}+\kappa^{-2}\bm{D}^{\top}\bm{D})^{-1} and 𝝁𝜸=𝑽𝜸(𝑽𝜸1𝝁𝜸+κ2𝑫𝒕)\bm{\mu}_{\bm{\gamma}}^{*}=\bm{V}_{\bm{\gamma}}^{*}(\bm{V}_{\bm{\gamma}}^{-1}\bm{\mu}_{\bm{\gamma}}+\kappa^{-2}\bm{D}^{\top}\bm{t}). The posterior full conditional distribution of κ2\kappa^{2} is IG(κ2|uκ2+(n2)/2,vκ2+i=3n(tiμ(𝒔i))2/2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}}+(n-2)/2,v_{\kappa^{2}}+\sum_{i=3}^{n}(t_{i}-\mu(\bm{s}_{i}))^{2}/2).

The posterior full conditional distribution of σ2\sigma^{2} is IG(σ2|uσ2+n/2,vσ2+i=1n(z(𝒔i)ρi(𝒔i)z(𝒔(i,i)))2/{2(1(ρi(𝒔i))2)})\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}}+n/2,v_{\sigma^{2}}+\sum_{i=1}^{n}(z(\bm{s}_{i})-\rho_{\ell_{i}}(\bm{s}_{i})z(\bm{s}_{(i,\ell_{i})}))^{2}/\{2(1-(\rho_{\ell_{i}}(\bm{s}_{i}))^{2})\}). The posterior full conditional distribution of ϕ\phi is proportional to IG(ϕ|uϕ,vϕ)i=2nN(z(𝒔i)ρi(𝒔i)z(𝒔(i,i)),σl2(1(ρi(𝒔i))2))\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\prod_{i=2}^{n}N(z(\bm{s}_{i})\mid\rho_{\ell_{i}}(\bm{s}_{i})z(\bm{s}_{(i,\ell_{i})}),\sigma_{l}^{2}(1-(\rho_{\ell_{i}}(\bm{s}_{i}))^{2})). We update ϕ\phi on its log scale with a random walk Metropolis step. Denote by 𝑨j(i)={j:z(𝒔(j,j))=z(𝒔i)}\bm{A}_{j}^{(i)}=\{j:z(\bm{s}_{(j,\ell_{j})})=z(\bm{s}_{i})\}, and assume ρl(𝒔1)=0,z(𝒔(1l))=0\rho_{l}(\bm{s}_{1})=0,z(\bm{s}_{(1l)})=0 for every ll. The posterior full conditional of the latent spatial random effects z(𝒔i)z(\bm{s}_{i}) is N(z(𝒔i)|σ~i2μ~i,σ~i2)N(z(\bm{s}_{i})\,|\,\tilde{\sigma}_{i}^{2}\tilde{\mu}_{i},\tilde{\sigma}_{i}^{2}) where σ~i2=(τ2+σ2(1(ρi(𝒔i))2)1+j:j𝑨j(i)s~ij2)1\tilde{\sigma}_{i}^{2}=\big{(}\tau^{-2}+\sigma^{-2}(1-(\rho_{\ell_{i}}(\bm{s}_{i}))^{2})^{-1}+\sum_{j:j\in\bm{A}_{j}^{(i)}}\tilde{s}_{ij}^{-2}\big{)}^{-1} and μ~i=τ2(y(𝒔i)x(𝒔i)𝜷)+σ2(1(ρi(𝒔i))2)1ρi(𝒔i)z(𝒔(i,i))+j:j𝑨j(i)z(𝒔j)(ρj(𝒔j))1s~ij2\tilde{\mu}_{i}=\tau^{-2}(y(\bm{s}_{i})-x(\bm{s}_{i})^{\top}\bm{\beta})+\sigma^{-2}(1-(\rho_{\ell_{i}}(\bm{s}_{i}))^{2})^{-1}\rho_{\ell_{i}}(\bm{s}_{i})z(\bm{s}_{(i,\ell_{i})})+\sum_{j:j\in\bm{A}_{j}^{(i)}}z(\bm{s}_{j})(\rho_{\ell_{j}}(\bm{s}_{j}))^{-1}\tilde{s}_{ij}^{-2} with s~ij2=σ2(1(ρj(𝒔j))2)/(ρj(𝒔j))2\tilde{s}_{ij}^{2}=\sigma^{2}(1-(\rho_{\ell_{j}}(\bm{s}_{j}))^{2})/(\rho_{\ell_{j}}(\bm{s}_{j}))^{2}, for i=1,,ni=1,\dots,n.

C.2 Skew-GNNMP Models

C.2.1 Bivariate Skew-Gaussian Distribution

Exploiting the location mixture representation of the skew-Gaussian distribution \citepsmazzalini1996multivariate for bivariate random vector 𝒁=(U,V)\bm{Z}=(U,V), we can write

f(𝒛z0)N((ξu+λuz0ξv+λvz0),σ2(1ρρ1)),z0N(z00,1)I(z00).f(\bm{z}\mid z_{0})\sim N\left(\begin{pmatrix}\xi_{u}+\lambda_{u}z_{0}\\ \xi_{v}+\lambda_{v}z_{0}\end{pmatrix},\sigma^{2}\begin{pmatrix}1&\rho\\ \rho&1\end{pmatrix}\right),\;\;z_{0}\sim N(z_{0}\mid 0,1)I(z_{0}\geq 0). (8)

It follows that, conditional on Z0=z0Z_{0}=z_{0}, the marginal densities of 𝒁\bm{Z} are N(u|ξu+λuz0,σ2)N(u\,|\,\xi_{u}+\lambda_{u}z_{0},\sigma^{2}) and N(v|ξv+λvz0,σ2)N(v\,|\,\xi_{v}+\lambda_{v}z_{0},\sigma^{2}), respectively. Then the conditional density of Z0Z_{0} given V=vV=v is p(z0|v)N(z0|(vξv)/λv,σ2/λv2)N(z0| 0,1)I(z00)p(z_{0}\,|\,v)\propto N(z_{0}\,|\,(v-\xi_{v})/\lambda_{v},\sigma^{2}/\lambda_{v}^{2})N(z_{0}\,|\,0,1)I(z_{0}\geq 0). Therefore, the conditional density p(z0|v)p(z_{0}\,|\,v) is a Gaussian distribution with mean μ0(v)=(vξv)λv/(σ2+λv2)\mu_{0}(v)=(v-\xi_{v})\lambda_{v}/(\sigma^{2}+\lambda_{v}^{2}) and variance τ02(v)=σ2/(σ2+λv2)\tau_{0}^{2}(v)=\sigma^{2}/(\sigma^{2}+\lambda_{v}^{2}), truncated at [0,)[0,\infty), denoted as TN0(z0|μ0(v),τ02(v))\mathrm{TN}_{0}(z_{0}\,|\,\mu_{0}(v),\tau_{0}^{2}(v)). Then the conditional distribution of UU given VV can be written as

fU|V(u|v)=0N(u|μu+ρ(vμv),σ2(1ρ2))TN(z0|μ0(v),τ02(v))𝑑z0,f_{U|V}(u\,|\,v)=\int_{0}^{\infty}N(u\,|\,\mu_{u}+\rho(v-\mu_{v}),\sigma^{2}(1-\rho^{2}))\mathrm{TN}(z_{0}\,|\,\mu_{0}(v),\tau_{0}^{2}(v))dz_{0}, (9)

where μu=ξu+λuz0\mu_{u}=\xi_{u}+\lambda_{u}z_{0}, μv=ξv+λvz0\mu_{v}=\xi_{v}+\lambda_{v}z_{0}.

Let 𝝃=(ξu,ξv)\bm{\xi}=(\xi_{u},\xi_{v})^{\top} and 𝝀=(λu,λv)\bm{\lambda}=(\lambda_{u},\lambda_{v}). After marginalizing out z0z_{0}, the joint density of 𝒁\bm{Z} is given by f(𝒛)=2N(𝒛|𝝃,𝚺)Φ((1𝝀𝚺1𝝀)1/2𝝀𝚺1(𝒚𝝃))f(\bm{z})=2N(\bm{z}\,|\,\bm{\xi},\bm{\Sigma})\,\Phi((1-\bm{\lambda}^{\top}\bm{\Sigma}^{-1}\bm{\lambda})^{-1/2}\bm{\lambda}^{\top}\bm{\Sigma}^{-1}(\bm{y}-\bm{\xi})), where 𝚺=σ2𝑹+𝝀𝝀\bm{\Sigma}=\sigma^{2}\bm{R}+\bm{\lambda}\bm{\lambda}^{\top}, 𝑹=(1ρρ1)\bm{R}=\left(\begin{smallmatrix}1&\rho\\ \rho&1\end{smallmatrix}\right), and Φ\Phi is the cdf of a standard Gaussian distribution. The marginal density of UU is fU(u)=2N(u|ξu,ωu2)Φ(αu(uξu)/ωu)f_{U}(u)=2N(u\,|\,\xi_{u},\omega_{u}^{2})\,\Phi(\alpha_{u}(u-\xi_{u})/\omega_{u}), where ωu2=λu2+σ2\omega_{u}^{2}=\lambda_{u}^{2}+\sigma^{2} and αu=λu/σ\alpha_{u}=\lambda_{u}/\sigma. We denote fU(u)f_{U}(u) as SN(u|ξu,ωu2,αu)\mathrm{SN}(u\,|\,\xi_{u},\omega_{u}^{2},\alpha_{u}). Similarly, the marginal density of VV is fV(v)=SN(ξv,ωv2,αv)f_{V}(v)=\mathrm{SN}(\xi_{v},\omega_{v}^{2},\alpha_{v}). It follows that the conditional density of UU given V=vV=v is

fU|V(u|v)=N(u|ξu+γ(vξv),ω~2)Φ(α1(uξu)+α2(vξv))/Φ(αv(vξv)/ωv),\displaystyle f_{U|V}(u\,|\,v)=N(u\,|\,\xi_{u}+\gamma(v-\xi_{v}),\tilde{\omega}^{2})\,\Phi(\alpha_{1}(u-\xi_{u})+\alpha_{2}(v-\xi_{v}))/\Phi(\alpha_{v}(v-\xi_{v})/\omega_{v}), (10)

where γ=(ρσ2+λuλv)/(σ2+λv2)\gamma=(\rho\sigma^{2}+\lambda_{u}\lambda_{v})/(\sigma^{2}+\lambda_{v}^{2}),   ω~2=σ2+λu2(ρσ2+λuλv)2/(σ2+λv2)\tilde{\omega}^{2}=\sigma^{2}+\lambda_{u}^{2}-(\rho\sigma^{2}+\lambda_{u}\lambda_{v})^{2}/(\sigma^{2}+\lambda_{v}^{2}), α1=(λuρλv)/m\alpha_{1}=(\lambda_{u}-\rho\lambda_{v})/m, α2=(λvρλu)/m\alpha_{2}=(\lambda_{v}-\rho\lambda_{u})/m, and m=(1ρ)s2(1ρ)s2+λu2+λv22ρλuλvm=\sqrt{(1-\rho)s^{2}}\sqrt{(1-\rho)s^{2}+\lambda_{u}^{2}+\lambda_{v}^{2}-2\rho\lambda_{u}\lambda_{v}}.

In the special case where ξu=ξv=0\xi_{u}=\xi_{v}=0 and λu=λv=λ\lambda_{u}=\lambda_{v}=\lambda, let ω2=λ2+σ2\omega^{2}=\lambda^{2}+\sigma^{2} and α=λ/σ\alpha=\lambda/\sigma. The joint density of 𝒁\bm{Z} can be written as f(𝒛)=2N(𝒛𝟎,𝚺)Φ(λ(1λ2𝟏2𝚺1𝟏2)1/2𝟏2𝚺1𝒛)f(\bm{z})=2N(\bm{z}\mid\bm{0},\bm{\Sigma})\,\Phi(\lambda(1-\lambda^{2}\bm{1}_{2}^{\top}\bm{\Sigma}^{-1}\bm{1}_{2})^{-1/2}\bm{1}_{2}^{\top}\bm{\Sigma}^{-1}\bm{z}), where the marginal density of 𝒁\bm{Z} is SN(x| 0,ω2,α)\mathrm{SN}(x\,|\,0,\omega^{2},\alpha). The conditional density of UU given V=vV=v is then given by

fU|V(u|v)=N(u|ρ~v,ω2(1ρ~2))Φ(α(u+v)/ω)/Φ(αv/ω),f_{U|V}(u\,|\,v)=N(u\,|\,\tilde{\rho}v,\omega^{2}(1-\tilde{\rho}^{2}))\,\Phi(\alpha^{\prime}(u+v)/\omega^{\prime})/\Phi(\alpha v/\omega), (11)

where ρ~=(ρσ2+λ2)/(σ2+λ2)\tilde{\rho}=(\rho\sigma^{2}+\lambda^{2})/(\sigma^{2}+\lambda^{2}), α=λ/s\alpha^{\prime}=\lambda/s, ω2=s2+2λ2\omega^{\prime 2}=s^{2}+2\lambda^{2}, and s2=(1+ρ)σ2s^{2}=(1+\rho)\sigma^{2}.

C.2.2 Stationary Skew-GNNMP Models

We take a set of base random vectors (Ul,Vl)(U,V)(U_{l},V_{l})\equiv(U,V) for all ll, where (U,V)(U,V) is a bivariate skew-Gaussian vector with distribution given by (8), and take ξu=ξv=0\xi_{u}=\xi_{v}=0, λu=λv=λ\lambda_{u}=\lambda_{v}=\lambda. We then extend (Ul,Vl)(U_{l},V_{l}) to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) by extending ρ\rho to ρl(𝒗)\rho_{l}(\bm{v}) using an exponential correlation function such that ρl(𝒗)ρl(𝒗𝒗(l))=exp(𝒗𝒗(l)/ϕ)\rho_{l}(\bm{v})\equiv\rho_{l}(||\bm{v}-\bm{v}_{(l)}||)=\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi), for l=1,,Ll=1,\dots,L. Using the resulting bivariate distribution for (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}), we define the spatially varying density f𝒗,l=fU𝒗,l|V𝒗,lf_{\bm{v},l}=f_{U_{\bm{v},l}|V_{\bm{v},l}} based on the formulation in (9). The resulting associated conditional density of the stationary skew-GNNMP is

p(y(𝒗)|𝒚Ne(𝒗))=l=1Lwl(𝒗)0N(y(𝒗)|μl(𝒗),σl2(𝒗))TN(z0(𝒗)|μ0(𝒗(l)),σ02)𝑑z0(𝒗),p(y(\bm{v})\,|\,\bm{y}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\int_{0}^{\infty}N(y(\bm{v})\,|\,\mu_{l}(\bm{v}),\sigma_{l}^{2}(\bm{v}))\mathrm{TN}(z_{0}(\bm{v})\,|\,\mu_{0}(\bm{v}_{(l)}),\sigma_{0}^{2})dz_{0}(\bm{v}), (12)

where μl(𝒗)=(1ρl(𝒗))λz0(𝒗)+ρl(𝒗)y(𝒗(l))\mu_{l}(\bm{v})=(1-\rho_{l}(\bm{v}))\lambda z_{0}(\bm{v})+\rho_{l}(\bm{v})y(\bm{v}_{(l)}), σ2(𝒗)=σ2(1(ρl(𝒗))2)\sigma^{2}(\bm{v})=\sigma^{2}(1-(\rho_{l}(\bm{v}))^{2}), μ0(𝒗(l))=y(𝒗(l))λ/(σ2+λ2)\mu_{0}(\bm{v}_{(l)})=y(\bm{v}_{(l)})\lambda/(\sigma^{2}+\lambda^{2}), and σ02=σ2/(σ2+λ2)\sigma_{0}^{2}=\sigma^{2}/(\sigma^{2}+\lambda^{2}).

The component conditional density in (12) is sampled via a latent variable z0(𝒗)z_{0}(\bm{v}). We marginalize out z0(𝒗)z_{0}(\bm{v}) to facilitate computation. Based on (11), we obtain the associated conditional density of the skew-GNNMP as

p(y(𝒗)𝒚Ne(𝒗))=l=1Lwl(𝒗)bl(𝒗)N(y(𝒗)ρ~l(𝒗)y(𝒗(l)),ω2(1(ρ~l(𝒗))2)),p(y(\bm{v})\mid\bm{y}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,b_{l}(\bm{v})N(y(\bm{v})\mid\tilde{\rho}_{l}(\bm{v})y(\bm{v}_{(l)}),\omega^{2}(1-(\tilde{\rho}_{l}(\bm{v}))^{2})), (13)

where bl(𝒗)=Φ(αl(𝒗)(y(𝒗)+y(𝒗(l)))/ωl(𝒗))/Φ(αy(𝒗(l))/ω)b_{l}(\bm{v})=\Phi(\alpha^{\prime}_{l}(\bm{v})(y(\bm{v})+y(\bm{v}_{(l)}))/\omega^{\prime}_{l}(\bm{v}))/\Phi(\alpha y(\bm{v}_{(l)})/\omega), ρ~l(𝒗)=(ρl(𝒗)σ2+λ2)/(σ2+λ2)\tilde{\rho}_{l}(\bm{v})=(\rho_{l}(\bm{v})\sigma^{2}+\lambda^{2})/(\sigma^{2}+\lambda^{2}), αl(𝒗)=λ/(1+ρl(𝒗))σ2\alpha^{\prime}_{l}(\bm{v})=\lambda/\sqrt{(1+\rho_{l}(\bm{v}))\sigma^{2}}, ωl2(𝒗)=(1+ρl(𝒗))σ2+2λ2\omega^{\prime 2}_{l}(\bm{v})=(1+\rho_{l}(\bm{v}))\sigma^{2}+2\lambda^{2}, α=λ/σ\alpha=\lambda/\sigma, and ω2=σ2+λ2\omega^{2}=\sigma^{2}+\lambda^{2}. For the weights wl(𝒗)w_{l}(\bm{v}), we use an exponential correlation function with range parameter ζ\zeta for the kernel functions of the random cutoff points. The Bayesian model is completed with prior specifications for λ,σ2,ϕ,ζ,𝜸,κ2\lambda,\sigma^{2},\phi,\zeta,\bm{\gamma},\kappa^{2}. In particular, we consider priors N(λ|μλ,σλ2)N(\lambda\,|\,\mu_{\lambda},\sigma^{2}_{\lambda}), IG(σ2|uσ2,vσ2)\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}}), IG(ϕ|uϕ,vϕ)\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi}), IG(ζ|uζ,vζ)\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta}), N(𝜸|𝝁𝜸,𝑽𝜸)N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}}) and IG(κ2|uκ2,vκ2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}}).

Given observations y(𝒔i)y(\bm{s}_{i}), i=1,,ni=1,\dots,n, over reference set 𝒮=(𝒔1,,𝒔n)\operatorname*{\mathcal{S}}=(\bm{s}_{1},\dots,\bm{s}_{n}), we perform Bayesian inference based a likelihood conditional on the first LL observations. The posterior distribution of the model parameters, given the conditional likelihood, is given by

N(λ|μλ,σλ2)×IG(σ2|uσ2,vσ2)×IG(ϕ|uϕ,vϕ)×IG(ζ|uζ,vζ)\displaystyle N(\lambda\,|\,\mu_{\lambda},\sigma^{2}_{\lambda})\times\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}})\times\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\times\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta})
×N(𝜸|𝝁𝜸,𝑽𝜸)×IG(κ2|uκ2,vκ2)×N(𝒕|𝑫𝜸,κ2𝐈nL)\displaystyle\times N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}})\times\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}})\times N(\bm{t}\,|\,\bm{D}\bm{\gamma},\,\kappa^{2}\mathbf{I}_{n-L})
×i=L+1nl=1Lbl(𝒔i)N(y(𝒔i)ρ~l(𝒔i)y(𝒔(il)),ω2(1(ρ~l(𝒔i))2))𝟙(r𝒔i,l1,r𝒔i,l)(ti),\displaystyle\times\prod_{i=L+1}^{n}\sum_{l=1}^{L}b_{l}(\bm{s}_{i})N(y(\bm{s}_{i})\mid\tilde{\rho}_{l}(\bm{s}_{i})y(\bm{s}_{(il)}),\omega^{2}(1-(\tilde{\rho}_{l}(\bm{s}_{i}))^{2}))\mathbbm{1}_{(r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l})}(t_{i}),

where the vector 𝒕=(tL+1,,tn)\bm{t}=(t_{L+1},\dots,t_{n})^{\top}, and the matrix 𝑫\bm{D} is (nL)×3(n-L)\times 3 such that the iith row is (1,sL+i,1,sL+i,2)(1,s_{L+i,1},s_{L+i,2}).

The MCMC sampler to obtain samples from the joint posterior distribution is described in the main paper. We present the posterior updates of λ,σ2\lambda,\sigma^{2} and ϕ\phi. Note that the configuration variables i\ell_{i} are such that i=l\ell_{i}=l if ti(r𝒔i,l1,r𝒔i,l)t_{i}\in(r_{\bm{s}_{i},l-1}^{*},r_{\bm{s}_{i},l}^{*}) for iL+1i\geq L+1. Denote by f𝒔i,l=bl(𝒔i)N(y(𝒔i)|ρ~l(𝒔i)y(𝒔(il)),ω2(1(ρ~l(𝒔i))2))f_{\bm{s}_{i},l}=b_{l}(\bm{s}_{i})N(y(\bm{s}_{i})\,|\,\tilde{\rho}_{l}(\bm{s}_{i})y(\bm{s}_{(il)}),\omega^{2}(1-(\tilde{\rho}_{l}(\bm{s}_{i}))^{2})). We use a random walk Metropolis step to update λ\lambda with target density N(λ|μλ,σλ2)i=L+1nf𝒔i,iN(\lambda\,|\,\mu_{\lambda},\sigma^{2}_{\lambda})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}. The posterior full conditional distributions of σ2\sigma^{2} and ϕ\phi are proportional to IG(σ2|uσ2,vσ2)i=L+1nf𝒔i,i\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, and IG(ϕ|uϕ,vϕ)i=L+1nf𝒔i,i\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, respectively. For each parameter, we update it on its log scale with a random walk Metropolis step.

C.2.3 Extended Skew-GNNMP Models

Again, we take a set of base random vectors (Ul,Vl)(U,V)(U_{l},V_{l})\equiv(U,V) for all ll, where (U,V)(U,V) is a bivariate skew-Gaussian vector with distribution given by (8). We extend (Ul,Vl)(U_{l},V_{l}) to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) by extending ρ\rho to ρl(𝒗)\rho_{l}(\bm{v}) using an exponential correlation function such that ρl(𝒗)ρl(𝒗𝒗(l))=exp(𝒗𝒗(l)/ϕ)\rho_{l}(\bm{v})\equiv\rho_{l}(||\bm{v}-\bm{v}_{(l)}||)=\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi), and extending ξu=𝒙(𝒗)𝜷\xi_{u}=\bm{x}(\bm{v})^{\top}\bm{\beta}, ξv=𝒙(𝒗(l))𝜷\xi_{v}=\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}, λu\lambda_{u} to λ(𝒗)\lambda(\bm{v}), and λv\lambda_{v} to λ(𝒗(l))\lambda(\bm{v}_{(l)}), for l=1,,Ll=1,\dots,L. Using the resulting bivariate distribution for (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}), we define the spatially varying density f𝒗,l=fU𝒗,l|V𝒗,lf_{\bm{v},l}=f_{U_{\bm{v},l}|V_{\bm{v},l}} based on the formulation in (9). The resulting associated conditional density of the extended skew-GNNMP is

p(y(𝒗)|𝒚Ne(𝒗))=l=1Lwl(𝒗)0N(y(𝒗)|μl(𝒗),σl2(𝒗))TN(z0(𝒗)|μ0l(𝒗(l)),σ0l2(𝒗(l)))𝑑z0(𝒗),p(y(\bm{v})\,|\,\bm{y}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\int_{0}^{\infty}N(y(\bm{v})\,|\,\mu_{l}(\bm{v}),\sigma_{l}^{2}(\bm{v}))\mathrm{TN}(z_{0}(\bm{v})\,|\,\mu_{0l}(\bm{v}_{(l)}),\sigma_{0l}^{2}(\bm{v}_{(l)}))dz_{0}(\bm{v}), (14)

where μl(𝒗)=𝒙(𝒗)𝜷+λ(𝒗)z0(𝒗)+ρl(𝒗)(y(𝒗(l))𝒙(𝒗(l))𝜷λ(𝒗(l))z0(𝒗))\mu_{l}(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+\lambda(\bm{v})z_{0}(\bm{v})+\rho_{l}(\bm{v})(y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}-\lambda(\bm{v}_{(l)})z_{0}(\bm{v})), σl2(𝒗)=σ2(1(ρl(𝒗))2)\,\sigma_{l}^{2}(\bm{v})=\sigma^{2}(1-(\rho_{l}(\bm{v}))^{2}), μ0l(𝒗(l))=(y(𝒗(l))𝒙(𝒗(l))𝜷)λ(𝒗(l))/(σ2+(λ(𝒗(l)))2)\,\mu_{0l}(\bm{v}_{(l)})=(y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta})\lambda(\bm{v}_{(l)})/(\sigma^{2}+(\lambda(\bm{v}_{(l)}))^{2}), and σ0l2(𝒗(l))=σ2/(σ2+(λ(𝒗(l)))2)\sigma_{0l}^{2}(\bm{v}_{(l)})=\sigma^{2}/(\sigma^{2}+(\lambda(\bm{v}_{(l)}))^{2}). After marginalizing out z0(𝒗)z_{0}(\bm{v}), the conditional density (14) based on formulation (10) can be written as

p(y(𝒗)𝒚Ne(𝒗))=l=1Lwl(𝒗)b~l(𝒗)N(y(𝒗)μ~l(𝒗),ω~l2(𝒗)),p(y(\bm{v})\mid\bm{y}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,\tilde{b}_{l}(\bm{v})N(y(\bm{v})\mid\tilde{\mu}_{l}(\bm{v}),\tilde{\omega}^{2}_{l}(\bm{v})), (15)

where μ~l(𝒗)=𝒙(𝒗)𝜷+γ~l(𝒗)(y(𝒗(l))𝒙(𝒗(l))𝜷)\tilde{\mu}_{l}(\bm{v})=\bm{x}(\bm{v})^{\top}\bm{\beta}+\tilde{\gamma}_{l}(\bm{v})(y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}), γ~l(𝒗)=(ρl(𝒗)σ2+λ(𝒗)λ(𝒗(l)))/ω2(𝒗(l))\tilde{\gamma}_{l}(\bm{v})=(\rho_{l}(\bm{v})\sigma^{2}+\lambda(\bm{v})\lambda(\bm{v}_{(l)}))/\omega^{2}(\bm{v}_{(l)}), ω~l2(𝒗)=ω(𝒗)2(ρl(𝒗)σ2+λ(𝒗)λ(𝒗(l)))2/ω2(𝒗(l))\tilde{\omega}^{2}_{l}(\bm{v})=\omega(\bm{v})^{2}-(\rho_{l}(\bm{v})\sigma^{2}+\lambda(\bm{v})\lambda(\bm{v}_{(l)}))^{2}/\omega^{2}(\bm{v}_{(l)}), sl2(𝒗)=(1+ρl(𝒗))σ2s_{l}^{2}(\bm{v})=(1+\rho_{l}(\bm{v}))\sigma^{2}, α(𝒗)=λ(𝒗)/σ\alpha(\bm{v})=\lambda(\bm{v})/\sigma, (ω(𝒗))2=λ(𝒗)2+σ2(\omega(\bm{v}))^{2}=\lambda(\bm{v})^{2}+\sigma^{2}, and

b~l(𝒗)\displaystyle\tilde{b}_{l}(\bm{v}) =Φ(α1(𝒗,𝒗(l))(y(𝒗)𝒙(𝒗)𝜷)+α2(𝒗,𝒗(l))(y(𝒗(l))𝒙(𝒗(l))𝜷))Φ(α(𝒗(l))(y(𝒗(l))𝒙(𝒗(l))𝜷)/ω(𝒗(l))),\displaystyle=\frac{\Phi(\alpha_{1}(\bm{v},\bm{v}_{(l)})(y(\bm{v})-\bm{x}(\bm{v})^{\top}\bm{\beta})+\alpha_{2}(\bm{v},\bm{v}_{(l)})(y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta}))}{\Phi(\alpha(\bm{v}_{(l)})(y(\bm{v}_{(l)})-\bm{x}(\bm{v}_{(l)})^{\top}\bm{\beta})/\omega(\bm{v}_{(l)}))},
α1(𝒗,𝒗(l))\displaystyle\alpha_{1}(\bm{v},\bm{v}_{(l)}) =(λ(𝒗)ρl(𝒗)λ(𝒗(l)))/m(𝒗),α2(𝒗,𝒗(l))=(λ(𝒗(l))ρl(𝒗)λ(𝒗))/m(𝒗),\displaystyle=(\lambda(\bm{v})-\rho_{l}(\bm{v})\lambda(\bm{v}_{(l)}))/m(\bm{v}),\;\;\alpha_{2}(\bm{v},\bm{v}_{(l)})=(\lambda(\bm{v}_{(l)})-\rho_{l}(\bm{v})\lambda(\bm{v}))/m(\bm{v}),
m(𝒗)\displaystyle m(\bm{v}) =(1ρl(𝒗))sl2(𝒗)(1ρl(𝒗))sl2(𝒗)+(λ(𝒗))2+(λ(𝒗(l)))22ρl(𝒗)λ(𝒗)λ(𝒗(l)).\displaystyle=\sqrt{(1-\rho_{l}(\bm{v}))s_{l}^{2}(\bm{v})}\sqrt{(1-\rho_{l}(\bm{v}))s^{2}_{l}(\bm{v})+(\lambda(\bm{v}))^{2}+(\lambda(\bm{v}_{(l)}))^{2}-2\rho_{l}(\bm{v})\lambda(\bm{v})\lambda(\bm{v}_{(l)})}.

We model the spatially varying λ(𝒗)\lambda(\bm{v}) via a partitioning approach. In particular, we partition the domain 𝒟\mathcal{D} such that 𝒟=k=1KPk\mathcal{D}=\cup_{k=1}^{K}P_{k}, PiPj=P_{i}\cap P_{j}=\emptyset for iji\neq j. For all 𝒗Pk\bm{v}\in P_{k}, we take λ(𝒗)=λk\lambda(\bm{v})=\lambda_{k}, for k=1,,Kk=1,\dots,K. For the weights wl(𝒗)w_{l}(\bm{v}), we use an exponential correlation function with range parameter ζ\zeta for the kernel function of the random cutoff points. The Bayesian model is completed with prior specifications for 𝜷,𝝀=(λ1,,λK),σ2,ϕ,ζ,𝜸,κ2\bm{\beta},\bm{\lambda}=(\lambda_{1},\dots,\lambda_{K}),\sigma^{2},\phi,\zeta,\bm{\gamma},\kappa^{2}. We assign a N(𝜷,|𝝁𝜷,𝑽𝜷)N(\bm{\beta},\,|\,\bm{\mu}_{\bm{\beta}},\bm{V}_{\bm{\beta}}) to the regression parameter 𝜷\bm{\beta} and N(λ|μλk,σλk2)N(\lambda\,|\,\mu_{\lambda k},\sigma^{2}_{\lambda k}) to λk\lambda_{k}, for k=1,,Kk=1,\dots,K. For other parameters, we take IG(σ2|uσ2,vσ2)\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}}), IG(ϕ|uϕ,vϕ)\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi}), IG(ζ|uζ,vζ)\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta}), N(𝜸|𝝁𝜸,𝑽γ)N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\gamma}) and IG(κ2|uκ2,vκ2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}}).

Given observations y(𝒔i)y(\bm{s}_{i}), i=1,,ni=1,\dots,n, over reference set 𝒮=(𝒔1,,𝒔n)\operatorname*{\mathcal{S}}=(\bm{s}_{1},\dots,\bm{s}_{n}), we perform Bayesian inference based a likelihood conditional on the first LL observations. The posterior distribution of the model parameters is

N(𝜷|𝝁𝜷,𝑽𝜷)k=1KN(λk|μλk,σλk2)×IG(σ2|uσ2,vσ2)×IG(ϕ|uϕ,vϕ)×IG(ζ|uζ,vζ)\displaystyle N(\bm{\beta}\,|\,\bm{\mu}_{\bm{\beta}},\bm{V}_{\bm{\beta}})\prod_{k=1}^{K}N(\lambda_{k}\,|\,\mu_{\lambda k},\sigma^{2}_{\lambda k})\times\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}})\times\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\times\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta})
×N(𝜸|𝝁𝜸,𝑽𝜸)×IG(κ2|uκ2,vκ2)×N(𝒕|𝑫𝜸,κ2𝐈nL)\displaystyle\times N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}})\times\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}})\times N(\bm{t}\,|\,\bm{D}\bm{\gamma},\,\kappa^{2}\mathbf{I}_{n-L})
×i=L+1nl=1Lb~l(𝒔i)N(y(𝒔i)μ~l(𝒔i),ω~l2(𝒔i))𝟙(r𝒔i,l1,r𝒔i,l)(ti),\displaystyle\times\prod_{i=L+1}^{n}\sum_{l=1}^{L}\tilde{b}_{l}(\bm{s}_{i})N(y(\bm{s}_{i})\mid\tilde{\mu}_{l}(\bm{s}_{i}),\tilde{\omega}^{2}_{l}(\bm{s}_{i}))\mathbbm{1}_{(r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l})}(t_{i}),

where the vector 𝒕=(tL+1,,tn)\bm{t}=(t_{L+1},\dots,t_{n})^{\top}, and the matrix 𝑫\bm{D} is (nL)×3(n-L)\times 3 such that the iith row is (1,sL+i,1,sL+i,2)(1,s_{L+i,1},s_{L+i,2}).

The MCMC sampler to obtain samples from the joint posterior distribution is described in the main paper. We present the posterior updates of 𝜷,𝝀,σ2\bm{\beta},\bm{\lambda},\sigma^{2} and ϕ\phi. Note that the configuration variables i\ell_{i} are such that i=l\ell_{i}=l if ti(r𝒔i,l1,r𝒔i,l)t_{i}\in(r_{\bm{s}_{i},l-1}^{*},r_{\bm{s}_{i},l}^{*}) for iL+1i\geq L+1. Denote by f𝒔i,l=b~l(𝒔i)N(y(𝒔i)|μ~l(𝒔i),ω~l2(𝒔i))f_{\bm{s}_{i},l}=\tilde{b}_{l}(\bm{s}_{i})N(y(\bm{s}_{i})\,|\,\tilde{\mu}_{l}(\bm{s}_{i}),\tilde{\omega}^{2}_{l}(\bm{s}_{i})). We use a random walk Metropolis step to update 𝜷\bm{\beta} with target density N(𝜷|𝝁𝜷,𝑽𝜷)i=L+1nf𝒔i,iN(\bm{\beta}\,|\,\bm{\mu}_{\bm{\beta}},\bm{V}_{\bm{\beta}})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}. Let Bk={i:𝒔iPk}{i:𝒔(ii)Pk}B_{k}=\{i:\bm{s}_{i}\in P_{k}\}\cup\{i:\bm{s}_{(i\ell_{i})}\in P_{k}\}. The posterior full conditional of λk\lambda_{k} is proportional to N(λk|μλk,σλk2)i:iBkf𝒔i,iN(\lambda_{k}\,|\,\mu_{\lambda k},\sigma^{2}_{\lambda k})\prod_{i:i\in B_{k}}f_{\bm{s}_{i},\ell_{i}}, and we use a random walk Metropolis step to sample λk\lambda_{k}, for k=1,,Kk=1,\dots,K. The posterior full conditional distributions of σ2\sigma^{2} and ϕ\phi are proportional to IG(σ2|uσ2,vσ2)i=L+1nf𝒔i,i\mathrm{IG}(\sigma^{2}\,|\,u_{\sigma^{2}},v_{\sigma^{2}})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, and IG(ϕ|uϕ,vϕ)i=L+1nf𝒔i,i\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, respectively. For each parameter, we update it on its log scale with a random walk Metropolis step.

C.3 Copula NNMP Models

C.3.1 Gaussian and Gumbel Copulas

We consider a continuous bivariate vector (X1,X2)(X_{1},X_{2}) with marginal cdfs F1F_{1} and F2F_{2} such that F1(x1)=t1F_{1}(x_{1})=t_{1} and F2(x2)=t2F_{2}(x_{2})=t_{2}. We introduce basic properties of the Gaussian and Gumbel copulas. For more details we refer to Joe (2014).

Gaussian copula

A Gaussian copula with correlation parameter ρ(1,1)\rho\in(-1,1) for (X1,X2)(X_{1},X_{2}) is C(F1(x1),F2(x2))=C(t1,t2ρ)=Φ2(Φ1(t1)+Φ1(t2))C(F_{1}(x_{1}),F_{2}(x_{2}))=C(t_{1},t_{2}\mid\rho)=\Phi_{2}(\Phi^{-1}(t_{1})+\Phi^{-1}(t_{2})), where Φ2\Phi_{2} is the cdf of a standard bivariate Gaussian distribution with correlation ρ\rho. The copula is asymptotically independent in both the lower and the upper tails. The corresponding copula density is

11ρ2exp(2ρΦ1(t1)Φ1(t2)ρ2{(Φ1(t1))2+(Φ1(t2))2}2(1ρ2)).\frac{1}{\sqrt{1-\rho^{2}}}\exp\left(\frac{2\rho\Phi^{-1}(t_{1})\Phi^{-1}(t_{2})-\rho^{2}\{(\Phi^{-1}(t_{1}))^{2}+(\Phi^{-1}(t_{2}))^{2}\}}{2(1-\rho^{2})}\right). (16)

Denote by C1|2(t1|t2)C_{1|2}(t_{1}\,|\,t_{2}) the conditional cdf of T1T_{1} given T2=t2T_{2}=t_{2}. Then we have C1|2(t1|t2)=C(t1,t2)t2=Φ(Φ1(t1)ρΦ1(t2)1ρ2)C_{1|2}(t_{1}\,|\,t_{2})=\frac{\partial C(t_{1},t_{2})}{\partial t_{2}}=\Phi\left(\frac{\Phi^{-1}(t_{1})-\rho\Phi^{-1}(t_{2})}{\sqrt{1-\rho^{2}}}\right). We sample X1X_{1}, given X2=x2X_{2}=x_{2}, with the following steps. Given a realization x2x_{2} of X2X_{2}, we compute t2=F2(x2)t_{2}=F_{2}(x_{2}). We then generate a random number zz from a uniform distribution on [0,1][0,1], and compute t1=C1|21(z|t2)t_{1}=C_{1|2}^{-1}(z\,|\,t_{2}) where C1|21(z|t2)=Φ((1ρ2)Φ1(z)+ρΦ1(t2))C_{1|2}^{-1}(z\,|\,t_{2})=\Phi\left(\sqrt{(1-\rho^{2})}\Phi^{-1}(z)+\rho\Phi^{-1}(t_{2})\right) is the inverse of C1|2(t1|t2)C_{1|2}(t_{1}\,|\,t_{2}). Finally, we obtain x1x_{1} from the inverse cdf F11(t1)F_{1}^{-1}(t_{1}).

Gumbel copula

A Gumbel copula with parameter η[1,)\eta\in[1,\infty) is C(F1(x1),F2(x2))=C(t1,t2|η)=exp([(log(t1))η+(log(t2))η]1/η)C(F_{1}(x_{1}),F_{2}(x_{2}))=C(t_{1},t_{2}\,|\,\eta)=\exp(-[(-\log(t_{1}))^{\eta}+(-\log(t_{2}))^{\eta}]^{1/\eta}). It is asymptotically independent in the lower tail and asymptotically dependent in the upper tail with tail dependence coefficient 221/η2-2^{1/\eta}. The corresponding copula density is

exp((u1η+u2η)1/η)((u1η+u2η)1/η+η1)(u1η+u2η)1/η2(u1u2)η1(t1t2)1.\exp(-(u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta})((u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta}+\eta-1)(u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta-2}(u_{1}u_{2})^{\eta-1}(t_{1}t_{2})^{-1}. (17)

Let u1=log(t1)u_{1}=-\log(t_{1}) and u2=log(t2)u_{2}=-\log(t_{2}). In particular, the Gumbel copula can be written as C¯(u1,u2|η)=exp((u1η+u2η)1/η)\overline{C}(u_{1},u_{2}\,|\,\eta)=\exp(-(u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta}), which is a bivariate exponential survival function, with marginals corresponding to a unit rate exponential distribution. The conditional cdf of T1T_{1} given T2=t2T_{2}=t_{2} is C1|2(t1|t2)=C¯1|2(u1|u2)=u21exp((u1η+u2η)1/η)(1+(u1/u2)η)1/η1C_{1|2}(t_{1}\,|\,t_{2})=\overline{C}_{1|2}(u_{1}\,|\,u_{2})=u_{2}^{-1}\exp(-(u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta})(1+(u_{1}/u_{2})^{\eta})^{1/\eta-1}. The inverse conditional cdf C1|21(|t2)C_{1|2}^{-1}(\cdot\,|\,t_{2}) does not have a closed form. To generate X1X_{1} given X2=x2X_{2}=x_{2}, following Joe (2014), we first define y=(u1η+u2η)1/ηy=(u_{1}^{\eta}+u_{2}^{\eta})^{1/\eta}. Then we have a realization of X1X_{1}, say x1=(y0ηu2η)1/ηx_{1}=(y_{0}^{\eta}-u_{2}^{\eta})^{1/\eta}, where y0y_{0} is the root of h(y)=y+(η1)log(y)(u2+(η1)log(u2)logz)=0h(y)=y+(\eta-1)\log(y)-(u_{2}+(\eta-1)\log(u_{2})-\log z)=0, where yu2y\geq u_{2}, and zz is a random number generated from a uniform distribution on [0,1][0,1].

C.3.2 Copula NNMP Models and Inference

We take a set of base random vectors (Ul,Vl)(U,V)(U_{l},V_{l})\equiv(U,V) where its bivariate distribution is specified by a Gaussian copula with correlation parameter ρ\rho. We extend (Ul,Vl)(U_{l},V_{l}) to (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) by extending ρ\rho to ρl(𝒗)ρl(𝒗𝒗(l))=exp(𝒗𝒗(l)/ϕ)\rho_{l}(\bm{v})\equiv\rho_{l}(||\bm{v}-\bm{v}_{(l)}||)=\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi), creating a sequence of spatially varying Gaussian copula C𝒗,lC_{\bm{v},l} for (U𝒗,l,V𝒗,l)(U_{\bm{v},l},V_{\bm{v},l}) with marginal cdfs FU𝒗,l=FV𝒗,l=FYF_{U_{\bm{v},l}}=F_{V_{\bm{v},l}}=F_{Y} for all 𝒗\bm{v} and all ll. The cdf FYF_{Y} corresponds to the stationary marginal distribution of the model. The associated conditional density of the Gaussian copula NNMP is

p(y(𝒗)𝒚Ne(𝒗))=l=1Lwl(𝒗)c𝒗,l(y(𝒗),y(𝒗(l)))fY(y(𝒗)),p(y(\bm{v})\mid\bm{y}_{\text{Ne}(\bm{v})})=\sum_{l=1}^{L}w_{l}(\bm{v})\,c_{\bm{v},l}(y(\bm{v}),y(\bm{v}_{(l)}))f_{Y}(y(\bm{v})),

where c𝒗,l(y(𝒗),y(𝒗(l)))c_{\bm{v},l}(y(\bm{v}),y(\bm{v}_{(l)})) is the Gaussian copula density obtained by replacing ρ\rho in (16) with ρl(𝒗)\rho_{l}(\bm{v}), and fYf_{Y} is the density of FYF_{Y}.

Similarly, we can obtain the Gumbel copula NNMP model using a collection of spatially varying Gumbel copulas by extending η\eta in (17) to ηl(𝒗)ηl(𝒗𝒗(l))=min{(1exp(𝒗𝒗(l)/ϕ))1,50}\eta_{l}(\bm{v})\equiv\eta_{l}(||\bm{v}-\bm{v}_{(l)}||)=\min\{(1-\exp(-||\bm{v}-\bm{v}_{(l)}||/\phi))^{-1},50\}, where the upper bound 5050 ensures numerical stability.

We discuss the inferential approach for the Gaussian copula NNMP; the approach for the Gumbel copula NNMP model is similar. Assume the stationary marginal density is a gamma density, denoted as fY=Ga(a,b)f_{Y}=\mathrm{Ga}(a,b), with mean E(Y)=a/bE(Y)=a/b. For the weights wl(𝒗)w_{l}(\bm{v}), we use an exponential correlation function with range parameter ζ\zeta to define the random cutoff points. The Bayesian model is completed with prior specifications for a,b,ϕ,ζ,𝜸,κ2a,b,\phi,\zeta,\bm{\gamma},\kappa^{2}. In particular, we consider priors Ga(ua,va)\mathrm{Ga}(u_{a},v_{a}), Ga(ub,vb)\mathrm{Ga}(u_{b},v_{b}), IG(ϕ|uϕ,vϕ)\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi}), IG(ζ|uζ,vζ)\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta}), N(𝜸|𝝁𝜸,𝑽𝜸)N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}}) and IG(κ2|uκ2,vκ2)\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}}).

Given observations y(𝒔i),i=1,,ny(\bm{s}_{i}),i=1,\dots,n, over reference set 𝒮=(𝒔1,,𝒔n)\operatorname*{\mathcal{S}}=(\bm{s}_{1},\dots,\bm{s}_{n}), we perform Bayesian inference using a likelihood conditional on (y(𝒔1),,y(𝒔L))(y(\bm{s}_{1}),\dots,y(\bm{s}_{L})). The posterior distribution of the model parameters is

Ga(ua,va)×Ga(ub,vb)×IG(ϕ|uϕ,vϕ)×IG(ζ|uζ,vζ)×N(𝜸|𝝁𝜸,𝑽𝜸)×IG(κ2|uκ2,vκ2)\displaystyle\mathrm{Ga}(u_{a},v_{a})\times\mathrm{Ga}(u_{b},v_{b})\times\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\times\mathrm{IG}(\zeta\,|\,u_{\zeta},v_{\zeta})\times N(\bm{\gamma}\,|\,\bm{\mu}_{\bm{\gamma}},\bm{V}_{\bm{\gamma}})\times\mathrm{IG}(\kappa^{2}\,|\,u_{\kappa^{2}},v_{\kappa^{2}})
×N(𝒕|𝑫𝜸,κ2𝐈nL)×i=L+1nl=1Lc𝒔i,l(y(𝒔i),y(𝒔(il)))fY(y(𝒔i))𝟙(r𝒔i,l1,r𝒔i,l)(ti),\displaystyle\times N(\bm{t}\,|\,\bm{D}\bm{\gamma},\,\kappa^{2}\mathbf{I}_{n-L})\times\prod_{i=L+1}^{n}\sum_{l=1}^{L}c_{\bm{s}_{i},l}(y(\bm{s}_{i}),y(\bm{s}_{(il)}))f_{Y}(y(\bm{s}_{i}))\mathbbm{1}_{(r^{*}_{\bm{s}_{i},l-1},r^{*}_{\bm{s}_{i},l})}(t_{i}),

where the vector 𝒕=(tL+1,,tn)\bm{t}=(t_{L+1},\dots,t_{n})^{\top}, and the matrix 𝑫\bm{D} is (nL)×3(n-L)\times 3 such that the iith row is (1,sL+i,1,sL+i,2)(1,s_{L+i,1},s_{L+i,2}). We provide the updates for parameters (a,b,ϕ)(a,b,\phi). Note that the configuration variables i\ell_{i} are such that i=l\ell_{i}=l if ti(r𝒔i,l1,r𝒔i,l)t_{i}\in(r_{\bm{s}_{i},l-1}^{*},r_{\bm{s}_{i},l}^{*}) for iL+1i\geq L+1. Denote by f𝒔i,l=c𝒔i,l(y(𝒔i),y(𝒔(il)))fY(y(𝒔i))f_{\bm{s}_{i},l}=c_{\bm{s}_{i},l}(y(\bm{s}_{i}),y(\bm{s}_{(il)}))f_{Y}(y(\bm{s}_{i})). The posterior full conditional distributions for parameters aa, bb and ϕ\phi are proportional to IG(a|ua,va)i=L+1nf𝒔i,i\mathrm{IG}(a\,|\,u_{a},v_{a})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, IG(b|ub,vb)i=L+1nf𝒔i,i\mathrm{IG}(b\,|\,u_{b},v_{b})\prod_{i=L+1}^{n}f_{\bm{s}_{i},\ell_{i}}, and IG(ϕ|uϕ,vϕ)i=L+1nc𝒔i,l(y(𝒔i),y(𝒔(il)))\mathrm{IG}(\phi\,|\,u_{\phi},v_{\phi})\prod_{i=L+1}^{n}c_{\bm{s}_{i},l}(y(\bm{s}_{i}),y(\bm{s}_{(il)})), respectively. Each parameter is updated on its log scale with a random walk Metropolis step.

\bibliographystylesm

jasa3 \bibliographysmref