Assessing Copula Models for Mixed Continuous-Ordinal Variables

Shenyi Pan Department of Statistics, University of British Columbia, Vancouver, BC Canada V6T 1Z4. Email: shenyi.pan@stat.ubc.ca. Harry Joe Department of Statistics, University of British Columbia, Vancouver, BC Canada V6T 1Z4. Email: Harry.Joe@ubc.ca.

Abstract

Vine pair-copula constructions exist for a mix of continuous and ordinal variables. In some steps, this can involve estimating a bivariate copula for a pair of mixed continuous-ordinal variables. To assess the adequacy of copula fits for such a pair, diagnostic and visualization methods based on normal score plots and conditional Q-Q plots are proposed. The former utilizes a latent continuous variable for the ordinal variable. Using the Kullback-Leibler divergence, existing probability models for mixed continuous-ordinal variable pair are assessed for the adequacy of fit with simple parametric copula families. The effectiveness of the proposed visualization and diagnostic methods is illustrated on simulated and real datasets.

Keywords: parametric copula, empirical beta copula, Kullback-Leibler divergence, location-scale mixture models, normal scores, ordinal regression, polyserial correlation.

1 Introduction

Vine pair-copula constructions have been used for a mix of continuous and discrete/ordinal variables in Stöber et al., (2015), Section 3.9.5 of Joe, (2014), and Chang and Joe, (2019). The latter is concerned with the use of vine constructions for prediction models based on explanatory variables that are a mix of continuous and ordinal variables.

In some steps of the vine construction, the estimation involves a bivariate copula for a pair of mixed continuous-ordinal variables. The main objective of this paper is to assess the adequacy of the fit of parametric copula families for a pair of variables, one of which is continuous and the other is ordinal with a few categories. To decide on possible suitable bivariate parametric copula families, the two variables can be visualized using normal score plots by converting the ordinal variable into an appropriate latent continuous variable. After fitting candidate copula families, quantile-quantile (Q-Q) plots of the continuous variable conditioned on each category of the ordinal variable can be used to assess the adequacy of the fit.

To theoretically assess whether commonly used parametric copula families can fit mixed continuous-ordinal variables well, we consider some bivariate distributions or models proposed in the statistical literature for one ordinal and one continuous variable. The Kullback-Leibler (KL) divergence is used to assess the adequacy of copula-based approximations. For mixture models, we find that simple 1- or 2-parameter parametric copula families can lead to good approximations when there are homoscedastic and roughly equally spaced components and the number of mixture components is small. For conditional probit and logit models, Gaussian or t copulas can provide perfect or near perfect matches when the continuous variable follows a normal distribution. Otherwise, simple parametric bivariate copula families can be inadequate. In particular, in mixture models, as the locations become more dispersed or when there is heteroscedasticity, the best approximating simple parametric copula families based on the KL divergence can have reflection asymmetry, tail asymmetry, or tail dependence, but the fit can still be inadequate based on the conditional Q-Q plots.

When simple parametric copula families do not provide good fits, nonparametric copulas can be fitted with appropriate adaptations to an ordinal variable using the latent continuous variable. We indicate how to adapt the empirical beta copulas (Segers et al., (2017) for use with a pair of mixed continuous and ordinal variables as an example.

The remainder of this paper is organized as follows. Section 2 gives an overview of copulas when fitted to a pair of mixed continuous-ordinal variables. Section 3 proposes visualization methods via normal score plots and copula model diagnostic methods via conditional Q-Q plots for a pair of mixed continuous-ordinal variables. Section 4 discusses approaches to assessing the adequacy of approximations by computing the KL divergence of a copula-based model from a given density. Section 5 covers procedures for fitting parametric and nonparametric copula models to mixed continuous-ordinal variables. The proposed visualization and diagnostic methods are illustrated on simulated datasets in Section 6 and demonstrated on a real dataset in Section 7. Section 8 has final discussions.

2 Copulas for Mixed Continuous-Ordinal Variables

A copula is a multivariate distribution function with univariate Uniform $(0,1)$ margins. According to Sklar’s theorem (Sklar, (1959)), a $d$ -variate distribution $F$ is a composition of a copula $C$ and its univariate marginal distributions $F_{1},\ldots,F_{d}$ ; that is, $F(\boldsymbol{x})=C(F_{1}(x_{1}),\ldots,F_{d}(x_{d}))$ , for $\boldsymbol{x}\in\mathbb{R}^{d}$ . If $F$ is a continuous $d$ -variate distribution function with univariate margins $F_{1},\ldots,F_{d}$ , and quantile functions $F_{1}^{-1},\ldots,F_{d}^{-1}$ , then the copula $C(\boldsymbol{u})=F(F_{1}^{-1}(u_{1}),\ldots,F_{d}^{-1}(u_{d}))$ , for $\boldsymbol{u}\in[0,1]^{d}$ , is the unique choice. If $F$ is a $d$ -variate distribution function of mixed continuous-ordinal variables, then the copula is only unique on the set $\text{Range}(F_{1})\times\cdots\times\text{Range}(F_{d})$ . Hence, the copula associated with $F$ is non-unique when some variables are ordinal. Nevertheless, parametric copula families can still be used for data applications. Many non-uniqueness results are shown in Genest and Nešlehová, (2007) for the completely discrete/ordinal case. For the bivariate case of one ordinal and one continuous variable, one candidate for the non-unique copula is based on conditionally uniform given identities that the copula must satisfy, extending an idea on page 122 of Joe, (2014). This corresponds to defining an appropriate latent continuous variable.

Consider an ordinal variable $X$ that can take $k$ distinct ordered values $\{v_{1},\ldots,v_{k}\}$ and a continuous variable $Y$ . Assume the labeling $v_{j}=j$ for $j=1,\ldots,k$ without loss of generality. Let $F_{X,Y}$ be the cumulative distribution function (CDF) of $(X,Y)$ , with copula $C_{X,Y}$ that is unique on $\{0,F_{X}(1),\ldots,F_{X}(k-1),1\}\times[0,1]$ . Let $Z$ be a continuous latent variable associated with $X$ with distribution $F_{Z}$ . If $Z=F^{-1}_{Z}(U_{Z})$ and

U_{Z}\sim U\left(F_{X}(i-1),F_{X}(i)\right),\text{ for }X=i,\quad i\in\{1,\ldots,k\},

(1)

then it can be shown that $C_{Z,Y}(F_{X}(x),F_{Y}(y))=C_{X,Y}(F_{X}(x),F_{Y}(y))$ holds for $x\in\{1,\ldots,k\}$ and $F_{Y}(y)\in[0,1]$ .

The proof is as follows. Note that $U_{Z}$ in (1) has $U(0,1)$ distribution. Hence, there is a unique copula $C_{Z,Y}$ for $(Z,Y)$ since $Z$ is continuous. Let $U_{Y}=F_{Y}(Y)$ and $U_{X}=F_{X}(X)$ . By Sklar’s theorem, any copula $C_{X,Y}$ for $(X,Y)$ satisfies

C_{X,Y}\left(F_{X}(i),F_{Y}(y)\right)=\mathbb{P}\left(U_{X}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right),\quad i=1,\ldots,k,\quad-\infty<y<\infty.

For a given $i\in\{1,\ldots,k\}$ , $U_{X}\leq F_{X}(i)$ implies that $X$ can take a value in $\{1,\ldots,i\}$ . The corresponding $U_{Z}$ thus follows one of the uniform distributions $U(0,F_{X}(1)),\dots,U(F_{X}(i-1),F_{X}(i))$ . This implies that $U_{Z}\leq F_{X}(i)$ also holds. Similarly, $U_{Z}\leq F_{X}(i)$ implies that $X\leq i$ or $U_{X}\leq F_{X}(i)$ . Hence, $U_{Z}\leq F_{X}(i)$ and $U_{X}\leq F_{X}(i)$ are equivalent events, and

	$\displaystyle C_{Z,Y}\left(F_{X}(i),F_{Y}(y)\right)=\mathbb{P}\left(U_{Z}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right)$
	$\displaystyle=\mathbb{P}\left(U_{X}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right)=C_{X,Y}\left(F_{X}(i),F_{Y}(y)\right).$

Thus $C_{Z,Y}$ matches $C_{X,Y}$ on $\{0,F_{X}(1),\ldots,F_{X}(k-1),1\}\times[0,1]$ .

Note that the condition in (1) only requires that $U_{Z}$ follows a piecewise uniform distribution given different values of $X$ . However, there are no restrictions on the joint distribution of $U_{Z}$ and $U_{Y}$ given $U_{X}$ . Given different joint distributions of $[U_{Z},U_{Y}|X=i]$ , the resulting copulas $C_{Z,Y}$ are also different. This shows the non-uniqueness of copulas for the case of one ordinal and one continuous variable. Examples of generating different sets of $U_{Z}$ values that satisfy the condition in (1) are illustrated in the next section.

3 Copula Model Diagnostics for Mixed Continuous-Ordinal

There are diagnostic plots and measures of asymmetry or dependence (Chapter 1 and Sections 2.12–2.15 of Joe, (2014)) to assess whether 1-parameter or 2-parameter bivariate copula families are useful for pairs of continuous variables with moderate to strong dependence. In this section, we discuss plots to help decide on possible parametric copula families for a continuous-ordinal pair of variables, and assess the adequacy of fitted copulas for a random sample with mixed continuous-ordinal variables. Visualization using normal score plots is presented in Section 3.1. A diagnostic procedure for copula estimates using conditional Q-Q plots is given in Section 3.2.

With an ordinal variable $X$ and a continuous variable $Y$ , suppose there is a random sample from $F_{X,Y}$ consisting of $n$ pairs $(x_{i},y_{i})$ for $i=1,\ldots,n$ , with $x_{i}\in\{1,\ldots,k\}$ . For each $j\in\{1,\ldots,k\}$ , let $n_{j}$ be the cardinality of $\{i:x_{i}=j\}$ . Let ${\widehat{F}}_{X}$ be the empirical distribution function of $\{x_{i}\}$ . For the continuous variable $Y$ , a parametric or nonparametric model can be applied to estimate $F_{Y}$ with ${\widehat{F}}_{Y}$ . For the $i$ th observation, let $u_{iX}^{+}={\widehat{F}}_{X}(x_{i})$ , $u_{iX}^{-}=\lim_{t\uparrow x_{i}}{\widehat{F}}_{X}(t)={\widehat{F}}_{X}(x_{i}^{-})$ , and $u_{iY}={\widehat{F}}_{Y}(y_{i})$ .

3.1 Visualizations via Normal Score Plots

For a pair of continuous variables, normal score plots are often used to visualize the copula for the variables and check for deviations from Gaussian dependence. With the ordinal variable $X$ , $\{u_{iX}^{+}\}$ is a set of discrete values. To visualize the relationship between $X$ and $Y$ , we use an appropriate latent continuous variable associated with $X$ .

For the normal score plots, we assume that the ordinal variable $X$ is generated from a latent standard normal variable $Z$ with estimated cutpoints $\zeta_{j}=\Phi^{-1}\left((n_{1}+\cdots+n_{j})/{n}\right)$ for $j=1,\ldots,k-1$ , where $\Phi$ and $\Phi^{-1}$ are the CDF and quantile function of the standard normal distribution. Let $\zeta_{0}=-\infty$ and $\zeta_{k}=\infty$ . We would like the latent variable $Z$ to be generated in such a way that the correlation between $Z$ and $Y$ is close to the correlation between $X$ and $Y$ in order for the visualization to preserve the strength of correlation from the original variables.

One measure of the association between an ordinal and a continuous variable is the polyserial correlation (see Olsson et al., (1982) and Section 2.12.7 of Joe, (2014)). The polyserial correlation between $X$ and $Y$ is defined as

\rho_{N}=\operatorname*{arg\,max}_{\rho}\sum_{i=1}^{n}\log\left\{\phi\left(\Phi^{-1}(u_{iY})\right)\left[\Phi\left(\frac{\zeta_{x_{i}}-\rho\Phi^{-1}(u_{iY})}{\sqrt{1-\rho^{2}}}\right)-\Phi\left(\frac{\zeta_{x_{i}-1}-\rho\Phi^{-1}(u_{iY})}{\sqrt{1-\rho^{2}}}\right)\right]\right\},

where $\phi$ is the probability density function (PDF) of the standard normal distribution.

Let $\mathcal{S}_{j}=\{i:x_{i}=j\}$ , $j=1,\ldots,k$ , be the set of indices of $X$ taking value $j$ . We propose to generate the normal scores of the latent variable $Z$ for the ordinal variable $X$ in the following steps.

1.

Compute the polyserial correlation $\rho_{N}$ between $X$ and $Y$ . For $j=1,\ldots,k$ , perform steps 2 to 5 on $\mathcal{S}_{j}$ :
2.

For each $i\in\mathcal{S}_{j}$ , independently generate $\omega_{i}\sim U(0,1)$ . Let

$u_{iW}=\Phi\left(\sqrt{1-\rho_{N}^{2}}\Phi^{-1}(\omega_{i})+\rho_{N}\Phi^{-1}(u_{iY})\right),$

assuming a bivariate Gaussian copula with correlation $\rho_{N}$ . Note that the generated $u_{iW}$ are in the range of $[0,1]$ . Let $\boldsymbol{u}_{W,j}=\{u_{iW}:i\in\mathcal{S}_{j}\}$ be a vector of length $n_{j}$ .
3.

For each $i\in\mathcal{S}_{j}$ , independently generate $\psi_{i}\sim U(n^{-1}\sum_{t=1}^{j-1}n_{t},n^{-1}\sum_{t=1}^{j}n_{t})$ for $j>1$ , or $\psi_{i}\sim U(0,n_{1}/n)$ for $j=1$ . Let $\boldsymbol{\psi}_{j}=\{\psi_{i}:i\in\mathcal{S}_{j}\}$ be a vector of length $n_{j}$ .
4.

For each $u_{iW}$ with $i\in\mathcal{S}_{j}$ , find its rank in $\boldsymbol{u}_{W,j}$ . Replace the value of $u_{iW}$ by $\psi_{\ell}$ that has the same rank in $\boldsymbol{\psi}_{j}$ , and denote this value by $u_{iZ}$ . Let $\boldsymbol{u}_{Z,j}=\{u_{iZ}:i\in\mathcal{S}_{j}\}$ .
5.

Let $z_{i}=\Phi^{-1}(u_{iZ})$ for each $i\in\mathcal{S}_{j}$ .

The $\{u_{iW}\}$ generated in step 2 does not have a uniform distribution. The additional steps 3 and 4 match the ranks of $u_{iW}$ with a random vector generated from the uniform distribution. This procedure ensures that the generated normal scores $\{z_{i}\}$ fall into the desired bins separated by the cutpoints $\boldsymbol{\zeta}$ and marginally follow the standard normal distribution. Within each bin, $\{z_{i}\}$ and $\{\Phi^{-1}(u_{iY})\}$ also have similar correlation to $\rho_{N}$ . The two sets of normal scores, $\{z_{i}:i=1,\ldots,n\}$ and $\{\Phi^{-1}(u_{iY}):i=1,\ldots,n\}$ , can then be plotted against each other to decide on suitable parametric copula families. If a bivariate copula model can fit $\{(u_{iZ},u_{iY})\}$ well, then it can also potentially provide adequate fits to $\{(x_{i},y_{i})\}$ .

Let $F_{n,X}(j)={\widehat{F}}_{X}(j)$ for $j=1,\dots,k$ . As $n\to\infty$ , $F_{n,X}(j)=n^{-1}\sum_{t=1}^{j}n_{t}\to F_{X}(j)$ for $j=1,\dots,k$ . Therefore, each element $\psi_{i}$ of $\boldsymbol{\psi}_{j}$ in Step 3 of the proposed algorithm above follows $U\left(F_{n,X}(j-1),F_{n,X}(j)\right)$ and satisfies the condition (1) as $n\to\infty$ . Similarly, each element of $\boldsymbol{u}_{Z,j}$ also satisfies the condition (1) as $n\to\infty$ , since $\boldsymbol{u}_{Z,j}$ is a permutation of $\boldsymbol{\psi}_{j}$ . However, the correlation for $\{(u_{iZ},u_{iY})\}$ is stronger than the correlation for $\{(\psi_{i},u_{iY})\}$ . The bivariate empirical copulas of the two sets $\{(\psi_{i},u_{iY}):i=1,\dots,n\}$ and $\{(u_{iZ},u_{iY}):i=1,\dots,n\}$ evaluated at $(u_{x},u_{y})$ are the same for empirical CDF values $u_{x}=F_{n,X}(j)$ and arbitrary $u_{y}\in(0,1)$ .

3.2 Diagnostics of Estimated Copula

With $F_{XY}(x,y)=C_{XY}(F_{X}(x),F_{Y}(y))$ for a copula, the conditional distribution of the continuous variable $Y$ given $X=x$ , where $x$ is a possible value of the ordinal variable $X$ , is

F_{Y|X}(y|x)=\frac{F_{XY}(x,y)-F_{XY}(x^{-},y)}{F_{X}(x)-F_{X}(x^{-})}=\frac{C_{XY}(F_{X}(x),F_{Y}(y))-C_{XY}(F_{X}(x^{-}),F_{Y}(y))}{F_{X}(x)-F_{X}(x^{-})}.

(2)

If ${\widehat{C}}_{XY}=C(\cdot;{\widehat{\boldsymbol{\theta}}})$ is an estimate from a parametric family, let

{\widehat{F}}_{Y|X}(y|x)=\frac{{\widehat{C}}_{XY}({\widehat{F}}_{X}(x),{\widehat{F}}_{Y}(y))-{\widehat{C}}_{XY}({\widehat{F}}_{X}(x^{-}),{\widehat{F}}_{Y}(y))}{{\widehat{F}}_{X}(x)-{\widehat{F}}_{X}(x^{-})}

be the estimated conditional distribution. Given a quantile level $q$ , the conditional quantile ${\widehat{F}}^{-1}_{Y|X}(q|x)$ is obtained as ${\widehat{F}}^{-1}_{Y}(v)$ where $v$ is the root to the equation

\frac{{\widehat{C}}_{XY}\bigl{(}{\widehat{F}}_{X}(x),v)-{\widehat{C}}_{XY}\bigl{(}{\widehat{F}}_{X}(x^{-}),v\bigr{)}}{{\widehat{F}}_{X}(x)-{\widehat{F}}_{X}(x^{-})}=q.

To assess the fit of ${\widehat{C}}_{XY}$ , conditional Q-Q plots can be generated based on ${\widehat{F}}^{-1}_{Y|X}(\cdot|j)$ for $j=1,\ldots,k$ . For the $j$ th conditional Q-Q plot, the quantiles ${\widehat{F}}_{Y|X}^{-1}(\cdot|j)$ are plotted against the quantiles of the empirical distribution of $\{y_{i}:x_{i}=j\}$ , that is, the model-based quantiles ${\widehat{F}}_{Y|X}^{-1}\left((m-0.5)/{n_{j}}\big{|}j\right)$ for $m=1,\ldots,n_{j}$ are plotted against the sorted values of $\{y_{i}:x_{i}=j\}$ . If the points in each conditional Q-Q plot lie closely along the $45^{\circ}$ diagonal line, it indicates that the copula estimate fits the sample well.

The use of these conditional Q-Q plots for copula diagnostics is illustrated in Section 6 for simulated datasets and Section 7 for a real dataset.

4 Assessing Copula Approximations of Probability Models for Mixed Continuous-Ordinal Variables

For a mix of several continuous variables and one ordinal variable, there are several classes of probability models in the statistical literature. Since our primary goal is to examine whether simple parametric bivariate copula families within the vine pair-copula construction are adequate, we focus on probability models for one continuous and one ordinal variable and assess the adequacy of approximations by parametric bivariate copula families with a few parameters.

This section discusses methods to theoretically assess the adequacy of copula approximations by computing the Kullback-Leibler (KL) divergence of $C(F_{X},F_{Y};\theta)$ with a parametric copula family $C(\cdot;\theta)$ relative to a given bivariate distribution $G_{X,Y}$ with margins $F_{X}$ and $F_{Y}$ . This leads to summaries of conditions for when parametric copula families with a few parameters are adequate, and other conditions for when they are inadequate. In particular, Section 4.1 gives an overview of common probability models for a pair of mixed continuous-ordinal variables. Computational details of the KL divergence are covered in Section 4.2. Some representative concrete examples are considered in Section 4.3 to show a range of KL divergence values.

4.1 Probability Models for Mixed Continuous-Ordinal Variables

There are two classes of models for a mix of continuous and ordinal variables depending on the order of conditioning.

For the first class, a multinomial distribution is assumed for the ordinal variable and the conditional distribution of the continuous variable given the ordinal variable can be either Gaussian with different mean and variance parameters, or a more general location-scale family (Little and Schluchter, (1985) and Krzanowski, (1993)). For the second class, the continuous variable is transformed to have a parametric distribution such as $N(0,1)$ and the ordinal variable conditioned on the continuous variable is an ordinal probit or logit model; references are Cox, (1972) and Krzanowski, (1993). We introduce the notation for these two classes of models with an ordinal variable $X$ and a continuous variable $Y$ .

For the first class of models, a location-scale mixture model is considered. The conditional distribution $F_{Y|X}(\cdot|x)$ is a general location-scale model with

[Y|X=x]\sim\frac{1}{\sigma_{x}}p\left(\frac{y-\mu_{x}}{\sigma_{x}}\right),\quad x\in\{1,\ldots,k\},

where $p$ is a density function on the real line. The ordinal variable $X$ has a probability distribution representing the mixture components. If $Y$ can only take positive values, it can be transformed via a logarithm to get a location-scale model.

For the second class of models, the conditional distribution $F_{X|Y}(\cdot|y)$ is

\mathbb{P}(X\leq x|Y=y)=F(ay+b_{x}),\quad x=1,\ldots,k,\quad b_{1}<\cdots<b_{k-1}<b_{k}=\infty,

where $F$ is a standard normal or logistic CDF, $a$ is a slope parameter, and $b$ is an offset depending on the ordinal category. The continuous variable $Y$ is assumed to have a unimodal distribution. If $Y\sim N(0,1)$ , simpler calculation results can be obtained when $F$ is the standard normal CDF.

4.2 Assessing Copula Approximation via the KL Divergence

Let $H_{X,Y},G_{X,Y}$ be two bivariate distributions with densities $h_{X,Y},g_{X,Y}$ on the respective relevant measure spaces. For ordinal $X$ with values in $\{1,\ldots,k\}$ and absolutely continuous $Y$ , the product measure comes from a counting measure and the Lebesgue measure on the real line. The non-negative KL divergence of $h_{X,Y}$ from $g_{X,Y}$ is

KL\left(h_{X,Y},g_{X,Y}\right)=\int_{-\infty}^{\infty}\sum_{i=1}^{k}g_{X,Y}(i,y)\log[g_{X,Y}(i,y)/h_{X,Y}(i,y)]\,{\rm d}y.

(3)

Let $C(\cdot;\theta)$ be a bivariate copula family with $C_{1|2}(u|v;\theta)={\partial C(u,v;\theta)/\partial v}$ . If $H_{X,Y}(x,y;\theta)=C(F_{X}(x),F_{Y}(y);\theta)$ with conditional probability mass function $h_{X|Y}(x|y;\theta)=C_{1|2}(F_{X}(x)|F_{Y}(y);\theta)-C_{1|2}(F_{X}(x^{-})|F_{Y}(y);\theta)$ , then the joint density is $h_{X,Y}(x,y;\theta)=f_{Y}(y)\,h_{X|Y}(x|y;\theta)$ with $f_{Y}(y)=\mathrm{d}F_{Y}(y)/\mathrm{d}y$ . In the setting of a mixture model that specifies $f_{X}$ and $g_{Y|X}$ , $f_{Y}(y)=\sum_{i=1}^{k}f_{X}(i)\,g_{Y|X}(y|i)$ .

The copula with parameter estimate

{\hat{\theta}}=\operatorname*{arg\,min}_{\theta}KL\left(h_{X,Y}(\cdot;\theta),g_{X,Y}\right)

is the member in the family that is the closest to $g_{X,Y}$ . The value $KL(h_{X,Y}(\cdot;{\hat{\theta}}),g_{X,Y})$ is considered as the KL divergence of the family $\{h_{X,Y}(\cdot;\theta)\}$ from $g_{X,Y}$ .

In the results below, we consider different parametric copula families denoted by $C^{(m)}(\cdot;\theta^{(m)})$ with corresponding densities $h^{(m)}_{X,Y}(\cdot;\theta^{(m)})$ . Given $g_{X,Y}$ , this leads to $KL(h^{(m)}_{X,Y}(\cdot;{\hat{\theta}}^{(m)}),g_{X,Y})$ for model $m$ . A copula family that has a smaller value of KL divergence is considered to have a better approximation of $g_{X,Y}$ .

Next are steps to find a sequence of $C^{(m)}$ that leads to smaller values of the KL divergence of a family from a given $g_{X,Y}$ . Because many simple parametric copula families only have positive dependence, we assume that $X$ and $Y$ have been oriented to have positive dependence. We also assume that $Y$ has a distribution that is close to unimodal (otherwise one might use a non-copula-based model for $(X,Y)$ ). Unless an additional reference is given, properties of the listed copula families can be found in Chapter 4 of Joe, (2014).

1.

Compute the KL divergence of the Gaussian copula family as a baseline.
2.

For copula family candidates with more dependence in one joint tail, compute the KL divergence of the Gumbel and survival Gumbel copula families with asymmetric dependence in the joint upper and lower tails. If one of these two families leads to smaller KL divergence, then try copula families with even more tail asymmetry in the same direction.
3.

For copula family candidates with less dependence in both joint tails (compared to Gaussian), compute the KL divergence of the Frank and Plackett copula families with reflection symmetric tail quadrant independence. If one of these two copula families leads to smaller KL divergence, then compute the KL divergence for the BB8 and BB10 ¹¹1Note that Kadhem and Nikoloulopoulos, (2021) show that there are parameters that can lead to non-convex contours of copula densities with N(0,1) margins for the BB10 copula. families with reflection tail asymmetry and their survival counterparts.
4.

For copula family candidates with more dependence in both joint tails, compute the KL divergence for the t copula family.
5.

If permutation asymmetry is possible in $g_{X,Y}$ (e.g., with heterogeneous components in a mixture model), compute the KL divergence for some copula families with permutation asymmetry; examples are skew normal (Yoshiba, (2018)) and asymmetric Gumbel (Section 4.15 of Joe, (2014)) copula families.

The concept of tail order (Hua and Joe, (2011)), covering copula families with different strengths of dependence in the joint lower and upper tails, is used in the above steps. The main distinctions are intermediate tail dependence (such as Gaussian copula with positive dependence), strong tail dependence (more dependence in the joint tail than Gaussian), and tail quadrant independence (less dependence in the joint tail than Gaussian). The concept of tail order may be less important when one variable is ordinal, because there are no observations in the joint upper or lower tail. However, copula families with tail asymmetries are still important to provide more flexibility when finding approximations to a given $g_{X,Y}$ .

4.3 Examples of KL Divergence for Mixed Continuous-Ordinal Variables

In this section, we consider a variety of concrete examples of the probability models in Section 4.1 which lead to $g_{X,Y}$ in (3). In all of these examples, the marginal density $f_{Y}$ of $g_{X,Y}$ is close to unimodal. Otherwise, $\text{cor}(X,Y)$ could be large and “clusters" might be seen in bivariate scatterplots. After examining a large number of cases, we find that a KL divergence value less than 0.003 usually indicates a good approximation from a copula family, and a KL divergence value greater than 0.01 usually indicates a poor approximation when using the conditional Q-Q diagnostic plots in Section 3.2.

The tables in Section 4.3.1 summarize some representative examples to illustrate what happens for (a) mixture models with different separation of location parameters and common versus varying scale parameters, and (b) conditional probit or logit models. Section 4.3.2 contains conclusions drawn from these examples.

4.3.1 Representative Cases

Tables 1 and 2 have some illustrative examples for 2 and 3 ordinal categories, respectively. The top parts of these tables have mixture components $[Y|X=x]$ that are Gaussian, t, or skew normal. The parametric bivariate copula families that lead to the smallest KL divergence, as well as the minimized KL divergence values, are shown in these tables.

For mixture models of different distributions, the mixing proportion vector $\pi$ of $X$ can vary across different cases. These examples have unequal mixing proportions and some examples have a dominant component. In Table 1 with two ordinal categories, models (A1), (B1), and (C1) have close locations and constant scales; models (A2), (B2), and (C2) have locations that are farther apart and constant scales; models (A3), (B3), and (C3) have close locations and non-constant scales; models (A4), (B4), and (C4) have locations that are farther apart and non-constant scales. In Table 2 with three ordinal categories, models (E1), (F1), and (G1) have equally spaced locations and constant scales; models (E2), (F2), and (G2) have unequally spaced locations and constant scales; models (E3), (F3), and (G3) have equally spaced locations and non-constant scales; models (E4), (F4), and (G4) have unequally spaced locations and non-constant scales. The tables show that the KL divergence values tend to be smaller for cases of close or equally spaced locations and constant scales, and be larger for cases of distant or unequally spaced locations and non-constant scales. With enough heterogeneity, the conditional Q-Q plots based on the parametric copula family with the smallest KL divergence typically show some deviation in the tails. When the ordinal variable has four or more categories, it is more difficult for simple parametric copula families to approximate mixture models well.

If there are asymmetries in the proportions in $f_{X}$ or the scale parameters are unequal, the best parametric copulas based on the KL divergence will have reflection or tail asymmetry, but need not be good fits based on the conditional Q-Q plots; see Section 6 for examples of cases (E1) to (E4). With asymmetries and unobserved extremes of the ordinal variable, the best parametric copula family based on the KL divergence could have tail dependence in the joint upper and/or lower tail, or in neither joint tail. However, the concept of tail dependence is less meaningful when one of the variables is ordinal.

The bottom parts of Tables 1 and 2 have ordinal regression models for the ordinal variable $X$ with a continuous covariate $Y$ . For the conditional probit model (D1), $\mathbb{P}(X=0|Y=y)=1-\mathbb{P}(Z\leq ay+b)=\mathbb{P}(Z\leq-ay-b)$ , where $Z\sim N(0,1)$ and $Z$ is independent from $Y\sim N(0,1)$ . As a result, $\mathbb{P}(X=0)=\mathbb{P}(Z+aY\leq-b)$ . Let $W=Z+aY$ . Then $W\sim N(0,1+a^{2})$ and $\text{cor}(W,Y)=a/(\sqrt{1+a^{2}})$ . The binary variable $X$ can thus be considered as being generated from a latent Gaussian variable $W$ with the cutpoint $-b$ . Therefore, no matter what values of $a$ and $b$ are used to specify the Bernoulli distributions for the conditional distribution of $X$ given $Y$ , a bivariate Gaussian copula with $\rho=a/(\sqrt{1+a^{2}})$ always provides a perfect match, leading to a KL divergence of 0.

For the conditional logit models (D2) and (D3) when $Y$ has a normal distribution, Gaussian copulas can provide a good approximation since the logit function is very close to the probit function, while t copulas (degrees of freedom 28 and 11) approximate these models slightly better with the KL divergence being very close to 0. For the conditional probit or logit models (D4) to (D7) when $Y$ has other distributions such as t or extreme value (EV) distributions, the approximation of copula-based models are slightly worse than the previous two scenarios, but still adequate.

Mixture of normal distributions
Model	$\pi$	Model parameters	Copula family	KL Divergence
(A1)	$(0.5,0.5)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1)$	Gaussian	0.0001
(A2)	$(0.7,0.3)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,1)$	BB8	0.0010
(A3)	$(0.6,0.4)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1.5)$	Survival BB1	0.0040
(A4)	$(0.2,0.8)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,2)$	BB1	0.0087
Mixture of t distributions ( $\nu$ : degrees of freedom)
Model	$\pi$	Model parameters	Copula family	KL Divergence
(B1)	$(0.4,0.6)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\nu}=(3,3)$	Survival BB10	0.0022
(B2)	$(0.3,0.7)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\nu}=(3,3)$	Survival BB10	0.0057
(B3)	$(0.6,0.4)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\nu}=(3,6)$	Survival BB10	0.0040
(B4)	$(0.7,0.3)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\nu}=(3,6)$	Survival BB10	0.0050
Mixture of skew normal distributions ( $\alpha$ : skew)
Model	$\pi$	Model parameters	Copula family	KL Divergence
(C1)	$(0.3,0.7)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1),\boldsymbol{\alpha}=(3,3)$	Survival Joe	0.0050
(C2)	$(0.6,0.4)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,1),\boldsymbol{\alpha}=(3,3)$	Survival Joe	0.0073
(C3)	$(0.5,0.5)$	$\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1.5),\boldsymbol{\alpha}=(3,6)$	Clayton	0.0065
(C4)	$(0.4,0.6)$	$\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,2),\boldsymbol{\alpha}=(3,6)$	Survival Joe	0.0056
Conditional probit or logit models
Model	Model specifications		Copula family	KL Divergence
(D1)	$Y\sim N(0,1),[X\|Y=y]\sim\text{Bernoulli}\left(\Phi(ay+b)\right)$		Gaussian	0
(D2)	$Y\sim N(0,1),[X\|Y=y]\sim\text{Bernoulli}\left(1/\left(1+\exp(-y-3)\right)\right)$		t(28)	$1.5\times 10^{-5}$
(D3)	$Y\sim N(0,1),[X\|Y=y]\sim\text{Bernoulli}\left(1/\left(1+\exp(-2y+2)\right)\right)$		t(11)	$4.5\times 10^{-5}$
(D4)	$Y\sim t_{3},[X\|Y=y]\sim\text{Bernoulli}\left(\Phi(y+2)\right)$		t(8)	0.0021
(D5)	$Y\sim t_{3},[X\|Y=y]\sim\text{Bernoulli}\left(1/\exp(-y-1)\right)$		Gaussian	0.0021
(D6)	$Y\sim\text{EV},[X\|Y=y]\sim\text{Bernoulli}\left(\Phi(y+1)\right)$		Gaussian	0.0019
(D7)	$Y\sim\text{EV},[X\|Y=y]\sim\text{Bernoulli}\left(1+\exp(-y-1)\right)$		Gaussian	0.0011

Table 1: Bivariate copula families that minimize the KL divergence to probability models for an ordinal variable with two categories and a continuous variable. For mixture models,

\pi

is the vector of mixing proportions. The minimized KL divergence values are shown in the last column. The skew normal distribution has PDF

f(y)=\frac{2}{\sigma}\phi\left(\frac{y-\mu}{\sigma}\right)\Phi\left(\alpha\left(\frac{y-\mu}{\sigma}\right)\right)

with a skew parameter

\alpha

. The extreme value (EV) distribution has CDF

F(y)=\exp(-\exp(-y)),y\in\mathbb{R}

Results are similar with more than two ordinal categories. For the conditional ordinal probit model (H1) with three categories 1, 2, and 3, the categorical variable $X$ can be considered as being generated from a latent Gaussian variable $W$ with cutpoints $b_{1}$ and $b_{2}$ with $b_{1}<b_{2}$ , where $W\sim N(0,1+a^{2})$ and $\text{cor}(W,Y)=-a/(\sqrt{1+a^{2}})$ . No matter what constants $a$ , $b_{1}$ , and $b_{2}$ are used to specify the probabilities of the three categories, the bivariate Gaussian copula with $\rho=-a/(\sqrt{1+a^{2}})$ always provides a perfect match, leading to KL divergences of 0. The same conclusion extends to conditional ordinal probit models with an arbitrary number of categories when $Y$ follows a normal distribution. For the conditional logit model (H2) when $Y$ has a normal distribution, the t copula (degrees of freedom 20) provides a good approximation with KL divergence being very close to 0. For the conditional probit models (H3) and (H4) when $Y$ has t or EV distributions, the approximation of simple parametric copula families becomes slightly worse.

Mixture of normal distributions
Model	$\pi$	Model parameters	Copula family	KL Divergence
(E1)	$(0.3,0.3,0.4)$	$\boldsymbol{\mu}=(1,2,3),\boldsymbol{\sigma}=(1,1,1)$	Gaussian	0.0016
(E2)	$(0.5,0.2,0.3)$	$\boldsymbol{\mu}=(1,3,6),\boldsymbol{\sigma}=(2,2,2)$	Survival BB1	0.0031
(E3)	$(0.4,0.4,0.2)$	$\boldsymbol{\mu}=(1,2,3),\boldsymbol{\sigma}=(3,2,4)$	Asymmetric Gumbel	0.0267
(E4)	$(0.3,0.4,0.3)$	$\boldsymbol{\mu}=(1,3,6),\boldsymbol{\sigma}=(4,6,3)$	Survival BB1	0.0472
Mixture of t distributions ( $\nu$ : degrees of freedom)
Model	$\pi$	Model parameters	Copula family	KL Divergence
(F1)	$(0.2,0.5,0.3)$	$\boldsymbol{\mu}=(1,2,3),\boldsymbol{\nu}=(4,4,4)$	Plackett	0.0028
(F2)	$(0.4,0.2,0.4)$	$\boldsymbol{\mu}=(1,3,7),\boldsymbol{\nu}=(4,4,4)$	Survival BB10	0.0432
(F3)	$(0.4,0.3,0.3)$	$\boldsymbol{\mu}=(1,2,3),\boldsymbol{\nu}=(6,3,9)$	Survival BB10	0.0075
(F4)	$(0.3,0.5,0.2)$	$\boldsymbol{\mu}=(1,3,7),\boldsymbol{\nu}=(6,3,9)$	BB8	0.0039
Mixture of skew normal distributions ( $\alpha$ : skew)
Model	$\pi$	Model parameters	Copula family	KL Divergence
(G1)	$(0.2,0.4,0.4)$	$\boldsymbol{\mu}=(2,3,4),\boldsymbol{\sigma}=(3,3,3),\boldsymbol{\alpha}=(4,4,4)$	Survival Gumbel	0.0102
(G2)	$(0.2,0.3,0.5)$	$\boldsymbol{\mu}=(2,4,8),\boldsymbol{\sigma}=(3,3,3),\boldsymbol{\alpha}=(4,4,4)$	Survival BB10	0.0424
(G3)	$(0.3,0.2,0.5)$	$\boldsymbol{\mu}=(2,3,4),\boldsymbol{\sigma}=(3,1,2),\boldsymbol{\alpha}=(3,2,4)$	t(2)	0.1421
(G4)	$(0.5,0.3,0.2)$	$\boldsymbol{\mu}=(2,4,8),\boldsymbol{\sigma}=(3,1,2),\boldsymbol{\alpha}=(3,2,4)$	BB8	0.2540
Conditional probit or logit models
Model	Model specifications		Copula family	KL Divergence
(H1)	$Y\sim N(0,1),\begin{cases}\mathbb{P}(X\leq 1\|Y=y)=\Phi(ay+b_{1})\\ \mathbb{P}(X\leq 2\|Y=y)=\Phi(ay+b_{2}),\end{cases}$ $b_{1}<b_{2}$		Gaussian	0
\hdashline(H2)	$Y\sim N(0,1),\begin{cases}\mathbb{P}(X\leq 1\|Y=y)=1/\exp(y+1)\\ \mathbb{P}(X\leq 2\|Y=y)=1/\exp(y-1)\end{cases}$		t(20)	$1.6\times 10^{-5}$
\hdashline(H3)	$Y\sim t_{3},\begin{cases}\mathbb{P}(X\leq 1\|Y=y)=\Phi(-y-1)\\ \mathbb{P}(X\leq 2\|Y=y)=\Phi(-y+1)\end{cases}$		Gaussian	0.0038
\hdashline(H4)	$Y\sim\text{EV},\begin{cases}\mathbb{P}(X\leq 1\|Y=y)=\Phi(-y-1)\\ \mathbb{P}(X\leq 2\|Y=y)=\Phi(-y+1)\end{cases}$		Gaussian	0.0095

Table 2: Bivariate copula families that minimize the KL divergence to probability models for an ordinal variable with three categories and a continuous variable. The minimized KL divergence values are shown in the last column. Other definitions are the same as in Table 1.

4.3.2 Conclusions on Copula Approximations to Models in Section 4.1

Based on the representative examples in the previous section, the following general conclusions can be drawn. Simple parametric copula-based models provide good approximations mainly for (a) mixture models $[Y|X=x]$ with roughly equally spaced components and similar scale parameters, and (b) ordinal regression models $[X|Y=y]$ that are closed to probit models with a unimodal distribution for $Y$ . For mixture models with components that have unequally spaced location parameters or quite different scale parameters, the effective number of parameters is greater than 2, so it is not surprising that the simple parametric copula families do not lead to good approximations. This motivates the use of nonparametric copulas in Section 5.2.

5 Fitting Copula Models to a Mixed Continuous-Ordinal Pair

This section explains the procedures for fitting copula models to a pair of mixed continuous-ordinal variables. For a random sample $\{(x_{i},y_{i}):i=1,\ldots,n\}$ , $u^{+}_{iX},u^{-}_{iX},u_{iY},u_{iZ}$ are as defined in Section 3. Details for fitting parametric and nonparametric copula models are elaborated in Sections 5.1 and 5.2, respectively.

5.1 Parametric Bivariate Copula Families

Suppose there are $M$ parametric bivariate copula models to consider as candidate families. For a parametric bivariate copula model $C^{(m)}$ with parameter vector $\boldsymbol{\theta}^{(m)}$ and $C_{1|2}^{(m)}(u|v;\boldsymbol{\theta}^{(m)})=\partial C^{(m)}(u,v;\boldsymbol{\theta}^{(m)})/\partial v$ , its log-likelihood function is

\mathcal{L}_{m}\bigl{(}\boldsymbol{\theta}^{(m)}\bigr{)}=\sum_{i=1}^{n}\log\left\{C_{1|2}^{(m)}\bigl{(}u_{iX}^{+}|u_{iY};\boldsymbol{\theta}^{(m)}\bigr{)}-C_{1|2}^{(m)}\bigl{(}u_{iX}^{-}|u_{iY};\boldsymbol{\theta}^{(m)}\bigr{)}\right\}.

The maximum likelihood estimator is $\widehat{\boldsymbol{\theta}}^{(m)}_{n}=\operatorname*{arg\,max}_{\boldsymbol{\theta}^{(m)}}\mathcal{L}_{m}(\boldsymbol{\theta}^{(m)})$ for a sample of size $n$ . As $n\to\infty$ , $\widehat{\boldsymbol{\theta}}^{(m)}_{n}$ converges in probability to the value $\widetilde{\boldsymbol{\theta}}^{(m)}$ that minimizes the KL divergence for family $C^{(m)}(\cdot;\boldsymbol{\theta}^{(m)})$ . Parametric copula models are then compared using model selection criteria such as AIC and BIC. Models with smaller values of AIC or BIC are usually considered to fit the data better.

5.2 Nonparametric Bivariate Copulas

Since (2) only involves the copula CDF but not the copula density, we consider the empirical beta copula (Segers et al., (2017)) as a nonparametric alternative fitted to pairs $(u_{iZ},u_{iY})$ , $i=1,\ldots,n$ . The empirical beta copula density performs less well than the KDE copula estimator in Nagler, (2018), even after some averaging over distinct subsamples. However, the empirical beta copula CDF has the advantage of being a proper copula, while the KDE approach only leads to a distribution that is approximately a copula.

Let $\boldsymbol{u}_{Z}=\left\{u_{iZ}:i=1,\dots,n\right\}$ and $\boldsymbol{u}_{Y}=\left\{u_{iY}:i=1,\dots,n\right\}$ . For $\{(u_{iZ},u_{iY})\}$ , the bivariate empirical beta copula CDF is given by

C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})=\frac{1}{n}\sum_{i=1}^{n}F_{B}\left(u_{Z};R^{(n)}_{iZ},n+1-R_{iZ}^{(n)}\right)\cdot F_{B}\left(u_{Y};R^{(n)}_{iY},n+1-R_{iY}^{(n)}\right),

(4)

where $F_{B}(\cdot;\alpha,\beta)$ is the CDF of the $\text{Beta}(\alpha,\beta)$ distribution, $R_{iZ}^{(n)}$ is the rank of $u_{iZ}$ in $\boldsymbol{u}_{Z}$ , and $R_{iY}^{(n)}$ is the rank of $u_{iY}$ in $\boldsymbol{u}_{Y}$ . Note that (4) is a continuous differentiable function on $[0,1]^{2}$ . The $r$ th order statistic $U_{(r:n)}$ in a random sample of size $n$ generated from $U(0,1)$ follows a $\text{Beta}(r,n+1-r)$ distribution, which leads to the two beta distributions in (4).

Assuming a consistent estimator for $F_{Y}$ , the consistency of the empirical beta copula with the latent vector $\boldsymbol{u}_{Z}$ is shown in Section 5.3 below.

5.3 Consistency of the Empirical Beta Copula Estimate with a Latent Variable

Let $F_{n,X}$ be the empirical distribution of the ordinal variable $X$ . Let $F_{n,Y}$ be a consistent estimate of $F_{Y}$ . Let $C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})$ be the empirical beta copula estimate in (4) and let $C_{n,Z,Y}(u_{Z},u_{Y})$ be the empirical copula of the same sample, defined as

C_{n,Z,Y}(u_{Z},u_{Y})=\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\frac{R_{iZ}^{(n)}}{n}\leq u_{Z}\right\}\mathbb{I}\left\{\frac{R_{iY}^{(n)}}{n}\leq u_{Y}\right\},\quad 0\leq u_{Z}\leq 1,\quad 0\leq u_{Y}\leq 1.

The empirical copula is a step function.

Proposition 2.8 in Segers et al., (2017) states that

\sup_{(u_{Z},u_{Y})\in[0,1]^{2}}\left|C_{n,Z,Y}(u_{Z},u_{Y})-C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})\right|=O\left(n^{-1/2}(\log n)^{1/2}\right).

This indicates that $C_{n,Z,Y}(u_{Z},u_{Y})-C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})\to 0$ for arbitrary $u_{Z}$ and $u_{Y}$ as $n\to\infty$ . It is shown in Section 3.1 that $u_{iZ}$ can be considered as being sampled from a uniform distribution that satisfies the condition (1) as $n\to\infty$ . Based on results on the empirical copula processes (Segers, (2012)), $C_{n,Z,Y}$ converges weakly to $C_{ZY}$ , where $Z$ is the latent Gaussian variable for $X$ in Section 3.1. Since $C_{ZY}$ matches $C_{XY}$ at the CDF values of $X$ when $Z$ satisfies the condition (1), $C^{\beta}_{n,Z,Y}(F_{X}(i),u_{Y})-C_{X,Y}(F_{X}(i),u_{Y})\overset{p}{\to}0$ for $i=1,\dots,k$ and arbitrary $u_{Y}$ as $n\to\infty$ . This shows the consistency of the empirical beta copula estimate in (4).

6 Illustrations on Simulated Datasets

In this section, we illustrate the visualization, estimation, and diagnostic techniques proposed in the previous sections on simulated datasets. Bivariate datasets are generated from four mixture models of normal distributions, denoted by (E1), (E2), (E3), and (E4) in Table 2. In each case, the sample size is 1000.

In Table 2, the minimized KL divergence values for cases (E1) and (E2) are much smaller than those for cases (E3) and (E4), indicating that parametric copula families fit (E1) and (E2) better than (E3) and (E4). Normal score plots of the continuous variable $Y$ versus the latent Gaussian variable $Z$ generated from the ordinal variable $X$ based on the steps in Section 3.1 are shown in Figure 1. It can be seen that the normal score plots of (E1) and (E2) have an approximate elliptical shape that can match some commonly used parametric copula families. In contrast, for (E3) and (E4), heteroscedasticity among different components of the mixture model leads to asymmetries and unusual shapes in the normal score plots. Simple parametric copula families are inadequate for the data generated in these two cases.

Refer to caption — Figure 1: Normal score plots of the continuous variable $Y$ and the latent Gaussian variable $Z$ generated from the ordinal variable $X$ for cases (E1) to (E4).

Diagnostic techniques in Section 3.2 are applied to assess the fits of the parametric copula families in Table 2 with the smallest KL divergence. The conditional Q-Q plots by category of the ordinal variable $X$ are shown in Figure 2. There are significant departures from the $45^{\circ}$ diagonal line for the bivariate copula families fitted to cases (E3) and (E4), indicating inadequate fits. This aligns with the conclusions drawn from the normal score visualizations in Figure 1.

With the empirical beta copula estimate in Section 5.2, the conditional Q-Q plots by category based on the ordinal variable $X$ are shown in Figure 3. It can be seen that the points in all Q-Q plots are closely aligned with the $45^{\circ}$ diagonal line, indicating that the empirical beta copula estimate provides a much more adequate fit to the data generated in all of these four cases.

7 Data Example

In this section, we illustrate the proposed visualization, estimation, and diagnostic techniques for a pair of mixed continuous-ordinal variables on the Auto MPG dataset (Quinlan, (1993), available at https://archive.ics.uci.edu/ml/datasets/Auto+MPG).

This dataset contains the fuel consumption data of 398 cars from 1970 to 1982. The goal is to predict the mpg (miles per gallon as an indicator of fuel efficiency) of each car based on a set of explanatory variables. The variable cylinders can take five unique ordinal values: 3, 4, 5, 6, and 8. We merge category 3 with 4 and category 5 with 6 since only 4 and 3 observations have 3 and 5 cylinders. Some summaries of the important variables along with the transformations to achieve positive correlations with the response variable mpg are given in Table 3. The nominal variable origin indicates where the car is from (1: USA, 2: Europe, 3: Japan). Since mpg tends to increase as origin changes from USA to Europe to Japan, we treat origin as an ordinal variable for the vine structure selection and conditional Q-Q plots mentioned below.

Variable	Type	$\rho$	Range	Transformation
cylinders	ordinal	-0.781	$\{4,6,8\}$	change sign, $x=-x$
horsepower	continuous	-0.771	$[46,230]$	change sign, $x=-x$
weight	continuous	-0.832	$[1613,5140]$	change sign, $x=-x$
acceleration	continuous	0.420	$[8.0,24.8]$
model year	continuous	0.579	$[70,82]$
origin	nominal	0.563	$\{1,2,3\}$
mpg	continuous		$[9.0,46.6]$

Table 3: Summaries of the variables in the Auto MPG dataset, including their type, Spearman’s

\rho

values with the response (mpg), range, and transformations to achieve positive correlation with mpg.

If the maximum spanning tree criterion based on absolute $\rho_{N}$ (correlation of normal scores, or polyserial/polychoric correlation) as in Chang and Joe, (2019) is used to select the vine structure for this dataset, the first tree of the resulting vine regression model would include the two ordinal variables cylinders and origin connected to weight on two edges for explanatory variables. The summary statistics of the distribution of weight (before sign change), the conditional distribution of weight given cylinders (before sign change), and the conditional distribution of weight given origin are shown in Table 4. Note that when fitting the vine copula model, the signs of weight and cylinders are changed so that the correlations between negative weight, negative cylinders, and origin are all positive. The conditional distributions of weight given cylinders have roughly equally spaced means and similar standard deviations; this is close to the setting of a mixture model with equally spaced locations and constant variance. Therefore, parametric copula families can fit this pair well. In contrast, the conditional distributions of weight given origin have unequally spaced means and very different standard deviations; this is similar to a mixture model with unequally spaced locations and non-constant variance. Therefore, it is more difficult for parametric copula families to provide good approximations to this second pair.

Subset	$n$	Min	Q1	Median	Mean	Q3	Max	SD
None	198	1613	2224	2804	2970	3608	5140	847
$\textit{cylinders}=4$	208	1613	2049	2240	2310	2567	3270	345
$\textit{cylinders}=6$	87	2472	2938	3193	3195	3431	3907	332
$\textit{cylinders}=8$	103	3086	3799	4140	4115	4404	5140	449
$\textit{origin}=1$	249	1800	2720	3365	3362	4054	5140	795
$\textit{origin}=2$	70	1825	2067	2240	2423	2770	3820	490
$\textit{origin}=3$	79	1613	1985	2155	2221	2412	2930	320

Table 4: The summary statistics, including sample size (

n

), quartiles, mean, and standard deviation (SD), of the distribution of weight, the conditional distributions of weight given cylinders, and the conditional distributions of weight given origin.

The best-fitting parametric copula family for the pair negative weight and negative cylinders is Gaussian ( $\rho=0.97$ ). Conditional Q-Q plots for weight by category of cylinders are show in the first row of Figure 4. For the pair of negative weight and origin, the large proportion of the $\textit{origin}=1$ category causes the best-fitting parametric copula families to have tail asymmetry with more dependence in the joint lower tail. The 1-parameter Clayton, 2-parameter BB1 and 2-parameter BB7 copulas have the largest (and approximately equal) log-likelihoods and similar conditional Q-Q plots. The conditional Q-Q plots of weight by category of origin are shown in the second row of Figure 4 with the Clayton copula. The two parametric copula families in this figure generally fit the data well. Some deviations from the $45^{\circ}$ diagonal line can still be observed in the last plot for the weight and origin pair, but inference based on this copula should be acceptable. When the empirical beta copulas are fitted to these two pairs of variables, all conditional Q-Q plots show alignment with the $45^{\circ}$ diagonal line, including the last plot.

8 Conclusion

Through two types of diagnostic plots and theoretical assessments using the Kullback-Leibler divergence, we show that simple parametric bivariate copula families with a few parameters can sometimes be inadequate for a pair of mixed continuous-ordinal variables. For such a pair, visualizations are proposed based on normal score plots using an appropriate latent continuous variable and conditional Q-Q plots of the continuous variable given the ordinal variable. Existing probability models for mixed continuous-ordinal variables are considered to assess the adequacy of fits of simple parametric copula families using the Kullback-Leibler divergence. When a pair of mixed continuous-ordinal variables is generated from mixture models of distributions with roughly equally spaced locations and constant scales or from conditional probit/logit models, simple parametric copula families can provide good fits. Otherwise, nonparametric counterparts can be fitted to provide better approximations. Applications to simulated and real datasets demonstrate the effectiveness of the proposed methods in identifying the lack of fit of simple parametric copula families and in improving the adequacy of fits with nonparametric copulas.

The results in this paper can be used to understand when and how some standard regression methods with ordinal and continuous explanatory variables can be approximated by the vine copula regression methodology as considered in Chang and Joe, (2019). Details will be provided in future research.

Acknowledgments

This research has been supported by the Four-Year Doctoral Fellowship of the University of British Columbia, NSERC Discovery Grant GR010293, and a Mercator Fellowship associated with Deutsche Forschungsgemeinschaft.

References

Chang and Joe, (2019) Chang, B. and Joe, H. (2019). Prediction based on conditional distributions of vine copulas. Computational Statistics & Data Analysis, 139:45–63.
Cox, (1972) Cox, D. R. (1972). The analysis of multivariate binary data. Applied Statistics, pages 113–120.
Genest and Nešlehová, (2007) Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data. ASTIN Bulletin, 37(2):475–515.
Hua and Joe, (2011) Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence of multivariate copulas. Journal of Multivariate Analysis, 102(10):1454–1471.
Joe, (2014) Joe, H. (2014). Dependence Modeling with Copulas. Chapman & Hall/CRC, Boca Raton, FL.
Kadhem and Nikoloulopoulos, (2021) Kadhem, S. H. and Nikoloulopoulos, A. K. (2021). Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74:365–403.
Krzanowski, (1993) Krzanowski, W. (1993). The location model for mixtures of categorical and continuous variables. Journal of Classification, 10:25–49.
Little and Schluchter, (1985) Little, R. J. and Schluchter, M. D. (1985). Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72(3):497–512.
Nagler, (2018) Nagler, T. (2018). kdecopula: An R package for the kernel estimation of bivariate copula densities. Journal of Statistical Software, 84:1–22.
Olsson et al., (1982) Olsson, U., Drasgow, F., and Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3):337–347.
Quinlan, (1993) Quinlan, J. R. (1993). Combining instance-based and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning, pages 236–243.
Segers, (2012) Segers, J. (2012). Asymptotics of empirical copula processes under non-restrictive smoothness assumptions. Bernoulli, 18(3):764–782.
Segers et al., (2017) Segers, J., Sibuya, M., and Tsukahara, H. (2017). The empirical beta copula. Journal of Multivariate Analysis, 155:35–51.
Sklar, (1959) Sklar, A. (1959). Fonctions de répartition á $n$ dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8:229–231.
Stöber et al., (2015) Stöber, J., Hong, H. G., Czado, C., and Ghosh, P. (2015). Comorbidity of chronic diseases in the elderly: Patterns identified by a copula design for mixed responses. Computational Statistics & Data Analysis, 88:28–39.
Yoshiba, (2018) Yoshiba, T. (2018). Maximum likelihood estimation of skew-t copulas with its applications to stock returns. Journal of Statistical Computation and Simulation, 88:2489–2506.