This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Assessing Copula Models for Mixed Continuous-Ordinal Variables

Shenyi Pan Department of Statistics, University of British Columbia, Vancouver, BC Canada V6T 1Z4. Email: shenyi.pan@stat.ubc.ca.    Harry Joe Department of Statistics, University of British Columbia, Vancouver, BC Canada V6T 1Z4. Email: Harry.Joe@ubc.ca.
Abstract

Vine pair-copula constructions exist for a mix of continuous and ordinal variables. In some steps, this can involve estimating a bivariate copula for a pair of mixed continuous-ordinal variables. To assess the adequacy of copula fits for such a pair, diagnostic and visualization methods based on normal score plots and conditional Q-Q plots are proposed. The former utilizes a latent continuous variable for the ordinal variable. Using the Kullback-Leibler divergence, existing probability models for mixed continuous-ordinal variable pair are assessed for the adequacy of fit with simple parametric copula families. The effectiveness of the proposed visualization and diagnostic methods is illustrated on simulated and real datasets.

Keywords: parametric copula, empirical beta copula, Kullback-Leibler divergence, location-scale mixture models, normal scores, ordinal regression, polyserial correlation.

1 Introduction

Vine pair-copula constructions have been used for a mix of continuous and discrete/ordinal variables in Stöber et al., (2015), Section 3.9.5 of Joe, (2014), and Chang and Joe, (2019). The latter is concerned with the use of vine constructions for prediction models based on explanatory variables that are a mix of continuous and ordinal variables.

In some steps of the vine construction, the estimation involves a bivariate copula for a pair of mixed continuous-ordinal variables. The main objective of this paper is to assess the adequacy of the fit of parametric copula families for a pair of variables, one of which is continuous and the other is ordinal with a few categories. To decide on possible suitable bivariate parametric copula families, the two variables can be visualized using normal score plots by converting the ordinal variable into an appropriate latent continuous variable. After fitting candidate copula families, quantile-quantile (Q-Q) plots of the continuous variable conditioned on each category of the ordinal variable can be used to assess the adequacy of the fit.

To theoretically assess whether commonly used parametric copula families can fit mixed continuous-ordinal variables well, we consider some bivariate distributions or models proposed in the statistical literature for one ordinal and one continuous variable. The Kullback-Leibler (KL) divergence is used to assess the adequacy of copula-based approximations. For mixture models, we find that simple 1- or 2-parameter parametric copula families can lead to good approximations when there are homoscedastic and roughly equally spaced components and the number of mixture components is small. For conditional probit and logit models, Gaussian or t copulas can provide perfect or near perfect matches when the continuous variable follows a normal distribution. Otherwise, simple parametric bivariate copula families can be inadequate. In particular, in mixture models, as the locations become more dispersed or when there is heteroscedasticity, the best approximating simple parametric copula families based on the KL divergence can have reflection asymmetry, tail asymmetry, or tail dependence, but the fit can still be inadequate based on the conditional Q-Q plots.

When simple parametric copula families do not provide good fits, nonparametric copulas can be fitted with appropriate adaptations to an ordinal variable using the latent continuous variable. We indicate how to adapt the empirical beta copulas (Segers et al., (2017) for use with a pair of mixed continuous and ordinal variables as an example.

The remainder of this paper is organized as follows. Section 2 gives an overview of copulas when fitted to a pair of mixed continuous-ordinal variables. Section 3 proposes visualization methods via normal score plots and copula model diagnostic methods via conditional Q-Q plots for a pair of mixed continuous-ordinal variables. Section 4 discusses approaches to assessing the adequacy of approximations by computing the KL divergence of a copula-based model from a given density. Section 5 covers procedures for fitting parametric and nonparametric copula models to mixed continuous-ordinal variables. The proposed visualization and diagnostic methods are illustrated on simulated datasets in Section 6 and demonstrated on a real dataset in Section 7. Section 8 has final discussions.

2 Copulas for Mixed Continuous-Ordinal Variables

A copula is a multivariate distribution function with univariate Uniform (0,1)(0,1) margins. According to Sklar’s theorem (Sklar, (1959)), a dd-variate distribution FF is a composition of a copula CC and its univariate marginal distributions F1,,FdF_{1},\ldots,F_{d}; that is, F(𝒙)=C(F1(x1),,Fd(xd))F(\boldsymbol{x})=C(F_{1}(x_{1}),\ldots,F_{d}(x_{d})), for 𝒙d\boldsymbol{x}\in\mathbb{R}^{d}. If FF is a continuous dd-variate distribution function with univariate margins F1,,FdF_{1},\ldots,F_{d}, and quantile functions F11,,Fd1F_{1}^{-1},\ldots,F_{d}^{-1}, then the copula C(𝒖)=F(F11(u1),,Fd1(ud))C(\boldsymbol{u})=F(F_{1}^{-1}(u_{1}),\ldots,F_{d}^{-1}(u_{d})), for 𝒖[0,1]d\boldsymbol{u}\in[0,1]^{d}, is the unique choice. If FF is a dd-variate distribution function of mixed continuous-ordinal variables, then the copula is only unique on the set Range(F1)××Range(Fd)\text{Range}(F_{1})\times\cdots\times\text{Range}(F_{d}). Hence, the copula associated with FF is non-unique when some variables are ordinal. Nevertheless, parametric copula families can still be used for data applications. Many non-uniqueness results are shown in Genest and Nešlehová, (2007) for the completely discrete/ordinal case. For the bivariate case of one ordinal and one continuous variable, one candidate for the non-unique copula is based on conditionally uniform given identities that the copula must satisfy, extending an idea on page 122 of Joe, (2014). This corresponds to defining an appropriate latent continuous variable.

Consider an ordinal variable XX that can take kk distinct ordered values {v1,,vk}\{v_{1},\ldots,v_{k}\} and a continuous variable YY. Assume the labeling vj=jv_{j}=j for j=1,,kj=1,\ldots,k without loss of generality. Let FX,YF_{X,Y} be the cumulative distribution function (CDF) of (X,Y)(X,Y), with copula CX,YC_{X,Y} that is unique on {0,FX(1),,FX(k1),1}×[0,1]\{0,F_{X}(1),\ldots,F_{X}(k-1),1\}\times[0,1]. Let ZZ be a continuous latent variable associated with XX with distribution FZF_{Z}. If Z=FZ1(UZ)Z=F^{-1}_{Z}(U_{Z}) and

UZU(FX(i1),FX(i)), for X=i,i{1,,k},U_{Z}\sim U\left(F_{X}(i-1),F_{X}(i)\right),\text{ for }X=i,\quad i\in\{1,\ldots,k\}, (1)

then it can be shown that CZ,Y(FX(x),FY(y))=CX,Y(FX(x),FY(y))C_{Z,Y}(F_{X}(x),F_{Y}(y))=C_{X,Y}(F_{X}(x),F_{Y}(y)) holds for x{1,,k}x\in\{1,\ldots,k\} and FY(y)[0,1]F_{Y}(y)\in[0,1].

The proof is as follows. Note that UZU_{Z} in (1) has U(0,1)U(0,1) distribution. Hence, there is a unique copula CZ,YC_{Z,Y} for (Z,Y)(Z,Y) since ZZ is continuous. Let UY=FY(Y)U_{Y}=F_{Y}(Y) and UX=FX(X)U_{X}=F_{X}(X). By Sklar’s theorem, any copula CX,YC_{X,Y} for (X,Y)(X,Y) satisfies

CX,Y(FX(i),FY(y))=(UXFX(i),UYFY(y)),i=1,,k,<y<.C_{X,Y}\left(F_{X}(i),F_{Y}(y)\right)=\mathbb{P}\left(U_{X}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right),\quad i=1,\ldots,k,\quad-\infty<y<\infty.

For a given i{1,,k}i\in\{1,\ldots,k\}, UXFX(i)U_{X}\leq F_{X}(i) implies that XX can take a value in {1,,i}\{1,\ldots,i\}. The corresponding UZU_{Z} thus follows one of the uniform distributions U(0,FX(1)),,U(FX(i1),FX(i))U(0,F_{X}(1)),\dots,U(F_{X}(i-1),F_{X}(i)). This implies that UZFX(i)U_{Z}\leq F_{X}(i) also holds. Similarly, UZFX(i)U_{Z}\leq F_{X}(i) implies that XiX\leq i or UXFX(i)U_{X}\leq F_{X}(i). Hence, UZFX(i)U_{Z}\leq F_{X}(i) and UXFX(i)U_{X}\leq F_{X}(i) are equivalent events, and

CZ,Y(FX(i),FY(y))=(UZFX(i),UYFY(y))\displaystyle C_{Z,Y}\left(F_{X}(i),F_{Y}(y)\right)=\mathbb{P}\left(U_{Z}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right)
=(UXFX(i),UYFY(y))=CX,Y(FX(i),FY(y)).\displaystyle=\mathbb{P}\left(U_{X}\leq F_{X}(i),U_{Y}\leq F_{Y}(y)\right)=C_{X,Y}\left(F_{X}(i),F_{Y}(y)\right).

Thus CZ,YC_{Z,Y} matches CX,YC_{X,Y} on {0,FX(1),,FX(k1),1}×[0,1]\{0,F_{X}(1),\ldots,F_{X}(k-1),1\}\times[0,1].

Note that the condition in (1) only requires that UZU_{Z} follows a piecewise uniform distribution given different values of XX. However, there are no restrictions on the joint distribution of UZU_{Z} and UYU_{Y} given UXU_{X}. Given different joint distributions of [UZ,UY|X=i][U_{Z},U_{Y}|X=i], the resulting copulas CZ,YC_{Z,Y} are also different. This shows the non-uniqueness of copulas for the case of one ordinal and one continuous variable. Examples of generating different sets of UZU_{Z} values that satisfy the condition in (1) are illustrated in the next section.

3 Copula Model Diagnostics for Mixed Continuous-Ordinal

There are diagnostic plots and measures of asymmetry or dependence (Chapter 1 and Sections 2.12–2.15 of Joe, (2014)) to assess whether 1-parameter or 2-parameter bivariate copula families are useful for pairs of continuous variables with moderate to strong dependence. In this section, we discuss plots to help decide on possible parametric copula families for a continuous-ordinal pair of variables, and assess the adequacy of fitted copulas for a random sample with mixed continuous-ordinal variables. Visualization using normal score plots is presented in Section 3.1. A diagnostic procedure for copula estimates using conditional Q-Q plots is given in Section 3.2.

With an ordinal variable XX and a continuous variable YY, suppose there is a random sample from FX,YF_{X,Y} consisting of nn pairs (xi,yi)(x_{i},y_{i}) for i=1,,ni=1,\ldots,n, with xi{1,,k}x_{i}\in\{1,\ldots,k\}. For each j{1,,k}j\in\{1,\ldots,k\}, let njn_{j} be the cardinality of {i:xi=j}\{i:x_{i}=j\}. Let F^X{\widehat{F}}_{X} be the empirical distribution function of {xi}\{x_{i}\}. For the continuous variable YY, a parametric or nonparametric model can be applied to estimate FYF_{Y} with F^Y{\widehat{F}}_{Y}. For the iith observation, let uiX+=F^X(xi)u_{iX}^{+}={\widehat{F}}_{X}(x_{i}), uiX=limtxiF^X(t)=F^X(xi)u_{iX}^{-}=\lim_{t\uparrow x_{i}}{\widehat{F}}_{X}(t)={\widehat{F}}_{X}(x_{i}^{-}), and uiY=F^Y(yi)u_{iY}={\widehat{F}}_{Y}(y_{i}).

3.1 Visualizations via Normal Score Plots

For a pair of continuous variables, normal score plots are often used to visualize the copula for the variables and check for deviations from Gaussian dependence. With the ordinal variable XX, {uiX+}\{u_{iX}^{+}\} is a set of discrete values. To visualize the relationship between XX and YY, we use an appropriate latent continuous variable associated with XX.

For the normal score plots, we assume that the ordinal variable XX is generated from a latent standard normal variable ZZ with estimated cutpoints ζj=Φ1((n1++nj)/n)\zeta_{j}=\Phi^{-1}\left((n_{1}+\cdots+n_{j})/{n}\right) for j=1,,k1j=1,\ldots,k-1, where Φ\Phi and Φ1\Phi^{-1} are the CDF and quantile function of the standard normal distribution. Let ζ0=\zeta_{0}=-\infty and ζk=\zeta_{k}=\infty. We would like the latent variable ZZ to be generated in such a way that the correlation between ZZ and YY is close to the correlation between XX and YY in order for the visualization to preserve the strength of correlation from the original variables.

One measure of the association between an ordinal and a continuous variable is the polyserial correlation (see Olsson et al., (1982) and Section 2.12.7 of Joe, (2014)). The polyserial correlation between XX and YY is defined as

ρN=argmaxρi=1nlog{ϕ(Φ1(uiY))[Φ(ζxiρΦ1(uiY)1ρ2)Φ(ζxi1ρΦ1(uiY)1ρ2)]},\rho_{N}=\operatorname*{arg\,max}_{\rho}\sum_{i=1}^{n}\log\left\{\phi\left(\Phi^{-1}(u_{iY})\right)\left[\Phi\left(\frac{\zeta_{x_{i}}-\rho\Phi^{-1}(u_{iY})}{\sqrt{1-\rho^{2}}}\right)-\Phi\left(\frac{\zeta_{x_{i}-1}-\rho\Phi^{-1}(u_{iY})}{\sqrt{1-\rho^{2}}}\right)\right]\right\},

where ϕ\phi is the probability density function (PDF) of the standard normal distribution.

Let 𝒮j={i:xi=j}\mathcal{S}_{j}=\{i:x_{i}=j\}, j=1,,kj=1,\ldots,k, be the set of indices of XX taking value jj. We propose to generate the normal scores of the latent variable ZZ for the ordinal variable XX in the following steps.

  1. 1.

    Compute the polyserial correlation ρN\rho_{N} between XX and YY. For j=1,,kj=1,\ldots,k, perform steps 2 to 5 on 𝒮j\mathcal{S}_{j}:

  2. 2.

    For each i𝒮ji\in\mathcal{S}_{j}, independently generate ωiU(0,1)\omega_{i}\sim U(0,1). Let

    uiW=Φ(1ρN2Φ1(ωi)+ρNΦ1(uiY)),u_{iW}=\Phi\left(\sqrt{1-\rho_{N}^{2}}\Phi^{-1}(\omega_{i})+\rho_{N}\Phi^{-1}(u_{iY})\right),

    assuming a bivariate Gaussian copula with correlation ρN\rho_{N}. Note that the generated uiWu_{iW} are in the range of [0,1][0,1]. Let 𝒖W,j={uiW:i𝒮j}\boldsymbol{u}_{W,j}=\{u_{iW}:i\in\mathcal{S}_{j}\} be a vector of length njn_{j}.

  3. 3.

    For each i𝒮ji\in\mathcal{S}_{j}, independently generate ψiU(n1t=1j1nt,n1t=1jnt)\psi_{i}\sim U(n^{-1}\sum_{t=1}^{j-1}n_{t},n^{-1}\sum_{t=1}^{j}n_{t}) for j>1j>1, or ψiU(0,n1/n)\psi_{i}\sim U(0,n_{1}/n) for j=1j=1. Let 𝝍j={ψi:i𝒮j}\boldsymbol{\psi}_{j}=\{\psi_{i}:i\in\mathcal{S}_{j}\} be a vector of length njn_{j}.

  4. 4.

    For each uiWu_{iW} with i𝒮ji\in\mathcal{S}_{j}, find its rank in 𝒖W,j\boldsymbol{u}_{W,j}. Replace the value of uiWu_{iW} by ψ\psi_{\ell} that has the same rank in 𝝍j\boldsymbol{\psi}_{j}, and denote this value by uiZu_{iZ}. Let 𝒖Z,j={uiZ:i𝒮j}\boldsymbol{u}_{Z,j}=\{u_{iZ}:i\in\mathcal{S}_{j}\}.

  5. 5.

    Let zi=Φ1(uiZ)z_{i}=\Phi^{-1}(u_{iZ}) for each i𝒮ji\in\mathcal{S}_{j}.

The {uiW}\{u_{iW}\} generated in step 2 does not have a uniform distribution. The additional steps 3 and 4 match the ranks of uiWu_{iW} with a random vector generated from the uniform distribution. This procedure ensures that the generated normal scores {zi}\{z_{i}\} fall into the desired bins separated by the cutpoints 𝜻\boldsymbol{\zeta} and marginally follow the standard normal distribution. Within each bin, {zi}\{z_{i}\} and {Φ1(uiY)}\{\Phi^{-1}(u_{iY})\} also have similar correlation to ρN\rho_{N}. The two sets of normal scores, {zi:i=1,,n}\{z_{i}:i=1,\ldots,n\} and {Φ1(uiY):i=1,,n}\{\Phi^{-1}(u_{iY}):i=1,\ldots,n\}, can then be plotted against each other to decide on suitable parametric copula families. If a bivariate copula model can fit {(uiZ,uiY)}\{(u_{iZ},u_{iY})\} well, then it can also potentially provide adequate fits to {(xi,yi)}\{(x_{i},y_{i})\}.

Let Fn,X(j)=F^X(j)F_{n,X}(j)={\widehat{F}}_{X}(j) for j=1,,kj=1,\dots,k. As nn\to\infty, Fn,X(j)=n1t=1jntFX(j)F_{n,X}(j)=n^{-1}\sum_{t=1}^{j}n_{t}\to F_{X}(j) for j=1,,kj=1,\dots,k. Therefore, each element ψi\psi_{i} of 𝝍j\boldsymbol{\psi}_{j} in Step 3 of the proposed algorithm above follows U(Fn,X(j1),Fn,X(j))U\left(F_{n,X}(j-1),F_{n,X}(j)\right) and satisfies the condition (1) as nn\to\infty. Similarly, each element of 𝒖Z,j\boldsymbol{u}_{Z,j} also satisfies the condition (1) as nn\to\infty, since 𝒖Z,j\boldsymbol{u}_{Z,j} is a permutation of 𝝍j\boldsymbol{\psi}_{j}. However, the correlation for {(uiZ,uiY)}\{(u_{iZ},u_{iY})\} is stronger than the correlation for {(ψi,uiY)}\{(\psi_{i},u_{iY})\}. The bivariate empirical copulas of the two sets {(ψi,uiY):i=1,,n}\{(\psi_{i},u_{iY}):i=1,\dots,n\} and {(uiZ,uiY):i=1,,n}\{(u_{iZ},u_{iY}):i=1,\dots,n\} evaluated at (ux,uy)(u_{x},u_{y}) are the same for empirical CDF values ux=Fn,X(j)u_{x}=F_{n,X}(j) and arbitrary uy(0,1)u_{y}\in(0,1).

3.2 Diagnostics of Estimated Copula

With FXY(x,y)=CXY(FX(x),FY(y))F_{XY}(x,y)=C_{XY}(F_{X}(x),F_{Y}(y)) for a copula, the conditional distribution of the continuous variable YY given X=xX=x, where xx is a possible value of the ordinal variable XX, is

FY|X(y|x)=FXY(x,y)FXY(x,y)FX(x)FX(x)=CXY(FX(x),FY(y))CXY(FX(x),FY(y))FX(x)FX(x).F_{Y|X}(y|x)=\frac{F_{XY}(x,y)-F_{XY}(x^{-},y)}{F_{X}(x)-F_{X}(x^{-})}=\frac{C_{XY}(F_{X}(x),F_{Y}(y))-C_{XY}(F_{X}(x^{-}),F_{Y}(y))}{F_{X}(x)-F_{X}(x^{-})}. (2)

If C^XY=C(;𝜽^){\widehat{C}}_{XY}=C(\cdot;{\widehat{\boldsymbol{\theta}}}) is an estimate from a parametric family, let

F^Y|X(y|x)=C^XY(F^X(x),F^Y(y))C^XY(F^X(x),F^Y(y))F^X(x)F^X(x){\widehat{F}}_{Y|X}(y|x)=\frac{{\widehat{C}}_{XY}({\widehat{F}}_{X}(x),{\widehat{F}}_{Y}(y))-{\widehat{C}}_{XY}({\widehat{F}}_{X}(x^{-}),{\widehat{F}}_{Y}(y))}{{\widehat{F}}_{X}(x)-{\widehat{F}}_{X}(x^{-})}

be the estimated conditional distribution. Given a quantile level qq, the conditional quantile F^Y|X1(q|x){\widehat{F}}^{-1}_{Y|X}(q|x) is obtained as F^Y1(v){\widehat{F}}^{-1}_{Y}(v) where vv is the root to the equation

C^XY(F^X(x),v)C^XY(F^X(x),v)F^X(x)F^X(x)=q.\frac{{\widehat{C}}_{XY}\bigl{(}{\widehat{F}}_{X}(x),v)-{\widehat{C}}_{XY}\bigl{(}{\widehat{F}}_{X}(x^{-}),v\bigr{)}}{{\widehat{F}}_{X}(x)-{\widehat{F}}_{X}(x^{-})}=q.

To assess the fit of C^XY{\widehat{C}}_{XY}, conditional Q-Q plots can be generated based on F^Y|X1(|j){\widehat{F}}^{-1}_{Y|X}(\cdot|j) for j=1,,kj=1,\ldots,k. For the jjth conditional Q-Q plot, the quantiles F^Y|X1(|j){\widehat{F}}_{Y|X}^{-1}(\cdot|j) are plotted against the quantiles of the empirical distribution of {yi:xi=j}\{y_{i}:x_{i}=j\}, that is, the model-based quantiles F^Y|X1((m0.5)/nj|j){\widehat{F}}_{Y|X}^{-1}\left((m-0.5)/{n_{j}}\big{|}j\right) for m=1,,njm=1,\ldots,n_{j} are plotted against the sorted values of {yi:xi=j}\{y_{i}:x_{i}=j\}. If the points in each conditional Q-Q plot lie closely along the 4545^{\circ} diagonal line, it indicates that the copula estimate fits the sample well.

The use of these conditional Q-Q plots for copula diagnostics is illustrated in Section 6 for simulated datasets and Section 7 for a real dataset.

4 Assessing Copula Approximations of Probability Models for Mixed Continuous-Ordinal Variables

For a mix of several continuous variables and one ordinal variable, there are several classes of probability models in the statistical literature. Since our primary goal is to examine whether simple parametric bivariate copula families within the vine pair-copula construction are adequate, we focus on probability models for one continuous and one ordinal variable and assess the adequacy of approximations by parametric bivariate copula families with a few parameters.

This section discusses methods to theoretically assess the adequacy of copula approximations by computing the Kullback-Leibler (KL) divergence of C(FX,FY;θ)C(F_{X},F_{Y};\theta) with a parametric copula family C(;θ)C(\cdot;\theta) relative to a given bivariate distribution GX,YG_{X,Y} with margins FXF_{X} and FYF_{Y}. This leads to summaries of conditions for when parametric copula families with a few parameters are adequate, and other conditions for when they are inadequate. In particular, Section 4.1 gives an overview of common probability models for a pair of mixed continuous-ordinal variables. Computational details of the KL divergence are covered in Section 4.2. Some representative concrete examples are considered in Section 4.3 to show a range of KL divergence values.

4.1 Probability Models for Mixed Continuous-Ordinal Variables

There are two classes of models for a mix of continuous and ordinal variables depending on the order of conditioning.

For the first class, a multinomial distribution is assumed for the ordinal variable and the conditional distribution of the continuous variable given the ordinal variable can be either Gaussian with different mean and variance parameters, or a more general location-scale family (Little and Schluchter, (1985) and Krzanowski, (1993)). For the second class, the continuous variable is transformed to have a parametric distribution such as N(0,1)N(0,1) and the ordinal variable conditioned on the continuous variable is an ordinal probit or logit model; references are Cox, (1972) and Krzanowski, (1993). We introduce the notation for these two classes of models with an ordinal variable XX and a continuous variable YY.

For the first class of models, a location-scale mixture model is considered. The conditional distribution FY|X(|x)F_{Y|X}(\cdot|x) is a general location-scale model with

[Y|X=x]1σxp(yμxσx),x{1,,k},[Y|X=x]\sim\frac{1}{\sigma_{x}}p\left(\frac{y-\mu_{x}}{\sigma_{x}}\right),\quad x\in\{1,\ldots,k\},

where pp is a density function on the real line. The ordinal variable XX has a probability distribution representing the mixture components. If YY can only take positive values, it can be transformed via a logarithm to get a location-scale model.

For the second class of models, the conditional distribution FX|Y(|y)F_{X|Y}(\cdot|y) is

(Xx|Y=y)=F(ay+bx),x=1,,k,b1<<bk1<bk=,\mathbb{P}(X\leq x|Y=y)=F(ay+b_{x}),\quad x=1,\ldots,k,\quad b_{1}<\cdots<b_{k-1}<b_{k}=\infty,

where FF is a standard normal or logistic CDF, aa is a slope parameter, and bb is an offset depending on the ordinal category. The continuous variable YY is assumed to have a unimodal distribution. If YN(0,1)Y\sim N(0,1), simpler calculation results can be obtained when FF is the standard normal CDF.

4.2 Assessing Copula Approximation via the KL Divergence

Let HX,Y,GX,YH_{X,Y},G_{X,Y} be two bivariate distributions with densities hX,Y,gX,Yh_{X,Y},g_{X,Y} on the respective relevant measure spaces. For ordinal XX with values in {1,,k}\{1,\ldots,k\} and absolutely continuous YY, the product measure comes from a counting measure and the Lebesgue measure on the real line. The non-negative KL divergence of hX,Yh_{X,Y} from gX,Yg_{X,Y} is

KL(hX,Y,gX,Y)=i=1kgX,Y(i,y)log[gX,Y(i,y)/hX,Y(i,y)]dy.KL\left(h_{X,Y},g_{X,Y}\right)=\int_{-\infty}^{\infty}\sum_{i=1}^{k}g_{X,Y}(i,y)\log[g_{X,Y}(i,y)/h_{X,Y}(i,y)]\,{\rm d}y. (3)

Let C(;θ)C(\cdot;\theta) be a bivariate copula family with C1|2(u|v;θ)=C(u,v;θ)/vC_{1|2}(u|v;\theta)={\partial C(u,v;\theta)/\partial v}. If HX,Y(x,y;θ)=C(FX(x),FY(y);θ)H_{X,Y}(x,y;\theta)=C(F_{X}(x),F_{Y}(y);\theta) with conditional probability mass function hX|Y(x|y;θ)=C1|2(FX(x)|FY(y);θ)C1|2(FX(x)|FY(y);θ)h_{X|Y}(x|y;\theta)=C_{1|2}(F_{X}(x)|F_{Y}(y);\theta)-C_{1|2}(F_{X}(x^{-})|F_{Y}(y);\theta), then the joint density is hX,Y(x,y;θ)=fY(y)hX|Y(x|y;θ)h_{X,Y}(x,y;\theta)=f_{Y}(y)\,h_{X|Y}(x|y;\theta) with fY(y)=dFY(y)/dyf_{Y}(y)=\mathrm{d}F_{Y}(y)/\mathrm{d}y. In the setting of a mixture model that specifies fXf_{X} and gY|Xg_{Y|X}, fY(y)=i=1kfX(i)gY|X(y|i)f_{Y}(y)=\sum_{i=1}^{k}f_{X}(i)\,g_{Y|X}(y|i).

The copula with parameter estimate

θ^=argminθKL(hX,Y(;θ),gX,Y){\hat{\theta}}=\operatorname*{arg\,min}_{\theta}KL\left(h_{X,Y}(\cdot;\theta),g_{X,Y}\right)

is the member in the family that is the closest to gX,Yg_{X,Y}. The value KL(hX,Y(;θ^),gX,Y)KL(h_{X,Y}(\cdot;{\hat{\theta}}),g_{X,Y}) is considered as the KL divergence of the family {hX,Y(;θ)}\{h_{X,Y}(\cdot;\theta)\} from gX,Yg_{X,Y}.

In the results below, we consider different parametric copula families denoted by C(m)(;θ(m))C^{(m)}(\cdot;\theta^{(m)}) with corresponding densities hX,Y(m)(;θ(m))h^{(m)}_{X,Y}(\cdot;\theta^{(m)}). Given gX,Yg_{X,Y}, this leads to KL(hX,Y(m)(;θ^(m)),gX,Y)KL(h^{(m)}_{X,Y}(\cdot;{\hat{\theta}}^{(m)}),g_{X,Y}) for model mm. A copula family that has a smaller value of KL divergence is considered to have a better approximation of gX,Yg_{X,Y}.

Next are steps to find a sequence of C(m)C^{(m)} that leads to smaller values of the KL divergence of a family from a given gX,Yg_{X,Y}. Because many simple parametric copula families only have positive dependence, we assume that XX and YY have been oriented to have positive dependence. We also assume that YY has a distribution that is close to unimodal (otherwise one might use a non-copula-based model for (X,Y)(X,Y)). Unless an additional reference is given, properties of the listed copula families can be found in Chapter 4 of Joe, (2014).

  1. 1.

    Compute the KL divergence of the Gaussian copula family as a baseline.

  2. 2.

    For copula family candidates with more dependence in one joint tail, compute the KL divergence of the Gumbel and survival Gumbel copula families with asymmetric dependence in the joint upper and lower tails. If one of these two families leads to smaller KL divergence, then try copula families with even more tail asymmetry in the same direction.

  3. 3.

    For copula family candidates with less dependence in both joint tails (compared to Gaussian), compute the KL divergence of the Frank and Plackett copula families with reflection symmetric tail quadrant independence. If one of these two copula families leads to smaller KL divergence, then compute the KL divergence for the BB8 and BB10 111Note that Kadhem and Nikoloulopoulos, (2021) show that there are parameters that can lead to non-convex contours of copula densities with N(0,1) margins for the BB10 copula. families with reflection tail asymmetry and their survival counterparts.

  4. 4.

    For copula family candidates with more dependence in both joint tails, compute the KL divergence for the t copula family.

  5. 5.

    If permutation asymmetry is possible in gX,Yg_{X,Y} (e.g., with heterogeneous components in a mixture model), compute the KL divergence for some copula families with permutation asymmetry; examples are skew normal (Yoshiba, (2018)) and asymmetric Gumbel (Section 4.15 of Joe, (2014)) copula families.

The concept of tail order (Hua and Joe, (2011)), covering copula families with different strengths of dependence in the joint lower and upper tails, is used in the above steps. The main distinctions are intermediate tail dependence (such as Gaussian copula with positive dependence), strong tail dependence (more dependence in the joint tail than Gaussian), and tail quadrant independence (less dependence in the joint tail than Gaussian). The concept of tail order may be less important when one variable is ordinal, because there are no observations in the joint upper or lower tail. However, copula families with tail asymmetries are still important to provide more flexibility when finding approximations to a given gX,Yg_{X,Y}.

4.3 Examples of KL Divergence for Mixed Continuous-Ordinal Variables

In this section, we consider a variety of concrete examples of the probability models in Section 4.1 which lead to gX,Yg_{X,Y} in (3). In all of these examples, the marginal density fYf_{Y} of gX,Yg_{X,Y} is close to unimodal. Otherwise, cor(X,Y)\text{cor}(X,Y) could be large and “clusters" might be seen in bivariate scatterplots. After examining a large number of cases, we find that a KL divergence value less than 0.003 usually indicates a good approximation from a copula family, and a KL divergence value greater than 0.01 usually indicates a poor approximation when using the conditional Q-Q diagnostic plots in Section 3.2.

The tables in Section 4.3.1 summarize some representative examples to illustrate what happens for (a) mixture models with different separation of location parameters and common versus varying scale parameters, and (b) conditional probit or logit models. Section 4.3.2 contains conclusions drawn from these examples.

4.3.1 Representative Cases

Tables 1 and 2 have some illustrative examples for 2 and 3 ordinal categories, respectively. The top parts of these tables have mixture components [Y|X=x][Y|X=x] that are Gaussian, t, or skew normal. The parametric bivariate copula families that lead to the smallest KL divergence, as well as the minimized KL divergence values, are shown in these tables.

For mixture models of different distributions, the mixing proportion vector π\pi of XX can vary across different cases. These examples have unequal mixing proportions and some examples have a dominant component. In Table 1 with two ordinal categories, models (A1), (B1), and (C1) have close locations and constant scales; models (A2), (B2), and (C2) have locations that are farther apart and constant scales; models (A3), (B3), and (C3) have close locations and non-constant scales; models (A4), (B4), and (C4) have locations that are farther apart and non-constant scales. In Table 2 with three ordinal categories, models (E1), (F1), and (G1) have equally spaced locations and constant scales; models (E2), (F2), and (G2) have unequally spaced locations and constant scales; models (E3), (F3), and (G3) have equally spaced locations and non-constant scales; models (E4), (F4), and (G4) have unequally spaced locations and non-constant scales. The tables show that the KL divergence values tend to be smaller for cases of close or equally spaced locations and constant scales, and be larger for cases of distant or unequally spaced locations and non-constant scales. With enough heterogeneity, the conditional Q-Q plots based on the parametric copula family with the smallest KL divergence typically show some deviation in the tails. When the ordinal variable has four or more categories, it is more difficult for simple parametric copula families to approximate mixture models well.

If there are asymmetries in the proportions in fXf_{X} or the scale parameters are unequal, the best parametric copulas based on the KL divergence will have reflection or tail asymmetry, but need not be good fits based on the conditional Q-Q plots; see Section 6 for examples of cases (E1) to (E4). With asymmetries and unobserved extremes of the ordinal variable, the best parametric copula family based on the KL divergence could have tail dependence in the joint upper and/or lower tail, or in neither joint tail. However, the concept of tail dependence is less meaningful when one of the variables is ordinal.

The bottom parts of Tables 1 and 2 have ordinal regression models for the ordinal variable XX with a continuous covariate YY. For the conditional probit model (D1), (X=0|Y=y)=1(Zay+b)=(Zayb)\mathbb{P}(X=0|Y=y)=1-\mathbb{P}(Z\leq ay+b)=\mathbb{P}(Z\leq-ay-b), where ZN(0,1)Z\sim N(0,1) and ZZ is independent from YN(0,1)Y\sim N(0,1). As a result, (X=0)=(Z+aYb)\mathbb{P}(X=0)=\mathbb{P}(Z+aY\leq-b). Let W=Z+aYW=Z+aY. Then WN(0,1+a2)W\sim N(0,1+a^{2}) and cor(W,Y)=a/(1+a2)\text{cor}(W,Y)=a/(\sqrt{1+a^{2}}). The binary variable XX can thus be considered as being generated from a latent Gaussian variable WW with the cutpoint b-b. Therefore, no matter what values of aa and bb are used to specify the Bernoulli distributions for the conditional distribution of XX given YY, a bivariate Gaussian copula with ρ=a/(1+a2)\rho=a/(\sqrt{1+a^{2}}) always provides a perfect match, leading to a KL divergence of 0.

For the conditional logit models (D2) and (D3) when YY has a normal distribution, Gaussian copulas can provide a good approximation since the logit function is very close to the probit function, while t copulas (degrees of freedom 28 and 11) approximate these models slightly better with the KL divergence being very close to 0. For the conditional probit or logit models (D4) to (D7) when YY has other distributions such as t or extreme value (EV) distributions, the approximation of copula-based models are slightly worse than the previous two scenarios, but still adequate.

Mixture of normal distributions
Model π\pi Model parameters Copula family KL Divergence
(A1) (0.5,0.5)(0.5,0.5) 𝝁=(1,2),𝝈=(1,1)\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1) Gaussian 0.0001
(A2) (0.7,0.3)(0.7,0.3) 𝝁=(1,3),𝝈=(1,1)\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,1) BB8 0.0010
(A3) (0.6,0.4)(0.6,0.4) 𝝁=(1,2),𝝈=(1,1.5)\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1.5) Survival BB1 0.0040
(A4) (0.2,0.8)(0.2,0.8) 𝝁=(1,3),𝝈=(1,2)\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,2) BB1 0.0087
Mixture of t distributions (ν\nu: degrees of freedom)
Model π\pi Model parameters Copula family KL Divergence
(B1) (0.4,0.6)(0.4,0.6) 𝝁=(1,2),𝝂=(3,3)\boldsymbol{\mu}=(1,2),\boldsymbol{\nu}=(3,3) Survival BB10 0.0022
(B2) (0.3,0.7)(0.3,0.7) 𝝁=(1,3),𝝂=(3,3)\boldsymbol{\mu}=(1,3),\boldsymbol{\nu}=(3,3) Survival BB10 0.0057
(B3) (0.6,0.4)(0.6,0.4) 𝝁=(1,2),𝝂=(3,6)\boldsymbol{\mu}=(1,2),\boldsymbol{\nu}=(3,6) Survival BB10 0.0040
(B4) (0.7,0.3)(0.7,0.3) 𝝁=(1,3),𝝂=(3,6)\boldsymbol{\mu}=(1,3),\boldsymbol{\nu}=(3,6) Survival BB10 0.0050
Mixture of skew normal distributions (α\alpha: skew)
Model π\pi Model parameters Copula family KL Divergence
(C1) (0.3,0.7)(0.3,0.7) 𝝁=(1,2),𝝈=(1,1),𝜶=(3,3)\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1),\boldsymbol{\alpha}=(3,3) Survival Joe 0.0050
(C2) (0.6,0.4)(0.6,0.4) 𝝁=(1,3),𝝈=(1,1),𝜶=(3,3)\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,1),\boldsymbol{\alpha}=(3,3) Survival Joe 0.0073
(C3) (0.5,0.5)(0.5,0.5) 𝝁=(1,2),𝝈=(1,1.5),𝜶=(3,6)\boldsymbol{\mu}=(1,2),\boldsymbol{\sigma}=(1,1.5),\boldsymbol{\alpha}=(3,6) Clayton 0.0065
(C4) (0.4,0.6)(0.4,0.6) 𝝁=(1,3),𝝈=(1,2),𝜶=(3,6)\boldsymbol{\mu}=(1,3),\boldsymbol{\sigma}=(1,2),\boldsymbol{\alpha}=(3,6) Survival Joe 0.0056
Conditional probit or logit models
Model Model specifications Copula family KL Divergence
(D1) YN(0,1),[X|Y=y]Bernoulli(Φ(ay+b))Y\sim N(0,1),[X|Y=y]\sim\text{Bernoulli}\left(\Phi(ay+b)\right) Gaussian 0
(D2) YN(0,1),[X|Y=y]Bernoulli(1/(1+exp(y3)))Y\sim N(0,1),[X|Y=y]\sim\text{Bernoulli}\left(1/\left(1+\exp(-y-3)\right)\right) t(28) 1.5×1051.5\times 10^{-5}
(D3) YN(0,1),[X|Y=y]Bernoulli(1/(1+exp(2y+2)))Y\sim N(0,1),[X|Y=y]\sim\text{Bernoulli}\left(1/\left(1+\exp(-2y+2)\right)\right) t(11) 4.5×1054.5\times 10^{-5}
(D4) Yt3,[X|Y=y]Bernoulli(Φ(y+2))Y\sim t_{3},[X|Y=y]\sim\text{Bernoulli}\left(\Phi(y+2)\right) t(8) 0.0021
(D5) Yt3,[X|Y=y]Bernoulli(1/exp(y1))Y\sim t_{3},[X|Y=y]\sim\text{Bernoulli}\left(1/\exp(-y-1)\right) Gaussian 0.0021
(D6) YEV,[X|Y=y]Bernoulli(Φ(y+1))Y\sim\text{EV},[X|Y=y]\sim\text{Bernoulli}\left(\Phi(y+1)\right) Gaussian 0.0019
(D7) YEV,[X|Y=y]Bernoulli(1+exp(y1))Y\sim\text{EV},[X|Y=y]\sim\text{Bernoulli}\left(1+\exp(-y-1)\right) Gaussian 0.0011
Table 1: Bivariate copula families that minimize the KL divergence to probability models for an ordinal variable with two categories and a continuous variable. For mixture models, π\pi is the vector of mixing proportions. The minimized KL divergence values are shown in the last column. The skew normal distribution has PDF f(y)=2σϕ(yμσ)Φ(α(yμσ))f(y)=\frac{2}{\sigma}\phi\left(\frac{y-\mu}{\sigma}\right)\Phi\left(\alpha\left(\frac{y-\mu}{\sigma}\right)\right) with a skew parameter α\alpha. The extreme value (EV) distribution has CDF F(y)=exp(exp(y)),yF(y)=\exp(-\exp(-y)),y\in\mathbb{R}.

Results are similar with more than two ordinal categories. For the conditional ordinal probit model (H1) with three categories 1, 2, and 3, the categorical variable XX can be considered as being generated from a latent Gaussian variable WW with cutpoints b1b_{1} and b2b_{2} with b1<b2b_{1}<b_{2}, where WN(0,1+a2)W\sim N(0,1+a^{2}) and cor(W,Y)=a/(1+a2)\text{cor}(W,Y)=-a/(\sqrt{1+a^{2}}). No matter what constants aa, b1b_{1}, and b2b_{2} are used to specify the probabilities of the three categories, the bivariate Gaussian copula with ρ=a/(1+a2)\rho=-a/(\sqrt{1+a^{2}}) always provides a perfect match, leading to KL divergences of 0. The same conclusion extends to conditional ordinal probit models with an arbitrary number of categories when YY follows a normal distribution. For the conditional logit model (H2) when YY has a normal distribution, the t copula (degrees of freedom 20) provides a good approximation with KL divergence being very close to 0. For the conditional probit models (H3) and (H4) when YY has t or EV distributions, the approximation of simple parametric copula families becomes slightly worse.

Mixture of normal distributions
Model π\pi Model parameters Copula family KL Divergence
(E1) (0.3,0.3,0.4)(0.3,0.3,0.4) 𝝁=(1,2,3),𝝈=(1,1,1)\boldsymbol{\mu}=(1,2,3),\boldsymbol{\sigma}=(1,1,1) Gaussian 0.0016
(E2) (0.5,0.2,0.3)(0.5,0.2,0.3) 𝝁=(1,3,6),𝝈=(2,2,2)\boldsymbol{\mu}=(1,3,6),\boldsymbol{\sigma}=(2,2,2) Survival BB1 0.0031
(E3) (0.4,0.4,0.2)(0.4,0.4,0.2) 𝝁=(1,2,3),𝝈=(3,2,4)\boldsymbol{\mu}=(1,2,3),\boldsymbol{\sigma}=(3,2,4) Asymmetric Gumbel 0.0267
(E4) (0.3,0.4,0.3)(0.3,0.4,0.3) 𝝁=(1,3,6),𝝈=(4,6,3)\boldsymbol{\mu}=(1,3,6),\boldsymbol{\sigma}=(4,6,3) Survival BB1 0.0472
Mixture of t distributions (ν\nu: degrees of freedom)
Model π\pi Model parameters Copula family KL Divergence
(F1) (0.2,0.5,0.3)(0.2,0.5,0.3) 𝝁=(1,2,3),𝝂=(4,4,4)\boldsymbol{\mu}=(1,2,3),\boldsymbol{\nu}=(4,4,4) Plackett 0.0028
(F2) (0.4,0.2,0.4)(0.4,0.2,0.4) 𝝁=(1,3,7),𝝂=(4,4,4)\boldsymbol{\mu}=(1,3,7),\boldsymbol{\nu}=(4,4,4) Survival BB10 0.0432
(F3) (0.4,0.3,0.3)(0.4,0.3,0.3) 𝝁=(1,2,3),𝝂=(6,3,9)\boldsymbol{\mu}=(1,2,3),\boldsymbol{\nu}=(6,3,9) Survival BB10 0.0075
(F4) (0.3,0.5,0.2)(0.3,0.5,0.2) 𝝁=(1,3,7),𝝂=(6,3,9)\boldsymbol{\mu}=(1,3,7),\boldsymbol{\nu}=(6,3,9) BB8 0.0039
Mixture of skew normal distributions (α\alpha: skew)
Model π\pi Model parameters Copula family KL Divergence
(G1) (0.2,0.4,0.4)(0.2,0.4,0.4) 𝝁=(2,3,4),𝝈=(3,3,3),𝜶=(4,4,4)\boldsymbol{\mu}=(2,3,4),\boldsymbol{\sigma}=(3,3,3),\boldsymbol{\alpha}=(4,4,4) Survival Gumbel 0.0102
(G2) (0.2,0.3,0.5)(0.2,0.3,0.5) 𝝁=(2,4,8),𝝈=(3,3,3),𝜶=(4,4,4)\boldsymbol{\mu}=(2,4,8),\boldsymbol{\sigma}=(3,3,3),\boldsymbol{\alpha}=(4,4,4) Survival BB10 0.0424
(G3) (0.3,0.2,0.5)(0.3,0.2,0.5) 𝝁=(2,3,4),𝝈=(3,1,2),𝜶=(3,2,4)\boldsymbol{\mu}=(2,3,4),\boldsymbol{\sigma}=(3,1,2),\boldsymbol{\alpha}=(3,2,4) t(2) 0.1421
(G4) (0.5,0.3,0.2)(0.5,0.3,0.2) 𝝁=(2,4,8),𝝈=(3,1,2),𝜶=(3,2,4)\boldsymbol{\mu}=(2,4,8),\boldsymbol{\sigma}=(3,1,2),\boldsymbol{\alpha}=(3,2,4) BB8 0.2540
Conditional probit or logit models
Model Model specifications Copula family KL Divergence
(H1) YN(0,1),{(X1|Y=y)=Φ(ay+b1)(X2|Y=y)=Φ(ay+b2),Y\sim N(0,1),\begin{cases}\mathbb{P}(X\leq 1|Y=y)=\Phi(ay+b_{1})\\ \mathbb{P}(X\leq 2|Y=y)=\Phi(ay+b_{2}),\end{cases}    b1<b2b_{1}<b_{2} Gaussian 0
\hdashline(H2) YN(0,1),{(X1|Y=y)=1/exp(y+1)(X2|Y=y)=1/exp(y1)Y\sim N(0,1),\begin{cases}\mathbb{P}(X\leq 1|Y=y)=1/\exp(y+1)\\ \mathbb{P}(X\leq 2|Y=y)=1/\exp(y-1)\end{cases} t(20) 1.6×1051.6\times 10^{-5}
\hdashline(H3) Yt3,{(X1|Y=y)=Φ(y1)(X2|Y=y)=Φ(y+1)Y\sim t_{3},\begin{cases}\mathbb{P}(X\leq 1|Y=y)=\Phi(-y-1)\\ \mathbb{P}(X\leq 2|Y=y)=\Phi(-y+1)\end{cases} Gaussian 0.0038
\hdashline(H4) YEV,{(X1|Y=y)=Φ(y1)(X2|Y=y)=Φ(y+1)Y\sim\text{EV},\begin{cases}\mathbb{P}(X\leq 1|Y=y)=\Phi(-y-1)\\ \mathbb{P}(X\leq 2|Y=y)=\Phi(-y+1)\end{cases} Gaussian 0.0095
Table 2: Bivariate copula families that minimize the KL divergence to probability models for an ordinal variable with three categories and a continuous variable. The minimized KL divergence values are shown in the last column. Other definitions are the same as in Table 1.

4.3.2 Conclusions on Copula Approximations to Models in Section 4.1

Based on the representative examples in the previous section, the following general conclusions can be drawn. Simple parametric copula-based models provide good approximations mainly for (a) mixture models [Y|X=x][Y|X=x] with roughly equally spaced components and similar scale parameters, and (b) ordinal regression models [X|Y=y][X|Y=y] that are closed to probit models with a unimodal distribution for YY. For mixture models with components that have unequally spaced location parameters or quite different scale parameters, the effective number of parameters is greater than 2, so it is not surprising that the simple parametric copula families do not lead to good approximations. This motivates the use of nonparametric copulas in Section 5.2.

5 Fitting Copula Models to a Mixed Continuous-Ordinal Pair

This section explains the procedures for fitting copula models to a pair of mixed continuous-ordinal variables. For a random sample {(xi,yi):i=1,,n}\{(x_{i},y_{i}):i=1,\ldots,n\}, uiX+,uiX,uiY,uiZu^{+}_{iX},u^{-}_{iX},u_{iY},u_{iZ} are as defined in Section 3. Details for fitting parametric and nonparametric copula models are elaborated in Sections 5.1 and 5.2, respectively.

5.1 Parametric Bivariate Copula Families

Suppose there are MM parametric bivariate copula models to consider as candidate families. For a parametric bivariate copula model C(m)C^{(m)} with parameter vector 𝜽(m)\boldsymbol{\theta}^{(m)} and C1|2(m)(u|v;𝜽(m))=C(m)(u,v;𝜽(m))/vC_{1|2}^{(m)}(u|v;\boldsymbol{\theta}^{(m)})=\partial C^{(m)}(u,v;\boldsymbol{\theta}^{(m)})/\partial v, its log-likelihood function is

m(𝜽(m))=i=1nlog{C1|2(m)(uiX+|uiY;𝜽(m))C1|2(m)(uiX|uiY;𝜽(m))}.\mathcal{L}_{m}\bigl{(}\boldsymbol{\theta}^{(m)}\bigr{)}=\sum_{i=1}^{n}\log\left\{C_{1|2}^{(m)}\bigl{(}u_{iX}^{+}|u_{iY};\boldsymbol{\theta}^{(m)}\bigr{)}-C_{1|2}^{(m)}\bigl{(}u_{iX}^{-}|u_{iY};\boldsymbol{\theta}^{(m)}\bigr{)}\right\}.

The maximum likelihood estimator is 𝜽^n(m)=argmax𝜽(m)m(𝜽(m))\widehat{\boldsymbol{\theta}}^{(m)}_{n}=\operatorname*{arg\,max}_{\boldsymbol{\theta}^{(m)}}\mathcal{L}_{m}(\boldsymbol{\theta}^{(m)}) for a sample of size nn. As nn\to\infty, 𝜽^n(m)\widehat{\boldsymbol{\theta}}^{(m)}_{n} converges in probability to the value 𝜽~(m)\widetilde{\boldsymbol{\theta}}^{(m)} that minimizes the KL divergence for family C(m)(;𝜽(m))C^{(m)}(\cdot;\boldsymbol{\theta}^{(m)}). Parametric copula models are then compared using model selection criteria such as AIC and BIC. Models with smaller values of AIC or BIC are usually considered to fit the data better.

5.2 Nonparametric Bivariate Copulas

Since (2) only involves the copula CDF but not the copula density, we consider the empirical beta copula (Segers et al., (2017)) as a nonparametric alternative fitted to pairs (uiZ,uiY)(u_{iZ},u_{iY}), i=1,,ni=1,\ldots,n. The empirical beta copula density performs less well than the KDE copula estimator in Nagler, (2018), even after some averaging over distinct subsamples. However, the empirical beta copula CDF has the advantage of being a proper copula, while the KDE approach only leads to a distribution that is approximately a copula.

Let 𝒖Z={uiZ:i=1,,n}\boldsymbol{u}_{Z}=\left\{u_{iZ}:i=1,\dots,n\right\} and 𝒖Y={uiY:i=1,,n}\boldsymbol{u}_{Y}=\left\{u_{iY}:i=1,\dots,n\right\}. For {(uiZ,uiY)}\{(u_{iZ},u_{iY})\}, the bivariate empirical beta copula CDF is given by

Cn,Z,Yβ(uZ,uY)=1ni=1nFB(uZ;RiZ(n),n+1RiZ(n))FB(uY;RiY(n),n+1RiY(n)),C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})=\frac{1}{n}\sum_{i=1}^{n}F_{B}\left(u_{Z};R^{(n)}_{iZ},n+1-R_{iZ}^{(n)}\right)\cdot F_{B}\left(u_{Y};R^{(n)}_{iY},n+1-R_{iY}^{(n)}\right), (4)

where FB(;α,β)F_{B}(\cdot;\alpha,\beta) is the CDF of the Beta(α,β)\text{Beta}(\alpha,\beta) distribution, RiZ(n)R_{iZ}^{(n)} is the rank of uiZu_{iZ} in 𝒖Z\boldsymbol{u}_{Z}, and RiY(n)R_{iY}^{(n)} is the rank of uiYu_{iY} in 𝒖Y\boldsymbol{u}_{Y}. Note that (4) is a continuous differentiable function on [0,1]2[0,1]^{2}. The rrth order statistic U(r:n)U_{(r:n)} in a random sample of size nn generated from U(0,1)U(0,1) follows a Beta(r,n+1r)\text{Beta}(r,n+1-r) distribution, which leads to the two beta distributions in (4).

Assuming a consistent estimator for FYF_{Y}, the consistency of the empirical beta copula with the latent vector 𝒖Z\boldsymbol{u}_{Z} is shown in Section 5.3 below.

5.3 Consistency of the Empirical Beta Copula Estimate with a Latent Variable

Let Fn,XF_{n,X} be the empirical distribution of the ordinal variable XX. Let Fn,YF_{n,Y} be a consistent estimate of FYF_{Y}. Let Cn,Z,Yβ(uZ,uY)C_{n,Z,Y}^{\beta}(u_{Z},u_{Y}) be the empirical beta copula estimate in (4) and let Cn,Z,Y(uZ,uY)C_{n,Z,Y}(u_{Z},u_{Y}) be the empirical copula of the same sample, defined as

Cn,Z,Y(uZ,uY)=1ni=1n𝕀{RiZ(n)nuZ}𝕀{RiY(n)nuY},0uZ1,0uY1.C_{n,Z,Y}(u_{Z},u_{Y})=\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\frac{R_{iZ}^{(n)}}{n}\leq u_{Z}\right\}\mathbb{I}\left\{\frac{R_{iY}^{(n)}}{n}\leq u_{Y}\right\},\quad 0\leq u_{Z}\leq 1,\quad 0\leq u_{Y}\leq 1.

The empirical copula is a step function.

Proposition 2.8 in Segers et al., (2017) states that

sup(uZ,uY)[0,1]2|Cn,Z,Y(uZ,uY)Cn,Z,Yβ(uZ,uY)|=O(n1/2(logn)1/2).\sup_{(u_{Z},u_{Y})\in[0,1]^{2}}\left|C_{n,Z,Y}(u_{Z},u_{Y})-C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})\right|=O\left(n^{-1/2}(\log n)^{1/2}\right).

This indicates that Cn,Z,Y(uZ,uY)Cn,Z,Yβ(uZ,uY)0C_{n,Z,Y}(u_{Z},u_{Y})-C_{n,Z,Y}^{\beta}(u_{Z},u_{Y})\to 0 for arbitrary uZu_{Z} and uYu_{Y} as nn\to\infty. It is shown in Section 3.1 that uiZu_{iZ} can be considered as being sampled from a uniform distribution that satisfies the condition (1) as nn\to\infty. Based on results on the empirical copula processes (Segers, (2012)), Cn,Z,YC_{n,Z,Y} converges weakly to CZYC_{ZY}, where ZZ is the latent Gaussian variable for XX in Section 3.1. Since CZYC_{ZY} matches CXYC_{XY} at the CDF values of XX when ZZ satisfies the condition (1), Cn,Z,Yβ(FX(i),uY)CX,Y(FX(i),uY)𝑝0C^{\beta}_{n,Z,Y}(F_{X}(i),u_{Y})-C_{X,Y}(F_{X}(i),u_{Y})\overset{p}{\to}0 for i=1,,ki=1,\dots,k and arbitrary uYu_{Y} as nn\to\infty. This shows the consistency of the empirical beta copula estimate in (4).

6 Illustrations on Simulated Datasets

In this section, we illustrate the visualization, estimation, and diagnostic techniques proposed in the previous sections on simulated datasets. Bivariate datasets are generated from four mixture models of normal distributions, denoted by (E1), (E2), (E3), and (E4) in Table 2. In each case, the sample size is 1000.

In Table 2, the minimized KL divergence values for cases (E1) and (E2) are much smaller than those for cases (E3) and (E4), indicating that parametric copula families fit (E1) and (E2) better than (E3) and (E4). Normal score plots of the continuous variable YY versus the latent Gaussian variable ZZ generated from the ordinal variable XX based on the steps in Section 3.1 are shown in Figure 1. It can be seen that the normal score plots of (E1) and (E2) have an approximate elliptical shape that can match some commonly used parametric copula families. In contrast, for (E3) and (E4), heteroscedasticity among different components of the mixture model leads to asymmetries and unusual shapes in the normal score plots. Simple parametric copula families are inadequate for the data generated in these two cases.

Refer to caption
Figure 1: Normal score plots of the continuous variable YY and the latent Gaussian variable ZZ generated from the ordinal variable XX for cases (E1) to (E4).

Diagnostic techniques in Section 3.2 are applied to assess the fits of the parametric copula families in Table 2 with the smallest KL divergence. The conditional Q-Q plots by category of the ordinal variable XX are shown in Figure 2. There are significant departures from the 4545^{\circ} diagonal line for the bivariate copula families fitted to cases (E3) and (E4), indicating inadequate fits. This aligns with the conclusions drawn from the normal score visualizations in Figure 1.

Refer to caption
Figure 2: The conditional Q-Q plots of FY|X(|x)F_{Y|X}(\cdot|x) for different ordinal categories xx assessing the fits of the parametric copula families for four simulated datasets based on (E1)–(E4) in Table 2 with 3 categories. Each row corresponds to the conditional Q-Q plots for one dataset.

With the empirical beta copula estimate in Section 5.2, the conditional Q-Q plots by category based on the ordinal variable XX are shown in Figure 3. It can be seen that the points in all Q-Q plots are closely aligned with the 4545^{\circ} diagonal line, indicating that the empirical beta copula estimate provides a much more adequate fit to the data generated in all of these four cases.

Refer to caption
Figure 3: The conditional Q-Q plots of FY|X(|x)F_{Y|X}(\cdot|x) for different ordinal categories xx assessing the fits of the empirical beta copulas for the four simulated datasets used in Figure 2.

7 Data Example

In this section, we illustrate the proposed visualization, estimation, and diagnostic techniques for a pair of mixed continuous-ordinal variables on the Auto MPG dataset (Quinlan, (1993), available at https://archive.ics.uci.edu/ml/datasets/Auto+MPG).

This dataset contains the fuel consumption data of 398 cars from 1970 to 1982. The goal is to predict the mpg (miles per gallon as an indicator of fuel efficiency) of each car based on a set of explanatory variables. The variable cylinders can take five unique ordinal values: 3, 4, 5, 6, and 8. We merge category 3 with 4 and category 5 with 6 since only 4 and 3 observations have 3 and 5 cylinders. Some summaries of the important variables along with the transformations to achieve positive correlations with the response variable mpg are given in Table 3. The nominal variable origin indicates where the car is from (1: USA, 2: Europe, 3: Japan). Since mpg tends to increase as origin changes from USA to Europe to Japan, we treat origin as an ordinal variable for the vine structure selection and conditional Q-Q plots mentioned below.

Variable Type ρ\rho Range Transformation
cylinders ordinal -0.781 {4,6,8}\{4,6,8\} change sign, x=xx=-x
horsepower continuous -0.771 [46,230][46,230] change sign, x=xx=-x
weight continuous -0.832 [1613,5140][1613,5140] change sign, x=xx=-x
acceleration continuous 0.420 [8.0,24.8][8.0,24.8]
model year continuous 0.579 [70,82][70,82]
origin nominal 0.563 {1,2,3}\{1,2,3\}
mpg continuous [9.0,46.6][9.0,46.6]
Table 3: Summaries of the variables in the Auto MPG dataset, including their type, Spearman’s ρ\rho values with the response (mpg), range, and transformations to achieve positive correlation with mpg.

If the maximum spanning tree criterion based on absolute ρN\rho_{N} (correlation of normal scores, or polyserial/polychoric correlation) as in Chang and Joe, (2019) is used to select the vine structure for this dataset, the first tree of the resulting vine regression model would include the two ordinal variables cylinders and origin connected to weight on two edges for explanatory variables. The summary statistics of the distribution of weight (before sign change), the conditional distribution of weight given cylinders (before sign change), and the conditional distribution of weight given origin are shown in Table 4. Note that when fitting the vine copula model, the signs of weight and cylinders are changed so that the correlations between negative weight, negative cylinders, and origin are all positive. The conditional distributions of weight given cylinders have roughly equally spaced means and similar standard deviations; this is close to the setting of a mixture model with equally spaced locations and constant variance. Therefore, parametric copula families can fit this pair well. In contrast, the conditional distributions of weight given origin have unequally spaced means and very different standard deviations; this is similar to a mixture model with unequally spaced locations and non-constant variance. Therefore, it is more difficult for parametric copula families to provide good approximations to this second pair.

Subset nn Min Q1 Median Mean Q3 Max SD
None 198 1613 2224 2804 2970 3608 5140 847
cylinders=4\textit{cylinders}=4 208 1613 2049 2240 2310 2567 3270 345
cylinders=6\textit{cylinders}=6 87 2472 2938 3193 3195 3431 3907 332
cylinders=8\textit{cylinders}=8 103 3086 3799 4140 4115 4404 5140 449
origin=1\textit{origin}=1 249 1800 2720 3365 3362 4054 5140 795
origin=2\textit{origin}=2 70 1825 2067 2240 2423 2770 3820 490
origin=3\textit{origin}=3 79 1613 1985 2155 2221 2412 2930 320
Table 4: The summary statistics, including sample size (nn), quartiles, mean, and standard deviation (SD), of the distribution of weight, the conditional distributions of weight given cylinders, and the conditional distributions of weight given origin.

The best-fitting parametric copula family for the pair negative weight and negative cylinders is Gaussian (ρ=0.97\rho=0.97). Conditional Q-Q plots for weight by category of cylinders are show in the first row of Figure 4. For the pair of negative weight and origin, the large proportion of the origin=1\textit{origin}=1 category causes the best-fitting parametric copula families to have tail asymmetry with more dependence in the joint lower tail. The 1-parameter Clayton, 2-parameter BB1 and 2-parameter BB7 copulas have the largest (and approximately equal) log-likelihoods and similar conditional Q-Q plots. The conditional Q-Q plots of weight by category of origin are shown in the second row of Figure 4 with the Clayton copula. The two parametric copula families in this figure generally fit the data well. Some deviations from the 4545^{\circ} diagonal line can still be observed in the last plot for the weight and origin pair, but inference based on this copula should be acceptable. When the empirical beta copulas are fitted to these two pairs of variables, all conditional Q-Q plots show alignment with the 4545^{\circ} diagonal line, including the last plot.

Refer to caption
Figure 4: The conditional Q-Q plots of FY|X(|x)F_{Y|X}(\cdot|x) for Y=Y=weight by category of the ordinal variable X=X=cylinders (first row) and the ordinal variable X=X=origin (second row) assessing fits of the best-fitting 1-parameter and 2-parameter copula families.

8 Conclusion

Through two types of diagnostic plots and theoretical assessments using the Kullback-Leibler divergence, we show that simple parametric bivariate copula families with a few parameters can sometimes be inadequate for a pair of mixed continuous-ordinal variables. For such a pair, visualizations are proposed based on normal score plots using an appropriate latent continuous variable and conditional Q-Q plots of the continuous variable given the ordinal variable. Existing probability models for mixed continuous-ordinal variables are considered to assess the adequacy of fits of simple parametric copula families using the Kullback-Leibler divergence. When a pair of mixed continuous-ordinal variables is generated from mixture models of distributions with roughly equally spaced locations and constant scales or from conditional probit/logit models, simple parametric copula families can provide good fits. Otherwise, nonparametric counterparts can be fitted to provide better approximations. Applications to simulated and real datasets demonstrate the effectiveness of the proposed methods in identifying the lack of fit of simple parametric copula families and in improving the adequacy of fits with nonparametric copulas.

The results in this paper can be used to understand when and how some standard regression methods with ordinal and continuous explanatory variables can be approximated by the vine copula regression methodology as considered in Chang and Joe, (2019). Details will be provided in future research.

Acknowledgments

This research has been supported by the Four-Year Doctoral Fellowship of the University of British Columbia, NSERC Discovery Grant GR010293, and a Mercator Fellowship associated with Deutsche Forschungsgemeinschaft.

References

  • Chang and Joe, (2019) Chang, B. and Joe, H. (2019). Prediction based on conditional distributions of vine copulas. Computational Statistics & Data Analysis, 139:45–63.
  • Cox, (1972) Cox, D. R. (1972). The analysis of multivariate binary data. Applied Statistics, pages 113–120.
  • Genest and Nešlehová, (2007) Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data. ASTIN Bulletin, 37(2):475–515.
  • Hua and Joe, (2011) Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence of multivariate copulas. Journal of Multivariate Analysis, 102(10):1454–1471.
  • Joe, (2014) Joe, H. (2014). Dependence Modeling with Copulas. Chapman & Hall/CRC, Boca Raton, FL.
  • Kadhem and Nikoloulopoulos, (2021) Kadhem, S. H. and Nikoloulopoulos, A. K. (2021). Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74:365–403.
  • Krzanowski, (1993) Krzanowski, W. (1993). The location model for mixtures of categorical and continuous variables. Journal of Classification, 10:25–49.
  • Little and Schluchter, (1985) Little, R. J. and Schluchter, M. D. (1985). Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika, 72(3):497–512.
  • Nagler, (2018) Nagler, T. (2018). kdecopula: An R package for the kernel estimation of bivariate copula densities. Journal of Statistical Software, 84:1–22.
  • Olsson et al., (1982) Olsson, U., Drasgow, F., and Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3):337–347.
  • Quinlan, (1993) Quinlan, J. R. (1993). Combining instance-based and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning, pages 236–243.
  • Segers, (2012) Segers, J. (2012). Asymptotics of empirical copula processes under non-restrictive smoothness assumptions. Bernoulli, 18(3):764–782.
  • Segers et al., (2017) Segers, J., Sibuya, M., and Tsukahara, H. (2017). The empirical beta copula. Journal of Multivariate Analysis, 155:35–51.
  • Sklar, (1959) Sklar, A. (1959). Fonctions de répartition á nn dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8:229–231.
  • Stöber et al., (2015) Stöber, J., Hong, H. G., Czado, C., and Ghosh, P. (2015). Comorbidity of chronic diseases in the elderly: Patterns identified by a copula design for mixed responses. Computational Statistics & Data Analysis, 88:28–39.
  • Yoshiba, (2018) Yoshiba, T. (2018). Maximum likelihood estimation of skew-t copulas with its applications to stock returns. Journal of Statistical Computation and Simulation, 88:2489–2506.