This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robustness against outliers
in ordinal response model via divergence approach

Tomotaka Momozaki Tokyo University of Science Tomoyuki Nakagawa Tokyo University of Science
(Last update: )
Abstract

This study deals with the problem of outliers in ordinal response model, which is a regression on ordered categorical data as the response variable. “Outlier” means that the combination of ordered categorical data and its covariates is heterogeneous compared to other pairs. Although the ordinal response model is important for data analysis in various fields such as medicine and social sciences, it is known that the maximum likelihood method with probit, logit, log-log and complementary log-log link functions, which are often used, is strongly affected by outliers, and statistical analysts are forced to limit their analysis when there may be outliers in the data.

To solve this problem, this paper provides inference methods with two robust divergences (the density-power and γ\gamma-divergences). We also derive influence functions for the proposed methods and show conditions on the link function for them to be bounded and to redescendence. Since the commonly used link functions satisfy these conditions, the analyst can perform robust and flexible analysis with our methods. In addition, and this is a result that further highlights our contributions, we show that the influence function in the maximum likelihood method does not have redescendence for any link function in the ordinal response model. Through numerical experiments using artificial and two real data, we show that the proposed methods perform better than the maximum likelihood method with and without outliers in the data for various link functions.

Keywords: density-power divergence; γ\gamma-divergence; influence function; link function; ordered category; outlier; robust inference

Mathematics Subject Classification: Primary 62F35 ; Secondary 62J12

1 Introduction

Ordered categorical data has become popular in a wide range of fields such as medicine, sociology, psychology, political sciences, economics, marketing, and so on (Breiger,, 1981; Ashby et al.,, 1989; Uebersax,, 1999). Ordered categorical data are, for example, the progression of a disease expressed as stage 1, 2, 3, or 4, or opinions on a policy expressed as opposition, neutrality, or approval. In addition, when continuous data are summarized into categorical data, such as ages 0-20, 21-40, 41-60, 61-80, and 80 or more, the categorical data are ordered categorical data. For this reason, ordered categorical data are often considered to be discretized values of latent continuous variables.

There have been many studies on how to analyze ordered categorical data from half a century ago to the present (McCullagh,, 1980; Albert and Chib,, 1993; Tomizawa et al.,, 2006; Agresti,, 2010; Agresti and Kateri,, 2017; Baetschmann et al.,, 2020). This paper focuses on one of the most commonly used methods, the ordinal response model, that is a regression on ordered categorical data as a response variable. The ordinal response model is one of the frameworks of generalized linear models, and is called the ordinal regression model, or the cumulative link model since it connects the cumulative probability of belonging to a certain category to the covariates with the “link” function (McCullagh,, 1980). The link functions most often used in data analysis are the probit link, which is the distribution function of the standard normal distribution, the logit link, which is the distribution function of the standard logistic distribution, the log-log link, which is the distribution function of the right-skewed Gumbel distribution, and the complementary log-log link, which is the distribution function of the left-skewed log-Weibull distribution. The formulation of the ordinal response model and its details are described in Section 2.

Outliers can be caused by various reasons, such as typos in the values and misrecognition of units. Since it is well known that the maximum likelihood method is strongly affected by outliers, another statistical method that can appropriately deal with outliers is required nowadays. There have been many studies on how to deal with outliers, for example, the well-known inferences based on Huber-type loss and robust divergences (Hampel et al.,, 1986; Basu et al.,, 1998; Jones et al.,, 2001; Fujisawa and Eguchi,, 2008; Huber and Ronchetti,, 2009; Ghosh and Basu,, 2013; Maronna et al.,, 2019; Castilla et al.,, 2021), but most of them have focused on continuous, binary, or counted data, with few focusing on ordered categorical data.

Of course, the ordinal response model is not an exception to the model affected by outliers, although the response variable is discrete type data. The maximum likelihood method in the ordinal response model with the commonly used link functions, such as above mentioned is strongly affected by outliers. Although outliers in discrete data may be harder to imagine than in continuous data, an ”outlier” in the ordered response model is defined as a combination of ordered categorical data and its covariates that is heterogeneous relative to other pairs (Riani et al.,, 2011). This may be taken to mean that the combination of ordered categorical data and its covariates is inconsistent. Unless otherwise noted, we use the same definition of outliers in this paper.

There are various methods to check whether an inference method is robust against outliers, and often an influence function (Hampel,, 1974) may be used in linear regression. Simply put, the influence function is an index to check the “influence” of data on an inference method, and its value must be bounded at least to be robust against outliers. This is because if the value diverges for a given data, the result of the inference depends almost entirely on the data without regard to other data.

As mentioned earlier, there are few studies on outliers in the ordinal response model, but there are studies that derive the conditions for the inference method and link function such that the influence function is bounded in the ordinal response model as well. Croux et al., (2013) and Iannario et al., (2017) proposed the weighted maximum likelihood methods using the Student, 0/1, and Huber weights so that the influence function in the ordinal response model is bounded. However, these papers do not specify the algorithms to perform their inference methods, which makes it difficult for practitioners who wish to implement the robust ordinal response model. We also attempted to implement their inference methods, but found it extremely difficult to converge the numerical optimization algorithm to compute their estimators.

Scalera et al., (2021) derived a class of link functions for bounded influence functions in the maximum likelihood method in the ordinal response model. However, the class does not include probit, logit, log-log and complementary log-log link functions which are commonly used in the analysis. That is, analysts cannot perform robust and flexible modeling for ordered categorical data. We have also confirmed that misspecification of the link function causes a substantial bias in the parameter estimation, although this is not discussed in depth here. This is also the case in the framework of binary regression, which is discussed in Czado and Santner, (1992), and confirms that the flexibility in the choice of the link function is very important.

The boundedness of the influence function is an important property to ensure that the result of inference is not too much sensitive by only certain data, but it does not mean that data that are largely outliers compared to other data do not influence the result of inference at all. In other words, even if the influence function is bounded, there is still some influence from outliers. Therefore, it is desirable for the influence function to satisfy not only boundedness but also redescendence (Maronna et al.,, 2019), which means that the influence of large outliers on inference can be ignored. In this paper, we consider a class of link functions for the influence function to satisfy redescendence in the ordinal response model in the maximum likelihood method, and show that such a class does not exist (Theorem 3 in Section 4).

Since the influence function for the maximum likelihood method in the ordinal response model does not satisfy redescendence, it is necessary to consider another inference method. The weighted maximum likelihood methods of Croux et al., (2013) and Iannario et al., (2017) do not satisfy redescendence because the weights used are the Student, 0/1, and Huber types (Maronna et al.,, 2019), although the influence function is bounded. Thus, we propose inference methods in the ordinal response model with two robust divergences, the density-power (DP) (Basu et al.,, 1998) and γ\gamma-divergences (Jones et al.,, 2001; Fujisawa and Eguchi,, 2008), which are expected to satisfy the redescendence of their influence functions.

Although recently, Pyne et al., (2022) proposed an inference method in the ordinal response model using the DP divergence, they did not discuss the condition for link functions to satisfy boundedness and redescendence of the influence function in their method. Our contribution is not only to derive the condition for link functions satisfying boundedness and redescendence of the influence function in the DP divergence (Theorem 4 in Section 4), but also to propose an inference method in the ordinal response model using the γ\gamma-divergence, which is also commonly used for robust inference, and derive the condition of link functions to achieve robust inference (Theorem 5 in Section 4), and compare the performance between the DP and γ\gamma-divergences. The derivation of this condition provides a guideline for practitioners to decide which of the various link functions to use when conducting robust analysis against outliers using the ordinal response model. It will also help to answer the question of whether the DP or γ\gamma-divergences should be used. Of course, the algorithm of our proposed methods can be used by anyone since the programming code described using the R\mathrm{R} is available on the GitHub repository (https://github.com/t-momozaki/RORM).

The paper is organized as follows Section 2 introduces the ordinal response model and its maximum likelihood method. Section 3 introduces two robust divergences, the DP and γ\gamma-divergences, and proposes estimators based on these divergences. We also briefly show that the proposed estimators are robust against outliers. Section 4 derives the influence functions of the proposed methods and considers the robustness of the proposed methods in terms of the influence functions. Section 5 considers the asymptotic properties of the proposed estimators and the testing procedures. Section 6 conducts some numerical experiments using artificial and two real data for the proposed methods and evaluates the performance of the proposed methods. Section 7 provides the conclusion and some remarks.

2 Ordinal response model and its maximum likelihood method

In this section, we discuss the ordinal response model and the maximum likelihood method, a parameter estimation method commonly used in the model.

Consider the following latent variable model for MM ordered categorical data yiy_{i} (i=1,2,,ni=1,2,\ldots,n) as the response variable.

zi=𝒙i𝜷+εi,z_{i}=\bm{x}_{i}^{\top}\bm{\beta}+\varepsilon_{i}, (1)

for i=1,2,,ni=1,2,\ldots,n, where ziz_{i} is called the (continuous) latent variable,

yi={1(δ0<ziδ1)2(δ1<ziδ2)M(δM1<ziδM),y_{i}=\begin{cases}1&(\delta_{0}<z_{i}\leq\delta_{1})\\ 2&(\delta_{1}<z_{i}\leq\delta_{2})\\ \vdots\\ M&(\delta_{M-1}<z_{i}\leq\delta_{M})\end{cases},

𝒙i=(xi1,xi2,,xip)\bm{x}_{i}=(x_{i1},x_{i2},\ldots,x_{ip})^{\top}, 𝜷=(β1,β2,,βp)\bm{\beta}=(\beta_{1},\beta_{2},\ldots,\beta_{p})^{\top}, εii.i.d.G()\varepsilon_{i}\overset{i.i.d.}{\sim}G(\cdot) (known) and has the density function g()g(\cdot), and =δ0<δ1<<δM=-\infty=\delta_{0}<\delta_{1}<\cdots<\delta_{M}=\infty are the cutpoints. For the identification, in the G()G(\cdot) there is no error scale and the latent model has no the intercept term. Note that the observed data is (𝒚,𝑿)(\bm{y},\bm{X}), where 𝒚=(y1,y2,,yn)\bm{y}=(y_{1},y_{2},\ldots,y_{n})^{\top} and 𝑿=(𝒙1,𝒙2,,𝒙n)\bm{X}=(\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{n})^{\top}, and the latent variable 𝒛=(z1,z2,,zp)\bm{z}=(z_{1},z_{2},\ldots,z_{p})^{\top} is unobserved. The parameters to be estimated in this model are 𝜽=(𝜷,𝜹)\bm{\theta}=(\bm{\beta},\bm{\delta}), where 𝜹=(δ1,δ2,,δM1)\bm{\delta}=(\delta_{1},\delta_{2},\ldots,\delta_{M-1}).

The probability mass function of yiy_{i} given 𝒙i\bm{x}_{i} with 𝜽\bm{\theta} is

f(yi|𝒙i;𝜽)=m=1MPr(yi=m|𝒙i;𝜽)I(yi=m),f(y_{i}|\bm{x}_{i};\bm{\theta})=\prod_{m=1}^{M}\Pr(y_{i}=m|\bm{x}_{i};\bm{\theta})^{I(y_{i}=m)},

where the indicator function I()I(\cdot) and

Pr(yi=m|𝒙i;𝜽)\displaystyle\Pr(y_{i}=m|\bm{x}_{i};\bm{\theta}) =Pr(δm1<ziδm|𝒙i;𝜽)\displaystyle=\Pr(\delta_{m-1}<z_{i}\leq\delta_{m}|\bm{x}_{i};\bm{\theta})
=G(δm𝒙i𝜷)G(δm1𝒙i𝜷).\displaystyle=G(\delta_{m}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{i}^{\top}\bm{\beta}).

Namely, the likelihood function for the ordinal response model is expressed as

f(𝒚|𝑿;𝜽)=i=1n[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)].f(\bm{y}|\bm{X};\bm{\theta})=\prod_{i=1}^{n}\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]. (2)

The maximum likelihood estimator (MLE) of 𝜽\bm{\theta} are obtained as 𝜽\bm{\theta} that minimizes the negative log-likelihood function. That is, noting the constraint δ1<δ2<<δM1\delta_{1}<\delta_{2}<\cdots<\delta_{M-1} the MLE is obtained by

𝜽^ML\displaystyle\hat{\bm{\theta}}_{ML} =argmin𝜽{logf(𝒚|𝑿;𝜽)}\displaystyle=\mathop{\rm arg~{}min}\limits_{\bm{\theta}}\left\{-\log f(\bm{y}|\bm{X};\bm{\theta})\right\}
=argmin𝜽{i=1nlog[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)]}.\displaystyle=\mathop{\rm arg~{}min}\limits_{\bm{\theta}}\left\{-\sum_{i=1}^{n}\log\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]\right\}.

It is well known that the MLE 𝜽^ML\hat{\bm{\theta}}_{ML} is strongly affected by the outliers in the observations (𝒚,𝑿\bm{y},\bm{X}). This is because if |𝒙1𝜷||\bm{x}_{1}^{\top}\bm{\beta}| is large enough, i.e., if (y1,𝒙1)(y_{1},\bm{x}_{1}) is outliers compared to the other data, then log[G(δy1𝒙1𝜷)G(δy11𝒙1𝜷)]\log\left[G(\delta_{y_{1}}-\bm{x}_{1}^{\top}\bm{\beta})-G(\delta_{y_{1}-1}-\bm{x}_{1}^{\top}\bm{\beta})\right] takes a large value and the entire objective function is strongly affected by the outlier (y1,𝒙1)(y_{1},\bm{x}_{1}).

Remark (Optimization with the ordering constraint).

There exist various methods for optimization algorithms with the ordering constraint such as the cutpoints in the ordinal response model, including those using matrix notation (Christensen,, 2018), but in this paper we use the Franses and Paap, (2001)’s method, which reparameterize the cutpoints as follows.

δ1=δ~1,,δm=δ~1+j=2mδ~j2for m=2,3,,M1.\delta_{1}=\tilde{\delta}_{1},~{}~{},\delta_{m}=\tilde{\delta}_{1}+\sum_{j=2}^{m}\tilde{\delta}_{j}^{2}~{}~{}\mbox{for $m=2,3,\ldots,M-1$}.

Then the likelihood function (2) for the ordinal response model can be expressed as

f(𝒚|𝑿;𝜽~)=i=1n[G(δ~1+j=2yiδ~j2𝒙i𝜷)G(δ~1+j=2yiδ~j2𝒙i𝜷)],f(\bm{y}|\bm{X};\tilde{\bm{\theta}})=\prod_{i=1}^{n}\left[G\left(\tilde{\delta}_{1}+\sum_{j=2}^{y_{i}}\tilde{\delta}_{j}^{2}-\bm{x}_{i}^{\top}\bm{\beta}\right)-G\left(\tilde{\delta}_{1}+\sum_{j=2}^{y_{i}}\tilde{\delta}_{j}^{2}-\bm{x}_{i}^{\top}\bm{\beta}\right)\right],

where 𝜽~=(𝜷,𝜹~)\tilde{\bm{\theta}}=(\bm{\beta},\tilde{\bm{\delta}}) with 𝜹~=(δ~1,δ~2,,δ~M1)\tilde{\bm{\delta}}=(\tilde{\delta}_{1},\tilde{\delta}_{2},\ldots,\tilde{\delta}_{M-1}), and we can obtain the MLE of 𝜽~\tilde{\bm{\theta}} by

argmin𝜽~{logf(𝒚|𝑿;𝜽~)}.\displaystyle\mathop{\rm arg~{}min}\limits_{\tilde{\bm{\theta}}}\left\{-\log f(\bm{y}|\bm{X};\tilde{\bm{\theta}})\right\}.

This minimization problem can be easily solved by using the optim function in the stats library of the R programming language. All the following minimization problems will be solved using the above reparameterization, but to simplify the notation, we will use 𝜹\bm{\delta} before reparameterization throughout this paper.

3 Estimators via robust divergence for the ordinal response model

This section introduces two robust divergences, the DP and γ\gamma-divergences, proposes robust estimators for the ordinal response model using these divergences, and describes their properties against outliers. Section 3.1 describes the case using the DP divergence, and Section 3.2 describes the case using the γ\gamma-divergence.

3.1 Case: Density-Power divergence

Suppose that h(x,y)h(x,y), h(y|x)h(y|x), and h(x)h(x) are the underlying probability density (or mass) functions of (x,y)(x,y), yy given xx, and xx, respectively. Let f(y|x;θ)f(y|x;\theta) be the probability density (or mass) function of yy given xx with parameter θ\theta. This paper uses the definition of the DP cross entropy

dDP(h(y|x),f(y|x;θ);h(x))\displaystyle d_{DP}(h(y|x),f(y|x;\theta);h(x))
=1αf(y|x;θ)αh(x,y)𝑑x𝑑y+11+α(f(y|x;θ)1+α𝑑y)h(x)𝑑x\displaystyle=-\frac{1}{\alpha}\int\int f(y|x;\theta)^{\alpha}h(x,y)dxdy+\frac{1}{1+\alpha}\int\left(\int f(y|x;\theta)^{1+\alpha}dy\right)h(x)dx (3)

for α>0\alpha>0. Using the DP cross entropy, the DP divergence is defined as

DDP(h(y|x),f(y|x;θ);h(x))=dDP(h(y|x),h(y|x);h(x))+dDP(h(y|x),f(y|x;θ);h(x)).D_{DP}(h(y|x),f(y|x;\theta);h(x))=-d_{DP}(h(y|x),h(y|x);h(x))+d_{DP}(h(y|x),f(y|x;\theta);h(x)).

About the properties of the DP divergence, see Basu et al., (1998).

Using the DP divergence, the target parameter can be considered by

θDP\displaystyle\theta_{DP}^{*} =argminθDDP(h(y|x),f(y|x;θ);h(x))\displaystyle=\mathop{\rm arg~{}min}\limits_{\theta}D_{DP}(h(y|x),f(y|x;\theta);h(x))
=argminθdDP(h(y|x),f(y|x;θ);h(x)),\displaystyle=\mathop{\rm arg~{}min}\limits_{\theta}d_{DP}(h(y|x),f(y|x;\theta);h(x)),

and when h(y|x)=f(y|x;θ)h(y|x)=f(y|x;\theta^{*}), θDP=θ\theta_{DP}^{*}=\theta^{*}.

Suppose that the observed data (x1,y1),(x2,y2),,(xn,yn)(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{n},y_{n}) are randomly drawn from the underlying distribution h(x,y)h(x,y). Then, from the formulation (3), the DP cross entropy is empirically estimated by

d~DP(f(y|x;θ))=1α{1ni=1nf(yi|xi;θ)α}+11+α{1ni=1nf(y|xi;θ)1+α𝑑y}.\tilde{d}_{DP}(f(y|x;\theta))=-\frac{1}{\alpha}\left\{\frac{1}{n}\sum_{i=1}^{n}f(y_{i}|x_{i};\theta)^{\alpha}\right\}+\frac{1}{1+\alpha}\left\{\frac{1}{n}\sum_{i=1}^{n}\int f(y|x_{i};\theta)^{1+\alpha}dy\right\}.

Namely, the DP estimator is defined by

θ^DP=argminθd~DP(f(y|x;θ)).\hat{\theta}_{DP}=\mathop{\rm arg~{}min}\limits_{\theta}\tilde{d}_{DP}(f(y|x;\theta)).

Based on the above, we propose the following DP-estimator for the ordinal response model.

𝜽^DP=argmin𝜽d~DP(f(𝒚|𝑿;𝜽)),\hat{\bm{\theta}}_{DP}=\mathop{\rm arg~{}min}\limits_{\bm{\theta}}\tilde{d}_{DP}(f(\bm{y}|\bm{X};\bm{\theta})), (4)

where the empirically estimated DP cross entropy

d~DP(f(𝒚|𝑿;𝜽))=1α{1ni=1n[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)]α}+11+α{1ni=1nm=1M[G(δm𝒙i𝜷)G(δm1𝒙i𝜷)]1+α}.\begin{split}\tilde{d}_{DP}(f(\bm{y}|\bm{X};\bm{\theta}))=&-\frac{1}{\alpha}\left\{\frac{1}{n}\sum_{i=1}^{n}\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{\alpha}\right\}\\ &+\frac{1}{1+\alpha}\left\{\frac{1}{n}\sum_{i=1}^{n}\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{1+\alpha}\right\}.\end{split} (5)

We briefly confirm that the proposed DP-estimator is robust against outliers. Suppose that |𝒙1𝜷||\bm{x}_{1}^{\top}\bm{\beta}| is large enough, i.e., (y1,𝒙1)(y_{1},\bm{x}_{1}) is outliers compared to the other data. Then, the empirically estimated DP cross entropy (5) is expressed as

d~DP(f(𝒚|𝑿;𝜽))\displaystyle\tilde{d}_{DP}(f(\bm{y}|\bm{X};\bm{\theta}))\approx 1α{1n1i=2n[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)]α}\displaystyle-\frac{1}{\alpha}\left\{\frac{1}{n-1}\sum_{i=2}^{n}\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{\alpha}\right\}
+11+α{1n1i=2nm=1M[G(δm𝒙i𝜷)G(δm1𝒙i𝜷)]1+α}\displaystyle+\frac{1}{1+\alpha}\left\{\frac{1}{n-1}\sum_{i=2}^{n}\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{1+\alpha}\right\}

since G(δy1𝒙1𝜷)G(δy11𝒙1𝜷)G(\delta_{y_{1}}-\bm{x}_{1}^{\top}\bm{\beta})-G(\delta_{y_{1}-1}-\bm{x}_{1}^{\top}\bm{\beta}) takes a small value. Namely, the objective function d~DP(f(𝒚|𝑿;𝜽))\tilde{d}_{DP}(f(\bm{y}|\bm{X};\bm{\theta})) in (4) is expressed by removing the outliers (y1,𝒙1)(y_{1},\bm{x}_{1}), and as a result, the effect of the outliers can be ignored in the parameter estimation.

3.2 Case: γ\gamma-divergence

Suppose that h(x,y)h(x,y), h(y|x)h(y|x), and h(x)h(x) are the underlying probability density (or mass) functions of (x,y)(x,y), yy given xx, and xx, respectively. Let f(y|x;θ)f(y|x;\theta) be the probability density (or mass) function of yy given xx with parameter θ\theta. This paper uses the definition of the γ\gamma-cross entropy of Kawashima and Fujisawa, (2017) as follows.

dγ(h(y|x),f(y|x;θ);h(x))\displaystyle d_{\gamma}(h(y|x),f(y|x;\theta);h(x))
=1γlog(h(y|x)f(y|x;θ)γ𝑑y)h(x)𝑑x+11+γlog(f(y|x;θ)1+γ𝑑y)h(x)𝑑x\displaystyle=-\frac{1}{\gamma}\log\int\left(\int h(y|x)f(y|x;\theta)^{\gamma}dy\right)h(x)dx+\frac{1}{1+\gamma}\log\int\left(\int f(y|x;\theta)^{1+\gamma}dy\right)h(x)dx
=1γlogf(y|x;θ)γh(x,y)𝑑x𝑑y+11+γlog(f(y|x;θ)1+γ𝑑y)h(x)𝑑x\displaystyle=-\frac{1}{\gamma}\log\int\int f(y|x;\theta)^{\gamma}h(x,y)dxdy+\frac{1}{1+\gamma}\log\int\left(\int f(y|x;\theta)^{1+\gamma}dy\right)h(x)dx (6)

for γ>0\gamma>0. Using the γ\gamma-cross entropy, the γ\gamma-divergence is defined as follows.

Dγ(h(y|x),f(y|x;θ);h(x))=dγ(h(y|x),h(y|x);h(x))+dγ(h(y|x),f(y|x;θ);h(x)).D_{\gamma}(h(y|x),f(y|x;\theta);h(x))=-d_{\gamma}(h(y|x),h(y|x);h(x))+d_{\gamma}(h(y|x),f(y|x;\theta);h(x)).

About the properties of the γ\gamma-divergence, see Fujisawa and Eguchi, (2008) and Kawashima and Fujisawa, (2017).

Using the γ\gamma-divergence, the target parameter can be considered by

θγ\displaystyle\theta_{\gamma}^{*} =argminθDγ(h(y|x),f(y|x;θ);h(x))\displaystyle=\mathop{\rm arg~{}min}\limits_{\theta}D_{\gamma}(h(y|x),f(y|x;\theta);h(x))
=argminθdγ(h(y|x),f(y|x;θ);h(x)),\displaystyle=\mathop{\rm arg~{}min}\limits_{\theta}d_{\gamma}(h(y|x),f(y|x;\theta);h(x)),

and when h(y|x)=f(y|x;θ)h(y|x)=f(y|x;\theta^{*}), θγ=θ\theta_{\gamma}^{*}=\theta^{*}.

Suppose that the observed data (x1,y1),(x2,y2),,(xn,yn)(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{n},y_{n}) are randomly drawn from the underlying distribution h(x,y)h(x,y). Then, from the formulation (6), the γ\gamma-cross entropy is empirically estimated by

d~γ(f(y|x;θ))=1γlog{1ni=1nf(yi|xi;θ)γ}+11+γlog{1ni=1nf(y|xi;θ)1+γ𝑑y}.\tilde{d}_{\gamma}(f(y|x;\theta))=-\frac{1}{\gamma}\log\left\{\frac{1}{n}\sum_{i=1}^{n}f(y_{i}|x_{i};\theta)^{\gamma}\right\}+\frac{1}{1+\gamma}\log\left\{\frac{1}{n}\sum_{i=1}^{n}\int f(y|x_{i};\theta)^{1+\gamma}dy\right\}.

Namely, the γ\gamma-estimator is defined by

θ^γ=argminθd~γ(f(y|x;θ)).\hat{\theta}_{\gamma}=\mathop{\rm arg~{}min}\limits_{\theta}\tilde{d}_{\gamma}(f(y|x;\theta)). (7)

Based on the above, we propose the following γ\gamma-estimator for the ordinal response model.

𝜽^γ=argmin𝜽d~γ(f(𝒚|𝑿;𝜽)),\hat{\bm{\theta}}_{\gamma}=\mathop{\rm arg~{}min}\limits_{\bm{\theta}}\tilde{d}_{\gamma}(f(\bm{y}|\bm{X};\bm{\theta})), (8)

where the empirically estimated γ\gamma-cross entropy

d~γ(f(𝒚|𝑿;𝜽))=1γlog{1ni=1n[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)]γ}+11+γlog{1ni=1nm=1M[G(δm𝒙i𝜷)G(δm1𝒙i𝜷)]1+γ}.\begin{split}\tilde{d}_{\gamma}(f(\bm{y}|\bm{X};\bm{\theta}))=&-\frac{1}{\gamma}\log\left\{\frac{1}{n}\sum_{i=1}^{n}\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{\gamma}\right\}\\ &+\frac{1}{1+\gamma}\log\left\{\frac{1}{n}\sum_{i=1}^{n}\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{1+\gamma}\right\}.\end{split} (9)

We briefly confirm that the proposed γ\gamma-estimator is robust against outliers. Suppose that |𝒙1𝜷||\bm{x}_{1}^{\top}\bm{\beta}| is large enough, i.e., (y1,𝒙1)(y_{1},\bm{x}_{1}) is outliers compared to the other data. Then, the empirically estimated γ\gamma-cross entropy (9) is expressed as

d~γ(f(𝒚|𝑿;𝜽))\displaystyle\tilde{d}_{\gamma}(f(\bm{y}|\bm{X};\bm{\theta}))\approx 1γlog{1n1i=2n[G(δyi𝒙i𝜷)G(δyi1𝒙i𝜷)]γ}\displaystyle-\frac{1}{\gamma}\log\left\{\frac{1}{n-1}\sum_{i=2}^{n}\left[G(\delta_{y_{i}}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{y_{i}-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{\gamma}\right\}
+11+γlog{1n1i=2nm=1M[G(δm𝒙i𝜷)G(δm1𝒙i𝜷)]1+γ}\displaystyle+\frac{1}{1+\gamma}\log\left\{\frac{1}{n-1}\sum_{i=2}^{n}\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{i}^{\top}\bm{\beta})\right]^{1+\gamma}\right\}

since G(δy1𝒙1𝜷)G(δy11𝒙1𝜷)G(\delta_{y_{1}}-\bm{x}_{1}^{\top}\bm{\beta})-G(\delta_{y_{1}-1}-\bm{x}_{1}^{\top}\bm{\beta}) takes a small value. Namely, the objective function d~γ(f(𝒚|𝑿;𝜽))\tilde{d}_{\gamma}(f(\bm{y}|\bm{X};\bm{\theta})) in (8) is expressed by removing the outliers (y1,𝒙1)(y_{1},\bm{x}_{1}), and as a result, the effect of the outliers can be ignored in the parameter estimation.

4 Influence functions for the ordinal response model

The influence function is classically often used as a measure of robustness of an estimator, and it is required that the influence function is bounded for the observations (see, Hampel,, 1974). This is because the influence function represents the effect on the estimator when there are a few outliers in the observed values. Furthermore, it preferably has the redescendence, that is the property that large outliers have little effect on the parameter estimation (Maronna et al.,, 2019).

This section derives the respective influence functions in the ordinal response model using the DP and γ\gamma-divergences, and discusses the robustness of each proposed estimation method against outliers in terms of the influence functions. The influence functions are plotted and compared. Before that, we describe notations for the discussion and discuss the influence function in the maximum likelihood method.

Suppose that (y1,𝒙1),(y2,𝒙2),,(yn,𝒙n)(y_{1},\bm{x}_{1}),(y_{2},\bm{x}_{2}),\ldots,(y_{n},\bm{x}_{n}) are generated from the true distribution F(y,𝒙)F(y,\bm{x}), and consider the contaminated model Fρ(y,𝒙)=(1ρ)F(y,𝒙)+ρΔ(yo,𝒙o)(y,𝒙)F_{\rho}(y,\bm{x})=(1-\rho)F(y,\bm{x})+\rho\Delta_{(y_{o},\bm{x}_{o})}(y,\bm{x}), where ρ\rho is the contamination ratio and Δ(yo,𝒙o)(y,𝒙)\Delta_{(y_{o},\bm{x}_{o})}(y,\bm{x}) is the contaminating distribution degenerated at (yo,𝒙o)(y_{o},\bm{x}_{o}). The influence function in the ordinal response model using the maximum likelihood method is expressed as follows.

IFML(yo,𝒙o;F,𝜽)=(𝑴ML(𝜽,ψML))1ψML(yo|𝒙o;𝜽),IF_{ML}(y_{o},\bm{x}_{o};F,\bm{\theta})=\left(\bm{M}_{ML}(\bm{\theta},\psi_{ML})\right)^{-1}\psi_{ML}(y_{o}|\bm{x}_{o};\bm{\theta}),

where

ψML(yo|𝒙o;𝜽)=s(yo|𝒙o;𝜽)=logf(yo|𝒙o;𝜽)𝜽,\psi_{ML}(y_{o}|\bm{x}_{o};\bm{\theta})=s(y_{o}|\bm{x}_{o};\bm{\theta})=\frac{\partial\log f(y_{o}|\bm{x}_{o};\bm{\theta})}{\partial\bm{\theta}},
𝑴ML(𝜽,ψML)=n𝔼(y,𝒙)[𝜽ψML(y|𝒙;𝜽)]=n(𝜽),\bm{M}_{ML}(\bm{\theta},\psi_{ML})=-n\mathbb{E}_{(y,\bm{x})}\left[\frac{\partial}{\partial\bm{\theta}}\psi_{ML}(y|\bm{x};\bm{\theta})\right]=n\mathcal{I}(\bm{\theta}),

and (𝜽)\mathcal{I}(\bm{\theta}) is the fisher information matrix. Note that since 𝔼(y,𝒙)\mathbb{E}_{(y,\bm{x})} is the expected value for the distribution F(y,𝒙)F(y,\bm{x}), 𝑴ML(𝜽,ψML)\bm{M}_{ML}(\bm{\theta},\psi_{ML}) is constant and bounded with respect to (y,𝒙)(y,\bm{x}). That is, for the boundedness of the influence function IFML(yo,𝒙o;F,𝜽)IF_{ML}(y_{o},\bm{x}_{o};F,\bm{\theta}), the function ψML(yo|𝒙o;𝜽)\psi_{ML}(y_{o}|\bm{x}_{o};\bm{\theta}) must be bounded.

The influence function of the parameter βk\beta_{k} (k=1,2,,pk=1,2,\ldots,p) in the maximum likelihood method for the ordinal response model is expressed as follows.

IFML(yo,𝒙o;F,βk)ψML(yo|𝒙o;βk)=g(δyo𝒙o𝜷)g(δyo1𝒙o𝜷)G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)xok.IF_{ML}(y_{o},\bm{x}_{o};F,\beta_{k})\propto\psi_{ML}(y_{o}|\bm{x}_{o};\beta_{k})=-\frac{g(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-g(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})}{G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})}x_{ok}. (10)

The influence function of the parameter δl\delta_{l} (l=1,2,,M1l=1,2,\ldots,M-1) in the maximum likelihood method for the ordinal response model is expressed as follows.

IFML(yo,𝒙o;F,δl)ψML(yo|𝒙o;δl)=g(δl𝒙o𝜷)[I(yo=l)I(yo=l+1)]G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷).IF_{ML}(y_{o},\bm{x}_{o};F,\delta_{l})\propto\psi_{ML}(y_{o}|\bm{x}_{o};\delta_{l})=\frac{g(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\left[I(y_{o}=l)-I(y_{o}=l+1)\right]}{G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})}. (11)

In order for these influence functions (10) and (11) to be bounded in the maximum likelihood method, the conditions for the distribution GG of εi\varepsilon_{i} in the latent variable model (1) were derived by Scalera et al., (2021).

Theorem 1 (Proposition 3.2 of Scalera et al.,, 2021).

The influence function for 𝛃\bm{\beta} (10) is bounded if and only if

limu±|ulogg(u)u|<+,\lim_{u\to\pm\infty}\left|u\frac{\partial\log g(u)}{\partial u}\right|<+\infty, (12)

where the g()g(\cdot) is the density function of GG.

Theorem 2 (Proposition 3.1 of Scalera et al.,, 2021).

The influence function for 𝛅\bm{\delta} (11) is bounded if and only if

limu±|logg(u)u|<+,\lim_{u\to\pm\infty}\left|\frac{\partial\log g(u)}{\partial u}\right|<+\infty, (13)

where the g()g(\cdot) is the density function of GG.

The distributions GG that satisfy the conditions (12) and (13) include the Student’s tt-distribution where the degree of freedom ν\nu is sufficiently small (Scalera et al.,, 2021). When the distribution GG is the Student’s tt-distribution, since the influence function of βk\beta_{k} is ν1-\nu-1 for large outliers, the influence function of βk\beta_{k} is bounded for ν<+\nu<+\infty, however the parameter estimation is always affected by outliers constantly. Of course, the same is true for δl\delta_{l}.

Although Scalera et al., (2021) mention the boundedness of the influence functions in the maximum likelihood method, they do not mention redescendence. Hence, we further give the following theorem on the redescendence of the influence functions in the maximum likelihood method.

Theorem 3.

The influence functions for 𝛃\bm{\beta} (10) and 𝛅\bm{\delta} (11) in the maximum likelihood method for the ordinal response model are not redescendence.

The proof is given in Appendix A.1.

Remark (Redescendence of the influence function of the maximum likelihood method in the ordinal response model).

There are various studies on heavy-tailed distributions, and recently the log-regularly varying function (Desgagné,, 2015) whose order of tail is same as (ulogu)1(u\log u)^{-1}, such as the log-Pareto truncated normal distribution proposed by Gagnon et al., (2020) and the extremely heavily-tailed distribution proposed by Hamura et al., (2022) is well known. However, even if these distributions are applied to G()G(\cdot), the influence function of βk\beta_{k} for the maximum likelihood method in the ordinal response model do not have redescendence. It is also known that g(δl𝒙i𝜷)/g(δ𝒙i𝜷)g(\delta_{l}-\bm{x}_{i}^{\top}\bm{\beta})/g(\delta^{*}-\bm{x}_{i}^{\top}\bm{\beta}) is of constant order for regularly varying and log-regularly varying functions (O’Hagan and Pericchi,, 2012; Desgagné,, 2015). Therefore, when Student’s tt-distribution or the heavy tailed distributions described above is applied to the distribution G()G(\cdot), when yi=1y_{i}=1 and l=1l=1, or yi=My_{i}=M and l=M1l=M-1, the influence function of δl\delta_{l} is zero for large outliers.

4.1 Case: Density-Power divergence

In the similar manner as in the maximum likelihood method, the influence function using the DP divergence is expressed as follows.

IFDP(yo,𝒙o;F,𝜽)ψDP(yo|𝒙o;𝜽),IF_{DP}(y_{o},\bm{x}_{o};F,\bm{\theta})\propto\psi_{DP}(y_{o}|\bm{x}_{o};\bm{\theta}),

where

ψDP(yo|𝒙o;𝜽)=f(yo|𝒙o;𝜽)αs(yo|𝒙o;𝜽)𝔼(y|𝒙)[f(y|𝒙;𝜽)αs(y|𝒙;𝜽)]\psi_{DP}(y_{o}|\bm{x}_{o};\bm{\theta})=f(y_{o}|\bm{x}_{o};\bm{\theta})^{\alpha}s(y_{o}|\bm{x}_{o};\bm{\theta})-\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\alpha}s(y|\bm{x};\bm{\theta})]

and 𝔼(y|𝒙)\mathbb{E}_{(y|\bm{x})} is the conditional expectation for the conditional distribution F(y|𝒙)F(y|\bm{x}). Thus, the influence functions of the parameters βk\beta_{k} (k=1,2,,pk=1,2,\ldots,p) and δl\delta_{l} (l=1,2,,M1l=1,2,\ldots,M-1) for the ordinal response model with the DP divergence are expressed by

IFDP(yo,𝒙o;F,βk)ψDP(yo|𝒙o;βk)=[g(δyo𝒙o𝜷)g(δyo1𝒙o𝜷)][G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]α1xok+(m=1M[g(δm𝒙o𝜷)g(δm1𝒙o𝜷)][G(δm𝒙o𝜷)G(δm1𝒙o𝜷)]α)xok\begin{split}&IF_{DP}(y_{o},\bm{x}_{o};F,\beta_{k})\propto\psi_{DP}(y_{o}|\bm{x}_{o};\beta_{k})\\ =&-\left[g(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-g(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha-1}x_{ok}\\ &+\left(\sum_{m=1}^{M}\left[g(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-g(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]\left[G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha}\right)x_{ok}\end{split} (14)

and

IFDP(yo,𝒙o;F,δl)ψDP(yo|𝒙o;δl)=g(δl𝒙o𝜷)[G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]α1[I(yo=l)I(yo=l+1)]g(δl𝒙o𝜷)×([G(δl𝒙o𝜷)G(δl1𝒙o𝜷)]α[G(δl+1𝒙o𝜷)G(δl𝒙o𝜷)]α),\begin{split}&IF_{DP}(y_{o},\bm{x}_{o};F,\delta_{l})\propto\psi_{DP}(y_{o}|\bm{x}_{o};\delta_{l})\\ =&g(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha-1}\left[I(y_{o}=l)-I(y_{o}=l+1)\right]\\ &-g(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\\ &\quad\times\left(\left[G(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{l-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha}-\left[G(\delta_{l+1}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha}\right),\end{split} (15)

respectively.

Here, the following theorem holds for the influence functions (14) and (15) using the DP divergence.

Theorem 4.

If there exists a certain 0<α10<\alpha\leq 1 such that

limu±g(u)αu=0,\lim_{u\to\pm\infty}g(u)^{\alpha}u=0, (16)

then the influence functions (14) and (15) are bounded and redescendent.

The proof is given in Appendix A.2.

Theorem 4 ensures that the proposed method using the DP divergence is robust against outliers in terms of the influence function. That is, the estimator of the proposed method is hardly affected by the large outliers.

4.2 Case: γ\gamma-divergence

In the similar manner as in the maximum likelihood method, the influence function using the γ\gamma-divergence is expressed as follows.

IFγ(yo,𝒙o;F,𝜽)ψγ(yo|𝒙o;𝜽),IF_{\gamma}(y_{o},\bm{x}_{o};F,\bm{\theta})\propto\psi_{\gamma}(y_{o}|\bm{x}_{o};\bm{\theta}),

where

ψγ(yo|𝒙o;𝜽)=f(yo|𝒙o;𝜽)γs(yo|𝒙o;𝜽)𝔼(y|𝒙)[f(y|𝒙;𝜽)γ]f(yo|𝒙o;𝜽)γ𝔼(y|𝒙)[f(y|𝒙;𝜽)γs(y|𝒙;𝜽)].\displaystyle\psi_{\gamma}(y_{o}|\bm{x}_{o};\bm{\theta})=f(y_{o}|\bm{x}_{o};\bm{\theta})^{\gamma}s(y_{o}|\bm{x}_{o};\bm{\theta})\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\gamma}]-f(y_{o}|\bm{x}_{o};\bm{\theta})^{\gamma}\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\gamma}s(y|\bm{x};\bm{\theta})].

Thus, the influence functions of the parameters βk\beta_{k} (k=1,2,,pk=1,2,\ldots,p) and δl\delta_{l} (l=1,2,,M1l=1,2,\ldots,M-1) for the ordinal response model with the γ\gamma-divergence are expressed by

IFγ(yo,𝒙o;F,βk)ψγ(yo|𝒙o;βk)=(m=1M[G(δm𝒙o𝜷)G(δm1𝒙o𝜷)]γ+1)×[g(δyo𝒙o𝜷)g(δyo1𝒙o𝜷)][G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]γ1xok+[G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]γxok×(m=1M[g(δm𝒙o𝜷)g(δm1𝒙o𝜷)][G(δm𝒙o𝜷)G(δm1𝒙o𝜷)]γ)\begin{split}&IF_{\gamma}(y_{o},\bm{x}_{o};F,\beta_{k})\propto\psi_{\gamma}(y_{o}|\bm{x}_{o};\beta_{k})\\ =&-\left(\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma+1}\right)\\ &\quad\times\left[g(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-g(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma-1}x_{ok}\\ &+\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma}x_{ok}\\ &\quad\times\left(\sum_{m=1}^{M}\left[g(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-g(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]\left[G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma}\right)\end{split} (17)

and

IFγ(yo,𝒙o;F,δl)ψγ(yo|𝒙o;δl)=(m=1M[G(δm𝒙o𝜷)G(δm1𝒙o𝜷)]γ+1)×g(δl𝒙o𝜷)[G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]γ1[I(yo=l)I(yo=l+1)]g(δl𝒙o𝜷)[G(δyo𝒙o𝜷)G(δyo1𝒙o𝜷)]γ×([G(δl𝒙o𝜷)G(δl1𝒙o𝜷)]γ[G(δl+1𝒙o𝜷)G(δl𝒙o𝜷)]γ),\begin{split}&IF_{\gamma}(y_{o},\bm{x}_{o};F,\delta_{l})\propto\psi_{\gamma}(y_{o}|\bm{x}_{o};\delta_{l})\\ =&\left(\sum_{m=1}^{M}\left[G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma+1}\right)\\ &\quad\times g(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma-1}\left[I(y_{o}=l)-I(y_{o}=l+1)\right]\\ &-g(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\left[G(\delta_{y_{o}}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{y_{o}-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma}\\ &\quad\times\left(\left[G(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{l-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma}-\left[G(\delta_{l+1}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{l}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\gamma}\right),\end{split} (18)

respectively.

Here, the following theorem holds for the influence functions (17) and (18) using the γ\gamma-divergence.

Theorem 5.

If there exists a certain 0<γ10<\gamma\leq 1 such that

limu±g(u)γu=0,\lim_{u\to\pm\infty}g(u)^{\gamma}u=0, (19)

then the influence functions (17) and (18) are bounded and redescendent.

The proof of Theorem 5 is omitted since it can be done in the same way as the proof of Theorem 4.

Theorem 5 ensures that the proposed method using the γ\gamma-divergence is robust against outliers in terms of the influence function. That is, the estimator of the proposed method is hardly affected by the large outliers.

4.3 Numerical experience of influence functions

Sections 4.1 and 4.2 prove that the influence functions for the ordinal response model using the DP and γ\gamma-divergences are bounded and redescendent under the conditions (16) and (19). In this section, we plot and compare these influence functions. Note that when α=0\alpha=0 and γ=0\gamma=0, the methods with the DP and γ\gamma-divergences are the equivalent to the maximum likelihood method, respectively.

The following settings are considered in plotting the influence functions. The ordinal response yy generated by (1), where β=1.0\beta=1.0 (i.e., using one covariate) and 𝜹=(1.5,0.5,1.5)\bm{\delta}=(-1.5,0.5,1.5), has 4 categories. For the distribution GG of the error term in (1), the standard normal distribution and the standard logistic distribution are used.

Figure 1 plots the influence function (vertical axis) for each parameter against the covariate xx (horizontal axis) with the maximum likelihood (black), the DP divergence (blue), and the γ\gamma-divergence (red). The first and second rows of Figure 1 show the standard normal and standard logistic distributions for GG, respectively. The first and second columns are the influence function of β\beta and δ1\delta_{1}, respectively. The line type differs according to the tuning parameter in the divergences, “dash” for α=γ=0.3\alpha=\gamma=0.3, “dot” for α=γ=0.5\alpha=\gamma=0.5, “dot-dash” when α=γ=0.7\alpha=\gamma=0.7, and “long-dash” when α=γ=1.0\alpha=\gamma=1.0. In plotting the influence functions, we set y=1y=1.

Since conditions (16) and (19) are satisfied when distribution GG is the standard normal distribution or the standard logistic distribution, as can be seen from Figure 1, the influence functions with the DP and γ\gamma-divergences are both bounded and redescendence. On the other hand, since the standard normal distribution does not satisfy the conditions (12) and (13), the influence function in the maximum likelihood method is not bounded, as can be seen from Figure 1, and since the standard logistic distribution satisfies only the condition (13), the influence function of β\beta is not bounded and that of δ1\delta_{1} is bounded.

In addition, since the order of g(u)αug(u)^{\alpha}u (or g(u)γug(u)^{\gamma}u) is smaller when the distribution GG is lighter in conditions (16) and (19), from the proof of Theorems 4 and 5 (see Appendix A.2), the influence function converges to zero faster when GG is lighter. Thus, since the standard normal distribution is lighter than the standard logistic distribution, the first row of Figure 1 converges to zero faster than the second row.

For the influence functions with the DP and γ\gamma-divergences, when the tuning parameters α\alpha and γ\gamma are the same value, the influence function with the γ\gamma-divergence takes smaller values when xx is around 0, but when |x||x|\to\infty, the influence functions with the DP and γ\gamma-divergences take the same value.

Refer to caption
Figure 1: Plot of influence function for each parameter with the maximum likelihood (black), the DP divergence (blue) and the γ\gamma-divergence (red): the horizontal is the value of covariates, the vertical is the value of influence function: the first row for the distribution GG with standard normal distribution, the second row for standard logistic distribution: the first column is for the coefficient parameter β\beta, the second column is for the cutoff parameter δ1\delta_{1}

5 Asymptotic properties for proposed robust estimators and their test procedures

This section presents the asymptotic distributions of the proposed DP and γ\gamma-estimators and proposes the test procedures using them.

5.1 Case: Density-Power divergence

Under general regularity conditions (Huber and Ronchetti,, 2009), the proposed DP-estimator 𝜽^DP\hat{\bm{\theta}}_{DP} (4) is asymptotically a multivariate normal distribution, i.e.,

n(𝜽^DP𝜽)𝑑N(𝟎,𝑽DP(𝜽,ψDP)),\sqrt{n}(\hat{\bm{\theta}}_{DP}-\bm{\theta})\overset{d}{\to}N(\bm{0},\bm{V}_{DP}(\bm{\theta},\psi_{DP})), (20)

where the asymptotic covariance matrix is

𝑽DP(𝜽,ψDP)=(𝑴DP(𝜽,ψDP))1𝑸DP(𝜽,ψDP)(𝑴DP(𝜽,ψDP))1\bm{V}_{DP}(\bm{\theta},\psi_{DP})=\left(\bm{M}_{DP}(\bm{\theta},\psi_{DP})\right)^{-1}\bm{Q}_{DP}(\bm{\theta},\psi_{DP})\left(\bm{M}_{DP}(\bm{\theta},\psi_{DP})\right)^{-1}

with

𝑴DP(𝜽,ψDP)=𝔼(y,𝒙)[𝜽ψDP(y|𝒙;𝜽)],\bm{M}_{DP}(\bm{\theta},\psi_{DP})=-\mathbb{E}_{(y,\bm{x})}\left[\frac{\partial}{\partial\bm{\theta}}\psi_{DP}(y|\bm{x};\bm{\theta})\right],
ψDP(y|𝒙;𝜽)=f(y|𝒙;𝜽)αs(y|𝒙;𝜽)𝔼(y|𝒙)[f(y|𝒙;𝜽)αs(y|𝒙;𝜽)],\psi_{DP}(y|\bm{x};\bm{\theta})=f(y|\bm{x};\bm{\theta})^{\alpha}s(y|\bm{x};\bm{\theta})-\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\alpha}s(y|\bm{x};\bm{\theta})],

and

𝑸DP(𝜽,ψDP)=𝔼(y,𝒙)[ψDP(y|𝒙;𝜽)ψDP(y|𝒙;𝜽)].\bm{Q}_{DP}(\bm{\theta},\psi_{DP})=\mathbb{E}_{(y,\bm{x})}\left[\psi_{DP}(y|\bm{x};\bm{\theta})\psi_{DP}(y|\bm{x};\bm{\theta})^{\top}\right].

Since the estimator 𝑽^DP(𝜽,ψDP)\hat{\bm{V}}_{DP}(\bm{\theta},\psi_{DP}) of the asymptotic covariance matrix 𝑽DP(𝜽,ψDP)\bm{V}_{DP}(\bm{\theta},\psi_{DP}) is

𝑽^DP(𝜽,ψDP)=(𝑴^DP(𝜽,ψDP))1𝑸^DP(𝜽,ψDP)(𝑴^DP(𝜽,ψDP))1,\hat{\bm{V}}_{DP}(\bm{\theta},\psi_{DP})=\left(\hat{\bm{M}}_{DP}(\bm{\theta},\psi_{DP})\right)^{-1}\hat{\bm{Q}}_{DP}(\bm{\theta},\psi_{DP})\left(\hat{\bm{M}}_{DP}(\bm{\theta},\psi_{DP})\right)^{-1},

where

𝑴^DP(𝜽,ψDP)=1ni=1n𝜽ψDP(yi|𝒙i;𝜽)|𝜽=𝜽^DP\hat{\bm{M}}_{DP}(\bm{\theta},\psi_{DP})=-\frac{1}{n}\sum_{i=1}^{n}\left.\frac{\partial}{\partial\bm{\theta}}\psi_{DP}(y_{i}|\bm{x}_{i};\bm{\theta})\right|_{\bm{\theta}=\hat{\bm{\theta}}_{DP}}

and

𝑸^DP(𝜽,ψDP)=1ni=1nψDP(yi|𝒙i;𝜽^DP)ψDP(yi|𝒙i;𝜽^DP),\hat{\bm{Q}}_{DP}(\bm{\theta},\psi_{DP})=\frac{1}{n}\sum_{i=1}^{n}\psi_{DP}(y_{i}|\bm{x}_{i};\hat{\bm{\theta}}_{DP})\psi_{DP}(y_{i}|\bm{x}_{i};\hat{\bm{\theta}}_{DP})^{\top},

and 𝑽^DP(𝜽,ψDP)\hat{\bm{V}}_{DP}(\bm{\theta},\psi_{DP}) is consistent for 𝑽DP(𝜽,ψDP)\bm{V}_{DP}(\bm{\theta},\psi_{DP}), then with the equation (20), n(𝜽^DP𝜽)\sqrt{n}(\hat{\bm{\theta}}_{DP}-\bm{\theta}) is asymptotically a multivariate normal distribution N(𝟎,𝑽^DP(𝜽,ψDP))N(\bm{0},\hat{\bm{V}}_{DP}(\bm{\theta},\psi_{DP})).

Hence under the null hypothesis H0k:βk=0H_{0}^{k}:\beta_{k}=0 for k=1,2,,pk=1,2,\ldots,p,

Tβk=β^kDPβkσ^DP2[βk]n𝑑N(0,1),T_{\beta_{k}}=\frac{\hat{\beta}_{k}^{DP}-\beta_{k}}{\sqrt{\frac{\hat{\sigma}_{DP}^{2}[\beta_{k}]}{n}}}\overset{d}{\to}N(0,1),

where σ^DP2[βk]\hat{\sigma}_{DP}^{2}[\beta_{k}] is the kk-th element on the diagonal of 𝑽^DP(𝜷,ψDP)\hat{\bm{V}}_{DP}(\bm{\beta},\psi_{DP}), 𝑽^DP(𝜷,ψDP)\hat{\bm{V}}_{DP}(\bm{\beta},\psi_{DP}) is the submatrix of 𝑽^DP(𝜽,ψDP)\hat{\bm{V}}_{DP}(\bm{\theta},\psi_{DP}) related to the coefficient parameters 𝜷\bm{\beta}, and β^kDP\hat{\beta}_{k}^{DP} is the kk-th element of the proposed DP-estimator 𝜷^DP\hat{\bm{\beta}}_{DP} of 𝜷\bm{\beta}.

5.2 Case: γ\gamma-divergence

Under general regularity conditions (Huber and Ronchetti,, 2009), the proposed γ\gamma-estimator 𝜽^γ\hat{\bm{\theta}}_{\gamma} (4) is asymptotically a multivariate normal distribution, i.e.,

n(𝜽^γ𝜽)𝑑N(𝟎,𝑽γ(𝜽,ψγ)),\sqrt{n}(\hat{\bm{\theta}}_{\gamma}-\bm{\theta})\overset{d}{\to}N(\bm{0},\bm{V}_{\gamma}(\bm{\theta},\psi_{\gamma})), (21)

where the asymptotic covariance matrix is

𝑽γ(𝜽,ψγ)=(𝑴γ(𝜽,ψγ))1𝑸γ(𝜽,ψγ)(𝑴γ(𝜽,ψγ))1\bm{V}_{\gamma}(\bm{\theta},\psi_{\gamma})=\left(\bm{M}_{\gamma}(\bm{\theta},\psi_{\gamma})\right)^{-1}\bm{Q}_{\gamma}(\bm{\theta},\psi_{\gamma})\left(\bm{M}_{\gamma}(\bm{\theta},\psi_{\gamma})\right)^{-1}

with

𝑴γ(𝜽,ψγ)=𝔼(y,𝒙)[𝜽ψγ(y|𝒙;𝜽)],\bm{M}_{\gamma}(\bm{\theta},\psi_{\gamma})=-\mathbb{E}_{(y,\bm{x})}\left[\frac{\partial}{\partial\bm{\theta}}\psi_{\gamma}(y|\bm{x};\bm{\theta})\right],
ψγ(y|𝒙;𝜽)=f(y|𝒙;𝜽)γs(y|𝒙;𝜽)𝔼(y|𝒙)[f(y|𝒙;𝜽)γ]f(y|𝒙;𝜽)γ𝔼(y|𝒙)[f(y|𝒙;𝜽)γs(y|𝒙;𝜽)],\psi_{\gamma}(y|\bm{x};\bm{\theta})=f(y|\bm{x};\bm{\theta})^{\gamma}s(y|\bm{x};\bm{\theta})\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\gamma}]-f(y|\bm{x};\bm{\theta})^{\gamma}\mathbb{E}_{(y|\bm{x})}[f(y|\bm{x};\bm{\theta})^{\gamma}s(y|\bm{x};\bm{\theta})],

and

𝑸γ(𝜽,ψγ)=𝔼(y,𝒙)[ψγ(y|𝒙;𝜽)ψγ(y|𝒙;𝜽)].\bm{Q}_{\gamma}(\bm{\theta},\psi_{\gamma})=\mathbb{E}_{(y,\bm{x})}\left[\psi_{\gamma}(y|\bm{x};\bm{\theta})\psi_{\gamma}(y|\bm{x};\bm{\theta})^{\top}\right].

Since the estimator 𝑽^γ(𝜽,ψγ)\hat{\bm{V}}_{\gamma}(\bm{\theta},\psi_{\gamma}) of the asymptotic covariance matrix 𝑽γ(𝜽,ψγ)\bm{V}_{\gamma}(\bm{\theta},\psi_{\gamma}) is

𝑽^γ(𝜽,ψγ)=(𝑴^γ(𝜽,ψγ))1𝑸^γ(𝜽,ψγ)(𝑴^γ(𝜽,ψγ))1,\hat{\bm{V}}_{\gamma}(\bm{\theta},\psi_{\gamma})=\left(\hat{\bm{M}}_{\gamma}(\bm{\theta},\psi_{\gamma})\right)^{-1}\hat{\bm{Q}}_{\gamma}(\bm{\theta},\psi_{\gamma})\left(\hat{\bm{M}}_{\gamma}(\bm{\theta},\psi_{\gamma})\right)^{-1},

where

𝑴^γ(𝜽,ψγ)=1ni=1n𝜽ψγ(yi|𝒙i;𝜽)|𝜽=𝜽^γ\hat{\bm{M}}_{\gamma}(\bm{\theta},\psi_{\gamma})=-\frac{1}{n}\sum_{i=1}^{n}\left.\frac{\partial}{\partial\bm{\theta}}\psi_{\gamma}(y_{i}|\bm{x}_{i};\bm{\theta})\right|_{\bm{\theta}=\hat{\bm{\theta}}_{\gamma}}

and

𝑸^γ(𝜽,ψγ)=1ni=1nψγ(yi|𝒙i;𝜽^γ)ψγ(yi|𝒙i;𝜽^γ),\hat{\bm{Q}}_{\gamma}(\bm{\theta},\psi_{\gamma})=\frac{1}{n}\sum_{i=1}^{n}\psi_{\gamma}(y_{i}|\bm{x}_{i};\hat{\bm{\theta}}_{\gamma})\psi_{\gamma}(y_{i}|\bm{x}_{i};\hat{\bm{\theta}}_{\gamma})^{\top},

and 𝑽^γ(𝜽,ψγ)\hat{\bm{V}}_{\gamma}(\bm{\theta},\psi_{\gamma}) is consistent for 𝑽γ(𝜽,ψγ)\bm{V}_{\gamma}(\bm{\theta},\psi_{\gamma}), then with the equation (21), n(𝜽^γ𝜽)\sqrt{n}(\hat{\bm{\theta}}_{\gamma}-\bm{\theta}) is asymptotically a multivariate normal distribution N(𝟎,𝑽^γ(𝜽,ψγ))N(\bm{0},\hat{\bm{V}}_{\gamma}(\bm{\theta},\psi_{\gamma})).

Hence under the null hypothesis H0k:βk=0H_{0}^{k}:\beta_{k}=0 for k=1,2,,pk=1,2,\ldots,p,

Tβk=β^kγβkσ^γ2[βk]n𝑑N(0,1),T_{\beta_{k}}=\frac{\hat{\beta}_{k}^{\gamma}-\beta_{k}}{\sqrt{\frac{\hat{\sigma}_{\gamma}^{2}[\beta_{k}]}{n}}}\overset{d}{\to}N(0,1),

where σ^γ2[βk]\hat{\sigma}_{\gamma}^{2}[\beta_{k}] is the kk-th element on the diagonal of 𝑽^γ(𝜷,ψγ)\hat{\bm{V}}_{\gamma}(\bm{\beta},\psi_{\gamma}), 𝑽^γ(𝜷,ψγ)\hat{\bm{V}}_{\gamma}(\bm{\beta},\psi_{\gamma}) is the submatrix of 𝑽^γ(𝜽,ψγ)\hat{\bm{V}}_{\gamma}(\bm{\theta},\psi_{\gamma}) related to the coefficient parameters 𝜷\bm{\beta}, and β^kγ\hat{\beta}_{k}^{\gamma} is the kk-th element of the proposed γ\gamma-estimator 𝜷^γ\hat{\bm{\beta}}_{\gamma} of 𝜷\bm{\beta}.

6 Numerical Experiments

This section shows the performance of the proposed robust ordinal response model with the DP and γ\gamma-divergences by means of numerical experiments with artificially generated data and two real data.

6.1 Simulation study

This section performs some simulation studies of the proposed methods, referring to Section 5 of Scalera et al., (2021). We consider the following latent variable model (1) for ordered categorical data yy with five categories (i.e., M=5M=5) as the response variable.

z=xβ1+dβ2+xdβ3+ε,z=x\beta_{1}+d\beta_{2}+xd\beta_{3}+\varepsilon,

where xN(0,1)x\sim N(0,1), dBernoulli(0.25)d\sim{\rm Bernoulli}(0.25), xx and dd are mutually independent. The random variable ε\varepsilon has standard normal, logistic, and Gumbel distributions, and in this numerical experiment we use the link function suitable for the each error distribution. That is, probit link for normal distribution, logit link for logistic distribution, and log-log link for Gumbel distribution are used. This is because the misspecification of the link function causes a substantial bias in the parameter estimation, apart from the effect of outliers on the inference. The development of an inference method that is robust to both outliers and the misspecification of link functions is the future work. The true values of the regression coefficients 𝜷\bm{\beta} and cutoff 𝜹\bm{\delta} parameters are set as follows according to the distribution ε\varepsilon has.

  • When ε\varepsilon has the normal distribution:

    (β1,β2,β3)=(2.5,1.2,0.7)and(δ1,δ2,δ3,δ4)=(3.0,0.7,1.6,3.9).(\beta_{1},\beta_{2},\beta_{3})=(2.5,1.2,0.7)~{}~{}\mbox{and}~{}~{}(\delta_{1},\delta_{2},\delta_{3},\delta_{4})=(-3.0,-0.7,1.6,3.9).
  • When ε\varepsilon has the logistic distribution:

    (β1,β2,β3)=(2.5,1.2,0.7)and(δ1,δ2,δ3,δ4)=(3.3,0.8,1.7,4.2).(\beta_{1},\beta_{2},\beta_{3})=(2.5,1.2,0.7)~{}~{}\mbox{and}~{}~{}(\delta_{1},\delta_{2},\delta_{3},\delta_{4})=(-3.3,-0.8,1.7,4.2).
  • When ε\varepsilon has the Gumbel distribution:

    (β1,β2,β3)=(2.5,1.2,0.7)and(δ1,δ2,δ3,δ4)=(2.9,1.0,2.9,4.8).(\beta_{1},\beta_{2},\beta_{3})=(2.5,1.2,0.7)~{}~{}\mbox{and}~{}~{}(\delta_{1},\delta_{2},\delta_{3},\delta_{4})=(-2.9,1.0,2.9,4.8).

We generate 200 observations with the above settings, and then generate outliers in the covariate xix_{i} for 5 and 10 percent of the total observations (i.e., 10 and 20 outliers) from N(20,1)N(20,1) and replace them with xix_{i} at random.

Using the link functions corresponding to the error distributions of ε\varepsilon, we compare the maximum likelihood method with the proposed methods using the DP and γ\gamma-divergences, using bias, mean squared error (MSE), and percentage of correctly classified responses (CCR):

Biasj=1Ss=1Sθ^jsθj,MSEj=1Ss=1S(θ^jsθj)2,CCR=1Ss=1SI(y^s=ys),\displaystyle{\rm Bias}_{j}=\frac{1}{S}\sum_{s=1}^{S}\hat{\theta}_{js}-\theta_{j},~{}~{}{\rm MSE}_{j}=\frac{1}{S}\sum_{s=1}^{S}(\hat{\theta}_{js}-\theta_{j})^{2},~{}~{}{\rm CCR}=\frac{1}{S}\sum_{s=1}^{S}I(\hat{y}_{s}=y_{s}^{*}),

where SS is the number of iterations (S=1000S=1000 in this simulation), jj is the index for each parameter, θ^js\hat{\theta}_{js} is the estimated value of the parameter at the ssth iteration, y^s\hat{y}_{s} is the predicted ordinal response using θ^js\hat{\theta}_{js} and the newly generated non-outlier covariates xx and dd at the ssth iteration, ysy_{s}^{*} is the validation data at the ssth iteration, and I(A)I(A) is an indicator function that returns 1 when AA is true and 0 otherwise.

The maximum likelihood method using the cauchit link, which is the distribution function of the Cauchy distribution and the link function whose influence function is bounded in shown by Scalera et al., (2021), is also included in the comparison. This is because as mentioned earlier, the misspecification of the link function certainly causes a substantial bias in the parameter estimation, but as shown in our simulation results to be presented later, it seems to perform better in the prediction than using a link function whose influence function is not bounded.

Tables 1 to 3 summarize the simulation results for the settings described above. Tables 1, 2, and 3 show the results when ε\varepsilon has the normal distribution, logistic distribution, and Gumbel distribution, respectively. When the outlier ratio is 0%\%, the maximum likelihood methods with suitable links minimize bias and MSE and have high CCR values in most cases, no matter which distribution ε\varepsilon follows. Notably, however, the proposed methods with the DP and γ\gamma-divergences also perform as well as the maximum likelihood methods. Since it is difficult to determine whether there are outliers or not in most cases in real data, it is a very desirable property that the estimation accuracy is equivalent to that of the maximum likelihood method even when there are no outliers in reality. When the cauchit link is used in the maximum likelihood method, the bias and MSE are still large due to the misspecification of the link function, but CCR is not that much worse than other methods.

When the data contain outliers, this simulation suggests that the maximum likelihood methods are considerably affected by the outliers. In addition, the maximum likelihood method with the cauchit link misspecifies the link function, resulting in considerably larger values of bias and MSE. On the other hand, as Scalera et al., (2021) also mentioned, since the influence function in the maximum likelihood method with the cauchit link is bounded, the prediction accuracy in terms of CCR values is better than that of the maximum likelihood method with correctly specified links. However, it can be seen from Tables 1 to 3 that our proposed methods are more accurate in terms of bias, MSE, and CCR. It is particularly noteworthy that the maximum likelihood method with the cauchit link gives considerably worse CCR values when the outlier ratio increases, while our proposed methods show almost no decrease in the CCR values. This may be because, as shown in Section 4, the proposed methods whose influence functions satisfy not only boundedness but also redescendence are more robust than the maximum likelihood method with the cauchit link.

Comparing the methods using the DP and γ\gamma-divergences, it is found that there is little difference in all the values of the bias, MSE, and CCR. However, although the difference is really small, the values of the bias and MSE are smaller for the method using the DP divergence, while the value of the CCR is larger for the method using γ\gamma-divergences. This tendency may slightly change depending on the selection of the tuning parameters, α\alpha and γ\gamma, as described in Remark below, but our empirical results are generally good when the values of α\alpha and γ\gamma are set at around 0.30.50.3\sim 0.5.

Next, we conduct a numerical experiment that is almost the same as the data generation process described above, but this time the outlier ratio in the covariate xix_{i} is fixed at 5% and the mean of the normal distribution of outliers is varied. Tables 4, 5, and 6 show the results for the mean parameter values of 5, 10, and 20 with the probit, logit, and log-log link functions. When the value of the mean parameter is set to 20, the settings are the same as those in the middle panels of Tables 1 to 3, and hence these values are reproduced in the tables. The results in Tables 4 to 6 are consistent with the results of the theorems on the influence function in Section 4. Since the maximum likelihood method with the probit, logit, and log-log links does not satisfy boundedness, the accuracy of inference on the bias, MSE, and CCR becomes worse for larger values of outliers. In contrast, the influence function of our method with the DP and γ\gamma-divergences satisfies both boundedness and redescendence, and the inference accuracy is robust to increasing values of outliers. Moreover, since our method can ignore the influence of outliers more as the value of outliers increases, the inference accuracy is better for most of the indices when the value of the mean parameter of the contaminated normal distribution is 20 than for the other cases.

Remark (Selection of tuning parameters, α\alpha and γ\gamma).

The DP and γ\gamma-divergences use tuning parameters α\alpha and γ\gamma, respectively, to control the robustness against outliers. Basu et al., (1998) mentioned that if the value of tuning parameter is smaller than necessary, the effect of outliers on the estimators obtained by using these divergences cannot be adequately removed, on the contrary, if a larger tuning parameter value is used than necessary, the statistical efficiency of the estimators will be lost instead. Namely, there is a trade-off between robustness and efficiency. Therefore, we would like to lightly discuss how to choose tuning parameters.

A simple and well-known method for selecting tuning parameters is to use the asymptotic relative efficiency, which is calculated using the asymptotic variance of the estimator in the robust divergence-based method and the asymptotic variance of the estimator in the maximum likelihood method. Another well-known method is the method using cross-validation, but it is not straightforward when outliers exist in the data, so we have to find appropriate values of tuning parameters through trial and error. Recently, Sugasawa and Yonekura, (2021) improved the method proposed by Warwick and Jones, (2005) and Basak et al., (2021) for selecting tuning parameters based on the asymptotic approximation of the mean square error in a simple normal model and linear regression, and proposed a method for selecting tuning parameters based on the asymptotic approximation of the Hyvärinen score (Shao et al.,, 2019; Dawid and Musio,, 2015) using unnormalized models based on robust divergence. The extension of their method to the ordinal response model will be the subject of future work.

Table 1: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the probit link, and maximum likelihood method with the cauchit link. The outlier ratios are 0, 5, and 10%. Bold, *, and ** denote the first, second, and third best performance, respectively.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Outlier Ratio 0%
ML 0.0767{\bf 0.0767} 0.0350{\bf 0.0350} 0.0435{\bf 0.0435} 0.0916{\bf 0.0916} 0.0216{\bf 0.0216} 0.0457{\bf 0.0457} 0.1219{\bf 0.1219} 0.0596{\bf 0.0596} 0.0597{\bf 0.0597} 0.0993{\bf 0.0993} 0.0996{\bf 0.0996} 0.0276{\bf 0.0276} 0.0450{\bf 0.0450} 0.1552{\bf 0.1552} 0.68540.6854
ML ++ Cauchit 3.82573.8257 1.83621.8362 1.14071.1407 4.81494.8149 1.07241.0724 2.43432.4343 6.22136.2213 15.608915.6089 3.98513.9851 2.16282.1628 24.737724.7377 1.40261.4026 6.47456.4745 41.211041.2110 0.68270.6827
DP 0.30.3 0.08250.0825* 0.03730.0373* 0.04390.0439* 0.09900.0990* 0.02370.0237* 0.04950.0495* 0.13010.1301* 0.06820.0682* 0.06580.0658* 0.10720.1072* 0.11370.1137* 0.03010.0301* 0.04960.0496* 0.17740.1774* 0.68590.6859**
DP 0.50.5 0.10180.1018 0.04680.0468 0.05180.0518 0.12180.1218 0.02880.0288 0.06120.0612 0.16030.1603 0.08480.0848 0.07590.0759 0.12100.1210 0.13880.1388 0.03360.0336 0.05790.0579 0.21870.2187 0.6862{\bf 0.6862}
γ\gamma 0.30.3 0.08290.0829** 0.03750.0375** 0.04400.0440** 0.09950.0995** 0.02380.0238** 0.04980.0498** 0.13090.1309** 0.06870.0687** 0.06590.0659** 0.10730.1073** 0.11440.1144** 0.03020.0302** 0.04980.0498** 0.17880.1788** 0.68590.6859**
γ\gamma 0.50.5 0.10450.1045 0.04810.0481 0.05260.0526 0.12480.1248 0.02960.0296 0.06280.0628 0.16430.1643 0.08770.0877 0.07670.0767 0.12170.1217 0.14280.1428 0.03400.0340 0.05910.0591 0.22560.2256 0.6862{\bf 0.6862}
Outlier Ratio 5%
ML 2.46942.4694 0.60270.6027 0.92020.9202 1.76941.7694 0.43880.4388 0.89500.8950 2.16822.1682 6.09836.0983 0.41110.4111 0.92920.9292 3.15093.1509 0.20430.2043 0.81520.8152 4.73884.7388 0.41620.4162
ML ++ Cauchit 0.10150.1015 0.97560.9756 2.26202.2620 1.52321.5232 0.19720.1972 0.59180.5918 3.21113.2111 3.77083.7708 6.70946.7094 24.791324.7913 4.22714.2271 0.26950.2695 1.23851.2385 44.835044.8350 0.57790.5779
DP 0.30.3 0.01800.0180* 0.0008{\bf 0.0008} 0.0277{\bf 0.0277} 0.00390.0039* 0.0003{\bf 0.0003} 0.00690.0069* 0.0036{\bf 0.0036} 0.0538{\bf 0.0538} 0.0645{\bf 0.0645} 0.1004{\bf 0.1004} 0.0948{\bf 0.0948} 0.0284{\bf 0.0284} 0.0417{\bf 0.0417} 0.1395{\bf 0.1395} 0.68290.6829
DP 0.50.5 0.0086{\bf 0.0086} 0.00560.0056* 0.03060.0306* 0.0035{\bf 0.0035} 0.00390.0039* 0.0041{\bf 0.0041} 0.01750.0175* 0.06370.0637** 0.07220.0722** 0.11140.1114** 0.10940.1094** 0.03130.0313** 0.04690.0469** 0.16400.1640** 0.6834{\bf 0.6834}
γ\gamma 0.30.3 0.02070.0207** 0.01960.0196** 0.03920.0392** 0.04100.0410** 0.01100.0110** 0.01780.0178** 0.06250.0625** 0.05660.0566* 0.06730.0673* 0.10470.1047* 0.10140.1014* 0.02970.0297* 0.04400.0440* 0.15080.1508* 0.68320.6832*
γ\gamma 0.50.5 0.08020.0802 0.04880.0488 0.05740.0574 0.10590.1059 0.02860.0286 0.05230.0523 0.15240.1524 0.07940.0794 0.08160.0816 0.12390.1239 0.13640.1364 0.03560.0356 0.05580.0558 0.21120.2112 0.68300.6830**
Outlier Ratio 10%
ML 2.48632.4863 0.60910.6091 0.92940.9294 1.77791.7779 0.43620.4362 0.90580.9058 2.18112.1811 6.18176.1817 0.41160.4116 0.93080.9308 3.18103.1810 0.20290.2029 0.83540.8354 4.79764.7976 0.42110.4211
ML ++ Cauchit 2.30532.3053 0.07830.0783 2.52912.5291 0.0143{\bf 0.0143} 0.28490.2849 0.43740.4374 0.62780.6278 5.88315.8831 0.71220.7122 8.14798.1479 0.58870.5887 0.13660.1366 0.35130.3513 3.75153.7515 0.43410.4341
DP 0.30.3 0.0574{\bf 0.0574} 0.0326{\bf 0.0326} 0.0346{\bf 0.0346} 0.07340.0734* 0.0179{\bf 0.0179} 0.0359{\bf 0.0359} 0.0960{\bf 0.0960} 0.0701{\bf 0.0701} 0.0710{\bf 0.0710} 0.1136{\bf 0.1136} 0.1163{\bf 0.1163} 0.0323{\bf 0.0323} 0.0508{\bf 0.0508} 0.1778{\bf 0.1778} 0.6805{\bf 0.6805}
DP 0.50.5 0.07440.0744* 0.04200.0420* 0.04230.0423* 0.09190.0919** 0.02290.0229* 0.04590.0459* 0.12280.1228* 0.08580.0858* 0.08150.0815** 0.12770.1277** 0.13840.1384* 0.03580.0358* 0.05870.0587* 0.21630.2163* 0.67940.6794
γ\gamma 0.30.3 0.13750.1375** 0.07140.0714** 0.05800.0580** 0.16620.1662 0.04010.0401** 0.08690.0869** 0.21770.2177** 0.09200.0920** 0.08030.0803* 0.12370.1237* 0.14950.1495** 0.03640.0364** 0.06170.0617** 0.23290.2329** 0.68030.6803*
γ\gamma 0.50.5 0.26010.2601 0.13220.1322 0.09780.0978 0.30690.3069 0.07440.0744 0.16410.1641 0.40530.4053 0.17040.1704 0.11340.1134 0.15960.1596 0.26120.2612 0.04890.0489 0.09840.0984 0.42440.4244 0.67950.6795**
Table 2: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the logit link, and maximum likelihood method with the cauchit link. The outlier ratios are 0, 5, and 10%. Bold, *, and ** denote the first, second, and third best performance, respectively.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Outlier Ratio 0%
ML 0.0578{\bf 0.0578} 0.0310{\bf 0.0310} 0.0390{\bf 0.0390} 0.0749{\bf 0.0749} 0.0125{\bf 0.0125} 0.0372{\bf 0.0372} 0.0976{\bf 0.0976} 0.0695{\bf 0.0695} 0.1263{\bf 0.1263} 0.1836{\bf 0.1836} 0.1311{\bf 0.1311} 0.0501{\bf 0.0501} 0.0654{\bf 0.0654} 0.1791{\bf 0.1791} 0.54410.5441
ML ++ Cauchit 0.45400.4540 0.23020.2302 0.17980.1798 0.81280.8128 0.13930.1393 0.29390.2939 1.02951.0295 0.38460.3846 0.28050.2805 0.35000.3500 1.00781.0078 0.10100.1010 0.21580.2158 1.57141.5714 0.53970.5397
DP 0.30.3 0.07130.0713* 0.03640.0364* 0.04140.0414* 0.09060.0906* 0.01640.0164* 0.04450.0445* 0.11910.1191* 0.07940.0794* 0.13660.1366* 0.19630.1963* 0.14620.1462* 0.05260.0526* 0.07050.0705* 0.20430.2043* 0.54500.5450
DP 0.50.5 0.08870.0887 0.04470.0447 0.04860.0486 0.11150.1115 0.02120.0212 0.05460.0546 0.14740.1474 0.09350.0935 0.15080.1508 0.21510.2151 0.16890.1689 0.05650.0565 0.07810.0781 0.24140.2414 0.5455{\bf 0.5455}
γ\gamma 0.30.3 0.07180.0718** 0.03670.0367** 0.04160.0416** 0.09120.0912** 0.01650.0165** 0.04480.0448** 0.11990.1199** 0.08000.0800** 0.13680.1368** 0.19650.1965** 0.14700.1470** 0.05270.0527** 0.07070.0707** 0.20570.2057** 0.54510.5451**
γ\gamma 0.50.5 0.09220.0922 0.04650.0465 0.04980.0498 0.11540.1154 0.02210.0221 0.05680.0568 0.15250.1525 0.09700.0970 0.15230.1523 0.21670.2167 0.17360.1736 0.05690.0569 0.07960.0796 0.24940.2494 0.5455{\bf 0.5455}
Outlier Ratio 5%
ML 2.44122.4412 0.37840.3784 1.48461.4846 1.32351.3235 0.37690.3769 0.63160.6316 1.42101.4210 5.96055.9605 0.21960.2196 2.29962.2996 1.81651.8165 0.17270.1727 0.43680.4368 2.12772.1277 0.38190.3819
ML ++ Cauchit 1.11691.1169 0.06610.0661 0.94730.9473 0.20000.2000 0.18150.1815 0.30240.3024 0.06740.0674 2.50902.5090 0.80440.8044 2.96392.9639 0.53940.5394 0.11820.1182 0.28250.2825 1.92911.9291 0.46960.4696
DP 0.30.3 0.0614{\bf 0.0614} 0.0272{\bf 0.0272} 0.0454{\bf 0.0454} 0.0832{\bf 0.0832} 0.0130{\bf 0.0130} 0.0404{\bf 0.0404} 0.10500.1050* 0.0778{\bf 0.0778} 0.1424{\bf 0.1424} 0.2146{\bf 0.2146} 0.1431{\bf 0.1431} 0.0543{\bf 0.0543} 0.0723{\bf 0.0723} 0.2073{\bf 0.2073} 0.5417{\bf 0.5417}
DP 0.50.5 0.07950.0795* 0.03580.0358* 0.05420.0542* 0.10390.1039* 0.01780.0178* 0.05100.0510* 0.13380.1338** 0.09160.0916** 0.15680.1568** 0.23570.2357** 0.16430.1643** 0.05860.0586** 0.08010.0801** 0.24370.2437** 0.54010.5401
γ\gamma 0.30.3 0.10810.1081** 0.04980.0498** 0.05980.0598** 0.13430.1343** 0.02580.0258** 0.07010.0701** 0.17330.1733 0.08930.0893* 0.15000.1500* 0.22520.2252* 0.16160.1616* 0.05760.0576* 0.07930.0793* 0.23710.2371* 0.54160.5416*
γ\gamma 0.50.5 0.18970.1897 0.08910.0891 0.08920.0892 0.22440.2244 0.04790.0479 0.12070.1207 0.29500.2950 0.13360.1336 0.18030.1803 0.26710.2671 0.22650.2265 0.06800.0680 0.10280.1028 0.34770.3477 0.54120.5412**
Outlier Ratio 10%
ML 2.47362.4736 0.37980.3798 1.50061.5006 1.33331.3333 0.38160.3816 0.64140.6414 1.44021.4402 6.11906.1190 0.21910.2191 2.34452.3445 1.84661.8466 0.17810.1781 0.45050.4505 2.17872.1787 0.37450.3745
ML ++ Cauchit 2.44302.4430 0.28760.2876 1.72581.7258 0.78870.7887 0.42080.4208 0.73670.7367 0.60240.6024 6.03826.0382 0.42590.4259 4.26234.2623 0.93320.9332 0.20840.2084 0.60490.6049 1.10411.1041 0.37350.3735
DP 0.30.3 0.0643{\bf 0.0643} 0.0325{\bf 0.0325} 0.0438{\bf 0.0438} 0.0887{\bf 0.0887} 0.0157{\bf 0.0157} 0.0387{\bf 0.0387} 0.1107{\bf 0.1107} 0.0862{\bf 0.0862} 0.1515{\bf 0.1515} 0.2242{\bf 0.2242} 0.1554{\bf 0.1554} 0.0582{\bf 0.0582} 0.0762{\bf 0.0762} 0.2184{\bf 0.2184} 0.53700.5370**
DP 0.50.5 0.08420.0842* 0.04340.0434* 0.05300.0530* 0.11170.1117* 0.02100.0210* 0.05000.0500* 0.14200.1420* 0.10220.1022* 0.16820.1682* 0.24620.2462* 0.17970.1797* 0.06280.0628* 0.08460.0846* 0.25870.2587* 0.5376{\bf 0.5376}
γ\gamma 0.30.3 0.15690.1569** 0.07740.0774** 0.07210.0721** 0.19030.1903** 0.04130.0413** 0.09740.0974** 0.24630.2463** 0.11430.1143** 0.16880.1688** 0.24620.2462** 0.19920.1992** 0.06550.0655** 0.09190.0919** 0.28870.2887** 0.53680.5368
γ\gamma 0.50.5 0.29980.2998 0.14830.1483 0.12070.1207 0.34930.3493 0.08070.0807 0.18640.1864 0.45890.4589 0.21020.2102 0.22420.2242 0.31220.3122 0.33690.3369 0.08460.0846 0.13910.1391 0.51940.5194 0.53750.5375*
Table 3: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the log-log link, and maximum likelihood method with the cauchit link. The outlier ratios are 0, 5, and 10%. Bold, *, and ** denote the first, second, and third best performance, respectively.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Outlier Ratio 0%
ML 0.0918{\bf 0.0918} 0.05630.0563** 0.02890.0289** 0.1176{\bf 0.1176} 0.0285{\bf 0.0285} 0.0827{\bf 0.0827} 0.1431{\bf 0.1431} 0.0763{\bf 0.0763} 0.0814{\bf 0.0814} 0.11960.1196** 0.1424{\bf 0.1424} 0.0394{\bf 0.0394} 0.1151{\bf 0.1151} 0.2664{\bf 0.2664} 0.6638{\bf 0.6638}
ML ++ Cauchit 2.55002.5500 1.23451.2345 0.72020.7202 4.09354.0935 0.14320.1432 2.13802.1380 4.28194.2819 7.14307.1430 2.02462.0246 1.15061.1506 17.896217.8962 0.21940.2194 5.36785.3678 20.384420.3844 0.66140.6614
DP 0.30.3 0.09630.0963* 0.0517{\bf 0.0517} 0.0266{\bf 0.0266} 0.12340.1234* 0.03210.0321* 0.09370.0937* 0.16280.1628* 0.08610.0861* 0.08640.0864* 0.1132{\bf 0.1132} 0.14450.1445* 0.04210.0421* 0.12180.1218* 0.28000.2800* 0.66320.6632**
DP 0.50.5 0.11920.1192 0.06240.0624 0.03480.0348 0.15260.1526 0.04020.0402 0.11800.1180 0.20420.2042 0.10820.1082 0.09960.0996 0.12850.1285 0.18110.1811 0.04750.0475 0.14720.1472 0.34540.3454 0.66370.6637*
γ\gamma 0.30.3 0.09740.0974** 0.05220.0522* 0.02680.0268* 0.12470.1247** 0.03260.0326** 0.09500.0950** 0.16470.1647** 0.08720.0872** 0.08680.0868** 0.11340.1134* 0.14600.1460** 0.04240.0424** 0.12320.1232** 0.28330.2833** 0.66320.6632**
γ\gamma 0.50.5 0.12480.1248 0.06510.0651 0.03610.0361 0.15950.1595 0.04270.0427 0.12410.1241 0.21360.2136 0.11410.1141 0.10160.1016 0.12980.1298 0.19030.1903 0.04890.0489 0.15440.1544 0.36240.3624 0.66320.6632**
Outlier Ratio 5%
ML 2.47172.4717 0.23870.2387 1.08201.0820 2.02932.0293 0.31920.3192 1.15541.1554 1.75421.7542 6.10986.1098 0.09030.0903 1.24141.2414 4.13204.1320 0.12090.1209 1.37581.3758 3.17933.1793 0.50440.5044
ML ++ Cauchit 0.88950.8895 0.17340.1734 2.61482.6148 1.84591.8459 0.50270.5027 0.20190.2019 1.57741.5774 4.13094.1309 2.39682.3968 29.965329.9653 5.42735.4273 0.49160.4916 2.37542.3754 28.155628.1556 0.57420.5742
DP 0.30.3 0.0649{\bf 0.0649} 0.0439{\bf 0.0439} 0.0296{\bf 0.0296} 0.0951{\bf 0.0951} 0.0271{\bf 0.0271} 0.0665{\bf 0.0665} 0.1138{\bf 0.1138} 0.0802{\bf 0.0802} 0.0885{\bf 0.0885} 0.1181{\bf 0.1181} 0.1427{\bf 0.1427} 0.0430{\bf 0.0430} 0.1159{\bf 0.1159} 0.2642{\bf 0.2642} 0.66840.6684*
DP 0.50.5 0.08770.0877* 0.05550.0555* 0.03770.0377* 0.12250.1225* 0.03580.0358* 0.09060.0906* 0.15420.1542* 0.10020.1002** 0.10190.1019** 0.13440.1344** 0.17680.1768** 0.04850.0485** 0.14020.1402** 0.32570.3257** 0.66820.6682**
γ\gamma 0.30.3 0.11590.1159* 0.06830.0683* 0.04310.0431* 0.15730.1573* 0.05160.0516* 0.12450.1245* 0.20120.2012* 0.09400.0940* 0.09530.0953* 0.12420.1242* 0.16660.1666* 0.04720.0472* 0.13390.1339* 0.30750.3075* 0.66810.6681
γ\gamma 0.50.5 0.21050.2105 0.11450.1145 0.07040.0704 0.27410.2741 0.09360.0936 0.22860.2286 0.36340.3634 0.15420.1542 0.12490.1249 0.15420.1542 0.26820.2682 0.06300.0630 0.20870.2087 0.49030.4903 0.6693{\bf 0.6693}
Outlier Ratio 10%
ML 2.48492.4849 0.23520.2352 1.09291.0929 2.04222.0422 0.31500.3150 1.15561.1556 1.74981.7498 6.17516.1751 0.0881{\bf 0.0881} 1.25701.2570 4.18414.1841 0.11800.1180 1.37631.3763 3.16563.1656 0.50920.5092
ML ++ Cauchit 2.38972.3897 0.42020.4202 2.62212.6221 1.08401.0840 0.74500.7450 1.16491.1649 0.21940.2194** 5.96135.9613 0.43980.4398 8.11628.1162 2.08542.0854 0.58870.5887 1.56021.5602 1.01031.0103 0.50860.5086
DP 0.30.3 0.0956{\bf 0.0956} 0.0577{\bf 0.0577} 0.0407{\bf 0.0407} 0.1300{\bf 0.1300} 0.0352{\bf 0.0352} 0.0977{\bf 0.0977} 0.1693{\bf 0.1693} 0.0933{\bf 0.0933} 0.09660.0966* 0.1293{\bf 0.1293} 0.1599{\bf 0.1599} 0.0464{\bf 0.0464} 0.1332{\bf 0.1332} 0.3132{\bf 0.3132} 0.6653{\bf 0.6653}
DP 0.50.5 0.12250.1225* 0.07140.0714* 0.05070.0507* 0.16450.1645* 0.04500.0450* 0.12630.1263* 0.21760.2176* 0.11870.1187* 0.11210.1121** 0.15000.1500** 0.20450.2045* 0.05270.0527* 0.16240.1624* 0.39450.3945* 0.66370.6637**
γ\gamma 0.30.3 0.19660.1966** 0.10600.1060** 0.06780.0678** 0.25300.2530** 0.08290.0829** 0.21240.2124** 0.34260.3426 0.13180.1318** 0.11290.1129 0.14350.1435* 0.22320.2232** 0.05640.0564** 0.18250.1825** 0.43460.4346** 0.66510.6651*
γ\gamma 0.50.5 0.36390.3639 0.18720.1872 0.11680.1168 0.46360.4636 0.15600.1560 0.39750.3975 0.63140.6314 0.27160.2716 0.16920.1692 0.19830.1983 0.45690.4569 0.08880.0888 0.35310.3531 0.86690.8669 0.66330.6633
Table 4: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the probit link, and maximum likelihood method with the cauchit link. The values of the means of the contaminated normal distribution are 5, 10, and 20. Bold, *, and ** denote the first, second, and third best performance, respectively.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Mean == 5
ML 2.04402.0440 0.53130.5313 0.71930.7193 1.52441.5244 1.2441{\bf 1.2441} 2.00802.0080 2.71212.7121 4.18704.1870 0.31840.3184 0.58750.5875 2.34772.3477 1.5589{\bf 1.5589} 4.04634.0463 7.39727.3972 0.51170.5117
ML ++ Cauchit 2.26332.2633 1.23281.2328 1.06511.0651 3.24753.2475 2.36292.3629 0.2718{\bf 0.2718} 3.38243.3824 5.67095.6709 2.14362.1436 2.21412.2141 11.417111.4171 5.72295.7229 0.3973{\bf 0.3973} 14.063814.0638 0.67220.6722
DP 0.30.3 0.0622{\bf 0.0622} 0.0322{\bf 0.0322} 0.0424{\bf 0.0424} 0.1787{\bf 0.1787} 1.71581.7158* 1.26381.2638 0.78890.7889 0.0694{\bf 0.0694} 0.0690{\bf 0.0690} 0.1106{\bf 0.1106} 0.1412{\bf 0.1412} 2.97452.9745* 1.64581.6458 0.79580.7958 0.67780.6778*
DP 0.50.5 0.08710.0871* 0.04340.0434* 0.04850.0485* 0.20570.2057* 1.72251.7225** 1.25031.2503 0.75460.7546** 0.08550.0855* 0.07920.0792** 0.12410.1241** 0.17190.1719** 3.00103.0010** 1.61931.6193 0.77620.7762** 0.67700.6770
γ\gamma 0.30.3 0.10300.1030** 0.05180.0518** 0.05410.0541** 0.22580.2258** 1.72701.7270 1.23811.2381** 0.72720.7272* 0.07940.0794** 0.07320.0732* 0.11560.1156* 0.16600.1660* 3.01433.0143 1.58381.5838** 0.71150.7115* 0.6780{\bf 0.6780}
γ\gamma 0.50.5 0.18210.1821 0.08920.0892 0.07660.0766 0.31540.3154 1.74851.7485 1.19041.1904* 0.6104{\bf 0.6104} 0.12290.1229 0.09310.0931 0.13890.1389 0.24870.2487 3.09533.0953 1.48061.4806* 0.6113{\bf 0.6113} 0.67710.6771**
Mean == 10
ML 2.37462.3746 0.58050.5805 0.88280.8828 1.66311.6631 1.2370{\bf 1.2370} 2.14212.1421 2.97332.9733 5.63985.6398 0.42300.4230 0.9311{\bf 0.9311} 2.78672.7867 1.5419{\bf 1.5419} 4.60344.6034 8.88318.8831 0.43040.4304
ML ++ Cauchit 2.02692.0269 1.27311.2731 1.29821.2982 3.08873.0887 2.30422.3042 0.1542{\bf 0.1542} 3.49873.4987 5.32255.3225 3.91833.9183 9.08879.0887 10.944710.9447 5.47485.4748 0.5146{\bf 0.5146} 25.891625.8916 0.66720.6672
DP 0.30.3 0.0672{\bf 0.0672} 0.0373{\bf 0.0373} 0.0276{\bf 0.0276} 0.1818{\bf 0.1818} 1.71911.7191* 1.26071.2607 0.78500.7850 0.0677{\bf 0.0677} 0.0703{\bf 0.0703} 0.10530.1053* 0.1408{\bf 0.1408} 2.98572.9857* 1.63641.6364 0.77970.7797 0.68280.6828*
DP 0.50.5 0.08630.0863* 0.04800.0480* 0.03540.0354* 0.20310.2031* 1.72461.7246** 1.24981.2498 0.75510.7551* 0.08420.0842** 0.08090.0809** 0.11920.1192 0.16940.1694** 3.00813.0081** 1.61631.6163 0.76860.7686** 0.6839{\bf 0.6839}
γ\gamma 0.30.3 0.10860.1086** 0.05740.0574** 0.03960.0396** 0.22950.2295** 1.73051.7305 1.23451.2345** 0.72210.7221* 0.07830.0783* 0.07490.0749* 0.10990.1099** 0.16610.1661* 3.02643.0264 1.57341.5734** 0.69380.6938* 0.68260.6826
γ\gamma 0.50.5 0.18310.1831 0.09500.0950 0.06380.0638 0.31460.3146 1.75131.7513 1.18871.1887* 0.6080{\bf 0.6080} 0.12220.1222 0.09600.0960 0.13360.1336 0.24690.2469 3.10483.1048 1.47481.4748* 0.6002{\bf 0.6002} 0.68280.6828*
Mean =20=20
ML 2.46942.4694 0.60270.6027 0.92020.9202 1.76941.7694 0.43880.4388 0.89500.8950 2.16822.1682 6.09836.0983 0.41110.4111 0.92920.9292 3.15093.1509 0.20430.2043 0.81520.8152 4.73884.7388 0.41620.4162
ML ++ Cauchit 0.10150.1015 0.97560.9756 2.26202.2620 1.52321.5232 0.19720.1972 0.59180.5918 3.21113.2111 3.77083.7708 6.70946.7094 24.791324.7913 4.22714.2271 0.26950.2695 1.23851.2385 44.835044.8350 0.57790.5779
DP 0.30.3 0.01800.0180* 0.0008{\bf 0.0008} 0.0277{\bf 0.0277} 0.00390.0039* 0.0003{\bf 0.0003} 0.00690.0069* 0.0036{\bf 0.0036} 0.0538{\bf 0.0538} 0.0645{\bf 0.0645} 0.1004{\bf 0.1004} 0.0948{\bf 0.0948} 0.0284{\bf 0.0284} 0.0417{\bf 0.0417} 0.1395{\bf 0.1395} 0.68290.6829
DP 0.50.5 0.0086{\bf 0.0086} 0.00560.0056* 0.03060.0306* 0.0035{\bf 0.0035} 0.00390.0039* 0.0041{\bf 0.0041} 0.01750.0175* 0.06370.0637** 0.07220.0722** 0.11140.1114** 0.10940.1094** 0.03130.0313** 0.04690.0469** 0.16400.1640** 0.6834{\bf 0.6834}
γ\gamma 0.30.3 0.02070.0207** 0.01960.0196** 0.03920.0392** 0.04100.0410** 0.01100.0110** 0.01780.0178** 0.06250.0625** 0.05660.0566* 0.06730.0673* 0.10470.1047* 0.10140.1014* 0.02970.0297* 0.04400.0440* 0.15080.1508* 0.68320.6832*
γ\gamma 0.50.5 0.08020.0802 0.04880.0488 0.05740.0574 0.10590.1059 0.02860.0286 0.05230.0523 0.15240.1524 0.07940.0794 0.08160.0816 0.12390.1239 0.13640.1364 0.03560.0356 0.05580.0558 0.21120.2112 0.68300.6830**
Table 5: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the logit link, and maximum likelihood method with the cauchit link. The values of the means of the contaminated normal distribution are 5, 10, and 20. Bold, *, and ** denote the first, second, and third best performance, respectively.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Mean =5=5
ML 1.63181.6318 0.27100.2710 0.99970.9997 0.66080.6608 1.4467{\bf 1.4467} 1.57481.5748 1.51891.5189 2.69802.6980 0.16670.1667 1.15211.1521 0.49950.4995 2.1208{\bf 2.1208} 2.51732.5173 2.42722.4272 0.45790.4579
ML ++ Cauchit 0.28910.2891 0.06490.0649 0.43900.4390 0.52110.5211 1.75031.7503* 1.21301.2130** 0.0967{\bf 0.0967} 0.26560.2656 0.20450.2045 0.57730.5773 0.54050.5405 3.12053.1205* 1.55841.5584** 0.70930.7093 0.51890.5189
DP 0.30.3 0.18960.1896 0.0095{\bf 0.0095} 0.16620.1662** 0.2847{\bf 0.2847} 1.75961.7596** 1.23921.2392 0.64530.6453 0.17480.1748 0.1350{\bf 0.1350} 0.24830.2483* 0.2379{\bf 0.2379} 3.14673.1467** 1.60201.6020 0.61750.6175 0.53290.5329
DP 0.50.5 0.0207{\bf 0.0207} 0.04090.0409** 0.0932{\bf 0.0932} 0.45320.4532** 1.80811.8081 1.16901.1690* 0.48230.4823** 0.1095{\bf 0.1095} 0.15950.1595** 0.2440{\bf 0.2440} 0.37100.3710** 3.32603.3260 1.44311.4431* 0.46500.4650* 0.5385{\bf 0.5385}
γ\gamma 0.30.3 0.14390.1439** 0.00990.0099* 0.17290.1729 0.33080.3308* 1.77171.7717 1.21401.2140 0.58690.5869 0.16390.1639** 0.14000.1400* 0.25600.2560** 0.27410.2741* 3.19163.1916 1.54331.5433 0.55590.5559** 0.53370.5337**
γ\gamma 0.50.5 0.13320.1332* 0.09320.0932 0.12240.1224* 0.57320.5732 1.83881.8388 1.1012{\bf 1.1012} 0.32480.3248* 0.13940.1394* 0.18330.1833 0.27410.2741 0.51780.5178 3.44503.4450 1.2996{\bf 1.2996} 0.3729{\bf 0.3729} 0.53690.5369*
Mean =10=10
ML 2.25872.2587 0.35460.3546 1.37861.3786 0.89740.8974 1.4037{\bf 1.4037} 1.75081.7508 1.88271.8827 5.10855.1085 0.20770.2077 2.00742.0074 0.86840.8684 2.0003{\bf 2.0003} 3.10373.1037 3.65623.6562 0.39330.3933
ML ++ Cauchit 0.52270.5227 0.0199{\bf 0.0199} 0.56680.5668 0.4481{\bf 0.4481} 1.71571.7157* 1.29751.2975 0.1614{\bf 0.1614} 0.91360.9136 0.24680.2468 1.04141.0414 0.58690.5869 3.01723.0172* 1.80831.8083 1.74741.7474 0.50610.5061
DP 0.30.3 0.0622{\bf 0.0622} 0.03460.0346* 0.0441{\bf 0.0441} 0.48610.4861* 1.81571.8157** 1.15861.1586 0.48900.4890 0.0859{\bf 0.0859} 0.1441{\bf 0.1441} 0.2095{\bf 0.2095} 0.3833{\bf 0.3833} 3.35213.3521** 1.41481.4148 0.44340.4434 0.5337{\bf 0.5337}
DP 0.50.5 0.09150.0915* 0.04660.0466** 0.04830.0483* 0.51670.5167** 1.82291.8229 1.14371.1437 0.44980.4498 0.09910.0991 0.16010.1601 0.22970.2297 0.43260.4326 3.38263.3826 1.38731.3873 0.43640.4364 0.5337{\bf 0.5337}
γ\gamma 0.30.3 0.11000.1100 0.05750.0575 0.05790.0579 0.53790.5379 1.82881.8288 1.12861.1286* 0.41980.4198** 0.09790.0979* 0.15220.1522* 0.21970.2197* 0.44430.4443** 3.40263.4026 1.34991.3499* 0.39180.3918* 0.53300.5330
γ\gamma 0.50.5 0.20280.2028 0.10070.1007 0.08280.0828 0.63860.6386 1.85361.8536 1.0735{\bf 1.0735} 0.28690.2869* 0.14470.1447 0.18540.1854 0.26000.2600 0.59800.5980 3.50273.5027 1.2425{\bf 1.2425} 0.3528{\bf 0.3528} 0.5337{\bf 0.5337}
Mean =20=20
ML 2.44122.4412 0.37840.3784 1.48461.4846 1.32351.3235 0.37690.3769 0.63160.6316 1.42101.4210 5.96055.9605 0.21960.2196 2.29962.2996 1.81651.8165 0.17270.1727 0.43680.4368 2.12772.1277 0.38190.3819
ML ++ Cauchit 1.11691.1169 0.06610.0661 0.94730.9473 0.20000.2000 0.18150.1815 0.30240.3024 0.06740.0674 2.50902.5090 0.80440.8044 2.96392.9639 0.53940.5394 0.11820.1182 0.28250.2825 1.92911.9291 0.46960.4696
DP 0.30.3 0.0614{\bf 0.0614} 0.0272{\bf 0.0272} 0.0454{\bf 0.0454} 0.0832{\bf 0.0832} 0.0130{\bf 0.0130} 0.0404{\bf 0.0404} 0.10500.1050* 0.0778{\bf 0.0778} 0.1424{\bf 0.1424} 0.2146{\bf 0.2146} 0.1431{\bf 0.1431} 0.0543{\bf 0.0543} 0.0723{\bf 0.0723} 0.2073{\bf 0.2073} 0.5417{\bf 0.5417}
DP 0.50.5 0.07950.0795* 0.03580.0358* 0.05420.0542* 0.10390.1039* 0.01780.0178* 0.05100.0510* 0.13380.1338** 0.09160.0916** 0.15680.1568** 0.23570.2357** 0.16430.1643** 0.05860.0586** 0.08010.0801** 0.24370.2437** 0.54010.5401
γ\gamma 0.30.3 0.10810.1081** 0.04980.0498** 0.05980.0598** 0.13430.1343** 0.02580.0258** 0.07010.0701** 0.17330.1733 0.08930.0893* 0.15000.1500* 0.22520.2252* 0.16160.1616* 0.05760.0576* 0.07930.0793* 0.23710.2371* 0.54160.5416*
γ\gamma 0.50.5 0.18970.1897 0.08910.0891 0.08920.0892 0.22440.2244 0.04790.0479 0.12070.1207 0.29500.2950 0.13360.1336 0.18030.1803 0.26710.2671 0.22650.2265 0.06800.0680 0.10280.1028 0.34770.3477 0.54120.5412**
Table 6: Bias, mean square error (MSE), and percentage of correctly classified responses (CCR) using the maximum likelihood, DP and γ\gamma-divergences methods in the ordinal response model with the log-log link, and maximum likelihood method with the cauchit link. The values of the means of the contaminated normal distribution are 5, 10, and 20.
Bias MSE CCR
tuning β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4} β1\beta_{1} β2\beta_{2} β3\beta_{3} δ1\delta_{1} δ2\delta_{2} δ3\delta_{3} δ4\delta_{4}
Mean =5=5
ML 2.23102.2310 0.29040.2904 0.95320.9532 1.90761.9076 0.27710.2771 1.08001.0800 1.65611.6561 4.98494.9849 0.12530.1253 0.98200.9820 3.66183.6618 0.09560.0956 1.20861.2086 2.85272.8527 0.50150.5015
ML ++ Cauchit 1.26031.2603 0.69410.6941 0.76210.7621 2.69002.6900 0.13190.1319 1.03871.0387 2.52702.5270 1.98071.9807 0.78740.7874 1.06921.0692 7.98897.9889 0.12910.1291 1.52401.5240 7.83967.8396 0.67330.6733
DP 0.30.3 0.0791{\bf 0.0791} 0.0483{\bf 0.0483} 0.0323{\bf 0.0323} 0.1102{\bf 0.1102} 0.0311{\bf 0.0311} 0.0784{\bf 0.0784} 0.1407{\bf 0.1407} 0.0867{\bf 0.0867} 0.0866{\bf 0.0866} 0.1212{\bf 0.1212} 0.1506{\bf 0.1506} 0.0437{\bf 0.0437} 0.1232{\bf 0.1232} 0.2839{\bf 0.2839} 0.6818{\bf 0.6818}
DP 0.50.5 0.10520.1052* 0.06100.0610* 0.04100.0410* 0.14200.1420* 0.04070.0407* 0.10530.1053* 0.18480.1848* 0.10820.1082** 0.10110.1011** 0.13880.1388** 0.18820.1882** 0.04900.0490** 0.14920.1492** 0.35100.3510** 0.68130.6813
γ\gamma 0.30.3 0.13040.1304** 0.07270.0727** 0.04560.0456** 0.17250.1725** 0.05550.0555** 0.13640.1364** 0.22810.2281** 0.10190.1019* 0.09340.0934* 0.12740.1274* 0.17620.1762* 0.04800.0480* 0.14270.1427* 0.33220.3322* 0.68140.6814**
γ\gamma 0.50.5 0.22860.2286 0.11990.1199 0.07390.0739 0.29420.2942 0.09800.0980 0.24360.2436 0.39480.3948 0.16700.1670 0.12480.1248 0.15950.1595 0.28640.2864 0.06380.0638 0.22240.2224 0.53030.5303 0.68150.6815*
Mean =10=10
ML 2.40702.4070 0.25500.2550 1.05121.0512 2.00482.0048 0.30020.3002 1.12891.1289 1.71761.7176 5.79495.7949 0.10040.1004** 1.17681.1768 4.03424.0342 0.10820.1082 1.31561.3156 3.05513.0551 0.51420.5142
ML ++ Cauchit 0.75290.7529 0.62220.6222 1.36421.3642 2.58902.5890 0.21440.2144 0.78580.7858 2.54722.5472 2.96692.9669 2.04772.0477 10.985510.9855 8.88898.8889 0.26270.2627 2.45162.4516 19.979319.9793 0.64540.6454
DP 0.30.3 0.0864{\bf 0.0864} 0.0531{\bf 0.0531} 0.0287{\bf 0.0287} 0.1126{\bf 0.1126} 0.0340{\bf 0.0340} 0.0883{\bf 0.0883} 0.1506{\bf 0.1506} 0.0879{\bf 0.0879} 0.0888{\bf 0.0888} 0.1178{\bf 0.1178} 0.1489{\bf 0.1489} 0.0449{\bf 0.0449} 0.1250{\bf 0.1250} 0.2873{\bf 0.2873} 0.6660{\bf 0.6660}
DP 0.50.5 0.11150.1115* 0.06580.0658* 0.03840.0384* 0.14360.1436* 0.04320.0432* 0.11440.1144* 0.19490.1949* 0.11090.1109** 0.10270.1027 0.13570.1357** 0.18830.1883** 0.05080.0508** 0.15190.1519** 0.35800.3580** 0.66570.6657*
γ\gamma 0.30.3 0.13790.1379** 0.07770.0777** 0.04220.0422** 0.17510.1751** 0.05850.0585** 0.14680.1468** 0.23870.2387** 0.10410.1041* 0.09600.0960* 0.12390.1239* 0.17510.1751* 0.04940.0494* 0.14580.1458* 0.33790.3379* 0.66570.6657*
γ\gamma 0.50.5 0.23570.2357 0.12530.1253 0.07170.0717 0.29670.2967 0.10090.1009 0.25380.2538 0.40680.4068 0.17210.1721 0.12720.1272 0.15590.1559 0.28810.2881 0.06620.0662 0.22840.2284 0.54450.5445 0.66560.6656
Mean =20=20
ML 2.47172.4717 0.23870.2387 1.08201.0820 2.02932.0293 0.31920.3192 1.15541.1554 1.75421.7542 6.10986.1098 0.09030.0903 1.24141.2414 4.13204.1320 0.12090.1209 1.37581.3758 3.17933.1793 0.50440.5044
ML ++ Cauchit 0.88950.8895 0.17340.1734 2.61482.6148 1.84591.8459 0.50270.5027 0.20190.2019 1.57741.5774 4.13094.1309 2.39682.3968 29.965329.9653 5.42735.4273 0.49160.4916 2.37542.3754 28.155628.1556 0.57420.5742
DP 0.30.3 0.0649{\bf 0.0649} 0.0439{\bf 0.0439} 0.0296{\bf 0.0296} 0.0951{\bf 0.0951} 0.0271{\bf 0.0271} 0.0665{\bf 0.0665} 0.1138{\bf 0.1138} 0.0802{\bf 0.0802} 0.0885{\bf 0.0885} 0.1181{\bf 0.1181} 0.1427{\bf 0.1427} 0.0430{\bf 0.0430} 0.1159{\bf 0.1159} 0.2642{\bf 0.2642} 0.66840.6684*
DP 0.50.5 0.08770.0877* 0.05550.0555* 0.03770.0377* 0.12250.1225* 0.03580.0358* 0.09060.0906* 0.15420.1542* 0.10020.1002** 0.10190.1019** 0.13440.1344** 0.17680.1768** 0.04850.0485** 0.14020.1402** 0.32570.3257** 0.66820.6682**
γ\gamma 0.30.3 0.11590.1159* 0.06830.0683* 0.04310.0431* 0.15730.1573* 0.05160.0516* 0.12450.1245* 0.20120.2012* 0.09400.0940* 0.09530.0953* 0.12420.1242* 0.16660.1666* 0.04720.0472* 0.13390.1339* 0.30750.3075* 0.66810.6681
γ\gamma 0.50.5 0.21050.2105 0.11450.1145 0.07040.0704 0.27410.2741 0.09360.0936 0.22860.2286 0.36340.3634 0.15420.1542 0.12490.1249 0.15420.1542 0.26820.2682 0.06300.0630 0.20870.2087 0.49030.4903 0.6693{\bf 0.6693}

6.2 Real data analysis

This section shows the robustness and usefulness of the proposed robust ordinal response with the DP and γ\gamma-divergences through the analysis of two real datasets: Boston housing data (Harrison and Rubinfeld,, 1978) and Affairs data (Fair,, 1978).

The Boston housing data is often used to evaluate the performance of robust inference methods in models with the continuous data as the response variable, such as the linear regression (Hashimoto and Sugasawa,, 2020; Hamura et al.,, 2022; Kawakami and Hashimoto,, 2022), but in this case, the response variable, the corrected median value of owner-occupied homes in USD 1000’s is applied to the ordinal response model assuming that it is obtained as the ordered categorical data with five categories. The Boston housing data has 14 variables including one binary variable. The Affairs data contains 3 continuous, 2 binary, 2 ordered categorical, and 2 multinomial categorical variables. We set one of the ordered categorical data, the self rating of marriage as the response variable. In the two real data, the continuous data are standardized to have mean 0 and variance 1, the ordered categorical data in the covariates is numerically transformed using the Likert sigma method, and the multinomial categorical data are transformed using the dummy variables. In analyzing the two sets of real data, we use the commonly-used symmetric link functions, the probit and logit links.

The generalized residuals (Franses and Paap,, 2001) plots with two link functions for two real datasets are shown in Figures 2 and 3. The solid and dashed lines in these figures indicate 95% and 99% intervals of the generalized residuals, respectively. In Subsections 6.2.1 and 6.2.2, the original data and the modified data in which individuals with the values of the generalized residuals larger than the 95% interval (solid line) in Figures 2 and 3 are removed are used for the data analyses.

Refer to caption
Figure 2: The generalized residuals (Franses and Paap,, 2001) plots for the Boston housing data with the probit and logit links. The solid and dashed lines indicate 95% and 99% intervals of the generalized residuals, respectively.
Refer to caption
Figure 3: The generalized residuals (Franses and Paap,, 2001) plots for the Affairs data with the probit and logit links. Solid and dashed lines indicate 95% and 99% intervals of the generalized residuals, respectively.

6.2.1 Boston housing data

In this section, we consider the Boston data (Harrison and Rubinfeld,, 1978) that contains 506 individuals and 14 variables including one binary variable. As mentioned above, the Boston housing data is often used in the numerical experiments of the robust inference methods for the continuous data as the response variable (Hashimoto and Sugasawa,, 2020; Hamura et al.,, 2022; Kawakami and Hashimoto,, 2022). Hence, we transform the continuous response variable, the corrected median value of owner-occupied homes in USD 1000’s to the the ordered categorical response variable with five categories (10,10<x20,20<x30,30<x40,40<\leq 10,10<x\leq 20,20<x\leq 30,30<x\leq 40,40<). We also standardize 12 continuous valued covariates as mean 0 and variance 1. For the link function in the data analysis, we use the commonly-used symmetric link functions, the probit and logit links.

Table 7 shows the estimates of each parameter when applying the maximum likelihood method and the proposed robust ordinal response model with the DP and γ\gamma-divergences (α,γ=0.3,0.5\alpha,\gamma=0.3,0.5) for the probit and logit links to the original Boston housing data and the modified data excluding individuals whose values of the generalized residuals are larger than the 95% interval (Figure 2). The values in the row of “Distance” in the column of each method are on the left, the sum of squares of the differences between the estimated values of each parameter applying the ML to the modified data and the corresponding method to the original data divided by the number of parameters, and on the right, that between the estimated values of each parameter applying the corresponding method to the original and modified data divided by the number of parameters. Since these two values are the same in the ML, they are listed only in the “original” column. The smaller these two values are, the less sensitive the inference method is to outliers.

It can be seen from Table 7 that the values of each “Distance” are smaller for all of our proposed methods, both with the probit and logit links, compared to ML. Namely, our proposed methods with the robust divergences are able to eliminate the influence of data that appear to be outliers. Furthermore, for the same tuning parameter values, the inferences with the γ\gamma-divergence have smaller values of the ”Distance” than those with the DP-divergence. The reason why the values of “Distance” are not closer to zero may be that it is difficult to identify outliers by using the generalized residuals for high-dimensional data, and therefore, the data whose values of the generalized residuals are larger than the 95% interval do not always coincide with the data whose influence on the inference is removed by the proposed methods.

Table 7: Estimates of each parameter when applying the maximum likelihood method and the proposed robust ordinal response model with the DP and γ\gamma-divergences (α,γ=0.3,0.5\alpha,\gamma=0.3,0.5) for the probit and logit links to the original Boston housing data and the modified data excluding individuals whose values of the generalized residuals are larger than the 95% interval (Figure 2). The values in the row of “Distance” in the column of each method are on the left, the sum of squares of the differences between the estimated values of each parameter applying the ML to the modified data and the corresponding method to the original data divided by the number of parameters, and on the right, that between the estimated values of each parameter applying the corresponding method to the original and modified data divided by the number of parameters.
ML DP-div.: α=0.3\alpha=0.3 DP-div.: α=0.5\alpha=0.5 γ\gamma-div.: γ=0.3\gamma=0.3 γ\gamma-div.: γ=0.5\gamma=0.5
probit original modified original modified original modified original modified original modified
Coefficient
β1\beta_{1}: crim 0.3745-0.3745 0.6241-0.6241 0.4306-0.4306 0.5983-0.5983 0.4545-0.4545 0.5867-0.5867 0.4333-0.4333 0.5986-0.5986 0.4664-0.4664 0.5891-0.5891
β2\beta_{2}: zn 0.21230.2123 0.32800.3280 0.22350.2235 0.33140.3314 0.23400.2340 0.33720.3372 0.22440.2244 0.33150.3315 0.23860.2386 0.33850.3385
β3\beta_{3}: indus 0.21720.2172 0.24160.2416 0.24710.2471 0.24780.2478 0.27040.2704 0.26260.2626 0.24860.2486 0.24800.2480 0.27730.2773 0.26390.2639
β4\beta_{4}: chas 0.55370.5537 0.70090.7009 0.45740.4574 0.60520.6052 0.47370.4737 0.56930.5693 0.45950.4595 0.60540.6054 0.48530.4853 0.57160.5716
β5\beta_{5}: nox 0.5457-0.5457 0.5447-0.5447 0.4104-0.4104 0.5073-0.5073 0.3659-0.3659 0.4757-0.4757 0.4116-0.4116 0.5076-0.5076 0.3699-0.3699 0.4771-0.4771
β6\beta_{6}: rm 0.44540.4454 0.85480.8548 0.83470.8347 0.94610.9461 1.06611.0661 1.14461.1446 0.84110.8411 0.94680.9468 1.09941.0994 1.15261.1526
β7\beta_{7}: age 0.1257-0.1257 0.1802-0.1802 0.2200-0.2200 0.1903-0.1903 0.2838-0.2838 0.2643-0.2643 0.2214-0.2214 0.1904-0.1904 0.2920-0.2920 0.2665-0.2665
β8\beta_{8}: dis 0.6845-0.6845 0.9430-0.9430 0.6540-0.6540 0.8803-0.8803 0.6483-0.6483 0.8655-0.8655 0.6566-0.6566 0.8806-0.8806 0.6592-0.6592 0.8687-0.8687
β9\beta_{9}: rad 0.98690.9869 1.53331.5333 1.01091.0109 1.45071.4507 1.05781.0578 1.41181.4118 1.01611.0161 1.45141.4514 1.08401.0840 1.41721.4172
β10\beta_{10}: tax 0.7900-0.7900 1.2633-1.2633 0.9249-0.9249 1.2366-1.2366 1.0285-1.0285 1.2759-1.2759 0.9302-0.9302 1.2372-1.2372 1.0556-1.0556 1.2817-1.2817
β11\beta_{11}: ptratio 0.5598-0.5598 0.8484-0.8484 0.6101-0.6101 0.8033-0.8033 0.6435-0.6435 0.8087-0.8087 0.6134-0.6134 0.8038-0.8038 0.6584-0.6584 0.8124-0.8124
β12\beta_{12}: black 0.33140.3314 0.62140.6214 0.57210.5721 0.67520.6752 0.68140.6814 0.74900.7490 0.57650.5765 0.67550.6755 0.70310.7031 0.75350.7535
β13\beta_{13}: lstat 1.3334-1.3334 2.2568-2.2568 1.4068-1.4068 2.1002-2.1002 1.4452-1.4452 1.9575-1.9575 1.4147-1.4147 2.1012-2.1012 1.4797-1.4797 1.9641-1.9641
Distance 0.14450.1445 - 0.10620.1062 0.07340.0734 0.09610.0961 0.04490.0449 0.10400.1040 0.07170.0717 0.08890.0889 0.03960.0396
Cutoff
δ1\delta_{1} 4.2923-4.2923 7.2768-7.2768 5.2172-5.2172 7.1307-7.1307 5.8103-5.8103 7.2795-7.2795 5.2502-5.2502 7.1343-7.1343 5.9749-5.9749 7.3150-7.3150
δ2\delta_{2} 0.3806-0.3806 0.5431-0.5431 0.3763-0.3763 0.5100-0.5100 0.3756-0.3756 0.5242-0.5242 0.3777-0.3777 0.5102-0.5102 0.3811-0.3811 0.5265-0.5265
δ3\delta_{3} 2.50332.5033 4.03944.0394 3.02193.0219 3.96793.9679 3.34263.3426 4.08794.0879 3.04103.0410 3.97003.9700 3.43253.4325 4.10834.1083
δ4\delta_{4} 3.79203.7920 6.15546.1554 4.74094.7409 6.09926.0992 5.30435.3043 6.37536.3753 4.76854.7685 6.10236.1023 5.43775.4377 6.40656.4065
Distance 4.21974.2197 - 1.82651.8265 1.60481.6048 0.84720.8472 0.97080.9708 1.76381.7638 1.55231.5523 0.65120.6512 0.80310.8031
ML DP-div.: α=0.3\alpha=0.3 DP-div.: α=0.5\alpha=0.5 γ\gamma-div.: γ=0.3\gamma=0.3 γ\gamma-div.: γ=0.5\gamma=0.5
logit original modified original modified original modified original modified original modified
Coefficient
β1\beta_{1}: crim 0.7092-0.7092 1.1042-1.1042 0.7492-0.7492 1.0500-1.0500 0.7689-0.7689 1.0483-1.0483 0.7529-0.7529 1.0502-1.0502 0.7874-0.7874 1.0516-1.0516
β2\beta_{2}: zn 0.39210.3921 0.58840.5884 0.39360.3936 0.58710.5871 0.40870.4087 0.60220.6022 0.39510.3951 0.58710.5871 0.41680.4168 0.60400.6040
β3\beta_{3}: indus 0.37600.3760 0.45010.4501 0.41570.4157 0.45410.4541 0.44600.4460 0.46580.4658 0.41780.4178 0.45430.4543 0.45630.4563 0.46720.4672
β4\beta_{4}: chas 0.85280.8528 1.18481.1848 0.78130.7813 1.02131.0213 0.80590.8059 0.97170.9717 0.78490.7849 1.02141.0214 0.82400.8240 0.97450.9745
β5\beta_{5}: nox 0.8716-0.8716 0.9267-0.9267 0.6739-0.6739 0.8367-0.8367 0.6023-0.6023 0.7828-0.7828 0.6758-0.6758 0.8368-0.8368 0.6088-0.6088 0.7844-0.7844
β6\beta_{6}: rm 0.97480.9748 1.63161.6316 1.52571.5257 1.83331.8333 1.86501.8650 2.05132.0513 1.53521.5352 1.83391.8339 1.91691.9169 2.05942.0594
β7\beta_{7}: age 0.2244-0.2244 0.3350-0.3350 0.3951-0.3951 0.3906-0.3906 0.5077-0.5077 0.4721-0.4721 0.3974-0.3974 0.3908-0.3908 0.5211-0.5211 0.4739-0.4739
β8\beta_{8}: dis 1.1820-1.1820 1.6192-1.6192 1.1127-1.1127 1.5037-1.5037 1.1010-1.1010 1.4781-1.4781 1.1167-1.1167 1.5039-1.5039 1.1188-1.1188 1.4819-1.4819
β9\beta_{9}: rad 1.77191.7719 2.68902.6890 1.76141.7614 2.52672.5267 1.81281.8128 2.50412.5041 1.76911.7691 2.52742.5274 1.85371.8537 2.51132.5113
β10\beta_{10}: tax 1.4505-1.4505 2.2116-2.2116 1.6220-1.6220 2.1835-2.1835 1.7665-1.7665 2.2500-2.2500 1.6297-1.6297 2.1843-2.1843 1.8079-1.8079 2.2570-2.2570
β11\beta_{11}: ptratio 1.0137-1.0137 1.4820-1.4820 1.0525-1.0525 1.4037-1.4037 1.0947-1.0947 1.4056-1.4056 1.0573-1.0573 1.4041-1.4041 1.1175-1.1175 1.4098-1.4098
β12\beta_{12}: black 0.73360.7336 1.39131.3913 1.03451.0345 1.45561.4556 1.16081.1608 1.53381.5338 1.04071.0407 1.45601.4560 1.19281.1928 1.53931.5393
β13\beta_{13}: lstat 2.4741-2.4741 3.8863-3.8863 2.4362-2.4362 3.5245-3.5245 2.4360-2.4360 3.3565-3.3565 2.4471-2.4471 3.5251-3.5251 2.4874-2.4874 3.3653-3.3653
Distance 0.38570.3857 - 0.32970.3297 0.21900.2190 0.30910.3091 0.16530.1653 0.32380.3238 0.21420.2142 0.28600.2860 0.14790.1479
Cutoff
δ1\delta_{1} 8.2184-8.2184 13.1173-13.1173 9.2307-9.2307 12.8034-12.8034 9.9826-9.9826 13.0503-13.0503 9.2789-9.2789 12.8064-12.8064 10.2354-10.2354 13.0931-13.0931
δ2\delta_{2} 0.6643-0.6643 0.8915-0.8915 0.6450-0.6450 0.8540-0.8540 0.6628-0.6628 0.8716-0.8716 0.6471-0.6471 0.8542-0.8542 0.6733-0.6733 0.8742-0.8742
δ3\delta_{3} 4.72624.7262 7.22107.2210 5.31225.3122 7.05997.0599 5.72645.7264 7.17937.1793 5.33985.3398 7.06167.0616 5.86385.8638 7.20247.2024
δ4\delta_{4} 7.20107.2010 11.003511.0035 8.33038.3303 10.896810.8968 9.09049.0904 11.193311.1933 8.37068.3706 10.899410.8994 9.29589.2958 11.228111.2281
Distance 11.183311.1833 - 6.48896.4889 5.61225.6122 3.94313.9431 3.99693.9969 6.31616.3161 5.46155.4615 3.27773.2777 3.43313.4331

6.2.2 Affairs data

Consider the Affairs data (Fair,, 1978) that contains 601 observations and 9 variables including 3 continuous, 2 binary, 2 ordered categorical, and 2 multinomial categorical variables. We analyze the data with the self rating of marriage as the ordered categorical response variable, and then we standardize 3 continuous valued covariates as mean 0 and variance 1, transform one ordered categorical covariate by the Likert sigma method, and create dummy variables from the multinomial categorical covariates. For the link function in the data analysis, we use the commonly-used symmetric link functions, the probit and logit links.

Table 8 shows the estimates of each parameter when applying the maximum likelihood method and the proposed robust ordinal response model with the DP and γ\gamma-divergences (α,γ=0.3,0.5\alpha,\gamma=0.3,0.5) for the probit and logit links to the original Affairs data and the modified data excluding individuals whose values of the generalized residuals are larger than the 95% interval (Figure 3). The values in the row of “Distance” in the column of each method are on the left, the sum of squares of the differences between the estimated values of each parameter applying the ML to the modified data and the corresponding method to the original data divided by the number of parameters, and on the right, that between the estimated values of each parameter applying the corresponding method to the original and modified data divided by the number of parameters. Since these two values are the same in the ML, they are listed only in the “original” column. The smaller these two values are, the less sensitive the inference method is to outliers.

It can be seen from Table 8 that the values of each “Distance” are smaller for all of our proposed methods, with both the probit and logit links, compared to ML. Namely, our proposed methods with the robust divergences are able to eliminate the influence of data that appear to be outliers. The values of ”Distance” in our proposed inference methods with the DP and γ\gamma-divergences are about the same. The reason why the values of “Distance” are not closer to zero may be that it is difficult to identify outliers by using the generalized residuals for high-dimensional data, and therefore, the data whose values of the generalized residuals are larger than the 95% interval do not always coincide with the data whose influence on the inference is removed by the proposed methods.

Table 8: Estimates of each parameter when applying the maximum likelihood method and the proposed robust ordinal response model with the DP and γ\gamma-divergences (α,γ=0.3,0.5\alpha,\gamma=0.3,0.5) for the probit and logit links to the original Affairs data and the modified data excluding individuals whose values of the generalized residuals are larger than the 95% interval (Figure 3). The values in the row of “Distance” in the column of each method are on the left, the sum of squares of the differences between the estimated values of each parameter applying the ML to the modified data and the corresponding method to the original data divided by the number of parameters, and on the right, that between the estimated values of each parameter applying the corresponding method to the original and modified data divided by the number of parameters.
ML DP-div.: α=0.3\alpha=0.3 DP-div.: α=0.5\alpha=0.5 γ\gamma-div.: γ=0.3\gamma=0.3 γ\gamma-div.: γ=0.5\gamma=0.5
probit original modified original modified original modified original modified original modified
Coefficient
β1\beta_{1}: affairs 0.2273-0.2273 0.4026-0.4026 0.2465-0.2465 0.3898-0.3898 0.2616-0.2616 0.3844-0.3844 0.2466-0.2466 0.3897-0.3897 0.2625-0.2625 0.3845-0.3845
β2\beta_{2}: gender 0.0631-0.0631 0.0432-0.0432 0.0826-0.0826 0.0575-0.0575 0.0886-0.0886 0.0674-0.0674 0.0826-0.0826 0.0575-0.0575 0.0888-0.0888 0.0673-0.0673
β3\beta_{3}: age 0.0660-0.0660 0.1369-0.1369 0.0628-0.0628 0.1280-0.1280 0.0660-0.0660 0.1253-0.1253 0.0629-0.0629 0.1281-0.1281 0.0662-0.0662 0.1253-0.1253
β4\beta_{4}: yearsmarried 0.1281-0.1281 0.0745-0.0745 0.1327-0.1327 0.0979-0.0979 0.1343-0.1343 0.1103-0.1103 0.1327-0.1327 0.0979-0.0979 0.1346-0.1346 0.1103-0.1103
β5\beta_{5}: children 0.2730-0.2730 0.3264-0.3264 0.2863-0.2863 0.3091-0.3091 0.2956-0.2956 0.3049-0.3049 0.2864-0.2864 0.3091-0.3091 0.2965-0.2965 0.3050-0.3050
β6\beta_{6}: religiousness 0.04510.0451 0.03130.0313 0.03480.0348 0.02310.0231 0.03040.0304 0.02010.0201 0.03480.0348 0.02310.0231 0.03040.0304 0.02010.0201
β7β12\beta_{7}\sim\beta_{12}: 0.74290.7429 1.51771.5177 0.87030.8703 1.53931.5393 1.00831.0083 1.64421.6442 0.87110.8711 1.54121.5412 1.00651.0065 1.64371.6437
1.11351.1135 1.98911.9891 1.24161.2416 1.97851.9785 1.38091.3809 2.06362.0636 1.24251.2425 1.98031.9803 1.38001.3800 2.06322.0632
education 1.30701.3070 2.18212.1821 1.42361.4236 2.15662.1566 1.55251.5525 2.23022.2302 1.42461.4246 2.15852.1585 1.55191.5519 2.22972.2297
1.15341.1534 2.16172.1617 1.34181.3418 2.18202.1820 1.51451.5145 2.28002.2800 1.34281.3428 2.18392.1839 1.51441.5144 2.27962.2796
1.09741.0974 1.98411.9841 1.25501.2550 1.98701.9870 1.40661.4066 2.07892.0789 1.25591.2559 1.98891.9889 1.40581.4058 2.07852.0785
1.37571.3757 2.32202.3220 1.51581.5158 2.30902.3090 1.66261.6626 2.39352.3935 1.51681.5168 2.31092.3109 1.66251.6625 2.39302.3930
β13β18\beta_{13}\sim\beta_{18}: 0.35890.3589 0.48290.4829 0.28060.2806 0.32120.3212 0.24050.2405 0.24260.2426 0.28080.2808 0.32150.3215 0.23980.2398 0.24250.2425
0.2767-0.2767 0.3298-0.3298 0.2390-0.2390 0.3097-0.3097 0.2198-0.2198 0.2969-0.2969 0.2390-0.2390 0.3096-0.3096 0.2202-0.2202 0.2970-0.2970
occupation 0.05590.0559 0.04750.0475 0.05910.0591 0.03930.0393 0.05900.0590 0.03940.0394 0.05920.0592 0.03940.0394 0.05890.0589 0.03930.0393
0.06210.0621 0.02870.0287 0.03520.0352 0.00260.0026 0.01700.0170 0.0079-0.0079 0.03520.0352 0.00270.0027 0.01680.0168 0.0079-0.0079
0.0187-0.0187 0.0591-0.0591 0.0472-0.0472 0.0839-0.0839 0.0683-0.0683 0.0967-0.0967 0.0472-0.0472 0.0838-0.0838 0.0687-0.0687 0.0967-0.0967
0.7293-0.7293 0.8639-0.8639 0.7464-0.7464 0.8580-0.8580 0.7664-0.7664 0.8550-0.8550 0.7466-0.7466 0.8578-0.8578 0.7686-0.7686 0.8551-0.8551
Distance 0.27290.2729 - 0.19480.1948 0.19160.1916 0.12920.1292 0.16320.1632 0.19430.1943 0.19200.1920 0.12940.1294 0.16330.1633
Cutoff
δ1\delta_{1} 1.2631-1.2631 1.5275-1.5275 1.2035-1.2035 1.2956-1.2956 1.1385-1.1385 1.1313-1.1313 1.2034-1.2034 1.2937-1.2937 1.1453-1.1453 1.1322-1.1322
δ2\delta_{2} 0.2932-0.2932 0.40430.4043 0.1876-0.1876 0.40330.4033 0.0719-0.0719 0.48740.4874 0.1871-0.1871 0.40530.4053 0.0762-0.0762 0.48670.4867
δ3\delta_{3} 0.32260.3226 1.13511.1351 0.41950.4195 1.10321.1032 0.53270.5327 1.17001.1700 0.42010.4201 1.10521.1052 0.52860.5286 1.16931.1693
δ4\delta_{4} 1.24361.2436 2.16332.1633 1.35081.3508 2.12692.1269 1.47451.4745 2.19482.1948 1.35151.3515 2.12882.1288 1.47181.4718 2.19422.1942
Distance 0.51560.5156 - 0.40690.4069 0.35680.3568 0.30390.3039 0.30950.3095 0.40620.4062 0.35820.3582 0.30570.3057 0.31230.3123
ML DP-div.: α=0.3\alpha=0.3 DP-div.: α=0.5\alpha=0.5 γ\gamma-div.: γ=0.3\gamma=0.3 γ\gamma-div.: γ=0.5\gamma=0.5
logit original modified original modified original modified original modified original modified
Coefficient
β1\beta_{1}: affairs 0.4181-0.4181 0.6741-0.6741 0.4383-0.4383 0.6423-0.6423 0.4521-0.4521 0.6271-0.6271 0.4384-0.4384 0.6421-0.6421 0.4530-0.4530 0.6269-0.6269
β2\beta_{2}: gender 0.1424-0.1424 0.1107-0.1107 0.1623-0.1623 0.1341-0.1341 0.1684-0.1684 0.1478-0.1478 0.1623-0.1623 0.1340-0.1340 0.1688-0.1688 0.1477-0.1477
β3\beta_{3}: age 0.1077-0.1077 0.2182-0.2182 0.1064-0.1064 0.2051-0.2051 0.1125-0.1125 0.2002-0.2002 0.1064-0.1064 0.2051-0.2051 0.1127-0.1127 0.2001-0.2001
β4\beta_{4}: yearsmarried 0.2196-0.2196 0.1568-0.1568 0.2281-0.2281 0.1885-0.1885 0.2318-0.2318 0.2042-0.2042 0.2282-0.2282 0.1884-0.1884 0.2323-0.2323 0.2041-0.2041
β5\beta_{5}: children 0.4385-0.4385 0.4861-0.4861 0.4513-0.4513 0.4668-0.4668 0.4604-0.4604 0.4651-0.4651 0.4514-0.4514 0.4667-0.4667 0.4611-0.4611 0.4651-0.4651
β6\beta_{6}: religiousness 0.05580.0558 0.03390.0339 0.04510.0451 0.02850.0285 0.04130.0413 0.02800.0280 0.04510.0451 0.02850.0285 0.04140.0414 0.02800.0280
β7β12\beta_{7}\sim\beta_{12}: 1.46111.4611 2.92142.9214 1.70351.7035 3.18483.1848 1.94501.9450 3.46663.4666 1.70711.7071 3.18403.1840 1.95201.9520 3.47513.4751
2.07522.0752 3.68353.6835 2.30102.3010 3.86983.8698 2.52942.5294 4.10644.1064 2.30482.3048 3.86893.8689 2.53732.5373 4.11484.1148
education 2.42132.4213 4.01964.0196 2.61702.6170 4.17454.1745 2.82372.8237 4.39094.3909 2.62102.6210 4.17354.1735 2.83202.8320 4.39944.3994
2.23952.2395 4.03034.0303 2.54792.5479 4.25134.2513 2.81192.8119 4.49934.4993 2.55202.5520 4.25034.2503 2.82092.8209 4.50784.5078
2.12842.1284 3.73423.7342 2.38742.3874 3.93523.9352 2.62672.6267 4.17954.1795 2.39142.3914 3.93433.9343 2.63492.6349 4.18814.1881
2.58472.5847 4.27914.2791 2.81772.8177 4.45054.4505 3.04793.0479 4.68044.6804 2.82172.8217 4.44944.4494 3.05683.0568 4.68884.6888
β13β18\beta_{13}\sim\beta_{18}: 0.54750.5475 0.66730.6673 0.41280.4128 0.41970.4197 0.34130.3413 0.31890.3189 0.41320.4132 0.41960.4196 0.34130.3413 0.31950.3195
0.4186-0.4186 0.5234-0.5234 0.3535-0.3535 0.4802-0.4802 0.3242-0.3242 0.4542-0.4542 0.3535-0.3535 0.4802-0.4802 0.3242-0.3242 0.4539-0.4539
occupation 0.10330.1033 0.08050.0805 0.09990.0999 0.07420.0742 0.09990.0999 0.07850.0785 0.10000.1000 0.07410.0741 0.10000.1000 0.07870.0787
0.07590.0759 0.01870.0187 0.03510.0351 0.0086-0.0086 0.01260.0126 0.0179-0.0179 0.03510.0351 0.0087-0.0087 0.01250.0125 0.0178-0.0178
0.0658-0.0658 0.1195-0.1195 0.1104-0.1104 0.1501-0.1501 0.1367-0.1367 0.1666-0.1666 0.1104-0.1104 0.1502-0.1502 0.1371-0.1371 0.1665-0.1665
1.2318-1.2318 1.4133-1.4133 1.2430-1.2430 1.3711-1.3711 1.2524-1.2524 1.3457-1.3457 1.2433-1.2433 1.3709-1.3709 1.2546-1.2546 1.3454-1.3454
Distance 0.89340.8934 - 0.65060.6506 0.84090.8409 0.45470.4547 0.84720.8472 0.64690.6469 0.83570.8357 0.44840.4484 0.84740.8474
Cutoff
δ1\delta_{1} 2.2134-2.2134 2.6098-2.6098 2.0120-2.0120 1.8811-1.8811 1.8218-1.8218 1.4920-1.4920 2.0093-2.0093 1.8808-1.8808 1.8210-1.8210 1.4854-1.4854
δ2\delta_{2} 0.2845-0.2845 1.00771.0077 0.0700-0.0700 1.24091.2409 0.14280.1428 1.49461.4946 0.0667-0.0667 1.24031.2403 0.14720.1472 1.50341.5034
δ3\delta_{3} 0.79320.7932 2.29412.2941 0.97600.9760 2.45282.4528 1.17111.1711 2.66502.6650 0.97950.9795 2.45222.4522 1.17621.1762 2.67392.6739
δ4\delta_{4} 2.31552.3155 3.99633.9963 2.51172.5117 4.14014.1401 2.71942.7194 4.34994.3499 2.51542.5154 4.13934.1393 2.72592.7259 4.35864.3586
Distance 1.72621.7262 - 1.36501.3650 1.64211.6421 1.06521.0652 1.70661.7066 1.35901.3590 1.63271.6327 1.05661.0566 1.71511.7151

7 Conclusion and remarks

This study examined the problem of outliers in ordinal response model, which is a regression on ordered categorical data as the response variable. Scalera et al., (2021) derived the condition for the link functions to make the influence function in the maximum likelihood method bounded in the ordinal response model, but the probit link, logit link, complementary log-log link, and log-log link, which are commonly used, do not satisfy the condition. Thus, when outliers exist in the data, the maximum likelihood method imposes restrictions on the link function, and that induces misspecification of the link function under such situations. In our numerical experiments, we also confirmed that the misspecification of the link function induces a substantial bias in the estimation of the parameters.

Scalera et al., (2021) mentions the boundedness of the influence function in the maximum likelihood method for the ordinal response model, and we further show that the influence function does not satisfy redescendence (Theorem 3 in Section 4). This means that the maximum likelihood method in the ordinal response model cannot completely eliminate the influence of large outliers, and in fact, in our numerical experiments, as the outlier ratio increases, the accuracy of the prediction becomes worse even if the cauchit link which has the bounded influence function is used.

To solve these problems and to realize robust inference in the ordinal response model against outliers, we proposed inference methods in the ordinal response model using the DP and γ\gamma-divergences. We also derived the influence functions for the methods with the DP and γ\gamma-divergences, and derived the conditions for the link functions to satisfy boundedness and redescendence (Theorems 4 and 5 in Section 4). The commonly used link functions satisfy the conditions, and our method allows practitioners using ordinal response models to perform robust against outliers and flexible analysis.

Numerical experiments were conducted to evaluate the performance of our proposed methods with the DP and γ\gamma-divergences using bias, mean square error (MSE), and percentage of correctly classified responses (CCR). In the absence of outliers, both methods perform as well as the maximum likelihood method, and in the presence of outliers, the results are almost unaffected by outliers. As for the difference between the methods using the DP and γ\gamma-divergences, the bias value is slightly smaller when the DP divergence is used, but the MSE value is moderately larger than that using the γ\gamma-divergence. In numerical experiments with two real data sets, the Boston housing and Affairs data, we also confirmed that our proposed methods with two robust divergences give the robust inference results that remove the influence of data that appear to be outliers.

Acknowledgement

We thank Prof. Shounosuke Sugasawa and Prof. Kaoru Irie of the University of Tokyo for their very helpful comments. This work was JSPS Grant-in-Aid for Early-Career Scientists Grant Number JP19K14597 and JSPS Grant-in-Aid for Scientific Research (B) Number 21H00699.

Appendix

A.1 Proof of Theorem 3

First, consider the influence function of βk\beta_{k} (10) for k=1,2,,pk=1,2,\ldots,p. From appendix of Scalera et al., (2021), for the influence function of βk\beta_{k} to have redescendence, the order of ug(u)/g(u)ug^{\prime}(u)/g(u) must be smaller than 1, where g()g^{\prime}(\cdot) is the first derivative of g()g(\cdot). This means that when the order of the tail of the function g(u)g(u) is equal to (logu)1(\log u)^{-1}, the influence function of βk\beta_{k} has redescendence, however, no such distributions exist since distributions with order of tail more slowly than u1u^{-1} are improper.

Next, consider the influence function of δl\delta_{l} (11) for l=1,2,,M1l=1,2,\ldots,M-1. Since from Lagrange’s theorem G(δl𝒙i𝜷)G(δl1𝒙i𝜷)=(δlδl1)g(δ𝒙i𝜷)G(\delta_{l}-\bm{x}_{i}^{\top}\bm{\beta})-G(\delta_{l-1}-\bm{x}_{i}^{\top}\bm{\beta})=(\delta_{l}-\delta_{l-1})g(\delta^{*}-\bm{x}_{i}^{\top}\bm{\beta}) for δ\delta^{*} such that δl1<δ<δl\delta_{l-1}<\delta^{*}<\delta_{l} and δlδl1>0\delta_{l}-\delta_{l-1}>0, for the influence function of δl\delta_{l} to have redescendence, the order of g(δl𝒙i𝜷)/g(δ𝒙i𝜷)g(\delta_{l}-\bm{x}_{i}^{\top}\bm{\beta})/g(\delta^{*}-\bm{x}_{i}^{\top}\bm{\beta}) must be o(1)o(1). However, such a distribution GG is improper, and such a distribution does not exist.

A.2 Proof of Theorem 4

First, consider the influence function (14) of βk\beta_{k}. It is clear that limu±g(u)u=0\lim_{u\to\pm\infty}g(u)u=0 holds when the condition (16) holds. If here u=δ𝒙o𝜷u=\delta-\bm{x}_{o}^{\top}\bm{\beta} and note that uu goes to infinity of the same order as xokx_{ok}, since G(δm𝒙o𝜷)G(δm1𝒙o𝜷)G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta}) is clearly bounded, the second term of equation (14) becomes zero for u±u\to\pm\infty.

Since from Lagrange’s theorem,

G(δm𝒙o𝜷)G(δm1𝒙o𝜷)=(δmδm1)g(δ𝒙o𝜷)G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})=(\delta_{m}-\delta_{m-1})g(\delta^{*}-\bm{x}_{o}^{\top}\bm{\beta})

for δ\delta^{*} such that δm1<δ<δm\delta_{m-1}<\delta^{*}<\delta_{m}, then

g(δm𝒙o𝜷)[G(δm𝒙o𝜷)G(δm1𝒙o𝜷)]α1xok\displaystyle g(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})\left[G(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})-G(\delta_{m-1}-\bm{x}_{o}^{\top}\bm{\beta})\right]^{\alpha-1}x_{ok}
=g(δm𝒙o𝜷)g(δ𝒙o𝜷)α1xok(δmδm1)1α\displaystyle=g(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})g(\delta^{*}-\bm{x}_{o}^{\top}\bm{\beta})^{\alpha-1}\frac{x_{ok}}{(\delta_{m}-\delta_{m-1})^{1-\alpha}} (A.1)

and g(δm𝒙o𝜷)g(δ𝒙o𝜷)α1g(\delta_{m}-\bm{x}_{o}^{\top}\bm{\beta})g(\delta^{*}-\bm{x}_{o}^{\top}\bm{\beta})^{\alpha-1} is of the same order as g(u)αg(u)^{\alpha} for u±u\to\pm\infty. Thus, if the condition (16) holds, i.e., if there exists a certain 0<α10<\alpha\leq 1 such that limu±g(u)αu=0\lim_{u\to\pm\infty}g(u)^{\alpha}u=0, then for u±u\to\pm\infty the equation (A.1) becomes

limu±g(u)αu(δmδm1)1α=0.\displaystyle\lim_{u\to\pm\infty}g(u)^{\alpha}\frac{u}{(\delta_{m}-\delta_{m-1})^{1-\alpha}}=0.

Therefore, if the condition (16) holds, the first term of equation (14) is zero for u±u\to\pm\infty and then the influence function of βk\beta_{k} using the DP divergence is also zero.

For the influence function (15) of δl\delta_{l}, it becomes zero for u±u\to\pm\infty in the similar manner. Namely, the influence functions (14) and (15) in the ordinal response model with the DP divergence are redescendent.

Moreover, the influence functions (14) and (15) are bounded since they are clearly continuous and their values are zero for u±u\to\pm\infty.

References

  • Agresti, (2010) Agresti, A. (2010). Analysis of Ordinal Categorical Data, volume 656. John Wiley & Sons.
  • Agresti and Kateri, (2017) Agresti, A. and Kateri, M. (2017). Ordinal probability effect measures for group comparisons in multinomial cumulative link models. Biometrics, 73(1), 214–219.
  • Albert and Chib, (1993) Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
  • Ashby et al., (1989) Ashby, D., West, C. R., and Ames, D. (1989). The ordered logistic regression model in psychiatry: rising prevalence of dementia in old people’s homes. Statistics in Medicine, 8(11), 1317–1326.
  • Baetschmann et al., (2020) Baetschmann, G., Ballantyne, A., Staub, K. E., and Winkelmann, R. (2020). feologit: A new command for fitting fixed-effects ordered logit models. The Stata Journal, 20(2), 253–275.
  • Basak et al., (2021) Basak, S., Basu, A., and Jones, M. (2021). On the ’optimal’ density power divergence tuning parameter. Journal of Applied Statistics, 48(3), 536–556.
  • Basu et al., (1998) Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3), 549–559.
  • Breiger, (1981) Breiger, R. L. (1981). The social class structure of occupational mobility. American Journal of Sociology, 87(3), 578–611.
  • Castilla et al., (2021) Castilla, E., Jaenada, M., and Pardo, L. (2021). Estimation and testing on independent not identically distributed observations based on Rényi’s pseudodistances. arXiv preprint arXiv:2102.12282, .
  • Christensen, (2018) Christensen, R. H. B. (2018). Cumulative link models for ordinal regression with the r package ordinal. Submitted in Journal of Statistical Software, 35.
  • Croux et al., (2013) Croux, C., Haesbroeck, G., and Ruwet, C. (2013). Robust estimation for ordinal regression. Journal of Statistical Planning and Inference, 143(9), 1486–1499.
  • Czado and Santner, (1992) Czado, C. and Santner, T. J. (1992). The effect of link misspecification on binary regression inference. Journal of Statistical Planning and Inference, 33(2), 213–231.
  • Dawid and Musio, (2015) Dawid, A. P. and Musio, M. (2015). Bayesian model selection based on proper scoring rules. Bayesian Analysis, 10(2), 479–499.
  • Desgagné, (2015) Desgagné, A. (2015). Robustness to outliers in location–scale parameter model using log-regularly varying distributions. The Annals of Statistics, 43(4), 1568–1595.
  • Fair, (1978) Fair, R. C. (1978). A theory of extramarital affairs. Journal of Political Economy, 86(1), 45–61.
  • Franses and Paap, (2001) Franses, P. H. and Paap, R. (2001). Quantitative Models in Marketing Research. Cambridge University Press.
  • Fujisawa and Eguchi, (2008) Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9), 2053–2081.
  • Gagnon et al., (2020) Gagnon, P., Desgagné, A., and Bédard, M. (2020). A new bayesian approach to robustness against outliers in linear regression. Bayesian Analysis, 15(2), 389–414.
  • Ghosh and Basu, (2013) Ghosh, A. and Basu, A. (2013). Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electronic Journal of Statistics, 7, 2420–2456.
  • Hampel, (1974) Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346), 383–393.
  • Hampel et al., (1986) Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986). Robust Statistics. Wiley Online Library.
  • Hamura et al., (2022) Hamura, Y., Irie, K., and Sugasawa, S. (2022). Log-regularly varying scale mixture of normals for robust regression. Computational Statistics and Data Analysis, 173, 107517.
  • Harrison and Rubinfeld, (1978) Harrison, J. D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
  • Hashimoto and Sugasawa, (2020) Hashimoto, S. and Sugasawa, S. (2020). Robust bayesian regression with synthetic posterior distributions. Entropy, 22(6), 661.
  • Huber and Ronchetti, (2009) Huber, J. and Ronchetti, E. M. (2009). Robust Statistics, Second Edition. Wiley.
  • Iannario et al., (2017) Iannario, M., Monti, A. C., Piccolo, D., and Ronchetti, E. (2017). Robust inference for ordinal response models. Electronic Journal of Statistics, 11(2), 3407–3445.
  • Jones et al., (2001) Jones, M., Hjort, N. L., Harris, I. R., and Basu, A. (2001). A comparison of related density-based minimum divergence estimators. Biometrika, 88(3), 865–873.
  • Kawakami and Hashimoto, (2022) Kawakami, J. and Hashimoto, S. (2022). Approximate gibbs sampler for bayesian huberized lasso. arXiv preprint arXiv:2204.00237, .
  • Kawashima and Fujisawa, (2017) Kawashima, T. and Fujisawa, H. (2017). Robust and sparse regression via γ\gamma-divergence. Entropy, 19(11), 608.
  • Maronna et al., (2019) Maronna, R. A., Martin, R. D., Yohai, V. J., and Salibián-Barrera, M. (2019). Robust Statistics: Theory and Methods (with R). John Wiley & Sons.
  • McCullagh, (1980) McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 109–127.
  • O’Hagan and Pericchi, (2012) O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review. Brazilian Journal of Probability and Statistics, 26(4), 372–401.
  • Pyne et al., (2022) Pyne, A., Roy, S., Ghosh, A., and Basu, A. (2022). Robust and efficient estimation in ordinal response models using the density power divergence. arXiv preprint arXiv:2208.14011, .
  • Riani et al., (2011) Riani, M., Torti, F., and Zani, S. (2011). Outliers and robustness for ordinal data. Modern Analysis of Customer Surveys: with applications using R, , 155–169.
  • Scalera et al., (2021) Scalera, V., Iannario, M., and Monti, A. C. (2021). Robust link functions. Statistics, 55(4), 963–977.
  • Shao et al., (2019) Shao, S., Jacob, P. E., Ding, J., and Tarokh, V. (2019). Bayesian model comparison with the hyvärinen score: Computation and consistency. Journal of the American Statistical Association, 114, 1826–1837.
  • Sugasawa and Yonekura, (2021) Sugasawa, S. and Yonekura, S. (2021). On selection criteria for the tuning parameter in robust divergence. Entropy, 23(9), 1147.
  • Tomizawa et al., (2006) Tomizawa, S., Miyamoto, N., and Ouchi, M. (2006). Decompositions of symmetry model into marginal homogeneity and distance subsymmetry in square contingency tables with ordered categories. REVSTAT–Statistical Journal, 4(2), 153–161.
  • Uebersax, (1999) Uebersax, J. S. (1999). Probit latent class analysis with dichotomous or ordered category measures: Conditional independence/dependence models. Applied Psychological Measurement, 23(4), 283–297.
  • Warwick and Jones, (2005) Warwick, J. and Jones, M. (2005). Choosing a robustness tuning parameter. Journal of Statistical Computation and Simulation, 75(7), 581–588.