This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust estimation in beta regression via maximum Lq-likelihood

Terezinha K. A. Ribeiro
Department of Statistics, University of São Paulo, Brazil
Silvia L.P. Ferrari111Corresponding author: silviaferrari@usp.br
Department of Statistics, University of São Paulo, Brazil
Abstract

Beta regression models are widely used for modeling continuous data limited to the unit interval, such as proportions, fractions, and rates. The inference for the parameters of beta regression models is commonly based on maximum likelihood estimation. However, it is known to be sensitive to discrepant observations. In some cases, one atypical data point can lead to severe bias and erroneous conclusions about the features of interest. In this work, we develop a robust estimation procedure for beta regression models based on the maximization of a reparameterized Lq-likelihood. The new estimator offers a trade-off between robustness and efficiency through a tuning constant. To select the optimal value of the tuning constant, we propose a data-driven method which ensures full efficiency in the absence of outliers. We also improve on an alternative robust estimator by applying our data-driven method to select its optimum tuning constant. Monte Carlo simulations suggest marked robustness of the two robust estimators with little loss of efficiency. Applications to three datasets are presented and discussed. As a by-product of the proposed methodology, residual diagnostic plots based on robust fits highlight outliers that would be masked under maximum likelihood estimation.

Keywords: Beta regression; Bounded influence function; Lq-likelihood; Outliers; Residuals; Robustness.

1 Introduction

Random phenomena that produce data in the unit interval (0,1)(0,1) are common in many research areas, including medicine, finance, economics, environment, hydrology, and psychology. Data of this type are usually rates, proportions, percentages, or fractions. A few examples are unemployment rates, the proportion of exports in total sales, the fraction of surface covered by vegetation, Gini’s index, the loss given default, health-related quality of life scores, and neonatal mortality rates. Beta regression models (Ferrari and Cribari-Neto, 2004; Smithson and Verkuilen, 2006) are widely employed for modeling continuous data bounded within the unit interval. The response variable is assumed to follow a beta distribution indexed by the mean and a precision parameter, which in turn are modeled through a variety of regression structures.

Let the response variable yiy_{i} have a beta distribution with mean 0<μi<10<\mu_{i}<1 and precision parameter ϕi>0\phi_{i}>0. The probability density function of yiy_{i} is

f(yi;μi,ϕi)=1B(μiϕi,(1μi)ϕi)yiμiϕi1(1yi)(1μi)ϕi1, 0<yi<1,\displaystyle f(y_{i};\mu_{i},\phi_{i})=\frac{1}{B(\mu_{i}\phi_{i},(1-\mu_{i})\phi_{i})}y_{i}^{\mu_{i}\phi_{i}-1}(1-y_{i})^{(1-\mu_{i})\phi_{i}-1},\mbox{ }0<y_{i}<1, (1)

where B(.,.)B(.,.) is the beta function. We write yiBeta(μi,ϕi)y_{i}\sim{\rm Beta}(\mu_{i},\phi_{i}), i=1,,ni=1,\ldots,n, and assume that y1,,yny_{1},\ldots,y_{n} are independent. Since Var(yi)=μi(1μi)/(1+ϕi){\rm Var}(y_{i})=\mu_{i}(1-\mu_{i})/(1+\phi_{i}), the parameter ϕi\phi_{i} is taken as a precision parameter because it is inversely related to the dispersion of yiy_{i}, for fixed mean μi\mu_{i}. The beta density (1) assumes different forms depending on the parameters. It has a single mode in (0,1)(0,1) if μiϕi>1\mu_{i}\phi_{i}>1 and (1μi)ϕi>1(1-\mu_{i})\phi_{i}>1, and its possible forms include J, inverted-J, tilde, inverted tilde, bathtub and uniform shapes. When μiϕi<1\mu_{i}\phi_{i}<1 or (1μi)ϕi<1(1-\mu_{i})\phi_{i}<1 the beta density is unbounded at one or both of the boundaries. We will assume that the regression structures for μi\mu_{i} and ϕi\phi_{i} are, respectively,

gμ(μi)=𝑿i𝜷,(meansubmodel)\displaystyle g_{\mu}(\mu_{i})=\mbox{\boldmath$X$\unboldmath}_{i}^{\top}{\bm{\beta}},\quad{\rm(mean\ submodel)} (2)
gϕ(ϕi)=𝒁i𝜸,(precisionsubmodel)\displaystyle g_{\phi}(\phi_{i})=\mbox{\boldmath$Z$\unboldmath}_{i}^{\top}{\bm{\gamma}},\quad{\rm(precision\ submodel)} (3)

where 𝜷=(β1,,βp1)p1{\bm{\beta}}=(\beta_{1},\ldots,\beta_{p_{1}})^{\top}\in\mathbb{R}^{p_{1}} and 𝜸=(γ1,,γp2)p2\mbox{\boldmath$\gamma$\unboldmath}=(\gamma_{1},\ldots,\gamma_{p_{2}})^{\top}\in\mathbb{R}^{p_{2}} are vectors of unknown regression parameters, 𝑿i=(xi1,,xip1)\mbox{\boldmath$X$\unboldmath}_{i}=(x_{i1},\ldots,x_{ip_{1}})^{\top} and 𝒁i=(zi1,,zip2)\mbox{\boldmath$Z$\unboldmath}_{i}=(z_{i1},\ldots,z_{ip_{2}})^{\top} are vectors of covariates of lengths p1p_{1} and p2p_{2} (p1+p2=p<np_{1}+p_{2}=p<n), respectively, which are assumed fixed and known. In addition, gμ():(0,1)g_{\mu}(\cdot):(0,1)\rightarrow\mathbb{R} and gϕ():(0,)g_{\phi}(\cdot):(0,\infty)\rightarrow\mathbb{R} are strictly monotonic and twice differentiable link functions. The model introduced by Ferrari and Cribari-Neto (2004) is a constant precision (or, equivalently, constant dispersion) beta regression model in the sense that ϕi=ϕ\phi_{i}=\phi, for i=1,,ni=1,\ldots,n.

Inference for the beta regression model (1)-(3) is commonly based on the maximum likelihood estimator (MLE). However, it may be highly influenced by the presence of outlying observations in the data.

As an illustration of the sensitivity of the MLE to outliers, we consider a dataset on risk management practices of 73 firms introduced and analyzed by Schmit and Roth (1990); see also Gómez-Déniz et al. (2014). The response variable is Firm cost, defined as premiums plus uninsured losses as a percentage of the total assets. Smaller values of Firm cost are attributed to firms that have a good risk management performance. We take only two covariates, namely Ind cost, a measure of the firm’s industry risk, and Size log, the logarithm of its total asset. Figure 1 displays the scatter plot of Firm cost versus Ind cost along with the fitted lines based on the MLE for a beta regression with varying precision for the full data and the data without the most eye-catching outliers, labeled in the plots as observations 15, 16 and 72.222The scatter plots were produced by setting the value of the remaining covariate at its sample median value. These observations have disproportionate effects on the fitted regression curves.

Refer to caption
Figure 1: Scatter plot of Firm cost versus Ind cost along with the fitted lines based on the MLE for the full data and the data without outliers.

To deal with atypical observations, models based on distributions that are more flexible than the beta distribution may be employed. For instance, Bayes et al. (2012), Migliorati et al. (2018), and Di Brisco et al. (2020) proposed different kinds of mixtures of beta distributions. However, more flexibility comes at the cost of model complexity and additional parameters, making model specification, inference, and interpretation more difficult. Our approach keeps the model simple, i.e. the beta distribution remains as the postulated model, but the inference procedure is replaced by a robust method.

Ghosh (2019) proposed a robust estimator in beta regression models based on the minimization of an empirical density power divergence introduced by Basu et al. (1998). The estimator, called the minimum density power divergence estimator (MDPDE), is similar to the approach we will propose here. The main difference between the two estimation procedures is that they are based on two different families of divergences that have the Kullback-Leibler divergence as a special case. In other words, both have the MLE as a special case. Ghosh (2019) presented some simulations of the MDPDE, but only for constant precision beta regression models with fixed values for the tuning constant. Here, we propose a data-driven method to select the tuning constant of the MDPDE which ensures full efficiency in the absence of contamination. Additionally, we evaluate the performance of the MDPDE under constant and varying precision using the proposed selection algorithm.

The chief goal of this paper is to propose a new robust estimator under the beta regression model (1)-(3) with a scheme determined by the data to choose its optimum tuning constant. In Section 2, we introduce a robust estimator based on the maximization of a reparameterized Lq-likelihood, and derive some properties. In Section 3, we introduce a data-driven method to the selection of the tuning constant. In Section 4, we provide Monte Carlo evidence on the proposed robust inference approach. Section 5 contains applications of the robust estimation for three datasets. Finally, Section 6 closes the paper with some concluding remarks. Some technical details are left for an appendix. The on-line Supplementary Material contains further technical information and additional numerical results. The codes for the simulations and applications and the datasets are available at https://github.com/terezinharibeiro/RobustBetaRegression.

2 Robust inference

Let y1,,yny_{1},\ldots,y_{n} be nn independent observations, where yiy_{i} comes from a postulated model indexed by a common unknown parameter 𝜽Θp\mbox{\boldmath$\theta$\unboldmath}\in\Theta\subset\mathbb{R}_{p}. For the beta regression model, the postulated density for yiy_{i} is given by (1)-(3), and we will denote it, from now on, by f𝜽(;μi,ϕi)f_{\bm{\theta}}(\cdot;\mu_{i},\phi_{i}). Let q(𝜽)\ell_{q}(\mbox{\boldmath$\theta$\unboldmath}) denote the Lq-likelihood (Ferrari and Yang, 2010),

q(𝜽)=i=1nLq(f𝜽(yi;μi,ϕi)),\displaystyle\ell_{q}({\bm{\theta}})=\displaystyle\sum_{i=1}^{n}L_{q}(f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})), (4)

where 0<q10<q\leq 1 is a tuning constant, and

Lq(u)={(u1q1)/(1q), if q1,log(u), if q=1,\displaystyle L_{q}(u)=\begin{cases}(u^{1-q}-1)/(1-q),\mbox{ if }q\neq 1,\\ \log(u),\mbox{ if }q=1,\end{cases}

is the Box-Cox transformation (Box and Cox, 1964), also called distorted logarithm. The population version of q(𝜽)/n-\ell_{q}(\mbox{\boldmath$\theta$\unboldmath})/n with respect to the density g()g(\cdot) is

Hq(f𝜽,g)=Lq(f𝜽(y;μ,ϕ))g(y)𝑑y,\displaystyle H_{q}(f_{\bm{\theta}},g)=-\displaystyle\int L_{q}\left(f_{\bm{\theta}}({y};{\mu},{\phi})\right)g({y})d{y},

which is the qq-entropy of f𝜽f_{\bm{\theta}} with respect to gg (Tsallis, 1988) and reduces to the Shannon entropy when q=1q=1. For q=1q=1, q(𝜽)\ell_{q}(\mbox{\boldmath$\theta$\unboldmath}) is the usual log-likelihood function and maximizing q(𝜽)\ell_{q}(\mbox{\boldmath$\theta$\unboldmath}) leads to the maximum likelihood estimator (MLE).

The estimator obtained by maximizing (4), will be denoted by 𝜽~q\widetilde{{\bm{\theta}}}_{q}. It comes from the estimating equation

i=1nU(yi,𝜽)f𝜽(yi;μi,ϕi)1q=𝟎,\displaystyle\sum_{i=1}^{n}U(y_{i},{\bm{\theta}})f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}=\mbox{\boldmath$0$\unboldmath}, (5)

where U(yi,𝜽)=𝜽log[f𝜽(yi;μi,ϕi)]U(y_{i},{\bm{\theta}})=\nabla_{\bm{\theta}}\log[f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})], with 𝜽\nabla_{{\bm{\theta}}} denoting the gradient with respect to 𝜽\theta. The score vector for 𝜽\theta corresponding to the ii-th observation is given by

U(yi;𝜽)=(ϕi(yiμi)gμ(μi)𝑿i,μi(yiμi)+yiμigϕ(ϕi)𝒁i),\displaystyle U(y_{i};\mbox{\boldmath$\theta$\unboldmath})=\left(\begin{array}[]{ccc}\phi_{i}\frac{(y_{i}^{\star}-\mu_{i}^{\star})}{g_{\mu}^{\prime}(\mu_{i})}\mbox{\boldmath$X$\unboldmath}_{i}^{\top},&\frac{\mu_{i}(y_{i}^{\star}-\mu_{i}^{\star})+y_{i}^{\dagger}-\mu_{i}^{\dagger}}{g_{\phi}^{\prime}(\phi_{i})}\mbox{\boldmath$Z$\unboldmath}_{i}^{\top}\\ \end{array}\right)^{\top}, (7)

where yi=log(yi/(1yi))y_{i}^{\star}=\log(y_{i}/(1-y_{i})), yi=log(1yi)y_{i}^{\dagger}=\log(1-y_{i}), μi=E(yi)=ψ(μiϕi)ψ((1μi)ϕi)\mu_{i}^{\star}=\mbox{E}(y_{i}^{\star})=\psi(\mu_{i}\phi_{i})-\psi((1-\mu_{i})\phi_{i}), and μi=E(yi)=ψ((1μi)ϕi)ψ(ϕi)\mu_{i}^{\dagger}=\mbox{E}(y_{i}^{\dagger})=\psi((1-\mu_{i})\phi_{i})-\psi(\phi_{i}), with ψ()\psi(\cdot) denoting the digamma function.

The score vector of the ii-th observation is weighted by f𝜽(yi;μi,ϕi)1qf_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}, which depends on the postulated model and the tuning constant qq. A choice of qq not too close to 1 produces a robust estimation procedure, because observations that are inconsistent with the assumed model receive smaller weights. Furthermore, (5) defines an M-estimation procedure, first introduced by Huber (1964). The class of M-estimators is widely used for obtaining robust estimators, and has interesting properties such as the asymptotic normality (Huber, 1981).

The estimating function U(𝒚;𝜽)=i=1nU(yi,𝜽)f𝜽(yi;μi,ϕi)1qU(\mbox{\boldmath$y$\unboldmath};\mbox{\boldmath$\theta$\unboldmath})=\sum_{i=1}^{n}U(y_{i},{\bm{\theta}})f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q} is biased unless q=1q=1. Hence, the estimator 𝜽~q\widetilde{{\bm{\theta}}}_{q} is not Fisher-consistent. Ferrari and La Vecchia (2012) showed that a suitable calibration function may be applied to rescale the parameter estimates and to achieve Fisher-consistency if the family of postulated distributions is closed under the power transformation. Given a density hh and a constant α(0,)\alpha\in(0,\infty), the power transformation is defined as

h(α)(y)=h(y)αh(y)α𝑑yh(y)α,y in the support,\displaystyle h^{(\alpha)}(y)=\frac{h(y)^{\alpha}}{\int h(y)^{\alpha}dy}\ \propto\ h(y)^{\alpha},\ \forall y\mbox{ in the support}, (8)

provided that h(y)α𝑑y<\int h(y)^{\alpha}dy<\infty, for 0<α<0<\alpha<\infty. For a family of postulated densities {h𝜽(),𝜽Θ}\{h_{\bm{\theta}}(\cdot),{\bm{\theta}}\in\Theta\} that is closed under (8) for all 0<α<0<\alpha<\infty, let τα(𝜽):ΘΘ\tau_{\alpha}(\mbox{\boldmath$\theta$\unboldmath}):\Theta\mapsto\Theta be an invertible continuous function satisfying hτα(𝜽)(y)=h𝜽(α)(y)h_{\tau_{\alpha}(\bm{\theta})}(y)=h_{\bm{\theta}}^{(\alpha)}(y) for all yy in the support, which is assumed not to depend on 𝜽\theta. Note that the function τα()\tau_{\alpha}(\cdot) maps each density h𝜽()h_{\bm{\theta}}(\cdot) with a unique power-transformed density hτα(𝜽)()h_{\tau_{\alpha}(\bm{\theta})}(\cdot).

The qq-entropy Hq(f𝜽,g)H_{q}(f_{\bm{\theta}},g) is related with the family of divergences of f𝜽f_{\bm{\theta}} with respect to gg

Dq(f𝜽,g)=q1Lq(f𝜽(y;μ,ϕ)g(y))g(y)𝑑y.D_{q}(f_{\bm{\theta}},g)=-q^{-1}\displaystyle\int L_{q}\left(\frac{f_{\bm{\theta}}({y};{\mu},{\phi})}{g(y)}\right)g({y})d{y}.

Ferrari and La Vecchia (2012, Lemma 1) showed that Dq(f𝜽,g(1/q))Hq(f𝜽,g)Hq(g(1/q),g)D_{q}(f_{\bm{\theta}},g^{(1/q)})\ \propto\ H_{q}(f_{\bm{\theta}},g)-H_{q}(g^{(1/q)},g). Consequently, the minimizer 𝜽~q\widetilde{{\bm{\theta}}}_{q} of the empirical qq-entropy with respect to 𝜽\theta equals the minimizer of the empirical version of Dq(f𝜽,g(1/q))D_{q}(f_{\bm{\theta}},g^{(1/q)}). In the minimization of Dq(f𝜽,g(1/q))D_{q}(f_{\bm{\theta}},g^{(1/q)}), the target density is the transformed density g(1/q)g^{(1/q)} instead of gg. Ferrari and La Vecchia (2012, Proposition 1) show that 𝜽~q\widetilde{{\bm{\theta}}}_{q} is not Fisher-consistent for 𝜽\theta but it is Fisher-consistent for τ1/q(𝜽)\tau_{1/q}(\mbox{\boldmath$\theta$\unboldmath}). Hence, 𝜽^q=τq(𝜽~q)\widehat{{\bm{\theta}}}_{q}=\tau_{q}(\widetilde{{\bm{\theta}}}_{q}) is Fisher-consistent for 𝜽\theta. Alternatively, 𝜽^q\widehat{{\bm{\theta}}}_{q} is obtained by maximizing the Lq-likelihood in the parameterization τ1/q(𝜽)\tau_{1/q}(\bm{\theta}) (La Vecchia et al., 2015). Another proof of the Fisher-consistency of 𝜽^q\widehat{{\bm{\theta}}}_{q} is given in the Supplementary Material.

For the beta regression model (1)-(3), 𝜽^q\widehat{{\bm{\theta}}}_{q} is the maximizer of

q(𝜽)=i=1nLq(fτ1/q(𝜽)(yi;μi,ϕi))=i=1nLq(f𝜽(1/q)(yi;μi,ϕi)),\displaystyle\ell^{*}_{q}({\bm{\theta}})=\displaystyle\sum_{i=1}^{n}L_{q}(f_{\tau_{1/q}(\bm{\theta})}(y_{i};\mu_{i},\phi_{i}))=\displaystyle\sum_{i=1}^{n}L_{q}\left(f_{\bm{\theta}}^{(1/q)}(y_{i};\mu_{i},\phi_{i})\right),

where f𝜽(1/q)(yi;μi,ϕi)=f𝜽(yi;μi,q1,ϕi,q1)f^{(1/q)}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})=f_{\bm{\theta}}(y_{i};\mu_{i,q^{-1}},\phi_{i,q^{-1}}), for 0<q<10<q<1, with

μi,q=ϕi,q1[q(μiϕi1)+1],ϕi,q=q(ϕi2)+2,\displaystyle\mu_{i,q}={\phi_{i,q}}^{-1}[q(\mu_{i}\phi_{i}-1)+1],\qquad\phi_{i,q}=q(\phi_{i}-2)+2, (9)

provided that 0<μi,q1<10<\mu_{i,q^{-1}}<1 and ϕi,q1>0\phi_{i,q^{-1}}>0, or equivalently, μiϕi>1q\mu_{i}\phi_{i}>1-q and (1μi)ϕi>1q(1-\mu_{i})\phi_{i}>1-q. Then, f𝜽(yi;μi,ϕi)f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i}) satisfies (8) for all α=1/q>0\alpha=1/q>0, if μiϕi1\mu_{i}\phi_{i}\geq 1 and (1μi)ϕi1(1-\mu_{i})\phi_{i}\geq 1, i.e. if f𝜽(yi;μi,ϕi)f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i}) is bounded. From (2) and (3), f𝜽(1/q)(yi;μi,ϕi)f^{(1/q)}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i}) corresponds to a modified beta regression model defined by the beta density (1) with mean and precision submodels given respectively by

g𝝁(μi)=g𝝁(μi,q),gϕ(ϕi)=gϕ(ϕi,q),\displaystyle g^{*}_{\bm{\mu}}(\mu_{i})=g_{\bm{\mu}}(\mu_{i,q}),\quad g^{*}_{\bm{\phi}}(\phi_{i})=g_{\bm{\phi}}(\phi_{i,q}),

and we denote it by f𝜽(yi;μi,ϕi)f_{\bm{\theta}}^{*}(y_{i};\mu_{i},\phi_{i}).

Therefore, we propose the estimator that is obtained through the maximization of

q(𝜽)=i=1nLq(f𝜽(yi;μi,ϕi)),\displaystyle\ell^{*}_{q}({\bm{\theta}})=\displaystyle\sum_{i=1}^{n}L_{q}(f^{*}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})), (10)

which leads to the following estimating equation

i=1nU(yi,𝜽)f𝜽(yi;μi,ϕi)1q=𝟎,\displaystyle\sum_{i=1}^{n}U^{*}(y_{i},{\bm{\theta}})f^{*}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}=\mbox{\boldmath$0$\unboldmath}, (11)

where U(yi,𝜽)=𝜽log[f𝜽(yi;μi,ϕi)]U^{*}(y_{i},{\bm{\theta}})=\nabla_{\bm{\theta}}\log[f^{*}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})] being a modified score vector for 𝜽\theta corresponding to the ii-th observation given by

U(yi;𝜽)=(q1ϕi,q(yiμi)gμ(μi,q)𝑿i,q1μi,q(yiμi)+yiμigϕ(ϕi,q)𝒁i).\displaystyle U^{*}(y_{i};\mbox{\boldmath$\theta$\unboldmath})=\left(\begin{array}[]{ccc}q^{-1}\phi_{i,q}\frac{(y_{i}^{\star}-\mu_{i}^{\star})}{g_{\mu}^{\prime}(\mu_{i,q})}\mbox{\boldmath$X$\unboldmath}_{i}^{\top},&q^{-1}\frac{\mu_{i,q}(y_{i}^{\star}-\mu_{i}^{\star})+y_{i}^{\dagger}-\mu_{i}^{\dagger}}{g_{\phi}^{\prime}(\phi_{i,q})}\mbox{\boldmath$Z$\unboldmath}_{i}^{\top}\end{array}\right)^{\top}. (13)

The final estimator, 𝜽^q\widehat{{\bm{\theta}}}_{q}, will be simply called the surrogate maximum likelihood estimator (SMLE).

From the theory of M-estimation, 𝜽^q 𝑎 N(𝜽,Vq(𝜽)),\widehat{{\bm{\theta}}}_{q}\mbox{ }{\overset{a}{\sim}}\mbox{ }\mbox{N}(\mbox{\boldmath$\theta$\unboldmath},V_{q}(\mbox{\boldmath$\theta$\unboldmath})), where 𝑎\overset{a}{\sim} denotes asymptotic distribution and

Vq(𝜽)=Jq(𝜽)1Kq(𝜽)Jq(𝜽)1,V_{q}(\mbox{\boldmath$\theta$\unboldmath})=J_{q}(\mbox{\boldmath$\theta$\unboldmath})^{-1}K_{q}(\mbox{\boldmath$\theta$\unboldmath}){J_{q}(\mbox{\boldmath$\theta$\unboldmath})^{-1}}^{\top}, (14)

with

Jq(𝜽)=i=1nE{𝜽[U(yi,𝜽)f𝜽(yi;μi,ϕi)1q]},Kq(𝜽)=i=1nE{U(yi,𝜽)U(yi,𝜽)f𝜽(yi;μi,ϕi)2(1q)};\displaystyle J_{q}(\mbox{\boldmath$\theta$\unboldmath})=\sum_{i=1}^{n}\mbox{E}\left\{\nabla_{{\bm{\theta}}^{\top}}\left[U^{*}(y_{i},{\bm{\theta}})f^{*}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}\right]\right\},\ \ K_{q}(\mbox{\boldmath$\theta$\unboldmath})=\sum_{i=1}^{n}\mbox{E}\left\{U^{*}(y_{i},{\bm{\theta}})U^{*}(y_{i},{\bm{\theta}})^{\top}f^{*}_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{2(1-q)}\right\};

see Ferrari and La Vecchia (2012, Section 3.2). For the beta regression model (1)-(3), Jq(𝜽)J_{q}(\mbox{\boldmath$\theta$\unboldmath}) and Kq(𝜽)K_{q}(\mbox{\boldmath$\theta$\unboldmath}) are given in the Appendix. The matrix Vq(𝜽)V_{q}(\mbox{\boldmath$\theta$\unboldmath}) is well-defined provided that 0<μi,2q<10<\mu_{i,2-q}<1 and ϕi,2q>0\phi_{i,2-q}>0, or equivalently μiϕi>2(1q)/(2q)\mu_{i}\phi_{i}>2(1-q)/(2-q) and (1μi)ϕi>2(1q)/(2q)(1-\mu_{i})\phi_{i}>2(1-q)/(2-q). These conditions hold for all 0<q10<q\leq 1, if μiϕi1\mu_{i}\phi_{i}\geq 1 and (1μi)ϕi1(1-\mu_{i})\phi_{i}\geq 1.

Likelihood-based tests are usually constructed from the three well-known test statistics, namely the likelihood ratio, Wald, and score statistics, and they rely on maximum likelihood estimation. Following Heritier and Ronchetti (1994) (see also Heritier et al. (2009, Section 2.5.3)), robust versions of these statistics may be derived by replacing the MLE, the log-likelihood function, the score vector, and the asymptotic covariance matrix respectively by the SMLE, the reparameterized LqL_{q}-likelihood in (10), the estimating function in (11), and the matrix in (14). In this paper, we focus on the robust Wald statistic, referred here as the Wald-type statistic, of the null hypothesis βk=βk(0)\beta_{k}=\beta_{k}^{(0)} against a two-sided alternative defined as {(β^kqβk(0))/se(β^kq)}2\{(\widehat{\beta}_{kq}-\beta_{k}^{(0)})/{\rm se}(\widehat{\beta}_{kq})\}^{2}, which has a χ12\chi^{2}_{1} distribution under the null hypothesis. Here, β^kq\widehat{\beta}_{kq} is the SMLE of βk\beta_{k} and se(β^kq){\rm se}(\widehat{\beta}_{kq}) is its asymptotic standard error that is computed from (14).

Since the MLE (𝜽^\widehat{\mbox{\boldmath$\theta$\unboldmath}}) and the SMLE are M-estimators, their influence functions for the beta regression model (1)-(3) are IF(y;𝜽^)=K1(𝜽)1U(y;𝜽)\mbox{IF}(y;\widehat{\mbox{\boldmath$\theta$\unboldmath}})=-K_{1}({\mbox{\boldmath$\theta$\unboldmath}})^{-1}U(y;{\mbox{\boldmath$\theta$\unboldmath}}) and IF(y;𝜽^q)=Jq(𝜽)1U(y,𝜽)f𝜽(y;μ,ϕ)1q\mbox{IF}(y;\widehat{{\bm{\theta}}}_{q})=-J_{q}(\mbox{\boldmath$\theta$\unboldmath})^{-1}U^{*}(y,{\bm{\theta}})f^{*}_{\bm{\theta}}(y;\mu,\phi)^{1-q}, where U(y;𝜽)U(y;{\mbox{\boldmath$\theta$\unboldmath}}) and U(y,𝜽)U^{*}(y,{\bm{\theta}}) are given in (7) and (13), respectively, K1(𝜽)K_{1}({\mbox{\boldmath$\theta$\unboldmath}}) is the Fisher information matrix (see the Appendix), and Jq(𝜽)J_{q}(\mbox{\boldmath$\theta$\unboldmath}) is given in (14). The influence function measures the asymptotic bias of the estimator caused by an infinitesimal contamination introduced in the data point yy (Hampel et al., 2011, Section 2.1.b). The individual score vector U(y;𝜽)U(y;{\mbox{\boldmath$\theta$\unboldmath}}) is unbounded when y0y\rightarrow 0 or y1y\rightarrow 1, i.e., for yy tending to one of the boundaries of the support set. Hence, the influence function of the MLE is also unbounded. On the other hand, for bounded beta densities the weighted modified score vector U(y,𝜽)f𝜽(y;μ,ϕ)1qU^{*}(y,{\bm{\theta}})f^{*}_{\bm{\theta}}(y;\mu,\phi)^{1-q} and the influence function of the SMLE are bounded, which implies that the SMLE is B-robust (Hampel et al., 2011, Section 2.1). Figure 2 illustrates the behavior of the score vector and of the weighted modified score vector for the i.i.d. case. Note that the components of the score vector grow or decrease unboundedly when yy approaches zero or one, while the components of the weighted modified score vector has a redescending shape, approaching zero at the boundaries. This means that observations close to zero or one may be influential in the MLE fit but not in the SMLE fit.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Components of the score vector and of the weighted modified score vector versus yy; μ=0.5\mu=0.5, ϕ=90\phi=90, and q=0.9q=0.9.

We also studied the behavior of the change-of-variance function, that measures the bias on the covariance matrix caused by an infinitesimal contamination in the data point yy. For the MLE, the change-of-variance function is unbounded. However, as presented in Ferrari and La Vecchia (2012, Proposition 2), the change-of-variance function of SMLE is bounded if the weighted modified score vector and its first derivative are bounded. We prove the boundedness of theses quantities for bounded beta densities, which implies that the SMLE is V-robust (Hampel et al., 2011, Section 2.5); see the Supplementary Material for details.

Ghosh (2019) proposed a robust estimator in beta regression models that minimizes an empirical version of the density power divergence introduced by Basu et al. (1998). The minimum density power divergence estimator (MDPDE) for the beta regression model (1)-(3) comes from the estimating equations

i=1n{U(yi,𝜽)f𝜽(yi;μi,ϕi)1qEi,q(𝜽)}=𝟎,\displaystyle\sum_{i=1}^{n}\left\{U(y_{i},{\bm{\theta}})f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}-E_{i,q}(\bm{\theta})\right\}=\mbox{\boldmath$0$\unboldmath}, (15)

where Ei,q(𝜽)=E(U(yi,𝜽)f𝜽(yi;μi,ϕi)1q)E_{i,q}(\bm{\theta})={\rm E}(U(y_{i},{\bm{\theta}})f_{\bm{\theta}}(y_{i};\mu_{i},\phi_{i})^{1-q}). The components of Ei,q(𝜽)E_{i,q}(\bm{\theta}) may be found in Ghosh (2019), and the asymptotic covariance matrix of the MDPDE is given in the Supplementary Material. The expectation Ei,q(𝜽)E_{i,q}(\bm{\theta}) requires that μiϕi>(1q)/(2q)\mu_{i}\phi_{i}>(1-q)/(2-q) and (1μi)ϕi>(1q)/(2q)(1-\mu_{i})\phi_{i}>(1-q)/(2-q). Besides, the asymptotic covariance matrix of the MDPDE requires that μiϕi>2(1q)/(32q)\mu_{i}\phi_{i}>2(1-q)/(3-2q) and (1μi)ϕi>2(1q)/(32q)(1-\mu_{i})\phi_{i}>2(1-q)/(3-2q). Hence, these quantities are well-defined, for all 0<q10<q\leq 1, only for bounded beta densities.

Note that the estimating functions in (15) are unbiased. They agree with those derived from the LqL_{q}-likelihood in (5) except for the term Ei,q(𝜽)E_{i,q}(\bm{\theta}) that plays the role of correcting the score bias. Recall that the score bias correction in the LqL_{q}-likelihood approach adopted here is achieved by rescaling the parameter estimates or, equivalently by reparameterizing the LqL_{q}-likelihood.

Ghosh (2019) presented simulations of the MDPDE only for constant precision beta regression models with fixed values for the tuning constant. However, in practical applications the choice of such a constant becomes a crucial issue. In Section 3, we will propose a data-driven method to select the tuning constant of the SMLE and the MDPDE which ensures full efficiency in the absence of contamination.

3 Selection of the tuning constant qq

An important aspect in real data applications is the choice of the optimal value of the tuning constant qq. Smaller values of qq increase robustness at the expense of reduced efficiency. La Vecchia et al. (2015) suggest to select the tuning constant closest to 1 (i.e., closest to MLE) such that the estimates of the parameters are sufficiently stable, ensuring full efficiency in the absence of contamination. This is the approach we follow here, but we standardize the estimates by ‘n×std. error\sqrt{n}\times\mbox{std. error}’ to simultaneously remove the effect of the sample size and of the magnitude of estimates of different parameters. We then propose a data-driven algorithm that starts with q0=1q_{0}=1 (MLE) and follows the steps below.

  1. 1.

    Define an equally spaced grid for qq: q0>q1>q2>>qm,q_{0}>q_{1}>q_{2}>\cdots>q_{m}, with qmqminq_{m}\geq q_{\rm min};

  2. 2.

    compute the estimates θ^qk1,,θ^qkp\widehat{\theta}^{1}_{q_{k}},\ldots,\widehat{\theta}^{p}_{q_{k}} with the corresponding asymptotic standard errors, se(θ^qk1),,\mbox{se}(\widehat{\theta}^{1}_{q_{k}}),\ldots, se(θ^qkp)\mbox{se}(\widehat{\theta}^{p}_{q_{k}});

  3. 3.

    compute the vectors of standardized estimates,

    𝒛qk=(θ^qk1nse(θ^qk1),,θ^qkpnse(θ^qkp)),\mbox{\boldmath$z$\unboldmath}_{q_{k}}=\left(\frac{\widehat{\theta}^{1}_{q_{k}}}{\sqrt{n}\ \mbox{se}(\widehat{\theta}^{1}_{q_{k}})},\cdots,\frac{\widehat{\theta}^{p}_{q_{k}}}{\sqrt{n}\ \mbox{se}(\widehat{\theta}^{p}_{q_{k}})}\right)^{\top},

    and the standardized quadratic variations (SQV) defined by

    SQVqk=p1𝒛qk𝒛qk+1;\mbox{SQV}_{q_{k}}=p^{-1}||\mbox{\boldmath$z$\unboldmath}_{q_{k}}-\mbox{\boldmath$z$\unboldmath}_{q_{k+1}}||;
  4. 4.

    for each qkq_{k}, check whether the stability condition defined by SQVqk<L\mbox{SQV}_{q_{k}}<L is satisfied, where L>0L>0 is a pre-defined threshold;

  5. 5.

    if the stability condition is satisfied for all qkq_{k}, set the optimal value of qq at q=maxqkq^{*}=\max q_{k};

  6. 6.

    if the stability condition is not satisfied for some qkq_{k}, set a new grid starting from the smallest qkq_{k} for which the condition does not hold;

  7. 7.

    repeat steps 1-6 until achieving stability of the estimates for the current grid or reaching the minimum value qminq_{\rm min}; if stability is achieved, set the optimal value of qq at q=maxqkq^{*}=\max q_{k}, otherwise at q=1q^{*}=1.

This procedure chooses the optimum value for the tuning constant that is closest to 1 (MLE) and still guarantees stability of the estimates for mm consecutive equally spaced values of qq. It may occur that such an optimal qq is not reached. In this case, the algorithm sets the optimal qq at 1 i.e. the MLE is chosen. This choice may seem unreasonable but recall that the SMLE is not expected to be robust for unbounded beta densities, which might lead to unstable estimates even for reasonably small qq. In such a situation, it is not advisable to replace the MLE by the SMLE. Our experience with simulated samples suggests to set the grid spacing at 0.02, the size of the grids at m=3m=3, the minimum value of qq at qmin=0.5q_{\rm min}=0.5 and the threshold at L=0.02L=0.02. These values were used in the simulations and applications reported below and worked very well.

4 Monte Carlo simulation results

To evaluate the performance and compare the MLE, SMLE and MDPDE, Monte Carlo simulations in the presence and absence of contamination in the data were carried out. The sample sizes considered are n=40,80,160,320n=40,80,160,320. The covariate values were set for the sample size n=40n=40 and replicated twice, four times and eight times to obtain the covariate values corresponding to the sizes n=80,n=80, 160, 320, respectively. This scheme guarantees that the heteroskedasticity intensity, max(ϕi)/min(ϕi)\mbox{max}(\phi_{i})/\mbox{min}(\phi_{i}), be constant for all the sample sizes (Espinheira et al., 2017, 2019). For all the scenarios we employ logit and logarithmic links in the mean and precision submodels, respectively (2) and (3). The models include intercepts, i.e. x1i=z1i=1x_{1i}=z_{1i}=1, for all i=1,2,,ni=1,2,\ldots,n, and the other covariates for the mean submodel are taken as random draws from a standard uniform distribution and kept constant throughout the simulated samples. For non-constant precision scenarios, the same covariates are used in the mean and precision submodels. The percentage of contamination in the sample for all the scenarios was set at 5%. The selection of the tuning constant qq for SMLE and MDPDE was conducted following the proposed algorithm described in Section 3. All the results are based on 1,000 Monte Carlo replications and were carried out using the R software environment (R Core Team, 2020).

Different configurations of parameter values and patterns of data contamination were considered. We now describe the results for two scenarios; results for other scenarios are collected in the Supplementary Material.

Scenario 1: Constant precision, one covariate in the mean submodel, mean response values close to zero.

The parameter values were set at β1=1.8,\beta_{1}=-1.8, β2=2.0\beta_{2}=-2.0, and γ1=4.5\gamma_{1}=4.5, which yield μ(0.02,0.14)\mu\in(0.02,0.14) with median(μ)=0.06\mbox{median}(\mu)=0.06 and ϕ=exp(4.5)=90\phi=\exp(4.5)=90. The contaminated sample replaces the observations generated with the 5% smallest mean responses by observations generated from a beta regression model with mean μi=(1+μi)/2\mu_{i}^{\prime}=(1+\mu_{i})/2 (and precision ϕ\phi). For instance, if μi0.1\mu_{i}\approx 0.1, then μi0.55\mu_{i}^{\prime}\approx 0.55.

Scenario 2: Varying precision, two covariates in the mean and precision submodels, mean response values around 0.4.

The parameter values were set at β1=0.8,\beta_{1}=0.8, β2=1.2\beta_{2}=-1.2, β3=1.2\beta_{3}=-1.2, γ1=3.8,\gamma_{1}=3.8, γ2=0.7\gamma_{2}=0.7, and γ3=0.7\gamma_{3}=0.7, which result in μ(0.22,0.64)\mu\in(0.22,0.64) with median(μ)=0.43\mbox{median}(\mu)=0.43 and ϕ(51,148)\phi\in(51,148) with median(ϕ)=85.\mbox{median}(\phi)=85. The contaminated sample replaces 5% of the sample as follows: the observations generated with the 2.5% largest values of μ\mu and those generated with the 2.5% smallest values of μ\mu are replaced by independent draws of a beta regression model with mean μi(1)=a1ci/(1+a1ci)\mu^{(1)}_{i}=a_{1}c_{i}/(1+a_{1}c_{i}) and μi(2)=a2ci/(1+a2ci)\mu^{(2)}_{i}=a_{2}c_{i}/(1+a_{2}c_{i}), respectively, where ci=μi/(1μi)c_{i}=\mu_{i}/(1-\mu_{i}), a1=0.01a_{1}=0.01 and a2=6.0a_{2}=6.0. For instance, if μ0.6\mu\approx 0.6, then μ(1)0.01\mu^{(1)}\approx 0.01, and if μ0.2\mu\approx 0.2, then μ(2)0.6\mu^{(2)}\approx 0.6.

Figure 3 and Figures 4-5 display the boxplots of the parameter estimates using MLE, SMLE, and MDPDE under Scenarios 1 and 2, respectively, for the data without contamination and for the contaminated data. Some general tendencies become apparent from these figures. First, the MLE of the parameters in the mean submodel and the precision submodel may be heavily affected by contamination in the data. Second, the robust estimators behave similarly to the MLE under non contaminated data. Third, the SMLE and MDPDE have similar behavior. Fourth, the SMLE and MDPDE remain centered at the true parameter value when contamination in introduced in the data; the robustness comes at the cost of some extra variability in small samples. Fifth, the data-driven selection of the optimum tuning constants worked well. In fact, as evidenced by Figure 6 the optimal values of qq for the SMLE and MDPDE are equal to 1 for the great majority of the non contaminated samples and around 0.9 for contaminated samples. Recall that q=1q=1 corresponds to the MLE. This means that the proposed mechanism is able to identify whether the sample requires a robust procedure or not, and correctly selects lower values of qq for contaminated samples. As a consequence, the robust estimators are full efficient in the absence of contamination. This is confirmed from a study (not shown) of the asymptotic efficiency of the robust estimators relative to the MLE measured by the ratio of the traces of the respective asymptotic covariance matrices. For all the scenarios and all the sample sizes considered, the relative asymptotic efficiency under no contaminated data is equal to 1 in almost all the cases; the smallest observed value is 0.9999.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Boxplots of parameter estimates under Scenario 1: MLE (left), SMLE (center), and MDPDE (right). The red dashed line represents the true parameter value.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Boxplots of estimates of β1\beta_{1}, β2\beta_{2}, and β3\beta_{3} under Scenario 2: MLE (left), SMLE (center), and MDPDE (right). The red dashed line represents the true parameter value.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Boxplots of estimates of γ1\gamma_{1}, γ2\gamma_{2}, and γ3\gamma_{3} under Scenario 2: MLE (left), SMLE (center), and MDPDE (right). The red dashed line represents the true parameter value.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Boxplots of the optimal values for the tuning constant qq for SMLE and MDPDE, under Scenarios 1 (first row) and 2 (second row).

We now compare the efficiency of the estimators through their total mean squared errors (TMSE). Table 1 displays the ratios of TMSE of the MLE, SMLE and MDPDE under Scenarios 1 and 2. When the data are not contaminated, the efficiency of the three estimators is similar, i.e., the ratios of the TMSE are equal to or close to 1. However, if contamination is present, the TMSE of the robust estimators are much smaller than those of the MLE more so if the sample is large. For instance, for n=160n=160 and Scenario 1, the TMSE of MLE is 141 times (134 times) larger than that of the SMLE (MDPDE). Under Scenario 2, the TMSE of the SMLE is smaller than that of the MDPDE for all the sample sizes.

Table 1: Ratio of total mean squared errors of the estimators under Scenarios 1 and 2.
Scenario 1 Scenario 2
Absence of contamination Presence of contamination Absence of contamination Presence of contamination
nn MLESMLE\frac{\rm MLE}{\rm SMLE} MLEMDPDE\frac{\rm MLE}{\rm MDPDE} SMLEMDPDE\frac{\rm SMLE}{\rm MDPDE} MLESMLE\frac{\rm MLE}{\rm SMLE} MLEMDPDE\frac{\rm MLE}{\rm MDPDE} SMLEMDPDE\frac{\rm SMLE}{\rm MDPDE} MLESMLE\frac{\rm MLE}{\rm SMLE} MLEMDPDE\frac{\rm MLE}{\rm MDPDE} SMLEMDPDE\frac{\rm SMLE}{\rm MDPDE} MLESMLE\frac{\rm MLE}{\rm SMLE} MLEMDPDE\frac{\rm MLE}{\rm MDPDE} SMLEMDPDE\frac{\rm SMLE}{\rm MDPDE}
4040 0.980.98 0.990.99 1.001.00 31.131.1 34.434.4 1.111.11 1.001.00 1.001.00 1.001.00 2.372.37 1.851.85 0.780.78
8080 1.001.00 1.001.00 1.001.00 73.573.5 72.872.8 0.990.99 1.001.00 1.001.00 1.001.00 5.905.90 4.164.16 0.700.70
160160 1.001.00 1.001.00 1.001.00 141.1141.1 133.7133.7 0.950.95 1.001.00 1.001.00 1.001.00 14.014.0 8.408.40 0.600.60
320320 1.001.00 1.001.00 1.001.00 284.2284.2 243.3243.3 0.860.86 1.001.00 1.001.00 1.001.00 31.731.7 26.026.0 0.820.82

Table 2 shows the empirical levels of the Wald-type tests based on the MLE and the robust estimators. The Wald-type test of the null hypothesis H:θj=θ0j{\rm H}:\theta^{j}=\theta^{j}_{0} against a two-sided alternative rejects H at the nominal level 1α1-\alpha when {(θ^jθ0j)/se(θ^j)}2\{(\widehat{\theta}^{j}-{\theta}^{j}_{0})/\mbox{se}(\widehat{\theta}^{j})\}^{2} exceeds the 1α1-\alpha quantile of the χ1\chi_{1} distribution. For non contaminated data, the tests behave similarly, the empirical levels being close 5% except for the smallest sample size in Scenario 2 (the empirical levels were observed between 8% and 11%). Under contaminated data, the tests that employ the robust estimators suffer from some type I error probability inflation. On the other hand, the tests that use the MLE are erratic in most of the cases, their empirical levels being near 100%. Hence, when the data contain outliers, the usual Wald-type tests should be avoided. The Wald-type tests computed from the robust estimators are reasonably reliable, although any decision based on a pp-value close to the nominal level should be taken with care.

Table 2: Empirical null levels of Wald-type tests under Scenarios 1 and 2 at the 5% nominal level.
Scenario 1 Scenario 2
n=40n=40 Abs. of cont. Pres. of cont. Abs. of cont. Pres. of cont.
Estimator β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3}
MLE 0.060.06 0.050.05 0.070.07 0.000.00 0.940.94 1.001.00 0.080.08 0.090.09 0.090.09 0.100.10 0.110.11 0.110.11 0.610.61 0.410.41 0.200.20 1.001.00 0.390.39 0.720.72
SMLE 0.060.06 0.050.05 0.070.07 0.060.06 0.080.08 0.100.10 0.080.08 0.090.09 0.090.09 0.100.10 0.110.11 0.110.11 0.150.15 0.130.13 0.130.13 0.220.22 0.210.21 0.230.23
MDPDE 0.060.06 0.050.05 0.070.07 0.050.05 0.080.08 0.090.09 0.080.08 0.090.09 0.090.09 0.100.10 0.110.11 0.110.11 0.170.17 0.150.15 0.140.14 0.290.29 0.260.26 0.290.29
Scenario 1 Scenario 2
n=80n=80 Abs. of cont. Pres. of cont. Abs. of cont. Pres. of cont.
Estimator β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3}
MLE 0.050.05 0.050.05 0.050.05 0.000.00 1.001.00 1.001.00 0.080.08 0.070.07 0.050.05 0.070.07 0.060.06 0.080.08 0.980.98 0.940.94 0.600.60 1.001.00 0.540.54 0.940.94
SMLE 0.050.05 0.050.05 0.050.05 0.050.05 0.070.07 0.070.07 0.080.08 0.070.07 0.050.05 0.070.07 0.060.06 0.080.08 0.110.11 0.080.08 0.070.07 0.170.17 0.130.13 0.180.18
MDPDE 0.050.05 0.050.05 0.050.05 0.050.05 0.070.07 0.070.07 0.080.08 0.070.07 0.050.05 0.070.07 0.060.06 0.080.08 0.120.12 0.090.09 0.080.08 0.220.22 0.190.19 0.240.24
Scenario 1 Scenario 2
n=160n=160 Abs. of cont. Pres. of cont. Abs. of cont. Pres. of cont.
Estimator β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3}
MLE 0.060.06 0.050.05 0.050.05 0.000.00 1.001.00 1.001.00 0.050.05 0.060.06 0.070.07 0.060.06 0.060.06 0.060.06 1.001.00 1.001.00 0.970.97 1.001.00 0.790.79 1.001.00
SMLE 0.060.06 0.050.05 0.050.05 0.050.05 0.070.07 0.080.08 0.050.05 0.060.06 0.070.07 0.060.06 0.060.06 0.060.06 0.090.09 0.070.07 0.080.08 0.150.15 0.110.11 0.140.14
MDPDE 0.060.06 0.050.05 0.050.05 0.050.05 0.070.07 0.080.08 0.050.05 0.060.06 0.070.07 0.060.06 0.060.06 0.060.06 0.110.11 0.080.08 0.090.09 0.190.19 0.160.16 0.180.18
Scenario 1 Scenario 2
n=320n=320 Abs. of cont. Pres. of cont. Abs. of cont. Pres. of cont.
Estimator β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} γ1\gamma_{1} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3} β1\beta_{1} β2\beta_{2} β3\beta_{3} γ1\gamma_{1} γ2\gamma_{2} γ3\gamma_{3}
MLE 0.060.06 0.050.05 0.050.05 0.000.00 1.001.00 1.001.00 0.060.06 0.060.06 0.050.05 0.060.06 0.060.06 0.060.06 1.001.00 1.001.00 1.001.00 1.001.00 0.980.98 1.001.00
SMLE 0.060.06 0.050.05 0.050.05 0.050.05 0.070.07 0.080.08 0.060.06 0.060.06 0.050.05 0.060.06 0.060.06 0.060.06 0.070.07 0.070.07 0.070.07 0.170.17 0.120.12 0.130.13
MDPDE 0.060.06 0.050.05 0.050.05 0.050.05 0.070.07 0.130.13 0.060.06 0.060.06 0.050.05 0.060.06 0.060.06 0.060.06 0.080.08 0.070.07 0.070.07 0.160.16 0.130.13 0.130.13

Overall, both the robust estimators performed similarly to the MLE when no contamination is present, and much better than the MLE under contamination in the data. The algorithm for the data-driven choice of the tuning constant proposed in this paper proved to be very effective for both estimators. It tends to select q1q\approx 1, i.e. the MLE, when the data are not contaminated, leading to full efficiency. When the data are contaminated, the selected qq tends to be reasonably smaller than 1, providing robust estimation. Both robust estimators applied with this automated selection of the tuning constant produce usable Wald-type tests, while the Wald-type tests that employ the MLE present unacceptable type I probabilities and should be avoided whenever there is any evidence of the presence of outliers.

5 Real data applications

5.1 The AIS Data

Bayes et al. (2012), Ghosh (2019), Migliorati et al. (2018) and van Niekerk et al. (2019) illustrated the sensitivity of MLE to outliers in the beta regression model with constant precision through the analysis of body fat percentages of 37 rowing athletes. The dataset, collected at the Australian Institute of Sport (AIS), is available in the package sn of software R (http://azzalini.stat.unipd.it/SN/index.html). The objective is to model the mean of the body fat percentage (BFP) as a function of the lean body mass (LBM). The authors noted substantial changes in the parameter estimates when two atypical observations are excluded from the data. Ghosh (2019) applied the MDPDE with different values of the tuning constant. Although the MDPDE displayed a robust fit, the practitioner would be uncertain of which value of the tuning constant should be chosen. Here, we will show that our proposed estimator with the optimal tuning constant selected as described in Section 3 provides a robust fit to the data.

Following the same specification of the model as in the papers mentioned above, we consider a beta regression model with constant precision given by log(μi/(1μi))=β1+β2LBMiandlog(ϕi)=γ1,\log({\mu_{i}}/{(1-\mu_{i})})=\beta_{1}+\beta_{2}\mbox{LBM}_{i}\ {\rm and}\ \log(\phi_{i})=\gamma_{1}, where μi\mu_{i} and ϕi\phi_{i} are the mean and precision of the response variable BFP for the ii-th athlete, i=1,,37i=1,\ldots,37. We fitted the model using the MLE and SMLE for the full data and excluding combinations of the most prominent outliers: {16}, {30}, and {16, 30}. The optimal value selected for the tuning constant qq is 0.820.82 for the full data, 0.880.88 and 0.840.84 for the data without observations 16 and 30, respectively, and 1 when both outliers are jointly deleted. These values reveal that the proposed selection method of the tuning constant correctly captures the fact that observations 16 and 30 are, individually and jointly, highly influential in the model fit. Figure 7 displays the scatter plot of LBM versus BFP along with the MLE and SMLE fitted lines for the full data and reduced data. We notice that the outlying observations lead to considerable changes in the fitted regression curves under the MLE fit. On the other hand, the SMLE fitted regression curves are almost indistinguishable.

Refer to caption
Refer to caption
Figure 7: Scatter plots of BFP versus LBM along with the fitted lines based on the MLE and SMLE for the full data and the data without outliers; AIS data.

Table 3 shows the estimates, standard errors, zz-statistics (estimate divided by standard error) and asymptotic pp-values of the Wald-type test of nullity of coefficients (i.e. based on the null asymptotic distribution of the square of the zz-statistic, which is χ12\chi^{2}_{1}) for the full data and the reduced data. We also report pp-values evaluated from parametric bootstrap with 500500 bootstrap replicates. In this application, the asymptotic and the bootstrap pp-values coincide up to the fourth decimal point.

We notice that the robust estimates present slight changes in the presence of the outlying observations 16 and 30 (individually or jointly). However, maximum likelihood estimates change drastically when outlying observations are excluded. For instance, the MLE for the coefficient of LBM is 0.027-0.027 for the full data and 0.038-0.038 for the data without observations 16 and 30, a relative change of 41%41\%. The corresponding robust estimates are 0.037-0.037 and 0.038-0.038. Also, the MLE of the mean and precision intercepts suffer major changes when the two atypical observations are excluded. The estimates obtained through the robust method remain almost unchangeable when the outlying observations are deleted. Overall, the robust estimates and the corresponding standard errors for the full data and for the reduced data are close to those obtained by the MLE without outliers.

Table 3: Estimates, standard errors, zz-stat and pp-values (bootstrap pp-values between parenteses) for beta regression with constant precision – AIS data.
MLE for the full dataset SMLE for the full dataset
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 0.0980.098 0.2530.253 0.3870.387 - 0.7820.782 0.1750.175 4.4604.460 -
LBM 0.027-0.027 0.0040.004 7.029-7.029 0.0000.000 (0.0000.000) 0.037-0.037 0.0030.003 13.627-13.627 0.0000.000 (0.0000.000)
precision submodel
Intercept 4.5714.571 0.2320.232 19.68619.686 - 5.3665.366 0.2420.242 22.16322.163 -
MLE without observation 16 SMLE without observation 16
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 0.3920.392 0.2560.256 1.5291.529 - 0.7910.791 0.1910.191 4.1364.136 -
LBM 0.032-0.032 0.0040.004 8.076-8.076 0.0000.000 (0.0000.000) 0.037-0.037 0.0030.003 12.694-12.694 0.0000.000 (0.0000.000)
precision submodel
Intercept 4.7424.742 0.2350.235 20.14020.140 - 5.3665.366 0.2400.240 22.37122.371 -
MLE without observation 30 SMLE without observation 30
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 0.3820.382 0.2190.219 1.7441.744 - 0.7770.777 0.1820.182 4.2644.264 -
LBM 0.031-0.031 0.0030.003 9.341-9.341 0.0000.000 (0.0000.000) 0.037-0.037 0.0030.003 13.200-13.200 0.0000.000 (0.0000.000)
precision submodel
Intercept 4.9534.953 0.2350.235 21.03221.032 - 5.3705.370 0.2430.243 22.06922.069 -
MLE without observations 16 and 30 SMLE without observations 16 and 30
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 0.8380.838 0.1880.188 4.4674.467 - 0.8380.838 0.1880.188 4.4674.467 -
LBM 0.038-0.038 0.0030.003 13.285-13.285 0.0000.000 (0.0000.000) 0.038-0.038 0.0030.003 13.285-13.285 0.0000.000 (0.0000.000)
precision submodel
Intercept 5.5075.507 0.2390.239 23.04723.047 - 5.5075.507 0.2390.239 23.04723.047 -

It is common practice in beta regression models to employ normal probability plots of residuals with simulated envelope to assess the goodness of fit (Espinheira et al., 2008; Ospina and Ferrari, 2012; Pereira, 2019). Following Espinheira et al. (2008), we consider the ‘standardized weighted residual 2’ given by

ri=yiμˇivˇi(1hˇii),\displaystyle r_{i}=\frac{y^{\star}_{i}-{\widecheck{\mu}}^{\star}_{i}}{\sqrt{\widecheck{v}_{i}(1-\widecheck{h}_{ii})}},

where μˇi\widecheck{\mu}^{\star}_{i}, vˇi\widecheck{v}_{i}, and hˇii\widecheck{h}_{ii} are estimates of μi\mu^{\star}_{i}, viv_{i}, and hiih_{ii}, respectively; hiih_{ii} is the ii-th diagonal element of HH, where H=W1/2X(XWX)1XW1/2H=W^{1/2}X(X^{\top}WX)^{-1}X^{\top}W^{1/2}, W=diag{w1,,wn}W=\mbox{diag}\{w_{1},\ldots,w_{n}\}, wi=ϕivi[gμ(μi)]2w_{i}=\phi_{i}v_{i}[g^{\prime}_{\mu}(\mu_{i})]^{-2}, and viv_{i} is given in the Appendix. Figure 8 shows normal probability plots of the residuals with simulated envelopes for the MLE and SMLE fits and a plot of the estimated weights employed by the SMLE. The residual plot for the MLE reveals some lack of fit, but the outlying observations, 16 and 30, fall only slightly outside of the envelope. Hence, it masks the influence of these observations. Differently, the residual plot for the SMLE fit suggests better fit for the majority of the data and strongly highlights observations 16 and 30 as outliers. In fact, robust estimators are expected to produce good fit for most of the data but not necessarily for the outliers. Figure 8(c) shows that observations with bigger residuals tend to receive smaller weights in the robust estimation. As expected observations 16 and 30 received the smallest weights.

Refer to caption
Refer to caption
Refer to caption
Figure 8: Normal probability plots of the residuals for MLE and SMLE and plot of estimated weights for SMLE – AIS data.

5.2 Tuna dataset

We consider a subset of the the data available in the Supplementary Material of Monllor-Hurtado et al. (2017) (https://doi.org/10.1371/journal.pone.0178196). Here, the response variable is the tropical tuna percentage (TTP) in longliner catches and the covariate is the sea surface temperature (SST). The data correspond to 77 observations of longliner catches in different points of the southern Indian ocean in the year 2000. One of the observations on TTP equals 1, which means that 100% of the catch is of tropical tuna, well above the second largest observed value, which is 0.350.35. We have no information of whether this is a correct observation or not, and hence a robust model fit that is not much influenced by this observation is required. An additional difficulty is that beta regressions are not suited for data with observations at the boundary of the unit interval. A practical strategy is to subtract or add a small value to such observations in order to locate them inside the open interval (0,1)(0,1). Here, we replaced 11 by 0.9990.999. Since the influence function of the MLE grows unbounded when an observation approaches zero or one, it may be highly influenced by observations near the boundaries. Recall that the SMLE has bounded influence function and, hence, it may be a suitable alternative to the MLE when observations at the boundary are present and slightly moved inside the open unit interval.

First, consider a constant precision beta regression model with log(μi/(1μi))=β1+β2SSTi\log({\mu_{i}}/{(1-\mu_{i})})=\beta_{1}+\beta_{2}\mbox{SST}_{i} and log(ϕi)=γ1\log(\phi_{i})=\gamma_{1}, where μi\mu_{i} and ϕi\phi_{i} are the mean and precision of TTP for the ii-th coordinate, i=1,,77i=1,\ldots,77. Figure 9 displays the scatter plot of TTP versus SST along with the MLE and SMLE fitted lines for the full data and the data without observation 46, that corresponds to the largest TTP value. A downward displacement of the MLE curve when observation 46 is excluded is apparent from the plot. The SMLE remains virtually unchanged with the exclusion of this data point.

Refer to caption
Figure 9: Scatter plots of TTP versus SST along with the fitted lines based on the MLE and SMLE for the full data and the data without an outlier; constant precision model – Tuna data.

Figure 10 shows normal probability plots of the residuals and estimated weights for the fitted models employing MLE and SMLE. The panels in the left column give strong indication of lack of fit of the maximum likelihood estimation and highlights observation 46 as outlier. The panels in the central column show reasonable fit of the LqL_{q}-likelihood estimation for all the observations except for case 46, that receives a weight close to zero. Results and plots not included here show that observation 25, which has the second largest absolute residual and falls slightly outside the envelope in the SMLE fit, does not have relevant impact in the MLE fit and, hence, its weight is close to the weights of the majority of the observations.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: Normal probability plots of the residuals for MLE and SMLE and plot of estimated weights for SMLE; constant precision model – Tuna data. The plots in the second row are zoomed versions of those in the first row.

It is of interest to investigate whether the influence of observation 46 is weakened when the precision is modeled as a function of the covariate. A varying precision model might also produce better MLE fit. The beta regression model considered now assumes that log(μi(1μi))=β1+β2SSTi,\log({\mu_{i}}{(1-\mu_{i})})=\beta_{1}+\beta_{2}\mbox{SST}_{i}, and log(ϕi)=γ1+γ2SSTi.\log(\phi_{i})=\gamma_{1}+\gamma_{2}\mbox{SST}_{i}. The optimum value of the tuning constant of the SMLE is 0.960.96 for the full data and 1 for the reduced data. Hence, when observation 46 is deleted, the SMLE and the MLE coincide. Figure 11(a) displays the scatter plot of TTP versus SST along with the fitted lines based on the SMLE and MLE for the full data and the data without observation 46. As in the constant precision modeling, this observation has disproportionate influence in the MLE fit and virtually does not affect the robust estimation.

Refer to caption
Refer to caption
Refer to caption
Figure 11: Scatter plots of TTP versus SST along with the fitted lines based on the MLE and SMLE for the full data and the data without an outlier; varying precision model – Tuna data.

Table 4 shows the estimates, standard errors, zz-statistics (estimate divided by standard error) and pp-values of the Wald-type test of nullity of coefficients for the full data and the reduced data. Note the substantial change caused by observation 46 in the estimate of the intercept of the precision submodel when the MLE is employed; it moves the estimate from 7.0227.022 to 2.1712.171, a relative change of 69%69\%. For the SMLE the estimates are, respectively, 2.4522.452 and 2.1712.171; the later coincides with the MLE. The maximum likelihood estimated coefficient of SST in the precision submodel is 0.213-0.213 (bootstrap pp-value=0.0000.000) for the full data and 0.0470.047 (bootstrap pp-value=0.3380.338) for the data without observation 46. Note the sign change in the estimate. Also, it is clear that a single observation causes a relevant change in an inferential conclusion: for the full data the coefficient of SST in the precision submodel is highly significant, while the MLE points to a constant precision model when the outlier is deleted. The corresponding SMLE estimates are 0.0360.036 (bootstrap pp-value=0.4140.414) for the full data and 0.0470.047 (bootstrap pp-value=0.3040.304) for the data without observation 46. In both cases, there is no evidence to reject the constant precision model.

We now investigate the sensitivity of the model fits to the choice of the replacement value for the response of observation 46. Recall that we replaced 1 by 0.9990.999. Figure 11(b)-(c) show the MLE and SMLE fitted curves for three choices of value for TTP46, namely 0.9990.999, 0.990.99, and 0.950.95. While the SMLE fitted curves remain unchanged, the MLE fit is markedly influenced by the chosen value. Moreover, the MLE of the SST coefficient in the precision submodel moves from 0.213-0.213 (bootstrap pp-value=0.0000.000) when TTP=460.999{}_{46}=0.999 to 0.094-0.094 (bootstrap pp-value=0.0300.030) when TTP=460.95{}_{46}=0.95, weakening the significance of SST in the precision submodel.

Table 4: Estimates, standard errors, zz-stat and pp-values (bootstrap pp-values between parenteses) for beta regression with varying precision – Tuna data.
MLE for the full dataset SMLE for the full dataset
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 6.945-6.945 0.6870.687 10.114-10.114 - 6.022-6.022 0.5210.521 11.557-11.557 -
SST 0.2240.224 0.0290.029 7.8157.815 0.0000.000 (0.0000.000) 0.1730.173 0.0200.020 8.6378.637 0.0000.000 (0.0000.000)
precision submodel
Intercept 7.0227.022 1.0461.046 6.7116.711 - 2.4522.452 1.0511.051 2.3332.333 -
SST 0.213-0.213 0.0420.042 5.066-5.066 0.0000.000 (0.0000.000) 0.0360.036 0.0430.043 0.8370.837 0.4020.402 (0.4140.414)
MLE without observation 46 SMLE without observation 46
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 5.990-5.990 0.5300.530 11.307-11.307 - 5.990-5.990 0.5300.530 11.307-11.307 -
SST 0.1710.171 0.0200.020 8.4648.464 0.0000.000 (0.0000.000) 0.1710.171 0.0200.020 8.4648.464 0.0000.000 (0.0000.000)
precision submodel
Intercept 2.1712.171 1.0471.047 2.0732.073 - 2.1712.171 1.0471.047 2.0732.073 -
SST 0.0470.047 0.0430.043 1.0991.099 0.2720.272 (0.3380.338) 0.0470.047 0.0430.043 1.0991.099 0.2720.272 (0.3040.304)

Figure 12 show plots of residuals and estimated weights for the MLE and SMLE fits of the varying precision beta regression model. The plots are similar to the corresponding plots of the constant precision model fits (Figure 10). We conclude that allowing the precision to be modeled as a function of SST does not improve the MLE fit and that observation 46 remains highly influential. A constant precision beta regression model coupled with the maximum reparameterized LqL_{q}-likelihood estimation we propose lead to a robust and suitable fit to the data.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 12: Normal probability plots of the residuals for MLE and SMLE and plot of estimated weights for SMLE; varying precision model – Tuna data. The plots in the second row are zoomed versions of those in the first row.

5.3 The Firm cost data

We now turn to the application introduced in Section 1.333Dataset taken from the personal page of Dr Edward W. Frees:
https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Regression%20Modeling/BookWebDec2010/CSVData/RiskSurvey.csv.
Consider a beta regression model with log(μi/(1μi))=β1+β2Ind costi+β3Size logi\log(\mu_{i}/(1-\mu_{i}))=\beta_{1}+\beta_{2}\mbox{Ind cost}_{i}+\beta_{3}\mbox{Size log}_{i} and log(ϕi)=γ1+γ2Ind costi+γ3Size logi,\log(\phi_{i})=\gamma_{1}+\gamma_{2}\mbox{Ind cost}_{i}+\gamma_{3}\mbox{Size log}_{i}, where μi\mu_{i} and ϕi\phi_{i} are the mean and precision of the response variable Firm cost for the ii-th firm, i=1,,73i=1,\ldots,73. We fitted the model using the MLE and the SMLE for the full data and the data without combinations of the most eye-catching outliers, namely observations 15, 16, and 72. The optimal value selected for the tuning constant qq is 0.960.96 for the full data and 0.940.94 for the data without observations 16 or/and 72. When observation 15 is excluded, either individually or with one or both other outliers, the selected qq turns to be 1, i.e., in these cases, the SMLE and the MLE coincide. Figure 13 displays the scatter plots of Firm cost versus the covariates along with the fitted lines based on the MLE and the SMLE for the full data and the reduced data. As anticipated in Section 1 the MLE fit is highly sensitive to the outliers. Differently, the outliers have much smaller impact on the SMLE fitted curves.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 13: Scatter plots for Firm cost dataset along with the fitted lines based on the MLE (a)-(c) and SMLE (b)-(d) for the full data and the data without outliers.

Table 5 shows the estimates, standard errors, zz-statistics, and pp-values of the Wald-type test of nullity of coefficients for the full data and the data without the subsets of observations {15} and {15, 16, 72}. The SMLE are much less influenced by the presence of outliers than the MLE. In fact, some MLE change considerably when observation 15 is excluded. For instance, the MLE for the coefficient of Size log in the precision submodel is 0.2440.244 (bootstrap pp-value=0.2000.200) for the full data and 0.6520.652 (bootstrap pp-value=0.0020.002) for the data without observation 15. Note the considerable relative change in the estimated coefficient of Size log (RC = 167%) and the change of the inferential conclusion; Size log is highly significant for the data without observation 15 and non-significant for the full data at the usual nominal levels. A similar behavior occurs when the subset {15, 16, 72} is excluded. On the other hand, the corresponding robust estimate is 0.6190.619 (bootstrap pp-value=0.0140.014) for the full data and it is equal to the MLE when observation 15 is excluded. Note that the robust inference leads to the conclusion that Size log is strongly significant in the precision model regardless of whether outliers are excluded or not. Additionally, the maximum likelihood estimated coefficient of Ind cost in the mean submodel decreases to half when observation 15 is excluded, while, in the precision submodel, it jumps from 5.718-5.718 to 1.596-1.596. Overall, we notice that the robust estimates and the corresponding standard errors for the full data and for the reduced data are close to those obtained by the maximum likelihood method applied to the data without outliers. This suggests that the inference based on the robust estimation is more reliable than the maximum likelihood estimation.

Table 5: Estimates, standard errors, zz-stat and pp-values (bootstrap pp-values between parenteses) for beta regression with varying precision – Firm cost data.
MLE for the full dataset SMLE for the full dataset
Estimate Std. error zz-stat pp-value Estimate Std. error zz-stat pp-value
mean submodel
Intercept 2.1882.188 0.9170.917 2.3862.386 - 3.5573.557 0.8460.846 4.2074.207 -
Ind cost 3.7633.763 0.5510.551 6.8306.830 0.0000.000 (0.0000.000) 1.9781.978 0.4470.447 4.4244.424 0.0000.000 (0.0040.004)
Size log 0.714-0.714 0.1090.109 6.580-6.580 0.0000.000 (0.0000.000) 0.828-0.828 0.0980.098 8.427-8.427 0.0000.000 (0.0000.000)
precision submodel
Intercept 2.7892.789 1.4761.476 1.8901.890 - 1.145-1.145 1.5221.522 0.753-0.753 -
Ind cost 5.718-5.718 0.6200.620 9.216-9.216 0.0000.000 (0.0000.000) 1.934-1.934 0.7340.734 2.635-2.635 0.0080.008 (0.0580.058)
Size log 0.2440.244 0.1700.170 1.4351.435 0.1510.151 (0.2000.200) 0.6190.619 0.1740.174 3.5623.562 0.0000.000 (0.0140.014)
MLE and SMLE without observation 15 MLE and SMLE without observations 15, 16 and 72
Estimate Std. error zz-stat pp-valuea Estimate Std. error zz-stat pp-value
mean submodel
Intercept 3.6543.654 0.8460.846 4.3214.321 - 2.5632.563 0.9170.917 2.7952.795 -
Ind cost 1.8141.814 0.4320.432 4.1964.196 0.0000.000 (0.0080.008) 1.8141.814 0.4760.476 3.8093.809 0.0000.000 (0.0080.008)
Size log 0.832-0.832 0.0980.098 8.498-8.498 0.0000.000 (0.0000.000) 0.703-0.703 0.1080.108 6.508-6.508 0.0000.000 (0.0000.000)
precision submodel
Intercept 1.522-1.522 1.5301.530 0.995-0.995 - 0.4620.462 1.7321.732 0.2670.267 -
Ind cost 1.596-1.596 0.7470.747 2.136-2.136 0.0330.033 (0.1180.118) 1.704-1.704 0.8680.868 1.963-1.963 0.0500.050 (0.1400.140)
Size log 0.6520.652 0.1750.175 3.7283.728 0.0000.000 (0.0020.002) 0.4210.421 0.2000.200 2.1072.107 0.0350.035 (0.0680.068)
  • a

    The bootstrap pp-values for the coefficients of Ind cost are those computed under the SMLE. The respective bootstrap pp-values under the MLE are 0.0060.006 and 0.1160.116. The remaining bootstrap pp-values under the SMLE and the MLE coincide.

Figure 14 shows normal probability plots of the residuals and estimated weights for the fitted models employing the MLE and the SMLE. Plots (a) and (d) suggests lack of the MLE fit and slightly displays observations 15 and 72 as outliers. Plots (b) and (e), that correspond to the SMLE fit for the full data, show that the majority of the data, including observations 16 and 72, is accommodated within the envelope. Observation 15 is clearly highlighted as an outlier. Indeed, only observation 15 receives a small weight in the robust fit; see plot (c) and (f). This behavior is expected because observations 16 and 72 do not have substantial influence in the MLE fit when observation 15 is not present in the data; see plots (a) and (c) in Figure 13.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 14: Normal probability plots of the residuals for MLE and SMLE and plot of estimated weights for SMLE; varying precision model – Firm cost data. The plots in the second row are zoomed versions of those in the first row.

6 Concluding remarks

Maximum likelihood estimation in beta regression is heavily influenced by outliers. We have proposed a robust inference method based on a reparameterized Lq-likelihood. The reparameterization is achieved through simple modifications in the mean and precision link functions. We have demonstrated robustness properties of the proposed estimator for bounded beta densities. The robust method depends on a tuning constant that balances robustness and efficiency. The selection of such a constant is a crucial issue for practical applications. We have proposed an algorithm to select the tuning constant based on the data. The algorithm is designed to choose q=1q=1, that corresponds to maximum likelihood estimation, when there is no contamination in the data, and to choose a suitable value of 0<q<10<q<1 when a robust fit is required. We have improved on the minimum power divergence estimator (Ghosh, 2019) by implementing our tuning constant selection scheme. Monte Carlo simulations and real data applications have illustrated the advantages of the robust methodology over the usual likelihood-based inference. As a by-product of the proposed approach, diagnostic tools such as normal probability plots of residuals based on robust fits have demonstrated effectiveness in highlighting outliers that would be masked under a non-robust procedure.

We emphasize that the codes and datasets used in this paper are available at the GitHub repository https://github.com/terezinharibeiro/RobustBetaRegression. Practitioners and researchers can replicate the simulations and applications and employ the proposed robust beta regression approach in their own data analyses.

The lack of robustness of the maximum likelihood estimation under data contamination renders Wald-type tests extremely oversized, hence unreliable. The proposed estimator leads to usable Wald-type tests. Likelihood ratio, score and gradient tests based on the reparameterized Lq-likelihood and the corresponding weighted modified score function in beta regression will be the subject of our future research.

A limitation of the present work is that robustness is achieved only for bounded beta densities, which fortunately are the most common in applied research. The second author has been working in an alternative robust method that preserves robustness even for beta densities that grow unboundedly at one or both of the boundaries of the support set. The work is in progress and the results will be reported elsewhere.

A usefull extension of the present work concerns inflated beta regression models (Ospina and Ferrari, 2012), that are employed when the response variable includes a non-negligible amount of observations at zero and/or one. This topic is beyond the scope of the present paper and will be dealt with in the near future.

Acknowledgments

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001 and by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brazil (CNPq). The second author gratefully acknowledges funding provided by CNPq (Grant No. 305963-2018-0).

Appendix

We define the following diagonal matrices with their respective diagonal elements: B1=diag{b1,1,,bn,1}B_{1}={\rm diag}\{b_{1,1},\ldots,b_{n,1}\} and B2=diag{b1,2,,bn,2}B_{2}={\rm diag}\{b_{1,2},\ldots,b_{n,2}\}, with

bi,1=B(μiϕi,(1μi)ϕi)qB(μi,qϕi,q,(1μi,q)ϕi,q),bi,2=B(μi,2qϕi,2q,(1μi,2q)ϕi,2q)B(μiϕi,(1μi)ϕi)2(1q)B(μi,qϕi,q,(1μi,q)ϕi,q);\displaystyle b_{i,1}=\frac{B(\mu_{i}\phi_{i},(1-\mu_{i})\phi_{i})^{q}}{B(\mu_{i,q}\phi_{i,q},(1-\mu_{i,q})\phi_{i,q})},\quad b_{i,2}=\frac{B(\mu_{i,2-q}\phi_{i,2-q},(1-\mu_{i,2-q})\phi_{i,2-q})}{B(\mu_{i}\phi_{i},(1-\mu_{i})\phi_{i})^{2(1-q)}B(\mu_{i,q}\phi_{i,q},(1-\mu_{i,q})\phi_{i,q})};

Tμ=diag{t1,μ,,tn,μ}T_{\mu}={\rm diag}\{t_{1,\mu},\ldots,t_{n,\mu}\} and Tϕ=diag{t1,ϕ,,tn,ϕ}T_{\phi}={\rm diag}\{t_{1,\phi},\ldots,t_{n,\phi}\}, with

ti,μ=[gμ(μi,q)]1,ti,ϕ=[gϕ(ϕi,q)]1;t_{i,\mu}=[g_{\mu}^{\prime}(\mu_{i,q})]^{-1},\quad t_{i,\phi}=[g_{\phi}^{\prime}(\phi_{i,q})]^{-1};

Φq=diag{ϕ1,q,,ϕn,q}\Phi_{q}={\rm diag}\{\phi_{1,q},\ldots,\phi_{n,q}\}; V=diag{v1,,vn}V={\rm diag}\{v_{1},\ldots,v_{n}\} and V2q=diag{v1,2q,,vn,2q}V_{2-q}={\rm diag}\{v_{1,2-q},\ldots,v_{n,2-q}\}, with

vi=Var(yi)=ψ(μiϕi)+ψ((1μi)ϕi),vi,2q=ψ(μi,2qϕi,2q)+ψ((1μi,2q)ϕi,2q);\displaystyle v_{i}=\mbox{Var}(y^{\star}_{i})=\psi^{\prime}(\mu_{i}\phi_{i})+\psi^{\prime}((1-\mu_{i})\phi_{i}),\quad v_{i,2-q}=\psi^{\prime}(\mu_{i,2-q}\phi_{i,2-q})+\psi^{\prime}((1-\mu_{i,2-q})\phi_{i,2-q});

C=diag{c1,,cn}C^{*}={\rm diag}\{c^{*}_{1},\ldots,c^{*}_{n}\} and C2q=diag{c1,2q,,cn,2q}C^{*}_{2-q}={\rm diag}\{c^{*}_{1,2-q},\ldots,c^{*}_{n,2-q}\}, with

ci\displaystyle c^{*}_{i} =\displaystyle= ϕi,q[μi,qψ(μiϕi)(1μi,q)ψ((1μi)ϕi)];\displaystyle\phi_{i,q}[\mu_{i,q}\psi^{\prime}(\mu_{i}\phi_{i})-(1-\mu_{i,q})\psi^{\prime}((1-\mu_{i})\phi_{i})];
ci,2q\displaystyle c^{*}_{i,2-q} =\displaystyle= ϕi,q[μi,qψ(μi,2qϕi,2q)(1μi,q)ψ((1μi,2q)ϕi,2q)];\displaystyle\phi_{i,q}[\mu_{i,q}\psi^{\prime}(\mu_{i,2-q}\phi_{i,2-q})-(1-\mu_{i,q})\psi^{\prime}((1-\mu_{i,2-q})\phi_{i,2-q})];

D=diag{d1,,dn}D^{*}={\rm diag}\{d^{*}_{1},\ldots,d^{*}_{n}\} and D2q=diag{d1,2q,,dn,2q}D^{*}_{2-q}={\rm diag}\{d^{*}_{1,2-q},\ldots,d^{*}_{n,2-q}\}, with

di\displaystyle d^{*}_{i} =\displaystyle= μi,q2ψ(μiϕi)+(1μi,q)2ψ((1μi)ϕi)ψ(ϕi),\displaystyle\mu_{i,q}^{2}\psi^{\prime}(\mu_{i}\phi_{i})+(1-\mu_{i,q})^{2}\psi^{\prime}((1-\mu_{i})\phi_{i})-\psi^{\prime}(\phi_{i}),
di,2q\displaystyle d^{*}_{i,2-q} =\displaystyle= μi,q2ψ(μi,2qϕi,2q)+(1μi,q)2ψ((1μi,2q)ϕi,2q)ψ(ϕi,2q);\displaystyle\mu_{i,q}^{2}\psi^{\prime}(\mu_{i,2-q}\phi_{i,2-q})+(1-\mu_{i,q})^{2}\psi^{\prime}((1-\mu_{i,2-q})\phi_{i,2-q})-\psi^{\prime}(\phi_{i,2-q});
M1\displaystyle M_{1} =\displaystyle= diag{(μ1,2qμ1),,(μn,2qμn)},\displaystyle{\rm diag}\{(\mu^{\star}_{1,2-q}-\mu^{\star}_{1}),\ldots,(\mu^{\star}_{n,2-q}-\mu^{\star}_{n})\},
M2\displaystyle M_{2} =\displaystyle= diag{μ1,2qd,,μn,2qd},\displaystyle{\rm diag}\{\mu^{d}_{1,2-q},\ldots,\mu^{d}_{n,2-q}\},
M3\displaystyle M_{3} =\displaystyle= diag{μ1,2qdϕ1,q(μ1,2qμ1),,μn,2qdϕn,q(μn,2qμn)},\displaystyle{\rm diag}\{\mu^{d}_{1,2-q}\phi_{1,q}(\mu^{\star}_{1,2-q}-\mu^{\star}_{1}),\ldots,\mu^{d}_{n,2-q}\phi_{n,q}(\mu^{\star}_{n,2-q}-\mu^{\star}_{n})\},

where

μi,2q\displaystyle\mu^{\star}_{i,2-q} =\displaystyle= ψ(μi,2qϕi,2q)ψ((1μi,2q)ϕi,2q),\displaystyle\psi(\mu_{i,2-q}\phi_{i,2-q})-\psi((1-\mu_{i,2-q})\phi_{i,2-q}),
μi,2q\displaystyle\mu^{\dagger}_{i,2-q} =\displaystyle= ψ((1μi,2q)ϕi,2q)ψ(ϕi,2q),\displaystyle\psi((1-\mu_{i,2-q})\phi_{i,2-q})-\psi(\phi_{i,2-q}),
μi,2qd\displaystyle\mu^{d}_{i,2-q} =\displaystyle= μi,q(μi,2qμi)+μi,2qμi.\displaystyle\mu_{i,q}(\mu^{\star}_{i,2-q}-\mu^{\star}_{i})+\mu^{\dagger}_{i,2-q}-\mu^{\dagger}_{i}.

All the above matrices are well-defined provided that 0<μi,2q<10<\mu_{i,2-q}<1 and ϕi,2q>0\phi_{i,2-q}>0. For the beta regression model (1)-(3), Jq(𝜽)J_{q}(\mbox{\boldmath$\theta$\unboldmath}) and Kq(𝜽)K_{q}(\mbox{\boldmath$\theta$\unboldmath}) we get after some algebra

Jq(𝜽)=q1(XB1Tμ2Φq2VXXB1TμTϕCZZB1TμTϕCXZB1Tϕ2DZ),\displaystyle J_{q}(\mbox{\boldmath$\theta$\unboldmath})=-q^{-1}\left(\begin{array}[]{cccc}X^{\top}B_{1}T_{\mu}^{2}\Phi_{q}^{2}VX&X^{\top}B_{1}T_{\mu}T_{\phi}C^{*}Z\\ Z^{\top}B_{1}T_{\mu}T_{\phi}C^{*}X&Z^{\top}B_{1}T_{\phi}^{2}D^{*}Z\\ \end{array}\right),
Kq(𝜽)=q2(XB2Tμ2Φq2(V2q+M12)XXB2TμTϕ(C2q+M3)ZZB2TμTϕ(C2q+M3)XZB2Tϕ2(D2q+M22)Z),\displaystyle K_{q}(\mbox{\boldmath$\theta$\unboldmath})=q^{-2}\left(\begin{array}[]{cccc}X^{\top}B_{2}T_{\mu}^{2}\Phi_{q}^{2}(V_{2-q}+M^{2}_{1})X&X^{\top}B_{2}T_{\mu}T_{\phi}(C^{*}_{2-q}+M_{3})Z\\ Z^{\top}B_{2}T_{\mu}T_{\phi}(C^{*}_{2-q}+M_{3})X&Z^{\top}B_{2}T_{\phi}^{2}(D^{*}_{2-q}+M_{2}^{2})Z\\ \end{array}\right),

where X=(𝑿1,,𝑿p1)X=(\mbox{\boldmath$X$\unboldmath}_{1},\ldots,\mbox{\boldmath$X$\unboldmath}_{p_{1}}), Z=(𝒁1,,𝒁p2)Z=(\mbox{\boldmath$Z$\unboldmath}_{1},\ldots,\mbox{\boldmath$Z$\unboldmath}_{p_{2}}); see the Supplementary Material for details.

For instance, setting q=1q=1

K1(𝜽)=(XTμ2Φ2VXXTμTϕCZZTμTϕCXZTϕ2DZ),\displaystyle K_{1}(\mbox{\boldmath$\theta$\unboldmath})=\left(\begin{array}[]{cccc}X^{\top}T_{\mu}^{2}\Phi^{2}VX&X^{\top}T_{\mu}T_{\phi}CZ\\ Z^{\top}T_{\mu}T_{\phi}CX&Z^{\top}T_{\phi}^{2}DZ\\ \end{array}\right),

where Φ=diag{ϕ1,,ϕn}\Phi={\rm diag}\{\phi_{1},\ldots,\phi_{n}\}; C=diag{c1,,cn}C={\rm diag}\{c_{1},\ldots,c_{n}\} and D=diag{d1,,dn}D={\rm diag}\{d_{1},\ldots,d_{n}\} with

ci\displaystyle c_{i} =\displaystyle= ϕi[μiψ(μiϕi)(1μi)ψ((1μi)ϕi)];\displaystyle\phi_{i}[\mu_{i}\psi^{\prime}(\mu_{i}\phi_{i})-(1-\mu_{i})\psi^{\prime}((1-\mu_{i})\phi_{i})];
di\displaystyle d_{i} =\displaystyle= μi2ψ(μiϕi)+(1μi)2ψ((1μi)ϕi)ψ(ϕi).\displaystyle\mu_{i}^{2}\psi^{\prime}(\mu_{i}\phi_{i})+(1-\mu_{i})^{2}\psi^{\prime}((1-\mu_{i})\phi_{i})-\psi^{\prime}(\phi_{i}).

References

  • Basu et al. (1998) Basu, A., Harris, I.R., Hjort, N.L., Jones, M., 1998. Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559.
  • Bayes et al. (2012) Bayes, C.L., Bazán, J.L., García, C., 2012. A new robust regression model for proportions. Bayesian Analysis 7, 841–866.
  • Box and Cox (1964) Box, G.E., Cox, D.R., 1964. An analysis of transformations. Journal of the Royal Statistical Society B 26, 211–252.
  • Di Brisco et al. (2020) Di Brisco, A.M., Migliorati, S., Ongaro, A., 2020. Robustness against outliers: A new variance inflated regression model for proportions. Statistical Modelling 20, 274–309.
  • Espinheira et al. (2008) Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F., 2008. On beta regression residuals. Journal of Applied Statistics 35, 407–419.
  • Espinheira et al. (2017) Espinheira, P.L., Santos, E.G., Cribari-Neto, F., 2017. On nonlinear beta regression residuals. Biometrical Journal 59, 445–461.
  • Espinheira et al. (2019) Espinheira, P.L., da Silva, L.C.M., Silva, A.O., Ospina, R., 2019. Model selection criteria on beta regression for machine learning. Machine Learning and Knowledge Extraction 1, 427–449.
  • Ferrari and La Vecchia (2012) Ferrari, D., La Vecchia, D., 2012. On robust estimation via pseudo-additive information. Biometrika 99, 238–244.
  • Ferrari and Yang (2010) Ferrari, D., Yang, Y., 2010. Maximum Lq\mbox{L}_{q}-likelihood estimation. The Annals of Statistics 38, 753–783.
  • Ferrari and Cribari-Neto (2004) Ferrari, S.L.P., Cribari-Neto, F., 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31, 799–815.
  • Ghosh (2019) Ghosh, A., 2019. Robust inference under the beta regression model with application to health care studies. Statistical Methods in Medical Research 28, 871–888.
  • Gómez-Déniz et al. (2014) Gómez-Déniz, E., Sordo, M.A., Calderín-Ojeda, E., 2014. The log–Lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: Mathematics and Economics 54, 49–57.
  • Hampel et al. (2011) Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 2011. Robust Statistics: The Approach Based on Influence Functions. volume 196. New York: John Wiley & Sons.
  • Heritier et al. (2009) Heritier, S., Cantoni, E., Copt, S., Victoria-Feser, M.P., 2009. Robust Methods in Biostatistics. volume 825. New York: John Wiley & Sons.
  • Heritier and Ronchetti (1994) Heritier, S., Ronchetti, E., 1994. Robust bounded-influence tests in general parametric models. Journal of the American Statistical Association 89, 897–904.
  • Huber (1981) Huber, P., 1981. Robust Statistics. New York: John Wiley & Sons.
  • Huber (1964) Huber, P.J., 1964. Robust estimation of a location parameter. The Annals of Mathematical Statistics 35, 73–101.
  • La Vecchia et al. (2015) La Vecchia, D., Camponovo, L., Ferrari, D., 2015. Robust heart rate variability analysis by generalized entropy minimization. Computational Statistics & Data Analysis 82, 137–151.
  • Migliorati et al. (2018) Migliorati, S., Di Brisco, A.M., Ongaro, A., 2018. A new regression model for bounded responses. Bayesian Analysis 13, 845–872.
  • Monllor-Hurtado et al. (2017) Monllor-Hurtado, A., Pennino, M.G., Sanchez-Lizaso, J.L., 2017. Shift in tuna catches due to ocean warming. PloS One 12, e0178196.
  • van Niekerk et al. (2019) van Niekerk, J., Bekker, A., Arashi, M., 2019. Beta regression in the presence of outliers – a wieldy bayesian solution. Statistical Methods in Medical Research 28, 3729–3740.
  • Ospina and Ferrari (2012) Ospina, R., Ferrari, S.L., 2012. A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis 56, 1609–1623.
  • Pereira (2019) Pereira, G.H.A., 2019. On quantile residuals in beta regression. Communications in Statistics - Simulation and Computation 48, 302–316.
  • R Core Team (2020) R Core Team, 2020. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL: https://www.R-project.org/.
  • Schmit and Roth (1990) Schmit, J.T., Roth, K., 1990. Cost effectiveness of risk management practices. Journal of Risk and Insurance 57, 455–470.
  • Smithson and Verkuilen (2006) Smithson, M., Verkuilen, J., 2006. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods 11, 54–71.
  • Tsallis (1988) Tsallis, C., 1988. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics 52, 479–487.