This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Transformed Fay-Herriot Model
with Measurement Error in Covariates

Sepideh Mosaferi 111Corresponding Author; E-mail address: mosaferi@iastate.edu
Sepideh Mosaferi is a Ph.D. Candidate with the Department of Statistics and Statistical Laboratory at Iowa State University (E-mail: mosaferi@iastate.edu). Malay Ghosh is a Professor with the Department of Statistics at University of Florida (E-mail: ghoshm@ufl.edu). Rebecca C. Steorts is an Assistant Professor with the Department of Statistical Science and Computer Science at Duke University (E-mail: beka@stat.duke.edu).
   Malay Ghosh    Rebecca C. Steorts
(February 16, 2021)
Abstract

Statistical agencies are often asked to produce small area estimates (SAEs) for positively skewed variables. When domain sample sizes are too small to support direct estimators, effects of skewness of the response variable can be large. As such, it is important to appropriately account for the distribution of the response variable given available auxiliary information. Motivated by this issue and in order to stabilize the skewness and achieve normality in the response variable, we propose an area-level log-measurement error model on the response variable. Then, under our proposed modeling framework, we derive an empirical Bayes (EB) predictor of positive small area quantities subject to the covariates containing measurement error. We propose a corresponding mean squared prediction error (MSPE) of EB predictor using both a jackknife and a bootstrap method. We show that the order of the bias is O(m1)O(m^{-1}), where mm is the number of small areas. Finally, we investigate the performance of our methodology using both design-based and model-based simulation studies.

KEYWORDS: Small area estimation; official statistics; Bayesian methods; jackknife; parametric bootstrap; applied statistics; simulation studies.

1 Introduction

Typically, in small area measurement error models, both the response variable and covariate can be any real number (see Ybarra and Lohr (2008), Arima et al. (2017)). However, statistical agencies are often asked to produce small area estimates (SAEs) for skewed variables, which are also positive in +\mathbb{R}^{+}. For instance, the Census of the Governments (CoG) provides information on roads, tolls, airports, and other similar information at the local-government level as defined by the United States Census Bureau (USCB). Another example includes the United States National Agricultural Statistics Service (NASS), which provides estimates regarding crop harvests (see Bellow and Lahiri (2011)). The United States Natural Resources Conservation Service (NRCS) provides estimates regarding roads at the county-level (e.g., Wang and Fuller (2003)), and the Australian Agricultural and Grazing Industries Survey provides estimates of the total expenditures of Australian farms (e.g., Chandra and Chambers (2011)).

When domain sample sizes are too small to support direct estimators, the effect of skewness can be quite large, and it is critical to account for the distribution of the response variable given auxiliary information at hand. For a review of the SAE literature, we refer to recent work by Rao and Molina (2015) and Pfefferman (2013). The case of positively skewed response variables is one such that the governing parameter in the Box-Cox transformation is zero. Due to the fact that the covariate in the model may be positively skewed and contains measurement error, this has received less attention in the literature. Throughout this paper, we explain the problem which is beyond a simple substitution and address some of its difficulties.

1.1 Census of the Governments

As mentioned in Sec. 1, our proposed framework is motivated by data that is positively skewed. One such data set is the Census of Governments (CoG), which is a survey data collected by the United States Census Bureau (USCB) periodically that provides comprehensive statistics about governments and governmental activities. Data is reported on government organizations, finances, and employment. For example, data from organizations refer to location, type, and characteristics of local governments and officials. Data from finances/employment refer to revenue, expenditure, debt, assets, employees, payroll, and benefits.

We utilize data from the CoG from 2007 and 2012 (https://www.census.gov/econ/overview/go0100.html). In the CoG, the small areas consist of the 48 states of the contiguous United States. These 48 areas contain 86,152 local governments defined by the USCB, such as airports, toll roads, bridges, and other federal government corporations. The parameter of interest is the average number of full-time employees per government at the state level from the 2012 data set, which can be defined as the total number of full-time employees from all local governments divided by the total number of local governments per state. The covariate of interest is the average number of full-time employees per government at the state level from the 2007 data set. After studying residual plots and histograms, we observe skewed patterns in the average number of full-time employees in both the 2007 and 2012 data sets, which partially motivate our proposed framework.

1.2 Our Contribution

Motivated by issues that statistical agencies face with skewed response variables we make several contributions to the literature. In order to stabilize the skewness and achieve normality in the response variable, we propose an area-level log-measurement error model on the response variable (Eq. (1)). In addition, we propose a log-measurement error model on the covariates (Eq. (2)). Next, under our proposed modeling framework, we derive an EB predictor of positive small area quantities subject to the covariates containing measurement error. In addition, we propose a corresponding estimate of the MSPE using a jackknife and a parametric bootstrap, where we illustrate that the order of the bias is O(m1)O(m^{-1}) under standard regularity conditions. We illustrate the performance of our methodology in both model-based simulation and design-based simulation studies. We summarize our conclusions and provide directions for future work.

The article is organized as follows. Sec. 1.3 details the prior work related to our proposed methodology. In Sec. 2, we propose a log-measurement error model for the response variable. In addition, we consider a measurement error model of the covariates with a log transformation. Further, we derive the EB predictor under our framework. Sec. 2.2 provides the MSPE for our EB predictor. We provide a decomposition of the MSPE to include the uncertainty of the EB predictor through unknown parameters. Sec. 3 provides two estimators of the MSPE, namely a jackknife and a parametric bootstrap, where we prove that the order of the bias is O(m1)O(m^{-1}) under standard regularity conditions. Sec. 4 provides both design-based and model-based simulation studies. Sec. 5 provides a discussion and directions for future work.

1.3 Prior Work

In this section, we review the prior literature most relevant to our proposed work. There is a rich literature on the area-level Fay-Herriot model, where various additive measurement error models have been proposed on the covariates. Ybarra and Lohr (2008) proposed the first additive measurement error model on the covariates. More specifically, the authors considered covariate information from another survey that was independent of the response variable. More recently, Berg and Chandra (2014) have proposed an EB predictor and an approximately unbiased MSE estimator under a unit-level log-normal model, where no measurement error is assumed present in the covariates. Turning to the Bayesian literature, Arima et al. (2017), Arima et al. (2015a), and Arima et al. (2015b) have provided fully Bayesian solutions to the measurement error problem for both unit-level and area-level small area estimation problems.

Next, we discuss related literature regarding the proposed jackknife and parametric bootstrap estimator of the MSPE of the Bayes estimators, where the order of the bias is O(m1),O(m^{-1}), under standard regularity conditions. Our proposed jackknife estimator of the MSPE contrasts that of Jiang et al. (2002), who proposed an MSE using an orthogonal decomposition, where the leading term in the MSE does not depend on the area-specific response and is nearly unbiased. Given that the authors can make an orthogonal decomposition, they can show that the order of the bias of the MSE is o(m1),o(m^{-1}), which contrasts our proposed approach. Under our approach, the leading term depends on the area-specific response, and thus, the bias is of order O(m1)O(m^{-1}). Turning to the bootstrap, we utilize methods similar to Butar and Lahiri (2003). Using this approach, we propose a parametric bootstrap estimator of the MSPE of our estimator. In a similar manner to the jackknife, the order of the bias for the parametric bootstrap estimator of the MSPE is O(m1)O(m^{-1}).

2 Area-Level Logarithmic Model with Measurement Error

Consider mm small areas and let YiY_{i} (i=1,,mi=1,\ldots,m) denote the population characteristic of interest in area ii, where often the information of interest is a population mean or proportion. A primary survey provides a direct estimator yiy_{i} of YiY_{i} for some or all of the mm small areas. In this section, we propose a measurement error model suitable for the inference of positively skewed response variable yiy_{i}. To achieve normality in the response variable, we therefore propose an area-level log-measurement error model on Yi.Y_{i}. In the rest of this section, we explain our model and the desirable predictor.

Consider the following model:

zi=θi+ei,\displaystyle z_{i}=\theta_{i}+e_{i}, (1)

where zi:=logyiz_{i}:=\log y_{i}, θi:=logYi\theta_{i}:=\log Y_{i}, and eie_{i} is the sampling error distributed as eiN(0,ψi)e_{i}\sim N(0,\psi_{i}). Assume

θi=k=1pβklogXik+νi,\theta_{i}=\sum_{k=1}^{p}\beta_{k}\log X_{ik}+\nu_{i},

where XikX_{ik} is the kk-th covariate of the ii-th small area, which is unknown but is observed by xikx_{ik}. The regression coefficient βk\beta_{k} is unknown and must be estimated, and νi\nu_{i} is the random effect distributed as νiN(0,σν2)\nu_{i}\sim N(0,\sigma^{2}_{\nu}), where σν2\sigma^{2}_{\nu} is unknown.

Our measurement error model for the case of positively skewed XikX_{ik}’s is proposed as

wik:=logxik=logXik+ηik,k=1,,p,w_{ik}:=\log x_{ik}=\log X_{ik}+\eta_{ik},\qquad k=1,...,p,

or in a vector form

𝒘i=𝑾i+𝜼i,𝜼iNp(𝟎,Σi),\bm{w}_{i}=\bm{W}_{i}+\bm{\eta}_{i},\qquad\bm{\eta}_{i}\sim N_{p}(\bm{0},\Sigma_{i}), (2)

where 𝒘i=(wi1,,wip)\bm{w}_{i}=(w_{i1},...,w_{ip})^{\top} and 𝑾i=(Wi1,,Wip)\bm{W}_{i}=(W_{i1},...,W_{ip})^{\top} for Wik=logXikW_{ik}=\log X_{ik}. Note that in Eq. (2), 𝑾i\bm{W}_{i} is non-stochastic within the class of functional measurement error models (c.f. Fuller (2006)). We assume Σi\Sigma_{i} is known, and if it is unknown, it can be estimated using microdata or from another independent survey. We refer to Arima et al. (2017) for further details of estimating Σi\Sigma_{i}.

Now, one can write

{zi=𝑾i𝜷+νi+𝜷𝜼i+eiθi=𝑾i𝜷+νi+𝜷𝜼i\displaystyle\begin{cases}z_{i}=\bm{W}_{i}^{\top}\bm{\beta}+\nu_{i}+\bm{\beta}^{\top}\bm{\eta}_{i}+e_{i}\\ \theta_{i}=\bm{W}_{i}^{\top}\bm{\beta}+\nu_{i}+\bm{\beta}^{\top}\bm{\eta}_{i}\end{cases}

where 𝜷=(β1,,βp)\bm{\beta}=(\beta_{1},...,\beta_{p})^{\top}. Thus, for the pair (zi,θi)(z_{i},\theta_{i}), we have the following joint normal distribution

(ziθi)N2[(𝑾i𝜷𝑾i𝜷),(𝜷Σi𝜷+σν2+ψi𝜷Σi𝜷+σν2𝜷Σi𝜷+σν2𝜷Σi𝜷+σν2)].\begin{pmatrix}z_{i}\\ \theta_{i}\end{pmatrix}\sim N_{2}\Bigg{[}\begin{pmatrix}\bm{W}_{i}^{\top}\bm{\beta}\\ \bm{W}_{i}^{\top}\bm{\beta}\end{pmatrix},\begin{pmatrix}\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}+\psi_{i}&\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}\\ \bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}&\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}\end{pmatrix}\Bigg{]}.

We assume all the sources of errors (ei,νi,𝜼i)(e_{i},\nu_{i},\bm{\eta}_{i}) for i=1,,mi=1,...,m are mutually independent throughout the rest of the paper.

Remark 1.

Eq. (1) is a Fay-Herriot model for ziz_{i}, however, the parameter of interest is Yi:=exp(θi)Y_{i}:=\exp(\theta_{i}) rather than θi.\theta_{i}. Slud and Maiti (2006) and Ghosh et al. (2015) used a similar model in the absence of measurement errors in the covariates.

Next, we give the following conditional distribution [θi|zi][\theta_{i}|z_{i}] to later justify our Bayesian interpretation of the unknown interested parameter YiY_{i}:

θi|ziN[𝑾i𝜷+𝜷Σi𝜷+σν2𝜷Σi𝜷+σν2+ψi(zi𝑾i𝜷),𝜷Σi𝜷+σν2(𝜷Σi𝜷+σν2)2𝜷Σi𝜷+σν2+ψi],\theta_{i}|z_{i}\sim N\Big{[}\bm{W}_{i}^{\top}\bm{\beta}+\frac{\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}}{\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}+\psi_{i}}(z_{i}-\bm{W}_{i}^{\top}\bm{\beta}),\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}-\frac{(\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu})^{2}}{\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}+\psi_{i}}\Big{]},

i.e.

θi|ziN(γizi+(1γi)𝑾i𝜷,γiψi),\theta_{i}|z_{i}\sim N\Big{(}\gamma_{i}z_{i}+(1-\gamma_{i})\bm{W}_{i}^{\top}\bm{\beta},\gamma_{i}\psi_{i}\Big{)},

where γi=(𝜷Σi𝜷+σν2)/(𝜷Σi𝜷+σν2+ψi)\gamma_{i}=(\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu})/(\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}+\psi_{i}).

Recall that the parameter of interest is Yi:=exp(θi)Y_{i}:=\exp(\theta_{i}) after transforming from the logarithmic scale back to the original scale. Therefore, the corresponding Bayes predictor is given by Y^i:=E(Yi|zi)\hat{Y}_{i}:=E(Y_{i}|z_{i}). By using the moment generating function of the normal distribution of θi|zi\theta_{i}|z_{i}, the Bayes predictor has the form of Y^i=exp{γizi+(1γi)𝑾i𝜷+γiψi/2}\hat{Y}_{i}=\exp\{\gamma_{i}z_{i}+(1-\gamma_{i})\bm{W}_{i}^{\top}\bm{\beta}+\gamma_{i}\psi_{i}/2\}. In practice, 𝑾i\bm{W}_{i} is unobserved, and since E(𝒘i)=𝑾iE(\bm{w}_{i})=\bm{W}_{i}, we can replace it with the observed 𝒘i\bm{w}_{i}. Also, 𝜷\bm{\beta} and σν2\sigma^{2}_{\nu} are unknown, and we need to replace them with their consistent estimators. Therefore, the EB predictor of YiY_{i} is

Y^iEB=exp{γ^izi+(1γ^i)𝒘i𝜷^+γ^iψi2}.\displaystyle\hat{Y}_{i}^{\text{EB}}=\exp\Big{\{}\hat{\gamma}_{i}z_{i}+(1-\hat{\gamma}_{i})\bm{w}_{i}^{\top}\hat{\bm{\beta}}+\frac{\hat{\gamma}_{i}\psi_{i}}{2}\Big{\}}. (3)

2.1 Estimation of Unknown Parameters

In this section, we discuss estimation of the unknown parameters 𝜷\bm{\beta} and σν2.\sigma^{2}_{\nu}. First, an estimator of 𝜷\bm{\beta} is obtained by solving the equation

i=1m[Di(𝒘i𝒘iΣi)]𝜷=i=1mDi𝒘izi.\sum_{i=1}^{m}\Big{[}D_{i}\Big{(}\bm{w}_{i}\bm{w}_{i}^{\top}-\Sigma_{i}\Big{)}\Big{]}\bm{\beta}=\sum_{i=1}^{m}D_{i}\bm{w}_{i}z_{i}. (4)

The justification for Eq. (4) is as follows. Let 𝒛=(z1,,zm)\bm{z}=(z_{1},...,z_{m})^{\top} and 𝑾=(𝑾1,,𝑾m)\bm{W}^{\top}=(\bm{W}_{1},...,\bm{W}_{m}). Then, 𝒛Nm(𝑾𝜷,D1)\bm{z}\sim N_{m}(\bm{W}\bm{\beta},D^{-1}) where D1=diag(D11,,Dm1)D^{-1}=\text{diag}(D_{1}^{-1},...,D_{m}^{-1}) and Di1=𝜷Σi𝜷+σν2+ψiD_{i}^{-1}=\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}+\sigma^{2}_{\nu}+\psi_{i}. Hence, an estimator of 𝜷\bm{\beta} is obtained by solving

𝜷=(𝑾D𝑾)1𝑾D𝒛=(i=1mDi𝑾i𝑾i)1i=1mDi𝑾izi.\bm{\beta}=\Big{(}\bm{W}^{\top}D\bm{W}\Big{)}^{-1}\bm{W}^{\top}D\bm{z}=\Big{(}\sum_{i=1}^{m}D_{i}\bm{W}_{i}\bm{W}_{i}^{\top}\Big{)}^{-1}\sum_{i=1}^{m}D_{i}\bm{W}_{i}z_{i}.

Now, notice that E(𝒘i𝒘i)=𝑾i𝑾i+ΣiE(\bm{w}_{i}\bm{w}_{i}^{\top})=\bm{W}_{i}\bm{W}_{i}^{\top}+\Sigma_{i} and E(𝒘i)=𝑾iE(\bm{w}_{i})=\bm{W}_{i}. Hence, we estimate 𝜷\bm{\beta} from

i=1m[Di(𝒘i𝒘iΣi)]𝜷=i=1mDi𝒘izi.\sum_{i=1}^{m}\Big{[}D_{i}\Big{(}\bm{w}_{i}\bm{w}_{i}^{\top}-\Sigma_{i}\Big{)}\Big{]}\bm{\beta}=\sum_{i=1}^{m}D_{i}\bm{w}_{i}z_{i}.

However, DiD_{i} is not known as both 𝜷\bm{\beta} and σν2\sigma^{2}_{\nu} are unknown. Take E(zi𝒘i𝜷)2=σν2+ψiE(z_{i}-\bm{w}_{i}^{\top}\bm{\beta})^{2}=\sigma^{2}_{\nu}+\psi_{i}. Then σν2\sigma^{2}_{\nu} can be estimated from

m1i=1m(zi𝒘i𝜷)2m1i=1mψi.m^{-1}\sum_{i=1}^{m}\Big{(}z_{i}-\bm{w}_{i}^{\top}\bm{\beta}\Big{)}^{2}-m^{-1}\sum_{i=1}^{m}\psi_{i}. (5)

If the above is less than zero, estimate σν2\sigma_{\nu}^{2} as zero. One can estimate 𝜷\bm{\beta} and σν2\sigma^{2}_{\nu} by iteratively solving the Eqs. (4) and (5).

2.2 Mean Squared Prediction Error of the EB Predictor

In this section, we first define the MSPE of the EB predictor Y^iEB.\hat{Y}_{i}^{\text{EB}}. Second, we show that the cross-product term of the MSPE of the EB predictor Y^iEB\hat{Y}_{i}^{\text{EB}} is exactly zero. Now, we introduce notation that will be used throughout the rest of the paper. Let

M1i\displaystyle M_{1i} :=E[(Y^iYi)2|zi]\displaystyle:=E[(\hat{Y}_{i}-Y_{i})^{2}|z_{i}]
=exp{ψiγi}[exp{ψiγi}1]exp{2[γizi+(1γi)𝑾i𝜷]}\displaystyle=\exp\Big{\{}\psi_{i}\gamma_{i}\Big{\}}\Big{[}\exp\Big{\{}\psi_{i}\gamma_{i}\Big{\}}-1\Big{]}\exp\Big{\{}2\Big{[}\gamma_{i}z_{i}+(1-\gamma_{i})\bm{W}_{i}^{\top}\bm{\beta}\Big{]}\Big{\}}
M2i\displaystyle M_{2i} :=E[(Y^iEBY^i)2|zi],M3i:=E[(Y^iEBY^i)(Y^iYi)|zi].\displaystyle:=E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}],\quad M_{3i}:=E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})(\hat{Y}_{i}-Y_{i})|z_{i}].

Note that we estimate 𝑾i\bm{W}_{i} with 𝒘i\bm{w}_{i}, and the term M1iM_{1i} depends on the area-specific response variable ziz_{i} unlike Jiang et al. (2002), and its estimator has bias of order O(m1)O(m^{-1}). Since we wish to include the uncertainty of the EB predictor Y^iEB\hat{Y}_{i}^{\text{EB}} with respect to the unknown parameters 𝜷\bm{\beta} and σν2\sigma^{2}_{\nu}, we decompose the MSPE into three terms using Definition 1.

Definition 1.

The MSPE of the EB predictor Y^iEB\hat{Y}^{\text{EB}}_{i} is

MSPE(Y^iEB)\displaystyle\text{MSPE}(\hat{Y}^{\text{EB}}_{i}) =E[(Y^iEBYi)2|zi]\displaystyle=E[(\hat{Y}_{i}^{\text{EB}}-Y_{i})^{2}|z_{i}]
E[(Y^iYi)2|zi]+E[(Y^iEBY^i)2|zi]+2E[(Y^iEBY^i)(Y^iYi)|zi]\displaystyle\equiv E[(\hat{Y}_{i}-Y_{i})^{2}|z_{i}]+E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}]+2E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})(\hat{Y}_{i}-Y_{i})|z_{i}]
=M1i+M2i+2M3i,\displaystyle=M_{1i}+M_{2i}+2M_{3i},

where we show below that M3i=0.M_{3i}=0.

To show that the cross product, M3iM_{3i} goes to 0, recall the Bayes estimator is

E[Y^i]=E[Yizi]E[Y^iYizi]=0.E[\hat{Y}_{i}]=E[Y_{i}\mid z_{i}]\implies E[\hat{Y}_{i}-Y_{i}\mid z_{i}]=0.

Consider

M3i\displaystyle M_{3i} =E[(Y^iEBY^i)(Y^iYi)|zi]\displaystyle=E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})(\hat{Y}_{i}-Y_{i})|z_{i}]
=E{(Y^iEBY^i)E((Y^iYi)zi)|zi}=0.\displaystyle=E\Big{\{}(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})E\Big{(}(\hat{Y}_{i}-Y_{i})\mid z_{i}\Big{)}\Big{|}z_{i}\Big{\}}=0.

3 Jackknife and Parametric Bootstrap Estimators of the MSPE

In this section, we propose two estimators for the MSPE of the EB predictor Y^iEB.\hat{Y}_{i}^{\text{EB}}. First, we propose a jackknife estimator of the MSPE. Second, we propose a parametric bootstrap estimator of the MSPE. The expectation of the proposed measure of uncertainty based on both methods is correct up to the order O(m1)O(m^{-1}) for the EB predictor.

3.1 Jackknife Estimator of the MSPE

In this section, we propose a jackknife estimator of the MSPE of the EB predictor Y^iEB,\hat{Y}_{i}^{\text{EB}}, denoted by mspeJ(Y^iEB).\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}}). We prove the order of the bias of mspeJ(Y^iEB)\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}}) is correct up to the order O(m1)O(m^{-1}) under six regularity conditions. We propose the following jackknife estimator:

mspeJ(Y^iEB)=M^1i,J+M^2i,Jwhere\displaystyle\text{mspe}_{J}(\hat{Y}^{\text{EB}}_{i})=\hat{M}_{1i,J}+\hat{M}_{2i,J}\quad\text{where} (6)
M^1i,J=M^1im1mj=1m(M^1iM^1i(j))andM^2i,J=m1mj=1m(Y^iEBY^i(j)EB)2,\displaystyle\hat{M}_{1i,J}=\hat{M}_{1i}-\frac{m-1}{m}\sum\limits_{j=1}^{m}(\hat{M}_{1i}-\hat{M}_{1i(-j)})\quad\text{and}\quad\hat{M}_{2i,J}=\frac{m-1}{m}\sum\limits_{j=1}^{m}(\hat{Y}^{\text{EB}}_{i}-\hat{Y}^{\text{EB}}_{i(-j)})^{2},

where (j)(-j) denotes all areas except the jj-th area. Therefore, let

M^1i\displaystyle\hat{M}_{1i} :=M1i(σ^ν2,𝜷^)\displaystyle:=M_{1i}(\hat{\sigma}^{2}_{\nu},\hat{\bm{\beta}})
=exp{ψiγ^i}[exp{ψiγ^i}1]exp{2[γ^izi+(1γ^i)𝒘i𝜷^]},\displaystyle=\exp\Big{\{}\psi_{i}\hat{\gamma}_{i}\Big{\}}\Big{[}\exp\Big{\{}\psi_{i}\hat{\gamma}_{i}\Big{\}}-1\Big{]}\exp\Big{\{}2\Big{[}\hat{\gamma}_{i}z_{i}+(1-\hat{\gamma}_{i})\bm{w}_{i}^{\top}\hat{\bm{\beta}}\Big{]}\Big{\}}, (7)
M^1i(j)\displaystyle\hat{M}_{1i(-j)} =exp{ψiγ^i(j)}[exp{ψiγ^i(j)}1]exp{2[γ^i(j)zi+(1γ^i(j))𝒘i𝜷^(j)]},and\displaystyle=\exp\Big{\{}\psi_{i}\hat{\gamma}_{i(-j)}\Big{\}}\Big{[}\exp\Big{\{}\psi_{i}\hat{\gamma}_{i(-j)}\Big{\}}-1\Big{]}\exp\Big{\{}2\Big{[}\hat{\gamma}_{i(-j)}z_{i}+(1-\hat{\gamma}_{i(-j)})\bm{w}_{i}^{\top}\hat{\bm{\beta}}_{(-j)}\Big{]}\Big{\}},\quad\text{and}
Y^i(j)EB\displaystyle\hat{Y}^{\text{EB}}_{i(-j)} =exp{γ^i(j)zi+(1γ^i(j))𝒘i𝜷^(j)+ψiγ^i(j)2}.\displaystyle=\exp\Big{\{}\hat{\gamma}_{i(-j)}z_{i}+(1-\hat{\gamma}_{i(-j)})\bm{w}_{i}^{\top}\hat{\bm{\beta}}_{(-j)}+\frac{\psi_{i}\hat{\gamma}_{i(-j)}}{2}\Big{\}}.

Note that for all [.](j)[.]_{(-j)} cases, the ϕ=(𝜷,σν2)\phi=(\bm{\beta},\sigma^{2}_{\nu})^{\top} estimators should plug into the expressions where the data is based on all the areas other than jj. We define some notation and then establish six regularity conditions used in Theorem 1. Let (|zi)\ell(\cdot|z_{i}) denote the conditional likelihood function. We define the corresponding first, second, and third derivatives of the conditional likelihood function by i(ϕ|zi)\ell_{i}^{{}^{\prime}}(\phi|z_{i}), i′′(ϕ|zi),\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i}), and i′′′(ϕ|zi)\ell_{i}^{{}^{\prime\prime\prime}}(\phi|z_{i}), respectively. Now, assume the following six regularity conditions:

Condition 1. Define ϕ=(𝜷,σν2)Θ\phi^{\top}=(\bm{\beta},\sigma^{2}_{\nu})\in\Theta where Θ\Theta is a compact set such that Θ(p,+)\Theta\subseteq(\mathbb{R}^{p},\mathbb{R}^{+}).

Condition 2. Assume ϕ^\hat{\phi} is a consistent estimator for ϕ,\phi, i.e. ϕ^𝑝ϕ\hat{\phi}\xrightarrow{p}\phi.

Condition 3. Assume i(ϕ|zi)\ell_{i}^{{}^{\prime}}(\phi|z_{i}) and i′′(ϕ|zi)\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i}) both exist for i=1,mi=1,\ldots m, almost surely in probability.

Condition 4. Assume E{i(ϕ|zi)|ϕ}=0E\{\ell_{i}^{{}^{\prime}}(\phi|z_{i})|\phi\}=0 for i=1,,mi=1,\ldots,m.

Condition 5. Assume i′′(ϕ|zi)\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i}) is a continuous function of ϕ\phi for i=1,,mi=1,...,m, almost surely in probability, where E{i′′(ϕ|zi)}E\{\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i})\} is positive definite, uniformly bounded away from 0, and is a measurable function of ziz_{i}.

Condition 6. Assume E{|i(ϕ|zi)|4+δ}E\{|\ell_{i}^{{}^{\prime}}(\phi|z_{i})|^{4+\delta}\}, E{|i′′(ϕ|zi)|4+δ}E\{|\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i})|^{4+\delta}\}, and E{supc(ϵ,ϵ)|i′′′(ϕ+c|zi)|4+δ}E\{\sup_{c\in(-\epsilon,\epsilon)}|\ell_{i}^{{}^{\prime\prime\prime}}(\phi+c|z_{i})|^{4+\delta}\} are uniformly bounded for i=1,mi=1,\ldots m under some ϵ>0\epsilon>0 and δ>0\delta>0.

Theorem 1.

Assume Conditions 1–6 hold. Then

E[mspeJ(Y^iEB)]=MSPE(Y^iEB)+O(m1).E[\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}})]=\text{MSPE}(\hat{Y}_{i}^{\text{EB}})+O(m^{-1}).
Proof.

Define

E(mspeJ(Y^iEB))\displaystyle E(\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}})) E(M^1i,J+M^2i,J)\displaystyle\equiv E(\hat{M}_{1i,J}+\hat{M}_{2i,J})
=E(M^1im1mj=1m[(M^1iM^1i(j))|zi])\displaystyle=E\Big{(}\hat{M}_{1i}-\frac{m-1}{m}\sum_{j=1}^{m}[(\hat{M}_{1i}-\hat{M}_{1i(-j)})|z_{i}]\Big{)}
+m1mE(j=1m[(Y^iEBY^i(j)EB)2|zi]).\displaystyle\quad+\frac{m-1}{m}E\Big{(}\sum_{j=1}^{m}[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i(-j)}^{\text{EB}})^{2}|z_{i}]\Big{)}.

Also, define a remainder term rir_{i} that is bounded in absolute value by RiR_{i} such that,

|ri|max{1,|(ϕ|zi)|3,|′′(ϕ|zi)|3,|′′′(ϕ|zi)|3}Ri.|r_{i}|\leq\max\{1,|\ell^{{}^{\prime}}(\phi|z_{i})|^{3},|\ell^{{}^{\prime\prime}}(\phi|z_{i})|^{3},|\ell^{{}^{\prime\prime\prime}}(\phi|z_{i})|^{3}\}\equiv R_{i}.

First, we prove M^1i,J\hat{M}_{1i,J} has a bias of order O(m1).O(m^{-1}). Using a Taylor series expansion, we find that

M^1i=M1i+M1i(ϕ)(ϕ^ϕ)+12M1i′′(ϕ)(ϕ^ϕ)2+16M1i′′′(ϕ)(ϕ^ϕ)3,\displaystyle\hat{M}_{1i}=M_{1i}+M_{1i}^{{}^{\prime}\top}(\phi)(\hat{\phi}-\phi)+\frac{1}{2}M_{1i}^{{}^{\prime\prime}\top}(\phi)(\hat{\phi}-\phi)^{2}+\frac{1}{6}M_{1i}^{{}^{\prime\prime\prime}\top}(\phi^{*})(\hat{\phi}-\phi)^{3},

for ϕ\phi^{*} between ϕ\phi and ϕ^\hat{\phi}. Also, M1i(ϕ)M_{1i}^{{}^{\prime}\top}(\phi), M1i′′(ϕ)M_{1i}^{{}^{\prime\prime}\top}(\phi), and M1i′′′(ϕ)M_{1i}^{{}^{\prime\prime\prime}\top}(\phi^{*}) stand for the first, second, and third derivatives of M1iM_{1i} with respect to ϕ\phi. Let ϕ^=(𝜷^,σ^ν2)\hat{\phi}^{\top}~{}=~{}(\hat{\bm{\beta}},\hat{\sigma}^{2}_{\nu}), and it follows that

M^1iM^1i(j)=M^1i(ϕ^)(ϕ^ϕ^(j))+12M^1i′′(ϕ^)(ϕ^ϕ^(j))2+16M^1i′′′(ϕ^(j))(ϕ^ϕ^(j))3,\displaystyle\hat{M}_{1i}-\hat{M}_{1i(-j)}=\hat{M}^{{}^{\prime}\top}_{1i}(\hat{\phi})(\hat{\phi}-\hat{\phi}_{(-j)})+\frac{1}{2}\hat{M}_{1i}^{{}^{\prime\prime}\top}(\hat{\phi})(\hat{\phi}-\hat{\phi}_{(-j)})^{2}+\frac{1}{6}\hat{M}_{1i}^{{}^{\prime\prime\prime}\top}(\hat{\phi}^{*}_{(-j)})(\hat{\phi}-\hat{\phi}_{(-j)})^{3},

for some ϕ^(j)\hat{\phi}^{*}_{(-j)} between ϕ^(j)\hat{\phi}_{(-j)} and ϕ^\hat{\phi}. In order to approximate the solution ϕ^\hat{\phi} of the equation f(τ)=i=1m(τ|zi)=0f(\tau)=\sum_{i=1}^{m}\ell^{\prime}(\tau|z_{i})=0 in iteration (ξ+1)(\xi+1), we use Householder’s method (Householder (1970), Theorem 4.4.1). See also Theorem 1 of Lohr and Rao (2009):

τξ+1=τξf(τξ)f(τξ)[1+τξf′′(τξ)2{f(τξ)}2].\displaystyle\tau_{\xi+1}=\tau_{\xi}-\frac{f(\tau_{\xi})}{f^{\prime}(\tau_{\xi})}\Big{[}1+\frac{\tau_{\xi}f^{\prime\prime}(\tau_{\xi})}{2\{f^{\prime}(\tau_{\xi})\}^{2}}\Big{]}.

By taking the initial value τξ=ϕ\tau_{\xi}=\phi, we have

ϕ^ϕ\displaystyle\hat{\phi}-\phi =i=1mi(ϕ|zi)i=1mi′′(ϕ|zi){1+k=1mk(ϕ|zk)r=1mr′′′(ϕ|zr)2(k=1mk′′(ϕ|zk))2}+Op(|ϕ^ϕ|3),and\displaystyle=-\frac{\sum\limits_{i=1}^{m}\ell_{i}^{{}^{\prime}}(\phi|z_{i})}{\sum\limits_{i=1}^{m}\ell_{i}^{{}^{\prime\prime}}(\phi|z_{i})}\Bigg{\{}1+\frac{\sum\limits_{k=1}^{m}\ell^{{}^{\prime}}_{k}(\phi|z_{k})\sum\limits_{r=1}^{m}\ell^{{}^{\prime\prime\prime}}_{r}(\phi|z_{r})}{2(\sum\limits_{k=1}^{m}\ell^{{}^{\prime\prime}}_{k}(\phi|z_{k}))^{2}}\Bigg{\}}+O_{p}(|\hat{\phi}-\phi|^{3}),\quad\text{and}
ϕ^ϕ^(j)\displaystyle\hat{\phi}-\hat{\phi}_{(-j)} =j(ϕ^|zj)kjmk′′(ϕ^|zk)[1j(ϕ^|zj)kjmk′′′(ϕ^|zk)2(kjmk′′(ϕ^|zk))2]+Op(|ϕ^ϕ^(j)|3).\displaystyle=\frac{\ell^{{}^{\prime}}_{j}(\hat{\phi}|z_{j})}{\sum\limits_{k\neq j}^{m}\ell^{{}^{\prime\prime}}_{k}(\hat{\phi}|z_{k})}\Bigg{[}1-\frac{\ell^{{}^{\prime}}_{j}(\hat{\phi}|z_{j})\sum\limits_{k\neq j}^{m}\ell^{{}^{\prime\prime\prime}}_{k}(\hat{\phi}|z_{k})}{2(\sum\limits_{k\neq j}^{m}\ell^{{}^{\prime\prime}}_{k}(\hat{\phi}|z_{k}))^{2}}\Bigg{]}+O_{p}(|\hat{\phi}-\hat{\phi}_{(-j)}|^{3}).

By taking conditional expectation and using Theorem 2.1 of Jiang et al. (2002), we find that

E(ϕ^ϕ|zi)=i(ϕ|zi)+φi=1mE{i′′(ϕ|zi)}+rio(m1),\displaystyle E(\hat{\phi}-\phi|z_{i})=\frac{-\ell_{i}^{{}^{\prime}}(\phi|z_{i})+\varphi}{\sum\limits_{i=1}^{m}E\{\ell^{{}^{\prime\prime}}_{i}(\phi|z_{i})\}}+r_{i}\,o(m^{-1}),

where

φ=j=1mE[j(ϕ|zj)j′′(ϕ|zj)]j=1mE{j′′(ϕ|zj)}j=1mk=1mE[j(ϕ|zj)]2E(k′′′(ϕ|zk))2(j=1mE{j′′(ϕ|zj)})2,\displaystyle\varphi=\frac{\sum\limits_{j=1}^{m}E[\ell^{{}^{\prime}}_{j}(\phi|z_{j})\ell^{{}^{\prime\prime}}_{j}(\phi|z_{j})]}{\sum\limits_{j=1}^{m}E\{\ell^{{}^{\prime\prime}}_{j}(\phi|z_{j})\}}-\frac{\sum\limits_{j=1}^{m}\sum\limits_{k=1}^{m}E{[\ell^{{}^{\prime}}_{j}(\phi|z_{j})]^{2}}E(\ell^{{}^{\prime\prime\prime}}_{k}(\phi|z_{k}))}{2(\sum\limits_{j=1}^{m}E\{\ell^{{}^{\prime\prime}}_{j}(\phi|z_{j})\})^{2}},
jimE(ϕ^ϕ^(j)|zi)\displaystyle\sum\limits_{j\neq i}^{m}E(\hat{\phi}-\hat{\phi}_{(-j)}|z_{i}) =i(ϕ|zi)+φj=1mE{j′′(ϕ|zj)}+rio(m1),and\displaystyle=\frac{-\ell^{{}^{\prime}}_{i}(\phi|z_{i})+\varphi}{\sum\limits_{j=1}^{m}E\{\ell^{{}^{\prime\prime}}_{j}(\phi|z_{j})\}}+r_{i}\,o(m^{-1}),\quad\text{and}
E(ϕ^(i)ϕ^|zi)\displaystyle E(\hat{\phi}_{(-i)}-\hat{\phi}|z_{i}) =i(ϕ|zi)j=1mE{j′′(ϕ|zj)}+rio(m1).\displaystyle=\frac{\ell^{{}^{\prime}}_{i}(\phi|z_{i})}{\sum\limits_{j=1}^{m}E\{\ell^{{}^{\prime\prime}}_{j}(\phi|z_{j})\}}+r_{i}\,o(m^{-1}).

By combining the above results, we find that

E{M^1i,JM1i|zi}\displaystyle E\{\hat{M}_{1i,J}-M_{1i}|z_{i}\} =M1i(ϕ|zi)i(ϕ|zi)/φ+rio(m1).\displaystyle=-M_{1i}^{{}^{\prime}}(\phi|z_{i})\ell^{{}^{\prime}}_{i}(\phi|z_{i})/\varphi+r_{i}\,o(m^{-1}).
Hence,E(M^1i,J)\displaystyle\text{Hence,}\quad E(\hat{M}_{1i,J}) =M1i+O(m1).\displaystyle=M_{1i}+O(m^{-1}).

Second, we prove M^2i\hat{M}_{2i} has a bias of order o(m1)o(m^{-1}). Let

Y^iEBY^i(j)EB:=h(ϕ^|zi)h(ϕ^(j)|zi),\displaystyle\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i(-j)}^{\text{EB}}:=h(\hat{\phi}|z_{i})-h(\hat{\phi}_{(-j)}|z_{i}),

and h(ϕ|zi)=E(YiEB|zi,ϕ)h(\phi|z_{i})=E(Y_{i}^{\text{EB}}|z_{i},\phi). Using a Taylor series expansion, we find that

Y^iEBY^i(j)EB=h(ϕ^|zi)(ϕ^ϕ^(j))+12h′′(ϕ^(j)|zi)(ϕ^ϕ^(j))2,\displaystyle\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i(-j)}^{\text{EB}}=h^{{}^{\prime}\top}(\hat{\phi}|z_{i})(\hat{\phi}-\hat{\phi}_{(-j)})+\frac{1}{2}h^{{}^{\prime\prime}\top}(\hat{\phi}^{*}_{(-j)}|z_{i})(\hat{\phi}-\hat{\phi}_{(-j)})^{2},

where

h(ϕ^|zi)=(h(ϕ^|zi)𝜷,h(ϕ^|zi)σν2),h′′(ϕ^|zi)=((h(ϕ^|zi))2𝜷,(h(ϕ^|zi))2σν2),\displaystyle h^{{}^{\prime}\top}(\hat{\phi}|z_{i})=\Big{(}\frac{\partial h(\hat{\phi}|z_{i})}{\partial\bm{\beta}},\frac{\partial h(\hat{\phi}|z_{i})}{\partial\sigma^{2}_{\nu}}\Big{)},\quad h^{{}^{\prime\prime}\top}(\hat{\phi}|z_{i})=\Big{(}\frac{\partial(\partial h(\hat{\phi}|z_{i}))}{\partial^{2}\bm{\beta}},\frac{\partial(\partial h(\hat{\phi}|z_{i}))}{\partial^{2}\sigma^{2}_{\nu}}\Big{)},

and ϕ^(j)\hat{\phi}^{*}_{(-j)} is between ϕ^(j)\hat{\phi}_{(-j)} and ϕ^.\hat{\phi}. Using an additional Taylor series expansion, we find that

j=1mE{(Y^i(j)EBY^iEB)2|zi}\displaystyle\sum\limits_{j=1}^{m}E\big{\{}(\hat{Y}_{i(-j)}^{\text{EB}}-\hat{Y}_{i}^{\text{EB}})^{2}|z_{i}\big{\}} ={h(ϕ|zi)}2×j=1mE{(j(ϕ|zj))2}φ2+rio(m1).Similarly,\displaystyle=\big{\{}h^{{}^{\prime}\top}(\phi|z_{i})\big{\}}^{2}\times\frac{\sum\limits_{j=1}^{m}E\{(\ell^{{}^{\prime}}_{j}(\phi|z_{j}))^{2}\}}{\varphi^{2}}+r_{i}\,o(m^{-1}).\quad\text{Similarly,}
E{(Y^iEBY^i)2|zi}\displaystyle E\big{\{}(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}\big{\}} ={h(ϕ|zi)}2×j=1mE{(j(ϕ|zj))2}φ2+rio(m1).\displaystyle=\big{\{}h^{{}^{\prime}\top}(\phi|z_{i})\big{\}}^{2}\times\frac{\sum\limits_{j=1}^{m}E\{(\ell^{{}^{\prime}}_{j}(\phi|z_{j}))^{2}\}}{\varphi^{2}}+r_{i}\,o(m^{-1}).

By combining the above results, we find that

E(M^2i,J)\displaystyle E(\hat{M}_{2i,J}) =M2i+o(m1).\displaystyle=M_{2i}+o(m^{-1}).

Finally,

E(mspeJ(Y^iEB))\displaystyle E(\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}})) =E(M^1i,J)+E(M^2i,J)\displaystyle=E(\hat{M}_{1i,J})+E(\hat{M}_{2i,J})
={M1i+O(m1)}+{M2i+o(m1)}\displaystyle=\{M_{1i}+O(m^{-1})\}+\{M_{2i}+o(m^{-1})\}
=M1i+M2i+O(m1).\displaystyle=M_{1i}+M_{2i}+O(m^{-1}).

Hence, E[mspeJ(Y^iEB)]=MSPE(Y^iEB)+O(m1)E[\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}})]=\text{MSPE}(\hat{Y}_{i}^{\text{EB}})+O(m^{-1}).

3.2 Parametric Bootstrap Estimator of the MSPE

In this section, we propose a parametric bootstrap estimator of the MSPE of the EB predictor Y^iEB\hat{Y}_{i}^{\text{EB}}, which we denote it by mspeB(Y^iEB).\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}}). We prove that the order of the bias is correct up to order O(m1).O(m^{-1}). Specifically, we extend Butar and Lahiri (2003) to find a parametric bootstrap of our proposed EB predictor. To introduce the parametric bootstrap method, consider the following bootstrap model:

zi|𝒘i,νi\displaystyle z_{i}^{\star}|\bm{w}_{i}^{\star},\nu_{i}^{\star} indN(𝒘i𝜷^+νi,ψi)\displaystyle\stackrel{{\scriptstyle ind}}{{\sim}}N(\bm{w}_{i}^{{\star}\top}\hat{\bm{\beta}}+\nu_{i}^{\star},\psi_{i})
𝒘i\displaystyle\bm{w}_{i}^{\star} indNp(𝑾i,Σi)\displaystyle\stackrel{{\scriptstyle ind}}{{\sim}}N_{p}(\bm{W}_{i},\Sigma_{i})
νi\displaystyle\nu_{i}^{\star} indN(0,σ^ν2).\displaystyle\stackrel{{\scriptstyle ind}}{{\sim}}N(0,\hat{\sigma}^{2}_{\nu}). (8)

Recall that from Definition 1, MSPE(Y^iEB)=M1i+E[(Y^iEBY^i)2|zi]\text{MSPE}(\hat{Y}_{i}^{\text{EB}})=M_{1i}+E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}] since M3i=0M_{3i}=0. We use the parametric bootstrap twice. First, we use it to estimate M1iM_{1i} in order to correct the bias of M^1i:=M1i(σ^ν2,𝜷^)\hat{M}_{1i}:=M_{1i}(\hat{\sigma}^{2}_{\nu},\hat{\bm{\beta}}) (see Eq. (3.1)). Second, we use it to estimate E[(Y^iEBY^i)2|zi]E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}]. More specifically, we propose to estimate M1iM_{1i} by 2M1i(σ^ν2,𝜷^)E[M1i(σ^ν2,𝜷^)|zi]2{M}_{1i}(\hat{\sigma}^{2}_{\nu},\hat{\bm{\beta}})-E_{\star}[M_{1i}(\hat{\sigma}^{\star 2}_{\nu},\hat{\bm{\beta}}^{\star})|z_{i}^{\star}], and E[(Y^iEBY^i)2|zi]E[(\hat{Y}_{i}^{\text{EB}}-\hat{Y}_{i})^{2}|z_{i}] by E[(Y^iEBY^iEB)2|zi]E_{\star}[(\hat{Y}_{i}^{\text{EB}\star}-\hat{Y}_{i}^{\text{EB}})^{2}|z_{i}^{\star}], where EE_{\star} denotes that the expectation is computed with respect to model in Eq. (3.2) and Y^iEB=exp{γ^izi+(1γ^i)𝒘i𝜷^+ψiγ^i/2}\hat{Y}_{i}^{\text{EB}\star}=\exp\{\hat{\gamma}_{i}^{\star}z_{i}+(1-\hat{\gamma}_{i}^{\star})\bm{w}_{i}^{\top}\hat{\bm{\beta}}^{\star}+\psi_{i}\hat{\gamma}_{i}^{\star}/2\}. In addition, γ^i=(σ^ν2+𝜷^Σi𝜷^)/(σ^ν2+𝜷^Σi𝜷^+ψi),\hat{\gamma}_{i}^{\star}=(\hat{\sigma}^{\star 2}_{\nu}+\hat{\bm{\beta}}^{\star\top}\Sigma_{i}\hat{\bm{\beta}}^{\star})/(\hat{\sigma}^{\star 2}_{\nu}+\hat{\bm{\beta}}^{\star\top}\Sigma_{i}\hat{\bm{\beta}}^{\star}+\psi_{i}), where 𝜷^\hat{\bm{\beta}}^{\star} and σ^ν2\hat{\sigma}^{\star 2}_{\nu} are estimators of 𝜷\bm{\beta} and σν2\sigma^{2}_{\nu} with respect to the parametric bootstrap model in Eq. (3.2).

Our proposed estimator of MSPE(Y^iEB)\text{MSPE}(\hat{Y}_{i}^{\text{EB}}) is

mspeB(Y^iEB)=2M1i(σ^ν2,𝜷^)E[M1i(σ^ν2,𝜷^)|zi]+E[(Y^iEBY^iEB)2|zi],\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}})=2{M}_{1i}(\hat{\sigma}^{2}_{\nu},\hat{\bm{\beta}})-E_{\star}[M_{1i}(\hat{\sigma}^{\star 2}_{\nu},\hat{\bm{\beta}}^{\star})|z_{i}^{\star}]+E_{\star}[(\hat{Y}_{i}^{\text{EB}\star}-\hat{Y}_{i}^{\text{EB}})^{2}|z_{i}^{\star}], (9)

which has bias of order O(m1)O(m^{-1}) as shown in the Theorem 2.

Theorem 2.

Assume E(σ^ν2σ^ν2)=Op(m1)E_{\star}(\hat{\sigma}_{\nu}^{\star 2}-\hat{\sigma}^{2}_{\nu})=O_{p}(m^{-1}) and E(𝛃^𝛃^)=Op(m1)E_{\star}(\hat{\bm{\beta}}^{\star}-\hat{\bm{\beta}})=O_{p}(m^{-1}). The bootstrap estimator of the MSPE has bias of order O(m1)O(m^{-1}), i.e.

E[mspeB(Y^iEB)]=MSPE(Y^iEB)+O(m1).E[\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}})]=\text{MSPE}(\hat{Y}_{i}^{\text{EB}})+O(m^{-1}).
Proof.

Let

E[M1i(σ^ν2,𝜷^)|zi]=M1i(σ^ν2,𝜷^)+Op(m1).E_{\star}[M_{1i}(\hat{\sigma}^{\star 2}_{\nu},\hat{\bm{\beta}}^{\star})|z_{i}^{\star}]=M_{1i}(\hat{\sigma}^{2}_{\nu},\hat{\bm{\beta}})+O_{p}(m^{-1}).

Assume that Rm=Op(m1)R^{{\star}}_{m}=O_{p^{\star}}(m^{-1}) such that mRmmR^{\star}_{m} is bounded in probability under the parametric bootstrap model in Eq. (3.2). Consider the following Taylor series expansion:

Y^iEBY^iEB=(ϕ^ϕ^)h(ϕ^|zi)+Rm,\hat{Y}_{i}^{\text{EB}\star}-\hat{Y}_{i}^{\text{EB}}=(\hat{\phi}^{\star}-\hat{\phi})^{\top}h^{\prime}(\hat{\phi}|z_{i})+R^{\star}_{m},

such that ϕ^=(𝜷^,σ^ν2)\hat{\phi}^{\star\top}=(\hat{\bm{\beta}}^{\star},\hat{\sigma}^{\star 2}_{\nu}).

Using an argument similar to the proof of Theorem 1,

E[(Y^iEBY^iEB)2|zi]\displaystyle E_{\star}[(\hat{Y}_{i}^{\text{EB}\star}-\hat{Y}_{i}^{\text{EB}})^{2}|z_{i}^{\star}] =M^2i+op(m1)andE[M^1i|zi]=M^1i+Op(m1).\displaystyle=\hat{M}_{2i}+o_{p}(m^{-1})\quad\text{and}\quad E_{\star}[\hat{M}^{\star}_{1i}|z_{i}^{\star}]=\hat{M}_{1i}+O_{p}(m^{-1}). (10)

Substituting Eq. (10) into Eq. (9), we find that

mspeB(Y^iEB)\displaystyle\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}}) =2M^1i[M^1i+Op(m1)]+M^2i+op(m1)\displaystyle=2\hat{M}_{1i}-[\hat{M}_{1i}+O_{p}(m^{-1})]+\hat{M}_{2i}+o_{p}(m^{-1})
=M^1i+M^2i+Op(m1).\displaystyle=\hat{M}_{1i}+\hat{M}_{2i}+O_{p}(m^{-1}).

This suggests that

E[mspeB(Y^iEB)]=MSPE(Y^iEB)+O(m1).E[\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}})]=\text{MSPE}(\hat{Y}_{i}^{\text{EB}})+O(m^{-1}).

4 Experiments

In this section, we investigate the performance of the EB predictors in comparison to the direct estimators through design-based and model-based simulation studies. In addition, we evaluate the MSPE estimators using both a jackknife and parametric bootstrap.

4.1 Design-Based Simulation Study

In this section, we consider a design-based simulation study using the CoG data set as described in Sec. 1.1.

4.1.1 Design-Based Simulation Setup

We describe the design-based simulation setup. The parameter of interest is average number of full-time employees per government at the state level from 2012 data set. The covariate is the average number of full-time employees per government at the state level from the 2007 data set. There are observed skewed patterns in the average number of full-time employees in both 2007 and 2012, which motivates our proposed framework.

For the response variable, we select a total sample of 7,000 governmental units proportionally allocated to the states and for the covariates, we select a total sample of 70,000 units and the survey-weighted averages were then calculated. The measurement error variance Σi\Sigma_{i} was obtained from a Taylor series approximation, where Var(xi)\text{Var}(x_{i}) was estimated from the formula of variance in simple random sampling without replacement at each state. The ψi\psi_{i}’s were estimated by a Generalized Variance Function (GVF) method (see Fay and Herriot (1979)). We assume the sampling variances to be known throughout the estimation procedure.

For the design-based simulation, we draw 1,000 samples and estimate the parameters from each sample. We evaluate our proposed predictors by empirical MSE per each state ii:

EMSE(Y^i)=1Rr=1R[Y^i(r)Yi(r)]2,\displaystyle\text{EMSE}(\hat{Y}_{i})=\frac{1}{R}\sum\limits_{r=1}^{R}\Big{[}\hat{Y}_{i}^{(r)}-Y_{i}^{(r)}\Big{]}^{2},

where R=1,000R=1,000 is the total number of replications, and Y^i\hat{Y}_{i} is the estimator of YiY_{i}. In addition, when the parametric bootstrap it used, we take B=1,000B=1,000 bootstrap samples. We use the same number of replications and bootstrap samples in the design and model-based simulation studies.

4.1.2 Design-Based Simulation Results

In this section, we provide the results of the design-based simulation study.

Investigating the performance of the proposed estimators

Recall that the covariate of interest is the average number of full-time employees per government at the state level from 2007 data set, and we wish to predict the average number of full-time employees per government at the state level in 2012. To do so, we give the predictors for each state as well as their corresponding EMSE’s in Tables 1 and 2. More specifically, we compare the following three estimators:

  • 1)

    yiy_{i}: the direct estimator,

  • 2)

    Y~i\tilde{Y}_{i}: the EB predictor, assuming the true covariate wiw_{i} and ignoring Σi\Sigma_{i} in our model,

  • 3)

    Y^iEB\hat{Y}_{i}^{\text{EB}}: the EB predictor, assuming the true covariate wiw_{i} has measurement error, where Σi\Sigma_{i} is included in our model.

We observe that in most cases the EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}}) is smaller than the EMSE(Y~i)\text{EMSE}(\tilde{Y}_{i}). However, we observe that our proposed EB predictor does not always outperform the direct estimator, which we further explore in our model-based simulation studies in Sec. 4.2.

Table 1: Estimators and their empirical MSEs from CoG. Note that nin_{i} is the sample size per state, and the MSEs are rescaled logarithmically.
ii State nin_{i} yiy_{i} Y~i\tilde{Y}_{i} Y^iEB\hat{Y}_{i}^{\text{EB}} EMSE(yi)\text{EMSE}(y_{i}) EMSE(Y~i)\text{EMSE}(\tilde{Y}_{i}) EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})
1 RI 10 191.641 202.928 204.907 6.523 5.390 5.103
2 AK 14 132.301 120.112 123.684 4.501 6.153 5.793
3 NV 15 420.299 422.992 431.912 8.077 8.170 8.449
4 MD 19 824.939 784.779 794.784 8.050 5.524 6.503
5 DE 27 64.273 72.674 73.130 3.003 5.113 5.182
6 LA 40 363.838 342.056 343.344 6.017 0.845 -2.863
7 VA 40 560.881 526.612 530.160 6.669 8.265 8.148
8 NH 44 80.695 83.645 83.857 -3.994 2.254 2.387
9 UT 47 126.045 117.729 118.512 2.827 2.873 2.461
10 AZ 49 332.610 349.704 350.915 6.270 7.380 7.439
11 CT 49 187.500 191.494 192.054 4.851 5.457 5.528
12 SC 53 231.790 221.881 222.574 5.158 6.279 6.218
13 WV 53 83.568 84.071 84.315 2.754 2.482 2.336
14 WY 55 43.084 39.629 39.795 2.493 3.873 3.824
15 VT 59 27.717 27.529 27.574 0.474 0.751 0.688
16 ME 65 38.892 40.713 40.789 2.966 1.899 1.840
17 NM 67 93.817 93.360 93.913 3.496 3.331 3.529
18 MA 70 237.066 231.213 231.999 4.218 1.741 2.310
19 TN 72 245.317 232.133 233.405 4.257 6.144 6.023
20 NC 76 372.264 357.791 359.375 5.846 6.997 6.700
21 MS 78 137.977 132.143 132.454 3.891 0.299 0.774
22 ID 92 46.450 45.642 45.784 2.451 1.910 2.016
23 AL 93 162.841 160.389 160.818 3.356 2.131 2.406
24 MT 99 23.296 22.457 22.508 0.461 1.481 1.433
25 KY 104 119.415 121.105 121.576 1.935 2.928 3.134
26 NJ 108 235.215 245.163 245.595 3.898 5.663 5.713
27 GA 109 255.141 243.862 244.539 5.686 6.696 6.648
Table 2: Estimators and their empirical MSEs from CoG. Note that nin_{i} is the sample size per state, and the MSEs are rescaled logarithmically (continued).
ii State nin_{i} yiy_{i} Y~i\tilde{Y}_{i} Y^iEB\hat{Y}_{i}^{\text{EB}} EMSE(yi)\text{EMSE}(y_{i}) EMSE(Y~i)\text{EMSE}(\tilde{Y}_{i}) EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})
28 AR 118 68.911 65.223 65.362 -3.160 2.719 2.646
29 OR 120 72.363 72.483 72.653 1.376 1.493 1.648
30 FL 124 377.822 365.035 366.658 7.658 8.148 8.092
31 OK 144 76.686 74.610 74.803 -2.447 1.155 0.926
32 WA 145 93.681 91.238 91.476 3.077 3.920 3.852
33 SD 153 14.757 14.078 14.142 -1.338 -3.583 -4.546
34 IA 155 56.219 55.686 55.790 -4.924 -0.962 -1.332
35 CO 184 74.373 71.579 71.908 0.789 2.907 2.746
36 NE 197 33.713 31.017 31.215 1.326 -0.561 -1.167
37 IN 217 75.994 78.712 78.830 0.619 0.609 0.775
38 ND 217 8.336 7.421 7.469 -1.704 -1.436 -1.644
39 MI 236 85.314 97.710 97.705 -1.220 4.945 4.944
40 WI 246 61.304 61.320 61.451 2.486 2.495 2.569
41 NY 270 280.982 278.784 283.973 6.457 6.275 6.681
42 MO 278 61.972 62.846 62.946 0.093 1.307 1.409
43 MN 289 42.025 45.747 45.727 -1.572 2.367 2.355
44 OH 302 108.261 111.756 111.905 2.364 3.820 3.864
45 KS 309 30.804 30.253 30.321 1.859 2.253 2.208
46 CA 338 238.284 248.558 248.800 6.991 6.244 6.223
47 TX 374 221.105 204.983 205.689 2.570 5.965 5.892
48 PA 386 81.653 83.551 83.668 2.888 3.628 3.666
49 IL 567 64.650 67.278 67.172 1.490 3.110 3.064
Jackknife versus parametric bootstrap estimators

Next, we consider the performance of MSPE estimators, i.e., the jackknife and bootstrap, with respect to the true MSE, i.e., EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}}) in Figure 1. The results are given on the logarithmic scale, and we observe that the distribution of jackknife is closer to the distribution of true MSE when compared to the bootstrap. Therefore, we recommend the jackknife given that it slightly overestimates the true MSE. As already mentioned, given that our proposed estimator does not uniformly beat the direct estimator in terms of the EMSE, we conduct a model-based simulation study in Sec. 4.2 to investigate this and provide further insight.

Refer to caption
Figure 1: Left: The jackknife and the bootstrap estimators versus the true MSE (EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})), where the results are rescaled logarithmically. Right: Box plots of the jackknife and the bootstrap estimators and the true MSE (EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})), where the results are rescaled logarithmically. In general, the log of the jackknife is closer to the log of true MSE.

4.2 Model-Based Simulation Study

In this section, we describe our model-based simulation study to further investigate the performance of the proposed EB predictor Y^iEB.\hat{Y}_{i}^{\text{EB}}. Second, we compare the proposed jackknife and parametric bootstrap estimators, mspeJ(Y^iEB)\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}}) and mspeB(Y^iEB).\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}}). Third, we investigate how often the variance estimates σ^u2\hat{\sigma}_{u}^{2} are zero. Finally, we investigate how the regression parameter changes when Σi\Sigma_{i} is misspecified. Our goal through this model-based simulation study is to understand how one could improve the EB predictor through future research, and to further understand its underlying behavior.

4.2.1 Model-Based Simulation Setup

In this section, we provide the setup of our model-based simulation study in Table 3. This setup follows Eqs. (1) and (2). We are interested in comparing the following four estimators:

  • 1)

    yiy_{i}: the direct estimator,

  • 2)

    Y^i\hat{Y}_{i}: the EB predictor, assuming the true covariate WiW_{i}

  • 3)

    Y~i\tilde{Y}_{i}: the EB predictor, assuming the true covariate wiw_{i} and ignoring Σi\Sigma_{i} in our model,

  • 4)

    Y^iEB\hat{Y}_{i}^{\text{EB}}: the EB predictor, assuming the true covariate wiw_{i} has measurement error, where Σi\Sigma_{i} is included in our model

We compare these four estimators (for each area ii) using the empirical MSE:

EMSE(Y^i)=1Rr=1R[Y^i(r)Yi(r)]2,\text{EMSE}(\hat{Y}_{i})=\frac{1}{R}\sum\limits_{r=1}^{R}\Big{[}\hat{Y}_{i}^{(r)}-Y_{i}^{(r)}\Big{]}^{2},

where Y^i\hat{Y}_{i} is the estimator of YiY_{i}.

Table 3: Model-based simulation setup with definition of parameters and distributions
Simulation Setup:
Generate WiW_{i} from a Normal(5,9) and ψi\psi_{i} from a Gamma(4.5,2)
Take θi=3Wi+νi\theta_{i}=3W_{i}+\nu_{i}, zi=θi+eiz_{i}=\theta_{i}+e_{i}, and wi=Wi+ηiw_{i}=W_{i}+\eta_{i}
νiNormal(0,σν2)\nu_{i}\sim\text{Normal}(0,\sigma^{2}_{\nu}), eiNormal(0,ψi)e_{i}\sim\text{Normal}(0,\psi_{i}), and ηiNormal(0,Σi)\eta_{i}\sim\text{Normal}(0,\Sigma_{i})
Take yi=exp(zi)y_{i}=\exp(z_{i}) and Yi=exp(θi)Y_{i}=\exp(\theta_{i})
Parameter Definition:
Let m=20,50,100,m=20,50,100, and 500500 (number of small areas)
Let σν2=2\sigma^{2}_{\nu}=2 (for all cases)
Let k{0,20,50,80,and 100}k\in\{0,20,50,80,\text{and}\;100\}
Σi{0,d}\Sigma_{i}\in\{0,d\}, where d=2d=2 or 44
Allow k%k\% of the Σi\Sigma_{i}’s randomly receive dd and the rest 0.

In order to evaluate the jackknife and parametric bootstrap estimators of Y^iEB\hat{Y}_{i}^{\text{EB}}, we consider the relative bias, denoted by RBJ(Y^iEB)\text{RB}_{J}(\hat{Y}_{i}^{\text{EB}}) and RBB(Y^iEB)\text{RB}_{B}(\hat{Y}_{i}^{\text{EB}}), respectively. More specifically, the relative biases are defined as follows for each area ii:

RBJ(Y^iEB)\displaystyle\text{RB}_{J}(\hat{Y}_{i}^{\text{EB}}) ={1Rr=1RmspeJ(r)(Y^iEB(r))EMSE(Y^iEB)}/EMSE(Y^iEB),\displaystyle=\Big{\{}\frac{1}{R}\sum_{r=1}^{R}\text{mspe}_{J}^{(r)}(\hat{Y}_{i}^{\text{EB}(r)})-\text{EMSE}(\hat{Y}_{i}^{\text{EB}})\Big{\}}\Big{/}\text{EMSE}(\hat{Y}_{i}^{\text{EB}}),
RBB(Y^iEB)\displaystyle\text{RB}_{B}(\hat{Y}_{i}^{\text{EB}}) ={1Rr=1RmspeB(r)(Y^iEB(r))EMSE(Y^iEB)}/EMSE(Y^iEB).\displaystyle=\Big{\{}\frac{1}{R}\sum_{r=1}^{R}\text{mspe}_{B}^{(r)}(\hat{Y}_{i}^{\text{EB}(r)})-\text{EMSE}(\hat{Y}_{i}^{\text{EB}})\Big{\}}\Big{/}\text{EMSE}(\hat{Y}_{i}^{\text{EB}}).

4.2.2 Model-Based Simulation Results

In this section, we summarize our results of the model-based simulation study.

Investigating the performance of the proposed estimators

In this section, we investigate the performance of the proposed estimators. Table 4 provides the four estimators given in Sec. 4.2.1 with their empirical MSEs, where we average the results over all the small areas and re-scale them using the logarithmic scale. When k=0k=0, the MSE’s for all EB predictors are the same since the term Σi\Sigma_{i} vanishes and wiw_{i} is the same as WiW_{i}. Overall, as the value of kk increases, the empirical MSE increases for almost all predictors. We observe there are cases in which the EB predictors cannot outperform the direct estimators based on the simulation results.

Table 4 illustrates that there are cases in which the EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}}) is larger than the EMSE(yi).\text{EMSE}(y_{i}). In fact, the EB predictors cannot outperform the direct estimators due to propagated errors in the term 𝜷Σi𝜷,\bm{\beta}^{\top}\Sigma_{i}\bm{\beta}, which is present in the term γi\gamma_{i} in the EB predictors through the simulations; see expression (3). Therefore, as the measurement error variance Σi\Sigma_{i} increases, we have shown that the MSE of our proposed EB predictors can also increase. This is the main point that one should notice when using a log-model with measurement error. In order to prevent such behavior, a further adjustment should be made to the EB predictors, which we discuss in Sec. 5.

Table 4: Estimators and their empirical MSEs from model-based simulations. The results are averaged over all the small areas and re-scaled logarithmically.
m k yiy_{i} Y^i\hat{Y}_{i} Y~i\tilde{Y}_{i} Y^iEB\hat{Y}_{i}^{\text{EB}} EMSE(yi)\text{EMSE}(y_{i}) EMSE(Y^i)\text{EMSE}(\hat{Y}_{i}) EMSE(Y~i)\text{EMSE}(\tilde{Y}_{i}) EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})
20 0 46.766 50.626 50.626 50.626 102.088 111.09 111.09 111.09
20 53.054 50.139 49.229 49.864 115.974 109.338 104.398 108.425
50 42.232 41.174 42.464 43.110 93.496 91.163 92.948 94.223
80 42.519 44.285 43.469 44.794 93.881 97.557 95.152 97.495
100 44.682 41.81 45.073 46.331 99.13 92.161 99.938 102.48
50 0 49.624 47.732 47.732 47.732 110.061 106.167 106.167 106.167
20 44.292 42.851 44.702 45.098 97.644 94.541 99.321 100.255
50 45.512 44.677 46.071 47.59 99.615 98.56 101.351 105.547
80 42.703 41.773 45.289 46.469 94.339 93.268 101.068 103.615
100 43.83 43.201 44.779 45.68 97.144 95.961 98.347 100.319
100 0 42.635 42.241 42.241 42.241 94.625 92.802 92.802 92.802
20 46.216 45.601 46.179 47.427 103.264 101.412 103.035 106.172
50 50.93 45.982 49.347 48.519 113.343 103.08 109.718 108.214
80 50.132 46.586 48.137 48.966 111.618 103.055 106.712 108.454
100 44.925 44.711 48.009 49.031 100.996 100.764 107.472 109.515
500 0 47.338 45.253 45.253 45.253 107.275 103.465 103.465 103.465
20 46.382 45.369 47.575 47.652 104.607 102.867 107.896 108.169
50 53.045 46.854 50.662 49.126 119.208 106.396 114.378 110.006
80 47.766 44.868 47.706 49.950 108.289 104.921 107.454 112.805
100 48.586 45.313 49.372 50.449 109.795 103.378 111.069 113.218
Jackknife versus parametric bootstrap estimators

We compare the jackknife MSPE estimator of the EB predictor Y^iEB\hat{Y}^{\text{EB}}_{i} to that of the bootstrap using the relative bias (see Table 5). In addition, we consider box plots for the jackknife and bootstrap MSPE estimators of the EB predictor Y^iEB,\hat{Y}^{\text{EB}}_{i}, where we compare these to box plots of the true values (see Figure 3). Both Table 5 and Figure 3 illustrate that the bootstrap receives a large number of negative values, which is due to the construction of M^1i.\hat{M}_{1i}. Here, we find that the bootstrap grossly underestimates the true values, whereas the jackknife slightly overestimates the true values. This could be due to generating data from the normal distribution and the non-linear transformation in the model. Thus, we would recommend the jackknife in practice.

Amount of zeros for the estimates of σ^u2\hat{\sigma}_{u}^{2}

Here, we investigate the proportion of zero estimates for σν2\sigma^{2}_{\nu} based on iteratively solving the Eqs. (4) and (5). Figure 2 illustrates that as the number of small areas increases, the magnitude of receiving zeros decreases. More specifically, we observe when m=20m=20 and as kk increases, Y^iEB\hat{Y}_{i}^{\text{EB}} and Y^i\hat{Y}_{i} tend to have a proportion of zero estimates of σν2\sigma^{2}_{\nu} between 0.3 and 0.5. When m=50m=50 and as kk increases, Y^iEB\hat{Y}_{i}^{\text{EB}} and Y^i\hat{Y}_{i} tend to have a proportion of zero estimates of σν2\sigma^{2}_{\nu} between 0.15 and 0.4. When m=100m=100 and as kk increases, Y^iEB\hat{Y}_{i}^{\text{EB}} and Y^i\hat{Y}_{i} tend to have a proportion of zero estimates of σν2\sigma^{2}_{\nu} between 0.05 and 0.3. When m=500m=500 and as kk increases, Y^iEB\hat{Y}_{i}^{\text{EB}} and Y^i\hat{Y}_{i} tend to have a proportion of zero estimates of σν2\sigma^{2}_{\nu} between 0 and 0.05. One should be cautious of this in practical applications, and adjusting for this is of the interest of future work.

Refer to caption
Figure 2: The proportion of zero estimates of σν2\sigma^{2}_{\nu} from model-based simulation when we perform 1,000 replications of the simulation study for k=0,100k=0,\ldots 100, m=20,50,100,500m=20,50,100,500, and d=2d=2.
The effect of misspecification of Σi\Sigma_{i} on β\beta

We investigate the effect of mis-specifying the variance Σi\Sigma_{i} on the estimation of the regression parameter β.\beta. To accomplish this, we conduct an empirical study based on the proposed model-based simulation study in Table 3 for the EB predictor Y^iEB\hat{Y}^{\text{EB}}_{i}. Assume β=3,\beta=3, and we consider two sets of experiments for each value of k,k, which are summarized in Table 6. Recall that Σi{0,d}.\Sigma_{i}\in\{0,d\}. Denote the first set of experiments by A1,B1,C1,D1A1,B1,C1,D1 and E1,E1, where we assume d=2.d=2. Denote the misspecified value of dd by dmis=4.d_{\text{mis}}=4. Denote the second set of experiments by A2,B2,C2,D2A2,B2,C2,D2 and E2,E2, where we assume d=4d=4 and dmis=2.d_{\text{mis}}=2. We conduct both sets of experiments for m=20m=20 and 500500. For each experiment, we estimate the unknown parameter β\beta under the followings: (1) the true value of dd denoted by β^\hat{\beta} and (2) the misspecified value of dmisd_{\text{mis}} denoted by β^mis\hat{\beta}_{\text{mis}}.

Then we compute the average absolute difference between the respective β\beta’s by considering the following:

100×1Rr=1R|β^(r)β^mis(r)|.100\times\frac{1}{R}\sum\limits_{r=1}^{R}\Big{|}\hat{\beta}^{(r)}-\hat{\beta}_{\text{mis}}^{(r)}\Big{|}.

In addition, we compute the magnitude of bias related to β^\hat{\beta} and β^mis\hat{\beta}_{\text{mis}} with respect to the true value of β=3\beta=3 as follows

100×1Rr=1R(β^(r)3)and100×1Rr=1R(β^mis(r)3).100\times\frac{1}{R}\sum_{r=1}^{R}\Big{(}\hat{\beta}^{(r)}-3\Big{)}\quad\text{and}\quad 100\times\frac{1}{R}\sum_{r=1}^{R}\Big{(}\hat{\beta}^{(r)}_{\text{mis}}-3\Big{)}.

Table 6 illustrates that the overall misspecification of Σi\Sigma_{i} leads to bias in β.\beta. When the magnitude of measurement error is zero (i.e. k=0k=0), there is no difference between the estimated β\beta using dd or dmisd_{\text{mis}}. On the other hand, when the magnitude of kk increases and we have more uncertainty in the error variance Σi\Sigma_{i}, values of β^\hat{\beta} and β^mis\hat{\beta}_{\text{mis}} diverge more from one another, and the magnitude of the bias increases. Also, we observe as the number of small areas increases, the value of bias decreases. One can resolve this bias issue by constructing an adaptive estimator for β^mis\hat{\beta}_{\text{mis}} in which its bias is corrected through some techniques such as bootstrap and develop a test of parameter specification.

Table 5: Comparison of the proposed jackknife and bootstrap estimators from model-based simulations. The results are averaged over all small ares. The results are rescaled by the logarithm of absolute value. Note, “*” denote that the original value is negative.
m k EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}}) mspeJ(Y^iEB)\text{mspe}_{J}(\hat{Y}_{i}^{\text{EB}}) mspeB(Y^iEB)\text{mspe}_{B}(\hat{Y}_{i}^{\text{EB}}) RBJ(Y^iEB)\text{RB}_{J}(\hat{Y}_{i}^{\text{EB}}) RBB(Y^iEB)\text{RB}_{B}(\hat{Y}_{i}^{\text{EB}})
20 0 111.09 120.731 118.478* 9.641 7.389*
20 108.425 115.222 115.884* 6.796 7.459*
50 94.223 107.245 111.255* 13.021 17.032*
80 97.495 113.06 129.772* 15.565 32.277*
100 102.48 115.536* 119.84* 13.056* 17.36*
50 0 106.167 112.318* 116.383* 6.154* 10.216*
20 100.255 110.449 106.743* 10.194 6.489*
50 105.547 115.365* 118.838* 9.818* 13.291*
80 103.615 114.127 112.717* 10.512 9.102*
100 100.319 110.454* 112.552* 10.134* 12.233*
100 0 92.802 98.67 102.932* 5.866 10.13*
20 106.172 106.788* 108.623* 1.048* 2.534*
50 108.214 119.666 120.032* 11.452 11.819*
80 108.454 117.223* 119.418* 8.769* 10.964*
100 109.515 119.028 117.071* 9.513 7.557*
500 0 103.465 108.38 110.455* 4.908 6.991*
20 108.169 119.672 115.892 11.502 7.722
50 110.006 121.191 125.754* 11.184 15.747*
80 112.805 125.672* 129.761* 12.867* 16.955*
100 113.217 123.037* 124.597* 9.819* 11.379*
Refer to caption
Figure 3: Comparing the distribution of the jackknife and bootstrap estimators with respect to the true MSE (EMSE(Y^iEB)\text{EMSE}(\hat{Y}_{i}^{\text{EB}})) from the model-based simulations. The results are logarithmically rescaled.
Table 6: Percentage of bias related to the consequences of misspecifying the error variance Σi\Sigma_{i} on β\beta in the EB predictor Y^iEB\hat{Y}^{\text{EB}}_{i} from model-based simulations. For all cases, we assume the true value for β\beta is 3. Also, σν2=2\sigma^{2}_{\nu}=2 and m=20m=20 (the smallest one) and 500500 (the largest one).
m k Experiment 1Rr=1R|β^(r)β^mis(r)|\frac{1}{R}\sum\limits_{r=1}^{R}\Big{|}\hat{\beta}^{(r)}-\hat{\beta}_{\text{mis}}^{(r)}\Big{|} 1Rr=1R(β^(r)3)\frac{1}{R}\sum\limits_{r=1}^{R}\Big{(}\hat{\beta}^{(r)}-3\Big{)} 1Rr=1R(β^mis(r)3)\frac{1}{R}\sum\limits_{r=1}^{R}\Big{(}\hat{\beta}_{\text{mis}}^{(r)}-3\Big{)}
20 0 A1(d=2,dmis=4)A1(d=2,d_{\text{mis}}=4) 0 0.026-0.026 0.026-0.026
A2(d=4,dmis=2)A2(d=4,d_{\text{mis}}=2) 0 0.237-0.237 0.237-0.237
2020 B1(d=2,dmis=4)B1(d=2,d_{\text{mis}}=4) 6.0346.034 0.3140.314 0.046-0.046
B2(d=4,dmis=2)B2(d=4,d_{\text{mis}}=2) 5.8145.814 0.634-0.634 0.591-0.591
5050 C1(d=2,dmis=4)C1(d=2,d_{\text{mis}}=4) 11.28711.287 0.184-0.184 0.864-0.864
C2(d=4,dmis=2)C2(d=4,d_{\text{mis}}=2) 10.97710.977 0.157-0.157 0.394-0.394
8080 D1(d=2,dmis=4)D1(d=2,d_{\text{mis}}=4) 19.64119.641 0.8420.842 2.1772.177
D2(d=4,dmis=2)D2(d=4,d_{\text{mis}}=2) 18.72218.722 0.7120.712 0.3270.327
100100 E1(d=2,dmis=4)E1(d=2,d_{\text{mis}}=4) 25.77425.774 2.6502.650 4.2854.285
E2(d=4,dmis=2)E2(d=4,d_{\text{mis}}=2) 25.96725.967 4.0204.020 2.9372.937
500 0 A1(d=2,dmis=4)A1(d=2,d_{\text{mis}}=4) 0 0.1850.185 0.1850.185
A2(d=4,dmis=2)A2(d=4,d_{\text{mis}}=2) 0 0.082-0.082 0.082-0.082
2020 B1(d=2,dmis=4)B1(d=2,d_{\text{mis}}=4) 1.1361.136 0.1740.174 0.1180.118
B2(d=4,dmis=2)B2(d=4,d_{\text{mis}}=2) 1.1071.107 0.0100.010 0.0090.009
5050 C1(d=2,dmis=4)C1(d=2,d_{\text{mis}}=4) 2.2002.200 0.138-0.138 0.0260.026
C2(d=4,dmis=2)C2(d=4,d_{\text{mis}}=2) 2.1822.182 0.044-0.044 0.0060.006
8080 D1(d=2,dmis=4)D1(d=2,d_{\text{mis}}=4) 3.3653.365 0.2040.204 0.067-0.067
D2(d=4,dmis=2)D2(d=4,d_{\text{mis}}=2) 3.3863.386 0.139-0.139 0.090-0.090
100100 E1(d=2,dmis=4)E1(d=2,d_{\text{mis}}=4) 5.0865.086 0.1740.174 0.1520.152
E2(d=4,dmis=2)E2(d=4,d_{\text{mis}}=2) 4.9414.941 0.022-0.022 0.1170.117

5 Discussion

In this paper, in order to stabilize the skewness and achieve normality in the response variable, we have proposed an area-level log-measurement error model on the response variable. In addition, we have proposed a measurement error model on the covariates. Second, under our proposed modeling framework, we derived the EB predictor of positive small area quantities subject to the covariates containing measurement error. Third, we proposed a corresponding estimate of MSPE using a jackknife and a bootstrap method, where we illustrated that the order of the bias is O(m1)O(m^{-1}), where mm is the number of small areas. Fourth, we have illustrated the performance of our methodology in both design-based simulation and model-based simulation studies, where the EMSE of the proposed EB predictor is not always uniformly better than that of the direct estimator. Our model-based simulation studies have provided further investigation and guidance on the behavior. For example, one fruitful area of future research would be providing a correction to the EB predictor to avoid for such behavior. One way to address this issue is by estimating ϕ=(𝜷,σν2)\phi=(\bm{\beta},\sigma^{2}_{\nu})^{\top} in such a way that its order of bias is smaller than O(m1)O(m^{-1}). This could help to reduce the amount of propagated errors in the EB predictor Y^iEB\hat{Y}_{i}^{\text{EB}}. Another way, which is less theoretical burdensome is by estimating the covariate as we have assumed it follows a functional measurement error model rather than a structural one. We have studied the MSPE of the EB predictor using both the jackknife and bootstrap in simulation studies, where we have shown the jackknife estimator performs better than the bootstrap one under our log model.

References

  • Arima et al. (2017) Arima, S., Bell, W., Datta, C., Gauri S. Franco and Liseo, B. 2017. Multivariate fay–herriot bayesian estimation of small area means under functional measurement error. Journal of the Royal Statistical Society: Series A, 180 (4): 1191–1209.
  • Arima et al. (2015a) Arima, S., Datta, G. S. and Liseo, B. 2015a. Accounting for measurement error in covariates in SAE: an overview in analysis of poverty data by small area methods. M. Pratesi Editor, Wiley New York.
  • Arima et al. (2015b) Arima, S., Datta, G. S. and Liseo, B. 2015b. Bayesian estimators for small area models when auxiliary information is measured with error. Scandinavian Journal of Statistics, 42 (2): 518–529.
  • Bellow and Lahiri (2011) Bellow, M. E. and Lahiri, P. S. 2011. An empirical best linear unbiased prediction approach to small area estimation of crop parameters. National Agricultural Statistics Service, University of Maryland.
  • Berg and Chandra (2014) Berg, E. and Chandra, H. 2014. Small area prediction for a unit-level lognormal model. Computational Statistics & Data Analysis, 78: 159–175.
  • Butar and Lahiri (2003) Butar, F. B. and Lahiri, P. 2003. On measures of uncertainty of empirical bayes small-area estimators. Journal of Statistical Planning and Inference, 112 (2): 63–76.
  • Chandra and Chambers (2011) Chandra, H. and Chambers, R. 2011. Small area estimation for skewed data in presence of zeros. Calcutta Statistical Association Bulletin, 63 (4): 241–258.
  • Fay and Herriot (1979) Fay, R. E. and Herriot, R. A. 1979. Estimates of income for small places: an application of james–stein procedures to census data. Journal of the American Statistical Association, 74 (366a): 269–277.
  • Fuller (2006) Fuller, W. A. 2006. Measurement error models. John Wiley & Sons.
  • Ghosh et al. (2015) Ghosh, M., Kubokawa, T. and Kawakubo, Y. 2015. Benchmarked empirical bayes methods in multiplicative area-level models with risk evaluation. Biometrika, 102 (3): 647–659.
  • Householder (1970) Householder, A. S. 1970. The numerical treatment of a single nonlinear equation. McGraw-Hill.
  • Jiang et al. (2002) Jiang, J., Lahiri, P. and Wan, S. M. 2002. A unified jackknife theory for empirical best prediction with m-estimation. The Annals of Statistics, 30 (6): 1782–1810.
  • Lohr and Rao (2009) Lohr, S. L. and Rao, J. 2009. Jackknife estimation of mean squared error of small area predictors in nonlinear mixed models. Biometrika, 96 96: 457–468.
  • Pfefferman (2013) Pfefferman, D. 2013. New important developments in small area estimation. Statistical Science 28 (1): 40–68.
  • Rao and Molina (2015) Rao, J. N. and Molina, I. 2015. Small area Estimation. Wiley Series in Survey Sampling, London.
  • Slud and Maiti (2006) Slud, E. V. and Maiti, T. 2006. Mean-squared error estimation in transformed fay–herriot models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68 (2): 239–257.
  • Wang and Fuller (2003) Wang, J. and Fuller, W. A. 2003. The mean squared error of small area predictors constructed with estimated area variances. Journal of the American Statistical Association, 98 (463): 716–723.
  • Ybarra and Lohr (2008) Ybarra, L. M. and Lohr, S. L. 2008. Small area estimation when auxiliary information is measured with error. Biometrika, 95 (4): 919–931.