This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Estimating overidentified linear models with heteroskedasticity and outliers

Lei Bill Wang
Abstract

A large degree of overidentification causes severe bias in TSLS. A conventional heuristic rule used to motivate new estimators in this context is approximate bias. This paper formalizes the definition of approximate bias and expands the applicability of approximate bias to various classes of estimators that bridge OLS, TSLS, and Jackknife IV estimators (JIVEs). By evaluating their approximate biases, I propose new approximately unbiased estimators, including UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of an existing estimator UIJIVE1. Both UOJIVEs are proven to be consistent and asymptotically normal under a fixed number of instruments and controls. The asymptotic proofs for UOJIVE1 in this paper require the absence of high leverage points, whereas proofs for UOJIVE2 do not. In addition, UOJIVE2 is consistent under many-instrument asymptotic. The simulation results align with the theorems in this paper: (i) Both UOJIVEs perform well under many instrument scenarios with or without heteroskedasticity, (ii) When a high leverage point coincides with a high variance of the error term, an outlier is generated and the performance of UOJIVE1 is much poorer than that of UOJIVE2.

Keywords: Approximate bias, instrumental variables (IV), overidentification, k-class estimator, Jackknife IV estimator (JIVE), many-instrument asymptotics, robustness to outlier

1 Introduction

Overidentified two-stage least squares (TSLS) are commonplace in economics research. Mogstad et al. (2021) summarize that from January 2000 to October 2018, 57 papers from American Economic Review, Quarterly Journal of Economics, Journal of Political Economy, Econometrica, and the Review of Economic Studies adopt overidentified TSLS. Unfortunately, overidentification introduces a bias problem to TSLS. The intuition for the relationship between bias and the number of IVs can be illustrated with a pathological example in which a researcher adds so many instruments in the first-stage regression that the number of first-stage regressors (which include both the IVs and exogenous control variables) is equal to the number of observations. The first stage regression equation has a perfect fit and its fitted value is exactly the observed endogenous variable values. Under this pathological example, TSLS and OLS perform the same, and so TSLS is equally biased as OLS.

To reduce or even completely resolve the problem, one may want to evaluate the bias of TSLS. However, evaluating the exact bias for estimators under the presence of endogeneity requires knowledge about the distribution of observable and unobservable variables. For example, Anderson and Sawa (1982) and Harding et al. (2016) assume that the error terms of the simultaneous equations are jointly normal and the assumption allows them to evaluate the finite sample distribution of TSLS and therefore, its bias. However, such assumptions can be too strong for economists’ preferences. Therefore, many econometricians resort to a different evaluation criterion called “approximate bias”. Many past works on IV estimators have evaluated the approximate bias of TSLS and used that to motivate new estimators (Nagar, 1959; Fuller, 1977; Buse, 1992; Angrist et al., 1999; Ackerberg and Devereux, 2009). The idea behind approximate bias is to divide the difference between an estimator and the target parameter into two parts. One part is of a higher stochastic order than the other and is dropped out of the subsequent approximate bias calculation. The expectation of the lower stochastic order part is called approximate bias.

This paper formalizes the definition of approximate bias and shows why this definition is a reasonable heuristic rule for motivating new estimators. The definition of approximate bias proposed by Nagar (1959) applies only to the k-class estimator. On the other hand, the definition proposed in the working paper of Angrist et al. (1999) (later referred to as AIK 1995) and used in Ackerberg and Devereux (2009) (later referred to as AD 2009) applies to a large class of estimators which includes OLS, TSLS, all k-class estimators, JIVE1, IJIVE1 and UIJIVE1.111IJIVE1 and UIJIVE1 are originally termed IJIVE and UIJIVE, respectively, by Ackerberg and Devereux (2009). I attach the number 1 at the end of their names for comparison purposes which will become clear in later sections of this paper. I show that the definition of approximate bias in AIK 1995 and AD 2009 is valid for an even larger class of estimators which additionally includes Fuller (1977), JIVE2, Hausman et al. (2012), and many other classes of estimators that bridge between OLS, TSLS, and JIVE2.

By expanding the applications of approximate bias to additional classes of estimators, I propose new estimators that are approximately unbiased. This paper focuses on two of the new estimators named UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of UIJIVE1. This paper shows that both UOJIVE1 and UOJIVE2 are consistent and asymptotically normal under a fixed number of instruments as the sample size goes to infinity. However, UOJIVE1’s asymptotic proofs rely on an assumption that rules out high leverage points. On the other hand, UOJIVE2 does not require such an assumption. In addition, Hausman et al. (2012)’s Theorem 1 directly applies to UOJIVE2, which proves that UOJIVE2 is consistent even under many-instrument asymptotic. Simulation results align with the theoretical results. UOJIVE1 and UOJIVE2 perform similarly well under a large degree of overidentification and heteroskedasticity. UOJIVE2’s performance is much more stable than UOJIVE1’s when an outlier (a high leverage point coinciding with a large variance of the error term) is deliberately introduced in the DGP.

The paper is organized as follows. Section 2 describes the problem setup and existing estimators in the approximate bias literature. Section 3 defines approximate bias for a large class of estimators and explains the importance of this formal definition. Sections 4 and 5 propose new estimators that are approximately unbiased. Section 6 shows that (i) under a fixed number of instruments, UOJIVE1 and UOJIVE2 are consistent and asymptotically normal, and (ii) under many-instrument asymptotic, UIJIVE2 is consistent. Section 7 and section 8 demonstrate the superior performance of UOJIVE2 in simulation and empirical studies. Section 9 concludes the paper.

2 Model setup and existing estimators

2.1 Overidentified linear model setup

The model concerns a typical endogeneity problem:

yi\displaystyle y_{i} =Xiβ+Wiγ+ϵi\displaystyle=X_{i}^{*}\beta^{*}+W_{i}\gamma^{*}+\epsilon_{i} (1)
Xi\displaystyle X_{i} =Ziπ+Wiδ+ηi\displaystyle=Z_{i}^{*}\pi^{*}+W_{i}\delta^{*}+\eta_{i} (2)

Eq.(1) contains the parameter of interest β\beta^{*}. yiy_{i} is the response/outcome/dependent variable and XiX_{i}^{*} is an L1L_{1}-dimensional row vector of endogenous covariate/explanatory/independent variables. Exogenous control variables, denoted as WiW_{i}, is an L2L_{2}-dimensional vector. Eq.(2) relates all the endogenous explanatory variables XiX_{i} to instrumental variables, ZiZ_{i}, and included exogenous control variables WiW_{i} from Eq.(1). ZiZ_{i} is a K1K_{1}-dimensional row vector, where K1L1K_{1}\geq L_{1}. Since this paper focuses on overidentified cases, I will assume that K1>L1K_{1}>L_{1} throughout the rest of the paper. As Hausman et al. (2012), Bekker and Crudu (2015), this paper treats ZZ as nonrandom (Alternatively, ZZ can be assumed to be random, but conditioned on like in Chao et al. (2012)). XiX_{i} is endogenous, Cov(ϵi,Xi)=Cov(ϵi,Ziπ+ηi)=Cov(ϵi,ηi)=σϵη0{\rm Cov}(\epsilon_{i},X_{i})={\rm Cov}(\epsilon_{i},Z_{i}\pi+\eta_{i})={\rm Cov}(\epsilon_{i},\eta_{i})=\sigma_{\epsilon\eta}\neq 0. I further assume that each pair of (ϵi,ηi)(\epsilon_{i},\eta_{i}) are independently and identically distributed with mean zero and covariance matrix (σϵ2σϵησηϵση2)\begin{pmatrix}\sigma_{\epsilon}^{2}&\sigma_{\epsilon\eta}\\ \sigma_{\eta\epsilon}&\sigma_{\eta}^{2}\end{pmatrix}. σϵ\sigma_{\epsilon} is a scalar. σϵη\sigma_{\epsilon\eta} is a LL-dimensional vector. ση\sigma_{\eta} is a L×LL\times L matrix. I also impose a relevance constraint that π0\pi\neq 0. In matrix notation, I have Eqs.(1) and (2) as:

y=\displaystyle y= Xβ+Wγ+ϵ\displaystyle X^{*}\beta^{*}+W\gamma^{*}+\epsilon (3)
X=\displaystyle X= Zπ+Wδ+η\displaystyle Z^{*}\pi^{*}+W\delta^{*}+\eta (4)

where yy and ϵ\epsilon are (N×1)(N\times 1) column vector; XX and η\eta are (N×L1)(N\times L_{1}) matrices; WW is (N×L2)(N\times L_{2}) and ZZ is a (N×K1)(N\times K_{1}) matrix, where NN is the number of observations. I also define the following notations for convenience:

X\displaystyle X =[XW]β\displaystyle=[X^{*}\quad W]\qquad\beta =(βγ)\displaystyle=\begin{pmatrix}\beta^{*}\\ \gamma^{*}\end{pmatrix}
Z\displaystyle Z =[ZW]π\displaystyle=[Z^{*}\quad W]\qquad\pi =(π0K1×L2δIL2)\displaystyle=\begin{pmatrix}\pi^{*}&0_{K_{1}\times L_{2}}\\ \delta^{*}&I_{L_{2}}\end{pmatrix}

Then, we have the following equivalent expressions for Eqs. (3) and (4):

y=Xβ+ϵ\displaystyle y=X\beta+\epsilon (5)
X=Zπ+η\displaystyle X=Z\pi+\eta (6)

It is also useful to define the following partialled out version of variables:

y~=yW(WW)1Wy\displaystyle\tilde{y}=y-W(W^{\prime}W)^{-1}W^{\prime}y
X~=XW(WW)1WX\displaystyle\tilde{X}=X^{*}-W(W^{\prime}W)^{-1}W^{\prime}X^{*}
Z~=ZW(WW)1WZ\displaystyle\tilde{Z}=Z^{*}-W(W^{\prime}W)^{-1}W^{\prime}Z^{*}

2.2 Existing estimators

IV estimator is often used to solve this simultaneous equation problem. I tabulate in Table 1 some of the existing IV estimators this paper repeatedly refers to. They all have the matrix expression (XCX)1(XCy)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y). The caption of Table 1 summarizes how these estimators are linked to each other.

Estimators C
OLS II
TSLS PZP_{Z}
k-class (1k)I+kPZ(1-k)I+kP_{Z}
JIVE2 PZDP_{Z}-D
JIVE1 (ID)1(PZD)(I-D)^{-1}(P_{Z}-D)
IJIVE1 (ID~)1(PZ~D~)(I-\tilde{D})^{-1}(P_{\tilde{Z}}-\tilde{D})
UIJIVE1 (ID~+ωI)1(PZ~D~+ωI)(I-\tilde{D}+\omega I)^{-1}(P_{\tilde{Z}}-\tilde{D}+\omega I)
Table 1: All these IV estimators are of the analytical form (XCX)1(XCy)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y). DD is the diagonal matrix of the projection matrix PZ=Z(ZZ)1ZP_{Z}=Z(Z^{\prime}Z)^{-1}Z^{\prime}. Z~\tilde{Z} is ZZ partialled out by WW, Z~=ZW(WW)1WZ\tilde{Z}=Z^{*}-W(W^{\prime}W)^{-1}W^{\prime}Z^{*}. PZ~P_{\tilde{Z}} is the projection matrix of Z~\tilde{Z} and D~\tilde{D} is the diagonal matrix of PZ~P_{\tilde{Z}}. JIVE2 modifies TSLS by removing the diagonal entries of the projection matrix PZP_{Z}. JIVE1 adds a rowwise division operation in front of the CC matrix of JIVE2. IJIVE is essentially equivalent to JIVE1, the only difference is that IJIVE takes in (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) instead of (y,X,Z)(y,X,Z). Its closed-form is written as (X~(ID~)1(PZ~D~)X~)1X~(ID~)1(PZ~D~)y~)(\tilde{X}(I-\tilde{D})^{-1}(P_{\tilde{Z}}-\tilde{D})\tilde{X})^{-1}\tilde{X}(I-\tilde{D})^{-1}(P_{\tilde{Z}}-\tilde{D})\tilde{y}). IJIVE reduces the approximate bias of JIVE1. UIJIVE1 further reduces the approximate bias of IJIVE1 by adding a constant ω\omega at the diagonal of the inverse term and the term post-multiplied to the inverse in the CC matrix. ω=L1+1N\omega=\frac{L_{1}+1}{N}.

3 Approximate bias

IV estimator is often employed to estimate β\beta^{*} (or β\beta) in Section 2. The most commonly used IV estimator is TSLS which has a bias problem when the degree of overidentification is large. Unfortunately, completely removing the bias of overidentified TSLS is generally infeasible unless economists are willing to assume parametric families for instrumental variables, ZZ. Therefore, econometricians often resort to a concept called approximate bias. For example, JIVE1, JIVE2, IJIVE1, and UIJIVE1 from Table 1 are all motivated by reducing approximate bias. The intuition behind the idea is to divide the difference between an estimator, β^\hat{\beta}, and the true parameter, β\beta, that the estimator is aiming to estimate into two parts. One part is of a higher stochastic order than the other and therefore, is dropped out of the subsequent approximate bias calculation. The other part with lower stochastic order has an easy-to-evaluate expectation. Its expectation is called approximate bias.

3.1 Definition of approximate bias

The following definition of approximate bias has been used by AIK 1995 and AD 2009 to motivate their development of new estimators whose CC matrix satisfies property CZ=ZCZ=Z (and hence CX=Zπ+CηCX=Z\pi+C\eta):

Definition 1.

The approximate bias of an IV estimator is E[RN]E[R_{N}] where

RN=JϵQ0NπZηJϵ+Q0NηCϵQ0NηPZπϵR_{N}=J\epsilon-\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta J\epsilon+\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}\epsilon-\frac{Q_{0}}{N}\eta^{\prime}P_{Z\pi}\epsilon

in which Q0=limN(πZZNπ)1Q_{0}=\lim_{N\to\infty}(\pi^{\prime}\frac{Z^{\prime}Z}{N}\pi)^{-1}, J=(πZZπ)1πZJ=(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime} and PZπ=Zπ(πZZπ)1πZP_{Z\pi}=Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}.

For readers’ convenience, I justify why this definition can be used as a reasonable heuristic rule for motivating new estimators in Appendix A. Note that both AIK 1995 and AD 2009 assume that CZ=ZCZ=Z which restricts the application of Definition 1. The condition that CZ=ZCZ=Z is satisfied by OLS, TSLS, all k-class estimators, JIVE1, IJIVE1, and UIJIVE1, but not many other estimators such as JIVE2, HLIM, and HFUL from Hausman et al. (2012), λ2\lambda_{2}- and ω2\omega_{2}-class estimators which I will introduce in later sections of the paper. In Appendix B, I show that the condition CZ=ZCZ=Z is not necessary, Definition 1 is a reasonable heuristic rule for the estimators as mentioned earlier which do not satisfy the condition CZ=ZCZ=Z.

3.2 Approximate bias of existing estimators

Definition 1 yields Corollary 1 and Definitions 2 and 3. I will use them to analyze the existing IV estimators in Table 1.

Corollary 1.

Approximate bias of an IV estimator is (XCX)1(XCy)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y) is

Q0N(tr(C)tr(PZπ)1)σηϵ\frac{Q_{0}}{N}(tr(C^{\prime})-tr(P_{Z\pi})-1)\sigma_{\eta\epsilon}

where Q0=limNπZZNπQ_{0}=\lim_{N\to\infty}\pi^{\prime}\frac{Z^{\prime}Z}{N}\pi and PZπ=Zπ(πZZπ)1πZP_{Z\pi}=Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}. An estimator’s approximate bias is proportional to tr(C)1tr(C)-\mathcal{L}-1, where \mathcal{L} is the number of columns of ZπZ\pi (or XX).222I use \mathcal{L} instead of LL because while for most of the estimators in Table 1, =L\mathcal{L}=L; for IJIVE1 and UIJIVE1, =L1\mathcal{L}=L_{1}.

Definition 2.

An estimator is said to be approximately unbiased if tr(C)tr(PZπ)1=0tr(C)-tr(P_{Z\pi})-1=0.

Definition 3.

The approximate bias of an estimator is said to be asymptotically vanishing if

tr(C)tr(PZπ)10 as N.tr(C)-tr(P_{Z\pi})-1\overset{}{\to}0\text{ as }N\to\infty.

I compute the approximate bias by showing the term tr(C)1tr(C)-\mathcal{L}-1 for the following estimators. Larger magnitude of tr(C)1tr(C)-\mathcal{L}-1 means a larger approximate bias. Ideally, tr(C)1=0tr(C)-\mathcal{L}-1=0, so that we can claim the estimator to be approximately unbiased.

The approximate bias computation for OLS, TSLS, and k-class estimators is straightforward: OLS’s approximate bias is proportional to NL1N-L-1, TSLS’s approximate bias is proportional to KL1K-L-1, k-class estimator’s approximate bias is proportional to kK+(1k)NL1kK+(1-k)N-L-1, where kk is a user-defined parameter. Setting k=NL1NKk=\frac{N-L-1}{N-K} gives us an approximately unbiased estimator. I call this estimator AUK (Approximately unbiased k-class estimator).

As explained earlier, TSLS’ approximate bias is proportional to KL1K-L-1 which is large when the degree of overidentification is large. AIK 1995 proposes a jackknife version of TSLS to reduce the approximate bias of TSLS when the degree of overidentification is large. The authors call the proposed estimators JIVE1 and JIVE2 (Jackknife IV Estimator). Evaluating the approximate bias of JIVE2 is straight forward:

tr(PZD)1=L1tr(P_{Z}-D)-\mathcal{L}-1=-L-1

We obtain the approximate bias of JIVE1:

tr((ID)1(PZD))1=i=1N01DiL1=L1tr((I-D)^{-1}(P_{Z}-D))-\mathcal{L}-1=\sum_{i=1}^{N}\frac{0}{1-D_{i}}-L-1=-L-1

AD 2009 reduces the approximate bias of the two JIVEs by paritialling out WW from (y,X,Z)(y,X^{*},Z^{*}). We have the approximate bias of IJIVE1 proportional to

tr((ID~)1(PZ~D~))1=i=1N01D~iL11=L11tr((I-\tilde{D})^{-1}(P_{\tilde{Z}}-\tilde{D}))-\mathcal{L}-1=\sum_{i=1}^{N}\frac{0}{1-\tilde{D}_{i}}-L_{1}-1=-L_{1}-1

Note that L1<LL_{1}<L, so IJIVE1 potentially has a smaller approximate bias than JIVEs.333The comparison requires knowledge on limNπZ~Z~Nπ\lim_{N\to\infty}\pi^{\prime}\frac{\tilde{Z}^{\prime}\tilde{Z}}{N}\pi. However, since approximate bias is used only as a heuristic rule to motivate new estimators, evaluating tr(C)tr(PZπ)1tr(C)-tr(P_{Z\pi})-1 is arguably sufficient for motivating purposes. AD 2009 also further reduces approximate bias by adding a constant (L1+1N\frac{L_{1}+1}{N}) to the diagonal of the CC matrix of IJIVE1. The new estimator is called UIJIVE1.

To evaluate the approximate bias of UIJIVE1, I make the following assumption about the leverage of ZZ, {Di}i=1N{\{{D}_{i}\}}_{i=1}^{N}:

Assumption (BA).

maxiDi\max_{i}{D}_{i} is bounded away from 1 for large enough N from 1. Equivalently, m>0, such that Di1m\exists\leavevmode\nobreak\ m>0,\text{ such that }{D}_{i}\leq 1-m for large enough NN and for all i=1,2,3,,Ni=1,2,3,\dots,N.

I state the assumption in terms of ZZ as I will repeatedly use it for ZZ in Section 6. Here, to compute the approximate bias of IJIVE1, I make assumption BA, but for {D~i}i=1N{\{\tilde{D}_{i}\}}_{i=1}^{N} instead of for {Di}i=1N{\{{D}_{i}\}}_{i=1}^{N}. Under Assumption BA for {D~i}i=1N{\{\tilde{D}_{i}\}}_{i=1}^{N}, UIJIVE1’s approximate bias is proportional to

tr(C)1=i=1NL1+1N1D~i+L1+1NL110tr(C)-\mathcal{L}-1=\sum_{i=1}^{N}\frac{\frac{L_{1}+1}{N}}{1-\tilde{D}_{i}+\frac{L_{1}+1}{N}}-L_{1}-1\to 0

Therefore, we have that UIJIVE1’s approximate bias is asymptotically vanishing. See proof in Appendix C.

4 From UIJIVE1 to UIJIVE2

This section interprets the relationship between existing estimators (in particular, JIVE1, IJIVE1, UIJIVE1, and OLS) which sheds light on how a new estimator that is approximately unbiased, namely, UIJIVE2 is developed.

4.1 ω1\omega_{1}-class estimator and UIJIVE1

I define the ω1\omega_{1}-class estimator that contains all the following four estimators: JIVE1, IJIVE1, UIJIVE1, and OLS. Since all of them can be expressed as: (XCX)1(XCy)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y), I define matrix CC for the ω1\omega_{1}-class estimators:

(ID+ω1I)1(PZD+ω1I)(I-D+\omega_{1}I)^{-1}(P_{Z}-D+\omega_{1}I)

when ω1=0\omega_{1}=0, it corresponds to JIVE1; when ω1=\omega_{1}=\infty, it corresponds to OLS. On the other hand, IJIVE1 is a special case of JIVE1 where (y,X,Z)(y,X,Z) is replaced with (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}), and hence is belonged to ω1\omega_{1}-class estimator. UIJIVE1 uses (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) and sets ω1=L1+1N\omega_{1}=\frac{L_{1}+1}{N}. As stated in Section 3.2, this choice of ω1=L1+1N\omega_{1}=\frac{L_{1}+1}{N} outputs UIJIVE1 whose approximate bias is asymptotically vanishing. The information of these four estimators is summarized in Table 2a.

The development from TSLS to JIVE1 to IJIVE1 to UIJIVE1 is depicted in the upper half of Figure 1. TSLS to JIVE1: JIVE1 is a jackknife version of TSLS where the first stage OLS in TSLS is replaced with a jackknife procedure for out of sample prediction. Call the resulting matrix from the first stage X^\hat{X}, the second stage IV estimation is the same for TSLS and JIVE1: (X^X)1(X^y)(\hat{X}^{\prime}X)^{-1}(\hat{X}^{\prime}y). JIVE1 to IJIVE1: IJIVE1 is a special case of JIVE1. JIVE1 takes (y,X,Z)(y,X,Z) as the input, IJIVE1 takes (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) (the partialled-out version of (y,X,Z)(y,X,Z)) as input. IJIVE1 to UIJIVE1: UIJIVE1 adds a constant ω1=L1+1N\omega_{1}=\frac{L_{1}+1}{N} to the diagonal of CC matrix of IJIVE1. We can interpret this addition of constant ω1I\omega_{1}I as bridging IJIVE1 and OLS since OLS has its CC matrix as II.

(y,X,Z)(y,X,Z) or (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) ω1\omega_{1}
JIVE1 (y,X,Z)(y,X,Z) 0
IJIVE (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) 0
UIJIVE1 (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) L1+1N\frac{L_{1}+1}{N}
OLS Both \infty
(a) OLS can take either (y,X,Z)(y,X,Z) or (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}). It estimates the entire β\beta when it takes (y,X,Z)(y,X,Z), it estimates the parameters for only the endogenous variables β\beta^{*} when it takes (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}). UIJIVE1 can be interpreted as an estimator that bridges IJIVE and OLS.
(y,X,Z)(y,X,Z) or (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) ω2\omega_{2}
JIVE2 (y,X,Z)(y,X,Z) 0
IJIVE2 (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) 0
UIJIVE2 (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) L1+1N\frac{L_{1}+1}{N}
OLS Both \infty
(b) OLS can take either (y,X,Z)(y,X,Z) or (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}). It estimates the entire β\beta when it takes (y,X,Z)(y,X,Z), it estimates the parameters for only the endogenous variables β\beta^{*} when it takes (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}). UIJIVE2 can be interpreted a an estimator that bridges IJIVE2 and OLS.
Table 2: Some examples of ω1\omega_{1}-class and ω2\omega_{2}-class estimators. The left panel are from ω1\omega_{1}-class and the right panel are from ω2\omega_{2}-class.

4.2 ω2\omega_{2}-class estimator and UIJIVE2

Refer to caption
Figure 1: The development of past estimators and new estimators. The papers that develop the corresponding estimators are in quotation marks. AIK 1999 develop JIVE1 and JIVE2 and AD 2009 develop IJIVE and UIJIVE1. This paper develops new estimators IJIVE2 and UIJIVE2. As shown in the figure, the mathematical relationship between JIVE1, IJIVE1, and UIJIVE1 is similar to that between JIVE2, IJIVE2, and UIJIVE2. The relationship between UIJIVE1 and UIJIVE2 (or between IJIVE1 and IJIVE2, which is not of interest to this paper) is analogous to the relationship between JIVE1 and JIVE2. Latter estimators (UIJIVE2 and JIVE2) remove the row-wise division in the former estimators (UIJIVE1 and JIVE1).

I define ω2\omega_{2}-class of estimator that contains JIVE2 and OLS. The CC matrix of ω2\omega_{2}-class estimator is

PZD+ω2I.P_{Z}-D+\omega_{2}I.

The only difference between CC matrices for ω1\omega_{1}- and ω2\omega_{2}- class estimators is the exclusion/inclusion of (ID+ω1I)1(I-D+\omega_{1}I)^{-1}, which is a rowwise division operation (i.e. dividing iith row of PZD+ω1IP_{Z}-D+\omega_{1}I by 1Di+ω11-D_{i}+\omega_{1}).

ω2=0\omega_{2}=0 corresponds to JIVE2 and ω2=\omega_{2}=\infty corresponds to OLS. By Corollary 1, the approximate bias of ω2\omega_{2}-class estimators that takes (y~,X~,Z~)(\tilde{y},\tilde{X},\tilde{Z}) is proportional to ω2NL11\omega_{2}N-L_{1}-1. If ω2=0\omega_{2}=0, I obtain a new estimator IJIVE2, corresponding to IJIVE1 from ω1\omega_{1}-class estimators. Selecting ω2=L1+1N\omega_{2}=\frac{L_{1}+1}{N}, I obtain approximately unbiased ω2\omega_{2}-class estimator and name it UIJIVE2. Its closed-form expression is:

(X~(PZ~D~+L1+1NI)X~)1(X~(PZ~D~+L1+1NI)y~)(\tilde{X}^{\prime}(P_{\tilde{Z}}-\tilde{D}+\frac{L_{1}+1}{N}I)^{\prime}\tilde{X})^{-1}(\tilde{X}^{\prime}(P_{\tilde{Z}}-\tilde{D}+\frac{L_{1}+1}{N}I)^{\prime}\tilde{y})

where the transpose on PZ~D~+L1+1NIP_{\tilde{Z}}-\tilde{D}+\frac{L_{1}+1}{N}I is not necessary as the matrix is symmetric, but I keep it for coherent notation.

The information of OLS, JIVE2, IJIVE2, and UIJIVE2 are summarized in Table 2b. I also depict the parallel relationship between ω1\omega_{1}-class and ω2\omega_{2}-class estimators in the lower half of Figure 1. The development from OLS to JIVE2 to IJIVE2 to UIJIVE2 is analogous to the development of the ω1\omega_{1}-class estimators. Readers can also interpret Figure 1 from top to bottom, all the estimators in the bottom remove the rowwise division operations in the estimators above them. The removal of rowwise division is important to the comparison between ω1\omega_{1}- and ω2\omega_{2}-class estimators. The division operation is highly unstable when DiD_{i} (or D~i\tilde{D}_{i}) is close to 1, leading undesirable asymptotic property to ω1\omega_{1}-class estimator, but not to ω2\omega_{2}-class estimator. This point will be explained formally in Section 6.

Name Consistency Approximately unbiased Many-instrument consistency
Homoskedasticity Heteroskedasticity
OLS
TSLS
Nagar
JIVE1
JIVE2
UIJIVE1
UIJIVE2
Table 3: Properties of different estimators in the approximate bias literature with endogenous regressor. means that Nagar estimator’s approximately unbiased property is only true under homoskedasticity, but not under heteroskedasticity, see proof in AD 2009.

5 From UIJIVE to UOJIVE

Figure 1 shows that UIJIVEs can be interpreted as an approximately unbiased estimator selected from a class of estimators that bridges IJIVEs and OLS. I apply the same thought process to other classes of estimators that bridge between OLS, TSLS, and JIVEs to obtain new approximately unbiased estimators. Table 4 summarizes these five classes of estimators (k-class, ω1\omega_{1}-, ω2\omega_{2}-, λ1\lambda_{1}-, and λ2\lambda_{2}-classes) and approximately unbiased estimators. See Appendix D for details on these estimators. The relationships between five classes of estimators are illustrated in Figure 2.

Class Estimators C parameter
kk-class AUK kPZ+(1k)IkP_{Z}+(1-k)I NL1NK\frac{N-L-1}{N-K}
λ1\lambda_{1}-class TSJI1 (Iλ1D)(PZλ1D)(I-\lambda_{1}D)(P_{Z}-\lambda_{1}D) KL1K\frac{K-L-1}{K}
λ2\lambda_{2}-class TSJI2 PZλ2DP_{Z}-\lambda_{2}D KL1K\frac{K-L-1}{K}
ω1\omega_{1}-class UOJIVE1 (ID+ω1I)1(PZD+ω1I)(I-D+\omega_{1}I)^{-1}(P_{Z}-D+\omega_{1}I) L+1N\frac{L+1}{N}
ω2\omega_{2}-class UOJIVE2 PZD+ω2IP_{Z}-D+\omega_{2}I L+1N\frac{L+1}{N}
Table 4: DD is the diagonal matrix of the projection matrix PZ=Z(ZZ)1ZP_{Z}=Z(Z^{\prime}Z)^{-1}Z^{\prime}. “parameter” column states the parameter value such that the estimator is approximately unbiased.

Note that UIJIVE1 and UIJIVE2 can be considered as special cases of UOJIVE1 and UOJIVE2. UIJIVE can be understood as the following two steps:

  1. 1.

    Partial out WW from ZZ^{*},XX^{*} and yy. Z~=ZPWZ\tilde{Z}=Z^{*}-P_{W}Z^{*}. X~=XPWX\tilde{X}=X^{*}-P_{W}X^{*}. y~=yPWy\tilde{y}=y-P_{W}y.

  2. 2.

    Set ω=L1+1N\omega=\frac{L_{1}+1}{N} and compute the estimate as an ω1\omega_{1}- and ω2\omega_{2}-class estimator using Z~\tilde{Z}, X~\tilde{X} and y~\tilde{y}.

The second step is exactly the UOJIVE for the following model:

y~=\displaystyle\tilde{y}= X~β+ϵ~\displaystyle\tilde{X}\beta+\tilde{\epsilon}
X~=\displaystyle\tilde{X}= Z~π+η~\displaystyle\tilde{Z}\pi+\tilde{\eta}

where ϵ~=ϵPWϵ\tilde{\epsilon}=\epsilon-P_{W}\epsilon and η~=ηPWη\tilde{\eta}=\eta-P_{W}\eta.

For the rest of the paper, I will show the asymptotic properties of UOJIVE1 and UOJIVE2 and demonstrate that UOJIVE2 is more robust to outliers than UOJIVE1. The theoretical results can be easily generalized to UIJIVE1, UIJIVE2, TSJI1 and TSJI2.

Refer to caption
(a) k, λ1\lambda_{1}- and ω1\omega_{1}-classes estimators
Refer to caption
(b) k, λ2\lambda_{2}- and ω2\omega_{2}-class estimators
Figure 2: The two figures illustrate the connections between various classes of estimators. The estimators in bold are those proposed by this paper. The relationship between TSJI1 and TSJI2 is analogous to the relationship between JIVE1 and JIVE2. The same analogy applies to the relationship between UOJIVE1 and UOJIVE2. The left panel contains three classes of estimators whose CC matrices have the CZ=ZCZ=Z property whereas λ2\lambda_{2}- and ω2\omega_{2}-class estimators on the right panel do not have this property.

6 Asymptotic property of UOJIVE1 and UOJIVE2

6.1 Consistency of UOJIVEs under fixed KK

Under fixed KK and LL, I show that both UOJIVE1 and UOJIVE2 are consistent as NN\to\infty with some assumptions imposed on the moment existence for observable and unobservable variables. It is worth mentioning that Assumption BA is important to the consistency of UOJIVE1, but not to that of UOJIVE2, suggesting that UOJIVE2 is robust to the presence of high leverage points while UOJIVE1 is not. Throughout this subsection, I make the following assumptions

Assumption 1.

Standard regularity assumptions hold for

1NXZ𝑝\displaystyle\frac{1}{N}X^{\prime}Z\overset{p}{\to} ΣXZ,\displaystyle\Sigma_{XZ},
1NZX𝑝\displaystyle\frac{1}{N}Z^{\prime}X\overset{p}{\to} ΣZX,\displaystyle\Sigma_{ZX},
1NZZ\displaystyle\frac{1}{N}Z^{\prime}Z\overset{}{\to} ΣZZ,\displaystyle\Sigma_{ZZ},
1NZϵ𝑑\displaystyle\frac{1}{\sqrt{N}}Z^{\prime}\epsilon\overset{d}{\to} N(0,σϵ2ΣZZ).\displaystyle N(0,\sigma_{\epsilon}^{2}\Sigma_{ZZ}).
Assumption 2.

E[ZiXi1+δ1]E[\lVert Z_{i}^{\prime}X_{i}\rVert^{1+\delta_{1}}] and E[ηiXi1+δ1]E[\lVert\eta_{i}^{\prime}X_{i}\rVert^{1+\delta_{1}}] are finite for some δ1>0\delta_{1}>0. The existence of these two expectations jointly implies that E[XiXi1+δ1]E[\lVert X_{i}^{\prime}X_{i}\rVert^{1+\delta_{1}}] is finite.

Assumption 3.

E[Ziϵi1+δ1]E[\lVert Z_{i}^{\prime}\epsilon_{i}\rVert^{1+\delta_{1}}] and E[ηiϵi1+δ1]E[\lVert\eta_{i}^{\prime}\epsilon_{i}\rVert^{1+\delta_{1}}] are finite, for some δ1>0\delta_{1}>0. The existence of these two expectations jointly implies that E[Xiϵi1+δ1]E[\lVert X_{i}^{\prime}\epsilon_{i}\rVert^{1+\delta_{1}}] is finite.

Denote L+1N\frac{L+1}{N} as ω\omega. Recall that UOJIVE1’s matrix expression is

β^UOJIVE1=\displaystyle\hat{\beta}_{UOJIVE1}= (X(PZD+ωI)(ID+ωI)1X)1(X(PZD+ωI)(ID+ωI)1y)\displaystyle(X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}X)^{-1}(X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}y)
=\displaystyle= β+(X(PZD+ωI)(ID+ωI)1X)1(X(PZD+ωI)(ID+ωI)1ϵ)\displaystyle\beta+(X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}X)^{-1}(X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}\epsilon)
Lemma 6.1.

Under Assumption BA, 1, and 2, 1NX(PZD+ωI)(ID+ωI)1X𝑝H\frac{1}{N}X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}X\overset{p}{\to}H, where H=ΣXZΣZZ1ΣZXH=\Sigma_{XZ}\Sigma_{ZZ}^{-1}\Sigma_{ZX}.

Lemma 6.2.

Under Assumption BA, 1, and 3, 1NX(PZD+ωI)(ID+ωI)1ϵ𝑝0\frac{1}{N}X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}\epsilon\overset{p}{\to}0.

The proofs for Lemma 6.1 and 6.2 are collected in Appendix E. The importance of Assumption BA stems from the presence of {Di}i=1N\{D_{i}\}_{i=1}^{N} in the denominators. UOJIVE2 does not have the same problem since (ID+ωI)1(I-D+\omega I)^{-1} does not appear in its analytical form.

Theorem 6.1.

Under Assumption BA, 1, 2, 3, β^UOJIVE1𝑝β\hat{\beta}_{UOJIVE1}\overset{p}{\to}\beta.

Proof.

With Lemma 6.1 and 6.2, proving the theorem is trivial. ∎

Now, we turn to analyzing the consistency of UOJIVE2. The steps for proving its consistency are similar to what we have done for UOJIVE1.

Lemma 6.3.

Under Assumption 1 and 2, 1NX(PZD+ωI)X𝑝H\frac{1}{N}X^{\prime}(P_{Z}-D+\omega I)X\overset{p}{\to}H, where H=ΣXZΣZZ1ΣZXH=\Sigma_{XZ}\Sigma_{ZZ}^{-1}\Sigma_{ZX}.

Lemma 6.4.

Under Assumption 1 and 3, 1NX(PZD+ωI)ϵ𝑝0\frac{1}{N}X^{\prime}(P_{Z}-D+\omega I)\epsilon\overset{p}{\to}0.

The proofs for Lemma 6.3 and 6.4 are collected in Appendix E. With these two lemmas, we establish the consistency of UOJIVE2.

Theorem 6.2.

Under Assumption 1, 2, and 3, β^UOJIVE2𝑝β\hat{\beta}_{UOJIVE2}\overset{p}{\to}\beta.

6.2 Asymptotic variance of UOJIVEs under fixed KK

Assumption 4.

E[Ziϵi2+δ1]E[\lVert Z_{i}^{\prime}\epsilon_{i}\rVert^{2+\delta_{1}}] and E[ηiϵi2+δ1]E[\lVert\eta_{i}^{\prime}\epsilon_{i}\rVert^{2+\delta_{1}}] are finite, for some δ1>0\delta_{1}>0. The existence of these two expectations jointly implies that E[Xiϵi2+δ1]E[\lVert X_{i}^{\prime}\epsilon_{i}\rVert^{2+\delta_{1}}] is finite.

Lemma 6.5.

Under Assumption BA, 1, 4, 1NX(PZD+ωI)(ID+ωI)1ϵ𝑑N(0,σϵ2H)\frac{1}{\sqrt{N}}X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}\epsilon\overset{d}{\to}N(0,\sigma_{\epsilon}^{2}H).

Proof for Lemma 6.5 can be found in Appendix E. 1NX(PZD+ωI)(ID+ωI)1ϵ𝑑N(0,σϵ2H)\frac{1}{\sqrt{N}}X^{\prime}(P_{Z}-D+\omega I)(I-D+\omega I)^{-1}\epsilon\overset{d}{\to}N(0,\sigma_{\epsilon}^{2}H). Combining with Lemma 6.1, the asymptotic variance of UOJIVE is σϵ2H1\sigma_{\epsilon}^{2}H^{-1}.

Theorem 6.3.

Under Assumption BA, 1, 2, 4, N(β^UOJIVE1β)𝑑N(0,σϵ2H1)\sqrt{N}(\hat{\beta}_{UOJIVE1}-\beta)\overset{d}{\to}N(0,\sigma_{\epsilon}^{2}H^{-1}).

Lemma 6.6.

Under Assumption 1 and 4, 1NX(PZD+ωI)ϵ𝑑N(0,σϵ2H)\frac{1}{\sqrt{N}}{X}^{\prime}(P_{{Z}}-{D}+\omega I)^{\prime}{\epsilon}\overset{d}{\to}N(0,\sigma_{{\epsilon}}^{2}H).

Proof for the lemma can be found in Appendix E. Lemma 6.3 and 6.6 establish the asymptotic normality of UOJIVE2 without Assumption BA.

Theorem 6.4.

Under Assumptions 1, 2, and 4, N(β^UOJIVE2β)𝑑N(0,σϵ2H1)\sqrt{N}(\hat{\beta}_{UOJIVE2}-\beta)\overset{d}{\to}N(0,\sigma_{\epsilon}^{2}H^{-1}).

6.3 Many-instrument consistency of UOJIVE2

The many-instrument asymptotics framework is that both K1K_{1} and NN go to infinity and the ratio K1N\frac{K_{1}}{N} converges to a constant α\alpha where 0<α<10<\alpha<1. The motivation behind the many-instrument framework is to provide a better approximation of a situation where the number of instruments is large relative to sample size and first-stage overfitting is concerning. Many papers have looked into many-instrument setup (Bekker, 1994; Kunitomo, 2012; Chao et al., 2012; Hausman et al., 2012). Theorem 1 from Hausman et al. (2012) directly applies to UOJIVE2.

Theorem 6.5.

Under Assumption 1-4 specified in Hausman et al. (2012), β^UOJIVE2𝑝β\hat{\beta}_{UOJIVE2}\overset{p}{\to}\beta as NN\to\infty.

7 Simulation

I run three types of simulations to contrast the performances of OLS, TSLS, JIVEs, and UIJIVEs (UOJIVEs). They are designed to test for many-instrument asymptotic under homoskedasticty, many-instrument asymptotic under consistency under heteroskedasticity, and robustness to outliers. Each simulation has two or four setups and each setup consists of 1000 rounds. All simulations have L1=1L_{1}=1. There is only one endogenous variable XX^{*}. β\beta^{*} is set to be 0.3. The intercepts are both stages are set to be 0 throughout all simulations, though a column of ones is still included in the regressions, either partialled out as part of WW for UIJIVEs or included as a part of ZZ for the rest of the estimators.

7.1 Simulation for many-instrument asymptotic under homoskedasticity

The simulation setup for many-instrument asymptotics is summarized in Table 5. The parameters for all instruments ZZ^{*} (and for all controls WW) are set as a constant, so I only report one value for each parameter instead of a vector. R0=NπE[ZZ]πση2R_{0}=\frac{N\pi E[Z^{\prime}Z]\pi}{\sigma_{\eta}^{2}} is the concentration parameter, I set it to be around 150 following Hansen and Kozbur (2014) to maintain a reasonable instrumental variable strength. ZZ follows standard multivariate normal distributions.444I follow Bekker and Crudu (2015) which also assumes nonrandom ZZ for analyzing the theoretical property of the IV estimator, but let ZZ follow a random distribution in their simulation study. The error terms ϵ\epsilon and η\eta are bivariate normal with mean (0,0)(0,0)^{\prime} and covariance matrix (0.80.60.61)\begin{pmatrix}0.8&-0.6\\ -0.6&1\end{pmatrix}.

Setup NN KK LL β\beta^{*} γ\gamma^{*} π\pi^{*} δ\delta^{*} R0R_{0}
1 500 50 10 0.3 1 0.08 0.05 140.5
2 2000 200 40 0.3 1 0.02 0.02 160
Table 5: The simulation setups for testing estimators’ performances under many-instrument scenarios under homoskedasticity

I report the simulation results in Table 6, it is clear that OLS, TSLS, and JIVEs suffer from a larger bias than approximately unbiased estimators. In addition, as I increase the degree of overidentification and the number of control variables, the bias of TSLS and JIVEs deteriorates. The sharp worsening of JIVEs’ bias is likely exacerbated by its instability as pointed out by Davidson and MacKinnon (2007). The simulation results align with Corollary 1 which implies that TSLS’ approximate bias is proportional to the degree of overidentification while JIVEs’ approximate biases are proportional to the number of control variables. All approximately unbiased estimators perform well under both setups. For example, UOJIVE1 and UOJIVE2 virtually do not show any bias and have a relatively small variance compared to JIVEs.

Setup 1 Setup 2
Estimator Bias Variance MSE Bias Variance MSE
OLS 0.475 0.001 0.226 0.564 0.000 0.318
TSLS 0.143 0.004 0.024 0.337 0.002 0.116
Nagar 0.013 0.009 0.009 0.055 0.016 0.019
AUK 0.007 0.011 0.011 0.023 0.027 0.027
JIVE1 0.069 0.016 0.021 0.421 1.009 1.185
JIVE2 0.069 0.016 0.021 0.420 1.093 1.268
TSJI1 0.003 0.010 0.010 0.002 0.022 0.022
TSJI2 0.003 0.010 0.010 0.002 0.022 0.022
UIJIVE1 0.003 0.010 0.010 0.002 0.023 0.023
UIJIVE2 0.003 0.010 0.010 0.002 0.023 0.023
UOJIVE1 0.003 0.010 0.010 0.002 0.023 0.023
UOJIVE2 0.003 0.010 0.010 0.002 0.022 0.022
Table 6: Comparison of estimator performances with many instruments

7.2 Many-instrument asymptotic under heteroskedasticity

Following the setup in Ackerberg and Devereux (2009), I test the performance of estimators with heteroskedastic errors while setting ZZ to be a group-fixed effect matrix without any control variables WW. Since there is no WW, UIJIVEs and UOJIVEs are equivalent and I only report UOJIVEs’ results.

The sample size is set to be 500, N=500N=500. The first 115 observations belong to Group 1, the next 115 observations belong to Group 2. Groups 1 and 2 are two big groups. The rest of the 270 observations consist of 18 small groups. Each small group has 15 observations. The first group is excluded from ZZ and π=0.3\pi^{*}=0.3, meaning that Group 1 on average has its XX value 0.3 below other groups conditional on everything else being equal. There are two covariance matrices: (0.250.10.10.25)\begin{pmatrix}0.25&-0.1\\ -0.1&0.25\end{pmatrix} and (0.250.20.20.25)\begin{pmatrix}0.25&0.2\\ 0.2&0.25\end{pmatrix}, denoted as - and ++ covariance matrices, respectively. In Setup 1, I let the big groups have the ++ covariance matrix and the small groups have the - covariance matrix; in Setup 2, I reverse the covariance matrices for the two types of groups.

I summarize the simulation results in Table 7. The result aligns with Ackerberg and Devereux (2009). The Nagar estimator is neither approximately unbiased nor consistent under heteroskedasticity with many instruments. The same can be said about approximately unbiased k-class estimator AUK which is proposed in Appendix D. In contrast, the performance of UOJIVE2 remains stellar, aligning with Theorem 6.5 that consistency of UOJIVE2 is established without assuming homoskedasticity.

Setup 1: small+big{\rm small}^{+}{\rm big}^{-} Setup 2: smallbig+{\rm small}^{-}{\rm big}^{+}
Estimator Bias Variance MSE Bias Variance MSE
OLS 0.232 0.002 0.056 0.141 0.002 0.022
TSLS 0.286 0.028 0.109 0.135 0.025 0.043
Nagar 0.258 6.927 6.987 0.364 0.274 0.406
AUK 0.336 0.246 0.358 0.409 1.087 1.253
JIVE1 0.311 119.913 119.890 0.047 0.116 0.118
JIVE2 0.036 0.392 0.393 0.047 0.113 0.115
TSJI1 0.054 0.127 0.130 0.072 0.073 0.078
TSJI2 0.075 0.234 0.239 0.072 0.069 0.074
UOJIVE1 0.011 0.088 0.088 0.023 0.064 0.065
UOJIVE2 0.019 0.095 0.096 0.024 0.061 0.062
Table 7: Comparison of estimator performances under heteroskedasticity

7.3 Simulation with outlier

In my proofs for the consistency and asymptotic normality results of UOJIVE1, Assumption BA is repeatedly invoked to bound below the denominator. With high leverage points, the asymptotic proofs for UOJIVE1 are no longer valid. In contrast, UOJIVE2’s asymptotic results do not require Assumption BA. In the following simulation setups, I deliberately introduce high leverage points in the DGP, when these high leverage points coincide with large variances of ϵ\epsilon, the performance of UOJIVE1 is much worse than that of UOJIVE2.

There are 5 instruments ZZ^{*}, one endogenous variable XX^{*}, and no controls WW. Again, UIJIVEs and UOJIVEs are equivalent due to the absence of WW, so I only report UOJIVEs. I set the sample size NN to be {101,401,901,1601}\{101,401,901,1601\}. All these numbers are 1 plus a square number so it is easy to set up the following simulations. The first observation is the high leverage point with its first entry being (N1)1/3(N-1)^{1/3}. For the rest of the N1N-1 observations, every N1\sqrt{N-1} observations have their first five rows being the identity matrix, the rest of the rows are all zeroes. This setup is equivalent to the group fixed effect setup for the heteroskedasticity simulation where there are five small groups (indexed 2-6) and one large group (Group 1), except the first row which is supposed to belong to Group 2 is contaminated, and has its value multiplied by N1/3N^{1/3}. I include Table 11 in the Appendix F to illustrate this simulation setup. The error terms ϵ\epsilon and η\eta are bivariate normal with mean (0,0)(0,0)^{\prime} and covariance matrix (0.80.60.61)\begin{pmatrix}0.8&-0.6\\ -0.6&1\end{pmatrix}, except that ϵ1\epsilon_{1}, the error terms for the high leverage point, is multiplied by N1/3N^{1/3}. The coincidence between high leverage and large variance of ϵ\epsilon generates an outlier. Intuitively, the first observation has a high influence on its fitted value and has a large probability of deviating a lot from the regression line. π\pi^{*} is set to be one.

The results for the simulations with outliers are summarized in Table 8. Throughout the four simulations, UOJIVE2 outperforms UOJIVE1 and TSJI2 outperforms TSJI1 substantially. The simulation result points to the importance of Assumption BA to the consistency of UOJIVE1.

N = 101
Estimator Bias Variance MSE TSJI1 0.018 0.388 0.388 TSJI2 0.014 0.130 0.130 UOJIVE1 0.013 0.193 0.193 UOJIVE2 0.001 0.067 0.067
N = 401
Estimator Bias Variance MSE TSJI1 0.054 0.397 0.400 TSJI2 0.012 0.110 0.110 UOJIVE1 0.036 0.169 0.170 UOJIVE2 0.010 0.036 0.036
N = 901
Estimator Bias Variance MSE TSJI1 0.029 0.359 0.360 TSJI2 0.010 0.093 0.092 UOJIVE1 0.019 0.144 0.144 UOJIVE2 0.004 0.024 0.024
N = 1601
Estimator Bias Variance MSE TSJI1 0.034 0.395 0.396 TSJI2 0.004 0.097 0.097 UOJIVE1 0.019 0.152 0.152 UOJIVE2 0.003 0.021 0.020
Table 8: Comparison of estimator performances with outliers

Moreover, the consistency results of UOJIVE2 seem to be preserved under this simulation setup even though ΣZZ1\Sigma_{ZZ}^{-1} does not exist. As shown by Figure 3, the empirical distribution of the 1000 UOJIVE2 estimates concentrates around β=0.3\beta^{*}=0.3 more and more as the sample size increases, whereas that of UOJIVE1 does not change much across the four simulations. It can be an interesting future research direction to show that UOJIVE2 is consistent under such a weak-instrument setup whereas UOJIVE1 is not.

Refer to caption
(a) N=101N=101
Refer to caption
(b) N=401N=401
Refer to caption
(c) N=901N=901
Refer to caption
(d) N=1601N=1601
Figure 3: Simulation with outliers

8 Empirical Studies

There are multitudes of social science studies that use a large number of instruments. Some examples include the judge leniency IV design where researchers use the identity of the judge as an instrument. In other words, the number of instruments is equal to the number of judges in the sample. The method has been applied to other settings (See Table 1 in Frandsen et al. (2023) for the immense popularity of judge leniency design). In this section, I apply approximately unbiased estimators to two classical empirical studies. I compute the standard error by assuming homoskedasticity and treating these approximately unbiased estimators as just-identified IV estimators.

8.1 Quarter of birth

The quarter of birth example has been repeatedly cited by many-instrument literature. Here I apply TSJIs and UOJIVEs the famous example in Angrist and Krueger (1991).

Many states in the US have a compulsory school attendance policy. Students are mandated to stay in school until their 16th, 17th, or 18th birthday depending on which state they are from. As such, students’ quarter of birth may induce different quitting-school behaviors. This natural experiment makes a quarter of birth a valid IV to estimate the marginal earnings brought by an additional school year for those who are affected by the compulsory attendance policy.

Angrist and Krueger (1991) interacts quarter of birth with other dummy variables to generate a large number of IVs,

  1. 1.

    Quarter of birth ×\times Year of birth

  2. 2.

    Quarter of birth ×\times Year of birth ×\times State of birth

where case 1 contains 30 instruments, and case 2 contains 180 instruments. The results are reported in Table 9.

Estimator Case 1 Case 2
Estimate (%) SE (%) Estimate (%) SE (%)
TSLS 8.91 1.61 9.28 0.93
Nagar 9.35 1.80 10.88 1.20
AUK 9.35 1.80 10.88 1.20
JIVE1 9.59 1.90 12.11 1.37
JIVE2 9.59 1.90 12.11 1.37
TSJI1 9.34 1.79 10.92 1.20
TSJI2 9.34 1.79 10.93 1.20
UIJIVE1 9.34 1.80 10.20 1.08
UIJIVE2 9.34 1.80 10.95 1.20
UOJIVE1 9.34 1.80 10.20 1.08
UOJIVE2 9.34 1.80 10.95 1.20
Table 9: text

8.2 Veteran’s smoking behavior

Bedard and Deschênes (2006) use year of birth and its interaction with gender as instruments to estimate how much enlisting for WWII and the Korean War increases the veterans’ probability of smoking during the later part of their lives. The result can be interpreted as LATE.

  1. 1.

    Birth year ×\times gender

  2. 2.

    Birth year

where case 1 uses all data and case 2 uses only data for male veterans. The results are summarized in Table 10.

Estimator Case 1 Case 2
CPS 60 CPS 90 CPS 60 CPS 90
Estimate SE Estimate SE Estimate SE Estimate SE
TSLS 27.63 3.50 34.58 2.38 23.69 13.90 30.14 3.15
Nagar 27.82 3.51 34.68 2.39 32.42 17.61 30.43 3.18
AUK 27.82 3.51 34.68 2.39 32.45 17.62 30.43 3.18
JIVE1 28.50 3.65 35.03 2.43 -136.12 224.30 31.13 3.33
JIVE2 28.50 3.65 35.03 2.43 -136.12 224.35 31.13 3.33
TSJI1 27.84 3.53 34.68 2.39 32.52 21.99 30.43 3.21
TSJI2 27.84 3.53 34.68 2.39 32.52 21.99 30.43 3.21
UIJIVE1 27.85 0.12 34.70 2.39 32.37 17.64 30.43 3.18
UIJIVE2 27.85 0.12 34.70 2.39 32.37 17.65 30.43 3.18
UOJIVE1 27.85 0.12 34.70 2.39 32.37 17.64 30.43 3.18
UOJIVE2 27.85 0.12 34.70 2.39 32.37 17.65 30.43 3.18
Table 10: Estimation Results from Empirical Study for Cases 1 and 2, CPS 60 and 90

The results of all estimators are close except for Case 2 with CPS90. In this setup of Case 2 CPS90, JIVE’s result, which is negative and counterintuitively large in magnitude (larger than 1), is driven by its instability. TSLS’ estimate is about 10% below the estimates of other approximately unbiased estimators. Compared to other setups (Case 1 CPS60, Case 1 CPS90, Case 2 CPS90), the TSLS estimate for this case is the smallest. In contrast, for other approximately unbiased estimators, their estimates throughout all four cases are closer to each other. Bedard and Deschênes (2006) claims that the four TSLS estimates “are almost identical”. Table 10 gives stronger evidence for this claim with even closer estimates from other approximately unbiased estimators.

9 Conclusion

This paper formalizes the definition of approximate bias and extends its applicability to other estimators that the approximate bias literature has not considered. The extension motivates new approximately unbiased estimators such as UOJIVE2. I show that UOJIVE2 is consistent under a fixed number of instruments and control variables as the sample size goes to infinity. The consistency and asymptotic normality results of UOJIVE2 do not require the absence of high leverage points, a condition that is necessary for the proofs that I construct to establish consistency and asymptotic normality of UOJIVE1. The simulation study aligns with these theoretical results. When a high leverage point coincides with a high variance of the error term, an outlier is generated and the performance of UOJIVE1 is much poorer than that of UOJIVE2.

Appendix A Approximate bias for classes of estimators that have CZ=ZCZ=Z property

Recall that

RN=JϵR1Q0NπZηJϵR2+Q0NηCϵR3Q0NηPZπϵR4+oP(1N)R_{N}=\underbrace{J\epsilon}_{R1}-\underbrace{\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta J\epsilon}_{R2}+\underbrace{\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}\epsilon}_{R3}-\underbrace{\frac{Q_{0}}{N}\eta^{\prime}P_{Z\pi}\epsilon}_{R4}+o_{P}(\frac{1}{N})

In this section, I will provide proof for corollary 1 and in the process, the derivation of R1R1, R2R2, R3R3 and R4R4. Consider an IV estimator that takes form of (XCX)1(XCy)=β+(XCX)1(XCϵ)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y)=\beta+(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}\epsilon) where CZ=ZCZ=Z and hence, CX=Zπ+CηCX=Z\pi+C\eta.

(XCX)1(XCϵ)=\displaystyle(X^{\prime}CX)^{-1}(X^{\prime}C\epsilon)= (XCX)1(XCϵ)\displaystyle(X^{\prime}CX)^{-1}(X^{\prime}C\epsilon)
=\displaystyle= ((πZ+ηC)X)1(XCϵ)\displaystyle((\pi^{\prime}Z^{\prime}+\eta^{\prime}C^{\prime})X)^{-1}(X^{\prime}C^{\prime}\epsilon)
=\displaystyle= (πZX+ηCX)1(XCϵ)\displaystyle(\pi^{\prime}Z^{\prime}X+\eta^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}\epsilon)
=\displaystyle= (I+QηCX)1Q(XCϵ)whereQ=(πZX)1\displaystyle(I+Q\eta^{\prime}C^{\prime}X)^{-1}Q(X^{\prime}C^{\prime}\epsilon)\quad\text{where}\quad Q=(\pi^{\prime}Z^{\prime}X)^{-1}
=\displaystyle= (IQηCXOP(1N))Q(XCϵ)OP(1N)+oP(1N)\displaystyle(I-\underbrace{Q\eta^{\prime}C^{\prime}X}_{\sim O_{P}(\frac{1}{\sqrt{N}})})\underbrace{Q(X^{\prime}C^{\prime}\epsilon)}_{\sim O_{P}(\frac{1}{\sqrt{N}})}+o_{P}(\frac{1}{N})

The last step is a geometric expansion of (I+QηCX)1(I+Q\eta^{\prime}C^{\prime}X)^{-1} where QηCX=OP(1N)Q\eta^{\prime}C^{\prime}X=O_{P}(\frac{1}{\sqrt{N}}) since Q=(πZX)1=OP(1N)Q=(\pi^{\prime}Z^{\prime}X)^{-1}=O_{P}(\frac{1}{N}) and ηCX=OP(N)\eta^{\prime}C^{\prime}X=O_{P}(\sqrt{N}). The first term’s stochastic order is obvious, I evaluate the stochastic order of λ1\lambda_{1}-class estimator’s ηCX\eta^{\prime}C^{\prime}X as an example to show that ηCX=OP(N)\eta^{\prime}C^{\prime}X=O_{P}(\sqrt{N}). The proof for ω1\omega_{1}-class is similar but easier given that ωO(1N)\omega\sim O(\frac{1}{N}) and Assumption BA. The proof for the k-class estimator is trivial.

ηCX=\displaystyle\eta^{\prime}C^{\prime}X= η(PZλD)(IλD)1X\displaystyle\eta^{\prime}(P_{Z}-\lambda D)(I-\lambda D)^{-1}X
=\displaystyle= ηZ(ZZ)1Z(IλD)1Xλi=1NηiZi(ZZ)1ZiXi1λDi\displaystyle\eta^{\prime}Z(Z^{\prime}Z)^{-1}Z^{\prime}(I-\lambda D)^{-1}X-\lambda\sum_{i=1}^{N}\eta_{i}^{\prime}Z_{i}(Z^{\prime}Z)^{-1}Z_{i}^{\prime}\frac{X_{i}}{1-\lambda D_{i}}
=\displaystyle= ηZ(ZZ)1i=1NZiXi1λDiλi=1NηiDiXi1λDi\displaystyle\eta^{\prime}Z(Z^{\prime}Z)^{-1}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-\lambda D_{i}}-\lambda\sum_{i=1}^{N}\eta_{i}^{\prime}D_{i}\frac{X_{i}}{1-\lambda D_{i}}

I make the following Assumption BA for the leverage of projection matrix PZ=Z(ZZ)1ZP_{Z}=Z(Z^{\prime}Z)^{-1}Z^{\prime}, the assumption implies that for large enough NN, DimD_{i}\leq m for some fixed m>0m>0, for all i=1,2,3,Ni=1,2,3\dots,N.

Lemma A.1.

Assume that 0λ10\leq\lambda\leq 1, ηZ(ZZ)1i=1NZiXi1λDi=OP(N)\eta^{\prime}Z(Z^{\prime}Z)^{-1}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-\lambda D_{i}}=O_{P}({\sqrt{N}}).

Proof.
ηZ(ZZ)1i=1NZiXi1λDi=OP(N)OP(1N)OP(N)=OP(N)\eta^{\prime}Z(Z^{\prime}Z)^{-1}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-\lambda D_{i}}=O_{P}(\sqrt{N})O_{P}(\frac{1}{N})O_{P}(N)=O_{P}(\sqrt{N})

because CLT applies to 1NηZ\frac{1}{\sqrt{N}}\eta^{\prime}Z and law of large of number applies to the summation term when divided by NN. ∎

Lemma A.2.

E[ϕir+δ1] for some δ1>01N1/rmaxiϕi=oP(1)E[\lVert\phi_{i}\rVert^{r+\delta_{1}}]\text{ for some $\delta_{1}>0$}\implies\frac{1}{N^{1/r}}\max_{i}{\lVert\phi_{i}\rVert}=o_{P}(1).

Proof.

For a fixed δ>0\delta>0,

P(1N1/rmaxiϕi<δ)=\displaystyle P(\frac{1}{N^{1/r}}\max_{i}{\lVert\phi_{i}\rVert}<\delta)= P(maxiϕi<δN1/r)\displaystyle P(\max_{i}\lVert\phi_{i}\rVert<\delta N^{1/r})
=\displaystyle= P(ϕi<δN1/rfori=1,2,,N)\displaystyle P(\lVert\phi_{i}\rVert<\delta N^{1/r}\quad\text{for}\quad i=1,2,\dots,N)
=\displaystyle= P(ϕi<δN1/r)N\displaystyle P(\lVert\phi_{i}\rVert<\delta N^{1/r})^{N}
=\displaystyle= P(ϕir+δ1<(δN1/r)r+δ1)Nfor δ1>0\displaystyle P(\lVert\phi_{i}\rVert^{r+\delta_{1}}<(\delta N^{1/r})^{r+\delta_{1}})^{N}\quad\text{for $\delta_{1}>0$}
=\displaystyle= (1P(ϕir+δ1(δN1/r)r+δ1))N\displaystyle\left(1-P(\lVert\phi_{i}\rVert^{r+\delta_{1}}\geq(\delta N^{1/r})^{r+\delta_{1}})\right)^{N}
\displaystyle\geq (1E[ϕir+δ1](δN1/r)r+δ1)N\displaystyle\left(1-\frac{E[\lVert\phi_{i}\rVert^{r+\delta_{1}}]}{(\delta N^{1/r})^{r+\delta_{1}}}\right)^{N}
=\displaystyle= (11NE[ϕir+δ1]δr+δ1Nδ1/r)N\displaystyle\left(1-\frac{1}{N}\frac{E[\lVert\phi_{i}\rVert^{r+\delta_{1}}]}{\delta^{r+\delta_{1}}N^{\delta_{1}/r}}\right)^{N}
\displaystyle\geq 1E[ϕir+δ1]δr+δ1Nδ1/r1\displaystyle 1-\frac{E[\lVert\phi_{i}\rVert^{r+\delta_{1}}]}{\delta^{r+\delta_{1}}N^{\delta_{1}/r}}\to 1

The last inequality holds when N1N\geq 1 and E[ϕir+δ1]δr+δ1Nδ1/r<N\frac{E[\lVert\phi_{i}\rVert^{r+\delta_{1}}]}{\delta^{r+\delta_{1}}N^{\delta_{1}/r}}<N, both of which are true for large NN. ∎

With Lemma A.2, it is easy to show the following: assuming that E[ηiXi2+δ1]E[\lVert\eta_{i}^{\prime}X_{i}\rVert^{2+\delta_{1}}] is finite, for some δ1>0\delta_{1}>0, i=1NηiDiXi1λDiK1mmaxiηiXi=oP(N)\lVert\sum_{i=1}^{N}\eta_{i}^{\prime}D_{i}\frac{X_{i}}{1-\lambda D_{i}}\rVert\leq\frac{K}{1-m}\max_{i}\lVert\eta_{i}^{\prime}X_{i}\rVert=o_{P}(\sqrt{N}). Combining this result with Lemma A.1, I show that ηCX=OP(N)\eta^{\prime}C^{\prime}X=O_{P}(\sqrt{N}). The proof for XCϵ=OP(N)X^{\prime}C^{\prime}\epsilon=O_{P}(\sqrt{N}) is similar.

Next, I sub X=Zπ+ηX=Z\pi+\eta into the (IQηCX)Q(XCϵ)+oP(1N)(I-{Q\eta^{\prime}C^{\prime}X})Q(X^{\prime}C^{\prime}\epsilon)+o_{P}(\frac{1}{N}).

(XCX)1(XCϵ)=\displaystyle(X^{\prime}CX)^{-1}(X^{\prime}C\epsilon)= (IQηC(Zπ+η)Q((Zπ+η)Cϵ)+oP(1N)\displaystyle(I-{Q\eta^{\prime}C^{\prime}(Z\pi+\eta)}Q((Z\pi+\eta)^{\prime}C^{\prime}\epsilon)+o_{P}(\frac{1}{N})
=\displaystyle= (IQηCZπQηCη)Q(πZϵ+ηCϵ)+oP(1N)\displaystyle(I-Q\eta^{\prime}C^{\prime}Z\pi-Q^{\prime}\eta^{\prime}C^{\prime}\eta)Q(\pi^{\prime}Z^{\prime}\epsilon+\eta^{\prime}C^{\prime}\epsilon)+o_{P}(\frac{1}{N})
=\displaystyle= QπZϵ+QηCϵQηCZπQπZϵ+oP(1N)\displaystyle Q\pi^{\prime}Z^{\prime}\epsilon+Q\eta^{\prime}C^{\prime}\epsilon-Q\eta^{\prime}C^{\prime}Z\pi Q\pi^{\prime}Z^{\prime}\epsilon+o_{P}(\frac{1}{N})

The last equality holds because after cross-multiplying, we have six terms to evaluate:

term stochastic order keep or not
QπZϵQ\pi^{\prime}Z^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{\sqrt{N}}) Yes
QηCϵQ\eta^{\prime}C^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{N}) Yes
QηCZπQπZϵ-Q\eta^{\prime}C^{\prime}Z\pi Q\pi^{\prime}Z^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{N}) Yes
QηCZπQηCϵ-Q\eta^{\prime}C^{\prime}Z\pi Q\eta^{\prime}C^{\prime}\epsilon OP(1NN)O_{P}(\frac{1}{N\sqrt{N}}) No
QηCηQπZϵ-Q\eta^{\prime}C^{\prime}\eta Q\pi^{\prime}Z^{\prime}\epsilon OP(1NN)O_{P}(\frac{1}{N\sqrt{N}}) No
QηCηQηCϵ-Q\eta^{\prime}C^{\prime}\eta Q\eta^{\prime}C^{\prime}\epsilon OP(1N2)O_{P}(\frac{1}{N^{2}}) No

After dropping the last three terms, we obtain the following expression for the difference between the estimator and β\beta:

(XCX)1(XCϵ)=QπZϵ+QηCϵQηCZπQπZϵ+oP(1N)(X^{\prime}CX)^{-1}(X^{\prime}C\epsilon)=Q\pi^{\prime}Z^{\prime}\epsilon+Q\eta^{\prime}C^{\prime}\epsilon-Q\eta^{\prime}C^{\prime}Z\pi Q\pi^{\prime}Z^{\prime}\epsilon+o_{P}(\frac{1}{N}) (7)

We evaluate the three terms in Eq.(7) separately.

A.1 QπZϵQ\pi^{\prime}Z^{\prime}\epsilon

QπZϵ=\displaystyle Q\pi^{\prime}Z^{\prime}\epsilon= (πZX)1πZϵ\displaystyle(\pi^{\prime}Z^{\prime}X)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= (πZZπ+πZη)1πZϵ\displaystyle(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= (πZZπ)1πZϵE[(πZZπ)1πZϵ]=0(πZZπ)1πZη(πZZπ+πZη)1πZϵ\displaystyle\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{E[(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon]=0}-(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon

The part of the expression with a zero expectation is exactly R1R1.

(πZZπ)1πZη(πZZπ+πZη)1πZϵ\displaystyle(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= (πZZπ)1πZη(πZZπ)1πZϵ(πZZπ)1πZη(πZZπ)1πZη(πZZπ+πZη)1πZϵOP(1NN)\displaystyle(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon-\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{\sim O_{P}(\frac{1}{N\sqrt{N}})}
=\displaystyle= Q0NπZη(πZZπ)1πZϵR2+oP(1N)\displaystyle\underbrace{\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{R2}+o_{P}(\frac{1}{N})

The last equality holds because N(πZZπ)1Q0N(\pi^{\prime}Z^{\prime}Z\pi)^{-1}{\to}Q_{0}, therefore, (πZZπ)1Q0N=o(1N)(\pi^{\prime}Z^{\prime}Z\pi)^{-1}-\frac{Q_{0}}{N}=o(\frac{1}{N}). And we also have that πZη(πZZπ)1πZϵ=OP(1)\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon=O_{P}(1). So,

Q0NπZη(πZZπ)1πZϵ(πZZπ)1πZη(πZZπ)1πZϵ\displaystyle\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon-(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= (Q0N(πZZπ)1)πZη(πZZπ)1πZϵ=o(1N)OP(1)=oP(1N).\displaystyle(\frac{Q_{0}}{N}-(\pi^{\prime}Z^{\prime}Z\pi)^{-1})\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon=o(\frac{1}{N})O_{P}(1)=o_{P}(\frac{1}{N}).

Then, I evaluate the expectation of R2:

E[Q0NπZη(πZZπ)1πZϵ]=\displaystyle E[\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon]= Q0NE[πZE[η(πZZπ)1πZϵ|Z]]\displaystyle\frac{Q_{0}}{N}E[\pi^{\prime}Z^{\prime}E[\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon|Z]]
=\displaystyle= Q0NE[πZ((πZZπ)1πZ)]σηϵ\displaystyle\frac{Q_{0}}{N}E[\pi^{\prime}Z^{\prime}((\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime})^{\prime}]\sigma_{\eta\epsilon}
=\displaystyle= Q0NILσηϵ=Q0Nσηϵ\displaystyle\frac{Q_{0}}{N}I_{L}\sigma_{\eta\epsilon}=\frac{Q_{0}}{N}\sigma_{\eta\epsilon}

A.2 QηCϵQ\eta^{\prime}C^{\prime}\epsilon

QηCϵ=Q0NηCϵR3+oP(1N)\displaystyle Q\eta^{\prime}C^{\prime}\epsilon=\underbrace{\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}\epsilon}_{R3}+o_{P}(\frac{1}{N})
E[Q0NηCϵ]=Q0Ntr(C)σηϵ\displaystyle E[\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}\epsilon]=\frac{Q_{0}}{N}tr(C^{\prime})\sigma_{\eta\epsilon}

A.3 QηCZπQπZϵQ\eta^{\prime}C^{\prime}Z\pi Q\pi^{\prime}Z^{\prime}\epsilon

QηCZπQπZϵ=\displaystyle Q\eta^{\prime}C^{\prime}Z\pi Q\pi^{\prime}Z^{\prime}\epsilon= QηCZπ(πZX)1πZϵ\displaystyle Q\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}X)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= QηCZπ(πZZπ+πZη)1πZϵ\displaystyle Q\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon
=\displaystyle= QηCZπ(πZZπ)1πZϵQηCZπ(πZZπ)1πZη(πZZπ+πZη)1πZϵO(1NN)\displaystyle Q\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon-\underbrace{Q\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi+\pi^{\prime}Z^{\prime}\eta)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{\sim O(\frac{1}{N\sqrt{N}})}
=\displaystyle= Q0NηCZπ(πZZπ)1πZϵequivalent to R4 for approximate bias computation purpose+oP(1N)\displaystyle\underbrace{\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{\text{equivalent to $R4$ for approximate bias computation purpose}}+o_{P}(\frac{1}{N})

Though the last expression is not the same as R4R4, it does not affect the definition of approximate bias since we are only interested in the expectation of the last expression and that of R4R4. As long as the last expression and R4R4 share the same expectation, definition 1 remains valid. Recall that R4=Q0NηPZπϵR4=\frac{Q_{0}}{N}\eta^{\prime}P_{Z\pi}\epsilon and E[Q0NηPZπϵ]=Q0Ntr(PZπ)σηϵE[\frac{Q_{0}}{N}\eta^{\prime}P_{Z\pi}\epsilon]=\frac{Q_{0}}{N}tr(P_{Z\pi})\sigma_{\eta\epsilon}. The following shows that the last expression has the same expectation.

E[Q0NηCZπ(πZZπ)1πZϵ]=\displaystyle E[\frac{Q_{0}}{N}\eta^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon]= Q0Ntr(CZπ(πZZπ)1πZ)σηϵ\displaystyle\frac{Q_{0}}{N}tr(C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime})\sigma_{\eta\epsilon}
=\displaystyle= Q0Ntr(ZCZπ(πZZπ)1π)σηϵ\displaystyle\frac{Q_{0}}{N}tr(Z^{\prime}C^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime})\sigma_{\eta\epsilon}
=\displaystyle= Q0Ntr(ZZπ(πZZπ)1π)σηϵ\displaystyle\frac{Q_{0}}{N}tr(Z^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime})\sigma_{\eta\epsilon}
=\displaystyle= Q0Ntr(Zπ(πZZπ)1πZ)σηϵ\displaystyle\frac{Q_{0}}{N}tr(Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime})\sigma_{\eta\epsilon}
=\displaystyle= Q0Ntr(PZπ)σηϵ\displaystyle\frac{Q_{0}}{N}tr(P_{Z\pi})\sigma_{\eta\epsilon}

Combining the results from sections A.1, A.2 and A.3, we obtain Corollary 1.

Appendix B Approximate bias for classes of estimators that do not have CZ=ZCZ=Z property

This section shows that definition 1 applies to ω2\omega_{2}-class and λ2\lambda_{2}-class estimators. Once the validity of the definition is established, it is trivial to show that corollary 1 is also true for these two classes of estimators.

B.1 ω2\omega_{2}-class estimators

Recall that the closed-form expression of ω2\omega_{2}-class estimator is

β^ω2=(X(PZD+ω2I)X)1(X(PZD+ω2I)y)\hat{\beta}_{\omega_{2}}=(X^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}X)^{-1}(X^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}y)

and the difference between β^ω2\hat{\beta}_{\omega_{2}} and β\beta is

(X(PZD+ω2I)X)1(X(PZD+ω2I)ϵ)\displaystyle(X^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}X)^{-1}(X^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}\epsilon)
=\displaystyle= (QX(PZD+ω2I)X)1(QX(PZD+ω2I)ϵ)\displaystyle(QX^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}X)^{-1}(QX^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}\epsilon)
=\displaystyle= (Q(πZ+η)(PZD+ω2I)X)1(QX(PZD+ω2I)ϵ)\displaystyle(Q(\pi^{\prime}Z^{\prime}+\eta^{\prime})(P_{Z}-D+\omega_{2}I)^{\prime}X)^{-1}(QX^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}\epsilon)
=\displaystyle= (IQπZDXoP(1N)+ω2IO(1N)+QηPZXOP(1N)QηDXoP(1N)+ω2QηXOP(1N))1(QX(PZD+ω2I)ϵ)OP(1N)\displaystyle(I-\underbrace{Q\pi^{\prime}Z^{\prime}D^{\prime}X}_{o_{P}(\frac{1}{\sqrt{N}})}+\underbrace{\omega_{2}I}_{O(\frac{1}{N})}+\underbrace{Q\eta^{\prime}P_{Z}^{\prime}X}_{O_{P}(\frac{1}{\sqrt{N}})}-\underbrace{Q\eta^{\prime}D^{\prime}X}_{o_{P}(\frac{1}{\sqrt{N}})}+\underbrace{\omega_{2}Q\eta^{\prime}X}_{O_{P}(\frac{1}{N})})^{-1}\underbrace{(QX^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}\epsilon)}_{\sim O_{P}(\frac{1}{\sqrt{N}})}
=\displaystyle= (IQηPZX)(QX(PZD+ω2I)ϵ)+oP(1N)\displaystyle(I-Q\eta^{\prime}P_{Z}^{\prime}X)(QX^{\prime}(P_{Z}-D+\omega_{2}I)^{\prime}\epsilon)+o_{P}(\frac{1}{N})

After cross-multiplying, we obtain the following six terms

term stochastic order keep or not
QXPZϵQX^{\prime}P_{Z}^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{\sqrt{N}}) Yes
QXDϵ-QX^{\prime}D^{\prime}\epsilon oP(1N)o_{P}(\frac{1}{\sqrt{N}}) Yes
ω2QXϵ\omega_{2}QX^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{N}) Yes
QηPZXQXPZϵ-Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon OP(1N)O_{P}(\frac{1}{N}) Yes
QηPZXQXDϵQ\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}D^{\prime}\epsilon oP(1N)o_{P}(\frac{1}{N}) No
ω2QηPZXQXϵ-\omega_{2}Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}\epsilon OP(1NN)O_{P}(\frac{1}{N\sqrt{N}}) No

B.1.1 QXPZϵQX^{\prime}P_{Z}^{\prime}\epsilon

QXPZϵ=\displaystyle QX^{\prime}P_{Z}^{\prime}\epsilon= (πZX)1(πZ+η)PZϵ\displaystyle(\pi^{\prime}Z^{\prime}X)^{-1}(\pi^{\prime}Z^{\prime}+\eta^{\prime})P_{Z}^{\prime}\epsilon
=\displaystyle= (πZX)1(πZϵ+ηPZϵ)\displaystyle(\pi^{\prime}Z^{\prime}X)^{-1}(\pi^{\prime}Z^{\prime}\epsilon+\eta^{\prime}P_{Z}^{\prime}\epsilon)
=\displaystyle= (πZZπ)1O(1N)(πZϵOP(N)+ηPZϵOP(1))(πZZπ)1(πZη)(πZX)1OP(1NN)(πZϵOP(N)+ηPZϵOP(1))\displaystyle\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}}_{O(\frac{1}{N})}(\underbrace{\pi^{\prime}Z^{\prime}\epsilon}_{O_{P}(\sqrt{N})}+\underbrace{\eta^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(1)})-\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}\eta)(\pi^{\prime}Z^{\prime}X)^{-1}}_{O_{P}(\frac{1}{N\sqrt{N}})}(\underbrace{\pi^{\prime}Z^{\prime}\epsilon}_{O_{P}(\sqrt{N})}+\underbrace{\eta^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(1)})
=\displaystyle= JϵR1+Q0NηPZϵ(a)Q0N(πZη)(πZZπ)1πZϵR2+oP(1N)\displaystyle\underbrace{J\epsilon}_{R1}+\underbrace{\frac{Q_{0}}{N}\eta^{\prime}P_{Z}^{\prime}\epsilon}_{(a)}-\underbrace{\frac{Q_{0}}{N}(\pi^{\prime}Z^{\prime}\eta)(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{R2}+o_{P}(\frac{1}{N})

B.1.2 QXDϵ-QX^{\prime}D^{\prime}\epsilon

QXDϵ=\displaystyle-QX^{\prime}D^{\prime}\epsilon= (πZZπ)1(πZ+η)Dϵ+(πZZπ)1(πZη)(πZX)1OP(1NN)XDϵoP(N)\displaystyle-(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}+\eta^{\prime})D^{\prime}\epsilon+\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}\eta)(\pi^{\prime}Z^{\prime}X)^{-1}}_{O_{P}(\frac{1}{N\sqrt{N}})}\underbrace{X^{\prime}D^{\prime}\epsilon}_{o_{P}({\sqrt{N}})}
=\displaystyle= (πZZπ)1(πZDϵ)E[(πZZπ)1(πZDϵ)]=0Q0N(ηDϵ)(b)+oP(1N)\displaystyle-\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}D^{\prime}\epsilon)}_{E[(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}D^{\prime}\epsilon)]=0}-\underbrace{\frac{Q_{0}}{N}(\eta^{\prime}D^{\prime}\epsilon)}_{(b)}+o_{P}(\frac{1}{N})

B.1.3 ω2QXϵ\omega_{2}QX^{\prime}\epsilon

ω2QXϵOP(1N)=\displaystyle\underbrace{\omega_{2}QX^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}= ω2(πZZπ)1(πZ+η)ϵ+oP(1N)\displaystyle\omega_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}+\eta^{\prime})\epsilon+o_{P}(\frac{1}{N})
=\displaystyle= ω2(πZZπ)1(πZϵ)E[ω2(πZZπ)1(πZϵ)]=0+Q0Nη(ω2I)ϵ(c)+oP(1N)\displaystyle\underbrace{\omega_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}\epsilon)}_{E[\omega_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}\epsilon)]=0}+\underbrace{\frac{Q_{0}}{N}\eta^{\prime}(\omega_{2}I)^{\prime}\epsilon}_{(c)}+o_{P}(\frac{1}{N})

Note that R3=(a)(b)+(c)R3=(a)-(b)+(c).

B.1.4 QηPZXQXPZϵ-Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon

QηPZXQXPZϵOP(1N)=\displaystyle-\underbrace{Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}= (πZZπ)1ηZπ(πZZπ)1πZϵ+oP(1N)\displaystyle-(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\eta^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon+o_{P}(\frac{1}{N})
=\displaystyle= Q0NηPZπϵR4+oP(1N)\displaystyle-\underbrace{\frac{Q_{0}}{N}\eta^{\prime}P_{Z\pi}\epsilon}_{R4}+o_{P}(\frac{1}{N})

B.2 λ2\lambda_{2}-class estimators

Recall that the closed-form expression of ω2\omega_{2}-class estimator is

β^λ2=(X(PZλ2D)X)1(X(PZλ2D)y)\hat{\beta}_{\lambda_{2}}=(X^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}X)^{-1}(X^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}y)

and the difference between β^λ2\hat{\beta}_{\lambda_{2}} and β\beta is

(X(PZλ2D)X)1(X(PZλ2D)ϵ)\displaystyle(X^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}X)^{-1}(X^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}\epsilon)
=\displaystyle= (QX(PZλ2D)X)1(QX(PZλ2D)ϵ)\displaystyle(QX^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}X)^{-1}(QX^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}\epsilon)
=\displaystyle= (I+QηPZXOP(1N)λ2QXDXoP(1N))1(QX(PZλ2D)ϵ)OP(1N)\displaystyle(I+\underbrace{Q\eta^{\prime}P_{Z}^{\prime}X}_{O_{P}(\frac{1}{\sqrt{N}})}-\underbrace{\lambda_{2}QX^{\prime}D^{\prime}X}_{o_{P}(\frac{1}{\sqrt{N}})})^{-1}\underbrace{(QX^{\prime}(P_{Z}-\lambda_{2}D)^{\prime}\epsilon)}_{O_{P}(\frac{1}{\sqrt{N}})}
=\displaystyle= (IQηPZXOP(1N))(QXPZϵOP(1N)λ2QXDϵoP(1N))+oP(1N)\displaystyle(I-\underbrace{Q\eta^{\prime}P_{Z}^{\prime}X}_{O_{P}(\frac{1}{\sqrt{N}})})(\underbrace{QX^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(\frac{1}{\sqrt{N}})}-\underbrace{\lambda_{2}QX^{\prime}D^{\prime}\epsilon}_{o_{P}(\frac{1}{\sqrt{N}})})+o_{P}(\frac{1}{N})
=\displaystyle= QXPZϵQηPZXQXPZϵλ2QXDϵ+oP(1N)\displaystyle QX^{\prime}P_{Z}^{\prime}\epsilon-Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon-\lambda_{2}QX^{\prime}D^{\prime}\epsilon+o_{P}(\frac{1}{N})

B.2.1 QXPZϵQX^{\prime}P_{Z}^{\prime}\epsilon

QXPZϵ=\displaystyle QX^{\prime}P_{Z}^{\prime}\epsilon= QπZϵ+QηPZϵ\displaystyle Q\pi^{\prime}Z^{\prime}\epsilon+Q\eta^{\prime}P_{Z}^{\prime}\epsilon
=\displaystyle= (πZZπ)1πZϵR1+(πZZπ)1ηZπ(πZX)1πZϵOP(1N)+QηPZϵOP(1N)\displaystyle\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{R1}+\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\eta^{\prime}Z\pi(\pi^{\prime}Z^{\prime}X)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}+\underbrace{Q\eta^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}
=\displaystyle= JϵR1+Q0NπZη(πZZπ)1πZϵR2+Q0NηPZϵ(d)+oP(1N)\displaystyle\underbrace{J\epsilon}_{R1}+\underbrace{\frac{Q_{0}}{N}\pi^{\prime}Z^{\prime}\eta(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{R_{2}}+\underbrace{\frac{Q_{0}}{N}\eta^{\prime}P_{Z}^{\prime}\epsilon}_{(d)}+o_{P}(\frac{1}{N})

B.2.2 QηPZXQXPZϵ-Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon

QηPZXQXPZϵOP(1N)=\displaystyle-\underbrace{Q\eta^{\prime}P_{Z}^{\prime}XQX^{\prime}P_{Z}^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}= (πZZπ)1ηPZ(Zπ+η))(πZZπ)1(πZ+η)PZϵOP(1N)+oP(1N)\displaystyle-\underbrace{(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\eta^{\prime}P_{Z}^{\prime}(Z\pi+\eta))(\pi^{\prime}Z^{\prime}Z\pi)^{-1}(\pi^{\prime}Z^{\prime}+\eta^{\prime})P_{Z}^{\prime}\epsilon}_{O_{P}(\frac{1}{N})}+o_{P}(\frac{1}{N})
=\displaystyle= Q0NηZπ(πZZπ)1πZϵR4+oP(1N)\displaystyle-\underbrace{\frac{Q_{0}}{N}\eta^{\prime}Z\pi(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}\epsilon}_{R4}+o_{P}(\frac{1}{N})

B.2.3 λ2QXDϵ-\lambda_{2}QX^{\prime}D^{\prime}\epsilon

λ2QXDϵoP(1N)=\displaystyle-\underbrace{\lambda_{2}QX^{\prime}D^{\prime}\epsilon}_{o_{P}(\frac{1}{\sqrt{N}})}= λ2(πZZπ)1ηDϵλ2(πZZπ)1πZDϵ+oP(1N)\displaystyle-\lambda_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\eta^{\prime}D^{\prime}\epsilon-\lambda_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}ZD^{\prime}\epsilon+o_{P}(\frac{1}{N})
=\displaystyle= Q0Nηλ2Dϵ(e)λ2(πZZπ)1πZDϵE[λ2(πZZπ)1πZDϵ]=0+oP(1N)\displaystyle-\underbrace{\frac{Q_{0}}{N}\eta^{\prime}\lambda_{2}D^{\prime}\epsilon}_{(e)}-\underbrace{\lambda_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}D^{\prime}\epsilon}_{E[\lambda_{2}(\pi^{\prime}Z^{\prime}Z\pi)^{-1}\pi^{\prime}Z^{\prime}D^{\prime}\epsilon]=0}+o_{P}(\frac{1}{N})

Note that R3=(d)(e)R_{3}=(d)-(e).

Appendix C Proof for approximate bias of UIJIVE1 is asymptotically vanishing

tr(C)1=i=1NL1+1N1D~i+L1+1NL11=\displaystyle tr(C)-\mathcal{L}-1=\sum_{i=1}^{N}\frac{\frac{L_{1}+1}{N}}{1-\tilde{D}_{i}+\frac{L_{1}+1}{N}}-L_{1}-1= i=1NL1+1NND~i+L1+1L11\displaystyle\sum_{i=1}^{N}\frac{L_{1}+1}{N-N\tilde{D}_{i}+L_{1}+1}-L_{1}-1
=\displaystyle= i=1N{L1+1NND~i+L1+1L1+1N}\displaystyle\sum_{i=1}^{N}\{\frac{L_{1}+1}{N-N\tilde{D}_{i}+L_{1}+1}-\frac{L_{1}+1}{N}\}
=\displaystyle= i=1NND~i(L1+1)(L1+1)2(NND~i+L1+1)N\displaystyle\sum_{i=1}^{N}\frac{N\tilde{D}_{i}(L_{1}+1)-(L_{1}+1)^{2}}{(N-N\tilde{D}_{i}+L_{1}+1)N}
Claim C.1.

tr(C)10tr(C)-\mathcal{L}-1\to 0.

Proof.
i=1NL1+1N1D~i+L1+1NL11\displaystyle\lVert\sum_{i=1}^{N}\frac{\frac{L_{1}+1}{N}}{1-\tilde{D}_{i}+\frac{L_{1}+1}{N}}-L_{1}-1\rVert\leq i=1NND~i(L1+1)(NND~i+L1+1)N+i=1N(L1+1)2(NND~i+L1+1)N\displaystyle\lVert\sum_{i=1}^{N}\frac{N\tilde{D}_{i}(L_{1}+1)}{(N-N\tilde{D}_{i}+L_{1}+1)N}\rVert+\lVert\sum_{i=1}^{N}\frac{(L_{1}+1)^{2}}{(N-N\tilde{D}_{i}+L_{1}+1)N}\rVert
\displaystyle\leq i=1ND~i(L1+1)(mN+L1+1)+i=1N(L1+1)2(mN+L1+1)N(BA)\displaystyle\lVert\sum_{i=1}^{N}\frac{\tilde{D}_{i}(L_{1}+1)}{(mN+L_{1}+1)}\rVert+\lVert\sum_{i=1}^{N}\frac{(L_{1}+1)^{2}}{(mN+L_{1}+1)N}\rVert\quad\text{(BA)}
=\displaystyle= K1(L1+1)mN+L1+1+(L1+1)2mN+L1+10\displaystyle\frac{K_{1}(L_{1}+1)}{mN+L_{1}+1}+\frac{(L_{1}+1)^{2}}{mN+L_{1}+1}\to 0

Appendix D Generalizing approximate bias to other classes of estimators

D.1 k-class estimators

Classical k-class estimators takes the form of (XCX)1(XCy)(X^{\prime}C^{\prime}X)^{-1}(X^{\prime}C^{\prime}y) and its CC is an affine combination of COLSC_{OLS} (=I=I) and CTSLSC_{TSLS} (=PZ=P_{Z}). Its CC matrix satisfies the CZ=ZCZ=Z property.

kCTSLSZ+(1k)COLSZ=kZ+(1k)Z=Zwhereα.kC_{TSLS}Z+(1-k)C_{OLS}Z=kZ+(1-k)Z=Z\quad\text{where}\quad\alpha\in\mathbb{R}.

Therefore, corollary 1 applies to all k-class estimators. I set k=NL1NKk=\frac{N-L-1}{N-K} such that the approximate bias of the k-class estimator is zero as in Eq(8). The resulting estimator is termed an Approximately Unbiased k-class estimator (or AUK in short) and AUK’s kk converges at a rate of O(1N2)O(\frac{1}{N^{2}}) to that of the Nagar estimator. In contrast, the Nagar estimator’s k converges to that of TSLS (k=1k=1) at a rate of O(1N)O(\frac{1}{N}).

tr(kCTSLS+(1k)COLS)L1\displaystyle tr(kC_{TSLS}+(1-k)C_{OLS})-L-1 =0\displaystyle=0 (8)
kK(1k)NL1\displaystyle kK-(1-k)N-L-1 =0\displaystyle=0
k=1+KL1NK\displaystyle k=1+\frac{K-L-1}{N-K} =1+KL1NNagar estimator’s k+O(1N2)\displaystyle=\underbrace{1+\frac{K-L-1}{N}}_{\text{Nagar estimator's $k$}}+O(\frac{1}{N^{2}})

D.2 λ1\lambda_{1}-class estimator

λ1\lambda_{1}-class estimator bridges between JIVE1 and TSLS, both of which have the CZ=ZCZ=Z property. To maintain this property, the CC matrix of λ1\lambda_{1}-class estimator is designed to be (Iλ1D)1(PZλ1D)(I-\lambda_{1}D)^{-1}(P_{Z}-\lambda_{1}D) such that when λ1=0\lambda_{1}=0, the estimator is TSLS; when λ1=1\lambda_{1}=1, the estimator is JIVE1. Evaluating the approximate bias of λ1\lambda_{1}-class estimator, we get

(Iλ1D)1(PZλ1D)Z=(Iλ1D)1(Zλ1DZ)=(Iλ1D)1(Iλ1D)Z=Z.(I-\lambda_{1}D)^{-1}(P_{Z}-\lambda_{1}D)Z=(I-\lambda_{1}D)^{-1}(Z-\lambda_{1}DZ)=(I-\lambda_{1}D)^{-1}(I-\lambda_{1}D)Z=Z.

By Corollary 1, the approximate bias of λ\lambda-class estimator is proportional to

(1λ1)i=1NDi1λ1DiL1(1-\lambda_{1})\sum_{i=1}^{N}\frac{D_{i}}{1-\lambda_{1}D_{i}}-L-1 (9)

Under Assumption BA, setting λ1=KL1K\lambda_{1}=\frac{K-L-1}{K} makes the approximate bias asymptotically vanish.

D.3 λ2\lambda_{2}-class estimator

The relationship between λ1\lambda_{1}-class estimator and λ2\lambda_{2}-class estimator is analogous to the relationship between JIVE1 and JIVE2. λ2\lambda_{2}-class estimator removes the row-wise division of λ1\lambda_{1}-class estimator. Hence, the CC matrix of λ2\lambda_{2}-class estimator is designed to be PZλ2DP_{Z}-\lambda_{2}D.

By corollary 1, the approximate bias for λ2\lambda_{2}-class estimator is λ2KL1\lambda_{2}K-L-1, and hence, the approximately unbiased λ2\lambda_{2}-class estimator has its λ2=KL1K\lambda_{2}=\frac{K-L-1}{K} and I call this estimator TSJI2.

D.4 ω1\omega_{1}-class estimator: UOJIVE1

By Corollary 1, the approximate bias of ω1\omega_{1}-class estimator is proportional to

i=1Nω11Di+ω1L1.\sum_{i=1}^{N}\frac{\omega_{1}}{1-D_{i}+\omega_{1}}-L-1. (10)

By a similar proof to Appendix C, we set ω1=L+1N\omega_{1}=\frac{L+1}{N}.

D.5 ω2\omega_{2}-class estimator: UOJIVE2

Similarly, we ω2=L+1N\omega_{2}=\frac{L+1}{N}.

Appendix E Asymptotic proofs

E.1 Proof for Lemma 6.1

1NXPZ(IDωI)1X=1NXZ(ZZ)1i=1NZiXi1Di+ω\displaystyle\frac{1}{N}X^{\prime}P_{Z}(I-D-\omega I)^{-1}X=\frac{1}{N}X^{\prime}Z(Z^{\prime}Z)^{-1}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-D_{i}+\omega} (11)
1NXD(IDωI)1X=1Ni=1NDiXiXi1Di+ω\displaystyle\frac{1}{N}X^{\prime}D(I-D-\omega I)^{-1}X=\frac{1}{N}\sum_{i=1}^{N}\frac{D_{i}X_{i}^{\prime}X_{i}}{1-D_{i}+\omega} (12)
1NXωI(IDωI)1X=1Ni=1NωXiXi1Di+ω\displaystyle\frac{1}{N}X^{\prime}\omega I(I-D-\omega I)^{-1}X=\frac{1}{N}\sum_{i=1}^{N}\frac{\omega X_{i}^{\prime}X_{i}}{1-D_{i}+\omega} (13)

Consider expression (11),

1Ni=1NZiXi1Di+ω1NZX\displaystyle\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-D_{i}+\omega}-\frac{1}{N}Z^{\prime}X\rVert\leq 1Ni=1N(Diω)ZiXi1Di+ω\displaystyle\frac{1}{N}\sum_{i=1}^{N}\lVert\frac{(D_{i}-\omega)Z_{i}^{\prime}X_{i}}{1-D_{i}+\omega}\rVert
\displaystyle\leq 1NKmmaxiZiXi+1NL+1mmaxiZiXi\displaystyle\frac{1}{N}\frac{K}{m}\max_{i}\lVert Z_{i}^{\prime}X_{i}\rVert+\frac{1}{N}\frac{L+1}{m}\max_{i}\lVert Z_{i}^{\prime}X_{i}\rVert

Assumption 2 and Lemma A.2 jointly imply that 1Ni=1NZiXi1Di+ω1NZX\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}X_{i}}{1-D_{i}+\omega}-\frac{1}{N}Z^{\prime}X\rVert is bounded above by the sum of of two oP(1)o_{P}(1) terms’ norms. Therefore, expression (11) converges in probability to HH.

Consider expression (12),

1Ni=1NDiXiXi1Di+ω1Ni=1NDiXiXi1Di+ωKNmmaxiXiXi.\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{D_{i}X_{i}^{\prime}X_{i}}{1-D_{i}+\omega}\rVert\leq\frac{1}{N}\sum_{i=1}^{N}\lVert\frac{D_{i}X_{i}^{\prime}X_{i}}{1-D_{i}+\omega}\rVert\leq\frac{K}{Nm}\max_{i}\lVert X_{i}^{\prime}X_{i}\rVert.

Assumption 2 and Lemma A.2 jointly imply that expression (12) converges in probability to 0.

Consider expression (13),

1Ni=1NωXiXi1Di+ωlimNωmE[XiXi]0\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{\omega X_{i}^{\prime}X_{i}}{1-D_{i}+\omega}\rVert\leq\frac{\lim_{N\to\infty}\omega}{m}E[\lVert X_{i}^{\prime}X_{i}\rVert]\to 0

because ω=O(1N)\omega=O(\frac{1}{N}).

E.2 Proof for Lemma 6.2

1NXPZ(ID+ωI)1ϵ=1NXZ(ZZ)1i=1NZiϵi1Di+ω\displaystyle\frac{1}{N}X^{\prime}P_{Z}(I-D+\omega I)^{-1}\epsilon=\frac{1}{N}X^{\prime}Z(Z^{\prime}Z)^{-1}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega} (14)
1NXD(ID+ωI)1ϵ=1Ni=1NDiXiϵi1Di+ω\displaystyle\frac{1}{N}X^{\prime}D(I-D+\omega I)^{-1}\epsilon=\frac{1}{N}\sum_{i=1}^{N}\frac{D_{i}X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega} (15)
1NXωI(IDωI)1ϵ=1Ni=1NωXiϵi1Di+ω\displaystyle\frac{1}{N}X^{\prime}\omega I(I-D-\omega I)^{-1}\epsilon=\frac{1}{N}\sum_{i=1}^{N}\frac{\omega X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega} (16)

Consider 1Ni=1NZiϵi1Di+ω\frac{1}{N}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega} in expression (14),

1Ni=1NZiϵi1Di+ω1NZϵ\displaystyle\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}-\frac{1}{N}Z^{\prime}\epsilon\rVert\leq 1Ni=1NDiZiϵiωZiϵi1Di+ω\displaystyle\frac{1}{N}\sum_{i=1}^{N}\lVert\frac{D_{i}Z_{i}^{\prime}\epsilon_{i}-\omega Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert
\displaystyle\leq 1Nmi=1NDiZiϵiωZiϵi(BA)\displaystyle\frac{1}{Nm}\sum_{i=1}^{N}\lVert D_{i}Z_{i}^{\prime}\epsilon_{i}-\omega Z_{i}^{\prime}\epsilon_{i}\rVert\quad\text{(BA)}
\displaystyle\leq KNmmaxiZiϵi+NωNmmaxiZiϵi\displaystyle\frac{K}{Nm}\max_{i}\lVert Z_{i}^{\prime}\epsilon_{i}\rVert+\frac{N\omega}{Nm}\max_{i}\lVert Z_{i}^{\prime}\epsilon_{i}\rVert

Both terms converge in probability to zero under Assumption 3. Note that Nω=O(1)N\omega=O(1). Therefore, NωNmmaxiZiϵi=O(1)oP(1)=oP(1)\frac{N\omega}{Nm}\max_{i}\lVert Z_{i}^{\prime}\epsilon_{i}\rVert=O(1)o_{P}(1)=o_{P}(1).

Consider expression (15),

1Ni=1NDiXiϵi1Di+ω1Nmi=1NDiXiϵiKNmmaxiXiϵi\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{D_{i}X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert\leq\frac{1}{Nm}\sum_{i=1}^{N}\lVert D_{i}X_{i}^{\prime}\epsilon_{i}\rVert\leq\frac{K}{Nm}\max_{i}\lVert X_{i}^{\prime}\epsilon_{i}\rVert

The last terms converge in probability to zero under Assumption 3.

Consider expression (16),

1Ni=1NωXiϵi1Di+ω1Ni=1NωXiϵi1Di+ωωNmmaxiXiϵi\lVert\frac{1}{N}\sum_{i=1}^{N}\frac{\omega X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert\leq\frac{1}{N}\sum_{i=1}^{N}\lVert\frac{\omega X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert\leq\frac{\omega}{Nm}\max_{i}\lVert X_{i}^{\prime}\epsilon_{i}\rVert

The last term converges in probability to zero under Assumption 3.

E.3 Proof for Lemma 6.3

1NXPZX𝑝H\displaystyle\frac{1}{N}X^{\prime}P_{Z}X\overset{p}{\to}H
1NXDX=1Ni=1NDiXiXiKNmaxiXiXi𝑝0\displaystyle\frac{1}{N}X^{\prime}DX=\frac{1}{N}\sum_{i=1}^{N}D_{i}X_{i}^{\prime}X_{i}\leq\frac{K}{N}\max_{i}\lVert X_{i}^{\prime}X_{i}\rVert\overset{p}{\to}0
1NωXX=1Ni=1NωXiXi𝑝0\displaystyle\frac{1}{N}\omega X^{\prime}X=\frac{1}{N}\sum_{i=1}^{N}\omega X_{i}^{\prime}X_{i}\overset{p}{\to}0

E.4 Proof for Lemma 6.4

1NXPZϵ=1NXZ(ZZ)1Zϵ𝑝0\displaystyle\frac{1}{N}X^{\prime}P_{Z}\epsilon=\frac{1}{N}X^{\prime}Z(Z^{\prime}Z)^{-1}Z^{\prime}\epsilon\overset{p}{\to}0
1NXDϵ=1Ni=1NDiXiϵiKNmaxiXiϵi𝑝0\displaystyle\frac{1}{N}X^{\prime}D\epsilon=\frac{1}{N}\sum_{i=1}^{N}D_{i}X_{i}^{\prime}\epsilon_{i}\leq\frac{K}{N}\max_{i}\lVert X_{i}^{\prime}\epsilon_{i}\rVert\overset{p}{\to}0
1NωXϵ=1Ni=1NωXiϵi𝑝0\displaystyle\frac{1}{N}\omega X^{\prime}\epsilon=\frac{1}{N}\sum_{i=1}^{N}\omega X_{i}^{\prime}\epsilon_{i}\overset{p}{\to}0

E.5 Proof for Lemma 6.5

The proof is similar to proof for Lemma 6.2. I first show that 1Ni=1NZiϵi1Di+ω1NZϵ𝑝0\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}-\frac{1}{\sqrt{N}}Z^{\prime}\epsilon\overset{p}{\to}0.

1Ni=1NZiϵi1Di+ω1Ni=1NZiϵi=\displaystyle\lVert\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}-\frac{1}{\sqrt{N}}\sum_{i=1}^{N}Z_{i}^{\prime}\epsilon_{i}\rVert= 1Ni=1N(Diω)Ziϵi1Di+ω\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{(D_{i}-\omega)Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert
=\displaystyle= 1Ni=1NDiZiϵi1Di+ω+1Ni=1NωZiϵi1Di+ω\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{D_{i}Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{\omega Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert
\displaystyle\leq 1Ni=1NDiZiϵim+ω+1Ni=1NωZiϵim+ω\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{D_{i}Z_{i}^{\prime}\epsilon_{i}}{m+\omega}\rVert+\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{\omega Z_{i}^{\prime}\epsilon_{i}}{m+\omega}\rVert
\displaystyle\leq K(m+ω)NmaxiZiϵi+ω(m+ω)Ni=1NZiϵi\displaystyle\frac{K}{(m+\omega)\sqrt{N}}\max_{i}\lVert Z_{i}^{\prime}\epsilon_{i}\rVert+\frac{\omega}{(m+\omega)\sqrt{N}}\sum_{i=1}^{N}\lVert Z_{i}^{\prime}\epsilon_{i}\rVert

The first term converges to 0 in probability under Assumption 4. The second term converges to 0 in probability ω=O(1N)\omega=O(\frac{1}{N}). Therefore, 1Ni=1NZiϵi1Di+ω𝑑N(0,σ2ΣZZ)\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\frac{Z_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\overset{d}{\to}N(0,\sigma^{2}\Sigma_{ZZ}) and 1NXPZ(ID+ωI)1ϵ𝑑N(0,σϵ2H)\frac{1}{\sqrt{N}}X^{\prime}P_{Z}(I-D+\omega I)^{-1}\epsilon\overset{d}{\to}N(0,\sigma_{\epsilon}^{2}H).

The other two terms converge to 0 in probability under Assumption BA and 4

1NXD(ID+ωI)1ϵ1Ni=1NDiXiϵi1Di+ω\displaystyle\lVert\frac{1}{\sqrt{N}}X^{\prime}D(I-D+\omega I)^{-1}\epsilon\rVert\leq\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{D_{i}X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert 1NKm+ωmaxiXiϵi𝑝0,\displaystyle\leq\frac{1}{\sqrt{N}}\frac{K}{m+\omega}\max_{i}\lVert X_{i}^{\prime}\epsilon_{i}\rVert\overset{p}{\to}0,
1NXωI(ID+ωI)1ϵ1Ni=1NωXiϵi1Di+ω\displaystyle\lVert\frac{1}{\sqrt{N}}X^{\prime}\omega I(I-D+\omega I)^{-1}\epsilon\rVert\leq\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\lVert\frac{\omega X_{i}^{\prime}\epsilon_{i}}{1-D_{i}+\omega}\rVert Nωm+ω1NmaxiXiϵi\displaystyle\leq\frac{N\omega}{m+\omega}\frac{1}{\sqrt{N}}\max_{i}\lVert X_{i}^{\prime}\epsilon_{i}\rVert
=L+1m+o(1/N)oP(1)=oP(1).\displaystyle=\frac{L+1}{m+o(1/N)}o_{P}(1)=o_{P}(1).

E.6 Proof for Lemma 6.6

1NXPZϵ=\displaystyle\frac{1}{\sqrt{N}}{X}^{\prime}P_{{Z}}^{\prime}{\epsilon}= 1NXZ(ZZ)1Zϵ𝑑N(0,σϵ2ΣXZΣZZ1ΣZX)\displaystyle\frac{1}{\sqrt{N}}{X}^{\prime}{Z}({Z}^{\prime}{Z})^{-1}{Z}^{\prime}{\epsilon}\overset{d}{\to}N(0,\sigma_{{\epsilon}}^{2}\Sigma_{{X}^{\prime}{Z}}\Sigma_{{Z}^{\prime}{Z}}^{-1}\Sigma_{{Z}^{\prime}{X}})
1NXDϵ=\displaystyle\frac{1}{\sqrt{N}}{X}^{\prime}{D}^{\prime}{\epsilon}= 1Ni=1NDiXiϵi𝑝0\displaystyle\frac{1}{\sqrt{N}}\sum_{i=1}^{N}{D}_{i}{X}_{i}^{\prime}{\epsilon}_{i}\overset{p}{\to}0
L+1N1NXϵ=\displaystyle\frac{L+1}{\sqrt{N}}\frac{1}{N}{X}^{\prime}{\epsilon}= O(1N)OP(1)=OP(1N)=oP(1)\displaystyle O(\frac{1}{\sqrt{N}})O_{P}(1)=O_{P}(\frac{1}{\sqrt{N}})=o_{P}(1)

Appendix F Example for the simulation setup with outliers

Column 1 Column 2 Column 3 Column 4 Column 5 Description
(N1)1/3(N-1)^{1/3} 0 0 0 0 Outlier
1 0 0 0 0 Group 2 (Row 2)
0 1 0 0 0 Group 3 (Row 3)
0 0 1 0 0 Group 4 (Row 4)
0 0 0 1 0 Group 5 (Row 5)
0 0 0 0 1 Group 6 (Row 6)
0 0 0 0 0 Group 1 (Rows 7-11)
0 0 0 0 0 Group 1 (continued)
0 0 0 0 0 Group 1 (continued)
0 0 0 0 0 Group 1 (continued)
0 0 0 0 0 Group 1 (continued)
1 0 0 0 0 Group 2 (Row 12)
0 1 0 0 0 Group 3 (Row 13)
0 0 1 0 0 Group 4 (Row 14)
0 0 0 1 0 Group 5 (Row 15)
0 0 0 0 1 Group 6 (Row 16)
0 0 0 0 0 Group 1 (Rows 17-21)
0 0 0 0 0 Group 1 (continued)
…(Pattern continues until 101 rows)
Table 11: Matrix Structure for Simulation with Outliers for N=101N=101, the outlier is supposed to belong to Group 2, but has its value contaminated (multiplied by (N1)1/3(N-1)^{1/3}).

References

  • Ackerberg and Devereux (2009) Daniel A Ackerberg and Paul J Devereux. Improved JIVE estimators for overidentified linear models with and without heteroskedasticity. The Review of Economics and Statistics, 91(2):351–362, 2009.
  • Anderson and Sawa (1982) Theodore Wilbur Anderson and Takamitsu Sawa. Exact and approximate distributions of the maximum likelihood estimator of a slope coefficient. Journal of the Royal Statistical Society Series B: Statistical Methodology, 44(1):52–62, 1982.
  • Angrist and Krueger (1991) Joshua D Angrist and Alan B Krueger. Does compulsory school attendance affect schooling and earnings? The Quarterly Journal of Economics, 106(4):979–1014, 1991.
  • Angrist et al. (1999) Joshua D Angrist, Guido W Imbens, and Alan B Krueger. Jackknife instrumental variables estimation. Journal of Applied Econometrics, 14(1):57–67, 1999.
  • Bedard and Deschênes (2006) Kelly Bedard and Olivier Deschênes. The long-term impact of military service on health: Evidence from World War II and Korean War veterans. American Economic Review, 96(1):176–194, 2006.
  • Bekker (1994) Paul A Bekker. Alternative approximations to the distributions of instrumental variable estimators. Econometrica: Journal of the Econometric Society, pages 657–681, 1994.
  • Bekker and Crudu (2015) Paul A Bekker and Federico Crudu. Jackknife instrumental variable estimation with heteroskedasticity. Journal of econometrics, 185(2):332–342, 2015.
  • Buse (1992) Adolf Buse. The bias of instrumental variable estimators. Econometrica: Journal of the Econometric Society, pages 173–180, 1992.
  • Chao et al. (2012) John C Chao, Norman R Swanson, Jerry A Hausman, Whitney K Newey, and Tiemen Woutersen. Asymptotic distribution of JIVE in a heteroskedastic IV regression with many instruments. Econometric Theory, 28(1):42–86, 2012.
  • Davidson and MacKinnon (2007) Russell Davidson and James G MacKinnon. Moments of IV and JIVE estimators. The Econometrics Journal, 10(3):541–553, 2007.
  • Frandsen et al. (2023) Brigham Frandsen, Lars Lefgren, and Emily Leslie. Judging Judge Fixed Effects. American Economic Review, 113(1):253–77, 2023.
  • Fuller (1977) Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.
  • Hansen and Kozbur (2014) Christian Hansen and Damian Kozbur. Instrumental variables estimation with many weak instruments using regularized JIVE. Journal of Econometrics, 182(2):290–308, 2014.
  • Harding et al. (2016) Matthew Harding, Jerry Hausman, and Christopher J Palmer. Finite sample bias corrected IV estimation for weak and many instruments. In Essays in honor of Aman Ullah, pages 245–273. Emerald Group Publishing Limited, 2016.
  • Hausman et al. (2012) Jerry A Hausman, Whitney K Newey, Tiemen Woutersen, John C Chao, and Norman R Swanson. Instrumental variable estimation with heteroskedasticity and many instruments. Quantitative Economics, 3(2):211–255, 2012.
  • Kunitomo (2012) Naoto Kunitomo. An optimal modification of the LIML estimation for many instruments and persistent heteroscedasticity. Annals of the Institute of Statistical Mathematics, 64:881–910, 2012.
  • Mogstad et al. (2021) Magne Mogstad, Alexander Torgovitsky, and Christopher R Walters. The causal interpretation of two-stage least squares with multiple instrumental variables. American Economic Review, 111(11):3663–98, 2021.
  • Nagar (1959) Anirudh L Nagar. The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica: Journal of the Econometric Society, pages 575–595, 1959.