Estimating overidentified linear models with heteroskedasticity and outliers
Abstract
A large degree of overidentification causes severe bias in TSLS. A conventional heuristic rule used to motivate new estimators in this context is approximate bias. This paper formalizes the definition of approximate bias and expands the applicability of approximate bias to various classes of estimators that bridge OLS, TSLS, and Jackknife IV estimators (JIVEs). By evaluating their approximate biases, I propose new approximately unbiased estimators, including UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of an existing estimator UIJIVE1. Both UOJIVEs are proven to be consistent and asymptotically normal under a fixed number of instruments and controls. The asymptotic proofs for UOJIVE1 in this paper require the absence of high leverage points, whereas proofs for UOJIVE2 do not. In addition, UOJIVE2 is consistent under many-instrument asymptotic. The simulation results align with the theorems in this paper: (i) Both UOJIVEs perform well under many instrument scenarios with or without heteroskedasticity, (ii) When a high leverage point coincides with a high variance of the error term, an outlier is generated and the performance of UOJIVE1 is much poorer than that of UOJIVE2.
Keywords: Approximate bias, instrumental variables (IV), overidentification, k-class estimator, Jackknife IV estimator (JIVE), many-instrument asymptotics, robustness to outlier
1 Introduction
Overidentified two-stage least squares (TSLS) are commonplace in economics research. Mogstad et al. (2021) summarize that from January 2000 to October 2018, 57 papers from American Economic Review, Quarterly Journal of Economics, Journal of Political Economy, Econometrica, and the Review of Economic Studies adopt overidentified TSLS. Unfortunately, overidentification introduces a bias problem to TSLS. The intuition for the relationship between bias and the number of IVs can be illustrated with a pathological example in which a researcher adds so many instruments in the first-stage regression that the number of first-stage regressors (which include both the IVs and exogenous control variables) is equal to the number of observations. The first stage regression equation has a perfect fit and its fitted value is exactly the observed endogenous variable values. Under this pathological example, TSLS and OLS perform the same, and so TSLS is equally biased as OLS.
To reduce or even completely resolve the problem, one may want to evaluate the bias of TSLS. However, evaluating the exact bias for estimators under the presence of endogeneity requires knowledge about the distribution of observable and unobservable variables. For example, Anderson and Sawa (1982) and Harding et al. (2016) assume that the error terms of the simultaneous equations are jointly normal and the assumption allows them to evaluate the finite sample distribution of TSLS and therefore, its bias. However, such assumptions can be too strong for economists’ preferences. Therefore, many econometricians resort to a different evaluation criterion called “approximate bias”. Many past works on IV estimators have evaluated the approximate bias of TSLS and used that to motivate new estimators (Nagar, 1959; Fuller, 1977; Buse, 1992; Angrist et al., 1999; Ackerberg and Devereux, 2009). The idea behind approximate bias is to divide the difference between an estimator and the target parameter into two parts. One part is of a higher stochastic order than the other and is dropped out of the subsequent approximate bias calculation. The expectation of the lower stochastic order part is called approximate bias.
This paper formalizes the definition of approximate bias and shows why this definition is a reasonable heuristic rule for motivating new estimators. The definition of approximate bias proposed by Nagar (1959) applies only to the k-class estimator. On the other hand, the definition proposed in the working paper of Angrist et al. (1999) (later referred to as AIK 1995) and used in Ackerberg and Devereux (2009) (later referred to as AD 2009) applies to a large class of estimators which includes OLS, TSLS, all k-class estimators, JIVE1, IJIVE1 and UIJIVE1.111IJIVE1 and UIJIVE1 are originally termed IJIVE and UIJIVE, respectively, by Ackerberg and Devereux (2009). I attach the number 1 at the end of their names for comparison purposes which will become clear in later sections of this paper. I show that the definition of approximate bias in AIK 1995 and AD 2009 is valid for an even larger class of estimators which additionally includes Fuller (1977), JIVE2, Hausman et al. (2012), and many other classes of estimators that bridge between OLS, TSLS, and JIVE2.
By expanding the applications of approximate bias to additional classes of estimators, I propose new estimators that are approximately unbiased. This paper focuses on two of the new estimators named UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of UIJIVE1. This paper shows that both UOJIVE1 and UOJIVE2 are consistent and asymptotically normal under a fixed number of instruments as the sample size goes to infinity. However, UOJIVE1’s asymptotic proofs rely on an assumption that rules out high leverage points. On the other hand, UOJIVE2 does not require such an assumption. In addition, Hausman et al. (2012)’s Theorem 1 directly applies to UOJIVE2, which proves that UOJIVE2 is consistent even under many-instrument asymptotic. Simulation results align with the theoretical results. UOJIVE1 and UOJIVE2 perform similarly well under a large degree of overidentification and heteroskedasticity. UOJIVE2’s performance is much more stable than UOJIVE1’s when an outlier (a high leverage point coinciding with a large variance of the error term) is deliberately introduced in the DGP.
The paper is organized as follows. Section 2 describes the problem setup and existing estimators in the approximate bias literature. Section 3 defines approximate bias for a large class of estimators and explains the importance of this formal definition. Sections 4 and 5 propose new estimators that are approximately unbiased. Section 6 shows that (i) under a fixed number of instruments, UOJIVE1 and UOJIVE2 are consistent and asymptotically normal, and (ii) under many-instrument asymptotic, UIJIVE2 is consistent. Section 7 and section 8 demonstrate the superior performance of UOJIVE2 in simulation and empirical studies. Section 9 concludes the paper.
2 Model setup and existing estimators
2.1 Overidentified linear model setup
The model concerns a typical endogeneity problem:
(1) | ||||
(2) |
Eq.(1) contains the parameter of interest . is the response/outcome/dependent variable and is an -dimensional row vector of endogenous covariate/explanatory/independent variables. Exogenous control variables, denoted as , is an -dimensional vector. Eq.(2) relates all the endogenous explanatory variables to instrumental variables, , and included exogenous control variables from Eq.(1). is a -dimensional row vector, where . Since this paper focuses on overidentified cases, I will assume that throughout the rest of the paper. As Hausman et al. (2012), Bekker and Crudu (2015), this paper treats as nonrandom (Alternatively, can be assumed to be random, but conditioned on like in Chao et al. (2012)). is endogenous, . I further assume that each pair of are independently and identically distributed with mean zero and covariance matrix . is a scalar. is a -dimensional vector. is a matrix. I also impose a relevance constraint that . In matrix notation, I have Eqs.(1) and (2) as:
(3) | ||||
(4) |
where and are column vector; and are matrices; is and is a matrix, where is the number of observations. I also define the following notations for convenience:
Then, we have the following equivalent expressions for Eqs. (3) and (4):
(5) | |||
(6) |
It is also useful to define the following partialled out version of variables:
2.2 Existing estimators
IV estimator is often used to solve this simultaneous equation problem. I tabulate in Table 1 some of the existing IV estimators this paper repeatedly refers to. They all have the matrix expression . The caption of Table 1 summarizes how these estimators are linked to each other.
Estimators | C |
---|---|
OLS | |
TSLS | |
k-class | |
JIVE2 | |
JIVE1 | |
IJIVE1 | |
UIJIVE1 |
3 Approximate bias
IV estimator is often employed to estimate (or ) in Section 2. The most commonly used IV estimator is TSLS which has a bias problem when the degree of overidentification is large. Unfortunately, completely removing the bias of overidentified TSLS is generally infeasible unless economists are willing to assume parametric families for instrumental variables, . Therefore, econometricians often resort to a concept called approximate bias. For example, JIVE1, JIVE2, IJIVE1, and UIJIVE1 from Table 1 are all motivated by reducing approximate bias. The intuition behind the idea is to divide the difference between an estimator, , and the true parameter, , that the estimator is aiming to estimate into two parts. One part is of a higher stochastic order than the other and therefore, is dropped out of the subsequent approximate bias calculation. The other part with lower stochastic order has an easy-to-evaluate expectation. Its expectation is called approximate bias.
3.1 Definition of approximate bias
The following definition of approximate bias has been used by AIK 1995 and AD 2009 to motivate their development of new estimators whose matrix satisfies property (and hence ):
Definition 1.
The approximate bias of an IV estimator is where
in which , and .
For readers’ convenience, I justify why this definition can be used as a reasonable heuristic rule for motivating new estimators in Appendix A. Note that both AIK 1995 and AD 2009 assume that which restricts the application of Definition 1. The condition that is satisfied by OLS, TSLS, all k-class estimators, JIVE1, IJIVE1, and UIJIVE1, but not many other estimators such as JIVE2, HLIM, and HFUL from Hausman et al. (2012), - and -class estimators which I will introduce in later sections of the paper. In Appendix B, I show that the condition is not necessary, Definition 1 is a reasonable heuristic rule for the estimators as mentioned earlier which do not satisfy the condition .
3.2 Approximate bias of existing estimators
Definition 1 yields Corollary 1 and Definitions 2 and 3. I will use them to analyze the existing IV estimators in Table 1.
Corollary 1.
Approximate bias of an IV estimator is is
where and . An estimator’s approximate bias is proportional to , where is the number of columns of (or ).222I use instead of because while for most of the estimators in Table 1, ; for IJIVE1 and UIJIVE1, .
Definition 2.
An estimator is said to be approximately unbiased if .
Definition 3.
The approximate bias of an estimator is said to be asymptotically vanishing if
I compute the approximate bias by showing the term for the following estimators. Larger magnitude of means a larger approximate bias. Ideally, , so that we can claim the estimator to be approximately unbiased.
The approximate bias computation for OLS, TSLS, and k-class estimators is straightforward: OLS’s approximate bias is proportional to , TSLS’s approximate bias is proportional to , k-class estimator’s approximate bias is proportional to , where is a user-defined parameter. Setting gives us an approximately unbiased estimator. I call this estimator AUK (Approximately unbiased k-class estimator).
As explained earlier, TSLS’ approximate bias is proportional to which is large when the degree of overidentification is large. AIK 1995 proposes a jackknife version of TSLS to reduce the approximate bias of TSLS when the degree of overidentification is large. The authors call the proposed estimators JIVE1 and JIVE2 (Jackknife IV Estimator). Evaluating the approximate bias of JIVE2 is straight forward:
We obtain the approximate bias of JIVE1:
AD 2009 reduces the approximate bias of the two JIVEs by paritialling out from . We have the approximate bias of IJIVE1 proportional to
Note that , so IJIVE1 potentially has a smaller approximate bias than JIVEs.333The comparison requires knowledge on . However, since approximate bias is used only as a heuristic rule to motivate new estimators, evaluating is arguably sufficient for motivating purposes. AD 2009 also further reduces approximate bias by adding a constant () to the diagonal of the matrix of IJIVE1. The new estimator is called UIJIVE1.
To evaluate the approximate bias of UIJIVE1, I make the following assumption about the leverage of , :
Assumption (BA).
is bounded away from 1 for large enough N from 1. Equivalently, for large enough and for all .
I state the assumption in terms of as I will repeatedly use it for in Section 6. Here, to compute the approximate bias of IJIVE1, I make assumption BA, but for instead of for . Under Assumption BA for , UIJIVE1’s approximate bias is proportional to
Therefore, we have that UIJIVE1’s approximate bias is asymptotically vanishing. See proof in Appendix C.
4 From UIJIVE1 to UIJIVE2
This section interprets the relationship between existing estimators (in particular, JIVE1, IJIVE1, UIJIVE1, and OLS) which sheds light on how a new estimator that is approximately unbiased, namely, UIJIVE2 is developed.
4.1 -class estimator and UIJIVE1
I define the -class estimator that contains all the following four estimators: JIVE1, IJIVE1, UIJIVE1, and OLS. Since all of them can be expressed as: , I define matrix for the -class estimators:
when , it corresponds to JIVE1; when , it corresponds to OLS. On the other hand, IJIVE1 is a special case of JIVE1 where is replaced with , and hence is belonged to -class estimator. UIJIVE1 uses and sets . As stated in Section 3.2, this choice of outputs UIJIVE1 whose approximate bias is asymptotically vanishing. The information of these four estimators is summarized in Table 2a.
The development from TSLS to JIVE1 to IJIVE1 to UIJIVE1 is depicted in the upper half of Figure 1. TSLS to JIVE1: JIVE1 is a jackknife version of TSLS where the first stage OLS in TSLS is replaced with a jackknife procedure for out of sample prediction. Call the resulting matrix from the first stage , the second stage IV estimation is the same for TSLS and JIVE1: . JIVE1 to IJIVE1: IJIVE1 is a special case of JIVE1. JIVE1 takes as the input, IJIVE1 takes (the partialled-out version of ) as input. IJIVE1 to UIJIVE1: UIJIVE1 adds a constant to the diagonal of matrix of IJIVE1. We can interpret this addition of constant as bridging IJIVE1 and OLS since OLS has its matrix as .
or | ||
---|---|---|
JIVE1 | 0 | |
IJIVE | 0 | |
UIJIVE1 | ||
OLS | Both |
or | ||
---|---|---|
JIVE2 | 0 | |
IJIVE2 | 0 | |
UIJIVE2 | ||
OLS | Both |
4.2 -class estimator and UIJIVE2

I define -class of estimator that contains JIVE2 and OLS. The matrix of -class estimator is
The only difference between matrices for - and - class estimators is the exclusion/inclusion of , which is a rowwise division operation (i.e. dividing th row of by ).
corresponds to JIVE2 and corresponds to OLS. By Corollary 1, the approximate bias of -class estimators that takes is proportional to . If , I obtain a new estimator IJIVE2, corresponding to IJIVE1 from -class estimators. Selecting , I obtain approximately unbiased -class estimator and name it UIJIVE2. Its closed-form expression is:
where the transpose on is not necessary as the matrix is symmetric, but I keep it for coherent notation.
The information of OLS, JIVE2, IJIVE2, and UIJIVE2 are summarized in Table 2b. I also depict the parallel relationship between -class and -class estimators in the lower half of Figure 1. The development from OLS to JIVE2 to IJIVE2 to UIJIVE2 is analogous to the development of the -class estimators. Readers can also interpret Figure 1 from top to bottom, all the estimators in the bottom remove the rowwise division operations in the estimators above them. The removal of rowwise division is important to the comparison between - and -class estimators. The division operation is highly unstable when (or ) is close to 1, leading undesirable asymptotic property to -class estimator, but not to -class estimator. This point will be explained formally in Section 6.
Name | Consistency | Approximately unbiased | Many-instrument consistency | |
Homoskedasticity | Heteroskedasticity | |||
OLS | ||||
TSLS | ✓ | |||
Nagar | ✓ | ✓∗ | ||
JIVE1 | ✓ | ✓ | ||
JIVE2 | ✓ | ✓ | ||
UIJIVE1 | ✓ | ✓ | ✓ | ✓ |
UIJIVE2 | ✓ | ✓ | ✓ | ✓ |
5 From UIJIVE to UOJIVE
Figure 1 shows that UIJIVEs can be interpreted as an approximately unbiased estimator selected from a class of estimators that bridges IJIVEs and OLS. I apply the same thought process to other classes of estimators that bridge between OLS, TSLS, and JIVEs to obtain new approximately unbiased estimators. Table 4 summarizes these five classes of estimators (k-class, -, -, -, and -classes) and approximately unbiased estimators. See Appendix D for details on these estimators. The relationships between five classes of estimators are illustrated in Figure 2.
Class | Estimators | C | parameter |
---|---|---|---|
-class | AUK | ||
-class | TSJI1 | ||
-class | TSJI2 | ||
-class | UOJIVE1 | ||
-class | UOJIVE2 |
Note that UIJIVE1 and UIJIVE2 can be considered as special cases of UOJIVE1 and UOJIVE2. UIJIVE can be understood as the following two steps:
-
1.
Partial out from , and . . . .
-
2.
Set and compute the estimate as an - and -class estimator using , and .
The second step is exactly the UOJIVE for the following model:
where and .
For the rest of the paper, I will show the asymptotic properties of UOJIVE1 and UOJIVE2 and demonstrate that UOJIVE2 is more robust to outliers than UOJIVE1. The theoretical results can be easily generalized to UIJIVE1, UIJIVE2, TSJI1 and TSJI2.


6 Asymptotic property of UOJIVE1 and UOJIVE2
6.1 Consistency of UOJIVEs under fixed
Under fixed and , I show that both UOJIVE1 and UOJIVE2 are consistent as with some assumptions imposed on the moment existence for observable and unobservable variables. It is worth mentioning that Assumption BA is important to the consistency of UOJIVE1, but not to that of UOJIVE2, suggesting that UOJIVE2 is robust to the presence of high leverage points while UOJIVE1 is not. Throughout this subsection, I make the following assumptions
Assumption 1.
Standard regularity assumptions hold for
Assumption 2.
and are finite for some . The existence of these two expectations jointly implies that is finite.
Assumption 3.
and are finite, for some . The existence of these two expectations jointly implies that is finite.
Denote as . Recall that UOJIVE1’s matrix expression is
The proofs for Lemma 6.1 and 6.2 are collected in Appendix E. The importance of Assumption BA stems from the presence of in the denominators. UOJIVE2 does not have the same problem since does not appear in its analytical form.
Now, we turn to analyzing the consistency of UOJIVE2. The steps for proving its consistency are similar to what we have done for UOJIVE1.
6.2 Asymptotic variance of UOJIVEs under fixed
Assumption 4.
and are finite, for some . The existence of these two expectations jointly implies that is finite.
Proof for Lemma 6.5 can be found in Appendix E. . Combining with Lemma 6.1, the asymptotic variance of UOJIVE is .
6.3 Many-instrument consistency of UOJIVE2
The many-instrument asymptotics framework is that both and go to infinity and the ratio converges to a constant where . The motivation behind the many-instrument framework is to provide a better approximation of a situation where the number of instruments is large relative to sample size and first-stage overfitting is concerning. Many papers have looked into many-instrument setup (Bekker, 1994; Kunitomo, 2012; Chao et al., 2012; Hausman et al., 2012). Theorem 1 from Hausman et al. (2012) directly applies to UOJIVE2.
Theorem 6.5.
Under Assumption 1-4 specified in Hausman et al. (2012), as .
7 Simulation
I run three types of simulations to contrast the performances of OLS, TSLS, JIVEs, and UIJIVEs (UOJIVEs). They are designed to test for many-instrument asymptotic under homoskedasticty, many-instrument asymptotic under consistency under heteroskedasticity, and robustness to outliers. Each simulation has two or four setups and each setup consists of 1000 rounds. All simulations have . There is only one endogenous variable . is set to be 0.3. The intercepts are both stages are set to be 0 throughout all simulations, though a column of ones is still included in the regressions, either partialled out as part of for UIJIVEs or included as a part of for the rest of the estimators.
7.1 Simulation for many-instrument asymptotic under homoskedasticity
The simulation setup for many-instrument asymptotics is summarized in Table 5. The parameters for all instruments (and for all controls ) are set as a constant, so I only report one value for each parameter instead of a vector. is the concentration parameter, I set it to be around 150 following Hansen and Kozbur (2014) to maintain a reasonable instrumental variable strength. follows standard multivariate normal distributions.444I follow Bekker and Crudu (2015) which also assumes nonrandom for analyzing the theoretical property of the IV estimator, but let follow a random distribution in their simulation study. The error terms and are bivariate normal with mean and covariance matrix .
Setup | ||||||||
---|---|---|---|---|---|---|---|---|
1 | 500 | 50 | 10 | 0.3 | 1 | 0.08 | 0.05 | 140.5 |
2 | 2000 | 200 | 40 | 0.3 | 1 | 0.02 | 0.02 | 160 |
I report the simulation results in Table 6, it is clear that OLS, TSLS, and JIVEs suffer from a larger bias than approximately unbiased estimators. In addition, as I increase the degree of overidentification and the number of control variables, the bias of TSLS and JIVEs deteriorates. The sharp worsening of JIVEs’ bias is likely exacerbated by its instability as pointed out by Davidson and MacKinnon (2007). The simulation results align with Corollary 1 which implies that TSLS’ approximate bias is proportional to the degree of overidentification while JIVEs’ approximate biases are proportional to the number of control variables. All approximately unbiased estimators perform well under both setups. For example, UOJIVE1 and UOJIVE2 virtually do not show any bias and have a relatively small variance compared to JIVEs.
Setup 1 | Setup 2 | |||||
---|---|---|---|---|---|---|
Estimator | Bias | Variance | MSE | Bias | Variance | MSE |
OLS | 0.475 | 0.001 | 0.226 | 0.564 | 0.000 | 0.318 |
TSLS | 0.143 | 0.004 | 0.024 | 0.337 | 0.002 | 0.116 |
Nagar | 0.013 | 0.009 | 0.009 | 0.055 | 0.016 | 0.019 |
AUK | 0.007 | 0.011 | 0.011 | 0.023 | 0.027 | 0.027 |
JIVE1 | 0.069 | 0.016 | 0.021 | 0.421 | 1.009 | 1.185 |
JIVE2 | 0.069 | 0.016 | 0.021 | 0.420 | 1.093 | 1.268 |
TSJI1 | 0.003 | 0.010 | 0.010 | 0.002 | 0.022 | 0.022 |
TSJI2 | 0.003 | 0.010 | 0.010 | 0.002 | 0.022 | 0.022 |
UIJIVE1 | 0.003 | 0.010 | 0.010 | 0.002 | 0.023 | 0.023 |
UIJIVE2 | 0.003 | 0.010 | 0.010 | 0.002 | 0.023 | 0.023 |
UOJIVE1 | 0.003 | 0.010 | 0.010 | 0.002 | 0.023 | 0.023 |
UOJIVE2 | 0.003 | 0.010 | 0.010 | 0.002 | 0.022 | 0.022 |
7.2 Many-instrument asymptotic under heteroskedasticity
Following the setup in Ackerberg and Devereux (2009), I test the performance of estimators with heteroskedastic errors while setting to be a group-fixed effect matrix without any control variables . Since there is no , UIJIVEs and UOJIVEs are equivalent and I only report UOJIVEs’ results.
The sample size is set to be 500, . The first 115 observations belong to Group 1, the next 115 observations belong to Group 2. Groups 1 and 2 are two big groups. The rest of the 270 observations consist of 18 small groups. Each small group has 15 observations. The first group is excluded from and , meaning that Group 1 on average has its value 0.3 below other groups conditional on everything else being equal. There are two covariance matrices: and , denoted as and covariance matrices, respectively. In Setup 1, I let the big groups have the covariance matrix and the small groups have the covariance matrix; in Setup 2, I reverse the covariance matrices for the two types of groups.
I summarize the simulation results in Table 7. The result aligns with Ackerberg and Devereux (2009). The Nagar estimator is neither approximately unbiased nor consistent under heteroskedasticity with many instruments. The same can be said about approximately unbiased k-class estimator AUK which is proposed in Appendix D. In contrast, the performance of UOJIVE2 remains stellar, aligning with Theorem 6.5 that consistency of UOJIVE2 is established without assuming homoskedasticity.
Setup 1: | Setup 2: | |||||
---|---|---|---|---|---|---|
Estimator | Bias | Variance | MSE | Bias | Variance | MSE |
OLS | 0.232 | 0.002 | 0.056 | 0.141 | 0.002 | 0.022 |
TSLS | 0.286 | 0.028 | 0.109 | 0.135 | 0.025 | 0.043 |
Nagar | 0.258 | 6.927 | 6.987 | 0.364 | 0.274 | 0.406 |
AUK | 0.336 | 0.246 | 0.358 | 0.409 | 1.087 | 1.253 |
JIVE1 | 0.311 | 119.913 | 119.890 | 0.047 | 0.116 | 0.118 |
JIVE2 | 0.036 | 0.392 | 0.393 | 0.047 | 0.113 | 0.115 |
TSJI1 | 0.054 | 0.127 | 0.130 | 0.072 | 0.073 | 0.078 |
TSJI2 | 0.075 | 0.234 | 0.239 | 0.072 | 0.069 | 0.074 |
UOJIVE1 | 0.011 | 0.088 | 0.088 | 0.023 | 0.064 | 0.065 |
UOJIVE2 | 0.019 | 0.095 | 0.096 | 0.024 | 0.061 | 0.062 |
7.3 Simulation with outlier
In my proofs for the consistency and asymptotic normality results of UOJIVE1, Assumption BA is repeatedly invoked to bound below the denominator. With high leverage points, the asymptotic proofs for UOJIVE1 are no longer valid. In contrast, UOJIVE2’s asymptotic results do not require Assumption BA. In the following simulation setups, I deliberately introduce high leverage points in the DGP, when these high leverage points coincide with large variances of , the performance of UOJIVE1 is much worse than that of UOJIVE2.
There are 5 instruments , one endogenous variable , and no controls . Again, UIJIVEs and UOJIVEs are equivalent due to the absence of , so I only report UOJIVEs. I set the sample size to be . All these numbers are 1 plus a square number so it is easy to set up the following simulations. The first observation is the high leverage point with its first entry being . For the rest of the observations, every observations have their first five rows being the identity matrix, the rest of the rows are all zeroes. This setup is equivalent to the group fixed effect setup for the heteroskedasticity simulation where there are five small groups (indexed 2-6) and one large group (Group 1), except the first row which is supposed to belong to Group 2 is contaminated, and has its value multiplied by . I include Table 11 in the Appendix F to illustrate this simulation setup. The error terms and are bivariate normal with mean and covariance matrix , except that , the error terms for the high leverage point, is multiplied by . The coincidence between high leverage and large variance of generates an outlier. Intuitively, the first observation has a high influence on its fitted value and has a large probability of deviating a lot from the regression line. is set to be one.
The results for the simulations with outliers are summarized in Table 8. Throughout the four simulations, UOJIVE2 outperforms UOJIVE1 and TSJI2 outperforms TSJI1 substantially. The simulation result points to the importance of Assumption BA to the consistency of UOJIVE1.
|
|
||||
|
|
Moreover, the consistency results of UOJIVE2 seem to be preserved under this simulation setup even though does not exist. As shown by Figure 3, the empirical distribution of the 1000 UOJIVE2 estimates concentrates around more and more as the sample size increases, whereas that of UOJIVE1 does not change much across the four simulations. It can be an interesting future research direction to show that UOJIVE2 is consistent under such a weak-instrument setup whereas UOJIVE1 is not.




8 Empirical Studies
There are multitudes of social science studies that use a large number of instruments. Some examples include the judge leniency IV design where researchers use the identity of the judge as an instrument. In other words, the number of instruments is equal to the number of judges in the sample. The method has been applied to other settings (See Table 1 in Frandsen et al. (2023) for the immense popularity of judge leniency design). In this section, I apply approximately unbiased estimators to two classical empirical studies. I compute the standard error by assuming homoskedasticity and treating these approximately unbiased estimators as just-identified IV estimators.
8.1 Quarter of birth
The quarter of birth example has been repeatedly cited by many-instrument literature. Here I apply TSJIs and UOJIVEs the famous example in Angrist and Krueger (1991).
Many states in the US have a compulsory school attendance policy. Students are mandated to stay in school until their 16th, 17th, or 18th birthday depending on which state they are from. As such, students’ quarter of birth may induce different quitting-school behaviors. This natural experiment makes a quarter of birth a valid IV to estimate the marginal earnings brought by an additional school year for those who are affected by the compulsory attendance policy.
Angrist and Krueger (1991) interacts quarter of birth with other dummy variables to generate a large number of IVs,
-
1.
Quarter of birth Year of birth
-
2.
Quarter of birth Year of birth State of birth
where case 1 contains 30 instruments, and case 2 contains 180 instruments. The results are reported in Table 9.
Estimator | Case 1 | Case 2 | ||
---|---|---|---|---|
Estimate (%) | SE (%) | Estimate (%) | SE (%) | |
TSLS | 8.91 | 1.61 | 9.28 | 0.93 |
Nagar | 9.35 | 1.80 | 10.88 | 1.20 |
AUK | 9.35 | 1.80 | 10.88 | 1.20 |
JIVE1 | 9.59 | 1.90 | 12.11 | 1.37 |
JIVE2 | 9.59 | 1.90 | 12.11 | 1.37 |
TSJI1 | 9.34 | 1.79 | 10.92 | 1.20 |
TSJI2 | 9.34 | 1.79 | 10.93 | 1.20 |
UIJIVE1 | 9.34 | 1.80 | 10.20 | 1.08 |
UIJIVE2 | 9.34 | 1.80 | 10.95 | 1.20 |
UOJIVE1 | 9.34 | 1.80 | 10.20 | 1.08 |
UOJIVE2 | 9.34 | 1.80 | 10.95 | 1.20 |
8.2 Veteran’s smoking behavior
Bedard and Deschênes (2006) use year of birth and its interaction with gender as instruments to estimate how much enlisting for WWII and the Korean War increases the veterans’ probability of smoking during the later part of their lives. The result can be interpreted as LATE.
-
1.
Birth year gender
-
2.
Birth year
where case 1 uses all data and case 2 uses only data for male veterans. The results are summarized in Table 10.
Estimator | Case 1 | Case 2 | ||||||
---|---|---|---|---|---|---|---|---|
CPS 60 | CPS 90 | CPS 60 | CPS 90 | |||||
Estimate | SE | Estimate | SE | Estimate | SE | Estimate | SE | |
TSLS | 27.63 | 3.50 | 34.58 | 2.38 | 23.69 | 13.90 | 30.14 | 3.15 |
Nagar | 27.82 | 3.51 | 34.68 | 2.39 | 32.42 | 17.61 | 30.43 | 3.18 |
AUK | 27.82 | 3.51 | 34.68 | 2.39 | 32.45 | 17.62 | 30.43 | 3.18 |
JIVE1 | 28.50 | 3.65 | 35.03 | 2.43 | -136.12 | 224.30 | 31.13 | 3.33 |
JIVE2 | 28.50 | 3.65 | 35.03 | 2.43 | -136.12 | 224.35 | 31.13 | 3.33 |
TSJI1 | 27.84 | 3.53 | 34.68 | 2.39 | 32.52 | 21.99 | 30.43 | 3.21 |
TSJI2 | 27.84 | 3.53 | 34.68 | 2.39 | 32.52 | 21.99 | 30.43 | 3.21 |
UIJIVE1 | 27.85 | 0.12 | 34.70 | 2.39 | 32.37 | 17.64 | 30.43 | 3.18 |
UIJIVE2 | 27.85 | 0.12 | 34.70 | 2.39 | 32.37 | 17.65 | 30.43 | 3.18 |
UOJIVE1 | 27.85 | 0.12 | 34.70 | 2.39 | 32.37 | 17.64 | 30.43 | 3.18 |
UOJIVE2 | 27.85 | 0.12 | 34.70 | 2.39 | 32.37 | 17.65 | 30.43 | 3.18 |
The results of all estimators are close except for Case 2 with CPS90. In this setup of Case 2 CPS90, JIVE’s result, which is negative and counterintuitively large in magnitude (larger than 1), is driven by its instability. TSLS’ estimate is about 10% below the estimates of other approximately unbiased estimators. Compared to other setups (Case 1 CPS60, Case 1 CPS90, Case 2 CPS90), the TSLS estimate for this case is the smallest. In contrast, for other approximately unbiased estimators, their estimates throughout all four cases are closer to each other. Bedard and Deschênes (2006) claims that the four TSLS estimates “are almost identical”. Table 10 gives stronger evidence for this claim with even closer estimates from other approximately unbiased estimators.
9 Conclusion
This paper formalizes the definition of approximate bias and extends its applicability to other estimators that the approximate bias literature has not considered. The extension motivates new approximately unbiased estimators such as UOJIVE2. I show that UOJIVE2 is consistent under a fixed number of instruments and control variables as the sample size goes to infinity. The consistency and asymptotic normality results of UOJIVE2 do not require the absence of high leverage points, a condition that is necessary for the proofs that I construct to establish consistency and asymptotic normality of UOJIVE1. The simulation study aligns with these theoretical results. When a high leverage point coincides with a high variance of the error term, an outlier is generated and the performance of UOJIVE1 is much poorer than that of UOJIVE2.
Appendix A Approximate bias for classes of estimators that have property
Recall that
In this section, I will provide proof for corollary 1 and in the process, the derivation of , , and . Consider an IV estimator that takes form of where and hence, .
The last step is a geometric expansion of where since and . The first term’s stochastic order is obvious, I evaluate the stochastic order of -class estimator’s as an example to show that . The proof for -class is similar but easier given that and Assumption BA. The proof for the k-class estimator is trivial.
I make the following Assumption BA for the leverage of projection matrix , the assumption implies that for large enough , for some fixed , for all .
Lemma A.1.
Assume that , .
Proof.
because CLT applies to and law of large of number applies to the summation term when divided by . ∎
Lemma A.2.
.
Proof.
For a fixed ,
The last inequality holds when and , both of which are true for large . ∎
With Lemma A.2, it is easy to show the following: assuming that is finite, for some , . Combining this result with Lemma A.1, I show that . The proof for is similar.
Next, I sub into the .
The last equality holds because after cross-multiplying, we have six terms to evaluate:
term | stochastic order | keep or not |
---|---|---|
Yes | ||
Yes | ||
Yes | ||
No | ||
No | ||
No |
After dropping the last three terms, we obtain the following expression for the difference between the estimator and :
(7) |
We evaluate the three terms in Eq.(7) separately.
A.1
The part of the expression with a zero expectation is exactly .
The last equality holds because , therefore, . And we also have that . So,
Then, I evaluate the expectation of R2:
A.2
A.3
Though the last expression is not the same as , it does not affect the definition of approximate bias since we are only interested in the expectation of the last expression and that of . As long as the last expression and share the same expectation, definition 1 remains valid. Recall that and . The following shows that the last expression has the same expectation.
Appendix B Approximate bias for classes of estimators that do not have property
This section shows that definition 1 applies to -class and -class estimators. Once the validity of the definition is established, it is trivial to show that corollary 1 is also true for these two classes of estimators.
B.1 -class estimators
Recall that the closed-form expression of -class estimator is
and the difference between and is
After cross-multiplying, we obtain the following six terms
term | stochastic order | keep or not |
---|---|---|
Yes | ||
Yes | ||
Yes | ||
Yes | ||
No | ||
No |
B.1.1
B.1.2
B.1.3
Note that .
B.1.4
B.2 -class estimators
Recall that the closed-form expression of -class estimator is
and the difference between and is
B.2.1
B.2.2
B.2.3
Note that .
Appendix C Proof for approximate bias of UIJIVE1 is asymptotically vanishing
Claim C.1.
.
Proof.
∎
Appendix D Generalizing approximate bias to other classes of estimators
D.1 k-class estimators
Classical k-class estimators takes the form of and its is an affine combination of () and (). Its matrix satisfies the property.
Therefore, corollary 1 applies to all k-class estimators. I set such that the approximate bias of the k-class estimator is zero as in Eq(8). The resulting estimator is termed an Approximately Unbiased k-class estimator (or AUK in short) and AUK’s converges at a rate of to that of the Nagar estimator. In contrast, the Nagar estimator’s k converges to that of TSLS () at a rate of .
(8) | ||||
D.2 -class estimator
-class estimator bridges between JIVE1 and TSLS, both of which have the property. To maintain this property, the matrix of -class estimator is designed to be such that when , the estimator is TSLS; when , the estimator is JIVE1. Evaluating the approximate bias of -class estimator, we get
By Corollary 1, the approximate bias of -class estimator is proportional to
(9) |
Under Assumption BA, setting makes the approximate bias asymptotically vanish.
D.3 -class estimator
The relationship between -class estimator and -class estimator is analogous to the relationship between JIVE1 and JIVE2. -class estimator removes the row-wise division of -class estimator. Hence, the matrix of -class estimator is designed to be .
By corollary 1, the approximate bias for -class estimator is , and hence, the approximately unbiased -class estimator has its and I call this estimator TSJI2.
D.4 -class estimator: UOJIVE1
D.5 -class estimator: UOJIVE2
Similarly, we .
Appendix E Asymptotic proofs
E.1 Proof for Lemma 6.1
(11) | |||
(12) | |||
(13) |
Consider expression (11),
Assumption 2 and Lemma A.2 jointly imply that is bounded above by the sum of of two terms’ norms. Therefore, expression (11) converges in probability to .
E.2 Proof for Lemma 6.2
(14) | |||
(15) | |||
(16) |
E.3 Proof for Lemma 6.3
E.4 Proof for Lemma 6.4
E.5 Proof for Lemma 6.5
The proof is similar to proof for Lemma 6.2. I first show that .
The first term converges to 0 in probability under Assumption 4. The second term converges to 0 in probability . Therefore, and .
The other two terms converge to 0 in probability under Assumption BA and 4
E.6 Proof for Lemma 6.6
Appendix F Example for the simulation setup with outliers
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Description |
0 | 0 | 0 | 0 | Outlier | |
1 | 0 | 0 | 0 | 0 | Group 2 (Row 2) |
0 | 1 | 0 | 0 | 0 | Group 3 (Row 3) |
0 | 0 | 1 | 0 | 0 | Group 4 (Row 4) |
0 | 0 | 0 | 1 | 0 | Group 5 (Row 5) |
0 | 0 | 0 | 0 | 1 | Group 6 (Row 6) |
0 | 0 | 0 | 0 | 0 | Group 1 (Rows 7-11) |
0 | 0 | 0 | 0 | 0 | Group 1 (continued) |
0 | 0 | 0 | 0 | 0 | Group 1 (continued) |
0 | 0 | 0 | 0 | 0 | Group 1 (continued) |
0 | 0 | 0 | 0 | 0 | Group 1 (continued) |
1 | 0 | 0 | 0 | 0 | Group 2 (Row 12) |
0 | 1 | 0 | 0 | 0 | Group 3 (Row 13) |
0 | 0 | 1 | 0 | 0 | Group 4 (Row 14) |
0 | 0 | 0 | 1 | 0 | Group 5 (Row 15) |
0 | 0 | 0 | 0 | 1 | Group 6 (Row 16) |
0 | 0 | 0 | 0 | 0 | Group 1 (Rows 17-21) |
0 | 0 | 0 | 0 | 0 | Group 1 (continued) |
…(Pattern continues until 101 rows) |
References
- Ackerberg and Devereux (2009) Daniel A Ackerberg and Paul J Devereux. Improved JIVE estimators for overidentified linear models with and without heteroskedasticity. The Review of Economics and Statistics, 91(2):351–362, 2009.
- Anderson and Sawa (1982) Theodore Wilbur Anderson and Takamitsu Sawa. Exact and approximate distributions of the maximum likelihood estimator of a slope coefficient. Journal of the Royal Statistical Society Series B: Statistical Methodology, 44(1):52–62, 1982.
- Angrist and Krueger (1991) Joshua D Angrist and Alan B Krueger. Does compulsory school attendance affect schooling and earnings? The Quarterly Journal of Economics, 106(4):979–1014, 1991.
- Angrist et al. (1999) Joshua D Angrist, Guido W Imbens, and Alan B Krueger. Jackknife instrumental variables estimation. Journal of Applied Econometrics, 14(1):57–67, 1999.
- Bedard and Deschênes (2006) Kelly Bedard and Olivier Deschênes. The long-term impact of military service on health: Evidence from World War II and Korean War veterans. American Economic Review, 96(1):176–194, 2006.
- Bekker (1994) Paul A Bekker. Alternative approximations to the distributions of instrumental variable estimators. Econometrica: Journal of the Econometric Society, pages 657–681, 1994.
- Bekker and Crudu (2015) Paul A Bekker and Federico Crudu. Jackknife instrumental variable estimation with heteroskedasticity. Journal of econometrics, 185(2):332–342, 2015.
- Buse (1992) Adolf Buse. The bias of instrumental variable estimators. Econometrica: Journal of the Econometric Society, pages 173–180, 1992.
- Chao et al. (2012) John C Chao, Norman R Swanson, Jerry A Hausman, Whitney K Newey, and Tiemen Woutersen. Asymptotic distribution of JIVE in a heteroskedastic IV regression with many instruments. Econometric Theory, 28(1):42–86, 2012.
- Davidson and MacKinnon (2007) Russell Davidson and James G MacKinnon. Moments of IV and JIVE estimators. The Econometrics Journal, 10(3):541–553, 2007.
- Frandsen et al. (2023) Brigham Frandsen, Lars Lefgren, and Emily Leslie. Judging Judge Fixed Effects. American Economic Review, 113(1):253–77, 2023.
- Fuller (1977) Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.
- Hansen and Kozbur (2014) Christian Hansen and Damian Kozbur. Instrumental variables estimation with many weak instruments using regularized JIVE. Journal of Econometrics, 182(2):290–308, 2014.
- Harding et al. (2016) Matthew Harding, Jerry Hausman, and Christopher J Palmer. Finite sample bias corrected IV estimation for weak and many instruments. In Essays in honor of Aman Ullah, pages 245–273. Emerald Group Publishing Limited, 2016.
- Hausman et al. (2012) Jerry A Hausman, Whitney K Newey, Tiemen Woutersen, John C Chao, and Norman R Swanson. Instrumental variable estimation with heteroskedasticity and many instruments. Quantitative Economics, 3(2):211–255, 2012.
- Kunitomo (2012) Naoto Kunitomo. An optimal modification of the LIML estimation for many instruments and persistent heteroscedasticity. Annals of the Institute of Statistical Mathematics, 64:881–910, 2012.
- Mogstad et al. (2021) Magne Mogstad, Alexander Torgovitsky, and Christopher R Walters. The causal interpretation of two-stage least squares with multiple instrumental variables. American Economic Review, 111(11):3663–98, 2021.
- Nagar (1959) Anirudh L Nagar. The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica: Journal of the Econometric Society, pages 575–595, 1959.