This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Balancing weights for region-level analysis: the effect of Medicaid Expansion on the uninsurance rate among states that did not expand Medicaid

Max Rubinsteinlabel = e1]mrubinst@andrew.cmu.edu [    Amelia Havilandlabel = e2 [    mark]amelia@andrew.cmu.edu    David Choilabel = e3 [    mark]davidch@andrew.cmu.edu Carnegie Mellon University, Heinz College and Department of Statistics & Data Science
Medicaid expansion,
balancing weights,
measurement error,
hierarchical data,
keywords:
\startlocaldefs\endlocaldefs

, , and

We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweight the expansion regions to match the covariate distribution of the non-expansion regions. Existing methods to estimate balancing weights often assume that the covariates are measured without error and do not account for dependencies in the outcome model. Our covariates have random noise that is uncorrelated with the outcome errors and our assumed outcome model has state-level random effects inducing dependence between regions. To correct for the bias induced by the measurement error, we propose generating our weights on a linear approximation to the true covariates, using an idea from measurement error literature known as “regression-calibration” (see, e.g., Carroll et al. (2006)). This requires auxiliary data to estimate the variability of the measurement error. We also modify the Stable Balancing Weights objective proposed by Zubizarreta (2015)) to reduce the variance of our estimator when the model errors are homoskedastic and equicorrelated within states. We show that these approaches outperform existing methods when attempting to predict observed outcomes during the pre-treatment period. Using this method we estimate that Medicaid expansion would have caused a -2.33 (-3.54, -1.11) percentage point change in the adult uninsurance rate among states that did not expand Medicaid.

1 Introduction

We study the effect of 2014 Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. We use public-use survey microdata from annual American Community Survey (ACS) aggregated to the consistent public use microdata area (CPUMA) level, a geographic region that falls within states. We calculate weights that reweight expansion-state CPUMAs to approximately match the covariate distribution of CPUMAs in states that did not expand Medicaid in 2014. We then estimate our causal effect as the difference in means between the reweighted treated CPUMAs and the observed mean of the non-expansion CPUMAs. A key challenge is that our data consists of estimated covariates. The sampling variability in these estimates is a form of measurement error that may bias effect estimates calculated on the observed data. Additionally, CPUMAs fall within states and share a common policy-making environment. The data-generating process for the outcomes therefore may contain state-level random effects that can increase the variance of standard estimation procedures. Our study contributes to the literature on balancing weights by proposing approaches to address both of these problems. We also contribute to the literature on Medicaid expansion by estimating the foregone coverage gains of Medicaid among states that did not expand Medicaid in 2014, which to our knowledge has not yet been directly estimated.

Approximate balancing weights are an estimation method in causal inference that grew out of the propensity score weighting literature. Rather than iteratively modeling the propensity score until the inverse probability weights achieve a desired level of balance, recent papers propose using optimization methods to generate weights that enforce covariate balance between the treated and control units (see, e.g., Hainmueller (2012), Imai and Ratkovic (2014), Zubizarreta (2015)).111These methods also use ideas from the survey literature, which had proposed similar approaches to adjust sample weights to enforce equality between sample totals and known population totals (see, e.g., Haberman (1984), Deville and Särndal (1992), Deville, Särndal and Sautory (1993), Särndal and Lundström (2005)). From an applied perspective, there are at least four benefits of this approach: first, it does not require iterating propensity score models to generate satisfactory weights. Second, these methods (and propensity score methods generally) do not use the outcomes to determine the weights, mitigating the risk of cherry-picking an outcome model specification to obtain a desired result. Third, these methods can constrain the weights to prevent extrapolation from the data, thereby reducing model dependence (Zubizarreta (2015)). Finally, the estimates are more interpretable: by making the comparison group explicit, it is easy to communicate exactly which units contributed to the counterfactual estimate.

Most proposed methods in this literature assume that the covariates are measured without error. For our application we assume that our covariates are measured with mean-zero additive error. This error could potentially bias standard estimation procedures. As a first contribution, we therefore propose generating our weights as a function of a linear approximation to the true covariate values, using an idea from the measurement error literature known as “regression-calibration” (see, e.g., Gleser (1992), Carroll et al. (2006)). This method requires access to an estimate of the measurement error covariance matrix, which we estimate using the ACS microdata. The theoretic consistency of these estimates requires several assumptions, including that the covariate measurement errors are uncorrelated with any errors in the outcome model, the outcome model is linear in the covariates, and the covariates are i.i.d. gaussian. The first assumption is reasonable for our application since the covariates are measured on a different cross-sectional survey than our outcomes. The second is strong but somewhat relaxed because we prevent our weights from extrapolating beyond the support of the data. The third can be fully relaxed to obtain consistent estimates using ordinary least squares (OLS), but unfortunately not without additional modeling beyond our proposed method. Despite appearing costly, we show in Section 3.4.2 that this tradeoff is likely worth it in our application.

As a second contribution, we propose modifying the Stable Balancing Weights (SBW) objective (Zubizarreta (2015)) to account for possible state-level dependencies in our outcome model. We assume that the errors are homoskedastic with constant positive equicorrelation, though our general approach can accommodate other assumed correlation structures. In a setting without measurement error, we show that this modification can reduce the variance of the resulting estimates. We also connect these weights to the implied regression weights from Generalized Least Squares (GLS). Our overall approach provides a general framework that can be used by other applied researchers who wish to use balancing weights to estimate causal effects when their data are measured with error and/or the model errors are dependent.222Our approach also relates to the “synthetic controls” literature (see, e.g., Abadie, Diamond and Hainmueller (2010)). Synthetic controls are a popular balancing weights approach frequently used in the applied economics literature to estimate treatment effects on the treated (ETT) for region-level policy changes when using time series cross sectional data. Our application uses a similar data structure; however, we instead consider the problem of estimating the ETC. In contrast to much of the synthetic controls literature, which assumes that the counterfactual outcomes follow a linear factor model, we instead assume that the counterfactual outcomes are linear in the observed covariates (including pre-treatment outcomes).

Section 2 begins with a more detailed overview of the policy problem, and then defines the study period, covariates, outcome, and treatment. Section 3 discusses our methods, including outlining our identification, estimation, and inferential procedures. Section 4 presents our results. Section 5 contains a discussion of our findings, and Section 6 contains a brief summary. The Appendices contain proofs, summary statistics, and additional results.

2 Policy Problem and Data

2.1 Policy Problem Statement

Under the Affordable Care Act (ACA), states were required to expand their Medicaid eligibility requirements by 2014 to offer coverage to all adults with incomes at or below 138 percent of the federal poverty level (FPL). The United States Supreme Court ruled this requirement unconstitutional in 2012, allowing states to decide whether to expand Medicaid coverage. In 2014, twenty-six states and the District of Columbia expanded their Medicaid programs. From 2015 through 2021, an additional twelve states elected to expand their Medicaid programs. Medicaid expansion has remained a debate among the remaining twelve states that have not expanded Medicaid as of 2021.333https://www.nbcnews.com/politics/politics-news/changed-hearts-minds-biden-s-funding-offer-shifts-medicaid-expansion-n1262229 The effects of Medicaid expansion on various outcomes, including uninsurance rates, mortality rates, and emergency department use, have been widely studied, primarily by using the initial expansions in 2014 and 2015 to define expansion states as “treated” states and non-expansion states as “control” states (see, e.g., Courtemanche et al. (2017), Miller, Johnson and Wherry (2021), Ladhania et al. (2021)).

Medicaid enrollment is not automatic, and take-up rates have historically varied across states. This variation is partly a function of state discretion in administering programs: for example, program outreach, citizenship verification policies, and application processes differ across states (Courtemanche et al. (2017)). Estimating how Medicaid eligibility expansion actually affects the number of uninsured individuals is therefore not obvious. This is also important because many effects are mediated largely through reducing the number of uninsured individuals. Existing studies have estimated that Medicaid expansion reduced the uninsurance rate between three and six percentage points on average among states that expanded Medicaid. These estimates differed depending on the data used, specific target population, study design, and level of analysis (see, e.g., Kaestner et al. (2017), Courtemanche et al. (2017), Frean, Gruber and Sommers (2017)). However, none of these studies have directly estimated the average treatment effect on the controls (ETC).

We believe that the ETC may differ from the ETT. Every state had different coverage policies prior to 2014, and non-expansion states tended to have less generous policies than expansion states. “Medicaid expansion” therefore represents a set of treatments of varying intensities that are distributed unevenly across expansion and non-expansion states. Averaged over the non-expansion states, which tended to have less generous policies and higher uninsurance rates prior to Medicaid expansion, we might expect the average effect to be larger in absolute magnitude than among the expansion states, where “Medicaid expansion” on average reflected smaller policy changes.444As a part of our analysis strategy, we limit our pool of expansion states to those where the policy changes were comparable to the non-expansion states. We also control for pre-treatment uninsurance rates (see Section 2.3). Even limited to states with equivalent coverage policies prior to 2014 we still might expect the ETT to differ from the ETC. For example, all states that were entirely controlled by the Democratic Party at the executive and legislative levels expanded their Medicaid programs, while only states where the Republican Party controlled at least part of the state government failed to expand their programs. Prior to the 2014 Medicaid expansion, Sommers et al. (2012) found that conservative governance was associated with lower Medicaid take-up rates. This might reflect differences in program implementation, which could serve as effect modifiers for comparable policy changes.555Interestingly, Sommers et al. (2012) also find that the association between conservative governance and lower take-up rates prior to 2014 existed even after controlling for a variety of factors pertaining to state-level policy administration decisions. They conjecture that this may reflect cultural conservatism: people in conservative states are more likely to view enrollment in social welfare programs negatively, and therefore be less likely to enroll. These factors may then attenuate the effects of Medicaid expansion averaged over non-expansion states relative to expansion states.

As a more general causal quantity, the ETC is also interesting in its own right: to the extent that the goal of studying Medicaid expansion is to understand the foregone benefits - or potential harms - of Medicaid expansion among non-expansion states, the ETC is the relevant quantity of interest. Authors have previously made claims about the ETC without directly estimating it. For example, Miller, Johnson and Wherry (2021) use their estimates of the ETT to predict that had non-expansion states expanded Medicaid, they would have seen 15,600 fewer deaths from 2014 through 2017. However, as these authors note, this estimate assumes that the ETT provides a comparable estimate to the ETC. From a policy analysis perspective, we recommend that researchers estimate the ETC directly when it answers a substantive question of interest. We therefore contribute to the literature on Medicaid expansion by directly estimating this quantity.

2.2 Data Source and Study Period

Our primary data source is the annual household and person public use microdata files from the ACS from 2011 through 2014. The ACS is an annual cross-sectional survey of approximately three million individuals across the United States. The public use microdata files include information on individuals in geographic areas greater than 65,000 people. The smallest geographic unit contained in these data are public-use microdata areas (PUMAs), arbitrary boundaries that nest within states but not within counties or other more commonly used geographic units. One limitation of these data is a 2012 change in the PUMA boundaries, which do not overlap well with the previous boundaries. As a result, the smallest possible geographic areas that nest both PUMA coding systems are known as consistent PUMAs (CPUMAs). The United States contains 1,075 total CPUMAs, with states ranging from having one CPUMA (South Dakota, Montana, and Idaho) to 123 CPUMAs (New York). Our primary dataset contains 925 CPUMAs among 45 states (see also Section 2.3). The average total number of sampled individuals per CPUMA across the four years is 1,001; the minimum number of people sampled was 334 and the maximum is 23,990. We aggregate the microdata to the CPUMA level using the survey weights.

This aggregation naturally raises concerns about measurement error and hierarchy. Any CPUMA-level variable is an estimate, leading to concerns about measurement error. The hierarchical nature of the dataset – CPUMAs within states – raises concerns about geographic dependence.

Our study period begins in 2011, following Courtemanche et al. (2017), who note that several other aspects of the ACA were implemented in 2010 – including the provision allowing for dependent coverage until age 26 and the elimination of co-payments for preventative care – and likely induced differential shocks across states. We also restrict our post-treatment period to 2014. We therefore avoid additional assumptions required for identification given that several states expanded Medicaid in 2015, including Indiana, Pennsylvania, and Alaska.

2.3 Treatment assignment

Reducing the concept of “Medicaid expansion” to a binary treatment simplifies a more complex reality. There are at least three reasons to be cautious about this simplification. First, states differed substantially in their Medicaid coverage policies prior to 2014. Given perfect data we might ideally consider Medicaid expansion as a continuous treatment with values proportional to the number of newly eligible individuals. The challenge is correctly identifying newly eligible individuals in the data (though see Frean, Gruber and Sommers (2017) and Miller, Johnson and Wherry (2021), who attempt to address this). Second, Frean, Gruber and Sommers (2017) note that five states (California, Connecticut, Minnesota, New Jersey, and Washington) and the District of Columbia adopted partial limited Medicaid expansions prior to 2014. The “2014 expansion” therefore actually occurred in part prior to 2014 for several states.666Kaestner et al. (2017) and Courtemanche et al. (2017) also consider Arizona, Colorado, Hawaii, Illinois, Iowa, Maryland, and Oregon to have had early expansions. Finally, timing is an issue: among the states that expanded Medicaid in 2014, Michigan’s expansion did not go into effect until April 2014, while New Hampshire’s expansion did not occur until September 2014.

Our primary analysis excludes New York, Vermont, Massachusetts, Delaware, and the District of Columbia from our pool of expansion states. These states had comparable Medicaid coverage policies prior to 2014 and therefore reflect invalid comparisons (Kaestner et al. (2017)). We also exclude New Hampshire because it did not expand Medicaid until September 2014. While Michigan expanded Medicaid in April 2014, we leave this state in our pool of “treated” states. We consider the remaining expansion states, including those with early expansions, as “treated” and the non-expansion states, including those that later expanded Medicaid, as “control” states. We later consider the sensitivity of our results to these classifications by removing the early expansion states indicated by Frean, Gruber and Sommers (2017). Our final dataset contains data for 925 CPUMAs, with 414 CPUMAs among 24 non-expansion states and 511 CPUMAs among 21 expansion states.777We additionally include the 4 CPUMAs from New Hampshire in the covariate adjustment procedure described in Section 3. When we exclude the early expansion states, we are left with 292 CPUMAs across 16 expansion states. We provide a complete list of states by Medicaid expansion classification in Appendix C.

2.4 Outcome

Our outcome is the non-elderly (individuals aged 19-64) adult uninsurance rate in 2014. While take-up among the Medicaid-eligible population is a more natural outcome, we choose the non-elderly adult uninsurance rate for two reasons, one theoretic and one practical. First, Medicaid eligibility post-expansion is likely endogenous: Medicaid expansion may affect an individual’s income and poverty levels, which often define Medicaid eligibility. Second, we can better compare our results with the existing literature, including Courtemanche et al. (2017), who also use this outcome. One drawback is that the simultaneous adoption of other ACA provisions by all states in 2014 also affects this outcome. As a result, we only attempt to estimate the effect of Medicaid expansion in 2014 in the context of this changing policy environment. We discuss this further in Sections 3.2 and  3.3.

2.5 Covariates

We choose our covariates to approximately align with those considered in Courtemanche et al. (2017) and that are likely to be potential confounders. Specifically, using the ACS public use microdata, we calculate the unemployment and uninsurance rates for each CPUMA from 2011 through 2013. We also estimate a variety of demographic characteristics averaged over this same time period, including percent female, white, married, Hispanic ethnicity, foreign-born, disabled, students, and citizens. We estimate the percent in discrete age categories, education attainment categories, income-to-poverty ratio categories, and categories of number of children. Finally, we calculate the average population growth and number of households to adults. We provide a more extensive description of our calculation of these variables in Appendix B.

In addition to the ACS microdata we use 2010 Census data to estimate the percentage living in an urban area for each CPUMA. Lastly, we include three state-level covariates reflecting the partisan composition of each state’s government in 2013. These include an indicator for states with a Republican governor, Republican control over the lower legislative chamber, and Republican control over both legislative chambers and the governorship.888Nebraska is the only state with a unicameral legislature and the legislature is technically non-partisan. We nevertheless classified them as having Republican control of the legislature for this analysis.

3 Methods

In this section we present our causal estimand, identifying assumptions, estimation strategy, and inferential procedure. Our primary methodological contributions are contained in the subsection on estimation. We begin by outlining notation.

3.1 Notation

We let ss index states, and cc index CPUMAs within states. Let mm denote the number of states, psp_{s} the number of CPUMAs in state ss, and n=s=1mpsn=\sum_{s=1}^{m}p_{s} the total number of CPUMAs. For each state ss, let AsA_{s} denote its treatment assignment according to the discussion given in Section 2.3, with As=1A_{s}=1 indicating treatment and As=0A_{s}=0 indicating control. For each CPUMA cc in state ss, let YscY_{sc} denote its uninsurance rate in 2014; let XscX_{sc} denote a q-dimensional covariate vector; and let Asc=AsA_{sc}=A_{s} denote its treatment status. We assume potential outcomes (Rubin (2005)), defining a CPUMA’s potential uninsurance under treatment by Ysc1Y^{1}_{sc}, and under control by Ysc0Y^{0}_{sc}. Finally, let n1n_{1} and n0n_{0} denote the number of treated and control CPUMAs, and define m1m_{1} and m0m_{0} analagously for states.

Given a collection of objects indexed over CPUMAs, we denote the complete set by removing the subscript. For example, XX denotes {Xsc:sc𝒞}\{X_{sc}:sc\in\mathcal{C}\}, where 𝒞\mathcal{C} is the set of all CPUMAs. Subscripting by the labels A=1A=1 or A=0A=0 denotes the subset corresponding to the treated or control units; for example, XA=0X_{A=0} denotes {Xsc:Asc=0}\{X_{sc}:A_{sc}=0\}, the covariates of the control units. To denote averaging, we will use an overbar, while also abbreviating A=0A=0 and A=1A=1 respectively by 0 and 11. For example, X¯0\bar{X}_{0} (which abbreviates X¯A=0\bar{X}_{A=0}) denotes the average covariate vector for the control units, and Y¯01\bar{Y}_{0}^{1} denotes the average potential outcome under treatment for the control units.

3.2 Estimand

Letting XX, Y1Y^{1}, and Y0Y^{0} be random, we define the causal estimand ψ0\psi_{0}

ψ0\displaystyle\psi_{0} =𝔼[Y¯01Y¯00XA=0]\displaystyle=\mathbb{E}[\bar{Y}_{0}^{1}-\bar{Y}_{0}^{0}\mid X_{A=0}] (1)
=ψ01ψ00\displaystyle=\psi_{0}^{1}-\psi_{0}^{0} (2)

where ψ0a\psi_{0}^{a} denotes the expectation 𝔼[Y¯0aXA=0]\mathbb{E}[\bar{Y}_{0}^{a}\mid X_{A=0}]. The estimand ψ0\psi_{0} represents the expected treatment effect on non-expansion states conditioning on XA=0X_{A=0}, the observed covariate values of the non-expansion states (see, e.g., Imbens (2004)). The challenge is that we do not observe YA=01Y^{1}_{A=0}, the counterfactual outcomes for non-expansion CPUMAs had their states expanded their Medicaid programs, nor their average Y¯01\bar{Y}^{1}_{0}. We therefore require causal assumptions to identify this counterfactual quantity using our observed data.999As noted previously, the 2014 Medicaid expansion occurred simultaneously with the implementation of several other major ACA provisions, including (but not limited to) the creation of the ACA-marketplace exchanges, the individual mandate, health insurance subsidies, and community-rating and guaranteed issue of insurance plans (Courtemanche et al. (2017)). Almost all states broadly implemented these reforms beginning January 2014. Conceptually we think of the other ACA components as a state-level treatment (RR) separate from Medicaid expansion (AA). Our total estimated effect may include interactions between these policy changes; however, we do not attempt to separately identify these effects. Without further assumptions, we therefore cannot generalize these results beyond 2014.

3.3 Identification

We appeal to the following causal assumptions to identify ψ0\psi_{0} from our observed data: the stable unit treatment value assumption (SUTVA), no unmeasured confounding given the true covariates, and no anticipatory treatment effects. We also invoke parametric assumptions to model the measurement error and to express our estimand in terms of parameters from a linear model. We conclude by using ideas from the “regression-calibration” literature (see, e.g., Gleser (1992)) to ensure that identifying our target estimand is possible given auxiliary data on the measurement error covariance matrix.

We first assume the SUTVA at the CPUMA level. Assuming the SUTVA has two implications for our analysis: first, that there is only one version of treatment; second, that each unit’s potential outcome only depends on its treatment assignment. We discussed potential violations of the first implication previously when considering how to reduce Medicaid expansion to a binary treatment. The second implication could be violated if one CPUMA’s expansion decision affected uninsurance rates in another CPUMA (see, e.g., Frean, Gruber and Sommers (2017)). On the other hand, our assumption allows for interference among individuals living within CPUMAs and is therefore weaker than assuming no interference among any individuals. Further addressing this is beyond the scope of this paper.

Second, we assume no effects of treatment on the observed covariates. This includes assuming no anticipatory effects on pre-2014 uninsurance rates. This is violated in our study, as some treated states allowed specific counties to expand their Medicaid programs prior to 2014, thereby affecting their pre-2014 uninsurance rates. We later test the sensitivity of our results to the exclusion of these states.

Third, we assume no unmeasured confounding. Specifically, we posit that in 2014 the potential outcomes for each CPUMA are jointly independent of the state-level treatment assignment conditional on CPUMA and state-level covariates XscX_{sc}:

(Ysc1,Ysc0)AsXsc(Y_{sc}^{1},Y_{sc}^{0})\perp A_{s}\mid X_{sc} (3)

The covariate vector XscX_{sc} includes both time-varying pre-treatment covariates, including pre-treatment outcomes, and covariates averaged across 2011-2013, such as demographic characteristics, and the state-level governance indicators discussed in Section 2.2. We believe this assumption is reasonable given our rich covariate set.

Fourth, we assume that the outcomes for each treatment group are linear in the true covariates:

Ysca=αa+Xscβa+ϵsc+εsa=0,1Y_{sc}^{a}=\alpha_{a}+X_{sc}^{\top}\beta_{a}+\epsilon_{sc}+\varepsilon_{s}\qquad a=0,1 (4)

where denotes vector transpose, and the errors ϵsc\epsilon_{sc} and εs\varepsilon_{s} are mean-zero; independent from the covariates, treatment assignment, and each other; and have finite variances σϵ2\sigma^{2}_{\epsilon} and σε2\sigma^{2}_{\varepsilon}, respectively.101010Because our covariates include pre-treatment outcomes, this assumption also implies that ϵsc\epsilon_{sc} and εsc\varepsilon_{sc} are uncorrelated with pre-treatment outcomes, including any error terms that might appear in their generative models. This implies that the errors for each CPUMA within a given state have a constant within-state correlation σε2/(σε2+σϵ2)\sigma^{2}_{\varepsilon}/(\sigma^{2}_{\varepsilon}+\sigma^{2}_{\epsilon}), which we denote as ρ\rho. To fix ideas, ϵsc\epsilon_{sc} may capture time-specific idiosyncracies at the local level, possibly due to the local policy or economic conditions. By contrast εs\varepsilon_{s} captures time-specific idiosyncracies at the state-level that are common across CPUMAs within a state due to the shared policy and economic environment.

Fifth, we assume that the covariates XX and outcomes YY are not observed directly. Instead, survey sampled versions WW and JJ are available, with additive gaussian measurement error arising due to sample variability.

Jsc\displaystyle J_{sc} =Ysc+ξsc\displaystyle=Y_{sc}+\xi_{sc} and Wsc\displaystyle W_{sc} =Xsc+νsc\displaystyle=X_{sc}+\nu_{sc} (5)

where (ξsc,νsc)(\xi_{sc},\nu_{sc}) is independent of (X,Y)(X,Y) and has distribution

(ξsc,νsc)indepMVN(0,[σξ,sc200Σν,sc])(\xi_{sc},\nu_{sc})\stackrel{{\scriptstyle\text{indep}}}{{\sim}}\operatorname{MVN}\left(0,\left[\begin{array}[]{cc}\sigma_{\xi,sc}^{2}&0\\ 0&\Sigma_{\nu,sc}\end{array}\right]\right) (6)

We believe equations (5) and (6) are reasonable because measurement error in our context is sampling variability. While (6) further implies the measurement errors in our covariates and outcomes are uncorrelated, this is also reasonable because our outcomes are measured on a different cross-sectional survey than our covariates.111111Our covariates are almost all ratio estimates, which are in general biased. This bias, however, is O(1/n)O(1/n) and therefore decreases quickly with the sample size; given that our sample sizes are all over 300, we treat these estimates as unbiased.

Sixth, we assume that the covariates for the treated units XscX_{sc} are drawn i.i.d. multivariate normal conditional on treatment:

Xsc|Asc=1\displaystyle X_{sc}|A_{sc}=1 iidMVN(υ1,ΣX|1)\displaystyle\stackrel{{\scriptstyle\text{iid}}}{{\sim}}MVN(\upsilon_{1},\Sigma_{X|1}) (7)

Under equations (6)-(7), the conditional expectation of XscX_{sc} given noisy observation WW among the treated units can be seen to equal

𝔼[Xsc|W,A]=υ1+ΣX|1(ΣX|1+Σν,sc)1(Wscυ1),sc:Asc=1\mathbb{E}[X_{sc}|W,A]=\upsilon_{1}+\Sigma_{X|1}\left(\Sigma_{X|1}+\Sigma_{\nu,sc}\right)^{-1}(W_{sc}-\upsilon_{1}),\qquad\forall\,sc:A_{sc}=1 (8)

Equation (7) is a convenient simplification to motivate (8). For example, XscX_{sc} includes state-level covariates, so the covariates cannot be independent. More generally, many of the covariates are bounded, and therefore cannot be gaussian. In fact, assuming (7) is not strictly necessary for consistent estimation of ψ01\psi_{0}^{1} (see, e.g., Gleser (1992)); however, it is required by the weighting approaches that we consider here. In our validation experiments described in Section 4.3, we find that our approaches which assume (7) outperform those that do not, as the latter group evidently relies more heavily on the linearity assumption (4).121212In principle it may be possible to generalize equations (5)-(7) to settings where the conditional expectation 𝔼[Xsc|W,A]\mathbb{E}[X_{sc}|W,A] follows a different form than (8), but is still accessible given auxiliary data. For example, to make the linearity assumption of equation (4) more credible, XscX_{sc} might include transformations or a basis expansion of the covariate, so that Xsc=ϕ(Usc)X_{sc}=\phi(U_{sc}) for some function ϕ\phi of the untransformed covariates UscU_{sc}. Under assumptions analogous to (5)-(7) for UscU_{sc}, we may still be able to estimate 𝔼[XscW,A]\mathbb{E}[X_{sc}\mid W,A]. We give some preliminary findings in Appendix A.1, Remark 5. Developing this idea further would be an interesting area for future work.

Regardless, to use (8) to estimate 𝔼[Xsc|Wsc,Asc=1]\mathbb{E}[X_{sc}|W_{sc},A_{sc}=1], we require υ1\upsilon_{1}, Σν,sc\Sigma_{\nu,sc}, and ΣX|1\Sigma_{X|1}. W¯1\bar{W}_{1} provides a consistent estimate of υ1\upsilon_{1}. However, the data does not identify Σν,sc\Sigma_{\nu,sc} and ΣX|1\Sigma_{X|1}. Our final assumption is that we can consistently estimate the covariance matrices Σν,sc\Sigma_{\nu,sc} and ΣX|1\Sigma_{X|1} using auxiliary data, so that we can use (8) to estimate the conditional mean. The ACS microdata serves as our auxiliary data; further details are discussed in Section 3.4.2.

Under these assumptions we can rewrite our causal estimand in terms of the model parameters. As ϵsc\epsilon_{sc} and εs\varepsilon_{s} are zero-mean and independent of covariates, we can rewrite ψ0a=𝔼[Y¯0aXA=0]\psi_{0}^{a}=\mathbb{E}[\bar{Y}_{0}^{a}\mid X_{A=0}] by applying expectations to (4), yielding

ψ0a=αa+X¯0βa\psi_{0}^{a}=\alpha_{a}+\bar{X}_{0}^{\top}\beta_{a} (9)

If we observed (A,Y,X)(A,Y,X), we would have YA=aY_{A=a} and XA=aX_{A=a} for a={0,1}a=\{0,1\}. Therefore, by (4) the data would identify (αa,βa)(\alpha_{a},\beta_{a}), which identifies ψ0a\psi_{0}^{a} since X¯0\bar{X}_{0} is observed. However, we only observe the noisy measurements JJ and WW. As ξsc\xi_{sc} is zero-mean, equation (5) implies that J¯0\bar{J}_{0} estimates Y¯00\bar{Y}_{0}^{0} and therefore ψ00\psi_{0}^{0}. Estimating ψ01\psi_{0}^{1} remains challenging: as Ysc1AscXsc\centernotJsc1AscWscY_{sc}^{1}\perp A_{sc}\mid X_{sc}\centernot\implies J_{sc}^{1}\perp A_{sc}\mid W_{sc}, it is well-known that noisy measurements will bias standard estimation procedures, such as linear regression, that naively use them without adjustment (see also Appendix A.1).

Let X~sc=𝔼[Xsc|W,A]\tilde{X}_{sc}=\mathbb{E}[X_{sc}|W,A] abbreviate the conditional expectation of the covariates given the noisy observations, as given by (8). Substituting Xsc=X~sc+XscX~scX_{sc}=\tilde{X}_{sc}+X_{sc}-\tilde{X}_{sc} into the outcome model given by (4) and then (4) into (5) yields

Jsc=α1+X~scβ1+(XscX~sc)β1+ξsc+ϵsc+εssc:Asc=1J_{sc}=\alpha_{1}+\tilde{X}_{sc}^{\top}\beta_{1}+(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}+\xi_{sc}+\epsilon_{sc}+\varepsilon_{s}\qquad\forall\,sc:A_{sc}=1 (10)

As XscX~scX_{sc}-\tilde{X}_{sc} equals Xsc𝔼[Xsc|W,A]X_{sc}-\mathbb{E}[X_{sc}|W,A], this quantity is zero-mean conditioned on (W,A)(W,A). The term (XscX~sc)β1(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1} appearing in (10) is therefore also conditionally zero-mean. Finally, the outcome noise ξsc\xi_{sc} as well as the noise terms ϵsc\epsilon_{sc} and εsc\varepsilon_{sc} appearing in this equation are also zero-mean and independent of (W,A)(W,A). It follows that if we observed (X~,J,A)(\tilde{X},J,A), we would have X~A=1\tilde{X}_{A=1} and JA=1J_{A=1}; therefore, by (10) the data would identify (α1,β1)(\alpha_{1},\beta_{1}). In turn, (9) implies that α1\alpha_{1}, β1\beta_{1} and X¯A=0\bar{X}_{A=0} (which can be estimated without bias from WA=0W_{A=0}) identifies ψ01\psi_{0}^{1}. Since have assumed that X~\tilde{X} follows (8), and that we have auxiliary data available to estimate this equation, we therefore have sufficient data to estimate ψ0\psi_{0} under our models and assumptions. We now discuss estimation.

3.4 Estimation

We propose to use approximate balancing weights to estimate ψ01\psi_{0}^{1}. We first review approximate balancing weights and the SBW objective proposed by Zubizarreta (2015). These methods typically assume that the covariates are measured without error. We will show in Proposition 1 that under the classical-errors-in-variables model, the SBW estimate of ψ01\psi_{0}^{1} has the same bias as the OLS estimate.

We first attempt to remove this bias by estimating (8), leveraging the ACS microdata replicate survey weights to estimate this model. We consider two adjustments: (a) a homogeneous adjustment that assumes the noise covariance Σν,sc\Sigma_{\nu,sc} is constant across all CPUMAs; and (b) a heterogeneous adjustment that allows Σν,sc\Sigma_{\nu,sc} to vary according to the sample sizes associated with each CPUMA. We next propose a modification to SBW that we call H-SBW, which accounts for the state-level random effects εs\varepsilon_{s}. Using SBW and H-SBW we generate weights that balance the adjusted data to the mean covariate values of the non-expansion states. To further reduce imbalances that remain after weighting, we consider bias-corrections using ridge-regression augmentation, following Ben-Michael, Feller and Rothstein (2021).

3.4.1 Stable balancing weights

Zubizarreta (2015) proposes the Stable Balancing Weights (SBW) algorithm to generate a set of weights γ\gamma that reweight a set of covariates Z={Zsc}Z=\{Z_{sc}\} to a target covariate vector υ\upsilon within a tolerance parameter δ\delta by solving the optimization problem:

minγscγsc2such thatγΓ(Z,υ,δ)\min_{\gamma}\sum_{sc}\gamma_{sc}^{2}\quad\text{such that}\quad\gamma\in\Gamma(Z,\upsilon,\delta) (11)

where the constraint set Γ(Z,υ,δ)\Gamma(Z,\upsilon,\delta) is given by

Γ(Z,υ,δ)={γ:|γscZscυ|δ,γ0,scγsc=1}\Gamma(Z,\upsilon,\delta)=\left\{\gamma:\left|\sum\gamma_{sc}Z_{sc}-\upsilon\right|\leq\delta,\,\gamma\geq 0,\,\sum_{sc}\gamma_{sc}=1\right\}

and where δ\delta may be a qq-dimensional vector if non-uniform tolerances are desired. To estimate ψ0\psi_{0} given the true covariates XX and outcomes YY, one can use SBW to reweight the treated units to approximately equal the mean covariate value of the control units by finding γ^\hat{\gamma} solving (11) with υ0=X¯0\upsilon_{0}=\bar{X}_{0} and Z=XA=1Z=X_{A=1} for some feasible δ\delta. One can then use γ^\hat{\gamma} to estimate ψ01\psi_{0}^{1} and ψ0\psi_{0}:

ψ^01\displaystyle\hat{\psi}_{0}^{1} =Asc=1γ^scYsc,\displaystyle=\sum_{A_{sc}=1}\hat{\gamma}_{sc}Y_{sc}, ψ^00\displaystyle\hat{\psi}_{0}^{0} =Y¯00,\displaystyle=\bar{Y}_{0}^{0}, ψ^=ψ^01ψ^00\displaystyle\hat{\psi}=\hat{\psi}_{0}^{1}-\hat{\psi}_{0}^{0} (12)

In the case where the potential outcomes follow the linear model specified in  (4), the bias of ψ^01\hat{\psi}^{1}_{0} is less than or equal to |β1|δ\lvert\beta_{1}\rvert^{\top}\delta, and therefore equal to zero if δ=0\delta=0 (Zubizarreta, 2015). Moreover, ψ^01\hat{\psi}_{0}^{1} produces the minimum variance estimator within the constraint set - conditional on XX - assuming that the errors in the outcome model are independent and identically distributed.

3.4.2 Measurement error

In the presence of measurement error the estimation procedure described in Section 3.4.1 will be biased. We show in Appendix A.1, Proposition 1, that under the classical errors-in-variables model where Σν,sc=Σν\Sigma_{\nu,sc}=\Sigma_{\nu} for all units, if γ^\hat{\gamma} is found by solving the SBW objective (11) with ZZ equal to the noisy covariates WA=1W_{A=1}, υ\upsilon equal to the estimated mean W¯0\bar{W}_{0} of the control units, and δ=0\delta=0, the estimator ψ^01\hat{\psi}_{0}^{1} in (12) has bias

𝔼[ψ^01]ψ01=(X¯0υ1)(κ1Iq)β1\displaystyle\mathbb{E}[\hat{\psi}_{0}^{1}]-\psi_{0}^{1}=(\bar{X}_{0}-\upsilon_{1})^{\top}(\kappa_{1}-I_{q})\beta_{1}

where κ1=(ΣX|1+Σν)1ΣX|1\kappa_{1}=(\Sigma_{X|1}+\Sigma_{\nu})^{-1}\Sigma_{X|1}. This is equivalent to the bias for an OLS estimator of ψ01\psi_{0}^{1}, where (α1,β1)(\alpha_{1},\beta_{1}) are estimated by regression of YA=1Y_{A=1} on WA=1W_{A=1}.

We mitigate this bias by setting Z=X^A=1Z=\hat{X}_{A=1}, where X^A=1\hat{X}_{A=1} is an estimate of X~A=1\tilde{X}_{A=1} given by (8). This requires estimating υ1\upsilon_{1}, Σν,sc\Sigma_{\nu,sc} and ΣX|1\Sigma_{X|1}. To estimate υ1\upsilon_{1} we simply use W¯1\bar{W}_{1}. To estimate ΣX|1\Sigma_{X|1} and Σν,sc\Sigma_{\nu,sc} we use the ACS microdata’s set of 80 replicate survey weights to construct 80 additional CPUMA-level datasets. For each CPUMA among the treated states, we take the empirical covariance matrix of its covariates over the datasets to derive unpooled etimates Σ^ν,scraw\hat{\Sigma}_{\nu,sc}^{\text{raw}}, which we average over CPUMAs to create Σ^ν\hat{\Sigma}_{\nu}. We then estimate ΣX|1\Sigma_{X|1} by subtracting Σ^ν\hat{\Sigma}_{\nu} from the empirical covariance matrix of WA=1W_{A=1},

Σ^X|1=1n1sc:Asc=1(WscW¯1)(WscW¯1)Σ^ν\hat{\Sigma}_{X|1}=\frac{1}{n_{1}}\sum_{sc:A_{sc}=1}(W_{sc}-\bar{W}_{1})(W_{sc}-\bar{W}_{1})^{\top}-\hat{\Sigma}_{\nu}

We consider two estimates of Σν,sc\Sigma_{\nu,sc}: first, where we let Σ^ν,sc=Σ^ν\hat{\Sigma}_{\nu,sc}=\hat{\Sigma}_{\nu} for all units, which we call the homogeneous adjustment; second, where each Σ^ν,sc\hat{\Sigma}_{\nu,sc} equals Σ^ν\hat{\Sigma}_{\nu} rescaled according to the sample size of the estimate WscW_{sc}, which we call the heterogeneous adjustment. We describe these adjustments fully in Appendix B. Using Σ^X|1\hat{\Sigma}_{X|1} and Σ^ν,sc\hat{\Sigma}_{\nu,sc}, we estimate X~A=1\tilde{X}_{A=1} using the empirical version of (8), inducing estimates X^A=1\hat{X}_{A=1} given by

X^sc=W¯1+Σ^X|1(Σ^X|1+Σ^ν,sc)1(WscW¯1),sc:Asc=1\hat{X}_{sc}=\bar{W}_{1}+\hat{\Sigma}_{X|1}(\hat{\Sigma}_{X|1}+\hat{\Sigma}_{\nu,sc})^{-1}(W_{sc}-\bar{W}_{1}),\qquad\forall\,sc:A_{sc}=1 (13)

We then compute debiased balancing weights γ^\hat{\gamma} by solving (11) with Z=X^A=1Z=\hat{X}_{A=1}, υ=W¯0\upsilon=\bar{W}_{0}, and tuning parameter δ\delta chosen as described in Section 3.4.4. Given γ^\hat{\gamma}, we find ψ^01\hat{\psi}_{0}^{1} and ψ^\hat{\psi} again using (12).

The homogeneous adjustment approximately aligns with the adjustments suggested by Carroll et al. (2006) and Gleser (1992). In Appendix A.1, Propositions 2-4, we show that this procedure returns consistent estimates of ψ01\psi_{0}^{1} and ψ0\psi_{0} under the identifying assumptions discussed and assuming δ=0\delta=0 is feasible. This is the first application we are aware of to apply regression calibration in the context of balancing weights to address measurement error. However, this method requires access to knowledge about Σν\Sigma_{\nu}. We use survey microdata to identify this parameter for our application. Alternatively, region-level datasets often contain region-level variance estimates. If a researcher is willing to assume Σν\Sigma_{\nu} is diagonal, she could leverage this information to use this approach. If no auxiliary data is available, she could also consider Σν\Sigma_{\nu} to be a sensitivity parameter and conduct estimates over a range of possible values (see, e.g., Huque, Bondell and Ryan (2014), Illenberger, Small and Shaw (2020)).

3.4.3 H-SBW criterion

Unlike the setting outlined in Zubizarreta (2015), our application likely has state-level dependencies in the error terms which may increase the variance of the SBW estimator. We therefore add a tuning parameter ρ[0,1)\rho\in[0,1) to penalize the within-state cross product of the weights, as detailed in  (14), representing a constant within-state correlation of the errors.

minγs=1m(c=1psγsc2+cdργscγsd)such thatγΓ(Z,υ,δ)\min_{\gamma}\quad\sum_{s=1}^{m}(\sum_{c=1}^{p_{s}}\gamma_{sc}^{2}+\sum_{c\neq d}\rho\gamma_{sc}\gamma_{sd})\quad\text{such that}\quad\gamma\in\Gamma(Z,\upsilon,\delta)\\ (14)

To build intuition about this objective, for δ\delta\to\infty, the following solution is attained:

γ^sc1(ps1)ρ+1\hat{\gamma}_{sc}\propto\frac{1}{(p_{s}-1)\rho+1} (15)

Setting ρ=0\rho=0 returns the SBW solution: γ^sc1\hat{\gamma}_{sc}\propto 1. When setting ρ1\rho\approx 1, we get γ^sc1ps\hat{\gamma}_{sc}\propto\frac{1}{p_{s}}. In other words, as we increase ρ\rho, this objective downweights CPUMAs in states with large numbers of CPUMAs and upweights CPUMAs in states with small numbers of CPUMAs (assigning each CPUMA within a state equal weight). As we increase ρ\rho, the objective will therefore more uniformly disperse weights across states. We show in Appendix A.3 that solving the H-SBW produces the minimum conditional variance estimator of ψ01\psi_{0}^{1} within the constraint set assuming homoskedasticity and equicorrelated errors. We also highlight the connection between the H-SBW solution and the implied regression weights from GLS.

An important caveat emerges in the context of measurement error. In settings where the covariates are dependent, the conditional expectation 𝔼[Xsc|W,A]\mathbb{E}[X_{sc}|W,A] is no longer given by (8), and therefore X^sc\hat{X}_{sc} given by (13) is no longer consistent for X~sc\tilde{X}_{sc}. As a result, finding weights solving H-SBW (14) with Z=X^A=1Z=\hat{X}_{A=1} is not unbiased in these settings. A simulation study in Appendix F also shows that this bias increases with ρ\rho, and that the SBW solution remains approximately unbiased. To regain unbiasedness in general, X^sc\hat{X}_{sc} must be modified from (13) to account for dependencies, requiring new modeling assumptions. We demonstrate this more formally in Appendix A.3 and propose an adjustment to account for dependent gaussian covariates in Appendix B.

3.4.4 Hyperparameter selection

Practical guidance in the literature is that δ\delta should reduce the standardized mean differences to be less than 0.1 (see, e.g., Zhang et al. (2019)). In our application, all of our covariates are measured on the same scale. Additionally, because some of these covariates have very small variances (for example, percent female), we instead target the percentage point differences. We can then estimate ψ0\psi_{0} using  (12), substituting JscJ_{sc} for YscY_{sc} and using the weights γ^\hat{\gamma}.

We choose the vector δ\delta using domain knowledge about which covariates are most likely to be important predictors of the potential outcomes under treatment, again recalling that the bias of our estimate is bounded by d=1qδd|β1,d|\sum_{d=1}^{q}\delta_{d}\lvert\beta_{1,d}\rvert (where we index the covariates from d=1,,q)d=1,...,q). Specifically, we know that pre-treatment outcomes are often strong predictors of post-treatment outcomes, so we constrain the imbalances to fall within 0.05 percentage points (out of 100) for pre-treatment outcomes.131313While both our approach and the synthetic controls literature prioritizes balancing pre-treatment outcomes, the motivation is somewhat different. The synthetic controls literature frequently assumes that the potential outcomes absent treatment follow a linear factor model. Under this model a heuristic is that the effects of covariates must show up via the pre-treatment outcomes, so that balancing pre-treatment outcomes is sufficient to balance any other relevant covariates (see Botosaru and Ferman (2019), who formalize this idea). However, both the theory and common practice recommends balancing a large vector of pre-treatment outcomes. We instead have very limited pre-treatment data. We therefore do not assume a factor model, and instead treat the pre-treatment outcomes simply as covariates. We then justify prioritizing balancing these pre-treatment outcomes assuming they have large coefficients relative to other covariates in the assumed model for the (post-treatment) potential outcomes under treatment. Because health insurance is often tied to employment, we also prioritize balancing pre-treatment uninsurance rates, seeking to reduce imbalances below 0.15 percentage points. On the opposite side of the spectrum, we constrain the Republican governance indicators to fall within 25 percentage points. While we believe that these covariates are important to balance, given the data we are unable to reduce the constraints further without generating extreme weights. We detail the remaining constraints in Appendix D.141414Our key estimand - ψ01\psi_{0}^{1} - differs from the traditional target of the synthetic controls literature - ψ10\psi^{0}_{1}. If there exist covariates that are not relevant for producing Y0Y^{0} but are relevant for Y1Y^{1}, then balancing pre-treatment outcomes alone will not in general balance these covariates (Botosaru and Ferman (2019)). Therefore, even with access to a large vector of pre-treatment outcomes and assuming that Y1Y^{1} follows a factor model, explicitly balancing such covariates in addition to pre-treatment outcomes may be necessary to estimate the ETC well.

We consider ρ{0,1/6}\rho\in\{0,1/6\}. The first choice is equivalent to the SBW objective, while the second assumes constant equicorrelation of 1/61/6. We choose ρ\rho to be small to limit additional bias induced by H-SBW in the context of dependent data and measurement error.

Data-driven approaches to select these parameters could also be used. For example, absent measurement error if pre-treatment outcomes and covariates were available one could use the residuals from GLS to estimate ρ\rho. Data-driven procedures for δ\delta are also possible. Wang and Zubizarreta (2020) propose a data-driven approach that only uses the covariate information. When data exists for a long pre-treatment period, Abadie, Diamond and Hainmueller (2015) propose tuning their weights with respect to covariate balance using a “training” and “validation” period, an idea that could be adapted to choose δ\delta. Expanding these ideas to this setting would be an interesting area for future work.

3.5 Sensitivity to covariate imbalance

Our initial set of weights allow for some large covariate imbalances. We follow the proposal of Ben-Michael, Feller and Rothstein (2021) and use ridge-regression augmentation to reduce these imbalances. While these weights achieve better covariate balance, this comes at the cost of extrapolating beyond the support of the data. Letting γ^HSBW\hat{\gamma}^{H-SBW} be our H-SBW weights, and X^1\hat{X}_{1} denote the matrix whose columns are the members of X^A=1\hat{X}_{A=1}, we define these weights as:

γ^BCHSBW=γ^HSBW+(X^1γ^HSBWW¯0)(X^1Ω1X^1+λIq)1X^1Ω1\hat{\gamma}^{BC-HSBW}=\hat{\gamma}^{H-SBW}+(\hat{X}_{1}\hat{\gamma}^{H-SBW}-\bar{W}_{0})^{\top}(\hat{X}_{1}\Omega^{-1}\hat{X}_{1}^{\top}+\lambda I_{q})^{-1}\hat{X}_{1}\Omega^{-1} (16)

where Ω\Omega is a block diagonal matrix with diagonal entries equal to one and the within-group off diagonals equal to ρ\rho. We choose λ\lambda so that all imbalances fall within 0.5 percentage points. We refer to Ben-Michael, Feller and Rothstein (2021) for more details about this procedure. For our results we consider estimators using SBW weights (ρ=0\rho=0), H-SBW weights (ρ=1/6\rho=1/6), and their ridge-augmented versions that we respectively call BC-SBW and BC-HSBW.

3.6 Model validation

To check model validity, we rerun our procedures on pre-treatment data to compare the performance of our models for a fixed δ\delta. In particular, we train our model on 2009-2011 data to predict 2012 outcomes, and 2010-2012 data to predict 2013 outcomes. We limit to one-year prediction error since our estimand is only one-year forward. We examine the performance of SBW against H-SBW and their bias-corrected versions BC-SBW and BC-HSBW, using the covariate adjustment methods described in Section 3.4.2 to account for measurement error. In Appendix E, we additionally compare to “Oaxaca-Blinder” OLS and GLS weights (see, e.g, Kline (2011)). The OLS weights do not require the gaussian assumption of (7) for consistency, but rely more heavily on the linear model (4).

3.7 Inference

We use the leave-one-state-out jackknife to estimate the variance of ψ^01\hat{\psi}_{0}^{1} (see, e.g., Cameron and Miller (2015)).151515The jackknife approximates the bootstrap, which is sometimes used to estimate the variance of the OLS-based estimates using regression-calibration in the standard setting with i.i.d. data (Carroll et al. (2006)). Specifically, we take the pool of treated states and generate a list of datasets that exclude each state. For each dataset in this list we calculate the weights and the leave-one-state-out estimate of ψ01\psi_{0}^{1}. Throughout all iterations we hold our targeted mean fixed at W¯0\bar{W}_{0}.161616That is, we treat W¯0\bar{W}_{0} as identical to X¯0\bar{X}_{0} and ignore the variability of the estimate. This variability is of smaller order than the variability in ψ^01\hat{\psi}_{0}^{1}: the former does not depend on the number of states but instead decreases with the number of CPUMAs among the control states and the sample sizes used to estimate each CPUMA-level covariate. Recall that X¯0\bar{X}_{0} is fixed because our estimand is conditional on the observed covariate values of the non-expansion states. When generating these estimates, if our preferred initial choice of δ\delta does not converge, we gradually reduce the constraints (increase δ\delta) until we can obtain a solution. For each dataset we also re-estimate X^A=1\hat{X}_{A=1} before estimating the weights to account for the variability in the covariate adjustment procedure. We then estimate the variance:

Var^(ψ^01)=m11m1s:As=1(S(s)S())2\hat{Var}(\hat{\psi}_{0}^{1})=\frac{m_{1}-1}{m_{1}}\sum_{s:A_{s}=1}\left(S_{(s)}-S_{(\cdot)}\right)^{2} (17)

where S(s)S_{(s)} is the estimator of ψ01\psi_{0}^{1} with treated state ss removed, and S()=1m1s:As=1S(s)S_{(\cdot)}=\frac{1}{m_{1}}\sum_{s:A_{s}=1}S_{(s)}. In other settings the jackknife has been shown to be a conservative approximation of the bootstrap, such as in Efron and Stein (1981), which we apply in Appendix A.1, Proposition 5 to give a partial result for our application. In a simulation study mirroring our setting with m1=25m_{1}=25 units (available in Appendix F), we obtain close to nominal coverage rates using these variance estimates.171717When more substantial undercoverage occurs it is likely due to bias.

To estimate the variance of ψ^00\hat{\psi}_{0}^{0} we run an auxiliary regression model on the non-expansion states and estimate the variance of the linear combination W¯0β^0\bar{W}_{0}^{\top}\hat{\beta}_{0} using cluster-robust standard errors. We do not need to adjust the non-expansion state data to estimate this quantity: a linear regression line always contains the point (W¯0,J¯0)(\bar{W}_{0},\bar{J}_{0}), which are unbiased estimates of (X¯0,ψ00)(\bar{X}_{0},\psi_{0}^{0}). Therefore, 𝔼{W¯0β^0X}=ψ00\mathbb{E}\{\bar{W}_{0}^{\top}\hat{\beta}_{0}\mid X\}=\psi_{0}^{0}. Our final variance estimate Var^(ψ^)\hat{Var}(\hat{\psi}) is the sum of Var^(ψ^01)\hat{Var}(\hat{\psi}_{0}^{1}) and Var^(ψ^00)\hat{Var}(\hat{\psi}_{0}^{0}).181818The latter is much smaller than the former – specifically, we estimate Var^(ψ^00)=0.017\hat{Var}(\hat{\psi}_{0}^{0})=0.017. We use the t-distribution with m11m_{1}-1 degrees of freedom to generate 95 percent confidence intervals.

4 Results

We first present summary statistics regarding the variability of the pre-treatment outcomes on our adjusted and unadjusted datasets. The second sub-section contains covariate balance diagnostics. The third sub-section contains a validation study, and the final sub-section contains our ETC estimates.

4.1 Covariate adjustment

Table 1 displays the sample variance of our pre-treatment outcomes among the expansion states using each covariate adjustment strategy (see Section 3.4.2 for definitions of these adjustments). Although we most heavily prioritize balancing these covariates, they are also among the least precisely estimated, as most of our other covariates average over multiple years of data. Both the homogeneous and heterogeneous adjustments reduce the variability in the data by comparable amounts. Intuitively, these adjustment reduce the likelihood that our balancing weights will fit to noise in the covariate measurements. These results are consistent across most of our other covariates. Tables containing distributional information for each covariate are available in Appendix C.

Table 1: Pre-treatment outcome sample variance by adjustment strategy (primary dataset)
Variable Unadjusted Heterogeneous Homogeneous
Uninsured Pct 2011 8.35 8.04 8.05
Uninsured Pct 2012 8.20 7.89 7.90
Uninsured Pct 2013 8.09 7.78 7.79

4.2 Covariate balance

Figure 1 displays the weighted and unweighted imbalances in our adjusted covariate set (using the homogeneous adjustment) using the H-SBW weights. Our unweighted data shows substantial imbalances in the Republican governance indicators as well as pre-treatment uninsurance rates. H-SBW reduces these differences; however, some remain, particularly among the Republican governance indicators. All other imbalances are relatively small, both on the absolute difference and standardized mean difference scale. In fact, despite not targeting the SMD, all but five remaining covariate imbalances fall within 0.1 SMD, compared to 23 covariates prior to reweighting. The largest remaining imbalance on the SMD scale is among percent female (-0.3), though, as noted previously, this variable has low variance in our dataset. A complete balance table is in Appendix D.

Figure 1: Balance plot, homogeneous adjustment (primary dataset)
Refer to caption
Figure 2: Weight comparison, weights summed by state (primary dataset)
Refer to caption

To further reduce these imbalances we augment these weights using ridge-regression. Figure 2 shows the total weights summed across states (and standardized to sum to 100) for three estimators: H-SBW, BC-HSBW, and SBW. For BC-HSBW we also display the sum of the negative weights separately from the sum of the positive weights to highlight the extrapolation. This figure illustrates two key points: first, that H-SBW more evenly disperses the weights across states relative to SBW; second, that BC-HSBW extrapolates somewhat heavily in order to achieve balance, particularly from CPUMAs in California. This is likely in part because California has the most CPUMAs of any state in the dataset.

4.3 Model validation

We compare the performance of our models by repeating the covariate adjustments and calculating our procedure on 2009-2011 ACS data to predict 2012 outcomes, and similarly for 2010-2012 data to predict 2013 outcomes for the non-expansion states. We run this procedure using the primary dataset and excluding the early expansion states. Table 2 displays the mean error and RMSE of these results, with the rows ordered by RMSE for each dataset. We see that the estimators trained on the covariate adjusted data perform substantially better than the unadjusted data and the estimators trained on the homogeneous adjustment outperform their counterparts on the heterogeneous adjustment. We therefore prioritize the results using the homogeneous adjustment. We also observe that H-SBW has comparable performance to SBW throughout.

Interestingly, the bias-corrected estimators perform relatively poorly on the primary dataset, but are the best performing estimators when excluding early expansion states – with RMSEs lower than any other estimator on the primary dataset. As the early expansion states include California, and Figure 2 shows that the bias-corrected estimators extrapolate heavily from this state, the differences in these results suggest that preventing this extrapolation may improve the model’s performance. While these results do not imply that the bias-corrected models will perform poorly when predicting ψ01\psi^{1}_{0} in the post-treatment period on the primary dataset (or that they will perform especially well when excluding the early expansion states), it does highlight the dangers of extrapolation: linearity may approximately hold on the support of the data where we have sufficient covariate overlap, but beyond this region this may be a more costly assumption.191919In Table 13 in Appendix E, we also compare the performance against the implied regression weights from OLS and GLS. These weights exactly balance the observed covariates, but are almost always the worst performing estimators in the validation study on either dataset. This also illustrates the benefits of the regularization inherent in the bias-corrected weights.

Table 2: Estimator pre-treatment outcome mean prediction error and RMSE (in % pts)
Primary dataset Early expansion excluded
Sigma estimate Estimator Mean Error RMSE Sigma estimate Estimator Mean Error RMSE
Homogeneous SBW -0.20 0.20 Homogeneous BC-HSBW -0.02 0.07
Homogeneous H-SBW -0.23 0.23 Homogeneous BC-SBW -0.03 0.12
Heterogeneous SBW -0.27 0.27 Heterogeneous BC-HSBW -0.08 0.14
Heterogeneous H-SBW -0.35 0.36 Heterogeneous BC-SBW -0.07 0.15
Homogeneous BC-SBW -0.39 0.39 Homogeneous H-SBW 0.01 0.25
Heterogeneous BC-SBW -0.42 0.42 Homogeneous SBW 0.07 0.26
Unadjusted SBW -0.56 0.56 Heterogeneous SBW 0.04 0.28
Unadjusted H-SBW -0.57 0.57 Heterogeneous H-SBW -0.04 0.29
Homogeneous BC-HSBW -0.58 0.58 Unadjusted SBW -0.37 0.42
Heterogeneous BC-HSBW -0.63 0.63 Unadjusted H-SBW -0.43 0.46
Unadjusted BC-SBW -0.88 0.88 Unadjusted BC-HSBW -0.60 0.60
Unadjusted BC-HSBW -0.96 0.96 Unadjusted BC-SBW -0.70 0.71

Finally, we observe that the mean errors for the estimators generated on the unadjusted dataset are all negative, reflecting that these estimates under-predict the true uninsurance rate (see also Table 13 in Appendix E, which shows similar results for each year individually). These results likely reflect a form of regression-to-the-mean caused by overfitting our weights to noisy covariate measurements. More formally, we can think of the uninsurance rates in time period tt in expansion and non-expansion regions as being drawn from separate distributions with means (υ1,υ0)(\upsilon_{1},\upsilon_{0}), respectively, where υ1<υ0\upsilon_{1}<\upsilon_{0}. For simplicity assume that the only covariate in the outcome model at time tt is Ysct1Y_{sct-1}. Under (4), we obtain Ysc=Ysct=α1+β1Ysct1+ϵsct+εstY_{sc}=Y_{sct}=\alpha_{1}+\beta_{1}Y_{sct-1}+\epsilon_{sct}+\varepsilon_{st}. The pre-treatment outcomes are likely positively correlated with the post-treatment outcomes, implying that β1>0\beta_{1}>0. Because υ1<υ0\upsilon_{1}<\upsilon_{0}, when reweighting the vector of noisy pre-treatment outcomes JA=1J_{A=1} to υ0\upsilon_{0} the expected value of the weighted measurement error 𝔼[Asc=1γ^scνsct1]\mathbb{E}[\sum_{A_{sc}=1}\hat{\gamma}_{sc}\nu_{sct-1}] should be positive. In other words, our weights are likely to favor units with covariate measurements that are greater than their true covariate values. The expected value of the weighted pre-treament outcome will then be less than the target υ0\upsilon_{0}. This implies that our estimates will be negatively biased, since 𝔼[β1(Asc=1γ^scYsct1υ0)]0\mathbb{E}[\beta_{1}(\sum_{A_{sc}=1}\hat{\gamma}_{sc}Y_{sct-1}-\upsilon_{0})]\leq 0.202020This phenomenon has also been discussed in the difference-in-differences and synthetic controls literature (see, e.g., Daw and Hatfield (2018)). While our covariate adjustments are meant to eliminate this bias, in practice they appear more likely only to reduce it. Assuming these errors reflect a (slight) negative bias that will also hold for our estimates of ψ01\psi^{1}_{0}, these results suggest that the true treatment effect may be closer to zero than our estimates.

4.4 Primary Results

Table 3(a) presents all of our estimates of the ETC. Using H-SBW we estimate an effect of -2.33 (-3.54, -1.11) percentage points on our primary dataset. The SBW results are almost identical with -2.35 (-3.76, -0.95) percentage points. Compared to the unadjusted data we find very similar point estimates at -2.34 (-2.88, -1.79) percentage points for H-SBW and -2.39 (-2.99, -1.79) percentage points for SBW. We see that H-SBW reduces the confidence interval length relative to SBW on our primary dataset, though the lengths are nearly identical when excluding early expansion states. This suggests that H-SBW had at best only modest variance improvements relative to SBW in this setting. Using the adjusted covariate set also increases the length of the confidence intervals relative to the unadjusted data. This increase in the estimated variance is expected in part because the adjustment procedure generally reduces the variability in the data, as we saw in Table 1, requiring that the variance of the weights to increase to achieve approximate balance. More generally this increase also reflects the additional uncertainty due to the measurement error.

Adding the bias-correction decreases the absolute magnitude of the estimates: we estimate effects of -2.05 (-3.30, -0.80) percentage points for BC-HSBW and -2.07 (-3.14, -1.00) percentage points for BC-SBW. This contrasts to our validation study, where the bias-corrected estimators tended to predict lower uninsurance rates than the other estimators (implying we might see larger absolute magnitude effect estimates).

Table 3: ETC estimates by weight type and adjustment strategy
Primary dataset Early expansion excluded
Weights Adjustment Estimate Difference Estimate Difference
(95% CI) (95% CI)
H-SBW Homogeneous -2.33 (-3.54, -1.11) 0.01 -2.09 (-3.24, -0.94) 0.19
H-SBW Unadjusted -2.34 (-2.88, -1.79) - -2.28 (-2.87, -1.70) -
BC-HSBW Homogeneous -2.05 (-3.30, -0.80) 0.17 -1.94 (-3.27, -0.61) 0.28
BC-HSBW Unadjusted -2.22 (-2.91, -1.52) - -2.22 (-3.14, -1.31) -
SBW Homogeneous -2.35 (-3.76, -0.95) 0.04 -2.05 (-3.19, -0.91) 0.16
SBW Unadjusted -2.39 (-2.99, -1.79) - -2.21 (-2.75, -1.68) -
BC-SBW Homogeneous -2.07 (-3.14, -1.00) 0.13 -1.99 (-3.33, -0.66) 0.23
BC-SBW Unadjusted -2.19 (-2.94, -1.45) - -2.23 (-3.12, -1.33) -
(a) “Difference” column reflects difference between adjusted and unadjusted estimators

All adjusted estimators were closer to zero than the corresponding unadjusted estimators. This includes estimates using the heterogeneous adjustment (see Appendix E). However, the difference between the adjusted SBW and H-SBW and the unadjusted versions is close to zero. This contrasts with our validation study where the unadjusted SBW and H-SBW estimators ranged from about 0.3 to 0.4 percentage points lower than the adjusted estimators.212121When excluding the early expansion states, the difference between the estimates on the adjusted and unadjusted data persist but are also smaller in magnitude. We interpret this difference as due to chance: while theory and our validation study suggests that our unadjusted estimators are biased, bias only exists in expectation. Our primary dataset is a random draw where our unadjusted and adjusted estimators give similar estimates.

We next consider the sensitivity of our analysis with respect to the no anticipatory treatment effects assumption by excluding the early expansion states (California, Connecticut, Minnesota, New Jersey, and Washington) and re-running our analyses. The columns under “Early expansion excluded” in Table 3(a) reflects these results. The overall patterns of the results are consistent with our primary estimates; however, our point estimates almost all move somewhat closer to zero. This may indicate that either the primary estimates have a slight negative bias, or that these estimates have a slight positive bias. Given our analysis of the validation study we view the first case as more likely. This would imply that our primary estimators reflect a lower bound on the true treatment effect. Regardless, the differences are small relative to our uncertainty estimates. Overall we view our primary results as relatively robust to the exclusion of these states. Additional results are available in Appendix E.

5 Discussion

We divide our discussion into two sections: methodological considerations and policy considerations.

5.1 Methodological considerations and limitations

We make multiple contributions to the literature on balancing weights. First, our estimation procedure accounts for mean-zero random noise in our covariates that is uncorrelated with the outcome errors. We modify the SBW constraint set to balance on a linear approximation to the true covariate values, applying the idea of regression calibration from the measurement error literature to the context of balancing weights. Our results illustrate the benefits of this procedure: using observed pre-treatment outcomes generated by an unknown data generating mechanism, Table 2 demonstrates that our proposed estimators have better predictive performance when balancing on the adjusted covariates. This finding is consistent with concerns about overfitting to noisy covariate measurements and subsequent regression-to-the-mean.

This approach has several limitations: first, it requires access to auxiliary data with which to estimate the measurement error covariance matrix Σν\Sigma_{\nu}. Many applications may not have access to such information. Even without such data, Σν\Sigma_{\nu} could also be considered a sensitivity parameter to evaluate the robustness of results to measurement error (see, e.g., Huque, Bondell and Ryan (2014), Illenberger, Small and Shaw (2020)). Second, from a theoretic perspective, we require strong distributional assumptions on the covariates to consistently estimate ψ01\psi_{0}^{1} using convex balancing weights. This contrasts to Gleser (1992), who shows that the OLS estimates are consistent with only very weak distributional assumptions on the data (see also Propositions 6 and  7 in Appendix A.1). This relates to a third limitation: we require strong outcome modeling assumptions. Yet by preventing extrapolation, SBW and H-SBW estimates may be less sensitive than OLS estimates to these assumptions. Our validation results support this: the standard regression calibration adjustment using OLS and GLS weights performs the worst out of any methods we consider (see Table 13 in Appendix E). In contrast, using regression-calibration with balancing weights – even when allowing for limited extrapolation using ridge-augmentation – performs better.222222We also provide a suggestion of how to adapt our procedure to accommodate a basis expansion of gaussian covariates in Appendix A.1 Remark 5 Developing methods to relax these assumptions further would be a valuable area for future work.

A final concern is that this procedure may be sub-optimal with respect to the mean-square error of our estimator. In particular, the bias induced by the measurement error decreases with the sample size used to calculate each CPUMA’s covariate values, the minimum of which were over three hundred. Yet the variance of our counterfactual estimate decreases with the number of treated states. From a theoretic perspective, the variance is of a larger order than the bias, so perhaps the bias from measurement error should not be a primary concern. Our final results support this: the changes in our results on the adjusted versus unadjusted data are of smaller magnitude than the associated uncertainty estimates. Other studies have proposed tests of whether the measurement error corrections are “worth it,” though we do not do this here (see, e.g., Gleser (1992)). However, as an informal observation, our simulation study in Appendix F shows that the MSE of the SBW estimator that naively balances on the noisy covariates WW is comparable to the MSE of the SBW estimator that balances on the adjusted covariates X^\hat{X} when the ratio of the variance of XX to the variance of WW is 0.95. However, even in this setting we find that confidence interval coverage can fall below nominal rates when balancing on WW, and that the measurement error correction can improve our ability to construct valid confidence intervals.

Our second contribution is to introduce the H-SBW objective. This objective can improve upon the SBW objective assuming that the errors in the outcome model are homoskedastic with constant positive equicorrelation ρ\rho within known groups of units. Assuming no measurement error and that ρ\rho is known, we show that H-SBW returns the minimum variance estimator within the constraint set by more evenly dispersing weights across the groups. We also demonstrate the connection between these weights and the implied weights from GLS (see Propositions 9 and 10 in Appendix A.2). While studies have considered balancing weights in settings with hierarchical data (see, e.g., Keele et al. (2020), Ben-Michael, Feller and Hartman (2021)), we are the first to our knowledge to propose changing the criterion to account for correlated outcomes.

This estimation procedure has at least three potential drawbacks. First, we make a very specific assumption on the covariance structure of the error terms that is useful for our application. For applications where a different structure Ω\Omega is more appropriate one can still follow our approach and minimize the more general criterion f(γ)=γΩγf(\gamma)=\gamma^{\top}\Omega\gamma. Second, we require specifying the parameters (ρ,δ)(\rho,\delta) in advance. Choosing δ\delta is a challenging problem shared with SBW and we do not offer any new suggestions (though see Wang and Zubizarreta (2020), who offer an interesting approach to this problem.) Choosing ρ\rho is a new problem of this estimation procedure. Encouragingly, our simulation study shows the H-SBW estimator for any chosen ρ\rho almost always has lower variance than SBW in the presence of state-level random effects (see Appendix F).232323We caution that this finding solely reflects the simulation space we examined. Even so, identifying a principled approach to choosing this parameter would be a useful future contribution. Third, in the presence of both measurement error and dependent data, using H-SBW in combination with the standard regression-calibration adjustment may be biased. This bias arises because the standard adjustment assumes independent data. Our simulations show that this bias can increase with ρ\rho and that SBW remains approximately unbiased if the covariates are gaussian - though even in this setting H-SBW may still yield modest MSE improvements. We also show a theoretical modification to the adjustment where H-SBW would return unbiased estimates in this setting (see Propositions 8 and  11).

5.2 Policy considerations and limitations

We estimate that had states that did not expand Medicaid in 2014 instead expanded their programs, they would have seen a -2.33 (-3.54, -1.11) percentage point change in the average adult uninsurance rate. Our validation study and robustness checks indicate that this estimate may be biased downwards (away from zero), in which case we can interpret this estimate as a lower bound on the ETC. Existing estimates place the ETT between -3 and -6 percentage points. These estimates vary depending on the targeted sub-population of interest, the data used, the level of modeling (individuals or regions), and the modeling approach (see, e.g., Courtemanche et al. (2017), Kaestner et al. (2017), Frean, Gruber and Sommers (2017)). Our estimate of the ETC are closer to zero than these estimates. When we attempt to estimate the ETT using our proposed method, the resulting estimates have high uncertainty due to limited covariate overlap.242424In particular, there were no states entirely controlled by Democrats that did not expand Medicaid. Even allowing for large imbalances, our standard error estimates were approximately three percentage points and our confidence intervals all contained zero. When estimating a simple difference-in-differences model on the unadjusted dataset we estimate that the ETT is -2.05 (-3.23, -0.87), where the standard errors account for clustering at the state level. The differences may reflect different modeling strategies and data, or it may suggest that the ETC is smaller in absolute magnitude than the ETT.

We are ultimately unable to draw any statistical conclusions about these differences; nevertheless, we continue to emphasize caution about assuming these estimands are equal. Due in part to the different covariate distributions of expansion versus non-expansion states, we may still suspect that the ETC differs from the ETT with respect to the uninsurance rates. Moreover, because almost every outcome of interest is mediated through increasing the number of insured individuals, such a difference may be important. For example, Miller, Johnson and Wherry (2021) study the effect of Medicaid expansion on mortality. Using their estimate of the ETT with respect to mortality they project that had all states expanded Medicaid, 15,600 deaths would have been avoided from 2014 through 2017. If we believe that this number increases monotonically with the number of uninsured individuals, this estimate may be an overestimate if the ETC with respect to the uninsurance rate is less than the ETT, or an underestimate if the ETC is greater than the ETT. Obtaining more precise inferences on the ETC with respect to uninsurance rates, if possible, would therefore be valuable future work.

Our analytic approach is not without limitations. Specifically, we require strong assumptions, including SUTVA, no anticipatory treatment effects, no unmeasured confounding conditional on the true covariates, and several parametric assumptions regarding the outcome and measurement error models. We address some concerns about possible violations of these assumptions. For example, our results were qualitatively similar whether we excluded possible “early expansion states,” or used different weighting strategies (including relaxing the positivity restrictions and changing the tuning parameter ρ\rho). However, we do not attempt to address concerns about the impact of spillovers across regions. And while we believe that no unmeasured confounding is reasonable for this problem, we did not conduct a sensitivity analysis (see, e.g., Bonvini and Kennedy (2021)) with respect to this assumption.

Medicaid expansion remains an ongoing policy debate in the United States, where as of 2022 twelve states have not expanded their eligibility requirements. Our study estimates the effect of Medicaid expansion on adult uninsurance rates; however, the primary reason this effect interesting is that Medicaid enrollment is not automatic for eligible individuals. If the goal of Medicaid expansion is to increase insurance access for low-income adults, state policy-makers also may wish to make it easier or even automatic to enroll in Medicaid.

6 Conclusion

We predict the average change in the non-elderly adult uninsurance rate in 2014 among states that did not expand their Medicaid eligibility thresholds as if they had. We use survey data aggregated to the CPUMA-level to estimate this quantity. The resulting dataset has both measurement error in the covariates that may bias standard estimation procedures, and a hierarchical structure that may increase the variance of these same approaches. We therefore propose an estimation procedure that uses balancing weights that accounts for these problems. We demonstrate that our bias-reduction approach improves on existing methods when predicting observed outcomes from an unknown data generating mechanism. Applying this method to our problem, we estimate that states that did not expand Medicaid in 2014 would have seen a -2.33 (-3.54, -1.11) percentage point change in their adult uninsurance rates had they done so. This is the first study we are aware of that directly estimates the treatment effect on the controls with respect to Medicaid expansion. From a methodological perspective, we demonstrate the value of our proposed method relative to existing methods. From a policy-analysis perspective, we emphasize the importance of directly estimating the relevant causal quantity of interest. More generally if the goal of Medicaid expansion is to improve access to insurance, state and federal policy-makers should consider policies that make Medicaid enrollment easier if not automatic.

Acknowledgements

We gratefully acknowledge invaluable advice and comments from Zachary Branson, Riccardo Fogliato, Edward Kennedy, Brian Kovak, Akshaya Jha, Lowell Taylor, Jose Zubizaretta.

{supplement}

Analyses conducted using R Version 4.0.2 (R Core Team (2020)), and the optweight (Greifer (2021)) and tidyverse (Wickham et al. (2019)) packages. Programs and supporting materials are available at github.com/mrubinst757/medicaid-expansion. Proofs and additional results are available in the Appendix.

References

  • Abadie, Diamond and Hainmueller (2010) {barticle}[author] \bauthor\bsnmAbadie, \bfnmAlberto\binitsA., \bauthor\bsnmDiamond, \bfnmAlexis\binitsA. and \bauthor\bsnmHainmueller, \bfnmJens\binitsJ. (\byear2010). \btitleSynthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. \bjournalJournal of the American statistical Association \bvolume105 \bpages493–505. \endbibitem
  • Abadie, Diamond and Hainmueller (2015) {barticle}[author] \bauthor\bsnmAbadie, \bfnmAlberto\binitsA., \bauthor\bsnmDiamond, \bfnmAlexis\binitsA. and \bauthor\bsnmHainmueller, \bfnmJens\binitsJ. (\byear2015). \btitleComparative politics and the synthetic control method. \bjournalAmerican Journal of Political Science \bvolume59 \bpages495–510. \endbibitem
  • Ben-Michael, Feller and Hartman (2021) {barticle}[author] \bauthor\bsnmBen-Michael, \bfnmEli\binitsE., \bauthor\bsnmFeller, \bfnmAvi\binitsA. and \bauthor\bsnmHartman, \bfnmErin\binitsE. (\byear2021). \btitleMultilevel calibration weighting for survey data. \bjournalarXiv preprint arXiv:2102.09052. \endbibitem
  • Ben-Michael, Feller and Rothstein (2021) {barticle}[author] \bauthor\bsnmBen-Michael, \bfnmEli\binitsE., \bauthor\bsnmFeller, \bfnmAvi\binitsA. and \bauthor\bsnmRothstein, \bfnmJesse\binitsJ. (\byear2021). \btitleThe augmented synthetic control method. \bjournalJournal of the American Statistical Association \bvolumejust-accepted \bpages1–34. \endbibitem
  • Bonvini and Kennedy (2021) {barticle}[author] \bauthor\bsnmBonvini, \bfnmMatteo\binitsM. and \bauthor\bsnmKennedy, \bfnmEdward H\binitsE. H. (\byear2021). \btitleSensitivity analysis via the proportion of unmeasured confounding. \bjournalJournal of the American Statistical Association \bpages1–11. \endbibitem
  • Botosaru and Ferman (2019) {barticle}[author] \bauthor\bsnmBotosaru, \bfnmIrene\binitsI. and \bauthor\bsnmFerman, \bfnmBruno\binitsB. (\byear2019). \btitleOn the role of covariates in the synthetic control method. \bjournalThe Econometrics Journal \bvolume22 \bpages117–130. \endbibitem
  • Buonaccorsi (2010) {bbook}[author] \bauthor\bsnmBuonaccorsi, \bfnmJohn P\binitsJ. P. (\byear2010). \btitleMeasurement error: models, methods, and applications. \bpublisherChapman and Hall/CRC. \endbibitem
  • Cameron, Gelbach and Miller (2008) {barticle}[author] \bauthor\bsnmCameron, \bfnmA Colin\binitsA. C., \bauthor\bsnmGelbach, \bfnmJonah B\binitsJ. B. and \bauthor\bsnmMiller, \bfnmDouglas L\binitsD. L. (\byear2008). \btitleBootstrap-based improvements for inference with clustered errors. \bjournalThe Review of Economics and Statistics \bvolume90 \bpages414–427. \endbibitem
  • Cameron and Miller (2015) {barticle}[author] \bauthor\bsnmCameron, \bfnmA Colin\binitsA. C. and \bauthor\bsnmMiller, \bfnmDouglas L\binitsD. L. (\byear2015). \btitleA practitioner’s guide to cluster-robust inference. \bjournalJournal of human resources \bvolume50 \bpages317–372. \endbibitem
  • Carroll et al. (2006) {bbook}[author] \bauthor\bsnmCarroll, \bfnmRaymond J\binitsR. J., \bauthor\bsnmRuppert, \bfnmDavid\binitsD., \bauthor\bsnmStefanski, \bfnmLeonard A\binitsL. A. and \bauthor\bsnmCrainiceanu, \bfnmCiprian M\binitsC. M. (\byear2006). \btitleMeasurement error in nonlinear models: a modern perspective. \bpublisherCRC press. \endbibitem
  • Chattopadhyay and Zubizarreta (2021) {barticle}[author] \bauthor\bsnmChattopadhyay, \bfnmAmbarish\binitsA. and \bauthor\bsnmZubizarreta, \bfnmJose R\binitsJ. R. (\byear2021). \btitleOn the implied weights of linear regression for causal inference. \bjournalarXiv preprint arXiv:2104.06581. \endbibitem
  • Courtemanche et al. (2017) {barticle}[author] \bauthor\bsnmCourtemanche, \bfnmCharles\binitsC., \bauthor\bsnmMarton, \bfnmJames\binitsJ., \bauthor\bsnmUkert, \bfnmBenjamin\binitsB., \bauthor\bsnmYelowitz, \bfnmAaron\binitsA. and \bauthor\bsnmZapata, \bfnmDaniela\binitsD. (\byear2017). \btitleEarly impacts of the Affordable Care Act on health insurance coverage in Medicaid expansion and non-expansion states. \bjournalJournal of Policy Analysis and Management \bvolume36 \bpages178–210. \endbibitem
  • Daw and Hatfield (2018) {barticle}[author] \bauthor\bsnmDaw, \bfnmJamie R\binitsJ. R. and \bauthor\bsnmHatfield, \bfnmLaura A\binitsL. A. (\byear2018). \btitleMatching and Regression to the Mean in Difference-in-Differences Analysis. \bjournalHealth services research \bvolume53 \bpages4138–4156. \endbibitem
  • Deville and Särndal (1992) {barticle}[author] \bauthor\bsnmDeville, \bfnmJean-Claude\binitsJ.-C. and \bauthor\bsnmSärndal, \bfnmCarl-Erik\binitsC.-E. (\byear1992). \btitleCalibration estimators in survey sampling. \bjournalJournal of the American statistical Association \bvolume87 \bpages376–382. \endbibitem
  • Deville, Särndal and Sautory (1993) {barticle}[author] \bauthor\bsnmDeville, \bfnmJean-Claude\binitsJ.-C., \bauthor\bsnmSärndal, \bfnmCarl-Erik\binitsC.-E. and \bauthor\bsnmSautory, \bfnmOlivier\binitsO. (\byear1993). \btitleGeneralized raking procedures in survey sampling. \bjournalJournal of the American statistical Association \bvolume88 \bpages1013–1020. \endbibitem
  • D’Amour et al. (2021) {barticle}[author] \bauthor\bsnmD’Amour, \bfnmAlexander\binitsA., \bauthor\bsnmDing, \bfnmPeng\binitsP., \bauthor\bsnmFeller, \bfnmAvi\binitsA., \bauthor\bsnmLei, \bfnmLihua\binitsL. and \bauthor\bsnmSekhon, \bfnmJasjeet\binitsJ. (\byear2021). \btitleOverlap in observational studies with high-dimensional covariates. \bjournalJournal of Econometrics \bvolume221 \bpages644–654. \endbibitem
  • Efron and Stein (1981) {barticle}[author] \bauthor\bsnmEfron, \bfnmBradley\binitsB. and \bauthor\bsnmStein, \bfnmCharles\binitsC. (\byear1981). \btitleThe jackknife estimate of variance. \bjournalThe Annals of Statistics \bpages586–596. \endbibitem
  • Frean, Gruber and Sommers (2017) {barticle}[author] \bauthor\bsnmFrean, \bfnmMolly\binitsM., \bauthor\bsnmGruber, \bfnmJonathan\binitsJ. and \bauthor\bsnmSommers, \bfnmBenjamin D\binitsB. D. (\byear2017). \btitlePremium subsidies, the mandate, and Medicaid expansion: Coverage effects of the Affordable Care Act. \bjournalJournal of Health Economics \bvolume53 \bpages72–86. \endbibitem
  • Gleser (1992) {barticle}[author] \bauthor\bsnmGleser, \bfnmLeon Jay\binitsL. J. (\byear1992). \btitleThe importance of assessing measurement reliability in multivariate regression. \bjournalJournal of the American Statistical Association \bvolume87 \bpages696–707. \endbibitem
  • Greifer (2021) {bmanual}[author] \bauthor\bsnmGreifer, \bfnmNoah\binitsN. (\byear2021). \btitleoptweight: Targeted Stable Balancing Weights Using Optimization \bnoteR package version 0.2.5.9000. \endbibitem
  • Haberman (1984) {barticle}[author] \bauthor\bsnmHaberman, \bfnmShelby J\binitsS. J. (\byear1984). \btitleAdjustment by minimum discriminant information. \bjournalThe Annals of Statistics \bpages971–988. \endbibitem
  • Hainmueller (2012) {barticle}[author] \bauthor\bsnmHainmueller, \bfnmJens\binitsJ. (\byear2012). \btitleEntropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. \bjournalPolitical analysis \bvolume20 \bpages25–46. \endbibitem
  • Huque, Bondell and Ryan (2014) {barticle}[author] \bauthor\bsnmHuque, \bfnmMd Hamidul\binitsM. H., \bauthor\bsnmBondell, \bfnmHoward D\binitsH. D. and \bauthor\bsnmRyan, \bfnmLouise\binitsL. (\byear2014). \btitleOn the impact of covariate measurement error on spatial regression modelling. \bjournalEnvironmetrics \bvolume25 \bpages560–570. \endbibitem
  • Illenberger, Small and Shaw (2020) {barticle}[author] \bauthor\bsnmIllenberger, \bfnmNicholas A\binitsN. A., \bauthor\bsnmSmall, \bfnmDylan S\binitsD. S. and \bauthor\bsnmShaw, \bfnmPamela A\binitsP. A. (\byear2020). \btitleImpact of Regression to the Mean on the Synthetic Control Method: Bias and Sensitivity Analysis. \bjournalEpidemiology \bvolume31 \bpages815–822. \endbibitem
  • Imai and Ratkovic (2014) {barticle}[author] \bauthor\bsnmImai, \bfnmKosuke\binitsK. and \bauthor\bsnmRatkovic, \bfnmMarc\binitsM. (\byear2014). \btitleCovariate balancing propensity score. \bjournalJournal of the Royal Statistical Society: Series B: Statistical Methodology \bpages243–263. \endbibitem
  • Imbens (2004) {barticle}[author] \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. (\byear2004). \btitleNonparametric estimation of average treatment effects under exogeneity: A review. \bjournalReview of Economics and statistics \bvolume86 \bpages4–29. \endbibitem
  • Kaestner et al. (2017) {barticle}[author] \bauthor\bsnmKaestner, \bfnmRobert\binitsR., \bauthor\bsnmGarrett, \bfnmBowen\binitsB., \bauthor\bsnmChen, \bfnmJiajia\binitsJ., \bauthor\bsnmGangopadhyaya, \bfnmAnuj\binitsA. and \bauthor\bsnmFleming, \bfnmCaitlyn\binitsC. (\byear2017). \btitleEffects of ACA Medicaid expansions on health insurance coverage and labor supply. \bjournalJournal of Policy Analysis and Management \bvolume36 \bpages608–642. \endbibitem
  • Keele et al. (2020) {barticle}[author] \bauthor\bsnmKeele, \bfnmLuke\binitsL., \bauthor\bsnmBen-Michael, \bfnmEli\binitsE., \bauthor\bsnmFeller, \bfnmAvi\binitsA., \bauthor\bsnmKelz, \bfnmRachel\binitsR. and \bauthor\bsnmMiratrix, \bfnmLuke\binitsL. (\byear2020). \btitleHospital Quality Risk Standardization via Approximate Balancing Weights. \bjournalarXiv preprint arXiv:2007.09056. \endbibitem
  • Kline (2011) {barticle}[author] \bauthor\bsnmKline, \bfnmPatrick\binitsP. (\byear2011). \btitleOaxaca-Blinder as a reweighting estimator. \bjournalAmerican Economic Review \bvolume101 \bpages532–37. \endbibitem
  • Ladhania et al. (2021) {barticle}[author] \bauthor\bsnmLadhania, \bfnmRahul\binitsR., \bauthor\bsnmHaviland, \bfnmAmelia M\binitsA. M., \bauthor\bsnmVenkat, \bfnmArvind\binitsA., \bauthor\bsnmTelang, \bfnmRahul\binitsR. and \bauthor\bsnmPines, \bfnmJesse M\binitsJ. M. (\byear2021). \btitleThe effect of Medicaid expansion on the nature of new enrollees’ emergency department use. \bjournalMedical Care Research and Review \bvolume78 \bpages24–35. \endbibitem
  • Miller, Johnson and Wherry (2021) {barticle}[author] \bauthor\bsnmMiller, \bfnmSarah\binitsS., \bauthor\bsnmJohnson, \bfnmNorman\binitsN. and \bauthor\bsnmWherry, \bfnmLaura R\binitsL. R. (\byear2021). \btitleMedicaid and mortality: new evidence from linked survey and administrative data. \bjournalThe Quarterly Journal of Economics \bvolume136 \bpages1783–1829. \endbibitem
  • Rubin (2005) {barticle}[author] \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2005). \btitleCausal inference using potential outcomes: Design, modeling, decisions. \bjournalJournal of the American Statistical Association \bvolume100 \bpages322–331. \endbibitem
  • Särndal and Lundström (2005) {bbook}[author] \bauthor\bsnmSärndal, \bfnmCarl-Erik\binitsC.-E. and \bauthor\bsnmLundström, \bfnmSixten\binitsS. (\byear2005). \btitleEstimation in surveys with nonresponse. \bpublisherJohn Wiley & Sons. \endbibitem
  • Sommers et al. (2012) {barticle}[author] \bauthor\bsnmSommers, \bfnmBen\binitsB., \bauthor\bsnmKronick, \bfnmRick\binitsR., \bauthor\bsnmFinegold, \bfnmKenneth\binitsK., \bauthor\bsnmPo, \bfnmRosa\binitsR., \bauthor\bsnmSchwartz, \bfnmKaryn\binitsK. and \bauthor\bsnmGlied, \bfnmSherry\binitsS. (\byear2012). \btitleUnderstanding participation rates in Medicaid: implications for the Affordable Care Act. \bjournalPublic Health \bvolume93 \bpages67–74. \endbibitem
  • R Core Team (2020) {bmanual}[author] \bauthor\bsnmR Core Team (\byear2020). \btitleR: A Language and Environment for Statistical Computing \bpublisherR Foundation for Statistical Computing, \baddressVienna, Austria. \endbibitem
  • Wang and Zubizarreta (2020) {barticle}[author] \bauthor\bsnmWang, \bfnmYixin\binitsY. and \bauthor\bsnmZubizarreta, \bfnmJose R\binitsJ. R. (\byear2020). \btitleMinimal dispersion approximately balancing weights: asymptotic properties and practical considerations. \bjournalBiometrika \bvolume107 \bpages93–105. \endbibitem
  • Wickham et al. (2019) {barticle}[author] \bauthor\bsnmWickham, \bfnmHadley\binitsH., \bauthor\bsnmAverick, \bfnmMara\binitsM., \bauthor\bsnmBryan, \bfnmJennifer\binitsJ., \bauthor\bsnmChang, \bfnmWinston\binitsW., \bauthor\bsnmMcGowan, \bfnmLucy D’Agostino\binitsL. D., \bauthor\bsnmFrançois, \bfnmRomain\binitsR., \bauthor\bsnmGrolemund, \bfnmGarrett\binitsG., \bauthor\bsnmHayes, \bfnmAlex\binitsA., \bauthor\bsnmHenry, \bfnmLionel\binitsL., \bauthor\bsnmHester, \bfnmJim\binitsJ., \bauthor\bsnmKuhn, \bfnmMax\binitsM., \bauthor\bsnmPedersen, \bfnmThomas Lin\binitsT. L., \bauthor\bsnmMiller, \bfnmEvan\binitsE., \bauthor\bsnmBache, \bfnmStephan Milton\binitsS. M., \bauthor\bsnmMüller, \bfnmKirill\binitsK., \bauthor\bsnmOoms, \bfnmJeroen\binitsJ., \bauthor\bsnmRobinson, \bfnmDavid\binitsD., \bauthor\bsnmSeidel, \bfnmDana Paige\binitsD. P., \bauthor\bsnmSpinu, \bfnmVitalie\binitsV., \bauthor\bsnmTakahashi, \bfnmKohske\binitsK., \bauthor\bsnmVaughan, \bfnmDavis\binitsD., \bauthor\bsnmWilke, \bfnmClaus\binitsC., \bauthor\bsnmWoo, \bfnmKara\binitsK. and \bauthor\bsnmYutani, \bfnmHiroaki\binitsH. (\byear2019). \btitleWelcome to the tidyverse. \bjournalJournal of Open Source Software \bvolume4 \bpages1686. \bdoi10.21105/joss.01686 \endbibitem
  • Zhang et al. (2019) {barticle}[author] \bauthor\bsnmZhang, \bfnmZhongheng\binitsZ., \bauthor\bsnmKim, \bfnmHwa Jung\binitsH. J., \bauthor\bsnmLonjon, \bfnmGuillaume\binitsG., \bauthor\bsnmZhu, \bfnmYibing\binitsY. \betalet al. (\byear2019). \btitleBalance diagnostics after propensity score matching. \bjournalAnnals of translational medicine \bvolume7. \endbibitem
  • Zubizarreta (2015) {barticle}[author] \bauthor\bsnmZubizarreta, \bfnmJosé R\binitsJ. R. (\byear2015). \btitleStable weights that balance covariates for estimation with incomplete outcome data. \bjournalJournal of the American Statistical Association \bvolume110 \bpages910–922. \endbibitem

Appendix A Proofs

We divide our proofs into three sections: the first two consist of propositions and the third contains the proofs of the propositions. In the first section our propositions pertain to the performance of SBW under the classical measurement error model. Our key results are that the bias of the SBW estimator is equivalent to the bias of the OLS estimator and that regression-calibration techniques can be used in this setting to obtain consistent estimators. However, these results assume that the data are gaussian. We also show that if the data are not gaussian, the OLS estimator using regression-calibration remains consistent, while the SBW estimator may be biased; however, this can be corrected if other distributional assumptions are made in place of gaussianity. In our second section we consider the properties of the H-SBW objective when the true covariates XX are observed. We show that if our assumed covariance structure for the outcome errors is correct, H-SBW produces the minimum conditional-on-X variance estimator within the constraint set. We also show how a generalized form of H-SBW weights relate to the implied regression weights from Generalized Least Squares (GLS). We conclude by showing that H-SBW may yield biased estimates if we do not correctly model the dependence structure of the data. Section A.3 contains all of the proofs.

A.1 SBW and classical measurement error

We begin by showing several results regarding the bias of the OLS and SBW estimators under the classical errors-in-variables model. First, we show that without adjustment for errors-in-covariates, the bias of the SBW estimator that sets δ=0\delta=0 (i.e. reweights the treated units to exactly balance the control units) is equal to the bias of the OLS estimator. Second, we show that if the observed covariate values for the treated data can be replaced by their conditional expectations X~\tilde{X} given the noisy observations, then the SBW estimator will be unbiased and consistent. Third, we consider the case where X~\tilde{X} must be estimated, and show that the SBW estimator is consistent if we replace X~\tilde{X} by a consistent estimate X^\hat{X}. Finally, we remove the assumption that XX is gaussian, and show that while the OLS estimator remains unbiased under weaker assumptions, the SBW estimator does not, and we show a general expression for the asymptotic bias. We take the perspective throughout that XX is random among the treated units but fixed for the control units.

We assume that equations (3) - (7) hold. For simplicity, we additionally assume that

ϵsc=0,εs=0,ξsc=0,Σν,sc=Σνs,c\epsilon_{sc}=0,\quad\varepsilon_{s}=0,\quad\xi_{sc}=0,\quad\Sigma_{\nu,sc}=\Sigma_{\nu}\qquad\forall s,c (18)

noting that ξsc=0\xi_{sc}=0 implies Jsc=YscJ_{sc}=Y_{sc}. These assumptions imply that the data from the treated units are i.i.d., though for consistency of notation we continue to index the data by ss and cc and assume that the state-membership for each CPUMA is known. The covariate observations of the treated units can then be seen to have covariance matrix

ΣW|1=ΣX|1+Σν\Sigma_{W|1}=\Sigma_{X|1}+\Sigma_{\nu}

and the conditional expectation of XscX_{sc} given WscW_{sc} for the treated units can be seen to equal

X~sc=v1+κ(Wscv1),sc:Asc=1\tilde{X}_{sc}=v_{1}+\kappa^{\top}(W_{sc}-v_{1}),\qquad\forall sc:A_{sc}=1

where

κ=(ΣX|1+Σν)1ΣX|1\kappa=(\Sigma_{X|1}+\Sigma_{\nu})^{-1}\Sigma_{X|1}

To ease notation, we abbreviate ΣX=ΣX1\Sigma_{X}=\Sigma_{X\mid 1} and similarly ΣW=ΣW1\Sigma_{W}=\Sigma_{W\mid 1}.

In Propositions 6, 7, and part of Proposition 1, we will remove the Gaussian covariate assumption given by (7). In its place, we will instead consider the weaker assumption that the empirical covariance of XX has a limit SXS_{X},

1n1Asc=1(XscX¯1)(XscX¯1)pSX\frac{1}{n_{1}}\sum_{A_{sc}=1}(X_{sc}-\bar{X}_{1})(X_{sc}-\bar{X}_{1})^{\top}\rightarrow^{p}S_{X} (19)

which implies a similar limit SWS_{W} for the noisy observations WW,

1n1Asc=1(WscW¯1)(WscW¯1)pSW=SX+Σν\frac{1}{n_{1}}\sum_{A_{sc}=1}(W_{sc}-\bar{W}_{1})(W_{sc}-\bar{W}_{1})^{\top}\rightarrow^{p}S_{W}=S_{X}+\Sigma_{\nu} (20)

where we have used the independence of the noise terms νsc\nu_{sc}, and similarly that

1n1Asc=1(WscW¯1)(YscY¯1)pSXβ1\frac{1}{n_{1}}\sum_{A_{sc}=1}(W_{sc}-\bar{W}_{1})(Y_{sc}-\bar{Y}_{1})^{\top}\rightarrow^{p}S_{X}\beta_{1} (21)

where we have additionally used the linear model for YscY_{sc} given by (4).

We first consider estimation without adjustment for errors in covariates. Proposition 1 states that the unadjusted OLS and SBW estimators have equal bias, with the bias of the OLS estimator remaining unchanged if the gaussian assumption of (6) is removed.

Proposition 1.

Let (3) - (7) and (18) hold. Let (α^,β^)(\hat{\alpha},\hat{\beta}) denote the unadjusted OLS estimator of (α1,β1)(\alpha_{1},\beta_{1}),

(α^,β^)=argminα,βsc:Asc=1(YscαWscβ)2(\hat{\alpha},\hat{\beta})=\arg\min_{\alpha,\beta}\sum_{sc:A_{sc}=1}(Y_{sc}-\alpha-W_{sc}^{\top}\beta)^{2} (22)

which induces the OLS estimator of ψ01\psi_{0}^{1} given by

ψ^01,ols=Y¯1+(W¯0W¯1)β^1\displaystyle\hat{\psi}^{1,\textup{ols}}_{0}=\bar{Y}_{1}+(\bar{W}_{0}-\bar{W}_{1})^{\top}\hat{\beta}_{1}

Let γ{\gamma} denote the unadjusted SBW weights under exact balance, found by solving (11) with constraint set Γ(WA=1,W¯0,0)\Gamma(W_{A=1},\bar{W}_{0},0), which induces the SBW estimator of ψ01\psi_{0}^{1} given by

ψ^01,sbw=sc:Asc=1γscYsc\displaystyle\hat{\psi}^{1,\textup{sbw}}_{0}=\sum_{sc:A_{sc}=1}{\gamma}_{sc}Y_{sc}

Then the estimators ψ^01,ols\hat{\psi}^{1,\textup{ols}}_{0} and ψ^01,sbw\hat{\psi}^{1,\textup{sbw}}_{0} have equal bias, satisfying

𝔼[ψ^01,ols]\displaystyle\mathbb{E}[\hat{\psi}_{0}^{1,\textup{ols}}] =𝔼[ψ^01,sbw]=ψ01+(X¯0υ1)(κIq)β\displaystyle=\mathbb{E}[\hat{\psi}^{1,\textup{sbw}}_{0}]=\psi_{0}^{1}+(\bar{X}_{0}-\upsilon_{1})^{\top}(\mathbf{\kappa}-I_{q})\beta

Additionally, the bias of ψ^01,ols\hat{\psi}_{0}^{1,\textup{ols}} is asymptotically unchanged if the gaussian covariate assumption given by (7) is replaced by (19).

To study the SBW estimator with covariate adjustment, we first consider an idealized version where ΣX\Sigma_{X} and Σν\Sigma_{\nu} are known, so that X~A=1\tilde{X}_{A=1} is also known. Proposition 2 shows that the resulting estimate of ψ01\psi_{0}^{1} is unbiased if δ=0\delta=0.

Proposition 2.

Let (3) - (7) and (18) hold. Let X~A=1\tilde{X}_{A=1} equal the conditional expectation of XA=1X_{A=1} given WW,

X~sc=υ1+κ(Wscυ1),sc:Asc=1\tilde{X}_{sc}=\upsilon_{1}+\kappa^{\top}(W_{sc}-\upsilon_{1}),\qquad\forall sc:A_{sc}=1

let γ\gamma^{*} be the solution to the SBW objective defined over the constraint set Γ(X~A=1,X¯0,0)\Gamma(\tilde{X}_{A=1},\bar{X}_{0},0), and let ψ^01,ideal\hat{\psi}^{1,\textup{ideal}}_{0} be the SBW estimator sc:Asc=1γscYsc\sum_{sc:A_{sc}=1}\gamma^{\star}_{sc}Y_{sc}. This estimator is unbiased for ψ01\psi_{0}^{1}.

Proposition 3 shows that the variance of this idealized SBW estimator goes to zero, implying consistency.

Proposition 3.

Let (3) - (7) and (18) hold, and let γ\gamma^{*} and ψ^01,ideal\hat{\psi}_{0}^{1,\textup{ideal}} be defined as in Proposition 2. Then the conditional variance of the estimation error is given by

Var(ψ^01,idealψ01|W)=γ2β1(ΣXΣXΣW1ΣX)β1\displaystyle\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1}|W\right)=\|\gamma^{*}\|^{2}\cdot\beta_{1}^{\top}(\Sigma_{X}-\Sigma_{X}\Sigma_{W}^{-1}\Sigma_{X})\beta_{1}

with Var(ψ^01,idealψ01|W)\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1}|W\right) and Var(ψ^01,ideal)\operatorname{Var}(\hat{\psi}_{0}^{1,\textup{ideal}}) both behaving as OP(n11)O_{P}(n_{1}^{-1}) as n1n_{1}\rightarrow\infty.

In practice, the idealized SBW estimator considered in Propositions 2 and 3 cannot be used, as ΣX\Sigma_{X} and Σν\Sigma_{\nu} are not known, but instead must be estimated from auxiliary data. Proposition 4 states that if these estimates are consistent, then the resulting adjusted SBW estimator for ψ01\psi_{0}^{1} is also consistent if δ=0\delta=0.

Proposition 4.

Let (3) - (7) and (18) hold. Given estimates Σ^X\hat{\Sigma}_{X} and Σ^ν\hat{\Sigma}_{\nu} that are consistent for ΣX\Sigma_{X} and Σν\Sigma_{\nu}, let X^A=1\hat{X}_{A=1} be given by

X^sc=W¯1+κ^(WscW¯1),\hat{X}_{sc}=\bar{W}_{1}+\hat{\kappa}^{\top}(W_{sc}-\bar{W}_{1}),

where κ^=(Σ^X+Σ^ν)1Σ^X\hat{\kappa}=(\hat{\Sigma}_{X}+\hat{\Sigma}_{\nu})^{-1}\hat{\Sigma}_{X}. Let γ^\hat{\gamma} be the weights that solve the SBW objective over the constraint set Γ(X^A=1,W¯0,0)\Gamma(\hat{X}_{A=1},\bar{W}_{0},0), and let ψ^01,adjusted=sc:Asc=1γ^scYsc\hat{\psi}^{1,\textup{adjusted}}_{0}=\sum_{sc:A_{sc}=1}\hat{\gamma}_{sc}Y_{sc} be the corresponding SBW estimator. This estimator is consistent for ψ01\psi_{0}^{1} as n1n_{1}\to\infty.

In (17) we propose a leave-one-state-out jackknife estimate of variance. Following Efron and Stein (1981), this estimate can be decomposed a conservatively biased estimate of the variance of ψ^01,adjusted\hat{\psi}_{0}^{1,\textup{adjusted}} given a sample size of (m11)(m_{1}-1) treated states, plus a heuristic adjustment to go from sample size (m11)(m_{1}-1) to sample size m1m_{1}, when treating the observations of the control states as fixed.

Proposition 5.

Let (3) - (7) hold, and additionally assume that psp_{s}, the number of CPUMAs, is i.i.d. in the treated states. Let Var^(ψ^01,adjusted)=m11m1Var~(ψ^01,adjusted)\hat{\operatorname{Var}}(\hat{\psi}_{0}^{1,\textup{adjusted}})=\frac{m_{1}-1}{m_{1}}\cdot\tilde{\operatorname{Var}}(\hat{\psi}_{0}^{1,\textup{adjusted}}), where

Var~(ψ^01,adjusted)=s:As=1(S(s)S())2\tilde{\operatorname{Var}}(\hat{\psi}_{0}^{1,\textup{adjusted}})=\sum_{s:A_{s}=1}(S_{(s)}-S_{(\cdot)})^{2} (23)

with S(s)S_{(s)} and S()S_{(\cdot)} as defined for (17). Then Var~\tilde{\operatorname{Var}} is conservatively biased for the variance of the leave-one-state-out estimate,

𝔼[Var~(ψ^01,adjusted)]Var(S(1)|W¯0),\mathbb{E}\left[\tilde{\operatorname{Var}}(\hat{\psi}_{0}^{1,\textup{adjusted}})\right]\geq\operatorname{Var}(S_{(1)}|\bar{W}_{0}),

where S(1)S_{(1)} can be seen to equal the estimator ψ^01,adjusted\hat{\psi}_{0}^{1,\textup{adjusted}} under a sample size of (m11)(m_{1}-1) treated states.

As the gaussian covariate assumption given by (7) is strong, it would be desirable if the adjusted OLS or SBW estimators were consistent even for non-gaussian XX. Proposition 6 shows under mild assumptions that this is in fact true when running OLS on the adjusted covariates.

Proposition 6.

Let (3) - (6), and (18)- (19) hold, with SXS_{X} invertible. Let (αˇ,βˇ)(\check{\alpha},\check{\beta}) denote the adjusted OLS estimates of (α1,β1)(\alpha_{1},\beta_{1}), solving

minα,βAsc=1(YscαXˇscβ)2\min_{\alpha,\beta}\sum_{A_{sc}=1}(Y_{sc}-\alpha-\check{X}_{sc}^{\top}\beta)^{2}

where Xˇsc=W¯1+κˇ(WscW¯1)\check{X}_{sc}=\bar{W}_{1}+\check{\kappa}^{\top}(W_{sc}-\bar{W}_{1}) with κˇ=(SX+Σν)1SX\check{\kappa}=(S_{X}+\Sigma_{\nu})^{-1}S_{X}. Then the adjusted OLS estimator of ψ01\psi_{0}^{1} given by

Y¯1(W¯0W¯1)βˇ\bar{Y}_{1}-(\bar{W}_{0}-\bar{W}_{1})^{\top}\check{\beta}

remains consistent if the gaussian assumption given by (6) is removed.

However, the same does not hold for the adjusted SBW estimator. Proposition 7 gives an expression for its bias when the covariates are non-gaussian.

Proposition 7.

Let the assumptions of Proposition 6 hold. Let γˇ\check{\gamma} solve the SBW objective over the constraint set Γ(XˇA=1,W¯0,0)\Gamma(\check{X}_{A=1},\bar{W}_{0},0) where XˇA=1\check{X}_{A=1} and κˇ\check{\kappa} are defined as in Proposition 6. Let QQ denote the set of indices where γˇ\check{\gamma} is non-zero,

Q={sc:γˇsc>0}Q=\{sc:\check{\gamma}_{sc}>0\}

with cardinality nQ=|Q|n_{Q}=|Q|, and let W¯Q\bar{W}_{Q} and SWQS_{W_{Q}} denote the empirical mean and covariance of {Wsc:scQ}\{W_{sc}:sc\in Q\},

W¯Q=1nQscQWsc,SWQ=1nQscQ(WscW¯Q)(WscW¯Q)\bar{W}_{Q}=\frac{1}{n_{Q}}\sum_{sc\in Q}W_{sc},\qquad S_{W_{Q}}=\frac{1}{n_{Q}}\sum_{sc\in Q}(W_{sc}-\bar{W}_{Q})(W_{sc}-\bar{W}_{Q})^{\top}

with X¯Q\bar{X}_{Q} the analogous empirical mean of {Xsc:scQ}\{X_{sc}:sc\in Q\} and SXWQS_{XW_{Q}} the empirical cross covariance,

SXWQ=1nQscQ(XscX¯Q)(WscW¯Q)S_{XW_{Q}}=\frac{1}{n_{Q}}\sum_{sc\in Q}(X_{sc}-\bar{X}_{Q})(W_{sc}-\bar{W}_{Q})^{\top}

Then if the gaussian assumption given by (6) is removed, the adjusted SBW estimator for ψ01\psi_{0}^{1} given by

Asc=1Yscγˇsc\sum_{A_{sc}=1}Y_{sc}\check{\gamma}_{sc}

may be biased for ψ01\psi_{0}^{1}, with estimation error given by

Asc=1Yscγˇscψ01\displaystyle\sum_{A_{sc}=1}Y_{sc}\check{\gamma}_{sc}-\psi_{0}^{1} =β1[(SXWQSWQ1SWSX1I)X¯0+(X¯QSXWQSWQ1SWSX1X¯1)\displaystyle=\beta_{1}^{\top}\Big{[}(S_{XW_{Q}}S_{W_{Q}}^{-1}S_{W}S_{X}^{-1}-I)\bar{X}_{0}+(\bar{X}_{Q}-S_{XW_{Q}}S_{W_{Q}}^{-1}S_{W}S_{X}^{-1}\bar{X}_{1})
SXWQSWQ1(X¯QX¯1)](1+oP(1))\displaystyle\hskip 28.45274pt{}-S_{XW_{Q}}S_{W_{Q}}^{-1}(\bar{X}_{Q}-\bar{X}_{1})\Big{]}(1+o_{P}(1)) (24)

which need not converge to zero unless X¯QX¯1\bar{X}_{Q}\to\bar{X}_{1}, SXWQSXS_{XW_{Q}}\to S_{X}, and SWQSWS_{W_{Q}}\to S_{W}.

Proposition 8 shows that if the conditional expectations can be computed for the treated units (which may be computationally difficult or require strong modeling assumptions if the data is non-gaussian, or if dependencies exist between CPUMAs), then SBW yields unbiased estimates.

Proposition 8.

Let equations (3)-(4) hold. Let X~\tilde{X}^{*} denote the conditional expectation,

X~sc=𝔼[Xsc|W,Asc=1]\tilde{X}^{*}_{sc}=\mathbb{E}[X_{sc}|W,A_{sc}=1]

let weights γ~\tilde{\gamma}^{*} solve the SBW objective (11) with constraint set Γ(X~A=1,X¯0,0)\Gamma(\tilde{X}^{\star}_{A=1},\bar{X}_{0},0), and consider the estimator of ψ01\psi_{0}^{1} given by Asc=1Yscγ~sc\sum_{A_{sc}=1}Y_{sc}\tilde{\gamma}^{*}_{sc}. This estimator is unbiased for ψ01\psi_{0}^{1}.

remark 1.

While we have assumed that ϵsc=0\epsilon_{sc}=0 for simplicity in our propositions, removing this assumption simply leads to the additional term sc:Asc=1γscϵsc\sum_{sc:A_{sc}=1}\gamma_{sc}\epsilon_{sc} in the error of the SBW estimator of ψ01\psi_{0}^{1}. This again has expectation zero, because the weights remain independent of the error ϵsc\epsilon_{sc} in the outcomes. Allowing non-zero ϵsc\epsilon_{sc} also adds a term to the estimator variance (conditional on WW) equal to σϵ2γ2\sigma^{2}_{\epsilon}\cdot\|\gamma^{*}\|^{2}, which does not change the variance bound given by Proposition 3.

remark 2.

For the adjusted OLS estimator, in which β1\beta_{1} is estimated using the adjusted covariates X~A=1\tilde{X}_{A=1}, in practice we must estimate X~\tilde{X} with some estimator X^\hat{X} that relies on an estimate κ^\hat{\kappa}. As long as κ^\hat{\kappa} is consistent for κ\kappa then the OLS estimator will also be consistent by the continuous mapping theorem.

remark 3.

As Proposition 5 implies that Var^\hat{\operatorname{Var}} is conservatively biased only up the heuristic (m11)/m1(m_{1}-1)/m_{1} scaling term, it may be preferable to remove this scaling term entirely, inflating the variance estimate slightly. While the proposition considers the marginal variance of the estimator ψ^01,adjusted\hat{\psi}_{0}^{1,\textup{adjusted}}, a confidence interval using the conditional variance Var(ψ^01adjusted|X)\operatorname{Var}(\hat{\psi}_{0}^{1\textup{adjusted}}|X) (see, e.g., Buonaccorsi (2010), who discuss using a modification of the parametric bootstrap for parameters estimated via OLS in this setting) may be of interest, potentially leading to smaller intervals and more precise inference.

remark 4.

To see how Proposition 7 implies that the adjusted SBW estimate may be biased in non-gaussian settings, we observe that as the set QQ in Proposition 7 will depend on the values of the covariates XX and observation noise ν\nu, the values of {Xsc:scQ}\{X_{sc}:sc\in Q\} and {νsc:scQ}\{\nu_{sc}:sc\in Q\} may differ systematically from their population, so that X¯Q\bar{X}_{Q}, SXWQS_{XW_{Q}} and SWQS_{W_{Q}} may not converge to their desired counterparts. While the expression for the estimation error given by (24) is asymptotic, an exact formula is given in (54) which is very similar; the only asymptotic approximations are the convergence of W¯1\bar{W}_{1} to X¯1\bar{X}_{1} and W¯0\bar{W}_{0} to X¯0\bar{X}_{0}.

remark 5.

We describe a possible direction for future work that utilizes Proposition 8. Suppose that in place of equations (5)-(7), we instead assume that XscX_{sc} is a transformation of the covariate, so that Xsc=ϕ(Usc)X_{sc}=\phi(U_{sc}) for some transformation ϕ\phi, and that the untransformed UscU_{sc} is observed with additive noise, so that Wsc=Usc+νscW_{sc}=U_{sc}+\nu_{sc}. For example, to make the linear model (4) more credible, ϕ(Usc)\phi(U_{sc}) might denote a basis expansion applied to the survey sampled covariates for each unit. If, analogous to assumptions (6) and (7), the original covariates UscU_{sc} and measurement error νsc\nu_{sc} can be assumed to be i.i.d. gaussian, so that the treated units satisfy

Usc\displaystyle U_{sc} 𝒩(v1,ΣU|1),\displaystyle\sim\mathcal{N}(v_{1},\Sigma_{U|1}), νsc\displaystyle\nu_{sc} 𝒩(0,Σν),sc:Asc=1\displaystyle\sim\mathcal{N}(0,\Sigma_{\nu}),\qquad\forall\ sc:A_{sc}=1

then the posterior distribution of UscU_{sc} given WW for the treated units will also be gaussian

Usc|Wsc𝒩(U~sc,ΣU~|1),sc:Asc=1U_{sc}|W_{sc}\sim\mathcal{N}(\tilde{U}_{sc},\Sigma_{\tilde{U}|1}),\qquad\forall\ sc:A_{sc}=1

where U~sc\tilde{U}_{sc} and ΣU~|1\Sigma_{\tilde{U}|1} are given for the treated units by

U~sc\displaystyle\tilde{U}_{sc} =v1+ΣU|1(ΣU|1+Σν)1(Wscv1),\displaystyle=v_{1}+\Sigma_{U|1}(\Sigma_{U|1}+\Sigma_{\nu})^{-1}(W_{sc}-v_{1}), ΣU~|1\displaystyle\Sigma_{\tilde{U}|1} =ΣU|1ΣU|1(ΣU|1+Σν)1ΣU|1\displaystyle=\Sigma_{U|1}-\Sigma_{U|1}(\Sigma_{U|1}+\Sigma_{\nu})^{-1}\Sigma_{U|1}

with analogous expressions for the control units. This suggests that if auxiliary data can be used to find ΣU|1\Sigma_{U|1}, ΣU|0\Sigma_{U|0}, and Σν\Sigma_{\nu} as before, then X~sc=𝔼[ϕ(Usc)|W,A]\tilde{X}^{*}_{sc}=\mathbb{E}[\phi(U_{sc})|W,A] could be estimated by using monte carlo methods. Specifically, for each unit scsc we can generate random variates {ui}\{u_{i}\} that are i.i.d. normal with mean U~sc\tilde{U}_{sc} and covariance ΣU~|Asc\Sigma_{\tilde{U}|A_{sc}}, and estimate X~sc\tilde{X}_{sc}^{*} by the average of {ϕ(ui)}\{\phi(u_{i})\}. To estimate the SBW constraint set Γ(X~A=1,X¯0,0)\Gamma(\tilde{X}^{*}_{A=1},\bar{X}_{0},0), then X¯0\bar{X}_{0} could be estimated by averaging X~A=0\tilde{X}^{*}_{A=0}. By Proposition 8 the resulting SBW weights would yield unbiased estimates.

A.2 Properties of H-SBW

Here we consider an H-SBW setting where νsc=0\nu_{sc}=0 so that the true covariates are observed. By (4), the outcomes have CPUMA level noise terms ϵsc\epsilon_{sc}, and also state-level noise terms εs\varepsilon_{s} that correlate the outcomes of CPUMAs in the same state. Proposition 9 states that if ρ\rho is the within-state correlation of these error terms, the H-SBW estimator produces the minimum conditional-on-X variance estimator of ψ01\psi_{0}^{1} within the constraint set.

Proposition 9.

Consider the outcome model in  (4). Assume the errors are homoskedastic and have finite variance σϵ2\sigma^{2}_{\epsilon} and σε2\sigma^{2}_{\varepsilon}, and let ρ\rho be the within-state correlation of the error terms. Let γ^hsbw\hat{\gamma}^{\textup{hsbw}} be the weights that solve (14) for known parameter ρ\rho across the constraint set Γ(XA=1,X¯0,δ)\Gamma(X_{A=1},\bar{X}_{0},\delta) for any δ\delta. Then the H-SBW estimator of ψ01\psi_{0}^{1},

s:As=1c=1psγ^schsbwYsc\sum_{s:A_{s}=1}\sum_{c=1}^{p_{s}}\hat{\gamma}_{sc}^{\textup{hsbw}}Y_{sc}

is the minimum conditional-on-X variance estimator of ψ01\psi_{0}^{1} within the constraint set Γ(XA=1,X¯0,δ)\Gamma(X_{A=1},\bar{X}_{0},\delta).

The SBW and H-SBW objective functions take the generic form γΩγ\gamma^{\top}\Omega\gamma: SBW takes Ω=In\Omega=I_{n}, while H-SBW specifies an Ω\Omega that allows for homoskedastic errors with positive within-state equicorrelation. Analogous versions hence exist for any assumed covariance structure Ω\Omega. Proposition 10 highlights connections between this generic form and GLS, showing that when exact balance is possible we can express the generic form of the problem as the implied regression weights from GLS estimated on a subset of the data. Similar results connecting balancing weights to regression weights can be found throughout the literature. For example, see Kline (2011), Ben-Michael, Feller and Rothstein (2021), Chattopadhyay and Zubizarreta (2021), who connect balancing weights to regression weights for OLS, ridge-regression, and weighted least-squares and two-stage-least-squares; our result adds by connecting balancing weights to regression weights for GLS.

Proposition 10.

Let γ\gamma^{*} solve the optimization problem

minγγΩγ subject to iγiZi=v,iγi=1 and γ0\min_{\gamma}\gamma^{\top}\Omega\gamma\quad\text{ subject to }\quad\sum_{i}\gamma_{i}Z_{i}=v,\ \sum_{i}\gamma_{i}=1\ \textup{ and }\gamma\geq 0 (25)

with Ω\Omega positive definite. Let Q={i:γi>0}Q=\{i:\gamma^{*}_{i}>0\} denote the indices of its non-zero entries. Then γ\gamma^{*} also solves the problem

minγγΩγsubject to iQγiZi=v,iQγi=1, and γi=0iQ\min_{\gamma}\ \gamma^{\top}\Omega\gamma\quad\textup{subject to }\quad\sum_{i\in Q}\gamma_{i}Z_{i}=v,\ \sum_{i\in Q}\gamma_{i}=1,\ \textup{ and }\gamma_{i}=0\ \forall\ i\not\in Q (26)

and hence has non-zero entries γQ={γi:iQ}\gamma^{*}_{Q}=\{\gamma_{i}^{*}:i\in Q\} satisfying

γQ=ΩQ1(ZQμ)[(ZQμ)ΩQ1(ZQμ)]1(vμ)+ΩQ1𝟏𝟏ΩQ1𝟏\gamma^{*}_{Q}=\Omega_{Q}^{-1}(Z_{Q}-\mu)^{\top}\left[(Z_{Q}-\mu)\Omega_{Q}^{-1}(Z_{Q}-\mu)^{\top}\right]^{-1}(v-\mu)+\frac{\Omega^{-1}_{Q}{\bf 1}}{{\bf 1}^{\top}\Omega^{-1}_{Q}{\bf 1}} (27)

where ZQZ_{Q} is the matrix whose columns are {Zi:iQ}\{Z_{i}:i\in Q\}, ΩQ\Omega_{Q} is the submatrix of Ω\Omega whose rows and columns are in QQ, 𝟏{\bf 1} is the column vector of ones, and μ\mu is the vector ZQΩQ1𝟏𝟏ΩQ1𝟏\frac{Z_{Q}\Omega_{Q}^{-1}{\bf 1}}{{\bf 1}^{\top}\Omega^{-1}_{Q}{\bf 1}}. These weights are equivalent to the implied regression weights when running GLS on the subset QQ.

remark 6.

To lighten notation, we have used ZQμZ_{Q}-\mu (a vector subtracted from a matrix) to mean ZQμ𝟏Z_{Q}-\mu{\bf 1}^{\top}, so that each column of ZQZ_{Q} is centered by μ\mu.

remark 7.

Removing the positivity constraint from the generic form of the SBW objective implies that the resulting weights are equivalent to the implied regression weights from GLS. This follows by noting that removing the positivity constraint from the SBW objective removes the solution’s dependence on the set QQ. This result is a natural generalization of the connections between the implied regression weights of OLS and SBW also noted by, for example, Chattopadhyay and Zubizarreta (2021).

Proposition 11 simply states that the conclusion of Proposition 8 holds not only for SBW, but for H-SBW as well.

Proposition 11.

Let the assumptions of Proposition 8 hold. Let X~\tilde{X}^{*} be defined as in Proposition 8, let weights γ~hsbw\tilde{\gamma}^{\textup{hsbw}*} solve the H-SBW objective (14) with constraint set Γ(X~A=1,X¯0,0)\Gamma(\tilde{X}^{\star}_{A=1},\bar{X}_{0},0), and consider the estimator of ψ01\psi_{0}^{1} given by Asc=1Yscγ~hsbw\sum_{A_{sc}=1}Y_{sc}\tilde{\gamma}^{\textup{hsbw}*}. This estimator is unbiased for ψ01\psi_{0}^{1}.

remark 8.

In Proposition 9, we assumed the outcomes followed (4) and the constraints balanced the means of the covariates; however, we can allow for any outcome model and our balance constraints can include any function of the covariate distribution and this result still holds conditional on XX (though of course the estimator may be badly biased). The key assumption is that the variability in the estimates comes from the outcome model errors, which are assumed to be homoskedastic and equicorrelated within state for known parameter ρ\rho.

remark 9.

Assuming that (Xsc,Wsc)Asc=1(X_{sc},W_{sc})\mid A_{sc}=1 are gaussian but dependent, Proposition 11 implies that if we correctly model the correlations between the CPUMAs within states in our regression calibration step, we can use GLS or H-SBW without inducing asymptotic bias (assuming all of our models are correct). This is similar to the approach followed in Huque, Bondell and Ryan (2014), who consider parameter estimation using GLS in the context of a one-dimensional spatially-correlated covariate measured with error. We also outline in Appendix B a potential adjustment when we assume the units have a covariance structure similar to our assumptions for the outcome model errors. We evaluate this adjustment in simulations in Appendix F.

To be clear if we do not model this dependence structure we cannot generally use the simple adjustment provided in (8) in combination with GLS to obtain asymptotically unbiased estimates. Intuitively this is because the implied weights from GLS depend on the dependence between CPUMAs within states, which (8) does not correctly account for. By contrast, Proposition 6 shows that we safely can ignore such dependence when using regression-calibration with OLS (as long as a probability limit exists for the empirical covariance matrix).

remark 10.

In our simulation study in Appendix F we obtain an approximately unbiased estimate when using SBW using the simple adjustment provided in (8) with dependent gaussian data. We conjecture that the set QQ may have some limiting boundary. If true, the characterization of SBW weights as regression weights in Proposition 10 would imply that the SBW weight γ^sc\hat{\gamma}_{sc} is fixed conditional on the input data point WscW_{sc} asymptotically. The error of the estimator could then decompose as a function of (XscX~sc)(X_{sc}-\tilde{X}_{sc}), which is independent of γscsbw\gamma_{sc}^{sbw} given WscW_{sc}. This implies that it would suffice to balance on X~A=1\tilde{X}_{A=1}.

A.3 Proofs

We begin by establishing the following identity for our target parameter ψ01\psi_{0}^{1} defined in (1).

ψ01=μy+(X¯0υ1)β1\psi^{1}_{0}=\mu_{y}+(\bar{X}_{0}-\upsilon_{1})^{\top}\beta_{1} (28)

where μy=𝔼[YscAsc=1]\mu_{y}=\mathbb{E}[Y_{sc}\mid A_{sc}=1] and υ1=𝔼[XscAsc=1]\upsilon_{1}=\mathbb{E}[X_{sc}\mid A_{sc}=1].

Proof of (28).

Using our causal and modeling assumptions we have that:

𝔼[Ysc1Xsc,Asc=0]\displaystyle\mathbb{E}[Y_{sc}^{1}\mid X_{sc},A_{sc}=0] =𝔼[Ysc1Xsc,Asc=1]\displaystyle=\mathbb{E}[Y_{sc}^{1}\mid X_{sc},A_{sc}=1]
=𝔼[YscXsc,Asc=1]\displaystyle=\mathbb{E}[Y_{sc}\mid X_{sc},A_{sc}=1]
=α1+Xscβ1\displaystyle=\alpha_{1}+X_{sc}^{\top}\beta_{1}
=μy+(Xscυ1)β\displaystyle=\mu_{y}+(X_{sc}-\upsilon_{1})^{\top}\beta
ψ01=μy+(X¯0υ1)β1\displaystyle\implies\psi_{0}^{1}=\mu_{y}+(\bar{X}_{0}-\upsilon_{1})^{\top}\beta_{1}

where the first equality follows from unconfoundedness, the second equality from consistency, the third from our parametric modeling assumptions, and the fourth by definition of α\alpha. The final equation follows from averaging over the control units. ∎

Proof of Propositon 1.

It can be seen from (8) that for all sc:Asc=1sc:A_{sc}=1,

Xsc\displaystyle X_{sc} =v1+(Wscv1)κ+νsc\displaystyle=v_{1}+(W_{sc}-v_{1})^{\top}\kappa+\nu_{sc}^{\prime}

where νsc=Xsc𝔼[Xsc|W,A=1]\nu_{sc}^{\prime}=X_{sc}-\mathbb{E}[X_{sc}|W,A=1] may be viewed as an independent zero-mean noise term. Plugging into (4) yields

Ysc\displaystyle Y_{sc} =α1+v1(Iκ)β1+Wscκβ1+ϵsc\displaystyle=\alpha_{1}+v_{1}^{\top}(I-\kappa)\beta_{1}+W_{sc}^{\top}\kappa\beta_{1}+\epsilon_{sc}^{\prime}

for ϵsc=β1νsc+ϵsc\epsilon_{sc}^{\prime}=\beta_{1}^{\top}\nu_{sc}^{\prime}+\epsilon_{sc}. It follows that the OLS estimate β^\hat{\beta} given by (22) satisfies (Gleser, 1992),

𝔼[β^|WA=1]=κβ1,and𝔼[W¯1β^]=v1κβ1\mathbb{E}[\hat{\beta}|W_{A=1}]=\kappa\beta_{1},\qquad\text{and}\qquad\mathbb{E}[\bar{W}_{1}\hat{\beta}]=v_{1}\kappa\beta_{1} (29)

To show that ψ^01,ols\hat{\psi}_{0}^{1,\textup{ols}} and ψ^01,sbw\hat{\psi}_{0}^{1,\textup{sbw}} have identical bias, we compute their expectations:

𝔼[ψ^01,ols]\displaystyle\mathbb{E}[\hat{\psi}_{0}^{1,\textup{ols}}] =𝔼[Y¯1+(W¯0W¯1)β^]\displaystyle=\mathbb{E}[\bar{Y}_{1}+(\bar{W}_{0}-\bar{W}_{1})^{\top}\hat{\beta}]
=μ¯y+(X¯0υ1)κβ1\displaystyle=\bar{\mu}_{y}+(\bar{X}_{0}-\upsilon_{1})^{\top}\kappa\beta_{1} (30)
=ψ01+(X¯0υ1)(κIq)β1\displaystyle=\psi_{0}^{1}+(\bar{X}_{0}-\upsilon_{1})^{\top}(\kappa-I_{q})\beta_{1} (31)

where (30) holds by (29), and (31) holds by (28). We next derive the expected value of ψ^1,sbw\hat{\psi}^{1,\textup{sbw}}:

𝔼[ψ^01,sbw]\displaystyle\mathbb{E}[\hat{\psi}_{0}^{1,\textup{sbw}}] =𝔼[Asc=1γscYsc]\displaystyle=\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}Y_{sc}\right]
=𝔼[Asc=1γsc(α1+(WscWsc+Xsc)β1+ϵsc)]\displaystyle=\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}\left(\alpha_{1}+(W_{sc}-W_{sc}+X_{sc})^{\top}\beta_{1}+\epsilon_{sc}\right)\right] (32)
=𝔼[α1+Asc=1γscWscβ1+Asc=1γsc(XscWsc)β1+Asc=1γscϵsc]\displaystyle=\mathbb{E}\left[\alpha_{1}+\sum_{A_{sc}=1}{\gamma}_{sc}W_{sc}^{\top}\beta_{1}+\sum_{A_{sc}=1}{\gamma}_{sc}(X_{sc}-W_{sc})^{\top}\beta_{1}+\sum_{A_{sc}=1}{\gamma}_{sc}\epsilon_{sc}\right]
=α1+X¯0β1+𝔼[Asc=1γsc(XscWsc)β1]\displaystyle=\alpha_{1}+\bar{X}_{0}^{\top}\beta_{1}+\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}(X_{sc}-W_{sc})^{\top}\beta_{1}\right] (33)
=ψ01+𝔼[Asc=1γsc(XscWsc)β1]\displaystyle=\psi_{0}^{1}+\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}(X_{sc}-W_{sc})^{\top}\beta_{1}\right] (34)
=ψ01+𝔼[Asc=1𝔼[γsc(XscWsc)β1|W]]\displaystyle=\psi_{0}^{1}+\mathbb{E}\left[\sum_{A_{sc}=1}\mathbb{E}\left[{\gamma}_{sc}(X_{sc}-W_{sc})^{\top}\beta_{1}|W\right]\right] (35)
=ψ01+𝔼[Asc=1γsc(𝔼[Xsc|W]Wsc)β1]\displaystyle=\psi_{0}^{1}+\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}(\mathbb{E}[X_{sc}|W]-W_{sc})^{\top}\beta_{1}\right] (36)
=ψ01+𝔼[Asc=1γsc(υ1+κ(Wscυ1)Wsc)β1]\displaystyle=\psi_{0}^{1}+\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}(\upsilon_{1}+\kappa^{\top}(W_{sc}-\upsilon_{1})-W_{sc})^{\top}\beta_{1}\right] (37)
=ψ01+𝔼[Asc=1γsc(Wscυ1)(κI)β1]\displaystyle=\psi_{0}^{1}+\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}(W_{sc}-\upsilon_{1})^{\top}(\kappa-I)\beta_{1}\right]
=ψ01+(𝔼[Asc=1γscWsc]υ1)(κI)β1\displaystyle=\psi_{0}^{1}+\left(\mathbb{E}\left[\sum_{A_{sc}=1}{\gamma}_{sc}W_{sc}\right]-\upsilon_{1}\right)^{\top}(\kappa-I)\beta_{1}
=ψ01+(X¯0υ1)(κIq)β1\displaystyle=\psi_{0}^{1}+\left(\bar{X}_{0}-\upsilon_{1}\right)^{\top}(\kappa-I_{q})\beta_{1} (38)

where (32) holds by the assumed linear model for YscY_{sc} given by (4); (33) and (38) hold because the SBW algorithm enforces that γscWsc=W¯0\sum\gamma_{sc}W_{sc}=\bar{W}_{0}, which has expectation X¯0\bar{X}_{0}, and because ϵsc\epsilon_{sc} is zero-mean and independent of WscW_{sc} and hence independent of γsc\gamma_{sc}; (34) holds by definition of ψ01\psi_{0}^{1} and the assumed linear model in (4); (35) is the tower property of expectations; (36) follows because γsc\gamma_{sc} and WscW_{sc} are deterministic given WW; and (37) uses the expression for the conditional expectation given by (8). It can be seen that (31) and (38) are equal, and hence show that ψ^01,ols\hat{\psi}_{0}^{1,\textup{ols}} and ψ^01,sbw\hat{\psi}_{0}^{1,\textup{sbw}} have equal bias.

It remains to show that the bias of the OLS estimator is unchanged if the gaussian assumption is relaxed so that (8) no longer holds. It follows from (22) that β^\hat{\beta} is asymptotically given by

β^\displaystyle\hat{\beta} =(Asc=1(WscW¯1)(WscW¯1))1(Asc=1(WscW¯1)(YscY¯1))\displaystyle=\left(\sum_{A_{sc}=1}(W_{sc}-\bar{W}_{1})(W_{sc}-\bar{W}_{1})^{\top}\right)^{-1}\left(\sum_{A_{sc}=1}(W_{sc}-\bar{W}_{1})(Y_{sc}-\bar{Y}_{1})^{\top}\right)
p(SX+Σν)1SXβ1=κˇβ1\displaystyle\rightarrow^{p}(S_{X}+\Sigma_{\nu})^{-1}S_{X}\beta_{1}=\check{\kappa}\beta_{1}

where we have used (20) and (21). Plugging into ψ01,ols\psi_{0}^{1,\textup{ols}} yields

𝔼[ψ^0]1,ols=𝔼[Y¯1+(W¯0W¯1)β^]\displaystyle\mathbb{E}[\hat{\psi}_{0}]^{1,\textup{ols}}=\mathbb{E}[\bar{Y}_{1}+(\bar{W}_{0}-\bar{W}_{1})^{\top}\hat{\beta}]
pμ¯y+(X¯0X¯1)κˇβ1\displaystyle\rightarrow^{p}\bar{\mu}_{y}+(\bar{X}_{0}-\bar{X}_{1})\check{\kappa}\beta_{1}

from which the result follows by the same steps used to show (31).

Proof of Proposition 2.

Assuming ϵsc=0\epsilon_{sc}=0, by linearity we know that

Ysc=α1+X~scβ1+(XscX~sc)β1sc:Asc=1Y_{sc}=\alpha_{1}+\tilde{X}_{sc}^{\top}\beta_{1}+(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}\qquad\forall sc:A_{sc}=1 (39)

We then have that:

ψ^01,idealψ01\displaystyle\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1} =sc:Asc=1γscYsc(α1+X¯0β1)\displaystyle=\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}Y_{sc}-(\alpha_{1}+\bar{X}_{0}^{\top}\beta_{1})
=sc:Asc=1γscα1+sc:Asc=1γscX~scβ1\displaystyle=\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}\alpha_{1}+\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}\tilde{X}_{sc}^{\top}\beta_{1}
+sc:Asc=1γsc(XscX~sc)β1(α1+X¯0β1)\displaystyle+\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}-(\alpha_{1}+\bar{X}_{0}^{\top}\beta_{1}) (40)
=sc:Asc=1γsc(XscX~sc)β1\displaystyle=\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1} (41)

where (40) follows from (39), and (41) holds since γsc=1\sum\gamma_{sc}^{\star}=1 and γscX~sc=X¯0\sum\gamma_{sc}^{\star}\tilde{X}_{sc}=\bar{X}_{0}. Conditioned on WW, it can be seen that γ\gamma^{*} is fixed and XscX~scX_{sc}-\tilde{X}_{sc} has expectation zero; therefore, (41) implies that the estimator is unbiased. ∎

Proof of Proposition 3.

To derive Var(ψ^01,ideal|W)\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}|W\right), we use

Var(ψ^01,idealψ01|W)\displaystyle\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1}|W\right) =Var[sc:Asc=1γsc(XscX~sc)β1W]\displaystyle=\operatorname{Var}\left[\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star}(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}\mid W\right] (42)
=sc:Asc=1Var(γsc(XscX~sc)β1W)\displaystyle=\sum_{sc:A_{sc}=1}\operatorname{Var}(\gamma_{sc}^{\star}(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}\mid W) (43)
=sc:Asc=1γsc2β1(ΣXΣXΣW1ΣX)β1\displaystyle=\sum_{sc:A_{sc}=1}\gamma_{sc}^{\star^{2}}\beta_{1}^{\top}(\Sigma_{X}-\Sigma_{X}\Sigma_{W}^{-1}\Sigma_{X})\beta_{1} (44)
=γ2β1(ΣXΣXΣW1ΣX)β1\displaystyle=\|\gamma^{*}\|^{2}\cdot\beta_{1}^{\top}(\Sigma_{X}-\Sigma_{X}\Sigma_{W}^{-1}\Sigma_{X})\beta_{1} (45)

where (42) follows from (41), (43) holds because the tuples (Xsc,Wsc)(X_{sc},W_{sc}) are i.i.d, and (44) holds because γsc\gamma_{sc}^{*} is fixed given WW and (Xsc,Wsc)(X_{sc},W_{sc}) are jointly normal.

To upper bound the conditional variance given by (45), we will construct a feasible solution γ\gamma^{\prime} to the SBW objective over the constraint set Γ(X~,X¯0,0)\Gamma(\tilde{X},\bar{X}_{0},0) such that γ2=OP(n11)\|\gamma^{\prime}\|^{2}=O_{P}(n_{1}^{-1}). As the optimal solution γ\gamma^{*} satisfies γ2γ2\|\gamma^{*}\|^{2}\leq\|\gamma^{\prime}\|^{2}, the result follows.

Our construction is the following. Divide the n1n_{1} treated units into L=n1/nsubL=\lfloor n_{1}/n^{\text{sub}}\rfloor subsets of size nsubn^{\text{sub}}, and a remainder subset. For the subsets =1,,L\ell=1,\ldots,L, let X()X^{(\ell)} denote its covariates, X~()\tilde{X}^{(\ell)} the conditional expectation 𝔼[X()|W,A]\mathbb{E}[X^{(\ell)}|W,A], and γ()\gamma^{(\ell)} the solution to the SBW objective over the constraint set Γ(X~(),X¯0,0)\Gamma(\tilde{X}^{(\ell)},\bar{X}_{0},0), with γ()=0\gamma^{(\ell)}=0 if the constraint set is infeasible. As the units are assumed to be i.i.d., it follows that γ(1),,γ(L)\gamma^{(1)},\ldots,\gamma^{(L)} are also i.i.d. Let nsubn^{\text{sub}} be large enough so that each γ()\gamma^{(\ell)} has positive probability of being non-zero.

Let LL^{\prime} denote the number of subsets whose γ()\gamma^{(\ell)} is non-zero. As each non-zero weight vector γ()\gamma^{(\ell)} is feasible for Γ(X~(),X¯0,0)\Gamma(\tilde{X}^{(\ell)},\bar{X}_{0},0), it can be seen that the concatenated vector γ=(γ(1)/L,,γ(L)/L,0)\gamma^{\prime}=(\gamma^{(1)}/L^{\prime},\ldots,\gamma^{(L)}/L^{\prime},0) is feasible for Γ(X~,X¯0,0)\Gamma(\tilde{X},\bar{X}_{0},0). As the weights γ()\gamma^{(\ell)} are i.i.d, it follows that γ2\|\gamma^{\prime}\|^{2} which equals 1(L)2γ2\frac{1}{(L^{\prime})^{2}}\sum_{\ell}\|\gamma_{\ell}\|^{2} converges in probability to 1L𝔼γ(1)2=OP(n11)\frac{1}{L^{\prime}}\mathbb{E}\|\gamma^{(1)}\|^{2}=O_{P}(n_{1}^{-1}), proving the bound on Var(ψ^01,idealψ01|W)\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1}|W\right).

To show this rate also holds for Var(ψ^01,ideal)\operatorname{Var}\left(\hat{\psi}_{0}^{1,\textup{ideal}}\right), we can apply the law of total variance to f=ψ^01,idealψ01f=\hat{\psi}_{0}^{1,\textup{ideal}}-\psi_{0}^{1},

Var(f)=𝔼[Var(f|W)](i)+Var(𝔼[f|W])(ii)\operatorname{Var}(f)=\underbrace{\mathbb{E}[\operatorname{Var}(f|W)]}_{(i)}+\underbrace{\operatorname{Var}(\mathbb{E}[f|W])}_{(ii)}

observing that 𝔼[f|W]=0\mathbb{E}[f|W]=0 by Proposition 2, so that (ii) is zero. To show that term (i) is O(n11)O(n_{1}^{-1}), we observe that as Var(f|W)=OP(n11)\operatorname{Var}(f|W)=O_{P}(n_{1}^{-1}) and is bounded (since γ\gamma^{*} is non-negative and sums to 1), it follows that 𝔼[Var(f|W)]\mathbb{E}[\operatorname{Var}(f|W)] must be O(n11)O(n_{1}^{-1}) as well. ∎

Proof of Proposition 4.

Following Proposition 2, assuming ϵsc=0\epsilon_{sc}=0 we can decompose the error of the estimator as follows:

ψ^01,adjustedψ01\displaystyle\hat{\psi}^{1,\textup{adjusted}}_{0}-\psi_{0}^{1} =Asc=1γ^scYscψ01\displaystyle=\sum_{A_{sc}=1}\hat{\gamma}_{sc}Y_{sc}-\psi_{0}^{1}
=Asc=1γ^sc(α1+Xscβ1)ψ01\displaystyle=\sum_{A_{sc}=1}\hat{\gamma}_{sc}(\alpha_{1}+X_{sc}^{\top}\beta_{1})-\psi_{0}^{1} (46)
=Asc=1γ^sc(α1+X^scβ1+(XscX^sc)β1)ψ01\displaystyle=\sum_{A_{sc}=1}\hat{\gamma}_{sc}(\alpha_{1}+\hat{X}_{sc}^{\top}\beta_{1}+(X_{sc}-\hat{X}_{sc})^{\top}\beta_{1})-\psi_{0}^{1}
=α1+Asc=1γ^scX^scβ1+Asc=1γ^sc(XscX^sc)β1ψ01\displaystyle=\alpha_{1}+\sum_{A_{sc}=1}\hat{\gamma}_{sc}\hat{X}_{sc}^{\top}\beta_{1}+\sum_{A_{sc}=1}\hat{\gamma}_{sc}(X_{sc}-\hat{X}_{sc})^{\top}\beta_{1}-\psi_{0}^{1}
=α1+W¯0β1+Asc=1γ^sc(XscX^sc)β1ψ01\displaystyle=\alpha_{1}+\bar{W}_{0}^{\top}\beta_{1}+\sum_{A_{sc}=1}\hat{\gamma}_{sc}(X_{sc}-\hat{X}_{sc})^{\top}\beta_{1}-\psi_{0}^{1} (47)
=(X¯0W¯0)β1(i)+Asc=1γ^sc(XscX~sc)β1(ii)+Asc=1γ^sc(X~scX^sc)β1(iii)\displaystyle=\underbrace{(\bar{X}_{0}-\bar{W}_{0})^{\top}\beta_{1}}_{(i)}+\underbrace{\sum_{A_{sc}=1}\hat{\gamma}_{sc}(X_{sc}-\tilde{X}_{sc})^{\top}\beta_{1}}_{(ii)}+\underbrace{\sum_{A_{sc}=1}\hat{\gamma}_{sc}(\tilde{X}_{sc}-\hat{X}_{sc})^{\top}\beta_{1}}_{(iii)} (48)

where (46) holds by (4), (47) uses that γ^scX^sc=W¯0\sum\hat{\gamma}_{sc}\hat{X}_{sc}=\bar{W}_{0}, and (48) uses that ψ01=α1+X¯0β1\psi_{0}^{1}=\alpha_{1}+\bar{X}_{0}^{\top}\beta_{1}.

We observe that term (i) goes to zero by the law of large numbers. Term (iii) goes to zero because γ^1\|\hat{\gamma}\|\leq 1 (since γ^0\hat{\gamma}\geq 0 and sums to 1), and because X^\hat{X} converges to X~\tilde{X} uniformly over all units as Σ^X\hat{\Sigma}_{X} and Σ^ν\hat{\Sigma}_{\nu} converge.

To show that (ii) goes to zero, we will show that conditioned on WW, (ii) is zero mean and has variance going to zero. Conditioned on WW, γ^\hat{\gamma} is fixed; this implies that conditioned on WW, term (ii) is zero-mean, and has conditional variance

γ^2β1(ΣXΣXΣW1ΣX)β1\|\hat{\gamma}\|^{2}\cdot\beta_{1}^{\top}(\Sigma_{X}-\Sigma_{X}\Sigma_{W}^{-1}\Sigma_{X})\beta_{1}

By a construction argument identical to one used in the proof of Proposition 3, it can be shown that γ^20\|\hat{\gamma}\|^{2}\rightarrow 0, implying that the conditional variance goes to zero as well. ∎

Proof of Proposition 5.

Let Us={(Jsc,Wsc):c=1,,ps}U_{s}=\{(J_{sc},W_{sc}):c=1,\ldots,p_{s}\} denote the observed outcomes and covariates corresponding to the CPUMAs in state ss. Under our assumptions, it holds that UsU_{s} is i.i.d. for the treated states. It can also be seen that ψ^01,adjusted\hat{\psi}_{0}^{1,\textup{adjusted}} is a symmetric function of the treated state observations {Us:As=1}\{U_{s}:A_{s}=1\}. It follows that equation (1.6) of Efron and Stein (1981) can be seen to apply to our setting; as this equation is equal to (23), this proves the result. ∎

Proof of Proposition 6.

Let μ\mu denote

μ=1n1Asc=1Xˇsc\mu=\frac{1}{n_{1}}\sum_{A_{sc}=1}\check{X}_{sc}

so that

Xˇscμ=κˇ(WscW¯1)\check{X}_{sc}-\mu=\check{\kappa}^{\top}(W_{sc}-\bar{W}_{1})

and hence that

βˇ\displaystyle\check{\beta} =(Asc=1(Xˇscμ)(Xˇscμ))1Asc=1(Xˇscμ)(YscY¯1)\displaystyle=\left(\sum_{A_{sc}=1}(\check{X}_{sc}-\mu)(\check{X}_{sc}-\mu)^{\top}\right)^{-1}\sum_{A_{sc}=1}(\check{X}_{sc}-\mu)(Y_{sc}-\bar{Y}_{1})
=(Asc=1κˇ(WscW¯1)(WscW¯1)κˇ)1Asc=1κˇ(WscW¯1)(YscY¯1)\displaystyle=\left(\sum_{A_{sc}=1}\check{\kappa}^{\top}(W_{sc}-\bar{W}_{1})(W_{sc}-\bar{W}_{1})^{\top}\check{\kappa}\right)^{-1}\sum_{A_{sc}=1}\check{\kappa}^{\top}(W_{sc}-\bar{W}_{1})(Y_{sc}-\bar{Y}_{1})
p(κˇ(SX+Σν)κˇ)1κˇSXβ1\displaystyle\to^{p}(\check{\kappa}^{\top}(S_{X}+\Sigma_{\nu})\check{\kappa})^{-1}\check{\kappa}^{\top}S_{X}\beta_{1} (49)
=κˇ1(SX+Σν)1SXβ1\displaystyle=\check{\kappa}^{-1}(S_{X}+\Sigma_{\nu})^{-1}S_{X}\beta_{1}
=β1\displaystyle=\beta_{1}

where (49) follows by (20) and (21), and the last step follows from the definition of κˇ\check{\kappa}. It then follows that

Y¯1(W¯0W¯1)βˇpα1+X¯0β1=ψ01\bar{Y}_{1}-(\bar{W}_{0}-\bar{W}_{1})^{\top}\check{\beta}\to^{p}\alpha_{1}+\bar{X}_{0}^{\top}\beta_{1}=\psi_{0}^{1}

proving consistency. ∎

Proof of Proposition 7.

We will use Proposition 10 which is proved later in this section. To apply it, we let Ω=I\Omega=I, Z=XˇA=1Z=\check{X}_{A=1}, and v=W¯0v=\bar{W}_{0}. Using Xˇsc=W¯1+κˇ(WscW¯1)\check{X}_{sc}=\bar{W}_{1}+\check{\kappa}^{\top}(W_{sc}-\bar{W}_{1}), we find that

μ\displaystyle\mu =1nQscQXˇsc\displaystyle=\frac{1}{n_{Q}}\sum_{sc\in Q}\check{X}_{sc}
=W¯1+κˇ(W¯QW¯1)\displaystyle=\bar{W}_{1}+\check{\kappa}^{\top}(\bar{W}_{Q}-\bar{W}_{1}) (50)

and hence that Xˇscμ=κˇ(WscW¯Q)\check{X}_{sc}-\mu=\check{\kappa}^{\top}(W_{sc}-\bar{W}_{Q}). Plugging into (27) yields

γˇsc\displaystyle\check{\gamma}_{sc} =1nQ(WscW¯Q)κˇ(κˇSWQκˇ)1(W¯0μ)+1nQ\displaystyle=\frac{1}{n_{Q}}(W_{sc}-\bar{W}_{Q})^{\top}\check{\kappa}(\check{\kappa}^{\top}S_{W_{Q}}\check{\kappa})^{-1}(\bar{W}_{0}-\mu)+\frac{1}{n_{Q}}
=1nQ(WscW¯Q)SWQ1κˇ(W¯0μ)+1nQscQ\displaystyle=\frac{1}{n_{Q}}(W_{sc}-\bar{W}_{Q})^{\top}S_{W_{Q}}^{-1}\check{\kappa}^{-\top}(\bar{W}_{0}-\mu)+\frac{1}{n_{Q}}\qquad\forall\ sc\in Q (51)

As Ysc=α1+β1(X¯Q+XscX¯Q)Y_{sc}=\alpha_{1}+\beta_{1}^{\top}(\bar{X}_{Q}+X_{sc}-\bar{X}_{Q}) for the treated units, the SBW estimator of ψ01\psi_{0}^{1} can be seen to equal

Asc=1Yscγˇsc\displaystyle\sum_{A_{sc}=1}Y_{sc}\check{\gamma}_{sc} =Asc=1(α1+β1X¯Q)γˇsc+Asc=1β1(XscX¯Q)γˇsc\displaystyle=\sum_{A_{sc}=1}(\alpha_{1}+\beta_{1}^{\top}\bar{X}_{Q})\check{\gamma}_{sc}+\sum_{A_{sc}=1}\beta_{1}^{\top}(X_{sc}-\bar{X}_{Q})\check{\gamma}_{sc}
=(α1+β1X¯Q)\displaystyle=(\alpha_{1}+\beta_{1}^{\top}\bar{X}_{Q}) (52)
+scQβ1(XscX¯Q)[1nQ(WscW¯Q)SWQ1κˇ(W¯0μ)+1nQ]\displaystyle\hskip 14.22636pt{}+\sum_{sc\in Q}\beta_{1}^{\top}(X_{sc}-\bar{X}_{Q})\left[\frac{1}{n_{Q}}(W_{sc}-\bar{W}_{Q})^{\top}S_{W_{Q}}^{-1}\check{\kappa}^{-\top}(\bar{W}_{0}-\mu)+\frac{1}{n_{Q}}\right]
=(α1+β1X¯Q)\displaystyle=(\alpha_{1}+\beta_{1}^{\top}\bar{X}_{Q}) (53)
+β1SXWQSWQ1κˇ(W¯0μ)+1nQscQβ1(XscX¯Q)=0\displaystyle\hskip 14.22636pt{}+\beta_{1}^{\top}S_{XW_{Q}}S_{W_{Q}}^{-1}\check{\kappa}^{-\top}(\bar{W}_{0}-\mu)+\underbrace{\frac{1}{n_{Q}}\sum_{sc\in Q}\beta_{1}^{\top}(X_{sc}-\bar{X}_{Q})}_{=0}
=α1+β1X¯0(β1X¯0β1SXWQSWQ1κˇW¯0)(i)\displaystyle=\alpha_{1}+\beta_{1}^{\top}\bar{X}_{0}-\underbrace{(\beta_{1}^{\top}\bar{X}_{0}-\beta_{1}^{\top}S_{XW_{Q}}S_{W_{Q}}^{-1}\check{\kappa}^{-\top}\bar{W}_{0})}_{(i)} (54)
+β1(X¯QSXWQSWQ1κˇW¯1)(ii)β1SXQSWQ1(W¯QW¯1))(iii)\displaystyle\hskip 14.22636pt{}+\underbrace{\beta_{1}^{\top}(\bar{X}_{Q}-S_{XW_{Q}}S_{W_{Q}}^{-1}\check{\kappa}^{-\top}\bar{W}_{1})}_{(ii)}-\underbrace{\beta_{1}^{\top}S_{X_{Q}}S_{W_{Q}}^{-1}(\bar{W}_{Q}-\bar{W}_{1}))}_{(iii)}
pψ01β1(ISXWQSWQ1SWSX1)X¯0(i)\displaystyle\to^{p}\psi_{0}^{1}-\underbrace{\beta_{1}^{\top}(I-S_{XW_{Q}}S_{W_{Q}}^{-1}S_{W}S_{X}^{-1})\bar{X}_{0}}_{(i)} (55)
+β1(X¯QSXWQSWQ1SWSX1X¯1)(ii)β1SXWQSWQ1(X¯QX¯1)(iii)\displaystyle\hskip 14.22636pt{}+\underbrace{\beta_{1}^{\top}(\bar{X}_{Q}-S_{XW_{Q}}S_{W_{Q}}^{-1}S_{W}S_{X}^{-1}\bar{X}_{1})}_{(ii)}-\underbrace{\beta_{1}^{\top}S_{XW_{Q}}S_{W_{Q}}^{-1}(\bar{X}_{Q}-\bar{X}_{1})}_{(iii)}

where (52) uses the expression for γˇ\check{\gamma} given by (51; (53) follows by algebraic manipulations, and notes that nQ1scQ(XscX¯Q)=0n_{Q}^{-1}\sum_{sc\in Q}(X_{sc}-\bar{X}_{Q})=0; (54) adds and subtracts β1X¯0\beta_{1}^{\top}\bar{X}_{0}, substitutes for μ\mu using (50), and groups the terms into (i), (ii), and (iii); and (55) substitutes for κˇ\check{\kappa} and uses W¯0pX¯0\bar{W}_{0}\to^{p}\bar{X}_{0} and W¯1pX¯1\bar{W}_{1}\to^{p}\bar{X}_{1}.

It can be seen that terms (i), (ii), and (iii) each go to zero if SXWQSXS_{XW_{Q}}\to S_{X}, SWQSWS_{W_{Q}}\to S_{W}, and X¯QX¯1\bar{X}_{Q}\to\bar{X}_{1}, proving the result. ∎

Proof of Propositions 8 and 11.

We prove the result for H-SBW only, as letting ρ=0\rho=0 includes SBW as a special case. Following the derivation of (41) with the H-SBW weights γ~hsbw\tilde{\gamma}^{\textup{hsbw}*} in place of γ\gamma^{*}, it can be shown that:

Asc=1Yscγ~schsbwψ01=sc:Asc=1γ~schsbw(XscX~sc)β1\displaystyle\sum_{A_{sc}=1}Y_{sc}\tilde{\gamma}_{sc}^{\textup{hsbw}*}-\psi^{1}_{0}=\sum_{sc:A_{sc}=1}\tilde{\gamma}^{\textup{hsbw}*}_{sc}(X_{sc}-\tilde{X}_{sc}^{\star})^{\top}\beta_{1}

Conditional on WW (and assuming that the correspondence between states and CPUMAs is known), γ~schsbw\tilde{\gamma}_{sc}^{\textup{hsbw}*} is fixed and XscX~scX_{sc}-\tilde{X}_{sc}^{*} equals Xsc𝔼[Xsc|W,A=1]X_{sc}-\mathbb{E}[X_{sc}|W,A=1] which has mean zero, proving the result. ∎

Proof of Proposition 9.
Var(nt1s:As=1c=1psγscYscX,A)\displaystyle\operatorname{Var}\left(n_{t}^{-1}\sum_{s:A_{s}=1}\sum_{c=1}^{p_{s}}\gamma_{sc}Y_{sc}\mid X,A\right) =nt2s:As=1c=1psγsc2(σϵ2+σε2)+cdγscγsdσε2\displaystyle=n_{t}^{-2}\sum_{s:A_{s}=1}\sum_{c=1}^{p_{s}}\gamma_{sc}^{2}(\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon})+\sum_{c\neq d}\gamma_{sc}\gamma_{sd}\sigma^{2}_{\varepsilon}
s:As=1c=1psγsc2+cdργscγsd\displaystyle\propto\sum_{s:A_{s}=1}\sum_{c=1}^{p_{s}}\gamma_{sc}^{2}+\sum_{c\neq d}\rho\gamma_{sc}\gamma_{sd}

where the second line follows by dividing by σϵ2+σε2\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon}. By definition of the H-SBW objective, which minimizes this function for known ρ\rho, the H-SBW estimator must produce the minimum conditional-on-X variance estimator within the constraint set. ∎

Proof of Proposition 10.

To show that γ\gamma^{*} solves (26), we first observe that it is a feasible solution, by definition of QQ. The result can then be proven by contradiction: if γ\gamma^{*} is feasible but does not solve (26), then a feasible γ~\tilde{\gamma} must exist with lower objective value. Then for some convex combination γλ=λγ~+(1λ)γ\gamma_{\lambda}=\lambda\tilde{\gamma}+(1-\lambda)\gamma^{*} with λ>0\lambda>0, we can show that that γλ\gamma_{\lambda} is both feasible for (25), and has lower objective value than γ\gamma^{*}:

  1. 1.

    To establish that γλ\gamma_{\lambda} is feasible, we observe that γ~\tilde{\gamma} is feasible for (26). This implies that if γi=0\gamma^{*}_{i}=0, then γ~i=0\tilde{\gamma}_{i}=0 as well; as a result, there exists λ>0\lambda>0 such that the convex combination γλ\gamma_{\lambda} satisfies γλ0\gamma_{\lambda}\geq 0 and hence is feasible for (25).

  2. 2.

    To show that this γλ\gamma_{\lambda} has lower objective value than γ\gamma^{*}, we observe that if γ~\tilde{\gamma} has lower objective value than γ\gamma^{*}, then by strict convexity of the objective any convex combination with λ>0\lambda>0 must have lower objective value than γ\gamma^{*} as well.

This shows that if γ\gamma^{*} is not optimal for (26), then it is not optimal for (25) either. But as γ\gamma^{*} is the optimal solution to (25), this is a contradiction; hence by taking the contrapositive it follows that γ\gamma^{*} must solve (26).

To show (27), we observe that (26) can be written as

minγQγQΩQγQsubject toZQγQ=v and  1γQ=1\min_{\gamma_{Q}}\gamma_{Q}^{\top}\Omega_{Q}\gamma_{Q}\quad\text{subject to}\quad Z_{Q}\gamma_{Q}=v\ \text{ and }\ {\bf 1}^{\top}\gamma_{Q}=1

which can be rewritten as

minγQγQΩQγQsubject to[ZQμ𝟏𝟏]γQ=[vμ1]\min_{\gamma_{Q}}\gamma_{Q}^{\top}\Omega_{Q}\gamma_{Q}\quad\text{subject to}\quad\left[\begin{array}[]{cr}Z_{Q}-\mu{\bf 1}^{\top}\\ {\bf 1}^{\top}\end{array}\right]\gamma_{Q}=\left[\begin{array}[]{cr}v-\mu&1\end{array}\right]

where we have subtracted μ𝟏γQ\mu{\bf 1}^{\top}\gamma_{Q} (which equals μ\mu) from both sides of the constraint. This is a least norm problem, and when feasible has solution

γQ=ΩQ1A(AΩQ1A)1b\gamma_{Q}^{*}=\Omega^{-1}_{Q}A^{\top}(A\Omega^{-1}_{Q}A^{\top})^{-1}b (56)

where A=[ZQμ𝟏𝟏]A=\left[\begin{array}[]{cr}Z_{Q}-\mu{\bf 1}^{\top}\\ {\bf 1}^{\top}\end{array}\right] and b=[vμ1]b=\left[\begin{array}[]{cr}v-\mu&1\end{array}\right]. As (ZQμ𝟏)ΩQ1𝟏=0(Z_{Q}-\mu{\bf 1}^{\top})\Omega_{Q}^{-1}{\bf 1}=0, it follows that AΩ1AA\Omega^{-1}A^{\top} is block diagonal

AΩQ1A=[(ZQμ)ΩQ1(ZQμ)00𝟏ΩQ1𝟏]A\Omega_{Q}^{-1}A^{\top}=\left[\begin{array}[]{cc}(Z_{Q}-\mu)\Omega_{Q}^{-1}(Z_{Q}-\mu)^{\top}&0\\ 0&{\bf 1}^{\top}\Omega^{-1}_{Q}{\bf 1}\end{array}\right]

so that plugging into (56) yields

γQ=ΩQ1(ZQμ)[(ZQμ)ΩQ1(ZQμ)]1(vμ)+ΩQ1𝟏𝟏ΩQ1𝟏\gamma^{*}_{Q}=\Omega_{Q}^{-1}(Z_{Q}-\mu)^{\top}\left[(Z_{Q}-\mu)\Omega_{Q}^{-1}(Z_{Q}-\mu)\right]^{-1}(v-\mu)+\frac{\Omega^{-1}_{Q}{\bf 1}}{{\bf 1}^{\top}\Omega^{-1}_{Q}{\bf 1}}

This is equivalent to the implied regression weights for the estimate of βb\beta^{\top}b using estimate of β\beta solving the GLS problem on the set QQ given an outcome vector YY,

minβ(YQAβ)ΩQ1(YQAβ)\displaystyle\min_{\beta}(Y_{Q}-A^{\top}\beta)\Omega_{Q}^{-1}(Y_{Q}-A^{\top}\beta)^{\top}

where β^=(AΩQ1A)1AΩQ1YQ\hat{\beta}=(A\Omega_{Q}^{-1}A^{\top})^{-1}A\Omega_{Q}^{-1}Y_{Q}, proving the result.

Appendix B Covariate and Adjustment Details

In this section we detail our estimation of the observed covariates WW and our adjusted covariates X^\hat{X}.

B.1 Unadjusted covariate estimates

To estimate our CPUMA-level covariates WW using the ACS microdata, we require estimating both numerator and denominator counts given that we are ultimately interested in calculating rates. For each CPUMA we estimate: the total non-elderly adult population for each year 2011-2014; the total labor force population (among non-elderly adults) for each year 2011-2013; and the total number of households averaged from 2011-2013. We also construct an average of the total non-elderly adult population from 2011-2013. These are our denominator variables. For our numerator counts, we estimate the total number of: females; White individuals; people of Hispanic ethnicity; people born outside of the United States; citizens; people with disabilities; married individuals; students; people with less than a high school education, high school degrees, some college, or college graduates or higher; people living under 138 percent of the FPL, between 139 and 299 percent, 300 and 499 percent, more than 500 percent, and who did not respond to the income survey question; people aged 19-29, 30-39, 40-49, 50-64; households with one, two, or three or more children, and households that did not respond about the number of children. We average these estimated counts from 2011-2013. For each individual year from 2011-2013, we then estimate the total number of people who were unemployed and uninsured at the time of the survey (calculated among all non-elderly adults and all non-elderly adults within the labor force, respectively). We divide the numerator totals by the corresponding denominator totals to estimate the percentage in each category. For the demographics, these include the average number of non-elderly adults from 2011-2013. For the time-varying variables, we use the corresponding year (where uninsurance rates are calculated as a fraction of the labor force rather than the non-elderly adult population). We also calculate the average non-elderly adult population growth and the average number of households to adults across 2011-2013. All estimates are calculated using the associated set of survey weights provided in the public use microdata files.

B.2 Covariate adjustment

We provide additional details about estimating X~\tilde{X}. We begin by estimating the unpooled unit-level covariance matrices Σν,scraw\Sigma_{\nu,sc}^{\text{raw}}, the sampling variability for each CPUMA among the treated units, by using the individual replicate survey weights to generate b=80b=80 additional CPUMA-level datasets. We then compute:

Σ^ν,scraw=480b=180(Wb,scW¯sc)(Wb,scW¯sc)\hat{\Sigma}_{\nu,sc}^{\text{raw}}=\frac{4}{80}\sum_{b=1}^{80}(W_{b,sc}-\bar{W}_{sc})(W_{b,sc}-\bar{W}_{sc})^{\top} (57)

where the 44 in the numerator comes from the process used to generate the replicate survey weights and W¯sc\bar{W}_{sc} is the vector of covariate values estimated using the original ACS weights.

We let Σ^W=n11sc:As=1(WscW¯1)(WscW¯1)\hat{\Sigma}_{W}=n_{1}^{-1}\sum_{sc:A_{s}=1}(W_{sc}-\bar{W}_{1})(W_{sc}-\bar{W}_{1})^{\top}. This estimate is calculated on the original observed dataset. We then estimate ΣX\Sigma_{X} using:

Σ^X=Σ^Wn11sc:Asc=1Σ^ν,scraw\hat{\Sigma}_{X}=\hat{\Sigma}_{W}-n_{1}^{-1}\sum_{sc:A_{sc}=1}\hat{\Sigma}_{\nu,sc}^{\text{raw}} (58)

Define

κ^=Σ^W1Σ^X\hat{\kappa}=\hat{\Sigma}_{W}^{-1}\hat{\Sigma}_{X} (59)

Notice that κ^\hat{\kappa} is a matrix of estimated coefficients of a linear regressions of the (unobserved) matrix XscX_{sc} on (observed) matrix WscW_{sc}. We can then estimate 𝔼[XscWsc,Asc=1]\mathbb{E}[X_{sc}\mid W_{sc},A_{sc}=1] using:

X^sc=𝔼^[XscWsc,Asc=1]=W¯1+κ^(WscW¯1)sc:Asc=1\hat{X}_{sc}=\hat{\mathbb{E}}[X_{sc}\mid W_{sc},A_{sc}=1]=\bar{W}_{1}+\hat{\kappa}^{\top}(W_{sc}-\bar{W}_{1})\forall sc:A_{sc}=1 (60)

We call this the “homogeneous adjustment” and denote the corresponding set of units in (60) X^A=1hom\hat{X}_{A=1}^{hom}. This adjustment approximately aligns with the adjustments suggested by Carroll et al. (2006) and Gleser (1992). This estimator for X~\tilde{X} is consistent for 𝔼[XW,A]\mathbb{E}[X\mid W,A] if we assume, for example, that the measurement errors are homoskedastic (see, e.g., Σν,sc=Σν\Sigma_{\nu,sc}=\Sigma_{\nu}).252525We can theoretically weaken this assumption to still obtain a consistent estimate (see, e.g., Buonaccorsi (2010)).

To potentially improve upon this procedure, we also consider a second estimate that we call the “heterogeneous adjustment.” This adjustment accounts for the fact that some regions with large populations are estimated quite precisely, while regions with small populations are estimated much less precisely (additionally, for a given CPUMA, some covariates are measured using three years of data, and others only one). For this adjustment we model an individual-level Σν,sc\Sigma_{\nu,sc} as a function of the sample sizes used to estimate each covariate.

Specifically, let sscs_{sc} be the q-dimensional vector of the sample sizes used to estimate each covariate value for a given CPUMA. Let \odot reflect the Hadamard product, and \oslash reflect Hadamard division. We assume that sscνscAsc=1N(0,Σν)\sqrt{s_{sc}}\odot\nu_{sc}\mid A_{sc}=1\sim N(0,\Sigma_{\nu}). Let Ssc=sscsscS_{sc}=\sqrt{s_{sc}}\sqrt{s_{sc}}^{\top}. We then know that Σν,sc=ΣνSsc\Sigma_{\nu,sc}=\Sigma_{\nu}\oslash S_{sc}.

To estimate Σν,sc\Sigma_{\nu,sc}, we first pool our initial estimates of the CPUMA-level covariance matrices Σ^ν,scraw\hat{\Sigma}_{\nu,sc}^{\text{raw}} to generate Σ^ν=n11sc:As=1𝐒scΣ^ν,scraw\hat{\Sigma}_{\nu}=n_{1}^{-1}\sum_{sc:A_{s}=1}\mathbf{S}_{sc}\circ\hat{\Sigma}_{\nu,sc}^{\text{raw}}. We estimate Σ^ν,sc=Σ^νSsc\hat{\Sigma}_{\nu,sc}=\hat{\Sigma}_{\nu}\oslash S_{sc}. Using the same estimate of Σ^X\hat{\Sigma}_{X} as before, we calculate κ^sc=(Σ^X+Σ^ν,sc)1Σ^X\hat{\kappa}_{sc}=(\hat{\Sigma}_{X}+\hat{\Sigma}_{\nu,sc})^{-1}\hat{\Sigma}_{X}, which we use to estimate the heterogeneous adjustment. We call the corresponding set of estimates that use this adjustment X^A=1het\hat{X}_{A=1}^{het}.

This adjustment accounts for CPUMA-level variability in the measurement error, and should more greatly affect outlying values of imprecisely estimated covariates, while leaving precisely estimated covariates closer to their observed value in the dataset. Moreover, unlike using the original individual-level CPUMA estimates Σ^ν,scraw\hat{\Sigma}_{\nu,sc}^{\text{raw}}, we are able to use the full efficiency of using all units in the modeling. On the other hand, this model assumes that all differences in the sampling variability are due to the sample sizes, and assumes away heterogeneity due to heteroskedasticity. In our simulation results in Appendix F, we also find that this estimator leads to bias if the measurement errors are in truth homoskedastic.

As a final adjustment we can also build on the homogeneous adjustment to account for the correlation structure of the data. To motivate this adjustment we view the entire data among the treated states X=(X11,,X1p1,,Xsps)X=(X_{11},...,X_{1p_{1}},...,X_{sp_{s}}) as reflecting a single draw from the distribution MVN(𝝊1,𝚺X)MVN(\boldsymbol{\upsilon}_{1},\mathbf{\Sigma}_{X}), where 𝝊1\boldsymbol{\upsilon}_{1} is a qn1qn_{1} dimensional vector that repeats the qq dimensional vector of means υ1\upsilon_{1} (defined previously) n1n_{1} times. Conditional on XX, W=(W11,,W1p1,,Wsps)W=(W_{11},...,W_{1p_{1}},...,W_{sp_{s}}) reflects a single draw from the distribution MVN(X,𝚺ν)MVN(X,\boldsymbol{\Sigma}_{\nu}), where 𝚺ν\boldsymbol{\Sigma}_{\nu} is a diagonal matrix with Σν\Sigma_{\nu} in the diagonals. That is, we assume that the measurement errors are drawn independently across units with constant covariance matrix Σν\Sigma_{\nu}.

We then let ΣB\Sigma_{B} represent the covariance between CPUMAs that share a state, which we assume is constant across all CPUMAs and states.

Let 𝚺W=𝚺X+𝚺ν\boldsymbol{\Sigma}_{W}=\boldsymbol{\Sigma}_{X}+\boldsymbol{\Sigma}_{\nu}. (X,W)(X,W) are then jointly normal with common mean 𝝊1\boldsymbol{\upsilon}_{1} and covariance matrix 𝚺\boldsymbol{\Sigma}:

𝚺\displaystyle\boldsymbol{\Sigma} =(𝚺X𝚺X𝚺X𝚺W)\displaystyle=\begin{pmatrix}\mathbf{\Sigma}_{X}&\mathbf{\Sigma}_{X}\\ \mathbf{\Sigma}_{X}&\mathbf{\Sigma}_{W}\end{pmatrix} (61)

Notice that 𝚺X\mathbf{\Sigma}_{X} and 𝚺W\mathbf{\Sigma}_{W} are block-diagonal matrices with qpsqp_{s} by qpsqp_{s} blocks indexed by ss that we denote ΣX,s\Sigma_{X,s} and ΣW,s\Sigma_{W,s}. Each element of these matrices contains the qq by qq matrices defined previously:

ΣX,s\displaystyle\Sigma_{X,s} =(ΣX,ΣB,,ΣBΣB,ΣX,,ΣBΣB,ΣB,,ΣX);ΣW,s=(ΣW,ΣB,,ΣBΣB,ΣW,,ΣBΣB,ΣB,,ΣW)\displaystyle=\begin{pmatrix}\Sigma_{X},\Sigma_{B},...,\Sigma_{B}\\ \Sigma_{B},\Sigma_{X},...,\Sigma_{B}\\ ...\\ \Sigma_{B},\Sigma_{B},...,\Sigma_{X}\end{pmatrix};\Sigma_{W,s}=\begin{pmatrix}\Sigma_{W},\Sigma_{B},...,\Sigma_{B}\\ \Sigma_{B},\Sigma_{W},...,\Sigma_{B}\\ ...\\ \Sigma_{B},\Sigma_{B},...,\Sigma_{W}\\ \end{pmatrix} (62)

where we have defined ΣX\Sigma_{X}, ΣW\Sigma_{W}, and ΣB\Sigma_{B} previously. Let 𝜿=𝚺𝐖1𝚺𝐗\boldsymbol{\kappa}=\mathbf{\Sigma_{W}}^{-1}\mathbf{\Sigma_{X}}. We then have that by the conditional normal distribution:

𝔼[XW,A=1]\displaystyle\mathbb{E}[X\mid W,A=1] =𝝊1+𝜿(W𝝊1)\displaystyle=\boldsymbol{\upsilon}_{1}+\boldsymbol{\kappa}^{\top}(W-\boldsymbol{\upsilon}_{1}) (63)

To estimate this quantity we need only additionally generate an estimate of ΣB\Sigma_{B} in addition to the previous estimates. We first generate state-specific estimates Σ^B,s=2ps(ps1)c<d(WscW¯)(WsdW¯)\hat{\Sigma}_{B,s}=\frac{2}{p_{s}(p_{s}-1)}\sum_{c<d}(W_{sc}-\bar{W})(W_{sd}-\bar{W})^{\top} and then generate Σ^B\hat{\Sigma}_{B} as an average of these estimates. We then replace the other quantities by their empirical counterparts to generate the adjustment set X^A=1cor\hat{X}_{A=1}^{cor}.

We do not use this adjustment for our application: in practice we find that it adds substantial variability to the observed data, predicting values that frequently fall outside of the support of the original dataset. In simulations in Appendix F we find that this adjustment adds quite a bit of variability to the final estimates given a m1=25m_{1}=25 states.

Appendix C Summary Statistics and Covariates

In this section we display summary statistics about the CPUMA-level datasets. The first two tables pertain to treatment assignment classifications. Table 4 lists the states that are assigned to each group: the first two columns include the treatment states and control states in our primary analysis. The third column lists the treatment states for our sensitivity analysis that excludes “early expansion” states. The final column indicates states that were always excluded from the analysis. Table 5 displays the total number of CPUMAs per state, as well as a column reiterating the state’s treatment assignment and whether it was an early expansion state.

The subsequent tables and figure display summary information about the expansion state data and the homogeneous and heterogeneous covariate adjustments detailed in Appendix B.

Table 6 displays univariate summary statistics for the treated CPUMAs. Specifically, the table displays mean, interquartile range, and the range (as defined by the maximum value minus the minimum value) for the unadjusted dataset, the heterogeneous adjustment, and the homogeneous adjustment. We see that the covariate adjustments generally reduce the variability relative to the unadjusted data.

Table 7 displays the frequency that the adjusted covariates fell outside of the support of the unadjusted dataset on our primary dataset. The frequency is comparable for either adjustment and the counts are low, supporting the use of the linear model outlined in (8). We also calculate these adjustments excluding early expansion states, and recalculate these adjustments excluding each state one at a time to calculate our variance estimates, yielding different results with respect to the quality of the resulting adjustments. These results are available on request.

Table 8 displays the trends in the outcome over time by treatment group. We use these estimates to compute the difference-in-differences estimator of the ETT in footnote 24.

Figure 3 displays the Pearson’s correlation coefficients for the bivariate relationships between the covariates on the unadjusted dataset (including both treated and untreated units). These point estimates may be biased due to the measurement error in the covariates. Nevertheless, this matrix is useful for at least two reasons: first, assuming the correlations among the treated and untreated units are similar, the more heavily correlated the data the easier it should be to attain covariate balance (see, e.g., D’Amour et al. (2021)). This matrix gives a general sense of how correlated the data are, even if the estimates are biased. Second, these correlations can suggest potential confounders by revealing which variables are most heavily associated with treatment assignment and the pre-treatment outcomes. For example, the plot shows a strong association between Republican governance and treatment assignment, and a smaller association between these variables with pre-treatment outcomes. The plot also illustrates strong associations between the pre-treatment uninsurance rates, though they are more weakly associated with treatment assignment.

Table 4: Treatment assignment classification
Treated states Control states Early expansion states Always excluded
AR, AZ, CA, CO, CT, HI, IA, IL, KY, MD, MIa{}^{\textrm{a}}, MN, ND, NJ, NM, NV, OH, OR, RI, WA, WV AK, AL, FL, GA, ID, IN, KS, LA, ME, MO, MS, MT, NC, NE, OK, PA, SC, SD, TN, TX, UT, VA, WI, WY CA, CT, MN, NJ, WA DEc{}^{\textrm{c}}, MAc{}^{\textrm{c}}, NHb{}^{\textrm{b}}, NYc{}^{\textrm{c}}, VTc{}^{\textrm{c}}, DCc{}^{\textrm{c}}

a{}^{\textrm{a}} Expanded April 2014

b{}^{\textrm{b}} Expanded September 2014; included for covariate adjustment estimates but not as a possible weight donor for treatment effect estimates

c{}^{\textrm{c}} Comparable coverage policies prior to 2014

Table 5: Number of CPUMAs per state
State Full State Treatment Number CPUMAs Early Expansion
Delaware DE Excluded 4 No
Massachusetts MA Excluded 15 No
New Hampshire NH Excluded a{}^{\textrm{a}} 4 No
New York NY Excluded 123 No
Vermont VT Excluded 4 No
Arizona AZ Expansion 11 No
Arkansas AR Expansion 15 No
Colorado CO Expansion 15 No
Hawaii HI Expansion 8 No
Illinois IL Expansion 47 No
Iowa IA Expansion 7 No
Kentucky KY Expansion 23 No
Maryland MD Expansion 36 No
Michigan MI Expansion 44 No
Nevada NV Expansion 7 No
New Mexico NM Expansion 6 No
North Dakota ND Expansion 2 No
Ohio OH Expansion 44 No
Oregon OR Expansion 17 No
Rhode Island RI Expansion 6 No
West Virginia WV Expansion 4 No
California CA Expansion 110 Yes
Connecticut CT Expansion 22 Yes
Minnesota MN Expansion 27 Yes
New Jersey NJ Expansion 38 Yes
Washington WA Expansion 22 Yes
Alabama AL Non-expansion 18 No
Alaska AK Non-expansion 4 No
Florida FL Non-expansion 59 No
Georgia GA Non-expansion 20 No
Idaho ID Non-expansion 1 No
Indiana IN Non-expansion 24 No
Kansas KS Non-expansion 9 No
Louisiana LA Non-expansion 15 No
Maine ME Non-expansion 5 No
Mississippi MS Non-expansion 7 No
Missouri MO Non-expansion 16 No
Montana MT Non-expansion 1 No
Nebraska NE Non-expansion 11 No
North Carolina NC Non-expansion 27 No
Oklahoma OK Non-expansion 8 No
Pennsylvania PA Non-expansion 55 No
South Carolina SC Non-expansion 10 No
South Dakota SD Non-expansion 1 No
Tennessee TN Non-expansion 28 No
Texas TX Non-expansion 49 No
Utah UT Non-expansion 8 No
Virginia VA Non-expansion 15 No
Wisconsin WI Non-expansion 21 No
Wyoming WY Non-expansion 2 No

a{}^{\textrm{a}} Included for covariate adjustment estimates but not as a possible weight donor for treatment effect estimates

Table 6: Univariate summary statistics on adjusted data, primary dataset
(Mean, IQR, Range)
Variable Unadjusted Heterogeneous Homogeneous
Age: 19-29 Pct (24.5, 6, 30.9) (24.5, 5.9, 29) (24.5, 5.9, 29)
Age: 30-39 Pct (20.9, 3.4, 20.9) (20.9, 3.1, 19.1) (20.9, 3.1, 19.4)
Age: 40-49 Pct (22.2, 2.5, 15.4) (22.2, 2.3, 13.7) (22.2, 2.2, 13.7)
Avg Adult to Household Ratio (151, 27.2, 174.3) (151, 27.2, 173.8) (151, 27.2, 173.3)
Avg Pop Growth (100.3, 1.9, 13.7) (100.3, 1.2, 6.2) (100.3, 1.2, 6.5)
Children: Missing Pct (10.5, 6.6, 41) (10.5, 6.5, 40.8) (10.5, 6.5, 40.5)
Children: One Pct (11.1, 3.1, 14.3) (11.1, 2.8, 12.3) (11.1, 2.8, 12.5)
Children: Three or More Pct (5.2, 2, 14.1) (5.2, 1.7, 13.5) (5.2, 1.7, 13.3)
Children: Two Pct (9.7, 3.5, 15) (9.7, 3.3, 13.5) (9.7, 3.2, 13.6)
Citizenship Pct (90, 11.9, 57.1) (90, 11.8, 55.4) (90, 11.7, 55.7)
Disability Pct (10.5, 5.3, 28.6) (10.4, 5.3, 26.7) (10.5, 5.4, 27.2)
Educ: HS Degree Pct (26.3, 10.7, 43.2) (26.3, 10.6, 41.8) (26.3, 10.6, 42)
Educ: Less than HS Pct (11.4, 7.7, 45.3) (11.4, 7.5, 45) (11.4, 7.4, 44.6)
Educ: Some College Pct (33.5, 7.9, 34.2) (33.5, 7.5, 32.9) (33.5, 7.4, 33)
Female Pct (50.1, 1.6, 15.4) (50.1, 1.4, 14.2) (50.1, 1.5, 14.2)
Foreign Born Pct (18.1, 22.4, 76) (18.1, 22.2, 75.2) (18.1, 22.2, 75.4)
Hispanic Pct (15.9, 17.7, 97.2) (15.9, 17.7, 97) (15.9, 17.7, 97)
Inc Pov: << 138 Pct (20, 11.9, 45.6) (20, 11.8, 44.8) (20, 11.8, 43.9)
Inc Pov: 139-299 Pct (24.9, 8.4, 34.2) (24.9, 7.9, 34.2) (24.9, 7.8, 34.1)
Inc Pov: 300-499 Pct (23.6, 5.5, 23) (23.6, 4.9, 22.2) (23.6, 4.9, 22.2)
Inc Pov: 500 + Pct (29.3, 18.5, 69.1) (29.3, 18.5, 68.1) (29.3, 18.5, 68)
Married Pct (50.7, 9.4, 45.1) (50.7, 9, 44.1) (50.7, 9.1, 44.2)
Race: White Pct (73.8, 25.4, 91.9) (73.8, 25.5, 91.7) (73.8, 25.5, 91.8)
Republican Governor 2013 a{}^{\textrm{a}} (31.1, 100, 100) (31.1, 100, 100) (31.1, 100, 100)
Republican Lower Leg Control 2013 a{}^{\textrm{a}} (24.1, 0, 100) (24.1, 0, 100) (24.1, 0, 100)
Republican Total Control 2013 a{}^{\textrm{a}} (19.8, 0, 100) (19.8, 0, 100) (19.8, 0, 100)
Student Pct (11.7, 3.4, 29.5) (11.7, 3.5, 28.1) (11.7, 3.5, 28)
Unemployed Pct 2011 (10.2, 4.6, 25.5) (10.2, 3.9, 23.8) (10.2, 3.9, 22.5)
Unemployed Pct 2012 (9.4, 4.5, 28.3) (9.4, 4.3, 23.6) (9.4, 4.3, 23.5)
Unemployed Pct 2013 (8.4, 3.6, 23.4) (8.4, 3.5, 20.1) (8.4, 3.5, 20.5)
Uninsured Pct 2011 (19.6, 11.2, 59) (19.7, 10.9, 51.8) (19.6, 10.9, 52.5)
Uninsured Pct 2012 (19.4, 9.9, 50.6) (19.4, 10.1, 49.7) (19.4, 10.3, 50.2)
Uninsured Pct 2013 (19, 11.2, 49.9) (19, 10.3, 48.2) (19, 10.5, 48.7)
Urban Pct b{}^{\textrm{b}} (82.9, 31.3, 91.3) (82.9, 31.3, 91.3) (82.9, 31.3, 91.3)

a{}^{\textrm{a}} Derived from data obtained from National Conference of State Legislatures

b{}^{\textrm{b}} Derived from 2010 Census

Table 7: Frequency of covariate adjustments falling outside the support of the unadjusted data
(primary dataset)
Variables Heterogeneous Homogeneous
Age: 19-29 Pct 0 0
Age: 30-39 Pct 0 0
Age: 40-49 Pct 0 0
Avg Adult to Household Ratio 0 0
Citizenship Pct 1 0
Disability Pct 2 2
Educ: HS Degree Pct 0 0
Educ: Less than HS Pct 1 1
Educ: Some College Pct 0 0
Female Pct 0 0
Foreign Born Pct 1 1
Uninsured Pct 2011 0 0
Uninsured Pct 2012 1 1
Uninsured Pct 2013 0 1
Hispanic Pct 0 0
Inc Pov: << 138 Pct 1 1
Inc Pov: 139-299 Pct 1 1
Inc Pov: 300-499 Pct 0 0
Inc Pov: 500 + Pct 0 0
Married Pct 0 0
Children: Missing Pct 1 1
Children: One Pct 0 0
Avg Pop Growth 0 0
Race: White Pct 1 1
Student Pct 0 0
Children: Three or More Pct 2 1
Children: Two Pct 1 1
Unemployed Pct 2011 0 0
Unemployed Pct 2012 0 0
Unemployed Pct 2013 0 0
Table 8: Mean non-elderly adult uninsurance rates, 2009-2014
Treatment Group 2009 2010 2011 2012 2013 2014
Non-expansion 21.84 22.97 22.72 22.41 22.01 19.07
Expansion (primary dataset) 19.52 20.20 19.63 19.42 19.01 14.02
Expansion (early excluded) 19.40 20.08 19.21 19.01 18.55 13.64
Figure 3: Correlation matrix: full data, unadjusted covariates
(primary dataset)
Refer to caption

Appendix D Weight Diagnostics

This section contains additional information pertaining to the generation of the balancing weights and their properties.

Table 9 displays each covariate with the targeted level of maximal imbalance δ\delta when generating approximate balancing weights. All variables and tolerances are measured in percentage points.

Table 10 displays the differences between the weighted mean covariate values of the expansion region and the mean of the non-expansion region for our primary dataset and with the early expansion states excluded (calculated using our the homogeneous covariate adjustments). The weights presented here are for the H-SBW estimator. The values under each column are in the following format: (unweighted difference, weighted difference). “Primary” and “Early excluded” refer to the primary dataset and those that exclude the early expansion states. “Percent” indicates that the differences displayed are in percentage points while “Standardized” indicates that the standardized mean differences are displayed. Additional results are available on request.

Figure 4 shows the weights summed by states for the H-SBW, BC-HSBW, SBW, and BC-HSBW weights on the primary dataset. The positive and negative weights are displayed separately and we standardize the weights to sum to 100. Figure 5 displays the same plot excluding the early expansion states. These plots again show that H-SBW and BC-HSBW more evenly disperses the weights across states relative to SBW and BC-SBW, and also highlights the extent to which the weights extrapolate from each state for the bias-corrected estimators.

We conclude by examining whether the H-SBW weights generated using the unadjusted data balance the adjusted covariates. While these metrics do not reflect the “true” imbalances, the comparison can provide some indication of whether the unadjusted weights are overfitting to noisy covariate measurements. Table 11 compares the imbalances among our pre-treatment outcomes using H-SBW weights generated on our unadjusted dataset applied to the adjusted (homogeneous) dataset. The “Unweighted Difference” column represents the raw difference in means, while the “Weighted Difference” column reflects the weighted difference that we calculate on the unadjusted dataset. The “Homogeneous Diff” column displays the weighted imbalance when applying the H-SBW weights to the dataset using the homogeneous adjustment, and similarly for “Heterogeneous Diff.” The weighted pre-treatment outcomes are approximately one percentage point lower than we desired in the two years prior to treatment using the heterogeneous adjustment, and -0.2 percentage points lower on average using the homogeneous adjustment. On the other hand, the naive difference suggests that the imbalance is only -0.05 percentage points. This result suggests that the unadjusted weights are overfitting to noisy covariates and may give an overly optimistic view of the covariate balance.

Table 9: Variables and maximal level of targeted imbalances (δ\delta)
Variable δ\delta
Uninsured Pct 2011 0.05
Uninsured Pct 2012 0.05
Uninsured Pct 2013 0.05
Unemployed Pct 2011 0.15
Unemployed Pct 2012 0.15
Unemployed Pct 2013 0.15
Avg Pop Growth 0.50
Avg Adult to Household Ratio 0.50
Female Pct 1.00
Age: 19-29 Pct 1.00
Age: 30-39 Pct 1.00
Age: 40-49 Pct 1.00
Age: 50-64 Pct 1.00
Married Pct 1.00
Disability Pct 1.00
Hispanic Pct 1.00
Race: White Pct 1.00
Children: One Pct 1.00
Children: Two Pct 1.00
Children: Three or More Pct 1.00
Children: Missing Pct 1.00
Urban Pct 2.00
Citizenship Pct 2.00
Educ: Less than HS Pct 2.00
Educ: HS Degree Pct 2.00
Educ: Some College Pct 2.00
Educ: College Pct 2.00
Student Pct 2.00
Inc Pov: << 138 Pct 2.00
Inc Pov: 139-299 Pct 2.00
Inc Pov: 300-499 Pct 2.00
Inc Pov: 500 + Pct 2.00
Inc Pov: NA Pct 2.00
Race: Other Pct 2.00
Foreign Born Pct 2.00
Republican Governor 2013 25.00
Republican Lower Leg Control 2013 25.00
Republican Total Control 2013 25.00
Table 10: Balance table: percent and standardized mean differences (H-SBW weights on primary dataset, homogeneous adjustment)
(Unweighted difference, Weighted difference)
Variables Preferred (Percent) Preferred (Standardized) Early excluded (Percent) Early excluded (Standardized)
Age: 19-29 Pct (-0.34, -0.34) (-0.05, -0.05) (-0.62, -0.21) (-0.09, -0.03)
Age: 30-39 Pct (0.36, 0.17) (0.1, 0.05) (-0.04, 0.32) (-0.01, 0.09)
Age: 40-49 Pct (0.19, -0.3) (0.06, -0.1) (-0.01, -0.44) (0, -0.15)
Avg Adult to Household Ratio (11.29, -0.04) (0.37, 0) (3.37, 0.1) (0.13, 0)
Citizenship Pct (-3.61, -1.59) (-0.33, -0.15) (-0.24, -1.45) (-0.03, -0.16)
Disability Pct (-1.45, 0.52) (-0.27, 0.1) (-0.17, 0.63) (-0.03, 0.11)
Educ: HS Degree Pct (-3.37, 0.54) (-0.32, 0.05) (-1.02, 0.64) (-0.1, 0.06)
Educ: Less than HS Pct (-0.37, 0.83) (-0.04, 0.1) (-1.22, 0.76) (-0.16, 0.1)
Educ: Some College Pct (-0.35, 0.4) (-0.05, 0.06) (0.36, 0.57) (0.05, 0.08)
Female Pct (-0.34, -0.64) (-0.16, -0.3) (-0.25, -1) (-0.12, -0.48)
Foreign Born Pct (7.6, 2) (0.42, 0.11) (1.02, 2) (0.07, 0.13)
Uninsured Pct 2011 (-3.08, 0.05) (-0.28, 0) (-3.51, -0.05) (-0.34, 0)
Uninsured Pct 2012 (-3, -0.05) (-0.27, 0) (-3.4, 0.05) (-0.33, 0)
Uninsured Pct 2013 (-2.99, -0.05) (-0.27, 0) (-3.45, -0.05) (-0.34, 0)
Hispanic Pct (4.46, 1) (0.2, 0.04) (-1.35, 1) (-0.07, 0.05)
Inc Pov: << 138 Pct (-2.05, 0.63) (-0.19, 0.06) (-1.33, 0.12) (-0.12, 0.01)
Inc Pov: 139-299 Pct (-2.45, 0.65) (-0.35, 0.09) (-1.53, 0.5) (-0.23, 0.08)
Inc Pov: 300-499 Pct (-0.59, -0.18) (-0.12, -0.04) (0.28, -0.18) (0.06, -0.04)
Inc Pov: 500 + Pct (5.58, -1.3) (0.35, -0.08) (2.9, -1.23) (0.2, -0.08)
Married Pct (-0.76, -0.43) (-0.07, -0.04) (-0.21, -0.53) (-0.02, -0.05)
Children: Missing Pct (-3.25, -1) (-0.36, -0.11) (-1.99, -0.1) (-0.21, -0.01)
Children: One Pct (0.7, -0.14) (0.25, -0.05) (0.11, -0.31) (0.04, -0.12)
Avg Pop Growth (-0.09, -0.21) (-0.07, -0.18) (-0.26, -0.19) (-0.22, -0.16)
Race: White Pct (-4.02, 1) (-0.16, 0.04) (0.09, 1) (0, 0.04)
Republican Governor 2013 (-64.78, -25) (-1.28, -0.5) (-54.46, -24.87) (-1.02, -0.47)
Republican Lower Leg Control 2013 (-74.72, -25) (-1.69, -0.57) (-56.67, -23.6) (-1.12, -0.47)
Republican Total Control 2013 (-71.3, -25) (-1.45, -0.51) (-56.47, -25) (-1.02, -0.45)
Student Pct (0.25, -0.5) (0.04, -0.08) (0.11, -0.25) (0.02, -0.04)
Children: Three or More Pct (0, -0.21) (0, -0.08) (-0.17, -0.26) (-0.07, -0.11)
Children: Two Pct (0.76, -0.31) (0.23, -0.09) (0.17, -0.37) (0.05, -0.12)
Unemployed Pct 2011 (0.82, 0.15) (0.18, 0.03) (0.68, 0.15) (0.15, 0.03)
Unemployed Pct 2012 (0.63, -0.03) (0.14, -0.01) (0.47, -0.03) (0.11, -0.01)
Unemployed Pct 2013 (0.42, -0.15) (0.11, -0.04) (0.22, -0.15) (0.06, -0.04)
Urban Pct (8.28, -2) (0.26, -0.06) (2.79, -2) (0.08, -0.06)
  • The values displayed in each cell are the (unweighted, weighted) differences. The columns containing “Standardized” reflect the standardized mean differences while “percent” indicates the mean differences in percentage points. The columns containing “Preferred” indicate that this is for our primary analysis while “Early excluded” is for our analysis that excludes the early expansion states.

Figure 4: Total weights summed by state, primary dataset
Refer to caption
Figure 5: Total weights summed by state, early expansion excluded
Refer to caption
Table 11: Balance comparison: weights estimated on unadjusted data applied to adjusted data
(primary dataset)
Variable Unweighted Difference Weighted Difference Homogeneous Difference Heterogeneous Difference
Uninsured Pct 2011 -3.09 -0.05 -0.11 0.92
Uninsured Pct 2012 -2.99 -0.05 -0.21 -1.06
Uninsured Pct 2013 -3.00 -0.05 -0.38 -0.93

Appendix E Additional Results

This section contains additional results that were not displayed in the main paper.

Tables 12 and 13 presents the full model validation results that also include estimators using the Oaxaca-Blinder OLS and GLS weights (see, e.g, Kline (2011)). The first table only presents the mean error and RMSE, with rows ordered by RMSE for each dataset, while the second table contains the errors for each individual year and the RMSE, with rows ordered by RMSE on the primary dataset. We find that the OLS and GLS estimators perform quite poorly, again showing that the cost of extrapolation is quite high in our application.

Table 14 displays the point estimates and confidence intervals from all estimators we considered, including those using the “heterogeneous adjustment.”

Figure 6(a) displays the change in our estimator when removing each state for all four of our estimators on the adjusted dataset (“homogeneous”) and the unadjusted dataset (“unadjusted”) for the primary dataset. Figure 7(a) displays the same information when excluding the early expansion states.

Our final two tables (Tables 15 and 16) display each point estimate associated with each estimator in the leave-one-state-out jackknife.

Table 12: Estimator pre-treatment outcome mean prediction error and RMSE (in % points)
Primary dataset Early expansion excluded
Sigma estimate Estimator Mean Error RMSE Sigma estimate Estimator Mean Error RMSE
Homogeneous SBW -0.20 0.20 Homogeneous BC-HSBW -0.02 0.07
Homogeneous H-SBW -0.23 0.23 Homogeneous BC-SBW -0.03 0.12
Heterogeneous SBW -0.27 0.27 Heterogeneous BC-HSBW -0.08 0.14
Heterogeneous H-SBW -0.35 0.36 Heterogeneous BC-SBW -0.07 0.15
Homogeneous BC-SBW -0.39 0.39 Homogeneous H-SBW 0.01 0.25
Heterogeneous BC-SBW -0.42 0.42 Homogeneous SBW 0.07 0.26
Unadjusted SBW -0.56 0.56 Heterogeneous SBW 0.04 0.28
Unadjusted H-SBW -0.57 0.57 Heterogeneous H-SBW -0.04 0.29
Homogeneous BC-HSBW -0.58 0.58 Heterogeneous GLS -0.02 0.29
Heterogeneous BC-HSBW -0.63 0.63 Heterogeneous OLS -0.14 0.39
Unadjusted BC-SBW -0.88 0.88 Unadjusted SBW -0.37 0.42
Unadjusted OLS -0.88 0.88 Unadjusted H-SBW -0.43 0.46
Unadjusted GLS -0.89 0.89 Unadjusted BC-HSBW -0.60 0.60
Unadjusted BC-HSBW -0.96 0.96 Unadjusted GLS -0.61 0.63
Homogeneous OLS -1.48 1.50 Unadjusted OLS -0.70 0.71
Homogeneous GLS -1.60 1.61 Unadjusted BC-SBW -0.70 0.71
Heterogeneous GLS -9.39 12.47 Homogeneous GLS 0.28 0.98
Heterogeneous OLS -11.88 16.21 Homogeneous OLS 0.02 1.10
Table 13: Estimator pre-treatment outcome prediction error (in % points)
Primary dataset Early expansion excluded
Sigma estimate Estimator 2012 error 2013 error RMSE 2012 error 2013 error RMSE
Homogeneous SBW -0.18 -0.22 0.20 0.32 -0.18 0.26
Homogeneous H-SBW -0.24 -0.21 0.23 0.26 -0.24 0.25
Heterogeneous SBW -0.25 -0.30 0.27 0.31 -0.24 0.28
Heterogeneous H-SBW -0.32 -0.39 0.36 0.25 -0.32 0.29
Homogeneous BC-SBW -0.42 -0.35 0.39 0.09 -0.15 0.12
Heterogeneous BC-SBW -0.45 -0.39 0.42 0.07 -0.21 0.15
Unadjusted SBW -0.50 -0.61 0.56 -0.18 -0.56 0.42
Unadjusted H-SBW -0.52 -0.61 0.57 -0.26 -0.60 0.46
Homogeneous BC-HSBW -0.53 -0.62 0.58 0.05 -0.09 0.07
Heterogeneous BC-HSBW -0.53 -0.72 0.63 0.03 -0.19 0.14
Unadjusted BC-SBW -0.82 -0.93 0.88 -0.55 -0.84 0.71
Unadjusted OLS -0.91 -0.84 0.88 -0.81 -0.59 0.71
Unadjusted GLS -0.87 -0.91 0.89 -0.78 -0.44 0.63
Unadjusted BC-HSBW -0.93 -0.99 0.96 -0.61 -0.58 0.60
Homogeneous OLS -1.75 -1.21 1.50 1.12 -1.08 1.10
Homogeneous GLS -1.76 -1.45 1.61 1.22 -0.66 0.98
Heterogeneous GLS -1.18 -17.60 12.47 0.27 -0.32 0.29
Heterogeneous OLS -0.85 -22.90 16.21 0.22 -0.50 0.39
Table 14: Point estimates and confidence intervals: all estimators
Primary dataset Early expansion excluded
Weight type Adjustment Estimate (95% CI) Difference Estimate (95% CI) Difference
H-SBW Homogeneous -2.33 (-3.54, -1.11) 0.01 -2.09 (-3.24, -0.94) 0.19
H-SBW Heterogeneous -2.24 (-3.47, -1.00) 0.10 -2.06 (-3.36, -0.77) 0.22
H-SBW Unadjusted -2.34 (-2.88, -1.79) - -2.28 (-2.87, -1.70) -
BC-HSBW Homogeneous -2.05 (-3.30, -0.80) 0.17 -1.94 (-3.27, -0.61) 0.28
BC-HSBW Heterogeneous -1.98 (-3.21, -0.75) 0.24 -1.93 (-3.55, -0.32) 0.29
BC-HSBW Unadjusted -2.22 (-2.91, -1.52) - -2.22 (-3.14, -1.31) -
SBW Homogeneous -2.35 (-3.76, -0.95) 0.04 -2.05 (-3.19, -0.91) 0.16
SBW Heterogeneous -2.28 (-3.58, -0.98) 0.11 -2.03 (-3.35, -0.72) 0.18
SBW Unadjusted -2.39 (-2.99, -1.79) - -2.21 (-2.75, -1.68) -
BC-SBW Homogeneous -2.07 (-3.14, -1.00) 0.13 -1.99 (-3.33, -0.66) 0.23
BC-SBW Heterogeneous -2.00 (-3.07, -0.92) 0.20 -2.00 (-3.65, -0.34) 0.23
BC-SBW Unadjusted -2.19 (-2.94, -1.45) - -2.23 (-3.12, -1.33) -
  • Note: “Difference” column reflects difference between adjusted and unadjusted estimators

  • Note: Confidence intervals are estimated using the leave-one-state-out jackknife and quantiles from t-distribution with m11m_{1}-1 degrees of freedom.

Figure 6: Leave-one-state-out point estimates minus original estimate, primary dataset
Refer to caption
(a) Colors reflect the magnitude of the difference in the estimates when subtracting the original estimate from the estimate that excludes the specified state. The values in the right panel are on the unadjusted data.
Figure 7: Leave-one-state-out point estimates minus original estimate, early expansion excluded
Refer to caption
(a) Colors reflect the magnitude of the difference in the estimates when subtracting the original estimate from the estimate that excludes the specified state. The values in the right panel are on the unadjusted data.
Table 15: Leave one state out jackknife point estimates: all estimators, primary dataset
Homogeneous Heterogeneous Unadjusted
Left out state BC-HSBW BC-SBW H-SBW SBW BC-HSBW BC-SBW H-SBW SBW BC-HSBW BC-SBW H-SBW SBW
All -2.05 -2.07 -2.33 -2.35 -1.98 -2.00 -2.24 -2.28 -2.22 -2.19 -2.34 -2.39
AR -2.40 -2.45 -2.59 -2.61 -2.30 -2.36 -2.46 -2.52 -2.38 -2.49 -2.33 -2.45
AZ -2.06 -2.14 -2.28 -2.31 -2.02 -2.06 -2.26 -2.25 -2.25 -2.24 -2.35 -2.39
CA -1.94 -1.93 -2.17 -2.08 -1.96 -1.90 -2.12 -2.04 -2.13 -2.14 -2.24 -2.19
CO -2.02 -2.03 -2.34 -2.34 -2.00 -1.98 -2.34 -2.34 -2.21 -2.22 -2.36 -2.44
CT -2.04 -2.07 -2.27 -2.28 -1.98 -2.00 -2.20 -2.22 -2.23 -2.25 -2.34 -2.39
HI -2.08 -2.09 -2.30 -2.33 -2.01 -2.03 -2.20 -2.26 -2.26 -2.23 -2.32 -2.39
IA -2.01 -2.10 -2.29 -2.34 -1.93 -2.04 -2.18 -2.26 -2.19 -2.23 -2.31 -2.38
IL -2.04 -1.98 -2.36 -2.29 -2.00 -1.94 -2.31 -2.26 -2.20 -2.14 -2.35 -2.40
KY -1.83 -1.85 -2.03 -1.99 -1.79 -1.82 -1.99 -1.97 -2.15 -2.18 -2.25 -2.32
MD -2.06 -2.11 -2.40 -2.40 -2.02 -2.04 -2.32 -2.33 -2.26 -2.27 -2.42 -2.46
MI -2.03 -2.02 -2.31 -2.29 -1.96 -1.88 -2.23 -2.16 -2.29 -2.24 -2.42 -2.39
MN -2.02 -2.03 -2.34 -2.38 -1.99 -1.96 -2.30 -2.33 -2.19 -2.14 -2.34 -2.39
ND -1.93 -2.01 -2.26 -2.29 -1.85 -1.93 -2.17 -2.21 -2.05 -2.15 -2.24 -2.34
NJ -2.04 -2.04 -2.32 -2.33 -2.01 -2.01 -2.28 -2.30 -2.28 -2.25 -2.41 -2.45
NM -1.95 -1.92 -2.19 -2.17 -1.94 -1.91 -2.20 -2.19 -2.19 -2.13 -2.32 -2.32
NV -2.06 -2.06 -2.36 -2.37 -1.99 -1.99 -2.27 -2.31 -2.23 -2.19 -2.36 -2.40
OH -2.43 -2.10 -2.67 -2.72 -2.41 -2.20 -2.67 -2.64 -2.34 -2.19 -2.38 -2.43
OR -2.08 -2.08 -2.35 -2.37 -2.05 -2.05 -2.33 -2.36 -2.22 -2.19 -2.33 -2.39
RI -2.09 -2.08 -2.34 -2.34 -2.01 -2.01 -2.22 -2.26 -2.25 -2.21 -2.34 -2.39
WA -2.02 -2.06 -2.26 -2.32 -1.99 -2.03 -2.24 -2.31 -2.19 -2.16 -2.26 -2.34
WV -2.07 -2.07 -2.34 -2.36 -2.00 -2.01 -2.24 -2.29 -2.23 -2.21 -2.34 -2.40
  • Note: Row labeled “All” reflects the original estimate

Table 16: Leave one state out jackknife point estimates: all estimators, early expansion excluded
Homogeneous Heterogeneous Unadjusted
Left out state BC-HSBW BC-SBW H-SBW SBW BC-HSBW BC-SBW H-SBW SBW BC-HSBW BC-SBW H-SBW SBW
All -1.94 -1.99 -2.09 -2.05 -1.93 -2.00 -2.06 -2.03 -2.22 -2.23 -2.28 -2.21
AR -1.82 -1.85 -1.92 -1.88 -1.71 -1.73 -1.77 -1.75 -2.39 -2.44 -2.27 -2.16
AZ -1.73 -1.74 -1.93 -1.88 -1.64 -1.66 -1.85 -1.81 -2.18 -2.15 -2.21 -2.17
CO -1.87 -1.95 -2.12 -2.08 -1.86 -1.92 -2.09 -2.00 -2.17 -2.23 -2.30 -2.24
HI -2.09 -2.09 -2.05 -2.03 -2.08 -2.09 -2.04 -1.95 -2.36 -2.30 -2.25 -2.20
IA -1.91 -1.99 -2.07 -2.03 -1.85 -1.95 -2.01 -1.98 -2.18 -2.27 -2.28 -2.21
IL -1.75 -1.68 -2.01 -1.86 -1.62 -1.56 -1.94 -1.79 -2.06 -2.03 -2.27 -2.16
KY -1.76 -1.87 -1.95 -1.93 -1.70 -1.85 -1.89 -1.89 -2.08 -2.20 -2.20 -2.16
MD -1.98 -1.96 -2.24 -2.09 -1.85 -1.87 -2.07 -1.96 -2.31 -2.36 -2.41 -2.35
MI -1.86 -1.95 -2.07 -2.09 -1.97 -1.94 -2.14 -1.97 -2.26 -2.33 -2.31 -2.26
ND -1.95 -1.98 -2.10 -2.06 -1.86 -1.94 -1.95 -1.97 -2.09 -2.22 -2.17 -2.19
NM -1.81 -1.88 -1.94 -1.88 -1.88 -1.89 -1.98 -1.86 -2.17 -2.15 -2.22 -2.15
NV -2.01 -2.08 -2.21 -2.20 -1.92 -2.00 -2.10 -2.01 -2.27 -2.36 -2.38 -2.32
OH -2.40 -2.40 -2.48 -2.41 -2.45 -2.48 -2.49 -2.47 -2.43 -2.43 -2.36 -2.25
OR -1.97 -2.07 -2.07 -2.08 -2.00 -2.05 -2.08 -1.97 -2.16 -2.22 -2.27 -2.21
RI -2.00 -2.06 -2.07 -2.03 -2.00 -2.05 -2.08 -2.01 -2.25 -2.29 -2.28 -2.21
WV -1.95 -1.99 -2.11 -2.04 -1.91 -1.98 -2.07 -2.03 -2.24 -2.24 -2.29 -2.22
  • Note: Row labeled “All” reflects the original estimate

Appendix F Simulation Study

This final section presents simulation results evaluating the performance of our proposed estimators in an idealized setting where the outcome model and measurement error models are known. The first subsection outlines our simulation study. The second subsection presents selected results, including the bias, mean-square error, and coverage rates for our proposed estimators and variance estimates. The final subsection provides additional results analyzing the performance of H-SBW and our proposed variance estimator absent measurement error.

F.1 Study design

For our simulations we generate populations of M1=5000M_{1}=5000 states, each with psp_{s} CPUMAs, so that we obtain a population of N1=s=1M1psN_{1}=\sum_{s=1}^{M_{1}}p_{s} CPUMAs. We draw psiidExp(0.1)+10p_{s}\stackrel{{\scriptstyle iid}}{{\sim}}\lfloor Exp(0.1)+10\rfloor so that the average number of regions per state is approximately 20. We then generate a 3-dimensional covariate vector XscX_{sc}:

μsiidMVN(0,ΣB)\displaystyle\mu_{s}\stackrel{{\scriptstyle iid}}{{\sim}}MVN(0,\Sigma_{B}) (64)
XscμsiidMVN(μs,ΣV)\displaystyle X_{sc}\mid\mu_{s}\stackrel{{\scriptstyle iid}}{{\sim}}MVN(\mu_{s},\Sigma_{V}) (65)

Define ΣX=ΣV+ΣB\Sigma_{X}=\Sigma_{V}+\Sigma_{B}. Let σx,j2\sigma^{2}_{x,j} be the j-th diagonal element of ΣX\Sigma_{X}. Across all simulations, we fix this number to be constant and equal to 2 (i.e. σx,j2=σx2=2\sigma^{2}_{x,j}=\sigma^{2}_{x}=2). We also fix the off-diagonal elements of both ΣV\Sigma_{V} and ΣB\Sigma_{B} to be equal and so that Cor(Xj,Xk)=0.25Cor(X_{j},X_{k})=0.25. Finally, let ρx,j\rho_{x,j} denote the within-state correlation of the observations (i.e. Cor(Xj,sc,Xj,sd)Cor(X_{j,sc},X_{j,sd}) for cdc\neq d). We set this value to be equal for all covariates, so that ρx,j=ρx\rho_{x,j}=\rho_{x}, but vary this parameter across simulations.

We then generate outcomes according to the model:

YscXscMVN(X1,sc+X2,sc+X3,sc,Ω)\displaystyle Y_{sc}\mid X_{sc}\sim MVN(X_{1,sc}+X_{2,sc}+X_{3,sc},\Omega)

where Ω\Omega is a block-diagonal matrix representing homoskedastic and equicorrelated errors outlined below.

Ω\displaystyle\Omega =(Ω10000Ω200000,,ΩM1)\displaystyle=\begin{pmatrix}\Omega_{1}&0&0&...&0\\ 0&\Omega_{2}&0&...&0\\ &&...&&\\ 0&0&0&,...,&\Omega_{M_{1}}\end{pmatrix} (66)
Ωs\displaystyle\Omega_{s} =(σϵ2+σε2σε2σε2σε2σε2σϵ2+σε2σε2σϵ2σε2σε2σε2σϵ2+σε2)\displaystyle=\begin{pmatrix}\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon}&\sigma^{2}_{\varepsilon}&\sigma^{2}_{\varepsilon}&...&\sigma^{2}_{\varepsilon}\\ \sigma^{2}_{\varepsilon}&\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon}&\sigma^{2}_{\varepsilon}&...&\sigma^{2}_{\epsilon}\\ &&...&&\\ \sigma^{2}_{\varepsilon}&\sigma^{2}_{\varepsilon}&\sigma^{2}_{\varepsilon}&...&\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon}\end{pmatrix} (67)

In other words σε2\sigma^{2}_{\varepsilon} represents the variance component from a state-level random effect and σϵ2\sigma^{2}_{\epsilon} represents a variance component from a CPUMA-level random effect, as described in the main paper.

We next generate our noisy outcome and covariate estimates (J,W)(J,W):

(Jsc,Wsc)(Ysc,Xsc)iidMVN((Ysc,Xsc),Σν,sc)\displaystyle(J_{sc},W_{sc})\mid(Y_{sc},X_{sc})\stackrel{{\scriptstyle iid}}{{\sim}}MVN((Y_{sc},X_{sc}),\Sigma_{\nu,sc})
Σν,sc=(σν,sc20000σν,sc20000σν,sc20000σν,sc2)\displaystyle\Sigma_{\nu,sc}=\begin{pmatrix}\sigma^{2}_{\nu,sc}&0&0&0\\ 0&\sigma^{2}_{\nu,sc}&0&0\\ 0&0&\sigma^{2}_{\nu,sc}&0\\ 0&0&0&\sigma^{2}_{\nu,sc}\end{pmatrix}

We allow σν,sc2\sigma^{2}_{\nu,sc} to either be constant or a function of the sample size of an underlying survey that generates the estimate. In the latter case we simulate these sample sizes rsciidUnif(300,2300)r_{sc}\stackrel{{\scriptstyle iid}}{{\sim}}Unif(300,2300). We calculate σv,sc2\sigma_{v,sc}^{2} that satisfies the following equality: σv,sc2=σv2/rsc\sigma^{2}_{v,sc}=\sigma^{2\star}_{v}/r_{sc}. This reflects the model assumed in the “heterogeneous adjustment.” We also define σv2\sigma_{v}^{2} as the expected value of σv,sc2\sigma^{2}_{v,sc}:

σν2=σν2𝔼[1/rsc]\displaystyle\sigma_{\nu}^{2}=\sigma_{\nu}^{2\star}\mathbb{E}[1/r_{sc}]

We then define parameter τ\tau:

τ=σx2/(σx2+σν2)\displaystyle\tau=\sigma^{2}_{x}/(\sigma^{2}_{x}+\sigma^{2}_{\nu})

We consider different values of τ\tau throughout our simulations. This parameter reflects the extent of the measurement error, with τ=1\tau=1 meaning that the covariates are measured without error, and the noise increasing as τ\tau goes to zero.

Define ρ=σε2/(σϵ2+σε2+σν2)\rho^{\star}=\sigma^{2}_{\varepsilon}/(\sigma^{2}_{\epsilon}+\sigma^{2}_{\varepsilon}+\sigma^{2}_{\nu}). ρ\rho^{\star} represents the within-state correlation of the outcome model errors given the true covariates XX, including the measurement errors in the outcome. We fix this to be 0.25 for our primary simulation results. We caveat that ρ\rho^{\star} only represents an optimal value in a setting without measurement error. This is due to the contribution of additional terms in the variance of the estimator from the noisy covariate measurements.

We then consider all 18 combinations of the following parameters:

  • rsciidUnif(300,2300);rsc=1r_{sc}\stackrel{{\scriptstyle iid}}{{\sim}}Unif(300,2300);r_{sc}=1

  • ρx{0,0.25,0.5}\rho_{x}\in\{0,0.25,0.5\}

  • τ{0.85,0.9,0.95}\tau\in\{0.85,0.9,0.95\}

For each parameterization we take 1000 random samples of size m1=25m_{1}=25 states (with n1n_{1} total CPUMAs that average around 450). For each sample we estimate a series of H-SBW weights that set δ=0\delta=0 and the target υ0=(1,1,1)\upsilon_{0}=(1,1,1). Let ρ\rho denote the assumed ρ\rho^{\star} in the H-SBW objective. We generate weights for all combinations of input datasets ZZ and correlation-parameters ρ\rho:

  • Z{WZ\in\{W, XX, X^hom\hat{X}^{hom}, X^het\hat{X}^{het}, X^cor\hat{X}^{cor}}

  • ρ{0,0.25,0.5}\rho\in\{0,0.25,0.5\}

We estimate the variance for each estimator using the leave-one-state-out jackknife, described in Section 3.

Note: for κ^\hat{\kappa} we use the empirical covariance matrix of WW, the estimated means W¯\bar{W}, and Σ^ν,sc\hat{\Sigma}_{\nu,sc}, where we draw Σ^ν,sc\hat{\Sigma}_{\nu,sc} from Σν,sc+𝒩(0,0.001450Id)\Sigma_{\nu,sc}+\mathcal{N}(0,0.001*450*I_{d}). In other words, when averaged together we assume that Σ^ν\hat{\Sigma}_{\nu} have a fairly precise estimate of Σν\Sigma_{\nu}.

F.2 Selected results

We present the results where the “heterogeneous adjustment” model is correct. The results for the homoskedastic measurement errors are quite similar and we observe in the text where they differ. Figure 8 displays the bias associated with each estimator. From left to right the panels reflect different values of τ\tau: the left-most panels have the most measurement error while the right-most panels have the least. From top to bottom the panels reflect different values of ρx\rho_{x}: the top-most panel has no correlation structure among the covariates, while the bottom-most panels are more highly correlated within state. Within each panel we organize the results by the input covariate set: WW represents the estimators generated without any covariate adjustment; XX reflects estimators generated on the true covariates; “Xhat-het” represents the heterogeneous adjustment (X^het\hat{X}^{het}), “Xhat-hom” represents the homogeneous adjustment (X^hom\hat{X}^{hom}), and “Xhat-cor” represents the correlated adjustment (X^cor\hat{X}^{cor}), described in Appendix B. The estimators are colored by the assumed value of ρ\rho in the H-SBW objective, where ρ=0\rho=0 is equivalent to SBW. Across all of the present simulations, we set ρ\rho^{\star} equal to 0.25.

We highlight two general results. First, if we know the true values of XX, all of our estimators are unbiased. Second, balancing on WW results in bias. This bias increases as τ\tau decreases, and, when the covariates are dependent, increases with ρ\rho. These results are consistent with our propositions in Appendices A.1 and  A.2.

When attempting to mitigate this bias by using some estimate of 𝔼[XW]\mathbb{E}[X\mid W], balancing on X^hom\hat{X}^{hom}, X^het\hat{X}^{het}, or X^cor\hat{X}^{cor} results in approximately unbiased estimates for all values of ρ\rho when the covariates uncorrelated. When XX are correlated across observations, setting ρ>0\rho>0 results in biased estimates for X^het\hat{X}^{het} or X^hom\hat{X}^{hom}, where the bias increases with ρ\rho. The SBW estimators remain approximately unbiased, even when the data are correlated. We speculate in Remark 10 why this may be the case. In settings where ρx=0.25\rho_{x}=0.25 the estimators using that X^cor\hat{X}^{cor} remain approximately unbiased. However, when ρx=0.5\rho_{x}=0.5 even these estimators appear to have some bias. In further investigations (results not displayed but available on request), this bias decreases with the number of states. Regardless of the adjustment set, all biases on the adjusted datasets are smaller in absolute magnitude than the corresponding biases associated with estimators that naively balance WW.

Figure 8: Simulation study: estimator bias
Refer to caption
(a) Averaged across 1000 simulations for each specification

All of these simulations had heterogeneous measurement error. When examining the results when the errors are homoskedastic we find that the estimators that balance on X~het\tilde{X}^{het} have a small bias even when τ=0\tau=0 or ρ=0\rho=0. Assuming this model is correct when it is not appears to have some cost. This may help explain the slightly worse performance we found when using the heterogeneous adjustment in our validation study in Section 4.

Figure 9 displays the variance associated with these same estimators. Throughout we obtain modest reductions in variance as we increase ρ\rho. Choosing either ρ=0.25\rho=0.25 or ρ=0.5\rho=0.5 gives similar reductions relative to choosing ρ=0\rho=0 (again equivalent to SBW). These variance reductions may increase if we increased ρ\rho^{\star}, which we fix in this simulation study to be 0.25. For example, Section F.3, Figure 13, considers a wider range of parameterizations of (ρ,ρ)(\rho,\rho^{\star}) in a context without measurement error, and shows that larger variance reductions are possible when ρ\rho^{\star} is higher. Admittedly, these settings may be unlikely in practice.262626In the setting without measurement error ρ\rho^{\star} represents the optimal value.

These results also show that balancing on X^cor\hat{X}^{cor} can lead to a much more variable estimate, particularly when the covariates have larger within-state correlations. Only in the setting with uncorrelated covariates and little measurement error does the variability induced by this adjustment appears to have less cost.

Figure 9: Simulation study: estimator variance
Refer to caption
(a) Averaged across 1000 simulations for each specification

Figure 10 displays the MSE of these estimators. The estimators generated on X^cor\hat{X}^{cor} have larger MSE, largely driven by the increase in variability we observed previously. Despite the increase in bias for H-SBW with X^hom\hat{X}^{hom} or X^het\hat{X}^{het} we still often find modest MSE reductions when using H-SBW relative to SBW. This appears more likely as the magnitude of the measurement error decreases and/or the within-state correlation decreases (i.e. moving towards the top-right of the plot). Whether an MSE improvement is possible also depends on ρ\rho^{\star}: if we were to set ρ=0\rho^{\star}=0, we would expect the MSE of these estimators to increase for all estimators as ρ\rho increases, even when we observe XX (see also Section F.3). The results are similar when considering homoskedastic measurement errors, though the space where we see MSE improvements for H-SBW relative to SBW on the uncorrelated adjustments appears to increase slightly (results available on request).

Figure 10: Simulation study: estimator mean-square-error
Refer to caption
(a) Averaged across 1000 simulations for each specification

We next evaluate the performance of the leave-one-state-out jackknife procedure and evaluate confidence interval coverage (for 95% confidence intervals) and length, and display these results in Figures 11 and  12. We use the standard normal quantiles to generate confidence intervals throughout.

Figure 11 shows that using SBW (ρ=0\rho=0) we obtain approximately nominal coverage rates across all specifications that use XX or some version of X^\hat{X}. However we fail to ever achieve nominal coverage rates when balancing on WW. Even when the amount of measurement error is small (τ=0.95\tau=0.95) we only obtain at best slightly below 90 percent coverage rates. Confidence interval coverage appears to deteriorate as we increase ρx\rho_{x}, even when balancing on the true covariates XX.272727This is consistent with results in Cameron, Gelbach and Miller (2008). Unsurprisingly the rates worsen for estimators generated on X^het\hat{X}^{het} and X^hom\hat{X}^{hom} in the settings where we found the highest bias. We also obtain slightly less than desired rates for estimators using XX. This may be due to the fact that we do not use any degrees-of-freedom adjustment for our confidence intervals (we examine this further in Section F.3). On the other hand, coverage rates are accurate or conservative for estimators generated on X^cor\hat{X}^{cor}. The positive bias of the jackknife variance estimates appears to drive this result.282828We almost always find a positive or negligible bias for the unscaled jackknife variance estimates, consistent with Proposition 5.

Figure 11: Simulation study: jackknife coverage rates
Refer to caption
(a) Averaged across 1000 simulations for each specification

Figure 12 compares the mean confidence interval lengths using the leave-one-state-out jackknife. The H-SBW estimator is associated with slightly more precise estimates, reflecting that the estimators have decreased variability under our assumed correlation structures. We again see that even when we choose ρ\rho sub-optimally we obtain more precise inferences than when using SBW. This suggests benefits to using H-SBW even when our estimate of ρ\rho is inaccurate. As the previous results would suggest, the confidence intervals using X^cor\hat{X}^{cor} are quite large. The results are similar when considering homoskedastic measurement errors.

Figure 12: Confidence interval length
Refer to caption
(a) Averaged across 1000 simulations for each specification

We emphasize five takeaways from this simulation study. First, the variance improvements of SBW relative to H-SBW are likely modest. That said, the extent to which we can improve on SBW also depend on ρ\rho^{\star}, which we fix in this study, and these improvements can be greater as ρ\rho^{\star} increases, which we explore more in Section F.3 below.292929They also depend on how psp_{s} varies across states. We draw psp_{s} from only one distribution throughout our simulations. Second, H-SBW can increase the bias of our proposed estimators in the context of measurement error and dependent data. However, the bias is generally small relative to the bias of balancing on the noisy covariate measurements WW, and MSE improvements using H-SBW are still possible relative to SBW with the simple covariate adjustments (X^hom\hat{X}^{hom} or X^het\hat{X}^{het}). Third, despite the improved theoretic properties, accounting for the correlation in the data when using H-SBW (i.e. balancing on X^cor\hat{X}^{cor} and the data are measured with error) may not be worth the cost in variance given the sample size of states that we consider here. Fourth, we find no evidence that the “heterogeneous adjustment” improves our subsequent estimates even when it reflects the true model. This adjustment may even induce bias when the model is incorrect.303030This finding may in part reflect the distribution of sample sizes we generated, which we took to be uniform. Perhaps with a different distribution these results would differ. Finally, confidence interval coverage when using the leave-one-state-out jackknife variance estimates give close to nominal coverage rates for our unbiased estimators – even when using the standard normal quantiles in a setting with 25 states.

This simulation study assumes throughout that we know the true data generating model for the outcome and that are data are gaussian. Because in practice we would not have such knowledge, this study complements our validation study in Section 4, which has more direct bearing on understanding how these estimators might perform in an applied setting.

F.3 Additional results: H-SBW without measurement error

We consider additional simulations for the setting where XX is known and the outcome model follows homoskedastic but possibly correlated errors. For these results we only vary ρ{0,0.25,0.5,0.75,0.99}\rho^{\star}\in\{0,0.25,0.5,0.75,0.99\}, and fix ρx=0.25\rho_{x}=0.25 throughout.

Figure 13 displays the empirical variance of the H-SBW estimators averaged over 1000 simulations. Each panel reflects different values of ρ\rho^{\star}, while the x-axis throughout displays the assumed value of ρ\rho in the H-SBW objective. The red bars indicate when ρ\rho is optimally selected, ρ=ρ\rho=\rho^{\star}. This selection corresponds with the lowest variance estimator as we expect. Consistent with our previous results, when ρ>0\rho^{\star}>0 any assumed ρ\rho improves the variance of the resulting estimator relative to SBW. This requires in general that our assumed correlation structure of the error terms is correct.

Figure 13: Variance of H-SBW estimator for known covariates
Refer to caption
(a) Averaged across 1000 simulations for each specification

We conclude by examining the confidence interval coverage for the leave-one-state-out-jackknife. Figure 14 displays the results. Consistent with our previous findings we obtain slightly less than nominal coverage rates, particularly when ρ\rho^{\star} is small. This slight undercoverage is likely due in part to our use of standard normal quantiles to generate confidence intervals.313131Consistent with Proposition 5 and the previous results we again find that the unscaled variance estimates are all either positively or negligibly biased. Interestingly, coverage appears to improve when setting ρ>0\rho>0 even when ρ=0\rho^{\star}=0. We speculate that by more evenly dispersing the weights across states, H-SBW may be increasing the effective degrees of freedom of the estimator (see, e.g., Cameron and Miller (2015)), illustrating another potential benefit of this approach. Analyzing the theoretic properties of these variances estimators in this setting is beyond the scope of the present study but would provide interesting future work.

Figure 14: H-SBW coverage for known covariates
Refer to caption
(a) Averaged across 1000 simulations for each specification