This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Estimation and Inference of Average Treatment Effect in Percentage Points under Heterogeneity

Ying Zeng School of Economics and Wang Yanan Institute for Studies in Economics, Xiamen University, China.
Abstract

In semi-log regression models with heterogeneous treatment effects, the average treatment effect (ATE) in log points and its exponential transformation minus one underestimate the ATE in percentage points. I propose new estimation and inference methods for the ATE in percentage points, with inference utilizing the Fenton-Wilkinson approximation. These methods are particularly relevant for staggered difference-in-differences designs, where treatment effects often vary across groups and periods. I prove the methods’ large-sample properties and demonstrate their finite-sample performance through simulations, revealing substantial discrepancies between conventional and proposed measures. Two empirical applications further underscore the practical importance of these methods.

Keywords: Treatment effect heterogeneity, Semi-log regression, Average treatment effect, Difference-in-differences, Percentage point

JEL codes: C01, C21

1 Introduction

Semi-log regression models, where the dependent variable is in the natural logarithm form, are widely used in empirical studies. In these models, the coefficient of a binary treatment is often interpreted as approximating the average treatment effect (ATE) in percentage change when its magnitude is small (Hansen, 2022; Chen and Roth, 2024). This interpretation is based on the following logic: Let Y1Y_{1} and Y0Y_{0} denote the potential outcomes under treatment and non-treatment respectively. The treatment effect in log points is defined as τ=ln(Y1)ln(Y0)\tau=\ln(Y_{1})-\ln(Y_{0}), while the percentage change is given by ρ=(Y1Y0)/Y0=exp(τ)1\rho=\left(Y_{1}-Y_{0}\right)/Y_{0}=\operatorname{exp}(\tau)-1. When |τ||\tau| is small, exp(τ)1τ\operatorname{exp}(\tau)-1\approx\tau, justifying the interpretation of τ\tau as an approximate percentage effect. Alternatively, as suggested by Halvorsen and Palmquist (1980), researchers can directly use the exact formula exp(τ)1\operatorname{exp}(\tau)-1 as the percentage effect of the treatment, which is applicable regardless of the magnitude of τ\tau.

This paper argues, however, that neither τ\tau nor exp(τ)1\operatorname{exp}(\tau)-1 can be interpreted as the ATE in percentage points when treatment effects are heterogeneous across groups.111Following existing discussions, this paper focuses on group-specific treatment effect heterogeneity, leaving within-group heterogeneity for future research. The intuition is straightforward: with group-specific treatment effect heterogeneity, the ATE in log points is a weighted sum of treatment effects in log points, expressed as τ¯=gwgτg\bar{\tau}=\sum_{g}w_{g}\tau_{g}, where wgw_{g} is the share of subgroup gg in the population of interest, and τg\tau_{g} is the log point effect for subgroup gg. Interpreting τ¯\bar{\tau} as an approximation of the average percentage effect ρ¯=gwgρg=gwg(exp(τg)1)\bar{\rho}=\sum_{g}w_{g}\rho_{g}=\sum_{g}w_{g}\left(\operatorname{exp}(\tau_{g})-1\right) requires τg\tau_{g} to be close to zero for all gg, a condition that may fail even if τ¯=0\bar{\tau}=0. Moreover, exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 is also a biased estimator of ρ¯\bar{\rho} as exp(τ¯)1ρ¯\operatorname{exp}(\bar{\tau})-1\geqslant\bar{\rho} by Jensen’s inequality, with equality holds if and only if ρg\rho_{g} is constant across gg. Consequently, exp(τ¯)1ρ¯\operatorname{exp}(\bar{\tau})-1\neq\bar{\rho} under treatment effect heterogeneity, with disparities between them increasing as treatment effects vary more widely.

The literature has largely overlooked the bias introduced by using the ATE in log points or its natural exponential minus one as the ATE in percentage points. This oversight may lead to misleading interpretations of results in empirical studies with heterogeneous treatment effects and log-transformed outcomes. A prominent example is the staggered difference-in-differences design, where different groups start receiving treatment at different times. When treatment effects are heterogeneous across groups or across time since treatment, two-way fixed effects models adopting a constant effect specification yield biased estimators for aggregate and dynamic treatment effects (de Chaisemartin and D’Haultfœuille, 2020; Goodman-Bacon, 2021; Sun and Abraham, 2021). Many heterogeneity-robust estimators have been proposed (de Chaisemartin and D’Haultfœuille, 2020; Callaway and Sant’Anna, 2021; Sun and Abraham, 2021; Borusyak et al., 2024), which amount to the ATE in log points when the outcome is log-transformed. Therefore, these estimators and their exponential minus one should not be interpreted as the ATE in percentage points.

Estimating and conducting inference on the ATE in percentage points ρ¯\bar{\rho} presents challenges. An obvious estimator of ρ¯\bar{\rho} is gw^gexp(τ^g)1\sum_{g}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g})-1, where w^g\hat{w}_{g} and τ^g\hat{\tau}_{g} are consistent estimators for wgw_{g} and τg\tau_{g} respectively. While this estimator is consistent, it can exhibit large small-sample bias due to E(exp(τ^g))exp(τg)E(\operatorname{exp}(\hat{\tau}_{g}))\neq\operatorname{exp}(\tau_{g})(Kennedy, 1981). Furthermore, inference on ρ¯\bar{\rho} is challenging because the distribution of gw^gexp(τ^g)\sum_{g}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g}) has an unknown form if both w^g\hat{w}_{g} and τ^g\hat{\tau}_{g} are asymptotically normal. Nevertheless, knowledge of ρ¯\bar{\rho} is crucial for policy-making, as changes are typically discussed in terms of levels or percentages rather than on a logarithmic scale.

This paper highlights the often-overlooked problem of interpreting the ATE in log points as the ATE in percentage points in the context of heterogeneous treatment effects. Building on this insight, the paper develops estimation and inference procedures for the ATE in percentage points under a general econometrics framework. This framework not only encompasses semi-log regression and semi-log staggered difference-in-differences models but also has the potential to be extended to Poisson regression models. Central to these procedures is a consistent estimator that does not rely on the ATE in log points or individual treatment effects in log points to be small, thus broadening its applicability across various empirical settings. To facilitate inference, the paper develops an approximate method using the Fenton-Wilkinson approximation of the sum of log-normal distributions (Fenton, 1960; Abu-Dayya and Beaulieu, 1994). The validity of these proposed methods is rigorously proven for large samples, and their performance in finite samples is demonstrated through comprehensive Monte Carlo simulations. To illustrate these methods’ practical relevance and applicability, the paper presents two empirical applications: one employing an semi-log regression model to study the causal impact of an education reform on wages, and another utilizing a staggered difference-in-differences design to examine the effect of minimum wage raises on teen employment. Through these contributions, this paper provides researchers with robust tools to more accurately estimate and interpret treatment effects in percentage terms, particularly in the presence of heterogeneity.

This paper connects to three strands of literature. First, it contributes to a growing body of works on treatment effect heterogeneity, which has shown that traditional estimates assuming a constant treatment effect are weighted averages of treatment effects, with weights possibly negative and not equal to the share of the subpopulation. Examples include ordinary least squares (OLS) estimators (Angrist, 1998; Gibbons et al., 2019; Słoczyński, 2022; Goldsmith-Pinkham et al., 2024), two-stage least squares (2SLS) estimators (Imbens and Angrist, 1994; Mogstad et al., 2021), and two-way fixed effects estimators for staggered difference-in-differences models (Goodman-Bacon, 2021; Sun and Abraham, 2021; de Chaisemartin and D’Haultfœuille, 2020; Borusyak et al., 2024). Numerous heterogeneity-robust estimators have been developed, typically by estimating subpopulation treatment effects and assigning proper weights to each subpopulation (e.g., Gibbons et al., 2019; Callaway and Sant’Anna, 2021; Sun and Abraham, 2021; Goldsmith-Pinkham et al., 2024). The results presented in this paper imply that when the outcome is log-transformed in these models, neither the traditional estimator nor the heterogeneity-robust estimators provide proper approximations of the ATE in percentage effects. The estimator and inference methods proposed here can be combined with heterogeneity-robust estimators to create consistent estimation and inference of the ATE in percentage points.

Second, this study extends existing research on interpreting binary variables in semi-log regressions from the case of constant treatment effects to heterogeneous treatment effects. Halvorsen and Palmquist (1980) pointed out that in a regression like ln(yi)=α+τdi+ϵi\ln(y_{i})=\alpha+\tau d_{i}+\epsilon_{i}, where did_{i} is a dummy variable, the percentage effect of dd should be exp(τ)1\operatorname{exp}(\tau)-1 rather than τ\tau. Kennedy (1981) argued that directly replacing τ\tau in exp(τ)1\operatorname{exp}(\tau)-1 with the OLS estimate τ^\hat{\tau} leads to a biased estimator, as E(exp(τ^))exp(τ)E(\operatorname{exp}(\hat{\tau}))\neq\operatorname{exp}(\tau) due to the non-linearity of the exponential function. He suggested using exp(τ^0.5σ^τ2)1\operatorname{exp}(\hat{\tau}-0.5\hat{\sigma}_{\tau}^{2})-1 to correct the bias, observing that E(exp(τ^))=exp(τ+0.5στ2)E(\operatorname{exp}(\hat{\tau}))=\operatorname{exp}(\tau+0.5\sigma_{\tau}^{2}) when τ^𝒩(τ,στ2)\hat{\tau}\sim\mathcal{N}(\tau,\sigma_{\tau}^{2}). This estimator is still biased as it uses the estimated value σ^τ2\hat{\sigma}_{\tau}^{2} rather than στ2\sigma_{\tau}^{2}. Under the assumption that ϵi\epsilon_{i} are i.i.d. normal, Giles (1982) and van Garderen and Shah (2002) proposed exact unbiased estimators. However, simulation results show that the difference between the exact unbiased estimator and the Kennedy (1981) estimator is negligible.

Third, this work supplements ongoing research into the potential pitfalls of using logarithmic transformations in empirical analyses. For instance, Mullahy and Norton (2024) and Chen and Roth (2024) demonstrated that when dependent variables have many zero values, coefficients in regressions with log-like transformations such as ln(y+1)\ln(y+1) do not bear the interpretation of percentage effects. Roth and Sant’Anna (2023) showed that the parallel trends assumption in difference-in-differences models is sensitive to the log-transformation of outcome variables. Manning (1998) and Silva and Tenreyro (2006) highlighted issues of using log-linear regressions for estimating impact on the outcome’s mean and elasticity under heteroscedasticity. The results in this paper further demonstrate that in the presence of treatment effect heterogeneity, coefficients in semi-log regressions may not have an ATE in percentage points interpretation.

The remainder of the paper is structured as follows: Section 2 introduces the definition of ATE in percentage points and discusses potential issues of using ATEs in log points as approximations. Section 3 outlines the estimation and inference methods. Section 4 presents two empirical applications, while Section 5 reports the results of Monte Carlo simulations. Section 6 concludes the paper. The appendix contains results tables from both the empirical applications and Monte Carlo simulations. An online appendix provides proof for all results stated in the main text, explores the extension of the proposed methods to Poisson regression, and presents additional Monte Carlo simulation results.

2 ATE in Percentage Points

2.1 Setup and Definitions

Consider a population partitioned into GG mutually exclusive and collectively exhaustive subgroups, indexed by g=1,,Gg=1,\ldots,G. Let wgw_{g} denote the population share of subgroup gg, where wg>0w_{g}>0 and g=1Gwg=1\sum_{g=1}^{G}w_{g}=1. Formal assumptions on wgw_{g} are presented in Section 3. I assume treatment effects are constant within subgroups but may vary across subgroups.

Define di(g)=1d_{i}^{(g)}=1 if individual ii belongs to subgroup gg, and 0 otherwise. Let yi,0y_{i,0} and yi,1y_{i,1} represent the potential outcomes for individual ii if not treated and if treated respectively. I assume that yi,0>0y_{i,0}>0 and yi,1>0y_{i,1}>0 for all ii, so that their natural logarithms are well defined.

Remark 1.

When outcome variables contain many zero values, alternative frameworks may be more appropriate. Specifically, Poisson regression models, as proposed by Chen and Roth (2024) and Mullahy and Norton (2024), can be employed to estimate the ATE in levels as a percentage of the baseline mean. I provide a detailed exposition on adapting the estimation and inference methods to this scenario in Section LABEL:sec:poisson of the online appendix.

The individual treatment effect in log points is defined as τi=ln(yi,1)ln(yi,0)\tau_{i}=\ln\left(y_{i,1}\right)-\ln\left(y_{i,0}\right). I assume that τi=τg\tau_{i}=\tau_{g} if di(g)=1d_{i}^{(g)}=1, implying homogeneous treatment effects in log points within subgroups. As a result, τi=g=1Gτgdi(g)=diτ\tau_{i}=\sum_{g=1}^{G}\tau_{g}d_{i}^{(g)}=d_{i}^{\prime}\tau, where di=(di(1),,di(G))d_{i}=(d_{i}^{(1)},\ldots,d_{i}^{(G)})^{\prime} and τ=(τ1,,τG)\tau=(\tau_{1},\ldots,\tau_{G})^{\prime}. The average treatment effect (ATE) in log points is τ¯=g=1Gwgτg=wτ\bar{\tau}=\sum_{g=1}^{G}w_{g}\tau_{g}=w^{\prime}\tau, where w=(w1,,wG)w=(w_{1},\ldots,w_{G})^{\prime}.

The individual treatment effect in percentage points is ρi=(yi,1yi,0)/yi,0=exp(τi)1\rho_{i}=\left(y_{i,1}-y_{i,0}\right)/y_{i,0}=\operatorname{exp}(\tau_{i})-1. Consequently ρi=ρg\rho_{i}=\rho_{g} when di(g)=1d_{i}^{(g)}=1, ensuring homogeneity of percentage point effects within subgroups. The ATE in percentage points is defined as

ρ¯=g=1Gwgρg=g=1Gwgexp(τg)1.\bar{\rho}=\sum_{g=1}^{G}w_{g}\rho_{g}=\sum_{g=1}^{G}w_{g}\operatorname{exp}(\tau_{g})-1. (1)

The objective of this paper is to develop estimation and inference methods for ρ¯\bar{\rho} using consistent and asymptotic normal estimators of wgw_{g} and τg\tau_{g}.

2.2 Log Points Approximations

Empirical studies often approximate average effects in percentage points using average effects in log points. When treatment effects are homogeneous, this approximation is reasonably accurate for modest log point effects. Formally, when τg=τ¯\tau_{g}=\bar{\tau} and hence ρg=ρ¯\rho_{g}=\bar{\rho} for all gg, the approximation error is given by ρ¯τ¯=exp(τ¯)1τ¯\bar{\rho}-\bar{\tau}=\operatorname{exp}(\bar{\tau})-1-\bar{\tau}. This error function is strictly convex in τ¯\bar{\tau} and reaches its minimum value of 0 when τ¯=0\bar{\tau}=0. When |τ¯|0.05|\bar{\tau}|\leqslant 0.05, the approximation error is 0ρ¯τ¯0.00130\leqslant\bar{\rho}-\bar{\tau}\leqslant 0.0013, i.e., less than 0.13 percentage points.222The function f(x)=exp(x)1xf(x)=\operatorname{exp}(x)-1-x has f(x)=exp(x)xf^{\prime}(x)=\operatorname{exp}(x)-x, and f′′(x)=exp(x)>0f^{\prime\prime}(x)=\operatorname{exp}(x)>0. With f(0)=0f^{\prime}(0)=0, we have f(x)f(0)=0f(x)\geqslant f(0)=0. Also, note that f(0.05)=0.00127>f(0.05)=0.00123f(0.05)=0.00127>f(-0.05)=0.00123. Hence f(x)0.00127f(x)\leqslant 0.00127 when |x|0.05|x|\leqslant 0.05. This justifies interpreting τ¯\bar{\tau} as a percentage effect when |τ¯||\bar{\tau}| is small and treatment effects are constant. Furthermore, when treatment effects are homogeneous, the expression exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 provides the exact value of ρ¯\bar{\rho}, regardless of the magnitude of τ¯\bar{\tau} (Halvorsen and Palmquist, 1980).

However, when treatment effects are heterogeneous across subgroups, the average effect in log points or its transformation exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 can differ substantially from ρ¯\bar{\rho}. Two types of log-based approximations are relevant in this context: (i) The weighted average of treatment effects in log points with arbitrary weights: τ~=g=1Gw~gτg\tilde{\tau}=\sum_{g=1}^{G}\tilde{w}_{g}\tau_{g}, where w~gwg\tilde{w}_{g}\neq w_{g} generally and g=1Gw~g=1\sum_{g=1}^{G}\tilde{w}_{g}=1. When w~g0\tilde{w}_{g}\geqslant 0 for all gg, τ~\tilde{\tau} is called a convex average of τg\tau_{g}. (ii) The ATE in log points: τ¯=g=1Gwgτg\bar{\tau}=\sum_{g=1}^{G}w_{g}\tau_{g}, which is a special type of τ~\tilde{\tau} with w~=w\tilde{w}=w and w~g>0\tilde{w}_{g}>0. Below I discuss the potential pitfalls of each of them.

2.2.1 Weighted Average of Log Point Effects

Interpreting estimates of τ~\tilde{\tau}, the weighted average of treatment effect in log points, as approximations of ρ¯\bar{\rho} is common in empirical research. This practice is particularly prevalent when treatment effects in log-transformed outcomes are heterogeneous, yet estimation assumes a constant effect. For example, consider a semi-log regression model ln(yi)=α+τsi+xiβ+ϵi,\ln(y_{i})=\alpha+\tau s_{i}+x_{i}^{\prime}\beta+\epsilon_{i}, where sis_{i} is the treatment dummy and xix_{i} is a column vector of covariates. Under the assumption E[ln(y0,i)|xi,si]=E(ln(y0,i)|xi)E\left[\ln(y_{0,i})|x_{i},s_{i}\right]=E(\ln(y_{0,i})|x_{i}), where y0,iy_{0,i} is the potential outcome if not treated, the OLS estimator of τ\tau is a convex average of covariate-specific treatment effect in ln(yi)\ln(y_{i}) (Angrist, 1998). Another example is the two-way fixed effects (TWFE) estimator for the staggered difference-in-differences designs, where different cohorts start to get treated at different times. Treatment effects can be heterogeneous across cohorts and across event time (time since treatment). Consider the following TWFE model:

ln(yit)=αi+γt+τsit+xitβ+ϵit,\ln(y_{it})=\alpha_{i}+\gamma_{t}+\tau s_{it}+x_{it}^{\prime}\beta+\epsilon_{it}, (2)

where yity_{it} is the outcome of individual ii at time tt, sits_{it} is the indicator that ii has already been treated in period tt, αi\alpha_{i}, γt\gamma_{t} and xitx_{it} are individual fixed effects, time fixed effects and control variables respectively. Under parallel trend and no anticipation assumptions, the OLS estimator for τ\tau is a weighted average of treatment effect in ln(yi)\ln(y_{i}) for different cohort-event time combinations. Notably, some of the weights can potentially be negative (de Chaisemartin and D’Haultfœuille, 2020; Goodman-Bacon, 2021). When outcomes are log-transformed, two-stage least squares (2SLS) estimators are also weighted average of log point effects (see, e.g., Imbens and Angrist 1994; Mogstad et al. 2021).

In these examples, interpreting the coefficient of the treatment dummy or its exponential transformation minus one as the ATE in percentage points implicitly approximates ρ¯\bar{\rho} with weighted averages of log points effects τ~\tilde{\tau} or exp(τ~)1\operatorname{exp}(\tilde{\tau})-1. Such approximations can lead to misleading interpretations due to several key issues.

First, if τ~\tilde{\tau} is a non-convex average of τg\tau_{g}, as can occur in some staggered difference-in-differences settings, then τ~\tilde{\tau} and exp(τ~)1\operatorname{exp}(\tilde{\tau})-1 may fall out of the convex hull of ρg\rho_{g}. For example, if (w~1,w~2)=(0.1,1.1)(\tilde{w}_{1},\tilde{w}_{2})=(-0.1,1.1) and (τ1,τ2)=(0.1,0.05)(\tau_{1},\tau_{2})=(0.1,0.05), we have (ρ1,ρ2)=(0.105,0.051)(\rho_{1},\rho_{2})=(0.105,0.051), hence τ~=0.045\tilde{\tau}=0.045 and exp(τ~)1=0.046\operatorname{exp}(\tilde{\tau})-1=0.046 are both outside the convex hull of ρg\rho_{g}.

Secondly, if τ~\tilde{\tau} is a convex average of τg\tau_{g}, τ~\tilde{\tau} may still be out of the convex hull of ρg\rho_{g}, although exp(τ~)1\operatorname{exp}(\tilde{\tau})-1 is in the convex hull of ρg\rho_{g}. One example of the former case is when (τ1,τ2)=(0.09,0.1)(\tau_{1},\tau_{2})=(0.09,0.1) and w~=(0.9,0.1)\tilde{w}=(0.9,0.1), then τ~=0.091\tilde{\tau}=0.091 is not in the convex hull of (ρ1,ρ2)=(0.094,0.105)(\rho_{1},\rho_{2})=(0.094,0.105). To see the latter, note that with 0w~g10\leqslant\tilde{w}_{g}\leqslant 1, ming(τg)τ~maxg(τg)\min_{g}(\tau_{g})\leqslant\tilde{\tau}\leqslant\max_{g}(\tau_{g}), hence ming(ρg)=min(exp(τg)1)exp(τ~)1maxg(exp(τg)1)=max(ρg)\min_{g}(\rho_{g})=\min(\operatorname{exp}(\tau_{g})-1)\leqslant\operatorname{exp}(\tilde{\tau})-1\leqslant\max_{g}(\operatorname{exp}(\tau_{g})-1)=\max(\rho_{g}).

Third, even if τ~\tilde{\tau} and exp(τ~)1\operatorname{exp}(\tilde{\tau})-1 are in the convex hull of ρg\rho_{g}, they are different from ρ¯\bar{\rho}. Take τ~\tilde{\tau} for example,

τ~ρ¯\displaystyle\tilde{\tau}-\bar{\rho} =g=1Gw~gτgg=1Gwgρg=g=1Gwg(τgρg)+g=1G(w~gwg)τg,\displaystyle=\sum_{g=1}^{G}\tilde{w}_{g}\tau_{g}-\sum_{g=1}^{G}w_{g}\rho_{g}=\sum_{g=1}^{G}w_{g}(\tau_{g}-\rho_{g})+\sum_{g=1}^{G}(\tilde{w}_{g}-w_{g})\tau_{g},

which depends on the disparities between wgw_{g} and w~g\tilde{w}_{g}, as well as between τg\tau_{g} and ρg=exp(τg)1\rho_{g}=\operatorname{exp}(\tau_{g})-1. The difference approaches 0 if τg\tau_{g} is close to 0 for all gg, but can be large even if τ~\tilde{\tau} is close to 0. One example is when (w~1,w~2)=(0.2,0.8)(\tilde{w}_{1},\tilde{w}_{2})=(0.2,0.8), (w1,w2)=(0.8,0.2)(w_{1},w_{2})=(0.8,0.2) and (τ1,τ2)=(0.08,0.02)(\tau_{1},\tau_{2})=(0.08,-0.02), then τ~=exp(τ~)1=0\tilde{\tau}=\operatorname{exp}(\tilde{\tau})-1=0 but ρ¯=0.063\bar{\rho}=0.063, and the difference is 6.36.3 percentage points.

2.2.2 ATE in Log Points

The ATE in log points, denoted as τ¯\bar{\tau}, can be estimated by explicitly accounting for treatment effects heterogeneity. This involves estimating τg\tau_{g} and wgw_{g} for each subgroup, then computing their weighted average. For example, in staggered difference-in-differences designs, Callaway and Sant’Anna (2021) develop estimates for aggregate treatment effects that essentially represent a set of ATEs in log points when outcomes are log-transformed. Their method employs doubly robust estimation to estimate each τg\tau_{g} (i.e., treatment effect in log points for a specific cohort-event time combination), and then aggregates them up using respective weights. Another example is the interaction-weighted estimator developed by Gibbons et al. (2019) and applied to staggered difference-in-differences settings by Sun and Abraham (2021). With log-transformed outcomes, these methods estimate τg\tau_{g} using OLS and then compute the weighted average using sample size based weights.

Given that wg>0w_{g}>0, τ¯\bar{\tau} is a convex average of τg\tau_{g}. As discussed in the second property of τ~\tilde{\tau}, exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 lies within the convex hull of ρg\rho_{g}, but τ¯\bar{\tau} itself may not. Moreover, both τ¯\bar{\tau} and exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 may be poor approximations of ρ¯\bar{\rho}. The approximation bias of τ¯\bar{\tau} is τ¯ρ¯=g=1Gwg(τgρg)=g=1Gwg(τgexp(τg)+1)\bar{\tau}-\bar{\rho}=\sum_{g=1}^{G}w_{g}(\tau_{g}-\rho_{g})=\sum_{g=1}^{G}w_{g}(\tau_{g}-\operatorname{exp}(\tau_{g})+1). This bias approaches 0 if τg\tau_{g} is close to 0 for all gg. However, it can be substantial if individual τg\tau_{g} values are large, even when τ¯=0\bar{\tau}=0. For instance, with w1=w2=0.5w_{1}=w_{2}=0.5 and (τ1,τ2)=(0.2,0.2)(\tau_{1},\tau_{2})=(-0.2,0.2), we obtain τ¯=0\bar{\tau}=0 but ρ¯=0.02\bar{\rho}=0.02. The approximation exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 yields a smaller error, as ρ¯exp(τ¯)1τ¯\bar{\rho}\geqslant\operatorname{exp}(\bar{\tau})-1\geqslant\bar{\tau}. The first inequality follows from Jensen’s inequality: g=1Gwgexp(τg)exp(wgτg)\sum_{g=1}^{G}w_{g}\operatorname{exp}(\tau_{g})\geqslant\operatorname{exp}(\sum w_{g}\tau_{g}), with equality holds only when treatment effects are homogeneous (τg=τ\tau_{g}=\tau for all gg). The larger the heterogeneity in τg\tau_{g}, the greater the bias of exp(τ¯)1\operatorname{exp}(\bar{\tau})-1. The second inequality stems from exp(x)x10\operatorname{exp}(x)-x-1\geqslant 0 for any xx, with equality holds only when τ¯=0\bar{\tau}=0, see footnote 2. Thus, exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 provides a better approximation for ρ¯\bar{\rho} as it always lies within the convex hull of ρg\rho_{g} and yields a smaller bias. However, exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 lacks a clear economic interpretation. It can be viewed as an alternative to the geometric mean of the percentage effect, as exp(τ¯)=exp(g=1Gwgln(ρg+1))\operatorname{exp}(\bar{\tau})=\operatorname{exp}\left(\sum_{g=1}^{G}w_{g}\ln(\rho_{g}+1)\right) represents the weighted geometric mean of ρg+1\rho_{g}+1, which is the ratio of the potential outcomes yi,1/yi,0y_{i,1}/y_{i,0} for subgroup gg.

While τ¯\bar{\tau} or exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 improve upon τ~\tilde{\tau} or exp(τ~)1\operatorname{exp}(\tilde{\tau})-1, they still fail to fully capture the ATE in percentage points. These limitations underscore the importance of developing consistent and, under certain circumstances, unbiased estimation and inference methods for ρ¯\bar{\rho}, which will be addressed in the subsequent section.

3 Estimation and Inference

This study proposes a general econometric framework for the estimation and inference of ρ¯\bar{\rho}, building upon consistent and asymptotically normal estimators for w=(w1,,wG)w=(w_{1},\ldots,w_{G})^{\prime} and τ=(τ1,,τG)\tau=(\tau_{1},\ldots,\tau_{G})^{\prime}.333The methodology is implemented in an accompanying R package, forthcoming for public use. The proposed methodology imposes no restrictions on the function form or application context of these estimators, provided they meet the assumptions in Section 3.1. The framework is motivated by semi-log regression models with multiple treatment indicators, which naturally extends to semi-log difference-in-differences models with staggered treatment adoption. However, the framework’s design allows extension to diverse empirical settings beyond its primary motivation.

3.1 Estimators of ww and τ\tau

I impose the following assumptions on the parameters ww and τ\tau.

Assumption 1.

There exist constants cwc_{w} and CwC_{w} such that 0<cwwgCw10<c_{w}\leqslant w_{g}\leqslant C_{w}\leqslant 1 for all gg, and gGwg=1\sum_{g}^{G}w_{g}=1.

Without loss of generality, I restrict wgw_{g} to be strictly positive for all gg, excluding groups with zero share from the analysis. I allow wg=1w_{g}=1 to incorporate the case of G=1G=1, representing homogeneous treatment effects.

Assumption 2.

For g=1,,Gg=1,\ldots,G, CττgCτ-C_{\tau}\leqslant\tau_{g}\leqslant C_{\tau} for some constant 0Cτ<0\leqslant C_{\tau}<\infty.

Assumption 2 bounds the value of τg\tau_{g}. It allows Cτ=0C_{\tau}=0, which implies τg=0\tau_{g}=0 and consequently ρg=0\rho_{g}=0 for all gg. Given that ρg=exp(τg)1\rho_{g}=\operatorname{exp}(\tau_{g})-1, the assumption leads to exp(Cτ)1ρgexp(Cτ)1\operatorname{exp}(-C_{\tau})-1\leqslant\rho_{g}\leqslant\operatorname{exp}(C_{\tau})-1. Combining Assumptions 1 and 2 yields Cττ¯Cτ-C_{\tau}\leqslant\bar{\tau}\leqslant C_{\tau} and exp(Cτ)1ρ¯exp(Cτ)1\operatorname{exp}(-C_{\tau})-1\leqslant\bar{\rho}\leqslant\operatorname{exp}(C_{\tau})-1, where τ¯\bar{\tau} and ρ¯\bar{\rho} are ATEs in log points and percentage points respectively.

Let w^=(w^1,,w^G)\hat{w}=(\hat{w}_{1},\ldots,\hat{w}_{G})^{\prime} and τ^=(τ^1,,τ^G)\hat{\tau}=(\hat{\tau}_{1},\ldots,\hat{\tau}_{G})^{\prime} denote the estimators of ww and τ\tau respectively. I assume that w^\hat{w} and τ^\hat{\tau} are consistent and asymptotically jointly normal.

Assumption 3.

(a) There exist G×GG\times G constant matrices Σ¯w\bar{\Sigma}_{w} and Σ¯τ\bar{\Sigma}_{\tau}, such that Σ¯w\bar{\Sigma}_{w} is positive semi-definite, Σ¯τ\bar{\Sigma}_{\tau} is positive definite, and as NN\rightarrow\infty,

N(w^wτ^τ)𝑑𝒩[02G×1,(Σ¯w00Σ¯τ)].\sqrt{N}\left(\begin{array}[]{c}\hat{w}-w\\ \hat{\tau}-\tau\end{array}\right)\xrightarrow{d}\mathcal{N}\left[0_{2G\times 1},\left(\begin{array}[]{cc}\bar{\Sigma}_{w}&0\\ 0&\bar{\Sigma}_{\tau}\end{array}\right)\right].

(b) Let Σ¯^w\hat{\bar{\Sigma}}_{w} and Σ¯^τ\hat{\bar{\Sigma}}_{\tau} be the estimators of Σ¯w\bar{\Sigma}_{w} and Σ¯τ\bar{\Sigma}_{\tau} respectively. As NN\rightarrow\infty, Σ¯^wΣ¯w𝑝0\hat{\bar{\Sigma}}_{w}-\bar{\Sigma}_{w}\xrightarrow{p}0, and Σ¯^τΣ¯τ𝑝0\hat{\bar{\Sigma}}_{\tau}-\bar{\Sigma}_{\tau}\xrightarrow{p}0 .

Assumption 3 is a standard property for estimators, except for the restriction of asymptotic joint normality. The asymptotic joint normality can hold trivially if w^\hat{w} and τ^\hat{\tau} are independent, e.g., when w^\hat{w} is estimated with sampling factors, and τ^\hat{\tau} is estimated with another set of variables independent of sampling. It can also hold when did_{i} is a vector of dummies for sub-treatment groups and τ^\hat{\tau} is an OLS estimator for the treatment effects in log points, as will be illustrated in Example 1.

I define Σw=Σ¯w/N\Sigma_{w}=\bar{\Sigma}_{w}/N and Στ=Σ¯τ/N\Sigma_{\tau}=\bar{\Sigma}_{\tau}/N, which are the asymptotic variances of w^\hat{w} and τ^\hat{\tau} respectively. When NN is large enough, approximately we have444The statement is heuristic, see Wooldridge (2010) pp. 40-42 for related discussions.

(w^wτ^τ)𝒩[02G×1,(Σw00Στ)].\left(\begin{array}[]{c}\hat{w}-w\\ \hat{\tau}-\tau\end{array}\right)\sim\mathcal{N}\left[0_{2G\times 1},\left(\begin{array}[]{cc}\Sigma_{w}&0\\ 0&\Sigma_{\tau}\end{array}\right)\right].

Denote the gg-th diagonal elements of Στ=Σ¯τ/N\Sigma_{\tau}=\bar{\Sigma}_{\tau}/N and Σ^τ=Σ¯^τ/N\hat{\Sigma}_{\tau}=\hat{\bar{\Sigma}}_{\tau}/N as στ,g2\sigma_{\tau,g}^{2} and σ^τ,g2\hat{\sigma}_{\tau,g}^{2} respectively, then στ,g2\sigma_{\tau,g}^{2} is the asymptotic variance of τ^g\hat{\tau}_{g} and σ^τ,g2\hat{\sigma}_{\tau,g}^{2} is its estimator, with N(σ^τ,g2στ,g2)𝑝0N(\hat{\sigma}_{\tau,g}^{2}-\sigma_{\tau,g}^{2})\xrightarrow{p}0 under Assumption 3. Let Σ^w=Σ¯^/N\hat{\Sigma}_{w}=\hat{\bar{\Sigma}}/N, and define σw,g2\sigma_{w,g}^{2} and σ^w,g2\hat{\sigma}_{w,g}^{2} in a similar manner, then N(σ^w,g2σw,g2)𝑝0N(\hat{\sigma}_{w,g}^{2}-\sigma_{w,g}^{2})\xrightarrow{p}0.

To construct the exact unbiased estimator, I further assume:

Assumption 4.

(a) For g=1,,Gg=1,\ldots,G, E(w^g)=wgE\left(\hat{w}_{g}\right)=w_{g}.

(b) The estimator w^g\hat{w}_{g} is a function of some set of variables ZZ, and we have τ^gτg|Z𝒩(0,στ,g2(Z))\hat{\tau}_{g}-\tau_{g}|Z\sim\mathcal{N}(0,\sigma_{\tau,g}^{2}(Z)), (mσ^τ,g2/στ,g2(Z))|Zχ2(m)\left(m\hat{\sigma}_{\tau,g}^{2}/\sigma_{\tau,g}^{2}(Z)\right)|Z\sim\chi^{2}(m) for some positive integer mm, and τ^g\hat{\tau}_{g} and σ^τ,g2\hat{\sigma}_{\tau,g}^{2} are independent conditional on ZZ.

Assumption 4 can hold when τ^\hat{\tau} is the OLS estimator in a semi-log regression model with i.i.d. normal errors and independent sampling.

I use two examples to highlight how the general assumptions are met in practice and explain their formulation. The first example is the OLS estimator in a semi-log regression model with multiple treatments under random sampling. The second example, which can be viewed as a special case of the first, is a semi-log difference-in-differences model with staggered adoption.

3.1.1 Example 1: Semi-log Regression Model

This example demonstrates how w^\hat{w} and τ^\hat{\tau} satisfying Assumption 3 can be obtained as OLS estimators, applicable in research designs such as (conditional) randomized controlled trials.

The objective is to estimate the average treatment effect on the treated (ATT) in percentage points. The population of interest is the entire treatment group, comprising GG sub-treatment groups, with homogeneous treatment effects within each sub-treatment group but potential heterogeneity across groups. In this setting, di(g)=1d_{i}^{(g)}=1 if ii is in sub-treatment group gg and 0 otherwise. The indicator of treatment is si=g=1Gdi(g)=𝟏dis_{i}=\sum_{g=1}^{G}d_{i}^{(g)}=\mathbf{1}^{\prime}d_{i}, where 𝟏G\mathbf{1}_{G} is a G×1G\times 1 vector of ones.

I obtain τ^\hat{\tau} as the OLS estimator from a semi-log regression model. Note that sidi=dis_{i}d_{i}=d_{i} as di(g)d_{i}^{(g)} are mutually exclusive dummies that sum up to sis_{i}. Also recall that τiln(y1,i)ln(y0,i)=diτ\tau_{i}\equiv\ln(y_{1,i})-\ln(y_{0,i})=d_{i}^{\prime}\tau. Thus the observed log-transformed outcome is

ln(yi)\displaystyle\ln(y_{i}) =ln(y0,i)(1si)+ln(y1,i)si=ln(y0,i)+siτi=ln(y0,i)+diτ.\displaystyle=\ln(y_{0,i})(1-s_{i})+\ln(y_{1,i})s_{i}=\ln(y_{0,i})+s_{i}\tau_{i}=\ln(y_{0,i})+d_{i}^{\prime}\tau.

Let xx denote a kx×1k_{x}\times 1 vector of observed covariates including a constant term. Assuming that E(ln(y0,i)|xi,di)=xiβE\left(\ln(y_{0,i})\left|x_{i},d_{i}\right.\right)=x_{i}^{\prime}\beta, a semi-log linear regression model that allows for treatment effect heterogeneity is specified as

ln(yi)=xiβ+diτ+ϵi,\ln(y_{i})=x_{i}^{\prime}\beta+d_{i}^{\prime}\tau+\epsilon_{i}, (3)

where ϵi=ln(y0,i)E[ln(y0,i)|xi,di]\epsilon_{i}=\ln(y_{0,i})-E\left[\ln\left(y_{0,i}\right)|x_{i},d_{i}\right] and by construction E(ϵi|xi,di)=0E(\epsilon_{i}|x_{i},d_{i})=0. 555The specification can also be justified by other assumptions and alternative estimation approaches are available, see Goldsmith-Pinkham et al. (2024) for a comprehensive discussion of the estimation of heterogeneous treatment effects in linear regressions.

Remark 2.

The specification in equation (3) imposes no restriction on the heterogeneity of ln(y0,i)\ln(y_{0,i}) and permits various group structures. The control group can be homogeneous or partitioned into GG sub-groups corresponding to treatment sub-groups. In the latter case, xix_{i} includes sub-group dummies, and di(g)d_{i}^{(g)} represents sub-group-treatment interactions. For instance, consider a sample comprising GG cities, each containing both control and treatment units. Here, xix_{i} would include city dummies, and did_{i} would be city-treatment interactions, capturing city-specific effects. In this specific scenario, the approach is analogous to the interacted weighted estimator in Gibbons et al. (2019), differing in the log-transformed outcome.

The matrix form of model (3) is lnY=Xβ+Dτ+ϵ,lnY=X\beta+D\tau+\epsilon, where lnY=(ln(y1),,ln(yN))lnY=\left(\ln(y_{1}),\ldots,\ln(y_{N})\right)^{\prime}, X=(x1,,xN)X=\left(x_{1},\ldots,x_{N}\right)^{\prime}, D=(d1,,dN)D=(d_{1},\ldots,d_{N})^{\prime}, and ϵ=(ϵ1,,ϵN)\epsilon=\left(\epsilon_{1},\ldots,\epsilon_{N}\right)^{\prime}. Let x~i=(xi,di)\tilde{x}_{i}=\left(x_{i}^{\prime},d_{i}^{\prime}\right)^{\prime}, and X~=(x~1,x~N)=[X,D]\tilde{X}=(\tilde{x}_{1},\ldots\tilde{x}_{N})^{\prime}=[X,D]. The OLS estimator of (β,τ)(\beta^{\prime},\tau^{\prime})^{\prime} is given by (β^,τ^)=(X~X~)1X~lnY(\hat{\beta}^{\prime},\hat{\tau}^{\prime})^{\prime}=(\tilde{X}^{\prime}\tilde{X})^{-1}\tilde{X}^{\prime}lnY. The heteroscedasticity robust estimator for the asymptotic variance of N(β^,τ^)\sqrt{N}(\hat{\beta}^{\prime},\hat{\tau}^{\prime})^{\prime} is

Σ¯^(β,τ)=N(X~X~)1(i=1Nx~ix~iϵ^i2)(X~X~)1,\hat{\bar{\Sigma}}_{(\beta,\tau)}=N(\tilde{X}^{\prime}\tilde{X})^{-1}\left(\sum_{i=1}^{N}\tilde{x}_{i}\tilde{x}_{i}^{\prime}\hat{\epsilon}_{i}^{2}\right)(\tilde{X}^{\prime}\tilde{X})^{-1}, (4)

where ϵ^i\hat{\epsilon}_{i} is the OLS residual.

I next discuss the estimation of ww. The population share of sub-treatment group gg in the whole treatment group is wg=E(di(g)=1|si=1)w_{g}=E(d_{i}^{(g)}=1|s_{i}=1). Suppose we have a sample of NN individuals, with NS=i=1NsiN_{S}=\sum_{i=1}^{N}s_{i} in the treatment group. Define Ng=i=1Ndi(g)N_{g}=\sum_{i=1}^{N}d_{i}^{(g)} as the size of sub-treatment group gg so that NS=g=1GNgN_{S}=\sum_{g=1}^{G}N_{g}.

The estimator of wgw_{g} is w^g=Ng/NS\hat{w}_{g}=N_{g}/N_{S} and of ww is w^=(N1/NS,,NG/NS)\hat{w}=(N_{1}/N_{S},\ldots,N_{G}/N_{S})^{\prime}. Denote the diagonal matrix with elements of vector α\alpha on the main diagonal as diag(α)\operatorname{diag}(\alpha). Let

Σ¯^w=N/NS(diag(w^)w^w^)\hat{\bar{\Sigma}}_{w}=N/N_{S}\left(\operatorname{diag}(\hat{w})-\hat{w}\hat{w}^{\prime}\right) (5)

be the estimator for the asymptotic variance of N(w^w)\sqrt{N}\left(\hat{w}-w\right). As shown in the proof of Lemma 1 in the online appendix, w^g\hat{w}_{g} is the OLS estimator for wgw_{g} in the regression di(g)=wgsi+vid_{i}^{(g)}=w_{g}s_{i}+v_{i} and hence a function of D=(d1,,dN)D=(d_{1},\ldots,d_{N}), and Σ¯^w/N\hat{\bar{\Sigma}}_{w}/N is the heteroscedasticity robust covariance matrix of w^\hat{w}.

Lemma 1.

Assume that: (i) The data {(yi,xi,di),i=1,,N}\left\{\left(y_{i},x_{i},d_{i}\right),i=1,\ldots,N\right\} is an i.i.d. sample drawn from the population. (ii) Treatment probability is E(si)=pSE(s_{i})=p_{S}, where 0<pS<10<p_{S}<1. (iii) E(x~ix~i)E\left(\tilde{x}_{i}\tilde{x}_{i}^{\prime}\right) is finite and nonsingular. (iv) E(ϵi|x~i)=0E(\epsilon_{i}|\tilde{x}_{i})=0 and E(ϵi2x~ix~i)=ΣxE(\epsilon_{i}^{2}\tilde{x}_{i}\tilde{x}_{i}^{\prime})=\Sigma_{x}, where Σx\Sigma_{x} is finite and positive definite. Then Assumption 3 holds, specifically: (a)

N(τ^τw^w)𝑑𝒩[02G×1,(Σ¯τ00Σ¯w)],\sqrt{N}\left(\begin{array}[]{c}\hat{\tau}-\tau\\ \hat{w}-w\end{array}\right)\xrightarrow{d}\mathcal{N}\left[0_{2G\times 1},\left(\begin{array}[]{cc}\bar{\Sigma}_{\tau}&0\\ 0&\bar{\Sigma}_{w}\end{array}\right)\right],

where Σ¯w=pS1(diag(w)ww)\bar{\Sigma}_{w}=p_{S}^{-1}\left(\operatorname{diag}(w)-ww^{\prime}\right) is positive semi-definite, Σ¯τ\bar{\Sigma}_{\tau} is the lower-right G×GG\times G sub-matrix of Σ¯(β,τ)=[E(x~ix~i)]1Σx[E(x~ix~i)]1\bar{\Sigma}_{\left(\beta,\tau\right)}=\left[E\left(\tilde{x}_{i}\tilde{x}_{i}^{\prime}\right)\right]^{-1}\Sigma_{x}\left[E\left(\tilde{x}_{i}\tilde{x}_{i}^{\prime}\right)\right]^{-1} and is positive definite.

(b) Let Σ¯^τ\hat{\bar{\Sigma}}_{\tau} be the lower-right G×GG\times G sub-matrix of Σ¯^(β,τ)\hat{\bar{\Sigma}}_{(\beta,\tau)} in (4), and Σ¯^w\hat{\bar{\Sigma}}_{w} be defined in (5), then Σ¯^wΣ¯w𝑝0\hat{\bar{\Sigma}}_{w}-\bar{\Sigma}_{w}\xrightarrow{p}0 and Σ¯^τΣ¯τ𝑝0\hat{\bar{\Sigma}}_{\tau}-\bar{\Sigma}_{\tau}\xrightarrow{p}0 as NN\rightarrow\infty.

Remark 3.

In the special case when G=1G=1, w^=w=1\hat{w}=w=1 and Σ¯^w=Σ¯w=0\hat{\bar{\Sigma}}_{w}=\bar{\Sigma}_{w}=0, which satisfy Assumption 3 trivially.

Under more restrictive conditions, Assumption 4 holds.

Lemma 2.

Suppose all conditions in Lemma 1 hold. Furthermore, ϵi\epsilon_{i} in regression (3) are i.i.d. 𝒩(0,σϵ2)\mathcal{N}(0,\sigma_{\epsilon}^{2}) conditional on X~\tilde{X}, and σ^τ,g2=zgσ^ϵ2\hat{\sigma}_{\tau,g}^{2}=z_{g}\hat{\sigma}_{\epsilon}^{2}, where zgz_{g} is the (kx+g)(k_{x}+g)-th diagonal element of (X~X~)1(\tilde{X}^{\prime}\tilde{X})^{-1}, σ^ϵ2=ϵ^ϵ^/(NkxG)\hat{\sigma}_{\epsilon}^{2}=\hat{\epsilon}^{\prime}\hat{\epsilon}/(N-k_{x}-G) and ϵ^\hat{\epsilon} is the vector of OLS residuals. Then Assumption 4 holds. Specifically: (i) w^\hat{w} is a function of X~\tilde{X}. (ii) (τ^τ)|X~𝒩(0,στ,g2(X~))\left(\hat{\tau}-\tau\right)|\tilde{X}\sim\mathcal{N}\left(0,\sigma_{\tau,g}^{2}(\tilde{X})\right), where στ,g2(X~)=zgσϵ2\sigma_{\tau,g}^{2}(\tilde{X})=z_{g}\sigma_{\epsilon}^{2}. (iii) Conditional on X~\tilde{X}, mσ^τ,g2/στ,g2(X~)=mσ^ϵ2/σϵ2χ2(m)m\hat{\sigma}_{\tau,g}^{2}/\sigma_{\tau,g}^{2}(\tilde{X})=m\hat{\sigma}_{\epsilon}^{2}/\sigma_{\epsilon}^{2}\sim\chi^{2}(m) where m=NkxGm=N-k_{x}-G. (iv) τ^g\hat{\tau}_{g} is independent of σ^τ,g2\hat{\sigma}_{\tau,g}^{2} conditional on X~\tilde{X}.

Proof.

Note that w^\hat{w} is a function of DD hence of X~=[X,D]\tilde{X}=[X,D]. The lemma thus follows the theory of the classical OLS model. ∎

3.1.2 Example 2: Staggered Difference-in-differences Design

This example considers a staggered difference-in-differences design where multiple groups start receiving treatment at different times. I focus on the case with a never-treated group and no treatment exit and use panel data for illustration.

Let c(i)c(i) denote the period when individual ii first receives treatment, with c(i)=c(i)=\infty for never-treated individuals. Cohort cc is defined as the group of individuals with c(i)=cc(i)=c, and the event time rit=tc(i)r_{it}=t-c(i) is time relative to treatment. Each c,rc,r combination is treated as a sub-group, with τ(c,r)\tau(c,r) denoting the treatment effect on the log-transformed outcome for cohort cc at event time rr. Recent literature has demonstrated that if treatment effects are heterogeneous across cc and rr, i.e., τ(c,r)τ(c,r)\tau(c,r)\neq\tau(c^{\prime},r^{\prime}) if ccc\neq c^{\prime} or rrr\neq r^{\prime}, traditional two-way fixed effects (TWFE) estimators can produce biased results (Goodman-Bacon, 2021; Sun and Abraham, 2021; de Chaisemartin and D’Haultfœuille, 2020; Borusyak et al., 2024). In response, researchers have proposed heterogeneity robust estimators (de Chaisemartin and D’Haultfœuille, 2020; Sun and Abraham, 2021; Callaway and Sant’Anna, 2021; Borusyak et al., 2024, etc.). With log-transformed outcomes, these estimators typically involve estimating τ(c,r)\tau(c,r) and the corresponding weights w(c,r)w(c,r), then computing the ATE in log points τ¯=crw(c,r)τ(c,r)\bar{\tau}=\sum_{c}\sum_{r}w(c,r)\tau(c,r). As discussed in Section 2.2, τ¯\bar{\tau} and exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 differ from the ATE in percentage points, ρ¯\bar{\rho}. However, the estimators of ww and τ\tau in these studies usually satisfy Assumption 3,666See, e.g., Theorems 2 and 3 in Callaway and Sant’Anna (2021), Propositions 5 and 6 in Sun and Abraham (2021). and therefore form the basis of the estimation and inference methods of ρ¯\bar{\rho} in this paper.

Below I discuss estimators of τ(c,r)\tau(c,r) and w(c,r)w(c,r) in Sun and Abraham (2021), which are adaptions of the OLS estimators in Example 1 to staggered difference-in-differences settings, if we view each (c,r)(c,r) combination as a sub-treatment group. For the estimation of τ(c,r)\tau(c,r), consider the model

ln(yit)=αi+βt+xitγ+cr1dit(c,r)τ(c,r)+ϵit,\ln(y_{it})=\alpha_{i}+\beta_{t}+x_{it}^{\prime}\gamma+\sum_{c\neq\infty}\sum_{r\neq-1}d_{it}(c,r)\tau(c,r)+\epsilon_{it}, (6)

where yity_{it}, αi\alpha_{i}, βt\beta_{t} and xitx_{it} are defined as in equation (2), and dit(c,r)=1d_{it}(c,r)=1 if individual ii belongs to cohort cc and rit=rr_{it}=r. The never-treated group (c=c=\infty) serves as the control group, and the period immediately preceding treatment (r=1r=-1) serves as the base period. Under assumptions of conditional parallel trends and no anticipation, OLS estimation of equation (6) yields an estimator of τ(c,r)\tau(c,r) that satisfies Assumption 3, see Propositions 5 and 6 in Sun and Abraham (2021).

The estimation of w(c,r)w(c,r) depends on the parameter of interest. Define 𝒜{(c,r):c,r1}\mathcal{A}\subset\{(c,r):c\neq\infty,r\neq-1\} as a subset of the sub-treatment groups (c,rc,r combinations) for which to calculate the ATT on percentage change. Following Callaway and Sant’Anna (2021), I define 𝒜={(c,r):c,r=r}\mathcal{A}=\{(c,r):c\neq\infty,r=r^{\ast}\} for ATT of event time rr^{\ast}, 𝒜={(c,r):c=c,r0}\mathcal{A}=\{(c,r):c=c^{\ast},r\geqslant 0\} for ATT of cohort cc^{\ast}, 𝒜={(c,r):c+r=t,r0}\mathcal{A}=\{(c,r):c+r=t,r\geqslant 0\} for ATT of calendar time tt, and 𝒜={(c,r):c,r0}\mathcal{A}=\{(c,r):c\neq\infty,r\geqslant 0\} for ATT of all treated units. Let w𝒜(c,r)w^{\mathcal{A}}(c,r) be the share of sub-treatment group (c,r)(c,r) in 𝒜\mathcal{A}, with w𝒜(c,r)=0w^{\mathcal{A}}(c,r)=0 if (c,r)𝒜(c,r)\notin\mathcal{A}. Then w𝒜(c,r)=E[(c,r)=(c,r)|(c,r)𝒜]w^{\mathcal{A}}(c^{\ast},r^{\ast})=E\left[(c,r)=(c^{\ast},r^{\ast})|(c,r)\in\mathcal{A}\right].

Treating each (c,r)(c,r) combination in 𝒜\mathcal{A} as a sub-treatment group and estimate w𝒜(c,r)w^{\mathcal{A}}(c,r) as in Example 1:

w^𝒜(c,r)=itdit(c,r)/((c,r)𝒜itdit(c,r)),\hat{w}^{\mathcal{A}}(c,r)=\sum_{i}\sum_{t}d_{it}(c,r)/\left(\sum_{(c^{\ast},r^{\ast})\in\mathcal{A}}\sum_{i}\sum_{t}d_{it}(c,r)\right), (7)

where the numerator is the sample size of cohort cc at event time rr, and the denominator is the sample size of all units in set 𝒜\mathcal{A}. Equivalently, w^𝒜(c,r)\hat{w}^{\mathcal{A}}(c,r) is the OLS estimator in the regression dit(c,r)=w𝒜(c,r)sit𝒜+vit,d_{it}(c,r)=w^{\mathcal{A}}(c,r)s_{it}^{\mathcal{A}}+v_{it}, where sit𝒜=(c,r)𝒜dit(c,r)s_{it}^{\mathcal{A}}=\sum_{(c,r)\in\mathcal{\mathcal{A}}}d_{it}(c,r) is the indicator that individual ii in period tt belongs to set 𝒜\mathcal{A}. Let w^𝒜\hat{w}^{\mathcal{A}} be the vector of w^𝒜(c,r)\hat{w}^{\mathcal{A}}(c,r) of all (c,r)(c,r) combinations in 𝒜\mathcal{A}. The asymptotic covariance matrix of N(w^𝒜w𝒜)\sqrt{N}\left(\hat{w}^{\mathcal{A}}-w^{\mathcal{A}}\right) is estimated as Σ¯^w𝒜=N/N𝒜(diag(w^𝒜)w^𝒜w^𝒜)\hat{\bar{\Sigma}}_{w}^{\mathcal{A}}=N/N_{\mathcal{A}}\left(\operatorname{diag}(\hat{w}^{\mathcal{A}})-\hat{w}^{\mathcal{A}}\hat{w}^{\mathcal{A}\prime}\right), where N𝒜N_{\mathcal{A}} is the total sample size of all units in 𝒜\mathcal{A}. The estimators w^𝒜\hat{w}^{\mathcal{A}} and Σ¯^w𝒜\hat{\bar{\Sigma}}_{w}^{\mathcal{A}} satisfy Assumption 3 by Lemma 1 and are identical to those in Sun and Abraham (2021).

3.2 Estimation of ρ¯\bar{\rho}

First, consider the estimators τ¯^=g=1Gw^gτ^g\hat{\bar{\tau}}=\sum_{g=1}^{G}\hat{w}_{g}\hat{\tau}_{g} for τ¯\bar{\tau} and ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1 for ρa=exp(τ¯)1\rho_{a}=\operatorname{exp}(\bar{\tau})-1. Under Assumption 3, τ¯^𝑝τ¯\hat{\bar{\tau}}\xrightarrow{p}\bar{\tau} and ρ^a𝑝ρa\hat{\rho}_{a}\xrightarrow{p}\rho_{a} by Slutsky’s theorem. Under treatment effect heterogeneity τ¯ρ¯\bar{\tau}\neq\bar{\rho} and ρaρ¯\rho_{a}\neq\bar{\rho}, both τ¯^\hat{\bar{\tau}} and ρ^a\hat{\rho}_{a} are inconsistent estimators for ρ¯\bar{\rho}.

A straightforward estimator for ρ¯\bar{\rho} is:

ρ^b=g=1Gw^gexp(τ^g)1.\hat{\rho}_{b}=\sum_{g=1}^{G}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g})-1. (8)

By Slutsky’s theorem, ρ^b\hat{\rho}_{b} is consistent under Assumption 3. However, it exhibits bias in finite samples due to the convexity of the exponential function: when E(τ^g)=τgE(\hat{\tau}_{g})=\tau_{g}, exp(τg)=exp(E(τ^g))E(exp(τ^g))\operatorname{exp}(\tau_{g})=\operatorname{exp}(E(\hat{\tau}_{g}))\neq E(\operatorname{exp}(\hat{\tau}_{g})) unless var(τ^g)=0var(\hat{\tau}_{g})=0. As sample size increases, var(τ^g)0var(\hat{\tau}_{g})\rightarrow 0, and the difference between exp(E(τ^g))\operatorname{exp}(E(\hat{\tau}_{g})) and E(exp(τ^g))E(\operatorname{exp}(\hat{\tau}_{g})) diminishes. The bias is more pronounced with multiple sub-treatment groups, as with more subgroups the size of each group decreases and var(τ^g)var(\hat{\tau}_{g}) increases.

To address the finite sample bias, I adapt the bias correction approach in Kennedy (1981) for heterogeneous treatment effects:

ρ^c=g=1Gw^gexp(τ^g0.5σ^τ,g2)1.\hat{\rho}_{c}=\sum_{g=1}^{G}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g}-0.5\hat{\sigma}_{\tau,g}^{2})-1. (9)

The estimator utilizes the mean of lognormal distribution: for x𝒩(μx,σx2)x\sim\mathcal{N}(\mu_{x},\sigma_{x}^{2}), so that exp(x)Lognormal(μx,σx2)\operatorname{exp}(x)\sim Lognormal(\mu_{x},\sigma_{x}^{2}),

E(exp(x))=exp(μx+0.5σx2).E\left(\operatorname{exp}(x)\right)=\operatorname{exp}(\mu_{x}+0.5\sigma_{x}^{2}). (10)

Under Assumptions 3 and 4, g=1Gw^gexp(τ^g0.5στ,g2)\sum_{g=1}^{G}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g}-0.5\sigma_{\tau,g}^{2}) is an unbiased estimator for g=1Gwgexp(τg)\sum_{g=1}^{G}w_{g}\operatorname{exp}(\tau_{g}). The estimator ρ^c\hat{\rho}_{c} is obtained by replacing στ,g2\sigma_{\tau,g}^{2} with σ^τ,g2\hat{\sigma}_{\tau,g}^{2}. While ρ^c\hat{\rho}_{c} remains biased as E(exp(σ^τ,g2))exp(E(σ^τ,g2))E\left(\operatorname{exp}(\hat{\sigma}_{\tau,g}^{2})\right)\neq\operatorname{exp}\left(E(\hat{\sigma}_{\tau,g}^{2})\right), the bias diminishes as the sample size increases. Furthermore, Monte Carlo simulations in van Garderen and Shah (2002) for the case of homogeneous effects and those in this paper demonstrate that ρ^c\hat{\rho}_{c} performs well in modest sample sizes.

Under Assumption 4, I propose an exact unbiased estimator of ρ¯\bar{\rho}:

ρ^d=g=1Gw^gexp(τ^g)F10(m2,m2σ^τ,g22)1.\hat{\rho}_{d}=\sum_{g=1}^{G}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g})\operatorname{{}_{0}F_{1}}(\frac{m}{2},-\frac{m}{2}\frac{\hat{\sigma}_{\tau,g}^{2}}{2})-1. (11)

where mm is the degree of freedom of the chi-square distribution of σ^τ,g2\hat{\sigma}_{\tau,g}^{2}, or in Example 1 the residual degree of freedom in regression (3), F10(a,b)\operatorname{{}_{0}F_{1}}(a,b) is a confluent hypergeometric limit function, equivalently a generalized hypergeometric function with 0 parameter of type 1 and 1 parameter of type 2, defined as

F10(a,b)=n=0bn(a)nn!,\operatorname{{}_{0}F_{1}}(a,b)=\sum_{n=0}^{\infty}\frac{b^{n}}{(a)_{n}n!}, (12)

with (a)n(a)_{n} denoting the rising factorial:

(a)n={1n=0a(a+1)(a+n1)n>0.(a)_{n}=\begin{cases}1&n=0\\ a(a+1)\ldots(a+n-1)&n>0.\end{cases}

The estimator is adapted from the exact unbiased estimator under treatment effect homogeneity by van Garderen and Shah (2002).777For a detailed discussion of hypergeometric functions, see Abadir (1999). Alternative forms of estimators can be developed using different approaches to estimate exp(τ^g0.5σg2)\operatorname{exp}(\hat{\tau}_{g}-0.5\sigma_{g}^{2}), see Zhang and Gou (2022) for a comprehensive study.

The estimator ρ^d\hat{\rho}_{d} involves more complex computation due to the confluent hypergeometric function. Unbiasedness of ρ^d\hat{\rho}_{d} requires substantially stronger assumptions than those for ρ^c\hat{\rho}_{c}, including i.i.d. normal errors as stipulated in Lemma 2. Furthermore, the difference between ρ^c\hat{\rho}_{c} and ρ^d\hat{\rho}_{d} becomes negligible as sample size increases, with convergence observed even for sample sizes as small as 20 under homogeneous treatment effects (van Garderen and Shah, 2002). Given the computational simplicity of ρ^c\hat{\rho}_{c} and the limited gains from using ρ^d\hat{\rho}_{d}, especially as sample size increases, ρ^c\hat{\rho}_{c} is generally preferred for estimation of ρ¯\bar{\rho}.

The properties of these estimators are formalized in the following theorem:

Theorem 1.

(a) Under Assumptions 1- 3, ρ^b\hat{\rho}_{b}, ρ^c\hat{\rho}_{c} and ρ^d\hat{\rho}_{d} are all consistent estimators for ρ¯\bar{\rho}.

(b) Under Assumptions 1- 4, ρ^d\hat{\rho}_{d} is an unbiased estimator of ρ¯\bar{\rho}.

The proof is in the online appendix.

3.3 Inference of ρ¯\bar{\rho}

I first discuss inference of ρ¯\bar{\rho} based on approximations τ¯\bar{\tau} or ρa=exp(τ¯)1\rho_{a}=\operatorname{exp}(\bar{\tau})-1. Assuming that τ¯^N(τ¯,στ¯2)\hat{\bar{\tau}}\sim N(\bar{\tau},\sigma_{\bar{\tau}}^{2}) approximately, which holds for various heterogeneity robust estimators (e.g., Callaway and Sant’Anna, 2021; Sun and Abraham, 2021; Gibbons et al., 2019). We can test H0:τ¯=ρ0H_{0}:\bar{\tau}=\rho_{0} using z-test based on:

zτ=(τ¯^ρ0)/στ¯.z_{\tau}=\left(\hat{\bar{\tau}}-\rho_{0}\right)/\sigma_{\bar{\tau}}. (13)

The corresponding (1α)(1-\alpha) confidence interval (CI) for τ¯\bar{\tau} is

CIτ=[τ¯^+στ¯zα/2,τ¯^στ¯zα/2],CI_{\tau}=\left[\hat{\bar{\tau}}+\sigma_{\bar{\tau}}z_{\alpha/2},\hat{\bar{\tau}}-\sigma_{\bar{\tau}}z_{\alpha/2}\right], (14)

where zα/2z_{\alpha/2} is the α/2\alpha/2 quantile of the standard normal distribution. In practice, στ¯\sigma_{\bar{\tau}} is replaced by a consistent estimator σ^τ¯\hat{\sigma}_{\bar{\tau}}. Using ρa=exp(τ¯)1\rho_{a}=\operatorname{exp}(\bar{\tau})-1 to approximate ρ¯\bar{\rho}, the z-score for H0:ρa=ρ0H_{0}:\rho_{a}=\rho_{0} is

za=(τ¯^ln(ρ0+1))/στ¯,z_{a}=\left(\hat{\bar{\tau}}-\ln(\rho_{0}+1)\right)/\sigma_{\bar{\tau}}, (15)

as ρa=ρ0\rho_{a}=\rho_{0} is equivalent to τ¯=ln(ρ0)+1\bar{\tau}=\ln(\rho_{0})+1. The (1α)(1-\alpha) CI of ρa\rho_{a} is

CIa\displaystyle CI_{a} =\displaystyle= [exp(τ¯^+στ¯zα/2)1,exp(τ¯^στ¯zα/2)1].\displaystyle\left[\operatorname{exp}\left(\hat{\bar{\tau}}+\sigma_{\bar{\tau}}z_{\alpha/2}\right)-1,\operatorname{exp}\left(\hat{\bar{\tau}}-\sigma_{\bar{\tau}}z_{\alpha/2}\right)-1\right]. (16)

Observe that when ρ0=0\rho_{0}=0, we have ρ0=ln(ρ0+1)\rho_{0}=\ln(\rho_{0}+1) and consequently zτ=zaz_{\tau}=z_{a}. Therefore, the tests for H0:ρ¯=ρ0H_{0}:\bar{\rho}=\rho_{0} based on the approximations τ¯\bar{\tau} and exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 yield identical results for ρ0=0\rho_{0}=0 and similar results for ρ0\rho_{0} close to 0. Given that ρ¯exp(τ¯)1τ¯\bar{\rho}\geqslant\operatorname{exp}(\bar{\tau})-1\geqslant\bar{\tau}, the inference for ρ¯\bar{\rho} based on exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 is expected to outperform that based on τ¯\bar{\tau}. Both methods yield reliable results when τg\tau_{g} is small for all gg and hence τ¯\bar{\tau}, exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 are both close to ρ¯\bar{\rho}, but may exhibit substantial bias when τ¯\bar{\tau} or exp(τ¯)1\operatorname{exp}(\bar{\tau})-1 differ significantly from ρ¯\bar{\rho}.

This paper proposes a novel approach for more accurate inference of μ0=ln(g=1Gwgexp(τg))\mu_{0}=\ln\left(\sum_{g=1}^{G}w_{g}\operatorname{exp}(\tau_{g})\right), which is equivalent of inference of ρ¯\bar{\rho}, as ρ¯=exp(μ0)1\bar{\rho}=\operatorname{exp}(\mu_{0})-1 is a strictly increasing function of μ0\mu_{0}. Note that μ0=0\mu_{0}=0 if and only if ρ¯=0\bar{\rho}=0. When G=1G=1, μ0=τ\mu_{0}=\tau is the homogeneous treatment effect in log points.

Exact inference on μ0\mu_{0} is infeasible due to the unknown distribution of ρ¯\bar{\rho} estimators. Even with known ww, g=1Gwgexp(τ^g)=g=1Gexp(ln(wg)+τ^g)\sum_{g=1}^{G}w_{g}\operatorname{exp}(\hat{\tau}_{g})=\sum_{g=1}^{G}\operatorname{exp}\left(\ln(w_{g})+\hat{\tau}_{g}\right) is the sum of correlated log-normal distributions, which has no known closed-form distribution. The approximation of the sum of log-normal variables is a well-known statistics problem and is of particular interest to researchers in telecommunications, see, e.g., Mehta et al. (2007) for a review.

I propose an approximate inference method for μ0\mu_{0}, using the Fenton-Wilkinson method to approximate the distribution of g=1Gw^gexp(τ^g)\sum_{g=1}^{G}\hat{w}_{g}\operatorname{exp}(\hat{\tau}_{g}), which is approximately the sum of correlated log-normals. The Fenton-Wilkinson method approximates the sum of log-normal variables with a single log-normal variable, by matching the first and second moments (Fenton, 1960; Abu-Dayya and Beaulieu, 1994). While primarily used in telecommunications, it has also found applications in economics (e.g., Kovak et al. 2021; Marone and Sabety 2022).

Let η0,g=ln(wg)+τg,\eta_{0,g}=\ln(w_{g})+\tau_{g}, and η0=(η0,1,,η0,G)\eta_{0}=(\eta_{0,1},\ldots,\eta_{0,G})^{\prime}, then

exp(η0)=(w1exp(τ1),wGexp(τG)),\operatorname{exp}(\eta_{0})=\left(w_{1}\operatorname{exp}(\tau_{1})\ldots,w_{G}\operatorname{exp}(\tau_{G})\right)^{\prime}, (17)
μ0=ln[g=1Gexp(η0,g)]=ln[𝟏Gexp(η0)],\mu_{0}=\ln\left[\sum_{g=1}^{G}\operatorname{exp}\left(\eta_{0,g}\right)\right]=\ln\left[\mathbf{1}_{G}^{\prime}\operatorname{exp}(\eta_{0})\right], (18)

where 𝟏G\mathbf{1}_{G} is a G×1G\times 1 vector of ones. Define

ηg=ln(w^g)+τ^g12σw,g2wg212στ,g2\eta_{g}^{\ast}=\ln(\hat{w}_{g})+\hat{\tau}_{g}-\frac{1}{2}\frac{\sigma_{w,g}^{2}}{w_{g}^{2}}-\frac{1}{2}\sigma_{\tau,g}^{2} (19)

as the estimate for η0,g\eta_{0,g}, with η=(η1,,ηG)\eta^{\ast}=(\eta_{1}^{\ast},\ldots,\eta_{G}^{\ast})^{\prime} estimating η0\eta_{0}. The bias correcting terms involving σw,g2\sigma_{w,g}^{2} and στ,g2\sigma_{\tau,g}^{2} are motivated by equation (10), as we shall see in the proof of Theorem 2. The estimator for μ0\mu_{0} is:

μ=ln(g=1Gexp(ηg))=ln[𝟏Gexp(η)].\mu^{\ast}=\ln\left(\sum_{g=1}^{G}\operatorname{exp}(\eta_{g}^{\ast})\right)=\ln\left[\mathbf{1}_{G}^{\prime}\operatorname{exp}(\eta^{\ast})\right]. (20)

Recall that diag(α)\operatorname{diag}(\alpha) is the diagonal matrix with elements of vector α\alpha on the main diagonal.

Theorem 2.

(a) Under Assumptions 1- 3,

zμ=μ+12σμ2μ0σμ𝒩(0,1)z_{\mu}=\frac{\mu^{\ast}+\frac{1}{2}\sigma_{\mu}^{\ast 2}-\mu_{0}}{\sigma_{\mu}^{\ast}}\sim\mathcal{N}(0,1) (21)

approximately as NN\rightarrow\infty, where

σμ2=ln(exp(η0)exp(Ση)exp(η0)exp(η0)𝟏G𝟏Gexp(η0))\sigma_{\mu}^{\ast 2}=\ln\left(\frac{\operatorname{exp}(\eta_{0})^{\prime}\operatorname{exp}(\Sigma_{\eta})\operatorname{exp}(\eta_{0})}{\operatorname{exp}(\eta_{0})^{\prime}\mathbf{1}_{G}\mathbf{1}_{G}^{\prime}\operatorname{exp}(\eta_{0})}\right) (22)

is the approximate variance of μ\mu^{\ast}, with Ση=diag(w)1Σwdiag(w)1+Στ\Sigma_{\eta}=\operatorname{diag}(w)^{-1}\Sigma_{w}\operatorname{diag}(w)^{-1}+\Sigma_{\tau} as the approximate variance of η\eta^{\ast}.

(b) The limit limNNσμ2=σ¯μ2\lim_{N\rightarrow\infty}N\sigma_{\mu}^{\ast 2}=\bar{\sigma}_{\mu}^{2} exists, and 0<σ¯μ2<0<\bar{\sigma}_{\mu}^{2}<\infty.

Remark 4.

In the case of homogeneous treatment effects, w=1w=1 and Σw=0\Sigma_{w}=0, we have μ=τ^12στ2\mu^{\ast}=\hat{\tau}-\frac{1}{2}\sigma_{\tau}^{2}, σμ2=στ2\sigma_{\mu}^{2}=\sigma_{\tau}^{2}, and hence zμ=(τ^μ0)/στ=zτz_{\mu}=\left(\hat{\tau}-\mu_{0}\right)/\sigma_{\tau}=z_{\tau}. Observe further that τ¯=τ\bar{\tau}=\tau under homogeneity and μ0=ln(ρ0+1)\mu_{0}=\ln(\rho_{0}+1), hence zμ=za=zτz_{\mu}=z_{a}=z_{\tau}.

Proof of the theorem is in the online appendix. Here is a sketch of the intuition. By Taylor approximation, ln(w^g)ln(wg)+(w^gwg)/wg\ln(\hat{w}_{g})\approx\ln(w_{g})+\left(\hat{w}_{g}-w_{g}\right)/w_{g}, which follows a normal distribution. Consequently ηg\eta_{g}^{\ast} is approximately normal with variance σw,g2/wg2+στ,g2\sigma_{w,g}^{2}/w_{g}^{2}+\sigma_{\tau,g}^{2}. In light of the mean of log-normal variables in (10), E(exp(ηg))=exp(η0,g)E(\operatorname{exp}(\eta_{g}^{\ast}))=\operatorname{exp}(\eta_{0,g}) and hence by (18) and (20), E(exp(μ))=exp(μ0)E\left(\operatorname{exp}(\mu^{\ast})\right)=\operatorname{exp}(\mu_{0}). In addition, exp(μ)=g=1Gexp(ηg)\operatorname{exp}(\mu^{\ast})=\sum_{g=1}^{G}\operatorname{exp}(\eta_{g}^{\ast}) represents the sum of log-normal variables, which can be approximated by a log-normal distribution using the Fenton-Wilkinson method. Thus, μ\mu^{\ast} is approximately normal with mean E(μ)E(\mu^{\ast}) and variance σμ2\sigma_{\mu}^{\ast 2}. Applying (10) again, we obtain exp(μ0)=E(exp(μ))=exp[E(μ)+0.5σμ2]\operatorname{exp}(\mu_{0})=E(\operatorname{exp}(\mu^{\ast}))=\operatorname{exp}\left[E(\mu^{\ast})+0.5\sigma_{\mu}^{\ast 2}\right], which implies μ0=E(μ)+0.5σμ2\mu_{0}=E(\mu^{\ast})+0.5\sigma_{\mu}^{\ast 2}. We thus have μ+0.5σμ2𝒩(μ0,σμ2)\mu^{\ast}+0.5\sigma_{\mu}^{\ast 2}\sim\mathcal{N}(\mu_{0},\sigma_{\mu}^{\ast 2}) approximately.

Theorem 2 provides the basis for hypothesis testing and confidence interval construction for μ0\mu_{0} and, by extension, ρ¯\bar{\rho}. For testing H0:μ0=aH_{0}:\mu_{0}=a, we can use the z-score zμz_{\mu} defined in (21). The corresponding 1α1-\alpha confidence interval for μ0\mu_{0} is:

[μ+0.5σμ2+zα/2σμ,μ+0.5σμ2zα/2σμ],[\mu^{\ast}+0.5\sigma_{\mu}^{\ast 2}+z_{\alpha/2}\sigma_{\mu}^{\ast},\mu^{\ast}+0.5\sigma_{\mu}^{\ast 2}-z_{\alpha/2}\sigma_{\mu}^{\ast}],

where zα/2z_{\alpha/2} is the α/2\alpha/2 quantile of the standard normal distribution. The 1α1-\alpha confidence interval of ρ¯\bar{\rho} is

[exp(μ+0.5σμ2+zα/2σμ)1,exp(μ+0.5σμ2zα/2σμ)1].\left[\operatorname{exp}(\mu^{\ast}+0.5\sigma_{\mu}^{\ast 2}+z_{\alpha/2}\sigma_{\mu}^{\ast})-1,\operatorname{exp}(\mu^{\ast}+0.5\sigma_{\mu}^{\ast 2}-z_{\alpha/2}\sigma_{\mu}^{\ast})-1\right]. (23)

Let μ^\hat{\mu} and σ^μ2\hat{\sigma}_{\mu}^{2} denote feasible estimates of μ\mu^{\ast} and σμ2\sigma_{\mu}^{\ast 2}, obtained by replacing wg,τg,σw,g2w_{g},\tau_{g},\sigma_{w,g}^{2} and στ,g2\sigma_{\tau,g}^{2} with their respective estimators. Lemma 3 below ensures that the asymptotic normality result holds when using these estimates in practice.

Lemma 3.

Suppose Assumptions 1- 3 hold, then

z^μ=μ^+0.5σ^μ2μ0σ^μ𝒩(0,1)\hat{z}_{\mu}=\frac{\hat{\mu}+0.5\hat{\sigma}_{\mu}^{2}-\mu_{0}}{\hat{\sigma}_{\mu}}\sim\mathcal{N}(0,1)

approximately as NN\to\infty.

4 Empirical Applications

4.1 Education Reform and Earnings

The first application uses a semi-log regression model as discussed in Example 1. I replicate and extend the analysis presented in Table 2, Column 1 of Meghir and Palme (2005), which examines the impact of educational reform in Sweden on earnings. I utilize their original unbalanced panel dataset, comprising 19,230 individuals observed from 1985 to 1996, with a total sample size of 209,683. I apply the procedure for estimating τ\tau and ww in Example 1.

I consider two specifications, both of which are semi-log regression models. The model assuming homogeneous treatment effects is:

ln(earnit)=α+τ×reformi+xitβ+ϵit,\ln(earn_{it})=\alpha+\tau\times reform_{i}+x_{it}^{\prime}\beta+\epsilon_{it}, (24)

and the model allowing for heterogeneous treatment effects across genders is

ln(earnit)=α+τ1reformifemalei+τ2reformimalei+xitβ+ϵit,\ln(earn_{it})=\alpha+\tau_{1}reform_{i}*female_{i}+\tau_{2}reform_{i}*male_{i}+x_{it}^{\prime}\beta+\epsilon_{it}, (25)

where earnitearn_{it} is the earnings of individual ii in year tt, reformireform_{i} is an indicator that individual ii was affected by the reform during childhood, xitx_{it} is a vector of controls including year and municipal fixed effects, a female indicator, as well as the following variables and their interactions with the female indicator: county fixed effects, father’s education level dummies, a cohort dummy, and 44 measures of individual abilities. 888The specification in Meghir and Palme (2005) differs slightly from model (24). While I incorporate municipal fixed effects directly, Meghir and Palme (2005) employ a demeaning approach, subtracting municipal means from all variables except for year dummies. If year dummies were also demeaned, the two approaches are mathematically equivalent for the full sample analysis but still different for subsample analyses, unless demeaning is performed within each specific subsample. Nevertheless, estimates of τ\tau from (24) closely align with those in Meghir and Palme (2005).

I estimate τ\tau in (24) and (25) using OLS on five samples: the full sample, individuals with low-educated fathers (N=173,435N=173,435), those with low-educated fathers and low personal ability (N=92,473N=92,473), those with low-educated fathers and high personal ability (N=80,962N=80,962), and individuals with high-educated fathers (N=362,48N=362,48). Standard errors are clustered at the municipality level. I estimate ww and the variance Σw\Sigma_{w} as in Example 1. For regression (24), w^=1\hat{w}=1 and Σ^w=0\hat{\Sigma}_{w}=0. For regression (25), w1w_{1} and w2w_{2} are estimated by the share of females and males in all treated units in each sample. Using these estimates, I compute point estimates and construct confidence intervals for ρ¯\bar{\rho} as detailed in Sections 3.2 and 3.3.

Table 1 presents the results, with the top and bottom panels corresponding to models (24) and (25) respectively. Columns (1)-(4) report point estimates of ρ¯\bar{\rho}: τ¯^=g=1Gw^gτ^g\hat{\bar{\tau}}=\sum_{g=1}^{G}\hat{w}_{g}\hat{\tau}_{g}, ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1, ρ^b\hat{\rho}_{b} from (8) and ρ^c\hat{\rho}_{c} from (9). The estimate ρ^d\hat{\rho}_{d} is suppressed as it is virtually identical to ρ^c\hat{\rho}_{c} when rounded to three decimal places. Columns (5)-(7) provide 95%95\% confidence intervals (CI) for τ¯\bar{\tau}, ρa\rho_{a} and ρ¯\bar{\rho}, defined in (14), (16), and (23) respectively. In the case of homogeneous treatment effects, ρ^a=ρ^b\hat{\rho}_{a}=\hat{\rho}_{b} and za=zμz_{a}=z_{\mu}, hence ρ^a\hat{\rho}_{a} and CI for ρa\rho_{a} are omitted. To facilitate interpretation, all estimates and CIs are scaled by 100.

The results for model (24) and (25) exhibit similar patterns. In what follows, I focus the discussion on the results from model (25). For the first four samples, 100(τ^1,τ^2)100(\hat{\tau}_{1},\hat{\tau}_{2}) are (1.86,0.96)(1.86,0.96), (4.42,2.39)(4.42,2.39), (3.65,1.48)(3.65,1.48), and (6.09,2.60)(6.09,2.60) respectively. The small magnitudes of τ^g\hat{\tau}_{g} result in similar values across ρ¯\bar{\rho} estimates and confidence intervals. For example, for the whole sample, τ¯^\hat{\bar{\tau}} is 1.4221.422 (95%95\% CI: [0.318,3.163][-0.318,3.163]), which is similar to ρ^c=1.427\hat{\rho}_{c}=1.427 (95%95\% CI: [0.321,3.213][-0.321,3.213]). For the fifth sample (individuals with high-educated fathers), (τ^1,τ^2)=(10.30,2.21)(\hat{\tau}_{1},\hat{\tau}_{2})=(-10.30,-2.21), indicating larger magnitudes and between-group heterogeneity of τg\tau_{g}. We can thus observe a larger difference in the estimators and CIs: τ¯^=6.37%\hat{\bar{\tau}}=-6.37\% (95%95\% CI: [10.14%,2.61%][-10.14\%,-2.61\%]) and ρ^c=6.13%\hat{\rho}_{c}=-6.13\% (95%95\% CI: [9.57%,2.52%][-9.57\%,-2.52\%]).

4.2 Minimum Wage Policies and Teen Employment

The second application illustrates the applicability of my methodology to staggered difference-in-differences designs, as outlined in Example 2. It expands the analysis in panel B of Table 3 in Callaway and Sant’Anna (2021), examining the impact of minimum wage policies on teen employment.

This study utilizes a panel dataset covering 2,197 counties from 2001 to 2007.999The sample size slightly exceeds that of Callaway and Sant’Anna (2021), which includes 2,284 counties. This discrepancy likely arises from subtle differences in the definition of the dependent variable and my approach to merging the datasets. Nevertheless, my estimates closely align with theirs, even when they employ the doubly robust estimator. The sample comprises four groups based on minimum wage policy changes: 102 counties in states that increased minimum wages in 2004 (cohort 2004), 225 counties in cohort 2006, 590 counties in cohort 2007, and 1379 counties with no minimum wage increases during the study period (never-treated group).

I use model (6) to estimate τ\tau. The outcome variable ln(yit)\ln(y_{it}) is the natural logarithm of teen employment in county ii in year tt, obtained from Quarterly Workforce Indicators as the employment of individuals aged 14 to 18 in all private sectors at the end of the first quarter of each year. Treatment indicators, 𝟏(c(i)=c,tc(i)=r)\mathbf{1}(c(i)=c,t-c(i)=r), denote that county ii belongs to cohort cc and year tt is the rthr-th year post-treatment. The control variables xitx_{it} include interaction terms between year dummies and the following county-level characteristics in the year 2000: population, percent of white residents, poverty rate, and log median income. These variables are sourced from the County and City Data Book 2000. For a comprehensive description of the minimum wage changes and the datasets, please refer to Dube et al. (2016), Callaway and Sant’Anna (2021). The weights w𝒜w^{\mathcal{A}} are estimated using equation (7) for various 𝒜\mathcal{A}, as discussed in Example 2. Utilizing the estimators of τ\tau and w𝒜w^{\mathcal{A}}, I implement my estimation and inference methods for ρ¯\bar{\rho}.

Table 2 presents the findings, with columns defined the same as Table 1 and ρ^d\hat{\rho}_{d} omitted as it is indistinguishable from ρ^c\hat{\rho}_{c} when rounded to three digits. From top to bottom are estimates and CIs for ATT in percentage points of all treated units, each cohort, each event time, and each calendar year.

My results reveal small but non-negligible differences between ρ^c\hat{\rho}_{c} and alternative estimators such as τ¯^\hat{\bar{\tau}}, ρ^a\hat{\rho}_{a} and ρ^b\hat{\rho}_{b}, and between the CIs of ρ¯\bar{\rho} and those of τ¯\bar{\tau} and ρa\rho_{a}. For example, the ATT for event time 22 is estimated at 0.0949-0.0949 log points with a 95% CI of [0.1203,0.0695][-0.1203,-0.0695], and 9.06-9.06 percentage points with a 95% CI of [11.33%,6.72%][-11.33\%,-6.72\%]. The discrepancy between τ¯^\hat{\bar{\tau}} and ρ^c\hat{\rho}_{c} is approximately 0.43%, while the difference between the lower bounds of the confidence intervals for τ¯\bar{\tau} and ρ¯\bar{\rho} exceeds 0.7%. While these differences are modest in absolute terms, they are sufficiently large to warrant consideration in the interpretation of results.

5 Monte Carlo Experiment

This section presents a Monte Carlo experiment for the semi-log regression model in Example 1 to examine the finite sample properties of various estimators and inference methods.

I consider a setting with G=4G=4 sub-treatment groups. Treatment probability is pS=0.2p_{S}=0.2, and each sub-treatment group has an equal weight wg=0.25w_{g}=0.25 in the whole treatment population. Consequently, the control group and each sub-treatment group comprise 20%20\% of the total population. Observations are randomly assigned to either the control group or a sub-treatment group according to these probabilities. Sample size NN is selected from {20, 50, 100, 200, 500, 1000, 2000, 5000, 10410^{4}, 10510^{5}, 10610^{6}}.

I generate samples using a simplified version of model (3): ln(yi)=1+xi+g=14di(g)τg+ϵi\ln(y_{i})=1+x_{i}+\sum_{g=1}^{4}d_{i}^{(g)}\tau_{g}+\epsilon_{i}, where the covariate xix_{i} and error term ϵi\epsilon_{i} each is i.i.d. 𝒩(0,1)\mathcal{N}(0,1). Results using skew normal errors are analogous and presented in the online appendix. I examine two scenarios: large effects with 100(ρ1,ρ2,ρ3,ρ4)=(16,8,8,16)100(\rho_{1},\rho_{2},\rho_{3},\rho_{4})=(-16,-8,8,16) and small effects with 100(ρ1,ρ2,ρ3,ρ4)=(8,4,4,8)100(\rho_{1},\rho_{2},\rho_{3},\rho_{4})=(-8,-4,4,8). For both cases, the true value of ρ¯\bar{\rho} is 0%0\%.

Tables 3 and 4 present results for large and small treatment effects respectively, with values scaled by 100. Each table reports the mean and standard errors (in parentheses) of estimators for ρ¯\bar{\rho} across 100,000 repetitions. True values of τ¯\bar{\tau}, ρa=exp(τ¯)1\rho_{a}=\operatorname{exp}(\bar{\tau})-1 and ρ¯\bar{\rho} are provided at the top of each table. The final two columns show empirical rejection rates of z-tests using zτz_{\tau} for H0:τ¯=0H_{0}:\bar{\tau}=0 and using zμz_{\mu} for H0:ρ¯=0H_{0}:\bar{\rho}=0 at the 5% level. The empirical rejection rates are equivalent to one minus the coverage rate for 0 of the confidence intervals in equation (14) and equation (23) respectively. The z-test using zaz_{a} is omitted, as za=zτz_{a}=z_{\tau} when ρ0=0\rho_{0}=0.

The results show that the proposed estimator ρ^c\hat{\rho}_{c} and z-test using zμz_{\mu} perform well even for modest sample sizes. For both large and small treatment effects, when N50N\geqslant 50, ρ^c\hat{\rho}_{c} has a bias smaller than 0.1%. The empirical rejection rate of the z-test based on zμz_{\mu} falls between 4.9% and 5.1% when N200N\geqslant 200. For 50N<20050\leqslant N<200, the empirical rejection rate is slightly larger but remains below 5.6%.

For small treatment effects (Table 4), τ¯^\hat{\bar{\tau}} and ρ^a\hat{\rho}_{a} provide reasonable approximation for ρ¯\bar{\rho}, as true values of τ¯\bar{\tau} and ρa\rho_{a} are around 0.2%0.2\%, close to ρ¯=0\bar{\rho}=0. The approximation bias converges to 0.2%0.2\% as sample size increases. Tests of ρ¯=0\bar{\rho}=0 based on zτz_{\tau} maintain appropriate rejection rates for N5000N\leqslant 5000 but can be significantly oversized for large samples, e.g., 12.5% when N=100,000N=100,000. For large treatment effects (Table 3), bias from τ¯^\hat{\bar{\tau}} and ρ^a\hat{\rho}_{a} are more obvious, and converges to 0.8%0.8\% in this particular case. Z-Tests based on zτz_{\tau} have empirical rejection rates that are far above the nominal rate 5%5\%, e.g., when N=105N=10^{5}, the empirical rejection rate is 17.5%17.5\%.

The gain of ρ^d\hat{\rho}_{d} is overall modest. In smaller samples, ρ^d\hat{\rho}_{d} demonstrates slightly smaller root mean square deviations. However, the estimators ρ^c\hat{\rho}_{c} and ρ^d\hat{\rho}_{d} yield similar results when N200N\geqslant 200. While ρ^b\hat{\rho}_{b} is consistent and approaches ρ^c\hat{\rho}_{c} when sample size grows, it yields much larger bias and standard errors when N104N\leqslant 10^{4}. This highlights the importance of correcting small sample bias using ρ^c\hat{\rho}_{c} .

These findings underscore the importance of choosing appropriate estimators and inference methods, particularly when dealing with large treatment effects or very large sample sizes. The proposed estimator ρ^c\hat{\rho}_{c} and z-test using zμz_{\mu} demonstrate robust performance across various scenarios, while traditional approaches may lead to biased estimates or inflated rejection rates under certain conditions.

6 Conclusion

This paper highlights the importance of correctly estimating and interpreting ATEs in percentage points when treatment effects are heterogeneous across groups. The discrepancies between ATEs in log points and percentage points can be substantial, especially when treatment effects are large or vary significantly across subgroups. Failing to account for these differences may lead to misinterpretation of results and potentially misguided policy recommendations.

My proposed methods provide researchers with tools to obtain more accurate estimates and conduct valid inferences in ATEs in percentage points. The methods can be applied to a variety of settings like the semi-log regression models. They are particularly relevant for research designs such as staggered difference-in-differences models, where treatment effect heterogeneity is common. By applying the methods to empirical studies on education reform and minimum wage policies, I demonstrate how accounting for heterogeneity can affect the interpretation of ATE in percentage points in practice. My method so far has focused on group-specific treatment effect heterogeneity. Future research could explore the case of within-group heterogeneity. As empirical studies continue to grapple with complex treatment effect patterns, tools like those presented in this paper will become increasingly valuable for accurate estimation and inference.

References

  • (1)
  • Abadir (1999) Abadir, Karim M., “An Introduction to Hypergeometric Functions for Economists,” Econometric Reviews, January 1999, 18 (3), 287–330.
  • Abu-Dayya and Beaulieu (1994) Abu-Dayya, A.A. and N.C. Beaulieu, “Outage Probabilities in the Presence of Correlated Lognormal Interferers,” IEEE Transactions on Vehicular Technology, February 1994, 43 (1), 164–173.
  • Angrist (1998) Angrist, Joshua D., “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants,” Econometrica, 1998, 66 (2), 249–288.
  • Borusyak et al. (2024) Borusyak, Kirill, Xavier Jaravel, and Jann Spiess, “Revisiting Event-Study Designs: Robust and Efficient Estimation,” The Review of Economic Studies, February 2024, p. rdae007.
  • Callaway and Sant’Anna (2021) Callaway, Brantly and Pedro H. C. Sant’Anna, “Difference-in-Differences with Multiple Time Periods,” Journal of Econometrics, December 2021, 225 (2), 200–230.
  • Chen and Roth (2024) Chen, Jiafeng and Jonathan Roth, “Logs with Zeros? Some Problems and Solutions,” The Quarterly Journal of Economics, May 2024, 139 (2), 891–936.
  • de Chaisemartin and D’Haultfœuille (2020) de Chaisemartin, Clément and Xavier D’Haultfœuille, “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects,” American Economic Review, September 2020, 110 (9), 2964–2996.
  • Dube et al. (2016) Dube, Arindrajit, T. William Lester, and Michael Reich, “Minimum Wage Shocks, Employment Flows, and Labor Market Frictions,” Journal of Labor Economics, July 2016, 34 (3), 663–704.
  • Fenton (1960) Fenton, L., “The Sum of Log-Normal Probability Distributions in Scatter Transmission Systems,” IRE Transactions on Communications Systems, March 1960, 8 (1), 57–67.
  • Gibbons et al. (2019) Gibbons, Charles E., Juan Carlos Suárez Serrato, and Michael B. Urbancic, “Broken or Fixed Effects?,” Journal of Econometric Methods, January 2019, 8 (1).
  • Giles (1982) Giles, David E. A., “The Interpretation of Dummy Variables in Semilogarithmic Equations: Unbiased Estimation,” Economics Letters, January 1982, 10 (1), 77–79.
  • Goldsmith-Pinkham et al. (2024) Goldsmith-Pinkham, Paul, Peter Hull, and Michal Kolesár, “Contamination Bias in Linear Regressions,” February 2024.
  • Goodman-Bacon (2021) Goodman-Bacon, Andrew, “Difference-in-Differences with Variation in Treatment Timing,” Journal of Econometrics, December 2021, 225 (2), 254–277.
  • Halvorsen and Palmquist (1980) Halvorsen, Robert and Raymond Palmquist, “The Interpretation of Dummy Variables in Semilogarithmic Equations,” American Economic Review, 1980, 70 (3), 474–475.
  • Hansen (2022) Hansen, Bruce, Econometrics, Princeton University Press, June 2022.
  • Imbens and Angrist (1994) Imbens, Guido W. and Joshua D. Angrist, “Identification and Estimation of Local Average Treatment Effects,” Econometrica, 1994, 62 (2), 467–475.
  • Kennedy (1981) Kennedy, Peter E., “Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations,” American Economic Review, 1981, 71 (4), 801–801.
  • Kovak et al. (2021) Kovak, Brian K., Lindsay Oldenski, and Nicholas Sly, “The Labor Market Effects of Offshoring by U.S. Multinational Firms,” The Review of Economics and Statistics, May 2021, 103 (2), 381–396.
  • Manning (1998) Manning, Willard G., “The Logged Dependent Variable, Heteroscedasticity, and the Retransformation Problem,” Journal of Health Economics, June 1998, 17 (3), 283–295.
  • Marone and Sabety (2022) Marone, Victoria R. and Adrienne Sabety, “When Should There Be Vertical Choice in Health Insurance Markets?,” American Economic Review, January 2022, 112 (1), 304–342.
  • Meghir and Palme (2005) Meghir, Costas and Mårten Palme, “Educational Reform, Ability, and Family Background,” American Economic Review, March 2005, 95 (1), 414–424.
  • Mehta et al. (2007) Mehta, Neelesh B., Jingxian Wu, Andreas F. Molisch, and Jin Zhang, “Approximating a Sum of Random Variables with a Lognormal,” IEEE Transactions on Wireless Communications, July 2007, 6 (7), 2690–2699.
  • Mogstad et al. (2021) Mogstad, Magne, Alexander Torgovitsky, and Christopher R. Walters, “The Causal Interpretation of Two-Stage Least Squares with Multiple Instrumental Variables,” American Economic Review, November 2021, 111 (11), 3663–3698.
  • Mullahy and Norton (2024) Mullahy, John and Edward C. Norton, “Why Transform Y? The Pitfalls of Transformed Regressions with a Mass at Zero*,” Oxford Bulletin of Economics and Statistics, 2024, 86 (2), 417–447.
  • Roth and Sant’Anna (2023) Roth, Jonathan and Pedro H. C. Sant’Anna, “When Is Parallel Trends Sensitive to Functional Form?,” Econometrica, 2023, 91 (2), 737–747.
  • Silva and Tenreyro (2006) Silva, J. M. C. Santos and Silvana Tenreyro, “The Log of Gravity,” The Review of Economics and Statistics, November 2006, 88 (4), 641–658.
  • Słoczyński (2022) Słoczyński, Tymon, “Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights,” The Review of Economics and Statistics, May 2022, 104 (3), 501–509.
  • Sun and Abraham (2021) Sun, Liyang and Sarah Abraham, “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects,” Journal of Econometrics, December 2021, 225 (2), 175–199.
  • van Garderen (2001) van Garderen, Kees Jan, “Optimal Prediction in Loglinear Models,” Journal of Econometrics, August 2001, 104 (1), 119–140.
  • van Garderen and Shah (2002)    and Chandra Shah, “Exact Interpretation of Dummy Variables in Semilogarithmic Equations,” The Econometrics Journal, June 2002, 5 (1), 149–159.
  • Wooldridge (2010) Wooldridge, Jeffrey M., Econometric Analysis of Cross Section and Panel Data, Second Edition, MIT Press, October 2010.
  • Zhang and Gou (2022) Zhang, Fengqing and Jiangtao Gou, “A Unified Framework for Estimation in Lognormal Models,” Journal of Business & Economic Statistics, October 2022, 40 (4), 1583–1595.

Appendix

Table 1: Impact of Education Reform on Earnings
Estimates of ρ¯\bar{\rho} 95%95\% Confidence Intervals
τ¯^\hat{\bar{\tau}} ρ^a\hat{\rho}_{a} ρ^b\hat{\rho}_{b} ρ^c\hat{\rho}_{c} τ¯\bar{\tau} ρa\rho_{a} ρ¯\bar{\rho}
Assuming Homogeneous Treatment Effects
All 1.422 1.433 1.429 [-0.322, 3.167] [-0.321, 3.217]
low fedu,all abi 3.434 3.493 3.489 [1.635, 5.232] [1.649, 5.371]
low fedu,low abi 2.569 2.602 2.593 [-0.030, 5.168] [-0.030, 5.304]
low fedu,high abi 4.445 4.546 4.536 [1.756, 7.135] [1.771, 7.396]
high fedu,all abi -6.426 -6.224 -6.241 [-10.168, -2.683] [-9.668, -2.648]
Allowing Treatment Effect Heterogeneity across Gender
All 1.422 1.433 1.434 1.427 [-0.318, 3.163] [-0.318, 3.213] [-0.321, 3.213]
low fedu,all abi 3.428 3.488 3.493 3.486 [1.633, 5.223] [1.646, 5.362] [1.647, 5.366]
low fedu,low abi 2.561 2.594 2.600 2.586 [-0.034, 5.157] [-0.034, 5.292] [-0.040, 5.298]
low fedu,high abi 4.429 4.529 4.545 4.528 [1.770, 7.089] [1.785, 7.346] [1.794, 7.354]
high fedu,all abi -6.374 -6.175 -6.098 -6.129 [-10.140, -2.607] [-9.643, -2.574] [-9.573, -2.521]
  • 1. This table replicates Table 2 column 1 of Meghir and Palme (2005), studying ATE in percentage points of education reform on earnings. All values have been scaled by 100.

  • 2. The first panel includes results based on regression (24), which assumes constant treatment effects. The second panel includes results based on regression (25), which allows treatment effects to vary by gender. For both regressions, standard errors are clustered by municipalities of schooling. Each regression is run on 5 samples: (1) the full sample(N=209,683N=209,683); (2) individuals with low father’s education(N=173,435N=173,435); (3) individuals with low father’s education and low personal ability(N=92,473N=92,473); (4) individuals with low father’s education and high personal ability(N=80,962N=80,962); (5) individuals with high father’s education(N=36,248N=36,248).

  • 3. The left panel are four estimates of ρ¯\bar{\rho}: τ¯^=gw^gτ^g\hat{\bar{\tau}}=\sum_{g}\hat{w}_{g}\hat{\tau}_{g}, ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1, ρ^b\hat{\rho}_{b} in equation (8), and ρ^c\hat{\rho}_{c} in equation (9). The estimator ρ^d\hat{\rho}_{d} is omitted as it is indistinguishable from ρ^c\hat{\rho}_{c} when rounded to three digits. The right panel are 95% confidence intervals of τ¯\bar{\tau} as in equation (14), of ρa\rho_{a} as in equation (16), and of ρ¯\bar{\rho} as in equation (23). In the homogeneous effects case, ρ^a=ρ^b\hat{\rho}_{a}=\hat{\rho}_{b}, and confidence intervals of ρa\rho_{a} are identical to those of ρ¯\bar{\rho}, hence ρ^a\hat{\rho}_{a} and CICI for ρa\rho_{a} are omitted.

Table 2: Impact of Minimum Wage Raise on Teen Employment
Estimates of ρ¯\bar{\rho} 95%95\% Confidence Intervals
τ¯^\hat{\bar{\tau}} ρ^a\hat{\rho}_{a} ρ^b\hat{\rho}_{b} ρ^c\hat{\rho}_{c} τ¯\bar{\tau} ρa\rho_{a} ρ¯\bar{\rho}
All Treated Units
atet -4.891 -4.774 -4.729 -4.738 [-6.603, -3.180] [-6.389, -3.130] [-6.536, -3.293]
Cohort
2004 -8.263 -7.931 -7.889 -7.896 [-10.293, -6.233] [-9.781, -6.043] [-10.032, -6.390]
2006 -4.479 -4.380 -4.349 -4.359 [-6.718, -2.239] [-6.498, -2.214] [-6.563, -2.308]
2007 -2.874 -2.833 -2.833 -2.844 [-5.752, 0.003] [-5.589, 0.003] [-5.589, 0.003]
Event Time
-6 -0.250 -0.250 -0.250 -0.340 [-8.583, 8.083] [-8.225, 8.419] [-8.225, 8.419]
-5 -3.214 -3.163 -3.147 -3.231 [-8.942, 2.514] [-8.554, 2.546] [-8.617, 2.437]
-4 -1.638 -1.625 -1.571 -1.640 [-6.815, 3.539] [-6.588, 3.602] [-6.606, 3.539]
-3 -0.192 -0.192 -0.170 -0.226 [-3.984, 3.600] [-3.906, 3.666] [-4.000, 3.513]
-2 0.887 0.891 0.902 0.888 [-1.222, 2.997] [-1.215, 3.042] [-1.329, 2.945]
0 -2.864 -2.823 -2.820 -2.830 [-4.878, -0.850] [-4.761, -0.846] [-4.868, -0.953]
1 -6.730 -6.508 -6.508 -6.516 [-8.667, -4.792] [-8.302, -4.679] [-8.440, -4.833]
2 -9.491 -9.055 -9.055 -9.062 [-12.029, -6.953] [-11.334, -6.717] [-11.334, -6.717]
3 -12.623 -11.859 -11.859 -11.873 [-16.065, -9.182] [-14.841, -8.773] [-14.841, -8.773]
Calendar Year
2004 -4.808 -4.694 -4.694 -4.697 [-6.361, -3.255] [-6.163, -3.203] [-6.163, -3.203]
2005 -6.130 -5.946 -5.946 -5.951 [-8.058, -4.203] [-7.741, -4.116] [-7.741, -4.116]
2006 -4.306 -4.215 -4.157 -4.166 [-6.662, -1.951] [-6.445, -1.932] [-6.517, -2.034]
2007 -4.971 -4.850 -4.801 -4.812 [-7.324, -2.618] [-7.063, -2.584] [-7.118, -2.639]
  • 1. This table replicates Table 3 Panel B in Callaway and Sant’Anna (2021), exploring the impact of minimum wage on teen employment.

  • 2. The four panels are the results of ATE in percentage points for all treated units, each cohort, each event time, and each calendar year. All values have been scaled by 100.

  • 3. The left panel are four estimates of ρ¯\bar{\rho}: τ¯^=gw^gτ^g\hat{\bar{\tau}}=\sum_{g}\hat{w}_{g}\hat{\tau}_{g}, ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1, ρ^b\hat{\rho}_{b} in equation (8), and ρ^c\hat{\rho}_{c} in equation (9). The estimator ρ^d\hat{\rho}_{d} is omitted as it is indistinguishable from ρ^c\hat{\rho}_{c} when rounded to three digits. The right panel are 95% confidence intervals of τ¯\bar{\tau} as in equation (14), of ρa\rho_{a} as in equation (16), and of ρ¯\bar{\rho} as in equation (23).

Table 3: Monte Carlo Results for Large ρ\rho and Normal Errors
Estimates of ρ¯\bar{\rho} Empirical Rej. Rate
N τ¯^\hat{\bar{\tau}} ρ^a\hat{\rho}_{a} ρ^b\hat{\rho}_{b} ρ^c\hat{\rho}_{c} ρ^d\hat{\rho}_{d} zτz_{\tau} zμz_{\mu}
True Values
-0.809 -0.806 0 0 0 5 5
Estimates
20 -0.721 20.288 34.375 0.677 0.040 5.961 6.797
(61.636) (85.942) (97.127) (71.749) (71.331)
50 -0.743 6.396 11.435 0.039 0.011 5.133 5.508
(37.165) (41.656) (43.832) (39.077) (39.065)
100 -0.813 2.493 5.286 -0.019 -0.022 5.075 5.228
(25.599) (26.675) (27.485) (26.086) (26.086)
200 -0.832 0.770 2.543 -0.028 -0.028 5.058 5.121
(17.882) (18.165) (18.526) (18.062) (18.062)
500 -0.831 -0.202 0.986 -0.025 -0.025 4.997 5.014
(11.212) (11.235) (11.390) (11.277) (11.277)
1000 -0.836 -0.520 0.474 -0.029 -0.029 5.144 5.027
(7.932) (7.905) (7.997) (7.957) (7.957)
2000 -0.846 -0.688 0.211 -0.039 -0.039 5.209 4.935
(5.584) (5.550) (5.610) (5.595) (5.595)
5000 -0.817 -0.751 0.091 -0.009 -0.009 5.600 5.018
(3.542) (3.517) (3.553) (3.550) (3.550)
10000 -0.825 -0.791 0.033 -0.017 -0.017 6.196 4.978
(2.504) (2.485) (2.510) (2.508) (2.508)
100000 -0.805 -0.799 0.009 0.004 0.004 17.465 5.111
(0.793) (0.787) (0.794) (0.794) (0.794)
1000000 -0.808 -0.805 0.001 0.001 0.001 89.760 5.042
(0.250) (0.248) (0.250) (0.250) (0.250)
  • 1. Monte Carlo simulation results for the case with 100ρ=(16,8,8,16)100\rho=(-16,-8,8,16) and normal errors. All results are scaled by 100.

  • 2. The left panel reports mean and standard errors (in the parentheses) across 100,000 replications of five estimates of ρ¯\bar{\rho}: τ¯^=gw^gτ^g\hat{\bar{\tau}}=\sum_{g}\hat{w}_{g}\hat{\tau}_{g}, ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1, ρ^b\hat{\rho}_{b} in equation (8), ρ^c\hat{\rho}_{c} in equation (9), and ρ^d\hat{\rho}_{d} in equation (11).

  • 3. The right panel reports empirical rejection rates across 100,000 replications of the z-tests for H0:τ¯=0H_{0}:\bar{\tau}=0 using zτz_{\tau} in (13) and for H0:ρ¯=0H_{0}:\bar{\rho}=0 using zμz_{\mu} in (21). Results of z-test for H0:ρa=0H_{0}:\rho_{a}=0 using zaz_{a} in (15) are equivalent to testing H0:τ¯=0H_{0}:\bar{\tau}=0 using zτz_{\tau} in (13) when ρ0=0\rho_{0}=0 and therefore omitted.

Table 4: Monte Carlo Results for Small ρ\rho and Normal Errors
Estimates of ρ¯\bar{\rho} Empirical Rej. Rate
N τ¯^\hat{\bar{\tau}} ρ^a\hat{\rho}_{a} ρ^b\hat{\rho}_{b} ρ^c\hat{\rho}_{c} ρ^d\hat{\rho}_{d} zτz_{\tau} zμz_{\mu}
True Values
-0.201 -0.200 0 0 0 5 5
Estimates
20 -0.454 20.303 33.730 0.219 -0.413 5.914 6.776
(61.319) (84.679) (95.212) (70.832) (70.440)
50 -0.338 6.873 11.293 -0.073 -0.100 5.305 5.637
(37.291) (41.924) (43.799) (39.113) (39.102)
100 -0.148 3.169 5.348 0.036 0.034 5.066 5.242
(25.566) (26.851) (27.486) (26.086) (26.085)
200 -0.259 1.345 2.508 -0.065 -0.065 5.014 5.075
(17.862) (18.247) (18.477) (18.013) (18.013)
500 -0.216 0.417 0.999 -0.013 -0.013 5.103 5.148
(11.239) (11.324) (11.399) (11.284) (11.284)
1000 -0.213 0.102 0.491 -0.012 -0.012 5.088 5.129
(7.932) (7.953) (7.989) (7.949) (7.949)
2000 -0.212 -0.055 0.240 -0.011 -0.011 4.941 4.966
(5.603) (5.605) (5.623) (5.609) (5.609)
5000 -0.196 -0.134 0.104 0.004 0.004 4.987 5.016
(3.530) (3.527) (3.536) (3.532) (3.532)
10000 -0.202 -0.171 0.048 -0.002 -0.002 5.110 5.006
(2.503) (2.499) (2.506) (2.505) (2.505)
100000 -0.199 -0.196 0.006 0.001 0.001 5.719 5.012
(0.791) (0.789) (0.791) (0.791) (0.791)
1000000 -0.200 -0.199 0.001 0.001 0.001 12.523 4.954
(0.249) (0.249) (0.249) (0.249) (0.249)
  • 1. Monte Carlo simulation results for the case with 100ρ=(8,4,4,8)100\rho=(-8,-4,4,8) and normal errors. All results are scaled by 100.

  • 2. The left panel reports mean and standard errors (in the parentheses) across 100,000 replications of five estimates of ρ¯\bar{\rho}: τ¯^=gw^gτ^g\hat{\bar{\tau}}=\sum_{g}\hat{w}_{g}\hat{\tau}_{g}, ρ^a=exp(τ¯^)1\hat{\rho}_{a}=\operatorname{exp}(\hat{\bar{\tau}})-1, ρ^b\hat{\rho}_{b} in equation (8), ρ^c\hat{\rho}_{c} in equation (9), and ρ^d\hat{\rho}_{d} in equation (11).

  • 3. The right panel reports empirical rejection rates across 100,000 replications of the z-tests for H0:τ¯=0H_{0}:\bar{\tau}=0 using zτz_{\tau} in (13) and for H0:ρ¯=0H_{0}:\bar{\rho}=0 using zμz_{\mu} in (21). Results of z-test for H0:ρa=0H_{0}:\rho_{a}=0 using zaz_{a} in (15) are equivalent to testing H0:τ¯=0H_{0}:\bar{\tau}=0 using zτz_{\tau} in (13) when ρ0=0\rho_{0}=0 and therefore omitted.