Distribution Regression in Duration Analysis:
an Application to Unemployment Spells

Miguel A. Delgado^† Andrés García-Suaza^‡ and Pedro H.C. Sant’Anna^†† ^†Universidad Carlos III de Madrid, Calle Madrid 126, 28903 Getafe, Madrid, Spain delgado@est-econ.uc3m.es ^‡Universidad del Rosario, Calle 14 625, Bogotá, Colombia. andres.garcia@urosario.edu.co ^‡ Microsoft and Vanderbilt University, 415 Calhoun Hall, Nashville, TN 37240, USA. pedro.h.santanna@vanderbilt.edu

(November 2021; …)

Abstract

This article proposes inference procedures for distribution regression models in duration analysis using randomly right-censored data. This generalizes classical duration models by allowing situations where explanatory variables’ marginal effects freely vary with duration time. The article discusses applications to testing uniform restrictions on the varying coefficients, inferences on average marginal effects, and others involving conditional distribution estimates. Finite sample properties of the proposed method are studied by means of Monte Carlo experiments. Finally, we apply our proposal to study the effects of unemployment benefits on unemployment duration.

Keywords: Conditional distribution; Duration models; Random Censoring; Unemployment duration; Varying coefficients model.

JEL Codes: C14, C24, C41, J64.

^†^†volume: 20

1 Introduction

Existing semiparametric duration models can be broadly classified into two groups: those based on the conditional hazard and those based on the quantile regression. The former includes the proportional hazard model (Cox, 1972, 1975), the proportional odds model (Clayton, 1976; Bennett, 1983; Murphy et al., 1997), and the accelerated failure time model (Kalbfleisch and Prentice, 1980); see Guo and Zeng (2014) for an overview. In these models, the conditional hazard identifies the conditional cumulative distribution expressed in terms of the error’s marginal distribution of a transformed failure time regression model (see, e.g., Hothorn et al., 2014). Censored quantile regression proposals are relatively more recent, see, e.g., Ying et al. (1995), Honore et al. (2002), Portnoy (2003), Peng and Huang (2008), and Wang and Wang (2009).

Conditional hazard and quantile regression models are alternative modeling strategies with advantages and drawbacks. Classical conditional hazard specifications impose identification conditions that are difficult to justify in some circumstances. For instance, the proportional hazard specification rules out important forms of heterogeneity, see, e.g., Portnoy (2003) and the discussion in Section 2. On the other hand, models based on quantile regression specifications avoid this problem, but impose that the underlying conditional cumulative distribution of duration time is absolutely continuous, which rules out other sampling schemes such as discrete outcomes.

The distribution regression approach was proposed by Foresi and Peracchi (1995) and formalized by Chernozhukov et al. (2013); see also Rothe and Wied (2013, 2019), Chernozhukov et al. (2019) and Chernozhukov et al. (2020) for later developments. One attractive feature of this modeling strategy is that marginal effects associated with distribution regression models are more flexible than those obtained from classical conditional hazard specifications, as they can accommodate richer forms of heterogeneity. In contrast to quantile regression, distribution regression can accommodate discrete, continuous and mixed duration data in a unified manner under fairly weak regularity conditions. Chernozhukov et al. (2013) show that distribution regression encompasses the Cox (1972) model as a special case and represents a useful alternative to quantile regression.

The distribution regression model specifies the distribution of the duration outcome variable by means of a generalized linear model with known link function and nonparametric varying coefficient depending on duration time. When data is uncensored, the varying coefficients at each duration value are consistently estimated by binary regressions. This method is also valid using censored data with known censoring time, which is an unlikely situation in duration studies, e.g., unemployment duration spells. When duration and censoring times are potentially unobserved, the standard binary regression approach lead to inconsistent estimators.

In this article, we propose a weighted binary regression procedure using Kaplan and Meier (1958) weights. The varying coefficients are identified by means of a set of moment restrictions of the duration time and covariates’ joint distribution. These moments are consistently estimated by their corresponding Kaplan-Meier integrals when the duration time is subjected to censoring; if censoring is not an issue, the Kaplan-Meier integrals reduce to the sample analogue of the moment restrictions. The resulting estimator admits a representation as a non-degenerate U-statistic of order three, whose Hájek projection is asymptotically distributed as a normal with an asymptotic variance that can be consistently estimated from the data. The resulting inference procedures on the coefficients do not rely on choosing tuning parameters such as bandwidths, imposing smoothness conditions on the censoring random variable, or using truncation arguments.

We establish the consistency and finite-dimensional distribution convergence of the varying coefficients under weak regularity conditions. Under more restrictive assumptions we justify the convergence in distribution of the estimated coefficient function as a random element of a suitable metric space. We also justify bootstrap-based inference procedures.

In order to justify the asymptotic inferences based on our proposed method, we impose restrictions on the joint distribution of the duration time, censoring time and covariates. Stute (1993, 1996) provided strong consistency and a central limit theorem for Kaplan-Meier integrals that estimate moments of (partially observed) durations time and covariates under the following identification conditions: (a) the duration and censoring times are independent, and (b) the event of being censored is conditionally independent of the covariates given the duration time. Stute shows that the two conditions suffice to identify the joint distribution of duration time and covariates from the observed censored data, and, hence, to identify corresponding moments of these conditional distributions. Of course, the identification of alternative models may require alternative regularity conditions. For instance, identification of the Cox (1972)’s hazard model, Aalen (1980)’s additive hazard model, or censored quantile regressions models (Portnoy, 2003) only requires that duration and censoring times are independent given covariates. However, as we mention in Section 2, these models usually exclude important types of heterogeneity, or restrict their attention to absolutely continuous duration data. Alternatively, one can potentially estimate the moment equations using inverse probability weighting (IPW) methods, as suggested by Robins and Rotnitzky (1992, 1995) and later extended by Wooldridge (2007) to general missing data problems. Although this path allows one to only assume that duration and censoring times are independent given covariates, it also requires that the conditional distribution of the censoring times given covariates is (uniformly) bounded away from one, which rules out discrete distributions and the most popular distribution functions used in survival analysis such as exponential, Gamma, log-normal and Weibull, among others. If one is not comfortable with these additional restrictions, truncation/trimming arguments need to be carefully introduced, and the choice of tuning parameters becomes a first-order concern. By adopting the Kaplan-Meier integrals approach, we avoid these alternative restrictions, but rely on conditions (a) and (b) above.

We illustrate the relevance of our proposal by assessing how changes in unemployment insurance benefits affect, on average, the distribution of unemployment duration, using data from the Survey of Income and Program Participation for the period 1985-2000. We find that, by allowing the distributional marginal effects to vary over the unemployment spell, our proposed method can reveal interesting insights when compared to traditional hazard models as those used by Chetty (2008). For instance, our results suggest a non-monotone marginal effect of an increase in unemployment insurance on the unemployment duration distribution. Such a finding is in sharp contrast with those obtained using classical proportional hazard models. On the other hand, our results agree with Chetty (2008) in that increases in unemployment insurance have larger effects on liquidity-constrained workers, suggesting that unemployment insurance affects unemployment duration not only through a moral hazard channel but also through a liquidity effect channel.

The rest of the article is organized as follows. Section 2 introduces the basic notation and motivates distribution regression models using duration data as an alternative to classical conditional hazard modeling. Section 3 describes our estimation procedure, whereas Section 4 introduces regularity conditions needed to justify inferences on the distribution regression varying coefficients. All the results are proved in the Supplementary Appendix. Application of the results to different contexts are placed in Section 5. Section 6 briefly summarizes the results of the Monte Carlo simulations – detailed discussion is presented in the Supplementary Appendix¹¹1The Supplementary Appendix is available at https://pedrohcgs.github.io/files/Delgado_GarciaSuaza_SantAnna_2021_EJ-Appendix.pdf. Finally, we apply the proposed techniques to investigate the effect of unemployment benefits on unemployment duration in Section 7. All our replication files are available at https://github.com/pedrohcgs/KMDR-replication.

2 Distribution Regression with Duration Outcomes

Consider the $\mathbb{R}^{1+k}-valued$ random vector $\left(T,X\right)$ defined on $\left(\Omega,\mathcal{F},\mathbb{P}\right),$ where $T$ is the duration outcome of interest and $X$ is a $k-$ dimensional vector of time-invariant covariates, with supports $\mathcal{T}$ and $\mathcal{X},$ respectively. Henceforth, let $\mathbf{X}=(1,X^{\prime})^{\prime}$ and $\mathbf{x}=(1,x^{\prime})^{\prime}$ . We assume that the conditional cumulative distribution function (CDF) of $T$ given $X$ follows the distribution regression (DR) model,

F_{\left.Y\right|X}\left(\left.t\right|X\right)=\Lambda\left(\varphi\left(t\right)+X^{\prime}\beta_{0}\left(t\right)\right)\text{ for some }\left(\varphi\left(t\right),\beta_{0}\left(t\right)\right)\in\Theta\text{ a.s.}

(2.1)

where $\theta_{0}\left(\cdot\right)=\left(\varphi\left(\cdot\right),\beta_{0}\left(\cdot\right)^{\prime}\right)^{\prime}\mapsto\Theta\subseteq\mathbb{R}^{k+1}$ is a vector of nonparametric functions, and $\Lambda$ is a known link function.

Chernozhukov et al. (2013) point out that the DR model is a flexible alternative to classical duration models as it allows coefficients to vary with duration time. In particular, they show that it nests the Cox (1972) proportional hazard (PH) model. Model (2.1) also generalizes other traditional models commonly adopted in duration analysis such as Kalbfleisch and Prentice (1980) accelerated failure time (AFT) model, and Clayton (1976) proportional odds (PO) model.

We note that all the aforementioned classical duration models are special cases of the linear transformation model

\varphi\left(T\right)=-X^{\prime}\beta_{0}+\Lambda^{-1}(U)\text{ }a.s.,

(2.2)

where $\beta_{0}\in\mathbb{R}^{k}$ is a vector of unknown parameters, $\varphi$ is a (potentially unknown) monotonically increasing transformation function, $\Lambda^{-1}$ is the $\Lambda^{\prime}s$ quantile function, and $U$ is uniformly distributed in $\left[0,1\right]$ independently of $X$ . For instance, $(a)$ the PH model corresponds to model (2.2) with $\Lambda$ the complementary log-log (cloglog) link function, $\Lambda\left(u\right)=1-\exp\left(-\exp\left(u\right)\right)$ ; $(b)$ the AFT model corresponds to (2.2) with $\Lambda$ commonly assumed to be the link function of a log-logistic or of a Weibull distribution; and $(c)$ the PO model corresponds to (2.2) with $\Lambda$ the logistic link function, $\Lambda\left(u\right)=\exp\left(u\right)\left/\left(1+\exp\left(u\right)\right)\right.$ ; see, e.g. Doksum and Gasko (1990), Cheng et al. (1995) and Hothorn et al. (2014). Indeed, the CDF associated with (2.2) is given by

F_{T|X}\left(t|X\right)=\Lambda\left(\varphi\left(t\right)+X^{\prime}\beta_{0}\right).

(2.3)

Therefore, (2.1) is a natural generalization of (2.3) that allows all the slope coefficients varying with $t$ . Other duration models with varying slope coefficients are also special cases of (2.1). For instance, Aalen (1980) semi-parametric additive hazard model reduces to (2.1) with $\Lambda\left(u\right)=1-\exp\left(-u\right)$ . The non-proportional odds model considered by McCullagh (1980) and Armstrong and Sloan (1989) can also be expressed in terms of the DR specification (2.1) when $\Lambda$ is the logistic link function.

We conclude this section by emphasizing that allowing for varying slope coefficients is particularly important to capture richer heterogeneous effects of covariates across the distribution of the duration outcome. Indeed, traditional duration specifications, such as the PH, PO or AFT models, implicitly impose that all covariates affect the CDF of $T$ in a proportional, and monotonic form. If $\Lambda$ is differentiable with Lebesgue density $\lambda$ , partial effects for these models have the form

\frac{d}{dx}F_{T|X}\left(t|x\right)=\beta_{0}\lambda\left(\varphi\left(t\right)+x^{\prime}\beta_{0}\right).

Thus, the sign of the marginal effect of any $X$ ’s component is not allowed to vary with $t$ , which may be too restrictive in some applications. For instance, in the context of the empirical application in Section 7, (2.3) does not allow a non-monotonic effect, e.g. U-shaped, ruling out non-stationary search models in the spirit of van den Berg (1990). The DR model (2.1) bypass such limitations since

\frac{d}{dx}F_{T|X}\left(t|x\right)=\beta_{0}(t)\lambda\left(\varphi\left(t\right)+x^{\prime}\beta_{0}\left(t\right)\right),

which sign is allowed to vary with $t$ . We view this added flexibility as an attractive feature of the DR modeling approach.

3 The Kaplan-Meier Distribution Regression

The main challenge when dealing with duration data is that the outcome of interest $T$ is subject to censoring according to a variable $C$ . As so, inferences must be based on observations $\left\{Y_{i},X_{i},\delta_{i}\right\}_{i=1}^{n}$ , with $Y_{i}=\min\left(T_{i},C_{i}\right)$ the realized (potentially censored) outcome, $\delta_{i}=1_{\left\{T_{i}\leq C_{i}\right\}}$ indicates whether or not $T$ is observed, and $\left\{T_{i},C_{i},X_{i}\right\}_{i=1}^{n}$ are independent and identically distributed ( $iid$ ) as $\left(T,C,X\right)$ . Henceforth, $1_{\left\{A\right\}}$ is the indicator function of the event $A$ .

Suppose for the moment that $\delta_{i}=1,$ i.e., $T_{i}=Y_{i},$ for all $i=1,\dots,n$ . In this case, the joint $\left(T,X^{\prime}\right)^{\prime}$ CDF, $F\left(t,x\right)$ , is consistently estimated by its sample version $\tilde{F}_{n}\left(t,x\right)=n^{-1}\sum_{i=1}^{n}1_{\left\{Y_{i}\leq y,X_{i}\leq x\right\}}$ , where inequalities are coordinate-wise. Therefore, following Foresi and Peracchi (1995) and Chernozhukov et al. (2013), $\theta_{0}\left(t\right)$ can be consistently estimated by $\tilde{\theta}_{n}(t)$ , the maximizer of the conditional likelihood function of $\left\{1_{\left\{Y_{i}\leq t\right\}},X_{i}\right\}_{i=1}^{n}$

\tilde{Q}_{n}(\theta,t)=\int_{\mathbb{R}^{1+k}}\ln p_{\theta,t}(1_{\left\{y\leq t\right\}},x)\tilde{F}_{n}(dy,dx)=\frac{1}{n}\sum_{i=1}^{n}\ln p_{\theta,t}(1_{\left\{Y_{i}\leq t\right\}},X_{i}),

(3.1)

with

p_{\theta,t}(d,x)=\Lambda\left(\boldsymbol{x}^{\prime}\theta\right)^{d}\left(1-\Lambda\left(\boldsymbol{x}^{\prime}\theta\right)\right)^{1-d}.

However, when $T$ is subject to right-censoring, i.e., $T\neq Y$ , $\tilde{F}_{n}$ is no longer a consistent estimator of $F$ and, hence, neither is $\tilde{\theta}_{n}\left(t\right)$ .

Since $T$ is not always observed, $F$ and $\theta(t)$ must be identified from the joint distribution of the observed data on $\left(Y,X,\delta\right)$ . However, this is not always possible without additional information. Henceforth, for any random variable $\xi$ , which can be $T,~C$ or $Y$ , define $F_{\xi}(t)\equiv\mathbb{P}\left(\xi\leq t\right)$ and $\tau_{\xi}\equiv\inf\left(t:\mathbb{P}\left(\xi\leq t\right)=1\right)$ , and let $A$ denote the (possibly empty) set of $F_{T}$ jumps. Also, for any generic function $g$ , $g(t-)\equiv\lim_{s\uparrow t}g(s)$ . If no information about $T$ beyond $\tau_{Y}$ is available from the data, identification of

F_{\ast}(t,x)\equiv\left\{\begin{array}[]{l}F(t,x)\text{ for }t<\tau_{Y}\\ F(\tau_{Y}-,x)+1_{\left\{\tau_{Y}\in A\right\}}\left[F(\tau_{Y},x)-F(\tau_{Y}-,x)\right]\text{ for }t\geq\tau_{Y}.\end{array}\right.

is the best that one can hope for; see, e.g., Tsiatis (1975, 1981), and Stute (1993). In order to identify $F_{\ast}$ we impose the following assumptions.

Assumption 3.1

$T$ and $C$ are independent.

Assumption 3.2

$\delta$ and $X$ are conditionally independent given $T$ .

Assumption 3.1 is introduced to identify $F_{\ast}\left(\cdot,\infty,...,\infty\right)$ (see Tsiatis, 1975, 1981 for discussion), whereas Assumption 3.2 is introduced by Stute (1993) and allows us to incorporate covariates into the analysis. Taken together, these two assumptions imply that covariates should have no effect on the probability of being censored once $T$ is known. Of course, Assumptions 3.1 and 3.2 hold if $C$ is independent of $\left(T,X\right)$ , but can also hold under more general circumstances. See Stute (1993, 1996, 1999) for a discussion on Assumptions 3.1 and and 3.2, and the identification of $F_{*}$ and its moments. Identification can be achieved under alternative conditions, in the context of alternative conditional survival models, or alternative estimation procedures. See Introduction for further comments.

Remark 3.1

Under Assumptions 3.1 and 3.2, $\tau_{Y}=\min\left(\tau_{T},\tau_{C}\right)$ . Therefore, $F=F_{\ast}$ if either $\tau_{T}<\tau_{C}$ , or $\tau_{C}=\infty$ irrespective of whether $\tau_{T}$ is finite or not. For most duration distributions considered in the literature, $\tau_{T}=\infty=\tau_{C}$ ; hence, $\tau_{Y}=\infty$ . When $\tau_{C}<\tau_{T}$ , $F\neq F_{\ast}$ in general, and $F$ cannot be consistently estimated as $T$ is not observed beyond $\tau_{C}$ . When $\tau_{T}=\tau_{C}<\infty,$ $F=F_{\ast}$ depending on the local structure of $F_{T}$ and $F_{C}$ around the common endpoint. Notice that the above endpoint conditions for $F=F_{\ast}$ can be satisfied with discontinuous $F$ .

In order to identify $\theta_{0}(t)$ , we ensure that $F=F_{\ast}$ by imposing the following condition; see Stute (1999) for a similar assumption in the case of nonlinear regression with randomly censored outcomes.

Assumption 3.3

$\tau_{T}<\tau_{C}$ or $\tau_{C}=\infty.$

Thus, $\theta_{0}\left(t\right)$ is identified as the parameter value that maximizes $Q\left(\theta,t\right)$ under standard conditions in binary regression, e.g., Amemiya (1985, Section 9.2.2), or more recently, van der Vaart (1998, Example 5.40).

Henceforth, $Y_{1:n}\leq...\leq Y_{n:n}$ are the ordered $Y-values$ , where ties within outcomes $T$ or within censoring times $C$ are ordered arbitrarily and ties among $T$ and $C$ are treated as if the former precedes the latter. For observations $\left\{\xi_{i}\right\}_{i=1}^{n}$ of a random variable $\xi$ , which may be $\delta$ or $X$ , $\xi_{\left[i:n\right]}$ is the $i-th$ $\xi-$ concomitant of the order statistics $\left\{Y_{i:n}\right\}_{i=1}^{n},$ i.e., $\xi_{\left[i:n\right]}=\xi_{j}$ if $Y_{i:n}=Y_{j}$ .

In the absence of covariates, $F_{\ast T}(t)=F_{\ast}\left(t,\infty,...,\infty\right)$ is consistently estimated by the Kaplan and Meier (1958) product-limit estimator,

\hat{F}_{Tn}(t)=1-\prod_{Y_{i:n}\leq t}\left(1-\frac{\delta_{\left[i:n\right]}}{n-i+1}\right),\text{ }t\in\mathbb{R}^{+}.

By noticing that the jumps of $\hat{F}_{Tn}$ at $Y_{i:n}$ are

W_{in}=\hat{F}_{Tn}(Y_{i:n})-\hat{F}_{Tn}(Y_{i-1:n})=\frac{\delta_{\left[i:n\right]}}{n-i+1}\prod_{j=1}^{i-1}\left(1-\frac{\delta_{\left[j:n\right]}}{n-j+1}\right),

(3.2)

we can express $\hat{F}_{Tn}$ in an additive form as

\hat{F}_{Tn}(t)=\sum_{i=1}^{n}W_{in}\cdot 1_{\left\{Y_{i:n}\leq t\right\}}.

When covariates are present, the natural $F_{\ast}\left(t,x\right)$ estimator is

\hat{F}_{n}(t,x)=\sum_{i=1}^{n}W_{in}\cdot 1_{\left\{Y_{i:n}\leq t,X_{\left[i:n\right]}\leq x\right\}},

(3.3)

see, Stute (1993, 1996). Stute (1993) showed that, under Assumptions 3.1-3.3,

\lim_{n\rightarrow\infty}\sup_{\left(t,x\right)\in\mathcal{T}\times\mathbb{R}^{k}}\left|\left(\hat{F}_{n}-F\right)\left(t,x\right)\right|=0~a.s.

Thus, (3.3) is indeed a natural candidate to replace $\tilde{F}_{n}$ in (3.1) under censoring. This suggests estimating $Q_{\ast}\left(t,\theta\right)$ by the Kaplan-Meier integral,

\hat{Q}_{n}\left(\theta,t\right)=\int\ln p_{\theta}(1_{\left\{y\leq t\right\}},x)\hat{F}_{n}(dy,dx)=\sum_{i=1}^{n}W_{in}\cdot\ln p_{\theta}(1_{\left\{Y_{i:n}\leq t\right\}},X_{\left[i:n\right]}),

which is a weighted version of $\tilde{Q}_{n}\left(\theta,t\right)$ .

The Kaplan-Meier distribution regression (KMDR) estimator of $\theta_{0}(t)$ is then given by

\hat{\theta}_{n}\left(t\right)=\underset{\theta\in\Theta}{\arg\max}\text{ }\hat{Q}_{n}\left(\theta,t\right).

Notice that, to compute the KMDR estimators, one simply needs to run a weighted binary regression, where the weights are the (random) Kaplan-Meier weights $W_{in}$ . In the absence of censoring, i.e., $Y_{i}=T_{i},$ $\delta_{i}=1$ , $i=1,..,n$ , we have that the Kaplan-Meier weights $W_{in}=n^{-1}$ , and $\hat{\theta}_{n}\left(t\right)$ reduces to $\tilde{\theta}_{n}\left(t\right).$

4 Asymptotic Theory

In this section, we present the asymptotic properties of our proposed KMDR estimator $\hat{\theta}_{n}(t)$ for $\theta_{0}\left(t\right)$ .

In order to establish consistency, we impose the following fairly weak assumption which is compatible with discrete, continuous and mixed duration outcomes.

Assumption 4.4

The CDF specification $(\ref{dr})$ holds, where $\Lambda:\mathbb{R\mapsto}\left[0,1\right]$ is a continuously differentiable monotone function, the distribution of $X$ is not concentrated on an affine subspace of $\mathbb{R}^{k-1},$ and $\mathbb{E}\left\|X\right\|^{2}<\infty.$

Next theorem, like any other result in the paper, is proved in the Supplemental Appendix. The proof applies the arguments in Example 5.40 in van der Vaart (1998) to show that

	$\displaystyle\sup_{\theta\in\Theta}\left\|\left(\hat{Q}_{n}-Q\right)\left(\theta,t\right)\right\|=o_{p}(1),$
	$\displaystyle\sup_{\theta:\left\\|\theta-\theta_{\ast}(t)\right\\|\geq\varepsilon}Q\left(\theta,t\right)<Q\left(\theta_{0}(t),t\right)\text{ for all }\varepsilon>0\text{ and each }t\in\mathcal{T}\text{.}$

Theorem 4.1

Under Assumptions 3.1, 3.2, 3.3 and 4.4, any sequence of estimators $\left\{\hat{\theta}_{n}(t)\right\}_{n\geq 1}$ that satisfies

\hat{Q}_{n}\left(\hat{\theta}_{n}(t)\right)\geq\hat{Q}_{n}\left(\theta_{0}(t)\right)-o_{p}\left(1\right),

converges in probability to $\theta_{0}(t)$ , as $n\rightarrow\infty$ .

In order to provide the finite dimensional asymptotic distribution of $\hat{\theta}_{n}(t),$ we need the following assumption.

Assumption 4.5

$\theta_{0}\left(t\right)$ is an inner point of $\Theta$ , which is a compact subset of $\mathbb{R}^{1+k}.$ Also, let $\mathcal{V}=\{\boldsymbol{x}^{\prime}\theta:\boldsymbol{x}=(1,x^{\prime})^{\prime},x\in\mathcal{X},\theta\in\Theta\}$ ; for $v\in\mathcal{V}$ , $\Lambda\left(v\right)$ is bounded away from zero and one, and admits a continuously differentiable Lebesgue density $\lambda(v)$ .

The restriction on $\Lambda$ in Assumption 4.5 is a classical assumption, which can be found, for instance, in Amemiya (1985) Assumption 9.2.1. The assumption is satisfied by the normal and logistic distribution (probit and logit), but also for many other distributions. In the general case, Assumption 4.5 essentially rules out extremes $t$ , since, in these cases, $\Lambda$ may be near zero or one. This assumption can be relaxed at the cost of more involved proofs.

The score function is given by

\hat{\Psi}_{n}\left(\theta,t\right)=\frac{\partial}{\partial\theta}\hat{Q}_{n}\left(\theta,t\right)=\sum_{i=1}^{n}W_{ni}\cdot\psi_{\theta t}\left(Y_{i:n},X_{\left[i:n\right]}\right),

where

\psi_{\theta t}\left(y,x\right)=\frac{1_{\left\{y\leq t\right\}}-\Lambda(\mathbf{x}^{\prime}\theta)}{\Lambda(\mathbf{x}^{\prime}\theta)\left[1-\Lambda(\mathbf{x}^{\prime}\theta)\right]}\lambda(\mathbf{x}^{\prime}\theta)\mathbf{x}.

The conditional Fisher information that the binary variable $1_{\left\{T\leq t\right\}}$ contains about $\theta_{0}(t)$ given $X$ is $\mathcal{I}_{0}\left(t\right)=\mathcal{I}\left(\theta_{0}\left(t\right),t\right)$ , where

\mathcal{I}\left(\theta,t\right)=\int_{\mathbb{R}^{1+k}}\psi_{\theta t}\left(y,x\right)\psi_{\theta t}^{\prime}\left(y,x\right)F\left(dy,dx\right).

(4.1)

Notice that, under our assumptions, $\theta\mapsto\mathcal{I}\left(\theta,\cdot\right)$ is continuously differentiable.

We first pay attention to the asymptotic distribution of the score $\hat{\Psi}_{n}^{0}\left(t\right)=\hat{\Psi}_{n}\left(\theta_{0}(t),t\right).$ Applying Stute (1995, 1996) lemmata, $\hat{\Psi}_{n}\left(\theta,t\right)$ can be expressed as an $U-$ statistic of order three with Hájek projection $\hat{U}_{n}^{0}(t)=\hat{U}_{n}(\theta_{0}(t),t)$ , where

\hat{U}_{n}(\theta,t)=\frac{1}{n}\sum_{i=1}^{n}\zeta_{\psi_{\theta t}}(Y_{i},X_{i},\delta_{i}),

and for any function $\left(y,x\right)\mapsto\varphi\left(y,x\right)$ ,

\zeta_{\varphi}(y,x,d)=\varphi(y,x)\gamma^{(0)}(y)d+\gamma_{\varphi}^{\left(1\right)}(y)(1-d)-\gamma_{\varphi}^{(2)}(y),

(4.2)

$\displaystyle\gamma^{(0)}(y)$	$\displaystyle=$	$\displaystyle\exp\left\{\int_{-\infty}^{y-}\frac{F_{Y}^{0}(dw)}{1-F_{Y}(w)}\right\},$	(4.3)
$\displaystyle\gamma_{\varphi}^{\left(1\right)}(y)$	$\displaystyle=$	$\displaystyle\frac{1}{1-F_{Y}(y)}\int 1_{\left\{y<w\right\}}\varphi(w,x){\gamma}^{(0)}(w)F^{1}(dw,dx),$	(4.4)
$\displaystyle\gamma_{\varphi}^{\left(2\right)}(y)$	$\displaystyle=$	$\displaystyle\int\int\frac{1_{\left\{v<y,v<w\right\}}\varphi(w,x){\gamma}^{(0)}(w)}{\left[1-F_{Y}(v)\right]^{2}}F_{Y}^{0}(dv)F^{1}(dw,dx),$	(4.5)

where $F_{Y}^{0}(t)=\mathbb{P}\left(Y\leq t,\delta=0\right)\text{ and }F^{1}(t,x)=\mathbb{P}\left(Y\leq t,X\leq x,\delta=1\right)$ .

To derive the asymptotic normality of the score, we need the following extra moment conditions.

Assumption 4.6

For each $t\in\mathcal{T}$ ,

	$\displaystyle\left(a\right)\text{ }\mathbb{E}\left[\left\\|\psi_{\theta_{0}(t)t}(Y,X)\gamma^{\left(0\right)}(Y)\delta\right\\|^{2}\right]$	$\displaystyle<\infty$
	$\displaystyle\left(b\right)\text{ }\int\left\\|\psi_{\theta_{0}(t)t}(y,x)\right\\|\sqrt{S(y)}F(dy,dx)$	$\displaystyle<\infty,$

with

S\left(y\right)=\int_{0}^{y-}\frac{F_{C}\left(d\bar{y}\right)}{\left[1-F_{Y}\left(\bar{y}\right)\right]\left[1-F_{C}\left(\bar{y}\right)\right]}.

Assumption 4.6 $\left(a\right)$ guarantees that the variance of the leading term in $\zeta_{\psi_{\theta_{0}\left(t\right)t}}$ is bounded, and implies that $\mathbb{E}\left\|\zeta_{\psi_{\theta t}}(Y,X,\delta)\right\|^{2}$ is finite. The bias of the Kaplan-Meier integral is not necessarily $o\left(n^{-1/2}\right)$ for any integrand function, and may decrease to zero at a polynomial rate depending on the degree of censoring, which is characterized by the function $S$ . Assumption 4.6 $\left(b\right)$ on $\psi_{\theta_{0}(t)t}$ guarantees that the $\hat{\Psi}_{n}^{0}\left(t\right)$ bias is of order $o\left(n^{-1/2}\right)$ ; see Stute (1994), Chen and Ying (1996), and Chen and Lo (1997).

Let $\left\{Z\left(t\right)\right\}_{t=t_{1}}^{t_{m}}$ be a $k+1$ Gaussian random vector with zero mean and covariances

Cov\left(Z\left(t_{j}\right)Z\left(t_{m}\right)\right)=\mathbb{E}\left[\zeta_{\psi_{\theta_{0}\left(t_{j}\right)t_{j}}}(Y,X,\delta)\zeta_{\psi_{\theta_{0}\left(t_{m}\right)t_{m}}}^{\prime}(Y,X,\delta)\right]=\Omega_{0}\left(j,m\right),

(4.6)

with $j,m=1,...,k+1$ .

With Assumption 4.6, we show that $\hat{\Psi}_{n}^{0}\left(t\right)$ is asymptotically equivalent to a $U-statistic$ of order 3 with Hájek projection $\hat{U}_{n}^{0}\left(t\right)$ . Thus, $\sqrt{n}\hat{U}_{n}^{0}$ is asymptotically normal with finite dimensional covariance $\Omega_{0}\left(j,m\right)$ . Theorem 4.2 below follows by applying Theorem 5.21 in van der Vaart (1998).

Theorem 4.2

Under Assumptions 3.1, 3.2, 3.3, 4.4, 4.5, and 4.6, for each $t\in\mathcal{T}$ ,

\hat{\theta}_{n}\left(t\right)=\theta_{0}\left(t\right)+\mathcal{I}_{0}^{-1}\left(t\right)\cdot\hat{U}_{n}^{0}\left(t\right)+o_{p}\left(n^{-1/2}\right)

(4.7)

and

\left\{\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)\left(t\right)\right\}_{t=t_{1}}^{t_{m}}\rightarrow_{d}\left\{\mathcal{I}_{0}^{-1}\left(t\right)Z(t)\right\}_{t=t_{1}}^{t_{m}}.

In order to conduct inferences on $\theta_{0}(\cdot)$ , $\Omega_{0}\left(j,m\right)$ is estimated by

\hat{\Omega}_{n}\left(j,m\right)=\frac{1}{n}\sum_{i=1}^{n}\hat{\zeta}_{i}(t_{j})\hat{\zeta}_{i}^{\prime}(t_{m}),

where

$\displaystyle\hat{\zeta}_{i}(t)$	$\displaystyle=$	$\displaystyle\hat{\psi}_{i}(t)\hat{\gamma}_{i}^{(0)}\delta_{i}+\hat{\gamma}_{i}^{\left(1\right)}(t)(1-\delta)-\hat{\gamma}_{i}^{(2)}(t),$	(4.8)
$\displaystyle\hat{\gamma}_{i}^{(0)}$	$\displaystyle=$	$\displaystyle\exp\left\{\frac{1}{n}\sum_{j=1}^{n}1_{\left\{Y_{j}<Y_{i}\right\}}\frac{1-\delta_{j}}{1-\hat{F}_{nY}(Y_{j})}\right\},$
$\displaystyle\hat{\gamma}_{i}^{\left(1\right)}(t)$	$\displaystyle=$	$\displaystyle\frac{1}{1-\hat{F}_{Y}(Y_{i})}\frac{1}{n}\sum_{j=1}^{n}1_{\left\{Y_{i}<Y_{j}\right\}}\hat{\psi}_{j}(t)\hat{\gamma}_{j}^{(0)}\delta_{j},$
$\displaystyle\hat{\gamma}_{i}^{\left(2\right)}(t)$	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}\sum_{j=1}^{n}\sum_{\ell=1}^{n}\frac{1_{\left\{Y_{j}<Y_{i},Y_{j}<Y_{\ell}\right\}}\hat{\psi}_{\ell}(t)\hat{\gamma}_{\ell}^{(0)}}{\left[1-\hat{F}_{Y}(Y_{j})\right]^{2}}\left(1-\delta_{j}\right)\delta_{\ell},$
$\displaystyle\hat{\psi}_{i}(t)$	$\displaystyle=$	$\displaystyle\psi_{\hat{\theta}_{n}(t)t}\left(Y_{i},X_{i}\right),\text{ }\hat{F}_{Y}(t)=\frac{1}{n}\sum_{i=1}^{n}1_{\left\{Y_{i}\leq t\right\},}$

and $\mathcal{I}_{0}\left(t\right)$ is estimated by

\widehat{\mathcal{I}}_{n}\left(t\right)=\sum_{i=1}^{n}W_{ni}\cdot\hat{\psi}_{i:n}(t)\hat{\psi}_{i:n}^{\prime}(t),

(4.9)

where $\hat{\psi}_{i:n}(t)=\psi_{\hat{\theta}_{n}(t)t}\left(Y_{i:n},X_{[i:n]}\right)$ ²²2Alternatively, one can estimate $-\mathcal{I}_{0}\left(t\right)$ using the Hessian of $\hat{Q}_{n}(\hat{\theta}_{n},t)$ . We follow this path in our simulations..

Corollary 4.1

Under the assumptions of Theorem 4.2, $\hat{\Omega}_{n}\left(j,m\right)\rightarrow_{p}\Omega_{0}\left(j,m\right)$ and $\widehat{\mathcal{I}}_{n}\left(t_{j}\right)\rightarrow_{p}\mathcal{I}_{0}^{-1}\left(t_{j}\right)$ for all $t_{j},t_{m}\in\mathcal{T}$ .

We can also justify inferences with the assistance of a multiplier bootstrap technique in the spirit of Theorem 2.3 in Stute et al. (2000) and Theorem 4 in Sant’Anna (2020), using an external resample of $\left\{\hat{\zeta}_{i}(t)\right\}_{i=1}^{n}.$ Once we generate $iid$ random numbers $\left\{V_{i}\right\}_{i=1}^{n}$ independently of the data with mean $0$ , variance $1$ , and finite third moment, the “resampled” $\left\{\hat{\zeta}_{i}^{\ast}(t)\right\}_{i=1}^{n}$ , with $\hat{\zeta}_{i}^{\ast}(t)=\hat{\zeta}_{i}(t)V_{i}$ , forms a basis to compute the bootstrap analog of $\hat{\theta}_{n}(t)$ based on the asymptotic linearization (4.7),

\hat{\theta}_{n}^{\ast}\left(t\right)=\hat{\theta}_{n}(t)+\widehat{\mathcal{I}}_{n}^{-1}\left(t\right)\hat{U}_{n}^{\ast}\left(t\right),

(4.10)

where

\hat{U}_{n}^{\ast}\left(t\right)=\frac{1}{n}\sum_{i=1}^{n}\hat{\zeta}_{i}^{\ast}(t)

is the bootstrap version of $\hat{U}_{n}^{0}\left(t\right)$ .

Theorem 4.3

Let the assumptions of Theorem 4.2 hold. Then, for $t_{1},...,t_{m}\in\mathcal{T}$ , we have that $\left\{\sqrt{n}\left(\hat{\theta}_{n}^{\ast}-\hat{\theta}_{n}\right)\left(t\right)\right\}_{t=t_{1}}^{t_{m}}$ converges in distribution under the bootstrap law to $\left\{\mathcal{I}_{0}^{-1}\left(t\right)Z(t)\right\}_{t=t_{1}}^{t_{m}}$ , with probability 1.

The theorem follows by checking the conditional version of the Linderberg-Levy conditions to show that, with probability 1, $\hat{U}_{n}^{\ast}\left(t\right)$ converges in distribution to $Z$ under the bootstrap law, i.e., conditional on the data, with probability 1.

Next, we strengthen the pointwise results in Theorems 4.2 and 4.3 to hold uniformly in $t$ . Towards this end, let the space $\ell^{\infty}\left(\mathcal{G}\right)^{q}$ be the set of all $q\times 1$ vectors of bounded functions on $\mathcal{G}$ , that is, all the functions’ vectors $g:u\mapsto\mathbb{R}$ such that $\sup_{u\in\mathcal{G}}\left\|g\left(u\right)\right\|<\infty.$ $\mathcal{G}$ can be either an Euclidean or a functional space. We interpret the multivariate empirical process indexed by functions in $\mathcal{G}$ as a random element in the metric space $\ell^{\infty}\left(\mathcal{G}\right)^{q}$ endowed with the sup-norm. The empirical process $\hat{\Psi}_{n}$ is indexed by $\left(\theta,t\right)\in\Theta\times\mathcal{T}$ ; in this case $\mathcal{G}=\Theta\times\mathcal{T}_{0},$ where $\mathcal{T}_{0}$ is the compact interval of $\mathcal{T}$ of interest. But $\hat{\Psi}_{n}$ can also be interpreted as an empirical process indexed by the class $\mathcal{F}$ of functions $\psi_{\theta t}:\mathcal{X\times}\mathcal{T\times}\left[0,1\right]\mapsto\mathbb{R}^{1+k};$ in this case $\mathcal{G}=\mathcal{F}$ . The random process $\hat{\Psi}_{n}$ is viewed as a random element of the metric space $\ell^{\infty}\left(\Theta\times\mathcal{T}_{0}\right)^{k+1}$ or $\ell^{\infty}\left(\mathcal{F}\right)^{k+1},$ where $\mathcal{T}_{0}\in\mathcal{T}$ is the compact subset of $\mathcal{T}$ we are interested in.

In order to establish the weak convergence of $\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)$ as a random element of the metric space $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k+1}$ , we impose the following additional assumption, which is also assumed in Chernozhukov et al. (2013).

Assumption 4.7

The duration interval of interest $\mathcal{T}_{0}$ is a compact subset of $\mathbb{R}^{+},$ and the conditional distribution function $F_{\left.Y\right|X}\left(\left.y\right|X\right)$ admits a Lebesgue density $f_{\left.Y\right|X}\left(\left.y\right|X\right)$ that is uniformly bounded and uniformly continuous in $\mathcal{T}_{0}$ with probability 1.

Notice that $f_{\left.T\right|X}\left(\left.t\right|X\right)=\dot{\theta}_{0}^{\prime}\left(t\right)\cdot\boldsymbol{X\cdot}\lambda\left(\theta_{0}^{\prime}\left(t\right)\boldsymbol{X}\right)$ a.s., where $\dot{\theta}_{0}\left(t\right)=d\theta_{0}\left(t\right)/dt.$ Therefore, assuming that $f_{\left.T\right|X}$ is a proper probability density may not be innocuous. In practice, $\mathcal{T}_{0}$ should be bounded by the minimum and maximum value of the observed uncensored duration.

Let $\mathbb{Z}$ be a centered tight Gaussian element of $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k+1}$ with matrix of variance and covariance functions

\mathbb{E}\left[\mathbb{Z}\left(t_{1}\right)\mathbb{Z}^{\prime}\left(t_{2}\right)\right]=\Omega_{0}\left(t_{1},t_{2}\right),\text{ }t_{1},t_{2}\in\mathcal{T}_{0},

with $\Omega_{0}\left(t_{1},t_{2}\right)$ defined in (4.6).

Theorem 4.4

Given a compact subset $\mathcal{T}_{0}$ of $\mathcal{T},$ under Assumptions 3.1, 3.2, 3.3, 4.4, 4.5, 4.6, and 4.7, $\left\{\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)\left(t\right)\right\}_{t\in\mathcal{T}_{0}}$ converges weakly to $\left\{\mathcal{I}_{0}^{-1}\left(t\right)\mathbb{Z}\left(t\right)\right\}_{t\in\mathcal{T}_{0}}$ in $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k+1}.$

This functional CLT is proved extending Chernozhukov et al. (2013) (CFM henceforth) strategy to our setup. To this end, we first show, using Stute (1995, 1996) lemmata, that the score function $\hat{\Psi}_{n}$ and $U_{n}$ are asymptotically equivalent in the metric space $\ell^{\infty}(\Theta,\mathcal{T}_{0})^{k+1}$ , i.e.,

\sup_{\theta\in\Theta,t\in\mathcal{T}_{0}}\left\|\hat{\Psi}_{n}\left(\theta,t\right)-U_{n}\left(\theta,t\right)\right\|=o_{p}\left(n^{-1/2}\right),

where $U_{n}$ is a $U-process$ with Hájek projection $\hat{U}_{n}$ . Then, we show that

\sup_{\theta\in\Theta,t\in\mathcal{T}_{0}}\left\|U_{n}\left(\theta,t\right)-\hat{U}_{n}\left(\theta,t\right)\right\|=o_{p}\left(n^{-1/2}\right)

applying asymptotic results for U-process in Arcones and Giné (1993, 1995). Since the class $\mathcal{E}$ of functions $\left(y,x,d\right)\mapsto\zeta_{\psi_{\theta t}}\left(y,x,d\right)$ is Donsker,

\left\{\sqrt{n}\left(\hat{U}_{n}\left(\theta,t\right)-\mathbb{E}\left[\zeta_{\psi_{\theta t}}\left(Y,X,\delta\right)\right]\right):\theta\in\Theta,t\in\mathcal{T}_{0}\right\}

converges weakly in $\ell^{\infty}\left(\Theta\times\mathcal{T}_{0}\right)^{k+1}$ . Then, noticing that Assumptions 4.4 - 4.7 imply Assumption DR in CFM, the functional $\theta\mapsto\Psi_{0}\left(\theta,t\right)$ , with $\Psi_{0}\left(\theta,t\right)=\mathbb{E}\left[\zeta_{\psi_{\theta t}}\left(Y,X,\delta\right)\right]$ , is continuously differentiable for each $t\in\mathcal{T}_{0},$ with $\left.\left.d\Psi_{0}\left(\theta,t\right)\right/d\theta^{\prime}\right\rfloor_{\theta=\theta_{0}\left(t\right)}=-\mathcal{I}_{0}\left(t\right),$ which is invertible. Then, according to CFM’s Lemma E.3,

\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)\left(t\right)=\mathcal{I}_{0}^{-1}\left(t\right)\cdot\hat{\Psi}_{n}\left(\theta_{0}\left(t\right),t\right)+\hat{r}_{n}\left(t\right),

with $\sup_{t\in\mathcal{T}_{0}}\left\|\hat{r}_{n}\left(t\right)\right\|=o_{p}\left(1\right)$ . This result and $\mathcal{E}$ ’s Donskerness prove the theorem.

A bootstrap version of this theorem follows by applying the multiplier CLT (see sections 2.6 and 3.6 in van der Vaart and Wellner, 1996).

Theorem 4.5

Suppose the assumptions of Theorem (4.4) hold. Then, with probability 1 $\left\{\sqrt{n}\left(\hat{\theta}_{n}^{\ast}-\hat{\theta}_{n}\right)\left(t\right)\right\}_{t\in\mathcal{T}_{0}}$ converges weakly under the bootstrap law to $\left\{\mathcal{I}^{-1}\left(t\right)\mathbb{Z}(t)\right\}_{t\in\mathcal{T}_{0}}$ in $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k+1}.$

In next section we discuss applications of these results to test restrictions on $\theta_{0}$ , as well as to make inferences on counterfactual distributions and on average distribution marginal effects.

5 Applications

5.1 Testing Linear and Nonlinear Hypothesis

A natural class of hypotheses to be tested is those of the form

H_{0}:\sup_{t\in\mathcal{T}_{0}}\left\|\Phi\left(\theta_{0}\left(t\right)\right)\right\|=0\text{ against }H_{1}:\sup_{t\in\mathcal{T}_{0}}\left\|\Phi\left(\theta_{0}\left(t\right)\right)\right\|>0,

for some compact subset $\mathcal{T}_{0}$ of $\mathcal{T},$ where $\theta\mapsto\Phi\left(\theta\right)\in\mathbb{R}^{q}$ , $q\leq k+1,$ is a continuously differentiable map with derivatives $\dot{\Phi}\left(\theta\right)=\partial\Phi\left(\theta\right)/\partial\theta^{\prime},$ such that $rank\left(\dot{\Phi}\left(\theta\right)\right)=q$ for all $\theta$ in a neighborhood of $\theta_{0}\left(t\right)$ . For instance, when $\Phi\left(\theta\right)=\theta$ , the null and alternative reads

H_{0}:\sup_{t\in\mathcal{T}_{0}}\left\|\theta_{0}\left(t\right)\right\|=0\text{ against }H_{1}:\sup_{t\in\mathcal{T}_{0}}\left\|\theta_{0}\left(t\right)\right\|>0,

which is a significance test for varying coefficients. We can also test that a linear combination of coefficients is satisfied, e.g., $\Phi(\theta)=R\theta-r$ , where $R$ and $r$ are known and $rank(R)=k+1$ .

A natural statistic is,

\hat{\omega}_{n}=n\cdot\sup_{t\in\mathcal{T}_{0}}\left\|\Phi\left(\hat{\theta}_{n}\left(t\right)\right)\right\|^{2}.

By the delta-method, and applying Theorem 4.4, uniformly in $t\in\mathcal{T}_{0},$

\sqrt{n}\left[\Phi\left(\hat{\theta}_{n}\left(t\right)\right)-\Phi\left(\theta_{0}\left(t\right)\right)\right]=\dot{\Phi}\left(\theta_{0}\left(t\right)\right)\sqrt{n}\left(\hat{\theta}_{n}-\theta_{0}\right)\left(t\right)+o_{p}(1)

and under $H_{0},$

\hat{\omega}_{n}\rightarrow_{d}\omega_{\infty}=\sup_{t\in\mathcal{T}_{0}}\left\|\dot{\Phi}\left(\theta_{0}\left(t\right)\right)\mathcal{I}_{0}^{-1}\left(t\right)\mathbb{Z}\left(t\right)\right\|^{2}.

Since the $\omega_{\infty}^{\prime}s$ distribution depends on $\theta_{0}\left(\cdot\right)$ and other unknown features of the underlying data generating process, analytical critical values seems unfeasible, at least with this level of generality. However, critical values can be estimated using the bootstrapped estimator of $\theta_{0}\left(t\right)$

Applying Theorem 4.5, uniformly in $t\in\mathcal{T}_{0},$ for almost every sample, it follows that

\sqrt{n}\left[\Phi\left(\hat{\theta}_{n}^{\ast}\left(t\right)\right)\mathcal{-}\Phi\left(\hat{\theta}_{n}\left(t\right)\right)\right]=\dot{\Phi}\left(\hat{\theta}_{n}\left(t\right)\right)\sqrt{n}\left(\hat{\theta}_{n}^{\ast}-\hat{\theta}_{n}\right)\left(t\right)+o_{p}(1).

Therefore, we can use the bootstrap test statistic

\hat{\omega}_{n}^{\ast}=n\cdot\sup_{t\in\mathcal{T}_{0}}\left\|\Phi\left(\hat{\theta}_{n}^{\ast}\left(t\right)\right)\mathcal{-}\Phi\left(\hat{\theta}_{n}\left(t\right)\right)\right\|^{2}.

(5.1)

By Theorem 4.5, under $H_{0}$ , with probability 1, $\hat{\omega}_{n}^{\ast}$ converges in distribution under the bootstrap law to $\omega_{\infty}$ . Under $H_{1},$ $\hat{\omega}_{n}^{\ast}=O\left(1\right)$ with probability 1.

We now describe a practical bootstrap algorithm to conduct such types of tests.

Algorithm 5.1 (Bootstrapped-based hypothesis testing)

Step 1.

Generate $iid$ $\{V_{i}\}_{i=1}^{n}$ from a distribution with mean 0, variance 1 and finite third moment, e.g., Rademacher distribution.
Step 2.

For a grid $t=t_{1},\dots,t_{p}$ , compute $\left\{\hat{\theta}_{n}^{\ast}\left(t\right)\right\}_{t=t_{1}}^{t_{p}}$ as in (4.10), using the same $\{V_{i}\}_{i=1}^{n}$ for all $t$ ’s.
Step 3.

Compute the bootstrap test statistics $\hat{\omega}_{n}^{\ast}$ in (5.1)
Step 4.

Repeat steps 1-3 $B$ times.
Step 5.

Reject $H_{0}$ at the $\alpha-level$ of significance if either $\hat{\omega}_{n}>\hat{\omega}_{n,\alpha}^{\ast}$ , where $\hat{\omega}_{n,\alpha}^{\ast}$ is the empirical $(1-\alpha)$ -quantile of the $B$ bootstrap draws of $\hat{\omega}_{n}^{\ast}$ , or if p-val ${}^{*}>\alpha$ , where

$p\text{-}val^{*}=\frac{1}{B}\sum_{j=1}^{B}1_{\left\{\hat{\omega}_{n}^{\ast(j)}>\hat{\omega}_{n}\right\}.}$

An interesting application of this type of test is to testing constancy of some functional of $\theta_{0}$ . For instance, one can set

\Phi\left(\theta\right)=\left[\boldsymbol{0}\vdots I_{K}\right]\theta

to test that all the slope coefficients are constant. A test statistic for this hypothesis is

\tilde{\omega}_{n}=n\cdot\sup_{t\in\mathcal{T}_{0}}\left\|\Phi\left(\hat{\theta}_{n}\left(t\right)\right)-\bar{\hat{\Phi}}_{n}\right\|^{2},

with $\bar{\hat{\Phi}}_{n}=\int_{s\in\mathcal{T}_{0}}{\Phi\left(\hat{\theta}_{n}\left(s\right)\right)}\Psi(ds)/\int_{s\in\mathcal{T}_{0}}\Psi(ds)$ , $\Psi$ being a known probability measure.³³3In practice, one can set $\Psi$ to be equal to the empirical distribution of the censored outcome $Y$ , though, in this case, the bootstrap procedure needs to be adjusted to account for this additional source of randomness. See Section 5.3 for related results. Bootstrap critical values can be obtained using the bootstrapped statistic,

\tilde{\omega}_{n}^{\ast}=n\cdot\sup_{t\in\mathcal{T}_{0}}\left\|\left[\Phi\left(\hat{\theta}_{n}^{\ast}\left(t\right)\right)-\bar{\hat{\Phi}}_{n}^{*}\right]\mathcal{-}\left[\Phi\left(\hat{\theta}_{n}\left(t\right)\right)-\bar{\hat{\Phi}}_{n}\right]\right\|^{2},

with $\bar{\hat{\Phi}}_{n}^{*}=\int_{s\in\mathcal{T}_{0}}{\Phi\left(\hat{\theta}_{n}^{*}\left(s\right)\right)}\Psi(ds)/\int_{s\in\mathcal{T}_{0}}\Psi(ds)$ .

5.2 Inferences on $F_{\left.T\right|X}$

Theorems 4.4 and 4.5 can also be applied to make inferences on $F_{\left.T\right|X}\left(\left.t\right|x\right)$ . The DR $F_{\left.T\right|X}\left(\left.t\right|x\right)^{\prime}s$ estimator is

\hat{F}_{n\left.T\right|X}\left(\left.t\right|x\right)=\Lambda\left(\boldsymbol{x}^{\prime}\hat{\theta}_{n}\left(t\right)\right),

with bootstrap analog,

\hat{F}_{n\left.T\right|X}^{\ast}\left(\left.t\right|x\right)=\Lambda\left(\boldsymbol{x}^{\prime}\hat{\theta}_{n}^{\ast}\left(t\right)\right).

The asymptotic distribution can be obtained using the delta-method. Applying Theorem 4.4, we have that

\left\{\sqrt{n}\left(\hat{F}_{n\left.T\right|X}-F_{\left.T\right|X}\right)\left(\left.t\right|x\right)\right\}_{\left(t,x\right)\in\mathcal{A}_{0}}\rightarrow_{d}\left\{\lambda\left(\theta_{0}^{\prime}\left(t\right)x\right)\cdot x\cdot\mathcal{I}_{0}^{-1}\left(t\right)\mathbb{Z}\left(t\right)\right\}_{\left(t,x\right)\in\mathcal{A}_{0}},

where $\mathcal{A}_{0}=\mathcal{T}_{0}\times\mathcal{X}_{0}$ is a compact subset of $\mathcal{T}_{0}\times\mathcal{X}_{0}$ . Likewise, applying Theorem 4.5, with probability 1,

\left\{\sqrt{n}\left(\hat{F}_{n\left.T\right|X}^{\ast}-\hat{F}_{n\left.T\right|X}\right)\left(\left.t\right|x\right)\right\}_{\left(t,x\right)\in\mathcal{A}_{0}}\rightarrow_{d}\left\{\lambda\left(\theta_{0}^{\prime}\left(t\right)x\right)\cdot x\cdot\mathcal{I}_{0}^{-1}\left(t\right)\mathbb{Z}\left(t\right)\right\}_{\left(t,x\right)\in\mathcal{A}_{0}}\text{ in }\ell^{\infty}\left(\mathcal{A}_{0}\right).

Given two samples $\{Y_{i}^{(j)},\delta_{i}^{(j)},X_{i}^{(j)}\}_{i=1}^{n_{j}},~j=1,2$ , under some standard overlap conditions, we can use the conditional CDF estimator computed using sample 1, $\widehat{F}^{(1)}_{T|X,n}$ , to compute the marginal counterfactual CDF of population 1 with respect to population 2,

\widehat{F}^{(1,2)}_{T,n}(t)=\int_{\mathcal{X}}\widehat{F}^{(1)}_{T|X,n}(t|x)\tilde{F}^{(2)}_{X,n}(dx)=\dfrac{1}{n_{2}}\sum_{i=1}^{n_{2}}\left(\widehat{F}^{(1)}_{T|X,n}(t|X_{i}^{(2)}\right).

5.3 Average Distribution Marginal Effects

Although $\hat{\theta}_{n}\left(t\right)$ provides useful information about the direction/sign of the effect of changes in $X$ on $F_{T|X}\left(t|X\right),$ in general, $\hat{\theta}_{n}(t)$ may not have a clear economic interpretation. In this section, we argue that this potential limitation can be easily avoided by focusing on the average distribution marginal effects (ADME) of $X$ ,⁴⁴4 To avoid cumbersome notation, we consider the case where the covariates $X$ are continuous, and enter the distribution regression model linearly. The asymptotic validity of all our results do not rely on this simplification.

\eta_{0}\left(t\right)\equiv\mathbb{E}\left[\beta_{0}\left(t\right)\lambda\left(\boldsymbol{X}^{\prime}\theta_{0}\left(t\right)\right)\right].

(5.2)

The ADME is the distributional analogue of the popular average partial effects. From (5.2), it is clear that he natural estimator for $\eta_{0}\left(t\right)$ is

\hat{\eta}_{n}\left(t\right)=\hat{\beta}_{n}\left(t\right)\frac{1}{n}\sum_{i=1}^{n}\lambda\left(\boldsymbol{X}_{i}^{\prime}\hat{\theta}_{n}\left(t\right)\right).

In what follows, we derive the asymptotic properties of $\hat{\eta}_{n}\left(t\right)$ . As in Theorem 4.4, we first show that, uniformly in $t\in\mathcal{T}_{0}$ ,

\left(\hat{\eta}_{n}\left(t\right)-{\eta}_{0}\left(t\right)\right)=\frac{1}{n}\sum_{i=1}^{n}\zeta_{i,ADME}(t)+o_{p}\left(n^{-1/2}\right),

(5.3)

where

$\displaystyle\zeta_{i,ADME}(t)$	$\displaystyle=$	$\displaystyle\beta_{0}\left(t\right)\cdot\left(\lambda(\mathbf{X}_{i}^{\prime}\theta_{0}(t))-\mathbb{E}\left[\lambda(\mathbf{X}^{\prime}\theta_{0}(t))\right]\right)$
		$\displaystyle+\mathbb{E}\left[\lambda(\mathbf{X}^{\prime}\theta_{0}(t))\right]\cdot H\cdot\mathcal{I}_{0}\left(t\right)^{-1}\cdot\zeta_{i}(t)$
		$\displaystyle+\beta_{0}\left(t\right)\cdot\mathbb{E}\left[\dot{\lambda}\left(\mathbf{X}^{\prime}\theta_{0}(t)\right)\mathbf{X}^{\prime}\right]\cdot\mathcal{I}_{0}\left(t\right)^{-1}\cdot\zeta_{i}(t)$

with $\zeta_{i}(t)$ is as defined in (4.2), with $\varphi=\psi_{\theta t}$ , $\dot{\lambda}\left(u\right)=d\lambda(u)/du$ , $H\equiv\left[0_{k},I_{k}\right]$ , $0_{k}$ the $k\times 1$ vector of zero, and $I_{k}$ the $k$ -dimensional identity matrix.

Let $\mathbb{Z}_{ADME}$ be a centered tight Gaussian element of $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k}$ with matrix of variance and covariance functions

\mathbb{E}\left[\mathbb{Z}_{ADME}\left(t_{1}\right)\mathbb{Z}_{ADME}^{\prime}\left(t_{2}\right)\right]=\mathbb{E}\left[\zeta_{ADME}\left(t_{1}\right)\zeta_{ADME}^{\prime}\left(t_{2}\right)\right].

Consider the bootstrapped estimator for the ADME

\hat{\eta}_{n}^{\ast}\left(t\right)=\hat{\eta}_{n}(t)+\frac{1}{n}\sum_{i=1}^{n}V_{i}\cdot\hat{\zeta}_{i,ADME}(t)

(5.5)

where $iid$ random numbers $\left\{V_{i}\right\}_{i=1}^{n}$ are $iid$ random variables with mean $0$ and variance $1$ , generated independently of the sample, and

$\displaystyle\hat{\zeta}_{i,ADME}(t)$	$\displaystyle=$	$\displaystyle\hat{\beta}_{n}\left(t\right)\cdot\left(\lambda(\boldsymbol{X}_{i}^{\prime}\hat{\theta}_{n}\left(t\right))-\frac{1}{n}\sum_{j=1}^{n}\lambda(\boldsymbol{X}_{j}^{\prime}\hat{\theta}_{n}\left(t\right))\right)$
		$\displaystyle+\left(\frac{1}{n}\sum_{j=1}^{n}\lambda(\boldsymbol{X}_{j}^{\prime}\hat{\theta}_{n}\left(t\right))\right)\cdot H\cdot\widehat{\mathcal{I}}_{n}\left(t\right)^{-1}\cdot\hat{\zeta}_{i}(t)$
		$\displaystyle+\hat{\beta}_{n}\left(t\right)\cdot\left(\frac{1}{n}\sum_{j=1}^{n}\dot{\lambda}\left(\boldsymbol{X}_{j}^{\prime}\hat{\theta}_{n}\left(t\right)\right)\mathbf{X}_{j}^{\prime}\right)\cdot\widehat{\mathcal{I}}_{n}\left(t\right)^{-1}\cdot\hat{\zeta}_{i}(t)$

where $\hat{\zeta}_{i}(t)$ is as in (4.8) and $\widehat{\mathcal{I}}_{n}\left(t\right)$ is as in (4.9).

Theorem 5.6

Suppose that the conditions of Theorem 4.4 hold. Then,

\left\{\sqrt{n}\left(\hat{\eta}_{n}\left(t\right)-{\eta}_{0}\left(t\right)\right)\right\}_{t\in\mathcal{T}_{0}}\rightarrow_{d}\left\{\mathbb{Z}_{ADME}(t)\right\}_{t\in\mathcal{T}_{0}}\text{ in }\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k}.

In addition, if follows that $\left\{\sqrt{n}\left(\hat{\eta}_{n}^{\ast}-\hat{\eta}_{n}\right)\left(t\right)\right\}_{t\in\mathcal{T}_{0}}$ converges weakly under the bootstrap law to $\left\{\mathbb{Z}_{ADME}(t)\right\}_{t\in\mathcal{T}_{0}}$ in $\ell^{\infty}\left(\mathcal{T}_{0}\right)^{k}$ , with probability one.

We now describe a practical bootstrap algorithm to compute simultaneous confidence intervals for the ADME associated with a given covariate $X_{1}$

Algorithm 5.2 (Bootstrapped Simultaneous Confidence Intervals)

Step 1.

Generate $iid$ $\{V_{i}\}_{i=1}^{n}$ from a distribution with mean 0, variance 1 and finite third moment, e.g., Rademacher distribution.
Step 2.

For a grid $t=t_{1},\dots,t_{p}$ , compute $\left\{\hat{\eta}_{n}^{\ast}\left(t\right)\right\}_{t=t_{1}}^{t_{p}}$ as in (5.5), using the same $\{V_{i}\}_{i=1}^{n}$ for all $t$ ’s.
Step 3.

Let $\widehat{\kappa}$ and $\widehat{\kappa}^{\ast}$ be the vectorized $\left\{\hat{\eta}_{n}\left(t\right)\right\}_{t=t_{1}}^{t_{p}}$ and $\left\{\hat{\eta}_{n}^{\ast}\left(t\right)\right\}_{t=t_{1}}^{t_{p}}$ , respectively, denote their $j$ - $th$ -element by $\widehat{\kappa}\left(j\right)$ and $\widehat{\kappa}^{\ast}\left(j\right)$ , and compute $\hat{R}^{\ast}\left(j\right)=\left(\widehat{\kappa}^{\ast}-\widehat{\kappa}\right)\left(j\right).$
Step 4.

Repeat steps 1-3 $B$ times.
Step 5.

For each bootstrap draw, compute $\bar{R}^{{}^{\ast}}=\max_{j}\left(\left|\widehat{R}^{\ast}\left(j\right)\right|\right).$
Step 6.

Construct $\widehat{c}_{1-\alpha}$ as the empirical $\left(1-a\right)$ -quantile of the $B$ bootstrap draws of $\bar{R}^{{}^{\ast}}$ .
Step 7.

Construct the bootstrapped simultaneous confidence band for $\hat{\eta}_{n}\left(t\right)$ , as $\widehat{CB}\left(t\right)=\left[\hat{\eta}_{n}\left(t\right)\pm\widehat{c}_{1-\alpha}\right]$ .

6 Simulation Studies

In this section, we briefly summarize the simulation results discussed in Section LABEL:MC-appendix of the Supplementary Appendix. In short, we compare the finite sample performance of the proposed KMDR estimators with those based on the proportional hazard (PH) model and on the proportional odds (PO) model. We consider three different data generating processes (DGPs): one DGP that satisfies the PH but not the PO assumption, one DGP that satisfies the PO but not the PH assumption, and one DGP where both PH and PO assumptions are violated, though it admits a DR specification with time-varying slope coefficient. We consider different levels of random censoring, and compare the KMDR, PH and PO models based on the conditional CDF, $F_{\left.T\right|X}$ , and the ADME as defined in (5.2). Both functionals have a clear economic interpretation.

Overall, the simulation results highlight that our proposed KMDR estimators performs nearly as well as the PH and PO model when these models are correctly specified, since KMDR nests both PH and PO specifications. On the other hand, when PH and PO models are misspecified, the proposed KMDR estimators performs better than these other popular specifications. Such gains are especially noticeable when one is interested in the ADME $\left(t\right)$ ; see Table LABEL:tab:dgp3. When comparing DR specifications with different link functions, we notice that estimators that use the cloglog link function tend to be more robust against model misspecifications than those based on the logit link function. This is perhaps because the cloglog link function is asymmetric and adapts better to the non-central parts of the distribution where there are many zeros (left tail) or many ones (right tail). Thus, we recommend favoring the cloglog specification in detriment of the logit one, though a more formal discussion about such choices are beyond the scope of this paper.

7 The effect of unemployment benefits on unemployment duration

One of the main concerns of the design of unemployment insurance policies is their adverse effect on unemployment duration. The prevailing view of the economics literature is that increasing unemployment insurance (UI) benefits leads to higher unemployment duration driven by a moral hazard effect: higher UI increases the agent’s reservation wage and reduces the incentive to job search, see, e.g., Krueger and Meyer (2002) and the references therein. Given that moral hazard leads to a reduction of social welfare, this argument has been used against increases in UI benefits.

In a seminal paper, Chetty (2008) challenges the traditional view that the link between unemployment benefits and duration is only because of moral hazard. He shows, among many other things, that distortions cause by UI on search behavior are mostly due to a “liquidity effect”. In simple terms, UI benefits provide cash-in-hand that allows liquidity constrained agents to equalize the marginal utility of consumption when employed and unemployed. Such a liquidity effect reduces the pressure to find a new job, leading to longer unemployment spells. However, in contrast to the moral hazard effect, the liquidity effect is a socially beneficial response to the correction of market failures. Thus, if one finds support in favor of liquidity effects, increases in UI benefits may lead to improvements in total welfare.

In this section, we provide additional evidence of the existence of this liquidity effect by comparing the effect of UI on household that are liquidity constrained with those that are unconstrained. In contrast to Chetty (2008), we do not rely on the Cox proportional hazard model but rather use our proposed KMDR tools. In this context, the Cox hazard model might be too rigid as does not allow for the effect of UI benefits to vary depending on whether a worker is starting their unemployment claim or they have been unemployed for a while. As we argued before, our proposed tools allow for this richer types of heterogeneity.

As in Chetty (2008), our data comes from the Survey of Income and Program Participation (SIPP) for the period spanning 1985-2000. Each SIPP panel surveys households at four-month intervals for two-four years, collecting information on household and individual characteristics, as well as employment status. The sample consists of prime-age males who have experienced job separation and report to be job seekers, are not on temporary layoff, have at least three months of work history in the survey and took up unemployment insurance benefits within one month after job loss. These restrictions leave 4,529 unemployment spells in the sample, 21.3% of those being right-censored. Unemployment durations are measured in weeks while individuals’ UI benefits are measured using the two-step imputation method described in Chetty (2008). For further details about the data, see Chetty (2008).

To analyze the effect of UI benefits on unemployment duration, we estimate the DR model

F_{\left.T\right|UI,Z}\left(\left.t\right|UI,Z\right)=\Psi\left(\alpha(t)+\beta(t)\ln UI+Z^{\prime}\gamma(t)\right),

(7.1)

where $\Psi\left(u\right)=1-\exp\left(-\exp\left(u\right)\right)$ is the complementary log-log link function, $UI$ is the worker’s weekly unemployment insurance benefits, and $Z$ is a vector of controls including the worker’s age, years of education, marital status dummy, logged pre-unemployment annual wage, and total wealth. To control for local labor market conditions and systematic differences in risk and performance across sector types, $Z$ also includes the state average unemployment rate and dummies for industry. We note that Chetty (2008) considers additional controls, including state, year and occupation fixed effects, resulting in a specification with almost 90 unknown parameters. For this reason, we adopt a more parsimonious specification described above. Let $\theta\left(t\right)=\left(\alpha(t),\beta(t),\gamma(t)^{{}^{\prime}}\right)^{\prime}$ , and $\mathbf{X=}\left(1,\ln UI,Z^{\prime}\right)^{\prime}$ .

Our main goal is to understand the effect of changes in UI on the probability of one finding a job in $t$ weeks, where $t=2,3,\dots 50.$ Although the sign of $\beta\left(t\right)$ indicates the direction of such a change, its magnitude may not have a straightforward economic interpretation. Thus, we focus on the ADME $\left(t\right)$ of $\ln UI$ ,

ADME_{\ln UI}\left(t\right)=\mathbb{E}\left[\frac{\partial F_{\left.T\right|UI,Z}\left(\left.t\right|UI,Z\right)}{\partial\ln UI}\right],

(7.2)

which is easy to interpret. For instance, an estimate of $ADME_{lnUI}(t)$ equal to 0.1 suggests that, on average, raising unemployment benefits by one percent increases the probability of finding a job in $t$ weeks or less by 0.1 percentage point. As discussed in Theorem 3, we can estimate $ADME_{\ln UI}\left(t\right)$ by

\widehat{ADME}_{n,\ln UI}\left(t\right)=\frac{1}{n}\sum_{i=1}^{n}\hat{\beta}_{n}(t)\cdot\psi\left(\mathbf{X}_{i}^{\prime}\hat{\theta}_{n}(t)\right),

(7.3)

where $\psi\left(u\right)=d\Psi\left(u\right)/du$ , and $\hat{\theta}_{n}\left(t\right)=\left(\hat{\alpha}_{n}(t),\hat{\beta}_{n}(t),\hat{\gamma}_{n}(t)^{{}^{\prime}}\right)^{\prime}$ is the KMDR estimator of $\theta\left(t\right)$ . When $\Psi$ is the cloglog link function, $\psi\left(u\right)=\exp\left(u\right)\exp\left(-\exp\left(u\right)\right)$ .

In what follows, we multiply the estimators of ADME ${}_{\ln UI}\left(t\right)$ by 10, which is interpreted as the effect of a 10% increase in UI benefits. Figure LABEL:fig.amde1 reports the estimates for the full sample (solid line), together with the 90 percent bootstrapped pointwise, and simultaneous confidence intervals (dark and light shaded area, respectively) computed using Algorithm 5.2.

The result reveals interesting effects. On average, a 10% increase in UI benefits appears to have no effect on the probability of a worker finding a job in the first seven weeks of the unemployment spell. Nonetheless, Figure LABEL:fig.amde1 shows that a change in UI is associated with a reduction of the probability of finding a job in the first $t=\left\{8,\dots,50\right\}$ weeks. Such an effect seems to be monotone until week 18, where a 10% increase in UI is associated with a two-percentage-point decrease in the average probability of finding a job until that week. After week 18, the effect of an increase in UI benefits on employment probabilities seems to weaken but remains statistically significant at the 10% level, except for $t=\left\{34,35,39\right\}$ . Note that the bootstrap simultaneous confidence interval is slightly wider than the pointwise one. However, it is important to mention that the bootstrap uniform confidence interval is designed to contain the entire true path of the ADME ${}_{\ln UI}\left(t\right)$ 90% of the time, which is in sharp contrast to the bootstrap pointwise confidence interval. This highlights the practical appeal of using simultaneous instead of pointwise inference procedures to better quantify the overall uncertainty in the estimation of all ADMEs. Overall, Figure LABEL:fig.amde1 shows that although an increase in UI benefits is close to zero at the beginning of the unemployment spell, they have an U-shaped effect on the unemployment duration distribution. Interestingly, estimates of ADME ${}_{\ln UI}\left(t\right)$ based on the Cox PH model with the same set of covariates as in (7.1) suggests that the effect of UI on unemployment duration distribution is monotone for $t\in[0,50]$ . However, once the proportional hazard specification is tested using either Grambsch and Therneau (1994) procedure or our bootstrap-based testing procedure for the null hypothesis that all $\hat{\theta}_{n}(t),t=\{2,\dots,50\}$ are constant in $t$ , as described in Section 5.1, the null of proportionality is rejected at the usual significance levels⁵⁵5The p-value associated with our proposed test statistics is 0.004., implying that, indeed, a proportional hazard model may not be appropriate for this application. Our KMDR model does not rely on such an assumption.

Although the results in Figure LABEL:fig.amde1 show that, on average, changes in UI have a U-shaped effect on the unemployment duration distribution, the analysis remains silent about the liquidity effects. To shed light on the importance of the liquidity effect relative to the moral hazard effect, Chetty (2008) argues that one can compare the response to an increase in UI benefits of workers who are not financially constrained with those who are constrained. Given that unconstrained workers have the ability to smooth consumption during unemployment, liquidity effects are absent and UI benefits lengthen unemployment duration only via moral hazard effects for these subgroup of individuals. To pursue this logic, we follow Chetty (2008) and use two proxy measures of liquidity constraint: liquid net wealth at the time of job loss (“net wealth”) and an indicator for having to make a mortgage payment. Chetty (2008) argues that workers with higher net wealth are less sensitive to UI benefit levels because they are less likely to be financially constrained. Similarly, workers that have to make mortgage payments before job loss have less ability to smooth consumption during unemployment because they are unlikely to sell their homes during the unemployment spell, whereas renters can adjust faster. To assess the role of liquidity effects, we divide the entire sample into four subsamples: $(a)$ workers with net wealth below the median, $(b)$ workers with net wealth above the median, $\left(c\right)$ workers with a mortgage, and $\left(d\right)$ workers without a mortgage. For each subsample, we estimate (7.1) using the KMDR approach, its corresponding estimator of ADME ${}_{\ln UI}\left(t\right)$ , $\widehat{ADME}_{n,\ln UI}\left(t\right)$ as in (7.3), and construct 90% bootstrap pointwise and simultaneous confidence intervals.

Refer to caption — Figure 1: Estimated average effect of a 10% increase of unemployment benefits on the distribution of unemployment duration, in different subpopulations. The solid line represents the estimates while the dark and light shaded areas are the 90% bootstrapped based pointwise and simultaneous confidence interval based on 100,000 bootstrap draws, respectively. Sample sizes for Panel (a)-(d) are 2152, 2155, 1952 and 2355, respectively.

Figure 1 reports the average effects of a 10% increase in UI benefits on the unemployment duration distribution in each of the four subsamples. The solid lines are the point estimates and the dark and light shaded areas are the 90% bootstrap pointwise and simultaneous confidence intervals, respectively.

The results show interesting heterogeneity of UI benefits effects with respect to liquidity constraint proxies. For those workers with net wealth above the median, an increase in UI benefits has no statistically significant effect on the unemployment duration distribution, except for $t=\left\{17,18\right\}$ . Thus, for those workers who are not constrained (and, therefore, for whom the liquidity effect is approximately zero), the moral hazard effect seems to be close to zero. On the other hand, for those workers with net wealth below the median, an increase in UI benefits is associated with lower probabilities of finding a job. The conclusion using mortgage as a proxy for liquidity constraint is qualitatively the same. Note that our analysis suggests that the ADME of an increase of UI is non-monotone across the unemployment duration distribution, highlighting the flexibility of the DR approach. Indeed, using our proposed test for all slope coefficients being constant, the proportional hazard specification is rejected at the 10% significance levels in each subsample.

In other to further highlight that the effect of UI benefits on unemployment duration is different depending on whether a worker is likely to be liquidity constrained or not, in Figure 2 we plot the difference of the estimated ADME between liquidity constrained and not-liquidity constrained workers, together with their pointwise and simultaneous confidence bands. Panel (a) suggests that UI benefits have a more negative effect on workers with net wealth below the median than on wealthier workers, and that this difference is statistically significant at the 10% level. Panel (b) reveals a qualitatively similar pattern when one compared workers with mortgage with those without mortgage, though the magnitude of this difference is less pronounced (but we can still reject the null hypothesis that the effects are the same, at the 10% significance level.)

Taken together, our results provide suggestive evidence that UI benefits have a non-monotone effect on the unemployment duration distribution and that such an effect varies whether workers are likely to be liquidity constrained or not. More precisely, our results suggest that the effect of UI on unemployment duration is larger for liquidity constrained workers. Through the lens of the results of Chetty (2008), our findings suggest that an increase in UI benefits affects unemployment duration not only through moral hazard but also because of a “liquidity effect.”

Acknowledgements

We are most grateful to the editor and two referees for their constructive comments which have lead to a improved paper. Research funded by Ministerio Economía y Competitividad (Spain), ECO2-17-86675-P, and by Comunidad de Madrid (Spain), MadEco-CM S2015/HUM-3444. Andrés García-Suaza was supported by the Colombia Científica-Alianza EFI Research Program, with code 60185 and contract number FP44842-220-2018, funded by The World Bank through the call Scientific Ecosystems, managed by the Colombian Ministry of Science, Technology and Innovation. Part of this article was written when Pedro H. C. Sant’Anna was visiting the Cowles Foundation at Yale University, whose hospitality is gratefully acknowledged.

References

Aalen (1980) Aalen, O. O. (1980). Lecture Notes in Statistics – Proceedings. In W. Klonecki, A. Kozek, and J. Rosinski (Eds.), Lecture Notes in Statistics, Volume 2, pp. 1–25. New York: Springer.
Amemiya (1985) Amemiya, T. (1985). Advanced Econometrics. Cambridge: Harvard University Press.
Arcones and Giné (1993) Arcones, M. A. and E. Giné (1993, jul). Limit Theorems for $U$-Processes. The Annals of Probability 21(3), 347–370.
Arcones and Giné (1995) Arcones, M. A. and E. Giné (1995). On the law of the iterated logarithm for canonical U-statistics and processes. Journal of Theoretical Probability 58, 217–245.
Armstrong and Sloan (1989) Armstrong, B. G. and M. Sloan (1989). Ordinal regression models for epidemiologic data. American Journal of Epidemiology 129, 191–204.
Bennett (1983) Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine 2, 273–277.
Chen and Lo (1997) Chen, K. and S.-H. Lo (1997). On the rate of uniform convergence of the product-limit estimator: strong and weak laws. Annals of Statistics 25, 1050–1087.
Chen and Ying (1996) Chen, K. and Z. Ying (1996). A counterexample to a conjecture concerning the Hall-Wellner band. Annals of Statistics 24, 641–646.
Cheng et al. (1995) Cheng, S. C., L. J. Wei, and Z. Ying (1995). Analysis of transformation models with censored data. Biometrika 82, 835–845.
Chernozhukov et al. (2019) Chernozhukov, V., I. Fernández-Val, and S. Luo (2019). Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK. arXiv preprint arXiv: 1811.11603, 1–50.
Chernozhukov et al. (2013) Chernozhukov, V., I. Fernández-Val, and B. Melly (2013). Inference on counterfactual distributions. Econometrica 81, 2205–2268.
Chernozhukov et al. (2020) Chernozhukov, V., I. Fernández-Val, and M. Weidner (2020). Network and Panel Quantile Effects Via Distribution Regression. Journal of Econometrics Forthcoming, 1–28.
Chetty (2008) Chetty, R. (2008). Moral Hazard versus Liquidity and Optimal Unemployment Insurance. Journal of Political Economy 116, 173–234.
Clayton (1976) Clayton, A. D. G. (1976). An Odds Ratio Comparison for Ordered Categorical Data with Censored Observations. Biometrika 63, 405–408.
Cox (1975) Cox, D. (1975). Partial Likelihood. Biometrika 62, 269–276.
Cox (1972) Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 34, 187–220.
Doksum and Gasko (1990) Doksum, K. and M. Gasko (1990). On a correspondence between models in binary regression and survival analysis. International Statistical Review 58, 243–252.
Foresi and Peracchi (1995) Foresi, S. and F. Peracchi (1995). The conditional Distribution of Excess Returns : The Conditional Distribution An Empirical Analysis. Journal of the American Statistical Association 90, 451–466.
Grambsch and Therneau (1994) Grambsch, P. and T. Therneau (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81, 515–526.
Guo and Zeng (2014) Guo, S. and D. Zeng (2014). An overview of semiparametric models in survival analysis. Journal of Statistical Planning and Inference 151-152, 1–16.
Honore et al. (2002) Honore, B., S. Khan, and J. Powell (2002). Quantile regression under random censoring. Journal of Econometrics 109, 67–105.
Hothorn et al. (2014) Hothorn, T., T. Kneib, and P. Bühlmann (2014). Conditional transformation models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 3–27.
Kalbfleisch and Prentice (1980) Kalbfleisch, J. D. and R. L. Prentice (1980). The statistical analysis of failure time data (2nd ed.). Hoboken, NJ: Wiley.
Kaplan and Meier (1958) Kaplan, E. L. and P. Meier (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457–481.
Krueger and Meyer (2002) Krueger, A. B. and B. D. Meyer (2002). Labor supply effects of social insurance. In A. J. Auerbach and M. Feldstein (Eds.), Handbook of Public Economics, Volume 4, Chapter 33, pp. 2327–2392. Amsterdam: North-Holland: Elsevier.
McCullagh (1980) McCullagh, P. (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society. Series B (Methodological) 42, 109–142.
Murphy et al. (1997) Murphy, S. A., A. J. Rossini, and A. W. VanderVaart (1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association 92, 968–976.
Peng and Huang (2008) Peng, L. and Y. Huang (2008). Survival Analysis With Quantile Regression Models. Journal of the American Statistical Association 103, 637–649.
Portnoy (2003) Portnoy, S. (2003). Censored Regression Quantiles. Journal of the American Statistical Association 98, 1001–1012.
Robins and Rotnitzky (1992) Robins, J. M. and A. Rotnitzky (1992). Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers. In AIDS Epidemiology, pp. 297–331. Boston, MA: Birkhäuser Boston.
Robins and Rotnitzky (1995) Robins, J. M. and A. Rotnitzky (1995). Semiparametric Efficiency in Multivariate Regression Models with Missing Data. Journal of the American Statistical Association 90, 122–129.
Rothe and Wied (2013) Rothe, C. and D. Wied (2013). Misspecification Testing in a Class of Conditional Distributional Models. Journal of the American Statistical Association 108, 314–324.
Rothe and Wied (2019) Rothe, C. and D. Wied (2019). Estimating Derivatives of Function-Valued Parameters in a Class of Moment Condition Models. Journal of Econometrics Forthcoming, 1–19.
Sant’Anna (2020) Sant’Anna, P. H. C. (2020). Nonparametric Tests for Treatment Effect Heterogeneity with Duration Outcomes. Journal of Business and Economic Statistics Forthcoming, 1–17.
Stute (1993) Stute, W. (1993). Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis 45, 89 – 103.
Stute (1994) Stute, W. (1994). The bias of Kaplan-Meier integrals. Scandinavian Journal of Statistics 21, 475–484.
Stute (1995) Stute, W. (1995). The central limit theorem under random censorship. Annals of Statistics 23, 422–439.
Stute (1996) Stute, W. (1996). Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics 23, 461–471.
Stute (1999) Stute, W. (1999). Nonlinear censored regression. Statistica Sinica 9, 1089–1102.
Stute et al. (2000) Stute, W., W. González-Manteiga, and C. . Sánchez Sellero (2000). Nonparametric model checks in censored regression. Communications in Statistics - Theory and Methods 29, 1611 – 1629.
Tsiatis (1975) Tsiatis, A. A. (1975). A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences of the United States of America 72, 20–22.
Tsiatis (1981) Tsiatis, A. A. (1981). A large sample study of Cox’s regression model. Annals of Statistics 9, 93–108.
van den Berg (1990) van den Berg, G. (1990). Nonstationarity in Job Search Theory. Review of Economic Studies 57, 255–277.
van der Vaart (1998) van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge: Cambridge University Press.
van der Vaart and Wellner (1996) van der Vaart, A. W. and J. A. Wellner (1996). Weak Convergence and Empirical Processes. New York: Springer.
Wang and Wang (2009) Wang, H. J. and L. Wang (2009). Locally weighted censored quantile regression. Journal of the American Statistical Association 104, 1–32.
Wooldridge (2007) Wooldridge, J. M. (2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics 141, 1281–1301.
Ying et al. (1995) Ying, Z., S. Jung, and L. Wei (1995). Survival analysis with median regression models. Journal of the American Statistical Association 90, 178–184.

Distribution Regression in Duration Analysis: an Application to Unemployment Spells

Abstract

1 Introduction

2 Distribution Regression with Duration Outcomes

3 The Kaplan-Meier Distribution Regression

Assumption 3.1

Assumption 3.2

Remark 3.1

Assumption 3.3

4 Asymptotic Theory

Assumption 4.4

Theorem 4.1

Assumption 4.5

Assumption 4.6

Theorem 4.2

Corollary 4.1

Theorem 4.3

Assumption 4.7

Theorem 4.4

Theorem 4.5

5 Applications

5.1 Testing Linear and Nonlinear Hypothesis

Algorithm 5.1 (Bootstrapped-based hypothesis testing)

5.2 Inferences on FT|XF_{\left.T\right|X}

5.3 Average Distribution Marginal Effects

Theorem 5.6

Algorithm 5.2 (Bootstrapped Simultaneous Confidence Intervals)

6 Simulation Studies

7 The effect of unemployment benefits on unemployment duration

Acknowledgements

References

Distribution Regression in Duration Analysis:
an Application to Unemployment Spells

5.2 Inferences on $F_{\left.T\right|X}$