This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bunching and Taxing Multidimensional Skillsthanks: We thank Hector Chade, Paolo Martellini, Chris Moser, Emmanuel Saez, Florian Scheuer, Andrew Shephard, Chris Taber and especially Jean-Charles Rochet for detailed insightful comments.

Job Boerma
        University of Wisconsin-Madison        
   Aleh Tsyvinski
               Yale University               
   Alexander P. Zimin
               MIT               
( March 2025 )
Abstract

We characterize optimal policy in a multidimensional nonlinear taxation model with bunching. We develop an empirically relevant model with cognitive and manual skills, firm heterogeneity, and labor market sorting. We first derive two conditions for the optimality of taxes that take into account bunching. The first condition - a stochastic dominance optimal tax condition - shows that at the optimum the schedule of benefits dominates the schedule of distortions in terms of second-order stochastic dominance. The second condition - a global optimal tax formula - provides a representation that balances the local costs and benefits of optimal taxation while explicitly accounting for global incentive constraints. Second, we use Legendre transformations to represent our problem as a linear program. This linearization allows us to solve the model quantitatively and to precisely characterize bunching. At an optimum, 10 percent of workers is bunched. We introduce two notions of bunching – blunt bunching and targeted bunching. Blunt bunching constitutes 30 percent of all bunching, occurs at the lowest regions of cognitive and manual skills, and lumps the allocations of these workers resulting in a significant distortion. Targeted bunching constitutes 70 percent of all bunching and recognizes the workers’ comparative advantage. The planner separates workers on their dominant skill and bunches them on their weaker skill, thus mitigating distortions along the dominant skill dimension.

1 Introduction

We make significant progress in analyzing multidimensional optimal nonlinear income taxation problems with bunching. This is one of the important open questions in the theory and practice of optimal taxation. Our paper is the first to solve for optimal multidimensional taxation with bunching in an empirically relevant model of wage determination.

The key difficulty of analyzing multidimensional optimal tax problems lies in characterizing regions of bunching. Bunching occurs when workers of different types receive identical allocations. Kleven, Kreiner, and Saez (2009) establish the importance of bunching in a model of couples taxation in which one partner makes only an extensive margin labor supply choice. Little is known about optimal taxation and the nature of bunching in more general settings. At the same time, a large literature in labor economics emphasizes the importance of multidimensional skills and labor market sorting in determining wage dispersion.

We develop an empirically relevant model that incorporates three important elements of wage dispersion. Workers differ in both manual and in cognitive skills, firms differ in productivity, and workers’ output depends on the firms they work for and coworkers they work with. We characterize equilibrium for the positive model in closed form and use this closed-form solution to identify the underlying multidimensional skill distribution.

The characterization of optimal taxes in our model is based on two main theoretical insights. First, we derive two conditions for optimal taxes that take into account bunching. The first condition - a stochastic dominance optimal taxation condition - shows that at the optimum the benefits and the costs are not necessarily equated for each skill level but rather the entire schedule of benefits dominates the entire schedule of distortions in terms of second-order stochastic dominance. The second condition - a global optimal tax formula - provides a representation that balances the local costs and benefits of optimal taxation while explicitly accounting for global incentive constraints. These optimal tax conditions generalize the classic unidimensional optimal taxation conditions in Mirrlees (1971), Diamond (1998), and Saez (2001) to a multidimensional optimal taxation problem, accounting for global incentive constraints and bunching. Second, we use Legendre transforms to represent our problem as a linear program. Legendre transforms are a powerful tool from convex analysis that allow to represent a convex function by a family of its tangent lines. This linearization enables us to fully solve the model quantitatively and, in particular, precisely characterize the patterns of bunching.

We find that 10 percent of all workers are bunched at the optimal allocation for our empirically estimated economy. We show that workers are bunched with other workers who are better in one dimension but worse in the other dimension. Moreover, a sizable portion of bunching is nonlocal.

We introduce two notions of bunching: blunt bunching and targeted bunching. Blunt bunching occurs at the lowest regions of cognitive and manual skills. The planner does not distinguish workers’ cognitive skills from their manual skills and lumps their allocations together by an index of their skills. This is a blunt tool for providing incentives as it creates significant distortions leading to high marginal taxes - 30 percent of all bunching is blunt. Targeted bunching recognizes the workers’ comparative advantage. The planner separates workers on their dominant skill while bunching them on their weaker skill - 70 percent of all bunching is targeted.

We now discuss our model and results in more detail. We consider the optimal policy problem which we formulate as a mechanism design problem (Mirrlees, 1971). The planner chooses consumption allocations, allocations of cognitive and manual tasks, and the assignment of workers to coworkers and firms subject to incentive constraints that workers truthfully report their type. The primary difficulty in analyzing multidimensional optimal taxation problems is in characterizing regions of bunching. Rochet and Choné (1998) shows that bunching is generic in the multidimensional multiproduct monopolist problem that is closely related to the multidimensional optimal taxation problem. An important paper by Kleven, Kreiner, and Saez (2009) solves a multidimensional optimal taxation model under the restriction that one of the allocations is binary and argues for the importance of bunching in their setup. In our model, both the allocations of the manual and cognitive tasks are instead unrestricted. For this general setup, no characterization of optimal tax policy is known.

Our first theoretical result is the derivation of two optimality conditions that take into account the regions of bunching. First, we show that, at the optimum, the utility and revenue benefits from the entire schedule of taxes second-order stochastically dominate the costs of distortions they induce. Without bunching, this tradeoff is made locally and leads the planner to equalize the marginal benefits and costs of taxes as in a unidimensional problem. We show that, when there is bunching, the planner no longer equates the benefits and the costs of the taxes at each worker skill level but instead requires that the entire schedule of the benefits of taxes second-order stochastically dominate the entire schedule of distortions, making this tradeoff nonlocal. Second, we derive an optimal tax formula for the multidimensional taxation problem that holds with equality. We show that the local tradeoffs have to be augmented with an additional term that accounts for global incentive constraints. Specifically, the additional term modifies the social welfare weights through a convexity correction.111When there is no bunching, a classic pointwise optimality condition holds that can be rewritten in terms of a multidimensional ABC taxation formula similar to the unidimensional tax formula in Diamond (1998) and Saez (2001). The absence of bunching is equivalent to the indirect utility function being strongly convex and the first-order approach being valid. Kleven, Kreiner, and Saez (2006, p. 23) derive such a multidimensional ABC formula without bunching. Both formulas are new to the literature on optimal taxation.

Our next main theoretical insight is to transform the planner problem to a linear program. This is a key step that enables computation of the bunching regions. Legendre transformations linearize a convex function by replacing it with the upper envelope of all its tangent lines. The Legendre transform allows us to translate the multidimensional optimal taxation problem into a linear program that can be analyzed quantitatively with high precision. Numerical precision is not merely a technical curiosity but is essential to identify the regions and nature of bunching.

A parallel starting point of our analysis is a characterization of the equilibrium in a positive economy. In our positive economy, workers choose the amount of cognitive and manual tasks to deliver, coworkers to work with, and firms to work for. This problem integrates endogenous labor supply decisions with the assignment of multidimensional workers to teams and to heterogenous firms. Our first result for the positive economy is that workers sort with identical coworkers (self-sorting) and that better teams work on more valuable projects (positive sorting). The resulting assignment is qualitatively identical to the assignment under the optimal policy problem but the exact assignment differs because of differences in labor supply due to incentive constraints.

We use the dual formulation of the equilibrium assignment problem to characterize equilibrium wages. We show that wages are a convex function of an index of the worker’s task inputs rather than a function of each task individually. We further establish an exact mapping between curvature in the wage schedule and the distribution of firm productivity. By choosing a parametric convex function, we can then infer a distribution of firm projects such that this convex function is the equilibrium wage schedule.

Having developed the theory to characterize both positive and optimal allocations, we bring our theory to the data. In order to quantify cognitive and manual skill heterogeneity in the U.S. population, we use earnings information for all workers between 2000 and 2019 in the American Community Survey (ACS). We combine the earnings data from the ACS with data from O*NET on the manual and cognitive task intensity for every occupation (Acemoglu and Autor (2011)).

We use our closed-form characterization for wages in the positive economy to pointwise identify the level of manual and cognitive tasks completed by each worker. We then use the worker’s optimality condition for each task together with these inferred task levels to identify levels of cognitive and manual skill that deliver each worker’s wage and relative task intensity in the cross-sectional data as a model outcome. For the unidimensional taxation problem, an important contribution of Saez (2001) was to infer the underlying productivity distribution using earnings data which then becomes a central input to determine optimal taxes. Our identification generalizes these findings and delivers the distribution of manual and cognitive skills in a model accounting for multiple drivers of earnings (multidimensional skills, coworkers, firms). Our identification resembles Boerma and Karabarbounis (2020, 2021) who use explicit solutions for home production models to identify productivity at home and market productivity using data on consumption, home and market hours.

We next turn to the quantitative characterization of the optimum using the inferred skill distribution. In order to understand our quantitative characterization, we first describe a benchmark without incentive constraints. Due to separability of preferences and technology over tasks, the efforts in a given task depend exclusively on the worker’s skills in this task and, hence, there is no cross-dependence between tasks. Trivially, there is no bunching and there are no distortions.

In sharp contrast to the benchmark, optimal task intensity in our model depends positively on both of the worker’s skills. Workers with high manual skills also deliver high levels of cognitive tasks. This codependence is lower at the top end of the skill distribution. In the limit, workers face zero distortion in their manual task allocation at the top of the manual skill distribution, meaning there is no dependence of their manual task intensity on their cognitive skills as in the benchmark. At the lower end of the skill distribution, the distortion from this positive codependence is high.

A central part of our contribution is to characterize patterns of bunching. We first show that 10.4 percent of all workers is bunched at the optimum. Workers bunch with other workers both near and afar. Moreover, workers exclusively bunch with workers that are better in one skill dimension but worse in another. Workers do not bunch with workers over whom they have an absolute advantage nor with workers who have an absolute advantage over them.

We introduce two types of bunching: blunt bunching and targeted bunching. In the blunt bunching region, the planner requires all workers with the same effective skill index to deliver identical cognitive and manual tasks, and thus bunches workers that vastly differ in their skills. This is a blunt way to provide incentives and comes at a cost of significant output distortions. In the targeted bunching region, the planner recognizes the increasing efficiency costs of distorting higher skill workers. When workers have a higher relative level of, for example, manual skills they are separated along this dimension but are bunched on their relatively low cognitive skill. The planner thus separates according to workers’ comparative advantage and bunches workers by comparative disadvantage. In contrast to the blunt bunching region, targeted bunching occurs with workers who are more similar in skills: not too far away yet still nonlocally. In the region without bunching, the planner distorts allocations as in the unidimensional case.

We summarize the bunching patterns by describing the tax wedges they induce. In particular, we find that the level of tax wedges is high in the bunching regions. The tax wedges are particularly high for low skill workers who are bluntly bunched and are also high along the dimension of comparative disadvantage for the more skilled workers in the targeted bunching region. We further show that the optimum is implementable by a tax function that is only a function of earnings and line of work.


Literature. We now describe related literature. Kleven, Kreiner, and Saez (2009) is the first paper that analyzed optimal multidimensional taxation with bunching. They model a binary labor supply choice for the secondary earner along with continuous labor supply choice for the primary earner. Judd, Ma, Saunders, and Su (2017) consider numerically some cases of optimal taxation with multiple dimensions of heterogeneity (up to five dimensions of heterogeneity with five individual types) and find that some non-local constraints bind. The most ambitious attempt to date to solve a multidimensional policy problem with bunching is Moser and Olea de Souza e Silva (2019) for a model where workers are heterogeneous in two dimensions but only one dimension of heterogeneity enters the planner’s objective. Their key ingredient is paternalistic preferences, which delivers bunching due to disagreement between the planner and workers. In their environment bunching is optimal and, in fact, an essential feature even for the unidimensional problem. The fact that the planner cares only about one dimension of heterogeneity significantly reduces the complexity of deviations patterns. They characterize the model theoretically with the continuous skill distributions and also compute the model with six impatient types in one dimension and a large number of types in the second dimension. In our paper and, more broadly, for multidimensional optimal nonlinear taxation problems the planner cares about heterogeneity in several dimensions and, hence, the deviations and bunching patterns are significantly more complicated and nuanced, especially, when a large number of types within each skill dimension is analyzed. Heathcote and Tsujiyama (2021b) comprehensively analyze computational performance of different algorithms for unidimensional optimal taxation. They show that the number of skill types is not just a technical detail but has an important impact on policy prescriptions. In our settings, the need for fine skill differentiation in both dimensions of heterogeneity is additionally important to recover the nuanced patterns of bunching and deviating, especially in the regions of targeted bunching. More broadly, there is a vast literature on multidimensional mechanisms (e.g., McAfee and McMillan (1988), Armstrong (1996) and Rochet and Choné (1998)) that also emphasizes the complexity, as well as the central role, of bunching for the optimal solutions.

An important strand of papers in Scheuer (2014), Rothschild and Scheuer (2013, 2014, 2016) analyze nonlinear optimal taxation with multidimensional heterogeneity. These papers achieve tractability by transforming the multidimensional problem into a unidimensional screening problem with an endogenous wage distribution. Moreover, Rothschild and Scheuer (2014, 2016) in the multidimensional case and Scheuer and Werning (2017) also emphasize the importance of labor market sorting. Lehmann, Renes, Spiritus, and Zoutman (2021) and Golosov and Krasikov (2022) use a first-order approach to theoretically and numerically study multidimensional optimal taxation when there is no bunching.

A complementary approach is to analyze optimal policy in economies with multidimensional heterogeneity by restricting taxes to parametric families. The most comprehensive recent analysis using this approach is Blundell and Shephard (2012) on optimal taxation of low-income families and Gayle and Shephard (2019) on optimal taxation of couples. Notable papers that use such a parametric approach in a variety of other areas of optimal taxes are, for example, Benabou (2002), Conesa, Kitao, and Krueger (2009), Heathcote, Storesletten, and Violante (2017). Heathcote and Tsujiyama (2021a) synthesize the Mirrleesian approach and the parametric approach to optimal taxation.

Our positive wage determination model relates to a growing literature in labor economics that adopts a task approach to understand the contribution of multidimensional skills to labor market outcomes. Recent prominent examples in this area include Yamaguchi (2012), Sanders and Taber (2012), Lindenlaub (2017), Deming (2017), Guvenen, Kuruscu, Tanaka, and Wiczer (2020), Lise and Postel-Vinay (2020), Roys and Taber (2022) and Lindenlaub and Postel-Vinay (2023). Differently from these papers, we combine multidimensional skill heterogeneity with sorting into worker teams and sorting with heterogeneous firms.

2 Environment

We consider an economy with two-dimensional worker skill heterogeneity and heterogeneous firms. Worker output depends not only on own cognitive and manual efforts but also on the coworker it works with and the firm it works for - as emphasized in the modern literature on wage determination.


Workers. The economy is populated by a measure two of workers who differ in two unobservable characteristics. Workers are endowed with a bundle of cognitive and manual skills α=(αc,αm)A\alpha=(\alpha_{c},\alpha_{m})\in A. The distribution over types α\alpha is denoted by Φ\Phi.

Workers have preferences over consumption cc and experience disutility from effort in cognitive and manual activities =(c,m)\ell=(\ell_{c},\ell_{m}):

U(c,)=u(c)v(c)v(m),U(c,\ell)=u(c)-v(\ell_{c})-v(\ell_{m}), (1)

where consumption cc and leisure \ell are positive, utility is increasing and concave in consumption, and decreasing and strictly concave in cognitive and manual efforts. We further assume disutility has the form:

v()=κρ,v(\ell)=\kappa\ell^{\rho}, (2)

with ρ>2,κ>0\rho>2,\kappa>0.


Technology. Cognitive and manual production input (xc,xm)𝐗(x_{c},x_{m})\in\mathbf{X} for a worker are the product of their skills and their efforts:

xs=αss,x_{s}=\alpha_{s}\ell_{s}, (3)

for all tasks sS={c,m}s\in S=\{c,m\}. The worker’s skill is given by α\alpha, while their effort is given by \ell.

The economy is populated by a unit mass of heterogeneous firms that produce a single output by organizing two workers into a team to work on a project zz. Firm production is represented by yy. We use a bilinear team technology together with a multiplicative firm technology:222The bilinear technology is also used in Lindenlaub (2017), Lise and Postel-Vinay (2020), and Lindenlaub and Postel-Vinay (2023), among others.

y(x1,x2,z)=z(x1cx2c+x1mx2m).y(x_{1},x_{2},z)=z\left(x_{1c}x_{2c}+x_{1m}x_{2m}\right). (4)

Assignment. An assignment pairs workers with coworkers and projects. Formally, an assignment is a probability measure γ\gamma over workers, coworkers, and firms. Given a distribution of worker inputs FxF_{x}, a distribution of coworker inputs FxF_{x}, and a distribution of firms FzF_{z}, the set of feasible assignments is Γ:=Γ(Fx,Fx,Fz)\Gamma:=\Gamma(F_{x},F_{x},F_{z}). This is the set of probability measures on the product space 𝐗×𝐗×Z\mathbf{X}\times\mathbf{X}\times Z such that the marginal distributions of γ\gamma onto the set of workers and coworkers 𝐗\mathbf{X} are FxF_{x}, and the marginal distribution of γ\gamma onto the set of firms ZZ is FzF_{z}. The assignment function captures the measure of workers that work together on a project as γ(x1,x2,z)\gamma(x_{1},x_{2},z). Given a feasible assignment total output is y(x1,x2,z)dγ\int y(x_{1},x_{2},z)\text{d}\gamma.


Resources. Aggregate output and external resources RR are allocated to workers to consume:

y(x1,x2,z)dγ+Rc(α)dΦ,\int y(x_{1},x_{2},z)\text{d}\gamma+R\geq\int c(\alpha)\text{d}\Phi, (5)

where c(α)dΦ\int c(\alpha)\text{d}\Phi is aggregate consumption.

3 Planning Problem

In this section, we formulate a planner problem and characterize optimal sorting. The planner problem is to choose an allocation {(c(α),xc(α),xm(α))}αA\{(c(\alpha),x_{c}(\alpha),x_{m}(\alpha))\}_{\alpha\in A} and an assignment γΓ\gamma\in\Gamma to minimize the resource cost of providing welfare 𝒰\mathcal{U}:

c(α)dΦy(x1,x2,z)dγ,\int c(\alpha)\text{d}\Phi-\int y(x_{1},x_{2},z)\text{d}\gamma, (6)

subject to incentive constraints for all workers αA\alpha\in A, so that workers do not gain by misreporting types to be α^=(α^c,α^m)\hat{\alpha}=(\hat{\alpha}_{c},\hat{\alpha}_{m}):

U(c(α),xc(α)/αc,xm(α)/αm)=maxα^AU(c(α^),xc(α^)/αc,xm(α^)/αm),U(c(\alpha),x_{c}(\alpha)/\alpha_{c},x_{m}(\alpha)/\alpha_{m})=\max\limits_{\hat{\alpha}\in A}\;U(c(\hat{\alpha}),x_{c}(\hat{\alpha})/\alpha_{c},x_{m}(\hat{\alpha})/\alpha_{m}), (7)

and the promise keeping condition:

U(c(α),xc(α)/αc,xm(α)/αm)dΦ𝒰,\int U(c(\alpha),x_{c}(\alpha)/\alpha_{c},x_{m}(\alpha)/\alpha_{m})\text{d}\Phi\geq\mathcal{U}, (8)

which requires that aggregate welfare exceeds promised value 𝒰\mathcal{U}.333The planning problem is equivalent to maximizing utilitarian welfare function subject to the resource constraint (6) and the incentive constraints (7). There are no incentive constraints for firms since we assume firm output yy and inputs x1x_{1} and x2x_{2} are observed by the planner. Hence, the firm productivity zz is not private information.

3.1 Assignment

The planning problem contains an assignment problem. The planner pairs worker and coworker inputs with firm projects to maximize output given a distribution of worker inputs and firm projects. We show the planner optimally chooses a self-sorted assignment, meaning that workers are paired with identical coworkers, and also show that the planner pairs better teams with more valuable projects.444This assignment problem falls into a class of multimarginal, multidimensional optimal transportation problems. Multidimensional skill and the dependence of worker output on coworkers are central to recent advances in sorting models that utilize optimal transport theory to characterize equilibrium (Chiappori, McCann, and Nesheim, 2010; Dupuy and Galichon, 2014; Lindenlaub, 2017; Chiappori, McCann, and Pass, 2017; Galichon and Salanié, 2022).

The assignment problem embedded in the planning problem is to choose an assignment to maximize production given the distribution of workers tasks FxF_{x} and the project distribution FzF_{z}:

maxγΓy(x1,x2,z)dγ.\max\limits_{\gamma\in\Gamma}\;\int y(x_{1},x_{2},z)\text{d}\gamma. (9)

We construct an assignment γ\gamma that self-sorts workers and coworkers to obtain a unidimensional distribution for team quality, or effective worker skill, X=xc2+xm2X=x_{c}^{2}+x_{m}^{2}. The assignment γ\gamma combines self-sorting of workers with positive sorting between the effective worker skill XX and projects zz.555Self-sorting is defined with respect to distribution of effective task inputs that the workers supply, not necessarily with respect to the underlying worker skills α\alpha. In the presence of bunching, multiple workers α\alpha supply the same task levels xx and hence self-sorting of effective tasks may imply matching different α\alpha. This assignment γ\gamma solves the assignment problem (9).

Proposition 1.

Optimal Sorting. The planner assignment γ\gamma_{*} is characterized by self-sorting of workers and by positive sorting between team quality and project values.

The proof is in Appendix A.1.

We now develop the intuition for Proposition 1. Given a firm project, and since the technology for each task in equation (4) is supermodular, the planner optimally wants to positively sort both cognitive and manual inputs to project zz. In our economy with multidimensional worker skills, positive sorting within each task is attained by self-sorting. An optimal assignment thus features self-sorting of workers with coworkers within projects zz. Given that workers are optimally self-sorted, the planner remains to sort self-sorted workers with effective skill XX to firms zz. Since the effective production technology is supermodular in team quality XX and project value zz, the optimal assignment features positive sorting between teams and project values.666Positive sorting of effective skill XX with project values zz follows the classical Beckerian analysis (Becker, 1973).

Given that the assignment features self-sorting, the output per worker produced by a team of two workers supplying task inputs (xc,xm)(x_{c},x_{m}) is 12z(xc2+xm2)\frac{1}{2}z(x^{2}_{c}+x^{2}_{m}). Aggregate output is y(x1,x2,z)dγ=12z(xc2+xm2)dΦ\int y(x_{1},x_{2},z)\text{d}\gamma=\frac{1}{2}\int z(x^{2}_{c}+x^{2}_{m})\text{d}\Phi, and the resource cost (6) can be written as:

(c(α)12z(α)(xc2(α)+xm2(α)))dΦ.\int\Big{(}c(\alpha)-\frac{1}{2}z(\alpha)\big{(}x^{2}_{c}(\alpha)+x^{2}_{m}(\alpha)\big{)}\Big{)}\text{d}\Phi. (10)

3.2 Utility Allocations

We next transform the planner problem from choosing consumption and task allocations to choosing consumption utility and labor disutility allocations.

For each task sSs\in S, we define the skill parameter ps=καsρp_{s}=\kappa\alpha_{s}^{-\rho} so the skill parameter is inversely related to the underlying skill αs\alpha_{s}. The implied distribution function for the skill parameter vector pp is denoted π\pi, and the corresponding assignment is denoted z(p)z(p). We use this skill transformation to define a worker’s utility from consumption as a function of their skill vector as c(p):=u(c(α))c(p):=u(c(\alpha)). Following this definition, the resource cost of consumption utility is 𝒞(c(p))=u1(c(p))\mathcal{C}(c(p))=u^{-1}(c(p)). Since the utility from consumption is strictly increasing and concave in the consumption allocation, the resource cost is strictly increasing and convex in consumption utility. Similarly, we define labor disutility in each task s𝒮s\in\mathcal{S} as a function of the transformed skill parameter pp as xs(p):=xs(α)ρx_{s}(p):=x_{s}(\alpha)^{\rho}. The resource cost of providing disutility 𝒳(xs(p)):=12xs(p)2ρ\mathcal{X}(x_{s}(p)):=-\frac{1}{2}x_{s}(p)^{\frac{2}{\rho}} is decreasing and strictly convex in labor disutility for ρ>2\rho>2.

Given the introduction of the skill parameter pp and the transformation of the choice variables from allocations to utilities, the planner chooses {(c(p),xc(p),xm(p))}\{(c(p),x_{c}(p),x_{m}(p))\} to minimize the resource cost of providing welfare:

(𝒞(c(p))+z(p)(𝒳(xc(p))+𝒳(xm(p))))π(p)dp,\int\left(\mathcal{C}(c(p))+z(p)\big{(}\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p))\big{)}\right)\pi(p)\text{d}p, (11)

subject to linear incentive constraints:

c(p)pcxc(p)pmxm(p)c(q)pcxc(q)pmxm(q),c(p)-p_{c}x_{c}(p)-p_{m}x_{m}(p)\geq c(q)-p_{c}x_{c}(q)-p_{m}x_{m}(q), (12)

for all workers (p,q)(p,q), and a linear promise keeping condition:

(c(p)pcxc(p)pmxm(p))π(p)dp𝒰.\int(c(p)-p_{c}x_{c}(p)-p_{m}x_{m}(p))\pi(p)\text{d}p\geq\mathcal{U}. (13)

3.3 Incentive Compatibility

We show that the indirect utility for workers is convex and decreasing in type p=(pc,pm)p=(p_{c},p_{m}). The indirect utility is defined as:

u(p)=c(p)pcxc(p)pmxm(p),u(p)=c(p)-p_{c}x_{c}(p)-p_{m}x_{m}(p), (14)

which implies that for any incentive compatible allocation u(p)=x(p)=(xc(p),xm(p))T\nabla u(p)=-x(p)=-(x_{c}(p),x_{m}(p))^{T} almost everywhere. Using the indirect utility function, the incentive constraints (12) are u(p)u(q)(pcqc)xc(q)(pmqm)xm(q)u(p)\geq u(q)-(p_{c}-q_{c})x_{c}(q)-(p_{m}-q_{m})x_{m}(q) or, equivalently in notation of scalar product ,\langle\cdot,\cdot\rangle:

u(p)u(q)pq,x(q)=pq,u(q),u(p)-u(q)\geq\langle p-q,-x(q)\rangle=\langle p-q,\nabla u(q)\rangle, (15)

for the incentive constraint where worker type pp does not want to report to be of type qq.

A differentiable function uu on a convex domain is convex if and only if u(p)u(q)+pq,u(q)u(p)\geq u(q)+\langle p-q,\nabla u(q)\rangle. This implies that an incentive compatible indirect utility function is necessarily convex. Since the gradient of the indirect utility function is the negative of a worker’s production disutility, and production disutility is positive, the indirect utility function decreases in pp, or u(p)=x(p)0\nabla u(p)=-x(p)\leq 0.777The indirect utility function thus increases in skill α\alpha. In Appendix A.2, we discuss differentiability of the indirect utility function in more detail, and we also establish which incentive compatibility constraints are redundant.

Lemma 1.

Any indirect utility function (14) that is incentive compatible is convex and decreasing in worker type pp.

We denote the set of utility allocations that satisfy the set of incentive constraints by \mathcal{I}, which we refer to as feasible allocations.

3.4 Bunching

We refer to bunching as different workers being assigned the same labor allocation xx, and therefore the same consumption allocation cc. We label the set of bunching points by \mathcal{B}.888Alternatively, one could define a worker pp being bunched when there exists another worker pp^{\prime} in its neighborhood such that x(p)=x(p)x(p)=x(p^{\prime}). Our definition of bunching is the closure of this set. While these definitions are economically equivalent, our definition facilitates the presentation of Proposition 4.

Definition.

Worker pp is bunched, pp\in\mathcal{B}, if and only if in any neighborhood around this worker there exists two other workers pp^{\prime} and p′′p^{\prime\prime} with identical allocations x(p)=x(p′′)x(p^{\prime})=x(p^{\prime\prime}).

We now state the notions of convexity and strong convexity. Assume that the indirect utility is twice continuously differentiable in the neighborhood of a type pp. An indirect utility function uu is convex if and only if the Hessian matrix H(u)H(u) is positive semidefinite. The indirect utility function is strongly convex if H(u)αIIH(u)-\alpha_{I}I is positive semidefinite for some strictly positive αI\alpha_{I}, where II is the identity matrix.

Lemma 2.

If the indirect utility (14) is strongly convex, then there is no bunching. If the indirect utility is not strongly convex at all points in the neighborhood of type pp, then worker pp is bunched.

The proof is in Appendix A.3.

3.5 Taxation

In order to describe optimal distortions, we define tax wedges for each task. The tax wedge captures the difference between a worker’s marginal rate of substitution between task xsx_{s} and consumption cc, v(xs(α)αs)1αs/u(c(α))v^{\prime}\big{(}\frac{x_{s}(\alpha)}{\alpha_{s}}\big{)}\frac{1}{\alpha_{s}}\big{/}u^{\prime}(c(\alpha)), and the marginal rate of transformation, zxs(α)zx_{s}(\alpha). We define the tax wedge as:

1τs:=v(xs(α)αs)1αsu(c(α))/(zxs(α))=psz𝒞(c(p))𝒳(xs(p)),1-\tau_{s}:=\frac{v^{\prime}\big{(}\frac{x_{s}(\alpha)}{\alpha_{s}}\big{)}\frac{1}{\alpha_{s}}}{u^{\prime}(c(\alpha))}\bigg{/}\big{(}zx_{s}(\alpha)\big{)}=-\frac{p_{s}}{z}\frac{\mathcal{C}^{\prime}(c(p))}{\mathcal{X}^{\prime}(x_{s}(p))}, (16)

where it follows from the inverse function theorem that 𝒞(c(p))=1/u(c(α))\mathcal{C}^{\prime}(c(p))=1/u^{\prime}(c(\alpha)). A positive wedge plays a role of an implicit tax on marginal income on task ss.999Using the definition for the tax wedge, we write τs1τs=z𝒳(xs(p))+ps𝒞(c(p))ps𝒞(c(p))\frac{\tau_{s}}{1-\tau_{s}}=-\frac{z\mathcal{X}^{\prime}(x_{s}(p))+p_{s}\mathcal{C}^{\prime}(c(p))}{p_{s}\mathcal{C}^{\prime}(c(p))}.

4 Characterization

We next derive an optimality condition for the multidimensional tax problem that incorporates bunching.

4.1 Implementability Condition

The planner chooses consumption utility and labor disutility (c,x)(c,x) to minimize the Lagrangian:

(c,x)=(𝒞(c)+z(𝒳(xc)+𝒳(xm))λ(cpcxcpmxm𝒰))πdp.\mathcal{L}(c,x)=\int\Big{(}\mathcal{C}(c)+z\big{(}\mathcal{X}(x_{c})+\mathcal{X}(x_{m})\big{)}-\lambda\big{(}c-p_{c}x_{c}-p_{m}x_{m}-\mathcal{U}\big{)}\Big{)}\pi\text{d}p. (17)

subject to the incentive constraints (12), where λ\lambda denotes the multiplier on the promise keeping constraint (13).101010We suppress the dependence on pp to streamline notation.

Proposition 2.

Implementability Condition. Let (c,x)(c,x) denote a solution to the planner problem, then the implementability condition:

(𝒞(c)c^+z(𝒳(xc)x^c+𝒳(xm)x^m))πdpλ(c^pcx^cpmx^m)πdp\int\big{(}\mathcal{C}^{\prime}(c)\hat{c}+z\big{(}\mathcal{X}^{\prime}(x_{c})\hat{x}_{c}+\mathcal{X}^{\prime}(x_{m})\hat{x}_{m}\big{)}\big{)}\pi\text{d}p\geq\lambda\int\big{(}\hat{c}-p_{c}\hat{x}_{c}-p_{m}\hat{x}_{m}\big{)}\pi\text{d}p (18)

holds for any feasible allocation (c^,x^)(\hat{c},\hat{x})\in\mathcal{I}. At a solution (c^,x^)=(c,x)(\hat{c},\hat{x})=(c,x), (18) holds with equality.

The proof is in Appendix A.4. Proposition 2 states that for any feasible allocation (c^,x^)(\hat{c},\hat{x}), the implementability condition is necessarily satisfied, where the marginal resource costs of providing consumption utility 𝒞(c)\mathcal{C}^{\prime}(c), as well as the marginal resource costs of providing disutility from work (𝒳(xc),𝒳(xm))(\mathcal{X}^{\prime}(x_{c}),\mathcal{X}^{\prime}(x_{m})), are evaluated at an optimum. Thus, the implementability condition places restrictions on the optimal (c,x)(c,x) that need to satisfy (18) for any feasible allocation (c^,x^)(\hat{c},\hat{x}).

Proposition 2 combines two variational arguments. First, consider a small proportional change in consumption utility and labor disutility. This variation is feasible. Since this scaling is unrestricted, meaning that it can either increase or decrease the utility allocations, it implies that (18) holds with equality at the optimal allocation (c,x)(c,x). Second, consider a convex combination of an optimal allocation and any other feasible allocation with a small weight. The convex combination is equivalent to scaling down the optimal allocation and adding a small positive perturbation. By the previous argument, rescaling does not change the Lagrangian at the optimum allocation. The positive perturbation should not decrease the Lagrangian. Since this perturbation is positive it gives an inequality condition.

Proposition 2 presents an implementability constraint for an incentive constrained economy. The implementability conditions are more common in the Ramsey taxation literature where they represent the distortions to allocations introduced by pre-specified taxes. In our model, we do not impose direct restrictions on the permissible taxes and, instead, an information friction endogenously restricts the set of allocations. Importantly, our implementability condition holds with inequality which, as we show, is essential for characterizing the bunching regions.

4.2 Optimal Tax Condition as Stochastic Dominance

We use Proposition 2 to derive an optimality condition for our multidimensional taxation problem in terms of a stochastic dominance condition.

We first use the indirect utility (14) for a feasible allocation (c^,x^)(\hat{c},\hat{x}) to write the implementability condition (18) as:

(𝒞(c)(u^u^p)z𝒳(x)u^)πdpλu^πdp0,\int\big{(}\mathcal{C}^{\prime}(c)\big{(}\hat{u}-\nabla\hat{u}\cdot p\big{)}-z\mathcal{X}^{\prime}(x)\cdot\nabla\hat{u}\big{)}\pi\text{d}p-\lambda\int\hat{u}\pi\text{d}p\geq 0, (19)

for any nonnegative, decreasing and convex indirect utility function u^\hat{u}. By Proposition 2 it follows that (19) holds with equality for an optimal indirect utility function. Integrating implementability condition (19) by parts we obtain:

(pc(π(pc𝒞(c)+z𝒳(xc)))+pm(π(pm𝒞(c)+z𝒳(xm))))u^dpπ(λC(c))u^dp+Ξ(u^),\int\big{(}\partial_{p_{c}}\big{(}\pi(p_{c}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{c}))\big{)}+\partial_{p_{m}}\hskip-1.9919pt\left(\pi(p_{m}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{m}))\right)\hskip-2.84544pt\big{)}\hat{u}\text{d}p\geq\int\pi(\lambda-C^{\prime}(c))\hat{u}\text{d}p+\Xi(\hat{u}), (20)

for any nonnegative, decreasing and convex indirect utility function u^\hat{u}, where boundary conditions act on u^\hat{u} as Ξ(u^)=p¯mp¯mπ(pc𝒞(c)+z𝒳(xc))u^|p¯cp¯cdpm+p¯cp¯cπ(pm𝒞(c)+z𝒳(xm))u^|p¯mp¯mdpc\Xi(\hat{u})=\int^{\bar{p}_{m}}_{\underline{p}_{m}}\pi(p_{c}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{c}))\hat{u}\big{|}^{\bar{p}_{c}}_{\underline{p}_{c}}\text{d}p_{m}+\int^{\bar{p}_{c}}_{\underline{p}_{c}}\pi(p_{m}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{m}))\hat{u}\big{|}^{\bar{p}_{m}}_{\underline{p}_{m}}\text{d}p_{c}.

We now define second-order stochastic dominance (Shaked and Shanthikumar, 2007):

Definition.

The measure μ\mu second-order stochastically dominates the measure ν\nu, or μν\mu\succeq\nu, if and only if for any nonnegative, decreasing and convex function u^\hat{u}:

u^(p)dμu^(p)dν.\int\hat{u}(p)\text{d}\mu\geq\int\hat{u}(p)\text{d}\nu. (21)

Second-order stochastic dominance states that equation (21) holds for any nonnegative, decreasing, and convex function u^\hat{u}. These conditions exactly correspond to the indirect utility being feasible (Lemma 1). Applying the definition for second-order stochastic dominance to equation (20) we obtain the following theorem.

Theorem.

Optimal Tax Condition as Stochastic Dominance. Suppose that the optimal allocation (c,x)(c,x), density function, and assignment are all continuously differentiable. Then,

pc(π(pc𝒞(c)+z𝒳(xc)))+pm(π(pm𝒞(c)+z𝒳(xm)))π(λ𝒞(c))+Ξ.\partial_{p_{c}}\big{(}\pi(p_{c}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{c}))\big{)}+\partial_{p_{m}}\hskip-1.9919pt\left(\pi(p_{m}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{m}))\right)\succeq\pi(\lambda-\mathcal{C}^{\prime}(c))+\Xi. (22)

This theorem derives the optimality condition for the multidimensional taxation economy that incorporates bunching. This condition shows that, at the optimum, the measure over marginal tax revenues, π(1/u(𝒞(c))λ)\pi\left(1/u^{\prime}(\mathcal{C}(c))-\lambda\right), second-order stochastically dominates the measure over marginal tax distortions,

pc(πu(𝒞(c))pcτc1τc)+pm(πu(𝒞(c))pmτm1τm)+Ξ,\partial_{p_{c}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{c}\frac{\tau_{c}}{1-\tau_{c}}\Big{)}+\partial_{p_{m}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{m}\frac{\tau_{m}}{1-\tau_{m}}\Big{)}+\Xi, (23)

where we use the definition of the labor skill wedge (16) and footnote 9.

Comparing the costs and the benefits of taxes is the key insight of the classic ABC formula and the analysis of Diamond (1998) and Saez (2001). In the classic unidimensional case, these costs and benefits are exactly equated for each of the skill levels. Our theorem shows that for the multidimensional tax case with bunching the logic of the ABC formula applies as the costs and the benefits of the taxes are compared. However, those are not necessarily equated at each skill level. Instead, our optimal tax condition (21) considers the benefits and the costs of the entire schedule of taxes and states that the entire schedule of benefits of taxes should second-order stochastically dominate the entire schedule of distortions - showing the non-local nature of the problem with multidimensional skills in the regions with bunching. Our formula applies both to the regions with and without bunching and, in the latter case reduces to equating the costs and the benefits of distortions at each skill level - thus making it a local problem for the regions without bunching.111111In Appendix A.5, we develop the connection between our general optimal tax conditions and the classic ABC formula. Our optimal taxation condition as stochastic dominance is also related to the sweeping operator in Rochet and Choné (1998). More specifically, the existence of a version of the sweeping operator can be established by using a variation of the Strassen Theorem (see Shaked and Shanthikumar (2007), Theorem 4.A.5). The optimal taxation formula goes beyond existence of such an operator by further relating the entire schedule of costs to the entire schedule of benefits of optimal taxes.

4.3 Global Optimal Tax Formula

We next provide an optimal tax formula for the multidimensional taxation problem as an equality. This representation also connects to the optimal tax formulas in unidimensional taxation problems which are derived as equality measuring the marginal costs and benefits of taxation (Mirrlees, 1971; Diamond, 1998; Saez, 2001). The main difference with these results is that the optimal tax formula in our multidimensional taxation problem explicitly accounts for global incentive constraints.

Theorem.

Global Optimal Tax Formula. Suppose the optimal allocation (c,x)(c,x), density function, and assignment are all continuously differentiable. Then,

pc(πu(𝒞(c))pcτc1τc)+pm(πu(𝒞(c))pmτm1τm)=π(1u(𝒞(c))λ)ΔM(p),\partial_{p_{c}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{c}\frac{\tau_{c}}{1-\tau_{c}}\Big{)}+\partial_{p_{m}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{m}\frac{\tau_{m}}{1-\tau_{m}}\Big{)}=\pi\left(\frac{1}{u^{\prime}(\mathcal{C}(c))}-\lambda\right)-\Delta M(p), (24)

where M(p)M(p) is a positive semidefinite matrix that enforces the convexity of the indirect utility function, and ΔM(p)=i,j2pipjMij(p)\Delta M(p)=\sum\limits_{i,j}\frac{\partial^{2}}{\partial p_{i}\partial p_{j}}M_{ij}(p).

The full proof is in Appendix A.6 and here we provide a sketch of the proof. First, we use the definition of the indirect utility function (14) to reformulate the planner problem as directly choosing an indirect utility function to minimize the resource cost of providing welfare. For the indirect utility function to be globally incentive compatible, the reformulated planning problem is constrained by the condition that the indirect utility function is convex and decreasing in worker type pp following the characterization in Lemma 1.

Second, the indirect utility function u(p)u(p) being convex is equivalent to its Hessian being positive semidefinite, H(u)0H(u)\succeq 0 for all worker types pp, which in turn is equivalent to:

vTH(u)v0,v^{T}H(u)v\geq 0, (25)

for all vectors v2v\in\mathbb{R}^{2}. These inequalities are an infinite series of constraints parameterized by the vectors vv for each worker type pp. For each of these constraints, we introduce a multiplier λ(v,p)0\lambda(v,p)\geq 0 and include these constraints into the Lagrangian for the planning problem.

Third, we establish (Lemma 8) that one can represent the constraint that the indirect utility function has to be convex as a matrix condition by introducing a positive semidefinite Kuhn-Tucker matrix M(p)M(p) for each worker pPp\in P. Instead of considering the infinite series of constraints for each worker pp, a single positive semidefinite matrix M(p)M(p) induces convexity of the indirect utility function. Upon integration by parts, this restriction appears as a modified social welfare weight ω(p)=1+ΔM(p)λπ\omega(p)=1+\frac{\Delta M(p)}{\lambda\pi}. In summary, the main contribution of the convexity constraint to the planning problem is modifying the social welfare weights through a convexity correction. Finally, we derive in Appendix A.6.2 and Appendix A.6.3 how this modified social welfare weight translates into the optimal tax condition (24).


We now discuss in more detail how the optimal tax condition applies in regions without bunching. Specifically, we consider the domain where the indirect utility function is strongly convex and, therefore, there is no bunching.

The main difficulty in analyzing bunching in the multidimensional case is that the possible indirect utility perturbations u^\hat{u} are required to be convex. The convexity of perturbations thus acts as an additional constraint on the entire tax schedule. Without bunching, the perturbation argument is straightforward to construct and leads to equating of cost and benefits of taxes at each skill level. Intuitively, if the underlying utility function is strongly convex, a small enough additive perturbation preserves convexity. As a result, the optimal tax condition (24) in Theorem Theorem applies with the convexity correction ΔM=0\Delta M=0 at the types where there is no bunching.

Corollary 3.

Multidimensional Optimal Tax Formula without Bunching. If the indirect utility function is strongly convex for a worker pp, then:

π(1u(𝒞(c))λ)=pc(πu(𝒞(c))pcτc1τc)+pm(πu(𝒞(c))pmτm1τm).\pi\left(\frac{1}{u^{\prime}(\mathcal{C}(c))}-\lambda\right)=\partial_{p_{c}}\left(\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{c}\frac{\tau_{c}}{1-\tau_{c}}\right)+\partial_{p_{m}}\left(\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{m}\frac{\tau_{m}}{1-\tau_{m}}\right). (26)

The proof is in Appendix A.7. In order to provide intuition for Corollary 3, and to connect our expression to the existing literature, we also write this condition in the original worker type coordinates α\alpha:

ϕ(α)(λ1u(c(α)))=1ραc(ϕ(α)u(c(α))αcτc1τc)+1ραm(ϕ(α)u(c(α))αmτm1τm),\phi(\alpha)\left(\lambda-\frac{1}{u^{\prime}(c(\alpha))}\right)=\frac{1}{\rho}\partial_{\alpha_{c}}\left(\frac{\phi(\alpha)}{u^{\prime}(c(\alpha))}\alpha_{c}\frac{\tau_{c}}{1-\tau_{c}}\right)+\frac{1}{\rho}\partial_{\alpha_{m}}\left(\frac{\phi(\alpha)}{u^{\prime}(c(\alpha))}\alpha_{m}\frac{\tau_{m}}{1-\tau_{m}}\right), (27)

which is the same form as derived in Kleven, Kreiner, and Saez (2006, p. 23), Lehmann, Renes, Spiritus, and Zoutman (2021), and Golosov and Krasikov (2022). The left-hand side captures the marginal benefit of increasing taxes, lowering the resource cost by taxing worker α\alpha at the cost λ\lambda of tightening the promise keeping condition, and where ϕ\phi denotes the density function over the types α\alpha. At an optimum, the marginal benefit of increasing taxes is equated to the marginal distortionary cost of increasing taxes, which is given by the right-hand side. The right-hand side captures the change in labor distortions inversely weighted by the marginal utility of consumption. Distortionary costs of taxation scale with the elasticity of labor supply, which is governed by ρ\rho. When the supply of tasks is elastic (low ρ\rho), marginal distortionary costs are large. When the supply of tasks is inelastic (high ρ\rho), marginal distortionary costs are small. All else equal, if the marginal utility from consumption is low, λ<1/u(c(α))\lambda<1/u^{\prime}(c(\alpha)), for high-skill workers, the labor skill distortion decreases with an increase in either cognitive or manual skills. When more workers are affected by a change in the skill distortions, or when the promise keeping constraint is tight, marginal labor distortions change more rapidly.

Finally, we provide a converse to Corollary 3 that allows to determine the regions of bunching.

Proposition 4.

Identifying Bunching. If equation (26) does not hold for a worker type pp, then this worker is bunched.

Proposition 4 thus provides a test to identify bunching. Whenever equation (26) is violated, the worker is bunched. We prove Proposition 4 in Appendix A.8. By the contrapositive to Corollary 3 it follows that when equation (26) does not hold, the indirect utility function is not strongly convex, meaning that the Hessian matrix is degenerate for worker pp. We show that the Hessian matrix is also degenerate for all workers within the neighborhood of pp, which we show is equivalent to worker pp being bunched.

4.4 Legendre Linearization

In this section, we discuss the main technique that enables the numerical solution of our problem. Specifically, we transform our problem into a linear problem using Legendre transformations for convex functions that translates convex functions into the upper envelopes of their tangent lines. In order to explain the Legendre transform, and show its importance, we use the resource cost of providing consumption utility 𝒞\mathcal{C} as an example.

A convex function exceeds all tangent lines. For any consumption utility cc, and for any point of tangency aa:

𝒞(c)𝒞(a)+(ca)𝒞(a)=φc𝒞(φ),\mathcal{C}(c)\geq\mathcal{C}(a)+(c-a)\mathcal{C}^{\prime}(a)=\varphi c-\mathcal{C}^{*}(\varphi), (28)

where the equality follows by parameterizing the tangent lines with their slope φ:=𝒞(a)\varphi:=\mathcal{C}^{\prime}(a) and by letting 𝒞(φ):=𝒞(a)+a𝒞(a)\mathcal{C}^{*}(\varphi):=-\mathcal{C}(a)+a\mathcal{C}^{\prime}(a) for a=𝒞1(φ)a={\mathcal{C}^{\prime}}^{-1}(\varphi). The function 𝒞\mathcal{C}^{*} is the Legendre transform for the resource cost of providing consumption utility 𝒞\mathcal{C}. Since a convex function exceeds all its tangent lines, and since the function value equals the value of the tangent line at the point of tangency:

𝒞(c)=maxφ0φc𝒞(φ).\mathcal{C}(c)=\max_{\varphi\geq 0}\;\varphi c-\mathcal{C}^{*}(\varphi). (29)

The Legendre transformation converts the convex resource cost of providing consumption utility on the left side of (29) into a family of linear constraints on the right. The family of linear constraints is parameterized by the slopes of the tangent lines of the cost function. Since the resource cost increases with consumption utility, the slopes of the tangent lines are positive, or φ0\varphi\geq 0.

The previous steps apply for any convex function, allowing us to use the same argument to transform the resource cost of providing work disutility into a family of linear constraints:

𝒳(xs)=maxψs0ψsxs𝒳(ψs),\mathcal{X}(x_{s})=\max_{\psi_{s}\leq 0}\;\psi_{s}x_{s}-\mathcal{X}^{*}(\psi_{s}), (30)

for each skill s𝒮s\in\mathcal{S}. An increase in production disutility increases production and therefore lowers resource costs. The resource cost of production disutility is decreasing, implying negative slopes of the tangent lines, or ψs0\psi_{s}\leq 0.

To summarize, the transformed planning problem is to minimize the resource cost of providing utilitarian welfare 𝒰\mathcal{U}:

(maxφ(p)0(φ(p)c(p)𝒞(φ(p)))+z(p)smaxψs(p)0(ψs(p)xs(p)𝒳(ψs(p))))π(p)dp\int\Big{(}\max_{\varphi(p)\geq 0}\big{(}\varphi(p)c(p)-\mathcal{C}^{*}(\varphi(p))\big{)}+z(p)\sum_{s}\max_{\psi_{s}(p)\leq 0}\big{(}\psi_{s}(p)x_{s}(p)-\mathcal{X}^{*}(\psi_{s}(p))\big{)}\Big{)}\pi(p)\text{d}p (31)

subject to incentive constraints (12) for all workers (p,q)P×P(p,q)\in P\times P, and the linear promise keeping condition (13).121212In Section A.9, we show this problem is equivalent to maximizing utilitarian welfare subject to the resource constraint, and the incentive constraints. In Section A.10 we establish how to derive the stochastic dominance condition and the general optimal tax formula directly from the transformed problem.


Numerical Approach. The main insight of this analysis is that Legendre transform enables us to translate the planning problem into a linear problem (see Appendix A.11 for more detail). This is the reason why we are able to solve the model for a total of 40 thousand worker types, with 200 types in both the cognitive and the manual dimension, and a total of 1.6 billion incentive constraints. Importantly, a large number of types and numerical precision is not merely a technical and computational curiosity, it is essential to characterize the regions and nature of bunching. In addition, we use two other significant steps to reduce the number of effective incentive constraints.

First, we consider only a small set of incentive constraints by adding incentive constraints between two worker types only if the distance between them is small.131313Oberman (2013) shows that the solution to the problem with only local constraints provides a reasonable initial guess. We then use an iterative procedure to update the set of incentive constraints. On each step, we add all violated incentive constraints to the problem.141414After the final step, the candidate solution satisfies all constraints to the strictly convex optimization problem and hence is the unique solution. In practice, we always obtain the same solution for different initial conditions. With 40 thousand types, this procedure allows to reduce the number of incentive constraints to about 4 million constraints instead of 1.6 billion. Second, an important step that helps us reduce the number of incentive constraints is that we do not need to consider reducible incentive constraints (see Section A.2). This observation additionally reduces the number of constraints by a factor of two. In Appendix A.11 we further prove the accuracy of the approximate planner problem and describe the algorithm that we use to characterize the numerical solution. We finally note that without introducing Legendre transforms the objective is nonlinear. Currently, even the state-of-the-art nonlinear solvers cannot handle the characterization of the solution even for small numbers of types.

5 Positive Economy

We describe and characterize an equilibrium in a positive model of workers with multidimensional skills sorting with heterogeneous firms.


Every firm zz takes wage schedule ww as given and chooses two workers to solve:

Ω(z)=maxx1,x2y(x1,x2,z)w(x1)w(x2).\Omega(z)=\max_{x_{1},x_{2}}\;y(x_{1},x_{2},z)-w(x_{1})-w(x_{2}). (32)

We define the surplus SS as output minus payments to the workers and the firm:

S(x1,x2,z)=y(x1,x2,z)w(x1)w(x2)Ω(z).S(x_{1},x_{2},z)=y(x_{1},x_{2},z)-w(x_{1})-w(x_{2})-\Omega(z). (33)

Firm output cannot exceed payments to its workers and owner, that is, S(x1,x2,z)0S(x_{1},x_{2},z)\leq 0 for any triplet (x1,x2,z)(x_{1},x_{2},z).

Every worker takes the wage schedule ww as given and chooses their cognitive and manual task inputs xx to solve:

maxxc,xmu(c)v(xcαc)v(xmαm)\max_{x_{c},x_{m}}\;u(c)-v\Big{(}\frac{x_{c}}{\alpha_{c}}\Big{)}-v\Big{(}\frac{x_{m}}{\alpha_{m}}\Big{)} (34)

subject to the budget constraint c=(1τ)w(x)c=(1-\tau)w(x), where w(x)=w(xc,xm)w(x)=w(x_{c},x_{m}) is the wage as a function of cognitive and manual inputs, and the disutility from work is given by (2). The government taxes earnings at a rate τ\tau to finance public expenditures GG that are not valued by workers.

The resource constraint is given by:

y(x1,x2,z)dγ(x1,x2,z)=c(α)dΦ(α)+Ω(z)dFz(z)+G.\int y(x_{1},x_{2},z)\text{d}\gamma(x_{1},x_{2},z)=\int c(\alpha)\text{d}\Phi(\alpha)+\int\Omega(z)\text{d}F_{z}(z)+G. (35)

Total production, y(x1,x2,z)dγ(x1,x2,z)\int y(x_{1},x_{2},z)\text{d}\gamma(x_{1},x_{2},z), equals output distributed to workers, c(α)dΦ(α)\int c(\alpha)\text{d}\Phi(\alpha), to firms Ω(z)dFz(z)\int\Omega(z)\text{d}F_{z}(z), and to public expenditures GG.


Equilibrium. Given fiscal policy (τ,G)(\tau,G), an equilibrium is a firm value function Ω\Omega, a wage schedule ww, a worker input distribution FxF_{x}, a feasible assignment γ\gamma, and an allocation {(c(α),xc(α),xm(α))}\{(c(\alpha),x_{c}(\alpha),x_{m}(\alpha))\} such that firms solve their profit maximization problem (32), workers solve the worker’s problem (34), the government budget constraint is satisfied G=τw(x)dΦ(α)G=\tau\int w(x)\text{d}\Phi(\alpha), and the resource constraint (35) is satisfied.


The equilibrium assignment is the assignment that maximizes aggregate output, that is, solves the primal problem, while the equilibrium wages ww and firm value function Ω\Omega solve the corresponding dual problem. The characterization of the equilibrium assignment, wage schedule, and firm value function through primal and dual problems is discussed for completeness in Appendix A.12.

5.1 Characterizing Equilibrium

We note that solving for the equilibrium assignment in the positive economy follows the same steps as solving for the planner assignment (9) in Section 3.1. It follows from Proposition 1 that the equilibrium features self-sorting between workers and coworkers, and positive sorting between team quality and firm project values.

In order to characterize wages and firm values, we solve the dual transport problem. Since the surplus is negative for any triplet in equilibrium, S(x1,x2,z)0S(x_{1},x_{2},z)\leq 0, and since the aggregate resource constraint (35), the government budget constraint and the household budget constraints hold in equilibrium, the surplus equals zero almost everywhere with respect to the equilibrium assignment, so w(x1)+w(x2)+Ω(z)=y(x1,x2,z)w(x_{1})+w(x_{2})+\Omega(z)=y(x_{1},x_{2},z). Output is distributed to the owner and to the workers. We use this condition to establish further properties of the firm value function and the wage schedule in Section A.14.

In Section A.14, we first note that wages are only a function of effective worker skills X=xc2+xm2X=x^{2}_{c}+x^{2}_{m}, and we define h(X)h(X), the firm’s total wage bill, as h(X)=2w(x)h(X)=2w(x). By applying standard arguments from optimal transport, wages are convex in effective worker skill XX. In other words, small differences in effective worker skill translate into increasingly large differences in earnings.151515The hedonic pricing condition z=h(X)z=h^{\prime}(X) delivers superstar effects in our model as well as a number of other assignment models (see, for example, Rosen (1981), Gabaix and Landier (2008), Tervio (2008), Scheuer and Werning (2017), and Boerma, Tsyvinski, and Zimin (2025)). Moreover, the firm value function is the Legendre transform of the wage bill, Ω=h\Omega=h^{*}. As a result, h(X)+h(z)=zXh(X)+h^{*}(z)=zX.


In our quantitative analysis, we infer the distribution of project values FzF_{z} using earnings data. The key is to show that there exists a firm project zz such that h(X)+h(z)=zXh(X)+h^{*}(z)=zX for any pairing (z,X)(z,X). When the wage bill hh is continuously differentiable h(X)+h(z)=zXh(X)+h^{*}(z)=zX implies z=h(X)z=h^{\prime}(X). That is, the derivative of the firm’s wage bill equals its project value. Given an increasing and convex wage bill hh, and effective skills XX, this condition identifies increasing values for firm productivity zz.

We apply this logic to the parametric continuously differentiable function h(X)=Xη+2ζh(X)=X^{\eta}+2\zeta where η1\eta\geq 1 governs the convexity of wages and ζ\zeta captures the lowest wage per worker. Using the derived fact that z=h(X)z=h^{\prime}(X), we can relate the distribution of firm projects zz to the convexity parameter η\eta of the wage bill. If η=1\eta=1, there is no dispersion in firm productivity. We formalize this in Lemma 3.

Lemma 3.

For some firm distribution FzF_{z} there exists an equilibrium with (ii) a self-sorted assignment, and (iiii) a wage function:

w(x)=12(xc2+xm2)η+ζ.w(x)=\frac{1}{2}\big{(}x^{2}_{c}+x^{2}_{m}\big{)}^{\eta}+\zeta. (36)

The proof is in Appendix A.15. The idea is to show there is a firm distribution FzF_{z} so that given wage schedule (36), workers and firms both optimize in a self-sorting equilibrium. Given the firm technology (4) and the wage equation (36), firm profits decrease in the difference between their workers’ skills. In order to minimize output losses, firms thus hire pairs of identical workers. Given wage equation (36), the worker problem (34) has a unique solution, so that the distribution of worker inputs FxF_{x} is uniquely determined by the worker problem. Finally, we map the firm distribution that induces (36) as an equilibrium wage equation using z=h(X)z=h^{\prime}(X). We use these steps to pointwise identify the worker skill distribution as we show in Section 6.

6 Quantitative Analysis

In this section we infer the distribution of cognitive and manual talents Φ\Phi. The inference of the underlying distributions of skills, a central input for the calculation of the optimal tax formula, generalizes the approach of the unidimensional skills in Saez (2001) to a labor market model with multidimensional skills, coworker and firm effects. We also calibrate the parameter ρ\rho that governs the curvature of disutility with respect to effort.

6.1 Data Sources

We use data from the American Community Survey (ACS). We consider individuals between 25 and 60 years of age. The final sample from the ACS includes almost 16 million individuals between 2000 and 2019. For all our results, we use sample weights provided by the survey. Our measure of labor income is wage and salary income before taxes over the past 12 months.161616This measure includes wages, salaries, commissions, cash bonuses, tips, and other money income received from an employer. We drop individuals with earnings below a threshold to focus on workers who are attached to the labor market. This minimum is one-half of the federal minimum wage times 13 weeks at 40 hours per week (as in Guvenen, Ozkan, and Song (2014)).

The ACS contains occupational information for every worker. We combine a worker with the task intensity for their occupation using O*NET task measures from Acemoglu and Autor (2011). Our cognitive measure is the average of their cognitive measures, and our manual measure is the average of their manual measures. Our resulting scores are approximately normally distributed across occupations.

For identification, we first construct a measure of relative task intensity by occupation. To obtain aggregated task production levels we use a Cobb-Douglas technology to map worker subtasks into final task production similar to Kremer (1993), Acemoglu and Autor (2011) and Deming (2017):

qs=exp(1|𝒱|ν𝒱logqsν).q_{s}=\exp\bigg{(}\frac{1}{|\mathcal{V}|}\sum\limits_{\nu\in\mathcal{V}}\log q_{s\nu}\bigg{)}. (37)

Letting logqsν\log q_{s\nu} be the ZZ-score by subtask ν\nu, we obtain cognitive and manual task production levels. Since our aggregated cognitive and manual measure are approximately normally distributed, task production levels are approximately lognormal. We now make an identification assumption that the relative task input is equal to the relative task production level, xm/xc=qm/qcx_{m}/x_{c}=q_{m}/q_{c}, which is hence also approximately lognormally distributed across occupations.

Refer to caption
Refer to caption
Refer to caption
Figure 1: Task Intensity Across Occupations

Figure 1 shows the distribution of manual and cognitive task production levels across occupations in logs (left and center panel) together with the relative distribution of manual and cognitive task intensity (right panel). Each distribution is well-approximated by a lognormal distribution.

Figure 1 shows the distribution of manual and cognitive task production levels across occupations in logs together with the relative distribution of manual and cognitive task intensity. The first two panels show that the distribution of manual task production levels and the distribution of cognitive task production levels can be described by a lognormal distribution. The right panel shows that the same holds for the relative manual task intensity.

Figure 2 displays the relation between relative task intensity and average earnings across occupations. Earnings are low for occupations with high manual task intensity, such as gardeners and truck drivers, while earnings are high for occupations with high cognitive task intensity such as software developers and actuaries. Moving from the 25th percentile to the 75th percentile in relative manual task intensity decreases earnings from 62 to 35 thousand dollars.

Refer to caption
Figure 2: Earnings and Relative Task Intensity

Figure 2 show the relation between average earnings (y-axis, logarithmic scale) and relative task intensity across occupations. Average earnings are decreasing in the relative manual task intensity. The size of each circle corresponds to the occupation’s employment share.

6.2 Calibration

We now calibrate the positive model. We parameterize fiscal policy and preferences, and infer the underlying multidimensional skill distribution.

The government taxes labor earnings to finance expenditures GG. If pre-tax earnings are ww, then taxes are given by T(w)=τwT(w)=\tau w. After-tax earnings are thus (1τ)w(1-\tau)w, we set τ=0.3\tau=0.3.

Firm heterogeneity governs the convexity of the wage schedule (see Lemma 3). We set the curvature parameter for the wage schedule η\eta to align the added variation in log wages due to firm heterogeneity with evidence from the literature on variation in log wages due to firm effects. Using the wage equation (36), the variation in firm projects multiplies the underlying variation across workers by η2\eta^{2}. We set η=1.1\eta=1.1 to attribute 17 percent of the added variation in wages to firm effects. Our target of 17 percent is in line with estimates from the literature.171717For example, Abowd, Lengermann, and McKinney (2003) find that firm variation makes up 17 percent of the variance in wages while Song, Price, Guvenen, Bloom, and Von Wachter (2019) instead report that firm variation makes up between 8 percent and 12 percent.

We next discuss the calibration of worker preferences. We use linear preferences with respect to consumption goods, u(c)=cu(c)=c, and estimate the parameter governing the curvature of the disutility function to efforts in each task ρ\rho. We set ρ\rho such that a regression of log market hours on hourly wages, holding constant the marginal value of wealth, yields a coefficient of 0.55. This target value comes from the meta-analysis of estimates of the intensive margin Frisch elasticity from Chetty, Guren, Manoli, and Weber (2012).

To use estimates for the Frisch elasticity for total hours with respect to hourly productivity to calibrate the curvature of the utility function with respect to effort, we derive this expression within our model. Given the specification for the disutility from work (2), the linear utility from consumption, and the worker technology (3), the worker’s problem (34) is:

maxxc,xm12(1τ)(xc2+xm2)ηκ(xcαc)ρκ(xmαm)ρ.\max\limits_{x_{c},x_{m}}\;\frac{1}{2}(1-\tau)(x_{c}^{2}+x_{m}^{2})^{\eta}-\kappa\Big{(}\frac{x_{c}}{\alpha_{c}}\Big{)}^{\rho}-\kappa\Big{(}\frac{x_{m}}{\alpha_{m}}\Big{)}^{\rho}. (38)

The optimality condition to the worker’s problem for each task s𝒮s\in\mathcal{S} is:

(1τ)η(2w(x))η1η=κρxsρ2αsρ,(1-\tau)\eta\big{(}2w(x)\big{)}^{\frac{\eta-1}{\eta}}=\kappa\rho\frac{x_{s}^{\rho-2}}{\alpha^{\rho}_{s}}, (39)

where w(x)=12(xc2+xm2)η+ζw(x)=\frac{1}{2}(x^{2}_{c}+x^{2}_{m})^{\eta}+\zeta by wage equation (36) with ζ\zeta representing minimum earnings in our data. That is, the marginal consumption utility from supplying extra tasks equals the marginal cost of effort. Taking the ratio of these optimality conditions, this implies that the skill, effort and task intensity ratio are related by:

αmαc=(xmxc)ρ2ρ=(mc)ρ22,\frac{\alpha_{m}}{\alpha_{c}}=\Big{(}\frac{x_{m}}{x_{c}}\Big{)}^{\frac{\rho-2}{\rho}}=\Big{(}\frac{\ell_{m}}{\ell_{c}}\Big{)}^{\frac{\rho-2}{2}}, (40)

where the second equality follows from the worker task technology (3). The marginal rate of substitution between activities, (cm)ρ1\big{(}\frac{\ell_{c}}{\ell_{m}}\big{)}^{\rho-1}, is equal to the ratio of marginal benefits between activities, (αcαm)2cm\big{(}\frac{\alpha_{c}}{\alpha_{m}}\big{)}^{2}\frac{\ell_{c}}{\ell_{m}}. Relative efforts are determined by relative skills αcαm\frac{\alpha_{c}}{\alpha_{m}}. Workers spend more effort on tasks in which they are more talented.

Using the first-order conditions for effort, and observing that the share of total efforts on each task is constant by (40), we can express the Frisch elasticity of total hours c+m\ell_{c}+\ell_{m} as:181818See Section A.16.

ε=log(c+m)logz(x)|λ=log(c+m)log(1τ)|λ=1ρ1,\varepsilon=\frac{\partial\log(\ell_{c}+\ell_{m})}{\partial\log z(x)}\bigg{|}_{\lambda}=\frac{\partial\log(\ell_{c}+\ell_{m})}{\partial\log(1-\tau)}\bigg{|}_{\lambda}=\frac{1}{\rho-1}, (41)

where λ\lambda is the marginal value of wealth, and z(x):=w(x)/(c+m)z(x):=w(x)/(\ell_{c}+\ell_{m}) is productivity per hour. We set ρ=2.8\rho=2.8 so that the Frisch elasticity ε\varepsilon is indeed 0.550.55. Finally, we normalize κ=12ρ\kappa=\frac{1}{2\rho}.


Skill Distribution. We now identify the skill distribution pointwise. Using the solution to the worker’s problem, together with data on both total earnings and occupational relative task intensity for each worker, we separately identify two sources of worker productivity (αc,αm)(\alpha_{c},\alpha_{m}) that rationalize the data as a model outcome. This identification argument is similar to Boerma and Karabarbounis (2020, 2021) who use explicit solutions for home production models to identify productivity at home and to identify permanent and transitory market productivity using data on consumption, home and market hours.

Using the O*NET task measures, we have information on the relative task intensity for each occupation xmxc\frac{x_{m}}{x_{c}} and, hence, we identify the relative skills αmαc\frac{\alpha_{m}}{\alpha_{c}} by equation (40). In order to determine the level of tasks, we use the wage equation (36):

w(x)=12(xc2+xm2)η=xc2η2(1+(xmxc)2)η.w(x)=\frac{1}{2}\big{(}x^{2}_{c}+x^{2}_{m}\big{)}^{\eta}=\frac{x^{2\eta}_{c}}{2}\bigg{(}1+\Big{(}\frac{x_{m}}{x_{c}}\Big{)}^{2}\bigg{)}^{\eta}. (42)

Given the skill ratio for an individual’s occupation, xmxc\frac{x_{m}}{x_{c}}, and an individual’s earnings w(x)w(x), this equation uniquely determines the level of cognitive tasks xcx_{c}, and hence the level of manual tasks xmx_{m}. By the optimality condition (39), we identify both cognitive skills αc\alpha_{c} and manual skills αm\alpha_{m} for each worker.

Table 1: Example of Identification
Relative Task Wages Task Intensity Task Skills
   xm/xcx_{m}/x_{c}    w(x)w(x)    xmx_{m}    xcx_{c}    αmρ\alpha_{m}^{\rho}    αcρ\alpha_{c}^{\rho}
1 Baseline 1 1 1.00 1.00 0.50 0.50
2 Task intensity 3 1 1.35 0.45 0.63 0.26
3 Wages 1 4 2.00 2.00 0.87 0.87
4 Taxes τ=0.3\tau=0.3 1 1 1.00 1.00 0.71 0.71
5 Firms η=1.1\eta=1.1 1 1 0.97 0.97 0.42 0.42

Table 1 illustrates the identification of workers’ manual and cognitive skills through five examples. We infer higher levels of manual skills with higher manual task intensity (in Row 2), higher earnings (Row 3), higher taxes (Row 4), and with less dispersion in firms’ project values (Row 5).


Examples. In order to provide insight into the identification of worker skill heterogeneity, we consider a numerical example. We first consider an economy without taxes τ=0\tau=0 and without heterogeneity in firm projects, η=1\eta=1.

Suppose a worker’s occupational relative task intensity is equal to one, qmqc=xmxc=1\frac{q_{m}}{q_{c}}=\frac{x_{m}}{x_{c}}=1, and their earnings equal mean earnings, which we normalize to one. By equation (42), the worker’s cognitive task intensity and the worker’s manual task intensity are equal to 11. Using the optimality condition for task inputs (39), αsρ=12\alpha_{s}^{\rho}=\frac{1}{2}, implying the worker is equally skilled in both tasks. This worker is presented in the first row of Table 1.

Inferred manual skill increases with manual task intensity. Consider some worker with relative manual task intensity equal to three, xmxc=3\frac{x_{m}}{x_{c}}=3, and average earnings. By equation (42), the cognitive task intensity is xc=15<1x_{c}=\frac{1}{\sqrt{5}}<1 and hence the worker’s manual task intensity is greater with xm=35>1x_{m}=\frac{3}{\sqrt{5}}>1. Since αsρ=12xsρ2\alpha^{\rho}_{s}=\frac{1}{2}x_{s}^{\rho-2}, it follows that the worker’s inferred manual skill increases with relative manual task intensity, while the worker’s cognitive skills decreases, as shown in the second row of Table 1.

Inferred skill levels increase with earnings. For a worker with a relative task intensity of one, but a high level of earnings, the relative skill intensity is one but the level of each task is greater. Consider a worker earning four times average earnings. By equation (42), we identify the worker’s cognitive task intensity, and therefore the worker’s manual task intensity, to be equal to 22. Using the worker’s optimality condition for task inputs (39), αsρ=122ρ2\alpha_{s}^{\rho}=\frac{1}{2}2^{\rho-2}, implying that the worker is equally skilled in both tasks, and almost 1.75 times as skilled as a worker in the same occupation earning average earnings. This worker is presented in the third row of Table 1.

The presence of taxes does not affect inferred task intensities xx but does increase the inferred skill levels α\alpha. Since the identification of the task intensity is based on pretax earnings (42), inferred task intensities do not vary with taxes. For η=1\eta=1, since the task intensity does not change with taxes, we obtain αsρ=12(1τ)xsρ2\alpha^{\rho}_{s}=\frac{1}{2(1-\tau)}x_{s}^{\rho-2}. When workers are taxed, the marginal benefit from completing tasks is reduced. In order to rationalize the same levels of cognitive and manual task intensity supplied by a worker, it must be less costly for the worker to complete tasks due to increased levels of skills, as shown in the fourth row of Table 1.

Finally, increased dispersion in firm project values decreases wage dispersion that is attributed to dispersion in task intensity. Consider the dispersion in firm projects with η>1\eta>1. Reorganizing the wage equation (42), xc=(2w(x))12η/1+(xmxc)2x_{c}=\left(2w(x)\right)^{\frac{1}{2\eta}}\Big{/}\sqrt{1+\left(\frac{x_{m}}{x_{c}}\right)^{2}}, shows that higher values of η\eta compress the dispersion in task intensity. Further, by combining the first-order condition (39) with wage equation (42), we obtain αsρw(x)ρ2η1\alpha^{\rho}_{s}\propto w(x)^{\frac{\rho}{2\eta}-1}. An increase in η\eta decreases the effective dispersion in skills. Dispersion in firm projects magnifies underlying differences in task intensity due to the positive sorting between workers and projects. Equivalently, small differences in effective worker skills generate large earnings differences.

Table 2: Illustration of Identification
Occupation Relative Wages Manual Cognitive Firm SOC Code
   logqmqc\log\frac{q_{m}}{q_{c}}    𝔼w(x)\mathbb{E}w(x)    αm𝔼αmσm\frac{\alpha_{m}-\mathbb{E}\alpha_{m}}{\sigma_{m}}    αc𝔼αcσc\frac{\alpha_{c}-\mathbb{E}\alpha_{c}}{\sigma_{c}}    αz𝔼αzσz\frac{\alpha_{z}-\mathbb{E}\alpha_{z}}{\sigma_{z}}
Gardeners -1.7 123 -0.93 -2.35 -1.28 37-3010
Cashiers -0.7 120 -0.47 -1.16 -1.62 41-2010
Police officers -0.1 164 -0.82 -0.48 -0.82 33-3050
Physicians -0.2 184 -1.77 -1.32 -3.11 29-1060
Chief executives -2.1 149 -2.39 -1.71 -2.63 11-1010
Actuaries -2.7 136 -3.46 -1.65 -2.43 15-2010

Table 2 shows the identification of worker skills for a number of occupations. Holding constant the relative manual skill intensity, high earnings identify high skill levels as seen by comparing the manual and cognitive skills of police officers and physicians. Holding constant earnings, high manual task intensity identifies high manual skills as seen by comparing the skills of gardeners and cashiers.


Having illustrated the identification with examples, we turn to identification using earnings data. Table 2 illustrates the identification of underlying skills for representative workers in occupations listed in the first column. The second column shows the relative manual task intensity for these occupations from O*NET task measures. The third column shows average earnings of the workers by occupation in the ACS. Table 2 shows a negative relation between manual task intensity and average earnings by occupation, in line with Figure 2.

In order to identify manual and cognitive skills, we use equations (39), (40) and (42). First, we establish that higher earnings identify higher levels of skills, everything else equal. Consider an example of police officers and physicians. Since the relative task intensity for police officers and physicians is comparable, their relative skills are comparable by (40). Average earnings of physicians exceed the average earnings of police officers implying a higher level of both cognitive and manual skills for physicians. Indeed, the fourth and fifth column in Table 2 show that while both physicians and police officers’ cognitive and manual talents exceed the population average, αs>𝔼αs\alpha_{s}>\mathbb{E}\alpha_{s}, the skills of physicians exceed the skills of police officers in both dimensions.

Second, we consider two occupations with similar wages to show that high manual task intensity identifies high manual skill all else being equal. While the earnings of gardeners and cashiers are similar, gardening is more demanding in manual skills. By equation (42), the cognitive task requirements of gardeners are lower than the cognitive task requirements for cashiers. By equation (40) it follows that a gardener has more manual skills than a cashier, but less cognitive skills. The fourth and fifth column in Table 2 displays this pattern.

Refer to caption
Figure 3: Inferred Skill Distribution

Figure 3 shows the inferred worker skill distribution, with bright colors indicating more mass. The panel shows the smoothed distribution of cognitive and manual skills that exactly rationalizes the data which is obtained using data on relative task intensity by occupation and worker earnings, through equations (39) to (42). The values are normalized such that one reflects a uniform distribution.

We apply the identification argument to all workers in the ACS to identify their skills. By identifying skills at the worker level, we allow for skill heterogeneity within occupations driven by earnings differences within occupation. As in the example, workers with high earnings have higher cognitive and manual skills than a worker with low earnings in the same occupation. Figure 3 shows the resulting distribution of cognitive and manual skills, after 98 percent winsorization and after smoothing the pointwise identified distribution using a kernel density estimation.191919We correct our kernel density estimator at the boundaries of our rectangular type space by reflecting along all boundaries, see, e.g. Karunamuni and Alberts (2005).

Refer to caption
Refer to caption
Figure 4: Firm and Wage Distribution

Figure 4 shows the histogram for the inferred firm distribution (left panel) and the model implied distribution of wages (right panel).

For illustrative purposes, we introduce representative occupations in Figure 3. Specifically, we provide nine representative occupations within the type space. For example, cashiers are workers with both low cognitive and low manual skills, chief executives have low manual skills but high cognitive skills, while physicians have both high cognitive and high manual skills.

Finally, Figure 4 shows the inferred firm productivity distribution in the left panel and the implied wage distribution in the right panel. The left hand distribution shows that the distribution of firm projects is relatively concentrated with project values ranging from 30 percent below the mean to 40 percent above the mean (1.1). By construction, the right panel replicates the empirical wage distribution.

7 Quantitative Results

In this section, we present the quantitative results to the planning problem using the empirically relevant model of Section 6.

7.1 Unconstrained Benchmark

In order to build intuition for the solution, we first present a benchmark without incentive constraints and firm heterogeneity. The planning problem then simplifies to minimizing resource costs (10) subject to the promise keeping condition (8). By using the functional form for preferences, the promise keeping condition simplifies to:

(c(α)κ(xc(α)/αc)ρκ(xm(α)/αm)ρ)dΦ𝒰.\int\Big{(}\hskip-1.42271ptc(\alpha)-\kappa\big{(}x_{c}(\alpha)\big{/}\alpha_{c}\big{)}^{\rho}-\kappa\big{(}x_{m}(\alpha)\big{/}\alpha_{m}\big{)}^{\rho}\Big{)}\text{d}\Phi\geq\mathcal{U}. (43)

At the optimum, cognitive tasks are independent of workers’ routine skills, and the elasticity of cognitive tasks with respect to cognitive skills is ρρ2\frac{\rho}{\rho-2}. Furthermore, the solution does not feature bunching. In order to see this, note that the following condition has to be satisfied:

xsαsρρ2,x_{s}\propto\alpha_{s}^{\frac{\rho}{\rho-2}}, (44)

for each skill s{c,m}s\in\{c,m\}. Due to additive separability of tasks in preferences and technology, the efforts on task ss depend only on the worker’s skills in this task. Equivalently, there is no cross-dependence between tasks. Since (44) describes a one-to-one relation between the worker’s skills and efforts in each task, there is no bunching at optimum. That is, in a neighborhood of worker α\alpha, every pair of distinct workers (α,α′′)(\alpha^{\prime},\alpha^{\prime\prime}) is assigned distinct allocations as x(α)x(α′′)x(\alpha^{\prime})\neq x(\alpha^{\prime\prime}).

Refer to caption
Refer to caption
Figure 5: Benchmark Allocation

Figure 5 shows the benchmark allocation for task intensity by worker’s cognitive and manual skills. The left panel shows the allocation of manual tasks, the right panel illustrates the allocation of cognitive tasks. The optimal allocation does not feature any cross-dependence between tasks: manual task intensity only varies with manual skill, while cognitive task intensity only varies with cognitive skill.

Given the empirical description of the distribution for cognitive and manual skills in Figure 3, equation (44) gives the optimal allocation of both cognitive and manual tasks. Figure 5 visualizes the benchmark allocation of task intensity by worker’s cognitive and manual skills. The left panel shows the allocation of manual tasks, the right panel shows the allocation of cognitive tasks. Since (44) rules out any cross-dependence between tasks, the optimal allocation is captured by parallel horizontal and vertical lines, respectively. Manual task intensity only varies with manual skill, while cognitive task intensity only varies with cognitive skill.

7.2 Optimal Solution

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Planner Allocation

Figure 6 visualizes the solution by worker’s cognitive and manual skills. The top row shows the manual and cognitive task allocation, the bottom row shows the consumption allocation and the assignment of workers to firms. The solution features positive dependence between tasks. For example, optimal cognitive task intensity increases with manual skills.

Figure 6 shows the solution to the planner problem. The top row shows the allocation of manual and cognitive tasks, the bottom row shows the allocation of consumption and the assignment of workers to firms. In contrast to the benchmark, optimal task intensity in one skill depends positively on a worker’s other skills. Consider the manual task allocation in the top left panel. Similar to the benchmark, the manual task intensity increases with a worker’s manual skills holding constant their cognitive skills. In contrast to the benchmark solution, the manual task intensity also increases with workers’ cognitive skills. That is, workers with the same manual ability but with a higher cognitive ability conduct a higher level of manual tasks. Moreover, this codependence between cognitive skills and manual tasks intensifies at low levels of cognitive skill. This can be seen by the contour lines being almost negative 45 degree lines at low levels of manual ability, while being almost flat at high levels of manual ability. The same pattern holds for cognitive tasks.

In this economy, the binding incentive constraints are for high types to mimic to be low types, which is also generally the case with unidimensional skill. In order to prevent the high type from pretending to be the low type, the allocation for the low types is distorted. With multidimensional skills, the allocation for the low types is distorted both by reducing the level of task output similar to the unidimensional case, and by increasing the codependence between tasks. The latter is a new type of distortion that emerges in taxation problems with multidimensional skill.

The bottom left panel shows the solution for consumption. Consumption increases with skills. Consumption of workers with top cognitive skills exceeds consumption of workers with top manual skills due to higher absolute levels of skill. The bottom right panel shows the assignment of workers to firms. Given the cognitive and manual tasks, the planner assigns workers with greatest effective skills X=xc2+xm2X=x^{2}_{c}+x^{2}_{m} to projects of greater value following Proposition 1. A physician thus works on a more valuable project than a cashier as in the positive economy. Since the range of the cognitive skills is higher than the range of the manual skills, the high value projects are assigned towards workers with greater cognitive skills.


Bunching. We now describe the nature of bunching in the optimal solution. Bunching means that different workers are assigned identical labor supply allocations and, therefore, are also assigned identical consumption allocations (see Section 3.4). We use three distinct methods based on the theoretical analysis in Section 3 and Section 4 to comprehensively characterize the bunching patterns that emerge in the quantitative model.

Refer to caption
Figure 7: Bunching

Figure 7 illustrates bunching in the optimal allocation. The left panel identifies bunching by analyzing the determinant of the Hessian matrix of the indirect utility function, and shows the value of the determinant on a log base 10 scale. The right panel identifies bunching using Corollary 3 by analyzing deviations from the multidimensional optimal tax formula on a log base 10 scale. The variable on the horizontal axis is cognitive skill αc\alpha_{c}; the variable on the vertical axis is manual skill αm\alpha_{m}. Both panels identify that workers in the bottom left region of the type space are bunched under the optimal allocation.

First, we use our theoretical results in Lemma 2 to identify bunching by analyzing the determinant of the Hessian matrix of the indirect utility function. By Lemma 2, if the indirect utility function is not strongly convex at all points in the neighborhood of type pp, then worker pp is bunched. If the Hessian matrix is not invertible, then the indirect utility function is not strongly convex. A matrix is invertible if and only if the determinant is not equal to zero. Therefore, if the determinant of the Hessian matrix equals zero, the matrix is not invertible, so the indirect utility function is not strongly convex and the worker is bunched. Thus bunching is present in regions where the determinant of the Hessian matrix of the indirect utility function is equal to zero. We apply this method in the left panel of Figure 7, which shows the value of the determinant on a log base 10 scale. The left panel shows that workers in the bottom left (dark) region are bunched.

Second, we now use our theoretical results in Corollary 3 to identify bunching. If the multidimensional optimal tax formula without bunching does not hold, then the indirect utility function is not strongly convex for worker pp. By Lemma 2, this implies that worker type pp is bunched. Numerically, we analyze deviations from the multidimensional optimal tax formula without bunching to establish bunching. We apply this method in the right panel of Figure 7. The figure shows the deviations from the multidimensional optimal tax formula without bunching on a log base 10 scale. The right panel delivers the same bunching region as the left panel in the bottom left (light) region.

Refer to caption
Figure 8: Illustration of Bunching

Figure 8 illustrates the procedure to classify bunching using isocurves. The long-dashed lines represent isocurves for different cognitive task levels, while the short-dashed lines represent isocurves for different manual task levels. Worker α\alpha is bunched with worker α\alpha^{\prime} if the isocurves for x(α)x(\alpha) intersect α\alpha^{\prime}.

The third approach to identify bunching is based directly on the definition of bunching (see Section 3.4). When different worker types bunch, they are assigned identical task levels. Worker α\alpha is bunched if there are other workers who are assigned the same task levels x(α)x(\alpha). Visually, we draw the isocurves corresponding to both xc(α)x_{c}(\alpha) and xm(α)x_{m}(\alpha) displayed on Figure 8 in the worker space α\alpha and assess whether the isocurves intersect for any other worker α\alpha^{\prime}. If there exists a worker αα\alpha^{\prime}\neq\alpha such that the isocurves intersect, then workers α\alpha and α\alpha^{\prime} are bunched.

Figure 8 gives an example of the procedure to classify optimal bunching using isocurves. The long-dash lines represent isocurves for different cognitive task levels, while the short-dash lines represent isocurves for different manual task levels. The dots indicate three hypothetical allocations. First, consider the isocurves corresponding to the allocation of crane operators in the top left corner. The long-dash isocurve represents workers with other combinations of skills (αc,αm)(\alpha_{c},\alpha_{m}) who produce the same cognitive tasks xcHx_{cH} as the crane operator. The short-dash isocurve represents workers with other combinations of skills (αc,αm)(\alpha_{c},\alpha_{m}) who produce the same manual tasks xmHx_{mH} as the crane operator. The lines intersect only at one point - the skills of the crane operator at the top left corner. That is, no other worker (αc,αm)(\alpha_{c},\alpha_{m}) receives the same task allocation (xcH,xmH)(x_{cH},x_{mH}) that is assigned to the crane operator. Next, consider the isocurves corresponding to the allocation of a gardener in the middle of Figure 8. The long-dash isocurve (workers producing the same cognitive tasks xcMx_{cM} as the gardener) overlaps with the short-dash isocurve (workers producing the same manual tasks xmMx_{mM} as the gardener) for high levels of manual skill αm\alpha_{m} and for low levels of cognitive skill αc\alpha_{c}. These workers (αc,αm)(\alpha_{c},\alpha_{m}) produce the same cognitive and manual tasks (xcM,xmM)(x_{cM},x_{mM}) as the gardener, indicating that gardeners bunch with workers whose comparative advantage also lies in manual work. We call this type of bunching targeted bunching. Finally, consider the bottom-left allocation corresponding to cashiers. In this case, the long-dash and short-dash isolines for cognitive and manual tasks overlap throughout the type space. All workers with skills (αc,αm)(\alpha_{c},\alpha_{m}) on that line produce the same cognitive and manual tasks (xcL,xmL)(x_{cL},x_{mL}) as the cashier, despite their skill differences. We call this type of bunching blunt bunching, where the planner does not distinguish different worker types when allocating tasks.

Refer to caption
Refer to caption
Figure 9: Optimal Targeted and Blunt Bunching

Figure 9 shows bunching at the solution. The left panel demonstrates bunching in the allocation space by displaying combinations of optimal cognitive and manual tasks (xc,xm)(x_{c},x_{m}). An allocation is marked in green or in blue if the allocation is assigned to more than one worker, while the allocation is marked orange if the allocation is assigned to one worker. The right panel displays bunching in the worker type space (αc,αm)(\alpha_{c},\alpha_{m}). In this figure, a worker type is marked in green or in blue if the worker’s task allocation is also assigned to another worker. The green area indicates the blunt bunching region while the blue area indicates the targeted bunching region.

Figure 9 shows bunching at the optimal solution and presents two complementary views of the issue. The left panel displays bunching through the allocation of tasks. We display combinations of cognitive and manual tasks (xc,xm)(x_{c},x_{m}) that are optimal. An allocation is marked green or blue if the allocation is assigned to more than one worker, while the allocation is marked orange if the allocation is assigned to one worker. There are two main regions of bunching in this picture. The first region is that of the blunt bunching and is given by the green line segment at the bottom of both cognitive and manual tasks. The second region is that of the targeted bunching and is represented by two blue line segments at the borders of the task trapezoid. That is, targeted bunching happens when the task input is low for only one task. The lower (flat) blue line segment represents low manual tasks, and the upper (vertical) line segment represents low cognitive tasks. Note that on these borders, when the task intensity becomes sufficiently high, the allocations are no longer bunched – that is, the blue line turns orange on the edges of the trapezoid.


Blunt and Targeted Bunching. In order to see which workers are bunched, the right panel shows bunched workers in the type space (αc,αm)(\alpha_{c},\alpha_{m}). In this figure, a worker type is marked in blue or green if the worker’s task allocation is also assigned to another worker.202020We construct the distance between two allocations x(p)x(p) and x(p^)x(\hat{p}) by considering the Euclidean distance between the allocations relative to the Euclidean distance between the respective types pp and p^\hat{p}. We classify two allocations to be identical if this ratio is below 10410^{-4}. This panel shows that bunched workers have low cognitive or low manual skills (or both). Worker types are marked blue in the region of targeted bunching, and marked green in the region of blunt bunching. Workers with both low cognitive and manual skills are in the blunt bunching region. Workers with medium manual or medium cognitive skills are in the targeted bunching region when their skill set is asymmetric. Workers with medium cognitive skills bunch when their manual skills are low, and vice versa. In order to quantify the measure of bunched workers in the economy, we now combine the right panel with the worker type distribution of Figure 3. At the optimum, 10.4 percent of the workers is bunched. The blunt bunching region comprises 30 percent of bunched workers. The targeted bunching region comprises 70 percent of bunched workers.

Refer to caption
Refer to caption
Figure 10: Bunching Patterns

Figure 10 shows with whom workers bunch by connecting workers with a line in the worker type space if their allocations are bunched. Workers exclusively bunch with workers that are better in one skill dimension, but worse in another as represented by downward-sloping lines. Green lines indicate the blunt bunching region, blue lines indicate the targeted bunching region. Under blunt bunching, workers on the vertical boundary bunch with workers on the horizontal boundary, unlike under targeted bunching. The right-hand panel zooms in to distinguish the blunt bunching region from the targeted bunching region.

Workers bunch with other workers both near and far. While Figure 9 shows at what allocations workers are bunched and which workers are bunched, it does not show with whom workers bunch. These bunching relations are shown in Figure 10, which connects two workers with a line in the type space if they are bunched.212121To facilitate the presentation, we display the bunching relations for one quarter of all workers. In addition, we display at most two relations for each worker.

Workers do not bunch with workers over whom they have an absolute advantage or who have an absolute advantage over them. Workers exclusively bunch with workers that are better in one skill dimension but worse in another. In Figure 9 this is evident since all connections are represented by downward-sloping lines. Gardeners with somewhat better cognitive skills, but somewhat lower manual skills are bunched with gardeners with less cognitive skills and higher manual skills. Despite the slight difference in skills, the planner assigns both identical tasks.

Within our bunching regions we distinguish two distinct patterns. In the lower triangle, which we indicate by green lines, the planner bunches together cognitive and manual tasks for all workers, ranging from those with relatively high manual skills to those with relatively high cognitive skills. Workers with the same effective skill index produce the same cognitive and manual tasks. Work is not tailored towards workers’ specific skills but to an overall level of skill. In this blunt region, the planner optimally satisfies the incentive constraints by assigning identical allocations to different workers. Deviations are deterred bluntly at the cost of efficiency.

The planner also bunches workers in the targeted bunching regions, but in a less rudimentary way because the efficiency cost of bunching increases with worker skills. The planner separately bunches together cognitive and manual tasks for workers that are relatively skilled in manual tasks and similarly bunches together the cognitive and manual tasks for workers that are relatively skilled in cognitive tasks. Unlike in the lower triangle, however, workers that are relatively skilled in cognitive labor do not bunch with workers that are skilled in manual labor. In sum, the planner separates on worker’s comparative advantage and bunches on worker’s comparative disadvantage in the targeted bunching region.

In the regions without bunching, the planner distorts allocations to deter deviations by workers to allocations they find desirable similar to the unidimensional case. This is incentive provision through distinct distorted allocations.


Taxation: Wedges and Tax Implementation. We next study the implications for optimal taxes by studying the labor wedge (16) for cognitive and manual tasks. Using linear preferences for consumption, the labor wedge is:

1τs=12xs(α)ρ2z(α)αsρlog(1τs)logz+(ρ2)logxsρlogαs.1-\tau_{s}=\frac{1}{2}\frac{x_{s}(\alpha)^{\rho-2}}{z(\alpha)\alpha_{s}^{\rho}}\qquad\implies\qquad\log(1-\tau_{s})\propto-\log z+(\rho-2)\log x_{s}-\rho\log\alpha_{s}. (45)

The labor wedge is determined by the optimal assignment through logz\log z, by the allocation of tasks through (ρ2)logxs(\rho-2)\log x_{s}, and by worker skills through ρlogαs\rho\log\alpha_{s}. Workers at better firms face a higher labor wedge τs\tau_{s}. If the planner reduces task inputs, the labor wedge increases. Keeping allocations constant, the labor wedge increases with worker skills.

Refer to caption
Refer to caption
Figure 11: Tax Wedges

Figure 11 visualizes the tax wedges for the planner solution. The left panel displays the manual tax wedge, the right panel displays the cognitive tax wedge.

The optimal tax wedges are presented in Figure 11. First, the respective wedge is zero for the worker with the highest respective skill in either the cognitive or manual dimension. Consider the graph for the manual tax in the left panel. Workers with the highest manual skill are those represented by the top horizontal boundary of the graph. The manual tax on them is zero. Note that for the workers with the top cognitive skill, who are represented as the right vertical boundary, the manual tax is not zero. The manual tax on the best crane operator (highest manual and low cognitive) and physician (highest manual and high cognitive) is zero while it is positive on the top executive (low manual and highest cognitive).

The cognitive tax is displayed in the right panel of Figure 11. The cognitive tax for workers with the highest cognitive skill is zero. For workers with the top manual skill the cognitive tax is not zero. The cognitive tax on top executives (low manual and highest cognitive) and the best physician (highest manual and high cognitive) are zero while it is positive on the best crane operator (highest manual and low cognitive).

Refer to caption
Refer to caption
Figure 12: Manual and Cognitive Tax Wedge

Figure 12 describes how the tax wedges vary with the level of worker skill. Figure 12 displays the manual wedge in the left panel, and the cognitive tax wedge in the right panel, for three groups of workers separated by their cognitive and manual skills, respectively.

Second, we describe how taxes change with the level of the skill. Figure 12 plots the manual (on the left) and cognitive taxes (on the right) for three groups of workers separated by their skills corresponding to the heatmap in Figure 11. Consider top manual workers. Those are physicians (high cognitive), carpenters (medium cognitive), and crane operators (low cognitive). These workers are located in the top horizontal strip of Figure 11 and represented by the dash-dotted line in the right panel of Figure 12. These workers are not in bunched regions, and the cognitive tax on them is low.

Consider medium manual workers. Those are lawyers (high cognitive), police officers (medium cognitive), and gardeners (low cognitive). These workers are located in the middle horizontal strip of Figure 11 and are represented by the long-dash line in the right panel of Figure 12. The cognitive tax on them is higher. Three forces give this shape from equation (45) - the assignment to heterogeneous firms, the task allocation, and the worker skill. The assignment is monotonic: lawyers are assigned to better firms than police officers and gardeners. This force gives a higher labor wedge on lawyers compared to the gardeners. Conversely, the task allocation increases with skills leading to a reduction in labor wedges. Finally, the labor wedge increases in worker skills. The low cognitive skill workers (gardeners) face a higher cognitive labor wedge than the highest cognitive skill workers (lawyers) which is hence driven by the low amounts of cognitive tasks they conduct. The high level of tax distortion on gardeners is also driven by the fact that they are in the targeted bunching region while lawyers and police officers are not bunched.

Consider low manual workers. Those are executives (high cognitive), teachers (medium cognitive), and cashiers (low cognitive). These workers are located in the bottom horizontal strip of Figure 11 and are represented by the dotted line in the right panel of Figure 12. The cognitive tax on them is higher than on medium manual workers. The cognitive tax is generally decreasing in skill with executives facing a lower marginal tax rate than cashiers. The high level of distortions on cashiers and teachers is also driven by the fact that they are bunched. The teachers are in the targeted bunching region – they have comparative advantage in their cognitive skills and are separated along the cognitive dimension but bunched in the manual dimension. The cashiers are in the blunt bunching region. They are bunched in both the manual and the cognitive dimensions and this leads to their allocation being heavily distorted. In the left panel of of Figure 12 we repeat the tax analysis for the manual tax skill wedge, using the left panel of Figure 11.

In Appendix A.17, we show that the planner allocation is implementable through a tax system over task inputs xx. Since the effective skill index X=xc2+xm2X=x_{c}^{2}+x_{m}^{2} determines worker wages w(X)w(X) where ww is strictly increasing (Section 5), there is a one-to-one mapping between the skill index xc2+xm2x_{c}^{2}+x_{m}^{2} and earnings ww. Therefore, there is a one-to-one map between (w,xm/xc)(w,x_{m}/x_{c}) and (xc,xm)(x_{c},x_{m}). Since the optimum is implementable by a tax function T(xm,xc)T(x_{m},x_{c}) it is also implementable by a tax function that is only a function of income and line of work T^(w,xm/xc)\hat{T}(w,x_{m}/x_{c}).

8 Conclusion

We advance the understanding of optimal tax policy in multidimensional environments with bunching theoretically and quantitatively. Our optimal tax conditions generalize classic unidimensional optimal tax conditions of Mirrlees (1971), Diamond (1998) and Saez (2001) to multidimensional taxation problems and account for global incentive constraints and bunching. For an empirically relevant model, we show that bunching is both substantial and nuanced and, hence, importantly impacts optimal policy design.

References

  • (1)
  • Abowd, Lengermann, and McKinney (2003) Abowd, J. M., P. Lengermann, and K. L. McKinney (2003): “The Measurement of Human Capital in the U.S. Economy,” U.S. Census Bureau Working Paper.
  • Acemoglu and Autor (2011) Acemoglu, D., and D. Autor (2011): “Skills, Tasks and Technologies: Implications for Employment and Earnings,” in Handbook of Labor Economics, vol. 4, pp. 1043–1171.
  • Ambrosio and Gigli (2013) Ambrosio, L., and N. Gigli (2013): “A User’s Guide to Optimal Transport,” in Modelling and Optimisation of Flows on Networks, pp. 1–155.
  • Armstrong (1996) Armstrong, M. (1996): “Multiproduct Nonlinear Pricing,” Econometrica, 64(1), 51–75.
  • Becker (1973) Becker, G. S. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy, 81(4), 813–846.
  • Benabou (2002) Benabou, R. (2002): “Tax and Education Policy in a Heterogeneous-Agent Economy: What Levels of Redistribution Maximize Growth and Efficiency?,” Econometrica, 70(2), 481–517.
  • Bertsekas and Yu (2011) Bertsekas, D. P., and H. Yu (2011): “A Unifying Polyhedral Approximation Framework for Convex Optimization,” SIAM Journal on Optimization, 21(1), 333–360.
  • Blundell and Shephard (2012) Blundell, R., and A. Shephard (2012): “Employment, Hours of Work and the Optimal Taxation of Low-Income Families,” Review of Economic Studies, 79(2), 481–510.
  • Boerma and Karabarbounis (2020) Boerma, J., and L. Karabarbounis (2020): “Labor Market Trends and the Changing Value of Time,” Journal of Economic Dynamics and Control, 115.
  • Boerma and Karabarbounis (2021)    (2021): “Inferring Inequality with Home Production,” Econometrica, 89(5), 2517–2556.
  • Boerma, Tsyvinski, and Zimin (2025) Boerma, J., A. Tsyvinski, and A. P. Zimin (2025): “Sorting with Teams,” Journal of Political Economy, 133(2), 421–454.
  • Bogachev and Kolesnikov (2012) Bogachev, V. I., and A. V. Kolesnikov (2012): “The Monge-Kantorovich Problem: Achievements, Connections, and Perspectives,” Russian Mathematical Surveys, 67(5), 785–890.
  • Chetty, Guren, Manoli, and Weber (2012) Chetty, R., A. Guren, D. Manoli, and A. Weber (2012): “Does Indivisible Labor Explain the Difference Between Micro and Macro Elasticities? A Meta-Analysis of Extensive Margin Elasticities,” NBER Macroeconomics Annual, 27, 1–56.
  • Chiappori, McCann, and Nesheim (2010) Chiappori, P.-A., R. J. McCann, and L. P. Nesheim (2010): “Hedonic Price Equilibria, Stable Matching, and Optimal Transport: Equivalence, Topology, and Uniqueness,” Economic Theory, 42(2), 317–354.
  • Chiappori, McCann, and Pass (2017) Chiappori, P.-A., R. J. McCann, and B. Pass (2017): “Multi-to One-Dimensional Optimal Transport,” Communications on Pure and Applied Mathematics, 70(12), 2405–2444.
  • Conesa, Kitao, and Krueger (2009) Conesa, J. C., S. Kitao, and D. Krueger (2009): “Taxing Capital? Not a Bad Idea After All!,” American Economic Review, 99(1), 25–48.
  • Deming (2017) Deming, D. J. (2017): “The Growing Importance of Social Skills in the Labor Market,” Quarterly Journal of Economics, 132(4), 1593–1640.
  • Diamond (1998) Diamond, P. A. (1998): “Optimal Income Taxation: An Example with a U-shaped Pattern of Optimal Marginal Tax Rates,” American Economic Review, 88(1), 83–95.
  • Dupuy and Galichon (2014) Dupuy, A., and A. Galichon (2014): “Personality Traits and the Marriage Market,” Journal of Political Economy, 122(6), 1271–1319.
  • Duran and Grossmann (1986) Duran, M. A., and I. E. Grossmann (1986): “An Outer-Approximation Algorithm for a Class of Mixed-Integer Nonlinear Programs,” Mathematical Programming, 36, 307–339.
  • Ekeland and Moreno-Bromberg (2010) Ekeland, I., and S. Moreno-Bromberg (2010): “An Algorithm for Computing Solutions of Variational Problems with Global Convexity Constraints,” Numerische Mathematik, 115(1), 45–69.
  • Gabaix and Landier (2008) Gabaix, X., and A. Landier (2008): “Why has CEO Pay Increased so Much?,” Quarterly Journal of Economics, 123(1), 49–100.
  • Galichon and Salanié (2022) Galichon, A., and B. Salanié (2022): “Cupid’s Invisible Hand: Social Surplus and Identification in Matching Models,” Review of Economic Studies, 89(5), 2600–2629.
  • Gayle and Shephard (2019) Gayle, G.-L., and A. Shephard (2019): “Optimal Taxation, Marriage, Home Production, and Family Labor Supply,” Econometrica, 87(1), 291–326.
  • Geoffrion (1970) Geoffrion, A. M. (1970): “Elements of Large Scale Mathematical Programming Part II: Synthesis of Algorithms and Bibliography,” Management Science, 16(11), 676–691.
  • Golosov and Krasikov (2022) Golosov, M., and I. Krasikov (2022): “Multidimensional Screening in Public Finance: The Optimal Taxation of Couples,” University of Chicago Working Paper.
  • Griessler (2018) Griessler, C. (2018): “C-cyclical Monotonicity as a Sufficient Criterion for Optimality in the Multimarginal Monge-Kantorovich Problem,” Proceedings of the American Mathematical Society, 146(11), 4735–4740.
  • Guvenen, Kuruscu, Tanaka, and Wiczer (2020) Guvenen, F., B. Kuruscu, S. Tanaka, and D. Wiczer (2020): “Multidimensional Skill Mismatch,” American Economic Journal: Macroeconomics, 12(1), 210–44.
  • Guvenen, Ozkan, and Song (2014) Guvenen, F., S. Ozkan, and J. Song (2014): “The Nature of Countercyclical Income Risk,” Journal of Political Economy, 122(3), 621–660.
  • Heathcote, Storesletten, and Violante (2017) Heathcote, J., K. Storesletten, and G. L. Violante (2017): “Optimal Tax Progressivity: An Analytical Framework,” Quarterly Journal of Economics, 132(4), 1693–1754.
  • Heathcote and Tsujiyama (2021a) Heathcote, J., and H. Tsujiyama (2021a): “Optimal Income Taxation: Mirrlees meets Ramsey,” Journal of Political Economy, 129(11), 3141–3184.
  • Heathcote and Tsujiyama (2021b)    (2021b): “Practical Optimal Income Taxation,” FRB of Minneapolis Working Paper.
  • Judd, Ma, Saunders, and Su (2017) Judd, K., D. Ma, M. A. Saunders, and C.-L. Su (2017): “Optimal Income Taxation with Multidimensional Taxpayer Types,” Stanford University Working Paper.
  • Karunamuni and Alberts (2005) Karunamuni, R. J., and T. Alberts (2005): “A Generalized Reflection Method of Boundary Correction in Kernel Density Estimation,” Canadian Journal of Statistics, 33(4), 497–509.
  • Kleven, Kreiner, and Saez (2006) Kleven, H. J., C. T. Kreiner, and E. Saez (2006): “The Optimal Income Taxation of Couples,” NBER Working Paper No. 12685.
  • Kleven, Kreiner, and Saez (2009)    (2009): “The Optimal Income Taxation of Couples,” Econometrica, 77(2), 537–560.
  • Kocherlakota (2010) Kocherlakota, N. R. (2010): The New Dynamic Public Finance. Princeton University Press.
  • Kremer (1993) Kremer, M. (1993): “The O-Ring Theory of Economic Development,” Quarterly Journal of Economics, 108(3), 551–575.
  • Lehmann, Renes, Spiritus, and Zoutman (2021) Lehmann, E., S. Renes, K. Spiritus, and F. Zoutman (2021): “Optimal Taxation with Multiple Incomes and Types,” CEPR Discussion Paper No. 16571.
  • Lindenlaub (2017) Lindenlaub, I. (2017): “Sorting Multidimensional Types: Theory and Application,” Review of Economic Studies, 84(2), 718–789.
  • Lindenlaub and Postel-Vinay (2023) Lindenlaub, I., and F. Postel-Vinay (2023): “Multidimensional Sorting under Random Search,” Journal of Political Economy, 131(12), 3497–3539.
  • Lions (1998) Lions, P.-L. (1998): “Identification of the Dual Cone of Convex Functions and Applications,” Comptes Rendus de l’Academie des Sciences, 12(326), 1385–1390.
  • Lise and Postel-Vinay (2020) Lise, J., and F. Postel-Vinay (2020): “Multidimensional Skills, Sorting, and Human Capital Accumulation,” American Economic Review, 110(8), 2328–76.
  • McAfee and McMillan (1988) McAfee, R. P., and J. McMillan (1988): “Multidimensional Incentive Compatibility and Mechanism Design,” Journal of Economic Theory, 46(2), 335–354.
  • Mirrlees (1971) Mirrlees, J. A. (1971): “An Exploration in the Theory of Optimum Income Taxation,” Review of Economic Studies, 38(2), 175–208.
  • Moser and Olea de Souza e Silva (2019) Moser, C., and P. Olea de Souza e Silva (2019): “Optimal Paternalistic Savings Policies,” Columbia Business School Working Paper.
  • Oberman (2013) Oberman, A. M. (2013): “A Numerical Method for Variational Problems with Convexity Constraints,” SIAM Journal on Scientific Computing, 35(1), 378–396.
  • Rochet and Choné (1998) Rochet, J.-C., and P. Choné (1998): “Ironing, Sweeping, and Multidimensional Screening,” Econometrica, 66(4), 783–826.
  • Rockafellar (1999) Rockafellar, R. T. (1999): Network Flows and Monotropic Optimization, vol. 9. Athena Scientific.
  • Rosen (1981) Rosen, S. (1981): “The Economics of Superstars,” American Economic Review, 71(5), 845–858.
  • Rothschild and Scheuer (2013) Rothschild, C., and F. Scheuer (2013): “Redistributive Taxation in the Roy Model,” Quarterly Journal of Economics, 128(2), 623–668.
  • Rothschild and Scheuer (2014)    (2014): “A Theory of Income Taxation under Multidimensional Skill Heterogeneity,” NBER Working Paper No. 19822.
  • Rothschild and Scheuer (2016)    (2016): “Optimal Taxation with Rent-Seeking,” Review of Economic Studies, 83(3), 1225–1262.
  • Roys and Taber (2022) Roys, N., and C. Taber (2022): “Skill Prices, Occupations, and Changes in the Wage Structure for Low Skilled Men,” University of Wisconsin Working Paper.
  • Saez (2001) Saez, E. (2001): “Using Elasticities to Derive Optimal Income Tax Rates,” Review of Economic Studies, 68(1), 205–229.
  • Sanders and Taber (2012) Sanders, C., and C. Taber (2012): “Life-cycle Wage Growth and Heterogeneous Human Capital,” Annual Review of Economics, 4(1), 399–425.
  • Scheuer (2014) Scheuer, F. (2014): “Entrepreneurial Taxation with Endogenous Entry,” American Economic Journal: Economic Policy, 6(2), 126–63.
  • Scheuer and Werning (2017) Scheuer, F., and I. Werning (2017): “The Taxation of Superstars,” Quarterly Journal of Economics, 132(1), 211–270.
  • Shaked and Shanthikumar (2007) Shaked, M., and J. G. Shanthikumar (2007): Stochastic Orders. Springer Series in Statistics.
  • Song, Price, Guvenen, Bloom, and Von Wachter (2019) Song, J., D. J. Price, F. Guvenen, N. Bloom, and T. Von Wachter (2019): “Firming Up Inequality,” Quarterly Journal of Economics, 134(1), 1–50.
  • Tervio (2008) Tervio, M. (2008): “The Difference That CEOs Make: An Assignment Model Approach,” American Economic Review, 98(3), 642–68.
  • Yamaguchi (2012) Yamaguchi, S. (2012): “Tasks and Heterogeneous Human Capital,” Journal of Labor Economics, 30(1), 1–53.

Bunching and Taxing Multidimensional Skills

Online Appendix

Job Boerma, Aleh Tsyvinski and Alexander Zimin

March 2025


Appendix A Proofs

In this appendix, we formally prove the results in the main text.

A.1 Proposition 1

To understand the optimal assignment, we consider a discrete version of the problem with identical discrete worker distributions {x1s}={x2s}\{x_{1s}\}=\{x_{2s}\}, which we denote by FxF_{x}, and a discrete firm distribution {zs}Fz\{z_{s}\}\sim F_{z}, which we denote by FzF_{z}, for s={1,,n}s=\{1,\dots,n\}. The discrete problem is to find an assignment γ\gamma to maximize output:

maxγΓ¯γijky(x1i,x2j,zk).\max\limits_{\gamma\in\underline{\Gamma}}\;\sum\gamma_{ijk}\hskip 1.42271pty(x_{1i},x_{2j},z_{k}). (A.1)

where γΓ¯:={γijk0|jkγijk=1,ikγijk=1,ijγijk=1}\gamma\in\underline{\Gamma}:=\{\gamma_{ijk}\geq 0\hskip 2.27626pt\big{|}\hskip 1.13791pt\sum_{jk}\gamma_{ijk}=1,\sum_{ik}\gamma_{ijk}=1,\sum_{ij}\gamma_{ijk}=1\}. We next solve this problem in steps.

First, we prove that without loss we can focus on assignments γ\gamma that are symmetric in worker inputs, so that γijk=γjik\gamma_{ijk}=\gamma_{jik}. Suppose a solution γ\gamma is not symmetric, and use that the worker input samples are identical to define another feasible transport plan γ^\hat{\gamma} so that γ^ijk=γjik\hat{\gamma}_{ijk}=\gamma_{jik} for all workers and projects. When γ\gamma solves the assignment problem, so does γ^\hat{\gamma} because γ^ijky(x1i,x2j,zk)=γjiky(x1i,x2j,zk)=γjiky(x2j,x1i,zk)=γijky(x1i,x2j,zk)\sum\hat{\gamma}_{ijk}y(x_{1i},x_{2j},z_{k})=\sum\gamma_{jik}y(x_{1i},x_{2j},z_{k})=\sum\gamma_{jik}y(x_{2j},x_{1i},z_{k})=\sum\gamma_{ijk}y(x_{1i},x_{2j},z_{k}), where the second equality follows as the production technology is symmetric in worker inputs, and the third equality follows by relabeling. This implies that the assignment 12(γ+γ^)\frac{1}{2}\left(\gamma+\hat{\gamma}\right) is also a solution, which is indeed a symmetric solution. In summary, for every optimal assignment γ\gamma there is a symmetric assignment 12(γ+γ^)\frac{1}{2}\left(\gamma+\hat{\gamma}\right) that also solves the assignment problem. Without loss of generality we can therefore focus on assignments γ\gamma that are symmetric in worker inputs.

Second, we prove it is optimal to self-sort workers so that γijk0\gamma_{ijk}\neq 0 implies that the workers are identical, or i=ji=j. Consider an optimal symmetric assignment γ\gamma and some project zkz_{k}, and denote the joint distribution of workers assigned to this project by γijk:=γijk\gamma^{k}_{ij}:=\gamma_{ijk}. We construct the marginal distributions of workers and coworkers assigned to this project as μ1ik:=jγijk\mu^{k}_{1i}:=\sum_{j}\gamma^{k}_{ij} and μ2jk:=iγijk\mu^{k}_{2j}:=\sum_{i}\gamma^{k}_{ij}. Due to the symmetry of the assignment function γ\gamma, the worker and coworker distribution within the firm are identical, μk=μ1k=μ2k\mu^{k}=\mu^{k}_{1}=\mu^{k}_{2}. Further, we let γ^k\hat{\gamma}^{k} denote the optimal reassignment of workers and coworkers within the project:

maxγkΓ~(μ,μ)γijy(x1i,x2j,zk).\max\limits_{\gamma^{k}\in\tilde{\Gamma}(\mu,\mu)}\;\sum\gamma_{ij}\hskip 1.42271pty\big{(}x_{1i},x_{2j},z_{k}\big{)}. (A.2)

that is, γ^k\hat{\gamma}^{k} solves the assignment problem within a firm given worker and coworker distribution μ\mu.

Within firms it is optimal to self-sort workers. Suppose some firm zz is assigned some worker and coworker distribution μ\mu, and consider two identical samples from this distribution. The within-firm assignment problem given these identical worker samples {xij}\{x_{ij}\} for i{1,2}i\in\{1,2\} and j𝒥:={1,,J}j\in\mathcal{J}:=\{1,\dots,J\} is to choose an assignment, or equivalently a permutation σ\sigma, to maximize output zj𝒥(x1cjx2cσ(j)+x1mjx2mσ(j))z\sum\limits_{j\in\mathcal{J}}\hskip 1.42271pt\big{(}x^{j}_{1c}x^{\sigma(j)}_{2c}+x^{j}_{1m}x^{\sigma(j)}_{2m}\big{)}. Using the rearrangement inequality, aggregate output is bounded by:

maxσzj𝒥(x1cjx2cσ(j)+x1mjx2mσ(j))zj𝒥(x1cjx2cj+x1mjx2mj)=zj𝒥((xcj)2+(xmj)2).\max\limits_{\sigma}\;z\sum\limits_{j\in\mathcal{J}}\hskip 1.42271pt\big{(}x^{j}_{1c}x_{2c}^{\sigma(j)}+x_{1m}^{j}x_{2m}^{\sigma(j)}\big{)}\leq z\sum\limits_{j\in\mathcal{J}}\hskip 1.42271pt\big{(}x^{j}_{1c}x^{j}_{2c}+x^{j}_{1m}x^{j}_{2m}\big{)}=z\sum\limits_{j\in\mathcal{J}}\hskip 1.42271pt\big{(}(x^{j}_{c})^{2}+(x^{j}_{m})^{2}\big{)}. (A.3)

The final equality follows as the worker and coworker distributions are identical. We conclude that self-sorting within every project attains maximum production. The rearrangement inequality implies optimality of positively sorting the skills of workers within each firm as the production technology for each unidimensional task is supermodular as in Becker (1973). In our environment with multidimensional skills positive sorting within each task is indeed attained by self-sorting, implying that γ^k\hat{\gamma}^{k} is a diagonal matrix.

To formally establish that the optimal assignment function features self-sorting for each firm, also with continuous marginal distributions μ\mu, we observe that (A.3) implies that the self-sorting set MX×XM\subset X\times X is cc-monotone (see, e.g., Bogachev and Kolesnikov (2012) or Ambrosio and Gigli (2013)).

Definition.

The set MM is cc-monotoneg if for all pairings (x11,x21),(x12,x22),,(x1n,x2n)M(x_{11},x_{21}),(x_{12},x_{22}),\dots,(x_{1n},x_{2n})\in M:

j𝒥y(x1j,x2j)j𝒥y(x1j,x2σ(j))\sum\limits_{j\in\mathcal{J}}y(x_{1j},x_{2j})\geq\sum\limits_{j\in\mathcal{J}}y(x_{1j},x_{2\sigma(j)}) (A.4)

for any permutation σ\sigma.

The cc-monotonicity condition directly implies the weaker condition that the matching set MM is cc-cyclically monotone, or j𝒥y(x1j,x2j)j𝒥y(x1j,x2j+1)\sum\limits_{j\in\mathcal{J}}y(x_{1j},x_{2j})\geq\sum\limits_{j\in\mathcal{J}}y(x_{1j},x_{2j+1}), where x2n+1=x21x_{2n+1}=x_{21}. The self-sorting assignment γz\gamma_{z} with support on matching set MM is optimal as this statement is equivalent to the support of γz\gamma_{z} being cc-cyclically monotone following Theorem 1.2.7 in Bogachev and Kolesnikov (2012) or Theorem 1.13 in Ambrosio and Gigli (2013).

Given optimal self-sorting of workers within each firm, we construct a diagonal assignment γ^\hat{\gamma} by replacing γk\gamma^{k} with the optimal self-sorted γ^k\hat{\gamma}^{k} for every project kk. Since γ^k\hat{\gamma}^{k} solves the assignment problem within each firm, γ^ijky(x1i,x2j,zk)γijky(x1i,x2j,zk)\sum\hat{\gamma}_{ijk}y(x_{1i},x_{2j},z_{k})\geq\sum\gamma_{ijk}y(x_{1i},x_{2j},z_{k}). By the construction of γ^\hat{\gamma}, it holds that γ^ijk0\hat{\gamma}_{ijk}\neq 0 implies that workers are identical i=ji=j for any kk. Without loss of generality, an optimal assignment indeed features self-sorting of workers with coworkers within projects zz. We define effective worker skills, or a team’s quality, by X:=xc2+xm2X:=x^{2}_{c}+x^{2}_{m}.

Finally, the optimal assignment sorts the best teams with the most valuable firm projects. Since the optimal assignment of workers within firms is self-sorting, the Kantorovich problem (A.1) simplifies to finding assignment γikΓ¯:={γik0|kγik=1,iγik=1}\gamma_{ik}\in\underline{\Gamma}:=\{\gamma_{ik}\geq 0\hskip 2.27626pt\big{|}\hskip 1.13791pt\sum_{k}\gamma_{ik}=1,\sum_{i}\gamma_{ik}=1\} to solve:

maxγΓ¯γiky(Xi,zk),\max\limits_{\gamma\in\underline{\Gamma}}\;\sum\gamma_{ik}\hskip 1.42271pty\big{(}X_{i},z_{k}\big{)}, (A.5)

that is, to assign teams to firms. Given that the reduced-form production technology is supermodular, the solution to this problem is a positive sorting between the team quality XiX_{i} and the project value zkz_{k}. The solution to the original multimarginal Kantorovich problem (A.1) is then constructed using γijk=γikδij\gamma_{ijk}=\gamma_{ik}\delta_{ij}, where δ\delta is the Kronecker delta function.

While we constructed the solution to the Kantorovich formulation of the assignment problem in the main text, we observe that the optimal assignment to the discrete planning problem γijk\gamma_{ijk} is a Monge solution, meaning γijk{0,1}\gamma_{ijk}\in\{0,1\}. This means the optimal assignment is a solution to the discrete planning problem of choosing permutations σi\sigma_{i}, to maximize output

maxσ1,σ2sy(x1σ1(s),x2σ2(s),zs).\max\limits_{\sigma_{1},\sigma_{2}}\;\sum_{s}\hskip 1.42271pty\big{(}x_{1\sigma_{1}(s)},x_{2\sigma_{2}(s)},z_{s}\big{)}. (A.6)

given the identical worker samples {xis}\{x_{is}\} drawn from the distribution FxF_{x} and a firm project sample {zs}\{z_{s}\} drawn from the distribution FzF_{z} for s={1,,n}s=\{1,\dots,n\}.

Before proceeding, we observe that a transport problem with two identical worker distributions FxF_{x} with measure one for each role is equivalent to a transport problem with a single distribution of workers FxF_{x} with measure two. Owing to the symmetry of workers’ skills in production (4), these are equivalent. Intuitively, any assignment for a problem with distinct worker distributions can be made symmetric. Consider assignment γ\gamma that solves the discrete assignment problem for distinct worker distributions Fx1F_{x_{1}} and Fx2F_{x_{2}}. The transpose of the assignment γ\gamma along the worker input dimensions, which we denote by γ\gamma^{\prime}, solves the discrete assignment problem with worker input distributions Fx2F_{x_{2}} and Fx1F_{x_{1}}. This implies that symmetric assignment γ^:=γ+γ2\hat{\gamma}:=\frac{\gamma+\gamma^{\prime}}{2} solves the discrete assignment problem with worker distributions Fx=12(Fx1+Fx2)F_{x}=\frac{1}{2}(F_{x_{1}}+F_{x_{2}}).


Continuous Distributions. To obtain the solution for continuous distributions of workers and coworkers, we extend our argument for the discrete distributions. We construct an assignment γ\gamma that self-sorts workers and coworkers to obtain a unidimensional distribution for team quality XX. The assignment γ\gamma combines self-sorting of workers with positive sorting between worker skill index XX and projects zz.

This assignment γ\gamma solves the Kantorovich problem (9). To prove this claim, denote the support of the assignment by MM, the matching set. Consider a collection of points within the matching set, {(x1s,x2s,zs)}M\{(x_{1s},x_{2s},z_{s})\}\in M, then for each of those points it holds x1s=x2sx_{1s}=x_{2s}, and that zszsz_{s}\leq z_{s^{\prime}} implies XsXsX_{s}\leq X_{s^{\prime}}. Since the support is constructed by using a Monge solution for the discrete assignment problem, sy(x1σ1(s),x2σ2(s),zs)sy(x1s,x2s,zs)\sum_{s}\hskip 1.42271pty(x_{1\sigma_{1}(s)},x_{2\sigma_{2}(s)},z_{s})\leq\sum_{s}\hskip 1.42271pty(x_{1s},x_{2s},z_{s}) for all permutations σ1,σ2\sigma_{1},\sigma_{2}. Equivalently, the matching set MM is cc-monotone. By Theorem 1.2 in Griessler (2018), the assignment γ\gamma solves the Kantorovich problem. This concludes the proof to Proposition 1.

A.2 Incentive Constraints

We first discuss the differentiability of the indirect utility function uu. Feasibility of an allocation is exactly equivalent to the indirect utility function uu being feasible, which means that uu is convex, nonnegative, and decreasing, together with the additional constraint x(p)u(p)-x(p)\in\partial u(p), where u\partial u denotes the subdifferential of uu at a given point.

Consider the incentive constraint (12):

c(p)pcxc(p)pmxm(p)c(p)pcxc(p)pmxm(p)c(p)-p_{c}x_{c}(p)-p_{m}x_{m}(p)\geq c(p^{\prime})-p_{c}x_{c}(p^{\prime})-p_{m}x_{m}(p^{\prime})

Using the notation for the indirect utility function (14), that is, u(p)=c(p)pcxc(p)pmxm(p)u(p)=c(p)-p_{c}x_{c}(p)-p_{m}x_{m}(p), this can be equivalently rewritten as

u(p)u(p)pp,x(p).u(p)-u(p^{\prime})\geq\langle p-p^{\prime},-x(p^{\prime})\rangle.

This inequality for each pair of types (p,p)(p,p^{\prime}) is equivalent to the convexity of uu together with the constraint that x(p)u(p)-x(p)\in\partial u(p), where u\partial u denotes the subdifferential of uu at given point. In this case, we know by Alexandrov theorem that uu, since it is a convex function, is differentiable almost everywhere, and hence u(p)=u(p)\partial u(p)=\nabla u(p) almost everywhere.


We next show which incentive compatibility constraints are redundant to the planner problem in the numerical analysis. We establish that every reducible incentive constraint is redundant in the presence of the irreducible constraints, which shrinks the set of incentive constraints that needs to be taken into account by the planner. To show this result, we let L2L\subseteq\mathbb{R}^{2} be a finite subset and we first define irreducible constraints.

ADEBC
Figure A.1: Reducible and Irreducible Incentive Constraints

Figure A.1 shows reducible and irreducible incentive constraints for worker AA. When the irreducible incentive constraints between workers AA and BB are satisfied (as indicated by the black solid line between workers AA and BB), and the irreducible incentive constraints between workers BB and CC are satisfied (as indicated by the blue solid line between workers BB and CC), then reducible incentive constraints between AA and CC are satisfied (as indicated by the orange dashed line). Every reducible incentive constraint is satisfied when the irreducible constraints are. The black solid lines denote all the irreducible incentive constraints for worker AA.

Definition.

A couple of points (p,q)L(p,q)\subseteq L is irreducibleg if there is no point mLm\in L on the interval between pp and qq.

Lemma 4.

All reducible incentive constraints are implied by irreducible incentive constraints.

The proof is presented below.

Figure A.1 shows the reducible and the irreducible incentive constraints for worker AA. When the irreducible incentive constraints between workers AA and BB are satisfied (as indicated by the black solid line between workers AA and BB), and the irreducible incentive constraints between workers BB and CC are satisfied (as indicated by the blue solid line between workers BB and CC), then reducible incentive constraints between AA and CC are satisfied (as indicated by the orange dashed line). Every reducible incentive constraint is satisfied when the irreducible constraints are. The black solid lines denote the irreducible incentive constraints for worker AA. The incentive constraints between workers AA and DD as well as between workers AA and EE are also reducible.


We next establish that no other incentive constraints can be eliminated a priori. The set of feasible allocations strictly increases by removing any of the irreducible incentive constraint.

Lemma 5.

Consider any irreducible incentive constraint where worker type p0p_{0} does not want to report to be of type q0q_{0}. If we eliminate such an incentive constraint, then there exists an allocation that satisfies all other incentive constraints while worker type p0p_{0} wants to report q0q_{0}. That is, for any irreducible pair (p0,q0)L(p_{0},q_{0})\subseteq L there exist functions (u,x)(u,x) such that:

u(p)u(q)pq,x(q)u(p)-u(q)\geq\langle p-q,-x(q)\rangle

for all (p,q)L(p,q)\in L where (p,q)(p0,q0)(p,q)\neq(p_{0},q_{0}) and

u(p0)u(q0)<p0q0,x(q0).u(p_{0})-u(q_{0})<\langle p_{0}-q_{0},-x(q_{0})\rangle.

The proof is presented below.

We denote the set of utility allocations that satisfy the set of irreducible linear incentive constraints by \mathcal{I}.


Proof to Lemma 4. Let vv be a ray from a parameter point p=(pc,pm)p=(p_{c},p_{m}), and let scalar parameters λ\lambda and β\beta such that 0<λ<β0<\lambda<\beta. We consider points p+λvp+\lambda v and p+βvp+\beta v.

We first show that if incentive constraints between points pp and p+λvp+\lambda v as well as p+λvp+\lambda v and p+βvp+\beta v are satisfied, then incentive constraints between pp and p+βvp+\beta v are implied. By (15), we know the incentive constraint that pp does not want to report q=p+λvq=p+\lambda v implies:

u(p)u(p+λv)λv,u(p+λv)=λv,x(p+λv).u(p)-u(p+\lambda v)\geq-\lambda\langle v,\partial u(p+\lambda v)\rangle=\lambda\langle v,x(p+\lambda v)\rangle. (A.7)

Similarly, the incentive constraint that q=p+λvq=p+\lambda v does not want to report pp implies:

u(p+λv)u(p)λv,u(p)=λv,x(p)u(p+\lambda v)-u(p)\geq\lambda\langle v,\partial u(p)\rangle=-\lambda\langle v,x(p)\rangle

Adding these two constraints, we obtain v,x(p)v,x(p+λv)\langle v,x(p)\rangle\geq\langle v,x(p+\lambda v)\rangle.

Using the incentive constraint that p+βvp+\beta v does not want to report p+λvp+\lambda v:

u(p+βv)u(p+λv)(βλ)v,x(p+λv),u(p+\beta v)-u(p+\lambda v)\geq-(\beta-\lambda)\langle v,x(p+\lambda v)\rangle, (A.8)

we show that given these incentive constraints, the constraint between points p+βvp+\beta v and pp is implied. We evaluate

u(p+βv)u(p)\displaystyle u(p+\beta v)-u(p) =(u(p+βv)u(p+λv))+(u(p+λv)u(p))\displaystyle=\big{(}u(p+\beta v)-u(p+\lambda v)\big{)}+\big{(}u(p+\lambda v)-u(p)\big{)}
(βλ)v,x(p+λv)λv,x(p)\displaystyle\geq-(\beta-\lambda)\langle v,x(p+\lambda v)\rangle-\lambda\langle v,x(p)\rangle
(βλ)v,x(p)λv,x(p)=βv,x(p)\displaystyle\geq-(\beta-\lambda)\langle v,x(p)\rangle-\lambda\langle v,x(p)\rangle=-\beta\langle v,x(p)\rangle

where the first inequality follows from the first and third incentive constraint, while the second inequality follows as v,x(p)v,x(p+λv)\langle v,x(p)\rangle\geq\langle v,x(p+\lambda v)\rangle. The final equality indeed implies that p+βvp+\beta v does not want to report pp.

Similarly, we use the incentive constraint that p+λvp+\lambda v does not want to report p+βvp+\beta v:

u(p+λv)u(p+βv)(λβ)v,x(p+βv)u(p+\lambda v)-u(p+\beta v)\geq-(\lambda-\beta)\langle v,x(p+\beta v)\rangle (A.9)

in order to prove that pp does not want to report p+βvp+\beta v:

u(p)u(p+βv)\displaystyle u(p)-u(p+\beta v) =(u(p)u(p+λv))+(u(p+λv)u(p+βv))\displaystyle=\big{(}u(p)-u(p+\lambda v)\big{)}+\big{(}u(p+\lambda v)-u(p+\beta v)\big{)}
λv,x(p+λv)(λβ)v,x(p+βv)\displaystyle\geq\lambda\langle v,x(p+\lambda v)\rangle-(\lambda-\beta)\langle v,x(p+\beta v)\rangle
λv,x(p+βv)(λβ)v,x(p+βv)=βv,x(p+βv)\displaystyle\geq\lambda\langle v,x(p+\beta v)\rangle-(\lambda-\beta)\langle v,x(p+\beta v)\rangle=\beta\langle v,x(p+\beta v)\rangle

where the final inequality follows by adding (A.8) and (A.9), which implies v,x(p+λv)v,x(p+βv)\langle v,x(p+\lambda v)\rangle\geq\langle v,x(p+\beta v)\rangle. This shows we do not need to incorporate incentive constraints between pp and p+βvp+\beta v when we incorporate the incentive constraint between pp and p+λvp+\lambda v, and between p+λvp+\lambda v and p+βvp+\beta v.

The final step is that our result so far held for general scalar parameters λ\lambda and β\beta be such that 0<λ<β0<\lambda<\beta. We note that for λ¯\underline{\lambda} so that 0<λ¯<λ<β0<\underline{\lambda}<\lambda<\beta, we can show that the constraints between pp and p+λvp+\lambda v are implied by the constraints between pp and p+λ¯vp+\underline{\lambda}v as well as p+λ¯vp+\underline{\lambda}v and p+λvp+\lambda v. Hence, for every point pp we only need to consider the constraints for the lowest possible values for λ\lambda. These constraints are irreducible.


Proof to Lemma 5. By induction. The induction base is a set of points in LL with |L|=2|L|=2. Let the points within the set be p0p_{0} and q0q_{0}. We show there exist functions (u,x)(u,x) so that:

u(q0)u(p0)\displaystyle u(q_{0})-u(p_{0}) q0p0,x(p0)\displaystyle\geq\langle q_{0}-p_{0},-x(p_{0})\rangle
u(p0)u(q0)\displaystyle u(p_{0})-u(q_{0}) <p0q0,x(q0).\displaystyle<\langle p_{0}-q_{0},-x(q_{0})\rangle.

Construct the function x(p0)=x(q0)=0x(p_{0})=x(q_{0})=0, and u(p0)=0u(p_{0})=0 and u(q0)=1u(q_{0})=1.

Induction step for |L|=n+1|L|=n+1 points. Let zz denote a vertex of the convex hull of set LL which is neither p0p_{0} nor q0q_{0}. Such a point indeed exists, else the convex hull is an interval between p0p_{0} and q0q_{0}, implying that any other point of the set LL would be a point between p0p_{0} and q0q_{0} contradicting (p0,q0)(p_{0},q_{0}) is irreducible.

Remove the point zz from the set LL. By induction step at nn, there exist functions (u,x)(u,x) for the lattice L{z}L\setminus\{z\} such that for the same irreducible pair (p0,q0)L{z}(p_{0},q_{0})\subseteq L\setminus\{z\}:

u(p)u(q)pq,x(q)u(p)-u(q)\geq\langle p-q,-x(q)\rangle

for all (p,q)L{z}(p,q)\in L\setminus\{z\} and (p,q)(p0,q0)(p,q)\neq(p_{0},q_{0}) and

u(p0)u(q0)<p0q0,x(q0).u(p_{0})-u(q_{0})<\langle p_{0}-q_{0},-x(q_{0})\rangle.

We need to extend the functions uu and xx onto the point zz. Here, we will use that zz is a vertex of the convex hull. We construct the value for the functions uu and xx at point zz. At the point zz, we require:

u(z)u(p)\displaystyle u(z)-u(p) zp,x(p)\displaystyle\geq\langle z-p,-x(p)\rangle
u(p)u(z)\displaystyle u(p)-u(z) pz,x(z)\displaystyle\geq\langle p-z,-x(z)\rangle

for all pL{z}p\in L\setminus\{z\}. Reorganizing, the first inequality becomes

u(z)maxpz{u(p)+zp,x(p)}=:Cu(z)\geq\max_{p\neq z}\;\big{\{}u(p)+\langle z-p,-x(p)\rangle\big{\}}=:\mathrm{C}

where we observe constant C\mathrm{C} is independent of both u(z)u(z) and x(z)x(z). We set u(z)=Cu(z)=\mathrm{C}. As a result, the second inequality is written as:

u(p)Cpz,x(z).u(p)-\mathrm{C}\geq\langle p-z,-x(z)\rangle.

To show that the inequality is satisfied, we use the following variation of Farkas’ Lemma. For any convex polytope P\mathrm{P} and for any vertex vv of this polytope, there exist a hyperplane such that vv belongs to the hyperplane while all other points of the convex polytope P\mathrm{P} lie strictly on one side of it. Equivalently, there exists a vector hh such that xv,h<0\langle x-v,h\rangle<0 for all xP{v}x\in\mathrm{P}\setminus\{v\}.

Since point zz is a vertex of the convex hull of LL there exists x~(z)\tilde{x}(z) so that pz,x~(z)<0\langle p-z,-\tilde{x}(z)\rangle<0 for every pL{z}p\in L\setminus\{z\}. Define the constant Cp=pz,x~(z)<0\mathrm{C}_{p}=\langle p-z,-\tilde{x}(z)\rangle<0. Then there exists positive value tp>0t_{p}>0 so that:

pz,tpx~(z)=tpCp<u(p)C.\langle p-z,-t_{p}\tilde{x}(z)\rangle=t_{p}\mathrm{C}_{p}<u(p)-\mathrm{C}.

Further, let t=maxpL{z}tpt=\max\limits_{p\in L\setminus\{z\}}t_{p}, implying that pz,tx~(z)=tCptpCp<u(p)C\langle p-z,-t\tilde{x}(z)\rangle=t\mathrm{C}_{p}\leq t_{p}\mathrm{C}_{p}<u(p)-\mathrm{C} for all pL{z}p\in L\setminus\{z\}. Hence, we set x(z)=tx~(z)x(z)=t\tilde{x}(z) to conclude our claim.

A.3 Lemma 2

We first prove that if the indirect utility function (14) is strongly convex, then there is no bunching. By the inverse function theorem, using that the indirect utility function uu is twice continuously differentiable, if the Jacobian matrix of the mapping from pp to xx is invertible then the labor task allocation is invertible. The Jacobian matrix of the mapping from pp to xx is the negative to the Hessian matrix of the indirect utility function (xc/pcxc/pmxm/pcxm/pm)\Big{(}\begin{smallmatrix}\partial x_{c}/\partial p_{c}&\partial x_{c}/\partial p_{m}\\ \;\partial x_{m}/\partial p_{c}\;\;&\;\;\partial x_{m}/\partial p_{m}\;\end{smallmatrix}\Big{)} using x(p)=u(p)x(p)=-\nabla u(p). Since the utility function uu is strongly convex for worker pp, its Hessian matrix is invertible, and hence the Jacobian matrix is. Summarizing, if the utility function is strongly convex, then there is no bunching.

Now we prove that if the indirect utility function (14) is not strongly convex, i.e. the Hessian matrix is degenerate for all workers in the neighborhood of pp, worker pp is bunched, or pp\in\mathcal{B}. To prove this statement, consider a mapping ff from worker type pp to the labor allocation xx, and let 𝒫\mathcal{P} denote the neighborhood of workers around pp such that Hessian matrix H(u)H(u) is degenerate for all workers p𝒫p\in\mathcal{P}. Since the Jacobian matrix of the mapping ff is the negative to the Hessian matrix of the indirect utility function, the Jacobian matrix is degenerate for all workers p𝒫p\in\mathcal{P}. Equivalently, 𝒫\mathcal{P} is a critical set. By Sard’s theorem it follows that the image f(𝒫)f(\mathcal{P}) has Lebesgue measure zero.

To prove that worker pp is bunched, suppose by contradiction they are not, pp\notin\mathcal{B}. Equivalently, the mapping ff is injective in a neighborhood 𝒫^𝒫\hat{\mathcal{P}}\subseteq\mathcal{P}. By the invariance of domain, the image f(𝒫^)f(\hat{\mathcal{P}}) is a non-empty open set. This implies that the Lebesgue measure is strictly positive for the image, contradicting the implication from Sard’s theorem. Thus, workers are bunched when the optimality condition does not hold.

A.4 Proposition 2

To prove the proposition, we prove Lemma 6 and Lemma 7.

Lemma 6.

Let (c,x)(c,x) solve the planner problem. The following condition holds with equality at an optimum:

(𝒞(c)c+z(𝒳(xc)xc+𝒳(xm)xm))πdp=λ(cpcxcpmxm)πdp.\int\big{(}\mathcal{C}^{\prime}(c)c+z\big{(}\mathcal{X}^{\prime}(x_{c})x_{c}+\mathcal{X}^{\prime}(x_{m})x_{m}\big{)}\big{)}\pi\text{d}p=\lambda\int\big{(}c-p_{c}x_{c}-p_{m}x_{m}\big{)}\pi\text{d}p. (A.10)
Proof.

Consider an allocation (c,x)(c,x) that satisfies the incentive compatibility constraints. Consider multiplying this allocation by a constant factor ζ>0\zeta>0, to obtain the scaled allocation ζ(c,x)\zeta(c,x). The Lagrangian of the scaled allocation exceeds the Lagrangian of the optimal allocation (c,x)(c,x), or (c,x)(ζ(c,x))\mathcal{L}(c,x)\leq\mathcal{L}(\zeta(c,x)). Therefore, we can consider a variation around the optimal allocation (c,x)(c,x), where we scale the allocation by a small factor ε\varepsilon, so that alternative allocation (c+εc,x+εx)(c+\varepsilon c,x+\varepsilon x) is feasible. Given such a variation, the implied change in the resource cost is:

Δ=ε((𝒞(c)c+z(𝒳(xc)xc+𝒳(xm)xm))πdpλ(cpcxcpmxm)πdp)+o(ε).\Delta=\varepsilon\Big{(}\int\big{(}\mathcal{C}^{\prime}(c)c+z\big{(}\mathcal{X}^{\prime}(x_{c})x_{c}+\mathcal{X}^{\prime}(x_{m})x_{m}\big{)}\big{)}\pi\text{d}p-\lambda\int\big{(}c-p_{c}x_{c}-p_{m}x_{m}\big{)}\pi\text{d}p\Big{)}+o(\varepsilon). (A.11)

At an optimum, neither a positive (ε>0)(\varepsilon>0) nor a negative (ε<0)(\varepsilon<0) small variation decreases the cost of resources, so Δ=o(ε)\Delta=o(\varepsilon), which establishes (A.10).∎

Lemma 7.

Let (c,x)(c,x) solve the planning problem, then the implementability condition (18) holds for any feasible allocation (c^,x^)(\hat{c},\hat{x})\in\mathcal{I}.

Proof.

If two allocations (c,x)(c,x) and (c^,x^)(\hat{c},\hat{x}) satisfy the incentive constraints, so does their convex combination (c~,x~)=(1ε)(c,x)+ε(c^,x^)(\tilde{c},\tilde{x})=(1-\varepsilon)(c,x)+\varepsilon(\hat{c},\hat{x}) for any ε(0,1)\varepsilon\in(0,1). If (c,x)(c,x) is a planner solution, it follows that for any ε(0,1)\varepsilon\in(0,1) and for any feasible allocation (c^,x^)(\hat{c},\hat{x}), a convex combination of the alternative allocation and the solution increases the Langrangian value relative to its optimum, (c~,x~)(c,x)0\mathcal{L}(\tilde{c},\tilde{x})-\mathcal{L}(c,x)\geq 0. By construction of the convex combination, this is equivalent to ((c,x)+ε((c^,x^)(c,x)))(c,x)0\mathcal{L}\big{(}(c,x)+\varepsilon\big{(}(\hat{c},\hat{x})-(c,x)\big{)}\big{)}-\mathcal{L}\big{(}c,x\big{)}\geq 0.

To further develop this, note that ((c,x)+ε((c^,x^)(c,x)))(c,x)=ε((𝒞(c)(c^c)+z𝒳(xs)(x^sxs))πdpλ((c^c)ps(x^sxs))πdp)+o(ε)=ε((𝒞(c)c^+z𝒳(xs)x^s)πdpλ(c^psx^s)πdp)+o(ε)\mathcal{L}((c,x)+\varepsilon((\hat{c},\hat{x})-(c,x)))-\mathcal{L}(c,x)=\varepsilon(\int(\mathcal{C}^{\prime}(c)(\hat{c}-c)+z\sum\mathcal{X}^{\prime}(x_{s})(\hat{x}_{s}-x_{s}))\pi\text{d}p-\lambda\int((\hat{c}-c)-\sum p_{s}(\hat{x}_{s}-x_{s}))\pi\text{d}p\big{)}+o(\varepsilon)=\varepsilon(\int(\mathcal{C}^{\prime}(c)\hat{c}+z\sum\mathcal{X}^{\prime}(x_{s})\hat{x}_{s})\pi\text{d}p-\lambda\int(\hat{c}-\sum p_{s}\hat{x}_{s})\pi\text{d}p)+o(\varepsilon), where the final equality follows by the optimality condition in Lemma 6. Equation (18) follows because for any ε(0,1)\varepsilon\in(0,1) the previous condition is positive, that is:

(𝒞(c)c^+z(𝒳(xc)x^c+𝒳(xm)x^m))πdpλ(c^pcx^cpmx^m)πdp,\int\big{(}\mathcal{C}^{\prime}(c)\hat{c}+z\big{(}\mathcal{X}^{\prime}(x_{c})\hat{x}_{c}+\mathcal{X}^{\prime}(x_{m})\hat{x}_{m}\big{)}\big{)}\pi\text{d}p\geq\lambda\int\big{(}\hat{c}-p_{c}\hat{x}_{c}-p_{m}\hat{x}_{m}\big{)}\pi\text{d}p, (18)

for any allocation (c^,x^)(\hat{c},\hat{x})\in\mathcal{I}.∎

A.5 Global Optimal Tax Formula in One Dimension

In this appendix, we develop the connection of our general optimal tax condition with stochastic dominance to the classic ABC formula. First, we consider equation (20) under unidimensional skill heterogeneity. With a slight abuse of notation, we denote the unidimensional skill by pp. In this case, equation (20) simplifies to:

p(π(p𝒞(c)+z𝒳(x)))u^dpπ(λC(c))u^dp,\int\partial_{p}\big{(}\pi(p\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x))\big{)}\hat{u}\text{d}p\geq\int\pi(\lambda-C^{\prime}(c))\hat{u}\text{d}p, (A.12)

for any decreasing, nonnegative and convex indirect utility function u^\hat{u} with u^(p¯)=0\hat{u}(\bar{p})=0.222222Asserting there is no bunching at the top of the unidimensional worker skill distribution, both the boundary conditions are zero under the additional condition that πC(c)dpλ\int\pi C^{\prime}(c)\text{d}p\geq\lambda. Moreover, with one dimension of worker heterogeneity, the measure ff second-order stochastically dominates the measure gg if and only if p¯p^F(p)dpp¯p^G(p)dp\int_{\underline{p}}^{\hat{p}}F(p)\text{d}p\geq\int_{\underline{p}}^{\hat{p}}G(p)\text{d}p, where FF and GG denote cumulative distribution functions (see the next paragraph). When the unidimensional measure pπ(p𝒞(c)+z𝒳(x))\partial_{p}\pi(p\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x)) second-order stochastically dominates the unidimensional measure π(λC(c))\pi(\lambda-C^{\prime}(c)) it thus implies:

p¯pπ(s)u(𝒞(c(s)))sτ1τdsp¯pp¯tπ(s)u(𝒞(c(s)))(1u(𝒞(c(s)))λ)dsdt\int^{p}_{\underline{p}}\frac{\pi(s)}{u^{\prime}(\mathcal{C}(c(s)))}s\frac{\tau}{1-\tau}\text{d}s\leq\int^{p}_{\underline{p}}\int^{t}_{\underline{p}}\frac{\pi(s)}{u^{\prime}(\mathcal{C}(c(s)))}\left(1-u^{\prime}(\mathcal{C}(c(s)))\lambda\right)\text{d}s\text{d}t (A.13)

for every worker pp, where we use the definition of the labor skill wedge (16), which changes the inequality sign, and also use that 𝒞(c)=1/u(𝒞(c))\mathcal{C}^{\prime}(c)=1/u^{\prime}(\mathcal{C}(c)). At an optimum, the utility-weighted average benefit of increasing marginal tax rates for all workers below pp, on the right, exceeds the corresponding costs. The benefit of an increase in a marginal tax rate is an increase in revenues collected from workers below pp (high α\alpha) net of the cost of tightening the promise-keeping constraint, p¯tπ(s)u(𝒞(c(s)))(1u(𝒞(c(s)))λ)ds\int^{t}_{\underline{p}}\frac{\pi(s)}{u^{\prime}(\mathcal{C}(c(s)))}\left(1-u^{\prime}(\mathcal{C}(c(s)))\lambda\right)\text{d}s. The cost of increasing the marginal tax for all workers below pp is captured by the marginal utility-weighted labor wedge. Our optimal tax formula as stochastic dominance (20) extends this logic to multidimensional skills.


Second-Order Stochastic Dominance in One Dimension. Let Υa(p)\Upsilon_{a}(p) denote a decreasing, nonnegative and convex function parameterized by aa that is strictly positive for all p<ap<a and is equal to zero for all pap\geq a. Specifically, we let Υa(p):=max(ap,0)\Upsilon_{a}(p):=\max(a-p,0). Given that Υa(p)\Upsilon_{a}(p) is decreasing, nonnegative, and convex, measure ff second-order stochastically dominating measure gg implies that ΥafdpΥagdp\int\Upsilon_{a}f\text{d}p\geq\int\Upsilon_{a}g\text{d}p for all aa following (21). Given the specification for Υa(p)\Upsilon_{a}(p) this is equivalent to 0a(ap)fdp0a(ap)gdp\int_{0}^{a}(a-p)f\text{d}p\geq\int_{0}^{a}(a-p)g\text{d}p for all aa, alternatively FdpGdp\int F\text{d}p\geq\int G\text{d}p. Since any unidimensional decreasing, nonnegative and convex indirect utility function u^\hat{u} with u^(p¯)=0\hat{u}(\bar{p})=0 can be considered as a positive combination of Υa(p)\Upsilon_{a}(p), the claim holds.


Relation to Mirrlees (1971), Diamond (1998), and Saez (2001). Second, we discuss in more detail how the optimal tax condition (24) directly relates to the ABC formulas in Diamond (1998) and Saez (2001), and to the optimal tax condition in Mirrlees (1971). To see the relationship, we first take the ABC formula in Saez (2001), which is equation (25) in his paper:

τl(θ)1τl(θ)=(1+1ε)1F(θ)θf(θ)θ[1uc(x)p]uc(θ)uc(x)f(x)1F(θ)dx\frac{\tau_{l}(\theta)}{1-\tau_{l}(\theta)}=\left(1+\frac{1}{\varepsilon}\right)\frac{1-F(\theta)}{\theta f(\theta)}\int_{\theta}^{\infty}\left[1-\frac{u_{c}(x)}{p}\right]\frac{u_{c}(\theta)}{u_{c}(x)}\frac{f(x)}{1-F(\theta)}\text{d}x (A.14)

Reorganizing this expression, letting ρ=(1+1ε)\rho=\left(1+\frac{1}{\varepsilon}\right), we obtain:

1ρf(θ)uc(θ)θτl(θ)1τl(θ)=θ[1uc(x)1p]f(x)dx,\frac{1}{\rho}\frac{f(\theta)}{u_{c}(\theta)}\theta\frac{\tau_{l}(\theta)}{1-\tau_{l}(\theta)}=\int_{\theta}^{\infty}\left[\frac{1}{u_{c}(x)}-\frac{1}{p}\right]f(x)\text{d}x, (A.15)

which is, in fact, the representation of the optimal tax formula in equation (33) in Mirrlees (1971). Differentiating this expression with respect to type θ\theta, we obtain:

1ρθ(f(θ)uc(θ)θτl(θ)1τl(θ))=[λ1uc(θ)]f(θ).\frac{1}{\rho}\partial_{\theta}\left(\frac{f(\theta)}{u_{c}(\theta)}\theta\frac{\tau_{l}(\theta)}{1-\tau_{l}(\theta)}\right)=\left[\lambda-\frac{1}{u_{c}(\theta)}\right]f(\theta). (A.16)

This is the unidimensional analog to our characterization in equation (24), where we observe that the multiplier pp on the resource constraint in Saez (2001) is the inverse of the multiplier on the promise keeping constraint 1λ\frac{1}{\lambda}, which follows directly from the Lagrangian (17).

A.6 Global Optimal Tax Formula

In this appendix, we derive the global optimal taxation formula (24). For ease of presentation, we assume that the convex indirect utility functions are smooth, meaning the second derivative is well-defined and continuous.

First, we reformulate the planner problem using the definition of the indirect utility function (14). Given the indirect utility function uu, both consumption and effort allocations can be expressed in terms of uu and its gradient. The planner chooses an indirect utility function uu to minimize the resource cost of providing welfare:

minuC(𝒞(u(p)u(p)p)+z(p)(𝒳(u(p)pc)+𝒳(u(p)pm)))π(p)dp,\min_{u\in C}\int\left(\mathcal{C}(u(p)-\nabla u(p)\cdot p)+z(p)\left(\mathcal{X}\left(-\frac{\partial u(p)}{p_{c}}\right)+\mathcal{X}\left(-\frac{\partial u(p)}{p_{m}}\right)\right)\right)\pi(p)\text{d}p, (A.17)

subject to the incentive constraint that requires the indirect utility to be convex and decreasing in worker type pp (Lemma 1), or uCu\in C, and the promise keeping condition:

u(p)π(p)dp𝒰.\int u(p)\pi(p)\text{d}p\geq\mathcal{U}. (A.18)

We introduce multiplier λ0\lambda\geq 0 on the promise keeping condition (A.18) in order to formulate the Lagrangian:

minuC(𝒞(u(p)u(p)p)+z(p)(𝒳(u(p)pc)+𝒳(u(p)pm)))πdpλ(u(p)πdp𝒰)\min_{u\in C}\int\hskip-1.42271pt\left(\mathcal{C}(u(p)-\nabla u(p)\cdot p)+z(p)\left(\mathcal{X}\left(-\frac{\partial u(p)}{\partial p_{c}}\right)+\mathcal{X}\left(-\frac{\partial u(p)}{\partial p_{m}}\right)\right)\right)\pi\,\text{d}p-\lambda\left(\int u(p)\pi\,\text{d}p-\mathcal{U}\right)

In the remainder of this appendix, we refer to the problem of minimizing the Lagrangian shorthand as:

minuCJ(u)=minuCPL(p,u(p),u(p))dp,\min_{u\in C}J(u)=\min_{u\in C}\int_{P}L(p,u(p),\nabla u(p))\text{d}p, (A.19)

where L(p,u(p),u(p))L(p,u(p),\nabla u(p)) is the contribution of worker type pp to the Lagrangian and where PP denotes the type space, a bounded convex subset of 2\mathbb{R}^{2}.

A.6.1 Convex Indirect Utility

The restriction that the indirect utility function is convex is equivalent to the Hessian matrix of the indirect utility function being positive semidefinite, H(u)0H(u)\succeq 0 for all pPp\in P, which in turn is equivalent to:

vTH(u)v0,v^{T}H(u)v\geq 0, (A.20)

for all v2v\in\mathbb{R}^{2}. We treat these inequalities as an infinite series of constraints parameterized by both vv and pp. For each of these constraints, we introduce a corresponding multiplier λ(v,p)0\lambda(v,p)\geq 0. The objective function is augmented to include these multipliers:

minuCJ(u)P2λ(v,p)(vTH(u)v)dvdp.\min_{u\in C}\;J(u)-\int_{P}\int_{\mathbb{R}^{2}}\lambda(v,p)\big{(}v^{T}H(u)v\big{)}\text{d}v\text{d}p. (A.21)

Next, fix a worker type pp and consider the multipliers associated with this worker type. Using ,\langle\cdot,\cdot\rangle to denote the inner products of matrices, we write:

2λ(v,p)(vTH(u)v)dv=H(u),2λ(v,p)vvTdv.\int_{\mathbb{R}^{2}}\lambda(v,p)\big{(}v^{T}H(u)v\big{)}\text{d}v=\bigg{\langle}H(u),\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v\bigg{\rangle}. (A.22)

This equation expresses the quadratic form as an inner product between the Hessian matrix H(u)H(u) and a matrix defined by the integral 2λ(v,p)vvTdv\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v, which we denote by AA.

We next characterize all matrices that can be represented as a convex combination of outer products vvTvv^{T} over the directions v2v\in\mathbb{R}^{2}:

A=2λ(v,p)vvTdvA=\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v (A.23)

where λ(v,p)0\lambda(v,p)\geq 0 is a non-negative function.

Lemma 8.

Matrix AA is a symmetric and positive semidefinite if and only if A=2λ(v,p)vvTdvA=\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v with respect to some function λ(v,p)0\lambda(v,p)\geq 0.

Proof.

We first establish that A=2λ(v,p)vvTdvA=\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v with respect to some function λ(v,p)0\lambda(v,p)\geq 0 implies AA is symmetric and positive semidefinite. Matrix AA is symmetric as each outer product vvTvv^{T} is a symmetric matrix, and any linear combination or integral of symmetric matrices is symmetric. Second, AA is positive semidefinite because for any vector w2w\in\mathbb{R}^{2}, wTAw=2λ(v,p)(wTv)2dv0w^{T}Aw=\int_{\mathbb{R}^{2}}\lambda(v,p)(w^{T}v)^{2}\text{d}v\geq 0, which shows AA is positive semidefinite.

Next, we show the converse also holds. We start with a matrix that is symmetric and positive semidefinite and show that it can be written in the form of equation (A.23). Any symmetric matrix A2×2A\in\mathbb{R}^{2\times 2} can be factorized using its eigenvalue decomposition. If AA is a positive semidefinite matrix, it can be written as:

A=QΛQT,A=Q\Lambda Q^{T}\;,

where Q2×2Q\in\mathbb{R}^{2\times 2} is an orthogonal matrix whose columns are the eigenvectors of AA. Since the matrix is positive semidefinite, it follows that Λ\Lambda is a diagonal matrix with nonnegative eigenvalues λi0\lambda_{i}\geq 0.

Since each eigenvalue λi\lambda_{i} corresponds to an eigenvector qiq_{i}, we rewrite AA as a finite sum of outer products of the eigenvectors:

A=i=12λiqiqiT.A=\sum_{i=1}^{2}\lambda_{i}q_{i}q_{i}^{T}.

The sum can be generalized to an integral, with the eigenvectors qiq_{i} replaced by the corresponding vectors v2v\in\mathbb{R}^{2}, and the eigenvalues λi\lambda_{i} replaced by the corresponding continuous nonnegative function λ(v,p)\lambda(v,p). In sum, positive semidefinite matrix AA can be expressed in the form:

A=2λ(v)vvTdv,A=\int_{\mathbb{R}^{2}}\lambda(v)vv^{T}\text{d}v,

where λ(v,p)0\lambda(v,p)\geq 0.∎

Instead of considering the continuous family of multipliers λ(v,p)\lambda(v,p), we represent the constraint that the indirect utility function has to be convex as a matrix condition by introducing the Kuhn-Tucker matrix M(p)M(p) for each pPp\in P. By Lemma 8, the matrix M(p)M(p) is required to be positive semidefinite, or M(p)0M(p)\succeq 0, for all pPp\in P. The Kuhn-Tucker matrix substitutes the term 2λ(v,p)vvTdv\int_{\mathbb{R}^{2}}\lambda(v,p)vv^{T}\text{d}v in the objective function (A.21), where λ(v,p)0\lambda(v,p)\geq 0:

minuJ(u)PH(u),M(p)dp,\min_{u}\;J(u)-\int_{P}\big{\langle}H(u),M(p)\big{\rangle}\text{d}p, (A.24)

where M(p)M(p) is the positive semidefinite matrix that enforces the convexity of the indirect utility function.

We proceed by integrating by parts the term:

PH(u),M(p)dp=Pi,j2u(p)pipjMij(p)dp.\int_{P}\big{\langle}H(u),M(p)\big{\rangle}\text{d}p=\int_{P}\sum_{i,j}\frac{\partial^{2}u(p)}{\partial p_{i}\partial p_{j}}M_{ij}(p)\text{d}p. (A.25)

Through integration by parts, we shift the derivatives from the indirect utility function to the Kuhn-Tucker matrix to obtain:232323Since the boundary terms do not affect the derivation of the optimality condition and the general optimal taxation formula, we suppress them for ease of exposition.

Pi,j2u(p)pipjMij(p)dp=Pi,j2Mij(p)pipju(p)dp=Pu(p)ΔM(p)dp,\int_{P}\sum_{i,j}\frac{\partial^{2}u(p)}{\partial p_{i}\partial p_{j}}M_{ij}(p)\text{d}p=\int_{P}\sum_{i,j}\frac{\partial^{2}M_{ij}(p)}{\partial p_{i}\partial p_{j}}u(p)\text{d}p=\int_{P}u(p)\Delta M(p)\text{d}p\;, (A.26)

where ΔM(p)=i,j2Mij(p)pipj\Delta M(p)=\sum\limits_{i,j}\frac{\partial^{2}M_{ij}(p)}{\partial p_{i}\partial p_{j}}. The resulting objective function for the problem is:

minuPL(p,u(p),u(p))dpPu(p)ΔM(p)dp.\min_{u}\int_{P}L(p,u(p),\nabla u(p))\,\text{d}p-\int_{P}u(p)\Delta M(p)\,\text{d}p. (A.27)

We can rewrite this by combining the promise keeping constraint with the convexity correction u(p)ΔM(p)u(p)\Delta M(p) as:

minuC\displaystyle\min_{u\in C}\int\hskip-1.42271pt (𝒞(u(p)u(p)p)+z(p)(𝒳(u(p)pc)+𝒳(u(p)pm)))πdp\displaystyle\left(\mathcal{C}(u(p)-\nabla u(p)\cdot p)+z(p)\left(\mathcal{X}\left(-\frac{\partial u(p)}{\partial p_{c}}\right)+\mathcal{X}\left(-\frac{\partial u(p)}{\partial p_{m}}\right)\right)\right)\pi\,\text{d}p
λ(πu(p)(1+ΔM(p)λπ)dp𝒰)\displaystyle\hskip 123.76965pt-\lambda\left(\int\pi u(p)\left(1+\frac{\Delta M(p)}{\lambda\pi}\right)\,\text{d}p-\mathcal{U}\right) (A.28)

This representation shows that the requirement that the indirect utility function is convex leads to the modified social welfare weight 1+ΔM(p)λπ=1+2pipjMij(p)λπ1+\frac{\Delta M(p)}{\lambda\pi}=1+\frac{\sum\frac{\partial^{2}}{\partial p_{i}\partial p_{j}}M_{ij}(p)}{\lambda\pi}. In other words, the main difference that convexity adds to the planning problem is through modifying the welfare function by the convexity correction. We next show how this result carries over to the optimality conditions.

A.6.2 Optimality Conditions

We derive the optimality conditions by using the objective function (A.27), and by considering a small variation in the indirect utility function. The first variation of the objective gives:

Luk=12pk(Luk)=ΔM(p),\frac{\partial L}{\partial u}-\sum_{k=1}^{2}\frac{\partial}{\partial p_{k}}\left(\frac{\partial L}{\partial u_{k}}\right)=\Delta M(p), (A.29)

where uk=upku_{k}=\frac{\partial u}{\partial p_{k}}. The left-hand side is the standard optimality condition, Lupk(Luk)\frac{\partial L}{\partial u}-\sum\frac{\partial}{\partial p_{k}}\big{(}\frac{\partial L}{\partial u_{k}}\big{)}. This yields the optimality condition without bunching, Lupk(Luk)=0\frac{\partial L}{\partial u}-\sum\frac{\partial}{\partial p_{k}}\big{(}\frac{\partial L}{\partial u_{k}}\big{)}=0, when the convexity constraint does not bind.

The right-hand side gives the additional term involving the second derivatives of the Kuhn-Tucker matrix. This term arises from the integration by parts of the Kuhn-Tucker matrix, and represents the effect of the convexity constraint, which is enforced through the positive semidefinite matrix MM.

Our derivation shows that the minimizer of our variational problem over convex indirect utility functions satisfies an optimality condition with an additional term arising from the Kuhn-Tucker multipliers associated with the convexity constraint. Our results build on Lions (1998) which analyzes variational problems over convex functions through a duality approach. Lions (1998) shows that the optimality conditions can be understood in terms of the polar cone of convex functions, where elements of the dual space are represented by measures linked to second derivatives. Our result explicitly incorporates Kuhn-Tucker multipliers into the variational problem to enforce the convexity constraint and includes them into the optimality condition. This construction directly shows the effect of convexity by showing how additional measure terms arise in the optimality conditions.

A.6.3 Optimal Tax Formula

We next apply the optimality condition for the general Lagrangian (A.29) to the Lagrangian for the optimality multidimensional taxation problem (A.17). As a result, we write the optimal taxation formula as:

pc(π(pc𝒞(c)+z𝒳(xc)))+pm(π(pm𝒞(c)+z𝒳(xm)))=π(λC(c))+ΔM(p).\partial_{p_{c}}\big{(}\pi(p_{c}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{c})))+\partial_{p_{m}}\hskip-1.9919pt\left(\pi(p_{m}\mathcal{C}^{\prime}(c)+z\mathcal{X}^{\prime}(x_{m}))\right)=\pi(\lambda-C^{\prime}(c))+\Delta M(p). (A.30)

Similar to our reformulation of the general optimal tax formula as stochastic dominance (22), we use the definition of the labor skill wedge (16) to rewrite the optimal taxation formula as:

pc(πu(𝒞(c))pcτc1τc)+pm(πu(𝒞(c))pmτm1τm)=π(1u(𝒞(c))λ)ΔM(p),\partial_{p_{c}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{c}\frac{\tau_{c}}{1-\tau_{c}}\Big{)}+\partial_{p_{m}}\Big{(}\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{m}\frac{\tau_{m}}{1-\tau_{m}}\Big{)}=\pi\left(\frac{1}{u^{\prime}(\mathcal{C}(c))}-\lambda\right)-\Delta M(p), (A.31)

which is the optimal taxation formula (24).

A.7 Corollary 3

We start with the region of strong convexity of the indirect utility function uu and, hence, a region without bunching. To analyze properties of optimal tax distortions, we use a perturbation function. Specifically, we construct a variation of the indirect utility function for a specific worker pp. Consider a worker pp in the interior of the type space such that both the assignment function zz and the distribution of worker types π\pi are differentiable in a neighborhood around this worker. Moreover, suppose that the strongly convex utility function uu is twice continuously differentiable within a neighborhood of the worker pp.

Consider an arbitrary perturbation of the indirect utility uu denoted u^=u+εV\hat{u}=u+\varepsilon V, where VV is a bump function that is concentrated in a small ball around pp which lies within the neighborhood around pp, and ε\varepsilon is small. The arbitrary perturbation function u+εVu+\varepsilon V is convex for small enough values for ε\varepsilon within the support of the bump function, |ε|<ε¯|\varepsilon|<\bar{\varepsilon}. Intuitively, if the underlying utility function is strongly convex, a small enough additive perturbation preserves convexity.242424The proof of this statement is presented below. See Convex Perturbation Function.

The perturbation function is convex, positive and non-increasing, and therefore implementable (19). Since the implementability condition (19) is linearly separable and holds with equality for an optimal utility function by Proposition 2, the implementability also has to be satisfied for εV\varepsilon V for all |ε|ε¯|\varepsilon|\leq\bar{\varepsilon}. Since ε\varepsilon can take either positive or negative values, the implementability condition holds with equality with respect to the bump function VV:

(𝒞(c)(VVp)z𝒳(x)V)πdp=λVπdp.\int\big{(}\mathcal{C}^{\prime}(c)\big{(}V-\nabla V\cdot p\big{)}-z\mathcal{X}^{\prime}(x)\cdot\nabla V\big{)}\pi\text{d}p=\lambda\int V\pi\text{d}p. (A.32)

Integrating the left-hand side of this equation by parts and tending the bump function VV to the Dirac delta function, we obtain the optimality condition equation in Corollary 3.


Convex Perturbation Function. We establish that the perturbation function is convex. We suppose that the indirect utility function uu is strongly convex for interior worker type pp and twice continuously differentiable within its neighborhood. Specifically, we suppose that H(u)αIIH(u)-\alpha_{I}I is positive semidefinite for worker pp for some αI>0\alpha_{I}>0, where HH denotes the Hessian matrix and II denotes the identity matrix.

Since worker pp is in the interior of the type space, the indirect utility function is strictly positive and strictly decreasing for worker pp. By contradiction, suppose the indirect utility function equals zero for worker pp, u(p)=0u(p)=0. Since the indirect utility function is non-increasing, u(p+ε)=0u(p+\varepsilon)=0 for small enough ε0\varepsilon\geq 0, implying that the gradient of the indirect utility function for worker pp is equal to zero, u(p)=0\nabla u(p)=0. By implication, consider that the partial derivative of the indirect utility function with respect to cognitive type pcp_{c} equals zero, pcu(p)=0\frac{\partial}{\partial p_{c}}u(p)=0. Since we consider a partial derivative for a convex function, the partial derivative increases with pcp_{c} so that pcu(pc+εc,pm)=0\frac{\partial}{\partial p_{c}}u(p_{c}+\varepsilon_{c},p_{m})=0 for all εc0\varepsilon_{c}\geq 0, or 22pcu(p)=0\frac{\partial^{2}}{\partial^{2}p_{c}}u(p)=0. It hence follows that Hcc(u)=0H_{cc}(u)=0, and hence that Hcc(u)αI<0H_{cc}(u)-\alpha_{I}<0 for αI>0\alpha_{I}>0 which contradicts that H(u)αIIH(u)-\alpha_{I}I is positive semidefinite by the Sylvester criterion. We conclude that the utility function is strictly positive and strictly decreasing for interior worker pp.

Since the indirect utility function uu is strongly convex for worker type pp and twice continuously differentiable within its neighborhood, the utility function is strongly convex in this neighborhood. The restriction that H(u)αIIH(u)-\alpha_{I}I is positive semidefinite in a neighborhood around worker type pp implies H(u)αI2IH(u)-\frac{\alpha_{I}}{2}I is positive semidefinite in the neighborhood around pp when the indirect utility function is twice continuously differentiable. Hence, the utility function uu is indeed strongly convex in this neighborhood.

We consider a perturbation of the indirect utility uu denoted by u+εVu+\varepsilon V, where VV is a bump function that is concentrated in a small ball around pp which lies within the neighborhood around pp, and ε\varepsilon is small. The arbitrary perturbation function u+εVu+\varepsilon V is convex for small enough values for ε\varepsilon within the support of the bump function, |ε|<ε¯|\varepsilon|<\bar{\varepsilon}.

While intuitive, we prove that u+εVu+\varepsilon V is convex for small enough values for ε\varepsilon within the support of the bump function in two steps. First, we observe that for some β>0\beta>0, it holds that H(V)βIH(V)-\beta I is negative semidefinite and that H(V)+βIH(V)+\beta I is positive semidefinite. In the former case, negative semidefinite is equivalent to xc2Vcc+2xcxmVcm+xm2Vmmβ(xc2+xm2)x_{c}^{2}V^{\phantom{2}}_{cc}+2x^{\phantom{2}}_{c}x^{\phantom{2}}_{m}V^{\phantom{2}}_{cm}+x_{m}^{2}V^{\phantom{2}}_{mm}\leq\beta(x_{c}^{2}+x_{m}^{2}) for any (xc,xm)(x_{c},x_{m}). To see this, we first observe xc2Vcc+2xcxmVcm+xm2Vmm|xc|2|Vcc|+2|xc||xm||Vcm|+|xm|2|Vmm|x_{c}^{2}V^{\phantom{2}}_{cc}+2x^{\phantom{2}}_{c}x^{\phantom{2}}_{m}V^{\phantom{2}}_{cm}+x_{m}^{2}V^{\phantom{2}}_{mm}\leq|x_{c}|^{2}|V^{\phantom{2}}_{cc}|+2|x^{\phantom{2}}_{c}||x^{\phantom{2}}_{m}||V^{\phantom{2}}_{cm}|+|x_{m}|^{2}|V^{\phantom{2}}_{mm}|. Furthermore, we use that 2|xm||xc||xc|2+|xm|22|x_{m}||x_{c}|\leq|x_{c}|^{2}+|x_{m}|^{2} to write xc2Vcc+2xcxmVcm+xm2Vmmxc2(|Vcc|+|Vcm|)+xm2(|Vcm|+|Vmm|)x_{c}^{2}V^{\phantom{2}}_{cc}+2x^{\phantom{2}}_{c}x^{\phantom{2}}_{m}V^{\phantom{2}}_{cm}+x_{m}^{2}V^{\phantom{2}}_{mm}\leq x_{c}^{2}(|V^{\phantom{2}}_{cc}|+|V^{\phantom{2}}_{cm}|)+x_{m}^{2}(|V^{\phantom{2}}_{cm}|+|V^{\phantom{2}}_{mm}|). Therefore, there indeed exists β=max(|Vcc|+|Vcm|,|Vcm|+|Vmm|)>0\beta=\max(|V^{\phantom{2}}_{cc}|+|V^{\phantom{2}}_{cm}|,|V^{\phantom{2}}_{cm}|+|V^{\phantom{2}}_{mm}|)>0 such that H(V)βIH(V)-\beta I is negative semidefinite. Through a similar argument H(V)+βIH(V)+\beta I is positive semidefinite. Given β>0\beta>0, it holds that εH(V)+|ε|βI\varepsilon H(V)+|\varepsilon|\beta I is positive semidefinite for positive ε\varepsilon, and that ε(H(V)βI)=εH(V)+|ε|βI\varepsilon(H(V)-\beta I)=\varepsilon H(V)+|\varepsilon|\beta I is positive semidefinite for negative ε\varepsilon.

Second, we note that the Hessian matrix for the perturbation function is additively separable, H(u+εV)=H(u)+εH(V)H(u+\varepsilon V)=H(u)+\varepsilon H(V). Since the matrix H(u)αI2IH(u)-\frac{\alpha_{I}}{2}I is positive definite, the matrix H(u+εV)αI2IεH(V)H(u+\varepsilon V)-\frac{\alpha_{I}}{2}I-\varepsilon H(V) is positive definite. Finally, since the sum of positive semidefinite matrices is itself positive semidefinite, it follows that H(u+εV)(αI2|ε|β)IH(u+\varepsilon V)-\left(\frac{\alpha_{I}}{2}-|\varepsilon|\beta\right)I is positive semidefinite for ε\varepsilon small enough, which confirms that the perturbation function is indeed convex. Following analogous reasoning, the indirect utility function uu is also decreasing and positive in a neighborhood around worker pp.


Changing Coordinates. To connect our expression to the existing literature, we transform the optimal tax formula into the original type coordinates α\alpha. We illustrate this transformation by focusing on the partial derivative with respect to cognitive skill in (26),

pc(πu(𝒞(c))pcτs1τc),\partial_{p_{c}}\left(\frac{\pi}{u^{\prime}(\mathcal{C}(c))}p_{c}\frac{\tau_{s}}{1-\tau_{c}}\right), (A.33)

where π\pi is the probability distribution in the transformed worker space pp. To convert this term into the original worker space, we first recall the change of coordinates ps=καsρp_{s}=\kappa\alpha_{s}^{-\rho}, or equivalently αs=(κ/ps)1ρ\alpha_{s}=(\kappa/p_{s})^{\frac{1}{\rho}}, implying that dαs=1ραspsdps=1κραsρ+1dps\text{d}\alpha_{s}=-\frac{1}{\rho}\frac{\alpha_{s}}{p_{s}}\text{d}p_{s}=-\frac{1}{\kappa\rho}\alpha_{s}^{\rho+1}\text{d}p_{s}.

We first explicitly formulate the relationship between the distribution function in the original worker type space α\alpha given by ϕ\phi, and the worker distribution function in transformed coordinates pp given by π\pi:

ϕ(α)dαcdαm\displaystyle\phi(\alpha)\text{d}\alpha_{c}\text{d}\alpha_{m} =ϕ(α)αcρ+1αmρ+1(κρ)2dpcdpm=π(p)dpcdpm,\displaystyle=\phi(\alpha)\frac{\alpha_{c}^{\rho+1}\alpha_{m}^{\rho+1}}{(\kappa\rho)^{2}}\text{d}p_{c}\text{d}p_{m}=\pi(p)\text{d}p_{c}\text{d}p_{m},

where the distribution function π(p):=ϕ(α)αcρ+1αmρ+1/(κρ)2\pi(p):=\phi(\alpha)\alpha_{c}^{\rho+1}\alpha_{m}^{\rho+1}/(\kappa\rho)^{2}. As a result, we express (A.33) as:

pc(πu(c(α))τc1τcκαcρ)=pc(ϕu(c(α))τc1τcαcαmρ+1κρ2),\partial_{p_{c}}\left(\frac{\pi}{u^{\prime}(c(\alpha))}\frac{\tau_{c}}{1-\tau_{c}}\kappa\alpha_{c}^{-\rho}\right)=\partial_{p_{c}}\bigg{(}\frac{\phi}{u^{\prime}(c(\alpha))}\frac{\tau_{c}}{1-\tau_{c}}\frac{\alpha_{c}\alpha_{m}^{\rho+1}}{\kappa\rho^{2}}\bigg{)}, (A.34)

Next, by the chain rule we have that zpc=zαcαcpc\frac{\partial z}{\partial p_{c}}=\frac{\partial z}{\partial\alpha_{c}}\frac{\partial\alpha_{c}}{\partial p_{c}}, which gives:

pc(πu(c(α))τc1τcκαcρ)\displaystyle\partial_{p_{c}}\bigg{(}\frac{\pi}{u^{\prime}(c(\alpha))}\frac{\tau_{c}}{1-\tau_{c}}\kappa\alpha_{c}^{-\rho}\bigg{)} =αc(ϕu(c(α))τc1τcαcαmρ+1κρ2)αcpc=αcρ+1αmρ+1κ2ρ3αc(ϕu(c(α))τc1τcαc)\displaystyle=\partial_{\alpha_{c}}\bigg{(}\frac{\phi}{u^{\prime}(c(\alpha))}\frac{\tau_{c}}{1-\tau_{c}}\frac{\alpha_{c}\alpha_{m}^{\rho+1}}{\kappa\rho^{2}}\bigg{)}\frac{\partial\alpha_{c}}{\partial p_{c}}=-\frac{\alpha_{c}^{\rho+1}\alpha_{m}^{\rho+1}}{\kappa^{2}\rho^{3}}\partial_{\alpha_{c}}\bigg{(}\frac{\phi}{u^{\prime}(c(\alpha))}\frac{\tau_{c}}{1-\tau_{c}}\alpha_{c}\bigg{)}

The derivation for the manual skill term is symmetric, which allows us to summarize the previous two expressions for both tasks as:

ps(πu(c(α))τs1τsκαcρ)=αcρ+1αmρ+1κ2ρ3αs(ϕu(c(α))τs1τsαs),\partial_{p_{s}}\bigg{(}\frac{\pi}{u^{\prime}(c(\alpha))}\frac{\tau_{s}}{1-\tau_{s}}\kappa\alpha_{c}^{-\rho}\bigg{)}=-\frac{\alpha_{c}^{\rho+1}\alpha_{m}^{\rho+1}}{\kappa^{2}\rho^{3}}\partial_{\alpha_{s}}\bigg{(}\frac{\phi}{u^{\prime}(c(\alpha))}\frac{\tau_{s}}{1-\tau_{s}}\alpha_{s}\bigg{)}, (A.35)

Finally, we rewrite the left side of equation (26), using the relation between density functions, as

π(1u(𝒞(c))λ)=ϕ(α)αcρ+1αmρ+1κ2ρ2(λ1u(c(α)))\pi\bigg{(}\frac{1}{u^{\prime}(\mathcal{C}(c))}-\lambda\bigg{)}=-\phi(\alpha)\frac{\alpha_{c}^{\rho+1}\alpha_{m}^{\rho+1}}{\kappa^{2}\rho^{2}}\bigg{(}\lambda-\frac{1}{u^{\prime}(c(\alpha))}\bigg{)} (A.36)

Combining equation (26) in the worker type space pp, with (A.35) and (A.36), we obtain (27) in the worker space α\alpha.

A.8 Proposition 4

By Corollary 3 it follows that when the optimality condition does not hold, the Hessian matrix is degenerate for worker pp. We next show that the Hessian matrix H(u)H(u) is also degenerate for all workers within the neighborhood of pp. By contradiction, suppose that in every neighborhood of point pp we can find a worker p^\hat{p} such that its Hessian is non-degenerate, or equivalently, has full rank. By Corollary 3, the optimality condition holds for worker p^\hat{p}. We can thus construct a sequence of points {p^n}\{\hat{p}_{n}\} that converges to pp. Since the optimality equation is continuous in pp, the sequence converges and that the optimality equation holds for worker pp, which is a contradiction.

A.9 Planner Duality

We prove duality between our cost minimization problem and a welfare maximization problem. The welfare maximization problem is to choose allocation (c,x)(c,x) to maximize utilitarian welfare:

(cpcxcpmxm)πdp,\int\big{(}c-p_{c}x_{c}-p_{m}x_{m}\big{)}\pi\text{d}p, (A.37)

subject to the incentive constraints (12) and the linear resource constraint:

(𝒞(c)+z(p)(𝒳(xc)+𝒳(xm)))πdpR,\int\big{(}\mathcal{C}(c)+z(p)\big{(}\mathcal{X}(x_{c})+\mathcal{X}(x_{m})\big{)}\big{)}\pi\text{d}p\leq R, (A.38)

for some exogenous level of federal resources RR.

Proposition 5.

Let (c,x)(c,x) solve the cost minimization problem associated with maximum welfare level 𝒰¯\overline{\mathcal{U}} so that the minimum resource cost is less than government resources RR. Then allocation (c,x)(c,x) solves the welfare maximization problem given government resources RR.

Conversely, if allocation (c,x)(c,x) solves the welfare maximization problem for resources RR and induces welfare 𝒰¯\overline{\mathcal{U}}, then (c,x)(c,x) solves the cost minimization solves the cost minimization problem for 𝒰=𝒰¯\mathcal{U}=\overline{\mathcal{U}}.

Proof.

First, we establish that the welfare attained by the cost minimization problem and welfare maximization problem are identical. Consider the solution to the cost minimization problem with maximum welfare level 𝒰¯\overline{\mathcal{U}} such that the resource cost is below resource level RR. Allocation (c,x)(c,x) satisfies both the incentive constraints and the resource constraints of the welfare maximization problem and is thus a feasible solution to the welfare maximization problem. Welfare in the welfare maximization problem therefore exceeds 𝒰¯\overline{\mathcal{U}}.

Conversely, take the solution to the welfare maximization and let 𝒰¯¯\overline{\overline{\mathcal{U}}} denote maximum welfare. Consider the allocation (c,x)(c,x) that solves the welfare maximization problem. The allocation (c,x)(c,x) satisfies both the incentive constraints and the promise keeping constraint to the cost minimization problem. Further, the associated resource cost is below resource level RR. Hence, 𝒰¯𝒰¯¯\overline{\mathcal{U}}\geq\overline{\overline{\mathcal{U}}}, implying that welfare is identical for the two problems, 𝒰¯=𝒰¯¯\overline{\mathcal{U}}=\overline{\overline{\mathcal{U}}}.

Second, we show duality of allocations. Suppose allocation (c,x)(c,x) solves the cost minimization problem with maximum welfare level 𝒰\mathcal{U} such that the cost is below resources RR, but the allocation does not solve the welfare maximization problem. Then, there is an alternative allocation (c^,x^)(\hat{c},\hat{x}) that solves the welfare maximization problem, is feasible, and attains strictly greater welfare. This implies that there exists a welfare level 𝒰^>𝒰\hat{\mathcal{U}}>\mathcal{U} so that (c^,x^)(\hat{c},\hat{x}) has a cost below resources RR, contradicting that 𝒰\mathcal{U} is the maximum welfare so that the minimum resource cost is below RR.

Conversely, suppose allocation (c^,x^)(\hat{c},\hat{x}) solves the welfare maximization problem given resources RR inducing welfare 𝒰^\hat{\mathcal{U}}, but does not solve the cost minimization problem. Then, there exists an alternative allocation (c,x)(c,x) that solves the cost minimization problem for a welfare level 𝒰>𝒰^\mathcal{U}>\hat{\mathcal{U}} such that the minimum cost is below resources RR. Allocation (c,x)(c,x) is feasible and attains strictly greater welfare, contradicting that (c^,x^)(\hat{c},\hat{x}) solves the welfare maximization problem.∎

A.10 Transformed Planner Problem

In this appendix, we analyze the planner problem of choosing an allocation (c,x)(c,x) to minimize the resource cost of providing welfare as in Section 3.2. Using the Legendre transforms (29) and (30) to linearize the resource costs, the planning problem is equivalent to:

minc,xmaxφ,ψ((φ(p)c(p)𝒞(φ(p)))+z(p)s(ψs(p)xs(p)𝒳(ψs(p))))π(p)dp\min_{c,x}\;\max\limits_{\varphi,\psi}\;\int\Big{(}\big{(}\varphi(p)c(p)-\mathcal{C}^{*}(\varphi(p))\big{)}+z(p)\sum_{s}\big{(}\psi_{s}(p)x_{s}(p)-\mathcal{X}^{*}(\psi_{s}(p))\big{)}\Big{)}\pi(p)\text{d}p (A.39)

subject to the set of linear irreducible incentive constraints (12) and the promise keeping constraint (13).

To develop properties of the solution we formulate a Lagrangian, where λ\lambda is the multiplier on the promise keeping constraint:

(c,x,φ,ψ)=((φc𝒞(φ))+zs(ψsxs𝒳(ψs))λ((cpsxs)𝒰))πdp.\mathcal{L}\big{(}c,x,\varphi,\psi\big{)}=\hskip-2.27626pt\int\hskip-2.27626pt\Big{(}\big{(}\varphi c-\mathcal{C}^{*}(\varphi)\big{)}+z\sum_{s}\big{(}\psi_{s}x_{s}-\mathcal{X}^{*}(\psi_{s})\big{)}-\lambda\Big{(}\int\big{(}c-\sum p_{s}x_{s}\big{)}-\mathcal{U}\Big{)}\Big{)}\pi\text{d}p. (A.40)

The Lagrangian is a continuous function that is concave-convex. Since the Legendre transform of a convex function is itself convex, the Lagrangian is concave in the distortions (φ,ψ)(\varphi,\psi) holding constant the allocations (c,x)(c,x), and convex in the allocations when holding constant the distortions. Further, since the set of allocations that satisfies the incentive constraints (12) is convex, we can apply the minimax theorem. We use the minimax relationship, minc,xmaxφ0,ψ0(c,x,φ,ψ)=maxφ0,ψ0minc,x(c,x,φ,ψ)\min\limits_{c,x\in\mathcal{I}}\;\max\limits_{\varphi\geq 0,\psi\leq 0}\mathcal{L}\big{(}c,x,\varphi,\psi\big{)}=\max\limits_{\varphi\geq 0,\psi\leq 0}\;\min\limits_{c,x\in\mathcal{I}}\;\mathcal{L}\big{(}c,x,\varphi,\psi\big{)}, to establish Lemma 9.


Lemma 9.

For every incentive compatible allocation (c,x)(c,x)\in\mathcal{I}, stochastic dominance has to be satisfied:

(φc+ψsxs)πdpλ(cpsxs)πdp.\int\hskip-1.42271pt\Big{(}\varphi c+\sum\psi_{s}x_{s}\Big{)}\pi\text{d}p\geq\lambda\int\hskip-1.42271pt\big{(}c-\sum p_{s}x_{s}\big{)}\pi\text{d}p. (A.41)

The result follows by analyzing maxφ0,ψ0minc,x(c,x,φ,ψ)\max\limits_{\varphi\geq 0,\psi\leq 0}\;\min\limits_{c,x\in\mathcal{I}}\;\mathcal{L}\big{(}c,x,\varphi,\psi\big{)}. By contradiction, suppose instead that (φc+ψsxs)πdp<λ(cpsxs)πdp\int(\varphi c+\sum\psi_{s}x_{s})\pi\text{d}p<\lambda\int(c-\sum p_{s}x_{s})\pi\text{d}p. Consider an increase in the allocation (c,x)(c,x) by a constant factor ζ>1\zeta>1. Since incentive compatible constraints are linear, the alternative allocation ζ(c,x)\zeta(c,x) is feasible. By increasing the constant factor, ζ\zeta\rightarrow\infty, optimization would lead to negative infinity, which is not optimal. At the solution to the planning problem, the stochastic dominance condition (A.41) will hold with equality.

Proposition 6.

Let (c,x,φ,ψ)(c,x,\varphi,\psi) solve the planning problem, then stochastic dominance condition holds with equality at optimum:

(φc+ψsxs)πdp=λ(cpsxs)πdp.\int\Big{(}\varphi c+\sum\psi_{s}x_{s}\Big{)}\pi\text{d}p=\lambda\int\big{(}c-\sum p_{s}x_{s}\big{)}\pi\text{d}p. (A.42)
Proof.

To establish the result, we use two problems. First, define the maximization problem:

maxφ,ψ¯(φ,ψ,λ),\max\limits_{\varphi,\psi}\;\underline{\mathcal{L}}(\varphi,\psi,\lambda), (A.43)

where ¯(φ,ψ,λ):=minc,x(c,x,φ,ψ,λ)\underline{\mathcal{L}}(\varphi,\psi,\lambda):=\min\limits_{c,x}\mathcal{L}(c,x,\varphi,\psi,\lambda). Let (φ,ψ)(\varphi^{*},\psi^{*}) be a solution to this problem. Similarly, we define a minimization problem:

minc,x¯(c,x),\min\limits_{c,x}\;\bar{\mathcal{L}}(c,x), (A.44)

where ¯(c,x):=maxφ,ψ(c,x,φ,ψ,λ)\bar{\mathcal{L}}(c,x):=\max\limits_{\varphi,\psi}\mathcal{L}(c,x,\varphi,\psi,\lambda), and let (c,x)(c^{*},x^{*}) be a minimizer to this problem.


Claim 1.

We show that for the Lagrangian (A.40) evaluated at the optimum it holds that:

(c,x,φ,ψ,λ)=minc,xmaxφ,ψ(c,x,φ,ψ,λ)=maxφ,ψminc,x(c,x,φ,ψ,λ)\mathcal{L}\big{(}c^{*},x^{*},\varphi^{*},\psi^{*},\lambda\big{)}=\min\limits_{c,x}\;\max\limits_{\varphi,\psi}\;\mathcal{L}\big{(}c,x,\varphi,\psi,\lambda\big{)}=\max\limits_{\varphi,\psi}\;\min\limits_{c,x}\;\mathcal{L}\big{(}c,x,\varphi,\psi,\lambda\big{)} (A.45)
Proof.

Necessarily it holds that (c,x,φ,ψ,λ)minc,x(c,x,φ,ψ,λ)=¯(φ,ψ,λ)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda)\geq\min\limits_{c,x}\mathcal{L}(c,x,\varphi^{*},\psi^{*},\lambda)=\underline{\mathcal{L}}(\varphi^{*},\psi^{*},\lambda). Since (φ,ψ)(\varphi^{*},\psi^{*}) solves the optimization problem, ¯(φ,ψ,λ)=maxφ,ψ¯(φ,ψ,λ)=maxφ,ψminc,x(c,x,φ,ψ,λ)\underline{\mathcal{L}}(\varphi^{*},\psi^{*},\lambda)=\max\limits_{\varphi,\psi}\underline{\mathcal{L}}(\varphi,\psi,\lambda)=\max\limits_{\varphi,\psi}\min\limits_{c,x}\mathcal{L}(c,x,\varphi,\psi,\lambda), and thus it follows (c,x,φ,ψ,λ)maxφ,ψminc,x(c,x,φ,ψ,λ)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda)\geq\max\limits_{\varphi,\psi}\min\limits_{c,x}\mathcal{L}(c,x,\varphi,\psi,\lambda).

Similarly, note that it necessarily holds that (c,x,φ,ψ,λ)maxφ,ψ(c,x,φ,ψ,λ)=¯(c,x)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda)\leq\max\limits_{\varphi,\psi}\mathcal{L}(c^{*},x^{*},\varphi,\psi,\lambda)=\bar{\mathcal{L}}(c^{*},x^{*}). Since the utility allocation (c,x)(c^{*},x^{*}) is a solution to the minimization problem, ¯(c,x)=minc,x¯(c,x)=minc,xmaxφ,ψ(c,x,φ,ψ,λ)\bar{\mathcal{L}}(c^{*},x^{*})=\min\limits_{c,x}\;\bar{\mathcal{L}}(c,x)=\min\limits_{c,x}\max\limits_{\varphi,\psi}\mathcal{L}(c,x,\varphi,\psi,\lambda). Combining the previous two statements, we conclude (c,x,φ,ψ,λ)minc,xmaxφ,ψ(c,x,φ,ψ,λ)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda^{*})\leq\min\limits_{c,x}\max\limits_{\varphi,\psi}\mathcal{L}(c,x,\varphi,\psi,\lambda), and hence:

minc,xmaxφ,ψ(c,x,φ,ψ,λ)(c,x,φ,ψ,λ)maxφ,ψminc,x(c,x,φ,ψ,λ).\min\limits_{c,x}\max\limits_{\varphi,\psi}\;\mathcal{L}(c,x,\varphi,\psi,\lambda)\geq\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda)\geq\max\limits_{\varphi,\psi}\min\limits_{c,x}\;\mathcal{L}(c,x,\varphi,\psi,\lambda). (A.46)

By the minimax theorem it follows that (A.45) applies.∎


Optimality Conditions. We obtain optimality conditions analyzing the planner problem using (c,x,φ,ψ,λ)=maxmin(c,x,φ,ψ,λ)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda^{*})=\max\;\min\;\mathcal{L}(c,x,\varphi,\psi,\lambda) from Claim 1. By reorganizing terms:

maxφ,ψminc,x(φc+ψsxsλ(cpsxs)𝒞(φ)𝒳(ψs)λ𝒰)πdp.\max\limits_{\varphi,\psi}\min\limits_{c,x\in\mathcal{I}}\;\int\Big{(}\varphi c+\sum\psi_{s}x_{s}-\lambda\Big{(}c-\sum p_{s}x_{s}\Big{)}-\mathcal{C}^{*}(\varphi)-\sum\mathcal{X}^{*}(\psi_{s})-\lambda\mathcal{U}\Big{)}\pi\text{d}p. (A.47)

We observe that only the first four terms depend on the utility allocation, and observe further that these terms are necessarily jointly positive for some utility allocation to be incentive compatible following Lemma 9. Since allocation (c,x)=0(c,x)=0 is incentive compatible and attains the minimum, the optimal utility allocation is chosen such that these terms jointly equal zero, implying:

(φc+ψsxs)πdp=λ(cpsxs)πdp,\int\big{(}\varphi^{*}c^{*}+\sum\psi^{*}_{s}x^{*}_{s}\big{)}\pi\text{d}p=\lambda\int\big{(}c^{*}-\sum p^{*}_{s}x^{*}_{s}\big{)}\pi\text{d}p, (A.42)

and thus concluding the proof.∎

To obtain further optimality conditions to our problem, we analyze the planner problem using (c,x,φ,ψ,λ)=minmax(c,x,φ,ψ,λ)\mathcal{L}(c^{*},x^{*},\varphi^{*},\psi^{*},\lambda)=\min\;\max\;\mathcal{L}(c,x,\varphi,\psi,\lambda) in Claim 1 to write:

minc,xmaxφ,ψ(φc𝒞(φ)+ψsxs𝒳(ψs)λ(cpsxs𝒰))πdp.\min\limits_{c,x}\;\max\limits_{\varphi,\psi}\int\Big{(}\varphi c-\mathcal{C}^{*}(\varphi)+\sum\psi_{s}x_{s}-\sum\mathcal{X}^{*}(\psi_{s})-\lambda\big{(}c-\sum p_{s}x_{s}-\mathcal{U}\big{)}\Big{)}\pi\text{d}p. (A.48)

We observe that only the first four terms depend on the convex conjugates, and that only the final term depends on the multiplier, in terms of the inner maximization problem. Since the promise keeping condition requires cpsxs𝒰c-\sum p_{s}x_{s}\geq\mathcal{U}, and λ0\lambda\geq 0, it has to hold that λ(cpsxs𝒰)=0\lambda(c^{*}-\sum p^{*}_{s}x^{*}_{s}-\mathcal{U})=0. Similarly, it has to hold that φ=𝒞(c)\varphi^{*}={\mathcal{C}}^{\prime}(c^{*}) and ψs=𝒳(xs)\psi^{*}_{s}={\mathcal{X}}^{\prime}(x^{*}_{s}).

A.11 Numerical Approach

Linearization of the Problem. We now discuss the linearization of the problem that is central to numerical tractability. The only nonlinear part of the optimization problem that remains to be linearized is the objective:

min(𝒞(c(p))+z(p)(𝒳(xc(p))+𝒳(xm(p))))π(p)dp.\min\int\left(\mathcal{C}(c(p))+z(p)\big{(}\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p))\big{)}\right)\pi(p)\text{d}p. (A.49)

To illustrate our approach, we focus on the linearization of the convex resource cost function for consumption utility 𝒞\mathcal{C}, and we suppose that bounds for the optimal solution are known a priori, or c¯(p)c(p)c¯(p)\underline{c}(p)\leq c(p)\leq\bar{c}(p) and x¯s(p)xs(p)x¯s(p)\underline{x}_{s}(p)\leq x_{s}(p)\leq\bar{x}_{s}(p).

The idea is to approximate the convex cost for consumption utility 𝒞\mathcal{C} from below with the tangent lines on the bounded interval. For each worker type pp, it follows from the definition of the Legendre transform (29) that 𝒞(c(p))=maxφφc(p)𝒞(φ)\mathcal{C}(c(p))=\max\limits_{\varphi}\varphi c(p)-\mathcal{C}^{*}(\varphi). We replace this continuous set of tangent slopes φ\varphi in (29) with a finite set of tangent lines. Specifically, we consider a list of slopes {φi(p)}i=1n\{\varphi_{i}(p)\}_{i=1}^{n} with corresponding tangent lines lipc(t):=φi(p)t𝒞(φi(p))l^{c}_{ip}(t):=\varphi_{i}(p)t-\mathcal{C}^{*}(\varphi_{i}(p)) such that the inequality:

0𝒞(t)max1inlipc(t)εc0\leq\mathcal{C}(t)-\max_{1\leq i\leq n}\;l^{c}_{ip}(t)\leq\varepsilon_{c} (A.50)

holds for all tt in the bounded interval [c¯(p),c¯(p)][\underline{c}(p),\bar{c}(p)]. Analogously, to linearize the resource cost of labor disutility 𝒳\mathcal{X}, we consider a list of slopes {ψis(p)}i=1n\{\psi^{s}_{i}(p)\}_{i=1}^{n} with corresponding tangent lines lips(t):=ψis(p)t𝒳(ψis(p))l^{s}_{ip}(t):=\psi^{s}_{i}(p)t-\mathcal{X}^{*}(\psi^{s}_{i}(p)) such that the inequality:

0𝒳(t)max1inlips(t)εs0\leq\mathcal{X}(t)-\max_{1\leq i\leq n}\;l^{s}_{ip}(t)\leq\varepsilon_{s} (A.51)

holds for each skill s𝒮s\in\mathcal{S} and for all tt in the interval [x¯s(p),x¯s(p)][\underline{x}_{s}(p),\bar{x}_{s}(p)].

As a key step, we next introduce independent auxiliary variables r(p)r(p) for each worker pp satisfying the following set of linear inequalities for all ii:

r(p)φi(p)c(p)𝒞(φi(p)).r(p)\geq\varphi_{i}(p)c(p)-\mathcal{C}^{*}(\varphi_{i}(p)). (A.52)

It follows from the discussion above that r(p)𝒞(c(p))r(p)\gtrsim\mathcal{C}(c(p)) for each worker pp. For the resource cost of disutility from working, we similarly define independent auxiliary variables rs(p)r_{s}(p) satisfying the linear inequalities for all ii:

rs(p)ψis(p)xs(p)𝒳(ψis(p)).r_{s}(p)\geq\psi^{s}_{i}(p)x_{s}(p)-\mathcal{X}^{*}(\psi^{s}_{i}(p)). (A.53)

We substitute the auxiliary variables r(p)r(p) and rs(p)r_{s}(p) for 𝒞(c(p))\mathcal{C}(c(p)) and 𝒳(xs(p))\mathcal{X}(x_{s}(p)) into our nonlinear objective to define the approximate planner problem. The approximate planner problem chooses (c,xs,r,rs)(c,x_{s},r,r_{s}) to solve:

min(r(p)+z(p)(rc(p)+rm(p)))dπ,\min\int\left(r(p)+z(p)\big{(}r_{c}(p)+r_{m}(p)\big{)}\right)\text{d}\pi, (A.54)

subject to the incentive constraints (12), the promise keeping constraint (13), constraints on the auxiliary variables (A.52) and (A.53), and the approximation bounds for consumption utility c¯(p)c(p)c¯(p)\underline{c}(p)\leq c(p)\leq\bar{c}(p) and task outputs x¯s(p)xs(p)x¯s(p)\underline{x}_{s}(p)\leq x_{s}(p)\leq\bar{x}_{s}(p).


Accuracy. We next describe the accuracy of the approximate planner problem and provide the algorithm that we use to characterize its solution. The precision of the solution to the approximate planner’s problem naturally depends on the accuracy of the prior location of the solution. The criterion we evaluate to ensure that the location is accurate is the absence of binding boundary constraints at the optimal solution. In line with this criterion, we define a solution is proper when no boundary constraints binds.

Definition.

The solution to the approximate problem is proper if the solution is strictly interior, that is c¯(p)<c(p)<c¯(p)\underline{c}(p)<c(p)<\bar{c}(p), x¯s(p)<xs(p)<x¯s(p)\underline{x}_{s}(p)<x_{s}(p)<\bar{x}_{s}(p) if x¯s(p)0\underline{x}_{s}(p)\neq 0 and xs(p)x¯s(p)x_{s}(p)\geq\underline{x}_{s}(p) when x¯s(p)=0\underline{x}_{s}(p)=0.

Proposition 7 shows that one can readily verify that a proper solution approximates well the optimal solution to the initial planner problem.

Proposition 7.

For the approximate problem, introduce the maximal approximation errors:

ε:=maxpmaxc¯tc¯[𝒞(t)maxilipc(t)]andεs:=maxpmaxx¯stx¯s[z(p)𝒳(t)z(p)maxilips(t)].\varepsilon:=\max_{p}\max_{\underline{c}\leq t\leq\overline{c}}\left[\mathcal{C}(t)-\max_{i}\;l^{c}_{ip}(t)\right]\hskip 19.91684pt\text{and}\hskip 19.91684pt\varepsilon_{s}:=\max_{p}\max_{\underline{x}_{s}\leq t\leq\overline{x}_{s}}\left[z(p)\mathcal{X}(t)-z(p)\max_{i}l^{s}_{ip}(t)\right].

If the solution to the approximate planner problem is proper, then the overall approximation error is bounded from above by the sum of maximal approximation errors:

0(𝒞(c(p))+z(p)(𝒳(xc(p))+𝒳(xm(p))))dπΩε+εc+εm,0\leq\int\Big{(}\mathcal{C}(c(p))+z(p)\big{(}\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p))\big{)}\Big{)}\text{d}\pi-\Omega\leq\varepsilon+\varepsilon_{c}+\varepsilon_{m},

where Ω\Omega is the minimum value for the original problem.

Proof.

We show that if the solution (c,xs,r,rs)(c,x_{s},r,r_{s}) to the approximation problem is proper, then the overall approximation error is bounded from above by the sum of maximal approximation errors. We next prove that both inequalities are satisfied.

The first inequality is satisfied since the approximate allocation (c,x)(c,x) is feasible. Since the approximate planner’s problem produces a feasible solution, we clearly have

Ω(𝒞(c(p))+z(p)(𝒳(xc(p))+𝒳(xm(p))))dπ.\Omega\leq\int\left(\mathcal{C}(c(p))+z(p)\big{(}\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p))\big{)}\right)\text{d}\pi. (A.55)

To prove the second inequality, we use that the definition of the approximation error εc\varepsilon_{c} and the approximation constraints (A.52) implies 𝒞(c(p))rc(p)+εc\mathcal{C}(c(p))\leq r_{c}(p)+\varepsilon_{c}. Denote by (c^,x^)(\hat{c},\hat{x}) the allocation that attains the minimum resource cost Ω:=(𝒞(c^(p))+z(p)(𝒳(x^c(p))+𝒳(x^m(p))))dπ\Omega:=\int\left(\mathcal{C}(\hat{c}(p))+z(p)\big{(}\mathcal{X}(\hat{x}_{c}(p))+\mathcal{X}(\hat{x}_{m}(p))\big{)}\right)\text{d}\pi. Since the solution to the approximate problem is proper, there exist a weight λ(0,1)\lambda\in(0,1) such that the convex combination given by c~(p)=λc(p)+(1λ)c^(p)\tilde{c}(p)=\lambda c(p)+(1-\lambda)\hat{c}(p) and x~s(p)=λxs(p)+(1λ)x^s(p)\tilde{x}_{s}(p)=\lambda x_{s}(p)+(1-\lambda)\hat{x}_{s}(p) is a proper allocation. To construct an alternative allocation that is feasible under the approximate problem, we can set:

r~c(p)\displaystyle\tilde{r}_{c}(p) =maxilipc(c~(p))\displaystyle=\max_{i}\;l_{ip}^{c}(\tilde{c}(p)) (A.56)
r~s(p)\displaystyle\tilde{r}_{s}(p) =maxilips(x~s(p))\displaystyle=\max_{i}\;l_{ip}^{s}(\tilde{x}_{s}(p)) (A.57)

Since (c,xs,r,rs)(c,x_{s},r,r_{s}) solves the approximate problem, and since (c~,x~s,r~,r~s)(\tilde{c},\tilde{x}_{s},\tilde{r},\tilde{r}_{s}) is feasible, we know that the cost under the alternative allocation exceeds the cost under the approximate solution. Since the pointwise maximum of convex functions is convex, we have that

r~(p)\displaystyle\tilde{r}(p) λmaxilipc(c(p))+(1λ)maxilipc(c^(p))λr(p)+(1λ)𝒞(c^(p))\displaystyle\leq\lambda\max_{i}l^{c}_{ip}(c(p))+(1-\lambda)\max_{i}l^{c}_{ip}(\hat{c}(p))\leq\lambda r(p)+(1-\lambda)\mathcal{C}(\hat{c}(p)) (A.58)
r~s(p)\displaystyle\tilde{r}_{s}(p) λmaxilips(xs(p))+(1λ)maxilips(x^s(p))λrs(p)+(1λ)𝒳(x^s(p))\displaystyle\leq\lambda\max_{i}l^{s}_{ip}(x_{s}(p))+(1-\lambda)\max_{i}l^{s}_{ip}(\hat{x}_{s}(p))\leq\lambda r_{s}(p)+(1-\lambda)\mathcal{X}(\hat{x}_{s}(p)) (A.59)

where the final inequalities follows from the definition of the approximation constraints (A.52), and from the observation that approximations are from below. By combining the two previous claims we write that

(r(p)+z(p)(rc(p)+rm(p)))dπ(r~(p)+z(p)(r~c(p)+r~m(p)))dπ\displaystyle\int\big{(}r(p)+z(p)\big{(}r_{c}(p)+r_{m}(p)\big{)}\big{)}\text{d}\pi\leq\int\big{(}\tilde{r}(p)+z(p)\big{(}\tilde{r}_{c}(p)+\tilde{r}_{m}(p)\big{)}\big{)}\text{d}\pi
λ\displaystyle\leq\lambda (rc(p)+z(p)(rc(p)+rm(p)))dπ+(1λ)(𝒞(c^(p))+z(p)(𝒳(x^c(p))+𝒳(x^m(p))))dπ,\displaystyle\int\big{(}r_{c}(p)+z(p)\big{(}r_{c}(p)+r_{m}(p)\big{)}\big{)}\text{d}\pi+(1-\lambda)\int\big{(}\mathcal{C}(\hat{c}(p))+z(p)\big{(}\mathcal{X}(\hat{x}_{c}(p))+\mathcal{X}(\hat{x}_{m}(p))\big{)}\big{)}\text{d}\pi,

which implies Ω(rc(p)+z(p)rs(p))dπ\Omega\geq\int\big{(}r_{c}(p)+z(p)\sum r_{s}(p)\big{)}\text{d}\pi. Finally, we use the definition of the approximation errors to write:

Ω(r(p)+z(p)(rc(p)+rm(p)))dπ(𝒞(c(p))+z(p)(𝒳(xc(p))+𝒳(xm(p))))dπε\displaystyle\Omega\geq\int\big{(}r(p)+z(p)\big{(}r_{c}(p)+r_{m}(p)\big{)}\big{)}\text{d}\pi\geq\int\big{(}\mathcal{C}(c(p))+z(p)\big{(}\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p))\big{)}\big{)}\text{d}\pi-\sum\varepsilon

which concludes the proof.∎

Algorithm 1. Iterative Algorithm for Planner’s Problem with Fixed Assignment.
Set initial location boundaries {c¯,c¯}\{\underline{c},\overline{c}\} and {x¯s,x¯s}\{\underline{x}_{s},\overline{x}_{s}\}, define initial accuracy levels εc\varepsilon_{c} and εs\varepsilon_{s}
while εc+εs>ζ\varepsilon_{c}+\sum\varepsilon_{s}>\zeta do
 
  for each pp, construct piecewise linear approximations of 𝒞\mathcal{C} and 𝒳\mathcal{X} on bounded intervals [c¯,c¯][\underline{c},\overline{c}] and [x¯s,x¯s][\underline{x}_{s},\overline{x}_{s}] with precisions εc\varepsilon_{c} and εs\varepsilon_{s}
 
  solve the approximate planner’s problem
 
 if the approximate solution is proper then
    
     update precision levels εcαεc\varepsilon_{c}\to\alpha\varepsilon_{c} and εsαεs\varepsilon_{s}\to\alpha\varepsilon_{s} for some α<1\alpha<1
    
     update location boundaries [c¯,c¯][\underline{c},\overline{c}] and [x¯s,x¯s][\underline{x}_{s},\overline{x}_{s}]
 else
    
     relax location boundaries
   end if
 
 return solution (c,xs,rc,rs)(c,x_{s},r_{c},r_{s}) to the final approximate planner’s problem.
end while

Algorithm. We use an iterative algorithm to solve the approximate planner’s problem for a given precision level.252525See Ekeland and Moreno-Bromberg (2010) for a discussion on numerically solving optimization problems subject to a convexity constraint on the function uu, and Oberman (2013) for a practical approach of dealing with global incentive constraints. We solve the planner problem for a worker type space with 200 types in both the cognitive and the manual dimension, equivalently, for a total of 40 thousand types. We display the structure of our numerical approach in Algorithm 1.

Having described how to characterize the planner problem given an arbitrary assignment, we next describe how to update the assignment to obtain a jointly optimal assignment and allocation. Given the optimality of positive sorting between workers and firms in Proposition 1, we update our assignment after each step by positively sorting the distribution of project values with the effective worker skill index 𝒳(xc(p))+𝒳(xm(p))\mathcal{X}(x_{c}(p))+\mathcal{X}(x_{m}(p)). By doing so, we reassign projects across workers which yields a new assignment. We then solve the planner’s problem for the new assignment function using Algorithm 1. We proceed until the assignment converges. To the best of our knowledge, there is no proof of unique convergence for this iterative procedure. In practice, however, we find that our algorithm always converges to the same assignment function for distinct initial assignments.

Literature. Our numerical approach relates to outer linearization of a separable convex objective function. This approach is well-established, see for example, Bertsekas and Yu (2011).

Outer linearization of a separable convex objective is part of the outer linearization approach for general problems. For example, Geoffrion (1970) present the idea of approximating a convex function with supporting hyperplanes, which is outer linearization. Duran and Grossmann (1986) applies outer linearization to general convex mixed-integer optimization problems. In our case, the objective is separable in variables, which leads to faster and more efficient algorithms for constructing the supporting hyperplanes. Bertsekas and Yu (2011) also have an objective function that is separable in variables and discuss that this problem has been explored under the framework of extended monotropic programming, which builds on monotropic programming (Rockafellar, 1999). Our approach extends beyond outer linearization of a separable convex objective. Whereas the literature routinely focuses on problems with linear equality constraints, our approach also addresses inequality constraints, making it applicable to a broader class of problems.


A.12 Characterizing Equilibrium using Transport Problems

To characterize an equilibrium, we relate our positive economy to optimal transport problems.


Primal Problem. The primal problem is to choose an assignment to maximize production:

maxγΓy(x1,x2,z)dγ.\max_{\gamma\in\Gamma}\;\int y(x_{1},x_{2},z)\text{d}\gamma. (A.60)

The choice of an assignment is restricted by the feasibility constraint, γΓ(Fx,Fx,Fz)\gamma\in\Gamma(F_{x},F_{x},F_{z}), where Γ\Gamma denotes the set of probability measures on the product space 𝐗×𝐗×Z\mathbf{X}\times\mathbf{X}\times Z such that the marginal distributions of γ\gamma onto 𝐗\mathbf{X} and ZZ are FxF_{x} and FzF_{z} respectively.


Dual Problem. The dual transport problem is to choose functions ww and Ω\Omega that solve:

minw,Ωw(x1)dFx+w(x2)dFx+Ω(z)dFz,\min\limits_{w,\Omega}\;\int w(x_{1})\text{d}F_{x}+\int w(x_{2})\text{d}F_{x}+\int\Omega(z)\text{d}F_{z}, (A.61)

subject to the constraint that the surplus is weakly negative for any triplet (x1,x2,z)(x_{1},x_{2},z), that is, S(x1,x2,z)0S(x_{1},x_{2},z)\leq 0.


We connect the primal problem and the dual problem to equilibrium in Lemma 10.

Lemma 10.

The equilibrium assignment γ\gamma solves the primal problem (A.60), equilibrium wages ww and firm values Ω\Omega solve the dual problem (A.61).

We use Lemma 10 to characterize the equilibrium.262626We note that a transport problem with two identical worker distributions FxF_{x} with unit mass for each role is equivalent to a transport problem with a single worker distribution Φx\Phi_{x} with mass two (Appendix A.13). We solve the primal problem (A.60) to characterize the equilibrium assignment and the dual problem (A.61) to characterize wages ww and firm values Ω\Omega. To prove Lemma 10, we use Lemma 11.

Lemma 11.

Suppose the objectives of the primal problem (9) and the dual problem (A.61) coincide y(x1,x2,z)dγ=w(x1)dFx+w(x2)dFx+Ω(z)dFz\int y(x_{1},x_{2},z)\text{d}\gamma=\int w(x_{1})\text{d}F_{x}+\int w(x_{2})\text{d}F_{x}+\int\Omega(z)\text{d}F_{z}. Then γ\gamma solves the primal problem, and the functions ww and Ω\Omega solve the dual transport problem.

The proof to Lemma 11 only uses of a notion of weak duality.

Weak Duality. Let γΓ(Fx1,Fx2,Fz)\gamma\in\Gamma(F_{x_{1}},F_{x_{2}},F_{z}) be a joint probability measure, and (f,g,h)(f,g,h) be functions such that y(x1,x2,z)f(x1)+g(x2)+h(z)y(x_{1},x_{2},z)\leq f(x_{1})+g(x_{2})+h(z) for all (x1,x2,z)(x_{1},x_{2},z). Then

minf,g,hf(x)dFx1+g(x)dFx2+h(z)dFzmaxγΓy(x1,x2,z)dγ.\min_{f,g,h}\;\int f(x)\text{d}F_{x_{1}}+\int g(x)\text{d}F_{x_{2}}+\int h(z)\text{d}F_{z}\;\geq\;\max_{\gamma\in\Gamma}\;\int y(x_{1},x_{2},z)\text{d}\gamma. (A.62)

Proof. For any functions (f,g,h)(f,g,h) so that y(x1,x2,z)f(x1)+g(x2)+h(z)y(x_{1},x_{2},z)\leq f(x_{1})+g(x_{2})+h(z) we have:

maxγΓy(x1,x2,z)dγ(f(x1)+g(x2)+h(z))dγ=f(x)dFx1+g(x)dFx2+h(z)dFz,\max_{\gamma\in\Gamma}\int y(x_{1},x_{2},z)\text{d}\gamma\leq\int\big{(}f(x_{1})+g(x_{2})+h(z)\big{)}\text{d}\gamma=\int f(x)\text{d}F_{x_{1}}+\int g(x)\text{d}F_{x_{2}}+\int h(z)\text{d}F_{z},

where the equality follows as γΓ(Fx1,Fx2,Fz)\gamma\in\Gamma(F_{x_{1}},F_{x_{2}},F_{z}). Since the above inequality holds for any (f,g,h)(f,g,h) it holds for (f,g,h)(f,g,h) that minimize the right-hand side.


We use weak duality to establish Lemma 11 by contradiction.


Proof of Lemma 11. Suppose by contradiction that γ\gamma does not solve the planning problem, then

maxπΓy(x1,x2,z)dπ\displaystyle\max_{\pi\in\Gamma}\int y(x_{1},x_{2},z)\text{d}\pi >y(x1,x2,z)dγ=w(x)dFx+w(x)dFx+Ω(z)dFz\displaystyle>\int y(x_{1},x_{2},z)\text{d}\gamma=\int w(x)\text{d}F_{x}+\int w(x)\text{d}F_{x}+\int\Omega(z)\text{d}F_{z}
minf,g,hf(x)dFx+g(x)dFx+h(z)dFz,\displaystyle\hskip 77.10715pt\geq\min_{f,g,h}\int f(x)\text{d}F_{x}+\int g(x)\text{d}F_{x}+\int h(z)\text{d}F_{z}, (A.63)

where the equality follows by assumption. This contradicts weak duality (A.62).

Suppose by contradiction that the functions f^,g^\hat{f},\hat{g}, and h^\hat{h} do not solve the dual problem. Then there exists functions f,gf,g, and hh such that

minf,g,hf(x)dFx+g(x)dFx+h(z)dFz\displaystyle\min_{f,g,h}\int f(x)\text{d}F_{x}+\int g(x)\text{d}F_{x}+\int h(z)\text{d}F_{z} <w(x)dFx+w(x)dFx+Ω(z)dFz\displaystyle<\int w(x)\text{d}F_{x}+\int w(x)\text{d}F_{x}+\int\Omega(z)\text{d}F_{z}
=y(x1,x2,z)dγmaxπΓy(x1,x2,z)dπ,\displaystyle=\int y(x_{1},x_{2},z)\text{d}\gamma\leq\max_{\pi\in\Gamma}\int y(x_{1},x_{2},z)\text{d}\pi, (A.64)

where the equality follows by the assumption. This inequality contradicts weak duality (A.62).


We now use Lemma 11 to show that equilibrium assignment γ\gamma solves the primal transport problem, and that the wage and firm value function solve the dual transport problem.

In equilibrium, the surplus is negative for any triplet (x1,x2,z)(x_{1},x_{2},z), which implies that y(x1,x2,z)w(x1)+w(x2)+Ω(z)y(x_{1},x_{2},z)\leq w(x_{1})+w(x_{2})+\Omega(z). By substituting the household budget constraints c=(1τ)w(x)c=(1-\tau)w(x), and the government budget constraint G=τw(x)dΦ(α)G=\tau\int w(x)\text{d}\Phi(\alpha), into the aggregate resource constraint (35), we write:

y(x1,x2,z)dμ=w(x1)dFx(x1)+w(x2)dFx(x2)+Ω(z)dFz(z).\int y(x_{1},x_{2},z)\text{d}\mu=\int w(x_{1})\text{d}F_{x}(x_{1})+\int w(x_{2})\text{d}F_{x}(x_{2})+\int\Omega(z)\text{d}F_{z}(z). (A.65)

By Lemma 11 it thus follows that μ\mu solves the primal problem and ww and Ω\Omega solve the dual problem.

A.13 Symmetric Equilibrium

We prove that we can restrict our attention to symmetric equilibria without loss of generality.

Lemma 12.

For any equilibrium with wages ww and assignment function γΓ(Fx1,Fx2,Fz)\gamma\in\Gamma(F_{x_{1}},F_{x_{2}},F_{z}), there exists an equilibrium with wages ww and a symmetric assignment γ^=γ+γ2\hat{\gamma}=\frac{\gamma+\gamma^{\prime}}{2}.

Proof.

Lemma 12 states that for any competitive equilibrium with wages ww, firm value Ω\Omega, and assignment γΓ(Fx1,Fx2,Fz)\gamma\in\Gamma(F_{x_{1}},F_{x_{2}},F_{z}), there is an equilibrium with identical wages ww, firm value function Ω\Omega with a symmetric assignment function γ^:=γ+γ2Γ(Fx,Fx,Fz)\hat{\gamma}:=\frac{\gamma+\gamma^{\prime}}{2}\in\Gamma(F_{x},F_{x},F_{z}), where Fx:=12(Fx1,Fx2)F_{x}:=\frac{1}{2}(F_{x_{1}},F_{x_{2}}).


To prove this result, we first define γ\gamma^{\prime} as a pushforward measure of the assignment function γ\gamma. We then show that the symmetric assignment function γ^\hat{\gamma} indeed solves the primal transport problem. Using Lemma 10, this establishes the result.

Definition.

Given spaces M1M_{1} and M2M_{2}, a measure γ\gamma concentrated on M1M_{1}, and a map T:M1M2T:M_{1}\rightarrow M_{2}, the pushforward measure of γ\gamma through TT, which we denote by T#γT_{\#}\gamma, is defined so that:

f(y)dT#γ=f(T(x))dγ.\int f(y)\text{d}T_{\#}\gamma=\int f(T(x))\text{d}\gamma. (A.66)

We define γ\gamma^{\prime} as the pushforward measure of γ\gamma through a mapping TT, or γ:=T#γ\gamma^{\prime}:=T_{\#}\gamma. Our mapping TT maps from the matching set onto itself interchanging the position of the worker and the coworker, that is, T:MMT:M\rightarrow M so that (x1,x2,z)(x2,x1,z)(x_{1},x_{2},z)\rightarrow(x_{2},x_{1},z). If γ\gamma is a feasible assignment, γ\gamma^{\prime} is a feasible assignment, that is, for γΓ(Fx1,Fx2,Fz)\gamma\in\Gamma(F_{x_{1}},F_{x_{2}},F_{z}) we have γΓ(Fx2,Fx1,Fz)\gamma^{\prime}\in\Gamma(F_{x_{2}},F_{x_{1}},F_{z}).

Using the definition of γ\gamma^{\prime}, we construct symmetric assignment function γ^:=γ+γ2\hat{\gamma}:=\frac{\gamma+\gamma^{\prime}}{2}, and observe that the symmetric assignment function is feasible given FxF_{x}, that is, γ^Γ(Fx,Fx,Fz)\hat{\gamma}\in\Gamma(F_{x},F_{x},F_{z}). Moreover, we observe that:

y(x1,x2,z)dγ^(x1,x2,z)=w(x1)dFx(x1)+w(x2)dFx(x2)+Ω(z)dFz(z).\int y(x_{1},x_{2},z)\text{d}\hat{\gamma}(x_{1},x_{2},z)=\int w(x_{1})\text{d}F_{x}(x_{1})+\int w(x_{2})\text{d}F_{x}(x_{2})+\int\Omega(z)\text{d}F_{z}(z). (A.67)

The left-hand side is unchanged as the production of equilibrium pairings does not change, while the right-hand side is unchanged as the skill distribution is unchanged. By Lemma 11, γ^\hat{\gamma} solves the primal transport problem, and functions ww and Ω\Omega solve the dual transport problem. By Lemma 10, this shows that symmetric assignment γ^\hat{\gamma} is an equilibrium assignment, ww are equilibrium wages, and Ω\Omega are equilibrium firm values.


Finally, we remark that the equilibrium assignment need not be γ^\hat{\gamma}. Specifically, we can replace γ^\hat{\gamma} with any other optimal primal solution. Suppose that there exists another solution to the primal problem γ~\tilde{\gamma}, then

y(x1,x2,z)dγ~(x1,x2,z)=w(x1)dFx(x1)+w(x2)dFx(x2)+Ω(z)dFz(z),\int y(x_{1},x_{2},z)\text{d}\tilde{\gamma}(x_{1},x_{2},z)=\int w(x_{1})\text{d}F_{x}(x_{1})+\int w(x_{2})\text{d}F_{x}(x_{2})+\int\Omega(z)\text{d}F_{z}(z), (A.68)

and hence S(x1,x2,z)=0S(x_{1},x_{2},z)=0 for γ~\tilde{\gamma} almost everywhere.∎

A.14 Wages and Firm Values

To see why only effective skill matters, consider two workers (xc,xm)(x_{c},x_{m}) and (x^c,x^m)(\hat{x}_{c},\hat{x}_{m}) with identical effective skill X=X^X=\hat{X}. Since the surplus is zero almost everywhere under equilibrium assignment γ\gamma, and using production technology (4), 2w(x)+Ω(z)=zX2w(x)+\Omega(z)=zX and 2w(x^)+Ω(z^)=z^X^2w(\hat{x})+\Omega(\hat{z})=\hat{z}\hat{X}. By the constraints to the dual problem (A.61), 2w(x)+Ω(z^)z^X=z^X^2w(x)+\Omega(\hat{z})\geq\hat{z}X=\hat{z}\hat{X} and 2w(x^)+Ω(z)zX^=zX2w(\hat{x})+\Omega(z)\geq z\hat{X}=zX, where the equalities follow since the workers’ effective skills are identical. Combining these expressions, w(x^)w(x)w(\hat{x})\geq w(x) and w(x)w(x^)w(x)\geq w(\hat{x}), so that w(x^)=w(x)w(\hat{x})=w(x). It is useful to define h(X)h(X), the firm’s total wage bill, as h(X):=2w(x)h(X):=2w(x).

Wages are convex in effective skill XX, so small differences in effective worker skill XX translate into increasingly large differences in worker earnings. The dual constraints imply h(X)zXΩ(z)h(X)\geq zX-\Omega(z) for any zz. Since the surplus is zero almost everywhere with respect to the equilibrium assignment h(X):=supz(zXΩ(z))h(X):=\sup\limits_{z}(zX-\Omega(z)) implying that h=Ωh=\Omega^{*}, the firm’s wage bill is the Legendre transform of the firm value function. Since h(X)h(X) is the supremum of linear functions in XX, the wage function is convex.

The firm value function is the Legendre transform of the wage bill. The dual constraints also imply that for any xx it holds that Ω(z)zXh(X)\Omega(z)\geq zX-h(X) and therefore Ω(z):=supX(zXh(X))\Omega(z):=\sup\limits_{X}(zX-h(X)). This implies that the firm value function is convex and indeed the Legendre transform of the wage bill Ω=h\Omega=h^{*}. As a result, h(X)+h(z)=zXh(X)+h^{*}(z)=zX.

A.15 Lemma 3

We show there exists a firm distribution FzF_{z} such that given wage schedule ww, workers and firms both optimize in a self-sorting equilibrium, where the distribution of worker skills FxF_{x} is determined by the worker problems given a talent distribution Φ\Phi. We verify this claim by studying the firm and worker problem given the postulated wage schedule (36).


Firm. Taking the wage schedule ww as given, the firm problem of choosing two workers to employ can be written as:

maxx1,x2y(x1,x2,z)w(x1)w(x2)=maxx1,x2z(x1cx2c+x1mx2m)12(x1c2+x1m2)η12(x2c2+x2m2)η\max_{x_{1},x_{2}}\hskip 2.27626pty(x_{1},x_{2},z)-w(x_{1})-w(x_{2})=\max_{x_{1},x_{2}}\hskip 2.27626ptz\left(x_{1c}x_{2c}+x_{1m}x_{2m}\right)-\frac{1}{2}(x_{1c}^{2}+x_{1m}^{2})^{\eta}-\frac{1}{2}(x_{2c}^{2}+x_{2m}^{2})^{\eta}

where the equality follows from substituting in the production technology (4) and wage schedule (36). The solution to this problem is that firm zz wants to hire two identical workers.

To establish that each firm wants to hire two identical workers, we show that a firm that hires different workers (x1,x2)(x_{1},x_{2}) such that x1x2x_{1}\neq x_{2} can increase its profits by hiring two identical workers (12(x1+x2),12(x1+x2))\big{(}\frac{1}{2}(x_{1}+x_{2}),\frac{1}{2}(x_{1}+x_{2})\big{)}. By hiring two identical workers, the firms increases its production and decreases its wage bill. Production increases since the worker production technology is concave, that is, (x1c+x2c2)2+(x1m+x2m2)2x1cx2c+x1mx2m\big{(}\frac{x_{1c}+x_{2c}}{2}\big{)}^{2}+\big{(}\frac{x_{1m}+x_{2m}}{2}\big{)}^{2}\geq x_{1c}x_{2c}+x_{1m}x_{2m} is implied by (x1cx2c)2+(x1mx2m)20(x_{1c}-x_{2c})^{2}+(x_{1m}-x_{2m})^{2}\geq 0. To see that the firm decreases its wage bill by hiring two identical workers, observe that (x1cx2c)2+(x1mx2m)20(x_{1c}-x_{2c})^{2}+(x_{1m}-x_{2m})^{2}\geq 0 implies (x1c+x2c2)2+(x1m+x2m2)212(x1c2+x1m2)+12(x2c2+x2m2)\big{(}\frac{x_{1c}+x_{2c}}{2}\big{)}^{2}+\big{(}\frac{x_{1m}+x_{2m}}{2}\big{)}^{2}\leq\frac{1}{2}\left(x_{1c}^{2}+x_{1m}^{2}\right)+\frac{1}{2}\left(x_{2c}^{2}+x_{2m}^{2}\right), which implies that for η1\eta\geq 1 we have ((x1c+x2c2)2+(x1m+x2m2)2)η(12(x1c2+x1m2)+12(x2c2+x2m2))η\big{(}\big{(}\frac{x_{1c}+x_{2c}}{2}\big{)}^{2}+\big{(}\frac{x_{1m}+x_{2m}}{2}\big{)}^{2}\big{)}^{\eta}\leq\big{(}\frac{1}{2}\left(x_{1c}^{2}+x_{1m}^{2}\right)+\frac{1}{2}\left(x_{2c}^{2}+x_{2m}^{2}\right)\big{)}^{\eta}. Since the function ςη\varsigma^{\eta} is convex (12ς+12ς^)η12ςη+12ς^η\big{(}\frac{1}{2}\varsigma+\frac{1}{2}\hat{\varsigma}\big{)}^{\eta}\leq\frac{1}{2}\varsigma^{\eta}+\frac{1}{2}\hat{\varsigma}^{\eta}, we obtain ((x1c+x2c2)2+(x1m+x2m2)2)η12(x1c2+x1m2)η+12(x2c2+x2m2)η\big{(}\big{(}\frac{x_{1c}+x_{2c}}{2}\big{)}^{2}+\big{(}\frac{x_{1m}+x_{2m}}{2}\big{)}^{2}\big{)}^{\eta}\leq\frac{1}{2}\left(x_{1c}^{2}+x_{1m}^{2}\right)^{\eta}+\frac{1}{2}\left(x_{2c}^{2}+x_{2m}^{2}\right)^{\eta} by applying this inequality to the right-hand side. Equivalently, by hiring two identical workers (12(x1+x2),12(x1+x2))\big{(}\frac{1}{2}(x_{1}+x_{2}),\frac{1}{2}(x_{1}+x_{2})\big{)} a firm lowers their wage bill relative to hiring two different workers (x1,x2)(x_{1},x_{2}), w(12(x1+x2))+w(12(x1+x2))w(x1)+w(x2)w(\frac{1}{2}(x_{1}+x_{2}))+w(\frac{1}{2}(x_{1}+x_{2}))\leq w(x_{1})+w(x_{2}). Finally, since firms hire identical workers, the firm’s optimality condition hiring effective worker skill XX gives z=h(X)z=h^{\prime}(X). Since the wage schedule is convex, equilibrium sorting is positive.


Worker. The distribution of worker skills is uniquely induced by the worker problems. Given the wage schedule, a worker’s problem is:

maxxc,xmu((1τ)w(x))κ(xcαc)ρκ(xmαm)ρ.\max_{x_{c},x_{m}}\;u\big{(}(1-\tau)w(x)\big{)}-\kappa\Big{(}\frac{x_{c}}{\alpha_{c}}\Big{)}^{\rho}-\kappa\Big{(}\frac{x_{m}}{\alpha_{m}}\Big{)}^{\rho}. (A.69)

Using a transformation x~s:=xsρ\tilde{x}_{s}:=x_{s}^{\rho}, the problem is:

maxx~c,x~mu((1τ)w~(x~)))x~cpcx~mpm,\max_{\tilde{x}_{c},\tilde{x}_{m}}\;u\big{(}(1-\tau)\tilde{w}(\tilde{x}))\big{)}-\frac{\tilde{x}_{c}}{p_{c}}-\frac{\tilde{x}_{m}}{p_{m}}, (A.70)

where w~(x~):=(12x~s2ρ)η\tilde{w}(\tilde{x}):=\Big{(}\frac{1}{2}\sum\tilde{x}_{s}^{\frac{2}{\rho}}\Big{)}^{\eta}.

We prove strict concavity of the objective by examining each of the terms in the objective. The second and third term are linear and thus concave. We remain to verify that the first term, u((1τ)w~(x~))u((1-\tau)\tilde{w}(\tilde{x})), is strictly concave. First, since the consumption utility uu is strictly concave:

λ\displaystyle\lambda u((1τ)w~(x))+(1λ)u((1τ)w~(x~))<u((1τ)(λw~(x)+(1λ)w~(x~))\displaystyle u((1-\tau)\tilde{w}(x))+(1-\lambda)u((1-\tau)\tilde{w}(\tilde{x}))<u((1-\tau)(\lambda\tilde{w}(x)+(1-\lambda)\tilde{w}(\tilde{x}))

Since uu is an increasing and concave function, the first term is strictly concave if w~(x~)\tilde{w}(\tilde{x}) is strictly concave. To establish this, we note that the transformed wage equation is a composite function of a concave CES aggregate with an increasing concave function as long as ηρ2\eta\leq\frac{\rho}{2}. The worker problem is strictly concave and thus has a unique solution, which implies the distribution of worker skills is uniquely induced by the worker’s problem.

To complete the proof we show there exists a distribution of firm projects such that the wage equation (36) is an equilibrium wage function. Since the wage bill is continuously differentiable, z=h(X)z=h^{\prime}(X). We can use this expression to uniquely pin down a distribution of firm project values that rationalizes the wage equation. Since the wage bill is convex, the inferred distribution indeed implies positive sorting between effective worker skills and firm project values.272727For the parametric specification, Ω(z):=sup(zXh(X))\Omega(z):=\sup(zX-h(X)) can be characterized as Ω(z)=𝒞zzηη12ζ\Omega(z)=\mathcal{C}_{z}z^{\frac{\eta}{\eta-1}}-2\zeta, where 𝒞z\mathcal{C}_{z} is a multiplicative constant independent of zz. Together with the dual constraint, this closed-form expression for the firm value allows us to characterize project value zz without relying on the derivative of the firm’s wage bill in the quantitative section.

A.16 Frisch Elasticity

We next show how to derive the expression for the Frisch elasticity of labor supply within our model. Adding the optimality conditions across tasks gives cρ+mρ=𝒞w(x)λ\ell_{c}^{\rho}+\ell_{m}^{\rho}=\mathcal{C}w(x)\lambda, with constant 𝒞:=(1τ)ηκρ\mathcal{C}:=(1-\tau)\frac{\eta}{\kappa\rho}. Using the constant effort shares implied by (40), and multiplying and dividing by ρ:=(c+m)ρ\ell^{\rho}:=(\ell_{c}+\ell_{m})^{\rho} we write ρ=w(x)λ\ell^{\rho}=\mathbb{C}w(x)\lambda, where :=𝒞/((c)ρ+(m)ρ)\mathbb{C}:=\mathcal{C}\big{/}\big{(}\big{(}\frac{\ell_{c}}{\ell}\big{)}^{\rho}+\big{(}\frac{\ell_{m}}{\ell}\big{)}^{\rho}\big{)} is constant across workers. To obtain the Frisch elasticity implied by our model, we relate a worker’s total efforts to earnings per hour z(x)=w(x)/(c+m)z(x)=w(x)/(\ell_{c}+\ell_{m}) as ρ1=z(x)λ\ell^{\rho-1}=\mathbb{C}z(x)\lambda to obtain (41):

ε=log(c+m)logz(x)|λ=log(c+m)log(1τ)|λ=1ρ1.\varepsilon=\frac{\partial\log(\ell_{c}+\ell_{m})}{\partial\log z(x)}\bigg{|}_{\lambda}=\frac{\partial\log(\ell_{c}+\ell_{m})}{\partial\log(1-\tau)}\bigg{|}_{\lambda}=\frac{1}{\rho-1}. (41)

A.17 Implementation

In this appendix, we describe an implementation of the optimum through an income tax system. This argument follows the analysis in Kocherlakota (2010) for an environment with multidimensional private information. To illustrate this argument, let {c(α),x(α)}\{c(\alpha),x(\alpha)\} denote the solution to the planning problem for a finite type space AA.

We first note that the consumption allocation depends on type α\alpha only through the allocation of tasks x(α)x(\alpha). If x(α)=x(α^)x(\alpha)=x(\hat{\alpha}) while c(α)>c(α^)c(\alpha)>c(\hat{\alpha}), worker α^\hat{\alpha} would pretend being type α\alpha to attain more consumption for identical task inputs and the planner allocation is not incentive compatible. The consumption allocation can thus be written as a function of the task inputs c(α)=c~(x(α))c(\alpha)=\tilde{c}(x(\alpha)), where the domain of the consumption function is finite and given by X:=α{x(α)}\textbf{X}:=\cup_{\alpha}\{x(\alpha)\}.

The consumption function is increasing in both cognitive and manual task disutility as the tax system has to reward workers for providing higher levels of task inputs. Suppose xc(α)=xc(α^)x_{c}(\alpha)=x_{c}(\hat{\alpha}), xm(α)>xm(α^)x_{m}(\alpha)>x_{m}(\hat{\alpha}) while c(α)c(α^)c(\alpha)\leq c(\hat{\alpha}). In this case, worker α\alpha reports α^\hat{\alpha}, so the planning allocation is not incentive compatible.

While the consumption allocation function c~\tilde{c} has a finite domain, we next extend to all xx such that xcminαAxc(α)x_{c}\geq\min\limits_{\alpha\in A}x_{c}(\alpha) and xmminαAxm(α)x_{m}\geq\min\limits_{\alpha\in A}x_{m}(\alpha). For all such task inputs, define

c^(x):=maxxxc~(x)\hat{c}(x):=\max\limits_{x^{\prime}\leq x}\;\tilde{c}(x^{\prime}) (A.71)

subject to xXx^{\prime}\in\textbf{X} if the maximizer exists and c^(x)=0\hat{c}(x)=0 otherwise. We also set c^(x)=\hat{c}(x)=-\infty for all xx outside the domain. We define the tax system T(x)T(x) over the same values of xx as:

T(x)=w(x)c^(x).T(x)=w(x)-\hat{c}(x). (A.72)
Proposition 8.

Tax system TT implements the planner allocation.

Proof.

In order to establish the result, we show that workers of skill α\alpha indeed choose the allocation for worker α\alpha under the planner allocation.

The problem of worker α\alpha given the tax system TT is to choose the level of cognitive and manual task inputs xx to solve:

maxxu(c)v(xcαc)v(xmαm)\max\limits_{x}\;u(c)-v\left(\frac{x_{c}}{\alpha_{c}}\right)-v\left(\frac{x_{m}}{\alpha_{m}}\right) (A.73)

subject to the constraint cw(x)T(x)=c^(x)c\leq w(x)-T(x)=\hat{c}(x). The problem can thus be written as:

maxxu(c^(x))v(xcαc)v(xmαm).\max\limits_{x}\;u(\hat{c}(x))-v\left(\frac{x_{c}}{\alpha_{c}}\right)-v\left(\frac{x_{m}}{\alpha_{m}}\right). (A.74)

Choosing cognitive and manual task inputs below their minimum would be suboptimal as the worker pay infinite taxes. We thus restrict the attention to task inputs x(minαxc(α),minαxm(α))x\geq\big{(}\min\limits_{\alpha}x_{c}(\alpha),\min\limits_{\alpha}x_{m}(\alpha)\big{)}.

We next show that workers choose a task allocation xXx\in\textbf{X}. By contradiction, suppose that the worker instead chooses an allocation x^X\hat{x}\notin\textbf{X}. Hence, the worker attains consumption c^(x^)\hat{c}(\hat{x}), which by the definition of the consumption allocation function is given by:

c^(x^)=maxxx^c~(x)\hat{c}(\hat{x})=\max\limits_{x^{\prime}\leq\hat{x}}\;\tilde{c}(x^{\prime}) (A.75)

subject to xXx^{\prime}\in\textbf{X}. Since x^X\hat{x}\notin\textbf{X}, the worker can do better by reducing their work effort choosing xXx\in\textbf{X} that delivers the same consumption level. The worker consumes the same, exerting less effort. As a result, we restrict the choice to xXx\in\textbf{X} without loss of generality.

Worker α\alpha chooses the optimal bundle of task inputs xXx\in\textbf{X}, which boils down to choosing xx such that for all xXx^{\prime}\in\textbf{X}

u(c~(x))v(xcαc)v(xmαm)u(c~(x))v(xcαc)v(xmαm)u\big{(}\tilde{c}(x)\big{)}-v\left(\frac{x_{c}}{\alpha_{c}}\right)-v\left(\frac{x_{m}}{\alpha_{m}}\right)\geq u\big{(}\tilde{c}(x^{\prime})\big{)}-v\left(\frac{x^{\prime}_{c}}{\alpha_{c}}\right)-v\left(\frac{x^{\prime}_{m}}{\alpha_{m}}\right) (A.76)

By definition of the consumption allocation function over the domain X it follows that this is equivalent to:

u(c(α))v(xc(α)αc)v(xm(α)αm)u(c(α))v(xc(α)αc)v(xm(α)αm)u\big{(}c(\alpha)\big{)}-v\left(\frac{x_{c}(\alpha)}{\alpha_{c}}\right)-v\left(\frac{x_{m}(\alpha)}{\alpha_{m}}\right)\geq u\big{(}c(\alpha^{\prime})\big{)}-v\left(\frac{x_{c}(\alpha^{\prime})}{\alpha_{c}}\right)-v\left(\frac{x_{m}(\alpha^{\prime})}{\alpha_{m}}\right) (A.77)

for all αA\alpha^{\prime}\in A. That worker α\alpha optimally chooses allocation (c(α),x(α))(c(\alpha),x(\alpha)), implementing the planner allocation, then follows from the incentive constraints.∎