[fileext=lop]listing
A Certainty Equivalent Merton Problem
Abstract
The Merton problem is the well-known stochastic control problem of choosing consumption over time, as well as an investment mix, to maximize expected constant relative risk aversion (CRRA) utility of consumption. Merton formulated the problem and provided an analytical solution in 1970; since then a number of extensions of the original formulation have been solved. In this note we identify a certainty equivalent problem, i.e., a deterministic optimal control problem with the same optimal value function and optimal policy, for the base Merton problem, as well as a number of extensions. When time is discretized, the certainty equivalent problem becomes a second-order cone program (SOCP), readily formulated and solved using domain specific languages for convex optimization. This makes it a good starting point for model predictive control, a policy that can handle extensions that are either too cumbersome or impossible to handle exactly using standard dynamic programming methods.
1 Introduction
We revisit Merton’s seminal 1970 formulation (and solution) of the consumption and investment decisions of an individual investor. We present a formulation of Merton’s problem as a deterministic convex optimal control problem, and in particular, a second-order cone program (SOCP) when time is discretized. Even though the Merton problem was first solved more than 50 years ago, its reformulation as a deterministic convex optimization problem provides fresh insight into the solution of the stochastic problem that may be useful for formulating other multiperiod investment problems as convex optimization problems.
We also see two practical advantages to the certainty equivalent formulation. First, for extensions of the Merton problem for which a solution is known, working out the optimal policy can be complex and error prone. To handle these extensions with the certainty equivalent form, we simply add the appropriate terms to the objective or constraints, to obtain the optimal policy. The problem specification is straightforward and transparent, especially when expressed in a domain specific language (DSL) for convex optimization, such as cvxpy [DB16].
The second and perhaps more significant advantage is that the certainty equivalent problem can be used as a starting point for further extensions of the Merton problem, for which no closed-form solutions are known. In this case, the certainty equivalence property is lost, and solving the deterministic problem no longer solves the corresponding stochastic problem exactly. We can, however, still use model predictive control (MPC), a method that involves online convex optimization, to develop a policy that handles the extension. MPC policies are simple, easy to implement, fully interpretable, and have excellent (if not always optimal) practical performance.
1.1 Previous work
Merton’s problem.
Merton’s consumption–investment problem dates back to his original 1970 paper [Mer70]. Many extensions to the basic Merton problem exist, some of which were covered in Merton’s original paper. (These include deterministic income and general HARA utility.) Most proposed extensions do not have a closed-form solution, but some that do include uncertain mortality, life insurance, and annuities, first adressed by [Ric75]. Some extensions for the specific case of quadratic utility are handled in [BC10]. We note that many of these extensions individually lead to complicated solutions, and deriving the optimal policy when several extensions are combined may be very inconvenient for a practical implementation.
Certainty equivalence.
Rarely, stochastic control problems have a certainty equivalent formulation, i.e., a deterministic optimal control problem with the same optimal policy. The most famous example is the linear quadratic regulator (LQR) problem, in which the dynamics are affine, driven by additive noise, and the stage costs are convex quadratic [BB18], [Ber17, §3.1], [KS72, §3]. In this case, the certainty equivalent problem is obtained by simply ignoring the stochastic noise term. Many extensions to linear quadratic control also have a certainty equivalent reformulation. Examples include the linear quadratic Guassian problem, in which the state is imperfectly observed [KS72, §5], and linear exponential quadratic regulator (LEQR) problem, which uses a risk-sensitive cost function [Whi90]. Our certainty equivalent formulation is similar to LEQR in that the uncertain quantity is adversarial [Whi90, §10.2]. (For the Merton problem, the uncertain quantity is the investment returns.)
Model predictive control.
In model predictive control, unknown values of future parameters are replaced with estimates or forecasts over a planning horizon extending from the current time to some time in the future, resulting in a deterministic optimal control problem. This problem is solved, with the result intrepretable as a plan of action over the planning horizon. The MPC policy simply uses the current or first value in the plan of action. This planning is repeated when updated forecasts are available, using the updated forecasts and current state. When applied in the context of stochastic control, MPC policies are not optimal in general, but often exhibit excellent practical performance, and are widely used in several application areas. MPC is discussed in detail in [BBM17, KH06]. In [Boy+14], the authors use a computational bound to show that MPC is nearly optimal for some stochastic control problems in finance.
As discussed above, ignoring uncertainty is in fact optimal for linear quadratic control, and MPC leads to an optimal policy when applied to LQR. In this sense, MPC can be interpreted as applying certainty-equivalence beyond where it is theoretically justified in order to obtain a good heuristic control policy [Ber17, §4.3]. For the Merton problem, we also propose to use a certainty equivalent problem as the basis of an MPC control policy, even when certain extensions to the base problem ruin exact certainty equivalence.
While MPC has been used in practical applications for decades, recent advances make it very attractive, and easy, to develop and deploy. First, DSLs for convex optimization allow the control policy to be expressed in a few lines of very simple and clear code, that express the dynamics, objective, and constraints, which makes it easier to develop, debug, and maintain (for example by adding or updating a constraint). Code generation systems such as cvxgen [MB12] can be used to generate low-level code for that solves the problem specified, which is suitable for use in high speed embedded applications [WB09]. In the context of the present paper, this means that the MPC policy we propose in §6 can be very conveniently implemented.
Multi-period portfolio optimization.
It is instructive to compare our certainty equivalent problem to popular formulations of multi-period portfolio allocation (See [Boy+17] and references therein). There are two features present in our certainty equivalent problem that we do not see in practical multiperiod portfolio construction problems in the literature:
-
1.
The risk term (which is quadratic in the dollar-valued asset allocation vector ), is normalized by the total wealth , which is also is a decision variable. This risk term is jointly convex in and (and is in fact SOCP representable). With this normalization, risk preferences are consistent even as the wealth changes over the investment horizon.
-
2.
The risk term is included as a penalty in the dynamics, i.e., by taking more risk now, one should expected to have lower wealth in the future. This contrasts with the tradition of penalizing risk in the objective function.
We believe these to be valuable improvements to standard multi-period portfolio construction formulations, especially in cases when the control or optimization is over a very long time period.
1.2 Outline
In §2, we give the base Merton problem and review its solution, for future reference. In §3, we give a certainty equivalent problem and prove equivalence. In §4, discuss several extensions to the Merton problem, and show how each one changes the certainty equivalent formulation. In §6, we discuss how to use the certainty equivalent problem for model predictive control.
2 Merton problem
In this section we discuss the Merton problem and its solution. To keep the proofs concise, we consider the most basic form of this problem; extensions are considered in §4. Our formulation is in continuous time and relies on stochastic calculus. However, to maintain both brevity and accessibility, we are cavalier about the technical details, with the assumption that a sophisticated reader can fill in the gaps, or consult other references.
Dynamics.
An investor must choose how to invest and consume over a lifetime of years. The investor has wealth at time , and consumes wealth at rate , for , with the remaining wealth invested in a portfolio with mean rate of return and volatility . The wealth dynamics are a geometric random walk,
where is a Brownian motion. The initial condition is .
Investment portfolio.
The portfolio consists of assets, with an investment mix given by the fractional allocation , with (where is the vector with all entries one). Thus we invest dollars in asset , with a negative value denoting a short position. The portfolio return rate and volatility are given by
where is the mean of the return process, and is the symmetric positive definite covariance. (Note that we use the time-varying scalar to denote the portfolio return as a function of time, and the vector to denote the constant expected return rates of the assets.)
The investment allocation decision satisfies , as well as other investment constraints, which we summarize as , where is a convex set. These could include risk limits, sector exposure limits, or concentration limits. (See [Boy+17, §4.4] for an overview of convex investment constraints.) For notational convenience, we assume every satisfies .
With the portfolio return and volatility we obtain the wealth dynamics
(1) |
Utility.
The investor has lifetime consumption utility and bequest utility . The risk aversion parameter satisfies and . The investor’s total expected utility is
(2) |
The parameter trades off consumption and bequest utility.
Stochastic control problem.
At each time , the investor chooses the consumption and the investment allocation . A policy maps the time and the current wealth to the consumption and the allocation , which we write as
(3) |
where for each , . (Here denotes the set of positive real numbers.) The Merton problem is to choose a policy , , to maximize .
2.1 Solution via dynamic programming
We review here the solution of the Merton problem via dynamic programming, for completeness and also for future reference.
Value function.
The value function , for , is defined as
with and following an optimal policy for , and initial condition . We define for .
If the value function is sufficiently smooth, it satisfies the Hamilton-Jacobi-Bellman PDE
(4) |
for . Conversely, any function satisfying (4) and the terminal condition is the value function. Here denotes the partial derivative of with respect to time, and and denote the first and second partial derivatives with respect to the wealth.
It is well known that the value function for the Merton problem is
(5) |
where is a function of time. To obtain , we first solve a Markowitz portfolio allocation problem,
maximize | (6) | |||||
subject to |
with variable . (Since , the second term is a concave risk adjustment.) We let denote the optimal value, and we denote the solution as . We then have, for ,
(7) |
where .
Optimal policy.
The optimal policy can be expressed in terms of the value function as
With the value function (5), we obtain the following optimal policy. The consumption has the simple form
and the optimal investment mix is constant over time,
(In extensions of the Merton problem, described below, the optimal investment mix is not constant over time.)
Proof of optimality.
Here we show that the function (5) satisfies the Hamilton-Jacobi-Bellman PDE. To do this, first we substitute , and into (4) to obtain
By pulling out from the last two terms and simplifying, we obtain
(8) |
The maximizing is the solution to problem (6). The quantity in the inner parantheses of (8) is the optimal value of this problem, which can be intrepreted as the certainty equivalent return. We now have
The supremum over is obtained for . Substituting in this value and simplifying, we obtain
It can be verified that the definition of in (7) is indeed a solution to this differential equation with terminal condition .
3 Certainty equivalent problem
In this section we present a deterministic convex optimal control problem that is equivalent to the Merton problem in the sense that it has the same value function and same optimal policy.
This certainty equivalent problem is
maximize | (9) | |||||
subject to | ||||||
The variables are the consumption , wealth , and , which is the dollar-valued allocation of wealth to each asset. (In the notation of §2, we have , and .) Note that the constraint implies , i.e., the total wealth is the sum of the dollar-valued asset allocations.
The objective is the lifetime utility, but without expectation since this problem is deterministic. The first constraint resembles the dynamics of the stochastic process (1), and we call this the dynamics constraint. We will see that for any solution to (9), this inequality constraint holds with equality, in which case the dynamics constraint becomes a (deterministic) ODE.
Interpretation.
The problem can be interpreted in the following way. We plan for a single outcome of the stochastic process (1). In particular, the dynamics constraint restricts the growth rate of the wealth to be no greater than the (the mean growth rate in the stochastic process (1)), but reduced by the additional term . Because , this term is negative. With the change of variables , we have
i.e., this adjustment term is proportional to the variance of the portfolio growth rate with investment allocation . In other words, we are pessimistically planning for bad investment returns, with the degree of pessimism depending on the risk aversion parameter and the risk of our portfolio.
In fact, in problem (9), we plan for the returns
The coefficients in front of and are negative, and the entries of and are typically positive. The vector can be interpreted as the risk allocation to the individual assets in the portfolio, since
In other words, the planned asset returns are the mean returns, reduced in proportion to the marginal contribution of each asset to the portfolio variance. This is related to the concept of risk parity [BST16].
Convexity.
Convexity of (9) follows from the fact that the risk penalty term is a quadratic-over-linear function, with is jointly convex in and [BV04, §3.1.5]. Also, the set
is the perspective of , which is convex when is [BV04, §2.3.3]. In fact, in most practical portfolio construction problems, can described by a collection of linear and quadratic constraints [Boy+17, §4.4]. In this case, when problem (9) is discretized, it becomes an SOCP, which we describe in §7.
Equivalence to Merton problem.
The Merton problem and problem (9) are equivalent in the sense that they have the same value function and optimal policy.
To see this, we first consider a modified version of (9) in which we convert the dynamics to an equality constraint using a slack variable :
The new control input can be interpreted as the rate at which we discard wealth. (We will see that at optimality .) For this modified problem, the Hamilton-Jacobi-Bellman equation is
First note that with our value function candidate (5), we have , and therefore , as expected. Now, by using the change of variables and plugging in our value function candidate, this equation becomes (8). From this point on, the proof that this candidate value function satisfies the Hamilton-Jacobi-Bellman equation proceeds exactly as for the (stochastic) Merton problem.
4 Exact extensions
Here we consider several extensions to the Merton problem, all of which are known in the literature and have closed-form solutions. For each one, we describe how to modify problem (9) to maintain the certainty-equivalence property.
Time-varying parameters.
The Merton problem can be solved when , , and change over time. To handle this in the certainty equivalent problem, we simply replace these parameters by , , and . (Here denotes the time-varying vector of asset expected returns, a notation clash with our previous use of as the scalar portfolio expected return.) Similarly, if we discount the consumption utility of the Merton problem:
where is the discount of the consumption utility at time , then the objective of the certainty equivalent problem will match (but without the expectation).
Uncertain mortality and bequest.
Here the terminal time is random with probability density and survival function
In this case, the investor’s utility is
Here the expectation is taken over as well as the paths of the stochastic process (1).
With this modification, the objective of the certainty equivalent problem changes to
We weight the consumption utility by the probability the investor is still alive, i.e., we treat the survival function as a discount factor. We also get utility for the bequest continuously over the interval , weighted by the density function .
Annuities and life insurance.
This extension is due to [Ric75]. Continuing with the previous extension, we allow the investor to purchase life insurance. The premium is , which the investor can choose, and the payout of the plan is , where is the payout-to-premium ratio at time . When , we interpret this as an annuity. In particular, at time , the investor has in the annuity account, which is lost on death, in return for an additional return of . The actuarially fair value of is , which is called the force of mortality. (If , then life insurance is favorable and annuities are unfavorable; if , the reverse is true.)
With this modification, the objective of the certainty-equivalant problem changes to
i.e., we add the insurance payout to the wealth in the bequest utility. The dynamics change to
Here we subtract the insurance premium from the growth rate of the wealth.
Income.
We can add a deterministic income stream, with income rate at time . The stochastic dynamics are modified be the addition of to the drift term of the wealth process, i.e.,
In this case, we also assume one of the assets is risk free with return and volatility , and that
(10) |
These assumptions allow the investor to counteract the income stream by shorting the risk-free asset and investing the proceeds in a preferred portfolio of other assets. The fair value of the income stream is its net present value over at the risk-free rate:
which can be interpreted as the remaining human capital of the investor.
For this extension, the dynamics in (9) are replaced by
Note the addition of the income term and the normalization of risk by the total wealth plus the remaining human capital. In this case, the wealth need not be positive but instead satisfies . Because of this, we also replace the constraint (which is not defined for ) with .
Epstein–Zin preferences.
One interesting feature of the certainty equivalent problem (9) is that the risk aversion parameter appears separately in the objective and dynamics constraint. It is reasonable to ask whether, by modifying the consumption utility to be
for some with and , but keeping in the dynamics constraint, problem (9) is equivalent to some variant of the Merton problem. This is indeed the case, but with the expected utility replaced by Epstein–Zin preferences, where is the elasticity of intertemporal substitution and is the risk aversion. For details, see [DE92].
5 Inexact extensions
Here we discuss several extensions of problem (9) that (to our knowledge) do not exactly solve a version of the Merton problem. Some of these build on the exact extensions of §4.
Modified utility.
We can change the objective of (9) to use any increasing, concave utility function for either consumption or bequest. These utility functions need not be additive over time: For example, we can maximize the minimum consumption over the interval ,
As a special case, we can add a minimum consumption constraint
where is the minimum allowable consumption amount as a function of age. Similarly, we can enforce a minimum bequest over some time window (say, to care for underage dependents until they come of age).
Spending limit.
We can limit consumption as a fraction of income with the constraint
for some parameter . For example, when , this constraint means that we can’t consume more than 70% of our income, i.e., we must have a savings rate of 30%.
This constraint can be adjusted to account for investment income. To see this, take to be the vector of dividend yields for each asset, which is constant and known in advance. The modified constraint becomes
When this constraint is tight, i.e., when we desire to consume more than times our income, there is added incentive to invest in assets with high dividend yield.
Minimum cash balance.
We can include a constraint that the amount invested in cash be above a certain level, i.e.,
where is the index of the cash asset. This is similar to an emergency fund constraint that we must keep six months worth of consumption in cash, which is expressed as
6 Application to model predictive control
Model predictive control is a technique for stochastic control problems that leverages a deterministic approximation of the stochastic problem. To evaluate an MPC policy, we first solve this determistic problem to obtain a planned trajectory for the state and control input over the planning horizon. We then implement only the first control input in this plan, and rest of the planned trajectory is discarded. To obtain future control inputs, the policy is evaluated again, which requires solving a new deterministic problem.
In the context of the Merton problem, the certainty equivalent problem is used as a basis for a simple model predictive control policy, which we denote . We first define this policy when , with initial wealth . We start by solving the deterministic control problem (9) to obtain the optimal trajectories and . The MPC policy then takes . To define the MPC policy for , we first form a new instance of problem (9), which is defined over the interval and has initial wealth . Once again we solve the deterministic optimal control problem (9), to obtain optimal and over the interval . We then take . Evaluating the MPC policy therefore always requires solving a deterministic optimal control problem of the form (9).
MPC is a convenient way to implement the optimal policy for the basic problem or any of the extensions of §4. In those cases, the MPC policy is optimal. When MPC is applied with constraints and an objective that do not correspond to any version of Merton problem, the MPC policy is a sophisticated heuristic, and very useful in practice.
To use MPC in practice requires discretizing problem (9), which we discuss in the next section.
7 Discretized problem
Here we show how to discretize problem (9). We do this for the basic problem only, but note that the extensions can be handled similarly.
We let denote the value of in (9) at time , , where is the discretization interval. (We use the same notation, but index with the subscript to denote the discretized variable, and index with to denote the continuous variable.) We similarly define the discretized variables and . Replacing the time derivative with the forward Euler approximation , and replacing the integral in the objective with a Riemann sum approximation, we obtain the discretized problem
maximize | (11) | |||||
subject to | ||||||
The variables are and for and for . All of the extensions (exact and inexact) discussed above can be discretized as well, but we do not give the details here.
The discretized certainty equivalent problem (11) is a (finite-dimensional) convex optimization problem, and can therefore be easily expressed in a domain-specific language for convex optimization, such as cvxpy. As an example, we give a cvxpy implementation of (11) in listing 7 when is given by (10).
An implementation of the discretized certainty equivalent problem (11) using cvxpy.
For most practical portfolio construction problems, is SOCP representable, which means that problem (11) is an SOCP [Lob+98]. To see this, note that the power utility and the quadratic-over-linear functions are SOCP representable; see [AG03, §2.2.f] and [Lob+98, §2.4], respectively. The perspective of can be represented using the same cones used to represent [MB15, §2].
To give some idea of the speed at which current solvers can solve the discretized problem (11) (and its extensions), consider a problem with assets, periods, and covariance matrix given as a typical factor model, with 25 factors. This problem has more than 100000 optimization variables. With just a small modification of the code given in listing 7 to exploit the low rank plus diagonal structure of the covariance matrix, the open-source solver ECOS [DCB13] solve the problem in around two seconds, on a single thread.
References
- [AG03] Farid Alizadeh and Donald Goldfarb “Second-order cone programming” In Mathematical programming 95.1, 2003, pp. 3–51
- [BB18] Shane Barratt and Stephen Boyd “Stochastic control with affine dynamics and extended quadratic costs” ArXiv preprint, 2018
- [BBM17] Francesco Borrelli, Alberto Bemporad and Manfred Morari “Predictive control for linear and hybrid systems” Cambridge University Press, 2017
- [BC10] Suleyman Basak and Georgy Chabakauri “Dynamic mean-variance asset allocation” In The Review of Financial Studies 23.8 Oxford University Press, 2010, pp. 2970–3016
- [Ber17] Dimitri P Bertsekas “Dynamic programming and optimal control” Athena scientific, 2017
- [Boy+14] Stephen Boyd, Mark Mueller, Brendon O’Donoghue and Yang Wang “Performance bounds and suboptimal policies for multi-period investment” In Foundations and Trends in Optimization 1.1 Now Publishers, Inc., 2014, pp. 1–69
- [Boy+17] Stephen Boyd, Enzo Busseti, Steve Diamond, Ronald N Kahn, Kwangmoo Koh, Peter Nystrup and Jan Speth “Multi-period trading via convex optimization” In Foundations and Trends in Optimization 3.1 Now Publishers, Inc., 2017, pp. 1–76
- [BST16] Xi Bai, Katya Scheinberg and Reha Tutuncu “Least-squares approach to risk parity in portfolio selection” In Quantitative Finance 16.3 Taylor & Francis, 2016, pp. 357–376
- [BV04] S. Boyd and L. Vandenberghe “Convex optimization” Cambridge University Press, 2004
- [DB16] S. Diamond and S. Boyd “CVXPY: A Python-embedded modeling language for convex optimization” In Journal of Machine Learning Research 17.83, 2016, pp. 1–5
- [DCB13] Alexander Domahidi, Eric Chu and Stephen Boyd “ECOS: An SOCP solver for embedded systems” In European Control Conference, 2013, pp. 3071–3076
- [DE92] Darrell Duffie and Larry G Epstein “Stochastic differential utility” In Econometrica: Journal of the Econometric Society JSTOR, 1992, pp. 353–394
- [KH06] Wook Hyun Kwon and Soo Hee Han “Receding horizon control: Model predictive control for state models” Springer, 2006
- [KS72] Huibert Kwakernaak and Raphael Sivan “Linear optimal control systems” John Wiley & Sons, 1972
- [Lob+98] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd and Hervé Lebret “Applications of second-order cone programming” In Linear algebra and its applications 284.1-3 Elsevier, 1998, pp. 193–228
- [MB12] Jacob Mattingley and Stephen Boyd “CVXGEN: A code generator for embedded convex optimization” In Optimization and Engineering 13.1 Springer, 2012, pp. 1–27
- [MB15] Nicholas Moehle and Stephen Boyd “A perspective-based convex relaxation for switched-affine optimal control” In Systems and Control Letters 86 Elsevier, 2015, pp. 34–40
- [Mer70] Robert C Merton “Optimum consumption and portfolio rules in a continuous-time model” In Stochastic Optimization Models in Finance Elsevier, 1970, pp. 621–661
- [Ric75] Scott F Richard “Optimal consumption, portfolio and life insurance rules for an uncertain lived individual in a continuous time model” In Journal of Financial Economics 2.2 Elsevier, 1975, pp. 187–203
- [WB09] Yang Wang and Stephen Boyd “Fast model predictive control using online optimization” In IEEE Transactions on control systems technology 18.2 IEEE, 2009, pp. 267–278
- [Whi90] Peter Whittle “Risk-sensitive Optimal Control” John Wiley & Sons, 1990