This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\DeclareCaptionType

[fileext=lop]listing

A Certainty Equivalent Merton Problem

Nicholas Moehle    Stephen Boyd
Abstract

The Merton problem is the well-known stochastic control problem of choosing consumption over time, as well as an investment mix, to maximize expected constant relative risk aversion (CRRA) utility of consumption. Merton formulated the problem and provided an analytical solution in 1970; since then a number of extensions of the original formulation have been solved. In this note we identify a certainty equivalent problem, i.e., a deterministic optimal control problem with the same optimal value function and optimal policy, for the base Merton problem, as well as a number of extensions. When time is discretized, the certainty equivalent problem becomes a second-order cone program (SOCP), readily formulated and solved using domain specific languages for convex optimization. This makes it a good starting point for model predictive control, a policy that can handle extensions that are either too cumbersome or impossible to handle exactly using standard dynamic programming methods.

1 Introduction

We revisit Merton’s seminal 1970 formulation (and solution) of the consumption and investment decisions of an individual investor. We present a formulation of Merton’s problem as a deterministic convex optimal control problem, and in particular, a second-order cone program (SOCP) when time is discretized. Even though the Merton problem was first solved more than 50 years ago, its reformulation as a deterministic convex optimization problem provides fresh insight into the solution of the stochastic problem that may be useful for formulating other multiperiod investment problems as convex optimization problems.

We also see two practical advantages to the certainty equivalent formulation. First, for extensions of the Merton problem for which a solution is known, working out the optimal policy can be complex and error prone. To handle these extensions with the certainty equivalent form, we simply add the appropriate terms to the objective or constraints, to obtain the optimal policy. The problem specification is straightforward and transparent, especially when expressed in a domain specific language (DSL) for convex optimization, such as cvxpy [DB16].

The second and perhaps more significant advantage is that the certainty equivalent problem can be used as a starting point for further extensions of the Merton problem, for which no closed-form solutions are known. In this case, the certainty equivalence property is lost, and solving the deterministic problem no longer solves the corresponding stochastic problem exactly. We can, however, still use model predictive control (MPC), a method that involves online convex optimization, to develop a policy that handles the extension. MPC policies are simple, easy to implement, fully interpretable, and have excellent (if not always optimal) practical performance.

1.1 Previous work

Merton’s problem.

Merton’s consumption–investment problem dates back to his original 1970 paper [Mer70]. Many extensions to the basic Merton problem exist, some of which were covered in Merton’s original paper. (These include deterministic income and general HARA utility.) Most proposed extensions do not have a closed-form solution, but some that do include uncertain mortality, life insurance, and annuities, first adressed by [Ric75]. Some extensions for the specific case of quadratic utility are handled in [BC10]. We note that many of these extensions individually lead to complicated solutions, and deriving the optimal policy when several extensions are combined may be very inconvenient for a practical implementation.

Certainty equivalence.

Rarely, stochastic control problems have a certainty equivalent formulation, i.e., a deterministic optimal control problem with the same optimal policy. The most famous example is the linear quadratic regulator (LQR) problem, in which the dynamics are affine, driven by additive noise, and the stage costs are convex quadratic [BB18], [Ber17, §3.1], [KS72, §3]. In this case, the certainty equivalent problem is obtained by simply ignoring the stochastic noise term. Many extensions to linear quadratic control also have a certainty equivalent reformulation. Examples include the linear quadratic Guassian problem, in which the state is imperfectly observed [KS72, §5], and linear exponential quadratic regulator (LEQR) problem, which uses a risk-sensitive cost function [Whi90]. Our certainty equivalent formulation is similar to LEQR in that the uncertain quantity is adversarial [Whi90, §10.2]. (For the Merton problem, the uncertain quantity is the investment returns.)

Model predictive control.

In model predictive control, unknown values of future parameters are replaced with estimates or forecasts over a planning horizon extending from the current time to some time in the future, resulting in a deterministic optimal control problem. This problem is solved, with the result intrepretable as a plan of action over the planning horizon. The MPC policy simply uses the current or first value in the plan of action. This planning is repeated when updated forecasts are available, using the updated forecasts and current state. When applied in the context of stochastic control, MPC policies are not optimal in general, but often exhibit excellent practical performance, and are widely used in several application areas. MPC is discussed in detail in [BBM17, KH06]. In [Boy+14], the authors use a computational bound to show that MPC is nearly optimal for some stochastic control problems in finance.

As discussed above, ignoring uncertainty is in fact optimal for linear quadratic control, and MPC leads to an optimal policy when applied to LQR. In this sense, MPC can be interpreted as applying certainty-equivalence beyond where it is theoretically justified in order to obtain a good heuristic control policy [Ber17, §4.3]. For the Merton problem, we also propose to use a certainty equivalent problem as the basis of an MPC control policy, even when certain extensions to the base problem ruin exact certainty equivalence.

While MPC has been used in practical applications for decades, recent advances make it very attractive, and easy, to develop and deploy. First, DSLs for convex optimization allow the control policy to be expressed in a few lines of very simple and clear code, that express the dynamics, objective, and constraints, which makes it easier to develop, debug, and maintain (for example by adding or updating a constraint). Code generation systems such as cvxgen [MB12] can be used to generate low-level code for that solves the problem specified, which is suitable for use in high speed embedded applications [WB09]. In the context of the present paper, this means that the MPC policy we propose in §6 can be very conveniently implemented.

Multi-period portfolio optimization.

It is instructive to compare our certainty equivalent problem to popular formulations of multi-period portfolio allocation (See [Boy+17] and references therein). There are two features present in our certainty equivalent problem that we do not see in practical multiperiod portfolio construction problems in the literature:

  1. 1.

    The risk term (which is quadratic in the dollar-valued asset allocation vector xtx_{t}), is normalized by the total wealth wtw_{t}, which is also is a decision variable. This risk term is jointly convex in xtx_{t} and wtw_{t} (and is in fact SOCP representable). With this normalization, risk preferences are consistent even as the wealth wtw_{t} changes over the investment horizon.

  2. 2.

    The risk term is included as a penalty in the dynamics, i.e., by taking more risk now, one should expected to have lower wealth in the future. This contrasts with the tradition of penalizing risk in the objective function.

We believe these to be valuable improvements to standard multi-period portfolio construction formulations, especially in cases when the control or optimization is over a very long time period.

1.2 Outline

In §2, we give the base Merton problem and review its solution, for future reference. In §3, we give a certainty equivalent problem and prove equivalence. In §4, discuss several extensions to the Merton problem, and show how each one changes the certainty equivalent formulation. In §6, we discuss how to use the certainty equivalent problem for model predictive control.

2 Merton problem

In this section we discuss the Merton problem and its solution. To keep the proofs concise, we consider the most basic form of this problem; extensions are considered in §4. Our formulation is in continuous time and relies on stochastic calculus. However, to maintain both brevity and accessibility, we are cavalier about the technical details, with the assumption that a sophisticated reader can fill in the gaps, or consult other references.

Dynamics.

An investor must choose how to invest and consume over a lifetime of TT years. The investor has wealth wt>0w_{t}>0 at time tt, and consumes wealth at rate ct>0c_{t}>0, for t[0,T]t\in[0,T], with the remaining wealth invested in a portfolio with mean rate of return μt\mu_{t} and volatility σt\sigma_{t}. The wealth dynamics are a geometric random walk,

dwt=(μtwtct)dt+σtwtdzt,dw_{t}=(\mu_{t}w_{t}-c_{t})\,dt+\sigma_{t}w_{t}\,dz_{t},

where ztz_{t} is a Brownian motion. The initial condition is w0=winit>0w_{0}=w_{\rm init}>0.

Investment portfolio.

The portfolio consists of nn assets, with an investment mix given by the fractional allocation θt\theta_{t}, with 𝟏Tθt=1\mathbf{1}^{T}\theta_{t}=1 (where 𝟏\mathbf{1} is the vector with all entries one). Thus we invest (wtθt)i(w_{t}\theta_{t})_{i} dollars in asset ii, with a negative value denoting a short position. The portfolio return rate and volatility are given by

μt=μTθt,σt=(θtTΣθt)1/2,\mu_{t}=\mu^{T}\theta_{t},\qquad\sigma_{t}=(\theta_{t}^{T}\Sigma\theta_{t})^{1/2},

where μRn\mu\in{\mbox{\bf R}}^{n} is the mean of the return process, and Σ\Sigma is the symmetric positive definite covariance. (Note that we use the time-varying scalar μt\mu_{t} to denote the portfolio return as a function of time, and the vector μ\mu to denote the constant expected return rates of the nn assets.)

The investment allocation decision θt\theta_{t} satisfies 𝟏Tθt=1\mathbf{1}^{T}\theta_{t}=1, as well as other investment constraints, which we summarize as θtΘ\theta_{t}\in\Theta, where Θ\Theta is a convex set. These could include risk limits, sector exposure limits, or concentration limits. (See [Boy+17, §4.4] for an overview of convex investment constraints.) For notational convenience, we assume every θtΘ\theta_{t}\in\Theta satisfies 𝟏Tθt=1\mathbf{1}^{T}\theta_{t}=1.

With the portfolio return and volatility we obtain the wealth dynamics

dwt=(μTθtwtct)dt+(θtTΣtθt)1/2wtdzt.dw_{t}=(\mu^{T}\theta_{t}w_{t}-c_{t})\,dt+\left(\theta_{t}^{T}\Sigma_{t}\theta_{t}\right)^{1/2}w_{t}\,dz_{t}. (1)

Utility.

The investor has lifetime consumption utility 0Tctγ/γ𝑑t\int_{0}^{T}c_{t}^{\gamma}/\gamma\;dt and bequest utility wTγ/γw_{T}^{\gamma}/\gamma. The risk aversion parameter γ\gamma satisfies γ<1\gamma<1 and γ0\gamma\neq 0. The investor’s total expected utility is

U=𝐄(βγwTγ+0T1γctγ𝑑t).U=\mathop{\bf E{}}\bigg{(}\frac{\beta}{\gamma}w_{T}^{\gamma}+\int_{0}^{T}\frac{1}{\gamma}c_{t}^{\gamma}\,dt\bigg{)}. (2)

The parameter β>0\beta>0 trades off consumption and bequest utility.

Stochastic control problem.

At each time tt, the investor chooses the consumption ctc_{t} and the investment allocation θt\theta_{t}. A policy maps the time tt and the current wealth wtw_{t} to the consumption ctc_{t} and the allocation θt\theta_{t}, which we write as

(ct,θt)=πt(wt),(c_{t},\theta_{t})=\pi_{t}(w_{t}), (3)

where for each t[0,T]t\in[0,T], πt:R++R++×Θ\pi_{t}:{\mbox{\bf R}}_{++}\to{\mbox{\bf R}}_{++}\times\Theta. (Here R++{\mbox{\bf R}}_{++} denotes the set of positive real numbers.) The Merton problem is to choose a policy πt\pi_{t}, t[0,T]t\in[0,T], to maximize UU.

2.1 Solution via dynamic programming

We review here the solution of the Merton problem via dynamic programming, for completeness and also for future reference.

Value function.

The value function Vt:R++RV_{t}:{\mbox{\bf R}}_{++}\to{\mbox{\bf R}}, for t[0,T]t\in[0,T], is defined as

Vt(w)=𝐄(βγwTγ+tT1γcτγ𝑑τ),V_{t}(w)=\mathop{\bf E{}}\bigg{(}\frac{\beta}{\gamma}w_{T}^{\gamma}+\int_{t}^{T}\frac{1}{\gamma}c_{\tau}^{\gamma}\,d\tau\bigg{)},

with cτc_{\tau} and θτ\theta_{\tau} following an optimal policy for τ[t,T]\tau\in[t,T], and initial condition wt=ww_{t}=w. We define VT(w)=(B/γ)wγV_{T}(w)=(B/\gamma)w^{\gamma} for w>0w>0.

If the value function is sufficiently smooth, it satisfies the Hamilton-Jacobi-Bellman PDE

V˙t(w)=supc,θΘ(1γcγ+Vt(w)(μTθwc)+12Vt′′(w)(θTΣθ)w2)\displaystyle-\dot{V}_{t}(w)=\sup_{c,\theta\in\Theta}\bigg{(}\frac{1}{\gamma}c^{\gamma}+V_{t}^{\prime}(w)(\mu^{T}\theta w-c)+\frac{1}{2}V_{t}^{\prime\prime}(w)(\theta^{T}\Sigma\theta)w^{2}\bigg{)} (4)

for w>0w>0. Conversely, any function satisfying (4) and the terminal condition VT=0V_{T}=0 is the value function. Here V˙t\dot{V}_{t} denotes the partial derivative of VV with respect to time, and VtV_{t}^{\prime} and Vt′′V_{t}^{\prime\prime} denote the first and second partial derivatives with respect to the wealth.

It is well known that the value function for the Merton problem is

Vt(w)=atwγγ,\displaystyle V_{t}(w)=a_{t}\frac{w^{\gamma}}{\gamma}, (5)

where ata_{t} is a function of time. To obtain ata_{t}, we first solve a Markowitz portfolio allocation problem,

maximize μTθ+γ12θTΣθ\displaystyle\mu^{T}\theta+\frac{\gamma-1}{2}\theta^{T}\Sigma\theta (6)
subject to θΘ,\displaystyle\theta\in\Theta,

with variable θ\theta. (Since γ1<0\gamma-1<0, the second term is a concave risk adjustment.) We let rcer_{\rm ce} denote the optimal value, and we denote the solution as θce\theta_{\rm ce}. We then have, for t[0,T]t\in[0,T],

at=(1γγrce(1Cexp(γrce1γ(Tt))))1γ,\displaystyle a_{t}=\bigg{(}\frac{1-\gamma}{\gamma r_{\rm ce}}\bigg{(}1-C\exp\Big{(}\frac{\gamma r_{\rm ce}}{1-\gamma}(T-t)\Big{)}\bigg{)}\bigg{)}^{1-\gamma}, (7)

where C=1γrceβ1/(1γ)/(1γ)C=1-\gamma r_{\rm ce}\beta^{1/(1-\gamma)}/(1-\gamma).

Optimal policy.

The optimal policy can be expressed in terms of the value function as

πt(w)=(ct,θt)=argmaxc,θΘ(1γcγ+Vt(w)(μTθwc)+12Vt′′(w)(θTΣθ)w2).\displaystyle\pi_{t}^{\star}(w)=(c_{t},\theta_{t})=\mathop{\rm argmax}_{c,\theta\in\Theta}\bigg{(}\frac{1}{\gamma}c^{\gamma}+V_{t}^{\prime}(w)(\mu^{T}\theta w-c)+\frac{1}{2}V_{t}^{\prime\prime}(w)(\theta^{T}\Sigma\theta)w^{2}\bigg{)}.

With the value function (5), we obtain the following optimal policy. The consumption has the simple form

ct=at1/(γ1)wt,c_{t}=a_{t}^{1/(\gamma-1)}w_{t},

and the optimal investment mix is constant over time,

θt=θce.\theta_{t}=\theta_{\rm ce}.

(In extensions of the Merton problem, described below, the optimal investment mix is not constant over time.)

Proof of optimality.

Here we show that the function (5) satisfies the Hamilton-Jacobi-Bellman PDE. To do this, first we substitute V˙\dot{V}, VtV_{t}^{\prime} and Vt′′V_{t}^{\prime\prime} into (4) to obtain

a˙twγγV˙(w)\displaystyle-\overbrace{\dot{a}_{t}\frac{w^{\gamma}}{\gamma}}^{\dot{V}(w)} =supc,θΘ(1γcγ+atwγ1V(w)(μTθwc)+12at(γ1)wγ2V′′(w)(θTΣθ)w2).\displaystyle=\sup_{c,\theta\in\Theta}\bigg{(}\frac{1}{\gamma}c^{\gamma}+\overbrace{a_{t}w^{\gamma-1}}^{V^{\prime}(w)}(\mu^{T}\theta w-c)+\frac{1}{2}\overbrace{a_{t}(\gamma-1)w^{\gamma-2}}^{V^{\prime\prime}(w)}(\theta^{T}\Sigma\theta)w^{2}\bigg{)}.

By pulling out wγ1w^{\gamma-1} from the last two terms and simplifying, we obtain

a˙twγγ\displaystyle-\dot{a}_{t}\frac{w^{\gamma}}{\gamma} =supc,θΘ(1γcγ+atwγ1((μTθ+γ12θTΣθ)wc)).\displaystyle=\sup_{c,\theta\in\Theta}\left(\frac{1}{\gamma}c^{\gamma}+a_{t}w^{\gamma-1}\left(\Big{(}\mu^{T}\theta+\frac{\gamma-1}{2}\theta^{T}\Sigma\theta\Big{)}w-c\right)\right). (8)

The maximizing θ\theta is the solution θce\theta_{\rm ce} to problem (6). The quantity in the inner parantheses of (8) is the optimal value rcer_{\rm ce} of this problem, which can be intrepreted as the certainty equivalent return. We now have

a˙twγγ\displaystyle-\dot{a}_{t}\frac{w^{\gamma}}{\gamma} =supc(1γcγ+βtwγ1(rcewc)).\displaystyle=\sup_{c}\left(\frac{1}{\gamma}c^{\gamma}+\beta_{t}w^{\gamma-1}\left(r_{\rm ce}w-c\right)\right).

The supremum over cc is obtained for c=at1/(γ1)wc=a_{t}^{1/(\gamma-1)}w. Substituting in this value and simplifying, we obtain

a˙t=(1γ)atγ/(γ1)+γatrce.\displaystyle-\dot{a}_{t}=(1-\gamma)a_{t}^{\gamma/(\gamma-1)}+\gamma a_{t}r_{\rm ce}.

It can be verified that the definition of ata_{t} in (7) is indeed a solution to this differential equation with terminal condition aT=βa_{T}=\beta.

3 Certainty equivalent problem

In this section we present a deterministic convex optimal control problem that is equivalent to the Merton problem in the sense that it has the same value function and same optimal policy.

This certainty equivalent problem is

maximize βγwTγ+0T1γctγ𝑑t\displaystyle\frac{\beta}{\gamma}w_{T}^{\gamma}+\int_{0}^{T}\frac{1}{\gamma}c_{t}^{\gamma}\,dt (9)
subject to w˙tμTxtct+(γ1)2xtTΣxtwt,t[0,T]\displaystyle\dot{w}_{t}\leq\mu^{T}x_{t}-c_{t}+\frac{(\gamma-1)}{2}\frac{x_{t}^{T}\Sigma x_{t}}{w_{t}},\quad t\in[0,T]
xt/wtΘ,t[0,T]\displaystyle x_{t}/w_{t}\in\Theta,\quad t\in[0,T]
w0=winit.\displaystyle w_{0}=w_{\rm init}.

The variables are the consumption ct:[0,T]R++c_{t}:[0,T]\to{\mbox{\bf R}}_{++}, wealth wt:[0,T]R++w_{t}:[0,T]\to{\mbox{\bf R}}_{++}, and xt:[0,T]Rnx_{t}:[0,T]\to{\mbox{\bf R}}^{n}, which is the dollar-valued allocation of wealth to each asset. (In the notation of §2, we have xt=wtθtx_{t}=w_{t}\theta_{t}, and θt=xt/wt\theta_{t}=x_{t}/w_{t}.) Note that the constraint xt/wtΘx_{t}/w_{t}\in\Theta implies 𝟏Txt=wt\mathbf{1}^{T}x_{t}=w_{t}, i.e., the total wealth is the sum of the dollar-valued asset allocations.

The objective is the lifetime utility, but without expectation since this problem is deterministic. The first constraint resembles the dynamics of the stochastic process (1), and we call this the dynamics constraint. We will see that for any solution to (9), this inequality constraint holds with equality, in which case the dynamics constraint becomes a (deterministic) ODE.

Interpretation.

The problem can be interpreted in the following way. We plan for a single outcome of the stochastic process (1). In particular, the dynamics constraint restricts the growth rate of the wealth to be no greater than the μTxtct\mu^{T}x_{t}-c_{t} (the mean growth rate in the stochastic process (1)), but reduced by the additional term (1/2)(γ1)xtTΣxt/wt(1/2)(\gamma-1)x_{t}^{T}\Sigma x_{t}/w_{t}. Because γ<1\gamma<1, this term is negative. With the change of variables θt=xt/wt\theta_{t}=x_{t}/w_{t}, we have

xtTΣxtwt=wtθtTΣθt,\frac{x_{t}^{T}\Sigma x_{t}}{w_{t}}=w_{t}\theta_{t}^{T}\Sigma\theta_{t},

i.e., this adjustment term is proportional to the variance of the portfolio growth rate with investment allocation θt=xt/wt\theta_{t}=x_{t}/w_{t}. In other words, we are pessimistically planning for bad investment returns, with the degree of pessimism depending on the risk aversion parameter γ\gamma and the risk of our portfolio.

In fact, in problem (9), we plan for the returns

rt=μ+γ12wtΣxt=μ+γ12Σθt.r_{t}=\mu+\frac{\gamma-1}{2w_{t}}\Sigma x_{t}=\mu+\frac{\gamma-1}{2}\Sigma\theta_{t}.

The coefficients in front of Σxt\Sigma x_{t} and Σθt\Sigma\theta_{t} are negative, and the entries of Σxt\Sigma x_{t} and Σθt\Sigma\theta_{t} are typically positive. The vector Σθt\Sigma\theta_{t} can be interpreted as the risk allocation to the individual assets in the portfolio, since

θtTΣθt=i=1n(θt)i(Σθt)i.\theta_{t}^{T}\Sigma\theta_{t}=\sum_{i=1}^{n}(\theta_{t})_{i}\left(\Sigma\theta_{t}\right)_{i}.

In other words, the planned asset returns are the mean returns, reduced in proportion to the marginal contribution of each asset to the portfolio variance. This is related to the concept of risk parity [BST16].

Convexity.

Convexity of (9) follows from the fact that the risk penalty term xtTΣxt/wtx_{t}^{T}\Sigma x_{t}/w_{t} is a quadratic-over-linear function, with is jointly convex in xtx_{t} and wtw_{t} [BV04, §3.1.5]. Also, the set

{(xt,wt)Rn×R++xt/wtΘ}\{(x_{t},w_{t})\in{\mbox{\bf R}}^{n}\times{\mbox{\bf R}}_{++}\mid x_{t}/w_{t}\in\Theta\}

is the perspective of Θ\Theta, which is convex when Θ\Theta is [BV04, §2.3.3]. In fact, in most practical portfolio construction problems, Θ\Theta can described by a collection of linear and quadratic constraints [Boy+17, §4.4]. In this case, when problem (9) is discretized, it becomes an SOCP, which we describe in §7.

Equivalence to Merton problem.

The Merton problem and problem (9) are equivalent in the sense that they have the same value function and optimal policy.

To see this, we first consider a modified version of (9) in which we convert the dynamics to an equality constraint using a slack variable ut0u_{t}\geq 0:

w˙t=μTxtct+(γ1)2xtTΣxtwt+ut.\dot{w}_{t}=\mu^{T}x_{t}-c_{t}+\frac{(\gamma-1)}{2}\frac{x_{t}^{T}\Sigma x_{t}}{w_{t}}+u_{t}.

The new control input utu_{t} can be interpreted as the rate at which we discard wealth. (We will see that at optimality ut=0u_{t}=0.) For this modified problem, the Hamilton-Jacobi-Bellman equation is

V˙(w)\displaystyle-\dot{V}(w) =supc,xwΘ,u01γcγ+Vt(w)((μTx+γ12wxTΣx)wcu).\displaystyle=\sup_{c,x\in w\Theta,u\geq 0}\frac{1}{\gamma}c^{\gamma}+V_{t}^{\prime}(w)\left(\Big{(}\mu^{T}x+\frac{\gamma-1}{2w}x^{T}\Sigma x\Big{)}w-c-u\right).

First note that with our value function candidate (5), we have V(w)>0V^{\prime}(w)>0, and therefore u=0u=0, as expected. Now, by using the change of variables x=θwx=\theta w and plugging in our value function candidate, this equation becomes (8). From this point on, the proof that this candidate value function satisfies the Hamilton-Jacobi-Bellman equation proceeds exactly as for the (stochastic) Merton problem.

4 Exact extensions

Here we consider several extensions to the Merton problem, all of which are known in the literature and have closed-form solutions. For each one, we describe how to modify problem (9) to maintain the certainty-equivalence property.

Time-varying parameters.

The Merton problem can be solved when μ\mu, Σ\Sigma, and Θ\Theta change over time. To handle this in the certainty equivalent problem, we simply replace these parameters by μt\mu_{t}, Σt\Sigma_{t}, and Θt\Theta_{t}. (Here μt\mu_{t} denotes the time-varying vector of asset expected returns, a notation clash with our previous use of μt\mu_{t} as the scalar portfolio expected return.) Similarly, if we discount the consumption utility of the Merton problem:

U=𝐄(βγwTγ+0Tαtγctγ𝑑t).U=\mathop{\bf E{}}\bigg{(}\frac{\beta}{\gamma}w_{T}^{\gamma}+\int_{0}^{T}\frac{\alpha_{t}}{\gamma}c_{t}^{\gamma}\,dt\bigg{)}.

where αt>0\alpha_{t}>0 is the discount of the consumption utility at time tt, then the objective of the certainty equivalent problem will match UU (but without the expectation).

Uncertain mortality and bequest.

Here the terminal time tf[0,T]t_{f}\in[0,T] is random with probability density ptp_{t} and survival function

st=𝐏𝐫𝐨𝐛(tf>t)=tTpt𝑑t.s_{t}=\mathop{\bf Prob}(t_{f}>t)=\int_{t}^{T}p_{t}\,dt.

In this case, the investor’s utility is

U=𝐄(Bγwtfγ+0tf1γctγ𝑑t).U=\mathop{\bf E{}}\bigg{(}\frac{B}{\gamma}w_{t_{f}}^{\gamma}+\int_{0}^{t_{f}}\frac{1}{\gamma}c_{t}^{\gamma}\,dt\bigg{)}.

Here the expectation is taken over tft_{f} as well as the paths of the stochastic process (1).

With this modification, the objective of the certainty equivalent problem changes to

0T(ptβγwtγ+stγctγ)𝑑t.\int_{0}^{T}\left(\frac{p_{t}\beta}{\gamma}w_{t}^{\gamma}+\frac{s_{t}}{\gamma}c_{t}^{\gamma}\right)\,dt.

We weight the consumption utility by the probability the investor is still alive, i.e., we treat the survival function as a discount factor. We also get utility for the bequest continuously over the interval [0,T][0,T], weighted by the density function ptp_{t}.

Annuities and life insurance.

This extension is due to [Ric75]. Continuing with the previous extension, we allow the investor to purchase life insurance. The premium is ltl_{t}, which the investor can choose, and the payout of the plan is λtlt\lambda_{t}l_{t}, where λt0\lambda_{t}\geq 0 is the payout-to-premium ratio at time tt. When lt<0l_{t}<0, we interpret this as an annuity. In particular, at time tt, the investor has lt-l_{t} in the annuity account, which is lost on death, in return for an additional return of λtlt-\lambda_{t}l_{t}. The actuarially fair value of λt\lambda_{t} is pt/stp_{t}/s_{t}, which is called the force of mortality. (If λt>pt/st\lambda_{t}>p_{t}/s_{t}, then life insurance is favorable and annuities are unfavorable; if λt<pt/st\lambda_{t}<p_{t}/s_{t}, the reverse is true.)

With this modification, the objective of the certainty-equivalant problem changes to

U=0T(ptβγ(wt+λtlt)γ+stγctγ)𝑑t,U=\int_{0}^{T}\left(\frac{p_{t}\beta}{\gamma}(w_{t}+\lambda_{t}l_{t})^{\gamma}+\frac{s_{t}}{\gamma}c_{t}^{\gamma}\right)\,dt,

i.e., we add the insurance payout to the wealth in the bequest utility. The dynamics change to

w˙tμTxtctlt+(γ1)2xtTΣxtwt.\dot{w}_{t}\leq\mu^{T}x_{t}-c_{t}-l_{t}+\frac{(\gamma-1)}{2}\frac{x_{t}^{T}\Sigma x_{t}}{w_{t}}.

Here we subtract the insurance premium from the growth rate of the wealth.

Income.

We can add a deterministic income stream, with income rate yty_{t} at time tt. The stochastic dynamics are modified be the addition of yty_{t} to the drift term of the wealth process, i.e.,

μt=μTθtwt+ytct.\mu_{t}=\mu^{T}\theta_{t}w_{t}+y_{t}-c_{t}.

In this case, we also assume one of the assets is risk free with return μrf\mu_{\rm rf} and volatility 0, and that

Θ={θ𝟏Tθ=1}.\Theta=\{\theta\mid\mathbf{1}^{T}\theta=1\}. (10)

These assumptions allow the investor to counteract the income stream by shorting the risk-free asset and investing the proceeds in a preferred portfolio of other assets. The fair value of the income stream is its net present value over [t,T][t,T] at the risk-free rate:

vt=tTeμrfyt𝑑t,v_{t}=\int_{t}^{T}e^{-\mu_{\rm rf}y_{t}}\,dt,

which can be interpreted as the remaining human capital of the investor.

For this extension, the dynamics in (9) are replaced by

w˙tμtTxt+ytct+(γ1)2xtTΣxtwt+vt.\dot{w}_{t}\leq\mu_{t}^{T}x_{t}+y_{t}-c_{t}+\frac{(\gamma-1)}{2}\frac{x_{t}^{T}\Sigma x_{t}}{w_{t}+v_{t}}.

Note the addition of the income term yty_{t} and the normalization of risk by the total wealth plus the remaining human capital. In this case, the wealth wtw_{t} need not be positive but instead satisfies wt+vt>0w_{t}+v_{t}>0. Because of this, we also replace the constraint xt/wtΘx_{t}/w_{t}\in\Theta (which is not defined for wt=0w_{t}=0) with 𝟏Txt=wt\mathbf{1}^{T}x_{t}=w_{t}.

Epstein–Zin preferences.

One interesting feature of the certainty equivalent problem (9) is that the risk aversion parameter γ\gamma appears separately in the objective and dynamics constraint. It is reasonable to ask whether, by modifying the consumption utility to be

βρwTρ+0T1ρctρ𝑑t\frac{\beta}{\rho}w_{T}^{\rho}+\int_{0}^{T}\frac{1}{\rho}c_{t}^{\rho}\,dt

for some ργ\rho\neq\gamma with ρ<1\rho<1 and ρ0\rho\neq 0, but keeping γ\gamma in the dynamics constraint, problem (9) is equivalent to some variant of the Merton problem. This is indeed the case, but with the expected utility UU replaced by Epstein–Zin preferences, where 1/ρ1/\rho is the elasticity of intertemporal substitution and γ\gamma is the risk aversion. For details, see [DE92].

5 Inexact extensions

Here we discuss several extensions of problem (9) that (to our knowledge) do not exactly solve a version of the Merton problem. Some of these build on the exact extensions of §4.

Modified utility.

We can change the objective of (9) to use any increasing, concave utility function for either consumption or bequest. These utility functions need not be additive over time: For example, we can maximize the minimum consumption over the interval [0,T][0,T],

As a special case, we can add a minimum consumption constraint

ctctmin,c_{t}\geq c_{t}^{\rm min},

where ctminc_{t}^{\rm min} is the minimum allowable consumption amount as a function of age. Similarly, we can enforce a minimum bequest over some time window (say, to care for underage dependents until they come of age).

Spending limit.

We can limit consumption as a fraction of income with the constraint

ctηytc_{t}\leq\eta y_{t}

for some parameter η>0\eta>0. For example, when η=0.7\eta=0.7, this constraint means that we can’t consume more than 70% of our income, i.e., we must have a savings rate of 30%.

This constraint can be adjusted to account for investment income. To see this, take dRnd\in{\mbox{\bf R}}^{n} to be the vector of dividend yields for each asset, which is constant and known in advance. The modified constraint becomes

ctηyt+dTxt.c_{t}\leq\eta y_{t}+d^{T}x_{t}.

When this constraint is tight, i.e., when we desire to consume more than η\eta times our income, there is added incentive to invest in assets with high dividend yield.

Minimum cash balance.

We can include a constraint that the amount invested in cash be above a certain level, i.e.,

(xt)i(xtmin)i,(x_{t})_{i}\geq(x_{t}^{\rm min})_{i},

where ii is the index of the cash asset. This is similar to an emergency fund constraint that we must keep six months worth of consumption in cash, which is expressed as

(xt)i0.5ct.(x_{t})_{i}\geq 0.5c_{t}.

6 Application to model predictive control

Model predictive control is a technique for stochastic control problems that leverages a deterministic approximation of the stochastic problem. To evaluate an MPC policy, we first solve this determistic problem to obtain a planned trajectory for the state and control input over the planning horizon. We then implement only the first control input in this plan, and rest of the planned trajectory is discarded. To obtain future control inputs, the policy is evaluated again, which requires solving a new deterministic problem.

In the context of the Merton problem, the certainty equivalent problem is used as a basis for a simple model predictive control policy, which we denote πtmpc\pi_{t}^{\rm mpc}. We first define this policy when t=0t=0, with initial wealth w0w_{0}. We start by solving the deterministic control problem (9) to obtain the optimal trajectories ctc_{t} and θt\theta_{t}. The MPC policy then takes π0mpc(w0)=(c0,θ0)\pi_{0}^{\rm mpc}(w_{0})=(c_{0},\theta_{0}). To define the MPC policy for t(0,T)t\in(0,T), we first form a new instance of problem (9), which is defined over the interval [t,T][t,T] and has initial wealth wtw_{t}. Once again we solve the deterministic optimal control problem (9), to obtain optimal cτc_{\tau} and θτ\theta_{\tau} over the interval τ[t,T]\tau\in[t,T]. We then take πtmpc(wt)=(ct,θt)\pi_{t}^{\rm mpc}(w_{t})=(c_{t},\theta_{t}). Evaluating the MPC policy therefore always requires solving a deterministic optimal control problem of the form (9).

MPC is a convenient way to implement the optimal policy for the basic problem or any of the extensions of §4. In those cases, the MPC policy is optimal. When MPC is applied with constraints and an objective that do not correspond to any version of Merton problem, the MPC policy is a sophisticated heuristic, and very useful in practice.

To use MPC in practice requires discretizing problem (9), which we discuss in the next section.

7 Discretized problem

Here we show how to discretize problem (9). We do this for the basic problem only, but note that the extensions can be handled similarly.

We let xkx_{k} denote the value of xtx_{t} in (9) at time t=hkt=hk, k=0,,Kk=0,\ldots,K, where h=T/Kh=T/K is the discretization interval. (We use the same notation, but index xx with the subscript kk to denote the discretized variable, and index with tt to denote the continuous variable.) We similarly define the discretized variables ckc_{k} and wkw_{k}. Replacing the time derivative w˙t\dot{w}_{t} with the forward Euler approximation (wk+1wk)/h(w_{k+1}-w_{k})/h, and replacing the integral in the objective with a Riemann sum approximation, we obtain the discretized problem

maximize βγwTγ+k=0K1hγckγ\displaystyle\frac{\beta}{\gamma}w_{T}^{\gamma}+\sum_{k=0}^{K-1}\frac{h}{\gamma}c_{k}^{\gamma} (11)
subject to wk+1wkhμTxkck+(γ1)2xkTΣxkwk,k=0,,K1\displaystyle\frac{w_{k+1}-w_{k}}{h}\leq\mu^{T}x_{k}-c_{k}+\frac{(\gamma-1)}{2}\frac{x_{k}^{T}\Sigma x_{k}}{w_{k}},\quad k=0,\dots,K-1
xk/wkΘ,k=0,,K\displaystyle x_{k}/w_{k}\in\Theta,\quad k=0,\dots,K
w0=winit.\displaystyle w_{0}=w_{\rm init}.

The variables are xkRnx_{k}\in{\mbox{\bf R}}^{n} and wkR++w_{k}\in{\mbox{\bf R}}_{++} for k=0,,Kk=0,\dots,K and ckR++c_{k}\in{\mbox{\bf R}}_{++} for k=0,,K1k=0,\dots,K-1. All of the extensions (exact and inexact) discussed above can be discretized as well, but we do not give the details here.

The discretized certainty equivalent problem (11) is a (finite-dimensional) convex optimization problem, and can therefore be easily expressed in a domain-specific language for convex optimization, such as cvxpy. As an example, we give a cvxpy implementation of (11) in listing 7 when Θ\Theta is given by (10).

{listing}
w = Variable(K+1)
x = Variable(n, K+1)
c = Variable(K)
Sigma_half = numpy.linalg.cholesky(Sigma)
U = beta/gamma * power(w[K], gamma) + h/gamma * sum(power(c, gamma))
constr = [w == sum(x, axis=0), w[0] == w_init]
for k in range(K):
constr += [diff(w[k+1] - w[k])/h <= mu @ x[:, k] - c[k] + (gamma - 1)/2 * quad_over_lin(Sigma_half @ x[:, k], w[k])]
]
problem = Problem(Maximize(U), constr)
problem.solve()

An implementation of the discretized certainty equivalent problem (11) using cvxpy.

For most practical portfolio construction problems, Θ\Theta is SOCP representable, which means that problem (11) is an SOCP [Lob+98]. To see this, note that the power utility ckγc_{k}^{\gamma} and the quadratic-over-linear functions are SOCP representable; see [AG03, §2.2.f] and [Lob+98, §2.4], respectively. The perspective of Θ\Theta can be represented using the same cones used to represent Θ\Theta [MB15, §2].

To give some idea of the speed at which current solvers can solve the discretized problem (11) (and its extensions), consider a problem with n=500n=500 assets, K=50K=50 periods, and covariance matrix Σ\Sigma given as a typical factor model, with 25 factors. This problem has more than 100000 optimization variables. With just a small modification of the code given in listing 7 to exploit the low rank plus diagonal structure of the covariance matrix, the open-source solver ECOS [DCB13] solve the problem in around two seconds, on a single thread.

References

  • [AG03] Farid Alizadeh and Donald Goldfarb “Second-order cone programming” In Mathematical programming 95.1, 2003, pp. 3–51
  • [BB18] Shane Barratt and Stephen Boyd “Stochastic control with affine dynamics and extended quadratic costs” ArXiv preprint, 2018
  • [BBM17] Francesco Borrelli, Alberto Bemporad and Manfred Morari “Predictive control for linear and hybrid systems” Cambridge University Press, 2017
  • [BC10] Suleyman Basak and Georgy Chabakauri “Dynamic mean-variance asset allocation” In The Review of Financial Studies 23.8 Oxford University Press, 2010, pp. 2970–3016
  • [Ber17] Dimitri P Bertsekas “Dynamic programming and optimal control” Athena scientific, 2017
  • [Boy+14] Stephen Boyd, Mark Mueller, Brendon O’Donoghue and Yang Wang “Performance bounds and suboptimal policies for multi-period investment” In Foundations and Trends in Optimization 1.1 Now Publishers, Inc., 2014, pp. 1–69
  • [Boy+17] Stephen Boyd, Enzo Busseti, Steve Diamond, Ronald N Kahn, Kwangmoo Koh, Peter Nystrup and Jan Speth “Multi-period trading via convex optimization” In Foundations and Trends in Optimization 3.1 Now Publishers, Inc., 2017, pp. 1–76
  • [BST16] Xi Bai, Katya Scheinberg and Reha Tutuncu “Least-squares approach to risk parity in portfolio selection” In Quantitative Finance 16.3 Taylor & Francis, 2016, pp. 357–376
  • [BV04] S. Boyd and L. Vandenberghe “Convex optimization” Cambridge University Press, 2004
  • [DB16] S. Diamond and S. Boyd “CVXPY: A Python-embedded modeling language for convex optimization” In Journal of Machine Learning Research 17.83, 2016, pp. 1–5
  • [DCB13] Alexander Domahidi, Eric Chu and Stephen Boyd “ECOS: An SOCP solver for embedded systems” In European Control Conference, 2013, pp. 3071–3076
  • [DE92] Darrell Duffie and Larry G Epstein “Stochastic differential utility” In Econometrica: Journal of the Econometric Society JSTOR, 1992, pp. 353–394
  • [KH06] Wook Hyun Kwon and Soo Hee Han “Receding horizon control: Model predictive control for state models” Springer, 2006
  • [KS72] Huibert Kwakernaak and Raphael Sivan “Linear optimal control systems” John Wiley & Sons, 1972
  • [Lob+98] Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd and Hervé Lebret “Applications of second-order cone programming” In Linear algebra and its applications 284.1-3 Elsevier, 1998, pp. 193–228
  • [MB12] Jacob Mattingley and Stephen Boyd “CVXGEN: A code generator for embedded convex optimization” In Optimization and Engineering 13.1 Springer, 2012, pp. 1–27
  • [MB15] Nicholas Moehle and Stephen Boyd “A perspective-based convex relaxation for switched-affine optimal control” In Systems and Control Letters 86 Elsevier, 2015, pp. 34–40
  • [Mer70] Robert C Merton “Optimum consumption and portfolio rules in a continuous-time model” In Stochastic Optimization Models in Finance Elsevier, 1970, pp. 621–661
  • [Ric75] Scott F Richard “Optimal consumption, portfolio and life insurance rules for an uncertain lived individual in a continuous time model” In Journal of Financial Economics 2.2 Elsevier, 1975, pp. 187–203
  • [WB09] Yang Wang and Stephen Boyd “Fast model predictive control using online optimization” In IEEE Transactions on control systems technology 18.2 IEEE, 2009, pp. 267–278
  • [Whi90] Peter Whittle “Risk-sensitive Optimal Control” John Wiley & Sons, 1990