This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Distributionally Robust Multistage Convex Optimization: Data-driven Models and Performance

Shixuan Zhang Wm Michael Barnes ’64 Department of Industrial & Systems Engineering, Texas A&M University,  shixuan.zhang@tamu.edu Xu Andy Sun Operations Research and Statistics, Sloan School of Management, Massachusetts Institute of Technology,  sunx@mit.edu
Abstract

This paper presents a novel algorithmic study with extensive numerical experiments of distributionally robust multistage convex optimization (DR-MCO). Following the previous work on dual dynamic programming (DDP) algorithmic framework for DR-MCO [48], we focus on data-driven DR-MCO models with Wasserstein ambiguity sets that allow probability measures with infinite supports. These data-driven Wasserstein DR-MCO models have out-of-sample performance guarantees and adjustable in-sample conservatism. Then by exploiting additional concavity or convexity in the uncertain cost functions, we design exact single stage subproblem oracle (SSSO) implementations that ensure the convergence of DDP algorithms. We test the data-driven Wasserstein DR-MCO models against multistage robust convex optimization (MRCO), risk-neutral and risk-averse multistage stochastic convex optimization (MSCO) models on multi-commodity inventory problems and hydro-thermal power planning problems. The results show that our DR-MCO models could outperform MRCO and MSCO models when the data size is small.
Keywords: distributionally robust optimization, multistage convex optimization, dual dynamic programming algorithm, multi-commodity inventory, hydro-thermal power planning problem

1 Introduction

Multistage convex optimization is a decision-making problem where objective functions and constraints are convex and decisions need to be made each time some of the uncertainty information is revealed. As it is often challenging to gain precise knowledge of the probability distributions of the uncertainty, distributionally robust multistage convex optimization (DR-MCO) allows ambiguities in probability distribution and aims to find optimal decisions that minimize the overall expected objective cost with respect to the worst-case distribution. The DR-MCO framework encompasses multistage stochastic convex optimization (MSCO) and multistage robust convex optimization (MRCO) as special cases, and thus have been widely applied in many areas including energy systems, supply chain and inventory planning, portfolio optimization, and finance [43, 6].

Distributionally robust optimization (DRO) has received significant research attention in recent years. Common choices of the ambiguity sets include moment-based ambiguity sets [40, 10, 42, 47], discrepancy or distance-based ambiguity sets [5, 4, 36, 7], and many other ones that are constructed from shape requirements or special properties of the distributions (we refer any interested reader to the review papers [37, 29] for more details). Among all these choices, data-driven Wasserstein ambiguity sets, i.e., Wasserstein distance balls centered at an empirical probability distribution obtained from given uncertainty data, has gained increasing popularity because of the following reasons: (1) measure concentration in Wasserstein distance guarantees with high probability that the decisions or policies obtained from solving the in-sample model provides an upper bound on the out-of-sample mean cost [16, 15]; and (2) even if we consider general probability distributions in the ambiguity sets, tractable finite-dimensional reformulation or approximation can be derived using strong duality for 1-Wasserstein distance of probability measures on Euclidean spaces [15, 50, 23], or more generally for any pp-Wasserstein distance of probability measures on Polish spaces [17]. We remark that these reformulations or approximations are derived for single-stage or two-stage settings. To the best of our knowledge, it remains unclear how to solve DR-MCO problems with Wasserstein ambiguity sets and infinite uncertainty sets.

There are also many works on DRO in the multistage settings. In particular, since taking expectation with respect to a worst-case probability distribution in an ambiguity set gives a coherent risk measure, multistage DRO falls into the category of risk-averse multistage stochastic optimization [43, 35], which dates back to at least [14]. When the underlying uncertainty is stagewise independent, a random nested cutting plane algorithm, which is called stochastic dual dynamic programming (DDP) and is very often used for risk-neutral multistage stochastic linear optimization (MSLO) problems [31], has been extended to risk-averse MSLO with polyhedral risk measures in [21, 45]. DDP algorithms iteratively build under-approximations of the worst-case expected cost-to-go functions in the dynamic programming recursion, which lead to policies that minimize the approximate total cost starting from each stage. To estimate the quality of the obtained policy, deterministic over-approximation is proposed in addition to the under-approximations for risk-averse MSLO problem in [34]. Alternatively, for time-consistent conditional-value-at-risk (CVaR) risk measures, a new sampling scheme based on importance sampling is proposed to achieve tighter estimation of the policy quality [25]. Further exploiting the deterministic over-approximation, a deterministic version of the DDP algorithm is proposed for risk-averse MSLO in [1]. As a variant, robust DDP is proposed for multistage robust linear optimization (MRLO) using similar deterministic over-approximation of cost-to-go functions in [18]. In [32], stochastic DDP algorithm is used to solve distributionally robust multistage linear optimization (DR-MLO) using ambiguity sets defined by a modified χ2\chi^{2}-distance for probability distributions supported on the historical data. In [13], stochastic DDP algorithm is used to solve DR-MLO with Wasserstein ambiguity sets on finitely supported probability distributions with asymptotic convergence and promising out-of-sample performance. We comment that the above solution approaches for risk-averse MSLO or DR-MLO rely either on sample average approximations of the risk measures or on the assumption that all uncertainties have finite support.

As DDP-type algorithms are extensively applied in solving risk-neutral and risk-averse MSLO and DR-MLO, their convergence analysis and complexity study become central questions. The finite time convergence of stochastic DDP algorithms is first proved for MSLO problems using polyhedrality of cost-to-go functions in [33, 41]. Such convergence is similarly proved for deterministic DDP algorithms for MSLO in [2], for MRLO in [18], and for DR-MLO in [1]. An asymptotic convergence of stochastic DDP algorithms is proved using monotone convergence argument in the space of convex cost-to-go function approximations for general risk-neutral multistage stochastic convex optimization (MSCO) in [19] and risk-averse MSCO in [20]. Complexity study of DDP algorithms is established using Lipschitz continuity of the under-approximations of the cost-to-go functions for risk-neutral MSCO in [26, 27], and for risk-neutral multistage mixed-integer nonlinear optimization in [49]. Our recent work [48] adopts an abstract definition of single stage subproblem oracles (SSSO) and proves SSSO-based complexity bounds, hence also the convergence, for two DDP algorithms applied to DR-MCO problems with general uncertainty supports and ambiguity sets.

Following [48], we aim to further study data-driven DR-MCO models and compare their numerical performance to other baselines models. In particular, our paper makes the following contributions to the literature.

  1. 1.

    We prove the out-of-sample performance guarantee using measure concentration results, adjustable in-sample conservatism assuming Lipschitz continuity of the value functions in the uncertainty variables for the data-driven DR-MCO models with Wasserstein ambiguity sets.

  2. 2.

    We discuss implementations of SSSO in the context of Wasserstein ambiguity sets containing infinitely supported probability distributions. Such SSSO allows DDP algorithms to converge with provable complexity bounds to an ε\varepsilon-global optimal first-stage solution, which could help minimize the influence from the sub-optimality caused by early termination of the algorithms when we compare model performances.

  3. 3.

    We present extensive numerical experiments using multi-commodity inventory problem with either uncertain demands or uncertain prices, and hydro-thermal power planning problems with real-world data, that compares the out-of-sample performances of our DR-MCO models against risk-neutral and CVaR risk-averse MSCO models, as well as the MRCO model. To the best of our knowledge, these are the first numerical experiments in the literature that compare DR-MCO models with MSCO and MRCO models when the uncertainty has infinitely many outcomes.

The rest of the paper is organized as follows. In Section 2, we define the DR-MCO model considered in this paper and show some of its favorable properties. In Section 3, we review DDP algorithms for DR-MCO and study the implementation of SSSO for Wasserstein ambiguity sets. In Section 4, we present numerical experiment results comparing DR-MCO models against other baseline models on two application problems. We provide some concluding remarks in Section 5.

2 Data-driven Model and Its Properties

2.1 Data-driven Model Formulation

In this section, we present a data-driven model for DR-MCO and some of its properties. Let 𝒯:={1,,T}\mathcal{T}\mathrel{\mathop{\ordinarycolon}}=\{1,\dots,T\} denote the set of stage indices. In each stage t𝒯t\in\mathcal{T}, we use 𝒳tdt\mathcal{X}_{t}\subset\mathbb{R}^{d_{t}} to denote the convex state space and xtx_{t} its elements, which is known as the state vector. We denote the set of uncertainties before stage tt as Ξtδt\Xi_{t}\subseteq\mathbb{R}^{\delta_{t}} and its elements as ξt\xi_{t}. For simplicity, we use the notation 𝒳0={x0}\mathcal{X}_{0}=\{x_{0}\} and Ξ1={ξ1}\Xi_{1}=\{\xi_{1}\} to denote parameter sets of the given initial state. After each stage t𝒯{T}t\in\mathcal{T}\setminus\{T\}, the uncertainty ξt\xi_{t} is assumed to be distributed according to an unknown probability measure ptp_{t} taken from a subset of the probability measures 𝒫tProb(Ξt)\mathcal{P}_{t}\subset\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}) supported on Ξt\Xi_{t}. The cost in each stage t𝒯t\in\mathcal{T} is described through a nonnegative, lower semicontinuous local cost function ft(xt1,xt;ξt)f_{t}(x_{t-1},x_{t};\xi_{t}), which is assumed to be convex in xt1x_{t-1} and xtx_{t} for every ξtΞt\xi_{t}\in\Xi_{t}. We allow ftf_{t} to take ++\infty to model constraints relating the states xt1,xtx_{t-1},x_{t} and the uncertainty ξt\xi_{t} (so that 𝒳t\mathcal{X}_{t} can be independent of xt1x_{t-1} and ξt\xi_{t}). The DR-MCO can be written in a nested formulation as follows.

infx1𝒳1f1(x0,x1;ξ1)+\displaystyle\inf_{x_{1}\in\mathcal{X}_{1}}\quad f_{1}(x_{0},x_{1};\xi_{1})\ +\; supp2𝒫2𝔼ξ2p2[infx2𝒳2f2(x1,x2;ξ2)+\displaystyle\sup_{p_{2}\in\mathcal{P}_{2}}\mathbb{E}_{\xi_{2}\sim p_{2}}\Bigg{[}\inf_{x_{2}\in\mathcal{X}_{2}}f_{2}(x_{1},x_{2};\xi_{2})+ (1)
+supp3𝒫3𝔼ξ3p3[infx3𝒳3f3(x2,x3;ξ3)+\displaystyle+\sup_{p_{3}\in\mathcal{P}_{3}}\mathbb{E}_{\xi_{3}\sim p_{3}}\bigg{[}\inf_{x_{3}\in\mathcal{X}_{3}}f_{3}(x_{2},x_{3};\xi_{3})+\cdots
+suppT𝒫T𝔼ξTpT[infxT𝒳TfT(xT1,xT;ξT)]]].\displaystyle+\sup_{p_{T}\in\mathcal{P}_{T}}\mathbb{E}_{\xi_{T}\sim p_{T}}\Big{[}\inf_{x_{T}\in\mathcal{X}_{T}}f_{T}(x_{T-1},x_{T};\xi_{T})\Big{]}\cdots\bigg{]}\Bigg{]}.

Here, 𝔼ξtpt\mathbb{E}_{\xi_{t}\sim p_{t}} is the expectation with respect to variable ξt\xi_{t} distributed according to the probability measure ptp_{t}. We remark that both the uncertainty sets Ξt\Xi_{t} and the ambiguity sets 𝒫t\mathcal{P}_{t} are independent between stages, which are usually referred to as stagewise independence.

Based on stagewise independence in the nested formulation, we can write the following recursion that is equivalent to (1) using the (worst-case expected) cost-to-go functions,

𝒬t1(xt1):=suppt𝒫t𝔼ξtpt[infxt𝒳tft(xt1,xt;ξt)+𝒬t(xt)],\mathcal{Q}_{t-1}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}\bigg{[}\inf_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t})\bigg{]}, (2)

for each t𝒯t\in\mathcal{T} and we set by convention 𝒬T(xT):=0\mathcal{Q}_{T}(x_{T})\mathrel{\mathop{\ordinarycolon}}=0 for any xT𝒳Tx_{T}\in\mathcal{X}_{T}. To simplify the notation, we also define the following value function for each stage t𝒯t\in\mathcal{T}:

Qt(xt1;ξt):=infxt𝒳tft(xt1,xt;ξt)+𝒬t(xt).Q_{t}(x_{t-1};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\inf_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t}). (3)

Using these value functions, we may write the optimal value of the DR-MCO (1) as Q1(x0;ξ1)Q_{1}(x_{0};\xi_{1}) and further simplify the recursion (2) as

𝒬t1(xt1)=suppt𝒫tΞtQt(xt1;ξt)dpt(ξt).\mathcal{Q}_{t-1}(x_{t-1})=\sup_{p_{t}\in\mathcal{P}_{t}}\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}p_{t}(\xi_{t}). (4)

While there are many different choices of the ambiguity set 𝒫t\mathcal{P}_{t} for each stage t𝒯t\in\mathcal{T} (see e.g., [47]), we focus on the data-driven ambiguity sets constructed as follows. Suppose we have the knowledge of ntn_{t} samples ξ^t,1,,ξ^t,nt\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}} of the uncertainty ξt\xi_{t}. The empirical probability measure is given by ν^t:=1ntk=1ntδξ^t,k\hat{\nu}_{t}\mathrel{\mathop{\ordinarycolon}}=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\delta_{\hat{\xi}_{t,k}}, where for each k=1,,ntk=1,\dots,n_{t}, δξ^t,k\delta_{\hat{\xi}_{t,k}} is the Dirac probability measure supported at the point ξ^t,kΞt\hat{\xi}_{t,k}\in\Xi_{t}, i.e., Ξtfdδξ^t,k=f(ξ^t,k)\int_{\Xi_{t}}f\operatorname{d\!}\delta_{\hat{\xi}_{t,k}}=f(\hat{\xi}_{t,k}) for any compactly supported function ff on Ξt\Xi_{t}. Such an empirical probability measure ν^t\hat{\nu}_{t} captures the information from the sample data and is often used to build the sample average approximation for multistage stochastic optimization [43].

Fix any distance function dt(,)d_{t}(\cdot,\cdot) on Ξt\Xi_{t}, the Wasserstein distance of order 1 (a.k.a, Kantorovich-Rubinstein distance) is defined as

Wt(μ,ν):=infπProb(Ξt×Ξt){Ξt×Ξtdt(ξ1,ξ2)dπ(ξ1,ξ2):P1(π)=μ,P2(π)=ν},W_{t}(\mu,\nu)\mathrel{\mathop{\ordinarycolon}}=\inf_{\pi\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}\times\Xi_{t})}\left\{\int_{\Xi_{t}\times\Xi_{t}}d_{t}(\xi^{1},\xi^{2})\operatorname{d\!}\pi(\xi^{1},\xi^{2})\mathrel{\mathop{\ordinarycolon}}P^{1}_{*}(\pi)=\mu,\,P^{2}_{*}(\pi)=\nu\right\}, (5)

for any two probability measures μ,νProb(Ξt)\mu,\nu\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}), where Pi(π)P^{i}_{*}(\pi) is the pushforward measure induced by the projection maps Pi:Ξt×ΞtΞtP^{i}\mathrel{\mathop{\ordinarycolon}}\Xi_{t}\times\Xi_{t}\to\Xi_{t} by sending Pi(ξ1,ξ2)=ξiP^{i}(\xi^{1},\xi^{2})=\xi^{i}, for i=1i=1 or 22. That is, the joint probability measure π\pi in (5) has marginal probability measures equal to μ\mu and ν\nu.

It can be shown that WtW_{t} is indeed a distance on the space of probability measures Prob(Ξt)\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}) [46, Definition 6.1] except that it may take the value of ++\infty. Thus it is natural to restrict our attention to the convex subset of probability measures with finite distance to a Dirac measure on Ξt\Xi_{t}

𝒲t:={μProb(Ξt):Ξtdt(ξ¯,ξ)dμ(ξ)<+, for some ξ¯Ξt}.\mathcal{W}_{t}\mathrel{\mathop{\ordinarycolon}}=\left\{\mu\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t})\mathrel{\mathop{\ordinarycolon}}\int_{\Xi_{t}}d_{t}(\bar{\xi},\xi)\operatorname{d\!}\mu(\xi)<+\infty,\text{ for some }\bar{\xi}\in\Xi_{t}\right\}. (6)

Note that any continuous function g(ξ)g(\xi) that satisfies |g(ξ)|C(1+dt(ξ¯,ξ))\lvert g(\xi)\rvert\leq C(1+d_{t}(\bar{\xi},\xi)) for some C>0C>0 and ξ¯Ξt\bar{\xi}\in\Xi_{t} would be integrable for any probability measure in 𝒲t\mathcal{W}_{t}. Now given any such continuous functions gt,1,,gt,mtg_{t,1},\dots,g_{t,m_{t}} on Ξt\Xi_{t} and a real vector ρt:=(ρt,j)t=0mtmt+1\rho_{t}\mathrel{\mathop{\ordinarycolon}}=(\rho_{t,j})_{t=0}^{m_{t}}\in\mathbb{R}^{m_{t}+1}, we define the Wasserstein ambiguity set 𝒫t\mathcal{P}_{t} as

𝒫t:={p𝒲t:Wt(p,ν^t)ρt,0,gt,j,pρt,j,j=1,,mt},\mathcal{P}_{t}\mathrel{\mathop{\ordinarycolon}}=\left\{p\in\mathcal{W}_{t}\mathrel{\mathop{\ordinarycolon}}W_{t}(p,\hat{\nu}_{t})\leq\rho_{t,0},\;\langle g_{t,j},p\rangle\leq\rho_{t,j},\;j=1,\dots,m_{t}\right\}, (7)

The first inequality constraint in the definition (7) bounds the Wasserstein distance of the probability measure p𝒲tp\in\mathcal{W}_{t} from the empirical measure ν^t\hat{\nu}_{t}, while the second set of constraints on pp are defined as gt,j,p:=Ξtgt,j(ξ)dp(ξ)\langle g_{t,j},p\rangle\mathrel{\mathop{\ordinarycolon}}=\int_{\Xi_{t}}g_{t,j}(\xi)\operatorname{d\!}p(\xi), which could be bounds on the moments.

It is well studied that the recursion for data-driven Wasserstein DR-MCO problems (2) can be reformulated as a finite-dimensional minimum-supremum optimization problem, under the assumption of strict feasibility.

Assumption 1.

The empirical probability measure ν^t\hat{\nu}_{t} is a strictly feasible solution of (7), i.e. it satisfies gt,j,ν^t=1nti=1ntgt,j(ξ^t,k)<ρt,j\langle g_{t,j},\hat{\nu}_{t}\rangle=\frac{1}{n_{t}}\sum_{i=1}^{n_{t}}g_{t,j}(\hat{\xi}_{t,k})<\rho_{t,j} for all j=1,,mtj=1,\dots,m_{t}.

Theorem 1.

Under Assumption 1, in any stage t2t\geq 2, the expected cost-to-go function (2) can be equivalently rewritten as

𝒬t1(xt1)=\displaystyle\mathcal{Q}_{t-1}(x_{t-1})= (8)
minλ0{j=0mtρt,jλj+1ntk=1ntsupξkΞt{Qt(xt1;ξk)λ0dt,k(ξk)j=1mtλjgt,j(ξk)}}.\displaystyle\min_{\lambda\geq 0}\left\{\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{\xi_{k}\in\Xi_{t}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{j}g_{t,j}(\xi_{k})\right\}\right\}.

We provide the derivation and proof details in Section A. As a corollary, we can prove a special version of the Kantorovich-Rubinstein duality formula [46, Remark 6.5].

Corollary 2.

Under Assumption 1, if the value function Qt(xt1;ξt)Q_{t}(x_{t-1};\xi_{t}) is ltl_{t}-Lipschitz continuous in the uncertainty ξtΞt\xi_{t}\in\Xi_{t} for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, then we have

𝒬t1(xt1)ρt,0lt+1ntk=1ntQt(xt1;ξ^t,k).\mathcal{Q}_{t-1}(x_{t-1})\leq\rho_{t,0}l_{t}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}Q_{t}(x_{t-1};\hat{\xi}_{t,k}).
Proof.

Take a feasible solution λ0=lt\lambda_{0}=l_{t} and λj=0\lambda_{j}=0 for j=1,,mtj=1,\dots,m_{t} in (8) of Theorem 1. Note that by the Lipschitz continuity assumption, the supremum is attained at ξ^t,k\hat{\xi}_{t,k} for each k=1,,ntk=1,\dots,n_{t}. ∎

2.2 Out-of-Sample Performance Guarantee

A major motivation for using Wasserstein DR-MCO models is the out-of-sample performance guarantee, which ensures that the decisions evaluated on the true probability distribution would perform no worse than the in-sample training with high probability. To begin with, we say that a probability measure μ𝒲t\mu\in\mathcal{W}_{t} is sub-Gaussian if Ξtexp(Cξ2)dμ(ξ)<+\int_{\Xi_{t}}\exp(C\xi^{2})\operatorname{d\!}\mu(\xi)<+\infty for some constant C>0C>0, or it has finite third moments if Ξtξ3dμ(ξ)<+\int_{\Xi_{t}}\lVert\xi\rVert^{3}\operatorname{d\!}\mu(\xi)<+\infty. Our discussion is based on the following specialized version of the measure concentration inequality [16, Theorem 2].

Theorem 3.

Fix any probability measure νtProb(Ξt)\nu_{t}\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}) in stage tt and let ν^t\hat{\nu}_{t} denote the empirical measure constructed from ntn_{t} independent and identically distributed (iid) samples of νt\nu_{t}. Then for any ρt,0>1\rho_{t,0}>1, we have

(Wt(νt,ν^t)>ρt,0)\displaystyle\mathbb{P}\left(W_{t}(\nu_{t},\hat{\nu}_{t})>\rho_{t,0}\right)\leq Ctexp(Ctntρt,0max{δt,2}),\displaystyle C_{t}\exp\left(-C^{\prime}_{t}n_{t}\rho_{t,0}^{\max\{\delta_{t},2\}}\right), if νt\nu_{t} is sub-Gaussian and δt2\delta_{t}\neq 2, (9)
(Wt(νt,ν^t)>ρt,0)\displaystyle\mathbb{P}\left(W_{t}(\nu_{t},\hat{\nu}_{t})>\rho_{t,0}\right)\leq Ct′′(ntρt,02)1,\displaystyle C^{\prime\prime}_{t}(n_{t}\rho_{t,0}^{2})^{-1}, if νt\nu_{t} has finite third moments, (10)

for some positive constants Ct,Ct,Ct′′>0C_{t},C^{\prime}_{t},C^{\prime\prime}_{t}>0 that depend only on νt\nu_{t}.

The measure concentration bound in (9) becomes slightly more intricate when the dimension of the uncertainty δt=2\delta_{t}=2 (see the details in [16]), so we focus our discussion below on the other cases.

The out-of-sample performance refers to the evaluation of the solutions and policies obtained from solving DR-MCO (1) on the true probability measures νt\nu_{t} for each t𝒯t\in\mathcal{T}. To be precise, fix any optimal policy given by the DR-MCO (1), i.e., an optimal initial stage state x1argminx1𝒳1{f1(x0,x1;ξ1)+𝒬1(x1)}x_{1}^{*}\in\operatorname*{arg\,min}_{x_{1}\in\mathcal{X}_{1}}\{f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\}, and a collection of mappings xt:𝒳t1×Ξt𝒳tx_{t}^{*}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}_{t-1}\times\Xi_{t}\to\mathcal{X}_{t} for t=2,,Tt=2,\dots,T, such that

  1. 1.

    xt(xt1,)x_{t}^{*}(x_{t-1},\cdot) is Borel measurable for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1};

  2. 2.

    xt(xt1,ξt)argminxt𝒳t{ft(xt1,xt;ξt)+𝒬t(xt)}x_{t}^{*}(x_{t-1},\xi_{t})\in\operatorname*{arg\,min}_{x_{t}\in\mathcal{X}_{t}}\{f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t})\} for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and ξtΞt\xi_{t}\in\Xi_{t}.

We define recursively the costs in the out-of-sample evaluation cost-to-go functions as

𝒬t1Eval(xt1):=Ξt[ft(xt1,xt(xt1,ξt);ξt)+𝒬tEval(xt(xt1,ξt))]dνt(ξt),\mathcal{Q}_{t-1}^{\mathrm{Eval}}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\int_{\Xi_{t}}\bigl{[}f_{t}(x_{t-1},x_{t}^{*}(x_{t-1},\xi_{t});\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Eval}}(x_{t}^{*}(x_{t-1},\xi_{t}))\bigr{]}\operatorname{d\!}\nu_{t}(\xi_{t}), (11)

with 𝒬TEval(xT)=0\mathcal{Q}_{T}^{\mathrm{Eval}}(x_{T})=0 for any xT𝒳Tx_{T}\in\mathcal{X}_{T}. Then the out-of-sample evaluation mean cost associated with this policy is defined as vEval:=f1(x0,x1;ξ1)+𝒬1Eval(x1)v^{\mathrm{Eval}}\mathrel{\mathop{\ordinarycolon}}=f_{1}(x_{0},x_{1}^{*};\xi_{1})+\mathcal{Q}_{1}^{\mathrm{Eval}}(x_{1}^{*}). The next theorem provides a lower bound on the probability that the event vEvalQ1(x0;ξ1)v^{\mathrm{Eval}}\leq Q_{1}(x_{0};\xi_{1}) happens, which is often known as the out-of-sample performance guarantee.

Theorem 4.

Fix any probability measure νt𝒲t\nu_{t}\in\mathcal{W}_{t} and let ν^t\hat{\nu}_{t} denote the empirical measure from ntn_{t} iid samples of νt\nu_{t} for all stages t𝒯t\in\mathcal{T}. Assume that gt,j,νtρt,j\langle g_{t,j},\nu_{t}\rangle\leq\rho_{t,j} for j=1,,mtj=1,\dots,m_{t}, and ρt,0>1\rho_{t,0}>1 for all t𝒯t\in\mathcal{T}. Then for any α(0,1)\alpha\in(0,1), we have vEvalQ1(x0;ξ1)v^{\mathrm{Eval}}\leq Q_{1}(x_{0};\xi_{1}) with probability at least α\alpha if either of the following conditions holds for each t𝒯t\in\mathcal{T}:

  1. 1.

    the probability measure νt\nu_{t} is sub-Gaussian, δt2\delta_{t}\neq 2, and

    ntρt,0max{δt,2}1Ct[lnCtln(1α1/(T1))],n_{t}\cdot\rho_{t,0}^{\max\{\delta_{t},2\}}\geq\frac{1}{C^{\prime}_{t}}\left[\ln{C_{t}}-\ln\left(1-\alpha^{1/(T-1)}\right)\right],
  2. 2.

    the probability measure νt\nu_{t} has finite third moments and

    ntρt,02Ct′′1α1/(T1),n_{t}\cdot\rho_{t,0}^{2}\geq\frac{C^{\prime\prime}_{t}}{1-\alpha^{1/(T-1)}},

where Ct,Ct,C_{t},C^{\prime}_{t}, and Ct′′C^{\prime\prime}_{t} are the positive constants in Theorem 3, that depend only on νt\nu_{t}.

Proof.

If either of the conditions is satisfied, then it is straightforward to check from Theorem 3 that the probability (Wt(νt,νt^)>ρt,0)1α1/(T1)\mathbb{P}(W_{t}(\nu_{t},\hat{\nu_{t}})>\rho_{t,0})\leq 1-\alpha^{1/(T-1)}. By the assumption on the iid sampling of ν^t\hat{\nu}_{t} and that gt,j,νtρt,j\langle g_{t,j},\nu_{t}\rangle\leq\rho_{t,j} for j=1,,mtj=1,\dots,m_{t}, the event E:={νt𝒫t for t=2,,T}E\mathrel{\mathop{\ordinarycolon}}=\{\nu_{t}\in\mathcal{P}_{t}\text{ for }t=2,\dots,T\} has the probability

(E)={Wt(νt,ν^t)ρt,0 for all t𝒯}=t=2T(Wt(νt,ν^t)ρt,0)α.\mathbb{P}(E)=\mathbb{P}\{W_{t}(\nu_{t},\hat{\nu}_{t})\leq\rho_{t,0}\text{ for all }t\in\mathcal{T}\}=\prod_{t=2}^{T}\mathbb{P}(W_{t}(\nu_{t},\hat{\nu}_{t})\leq\rho_{t,0})\geq\alpha.

To complete the proof, we claim that 𝒬tEval(xt)𝒬t(xt)\mathcal{Q}_{t}^{\mathrm{Eval}}(x_{t})\leq\mathcal{Q}_{t}(x_{t}) for all xt𝒳tx_{t}\in\mathcal{X}_{t} and t𝒯t\in\mathcal{T} everywhere on EE. Note that 𝒬TEval(xT)=𝒬T(xT)=0\mathcal{Q}_{T}^{\mathrm{Eval}}(x_{T})=\mathcal{Q}_{T}(x_{T})=0 for all xT𝒳Tx_{T}\in\mathcal{X}_{T}. Now if 𝒬tEval(xt)𝒬t(xt)\mathcal{Q}_{t}^{\mathrm{Eval}}(x_{t})\leq\mathcal{Q}_{t}(x_{t}) for all xt𝒳tx_{t}\in\mathcal{X}_{t}, then

ft(xt1,xt(xt1,ξt);ξt)+𝒬tEval(xt(xt1,ξt))\displaystyle f_{t}(x_{t-1},x_{t}^{*}(x_{t-1},\xi_{t});\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Eval}}(x_{t}^{*}(x_{t-1},\xi_{t}))
ft(xt1,xt(xt1,ξt);ξt)+𝒬t(xt(xt1,ξt))\displaystyle\leq f_{t}(x_{t-1},x_{t}^{*}(x_{t-1},\xi_{t});\xi_{t})+\mathcal{Q}_{t}(x_{t}^{*}(x_{t-1},\xi_{t}))
=minxt𝒳t{ft(xt1,xt;ξt)+𝒬t(xt)}\displaystyle=\min_{x_{t}\in\mathcal{X}_{t}}\{f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t})\}
=Qt(xt1;ξt).\displaystyle=Q_{t}(x_{t-1};\xi_{t}).

Therefore, on the event EE, we have

𝒬t1Eval(xt1)\displaystyle\mathcal{Q}_{t-1}^{\mathrm{Eval}}(x_{t-1}) =Ξt[ft(xt1,xt(xt1,ξt);ξt)+𝒬tEval(xt(xt1,ξt))]dνt(ξt)\displaystyle=\int_{\Xi_{t}}\bigl{[}f_{t}(x_{t-1},x_{t}^{*}(x_{t-1},\xi_{t});\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Eval}}(x_{t}^{*}(x_{t-1},\xi_{t}))\bigr{]}\operatorname{d\!}\nu_{t}(\xi_{t})
ΞtQt(xt1;ξt)dνt(ξt)suppt𝒫tΞtQt(xt1;ξt)dpt(ξt)=𝒬t1(xt1).\displaystyle\leq\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}\nu_{t}(\xi_{t})\leq\sup_{p_{t}\in\mathcal{P}_{t}}\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}p_{t}(\xi_{t})=\mathcal{Q}_{t-1}(x_{t-1}).

This recursion shows that 𝒬1Eval(x1)𝒬1(x1)\mathcal{Q}_{1}^{\mathrm{Eval}}(x_{1})\leq\mathcal{Q}_{1}(x_{1}) for all x1𝒳1x_{1}\in\mathcal{X}_{1}, and thus

vEval=f1(x0,x1;ξ1)+𝒬1Eval(x1)minx1𝒳1{f1(x0,x1;ξ1)+𝒬1(x1)}=Q1(x0;ξ1).v^{\mathrm{Eval}}=f_{1}(x_{0},x_{1}^{*};\xi_{1})+\mathcal{Q}_{1}^{\mathrm{Eval}}(x_{1}^{*})\leq\min_{x_{1}\in\mathcal{X}_{1}}\{f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\}=Q_{1}(x_{0};\xi_{1}).

While Theorem 4 is a direct consequence of Theorem 3, it shows some interesting aspects of DR-MCO using Wasserstein ambiguity sets. First, to get a certain probabilistic bound for the out-of-sample performance guarantee, we may need to increase the number of samples ntn_{t} or the Wasserstein distance bound ρt,0\rho_{t,0} for a larger number of stages TT. Second, for probability measures that are not sub-Gaussian (or more generally those with heavy tails, see [15]), for which we would like to apply the second condition, such increase require the product ntρt,02n_{t}\cdot\rho_{t,0}^{2} to grow approximately on the order of 𝒪(T)\mathcal{O}(T) when α\alpha is close to 1. Therefore, it is sometimes useful to take larger values of ρt,0\rho_{t,0} for out-of-sample performance guarantees, especially when the number of sample ntn_{t} is limited, or when the true probability measures νt\nu_{t} are not sub-Gaussian.

2.3 Adjustable In-Sample Conservatism

Another well studied approach to guarantee out-of-sample performance is the multistage robust convex optimization (MRCO) model [18, 48]. However, MRCO considers only the worst-case outcomes of the uncertainties and thus can be overly conservative. To be precise, we define the nominal MSCO from the empirical measures ν^t\hat{\nu}_{t} without risk aversion by the following recursion

𝒬t1Nomin(xt1):=ΞtQtNomin(xt1;ξt)dν^t(ξt)=1ntk=1ntQtNomin(xt1;ξ^t,k),\mathcal{Q}_{t-1}^{\mathrm{Nomin}}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\int_{\Xi_{t}}Q_{t}^{\mathrm{Nomin}}(x_{t-1};\xi_{t})\operatorname{d\!}\hat{\nu}_{t}(\xi_{t})=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}Q_{t}^{\mathrm{Nomin}}(x_{t-1};\hat{\xi}_{t,k}), (12)

where

QtNomin(xt1;ξt):=infxt𝒳tft(xt1,xt;ξt)+𝒬tNomin(xt),Q_{t}^{\mathrm{Nomin}}(x_{t-1};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\inf_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Nomin}}(x_{t}), (13)

and 𝒬TNomin(xT)=0\mathcal{Q}_{T}^{\mathrm{Nomin}}(x_{T})=0 for any xT𝒳Tx_{T}\in\mathcal{X}_{T}. Any MRCO that is built directly from data could have a much larger optimal cost than the nominal MSCO with a probability growing with the numbers of samples ntn_{t} and stages TT, as illustrated by the following example.

Example 1.

Consider an MSCO with local cost functions ft(xt1,xt;ξt):=xt+ξtf_{t}(x_{t-1},x_{t};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=x_{t}+\xi_{t} and state spaces 𝒳t:=[0,1]\mathcal{X}_{t}\mathrel{\mathop{\ordinarycolon}}=[0,1]\subseteq\mathbb{R} for all t𝒯t\in\mathcal{T}. For each t2t\geq 2, the uncertainties are described by a probability measure νt\nu_{t} on the set Ξt:=0\Xi_{t}\mathrel{\mathop{\ordinarycolon}}=\mathbb{R}_{\geq 0} such that for any C>0C>0, we have νt(C,+)>0\nu_{t}(C,+\infty)>0. Now to approximate this MSCO, suppose we are given iid samples ξ^t,1,,ξ^t,nt\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}} from the probability measure νt\nu_{t} in each stage t2t\geq 2. Then any MRCO defined by the following recursion of expected cost-to-go functions

𝒬t1Robust(xt1):=supξtΞ^tminxt𝒳tft(xt1,xt;ξt)+𝒬tRobust(xt),t2,\mathcal{Q}_{t-1}^{\mathrm{Robust}}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\sup_{\xi_{t}\in\hat{\Xi}_{t}}\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Robust}}(x_{t}),\quad\forall\,t\geq 2,

where Ξ^t{ξ^t,1,,ξ^t,nt}\hat{\Xi}_{t}\supseteq\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} is an uncertainty subset constructed from the data, would have its optimal value Q1Robust(x0;ξ1)x0+ξ1+t=2Tξ^tmaxQ_{1}^{\mathrm{Robust}}(x_{0};\xi_{1})\geq x_{0}+\xi_{1}+\sum_{t=2}^{T}\hat{\xi}_{t}^{\max}, where ξ^tmax:=max{ξ^t,1,,ξ^t,nt}\hat{\xi}_{t}^{\max}\mathrel{\mathop{\ordinarycolon}}=\max\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\}. The optimal value of the corresponding nominal MSCO is Q1Nomin(x0;ξ1)=x0+ξ1+t=2Tξ^tmeanQ_{1}^{\mathrm{Nomin}}(x_{0};\xi_{1})=x_{0}+\xi_{1}+\sum_{t=2}^{T}\hat{\xi}_{t}^{\mathrm{mean}} where ξ^tmean:=1ntk=1ntξ^t,k\hat{\xi}_{t}^{\mathrm{mean}}\mathrel{\mathop{\ordinarycolon}}=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\hat{\xi}_{t,k}. Therefore, for any constant C>maxt=2,,T𝔼[ξt]C>\max_{t=2,\dots,T}\mathbb{E}[\xi_{t}], we can show that

{Q1Robust(x0;ξ1)Q1Nomin(x0;ξ1)>C}\displaystyle\mathbb{P}\left\{Q_{1}^{\mathrm{Robust}}(x_{0};\xi_{1})-Q_{1}^{\mathrm{Nomin}}(x_{0};\xi_{1})>C\right\} =1t=2T((νt[0,2C])nt+{ξ^tmean>C}),\displaystyle=1-\prod_{t=2}^{T}\left((\nu_{t}[0,2C])^{n_{t}}+\mathbb{P}\left\{\hat{\xi}_{t}^{\mathrm{mean}}>C\right\}\right),

which goes to 1 as ntn_{t}\to\infty.

In contrast, the difference of the optimal values between the DR-MCO and the nominal MSCO Q1(x0;ξ1)Q1Nomin(x0;ξ1)Q_{1}(x_{0};\xi_{1})-Q_{1}^{\mathrm{Nomin}}(x_{0};\xi_{1}) can be bounded by the Wasserstein distances ρt,0\rho_{t,0}, as shown below.

Theorem 5.

Suppose the value function Qt(xt1;ξt)Q_{t}(x_{t-1};\xi_{t}) is ltl_{t}-Lipschitz continuous in ξtΞt\xi_{t}\in\Xi_{t} for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, for each stage t𝒯t\in\mathcal{T}. Under Assumption 1, the difference of optimal values between the DR-MCO and the nominal MSCO satisfies

Q1(x0;ξ1)Q1Nomin(x0;ξ1)t=2Tltρt,0.Q_{1}(x_{0};\xi_{1})-Q_{1}^{\mathrm{Nomin}}(x_{0};\xi_{1})\leq\sum_{t=2}^{T}l_{t}\rho_{t,0}.
Proof.

We first observe that if there exists ε0\varepsilon\geq 0 such that supxt𝒳t|𝒬t(xt)𝒬tNomin(xt)|ε\sup_{x_{t}\in\mathcal{X}_{t}}\lvert\mathcal{Q}_{t}(x_{t})-\mathcal{Q}_{t}^{\mathrm{Nomin}}(x_{t})\rvert\leq\varepsilon, then the value functions satisfy Qt(xt1;ξt)QtNomin(xt1;ξt)εQ_{t}(x_{t-1};\xi_{t})-Q_{t}^{\mathrm{Nomin}}(x_{t-1};\xi_{t})\leq\varepsilon by the definitions (3) and (13). Now we prove by recursion that 𝒬t(xt)𝒬tNomin(xt)s>tlsρs,0\mathcal{Q}_{t}(x_{t})-\mathcal{Q}_{t}^{\mathrm{Nomin}}(x_{t})\leq\sum_{s>t}l_{s}\rho_{s,0} for any xt𝒳tx_{t}\in\mathcal{X}_{t}, which holds trivially for t=Tt=T. For any t𝒯t\in\mathcal{T}, we have

𝒬t1(xt1)𝒬t1Nomin(xt1)=\displaystyle\mathcal{Q}_{t-1}(x_{t-1})-\mathcal{Q}_{t-1}^{\mathrm{Nomin}}(x_{t-1})= (suppt𝒫tΞtQt(xt1;ξt)dpt(ξt)ΞtQt(xt1;ξt)dν^t(ξt))\displaystyle\left(\sup_{p_{t}\in\mathcal{P}_{t}}\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}p_{t}(\xi_{t})-\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}\hat{\nu}_{t}(\xi_{t})\right)
+\displaystyle+ Ξt(Qt(xt1;ξt)QtNomin(xt1;ξt))dν^t(ξt)\displaystyle\int_{\Xi_{t}}\left(Q_{t}(x_{t-1};\xi_{t})-Q_{t}^{\mathrm{Nomin}}(x_{t-1};\xi_{t})\right)\operatorname{d\!}\hat{\nu}_{t}(\xi_{t})
\displaystyle\ \leq ltρt,0+s>tlsρs,0=s>t1lsρs,0,\displaystyle l_{t}\rho_{t,0}+\sum_{s>t}l_{s}\rho_{s,0}=\sum_{s>t-1}l_{s}\rho_{s,0},

where the first part before the inequality is bounded by ltρt,0l_{t}\rho_{t,0} using Corollary 2, and the second part is bounded by our observation above. This recursion shows that 𝒬1(x1)𝒬1Nomin(x1)t=2Tltρt,0\mathcal{Q}_{1}(x_{1})-\mathcal{Q}_{1}^{\mathrm{Nomin}}(x_{1})\leq\sum_{t=2}^{T}l_{t}\rho_{t,0}, which completes the proof by the same observation. ∎

Theorem 5 shows that the conservatism of the DR-MCO can be adjusted linearly with the Wasserstein distance bound ρt,0\rho_{t,0}, assuming the Lipschitz continuity of the value functions in the uncertainties. However, as the out-of-sample performance guarantee in Theorem 4 depends on some unknown constants in Theorem 3, it is not easy to numerically determine the Wasserstein distance bounds ρt,0\rho_{t,0}. We discuss some practical choices of the bounds ρt,0\rho_{t,0} in Section 4.

3 Dual Dynamic Programming Algorithm

In this section, we first review the recursive cutting plane approximations and the dual dynamic programming (DDP) algorithm. Then we focus on different realizations of the single stage subproblem oracles (SSSO) for DR-MCO with Wasserstein ambiguity sets that would guarantee the convergence of the DDP algorithm.

3.1 Recursive Approximations and Regularization

Recall that for any convex function 𝒬:𝒳{+}\mathcal{Q}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}\to\mathbb{R}\cup\{+\infty\}, an affine function 𝒱:𝒳\mathcal{V}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}\to\mathbb{R} is called a (valid) linear cut if 𝒬(x)𝒱(x)\mathcal{Q}(x)\geq\mathcal{V}(x) for all x𝒳x\in\mathcal{X}. A collection of such valid linear cuts {𝒱j}1ji\{\mathcal{V}^{j}\}_{1\leq j\leq i} defines a valid under-approximation 𝒬¯i(x):=max1ji𝒱j(x)\underline{\mathcal{Q}}^{i}(x)\mathrel{\mathop{\ordinarycolon}}=\max_{1\leq j\leq i}\mathcal{V}^{j}(x) of 𝒬(x)\mathcal{Q}(x). Similarly by convexity, given a collection of overestimate values vj𝒬(xj)v^{j}\geq\mathcal{Q}(x^{j}) for j=1,,ij=1,\dots,i, we can define a valid over-approximation by the convex envelope 𝒬¯i(x):=conv1ji(vj+ιxj(x))\overline{\mathcal{Q}}^{i}(x)\mathrel{\mathop{\ordinarycolon}}=\mathrm{conv}_{1\leq j\leq i}(v^{j}+\iota_{x^{j}}(x)), where ιxj(x)=0\iota_{x^{j}}(x)=0 when x=xjx=x^{j} and ++\infty otherwise, is the convex indicator function centered at xjx^{j}. The validness of these approximations 𝒬¯(x)𝒬(x)𝒬¯(x)\underline{\mathcal{Q}}(x)\leq\mathcal{Q}(x)\leq\overline{\mathcal{Q}}(x) for all x𝒳x\in\mathcal{X} suggests that we may use them in the place of 𝒬\mathcal{Q} for recursive updates during a stagewise decomposition algorithm.

To see how the recursive updates may work, let us assume temporarily in this subsection that the worse-case probability measure ptp_{t}^{*} in (4) exists and can be found. Given any under-approximation 𝒬¯t\underline{\mathcal{Q}}_{t} for 𝒬t\mathcal{Q}_{t}, for any ξ^tΞt\hat{\xi}_{t}\in\Xi_{t}, we can generate a linear cut Vt(xt1;ξt):=v¯(ξ^t)+u(ξ^t)𝖳(xt1x~t1)V_{t}(x_{t-1};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\underline{\hbox{$v$}}(\hat{\xi}_{t})+u(\hat{\xi}_{t})^{\mathsf{T}}(x_{t-1}-\tilde{x}_{t-1}) for the value function Qt(xt1;ξ^t)Q_{t}(x_{t-1};\hat{\xi}_{t}) with some x~t1𝒳t1\tilde{x}_{t-1}\in\mathcal{X}_{t-1}, where v¯(ξ^t)\underline{\hbox{$v$}}(\hat{\xi}_{t}) and u(ξ^t)u(\hat{\xi}_{t}) are the optimal value and an optimal dual solution to the following Lagrangian dual of the under-approximation problem that is parametrized by ξ^t\hat{\xi}_{t}:

v¯(ξ^t):=supudt1min{ft(zt,xt;ξ^t)+𝒬¯t(xt)+u𝖳(x~t1zt):xt𝒳t,ztdt1},\underline{\hbox{$v$}}(\hat{\xi}_{t})\mathrel{\mathop{\ordinarycolon}}=\sup_{u\in\mathbb{R}^{d_{t-1}}}\min\left\{f_{t}(z_{t},x_{t};\hat{\xi}_{t})+\underline{\mathcal{Q}}_{t}(x_{t})+u^{\mathsf{T}}(\tilde{x}_{t-1}-z_{t})\mathrel{\mathop{\ordinarycolon}}x_{t}\in\mathcal{X}_{t},z_{t}\in\mathbb{R}^{d_{t-1}}\right\}, (14)

assuming that u(ξ^t)u(\hat{\xi}_{t}) exists. Then we can aggregate these linear cuts Vt(xt1;ξt)V_{t}(x_{t-1};\xi_{t}) into a linear cut 𝒱t1():=𝔼ξtptVt(;ξt)\mathcal{V}_{t-1}(\cdot)\mathrel{\mathop{\ordinarycolon}}=\mathbb{E}_{\xi_{t}\sim p_{t}^{*}}V_{t}(\cdot;\xi_{t}) for 𝒬t1\mathcal{Q}_{t-1}, where the expectation is taken componentwise with respect to the probability measure ptp_{t}^{*}. In this way, we can use the under-approximation 𝒬¯t\underline{\mathcal{Q}}_{t} for 𝒬t\mathcal{Q}_{t} to update the under-approximation 𝒬¯t1\underline{\mathcal{Q}}_{t-1} for 𝒬t1\mathcal{Q}_{t-1} by the aggregated linear cut 𝒱t1\mathcal{V}_{t-1}.

Likewise, given any over-approximation 𝒬¯t\overline{\mathcal{Q}}_{t} for 𝒬t\mathcal{Q}_{t}, for any ξ^tΞt\hat{\xi}_{t}\in\Xi_{t} and x~t1𝒳t1\tilde{x}_{t-1}\in\mathcal{X}_{t-1}, we can solve the following over-estimation problem

v¯(ξ^t):=min{ft(x~t1,xt;ξ^t)+𝒬¯t(xt):xt𝒳t},\bar{v}(\hat{\xi}_{t})\mathrel{\mathop{\ordinarycolon}}=\min\left\{f_{t}(\tilde{x}_{t-1},x_{t};\hat{\xi}_{t})+\overline{\mathcal{Q}}_{t}(x_{t})\mathrel{\mathop{\ordinarycolon}}x_{t}\in\mathcal{X}_{t}\right\}, (15)

which gives an overestimate value v¯(ξ^t)Qt(x~t1;ξ^t)\bar{v}(\hat{\xi}_{t})\geq Q_{t}(\tilde{x}_{t-1};\hat{\xi}_{t}). Again we can aggregate the overestimate value by setting vt1:=𝔼ξtptv¯(ξt)v_{t-1}\mathrel{\mathop{\ordinarycolon}}=\mathbb{E}_{\xi_{t}\sim p_{t}^{*}}\bar{v}(\xi_{t}), which by definition satisfies vt1𝒬t1(x~t1)v_{t-1}\geq\mathcal{Q}_{t-1}(\tilde{x}_{t-1}) and thus can be used to update the over-approximation 𝒬¯t1\overline{\mathcal{Q}}_{t-1} for 𝒬t1\mathcal{Q}_{t-1}.

There are, however, some potential issues with this recursive approximation method. First, the supremum in the Lagrangian dual problem (14) may not be attained, which could happen if v¯(ξ^t)=+\underline{\hbox{$v$}}(\hat{\xi}_{t})=+\infty (i.e., x~t1\tilde{x}_{t-1} is an infeasible state for ξ^t\hat{\xi}_{t}) or if any neighborhood of x~t1\tilde{x}_{t-1} contains such an infeasible state. In this case, we may fail to generate a linear cut Vt(;ξ^t)V_{t}(\cdot;\hat{\xi}_{t}). Second, Lipschitz constants of the linear cuts 𝒱t1\mathcal{V}_{t-1} may be affected by the under-approximation 𝒬¯t\underline{\mathcal{Q}}_{t}, causing a worse approximation quality. This may happens when x~t1\tilde{x}_{t-1} is an extreme point of 𝒳t1\mathcal{X}_{t-1}. In fact, it is shown in [48] that the Lipschitz constants of 𝒬¯t\underline{\mathcal{Q}}_{t} could exceed those of 𝒬t\mathcal{Q}_{t} and grow with the total number of stages TT. Third, the over-approximation function evaluates to ++\infty at any point that is not in the convex hull of previously visited points, which makes the gap 𝒬¯t(x)𝒬¯t(x)\overline{\mathcal{Q}}_{t}(x)-\underline{\mathcal{Q}}_{t}(x) less useful as an estimate of the quality of the current solutions or policies.

To remedy these issues, we consider a technique called Lipschitzian regularization. Given regularization factors Mt>0M_{t}>0, we define the regularized local cost function as

ftR(xt1,xt;ξt):=infztdt1ft(zt,xt;ξt)+Mtxt1zt,f_{t}^{\mathrm{R}}(x_{t-1},x_{t};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\inf_{z_{t}\in\mathbb{R}^{d_{t-1}}}f_{t}(z_{t},x_{t};\xi_{t})+M_{t}\lVert x_{t-1}-z_{t}\rVert, (16)

and the regularized value function

QtR(xt1;ξt):=minxt𝒳tftR(xt1,xt;ξt)+𝒬tR(xt),Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\min_{x_{t}\in\mathcal{X}_{t}}f_{t}^{\mathrm{R}}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t}), (17)

recursively for t=T,T1,,2t=T,T-1,\dots,2, where 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}} is the regularized expected cost-to-go function defined as

𝒬tR(xt):=suppt+1𝒫t+1𝔼ξt+1pt+1Qt+1R(xt;ξt+1),\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})\mathrel{\mathop{\ordinarycolon}}=\sup_{p_{t+1}\in\mathcal{P}_{t+1}}\mathbb{E}_{\xi_{t+1}\sim p_{t+1}}Q_{t+1}^{\mathrm{R}}(x_{t};\xi_{t+1}), (18)

for tT1t\leq T-1, and 𝒬TR(xT)0\mathcal{Q}_{T}^{\mathrm{R}}(x_{T})\equiv 0 for any xT𝒳Tx_{T}\in\mathcal{X}_{T}. It is then straightforward to check that QtR(xt1;ξt)Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t}) is uniformly MtM_{t}-Lipschitz continuous in xt1x_{t-1} for each ξtΞt\xi_{t}\in\Xi_{t}, and consequently 𝒬t1R\mathcal{Q}_{t-1}^{\mathrm{R}} is also MtM_{t}-Lipschitz continuous. In the definitions (14) and (15), we can replace accordingly the original cost functions ftf_{t} with the regularized cost function ftRf_{t}^{\mathrm{R}}, which would guarantee the MtM_{t}-Lipschitz continuity of the generated cuts and allow us to enhance the over-approximation to be MtM_{t}-Lipschitz continuous.

Lipschitzian regularization in general only gives under-approximations of the true value and expected cost-to-go functions. We need the following assumption to preserve the optimality and feasibility of the solutions.

Assumption 2.

For the given regularization factors Mt>0M_{t}>0, t𝒯t\in\mathcal{T}, the optimal value of the regularized DR-MCO satisfies

minx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1)=minx1𝒳1f1(x0,x1;ξ1)+𝒬1R(x1)\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})=\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}^{\mathrm{R}}(x_{1})

and the sets of optimal first-stage solutions are the same, i.e., argmin{f1(x0,x1;ξ1)+𝒬1R(x1):x1𝒳1}=argmin{f1(x0,x1;ξ1)+𝒬1(x1):x1𝒳1}.\operatorname*{arg\,min}\{f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}^{\mathrm{R}}(x_{1})\mathrel{\mathop{\ordinarycolon}}x_{1}\in\mathcal{X}_{1}\}=\operatorname*{arg\,min}\{f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\mathrel{\mathop{\ordinarycolon}}x_{1}\in\mathcal{X}_{1}\}.

We remark by the following proposition that Assumption 2 can be satisfied in any problem that already have uniformly Lipschitz continuous value function Qt(;ξt)Q_{t}(\cdot;\xi_{t}) for all ξtΞt\xi_{t}\in\Xi_{t}.

Proposition 6 (Proposition 4,[48]).

Suppose each state space 𝒳tdt\mathcal{X}_{t}\subseteq\mathbb{R}^{d_{t}} is full dimensional, i.e., int𝒳t\mathrm{int}\mathcal{X}_{t}\neq\varnothing. Then Assumption 2 holds if for each stage t2t\geq 2, the value function Qt(;ξt)Q_{t}(\cdot;\xi_{t}) is MtM_{t}-Lipschitz continuous for any ξtΞt\xi_{t}\in\Xi_{t}.

In general, we can execute the DDP algorithm even with Mt=+M_{t}=+\infty, as described below.

3.2 Dual Dynamic Programming Algorithm

DDP algorithms generally refer to the recursive cutting plane algorithms that exploit the stagewise independence structure. We first review the definitions of single stage subproblem oracles (SSSO), which symbolize the subroutines of subproblem solving in each stage with under- and over-approximations of the expected cost-to-go functions [48].

Definition 1 (Initial stage subproblem oracle).

Let 𝒬¯1,𝒬¯1:𝒳1¯\underline{\mathcal{Q}}_{1},\overline{\mathcal{Q}}_{1}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}_{1}\to\bar{\mathbb{R}} denote two lsc convex functions, representing an under-approximation and an over-approximation of the cost-to-go function 𝒬1R\mathcal{Q}_{1}^{\mathrm{R}} in (18), respectively. Consider the following subproblem for the first stage t=1t=1,

minx1𝒳1f1(x0,x1;ξ1)+𝒬¯1(x1).\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\underline{\mathcal{Q}}_{1}(x_{1}). (I)

The initial stage subproblem oracle provides an optimal solution x1x_{1} to (I) and calculates the approximation gap γ1𝒬¯1(x1)𝒬¯1(x1)\gamma_{1}\coloneqq\overline{\mathcal{Q}}_{1}(x_{1})-\underline{\mathcal{Q}}_{1}(x_{1}) at the solution. We thus define the subproblem oracle formally as a map 𝒪1:(𝒬¯1,𝒬¯1)(x1;γ1)\mathscr{O}_{1}\mathrel{\mathop{\ordinarycolon}}(\underline{\mathcal{Q}}_{1},\overline{\mathcal{Q}}_{1})\mapsto(x_{1};\gamma_{1}).

Definition 2 (Noninitial stage subproblem oracle).

Fix any regularization parameter Mt>0M_{t}>0. Let 𝒬¯t,𝒬¯t:𝒳t¯\underline{\mathcal{Q}}_{t},\overline{\mathcal{Q}}_{t}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}_{t}\to\bar{\mathbb{R}} denote two lsc convex functions, representing an under-approximation and an over-approximation of the cost-to-go function 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}} in (18), respectively, for some stage t>1t>1. Then given a feasible state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, the noninitial stage subproblem oracle provides a feasible state xt𝒳tx_{t}\in\mathcal{X}_{t}, an MtM_{t}-Lipschitz continuous linear cut 𝒱t1()\mathcal{V}_{t-1}(\cdot), and an over-estimate value vt1v_{t-1} such that

  • they are valid, i.e., 𝒱t1(x)𝒬t1R(x)\mathcal{V}_{t-1}(x)\leq\mathcal{Q}_{t-1}^{\mathrm{R}}(x) for any x𝒳t1x\in\mathcal{X}_{t-1} and vt1𝒬t1R(xt1)v_{t-1}\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x_{t-1});

  • the gap is controlled, i.e., vt1𝒱t1(xt1)γt𝒬¯t(xt)𝒬¯t(xt)v_{t-1}-\mathcal{V}_{t-1}(x_{t-1})\leq\gamma_{t}\coloneqq\overline{\mathcal{Q}}_{t}(x_{t})-\underline{\mathcal{Q}}_{t}(x_{t}).

We thus define the subproblem oracle formally as a map 𝒪t:(xt1,𝒬¯t,𝒬¯t)(𝒱t1,vt1,xt;γt)\mathscr{O}_{t}\mathrel{\mathop{\ordinarycolon}}(x_{t-1},\underline{\mathcal{Q}}_{t},\overline{\mathcal{Q}}_{t})\mapsto(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}).

Now we can present the (consecutive) DDP algorithm based on the SSSO. In Algorithm 1, each iteration ii\in\mathbb{N} consists of two steps: the noninitial stage step and the initial stage step. In the noninitial stage step, we evaluate the SSSO from stage t=2t=2 to t=Tt=T to collect feasible states xtix_{t}^{i}, overestimate values vt1iv_{t-1}^{i}, and a valid linear cut 𝒱t1i\mathcal{V}_{t-1}^{i} for updating the approximations 𝒬¯t1i\underline{\mathcal{Q}}_{t-1}^{i} and 𝒬¯t1i\overline{\mathcal{Q}}_{t-1}^{i}. Then we evaluate the initial stage SSSO to get an optimal solution x1i+1x_{1}^{i+1} and its optimality gap γ1i+1\gamma_{1}^{i+1}. We terminate the algorithm when the gap is sufficiently small.

Algorithm 1 Dual Dynamic Programming Algorithm
1:subproblem oracles 𝒪t\mathscr{O}_{t} for t𝒯t\in\mathcal{T}, optimality gap ε>0\varepsilon>0
2:an ε\varepsilon-optimal first stage solution x1x_{1}^{*} to the regularization (18)
3:initialize: 𝒬¯t00,𝒬¯t0+,t𝒯\{T}\underline{\mathcal{Q}}_{t}^{0}\leftarrow 0,\overline{\mathcal{Q}}_{t}^{0}\leftarrow+\infty,t\in\mathcal{T}\backslash\{T\}; 𝒬¯Tj,𝒬¯Tj0,j\underline{\mathcal{Q}}_{T}^{j},\overline{\mathcal{Q}}_{T}^{j}\leftarrow 0,j\in\mathbb{N}; i1i\leftarrow 1
4:evaluate (x11;γ11)𝒪1(𝒬¯10,𝒬¯10)(x_{1}^{1};\gamma_{1}^{1})\leftarrow\mathscr{O}_{1}(\underline{\mathcal{Q}}_{1}^{0},\overline{\mathcal{Q}}_{1}^{0})
5:set LowerBoundf1(x0,x11;ξ1),UpperBound+\text{{LowerBound}}\leftarrow f_{1}(x_{0},x_{1}^{1};\xi_{1}),\ \text{{UpperBound}}\leftarrow+\infty
6:while UpperBoundLowerBound>ε\text{{UpperBound}}-\text{{LowerBound}}>\varepsilon do
7:     for t=2,,Tt=2,\dots,T do
8:         evaluate (𝒱t1i,vt1i,xti;γti)=𝒪t(xt1i,𝒬¯ti1,𝒬¯ti1)(\mathcal{V}_{t-1}^{i},v_{t-1}^{i},x_{t}^{i};\gamma_{t}^{i})=\mathscr{O}_{t}(x_{t-1}^{i},\underline{\mathcal{Q}}_{t}^{i-1},\overline{\mathcal{Q}}_{t}^{i-1}) \triangleright noninitial stage step
9:         update 𝒬¯t1i(x)max{𝒬¯t1i1(x),𝒱t1i(x)}\underline{\mathcal{Q}}_{t-1}^{i}(x)\leftarrow\max\{\underline{\mathcal{Q}}_{t-1}^{i-1}(x),\mathcal{V}_{t-1}^{i}(x)\}
10:         update 𝒬¯t1i(x)conv{𝒬¯t1i1(x),vt1i+Mt1xxt1i}\overline{\mathcal{Q}}_{t-1}^{i}(x)\leftarrow\mathrm{conv}\{\overline{\mathcal{Q}}_{t-1}^{i-1}(x),v_{t-1}^{i}+M_{t-1}\lVert x-x_{t-1}^{i}\rVert\}      
11:     evaluate (x1i+1;γ1i+1)𝒪1(𝒬¯1i,𝒬¯1i)(x_{1}^{i+1};\gamma_{1}^{i+1})\leftarrow\mathscr{O}_{1}(\underline{\mathcal{Q}}_{1}^{i},\overline{\mathcal{Q}}_{1}^{i}) \triangleright initial stage step
12:     update LowerBoundf1(x0,x1i+1;ξ1)+𝒬¯1i(x1i+1)\text{{LowerBound}}\leftarrow f_{1}(x_{0},x_{1}^{i+1};\xi_{1})+\underline{\mathcal{Q}}_{1}^{i}(x_{1}^{i+1})
13:     update UpperBoundf1(x0,x1i+1;ξ1)+𝒬¯1i(x1i+1)\text{{UpperBound}}^{\prime}\leftarrow f_{1}(x_{0},x_{1}^{i+1};\xi_{1})+\overline{\mathcal{Q}}_{1}^{i}(x_{1}^{i+1})
14:     if UpperBound<UpperBound\text{{UpperBound}}^{\prime}<\text{{UpperBound}} then
15:         set x1x1i+1x_{1}^{*}\leftarrow x_{1}^{i+1}, UpperBoundUpperBound\text{{UpperBound}}\leftarrow\text{{UpperBound}}^{\prime}      
16:     update ii+1i\leftarrow i+1

Algorithm 1 is proved to terminate with an ε\varepsilon-optimal solution in finitely many iterations [48]. We include the theorem here for completeness.

Theorem 7.

Suppose that all the state spaces 𝒳t\mathcal{X}_{t} have the dimensions bounded by dd and diameters by DD, and let Mmax{Mt:t=1,,T1}M\coloneqq\max\{M_{t}\mathrel{\mathop{\ordinarycolon}}t=1,\dots,T-1\}. If for each stage t𝒯t\in\mathcal{T}, the local cost functions are strictly positive ft(xt1,xt;ξt)Cf_{t}(x_{t-1},x_{t};\xi_{t})\geq C for all feasible solutions xt𝒳tx_{t}\in\mathcal{X}_{t}, uncertainties ξtΞt\xi_{t}\in\Xi_{t} and some constant C>0C>0, then the total number of noninitial stage subproblem oracle evaluations before achieving an α\alpha-relative optimal solution x1x_{1}^{*} for Algorithm 1 is bounded by

#Eval1+T(T1)(1+2MDαC)d.\textup{{\#Eval}}\leq 1+T(T-1)\left(1+\frac{2MD}{\alpha C}\right)^{d}.

We remark that the DDP algorithm can also be executed in a nonconsecutive way with a similar complexity bound. However, since we need to run the algorithm for a fixed number of iterations to compare different models, we restrict our attention to the consecutive version here. Any interested reader is referred to [48] for more detailed discussion.

In Section 3.1, we have seen how SSSO could be implemented if the worst-case probability measure exists in (4) and can be found. However, it is well known that the worst-case probability measure may not exist (see e.g., [15, Example 2]). Moreover, the integration with respect to some worst-case probability measure could also be numerically challenging. Therefore, we next provide two possible SSSO implementation methods directly using the recursion (8).

3.3 Subproblem Oracles: Concave Uncertain Cost Functions

If the following assumption holds, we are able to reformulate the recursion (8) into a convex optimization problem.

Assumption 3.

The local cost function ft(xt1,xt;ξt)f_{t}(x_{t-1},x_{t};\xi_{t}) is concave and upper semicontinuous in the uncertainty ξt\xi_{t} for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and xt𝒳tx_{t}\in\mathcal{X}_{t}.

A direct consequence of Assumption 3 is that the effective domain of the state xtx_{t} does not depend on the uncertainty ξt\xi_{t}, as shown in the following lemma.

Lemma 8.

Under Assumption 3, we have domft(xt1,;ξt)=domft(xt1,;ξt)\mathrm{dom}\,{f_{t}(x_{t-1},\cdot;\xi_{t})}=\mathrm{dom}\,{f_{t}(x_{t-1},\cdot;\xi^{\prime}_{t})} for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and ξt,ξtΞt\xi_{t},\xi^{\prime}_{t}\in\Xi_{t}.

Proof.

Assume for contradiction that there exists some xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, xt𝒳tx_{t}\in\mathcal{X}_{t}, and ξt,ξtΞt\xi_{t},\xi^{\prime}_{t}\in\Xi_{t} such that ft(xt1,xt;ξt)<+f_{t}(x_{t-1},x_{t};\xi_{t})<+\infty but ft(xt1,xt;ξt)=+f_{t}(x_{t-1},x_{t};\xi^{\prime}_{t})=+\infty. Then for c(0,1)c\in(0,1), we have ft(xt1,xt;(1c)ξt+cξt)=+f_{t}(x_{t-1},x_{t};(1-c)\xi_{t}+c\xi^{\prime}_{t})=+\infty by the concavity and nonnegativity of ftf_{t}. It follows from upper semicontinuity that ft(xt1,xt;ξt)lim supc0+ft(xt1,xt;(1c)ξt+cξt)=+f_{t}(x_{t-1},x_{t};\xi_{t})\geq\limsup_{c\to 0+}f_{t}(x_{t-1},x_{t};(1-c)\xi_{t}+c\xi^{\prime}_{t})=+\infty, which is a contradiction. ∎

We are now ready to prove the alternative formulation of the recursion (8).

Theorem 9.

Under Assumption 3, if we further assume that the continuous functions gt,jg_{t,j} are convex for j=1,,mj=1,\dots,m and dt,k(ξt,k)=ξt,kξ^t,kd_{t,k}(\xi_{t,k})=\lVert\xi_{t,k}-\hat{\xi}_{t,k}\rVert for k=1,,ntk=1,\dots,n_{t}, then we have

𝒬t1(xt1)=min\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\min j=0mtρt,jλt,j+1ntk=1nt[ht,k(zt,xt,k,ζt,k,λt)+𝒬t(xt,k)]\displaystyle\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\left[h_{t,k}(z_{t},x_{t,k},\zeta_{t,k},\lambda_{t})+\mathcal{Q}_{t}(x_{t,k})\right] (19)
s.t.\displaystyle\mathrm{s.t.} ζt,kλt,0,\displaystyle\left\lVert\zeta_{t,k}\right\rVert_{*}\leq\lambda_{t,0},
zt=xt1,\displaystyle z_{t}=x_{t-1},
λt0mt+1,xt,k𝒳t,\displaystyle\lambda_{t}\in\mathbb{R}^{m_{t}+1}_{\geq 0},x_{t,k}\in\mathcal{X}_{t},

where for each k=1,,ntk=1,\dots,n_{t}, ht,kh_{t,k} is defined as

ht,k(xt1,xt,k,ζt,k,λt):=supξt,kΞtft(xt1,xt,k;ξt,k)j=1mtλt,jgt,j(ξt,k)+ζt,k𝖳(ξt,kξ^t,k).h_{t,k}(x_{t-1},x_{t,k},\zeta_{t,k},\lambda_{t})\mathrel{\mathop{\ordinarycolon}}=\sup_{\xi_{t,k}\in\Xi_{t}}f_{t}(x_{t-1},x_{t,k};\xi_{t,k})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{t,k})+\zeta_{t,k}^{\mathsf{T}}(\xi_{t,k}-\hat{\xi}_{t,k}).
Proof.

By Lemma 8, for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, we can define a set 𝒳t(xt1):=domft(xt1,;ξt)𝒳t\mathcal{X}_{t}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\mathrm{dom}\,f_{t}(x_{t-1},\cdot;\xi_{t})\subseteq\mathcal{X}_{t} which is independent of ξtΞt\xi_{t}\in\Xi_{t} and closed by the lower semicontinuity of ftf_{t}. Note that the norm function has the dual representation dt,k(ξt,k)=ξt,kξ^t,k=maxζ1ζ𝖳(ξt,kξ^t,k)d_{t,k}(\xi_{t,k})=\lVert\xi_{t,k}-\hat{\xi}_{t,k}\rVert=\max_{\lVert\zeta\rVert_{*}\leq 1}\zeta^{\mathsf{T}}(\xi_{t,k}-\hat{\xi}_{t,k}). Thus by the recursion (8), we can write

𝒬t1(xt1)=minλt0mt+1j=0mtρt,jλt,j\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\min_{\lambda_{t}\in\mathbb{R}^{m_{t}+1}_{\geq 0}}\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j} +1ntk=1ntsupξt,kΞt[j=1mtλt,jgt,j(ξt,k)\displaystyle+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{\xi_{t,k}\in\Xi_{t}}\bigg{[}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{t,k})
+minxt,k,ζt,k\displaystyle+\min_{x_{t,k},\zeta_{t,k}}\quad ft(xt1,xt,k;ξt,k)+𝒬t(xt,k)+ζt,k𝖳(ξt,kξ^t,k)]\displaystyle f_{t}(x_{t-1},x_{t,k};\xi_{t,k})+\mathcal{Q}_{t}(x_{t,k})+\zeta_{t,k}^{\mathsf{T}}(\xi_{t,k}-\hat{\xi}_{t,k})\bigg{]}
s.t.\displaystyle\mathrm{s.t.}\quad ζt,kλt,0,\displaystyle\left\lVert\zeta_{t,k}\right\rVert_{*}\leq\lambda_{t,0},
xt,k𝒳t(xt1).\displaystyle x_{t,k}\in\mathcal{X}_{t}(x_{t-1}).

Now for any fixed xt1x_{t-1} and λt\lambda_{t}, we see that the sets {ζt,k:ζt,kλt,0}\{\zeta_{t,k}\mathrel{\mathop{\ordinarycolon}}\lVert\zeta_{t,k}\rVert_{*}\leq\lambda_{t,0}\} and 𝒳t(xt1)\mathcal{X}_{t}(x_{t-1}) are compact. Moreover, the function inside the supremum of ξt,k\xi_{t,k} is concave and upper semicontinuous in ξt,k\xi_{t,k}, while convex and lower semicontinuous in ξt,k\xi_{t,k} and ζt,k\zeta_{t,k}. Thus the result follows by applying Sion’s minimax theorem [24]. ∎

Remark.

The proof remains valid if we replace simultaneously 𝒬t\mathcal{Q}_{t}, 𝒬t1\mathcal{Q}_{t-1}, and ftf_{t} with 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}}, 𝒬t1R\mathcal{Q}_{t-1}^{\mathrm{R}}, and ftRf_{t}^{\mathrm{R}} in the theorem. In this case we use ht,kRh_{t,k}^{\mathrm{R}} to denote the convex conjugate functions.

We provide a possible implementation for noninitial stage SSSO in Algorithm 2 based on Theorem 9. Its correctness is verified by the following corollary.

Algorithm 2 Single Stage Subproblem Oracle Implementation Under Assumption 3
1:function hth_{t}, over- and under-approximations 𝒬¯ti\overline{\mathcal{Q}}_{t}^{i} and 𝒬¯ti\underline{\mathcal{Q}}_{t}^{i}, and a state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}
2:a linear cut 𝒱t1\mathcal{V}_{t-1}, an overestimate vt1v_{t-1}, a state xtx_{t}, and a gap value γt\gamma_{t}
3:Solve the minimization (19) with 𝒬t\mathcal{Q}_{t} replaced by 𝒬¯ti\underline{\mathcal{Q}}_{t}^{i} and store the optimal value vt1v_{t-1}^{*}, optimal solutions λt\lambda_{t}^{*} and (xt,k,ζt,k)k=1nt(x_{t,k}^{*},\zeta_{t,k}^{*})_{k=1}^{n_{t}} and the dual solutions utu_{t} associated with the constraints zt=xt1z_{t}=x_{t-1}
4:for k=1,,ntk=1,\dots,n_{t} do
5:     Compute the gap value γt,k:=𝒬¯ti(xt,k)𝒬¯ti(xt,k)\gamma_{t,k}\mathrel{\mathop{\ordinarycolon}}=\overline{\mathcal{Q}}_{t}^{i}(x_{t,k}^{*})-\underline{\mathcal{Q}}_{t}^{i}(x_{t,k}^{*})
6:Set 𝒱t1()vt1+ut𝖳()\mathcal{V}_{t-1}(\cdot)\leftarrow v_{t-1}^{*}+u_{t}^{\mathsf{T}}(\cdot)
7:Set vt1vt1+1ntk=1ntγt,kv_{t-1}\leftarrow v_{t-1}^{*}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\gamma_{t,k}
8:Take any kargmax{γt,k:k=1,,nt}k^{*}\in\operatorname*{arg\,max}\{\gamma_{t,k}\mathrel{\mathop{\ordinarycolon}}k=1,\dots,n_{t}\} and set xtxt,kx_{t}\leftarrow x_{t,k^{*}}^{*}, γtγt,k\gamma_{t}\leftarrow\gamma_{t,k^{*}}
Corollary 10.

Under the same assumptions of Theorem 9, the outputs (𝒱t1,vt1,xt;γt)(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}) of Algorithm 2 satisfy the conditions in Definition 2.

Proof.

To check the validness of vt1v_{t-1}, let (zt,xt,k,ζt,k,λt)(z_{t}^{*},x_{t,k}^{*},\zeta_{t,k}^{*},\lambda_{t}^{*}) denote an optimal solution in the minimization (19) with 𝒬t\mathcal{Q}_{t} replaced by 𝒬¯ti\underline{\mathcal{Q}}_{t}^{i}. Then we have

vt1\displaystyle v_{t-1} =vt1+1ntk=1ntγt,k\displaystyle=v_{t-1}^{*}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\gamma_{t,k}
=j=0mtρt,jλt,j+1ntk=1nt[ht,k(zt,xt,k,ζt,k,λt)+𝒬¯ti(xt,k)+γt,k]\displaystyle=\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}^{*}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}[h_{t,k}(z_{t}^{*},x_{t,k}^{*},\zeta_{t,k}^{*},\lambda_{t}^{*})+\underline{\mathcal{Q}}_{t}^{i}(x_{t,k}^{*})+\gamma_{t,k}]
j=0mtρt,jλt,j+1ntk=1nt[ht,k(zt,xt,k,ζt,k,λt)+𝒬t(xt,k)]𝒬t1(xt1),\displaystyle\geq\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}^{*}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}[h_{t,k}(z_{t}^{*},x_{t,k}^{*},\zeta_{t,k}^{*},\lambda_{t}^{*})+\mathcal{Q}_{t}(x_{t,k}^{*})]\geq\mathcal{Q}_{t-1}(x_{t-1}),

the last inequality is due to the feasibility of (𝒱t1,vt1,xt;γt)(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}) in the minimization (19). For the validness of 𝒱t1()\mathcal{V}_{t-1}(\cdot), note that the value vt1v_{t-1}^{*} and the dual solution utu_{t} define a valid linear under-approximation for the function 𝒬t1()\mathcal{Q}^{\prime}_{t-1}(\cdot) defined by replacing 𝒬t\mathcal{Q}_{t} with 𝒬¯ti\underline{\mathcal{Q}}_{t}^{i} in the minimization (19). Since clearly 𝒬t1(xt1)𝒬t1(xt1)\mathcal{Q}^{\prime}_{t-1}(x_{t-1})\leq\mathcal{Q}_{t-1}(x_{t-1}) for all xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, we see that 𝒱t1()\mathcal{V}_{t-1}(\cdot) is a valid under-approximation for 𝒬t1()\mathcal{Q}_{t-1}(\cdot). Finally the gap vt1𝒱t1(xt1)=1ntk=1ntγt,kγtv_{t-1}-\mathcal{V}_{t-1}(x_{t-1})=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\gamma_{t,k}\leq\gamma_{t} is controlled. ∎

Theorem 9 and Algorithm 2 would be most useful when the functions ht,kh_{t,k} can be written explicitly as minimization problems. We thus spend the rest of this section to derive the form of ht,kh_{t,k} in a special yet practically important case, where the local cost function ftf_{t} can be written as

ft(xt1,xt;ξt)=min\displaystyle f_{t}(x_{t-1},x_{t};\xi_{t})=\min (Atξt+at)𝖳yt\displaystyle(A_{t}\xi_{t}+a_{t})^{\mathsf{T}}y_{t} (20)
s.t.\displaystyle\mathrm{s.t.} (xt1,yt,xt)t,\displaystyle(x_{t-1},y_{t},x_{t})\in\mathcal{F}_{t},

for some compact convex set t𝒳t1×dt×𝒳t\mathcal{F}_{t}\subseteq\mathcal{X}_{t-1}\times\mathbb{R}^{d^{\prime}_{t}}\times\mathcal{X}_{t} in each stage t𝒯t\in\mathcal{T}. It is straightforward to check that ftf_{t} in (20) is lower semicontinuous and convex in (xt1,xt)(x_{t-1},x_{t}) for any ξtΞt\xi_{t}\in\Xi_{t}. To simplify our discussion, we assume that ft(,xt;ξt)f_{t}(\cdot,x_{t};\xi_{t}) is MtM_{t}-Lipschitz continuous, so by Proposition 6 𝒬t=𝒬tR\mathcal{Q}_{t}=\mathcal{Q}_{t}^{\mathrm{R}} for all t𝒯t\in\mathcal{T}. The problem (20) is a common formulation in the usual MSCO literature, such as [41], where t\mathcal{F}_{t} is supposed to be a polytope.

Proposition 11.

Suppose the local cost function ft(xt1,xt;ξt)f_{t}(x_{t-1},x_{t};\xi_{t}) is given in the form (20) and uniformly MtM_{t}-Lipschitz continuous in the variable xt1x_{t-1}. Fix any point ξ¯tintΞt\bar{\xi}_{t}\in\mathrm{int}\Xi_{t} and let σt(ζ):=supξΞtζ𝖳(ξξ¯t)\sigma_{t}(\zeta)\mathrel{\mathop{\ordinarycolon}}=\sup_{\xi\in\Xi_{t}}\zeta^{\mathsf{T}}(\xi-\bar{\xi}_{t}) denote the support function of the set Ξtξ¯t\Xi_{t}-\bar{\xi}_{t}. If the functions gt,j(ξt)=ξt𝖳Bt,jξt+bt,j𝖳ξtg_{t,j}(\xi_{t})=\xi_{t}^{\mathsf{T}}B_{t,j}\xi_{t}+b_{t,j}^{\mathsf{T}}\xi_{t} are quadratic with coefficients Bt,j𝒮0δtB_{t,j}\in\mathcal{S}^{\delta_{t}}_{\succeq 0} and bt,jδtb_{t,j}\in\mathbb{R}^{\delta_{t}} for j=1,,mtj=1,\dots,m_{t}, then we can write

ht,k(xt1,xt,k,ζt,k,λt)=minyt,k,wt,j,wt,j,κj\displaystyle h_{t,k}(x_{t-1},x_{t,k},\zeta_{t,k},\lambda_{t})=\min_{y_{t,k},w_{t,j},w^{\prime}_{t,j},\kappa_{j}}\quad (Atξ¯t+at)𝖳yt,kj=1mtλt,j[ξ¯t𝖳Bt,jξ¯t+bt,j𝖳ξ¯t]\displaystyle(A_{t}\bar{\xi}_{t}+a_{t})^{\mathsf{T}}y_{t,k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}[\bar{\xi}_{t}^{\mathsf{T}}B_{t,j}\bar{\xi}_{t}+b_{t,j}^{\mathsf{T}}\bar{\xi}_{t}] (21)
+ζt,k𝖳(ξ¯tξ^t,k)+j=1mtκt,j+σt(wt,0)\displaystyle+\zeta_{t,k}^{\mathsf{T}}(\bar{\xi}_{t}-\hat{\xi}_{t,k})+\sum_{j=1}^{m_{t}}\kappa_{t,j}+\sigma_{t}(w_{t,0})
s.t.\displaystyle\mathrm{s.t.}\quad j=0mtwt,j=ζt,k+At𝖳yt,kj=1mtλt,j[2Bt,jξ¯t+bt,j],\displaystyle\sum_{j=0}^{m_{t}}w_{t,j}=\zeta_{t,k}+A_{t}^{\mathsf{T}}y_{t,k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}\big{[}2B_{t,j}\bar{\xi}_{t}+b_{t,j}\big{]},
κt,j0,j=1,,m,\displaystyle\kappa_{t,j}\geq 0,\quad j=1,\dots,m,
wt,jδt,j=0,,m,\displaystyle w_{t,j}\in\mathbb{R}^{\delta_{t}},\quad j=0,\dots,m,
κt,j+λt,j(κt,jλt,j,Ut,jwt,j)2,j=1,,mt,\displaystyle\kappa_{t,j}+\lambda_{t,j}\geq\lVert(\kappa_{t,j}-\lambda_{t,j},U_{t,j}w_{t,j})\rVert_{2},\quad j=1,\dots,m_{t},
wt,j=Bt,jwt,j,wt,jδt,j=1,,m,\displaystyle w_{t,j}=B_{t,j}w^{\prime}_{t,j},\ w^{\prime}_{t,j}\in\mathbb{R}^{\delta_{t}},\quad j=1,\dots,m,
(xt1,yt,k,xt,k)t.\displaystyle(x_{t-1},y_{t,k},x_{t,k})\in\mathcal{F}_{t}.

Here, Ut,jU_{t,j} is a δt×δt\delta_{t}\times\delta_{t} real matrix such that Ut,j𝖳Ut,jU_{t,j}^{\mathsf{T}}U_{t,j} is the pseudoinverse of Bt,jB_{t,j}.

Proof.

Under the assumptions, we can write the function ht,kh_{t,k} as

ht,k(xt1,xt,k,ζt,k,λt)=supξtΞtminyt,k\displaystyle h_{t,k}(x_{t-1},x_{t,k},\zeta_{t,k},\lambda_{t})=\sup_{\xi_{t}\in\Xi_{t}}\min_{y_{t,k}} (Atξt+at)𝖳yt,k+ζt,k𝖳(ξtξ^t,k)j=1mtλt,j(ξt𝖳Bt,jξt+bt,j𝖳ξt)\displaystyle(A_{t}\xi_{t}+a_{t})^{\mathsf{T}}y_{t,k}+\zeta_{t,k}^{\mathsf{T}}(\xi_{t}-\hat{\xi}_{t,k})-\sum_{j=1}^{m_{t}}\lambda_{t,j}(\xi_{t}^{\mathsf{T}}B_{t,j}\xi_{t}+b_{t,j}^{\mathsf{T}}\xi_{t})
s.t.\displaystyle\mathrm{s.t.} (xt1,yt,k,xt,k)t.\displaystyle(x_{t-1},y_{t,k},x_{t,k})\in\mathcal{F}_{t}.

Note that the objective function in (21) is continuous in both yty_{t} and ξt,k\xi_{t,k}, and the projection of t\mathcal{F}_{t} onto the variables yt,ky_{t,k} is compact. Thus by the minimax theorem [24], we can exchange the supremum and minimum operations

ht,k(xt1,xt,k,ζt,k,λt)=minyt,k\displaystyle h_{t,k}(x_{t-1},x_{t,k},\zeta_{t,k},\lambda_{t})=\min_{y_{t,k}} (Atξ¯t+at)𝖳yt,kj=1mtλt,j[ξ¯t𝖳Bt,jξ¯t+bt,j𝖳ξ¯t]+ζt,k𝖳(ξ¯tξ^t,k)\displaystyle(A_{t}\bar{\xi}_{t}+a_{t})^{\mathsf{T}}y_{t,k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}[\bar{\xi}_{t}^{\mathsf{T}}B_{t,j}\bar{\xi}_{t}+b_{t,j}^{\mathsf{T}}\bar{\xi}_{t}]+\zeta_{t,k}^{\mathsf{T}}(\bar{\xi}_{t}-\hat{\xi}_{t,k}) (22)
+supξtδt\displaystyle+\sup_{\xi_{t}\in\mathbb{R}^{\delta_{t}}} {ζt,k𝖳ξtιt(ξt)+yt,k𝖳Atξtj=1mtλt,j(ξt𝖳Bt,jξt+2ξ¯tBt,jξt+bt,j𝖳ξt)}\displaystyle\left\{\zeta_{t,k}^{\mathsf{T}}\xi_{t}-\iota_{t}(\xi_{t})+y_{t,k}^{\mathsf{T}}A_{t}\xi_{t}-\sum_{j=1}^{m_{t}}\lambda_{t,j}(\xi_{t}^{\mathsf{T}}B_{t,j}\xi_{t}+2\bar{\xi}_{t}B_{t,j}\xi_{t}+b_{t,j}^{\mathsf{T}}\xi_{t})\right\}
s.t.\displaystyle\mathrm{s.t.} (xt1,yt,k,xt,k)t,\displaystyle(x_{t-1},y_{t,k},x_{t,k})\in\mathcal{F}_{t},

where ιt\iota_{t} is the convex indicator function of the set Ξtξ¯t\Xi_{t}-\bar{\xi}_{t}, the convex conjugate of which is the support function σt\sigma_{t} by definition. If we further denote φt,j(ξt;λt,j):=λt,j(ξt𝖳Bt,jξt)\varphi_{t,j}(\xi_{t};\lambda_{t,j})\mathrel{\mathop{\ordinarycolon}}=\lambda_{t,j}(\xi_{t}^{\mathsf{T}}B_{t,j}\xi_{t}), the supremum can be written using convex conjugacy as

(ιt+j=1mtφt,j(;λt,j))(ζt,k+At𝖳yt,kj=1mtλt,j[2Bt,jξ¯t+bt,j]).\left(\iota_{t}+\sum_{j=1}^{m_{t}}\varphi_{t,j}(\cdot;\lambda_{t,j})\right)^{*}\bigg{(}\zeta_{t,k}+A_{t}^{\mathsf{T}}y_{t,k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}\big{[}2B_{t,j}\bar{\xi}_{t}+b_{t,j}\big{]}\bigg{)}.

Note that for each j=1,,mtj=1,\dots,m_{t}, the parametrized conjugate function φt,j(;λt,j)\varphi_{t,j}^{*}(\cdot;\lambda_{t,j}) can be written as [39, Example 11.10]

φt,j(w;λt,j)\displaystyle\varphi_{t,j}^{*}(w;\lambda_{t,j}) ={w𝖳Bt,jw4λt,j, if wrangeBt,j,+,otherwise,\displaystyle=\begin{cases}&\displaystyle\frac{w^{\mathsf{T}}B_{t,j}^{\dagger}w}{4\lambda_{t,j}},\text{ if }w\in\mathrm{range}B_{t,j},\\ &+\infty,\quad\text{otherwise},\end{cases}
=min{κt,j0:4κt,jλt,j(Ut,jw)𝖳(Ut,jw),w=Bt,jw}\displaystyle=\min\left\{\kappa_{t,j}\geq 0\mathrel{\mathop{\ordinarycolon}}4\kappa_{t,j}\lambda_{t,j}\geq(U_{t,j}w)^{\mathsf{T}}(U_{t,j}w),w=B_{t,j}w^{\prime}\right\}
=min{κt,j0:κt,j+λt,j(κt,jλt,j,Ut,jw)2,w=Bt,jw},\displaystyle=\min\left\{\kappa_{t,j}\geq 0\mathrel{\mathop{\ordinarycolon}}\kappa_{t,j}+\lambda_{t,j}\geq\lVert(\kappa_{t,j}-\lambda_{t,j},U_{t,j}w)\rVert_{2},w=B_{t,j}w^{\prime}\right\},

which is nonnegative and second-order conic representable. Here the convention for λt,j=0\lambda_{t,j}=0 is consistent: we have φt,j(0;0)=0\varphi_{t,j}^{*}(0;0)=0 and φt,j(w;0)=+\varphi_{t,j}^{*}(w;0)=+\infty for any w0w\neq 0 because (Ut,jBt,j)𝖳(Ut,jBt,j)=Bt,jBt,jBt,j=Bt,j(U_{t,j}B_{t,j})^{\mathsf{T}}(U_{t,j}B_{t,j})=B_{t,j}B_{t,j}^{\dagger}B_{t,j}=B_{t,j}, which implies that Ut,jw=Ut,jBt,jw0U_{t,j}w=U_{t,j}B_{t,j}w^{\prime}\neq 0. Now using the formula for convex conjugate of sum of convex functions, we have

(ιt+j=1mtφt,j(;λt,j))=cl(σtφt,1(;λt,1)φt,mt(;λt,mt)),\left(\iota_{t}+\sum_{j=1}^{m_{t}}\varphi_{t,j}(\cdot;\lambda_{t,j})\right)^{*}=\mathrm{cl}\left(\sigma_{t}{\scriptsize\Square}\varphi_{t,1}^{*}(\cdot;\lambda_{t,1}){\scriptsize\Square}\cdots{\scriptsize\Square}\varphi_{t,m_{t}}(\cdot;\lambda_{t,m_{t}})\right), (23)

where {\scriptsize\Square} denotes the infimal convolution (a.k.a. epi-addition) of two convex functions and cl\mathrm{cl} denotes the lower semicontinuous hull of a proper function. Since ξ¯tintΞt\bar{\xi}_{t}\in\mathrm{int}\Xi_{t}, the support function is coercive, i.e., limwσt(w)=+\lim_{\lVert w\rVert\to\infty}\sigma_{t}(w)=+\infty. Moreover, each φt,j\varphi_{t,j}^{*} is bounded below as it is nonnegative. Therefore, the closure operation is superficial and the convex conjugate of the sum is indeed lower semicontinuous [3, Proposition 12.14]. The rest of the proof follows from substitution of this convex conjugate expression (23) into the supremum in (22). ∎

3.4 Subproblem Oracles: Convex Uncertain Cost Functions

We provide another useful reformulation of the recursion (8) in this section, based on the following assumption.

Assumption 4.

The local cost function ft(xt1,xt;ξt)f_{t}(x_{t-1},x_{t};\xi_{t}) is jointly convex in the state variable xtx_{t} and the uncertainty ξt\xi_{t}, for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. Moreover, the uncertainty set Ξt\Xi_{t} is a pointed polyhedron and the distance function dt,k()d_{t,k}(\cdot) is polyhedrally representable.

We first mention some direct consequences of Assumption 4. First, the value function Qt(xt1;ξt)Q_{t}(x_{t-1};\xi_{t}) would be a convex function in the uncertainty ξt\xi_{t} for each state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, although it may not be a jointly convex function. Second, recall that a polyhedron is pointed if it does not contain any lines. Any point in a pointed polyhedron can be written as a convex combination of its extreme points and rays [9, Theorem 3.37]. Now under Assumption 4, we may define a lifted uncertainty set as Ξ~t,k:={(ζ,ξ):ξΞt,ζdt,k(ξ)}\tilde{\Xi}_{t,k}\mathrel{\mathop{\ordinarycolon}}=\{(\zeta,\xi)\mathrel{\mathop{\ordinarycolon}}\xi\in\Xi_{t},\,\zeta\geq d_{t,k}(\xi)\}. It is easy to see that Ξ~t,k\tilde{\Xi}_{t,k} is also a pointed polyhedron. We denote the finite set of extreme points of it as extΞ~t,k={(ζ~l,ξ~l)}lEt,k\mathrm{ext}\,\tilde{\Xi}_{t,k}=\{(\tilde{\zeta}_{l},\tilde{\xi}_{l})\}_{l\in E_{t,k}} where Et,kE_{t,k} is the set of indices, and want to show that the maximization in (8) can be taken over the finite set {(ζ~l,ξ~l)}lEt,k\{(\tilde{\zeta}_{l},\tilde{\xi}_{l})\}_{l\in E_{t,k}} in two important cases. First, we consider problems with bounded uncertainty sets Ξt\Xi_{t}.

Proposition 12.

Under Assumption 4, if we further assume that Ξt\Xi_{t} is bounded and all functions gt,jg_{t,j} are concave for j=1,,mtj=1,\dots,m_{t}, then the problem (8) can be equivalently reformulated as

𝒬t1(xt1)=minλt,τt\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\min_{\lambda_{t},\tau_{t}}\quad j=0mtρt,jλt,j+1ntk=1ntτt,k\displaystyle\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\tau_{t,k} (24)
s.t.\displaystyle\mathrm{s.t.}\quad λt0,\displaystyle\lambda_{t}\geq 0,
τt,kQt(xt1;ξ~l)λt,0ζ~lj=1mtλt,jgt,j(ξ~l),for all lEt,k and k=1,,nt.\displaystyle\tau_{t,k}\geq Q_{t}(x_{t-1};\tilde{\xi}_{l})-\lambda_{t,0}\tilde{\zeta}_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\tilde{\xi}_{l}),\quad\textnormal{for all }l\in E_{t,k}\textnormal{ and }k=1,\dots,n_{t}.
Proof.

From the definition of lifted uncertainty set Ξ~t\tilde{\Xi}_{t}, we have

supξkΞt{Qt(xt1;ξk)λt,0dt,k(ξk)j=1mtλt,jgt,j(ξk)}\displaystyle\sup_{\xi_{k}\in\Xi_{t}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\right\}
=maxξkΞt,ζk{Qt(xt1;ξk)λt,0ζkj=1mtλt,jgt,j(ξk):ζkdt,k(ξk)}\displaystyle=\max_{\xi_{k}\in\Xi_{t},\zeta_{k}\in\mathbb{R}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}\zeta_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\mathrel{\mathop{\ordinarycolon}}\zeta_{k}\geq d_{t,k}(\xi_{k})\right\}
=max(ζk,ξk)Ξ~t,k{Qt(xt1;ξk)λt,0ζkj=1mtλt,jgt,j(ξk)}\displaystyle=\max_{(\zeta_{k},\xi_{k})\in\tilde{\Xi}_{t,k}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}\zeta_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\right\}
=maxlEt,k{Qt(xt1;ξ~l)λt,0ζ~lj=1mtλt,jgt,j(ξ~l)}.\displaystyle=\max_{l\in E_{t,k}}\left\{Q_{t}(x_{t-1};\tilde{\xi}_{l})-\lambda_{t,0}\tilde{\zeta}_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\tilde{\xi}_{l})\right\}.

To see the last equality, note that if Ξt\Xi_{t} is bounded, then the only recession direction of the lifted uncertainty set Ξ~t,k\tilde{\Xi}_{t,k} is (1,0)(1,0). Since λt,00\lambda_{t,0}\geq 0, any maximum solution (ζk,ξk)(\zeta_{k}^{*},\xi_{k}^{*}) lies in the convex hull of extΞ~t,k\mathrm{ext}\,\tilde{\Xi}_{t,k}. Now the last equality follows from the convexity of the function Qt(xt1;ξk)λt,0ζkj=1mtλt,jgt,j(ξk)Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}\zeta_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k}) in terms of ξk\xi_{k} and ζk\zeta_{k}. Finally, the reformulation is done by replacing the maximum of finitely many functions by its epigraphical representation τt,kQt(xt1;ξl)λt,0ζlj=1mtλt,jgt,j(ξl)\tau_{t,k}\geq Q_{t}(x_{t-1};\xi_{l})-\lambda_{t,0}\zeta_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{l}) for all lEt,kl\in E_{t,k} and k=1,,ntk=1,\dots,n_{t}. ∎

If the uncertainty sets Ξt\Xi_{t} are unbounded, then in general the supremum in (8) can take ++\infty in some unbounded directions of Ξt\Xi_{t}, even when the value function Qt(xt1;)Q_{t}(x_{t-1};\cdot) has finite values everywhere. To avoid such situation, we consider the growth rate of the value function Qt(xt1;)Q_{t}(x_{t-1};\cdot) defined as

rt(xt1):=lim supdt,k(ξt),ξtΞtQt(xt1;ξt)Qt(xt1;ξ^t,k)dt,k(ξt)0,r_{t}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\limsup_{\begin{subarray}{c}d_{t,k}(\xi_{t})\to\infty,\\ \xi_{t}\in\Xi_{t}\end{subarray}}\frac{Q_{t}(x_{t-1};\xi_{t})-Q_{t}(x_{t-1};\hat{\xi}_{t,k})}{d_{t,k}(\xi_{t})}\geq 0, (25)

for any real-valued Qt(xt1,)Q_{t}(x_{t-1},\cdot), where the limit superior is in fact independent of the choice of k=1,,ntk=1,\dots,n_{t}, and the inequality is due to that Qt(xt1;)Q_{t}(x_{t-1};\cdot) is assumed to be lower bounded by 0. Our convention is to set rt(xt1)0r_{t}(x_{t-1})\equiv 0 when Ξt\Xi_{t} is bounded. We now consider problems with unbounded uncertainty sets Ξt\Xi_{t}.

Proposition 13.

Under Assumption 4, if Qt(xt1;)Q_{t}(x_{t-1};\cdot) has finite growth rate rt(xt1)r_{t}(x_{t-1}) and all functions gt,jg_{t,j} are bounded and concave for j=1,,mtj=1,\dots,m_{t}, then the problem (8) with any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} such that 𝒬t1(xt1)<+\mathcal{Q}_{t-1}(x_{t-1})<+\infty can be equivalently reformulated as

𝒬t1(xt1)=minλt,τt\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\min_{\lambda_{t},\tau_{t}}\quad j=0mtρt,jλt,j+1ntk=1ntτt,k\displaystyle\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\tau_{t,k} (26)
s.t.\displaystyle\mathrm{s.t.}\quad λt0,\displaystyle\lambda_{t}\geq 0,
λt,0rt(xt1),\displaystyle\lambda_{t,0}\geq r_{t}(x_{t-1}),
τt,kQt(xt1;ξ~l)λt,0ζ~lj=1mtλt,jgt,j(ξ~l),for all lEt,k and k=1,,nt.\displaystyle\tau_{t,k}\geq Q_{t}(x_{t-1};\tilde{\xi}_{l})-\lambda_{t,0}\tilde{\zeta}_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\tilde{\xi}_{l}),\quad\textnormal{for all }l\in E_{t,k}\textnormal{ and }k=1,\dots,n_{t}.
Proof.

We claim that the supremum

supξkΞt{Qt(xt1;ξk)λt,0dt,k(ξk)j=1mtλt,jgt,j(ξk)}<+\sup_{\xi_{k}\in\Xi_{t}}\biggl{\{}Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\biggr{\}}<+\infty

if and only if λt,0rt(xt1)\lambda_{t,0}\geq r_{t}(x_{t-1}), for each k=1,,ntk=1,\dots,n_{t}. Suppose λt,0<rt(xt1)\lambda_{t,0}<r_{t}(x_{t-1}). By definition (25), there exists a sequence {ξk(i)}iΞt\{\xi_{k}^{(i)}\}_{i\in\mathbb{N}}\subseteq\Xi_{t} and a constant ε>0\varepsilon>0 such that dt,k(ξk(i))d_{t,k}(\xi_{k}^{(i)})\to\infty as ii\to\infty and Qt(xt1;ξk(i))Qt(xt1;ξ^t,k)+(λt,0+ε)dt,k(ξk(i))Q_{t}(x_{t-1};\xi_{k}^{(i)})\geq Q_{t}(x_{t-1};\hat{\xi}_{t,k})+(\lambda_{t,0}+\varepsilon)d_{t,k}(\xi_{k}^{(i)}). Thus supi{Qt(xt1;ξk(i))λt,0dt,k(ξk(i))j=1mtλt,jgt,j(ξk(i))}supi{εdt,k(ξk(i))j=1mtλt,jgt,j(ξk(i))}=+\sup_{i\in\mathbb{N}}\{Q_{t}(x_{t-1};\xi_{k}^{(i)})-\lambda_{t,0}d_{t,k}(\xi_{k}^{(i)})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k}^{(i)})\}\geq\sup_{i\in\mathbb{N}}\{\varepsilon d_{t,k}(\xi_{k}^{(i)})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k}^{(i)})\}=+\infty as gt,j(ξk(i))g_{t,j}(\xi_{k}^{(i)}) for j=1,,mtj=1,\dots,m_{t} are bounded.

Conversely, by definition (25), there exists a constant d¯>dt,k(ξ^t,k)\bar{d}>d_{t,k}(\hat{\xi}_{t,k}) such that Qt(xt1;ξk)Qt(xt1;ξ^t,k)+λt,0dt,k(ξk)Q_{t}(x_{t-1};\xi_{k})\leq Q_{t}(x_{t-1};\hat{\xi}_{t,k})+\lambda_{t,0}d_{t,k}(\xi_{k}) for all ξkΞt\xi_{k}\in\Xi_{t} with dt,k(ξk)d¯d_{t,k}(\xi_{k})\geq\bar{d}. Thus we have

supξkΞt{Qt(xt1;ξk)λt,0dt,k(ξk)j=1mtλt,jgt,j(ξk)}\displaystyle\sup_{\xi_{k}\in\Xi_{t}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\right\}
supdt,k(ξk)d¯{Qt(xt1;ξk)λt,0dt,k(ξk)}+supξkΞtj=1mt(λt,jgt,j(ξk))\displaystyle\leq\sup_{d_{t,k}(\xi_{k})\leq\bar{d}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}d_{t,k}(\xi_{k})\right\}+\sup_{\xi_{k}\in\Xi_{t}}\sum_{j=1}^{m_{t}}\left(-\lambda_{t,j}g_{t,j}(\xi_{k})\right)
=max(ζk,ξk)Ξ~t(d¯){Qt(xt1;ξk)λt,0ζk}+supξkΞtj=1mt(λt,jgt,j(ξk))<+,\displaystyle=\max_{(\zeta_{k},\xi_{k})\in\tilde{\Xi}_{t}(\bar{d})}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}\zeta_{k}\right\}+\sup_{\xi_{k}\in\Xi_{t}}\sum_{j=1}^{m_{t}}\left(-\lambda_{t,j}g_{t,j}(\xi_{k})\right)<+\infty,

where Ξ~t(d¯):={(ζ,ξ):ξΞt,dt,k(ξ)d¯,ζdt,k(ξ)}\tilde{\Xi}_{t}(\bar{d})\mathrel{\mathop{\ordinarycolon}}=\{(\zeta,\xi)\mathrel{\mathop{\ordinarycolon}}\xi\in\Xi_{t},d_{t,k}(\xi)\leq\bar{d},\zeta\geq d_{t,k}(\xi)\}, and the maximum is finite because it is attained on some extreme point (ζ¯k,ξ¯k)Ξ~t(d¯)(\bar{\zeta}_{k},\bar{\xi}_{k})\in\tilde{\Xi}_{t}(\bar{d}) by convexity, so Qt(xt1;ξ¯k)λt,0ζ¯k<+Q_{t}(x_{t-1};\bar{\xi}_{k})-\lambda_{t,0}\bar{\zeta}_{k}<+\infty.

Now from this claim, we see that for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} such that 𝒬t1(xt1)<+\mathcal{Q}_{t-1}(x_{t-1})<+\infty, the problem (8) can be formulated equivalently as

𝒬t1(xt1)=minλt0\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\min_{\lambda_{t}\geq 0} 1ntk=1ntsup(ζk,ξk)Ξt{Qt(xt1;ξk)λt,0ζkj=1mtλt,jgt,j(ξk)}+j=0mtρt,jλt,j\displaystyle\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{(\zeta_{k},\xi_{k})\in\Xi_{t}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{t,0}\zeta_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\xi_{k})\right\}+\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{t,j}
s.t.\displaystyle\mathrm{s.t.} λt,0rt(xt1).\displaystyle\lambda_{t,0}\geq r_{t}(x_{t-1}).

The supremum can be attained in Ξ~t,k:=conv(extΞ~t,k)\tilde{\Xi}^{\prime}_{t,k}\mathrel{\mathop{\ordinarycolon}}=\mathrm{conv}(\mathrm{ext}\,\tilde{\Xi}_{t,k}): otherwise there exists a point (ζˇk,ξˇk)Ξ~t,kΞ~t,k(\check{\zeta}_{k},\check{\xi}_{k})\in\tilde{\Xi}_{t,k}\setminus\tilde{\Xi}^{\prime}_{t,k} and (ζ¯k,ξ¯k)Ξ~t,k(\bar{\zeta}_{k},\bar{\xi}_{k})\in\tilde{\Xi}^{\prime}_{t,k} such that

Qt(xt1;ξˇk)λt,0ζˇkj=1mtλt,jgt,j(ξˇk)>Qt(xt1;ξ¯k)λt,0ζ¯kj=1mtλt,jgt,j(ξ¯k).Q_{t}(x_{t-1};\check{\xi}_{k})-\lambda_{t,0}\check{\zeta}_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\check{\xi}_{k})>Q_{t}(x_{t-1};\bar{\xi}_{k})-\lambda_{t,0}\bar{\zeta}_{k}-\sum_{j=1}^{m_{t}}\lambda_{t,j}g_{t,j}(\bar{\xi}_{k}).

In other words, (ζˇk,ξˇk)(ζ¯k,ξ¯k)(\check{\zeta}_{k},\check{\xi}_{k})-(\bar{\zeta}_{k},\bar{\xi}_{k}) defines a strictly increasing ray of Ξ~t\tilde{\Xi}_{t}, which by convexity implies that the supremum is ++\infty, a contradiction. Using the convexity again as in the proof of Proposition 12, we conclude that the supremum is indeed attained in extΞ~t,k\mathrm{ext}\,\tilde{\Xi}_{t,k}, and this completes the proof. ∎

Proposition 13 reduces to Proposition 12 since the growth rate rt(xt1)=0r_{t}(x_{t-1})=0 and any continuous function gt,jg_{t,j} over a bounded polyhedron is bounded. The finite growth rate condition is often satisfied, especially when the value function Qt(xt1;)Q_{t}(x_{t-1};\cdot) is Lipschitz continuous. However, it is in general difficult to estimate the growth rate (25). Fortunately, in some application problem, we can derive explicitly the growth rate, as discussed in Section 4.3.

Note that the problems (24) and (26) are standard linear optimization problems in the variables λt\lambda_{t} and τt\tau_{t}. Thus by strong duality, we can write the dual problem as

𝒬t1(xt1)=maxθt,κt,k,l0\displaystyle\mathcal{Q}_{t-1}(x_{t-1})=\max_{\theta_{t},\kappa_{t,k,l}\geq 0} θtrt(xt1)+k=1ntlEt,kκt,k,lQt(xt1;ξ~l)\displaystyle\theta_{t}r_{t}(x_{t-1})+\sum_{k=1}^{n_{t}}\sum_{l\in E_{t,k}}\kappa_{t,k,l}Q_{t}(x_{t-1};\tilde{\xi}_{l}) (27)
s.t.\displaystyle\mathrm{s.t.} lEt,kκt,k,l=1nt,k=1,,nt,\displaystyle\sum_{l\in E_{t,k}}\kappa_{t,k,l}=\frac{1}{n_{t}},\quad k=1,\dots,n_{t},
θt+k=1ntlEt,kζ~lκt,k,lρt,0,\displaystyle\theta_{t}+\sum_{k=1}^{n_{t}}\sum_{l\in E_{t,k}}\tilde{\zeta}_{l}\kappa_{t,k,l}\leq\rho_{t,0},
k=1ntlEt,kgt,j(ξ~l)κt,k,lρt,j,j=1,,mt.\displaystyle\sum_{k=1}^{n_{t}}\sum_{l\in E_{t,k}}g_{t,j}(\tilde{\xi}_{l})\kappa_{t,k,l}\leq\rho_{t,j},\quad j=1,\dots,m_{t}.

Consequently, any feasible dual solutions θt\theta_{t} and κt,k,l\kappa_{t,k,l} to the dual (27) define a valid under-approximation

𝒬t1(xt1)θtrt(xt1)+k=1ntlEt,kκt,k,lQt(xt1;ξ~l),xt1𝒳t1.\mathcal{Q}_{t-1}(x_{t-1})\geq\theta_{t}r_{t}(x_{t-1})+\sum_{k=1}^{n_{t}}\sum_{l\in E_{t,k}}\kappa_{t,k,l}Q_{t}(x_{t-1};\tilde{\xi}_{l}),\quad\forall\,x_{t-1}\in\mathcal{X}_{t-1}. (28)
Algorithm 3 Single Stage Subproblem Oracle Implementation Under Assumption 4
1: over- and under-approximations 𝒬¯ti\overline{\mathcal{Q}}_{t}^{i} and 𝒬¯ti\underline{\mathcal{Q}}_{t}^{i}, a state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, growth rate rt(xt1)r_{t}(x_{t-1}), and extreme point sets extΞ~t,k\mathrm{ext}\,\tilde{\Xi}_{t,k} for k=1,,ntk=1,\dots,n_{t}
2:a linear cut 𝒱t1\mathcal{V}_{t-1}, an overestimate vt1v_{t-1}, a state xtx_{t}, and a gap value γt\gamma_{t}
3:for k=1,,ntk=1,\dots,n_{t} do
4:     for lEt,kl\in E_{t,k} do
5:         Evaluate the approximate value function
Q¯t(xt1;ξ~l):=minxt𝒳tft(xt1,xt;ξ~l)+𝒬¯ti(xt)\underline{Q}_{t}(x_{t-1};\tilde{\xi}_{l})\mathrel{\mathop{\ordinarycolon}}=\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\tilde{\xi}_{l})+\underline{\mathcal{Q}}_{t}^{i}(x_{t})
with a minimizer stored as xt,k,lx_{t,k,l} and a subgradient vector as ut,k,lQ¯t(;ξ~l)u_{t,k,l}\in\partial\underline{Q}_{t}(\cdot;\tilde{\xi}_{l}) at xt1x_{t-1}
6:         Calculate γt,k,l:=𝒬¯ti(xt,k,l)𝒬¯ti(xt,k,l)\gamma_{t,k,l}\mathrel{\mathop{\ordinarycolon}}=\overline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})-\underline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})      
7:Solve the problem (26) (or (24) if Ξt\Xi_{t} is bounded) with Qt(xt1;ξ~l)Q_{t}(x_{t-1};\tilde{\xi}_{l}) replaced by Q¯t(xt1;ξ~l)\underline{Q}_{t}(x_{t-1};\tilde{\xi}_{l}) and store the optimal value vt1v_{t-1}^{*} and dual solutions θt\theta_{t}^{*}, κt,k,l\kappa_{t,k,l}^{*} to (27)
8:for k=1,,ntk=1,\dots,n_{t} do
9:     Take any largmax{𝒬¯ti(xt,k,l)𝒬¯ti(xt,k,l):lEt,k}l^{*}\in\operatorname*{arg\,max}\big{\{}\overline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})-\underline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})\mathrel{\mathop{\ordinarycolon}}l\in E_{t,k}\big{\}}
10:     Set γt,kγt,k,l\gamma_{t,k}\leftarrow\gamma_{t,k,l^{*}} and xt,kxt,k,lx_{t,k}\leftarrow x_{t,k,l^{*}}
11:Take a subgradient wtrt()w_{t}\in\partial r_{t}(\cdot) at xt1x_{t-1}
12:Set 𝒱t1()vt1+θtwt𝖳(xt1)+k=1ntlEtκt,k,lut,k,l𝖳(xt1)\mathcal{V}_{t-1}(\cdot)\leftarrow v_{t-1}^{*}+\theta^{*}_{t}w_{t}^{\mathsf{T}}(\cdot-x_{t-1})+\sum_{k=1}^{n_{t}}\sum_{l\in E_{t}}\kappa_{t,k,l}^{*}u_{t,k,l}^{\mathsf{T}}(\cdot-x_{t-1})
13:Set vt1vt1+1ntk=1ntγt,kv_{t-1}\leftarrow v_{t-1}^{*}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\gamma_{t,k}
14:Take any kargmax{γt,k:k=1,,nt}k^{*}\in\operatorname*{arg\,max}\{\gamma_{t,k}\mathrel{\mathop{\ordinarycolon}}k=1,\dots,n_{t}\} and set xtxt,kx_{t}\leftarrow x_{t,k^{*}}, γtγt,k\gamma_{t}\leftarrow\gamma_{t,k^{*}}

We now describe an SSSO implementation in Algorithm 3. Its correctness is verified in the following corollary.

Corollary 14.

Suppose that the growth rate function rt()r_{t}(\cdot) is convex. Under the assumptions of of Proposition 13, the outputs (𝒱t1,vt1,xt;γt)(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}) of Algorithm 3 satisfy the conditions in Definition 2.

Proof.

The validness of 𝒱t1()\mathcal{V}_{t-1}(\cdot) follows directly from the inequality (28) and the fact that Q¯t(xt1;ξk)Qt(xt1;ξk)\underline{Q}_{t}(x_{t-1};\xi_{k})\leq Q_{t}(x_{t-1};\xi_{k}) for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and ξkΞt\xi_{k}\in\Xi_{t} by definition. To see the validness of vt1v_{t-1}, note that for any k=1,,ntk=1,\dots,n_{t} and lEt,kl\in E_{t,k}, we have

Qt(xt1;ξ~l)\displaystyle Q_{t}(x_{t-1};\tilde{\xi}_{l}) minxt𝒳t[ft(xt1,xt;ξ~l)+𝒬¯ti(xt)]\displaystyle\leq\min_{x_{t}\in\mathcal{X}_{t}}\left[f_{t}(x_{t-1},x_{t};\tilde{\xi}_{l})+\overline{\mathcal{Q}}_{t}^{i}(x_{t})\right]
ft(xt1,xt,k,l;ξ~l)+𝒬¯ti(xt,k,l)\displaystyle\leq f_{t}(x_{t-1},x_{t,k,l};\tilde{\xi}_{l})+\overline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})
ft(xt1,xt,k,l;ξ~l)+𝒬¯ti(xt,k,l)+γt,k,l\displaystyle\leq f_{t}(x_{t-1},x_{t,k,l};\tilde{\xi}_{l})+\underline{\mathcal{Q}}_{t}^{i}(x_{t,k,l})+\gamma_{t,k,l}
Q¯t(xt1;ξ~l)+γt,k\displaystyle\leq\underline{Q}_{t}(x_{t-1};\tilde{\xi}_{l})+\gamma_{t,k}

by the definition of γt,k\gamma_{t,k} in Algorithm 3. Thus for any optimal solution λt\lambda_{t}^{*} to the problem (26) with Qt(xt1;ξ~l)Q_{t}(x_{t-1};\tilde{\xi}_{l}) replaced by Q¯t(xt1;ξ~l)\underline{Q}_{t}(x_{t-1};\tilde{\xi}_{l}), we have

maxlEt,kQt(xt1;ξ~l)λt,0ζ~lj=1mtλt,jgt,j(ξ~l)maxlEt,kQ¯t(xt1;ξ~l)λt,0ζ~lj=1mtλt,jgt,j(ξ~l)+γt,k,\displaystyle\max_{l\in E_{t,k}}Q_{t}(x_{t-1};\tilde{\xi}_{l})-\lambda_{t,0}^{*}\tilde{\zeta}_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}^{*}g_{t,j}(\tilde{\xi}_{l})\leq\max_{l\in E_{t,k}}\underline{Q}_{t}(x_{t-1};\tilde{\xi}_{l})-\lambda_{t,0}^{*}\tilde{\zeta}_{l}-\sum_{j=1}^{m_{t}}\lambda_{t,j}^{*}g_{t,j}(\tilde{\xi}_{l})+\gamma_{t,k},

and consequently vt1𝒬t1(xt1)v_{t-1}\geq\mathcal{Q}_{t-1}(x_{t-1}) since λt\lambda_{t}^{*} is also a feasible solution to the minimization in (26). Finally, the gap is controlled since vt1𝒱t1(xt1)=1ntk=1ntγt,kγt,k=𝒬¯ti(xt)𝒬¯ti(xt)v_{t-1}-\mathcal{V}_{t-1}(x_{t-1})=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\gamma_{t,k}\leq\gamma_{t,k^{*}}=\overline{\mathcal{Q}}_{t}^{i}(x_{t})-\underline{\mathcal{Q}}_{t}^{i}(x_{t}). ∎

4 Numerical Experiments

In this section, we first introduce baseline models used for comparison against the DR-MCO model (1). Then we present comprehensive numerical studies of two application problems: the multi-commodity inventory problem with either uncertain demands or uncertain prices, and the hydro-thermal power system planning problem with uncertain water inflows.

4.1 Baseline Models and Experiment Settings

For performance comparison, we introduce two types of baseline models in addition to the DR-MCO with Wasserstein ambiguity sets (7). The first baseline model is the simple multistage robust convex optimization (MRCO) model, where we simply consider the worst-case outcome out of the uncertainty set Ξt\Xi_{t} in each stage tt. Namely, the cost-to-go functions of the MRCO can be defined recursively as

𝒬t1Robust(xt1):=supξtΞtminxt𝒳tft(xt1,xt;ξt)+𝒬tRobust(xt),t=T,T1,,2.\mathcal{Q}_{t-1}^{\mathrm{Robust}}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\sup_{\xi_{t}\in\Xi_{t}}\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Robust}}(x_{t}),\quad t=T,T-1,\dots,2. (29)

When the sum ft(xt1,xt;ξt)+𝒬tRobust(xt)f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{Robust}}(x_{t}) is jointly convex in the state xtx_{t} and the uncertainty ξt\xi_{t} for any given xt1x_{t-1}, then the supremum can be attained at some extreme point of the convex hull of Ξt\Xi_{t} if it is finite. In particular, if we have relatively complete recourse, (i.e., the sum is always finite for any given xt1x_{t-1}), and if Ξt\Xi_{t} is a polytope, (i.e., it is a convex hull of finitely many points), then we can enumerate over the extreme points of Ξt\Xi_{t} to find the supremum, which allows us to solve the simple MRCO by Algorithm 1. In general, if the uncertainty set Ξt\Xi_{t} is unbounded, then the cost-to-go functions of the MRCO model can take ++\infty everywhere, so we will only use the baseline MRCO model when we have polytope uncertainty sets Ξt\Xi_{t}.

The second type of baseline models consists of risk-neutral and risk-averse multistage stochastic convex optimization (MSCO) models. The nominal probability measures in the MSCO models can be either the empirical measure ν^t\hat{\nu}_{t}, or a probability measure associated with the sample average approximation (SAA) of a fitted probability measure, which we denote as ν~t=1ntk=1ntδξ~t,k\tilde{\nu}_{t}=\frac{1}{n^{\prime}_{t}}\sum_{k=1}^{n^{\prime}_{t}}\delta_{\tilde{\xi}_{t,k}}. The main difference here is that the outcomes in an SAA probability measure ξ~t,1,,ξ~t,nt\tilde{\xi}_{t,1},\dots,\tilde{\xi}_{t,n^{\prime}_{t}} can be different from those given by the empirical measure ξ^t,1,,ξ^t,nt\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}. Moreover, we are able to take nt>ntn^{\prime}_{t}>n_{t} for a potentially better training effect. To ease the notation, we also allow ν~t=ν^t\tilde{\nu}_{t}=\hat{\nu}_{t} to happen when we describe the risk measures in the rest of this section.

For the risk-averse MSCO models, we use the risk measure that is called conditional value-at-risk (CVaR, a.k.a. average value-at-risk or expected shortfall). Its coherence leads to a dual representation [34], that allows the risk-averse MSCO models solved by Algorithm 1 with a straightforward implementation of SSSO. For simplicity, we only introduce the CVaR risk-averse MSCO based on this dual representation, and any interested reader is referred to [43] for the primal definition and the proof of duality.

Given parameters α(0,1)\alpha\in(0,1) and β[0,1]\beta\in[0,1], we define the cost-to-go functions associated with the (α,β)(\alpha,\beta)-CVaR risk measures recursively for t=T,T1,,2t=T,T-1,\dots,2 as

𝒬t1CVaR(xt1):=maxpt𝒫tCVaRk=1ntpt,k{minxt𝒳tft(xt1,xt;ξ~t,k)+𝒬tCVaR(xt)},\mathcal{Q}_{t-1}^{\mathrm{CVaR}}(x_{t-1})\mathrel{\mathop{\ordinarycolon}}=\max_{p_{t}\in\mathcal{P}_{t}^{\mathrm{CVaR}}}\sum_{k=1}^{n^{\prime}_{t}}p_{t,k}\biggl{\{}\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\tilde{\xi}_{t,k})+\mathcal{Q}_{t}^{\mathrm{CVaR}}(x_{t})\biggr{\}}, (30)

where the ambiguity set is defined as

𝒫tCVaR:={pt=(pt,1,,pt,nt)nt:0pt,kβnt+1βαnt,k=1,,nt;k=1ntpt,k=1,}.\mathcal{P}_{t}^{\mathrm{CVaR}}\mathrel{\mathop{\ordinarycolon}}=\left\{p_{t}=(p_{t,1},\dots,p_{t,n^{\prime}_{t}})\in\mathbb{R}^{n^{\prime}_{t}}\mathrel{\mathop{\ordinarycolon}}\begin{aligned} &0\leq p_{t,k}\leq\frac{\beta}{n_{t}}+\frac{1-\beta}{\alpha n_{t}},\\ &k=1,\dots,n^{\prime}_{t};\ \sum_{k=1}^{n^{\prime}_{t}}p_{t,k}=1,\end{aligned}\right\}. (31)

Note that when β=1\beta=1, the ambiguity set 𝒫tCVaR={(1nt,,1nt)}\mathcal{P}_{t}^{\mathrm{CVaR}}=\{(\frac{1}{n^{\prime}_{t}},\dots,\frac{1}{n^{\prime}_{t}})\} has only one element corresponding to the SAA probability measure ν~t\tilde{\nu}_{t}. Thus the CVaR risk-averse MSCO model (30) reduces to the risk-neutral nominal MSCO in this case. Alternatively, if β=0\beta=0 and α1nt\alpha\leq\frac{1}{n_{t}}, then the CVaR risk-averse MSCO model 30 considers only the worst outcome of the SAA probability measure in each stage. We remark that both the simple MRCO model (29) and the CVaR risk-averse MSCO model (30) can be solved by Algorithm 1 since only finitely many outcomes need to be considered in each stage tt. More details on the SSSO for these two baseline models can be found in [48].

Our numerical experiments aim to demonstrate two attractive aspects of the DR-MCO models on some application problems: better out-of-sample performance compared to the baseline models, and ability to achieve out-of-sample performance guarantee with reasonable conservatism. For ease of evaluation, we assume that we have the knowledge of the true underlying probability measure ν(Ξ2××ΞT)\nu\in\mathcal{M}(\Xi_{2}\times\cdots\times\Xi_{T}), and thus the marginal probability measures νt:=Pt(ν)\nu_{t}\mathrel{\mathop{\ordinarycolon}}=P^{t}_{*}(\nu), where PtP^{t}_{*} is the pushforward of the canonical projection map Pt:Ξ2×ΞTΞtP^{t}\mathrel{\mathop{\ordinarycolon}}\Xi_{2}\times\cdots\Xi_{T}\to\Xi_{t}, for t=2,,Tt=2,\dots,T. Here, we do not restrict our attention to the case that ν\nu is a product of ν2,,νT\nu_{2},\dots,\nu_{T}, i.e., the true probability measure is stagewise independent, so our modeling (2) can be used as approximation for problems under general stochastic processes. The experiments are then carried out in the following procedures with a uniform number of data points nt=nn_{t}=n in each stage.

  1. 1.

    Draw nn iid samples from ν\nu to form the empirical probability measures ν^t\hat{\nu}_{t};

  2. 2.

    Construct the baseline models and DR-MCO models using ν^t\hat{\nu}_{t};

  3. 3.

    Solve these models using our DDP algorithm (Algorithm 1) to a desired accuracy or within the maximum number of iterations or computation time;

  4. 4.

    Draw NN iid sample paths from ν\nu and evaluate the performance profiles (mean, variance, and quantiles) of the models on these sample paths.

In particular, we focus on limited or moderate training sample sizes n{5,10,20,40}n\in\{5,10,20,40\}, while keeping our sizes of evaluation sample paths to be relatively large (N=100,000N=100,000). In each independent test run of our numerical experiment, the training samples used in a smaller-sized test are kept in larger-sized tests, and the evaluation sample paths are held unchanged for all models and sample sizes.

Our algorithms and numerical examples are implemented using Julia 1.6 [8], with Gurobi 9.0 [22] interfaced through the JuMP package (version 0.23) [12]. We use 25 single-core 2.1GHz Intel Xeon processors (24 for the worker processes and 1 for the manager process) with 50 GByte of RAM to allow parallelization of the SSSO (Algorithm 3).

4.2 Multi-commodity Inventory Problems

We consider a multi-commodity inventory problem which is adapted from the ones studied in [18, 48]. Let 𝒥{1,2,,J}\mathcal{J}\coloneqq\{1,2,\dots,J\} denote the set of product indices. We first describe the variables in each stage t𝒯t\in\mathcal{T}. We use xt,jlx^{l}_{t,j} to denote the variable of inventory level, yt,jay^{a}_{t,j} (resp. xt,jbx^{b}_{t,j}) to denote the amount of express (resp. standard) order fulfilled in the current (resp. subsequent) stage, and yt,jry^{r}_{t,j} to denote the amount of rejected order of each product j𝒥j\in\mathcal{J}. Let xt:=(xt,1l,,xt,Kl,xt,1b,,xt,Kb)x_{t}\mathrel{\mathop{\ordinarycolon}}=(x^{l}_{t,1},\dots,x^{l}_{t,K},x^{b}_{t,1},\dots,x^{b}_{t,K}) be the state variable and yt:=(yt,1a,,yt,Ka,yt,1r,,yt,Kr)y_{t}\mathrel{\mathop{\ordinarycolon}}=(y^{a}_{t,1},\dots,y^{a}_{t,K},y^{r}_{t,1},\dots,y^{r}_{t,K}) be the internal variable for each stage t𝒯t\in\mathcal{T}. The stage tt subproblem can be defined through the local cost functions ftf_{t} as

ft(xt1,xt;ξt)minyt\displaystyle f_{t}(x_{t-1},x_{t};\xi_{t})\coloneqq\min_{y_{t}}\quad CF+j𝒥(Ct,jayt,ja+Ct,jbxt,jb+Cjryt,jr+CjH[xt,jl]++CjB[xt,jl])\displaystyle C^{F}+\sum_{j\in\mathcal{J}}\left(C^{a}_{t,j}y^{a}_{t,j}+C^{b}_{t,j}x^{b}_{t,j}+C^{r}_{j}y^{r}_{t,j}+C^{H}_{j}[x^{l}_{t,j}]_{+}+C^{B}_{j}[x^{l}_{t,j}]_{-}\right) (32)
s.t.\displaystyle\mathrm{s.t.}\quad j𝒥yt,jaBc,\displaystyle\sum_{j\in\mathcal{J}}y^{a}_{t,j}\leq B^{c},
xt,jlxt1,jl+yt,ja+xt1,jb+yt,jrDt,j,j𝒥,\displaystyle x^{l}_{t,j}\leq x^{l}_{t-1,j}+y^{a}_{t,j}+x^{b}_{t-1,j}+y^{r}_{t,j}-D_{t,j},\forall\,j\in\mathcal{J},
yt,ja[0,Bja],yt,jr[0,Dt,j],j𝒥,\displaystyle y^{a}_{t,j}\in[0,B^{a}_{j}],\ y^{r}_{t,j}\in[0,D_{t,j}],\ \forall\,j\in\mathcal{J},
xt,jb[0,Bjb],xt,jl[Bjl,,Bjl,+],j𝒥.\displaystyle x^{b}_{t,j}\in[0,B^{b}_{j}],\ x^{l}_{t,j}\in[B^{l,-}_{j},B^{l,+}_{j}],\ \forall\,j\in\mathcal{J}.

In the definition (32), we use Ct,ja=Ct,ja(ξt)C^{a}_{t,j}=C^{a}_{t,j}(\xi_{t}) (resp. Ct,jb=Ct,jb(ξt)C^{b}_{t,j}=C^{b}_{t,j}(\xi_{t})) to denote the uncertain express (resp. standard) order unit cost, CjHC^{H}_{j} (resp. CjBC^{B}_{j}) the inventory holding (resp. backlogging) unit cost, CjrC^{r}_{j} the penalty on order rejections, CF1C^{F}\equiv 1 a positive fixed cost, BjaB^{a}_{j} (resp. BjbB^{b}_{j}) the bound for the express (resp. standard) order, and Bjl,,Bjl,+B^{l,-}_{j},B^{l,+}_{j} the bounds on the backlogging and inventory levels, Dt,j=Dt,j(ξt)D_{t,j}=D_{t,j}(\xi_{t}) the uncertain demand for the product jj, respectively. The first constraint in (32) is a cumulative bound BcB^{c} on the express orders, the second constraint characterizes the change in the inventory level, and the rest are bounds on the decision variables with respect to each product. The notations [x]+max{x,0}[x]_{+}\coloneqq\max\{x,0\} and [x]min{0,x}[x]_{-}\coloneqq-\min\{0,x\} are used to denote the positive and negative parts of a real number xx. The initial state is given by x0,jb=x0,jl=0x_{0,j}^{b}=x_{0,j}^{l}=0 for all j𝒥j\in\mathcal{J}. Before we discuss the details of the uncertain parameters Ct,ja,Ct,jbC^{a}_{t,j},C^{b}_{t,j} or Dt,jD_{t,j}, we make the following remarks on the definition (32).

First, it is easy to check that if we Ct,ja,Ct,jbC_{t,j}^{a},C_{t,j}^{b} (resp. Dt,jD_{t,j}) are deterministic, then Assumption 4 (resp. Assumption 3) is satisfied so we are able to apply the SSSO implementations discussed in Sections 3.4 and 3.3. Second, as the bounds Bjl,,Bjl,+B_{j}^{l,-},B_{j}^{l,+} do not change with tt and all orders can be rejected (i.e., yt,jr=Dt,jy_{t,j}^{r}=D_{t,j} is feasible for all j𝒥j\in\mathcal{J}), we see that the problem (32) has relatively complete recourse. Third, the Lipschitz constant of the value functions Qt(;ξt)Q_{t}(\cdot;\xi_{t}) is uniformly bounded by j𝒥Cjr\sum_{j\in\mathcal{J}}C_{j}^{r}, so Proposition 6 can be applied here if we set the regularization factors to be sufficiently large Mtj𝒥CjrM_{t}\geq\sum_{j\in\mathcal{J}}C_{j}^{r}. Besides, the Lipschitz continuity guarantees the in-sample adjustable conservatism by Theorem 5. Last, since all state variables are bounded, together with the above observation, we know by Theorem 7 that Algorithm 1 would always converge on our inventory problem (32).

4.2.1 Inventory Problems with Uncertain Demands

First, we consider the inventory problems with uncertain demands, where the goal is to seek a policy with minimum mean inventory cost plus the penalty on order rejections. The uncertain demands are modeled by the following expression:

Dt,j(ξt):=D0[1+cos(2π(t+j)τ)]+D¯ξt,j,j𝒥,t𝒯.D_{t,j}(\xi_{t})\mathrel{\mathop{\ordinarycolon}}=D_{0}\left[1+\cos\biggl{(}\frac{2\pi(t+j)}{\tau}\biggr{)}\right]+\bar{D}\cdot\xi_{t,j},\quad j\in\mathcal{J},t\in\mathcal{T}. (33)

Here, D0D_{0} is a factor and τ\tau is the period for the base demands, and D¯\bar{D} is the bound on the uncertain demands. The uncertain vector ξt[0,1]J\xi_{t}\in[0,1]^{J} has its components described as follows: ξt,1Uniform(0,1)\xi_{t,1}\sim\mathrm{Uniform}(0,1), and for j=2,,Jj=2,\dots,J, we have

ξt,jξt,j1{Uniform(0,(1+ξt,j1)/2),if ξt,j112,Uniform(ξt,j1/2,1),otherwise.\xi_{t,j}\mid\xi_{t,j-1}\sim\left\{\begin{aligned} &\mathrm{Uniform}(0,(1+\xi_{t,j-1})/2),\quad&&\text{if }\xi_{t,j-1}\leq\frac{1}{2},\\ &\mathrm{Uniform}(\xi_{t,j-1}/2,1),\quad&&\text{otherwise}.\end{aligned}\right. (34)

For the experiments, we consider J=3J=3 products and T=τ=5T=\tau=5 stages. The unit prices of each product are deterministically set to Ct,ja=5C^{a}_{t,j}=5 and Ct,jb=1C^{b}_{t,j}=1 for all t𝒯t\in\mathcal{T}; the inventory and holding costs are CjH=2C_{j}^{H}=2 and CjB=10C_{j}^{B}=10, and the rejection costs are Cjr=100C_{j}^{r}=100, for each j𝒥j\in\mathcal{J}. The bounds are set to Bc=15B^{c}=15, Bja=10B_{j}^{a}=10, Bjb=20B_{j}^{b}=20, Bjl,=10B_{j}^{l,-}=10, and Bjl,+=100B_{j}^{l,+}=100 for each j𝒥j\in\mathcal{J}. We pick the uncertainty parameters D0=5D_{0}=5 and D¯=50\bar{D}=50. We terminate the DDP algorithm if it reaches 1% relative optimality or 2000 iterations. For Wasserstein ambiguity sets, we only consider the radius constraint (i.e., mt=0m_{t}=0) with radius set to be relative to the following estimation of the distance among data points:

d^t:=maxk=1,,ntWt(ν^t,δξ^t,k)=maxk=1,,nt1ntkkξ^t,kξ^t,k.\hat{d}_{t}\mathrel{\mathop{\ordinarycolon}}=\max_{k=1,\dots,n_{t}}W_{t}(\hat{\nu}_{t},\delta_{\hat{\xi}_{t,k}})=\max_{k=1,\dots,n_{t}}\frac{1}{n_{t}}\sum_{k^{\prime}\neq k}\lVert\hat{\xi}_{t,k}-\hat{\xi}_{t,k^{\prime}}\rVert. (35)

For the MSCO models, we directly use the empirical probability measures ν^t\hat{\nu}_{t} for each t𝒯t\in\mathcal{T}. Further, we consider parameters α{0.01,0.05,0.10}\alpha\in\{0.01,0.05,0.10\} and β{0.0,0.25,0.50,0.75}\beta\in\{0.0,0.25,0.50,0.75\} for the CVaR risk-averse MSCO models.

Using the experiment procedure described in Section 4.1, we present the results of our data-driven DR-MCO model with Wasserstein ambiguity sets and the baseline models.

Refer to caption
Figure 1: Out-of-sample Cost Quantiles for Different Radii on Multi-commodity Inventory with Uncertain Demands

Figure 1 (and Figures 5 and 6 in Section B) displays the out-of-sample cost quantiles of the nominal stochastic model and the DR-MCO models with different Wasserstein radii, constructed from the empirical probability measures ν^t\hat{\nu}_{t}. Here we use the log radius -\infty to denote the nominal stochastic model, i.e., ρt,0=0\rho_{t,0}=0. From the plot, we see that in small-sample case (nt=5n_{t}=5), the Wasserstein DR-MCO model significantly reduces the top 10% out-of-sample evaluation costs when the radius is set to be 101.4101.010^{-1.4}-10^{-1.0} of the estimation d^t\hat{d}_{t}. Moreover, the difference between top 10% and bottom 10% of the out-of-sample evaluation costs becomes smaller around the 100.8d^t10^{-0.8}\cdot\hat{d}_{t} even for larger sample sizes. However, the median out-of-sample cost increases with the Wasserstein radius, suggesting that larger Wasserstein radii may lead to overly conservative policies.

Refer to caption
Figure 2: Comparison against Baseline Models on Multi-commodity Inventory with Uncertain Demands

To better quantify the trade-off between mean and variance of the out-of-sample evaluation costs, we present Figure 2 (and Figures 7 and 8 in Section B). Here, the lines connect the points representing Wasserstein DR-MCO models from the smallest radius to the largest one. We say one policy dominates another policy if the former has smaller mean and standard deviation than the latter does, and have the following observations. First, in all cases, the policy from the robust model is dominated by some policy from the Wasserstein DR-MCO model. Second, the policies from the nominal and CVaR MSCO models are all dominated by some policy from the DR-MCO model, when the sample size is small (nt=5n_{t}=5). Last, policies from the CVaR MSCO model could penetrate the frontier formed by policies from the DR-MCO model, especially when the sample sizes increase to nt=20n_{t}=20 or 4040.

4.2.2 Inventory Problems with Uncertain Prices

Now we discuss the inventory problems with uncertain prices and fixed demands. These problems can be viewed as a simplified model for supply contract problems [28], where the goal is to estimate the total cost of such supply contract and under-estimation would be very undesirable. The uncertain prices are modeled by the following expression:

Ct,jb(ξt):=ξt,j,Ct,ja(ξt):=C1ξt,j,j𝒥,t𝒯.C^{b}_{t,j}(\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\xi_{t,j},\quad C^{a}_{t,j}(\xi_{t})\mathrel{\mathop{\ordinarycolon}}=C_{1}\cdot\xi_{t,j},\quad j\in\mathcal{J},t\in\mathcal{T}. (36)

Here, C1C_{1} is a factor for express orders. The uncertain vector ξt0J\xi_{t}\in\mathbb{R}^{J}_{\geq 0} follows a truncated multivariate normal distribution:

ξt:=max{Normal(μt,C¯Σt),C¯},μt:=C0[1+sin(2π(t+j)τ)],\xi_{t}\mathrel{\mathop{\ordinarycolon}}=\max\left\{\mathrm{Normal}(\mu_{t},\bar{C}\cdot\Sigma_{t}),\underline{\hbox{$C$}}\right\},\quad\mu_{t}\mathrel{\mathop{\ordinarycolon}}=C_{0}\left[1+\sin\biggl{(}\frac{2\pi(t+j)}{\tau}\biggr{)}\right], (37)

where the maximum is taken componentwise, C0C_{0} is a factor for base prices, τ\tau is the period, C¯\bar{C} is the magnitude on the price variation, C¯\underline{\hbox{$C$}} is the lower bound on the prices, and the covariance matrix Σt\Sigma_{t} is randomly generated (by multiplying a uniformly distributed random matrix with its transpose) and normalized to have its maximum eigenvalue equal to 1. The demands are deterministically given by

Dt,j:=D0[1+cos(2π(t+j)τ)]+D¯,j𝒥,t𝒯.D_{t,j}\mathrel{\mathop{\ordinarycolon}}=D_{0}\left[1+\cos\biggl{(}\frac{2\pi(t+j)}{\tau}\biggr{)}\right]+\bar{D},\quad j\in\mathcal{J},t\in\mathcal{T}. (38)

For the experiments, we consider J=5J=5 products, T=10T=10 stages, and the period τ=5\tau=5. The price uncertainty has parameters C0=1C_{0}=1, C1=5C_{1}=5, C¯=0.1\bar{C}=0.1, and C¯=0.001\underline{\hbox{$C$}}=0.001. We choose the demand parameters D0=5D_{0}=5 and D¯=10\bar{D}=10. The inventory and holding costs are CjH=1C_{j}^{H}=1 and CjB=10C_{j}^{B}=10, and the rejection costs are Cjr=100C_{j}^{r}=100, for each j𝒥j\in\mathcal{J}. The bounds are set to Bc=15B^{c}=15, Bja=10B_{j}^{a}=10, Bjb=20B_{j}^{b}=20, Bjl,=20B_{j}^{l,-}=20, and Bjl,+=20B_{j}^{l,+}=20 for each j𝒥j\in\mathcal{J}. The Wasserstein radii in the DR-MCO models are set relatively with respect to d^t\hat{d}_{t} defined in (35). The baseline MSCO models are constructed in the same way as described in Section 4.2.1. We do not consider the baseline MRCO model here as the uncertainty set Ξt\Xi_{t} is unbounded.

Refer to caption
Figure 3: In-sample and Out-of-sample Mean Costs on Multi-commodity Inventory with Uncertain Prices

We plot the in-sample objective costs and out-of-sample mean evaluation costs in Figure 3 (and Figures 9 and 10 in Section B). The label Nominal\mathrm{Nominal} refers to the nominal risk-neutral MSCO model using the empirical probability measures ν^t\hat{\nu}_{t}; Wass(γ)\mathrm{Wass}(\gamma) refers to the Wasserstein DR-MCO model with radius ρt,0=γd^t\rho_{t,0}=\gamma\cdot\hat{d}_{t} in each stage t2t\geq 2; and CVaR(α,β)\mathrm{CVaR}(\alpha,\beta) refers to the CVaR risk-averse MSCO model with parameters α\alpha and β\beta. As the uncertainty vectors now have an unbounded support, the robust model is no longer applicable. We see that in all cases, the in-sample objective cost grows linearly with respect to the Wasserstein distance, as predicted by Theorem 5. As the nominal stochastic model inevitably under-estimates the mean evaluation costs, using Wasserstein DR-MCO models with a relative radius γ[1.6,2.4]\gamma\in[1.6,2.4] depending on the sample size ntn_{t} could achieve the out-of-sample performance guarantee in almost all test cases. Moreover, none of the CVaR risk-averse model in the experiments could achieve similar effect. Thus we believe that the Wasserstein DR-MCO models are particularly more favorable in the context of supply contracts. It is however worth mentioning that we do not observe any improvement of the mean or the variance of evaluation costs from the Wasserstein DR-MCO model over the baseline models.

4.3 Hydro-thermal Power Planning Problem

We next consider the Brazilian interconnected power system planning problem described in [11]. Let 𝒥={1,,J}\mathcal{J}=\{1,\dots,J\} denote the indices of four regions in the system, and =j𝒥j\mathcal{L}=\cup_{j\in\mathcal{J}}\mathcal{L}_{j} the indices of thermal power plants, where each of the disjoint subsets j\mathcal{L}_{j} is associated with the region j𝒥j\in\mathcal{J}. We first describe the decision variables in each stage t𝒯t\in\mathcal{T}. We use xt,jlx^{l}_{t,j} to denote the stored energy level, yt,jhy^{h}_{t,j} to denote the hydro power generation, and yt,jsy^{s}_{t,j} to denote the energy spillage, of some region j𝒥j\in\mathcal{J}; and yt,lgy^{g}_{t,l} to denote the thermal power generation for some thermal power plant ll\in\mathcal{L}. For two different regions jj𝒥j\neq j^{\prime}\in\mathcal{J}, we use yt,j,jey^{e}_{t,j,j^{\prime}} to denote the energy exchange from region jj to region jj^{\prime}, and yt,j,jay^{a}_{t,j,j^{\prime}} to denote the deficit account for region jj in region jj^{\prime}. Let xt:=(xt,1l,,xt,Jl)x_{t}\mathrel{\mathop{\ordinarycolon}}=(x^{l}_{t,1},\dots,x^{l}_{t,J}) be the state vector of energy levels, yty_{t} the internal decision vector consisting of yt,jsy^{s}_{t,j} and yt,jhy^{h}_{t,j} for j𝒥j\in\mathcal{J}, yt,lgy^{g}_{t,l} for all ll\in\mathcal{L}, yt,j,jey^{e}_{t,j,j^{\prime}} and yt,j,jay^{a}_{t,j,j^{\prime}} for any jj𝒥j\neq j^{\prime}\in\mathcal{J}; and (ξt,1,,ξt,J)0J(\xi_{t,1},\dots,\xi_{t,J})\in\mathbb{R}^{J}_{\geq 0} the uncertain vector energy inflows in stage tt. We define the DR-MCO by specifying cost functions ftf_{t} as

ft(xt1,xt;ξt)minyt\displaystyle f_{t}(x_{t-1},x_{t};\xi_{t})\coloneqq\min_{y_{t}}\quad j𝒥(Csyt,js+ljClgyt,lg+jj(Cj,jeyt,j,je+Cj,jayt,j,ja))\displaystyle\sum_{j\in\mathcal{J}}\bigg{(}C^{s}y^{s}_{t,j}+\sum_{l\in\mathcal{L}_{j}}C^{g}_{l}y^{g}_{t,l}+\sum_{j^{\prime}\neq j}\big{(}C^{e}_{j,j^{\prime}}y^{e}_{t,j,j^{\prime}}+C^{a}_{j,j^{\prime}}y^{a}_{t,j,j^{\prime}}\big{)}\bigg{)} (39)
s.t.\displaystyle\mathrm{s.t.}\quad xt,jl+yt,jh+yt,js=xt1,jl+ξt,j,j𝒥,\displaystyle x^{l}_{t,j}+y^{h}_{t,j}+y^{s}_{t,j}=x^{l}_{t-1,j}+\xi_{t,j},\ \forall\,j\in\mathcal{J},
yt,jh+ljyt,lg+jj(yt,j,jayt,j,je+yt,j,je)=Dt,j,j𝒥,\displaystyle y^{h}_{t,j}+\sum_{l\in\mathcal{L}_{j}}y^{g}_{t,l}+\sum_{j^{\prime}\neq j}(y^{a}_{t,j,j^{\prime}}-y^{e}_{t,j,j^{\prime}}+y^{e}_{t,j^{\prime},j})=D_{t,j},\ \forall\,j\in\mathcal{J},
yt,lg[Blg,,Blg,+],l,\displaystyle y^{g}_{t,l}\in[B^{g,-}_{l},B^{g,+}_{l}],\ \forall\,l\in\mathcal{L},
xt,jl[0,Bjl],yt,jh[0,Bjh],j𝒥,\displaystyle x^{l}_{t,j}\in[0,B^{l}_{j}],\ y^{h}_{t,j}\in[0,B^{h}_{j}],\ \forall\,j\in\mathcal{J},
yt,j,ja[0,Bj,ja],yt,j,je[0,Bj,je],j,j𝒥.\displaystyle y^{a}_{t,j,j^{\prime}}\in[0,B^{a}_{j,j^{\prime}}],\ y^{e}_{t,j,j^{\prime}}\in[0,B^{e}_{j,j^{\prime}}],\ \forall\,j,j^{\prime}\in\mathcal{J}.

Here in the formulation, CsC^{s} denotes the unit penalty on energy spillage, ClgC^{g}_{l} the unit cost of thermal power generation of plant ll, Cj,jeC^{e}_{j,j^{\prime}} the unit cost of power exchange from region jj to region jj^{\prime}, Cj,jaC^{a}_{j,j^{\prime}} the unit cost on the energy deficit account for region jj in region kk^{\prime}, Dt,jD_{t,j} the deterministic power demand in stage tt and region kk, BjlB^{l}_{j} the bound on the storage level in region jj, BjhB^{h}_{j} the bound on hydro power generation in region jj, Blg,,Blg,+B^{g,-}_{l},B^{g,+}_{l} the lower and upper bounds of thermal power generation in plant ll, Bj,jaB^{a}_{j,j^{\prime}} the bound on the deficit account for region jj in region jj^{\prime} such that jjBj,ja=Dt,j\sum_{j^{\prime}\neq j}B_{j,j^{\prime}}^{a}=D_{t,j}, and Bj,jeB^{e}_{j,j^{\prime}} the bound on the energy exchange from region jj to region jj^{\prime}. The first constraint in (39) characterizes the change of energy storage levels in each region, the second constraint imposes the power generation-demand balance for each region, and the rest are bounds on the decision variables. The initial state x0x_{0} and uncertainty vector ξ1\xi_{1} are given by data. In our experiment, we consider T=13T=13 and all other parameters used in this problem can be found in [11].

For the problem (39), we have the following remarks. First, we always have the relatively complete recourse as we allow spillage for extra energy inflows and deficit for energy demands in each region. Then it is straightforward to check that the Lipschitz constant of ftf_{t} in either xt1x_{t-1} or ξt\xi_{t} can be bounded by the maximum of the deficit cost Cj,jaC_{j,j^{\prime}}^{a} and the spillage penalty CsC^{s}. Second, as now the uncertainty has an unbounded support 0J\mathbb{R}^{J}_{\geq 0}, we need to estimate the growth rate of the value function rt(xt1)r_{t}(x_{t-1}). Note that if for any region the inflow is sufficiently large such that all demands and energy exchanges have met their upper bounds, then the only cost incurred by further increasing the inflow is simply the spillage penalty. Thus we conclude that rt(xt1)=CsJr_{t}(x_{t-1})=C^{s}\cdot J, which is a constant function for all xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. Last, it is easy to see that Assumption 4 holds for the problem (39) and that the state variables 𝒳t\mathcal{X}_{t} is compact, so our SSSO implementation (Algorithm 3) would guarantee the convergence of our DDP algorithm (Algorithm 1).

Now we assume that the true uncertainty can be described by the following logarithmic autoregressive time series:

lnξtμt=φt(lnξt1μt1)+εt,εtNormal(0,Σt),\ln{\xi_{t}}-\mu_{t}=\varphi_{t}(\ln{\xi_{t-1}}-\mu_{t-1})+\varepsilon_{t},\quad\varepsilon_{t}\sim\mathrm{Normal}(0,\Sigma_{t}), (40)

where the logarithm and the product are taken componentwise, and the parameters μt,φtJ\mu_{t},\varphi_{t}\in\mathbb{R}^{J} and Σt𝒮0J\Sigma_{t}\in\mathcal{S}_{\succeq 0}^{J} are fit from historical data (see modeling details in [44] and coding details in [11]). Note that (40) is not linear with respect to the uncertainty vectors ξt\xi_{t}, and consequently it cannot directly be reformulated into a stagewise independent MSCO (or a DR-MCO) [30]. While there are approaches based on Markov chain DDP or linearized version of the model (40), they would require alteration of the DDP algorithm or an increase in the state space dimension. Alternatively, we would like to study the effects of the stagewise independence assumption [13] and the Wasserstein ambiguity sets in our DR-MCO model (2).

We can see that under the assumption on the true uncertainty (40), each uncertainty ξt\xi_{t} follows a multivariate lognormal distribution. Thus instead of directly using the empirical probability measures ν^t\hat{\nu}_{t} in the MSCO models, we can fit lognormal distributions based on the empirical outcomes ξ^t,1,,ξ^t,nt\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}, from which we further construct the SAA probability measures ν~t\tilde{\nu}_{t} for the MSCO models. Moreover, we can also use this SAA probability measure ν~t\tilde{\nu}_{t} to estimate the Wasserstein distance bound ρt,0\rho_{t,0} using

d~t:=Wt(ν^t,ν~t)=minπk,k0\displaystyle\tilde{d}_{t}\mathrel{\mathop{\ordinarycolon}}=W_{t}(\hat{\nu}_{t},\tilde{\nu}_{t})=\min_{\pi_{k,k^{\prime}}\geq 0} k=1ntk=1ntπk,kdt(ξ^t,k,ξ~t,k)\displaystyle\sum_{k=1}^{n_{t}}\sum_{k^{\prime}=1}^{n^{\prime}_{t}}\pi_{k,k^{\prime}}\cdot d_{t}(\hat{\xi}_{t,k},\tilde{\xi}_{t,k^{\prime}}) (41)
s.t.\displaystyle\mathrm{s.t.} k=1ntπk,k=1nt,k=1,,nt,\displaystyle\sum_{k^{\prime}=1}^{n^{\prime}_{t}}\pi_{k,k^{\prime}}=\frac{1}{n_{t}},\quad k=1,\dots,n_{t},
k=1ntπk,k=1nt,k=1,,nt.\displaystyle\sum_{k=1}^{n_{t}}\pi_{k,k^{\prime}}=\frac{1}{n^{\prime}_{t}},\quad k=1,\dots,n^{\prime}_{t}.

We then set the Wasserstein radius to be ρt,0=γd~t\rho_{t,0}=\gamma\cdot\tilde{d}_{t} for the relative factors γ=102.0,101.8,,100.2,1.0\gamma=10^{-2.0},10^{-1.8},\dots,10^{-0.2},1.0. For the baseline risk-averse MSCO models, we use CVaR parameters α=0.1\alpha=0.1 and β{0.0,0.1,,0.9,1.0}\beta\in\{0.0,0.1,\dots,0.9,1.0\}.

Note that as the SAA resampling step is random, the performance of our DR-MCO models and MSCO models would also be random. In addition, as it is often very challenging to solve the problem (39) to certain optimality gap, we choose to terminate it with a maximum of 1000 iterations, and allow random sampling in the Algorithm 1, in which the noninitial stage step does not strictly follow Definition 2 and guarantees only the validness. The benefit of such random sampling is that empirically the under-approximations (hence the policies) often improve faster especially in the beginning stage of the algorithms. We refer any interested readers to stochastic DDP literature (e.g., [2, 49, 26]) for more details.

Refer to caption
Figure 4: Comparison against Baseline Models on Hydro-thermal Power Planning

Due to the randomness of the models and the algorithm, we repeat each test 3 times with the same empirical probability measures ν^t\hat{\nu}_{t} in the experiment procedure. Then we plot the median values with error bars indicating the maximum and minimum values in Figure 4 (and Figures 11 and 12 in Section B) with increasing values of γ\gamma in the Wasserstein DR-MCO models and decreasing values of β\beta in the CVaR risk-averse MSCO models. First, we see that most of the Wasserstein DR-MCO models and the CVaR risk-averse MSCO models achieve better performances in either the mean or the standard deviation of evaluation costs, compared with the risk-neutral MSCO model. This observation justifies the usage of ambiguity sets or risk aversion when we approximate a stagewise dependent problem with stagewise independent models. Second, when the sample sizes are small (nt=5n_{t}=5 or 1010), we see that the Wasserstein DR-MCO could outperform the CVaR risk-averse MSCO models (e.g., for γ[101.0,100.6]\gamma\in[10^{-1.0},10^{-0.6}]) in the out-of-sample mean cost. Third, as the sample size grows to nt=20n_{t}=20 or 4040, while the policies obtained from Wasserstein DR-MCO models are always dominated by those from the CVaR risk-averse MSCO models, the latter could achieve both lower mean cost and standard deviation. Our conjecture is that for larger sample sizes, the cost-to-go functions of the Wasserstein DR-MCO models have more complicated shapes, thus making it hard to approximate in the limited 1000 iterations. This also suggests that the CVaR risk-averse MSCO models could lead to good out-of-sample performances when an probability distribution fitting is possible and the computation budget is limited.

5 Concluding Remarks

In this work, we study the data-driven DR-MCO models with Wasserstein ambiguity sets. We show that with a sufficiently large Wasserstein radius, such DR-MCO model satisfies out-of-sample performance guarantee with high probability even with limited data sizes. Using convex dual reformulation, we show that the in-sample conservatism is linearly bounded by the radius when the value functions are Lipschitz continuous in the uncertainties. To numerically solve the data-driven DR-MCO models, we design exact SSSO subproblems for the DDP algorithms by exploiting either the concavity of the convexity of the cost functions in terms of the uncertainties. We conduct extensive numerical experiments to compare the performance of our DR-MCO models against MRCO models, risk-neutral and risk-averse MSCO models. On the multi-commodity inventory problems with uncertain demands, we observe that the DR-MCO models are able to provide policies that dominate those baseline model policies in the out-of-sample evaluations when the in-sample data size is small. Moreover, on the inventory problems with uncertain prices, the DR-MCO models could achieve out-of-sample performance guarantee with little compromise of the objective value, which has not been achieved by the baseline models. On the hydro-thermal power planning problems with uncertain energy inflows, we see that with limited number of iterations, while the policies from the DR-MCO models could achieve better out-of-sample performances than the risk-averse MSCO models for small data sizes, they are dominated by the risk-averse MSCO models for larger data sizes. We hope these numerical experiments could serve as benchmarks for future studies on DR-MCO problems.

References

  • [1] R Baucke, A Downward, and G Zakeri. A deterministic algorithm for solving stochastic minimax dynamic programmes. Optimization Online, 2018.
  • [2] Regan Baucke, Anthony Downward, and Golbon Zakeri. A deterministic algorithm for solving multistage stochastic programming problems. Optimization Online, pages 1–25, 2017.
  • [3] Heinz H Bauschke, Patrick L Combettes, et al. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
  • [4] Güzin Bayraksan and David K Love. Data-driven stochastic programming using phi-divergences. In The operations research revolution, pages 1–19. INFORMS, 2015.
  • [5] Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  • [6] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization, volume 28. Princeton university press, 2009.
  • [7] Dimitris Bertsimas, Vishal Gupta, and Nathan Kallus. Data-driven robust optimization. Mathematical Programming, 167(2):235–292, 2018.
  • [8] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
  • [9] Michele Conforti, Gérard Cornuéjols, Giacomo Zambelli, et al. Integer programming, volume 271. Springer, 2014.
  • [10] Erick Delage and Yinyu Ye. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
  • [11] Lingquan Ding, Shabbir Ahmed, and Alexander Shapiro. A python package for multi-stage stochastic programming. Optimization online, pages 1–42, 2019.
  • [12] Iain Dunning, Joey Huchette, and Miles Lubin. Jump: A modeling language for mathematical optimization. SIAM Review, 59(2):295–320, 2017.
  • [13] Daniel Duque and David P Morton. Distributionally robust stochastic dual dynamic programming. SIAM Journal on Optimization, 30(4):2841–2865, 2020.
  • [14] Andreas Eichhorn and Werner Römisch. Polyhedral risk measures in stochastic programming. SIAM Journal on Optimization, 16(1):69–95, 2005.
  • [15] Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1):115–166, 2018.
  • [16] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3):707–738, 2015.
  • [17] Rui Gao and Anton Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. Mathematics of Operations Research, 2022.
  • [18] Angelos Georghiou, Angelos Tsoukalas, and Wolfram Wiesemann. Robust dual dynamic programming. Operations Research, 67(3):813–830, 2019.
  • [19] Pierre Girardeau, Vincent Leclere, and Andrew B Philpott. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research, 40(1):130–145, 2015.
  • [20] Vincent Guigues. Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convex programs. SIAM Journal on Optimization, 26(4):2468–2494, 2016.
  • [21] Vincent Guigues and Werner Römisch. Sampling-based decomposition methods for multistage stochastic programs based on extended polyhedral risk measures. SIAM Journal on Optimization, 22(2):286–312, 2012.
  • [22] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022.
  • [23] Grani A Hanasusanto and Daniel Kuhn. Conic programming reformulations of two-stage distributionally robust linear programs over wasserstein balls. Operations Research, 66(3):849–869, 2018.
  • [24] Hidetoshi Komiya. Elementary proof for sion’s minimax theorem. Kodai mathematical journal, 11(1):5–7, 1988.
  • [25] Václav Kozmík and David P Morton. Evaluating policies in risk-averse multi-stage stochastic programming. Mathematical Programming, 152(1):275–300, 2015.
  • [26] Guanghui Lan. Complexity of stochastic dual dynamic programming. Mathematical Programming, pages 1–38, 2020.
  • [27] Guanghui Lan. Correction to: Complexity of stochastic dual dynamic programming. Mathematical Programming, pages 1–3, 2022.
  • [28] Chung-Lun Li and Panos Kouvelis. Flexible and risk-sharing supply contracts under price uncertainty. Management science, 45(10):1378–1398, 1999.
  • [29] Fengming Lin, Xiaolei Fang, and Zheming Gao. Distributionally robust optimization: A review on theory and applications. Numerical Algebra, Control & Optimization, 12(1):159, 2022.
  • [30] Nils Löhndorf and Alexander Shapiro. Modeling time-dependent randomness in stochastic dual dynamic programming. European Journal of Operational Research, 273(2):650–661, 2019.
  • [31] Mario VF Pereira and Leontina MVG Pinto. Multi-stage stochastic optimization applied to energy planning. Mathematical programming, 52(1):359–375, 1991.
  • [32] Andrew B Philpott, Vitor L de Matos, and Lea Kapelevich. Distributionally robust sddp. Computational Management Science, 15(3):431–454, 2018.
  • [33] Andrew B Philpott and Ziming Guan. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters, 36(4):450–455, 2008.
  • [34] Andy Philpott, Vitor de Matos, and Erlon Finardi. On solving multistage stochastic programs with coherent risk measures. Operations Research, 61(4):957–970, 2013.
  • [35] Alois Pichler and Alexander Shapiro. Mathematical foundations of distributionally robust multistage optimization. SIAM Journal on Optimization, 31(4):3044–3067, 2021.
  • [36] Hamed Rahimian, Güzin Bayraksan, and Tito Homem-de Mello. Identifying effective scenarios in distributionally robust stochastic programs with total variation distance. Mathematical Programming, 173(1):393–430, 2019.
  • [37] Hamed Rahimian and Sanjay Mehrotra. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659, 2019.
  • [38] R Tyrrell Rockafellar. Convex analysis. Number 28. Princeton university press, 1970.
  • [39] R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  • [40] Herbert E Scarf. A min-max solution of an inventory problem. Rand Corporation Santa Monica, 1957.
  • [41] Alexander Shapiro. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research, 209(1):63–72, 2011.
  • [42] Alexander Shapiro and Shabbir Ahmed. On a class of minimax stochastic programs. SIAM Journal on Optimization, 14(4):1237–1249, 2004.
  • [43] Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczynski. Lectures on stochastic programming: modeling and theory. SIAM, 2021.
  • [44] Alexander Shapiro, Wajdi Tekaya, Joari P da Costa, and Murilo P Soares. Final report for technical cooperation between georgia institute of technology and ons-operador nacional do sistema elétrico. Georgia Tech ISyE Report, 2012.
  • [45] Alexander Shapiro, Wajdi Tekaya, Joari Paulo da Costa, and Murilo Pereira Soares. Risk neutral and risk averse stochastic dual dynamic programming method. European journal of operational research, 224(2):375–391, 2013.
  • [46] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
  • [47] Wolfram Wiesemann, Daniel Kuhn, and Melvyn Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358–1376, 2014.
  • [48] Shixuan Zhang and Xu Andy Sun. On distributionally robust multistage convex optimization: New algorithms and complexity analysis. arXiv preprint arXiv:2010.06759, 2020.
  • [49] Shixuan Zhang and Xu Andy Sun. Stochastic dual dynamic programming for multistage stochastic mixed-integer nonlinear optimization. Mathematical Programming, pages 1–51, 2022.
  • [50] Chaoyue Zhao and Yongpei Guan. Data-driven risk-averse stochastic optimization with wasserstein metric. Operations Research Letters, 46(2):262–267, 2018.

Appendix A Finite-dimensional Dual Recursion for DR-MCO

In this section, we briefly review some general strong Lagrangian duality theory and then apply it to derive our finite-dimensional dual recursion for our DR-MCO problems (2).

A.1 Generalized Slater Condition and Lagrangian Duality

Given an \mathbb{R}-vector space \mathcal{M}, we consider the following optimization problem.

vP:=infμC\displaystyle v^{\mathrm{P}}\mathrel{\mathop{\ordinarycolon}}=\inf_{\mu\in C}\quad φ0(μ)\displaystyle\varphi_{0}(\mu) (42)
s.t.\displaystyle\mathrm{s.t.}\quad φj(μ)0,j=1,,l,\displaystyle\varphi_{j}(\mu)\leq 0,\quad j=1,\dots,l,
φj(μ)=0,j=l+1,,m.\displaystyle\varphi_{j}(\mu)=0,\quad j=l+1,\dots,m.

Here, CC\subset\mathcal{M} is a convex subset, the functions φj:{+}\varphi_{j}\mathrel{\mathop{\ordinarycolon}}\mathcal{M}\to\mathbb{R}\cup\{+\infty\} are convex for each j=0,1,,lj=0,1,\dots,l and φj:\varphi_{j}\mathrel{\mathop{\ordinarycolon}}\mathcal{M}\to\mathbb{R} are affine for each j=l+1,,mj=l+1,\dots,m. Using a vector of multipliers λm\lambda\in\mathbb{R}^{m}, the Lagrangian dual problem of (42) can be written as

vD:=supλΛinfμC{φ0(μ)+j=1mλjφj(μ)},v^{\mathrm{D}}\mathrel{\mathop{\ordinarycolon}}=\sup_{\lambda\in\Lambda}\ \inf_{\mu\in C}\left\{\varphi_{0}(\mu)+\sum_{j=1}^{m}\lambda_{j}\varphi_{j}(\mu)\right\}, (43)

where the admissible set for the multipliers is defined as Λ:={λm:λj0,j=1,,l}\Lambda\mathrel{\mathop{\ordinarycolon}}=\{\lambda\in\mathbb{R}^{m}\mathrel{\mathop{\ordinarycolon}}\lambda_{j}\geq 0,\,\forall\,j=1,\dots,l\}. We want to show the strong duality between (42) and (43), given the following condition.

Definition 3.

We say that the problem (42) satisfies the (generalized) Slater condition if the point η=0\eta=0 is in the relative interior of the effective domain of the convex value function associated with the primal problem (42)

v(η):=infμC{φ0(μ):φj(μ)=ηj,j=1,,l, and φj(μ)ηj,j=l+1,,m},ηm.v(\eta)\mathrel{\mathop{\ordinarycolon}}=\inf_{\mu\in C}\left\{\varphi_{0}(\mu)\mathrel{\mathop{\ordinarycolon}}\varphi_{j}(\mu)=\eta_{j},\,j=1,\dots,l,\text{ and }\varphi_{j}(\mu)\leq\eta_{j},\,j=l+1,\dots,m\right\},\quad\eta\in\mathbb{R}^{m}.

Recall that the effective domain of a convex function v:m{±}v\mathrel{\mathop{\ordinarycolon}}\mathbb{R}^{m}\to\mathbb{R}\cup\{\pm\infty\} is defined as domv:={ηm:v(η)<+}\mathrm{dom}\,{v}\mathrel{\mathop{\ordinarycolon}}=\{\eta\in\mathbb{R}^{m}\mathrel{\mathop{\ordinarycolon}}v(\eta)<+\infty\}, which is clearly a convex set. The affine hull of a convex set KmK\subset\mathbb{R}^{m} is defined to be the smallest affine space containing KK, and the relative interior of KK is the interior of KK viewed as a subset of its affine hull (equipped with the subspace topology). By convention, we have v(η)=+v(\eta)=+\infty if there is no μC\mu\in C such that φj(μ)ηj\varphi_{j}(\mu)\leq\eta_{j} for all j=1,,mj=1,\dots,m.

Proposition 15.

Assuming the Slater condition, the strong duality holds vP=vDv^{\mathrm{P}}=v^{\mathrm{D}} with an optimal dual solution λ0\lambda^{*}\geq 0 (i.e., the supremum in the dual problem (43) is attained).

Proof.

The weak duality vPvDv^{\mathrm{P}}\geq v^{\mathrm{D}} holds with a standard argument of exchanging the inf\inf and sup\sup operators, so it suffices to show that vPvDv^{\mathrm{P}}\leq v^{\mathrm{D}}. If vP=v^{\mathrm{P}}=-\infty then the inequality holds trivially, so we assume that vP>v^{\mathrm{P}}>-\infty. Given the Slater condition, the value function v(η)v(\eta) of the primal problem (42) must be proper v(η)>v(\eta)>-\infty for all ηm\eta\in\mathbb{R}^{m} (ref. Theorem 7.2 in [38]) because η=0\eta=0 is in the relative interior of the effective domain of vv and v(0)>v(0)>-\infty. Thus it is also subdifferentiable at the point η=0\eta=0 (ref. Theorem 23.4 in [38]), i.e., there exists a subgradient vector λm\lambda^{*}\in\mathbb{R}^{m} such that v(η)v(0)(λ)𝖳ηv(\eta)\geq v(0)-(\lambda^{*})^{\mathsf{T}}\eta for any ηm\eta\in\mathbb{R}^{m}. Here, for each j=1,,lj=1,\dots,l, the multiplier λj\lambda_{j}^{*} must be nonnegative since the function v(η)v(\eta) is not increasing in the jj-th component, so we have λΛ\lambda\in\Lambda. Since the inequality v(η)+(λ)𝖳ηv(0)=vPv(\eta)+(\lambda^{*})^{\mathsf{T}}\eta\geq v(0)=v^{\mathrm{P}} holds for any ηm\eta\in\mathbb{R}^{m}, we have

vP\displaystyle v^{\mathrm{P}} infηm{v(η)+(λ)𝖳η}\displaystyle\leq\inf_{\eta\in\mathbb{R}^{m}}\{v(\eta)+(\lambda^{*})^{\mathsf{T}}\eta\}
=infμCinfηm{φ0(μ)+j=1mλjηj:φj(μ)ηj,j=1,,l,φj(μ)=ηj,j=l+1,,m}\displaystyle=\inf_{\mu\in C}\inf_{\eta\in\mathbb{R}^{m}}\left\{\varphi_{0}(\mu)+\sum_{j=1}^{m}\lambda_{j}^{*}\eta_{j}\mathrel{\mathop{\ordinarycolon}}\varphi_{j}(\mu)\leq\eta_{j},\,j=1,\dots,l,\;\varphi_{j}(\mu)=\eta_{j},\,j=l+1,\dots,m\right\}
=infμC{φ0(μ)+j=1mλjφj(μ)}vD.\displaystyle=\inf_{\mu\in C}\left\{\varphi_{0}(\mu)+\sum_{j=1}^{m}\lambda_{j}^{*}\varphi_{j}(\mu)\right\}\leq v^{\mathrm{D}}.

The first equality here results from exchanging two infimum operators, while the second one follows by taking ηj=φj(μ)\eta_{j}=\varphi_{j}(\mu), due to the nonnegativity of λj\lambda^{*}_{j} for each j=1,,lj=1,\dots,l, and replacing ηj\eta_{j} with φj(μ)\varphi_{j}(\mu) for each j=l+1,,mj=l+1,\dots,m. ∎∎

The strong Lagrangian duality guaranteed by the Slater condition is useful for many applications because we do not have to specify the topology on the vector space \mathcal{M}. The corollary below summarizes a special case where there is no equality constraint, and all the inequality constraints can be strictly satisfied.

Corollary 16.

For problems (42) and (43) with l=ml=m (no equality constraints), the strong duality holds if there exists a point μ¯C\bar{\mu}\in C such that φj(μ¯)<0\varphi_{j}(\bar{\mu})<0, for each j=1,,mj=1,\dots,m.

Proof.

Let εj:=φj(μ)>0\varepsilon_{j}\mathrel{\mathop{\ordinarycolon}}=-\varphi_{j}(\mu)>0 for j=1,,mj=1,\dots,m, and U:=j=1m(εj2,εj2)mU\mathrel{\mathop{\ordinarycolon}}=\prod_{j=1}^{m}(-\frac{\varepsilon_{j}}{2},\frac{\varepsilon_{j}}{2})\subset\mathbb{R}^{m} be an open hyperrectangle. Then for any ηU\eta\in U, we have v(η)φ0(μ¯)<+v(\eta)\leq\varphi_{0}(\bar{\mu})<+\infty. Therefore, we know that the Slater condition holds because 0Udomv0\in U\subset\mathrm{dom}\,{v}, and the result follows from Proposition 15. ∎∎

Assuming that \mathcal{M} is normed and complete, we could have another generalized Slater condition for problems with equality constraints. However, it is in general much harder to check so we do not base our discussion in Section 2 on it.

A.2 Finite-dimensional Dual Recursion

Now we derive the finite-dimensional dual recursion for the DR-MCO (2). Recall that by the definition of Wasserstein distance (5), the constraint Wt(p,ν^t)ρt,0W_{t}(p,\hat{\nu}_{t})\leq\rho_{t,0} is ensured if there exists a probability measure on the product space πtProb(Ξt×Ξt)\pi_{t}\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}\times\Xi_{t}) with marginal probability measures P1(πt)=pP^{1}_{*}(\pi_{t})=p and P2(πt)=ν^tP^{2}_{*}(\pi_{t})=\hat{\nu}_{t}, such that

ρt,0Ξt×Ξtdtdπt=ΞtΞtdt(ξ,ξ)dπt(ξ|ξ)dν^t(ξ)=1ntk=1ntΞtdt,k(ξ)dpt,k(ξ)\displaystyle\rho_{t,0}\geq\int_{\Xi_{t}\times\Xi_{t}}d_{t}\operatorname{d\!}\pi_{t}=\int_{\Xi_{t}}\int_{\Xi_{t}}d_{t}(\xi,\xi^{\prime})\operatorname{d\!}\pi_{t}(\xi|\xi^{\prime})\operatorname{d\!}\hat{\nu}_{t}(\xi^{\prime})=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\int_{\Xi_{t}}d_{t,k}(\xi)\operatorname{d\!}p_{t,k}(\xi) (44)

where we define pt,k:=πt(|ξ^t,k)𝒲tp_{t,k}\mathrel{\mathop{\ordinarycolon}}=\pi_{t}(\cdot|\hat{\xi}_{t,k})\in\mathcal{W}_{t} to be the probability measure conditioned on {ξ=ξ^t,k}\{\xi^{\prime}=\hat{\xi}_{t,k}\} and dt,k(ξ):=d(ξ,ξ^t,k)d_{t,k}(\xi)\mathrel{\mathop{\ordinarycolon}}=d(\xi,\hat{\xi}_{t,k}) for any ξΞt\xi\in\Xi_{t}. Then we have p=1ntk=1ntpt,kp=\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}p_{t,k} by the law of total probability. Due to this condition, we define the following parametrized optimization problem for any ρ>0\rho>0 and any state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}

qt(xt1;ρ):=suppt,k𝒲t\displaystyle q_{t}(x_{t-1};\rho)\mathrel{\mathop{\ordinarycolon}}=\sup_{p_{t,k}\in\mathcal{W}_{t}}\quad 1ntk=1ntΞtQt(xt1;ξt)dpt,k(ξt)\displaystyle\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}p_{t,k}(\xi_{t}) (45)
s.t.\displaystyle\mathrm{s.t.}\quad 1ntk=1ntΞtdt,kdpt,kρ,\displaystyle\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\int_{\Xi_{t}}d_{t,k}\operatorname{d\!}{p_{t,k}}\leq\rho,
1ntk=1ntΞtgt,jdpt,kρt,j,j=1,,mt.\displaystyle\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\int_{\Xi_{t}}g_{t,j}\operatorname{d\!}{p_{t,k}}\leq\rho_{t,j},\quad j=1,\dots,m_{t}.

From the discussion above, we see that 𝒬t1(xt1)qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})\geq q_{t}(x_{t-1};\rho_{t,0}). We next show that the equality holds assuming the strict feasibility of the empirical measure ν^t\hat{\nu}_{t}.

Let λ0mt+1\lambda\in\mathbb{R}^{m_{t}+1}_{\geq 0} denote a multiplier vector. Assumption 1 guarantees by Corollary 16 the strong duality of the following Lagrangian dual problem

qt(xt1;ρ)\displaystyle q_{t}(x_{t-1};\rho) =minλ0{ρλ0+j=1mtρt,jλj+1ntk=1ntsuppt,k𝒲tΞt[Qt(xt1;)λ0dt,kj=1mtλjgt,j]dpt,k}\displaystyle=\min_{\lambda\geq 0}\left\{\rho\lambda_{0}+\sum_{j=1}^{m_{t}}\rho_{t,j}\lambda_{j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{p_{t,k}\in\mathcal{W}_{t}}\int_{\Xi_{t}}\bigg{[}Q_{t}(x_{t-1};\cdot)-\lambda_{0}d_{t,k}-\sum_{j=1}^{m_{t}}\lambda_{j}g_{t,j}\bigg{]}\operatorname{d\!}{p_{t,k}}\right\}
=minλ0{ρλ0+j=1mtρt,jλj+1ntk=1ntsupξkΞt{Qt(xt1;ξk)λ0dt,k(ξk)j=1mtλjgt,j(ξk)}}.\displaystyle=\min_{\lambda\geq 0}\left\{\rho\lambda_{0}+\sum_{j=1}^{m_{t}}\rho_{t,j}\lambda_{j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{\xi_{k}\in\Xi_{t}}\Bigg{\{}Q_{t}(x_{t-1};\xi_{k})-\lambda_{0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{j}g_{t,j}(\xi_{k})\Bigg{\}}\right\}. (46)

where the second equality holds because each Dirac measure centered at ξΞt\xi\in\Xi_{t} satisfies δξ𝒲t\delta_{\xi}\in\mathcal{W}_{t} and each pt,k𝒲tp_{t,k}\in\mathcal{W}_{t} is a probability measure. We are now ready to show that 𝒬t1(xt1)=qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})=q_{t}(x_{t-1};\rho_{t,0}), which implies Theorem 1.

Theorem 17.

Under Assumption 1, in any stage t2t\geq 2, we have qt(xt1;ρ)q_{t}(x_{t-1};\rho) is a concave function in ρ\rho for any fixed xt1x_{t-1}. Consequently, the expected cost-to-go function (2) satisfies 𝒬t1(xt1)=qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})=q_{t}(x_{t-1};\rho_{t,0}) and thus can be equivalently rewritten as

𝒬t1(xt1)=minλ0{j=0mtρt,jλj+1ntk=1ntsupξkΞt{Qt(xt1;ξk)λ0dt,k(ξk)j=1mtλjgt,j(ξk)}}.\mathcal{Q}_{t-1}(x_{t-1})=\min_{\lambda\geq 0}\left\{\sum_{j=0}^{m_{t}}\rho_{t,j}\lambda_{j}+\frac{1}{n_{t}}\sum_{k=1}^{n_{t}}\sup_{\xi_{k}\in\Xi_{t}}\left\{Q_{t}(x_{t-1};\xi_{k})-\lambda_{0}d_{t,k}(\xi_{k})-\sum_{j=1}^{m_{t}}\lambda_{j}g_{t,j}(\xi_{k})\right\}\right\}.
Proof.

Let us fix any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. The first assertion on the concavity of qt(xt1;ρ)q_{t}(x_{t-1};\rho) follows directly from (46) since qt(xt1;)q_{t}(x_{t-1};\cdot) is a minimum of affine functions. Moreover, from the definition (45) we see that qt(xt1;ρ)0q_{t}(x_{t-1};\rho)\geq 0 for any ρ>0\rho>0 as the measures pt,k=δξ^t,kp_{t,k}=\delta_{\hat{\xi}_{t,k}} satisfy the constraints and Qt(xt1;ξt)0Q_{t}(x_{t-1};\xi_{t})\geq 0 by the nonnegativity of the cost functions ftf_{t}. If qt(xt1;ρt,0)=+q_{t}(x_{t-1};\rho_{t,0})=+\infty, then the equality holds trivially as we already showed that 𝒬t1(xt1)qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})\geq q_{t}(x_{t-1};\rho_{t,0}). Otherwise, we must have qt(xt1;ρ)<+q_{t}(x_{t-1};\rho)<+\infty for any ρ>0\rho>0 due to the concavity. Thus qt(xt1;)q_{t}(x_{t-1};\cdot) is a continuous function on (0,+)(0,+\infty).

To prove the inequality 𝒬t1(xt1)qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})\leq q_{t}(x_{t-1};\rho_{t,0}), take any ε>0\varepsilon>0. From the definition (5), the constraint Wt(p,ν^t)ρt,0W_{t}(p,\hat{\nu}_{t})\leq\rho_{t,0} implies that there exists a probability measure πtProb(Ξt×Ξt)\pi_{t}\in\mathcal{M}^{\mathrm{Prob}}(\Xi_{t}\times\Xi_{t}) with marginal probability measures P1(πt)=pP^{1}_{*}(\pi_{t})=p and P2(πt)=ν^tP^{2}_{*}(\pi_{t})=\hat{\nu}_{t} such that dtdπtρt,0+ε\int d_{t}\operatorname{d\!}{\pi_{t}}\leq\rho_{t,0}+\varepsilon. In other words, we have 𝒬t1(xt1)qt(xt1;ρt,0+ε)\mathcal{Q}_{t-1}(x_{t-1})\leq q_{t}(x_{t-1};\rho_{t,0}+\varepsilon). Now by the continuity of qt(xt1;)q_{t}(x_{t-1};\cdot), we conclude that 𝒬t1(xt1)limε0+qt(xt1;ρt,0+ε)=qt(xt1;ρt,0)\mathcal{Q}_{t-1}(x_{t-1})\leq\lim_{\varepsilon\to 0+}q_{t}(x_{t-1};\rho_{t,0}+\varepsilon)=q_{t}(x_{t-1};\rho_{t,0}). ∎∎

Appendix B Supplemental Numerical Results

In this section, we display supplemental results from our numerical experiments in Section 4.

Refer to caption
Figure 5: Out-of-sample Cost Quantiles for Different Radii on Multi-commodity Inventory with Uncertain Demands, Additional Run 1
Refer to caption
Figure 6: Out-of-sample Cost Quantiles for Different Radii on Multi-commodity Inventory with Uncertain Demands, Additional Run 2
Refer to caption
Figure 7: Comparison against Baseline Models on Multi-commodity Inventory with Uncertain Demands, Additional Run 1
Refer to caption
Figure 8: Comparison against Baseline Models on Multi-commodity Inventory with Uncertain Demands, Additional Run 2
Refer to caption
Figure 9: In-sample and Out-of-sample Mean Costs on Multi-commodity Inventory with Uncertain Prices, Additional Run 1
Refer to caption
Figure 10: In-sample and Out-of-sample Mean Costs on Multi-commodity Inventory with Uncertain Prices, Additional Run 2
Refer to caption
Figure 11: Comparison against Baseline Models on Hydro-thermal Power Planning, Additional Run 1
Refer to caption
Figure 12: Comparison against Baseline Models on Hydro-thermal Power Planning, Additional Run 2