This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Distributionally Robust Multistage Convex Optimization: New Algorithms and Complexity Analysis

Shixuan Zhang Wm Michael Barnes ‘64 Department of Industrial & Systems Engineering, Texas A&M University, College Station, TX 77843, USA  shixuan.zhang@tamu.edu Xu Andy Sun Operations Research and Statistics, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139,  sunx@mit.edu
Abstract

This paper presents an algorithmic study and complexity analysis for solving distributionally robust multistage convex optimization (DR-MCO). We generalize the usual consecutive dual dynamic programming (DDP) algorithm to DR-MCO and propose a new nonconsecutive DDP algorithm that explores the stages in an adaptive fashion. We introduce dual bounds in the DDP recursions to prevent the growth of Lipschitz constants of the dual approximations caused by recursive cutting plane methods. We then provide a thorough subproblem-oracle-based complexity analysis of the proposed algorithms, proving both upper complexity bounds and a matching lower bound.

To the best of our knowledge, this is the first nonasymptotic complexity result for DDP-type algorithms on DR-MCO, which reveals that in some practical settings the new nonconsecutive DDP algorithm scales linearly with respect to the number of stages. Numerical examples are given to show the effectiveness of the proposed nonconsecutive DDP algorithm and the dual-bounding technique, including the reduction of the computation time or the number of subproblem oracle evaluations, and the capability to solve problems without relatively complete recourse.
Keywords: distributionally robust convex optimization, multistage optimization, dual dynamic programming algorithm, complexity analysis, cutting plane method

1 Introduction

Distributionally robust multistage convex optimization (DR-MCO) is a sequential decision making problem with convex objective functions and constraints, where the exact probability distribution of the uncertain parameters is unknown and decisions need to be made considering a family of distributions. DR-MCO provides a unified framework for studying decision-making under uncertainty. It includes as special cases both multistage stochastic convex optimization (MSCO), where some distribution of the uncertain parameters is known and the expected cost is to be minimized, and multistage robust convex optimization (MRCO), where the uncertainty is described by a set and the worst-case cost is to be minimized. DR-MCO as a general decision framework finds ubiquitous applications in energy system planning, supply chain and inventory control, portfolio optimization, finance, and many other areas (see  e.g. [39, 6]).

DR-MCO is in general challenging to solve, due to the fast growth of the number of decisions with respect to the number of decision stages [36, 15, 17]. Meanwhile, real-world problems are often endowed with special structures in the uncertainty. In particular, the uncertainty may exhibit stagewise independence (SI), i.e., the uncertainty in different stages are independent from each other. Many uncertainty structures, such as autoregressive stochastic models, can be reformulated to satisfy SI [41]. This versatile modeling capability of SI has great implications on computation. It allows recursive formulations of cost-to-go functions in each stage of a DR-MCO to be independent of the outcomes in its previous stages, thus making efficient approximations of the cost-to-go functions possible. Indeed, SI has been successfully exploited by various algorithms in solving MSCO and MRCO [37, 32, 17, 30, 3, 4, 1, 43, 38].

Among algorithms for DR-MCO, dual dynamic programming (DDP) is a prevalent class of recursive cutting plane algorithms that originate from nested Benders decomposition for multistage stochastic linear optimization [8, 28]. The earliest form of DDP for MSCO using stochastic sampling method was proposed in [29], known as stochastic dual dynamic programming (SDDP), where in each iteration the scenarios are sampled randomly and solved sequentially before updating the cost-to-go functions recursively. The algorithm has since been widely adopted in areas such as energy systems scheduling [16, 11, 40]. The deterministic sampling version of DDP was later proposed, which uses both over- and under-approximations for sampling and termination [3, 4]. DDP has also been extended to multistage stochastic nonconvex problems [43, 44, 1, 42]. Robust dual dynamic programming (RDDP) is proposed for multistage robust linear optimization [17]. Due to its intrinsic difficulty, the uncertainty sets are assumed to be polytopes such that the subproblem in each stage can be solved via a vertex enumeration technique over the uncertainty set. Similar to the deterministic DDP, RDDP constructs both over- and under-approximations to select the worst-case outcome. Moreover, it has the advantage of being able to terminate the algorithm with a guaranteed optimal first stage solution, in contrast to the commonly used decision rules [24, 7, 21]. Recently, DDP has been further extended to DR-MCO with promising out-of-sample performance [2, 22, 31, 14]. In particular, [31] uses an ambiguity set defined by a χ2\chi^{2}-distance neighborhood. In [14], the ambiguity sets are taken to be Wasserstein metric balls with finite support and centered at the empirical distributions, and the algorithm is shown to converge asymptotically with stochastic sampling methods. We comment that all of the above variants of DDP algorithms rely on the assumption of relatively complete recourse (RCR), while it is indeed possible to have MSCO that do not satisfy such an assumption [27].

The convergence analysis of DDP begins with multistage linear optimization [32, 37, 26, 10], where an almost sure finite convergence is established based on polyhedral structures. In [18], an asymptotic convergence is proved for MSCO problems. Due to the multistage structure, a main complexity question concerning DDP is the dependence of its iteration complexity on the number of decision stages, which is recently answered for MSCO, independently in [25, 42]. In particular, [18, 25] assume RCR and, moreover, the value functions are all Lipschitz continuous. A weaker regularity assumption concerning exact penalization is made instead in [42] that allows the DDP algorithm to work without RCR. However, it is not yet known whether the DDP algorithm, together with the complexity analysis, works for DR-MCO. The present paper introduces new algorithmic techniques to DDP and answers this question in the positive for the first time.

While the DDP algorithms have achieved great computational success for MSCO problems, one major issue of their generalization to the DR-MCO problems is the lack of proper termination criterion. Due to the distributional uncertainty in the model, the commonly used statistical upper bound for the policy evaluation in the MSCO literature (e.g., [37, 43]) is no longer valid for the DR-MCO problems. As a result, computational experiments in [31] and [14] terminate at a fixed number of iterations or cuts, without a good guarantee of the solution quality. To overcome the lack of statistical upper bound, we follow the idea of deterministic upper bounds [33, 3] and focus on dynamically constructing both under- and over-approximations to certify the optimality of the solutions, similar to the RDDP algorithm [17]. Our paper makes the following contributions.

  1. 1.

    We provide a unified framework for studying DR-MCO under SI assumption. We find that the traditional cutting plane methods can cause the Lipschitz constants of the under-approximations to grow with respect to the number of stages. Motivated by this phenomenon, we introduce bounds on the dual variables in the DDP recursion, which could effectively control the growth of Lipschitz constants and can dispense with the RCR assumption.

  2. 2.

    We extend the usual consecutive DDP (CDDP) to DR-MCO with consecutive exploration of the single stage subproblems, and also propose a novel nonconsecutive variant (NDDP), which adaptively explores the stages in a nonconsecutive fashion. We provide subproblem-oracle-based complexity upper bounds for both algorithms. In particular, the proposed NDDP has a linear complexity growth with respect to the number of stages TT, assuming that the target optimality gap is allowed to scale with TT.

  3. 3.

    We construct a class of multistage robust convex problems to obtain a complexity lower bound for the new algorithms for the first time, which shows the complexity upper bounds are essentially tight. The complexity bounds can be applied to more general DR-MCO problems with continuous distributions.

  4. 4.

    Numerical results on a multi-commodity inventory problem and a hydro-thermal power planning problem are given to illustrate the effectiveness of our algorithms, including the capability to solve problems without RCR and the reduction in the computation time and number of subproblem oracle evaluations.

The rest of the paper is organized as follows. Section 2 contains the formulations of the problems and the discussion on the dual bounds in the DDP recursions. In Section 3, we introduce the single stage subproblem oracles and present the CDDP and NDDP algorithms with complexity bounds. In Section 4, we present two classes of numerical examples that demonstrate the effectiveness of the algorithms. Section 5 concludes the paper.

2 Formulations and Recursive Approximation

In this section, we introduce formulations of distributionally robust multistage convex optimization (DR-MCO). In the case where the DR-MCO has finite support, we build approximations of the value functions using recursions. We then discuss the dual-bounding technique and its exactness that is used for complexity analysis in Section 3.

2.1 Problem Formulations

We present a general definition of DR-MCO under the assumption of stagewise independence (SI). The SI assumption is necessary for efficient algorithmic development and we refer any interested reader to [39] for more general settings. Then we show that most MSCO and MRCO in the existing literature (such as [18] and  [17]) can be encompassed within this DR-MCO framework even with only finitely many uncertainty outcomes in each stage.

Let 𝒯{1,2,,T}\mathcal{T}\coloneqq\{1,2,\dots,T\} denote the set of stage indices and 𝒯𝒯{1}\mathcal{T}^{\prime}\coloneqq\mathcal{T}\setminus\{1\}. For each t𝒯t\in\mathcal{T}, the decision variable in stage tt is denoted as xtx_{t} and constrained in a compact convex set 𝒳tdt\mathcal{X}_{t}\subset\mathbb{R}^{d_{t}} with dimension dt0d_{t}\in\mathbb{Z}_{\geq 0}. The uncertainty in stage tt is modeled as a random vector ξt\xi_{t} with some possibly unknown probability distribution and its convex support set denoted as Ξt\Xi_{t}. For simplicity, we use x0x_{0} and ξ1\xi_{1} to denote deterministic parameters, known as the initial conditions. The cost incurred by the decisions and the uncertainty outcomes in stage tt is modeled by a nonnegative lower semicontinuous function ft(xt1,xt;ξt)f_{t}(x_{t-1},x_{t};\xi_{t}), that is convex in (xt1,xt)(x_{t-1},x_{t}) for each ξtΞt\xi_{t}\in\Xi_{t} and allowed to take ++\infty for infeasibility. We show by the following example that the usual constraints can be modeled as part of the cost functions ftf_{t}.

Example 1.

Let 𝒴t\mathcal{Y}_{t} denote a compact convex decision set of auxiliary variables yty_{t} in stage t𝒯t\in\mathcal{T}. Given a nonnegative, convex, continuous real-valued cost function ft<(xt1,yt,xt;ξt)f_{t}^{<\infty}(x_{t-1},y_{t},x_{t};\xi_{t}) and any convex continuous functional feasibility constraints gt(xt1,yt,xt;ξt)0g_{t}(x_{t-1},y_{t},x_{t};\xi_{t})\leq 0, we can define t:={(xt1,xt,ξt)𝒳t1×𝒳t×Ξt:yt𝒴t s.t. gt(xt1,yt,xt,ξt)0}\mathcal{F}_{t}\mathrel{\mathop{\ordinarycolon}}=\{(x_{t-1},x_{t},\xi_{t})\in\mathcal{X}_{t-1}\times\mathcal{X}_{t}\times\Xi_{t}\mathrel{\mathop{\ordinarycolon}}\exists\,y_{t}\in\mathcal{Y}_{t}\text{ s.t. }g_{t}(x_{t-1},y_{t},x_{t},\xi_{t})\leq 0\} and set the extended real-valued cost function as

ft(xt1,xt;ξt)={minyt𝒴tft<(xt1,yt,xt;ξt), if (xt1,xt,ξt)t,+, otherwise.f_{t}(x_{t-1},x_{t};\xi_{t})=\begin{cases}\min_{y_{t}\in\mathcal{Y}_{t}}f_{t}^{<\infty}(x_{t-1},y_{t},x_{t};\xi_{t}),&\text{ if }(x_{t-1},x_{t},\xi_{t})\in\mathcal{F}_{t},\\ +\infty,&\text{ otherwise}.\end{cases}

It can be checked that the cost function ftf_{t} in this example is indeed nonnegative, lower semicontinuous, convex in (xt1,xt)(x_{t-1},x_{t}), and proper if t\mathcal{F}_{t}\neq\varnothing.

Let 𝒫t\mathcal{P}_{t} denote a set of Borel probability measures supported on the uncertainty set Ξt\Xi_{t}, which is known as the ambiguity set, for each stage t𝒯t\in\mathcal{T}^{\prime}. Now we can define our DR-MCO problem as follows.

minx1𝒳1f1(x0,x1;ξ1)+\displaystyle\min_{x_{1}\in\mathcal{X}_{1}}\quad f_{1}(x_{0},x_{1};\xi_{1})\ + supp2𝒫2𝔼ξ2p2minx2𝒳2[f2(x1,x2;ξ2)+\displaystyle\sup_{p_{2}\in\mathcal{P}_{2}}\mathbb{E}_{\xi_{2}\sim p_{2}}\min_{x_{2}\in\mathcal{X}_{2}}\Bigg{[}f_{2}(x_{1},x_{2};\xi_{2})+ (1)
+supp3𝒫3𝔼ξ3p3minx3𝒳3[f3(x2,x3;ξ3)+\displaystyle+\sup_{p_{3}\in\mathcal{P}_{3}}\mathbb{E}_{\xi_{3}\sim p_{3}}\min_{x_{3}\in\mathcal{X}_{3}}\bigg{[}f_{3}(x_{2},x_{3};\xi_{3})+\cdots
+suppT𝒫T𝔼ξTpTminxT𝒳TfT(xT1,xT;ξT)]].\displaystyle+\sup_{p_{T}\in\mathcal{P}_{T}}\mathbb{E}_{\xi_{T}\sim p_{T}}\min_{x_{T}\in\mathcal{X}_{T}}f_{T}(x_{T-1},x_{T};\xi_{T})\bigg{]}\Bigg{]}.

Here, each expectation 𝔼ξtpt\mathbb{E}_{\xi_{t}\sim p_{t}} is taken with respect to some given probability measure pt𝒫tp_{t}\in\mathcal{P}_{t}. Note that our problem (1) relies on the SI assumption in the sense that the ambiguity sets and the expectations are independent across stages. Consequently, we can define the (worst-case expected) cost-to-go functions recursively from t=Tt=T to t=2t=2 as:

𝒬t1(xt1)suppt𝒫t𝔼ξtptminxt𝒳tft(xt1,xt;ξt)+𝒬t(xt),\mathcal{Q}_{t-1}(x_{t-1})\coloneqq\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t}), (2)

with 𝒬T(xT)0\mathcal{Q}_{T}(x_{T})\equiv 0 for any xT𝒳Tx_{T}\in\mathcal{X}_{T}. To simplify the notation, we also define the value function as

Qt(xt1;ξt)minxt𝒳tft(xt1,xt;ξt)+𝒬t(xt),Q_{t}(x_{t-1};\xi_{t})\coloneqq\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t}), (3)

such that 𝒬t1(xt1)=suppt𝒫t𝔼ξtptQt(xt1;ξt)\mathcal{Q}_{t-1}(x_{t-1})=\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(x_{t-1};\xi_{t}) for all t𝒯t\in\mathcal{T}^{\prime}. Our algorithmic goal in this paper is to find a near-optimal first stage solution

x1argminx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1),x_{1}^{*}\in\operatorname*{arg\,min}_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1}), (4)

as a multistage solution may already take exponentially many operations in TT to be written down, e.g., 2T12^{T-1} if we have an MSCO with |Ξt|=2\mathinner{\!\left\lvert\Xi_{t}\right\rvert}=2 for t=2,,Tt=2,\dots,T. We say that x1x_{1}^{*} is an ε\varepsilon-optimal first stage solution if

f1(x0,x1;ξ1)+Q1(x1)minx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1).f_{1}(x_{0},x_{1}^{*};\xi_{1})+Q_{1}(x_{1}^{*})\leq\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1}). (5)

It is well-known that if the ambiguity set 𝒫t\mathcal{P}_{t} is a singleton set, then the supremum is superficial so the DR-MCO (1) reduces to an MSCO. If the ambiguity set 𝒫t\mathcal{P}_{t} contains all Dirac atomic measures (i.e., measures δξ^\delta_{\hat{\xi}} for all ξ^Ξt\hat{\xi}\in\Xi_{t} such that 𝔼ξtδξ^g(ξ)=g(ξ^)\mathbb{E}_{\xi_{t}\sim\delta_{\hat{\xi}}}g(\xi)=g(\hat{\xi}) for any Borel measurable function gg), then the supremum

suppt𝒫t𝔼ξtptQt(xt1;ξt)=supξtΞtQt(xt1;ξt)\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(x_{t-1};\xi_{t})=\sup_{\xi_{t}\in\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})

and thus the DR-MCO (1) reduces to an MRCO. The following proposition checks that minimizations in DR-MCO (1) (and thus also in MSCO and MRCO) are convex and well-defined.

Proposition 1.

The cost-to-go functions 𝒬t(xt)\mathcal{Q}_{t}(x_{t}) are lower semicontinuous (lsc) and convex for all t𝒯t\in\mathcal{T}.

Proof.

We prove by recursion from t=Tt=T to t=1t=1. By definition, 𝒬T(xT)0\mathcal{Q}_{T}(x_{T})\equiv 0 is lsc and convex. Now assume 𝒬t\mathcal{Q}_{t} is lsc and convex for some t𝒯t\in\mathcal{T}. Then the sum ft(xt1,xt;ξt)+𝒬t(xt)f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t}) is also lsc and convex in (xt1,xt)(x_{t-1},x_{t}) for any ξtΞt\xi_{t}\in\Xi_{t}, due to the assumption of ftf_{t}. Since 𝒳t\mathcal{X}_{t} is compact, we have Qt(xt1;ξt):=minxt𝒳tft(xt1,xt;ξt)+𝒬t(xt)Q_{t}(x_{t-1};\xi_{t})\mathrel{\mathop{\ordinarycolon}}=\min_{x_{t}\in\mathcal{X}_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t}) is lsc (see e.g., Lemma 1.30 in [5]) and convex. Now fix any Borel probability measure pt𝒫tp_{t}\in\mathcal{P}_{t} and take any sequence {xi}𝒳t1\{x^{i}\}\subset\mathcal{X}_{t-1} with limixi=xt1\lim_{i\to\infty}x^{i}=x_{t-1}. Note that QtQ_{t} is nonnegative by definition, so by Fatou’s lemma (see e.g., Lemma 1.28 in [35]) we have

lim infi𝔼ξtptQt(xi;ξt)Ξtlim infiQt(xi;ξt)dpt(ξt)ΞtQt(xt1;ξt)dpt(ξt).\liminf_{i\to\infty}\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(x^{i};\xi_{t})\geq\int_{\Xi_{t}}\liminf_{i\to\infty}Q_{t}(x^{i};\xi_{t})\operatorname{d\!}p_{t}(\xi_{t})\geq\int_{\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})\operatorname{d\!}p_{t}(\xi_{t}).

The expectation and the integrals are well-defined since QtQ_{t} is lsc, hence Borel measurable. This inequality shows that the function 𝔼ξtptQt(xt1;ξt)\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(x_{t-1};\xi_{t}) is lsc in xt1x_{t-1}. It is also convex in xt1x_{t-1} by the linearity and monotonicity of expectations (see e.g., Theorem 7.46 in [39]). Finally, the epigraph of 𝒬t1()\mathcal{Q}_{t-1}(\cdot) is the intersection of epigraphs of 𝔼ξtptQt(;ξt)\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(\cdot;\xi_{t}) for all pt𝒫tp_{t}\in\mathcal{P}_{t}, which shows that 𝒬t1()\mathcal{Q}_{t-1}(\cdot) is lsc and convex. ∎

2.1.1 Finitely Supported Uncertainty Sets

If the uncertainty set Ξt={ξ^t,1,,ξ^t,nt}\Xi_{t}=\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} is finite, then the ambiguity set is a subset of a ntn_{t}-dimensional simplex 𝒫tΔnt:={pt0nt:k=1ntpt,k=1}\mathcal{P}_{t}\subseteq\Delta^{n_{t}}\mathrel{\mathop{\ordinarycolon}}=\{p_{t}\in\mathbb{R}^{n_{t}}_{\geq 0}\mathrel{\mathop{\ordinarycolon}}\sum_{k=1}^{n_{t}}p_{t,k}=1\}. In this case, the expectations in the definitions (1) and (2) become finite weighted summations. For MSCO, finite supports often arise from sample average approximations [37]. It turns out that for MRCO, we can also restrict our attention to a finite support set when the problem is polyhedral (cf. [17]).

Proposition 2.

Let 𝒬t\mathcal{Q}_{t} denote the cost-to-go functions of an MRCO, i.e., a DR-MCO (1) with all Dirac atomic measures included in 𝒫t\mathcal{P}_{t} for all t𝒯t\in\mathcal{T}^{\prime}. If the cost functions ftf_{t} are jointly convex in (xt,ξt)(x_{t},\xi_{t}) for each xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and the uncertainty sets Ξt\Xi_{t} are polytopes for all t𝒯t\in\mathcal{T}^{\prime}, then

𝒬t1(xt1)=maxptΔntk=1ntpt,kQt(xt1;ξ^t,k).\mathcal{Q}_{t-1}(x_{t-1})=\max_{p_{t}\in\Delta^{n_{t}}}\sum_{k=1}^{n_{t}}p_{t,k}\cdot Q_{t}(x_{t-1};\hat{\xi}_{t,k}).

where {ξ^t,1,,ξ^t,nt}\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} is the finite set of extreme points of Ξt\Xi_{t}.

Proof.

By Proposition 1, each worst-case cost-to-go function 𝒬t\mathcal{Q}_{t} is convex for any t𝒯t\in\mathcal{T}. From the assumption, the value function Qt(xt1;ξt)Q_{t}(x_{t-1};\xi_{t}) is thus also convex in ξt\xi_{t} for each xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. Consequently, we have

𝒬t1(xt1)=supξtΞtQt(xt1;ξt)=maxptΔntk=1ntpt,kQt(xt1;ξ^t,k),\mathcal{Q}_{t-1}(x_{t-1})=\sup_{\xi_{t}\in\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})=\max_{p_{t}\in\Delta^{n_{t}}}\sum_{k=1}^{n_{t}}p_{t,k}Q_{t}(x_{t-1};\hat{\xi}_{t,k}),

due to the linearity of the maximization. We point out that the above equation holds even when supξtΞtQt(xt1;ξt)=+\sup_{\xi_{t}\in\Xi_{t}}Q_{t}(x_{t-1};\xi_{t})=+\infty because in this case we must have Qt(xt1;ξ^t,k)=+Q_{t}(x_{t-1};\hat{\xi}_{t,k})=+\infty for some k=1,,ntk=1,\dots,n_{t} by its convexity. ∎

In view of Proposition 2, we decide to focus on finitely supported uncertainty sets when we discuss numerical implementations in this paper (such as in Sections 3.1.1 and 4), although the algorithms and the complexity questions are intended for a more general setting of DR-MCO problems.

2.2 Approximation of Recursions

We now discuss the approximation of functions 𝒬t1\mathcal{Q}_{t-1} and QtQ_{t} in the recursions (2) and (3). The following lemma relates the Lipschitz continuity of the value functions Qt(;ξt)Q_{t}(\cdot;\xi_{t}) for ξtΞt\xi_{t}\in\Xi_{t} and that of the cost-to-go function 𝒬t\mathcal{Q}_{t}.

Lemma 1.

If Qt(;ξt)Q_{t}(\cdot;\xi_{t}) is LtL_{t}-Lipschitz continuous on 𝒳t\mathcal{X}_{t} for any ξtΞt\xi_{t}\in\Xi_{t} and for some Lt>0L_{t}>0, then so is 𝒬t()\mathcal{Q}_{t}(\cdot).

Proof.

Take any two points x1,x2𝒳tx^{1},x^{2}\in\mathcal{X}_{t}. For any ε>0\varepsilon>0, take pt1𝒫tp_{t}^{1}\in\mathcal{P}_{t} such that

𝒬t1(x1)ε𝔼ξtpt1Qt(x1;ξt)𝒬t1(x1).\mathcal{Q}_{t-1}(x^{1})-\varepsilon\leq\mathbb{E}_{\xi_{t}\sim p_{t}^{1}}Q_{t}(x^{1};\xi_{t})\leq\mathcal{Q}_{t-1}(x^{1}).

Then we have

𝒬t1(x1)𝒬t1(x2)\displaystyle\mathcal{Q}_{t-1}(x^{1})-\mathcal{Q}_{t-1}(x^{2}) ε+𝔼ξtpt1Qt(x1;ξt)suppt𝒫t𝔼ξtptQt(x2;ξt)\displaystyle\leq\varepsilon+\mathbb{E}_{\xi_{t}\sim p_{t}^{1}}Q_{t}(x^{1};\xi_{t})-\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}(x^{2};\xi_{t})
ε+𝔼ξtpt1[Qt(x1;ξt)Qt(x2;ξt)]\displaystyle\leq\varepsilon+\mathbb{E}_{\xi_{t}\sim p_{t}^{1}}\Bigl{[}Q_{t}(x^{1};\xi_{t})-Q_{t}(x^{2};\xi_{t})\Bigr{]}
ε+𝔼ξtpt1Ltx1x2=ε+Ltx1x2.\displaystyle\leq\varepsilon+\mathbb{E}_{\xi_{t}\sim p_{t}^{1}}L_{t}\lVert x^{1}-x^{2}\rVert=\varepsilon+L_{t}\lVert x^{1}-x^{2}\rVert.

Since ε>0\varepsilon>0 is arbitrary, we have 𝒬t1(x1)𝒬t1(x2)Ltx1x2\mathcal{Q}_{t-1}(x^{1})-\mathcal{Q}_{t-1}(x^{2})\leq L_{t}\lVert x^{1}-x^{2}\rVert. The proof is then completed by exchanging x1x^{1}, x2x^{2} and repeating the argument. ∎

Combining Lemma 1 and Proposition 1, we know that if the value functions are convex and Lipschitz continuous, then so are the cost-to-go functions. In such a case, we can use cutting plane methods to build an under-approximation of the cost-to-go functions. To be precise, for every ξtΞt\xi_{t}\in\Xi_{t}, let Vt(;ξt)V_{t}(\cdot;\xi_{t}) denote an affine function such that Qt(xt1;ξt)Vt(xt1;ξt)Q_{t}(x_{t-1};\xi_{t})\geq V_{t}(x_{t-1};\xi_{t}) for all xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. Such an affine function is referred to as a linear valid inequality or a linear cut for the value function, which is generated in the following way. Let 𝒬¯t\underline{\mathcal{Q}}_{t} denote an under-approximation of the cost-to-go function 𝒬t\mathcal{Q}_{t} and x^t1𝒳t1\hat{x}_{t-1}\in\mathcal{X}_{t-1} a feasible state. For any fixed ξtΞt\xi_{t}\in\Xi_{t}, we introduce a redundant variable ztdt1z_{t}\in\mathbb{R}^{d_{t-1}} with the duplicating constraint zt=x^t1z_{t}=\hat{x}_{t-1}. Then the Lagrangian dual problem

supλtdt1infxt𝒳t,ztdt1ft(zt,xt;ξt)+𝒬¯t(xt)+λt,x^t1zt\sup_{\lambda_{t}\in\mathbb{R}^{d_{t-1}}}\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}\quad f_{t}(z_{t},x_{t};\xi_{t})+\underline{\mathcal{Q}}_{t}(x_{t})+\langle\lambda_{t},\hat{x}_{t-1}-z_{t}\rangle (6)

gives an affine function Vt(xt1;ξt)v¯t(ξt)+λ^t(ξt),xt1x^t1V_{t}(x_{t-1};\xi_{t})\coloneqq\underline{\hbox{\set@color$v$}}_{t}(\xi_{t})+\langle\hat{\lambda}_{t}(\xi_{t}),x_{t-1}-\hat{x}_{t-1}\rangle, where we assume that an optimal dual solution λ^t(ξt)\hat{\lambda}_{t}(\xi_{t}) of (6) and the associated infimum value v¯t(ξt)>\underline{\hbox{\set@color$v$}}_{t}(\xi_{t})>-\infty are found. Then, by definition (3) and weak duality, we have for every xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} that

Qt(xt1;ξt)supλtdt1infxt𝒳t,ztdt1ft(zt,xt;ξt)+𝒬t(xt)+λt,xt1zt\displaystyle Q_{t}(x_{t-1};\xi_{t})\geq\sup_{\lambda_{t}\in\mathbb{R}^{d_{t-1}}}\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(z_{t},x_{t};\xi_{t})+\mathcal{Q}_{t}(x_{t})+\langle\lambda_{t},x_{t-1}-z_{t}\rangle (7)
infxt𝒳t,ztdt1ft(zt,xt;ξt)+𝒬¯t(xt)+λ^t(ξt),x^t1zt+λ^t(ξt),xt1x^t1\displaystyle\geq\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(z_{t},x_{t};\xi_{t})+\underline{\mathcal{Q}}_{t}(x_{t})+\langle\hat{\lambda}_{t}(\xi_{t}),\hat{x}_{t-1}-z_{t}\rangle+\langle\hat{\lambda}_{t}(\xi_{t}),x_{t-1}-\hat{x}_{t-1}\rangle
=v¯t(ξt)+λ^t(ξt),xt1x^t1.\displaystyle=\ \underline{\hbox{\set@color$v$}}_{t}(\xi_{t})+\langle\hat{\lambda}_{t}(\xi_{t}),x_{t-1}-\hat{x}_{t-1}\rangle.

Therefore, Vt(;ξt)V_{t}(\cdot;\xi_{t}) is a valid linear cut for the value function Qt(;ξt)Q_{t}(\cdot;\xi_{t}). Finally, we can aggregate the valid linear cuts to get an affine function

𝒱t1(xt1)𝔼ξtp^tVt(xt1;ξt),\mathcal{V}_{t-1}(x_{t-1})\coloneqq\mathbb{E}_{\xi_{t}\sim\hat{p}_{t}}V_{t}(x_{t-1};\xi_{t}), (8)

where p^t𝒫t\hat{p}_{t}\in\mathcal{P}_{t} can be arbitrarily chosen. The affine function 𝒱t1()\mathcal{V}_{t-1}(\cdot) is a valid linear cut for the cost-to-go function 𝒬t1()\mathcal{Q}_{t-1}(\cdot) because for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1},

𝒬t1(xt1)𝒱t1(xt1)\displaystyle\mathcal{Q}_{t-1}(x_{t-1})-\mathcal{V}_{t-1}(x_{t-1}) 𝔼ξtp^t[Qt(xt1;ξt)Vt(xt1;ξt)]0.\displaystyle\geq\mathbb{E}_{\xi_{t}\sim\hat{p}_{t}}\Bigl{[}Q_{t}(x_{t-1};\xi_{t})-V_{t}(x_{t-1};\xi_{t})\Bigr{]}\geq 0. (9)

The above procedure of generating valid linear cuts requires finding the supremum in the Lagrangian dual problem (6). If the problem (1) does not have relatively complete recourse (RCR), the supremum in (6) may not be attained. Moreover, even with RCR, as we use some under-approximation 𝒬¯t\underline{\mathcal{Q}}_{t} in the dual problem (6), the maximum may only be attained outside a ball with its radius growing with TT, as illustrated by the following example.

Example 2.

Consider a deterministic problem (i.e., |Ξt|=1\mathinner{\!\left\lvert\Xi_{t}\right\rvert}=1 for all t𝒯t\in\mathcal{T}^{\prime}) defined by the recursions for t=T,,2t=T,\dots,2

𝒬t1(xt1)minyt,xt\displaystyle\mathcal{Q}_{t-1}(x_{t-1})\coloneqq\min_{y_{t},x_{t}}\quad 1+yt+𝒬t(xt)\displaystyle 1+y_{t}+\mathcal{Q}_{t}(x_{t})
s.t.\displaystyle\mathrm{s.t.}\quad yt0,yt12xt1,xtxt1+12, 0xt1.\displaystyle y_{t}\geq 0,\ y_{t}\geq 1-2x_{t-1},\ x_{t}\leq x_{t-1}+\frac{1}{2},\ 0\leq x_{t}\leq 1.

Note that for each stage t𝒯t\in\mathcal{T}^{\prime}, since xt1[0,1]x_{t-1}\in[0,1], we have a feasible solution xt=1/2x_{t}=1/2 and yt=0y_{t}=0, which implies that 𝒬t(xt)=Tt+max{0,12xt}\mathcal{Q}_{t}(x_{t})=T-t+\max\{0,1-2x_{t}\} for tT1t\leq T-1. However, if we start our approximation with points xt0=0x_{t}^{0}=0 for all stages t𝒯t\in\mathcal{T}, then the linear cut 𝒱t1(xt1)=Vt(xt1)v¯t+λ^t,xtxt0\mathcal{V}_{t-1}(x_{t-1})=V_{t}(x_{t-1})\coloneqq\underline{\hbox{\set@color$v$}}_{t}+\langle\hat{\lambda}_{t},x_{t}-x_{t}^{0}\rangle can be generated from the following dual problem at stage t𝒯t\in\mathcal{T}^{\prime}:

maxλtinfzt,yt,xt\displaystyle\max_{\lambda_{t}\in\mathbb{R}}\inf_{z_{t},y_{t},x_{t}}\quad 1+yt+λt(0zt)+𝒬¯t0(xt)\displaystyle 1+y_{t}+\lambda_{t}(0-z_{t})+\underline{\mathcal{Q}}_{t}^{0}(x_{t})
s.t.\displaystyle\mathrm{s.t.}\quad yt0,yt12zt,xtzt+12, 0xt1.\displaystyle y_{t}\geq 0,\ y_{t}\geq 1-2z_{t},\ x_{t}\leq z_{t}+\frac{1}{2},\ 0\leq x_{t}\leq 1.

The optimal dual solution is λ^T=2\hat{\lambda}_{T}=-2 in stage TT. Thus the under-approximation of 𝒬T1(xT1)\mathcal{Q}_{T-1}(x_{T-1}) after adding the linear cut is 𝒬¯T10(xT1)=𝒱T(xT1)=22xT1\underline{\mathcal{Q}}_{T-1}^{0}(x_{T-1})=\mathcal{V}_{T}(x_{T-1})=2-2x_{T-1}. The dual problem in stage t=T1t=T-1 now becomes

maxλtinfzt,yt,xt\displaystyle\max_{\lambda_{t}\in\mathbb{R}}\inf_{z_{t},y_{t},x_{t}}\quad 1+ytλtzt+(Tt+1)ctxt\displaystyle 1+y_{t}-\lambda_{t}z_{t}+(T-t+1)-c_{t}x_{t}
s.t.\displaystyle\mathrm{s.t.}\quad yt0,yt12zt,xtzt+12, 0xt1.\displaystyle y_{t}\geq 0,\ y_{t}\geq 1-2z_{t},\ x_{t}\leq z_{t}+\frac{1}{2},\ 0\leq x_{t}\leq 1.

with cT1=2c_{T-1}=2. Note that if λt>ct2\lambda_{t}>-c_{t}-2, the inner minimum value would be strictly smaller than Tt+3ct/2T-t+3-c_{t}/2 (e.g., taking zt=1/2z_{t}=1/2, yt=0y_{t}=0, and xt=1x_{t}=1). Thus any optimal dual solution must satisfy λtct2\lambda_{t}\leq-c_{t}-2 as zt=0z_{t}=0, yt=1y_{t}=1, and xt=1/2x_{t}=1/2 gives an inner minimum value Tt+3ct/2T-t+3-c_{t}/2. By repeating the argument recursively, it can be checked that the under-approximation 𝒬¯t0(xt)=𝒱t(xt)=Tt+1ctxt\underline{\mathcal{Q}}_{t}^{0}(x_{t})=\mathcal{V}_{t}(x_{t})=T-t+1-c_{t}x_{t} with ct=2(Tt)c_{t}=2(T-t). This implies that any optimal dual solution in stage tt would lie outside the open ball of radius 2(Tt+1)2(T-t+1), which is greater than 2, the Lipschitz constant of 𝒬t\mathcal{Q}_{t} on 𝒳t\mathcal{X}_{t}. We illustrate the example with Figure 1 using T=3T=3.

Refer to caption
(a) t=2t=2
Refer to caption
(b) t=1t=1
Figure 1: Cost-to-go Functions and First Iteration Linear Cuts for T=3T=3

2.3 Exactness of Dual Bounds

Example 2 suggests that the norm of optimal dual solutions for linear cut generation cannot be bounded by the Lipschitz constants of the value functions. Intuitively speaking, larger norms of dual variables lead to steeper linear cuts (6), which makes the under-approximation less useful at points away from the previously visited ones. For this reason, we propose to enforce bounds on the dual variables for recursive approximations and investigate the exactness of these artificial dual bounds. To begin with, let us fix algorithmic parameters Mt>0M_{t}>0 for stages t𝒯t\in\mathcal{T}^{\prime} and define alternative value function recursively

QtR(xt1;ξt)maxλtMtinfxt𝒳t,ztdt1ft(xt1,xt;ξt)+𝒬tR(xt)+λt,xt1zt,Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})\coloneqq\max_{\lVert\lambda_{t}\rVert_{*}\leq M_{t}}\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+\langle\lambda_{t},x_{t-1}-z_{t}\rangle, (10)

with 𝒬TR(xT)0\mathcal{Q}_{T}^{\mathrm{R}}(x_{T})\equiv 0 on 𝒳T\mathcal{X}_{T} and

𝒬t1R(xt1)suppt𝒫t𝔼ξtptQtR(xt1;ξt)\mathcal{Q}_{t-1}^{\mathrm{R}}(x_{t-1})\coloneqq\sup_{p_{t}\in\mathcal{P}_{t}}\mathbb{E}_{\xi_{t}\sim p_{t}}Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t}) (11)

for t=T,T1,,2t=T,T-1,\dots,2. We call them (Lipschitz) regularized value functions and (Lipschitz) regularized cost-to-go functions, due to the following observation.

Proposition 3.

For any t𝒯t\in\mathcal{T}^{\prime} and ξtΞt\xi_{t}\in\Xi_{t}, the function QtR(;ξt)Q_{t}^{\mathrm{R}}(\cdot;\xi_{t}) is MtM_{t}-Lipschitz continuous, and QtR(xt1;ξt)Qt(xt1;ξt)Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})\leq Q_{t}(x_{t-1};\xi_{t}) for all xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}. Moreover, we can write QtR(xt1;ξt)Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t}) as a value function

QtR(xt1;ξt)=minxt𝒳t,ztdt1ft(zt,xt;ξt)+𝒬tR(xt)+Mtxt1zt.Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})=\min_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(z_{t},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+M_{t}\lVert x_{t-1}-z_{t}\rVert.
Proof.

We begin our proof for the equation with the following observation

maxλtMtinfxt𝒳t,ztdt1ft(xt1,xt;ξt)+𝒬tR(xt)+λt,xt1zt\displaystyle\max_{\lVert\lambda_{t}\rVert_{*}\leq M_{t}}\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+\langle\lambda_{t},x_{t-1}-z_{t}\rangle
=infxt𝒳t,ztdt1maxλtMtft(xt1,xt;ξt)+𝒬tR(xt)+λt,xt1zt\displaystyle=\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}\max_{\lVert\lambda_{t}\rVert_{*}\leq M_{t}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+\langle\lambda_{t},x_{t-1}-z_{t}\rangle
=minxt𝒳t,ztdt1ft(xt1,xt;ξt)+𝒬tR(xt)+Mtxt1zt.\displaystyle=\min_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+M_{t}\lVert x_{t-1}-z_{t}\rVert.

Here, the first equality is due to Sion’s minimax theorem [23], where the maximization is taken over a compact ball of radius MtM_{t}. The second equality follows from the dual representation of the norm functions [9, A.1.6]. The infimum is attained since the sum ft(xt1,xt;ξt)+𝒬tR(xt)+Mtxtztf_{t}(x_{t-1},x_{t};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})+M_{t}\lVert x_{t}-z_{t}\rVert is bounded from below by our nonnegativity assumption of ftf_{t}. Now it can be checked recursively for t=T,,2t=T,\dots,2 using the definition of 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}} that QtR(xt1;ξt)Qt(xt1;ξt)Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})\leq Q_{t}(x_{t-1};\xi_{t}) for any xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} because zt=xt1z_{t}=x_{t-1} is a feasible solution to the minimization.

It remains to show the MtM_{t}-Lipschitz continuity of QtR(;ξt)Q_{t}^{\mathrm{R}}(\cdot;\xi_{t}). Pick any x1,x2𝒳t1x^{1},x^{2}\in\mathcal{X}_{t-1} and let xti,ztix_{t}^{i},z_{t}^{i} denote the solutions in the minimization associated with xix^{i} for i=1i=1 and 22, respectively. Then, as xt2,zt2x_{t}^{2},z_{t}^{2} are also feasible solutions for the minimization associated with x1x^{1}, we have

QtR(x1;ξt)QtR(x2;ξt)\displaystyle Q_{t}^{\mathrm{R}}(x^{1};\xi_{t})-Q_{t}^{\mathrm{R}}(x^{2};\xi_{t})
ft(zt2,xt2;ξt)+𝒬tR(xt2)+Mtx1zt2\displaystyle\leq f_{t}(z_{t}^{2},x_{t}^{2};\xi_{t})+\mathcal{Q}_{t}^{\mathrm{R}}(x_{t}^{2})+M_{t}\lVert x^{1}-z_{t}^{2}\rVert
ft(zt2,xt2;ξt)𝒬tR(xt2)Mtx2zt2\displaystyle\quad-f_{t}(z_{t}^{2},x_{t}^{2};\xi_{t})-\mathcal{Q}_{t}^{\mathrm{R}}(x_{t}^{2})-M_{t}\lVert x^{2}-z_{t}^{2}\rVert
=Mt(x1zt2x2zt2)Mtx1x2.\displaystyle=M_{t}\bigl{(}\lVert x^{1}-z_{t}^{2}\rVert-\lVert x^{2}-z_{t}^{2}\rVert\bigr{)}\leq M_{t}\lVert x^{1}-x^{2}\rVert.

Now we can repeat the argument with exchanged indices 1 and 2 and show that QtR(x2;ξt)QtR(x1;ξt)Mtx1x2Q_{t}^{\mathrm{R}}(x^{2};\xi_{t})-Q_{t}^{\mathrm{R}}(x^{1};\xi_{t})\leq M_{t}\lVert x^{1}-x^{2}\rVert, which completes the proof. ∎

We can now generate a linear cut Vt(;ξt)V_{t}(\cdot;\xi_{t}) using a dual solution λ^t\hat{\lambda}_{t} and its associated infimum value v¯t>\underline{\hbox{\set@color$v$}}_{t}>-\infty of the following dual problem (cf. (6))

maxλtMtinfxt𝒳t,ztdt1ft(zt,xt;ξt)+𝒬¯t(xt)+λt,x^t1zt.\max_{\lVert\lambda_{t}\rVert_{*}\leq M_{t}}\inf_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}\quad f_{t}(z_{t},x_{t};\xi_{t})+\underline{\mathcal{Q}}_{t}(x_{t})+\langle\lambda_{t},\hat{x}_{t-1}-z_{t}\rangle. (12)

Whenever 𝒬¯t\underline{\mathcal{Q}}_{t} is an under-approximation of the regularized cost-to-go function 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}}, we can use the same argument as in (7) to see the validness of Vt(;ξt)V_{t}(\cdot;\xi_{t}) as a linear cut for QtR(;ξt)Q_{t}^{\mathrm{R}}(\cdot;\xi_{t}), for all ξtΞt\xi_{t}\in\Xi_{t}, and thus also the validness of 𝒱t1()\mathcal{V}_{t-1}(\cdot) as an aggregate linear cut for 𝒬t1R()\mathcal{Q}_{t-1}^{\mathrm{R}}(\cdot). The Lipschitz constants of these linear cuts Vt(;ξt)V_{t}(\cdot;\xi_{t}) and 𝒱t1()\mathcal{V}_{t-1}(\cdot) are bounded by MtM_{t}.

By solving the bounded dual problems (12) for linear cut generation, by Proposition 3, we are essentially replacing the value functions QtQ_{t} and the cost-to-go functions 𝒬t\mathcal{Q}_{t} with the regularized ones, QtRQ_{t}^{\mathrm{R}} and 𝒬tR\mathcal{Q}_{t}^{\mathrm{R}}, respectively. We want to discuss that such replacement does not compromise any feasibility or optimality, which requires a technical lemma.

Lemma 2.

Let 𝒳d\mathcal{X}\subseteq\mathbb{R}^{d} be a full-dimensional convex set and f:df\mathrel{\mathop{\ordinarycolon}}\mathbb{R}^{d}\to\mathbb{R} be a convex function. If the restriction f|𝒳f|_{\mathcal{X}} is MM-Lipschitz continuous, then we have f(x)=infzd{f(z)+Mxz}f(x)=\inf_{z\in\mathbb{R}^{d}}\{f(z)+M\lVert x-z\rVert\} for any x𝒳x\in\mathcal{X}.

Proof.

Assume for contradiction that for some x𝒳x\in\mathcal{X}, there exists zdz\in\mathbb{R}^{d} and ε>0\varepsilon>0 such that f(x)>f(z)+(1+ε)Mzxf(x)>f(z)+(1+\varepsilon)M\lVert z-x\rVert. If xx lies in the interior int𝒳\mathrm{int}\mathcal{X}, then we can find z=x+δ(xz)𝒳z^{\prime}=x+\delta(x-z)\in\mathcal{X} for some δ>0\delta>0. Thus by convexity, (f(z)f(x))/zx(f(x)f(z))/xz>(1+ε)M(f(z^{\prime})-f(x))/\lVert z^{\prime}-x\rVert\geq(f(x)-f(z))/\lVert x-z\rVert>(1+\varepsilon)M, which contradicts the MM-Lipschitz continuity of f|𝒳f|_{\mathcal{X}}.

Otherwise if xint𝒳x\notin\mathrm{int}\mathcal{X}, since 𝒳\mathcal{X} is full-dimensional, we can find xint𝒳x^{\prime}\in\mathrm{int}\mathcal{X} with xxzxε/2\lVert x^{\prime}-x\rVert\leq\lVert z-x\rVert\cdot\varepsilon/2, so xz(1+ε/2)zx\lVert x^{\prime}-z\rVert\leq(1+\varepsilon/2)\lVert z-x\rVert. Besides, by the MM-Lipschitz continuity of f|𝒳f|_{\mathcal{X}}, we have f(x)f(x)Mxx>f(z)+(1+ε/2)Mzxf(x^{\prime})\geq f(x)-M\lVert x-x^{\prime}\rVert>f(z)+(1+\varepsilon/2)M\lVert z-x\rVert. Therefore,

f(x)f(z)xz>(1+ε/2)Mzx(1+ε/2)zx=M,\frac{f(x^{\prime})-f(z)}{\lVert x^{\prime}-z\rVert}>\frac{(1+\varepsilon/2)M\lVert z-x\rVert}{(1+\varepsilon/2)\lVert z-x\rVert}=M,

which shows contradiction by the argument above since xint𝒳x^{\prime}\in\mathrm{int}\mathcal{X}. ∎

Proposition 4.

Suppose that the state spaces 𝒳t\mathcal{X}_{t} are full-dimensional. If the value function Qt(;ξt)Q_{t}(\cdot;\xi_{t}) is MtM_{t}-Lipschitz continuous for any ξtΞt\xi_{t}\in\Xi_{t} and t𝒯t\in\mathcal{T}, then we have QtR(xt1;ξt)=Qt(xt1;ξt)Q_{t}^{\mathrm{R}}(x_{t-1};\xi_{t})=Q_{t}(x_{t-1};\xi_{t}) for all xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1} and t𝒯t\in\mathcal{T}.

Proof.

We prove the assertion by applying Lemma 2 recursively for t=T,T1,,1t=T,T-1,\dots,1. Suppose that 𝒬t=𝒬tR\mathcal{Q}_{t}=\mathcal{Q}_{t}^{\mathrm{R}} on 𝒳t\mathcal{X}_{t} for some t𝒯t\in\mathcal{T}, which holds trivially for t=Tt=T. We see that Qt(;ξt)=QtR(;ξt)Q_{t}(\cdot;\xi_{t})=Q_{t}^{\mathrm{R}}(\cdot;\xi_{t}) everywhere on 𝒳t1\mathcal{X}_{t-1} for any ξtΞt\xi_{t}\in\Xi_{t} by Proposition 3. Thus by definition (11), 𝒬t1=𝒬t1R\mathcal{Q}_{t-1}=\mathcal{Q}_{t-1}^{\mathrm{R}} on 𝒳t1\mathcal{X}_{t-1}. ∎

Proposition 4 ensures the exactness of the bounded dual recursion (10) and (11), provided that the value functions Qt(;ξt)Q_{t}(\cdot;\xi_{t}) are MtM_{t}-Lipschitz continuous for some known values Mt>0M_{t}>0. We remark that the idea of bounding the dual variables, or equivalently adding a regularization term in the dual problems, could work more generally for multistage stochastic problems without RCR, and refer any interested readers to the paper [42].

3 Algorithms and Complexity Analysis

In this section, we first define single stage subproblem oracles (SSSO) for DR-MCO (1), based on which we define the notion of complexity of the algorithms. A simple implementation of the SSSO is then discussed for the finitely supported problems. We generalize the consecutive DDP (CDDP) algorithm to DR-MCO and more importantly, we introduce a new nonconsecutive DDP algorithm (NDDP). The complexity upper bound for each algorithm is then presented, and finally we provide a complexity lower bound for both algorithms, which shows the upper complexity bounds are nearly tight.

3.1 Single Stage Subproblem Oracles

A subproblem oracle is an oracle that gives a solution to the subproblem given its own information as well as the data generated by the algorithm. The single stage subproblem oracles (SSSO) used in this paper solve an approximation of the problem defined by (10) and (11) for some stage t𝒯t\in\mathcal{T}.

Definition 1 (Initial stage subproblem oracle).

Let 𝒬¯1,𝒬¯1:𝒳1{+}\underline{\mathcal{Q}}_{1},\overline{\mathcal{Q}}_{1}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}_{1}\to\mathbb{R}\cup\{+\infty\} denote two lsc, convex functions, with 𝒬¯1(x1)𝒬1R(x1)𝒬¯1(x1)\underline{\mathcal{Q}}_{1}(x_{1})\leq\mathcal{Q}_{1}^{\mathrm{R}}(x_{1})\leq\overline{\mathcal{Q}}_{1}(x_{1}) for any x1𝒳1x_{1}\in\mathcal{X}_{1}. Consider the following subproblem for the first stage t=1t=1,

minx1𝒳1f1(x0,x1;ξ1)+𝒬¯1(x1).\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\underline{\mathcal{Q}}_{1}(x_{1}). (I)

The initial stage subproblem oracle provides an optimal solution x1x_{1} to (I) and calculates the approximation gap γ1𝒬¯1(x1)𝒬¯1(x1)\gamma_{1}\coloneqq\overline{\mathcal{Q}}_{1}(x_{1})-\underline{\mathcal{Q}}_{1}(x_{1}). We thus define the subproblem oracle formally as a map 𝒪1:(𝒬¯1,𝒬¯1)(x1;γ1)\mathscr{O}_{1}\mathrel{\mathop{\ordinarycolon}}(\underline{\mathcal{Q}}_{1},\overline{\mathcal{Q}}_{1})\mapsto(x_{1};\gamma_{1}).

Definition 2 (Noninitial stage subproblem oracle).

For any t𝒯t\in\mathcal{T}^{\prime}, let 𝒬¯t,𝒬¯t:𝒳t{+}\underline{\mathcal{Q}}_{t},\overline{\mathcal{Q}}_{t}\mathrel{\mathop{\ordinarycolon}}\mathcal{X}_{t}\to\mathbb{R}\cup\{+\infty\} denote two lsc, convex functions, with 𝒬¯t(xt)𝒬tR(xt)𝒬¯t(xt)\underline{\mathcal{Q}}_{t}(x_{t})\leq\mathcal{Q}_{t}^{\mathrm{R}}(x_{t})\leq\overline{\mathcal{Q}}_{t}(x_{t}) for all xt𝒳tx_{t}\in\mathcal{X}_{t}. Then given a feasible state xt1𝒳t1x_{t-1}\in\mathcal{X}_{t-1}, the stage-tt subproblem oracle provides a feasible state xt𝒳tx_{t}\in\mathcal{X}_{t}, an MtM_{t}-Lipschitz continuous linear cut 𝒱t1()\mathcal{V}_{t-1}(\,\cdot\,), and an over-estimate value vt1v_{t-1} such that

  • they are valid, i.e., 𝒱t1()𝒬t1R()\mathcal{V}_{t-1}(\cdot)\leq\mathcal{Q}_{t-1}^{\mathrm{R}}(\cdot) on 𝒳t1\mathcal{X}_{t-1} and vt1𝒬t1R(xt1)v_{t-1}\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x_{t-1});

  • the gap is controlled, i.e., vt1𝒱t1(xt1)γt𝒬¯t(xt)𝒬¯t(xt)v_{t-1}-\mathcal{V}_{t-1}(x_{t-1})\leq\gamma_{t}\coloneqq\overline{\mathcal{Q}}_{t}(x_{t})-\underline{\mathcal{Q}}_{t}(x_{t}).

We thus define the subproblem oracle formally as a map 𝒪t:(xt1,𝒬¯t,𝒬¯t)(𝒱t1,vt1,xt;γt)\mathscr{O}_{t}\mathrel{\mathop{\ordinarycolon}}(x_{t-1},\underline{\mathcal{Q}}_{t},\overline{\mathcal{Q}}_{t})\mapsto(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}).

The noninitial stage subproblem oracles are different from the initial stage subproblem oracle, in the sense that it does not necessarily provide any optimal solution to some optimization problem. Instead, it provides some feasible state, which could be used for exploration of the following stages, a linear cut and an estimate value for updating the approximation in the previous stages. As implementations for Definition 2 may be less intuitive than those for Definition 1, we provide some illustration using finitely supported DR-MCO below.

3.1.1 SSSO Implementation for Finitely Supported DR-MCO

We now propose a possible realization of the noninitial stage subproblem oracle based on the linear cut generation procedure discussed in Section 2.2. Suppose Ξt={ξ^t,1,,ξ^t,nt}\Xi_{t}=\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} for each t𝒯t\in\mathcal{T}^{\prime}. By solving the bounded Lagrangian dual problem (12) for each ξ^t,k\hat{\xi}_{t,k}, we can get primal solutions x^t,k\hat{x}_{t,k}, z^t,k\hat{z}_{t,k}, and linear cuts Vt(xt1;ξ^t,k)V_{t}(x_{t-1};\hat{\xi}_{t,k}) for k=1,,ntk=1,\dots,n_{t} that are all MtM_{t}-Lipschitz continuous. We then aggregate them using 𝒱t1()k=1ntp^t,kVt(;ξ^t,k)\mathcal{V}_{t-1}(\cdot)\coloneqq\sum_{k=1}^{n_{t}}\hat{p}_{t,k}V_{t}(\cdot;\hat{\xi}_{t,k}), where p^targmaxpt𝒫tk=1ntpt,kVt(xt1;ξ^t,k)\hat{p}_{t}\in\operatorname*{arg\,max}_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}V_{t}(x_{t-1};\hat{\xi}_{t,k}) is a maximizer at the given state xt1x_{t-1}. It is clear that 𝒱t1()\mathcal{V}_{t-1}(\cdot) is also MtM_{t}-Lipschitz continuous. Next we solve the maximization problem

vt1maxpt𝒫tk=1ntpt,k(ft(xt1,x^t,k;ξ^t,k)+𝒬¯t(x^t,k)+Mtxt1z^t,k).v_{t-1}\coloneqq\max_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\cdot\bigl{(}f_{t}(x_{t-1},\hat{x}_{t,k};\hat{\xi}_{t,k})+\overline{\mathcal{Q}}_{t}(\hat{x}_{t,k})+M_{t}\lVert x_{t-1}-\hat{z}_{t,k}\rVert\bigr{)}. (13)

Note that by assumption 𝒬¯t(xt)𝒬tR(xt)\overline{\mathcal{Q}}_{t}(x_{t})\geq\mathcal{Q}_{t}^{\mathrm{R}}(x_{t}) for all xt𝒳tx_{t}\in\mathcal{X}_{t}, and (x^t,k,z^t,k)(\hat{x}_{t,k},\hat{z}_{t,k}) is a feasible solution to the problem

minxt𝒳t,ztdt1ft(zt,xt;ξ^t,k)+𝒬¯t(xt)+Mtxt1ztQtR(xt1;ξ^t,k).\min_{\begin{subarray}{c}x_{t}\in\mathcal{X}_{t},\\ z_{t}\in\mathbb{R}^{d_{t-1}}\end{subarray}}f_{t}(z_{t},x_{t};\hat{\xi}_{t,k})+\overline{\mathcal{Q}}_{t}(x_{t})+M_{t}\lVert x_{t-1}-z_{t}\rVert\geq Q_{t}^{\mathrm{R}}(x_{t-1};\hat{\xi}_{t,k}).

Thus the value vt1k=1ntpt,kQtR(xt1;ξ^t,k)=𝒬t1R(xt1)v_{t-1}\geq\sum_{k=1}^{n_{t}}p_{t,k}Q^{\mathrm{R}}_{t}(x_{t-1};\hat{\xi}_{t,k})=\mathcal{Q}_{t-1}^{\mathrm{R}}(x_{t-1}) is a valid overestimation.

It remains to determine the feasible state xtx_{t} and calculate the gap γt\gamma_{t} that satisfy the second requirement in Definition 2. Let γt,k𝒬¯t(x^t,k)𝒬¯t(x^t,k)\gamma_{t,k}\coloneqq\overline{\mathcal{Q}}_{t}(\hat{x}_{t,k})-\underline{\mathcal{Q}}_{t}(\hat{x}_{t,k}) for each k=1,,ntk=1,\dots,n_{t}. We pick the index kk^{*} corresponding to the largest gap γt,k\gamma_{t,k^{*}}, and set xt=x^t,kx_{t}=\hat{x}_{t,k^{*}}, γt=γt,k\gamma_{t}=\gamma_{t,k^{*}}. Consequently, we have

vt1\displaystyle v_{t-1} =maxpt𝒫tk=1ntpt,k(ft(z^t,k,x^t,k;ξ^t,k)+𝒬¯t(x^t,k)+Mtxt1z^t,k)\displaystyle=\max_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\bigl{(}f_{t}(\hat{z}_{t,k},\hat{x}_{t,k};\hat{\xi}_{t,k})+\overline{\mathcal{Q}}_{t}(\hat{x}_{t,k})+M_{t}\lVert x_{t-1}-\hat{z}_{t,k}\rVert\bigr{)}
=maxpt𝒫tk=1ntpt,k(ft(z^t,k,x^t,k;ξ^t,k)+γt,k+𝒬¯t(x^t,k)+Mtxt1z^t,k)\displaystyle=\max_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\bigl{(}f_{t}(\hat{z}_{t,k},\hat{x}_{t,k};\hat{\xi}_{t,k})+\gamma_{t,k}+\underline{\mathcal{Q}}_{t}(\hat{x}_{t,k})+M_{t}\lVert x_{t-1}-\hat{z}_{t,k}\rVert\bigr{)}
γt+maxpt𝒫tk=1ntpt,k(ft(z^t,k,x^t,k;ξ^t,k)+𝒬¯t(x^t,k)+Mtxt1z^t,k)\displaystyle\leq\gamma_{t}+\max_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\bigl{(}f_{t}(\hat{z}_{t,k},\hat{x}_{t,k};\hat{\xi}_{t,k})+\underline{\mathcal{Q}}_{t}(\hat{x}_{t,k})+M_{t}\lVert x_{t-1}-\hat{z}_{t,k}\rVert\bigr{)}
=γt+𝒱t1(xt1).\displaystyle=\gamma_{t}+\mathcal{V}_{t-1}(x_{t-1}).

We summarize the above realization of the noninitial stage subproblem oracles in Algorithm 1.

Algorithm 1 A Realization of Noninitial Stage Subproblem Oracle
1:Ξt={ξ^t,1,,ξ^t,nt}\Xi_{t}=\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} and (xt1,𝒬¯t,𝒬¯t)(x_{t-1},\underline{\mathcal{Q}}_{t},\overline{\mathcal{Q}}_{t})
2:(𝒱t1,vt1,xt;γt)(\mathcal{V}_{t-1},v_{t-1},x_{t};\gamma_{t}) satisfying Definition 2
3:for k=1,,ntk=1,\dots,n_{t} do
4:     solve the bounded Lagrangian dual problem (12) associated with ξ^t,k\hat{\xi}_{t,k}
5:     collect the primal-dual solution pair (x^t,k,z^t,k;λ^t,k)(\hat{x}_{t,k},\hat{z}_{t,k};\hat{\lambda}_{t,k}) and the optimal value v¯t,k\underline{\hbox{\set@color$v$}}_{t,k}
6:     define Vt(x;ξ^t,k)v¯t,k+λ^t,k,xxt1V_{t}(x;\hat{\xi}_{t,k})\coloneqq\underline{\hbox{\set@color$v$}}_{t,k}+\langle\hat{\lambda}_{t,k},x-x_{t-1}\rangle
7:     calculate γt,k𝒬¯t(x^t,k)𝒬¯t(x^t,k)\gamma_{t,k}\coloneqq\overline{\mathcal{Q}}_{t}(\hat{x}_{t,k})-\underline{\mathcal{Q}}_{t}(\hat{x}_{t,k})
8:     let v¯t,k:=ft(xt1,x^t,k;ξ^t,k)+𝒬¯t(x^t,k)+Mtxt1z^t,k\bar{v}_{t,k}\mathrel{\mathop{\ordinarycolon}}=f_{t}(x_{t-1},\hat{x}_{t,k};\hat{\xi}_{t,k})+\overline{\mathcal{Q}}_{t}(\hat{x}_{t,k})+M_{t}\lVert x_{t-1}-\hat{z}_{t,k}\rVert
9:end for
10:construct 𝒱t1(x)k=1ntp^t,kVt(x;ξ^t,k)\mathcal{V}_{t-1}(x)\coloneqq\sum_{k=1}^{n_{t}}\hat{p}_{t,k}V_{t}(x;\hat{\xi}_{t,k}) with p^targmaxpt𝒫tk=1ntpt,kv¯t,k\hat{p}_{t}\in\operatorname*{arg\,max}_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\underline{\hbox{\set@color$v$}}_{t,k}
11:calculate vt1maxpt𝒫tk=1ntpt,kv¯t,kv_{t-1}\coloneqq\max_{p_{t}\in\mathcal{P}_{t}}\sum_{k=1}^{n_{t}}p_{t,k}\bar{v}_{t,k}
12:find kk^{*} such that γt,kγt,k\gamma_{t,k^{*}}\geq\gamma_{t,k} for all k=1,,ntk=1,\dots,n_{t} and set xtx^t,k,γtγt,kx_{t}\coloneqq\hat{x}_{t,k^{*}},\gamma_{t}\coloneqq\gamma_{t,k^{*}}

We remark that Algorithm 1 is not the only way to realize the SSSO in Definition 2. For example, it is discussed in [17] that a polyhedral single stage subproblem of MRCO can be reformulated as mixed-integer linear optimization, which may then be solved by branch-and-bound type algorithms. Therefore, the introduction of SSSO may benefit our discussion by avoiding restrictions of solution methods in each stage, even when the uncertainty sets are finite. Besides, with SSSO, the complexity analysis may better reflect the computation time as the for-loop in Algorithm 1 can be easily parallelized. We also show in the next section that SSSO enables us to introduce a nonconsecutive dual dynamic programming algorithm.

3.2 Dual Dynamic Programming Algorithms

With the subproblem oracles, we generalize the consecutive dual dynamic programming (CDDP) algorithm to DR-MCO and introduce a nonconsecutive dual dynamic programming (NDDP) algorithm. Both algorithms produce deterministic upper bounds for exploration and termination.

3.2.1 Consecutive Dual Dynamic Programming

To ease the notation, we use conv{h1,h2}\mathrm{conv}\{h_{1},h_{2}\} to denote the function corresponding to the closed convex hull of the epigraphs of functions h1h_{1} and h2h_{2}. More precisely, we define

conv{h1,h2}(x)supλinfz{min{h1(z),h2(z)}+λ,xz}.\mathrm{conv}\{h_{1},h_{2}\}(x)\coloneqq\sup_{\lambda}\inf_{z}\left\{\min\{h_{1}(z),h_{2}(z)\}+\langle\lambda,x-z\rangle\right\}.

Note that if h1,h2h_{1},h_{2} are further polyhedral, then by linear optimization strong duality, the function conv{h1,h2}\mathrm{conv}\{h_{1},h_{2}\} (assuming it is proper) can be written as

conv{h1,h2}(x)=min{v:(z,v)=μ1(z1,v1)+μ2(z2,v2),vihi(zi),μi0,i=1,2,μ1+μ2=1}.\mathrm{conv}\{h_{1},h_{2}\}(x)=\min\left\{v\in\mathbb{R}\mathrel{\mathop{\ordinarycolon}}\begin{aligned} &(z,v)=\mu_{1}(z_{1},v_{1})+\mu_{2}(z_{2},v_{2}),\\ &v_{i}\geq h_{i}(z_{i}),\mu_{i}\in\mathbb{R}_{\geq 0},i=1,2,\ \mu_{1}+\mu_{2}=1\end{aligned}\right\}.
Algorithm 2 Consecutive Dual Dynamic Programming Algorithm
1:subproblem oracles 𝒪t\mathscr{O}_{t} for t𝒯t\in\mathcal{T}, target optimality gap ε>0\varepsilon>0
2:an ε\varepsilon-optimal first stage solution x1x_{1}^{*}
3:initialize: 𝒬¯t00,𝒬¯t0+,t𝒯\{T}\underline{\mathcal{Q}}_{t}^{0}\leftarrow 0,\overline{\mathcal{Q}}_{t}^{0}\leftarrow+\infty,t\in\mathcal{T}\backslash\{T\}; 𝒬¯Tj,𝒬¯Tj0,j\underline{\mathcal{Q}}_{T}^{j},\overline{\mathcal{Q}}_{T}^{j}\leftarrow 0,j\in\mathbb{N}; i1i\leftarrow 1
4:evaluate (x11;γ11)𝒪1(𝒬¯10,𝒬¯10)(x_{1}^{1};\gamma_{1}^{1})\leftarrow\mathscr{O}_{1}(\underline{\mathcal{Q}}_{1}^{0},\overline{\mathcal{Q}}_{1}^{0})
5:set LowerBoundf1(x0,x11;ξ1),UpperBound+\text{{LowerBound}}\leftarrow f_{1}(x_{0},x_{1}^{1};\xi_{1}),\ \text{{UpperBound}}\leftarrow+\infty
6:while UpperBoundLowerBound>ε\text{{UpperBound}}-\text{{LowerBound}}>\varepsilon do
7:     for t=2,,Tt=2,\dots,T do
8:         evaluate (𝒱t1i,vt1i,xti;γti)=𝒪t(xt1i,𝒬¯ti1,𝒬¯ti1)(\mathcal{V}_{t-1}^{i},v_{t-1}^{i},x_{t}^{i};\gamma_{t}^{i})=\mathscr{O}_{t}(x_{t-1}^{i},\underline{\mathcal{Q}}_{t}^{i-1},\overline{\mathcal{Q}}_{t}^{i-1}) \triangleright forward step
9:     end for
10:     for t=T,,2t=T,\dots,2 do
11:         update 𝒬¯t1i(x)max{𝒬¯t1i1(x),𝒱t1i(x)}\underline{\mathcal{Q}}_{t-1}^{i}(x)\leftarrow\max\{\underline{\mathcal{Q}}_{t-1}^{i-1}(x),\mathcal{V}_{t-1}^{i}(x)\} \triangleright backward step
12:         update 𝒬¯t1i(x)conv{𝒬¯t1i1(x),vt1i+Mtxxt1i}\overline{\mathcal{Q}}_{t-1}^{i}(x)\leftarrow\mathrm{conv}\{\overline{\mathcal{Q}}_{t-1}^{i-1}(x),v_{t-1}^{i}+M_{t}\lVert x-x_{t-1}^{i}\rVert\}
13:     end for
14:     evaluate (x1i+1;γ1i+1)𝒪1(𝒬¯1i,𝒬¯1i)(x_{1}^{i+1};\gamma_{1}^{i+1})\leftarrow\mathscr{O}_{1}(\underline{\mathcal{Q}}_{1}^{i},\overline{\mathcal{Q}}_{1}^{i}) \triangleright initial stage step
15:     update LowerBoundf1(x0,x1i+1;ξ1)+𝒬¯1i(x1i+1)\text{{LowerBound}}\leftarrow f_{1}(x_{0},x_{1}^{i+1};\xi_{1})+\underline{\mathcal{Q}}_{1}^{i}(x_{1}^{i+1})
16:     update UpperBoundf1(x0,x1i+1;ξ1)+𝒬¯1i(x1i+1)\text{{UpperBound}}^{\prime}\leftarrow f_{1}(x_{0},x_{1}^{i+1};\xi_{1})+\overline{\mathcal{Q}}_{1}^{i}(x_{1}^{i+1})
17:     if UpperBound<UpperBound\text{{UpperBound}}^{\prime}<\text{{UpperBound}} then
18:         set x1x1i+1x_{1}^{*}\leftarrow x_{1}^{i+1}, UpperBoundUpperBound\text{{UpperBound}}\leftarrow\text{{UpperBound}}^{\prime}
19:     end if
20:     update ii+1i\leftarrow i+1
21:end while

For each iteration ii\in\mathbb{N}, the main loop of CDDP (Algorithm 2) consists of three steps. The forward step uses the state xt1ix_{t-1}^{i} in the previous stage and the approximations 𝒬¯ti1\underline{\mathcal{Q}}_{t}^{i-1} and 𝒬¯ti1\overline{\mathcal{Q}}_{t}^{i-1} to produce a new state xtix_{t}^{i}. Then the backward step at stage tt uses the cut 𝒱t1i(x)\mathcal{V}_{t-1}^{i}(x) and the value vt1iv_{t-1}^{i} to update the approximations 𝒬¯t1i,𝒬¯t1i\underline{\mathcal{Q}}_{t-1}^{i},\overline{\mathcal{Q}}_{t-1}^{i} in its precedent stage t1t-1. The initial stage step produces a first stage solution x1i+1x_{1}^{i+1} and updates the lower and upper bounds.

We next discuss the correctness of Algorithm 2, i.e., the returned solution x1x_{1}^{*} is ε\varepsilon-optimal as defined in (5), while leaving the finiteness proof to Section 3.3. From the termination of the while-loop, it suffices to show that the approximations are valid 𝒬¯ti(x)𝒬tR(x)𝒬¯ti(x)\underline{\mathcal{Q}}_{t}^{i}(x)\leq\mathcal{Q}_{t}^{\mathrm{R}}(x)\leq\overline{\mathcal{Q}}_{t}^{i}(x) for each t𝒯t\in\mathcal{T} and ii\in\mathbb{N}. The first inequality follows from the validness of linear cuts 𝒱ti\mathcal{V}_{t}^{i} (Definition 2). The second inequality is due to the MtM_{t}-Lipschitz continuity of 𝒬t1R\mathcal{Q}_{t-1}^{\mathrm{R}} (Proposition 3). In particular, the inequality vt1i𝒬t1R(xt1i)v_{t-1}^{i}\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x_{t-1}^{i}) implies vt1i+Mtxxt1i𝒬tR(x)v_{t-1}^{i}+M_{t}\lVert x-x_{t-1}^{i}\rVert\geq\mathcal{Q}_{t}^{\mathrm{R}}(x) for all x𝒳t1x\in\mathcal{X}_{t-1}. Given that 𝒬¯t1i1(x)𝒬t1R(x)\overline{\mathcal{Q}}_{t-1}^{i-1}(x)\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x) for x𝒳t1x\in\mathcal{X}_{t-1}, which is obviously true for i=1i=1, we conclude that

min{𝒬¯t1i1(x),vt1i+Mtxxt1i}𝒬t1R(x),x𝒳t1.\min\{\overline{\mathcal{Q}}_{t-1}^{i-1}(x),v_{t-1}^{i}+M_{t}\|{x-x_{t-1}^{i}}\|\}\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x),\quad\forall\,x\in\mathcal{X}_{t-1}.

By taking the closed convex hull of the epigraphs on both sides, we have shown that 𝒬¯t1i(x)𝒬t1R(x)\overline{\mathcal{Q}}_{t-1}^{i}(x)\geq\mathcal{Q}_{t-1}^{\mathrm{R}}(x) for all x𝒳t1x\in\mathcal{X}_{t-1}. The above argument shows inductively that for all ii\in\mathbb{N}, the approximations are valid, which then implies the correctness of the algorithm.

We comment that the linear cut 𝒱t1i\mathcal{V}_{t-1}^{i} and the over-estimate value vt1iv_{t-1}^{i} are generated using only the information in the previous iteration i1i-1. In fact, the subproblem oracles can be re-evaluated in the backward steps to produce tighter approximations. We simply keep the CDDP algorithm in its current form because it is already sufficient for us to prove its complexity bounds, and hence its convergence. Alternatively, we propose a nonconsecutive version, NDDP, algorithm that could conduct more efficient approximation updates.

Algorithm 3 Nonconsecutive Dual Dynamic Programming (NDDP) Algorithm
1:subproblem oracles 𝒪t\mathscr{O}_{t} for t𝒯t\in\mathcal{T}, opt. and approx. gaps ε=δ1>>δT=0\varepsilon=\delta_{1}>\cdots>\delta_{T}=0
2:an ε\varepsilon-optimal first stage solution x1x_{1}^{*}
3:initialize: 𝒬¯t00,𝒬¯t0+,t𝒯\{T}\underline{\mathcal{Q}}_{t}^{0}\leftarrow 0,\overline{\mathcal{Q}}_{t}^{0}\leftarrow+\infty,t\in\mathcal{T}\backslash\{T\}; 𝒬¯Tj,𝒬¯Tj0,j\underline{\mathcal{Q}}_{T}^{j},\overline{\mathcal{Q}}_{T}^{j}\leftarrow 0,j\in\mathbb{N}; it0,t𝒯i_{t}\leftarrow 0,t\in\mathcal{T}
4:set LowerBound0,UpperBound+\text{{LowerBound}}\leftarrow 0,\ \text{{UpperBound}}\leftarrow+\infty, t1t\leftarrow 1
5:loop
6:     update itit+1i_{t}\leftarrow i_{t}+1
7:     if t=1t=1 then
8:         evaluate (x1i1;γ1i1)𝒪1(𝒬¯1i1,𝒬¯1i1)(x_{1}^{i_{1}};\gamma_{1}^{i_{1}})\leftarrow\mathscr{O}_{1}(\underline{\mathcal{Q}}_{1}^{i_{1}},\overline{\mathcal{Q}}_{1}^{i_{1}}) \triangleright initial stage step
9:         update LowerBoundf1(x0,x1i1;ξ1)+𝒬¯1i1(x1i1)\text{{LowerBound}}\leftarrow f_{1}(x_{0},x_{1}^{i_{1}};\xi_{1})+\underline{\mathcal{Q}}_{1}^{i_{1}}(x_{1}^{i_{1}})
10:         update UpperBoundf1(x0,x1i1;ξ1)+𝒬¯1i1(x1i1)\text{{UpperBound}}^{\prime}\leftarrow f_{1}(x_{0},x_{1}^{i_{1}};\xi_{1})+\overline{\mathcal{Q}}_{1}^{i_{1}}(x_{1}^{i_{1}})
11:         if UpperBound<UpperBound\text{{UpperBound}}^{\prime}<\text{{UpperBound}} then
12:              set x1x1i1x_{1}^{*}\leftarrow x_{1}^{i_{1}}, UpperBoundUpperBound\text{{UpperBound}}\leftarrow\text{{UpperBound}}^{\prime}
13:         end if
14:         if UpperBoundLowerBoundε\text{{UpperBound}}-\text{{LowerBound}}\leq\varepsilon then
15:              break the loop
16:         end if
17:         maintain 𝒬¯2i2+1(x)𝒬¯2i2(x),𝒬¯2i2+1(x)𝒬¯2i2(x)\underline{\mathcal{Q}}_{2}^{i_{2}+1}(x)\leftarrow\underline{\mathcal{Q}}_{2}^{i_{2}}(x),\;\overline{\mathcal{Q}}_{2}^{i_{2}+1}(x)\leftarrow\overline{\mathcal{Q}}_{2}^{i_{2}}(x)
18:         set tt+1t\leftarrow t+1
19:     else
20:         evaluate (𝒱t1it,vt1it,xtit;γtit)𝒪t(xt1it1,𝒬¯tit,𝒬¯tit)(\mathcal{V}_{t-1}^{i_{t}},v_{t-1}^{i_{t}},x_{t}^{i_{t}};\gamma_{t}^{i_{t}})\leftarrow\mathscr{O}_{t}(x_{t-1}^{i_{t-1}},\underline{\mathcal{Q}}_{t}^{i_{t}},\overline{\mathcal{Q}}_{t}^{i_{t}}) \triangleright noninitial stage step
21:         if t<Tt<T and γtit>δt\gamma_{t}^{i_{t}}>\delta_{t} then
22:              maintain 𝒬¯t+1it+1+1(x)𝒬¯t+1it+1(x),𝒬¯t+1it+1+1(x)𝒬¯t+1it+1(x)\underline{\mathcal{Q}}_{t+1}^{i_{t+1}+1}(x)\leftarrow\underline{\mathcal{Q}}_{t+1}^{i_{t+1}}(x),\;\overline{\mathcal{Q}}_{t+1}^{i_{t+1}+1}(x)\leftarrow\overline{\mathcal{Q}}_{t+1}^{i_{t+1}}(x)
23:              set tt+1t\leftarrow t+1
24:         else
25:              update 𝒬¯t1it1+1(x)max{𝒬¯t1it1(x),𝒱t1it(x)}\underline{\mathcal{Q}}_{t-1}^{i_{t-1}+1}(x)\leftarrow\max\{\underline{\mathcal{Q}}_{t-1}^{i_{t-1}}(x),\mathcal{V}_{t-1}^{i_{t}}(x)\}
26:              update 𝒬¯t1it1+1(x)conv{𝒬¯t1it1(x),vt1it+Mt1xxt1it1}\overline{\mathcal{Q}}_{t-1}^{i_{t-1}+1}(x)\leftarrow\mathrm{conv}\{\overline{\mathcal{Q}}_{t-1}^{i_{t-1}}(x),v_{t-1}^{i_{t}}+M_{t-1}\|{x-x_{t-1}^{i_{t-1}}}\|\}
27:              set tt1t\leftarrow t-1
28:         end if
29:     end if
30:end loop

3.2.2 Nonconsecutive Dual Dynamic Programming

We describe the NDDP algorithm in Algorithm 3. To start the algorithm, it requires an additionally chosen vector of approximation gaps δ(δt)t=1T\delta\coloneqq(\delta_{t})_{t=1}^{T} such that ε=δ1>δ2>>δT=0\varepsilon=\delta_{1}>\delta_{2}>\cdots>\delta_{T}=0, compared with the CDDP algorithm. These predetermined approximation gaps serve as criteria at stage tt for deciding the next stage to be solved: the precedent stage t1t-1 or the subsequent one t+1t+1. If the algorithm decides to proceed to the subsequent stage t+1t+1, then the current state xtx_{t} is used; otherwise the generated linear cut 𝒱t1\mathcal{V}_{t-1} and over-estimate value vt1v_{t-1} are used for updating the approximations. The above argument of validness of approximations imply that 𝒬¯tit(x)𝒬tR(x)𝒬¯tit(x)\underline{\mathcal{Q}}_{t}^{i_{t}}(x)\leq\mathcal{Q}_{t}^{\mathrm{R}}(x)\leq\overline{\mathcal{Q}}_{t}^{i_{t}}(x) for all x𝒳tx\in\mathcal{X}_{t} holds for any stage t𝒯t\in\mathcal{T} and any index iti_{t}\in\mathbb{N}. Therefore, when NDDP terminates, the returned solution x1x_{1}^{*} is indeed ε\varepsilon-optimal.

3.3 Complexity Upper Bounds

In this section, we provide a complexity analysis for the proposed CDDP and NDDP algorithms, which implies that both algorithms terminate in finite time for any finite M2,,MT>0M_{2},\dots,M_{T}>0 and ε>0\varepsilon>0. Our goal is to derive an upper bound on the total number of subproblem oracle evaluations before the termination of the algorithm. To begin with, let 𝒥t,t>1\mathcal{J}_{t},t>1, denote the set of pairs of indices (it1,it)(i_{t-1},i_{t}) such that the noninitial stage subproblem oracle is evaluated at the iti_{t}-th time at the state xt1it1x_{t-1}^{i_{t-1}}, i.e., (𝒱t1it,vt1it,xtit;γtit)=𝒪t(xt1it1,𝒬¯tit,𝒬¯tit)(\mathcal{V}_{t-1}^{i_{t}},v_{t-1}^{i_{t}},x_{t}^{i_{t}};\gamma_{t}^{i_{t}})=\mathscr{O}_{t}(x_{t-1}^{i_{t-1}},\underline{\mathcal{Q}}_{t}^{i_{t}},\overline{\mathcal{Q}}_{t}^{i_{t}}). For the CDDP algorithm, all stages share the same iteration index it=ii_{t}=i, so 𝒥t={(i,i):i}\mathcal{J}_{t}=\{(i,i)\mathrel{\mathop{\ordinarycolon}}i\in\mathbb{N}\} for all t>1t>1. We define the following sets of indices for each t𝒯{T}t\in\mathcal{T}\setminus\{T\}:

t(δ){it:γtit>δt and γt+1it+1δt+1,(it,it+1)𝒥t+1}.\mathcal{I}_{t}(\delta)\coloneqq\left\{i_{t}\in\mathbb{N}\mathrel{\mathop{\ordinarycolon}}\gamma_{t}^{i_{t}}>\delta_{t}\text{ and }\gamma_{t+1}^{i_{t+1}}\leq\delta_{t+1},\ (i_{t},i_{t+1})\in\mathcal{J}_{t+1}\right\}. (14)

Here, for NDDP algorithm, δ\delta is the given approximation gap vector, while for CDDP algorithm, δ=(δt)t=1T\delta=(\delta_{t})_{t=1}^{T} can be any vector satisfying ε=δ1>δ2>>δT=0\varepsilon=\delta_{1}>\delta_{2}>\cdots>\delta_{T}=0 for the purpose of analysis, since it is not required for the CDDP algorithm. We adopt the convention that the gap for the last stage γTiT0\gamma_{T}^{i_{T}}\equiv 0 such that iT1T1(δ)i_{T-1}\in\mathcal{I}_{T-1}(\delta) if and only if γT1iT1>δT1\gamma_{T-1}^{i_{T-1}}>\delta_{T-1} and (iT1,iT)𝒥T(i_{T-1},i_{T})\in\mathcal{J}_{T}. In plain words, t(δ)\mathcal{I}_{t}(\delta) contains the pairs of indices where the state xtitx_{t}^{i_{t}} has an approximation gap greater than δt\delta_{t} at iti_{t}-th evaluation, but at it+1i_{t+1}-th iteration the stage-(t+1)(t+1) oracle finds a solution with approximation gap smaller than δt+1\delta_{t+1}. An important observation is that all these index sets are finite (before algorithm termination) |t|<\mathinner{\!\left\lvert\mathcal{I}_{t}\right\rvert}<\infty, which is more precisely stated in the following lemma.

Lemma 3.

For stage any t<Tt<T, suppose the state space 𝒳tdt\mathcal{X}_{t}\subset\mathbb{R}^{d_{t}} is contained in a ball with diameter Dt>0D_{t}>0. Then,

|t(δ)|(1+2Mt+1Dtδtδt+1)dt.\mathinner{\!\left\lvert\mathcal{I}_{t}(\delta)\right\rvert}\leq\left(1+\frac{2M_{t+1}D_{t}}{\delta_{t}-\delta_{t+1}}\right)^{d_{t}}. (15)
Proof.

We claim that for any j,ktj,k\in\mathcal{I}_{t}, jkj\neq k, it holds that xtjxtk>(δtδt+1)/(2Mt+1)\lVert x_{t}^{j}-x_{t}^{k}\rVert>(\delta_{t}-\delta_{t+1})/(2M_{t+1}). Assume for contradiction that xtjxtk(δtδt+1)/(2Mt+1)\lVert x_{t}^{j}-x_{t}^{k}\rVert\leq(\delta_{t}-\delta_{t+1})/(2M_{t+1}) for some j<kj<k, j,kt(δ)j,k\in\mathcal{I}_{t}(\delta). By definition of t(δ)\mathcal{I}_{t}(\delta), the (t+1)(t+1)-th subproblem oracle is evaluated at the state xtjx_{t}^{j}, and in both the CDDP and the NDDP algorithms, the approximations 𝒬¯tj\underline{\mathcal{Q}}_{t}^{j} and 𝒬¯tj\overline{\mathcal{Q}}_{t}^{j} are updated since γt+1it+1δt+1\gamma_{t+1}^{i_{t+1}}\leq\delta_{t+1} for some it+1i_{t+1}\in\mathbb{N} with (j,it+1)𝒥t+1(j,i_{t+1})\in\mathcal{J}_{t+1}. Then by Definition 2 of the noninitial stage subproblem oracle, we have 𝒬¯tj+1(xtj)𝒬¯tj+1(xtj)δt+1\overline{\mathcal{Q}}_{t}^{j+1}(x_{t}^{j})-\underline{\mathcal{Q}}_{t}^{j+1}(x_{t}^{j})\leq\delta_{t+1}. Note that for any point x𝒳tx\in\mathcal{X}_{t} with xxtj(δtδt+1)/(2Mt+1)\lVert x-x_{t}^{j}\rVert\leq(\delta_{t}-\delta_{t+1})/(2M_{t+1}), we have 𝒬¯tj+1(x)𝒬¯tj+1(x)δt\overline{\mathcal{Q}}_{t}^{j+1}(x)-\underline{\mathcal{Q}}_{t}^{j+1}(x)\leq\delta_{t} because of the Mt+1M_{t+1}-Lipschitz continuity of the approximations. Since 𝒬¯tk(x)𝒬¯tj+1(x)\overline{\mathcal{Q}}_{t}^{k}(x)\leq\overline{\mathcal{Q}}_{t}^{j+1}(x) and 𝒬¯tk(x)𝒬¯tj+1(x)\underline{\mathcal{Q}}_{t}^{k}(x)\geq\underline{\mathcal{Q}}_{t}^{j+1}(x), by setting x=xtkx=x_{t}^{k}, we see a contradiction with the assumption that kt(δ)k\in\mathcal{I}_{t}(\delta), which proves the claim.

To ease the notation, let rt(δtδt+1)/(2Mt+1)r_{t}\coloneqq(\delta_{t}-\delta_{t+1})/(2M_{t+1}) denote the radius of the dtd_{t}-dimensional balls dt(xtj;rt)\mathcal{B}^{d_{t}}(x_{t}^{j};r_{t}) centered at xtjx_{t}^{j} for jt(δ)j\in\mathcal{I}_{t}(\delta), and let t𝒳t\mathcal{B}_{t}\supseteq\mathcal{X}_{t} denote a ball with diameter DtD_{t}. From the above claim, we know that xtkdt(xtj;rt)x_{t}^{k}\notin\mathcal{B}^{d_{t}}(x_{t}^{j};r_{t}) for any j,kt(δ)j,k\in\mathcal{I}_{t}(\delta) with j<kj<k. In other words, the smaller balls dt(xtj;rt/2)\mathcal{B}^{d_{t}}(x_{t}^{j};r_{t}/2) are disjoint. Meanwhile, note that each of these smaller balls satisfies dt(xtj;rt/2)t+dt(0;rt/2)\mathcal{B}^{d_{t}}(x_{t}^{j};r_{t}/2)\subset\mathcal{B}_{t}+\mathcal{B}^{d_{t}}(0;r_{t}/2) (the Minkowski sum in the Euclidean space dt\mathbb{R}^{d_{t}}). Therefore, the volumes satisfy the relation

Vol(jt(δ)dt(xtj;rt/2))=|t(δ)|Voldt(0;rt/2)Vol(t+dt(0;rt/2)),\mathrm{Vol}\left(\textstyle\bigcup_{j\in\mathcal{I}_{t}(\delta)}\mathcal{B}^{d_{t}}(x_{t}^{j};r_{t}/2)\right)=\mathinner{\!\left\lvert\mathcal{I}_{t}(\delta)\right\rvert}\cdot\mathrm{Vol}\mathcal{B}^{d_{t}}(0;r_{t}/2)\leq\mathrm{Vol}\left(\mathcal{B}_{t}+\mathcal{B}^{d_{t}}(0;r_{t}/2)\right),

which implies that

|t(δ)|Vol(t+dt(0;rt/2))Voldt(0;rt/2)=(Dt/2+rt/2rt/2)dt=(1+2Mt+1Dtδtδt+1)dt.\mathinner{\!\left\lvert\mathcal{I}_{t}(\delta)\right\rvert}\leq\frac{\mathrm{Vol}(\mathcal{B}_{t}+\mathcal{B}^{d_{t}}(0;r_{t}/2))}{\mathrm{Vol}\mathcal{B}^{d_{t}}(0;r_{t}/2)}=\left(\frac{D_{t}/2+r_{t}/2}{r_{t}/2}\right)^{d_{t}}=\left(1+\frac{2M_{t+1}D_{t}}{\delta_{t}-\delta_{t+1}}\right)^{d_{t}}.

We prove the following complexity upper bounds for the CDDP algorithm (Theorem 1) and the NDDP algorithm (Theorem 2).

Theorem 1.

Suppose the state spaces 𝒳tdt\mathcal{X}_{t}\subset\mathbb{R}^{d_{t}} are contained in balls, each with diameter Dt>0D_{t}>0. Then for the CDDP algorithm (Algorithm 2), the total number of subproblem oracle evaluations #EvalCDDP\textup{{\#Eval}}_{\textup{CDDP}} before termination is bounded by

#EvalCDDP1+Tinfδ{t=1T1(1+2Mt+1Dtδtδt+1)dt:ε=δ1>δ2>>δT=0}.\textup{{\#Eval}}_{\textup{CDDP}}\leq 1+T\cdot\inf_{\delta}\left\{\sum_{t=1}^{T-1}\left(1+\frac{2M_{t+1}D_{t}}{\delta_{t}-\delta_{t+1}}\right)^{d_{t}}\mathrel{\mathop{\ordinarycolon}}\varepsilon=\delta_{1}>\delta_{2}>\cdots>\delta_{T}=0\right\}.
Proof.

We prove by showing that for any approximation gap vector δ\delta satisfying ε=δ1>δ2>>δT=0\varepsilon=\delta_{1}>\delta_{2}>\cdots>\delta_{T}=0, the largest iteration index ii is bounded by

it=1T1(1+2Mt+1Dtδtδt+1)dt.i\leq\sum_{t=1}^{T-1}\left(1+\frac{2M_{t+1}D_{t}}{\delta_{t}-\delta_{t+1}}\right)^{d_{t}}. (16)

We claim that each iteration ii\in\mathbb{N} must lie in either of the following two cases:

  1. 1.

    the initial stage step has γ1iε\gamma_{1}^{i}\leq\varepsilon; or

  2. 2.

    the ii-th forward step is in the index set it(δ)i\in\mathcal{I}_{t}(\delta) for some stage t<Tt<T.

To see the claim, suppose that the iteration ii\in\mathbb{N} is not in the first case. Then we have γ1i>ε\gamma_{1}^{i}>\varepsilon and by convention γTi=0δT\gamma_{T}^{i}=0\leq\delta_{T}. Therefore, there exists a stage t<Tt<T such that γti>δt\gamma_{t}^{i}>\delta_{t} while γt+1iδt+1\gamma_{t+1}^{i}\leq\delta_{t+1}, which is the second case. Note that when the first case happens, we have UpperBoundLowerBoundγ1iε\text{{UpperBound}}-\text{{LowerBound}}\leq\gamma_{1}^{i}\leq\varepsilon and thus the CDDP algorithm terminates. By Lemma 3, the second case can only happen at most t=1T1|t(δ)|\sum_{t=1}^{T-1}\mathinner{\!\left\lvert\mathcal{I}_{t}(\delta)\right\rvert} times, proving the bound (16). The theorem then follows from the fact that in each CDDP iteration, the subproblem oracle is evaluated T+1T+1 times with the additional evaluation of the initial stage subproblem oracle for checking the termination criterion. ∎

Remark 1.

When dt=dd_{t}=d, Mt=MM_{t}=M, and Dt=DD_{t}=D for all t=1,,T1t=1,\dots,T-1, the bound in Theorem 1 can be simplified nicely. Note that the function

t=1T1(1+2MDεσt)d\sum_{t=1}^{T-1}\left(1+\frac{2MD}{\varepsilon\cdot\sigma_{t}}\right)^{d}

is strictly convex and symmetric under permutation on the simplex ΔT1:={σ=(σ1,,σT1)T1:t=1T1σt=1}\Delta^{T-1}\mathrel{\mathop{\ordinarycolon}}=\{\sigma=(\sigma_{1},\dots,\sigma_{T-1})\in\mathbb{R}^{T-1}\mathrel{\mathop{\ordinarycolon}}\sum_{t=1}^{T-1}\sigma_{t}=1\}. Therefore, it has a unique optimal solution δ=(1T1,,1T1)\delta=(\frac{1}{T-1},\dots,\frac{1}{T-1}), which implies the infimum in (16) is attained by δt=TtT1ε\delta_{t}=\frac{T-t}{T-1}\varepsilon, t𝒯t\in\mathcal{T}. In this case, the CDDP complexity upper bound in Theorem 1 becomes

i1+T(T1)(1+2MD(T1)ε)d.i\leq 1+T(T-1)\left(1+\frac{2MD(T-1)}{\varepsilon}\right)^{d}.
Theorem 2.

Suppose the state spaces 𝒳tdt\mathcal{X}_{t}\subset\mathbb{R}^{d_{t}} are contained in balls, each with diameter Dt>0D_{t}>0. Then, for the NDDP algorithm (Algorithm 3) with the predetermined approximation gap vector (δt)t=1T(\delta_{t})_{t=1}^{T} satisfying ε=δ1>δ2>>δT=0\varepsilon=\delta_{1}>\delta_{2}>\cdots>\delta_{T}=0, the total number of subproblem oracle evaluations #EvalNDDP\textup{{\#Eval}}_{\textup{NDDP}} before termination is bounded by

#EvalNDDP1+2t=1T1(1+2Mt+1Dtδtδt+1)dt.\textup{{\#Eval}}_{\textup{NDDP}}\leq 1+2\cdot\sum_{t=1}^{T-1}\left(1+\frac{2M_{t+1}D_{t}}{\delta_{t}-\delta_{t+1}}\right)^{d_{t}}.
Proof.

For the NDDP algorithm, each time when it decides to go back to the precedent stage tt1t\leftarrow t-1, we must have γtitδt\gamma_{t}^{i_{t}}\leq\delta_{t} while γt1it1>δt1\gamma_{t-1}^{i_{t-1}}>\delta_{t-1} for some (it1,it)𝒥t(i_{t-1},i_{t})\in\mathcal{J}_{t}. In this case, we have by definition that it1t1(δ)i_{t-1}\in\mathcal{I}_{t-1}(\delta). By Lemma 3, such “going back” steps can only happen at most t=1T1|t(δ)|\sum_{t=1}^{T-1}\mathinner{\!\left\lvert\mathcal{I}_{t}(\delta)\right\rvert} times. The total number of evaluations is then bounded by two times the number of “going back” steps due to the corresponding “going forward” steps, plus an additional evaluation of the initial stage oracle for checking the termination criterion. ∎

Let us compare the complexity bounds of the two algorithms. If we fix the approximation gap vector δ\delta in Theorem 1 to be the same in the NDDP algorithm, then the complexity bound of CDDP grows faster than that of NDDP as T2T\geq 2. However, since an optimal choice of the gap vector δ\delta is usually not known, CDDP has the advantage of not requiring estimates of these factors for the algorithm to be executed.

In many practical problems, it is reasonable to allow the target optimality gap to grow with TT. We say x1𝒳1x_{1}^{*}\in\mathcal{X}_{1} is an α\alpha-relative optimal solution if

f1(x0,x1;ξ1)+Q1(x1)(1+α)(minx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1)).f_{1}(x_{0},x_{1}^{*};\xi_{1})+Q_{1}(x_{1}^{*})\leq(1+\alpha)\left(\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\right).

An ε\varepsilon-optimal first stage solution is α\alpha-relative optimal if

εα(minx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1)).\varepsilon\leq\alpha\left(\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\right).

We provide below an important simplification of the above complexity bounds when considering relative optimality gaps.

Corollary 1.

Suppose that all the state spaces have the same dimension dt=dd_{t}=d and are bounded by a common diameter DtDD_{t}\leq D, and let Mmax{Mt:t𝒯}M\coloneqq\max\{M_{t}\mathrel{\mathop{\ordinarycolon}}t\in\mathcal{T}^{\prime}\}. If for each stage t𝒯t\in\mathcal{T}, the local cost functions are uniformly bounded from below by C>0C>0 for all feasible solutions xt1,xtx_{t-1},x_{t} and uncertainty outcomes ξt\xi_{t}, i.e., ft(xt1,xt;ξt)Cf_{t}(x_{t-1},x_{t};\xi_{t})\geq C, then the total number of subproblem oracle evaluations before achieving an α\alpha-relative optimal solution x1x_{1}^{*} for CDDP and NDDP are upper bounded respectively by

#EvalCDDP1+T(T1)(1+2MDαC)d,\displaystyle\textup{{\#Eval}}_{\textup{CDDP}}\leq 1+T(T-1)\left(1+\frac{2MD}{\alpha C}\right)^{d},
#EvalNDDP1+2(T1)(1+2MDαC)d.\displaystyle\textup{{\#Eval}}_{\textup{NDDP}}\leq 1+2(T-1)\left(1+\frac{2MD}{\alpha C}\right)^{d}.
Proof.

By assumption, minx1𝒳1f1(x0,x1;ξ1)+𝒬1(x1)CT\min_{x_{1}\in\mathcal{X}_{1}}f_{1}(x_{0},x_{1};\xi_{1})+\mathcal{Q}_{1}(x_{1})\geq CT. Thus the assertion is proved if x1x_{1}^{*} is ε\varepsilon-optimal with ε=αC(T1)<αCT\varepsilon=\alpha C(T-1)<\alpha CT. We can apply Theorems 1 and 2 by setting δt=(Tt)αC\delta_{t}=(T-t)\alpha C. ∎

Corollary 1 shows that for problems that have strictly positive cost in each stage, the proposed complexity bounds for an α\alpha-relative optimal solution grow at most quadratically for CDDP and linearly for NDDP with respect to the number of stages TT. At the same time, they both grow exponentially with the dimension dd. In what follows, we show that this exponential dependence on dd is nearly tight by providing a matching lower bound.

3.4 Complexity Lower Bound

Note that if we take δt=ε(Tt)/(T1)\delta_{t}=\varepsilon(T-t)/(T-1) for t𝒯t\in\mathcal{T}, then the complexity upper bounds in Theorems 1 and 2 depend on terms (T1)dt+1(T-1)^{d_{t}+1} or (T1)dt+2(T-1)^{d_{t}+2}, where dtd_{t} is the state space dimension of stage t<Tt<T. It is natural to ask whether it is possible for either algorithm to achieve an ε\varepsilon-optimal solution without the exponential dependence on the state space dimensions as in Corollary 1. We construct a class of convex problems to show that this is indeed impossible.

Given a dd-sphere 𝒮d(r)={xd+1:x2=r}\mathcal{S}^{d}(r)=\{x\in\mathbb{R}^{d+1}\mathrel{\mathop{\ordinarycolon}}\lVert x\rVert_{2}=r\} with radius r>0r>0, a spherical cap with depth θ>0\theta>0 centered at a point x𝒮d(r)x\in\mathcal{S}^{d}(r) is the set 𝒮θd(r,x){y𝒮d(r):yx,xθr}\mathcal{S}^{d}_{\theta}(r,x)\coloneqq\{y\in\mathcal{S}^{d}(r)\mathrel{\mathop{\ordinarycolon}}\langle y-x,x\rangle\geq-\theta r\}. Our construction is based on the following technical lemma, where we use Γ()\Gamma(\cdot) to denote the gamma function.

Lemma 4 ([42, Lemmas 8 and 9]).

Given a dd-sphere 𝒮d(r),d2\mathcal{S}^{d}(r),d\geq 2 and depth θ<(122)r\theta<(1-\frac{\sqrt{2}}{2})r, there exists a finite set of points 𝒲\mathcal{W} with

|𝒲|(d21)πdΓ(d/2+1)Γ(d/2+3/2)(r2θ)(d1)/2,\mathinner{\!\left\lvert\mathcal{W}\right\rvert}\geq\frac{(d^{2}-1)\sqrt{\pi}}{d}\frac{\Gamma(d/2+1)}{\Gamma(d/2+3/2)}\left(\frac{r}{2\theta}\right)^{(d-1)/2},

such that for any w𝒲w\in\mathcal{W}, 𝒮θd(r,w)𝒲={w}\mathcal{S}^{d}_{\theta}(r,w)\cap\mathcal{W}=\{w\}. Moreover, given a positive constant l>0l>0, any function F𝒲:𝒲(12lθ,lθ)F^{\mathcal{W}}\mathrel{\mathop{\ordinarycolon}}\mathcal{W}\to(\frac{1}{2}l\theta,l\theta) can be extended to an ll-Lipschitz continuous convex function F:d+1(r)F\mathrel{\mathop{\ordinarycolon}}\mathcal{B}^{d+1}(r)\to\mathbb{R}, such that FF is differentiable at any point of 𝒲\mathcal{W}, and the approximate functions

F¯w(x)\displaystyle\underline{F}_{w}(x) max{F(v)+F(v),xv:v𝒲,vw},\displaystyle\coloneqq\max\{F(v)+\left\langle\nabla F(v),x-v\right\rangle\mathrel{\mathop{\ordinarycolon}}v\in\mathcal{W},v\neq w\},
F¯w(x)\displaystyle\overline{F}_{w}(x) conv{F(v)+lxv:v𝒲,vw},\displaystyle\coloneqq\mathrm{conv}\{F(v)+l\lVert x-v\rVert\mathrel{\mathop{\ordinarycolon}}v\in\mathcal{W},v\neq w\},

satisfy F¯w(w)<0\underline{F}_{w}(w)<0 and F¯w(w)>32lθ\overline{F}_{w}(w)>\frac{3}{2}l\theta.

We next construct a class of deterministic DR-MCO problems using such convex functions, with the following parameters: T2T\geq 2 as the number of stages, L>0L>0 as an upper bound on Lipschitz constants, d3d\geq 3 as the state space dimension, D=2r>0D=2r>0 as the state space diameter, and ε>0\varepsilon>0 as the target optimality gap. Set εt=ε/(T1)\varepsilon_{t}=\varepsilon/(T-1). Construct sets of points 𝒲t{wt,k}k=1Kt\mathcal{W}_{t}\coloneqq\{w_{t,k}\}_{k=1}^{K_{t}}, where θt=εt/L\theta_{t}=\varepsilon_{t}/L for t𝒯t\in\mathcal{T}^{\prime}. Let Ft(x)=F𝒲t(x)F_{t}(x)=F^{\mathcal{W}_{t}}(x) be an LL-Lipschitz continuous convex function on d(r)\mathcal{B}^{d}(r) constructed in Lemma 4 with Ft(wt,k)(εt/2,εt)F_{t}(w_{t,k})\in(\varepsilon_{t}/2,\varepsilon_{t}) for each wt,k𝒲tw_{t,k}\in\mathcal{W}_{t}. And we denote F1(x)0F_{1}(x)\equiv 0. Our DR-MCO problems for the complexity lower bound are defined by the following recursions for t=T,,2t=T,\dots,2

𝒬t1(xt1)=minxtd(r){Ft(xt1)+𝒬t(xt)},\mathcal{Q}_{t-1}(x_{t-1})=\min_{x_{t}\in\mathcal{B}^{d}(r)}\left\{F_{t}(x_{t-1})+\mathcal{Q}_{t}(x_{t})\right\}, (17)

where 𝒬T(x)0\mathcal{Q}_{T}(x)\equiv 0, and the first stage problem is defined as minx1d(r)𝒬1(x1)\min_{x_{1}\in\mathcal{B}^{d}(r)}\mathcal{Q}_{1}(x_{1}). We assume that the dual bounds satisfy MtLM_{t}\geq L, t𝒯t\in\mathcal{T}^{\prime} for the exactness guaranteed by Proposition 4. To describe the asymptotic behavior of a lower bound, we use the Ω\Omega-notation, i.e., for functions a,b:a,b\mathrel{\mathop{\ordinarycolon}}\mathbb{Z}\to\mathbb{R}, a(t)=Ω(b(t))a(t)=\Omega(b(t)) if lim inft|a(n)b(n)|>0\liminf_{t\to\infty}\mathinner{\!\left\lvert\frac{a(n)}{b(n)}\right\rvert}>0.

Theorem 3.

For the problem (17), there exist single stage subproblem oracles such that the number of evaluations #Eval for either of Algorithms 2 and 3 before termination has the following lower bound

#Evaldd1π2(d24)(DL(T1)2ε)d/21=Ω(Td/21) as T.\textup{{\#Eval}}\geq\frac{d}{d-1}\sqrt{\frac{\pi}{2}(d^{2}-4)}\left(\frac{DL(T-1)}{2\varepsilon}\right)^{d/2-1}=\varOmega(T^{d/2-1})\text{ as }T\to\infty.
Proof.

We assume that our SSSO at stage t𝒯t\in\mathcal{T} returns points in 𝒲t\mathcal{W}_{t} in the first KtK_{t} evaluations without repetition. More specifically, for t𝒯t\in\mathcal{T}^{\prime} and itKti_{t}\leq K_{t}, given 𝒬¯tit,𝒬¯tit\underline{\mathcal{Q}}_{t}^{i_{t}},\overline{\mathcal{Q}}_{t}^{i_{t}} and xt1it1x_{t-1}^{i_{t-1}} at some (it1,it)𝒥t(i_{t-1},i_{t})\in\mathcal{J}_{t}, the SSSO returns

  • a state xtit𝒲tx_{t}^{i_{t}}\in\mathcal{W}_{t} with xtitxtitx_{t}^{i_{t}}\neq x_{t}^{i^{\prime}_{t}} for all it<iti^{\prime}_{t}<i_{t};

  • a linear cut 𝒱tit(x):=Vtit(x)+𝒬¯tit(xtit)\mathcal{V}_{t}^{i_{t}}(x)\mathrel{\mathop{\ordinarycolon}}=V_{t}^{i_{t}}(x)+\underline{\mathcal{Q}}_{t}^{i_{t}}(x_{t}^{i_{t}});

  • an overestimate vtit:=Ft(xt1it1)+𝒬¯tit(xtit)v_{t}^{i_{t}}\mathrel{\mathop{\ordinarycolon}}=F_{t}(x_{t-1}^{i_{t-1}})+\overline{\mathcal{Q}}_{t}^{i_{t}}(x_{t}^{i_{t}}).

where Vtit(x):=Ft(xt1it1)+Ft(xt1it1),xxt1it1V_{t}^{i_{t}}(x)\mathrel{\mathop{\ordinarycolon}}=F_{t}(x_{t-1}^{i_{t-1}})+\langle\nabla F_{t}(x_{t-1}^{i_{t-1}}),x-x_{t-1}^{i_{t-1}}\rangle for any (it1,it)𝒥t(i_{t-1},i_{t})\in\mathcal{J}_{t}. Thus our under- and over-approximations of 𝒬t\mathcal{Q}_{t} can be written as

𝒬¯t1it1(x)=max{0,max{𝒬¯tit(xtit)+Vtit(x):(it1,it)𝒥t,it1it1}},\displaystyle\underline{\mathcal{Q}}_{t-1}^{i_{t-1}}(x)=\max\Bigl{\{}0,\max\bigl{\{}\underline{\mathcal{Q}}_{t}^{i^{\prime}_{t}}(x_{t}^{i^{\prime}_{t}})+V_{t}^{i^{\prime}_{t}}(x)\mathrel{\mathop{\ordinarycolon}}(i^{\prime}_{t-1},i^{\prime}_{t})\in\mathcal{J}_{t},i^{\prime}_{t-1}\leq i_{t-1}\bigr{\}}\Bigr{\}},
𝒬¯t1it1(x)=conv{𝒬¯tit(xtit)+Ft(xt1it1)+Mtxxt1it1:(it1,it)𝒥t,it1it1}.\displaystyle\overline{\mathcal{Q}}_{t-1}^{i_{t-1}}(x)=\mathrm{conv}\Bigl{\{}\overline{\mathcal{Q}}_{t}^{i^{\prime}_{t}}(x_{t}^{i^{\prime}_{t}})+F_{t}(x_{t-1}^{i^{\prime}_{t-1}})+M_{t}\lVert x-x_{t-1}^{i^{\prime}_{t-1}}\rVert\mathrel{\mathop{\ordinarycolon}}(i^{\prime}_{t-1},i^{\prime}_{t})\in\mathcal{J}_{t},i^{\prime}_{t-1}\leq i_{t-1}\Bigr{\}}.

To see that the above assumption on the SSSO does not conflict with Definition 2, it is straightforward to check that the state xtitx_{t}^{i_{t}} is feasible, the overestimate vtit=Ft(xt1it1)+𝒬¯tit(xtit)𝒬t1(xt1it1)v_{t}^{i_{t}}=F_{t}(x_{t-1}^{i_{t-1}})+\overline{\mathcal{Q}}_{t}^{i_{t}}(x_{t}^{i_{t}})\geq\mathcal{Q}_{t-1}(x_{t-1}^{i_{t-1}}), and the gap is controlled vtit𝒱tit(xt1it1)=𝒬¯tit(xtit)𝒬¯tit(xtit)v_{t}^{i_{t}}-\mathcal{V}_{t}^{i_{t}}(x_{t-1}^{i_{t-1}})=\overline{\mathcal{Q}}_{t}^{i_{t}}(x_{t}^{i_{t}})-\underline{\mathcal{Q}}_{t}^{i_{t}}(x_{t}^{i_{t}}). For the validness of the linear cut 𝒱tit\mathcal{V}_{t}^{i_{t}}, note that by Lemma 4, Vtit(x)<0V_{t}^{i_{t}}(x)<0 for all x𝒲t1{xt1it1}x\in\mathcal{W}_{t-1}\setminus\{x_{t-1}^{i_{t-1}}\}. Thus by a recursive argument from t=Tt=T to 22, we know that 𝒬¯t1it1(x)0\underline{\mathcal{Q}}_{t-1}^{i_{t-1}}(x)\equiv 0 when itKti_{t}\leq K_{t} for all t𝒯t\in\mathcal{T}^{\prime}. Similarly, one can verify that 𝒬¯t1(xt1it1)32ttεt\overline{\mathcal{Q}}_{t-1}(x_{t-1}^{i_{t-1}})\geq\frac{3}{2}\sum_{t^{\prime}\geq t}\varepsilon_{t^{\prime}}. Therefore, UpperBoundLowerBound=𝒬¯1i1(x1i1)𝒬¯1i1(x1i1)>32t=2Tεt>ε\text{{UpperBound}}-\text{{LowerBound}}=\overline{\mathcal{Q}}_{1}^{i_{1}}(x_{1}^{i_{1}})-\underline{\mathcal{Q}}_{1}^{i_{1}}(x_{1}^{i_{1}})>\frac{3}{2}\sum_{t=2}^{T}\varepsilon_{t}>\varepsilon. In other words, when the algorithms terminate, we must have itKti_{t}\geq K_{t} for some t𝒯t\in\mathcal{T}^{\prime}, which implies

#EvalKt\displaystyle\textup{{\#Eval}}\geq K_{t} ((d1)21)πd1Γ((d1)/2+1)Γ((d1)/2+3/2)(rLεt)(d2)/2\displaystyle\geq\frac{((d-1)^{2}-1)\sqrt{\pi}}{d-1}\frac{\Gamma((d-1)/2+1)}{\Gamma((d-1)/2+3/2)}\left(\frac{rL}{\varepsilon_{t}}\right)^{(d-2)/2}
d(d2)πd1Γ(d/2+1/2)Γ(d/2+1)(DL(T1)2ε)(d2)/2\displaystyle\geq\frac{d(d-2)\sqrt{\pi}}{d-1}\frac{\Gamma(d/2+1/2)}{\Gamma(d/2+1)}\left(\frac{DL(T-1)}{2\varepsilon}\right)^{(d-2)/2}
>dd1π2(d24)(DL(T1)2ε)d/21.\displaystyle>\frac{d}{d-1}\sqrt{\frac{\pi}{2}(d^{2}-4)}\left(\frac{DL(T-1)}{2\varepsilon}\right)^{d/2-1}.

Here, the last inequality is due to Wendel’s bound on the ratio of two gamma functions [34]. ∎

Remark 2.

The CDDP and NDDP algorithms (Algorithms 2 and 3) and their complexity analyses depend only on the SSSO (Definitions 1 and 2). Therefore, while we only discuss SSSO implementations for finitely supported DR-MCO in Section 3.1.1, the CDDP and NDDP algorithms work for more general DR-MCO problems. Moreover, the complexity upper bounds (Theorems 1 and 2, Corollary 1) and the lower bound (Theorem 3) remain valid for them as well.

4 Numerical Experiments

In this section, we numerically test the proposed CDDP and NDDP algorithms. The first test problem is a robust multi-commodity inventory problem with customer demand uncertainty. The second test problem is a distributionally robust hydro-thermal power planning problem with stochastic energy inflows. The computation budget consists of 10 3.0-GHz CPU cores and a total of 70 GBytes of RAM. The algorithms are implemented using JuMP package ([13], v1.14) in Julia language (v1.9) with Gurobi 9.5 [20] as its underlying LP solver.

4.1 Multi-Commodity Inventory Problem

We consider an MRCO of multi-commodity inventory problem with uncertain customer demands and deterministic holding and backlogging costs, following the description in [17]. This problem can be described through the local cost functions ftf_{t} and the uncertainty sets Ξt\Xi_{t}. Let 𝒥{1,2,,J}\mathcal{J}\coloneqq\{1,2,\dots,J\} denote the set of product indices. We first describe the variables in each stage t𝒯t\in\mathcal{T}. We use xt,jlx^{l}_{t,j} to denote the inventory level, yt,jay^{a}_{t,j} (resp. xt,jbx^{b}_{t,j}) to denote the amount of express (resp. standard) order fulfilled in the current (resp. subsequent) stage, of some product j𝒥j\in\mathcal{J}. Let ξtΞt\xi_{t}\in\Xi_{t} denote the uncertainty vector controlling the customer demands in stage tt. The state variables consist of the collection xt(xt,1l,,xt,Jl,xt,1b,,xt,Jb)x_{t}\coloneqq(x^{l}_{t,1},\dots,x^{l}_{t,J},x^{b}_{t,1},\dots,x^{b}_{t,J}), while the internal variables are simply the express order amounts yt=(yt,1a,,yt,Ja)y_{t}=(y^{a}_{t,1},\dots,y^{a}_{t,J}). In stage t𝒯t\in\mathcal{T}, the local cost function ftf_{t} and the state space 𝒳t\mathcal{X}_{t} are defined by

ft(xt1,xt;ξt)minyt\displaystyle f_{t}(x_{t-1},x_{t};\xi_{t})\coloneqq\min_{y_{t}}\; CF+j𝒥(Cjayt,ja+Cjbxt,jb+CjH[xt,jl]++\displaystyle C^{F}+\sum_{j\in\mathcal{J}}\bigl{(}C^{a}_{j}y^{a}_{t,j}+C^{b}_{j}x^{b}_{t,j}+C^{H}_{j}[x^{l}_{t,j}]_{+}+ cjB[xt,jl])\displaystyle c^{B}_{j}[x^{l}_{t,j}]_{-}\bigr{)} (18)
s.t.\displaystyle\mathrm{s.t.}\; j𝒥yt,jaBc,\displaystyle\sum_{j\in\mathcal{J}}y^{a}_{t,j}\leq B^{c},
xt,jlyt,jaxt1,jb=xt,jlDt,j(ξt),\displaystyle x^{l}_{t,j}-y^{a}_{t,j}-x^{b}_{t-1,j}=x^{l}_{t,j}-D_{t,j}(\xi_{t}), j𝒥,\displaystyle\forall\,j\in\mathcal{J},
yt,ja[0,Bja],\displaystyle y^{a}_{t,j}\in[0,B^{a}_{j}], j𝒥.\displaystyle\forall\,j\in\mathcal{J}.

and

𝒳t{xt2J:0xt,jbBjb,Bjlxt,jlBjl,j𝒥}.\mathcal{X}_{t}\coloneqq\bigl{\{}x_{t}\in\mathbb{R}^{2J}\mathrel{\mathop{\ordinarycolon}}0\leq x^{b}_{t,j}\leq B^{b}_{j},\ -B^{l}_{j}\leq x^{l}_{t,j}\leq B^{l}_{j},\ \forall\,j\in\mathcal{J}\bigr{\}}. (19)

Here in the formulation, CjaC^{a}_{j} (resp. CbC^{b}) denotes the express (resp. standard) order unit cost, CjHC^{H}_{j} (resp. CjBC^{B}_{j}) the inventory holding (resp. backlogging) unit cost, BjaB^{a}_{j} (resp. BjbB^{b}_{j}) the bound on the express (resp. standard) order, and BjlB^{l}_{j} the inventory level bound, for the product j𝒥j\in\mathcal{J}, respectively. The first constraint in (18) is a cumulative bound BcB^{c} on the express orders, the second constraint characterizes the change in the inventory level, and the last of (18) and the state space (19) represent bounds on the decision variables with respect to each product. We also put CF>0C^{F}>0 as a fixed cost to ensure the cost function is strictly positive (cf. Corollary 1). We use [x]+max{x,0}[x]_{+}\coloneqq\max\{x,0\} and [x]min{0,x}[x]_{-}\coloneqq-\min\{0,x\} to denote the positive and negative part of a real number xx. The initial state x0x_{0} is given by x0,jl=x0,jb=0x^{l}_{0,j}=x^{b}_{0,j}=0 for all j𝒥j\in\mathcal{J}. The uncertainty set Ξt\Xi_{t} is an EE-dimensional box [1,1]E[-1,1]^{E}, and the customer demand is predicted by the following factor model:

Dt,j(ξt)={2+sin((t1)π5)+Φt,j𝖳ξt,jJ/2,2+cos((t1)π5)+Φt,k𝖳ξt,j>J/2,D_{t,j}(\xi_{t})=\begin{cases}&2+\sin\left(\dfrac{(t-1)\pi}{5}\right)+\Phi_{t,j}^{\mathsf{T}}\xi_{t},\quad j\leq J/2,\\ &2+\cos\left(\dfrac{(t-1)\pi}{5}\right)+\Phi_{t,k}^{\mathsf{T}}\xi_{t},\quad j>J/2,\end{cases} (20)

where Φt,j\Phi_{t,j} is an EE-dimensional vector where each entry is chosen uniformly at random from [1/E,1/E][-1/E,1/E]. Thus the value Φt,j𝖳ξt[1,1]\Phi_{t,j}^{\mathsf{T}}\xi_{t}\in[-1,1] and Dt,j(ξt)0D_{t,j}(\xi_{t})\geq 0 for all t𝒯t\in\mathcal{T} and j𝒥j\in\mathcal{J}.

For the following numerical test, we set the number of products J=5J=5, the number of uncertainty factors E=4E=4, Bja=Bjb=Bjl=10B_{j}^{a}=B_{j}^{b}=B_{j}^{l}=10, Cjb=1C_{j}^{b}=1 for all j𝒥j\in\mathcal{J}, CF=1C^{F}=1, and Bc=0.3J=1.5B^{c}=0.3J=1.5. The costs are generated uniformly at random within Cja[1,3]C_{j}^{a}\in[1,3], CjH,cjB[0,2]C_{j}^{H},c_{j}^{B}\in[0,2] for all j𝒥j\in\mathcal{J}. Due to lack of RCR of the problem (18), we first test both CDDP (Algorithm 2) and NDDP (Algorithm 3) with dual bounds Mt=1.0×102M_{t}=1.0\times 10^{2} for all t𝒯t\in\mathcal{T}. The optimality gap set to be relative α=1%\alpha=1\% and approximation gaps set dynamically by δtit=LowerBoundα(Tt)/(T1)\delta_{t}^{i_{t}}=\text{{LowerBound}}\cdot\alpha(T-t)/(T-1) for t𝒯t\in\mathcal{T}. Over 3 independently generated test cases, we plot the growth of computational time and number of oracle evaluations with respect to the number of stages in Fig. 2, which shows that NDDP performs increasingly better than CDDP as the number of stages increases. In particular, NDDP can save up to 47.8% computation time and up to 44.2% subproblem oracle evaluations comparing to CDDP for these 40-stage problems.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Growth of computation expenses for CDDP and NDDP

As a comparison, we implement the NDDP algorithm without dual bounds. Instead, it generates linear feasibility cuts for approximation of the feasible sets (see definition of feasibility cuts in, e.g., [19]). We limit the computational budget to 2000T2000\cdot T subproblem oracle calls. For 5 independently generated test cases, we have obtained the following results (Table 1).

Table 1: Comparison of NDDP Performances with or without dual bounds
With Dual Bounds Without Dual Bounds
Stage LB UB Time (s) #Eval LB UB Time (s) #Eval
10 154.70 156.02 30.61 1554 154.70 155.30 33.34 1497
155.98 157.06 30.80 1523 155.98 157.37 14.16 1371
128.33 128.89 37.33 1771 128.31 129.48 19.46 1803
137.06 137.16 26.69 1355 137.06 137.06 14.75 1397
120.12 120.82 70.06 2345 118.11 inf 318.05 20003
15 232.46 233.85 63.84 2776 232.13 inf 440.07 30037
233.38 235.56 74.51 3104 233.58 235.23 33.90 2891
202.31 203.40 75.25 3154 202.48 203.70 40.89 3522
208.56 210.14 61.55 2707 208.50 209.28 33.85 2900
195.16 196.78 129.38 4589 191.30 inf 666.52 30024
20 291.78 293.94 196.57 6637 286.88 inf 615.68 40054
292.87 295.77 148.18 5561 292.87 293.14 67.45 4955
256.15 258.03 157.78 5810 256.19 258.63 74.12 5762
261.81 263.80 113.54 4569 261.81 261.82 61.22 4821
249.85 252.26 277.24 8296 235.01 inf 665.76 40002
25 369.62 372.54 257.33 8984 369.66 371.10 102.86 7334
370.08 373.74 278.66 9225 370.47 371.39 101.37 7153
330.18 333.38 213.01 7602 330.37 331.63 112.88 8133
333.20 336.05 166.87 6346 333.34 333.74 96.10 6946
324.83 327.99 479.58 12785 323.43 inf 1167.04 50006
30 428.81 432.41 545.98 15212 429.11 429.69 165.55 10217
429.34 432.08 414.91 12639 429.77 430.27 141.67 9549
383.74 387.01 372.00 11820 384.22 387.43 173.06 11368
386.54 390.41 277.57 9430 386.56 386.58 151.97 9824
379.46 382.84 649.69 16730 379.27 inf 1230.98 60016

In Table 1, the inf indicates values of infinity or numerically infinity values (i.e., values greater than 10810^{8}) upon termination. As we see from the table, the NDDP algorithm together with feasibility cuts fails to solve 2 out of 5 cases even when there are only 15 stages, showing the instability of the performance of feasibility cuts. In contrast, the algorithm with the dual-bounding technique solves all of the cases within a reasonable computation time and number of subproblem oracle evaluations, without any optimality gap on those cases that both formulations are able to solve. This demonstrates the ability of the our proposed DDP algorithms handling problems without RCR. It is worth mentioning that for cases where the NDDP algorithm converges without dual bounds, the computation time used is usually smaller than the regularized problem, which can be explained by better numerical conditions of feasibility cuts and their effect on reducing the effective volumes of the state space.

4.2 Hydro-Thermal Power Planning Problem

We next consider the Brazilian interconnected power system described in [12]. By assuming the stagewise independence and taking sample average approximation in the underlying stochastic energy inflow, we formulate the problem below as a finitely supported DR-MCO (1). Let 𝒥={1,,J}\mathcal{J}=\{1,\dots,J\} denote the indices of regions in the system (J=4J=4 in our data), and =j𝒥j\mathcal{L}=\cup_{j\in\mathcal{J}}\mathcal{L}_{j} the indices of thermal power plants, where each of the disjoint subsets j\mathcal{L}_{j} is associated with the region j𝒥j\in\mathcal{J}. We first describe the decision variables in each stage t𝒯t\in\mathcal{T}. Let Ξt={ξ^t,1,,ξ^t,nt}\Xi_{t}=\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\} denote the finite uncertainty set of energy inflow in stage t𝒯t\in\mathcal{T}^{\prime}. We use xt,jlx^{l}_{t,j} to denote the stored energy level, yt,jhy^{h}_{t,j} to denote the hydro power generation, and yt,jsy^{s}_{t,j} to denote the energy spillage, of some region j𝒥j\in\mathcal{J}; and yt,lgy^{g}_{t,l} to denote the thermal power generation for some thermal power plant ll\in\mathcal{L}. For two different regions jj𝒥j\neq j^{\prime}\in\mathcal{J}, we use yt,j,jey^{e}_{t,j,j^{\prime}} to denote the energy exchange from region jj to region jj^{\prime}, and yt,j,jay^{a}_{t,j,j^{\prime}} to denote the deficit account for region jj in region jj^{\prime}. Let xt(xt,1l,,xt,Jl)x_{t}\coloneqq(x^{l}_{t,1},\dots,x^{l}_{t,J}) be the state variables and yty_{t} denote the internal variables consisting of yt,jh,yt,js,yt,lg,yt,j,jey^{h}_{t,j},y^{s}_{t,j},y^{g}_{t,l},y^{e}_{t,j,j^{\prime}}, and yt,j,jay^{a}_{t,j,j^{\prime}} for any j,j𝒥j,j^{\prime}\in\mathcal{J} and ll\in\mathcal{L}. Then in stage t𝒯t\in\mathcal{T}, the state space is simply 𝒳tj=1J[0,Bjl]\mathcal{X}_{t}\coloneqq\prod_{j=1}^{J}[0,B_{j}^{l}], and the local cost function ftf_{t} can be defined as

ft(xt1\displaystyle f_{t}(x_{t-1} ,xt;ξt)\displaystyle,x_{t};\xi_{t})\coloneqq (21)
minyt\displaystyle\min_{y_{t}}\quad j𝒥(Csyt,js+ljClgyt,lg+j𝒥(Cj,jeyt,j,je+Cj,jayt,j,ja))\displaystyle\sum_{j\in\mathcal{J}}\bigg{(}C^{s}y^{s}_{t,j}+\sum_{l\in\mathcal{L}_{j}}C^{g}_{l}y^{g}_{t,l}+\sum_{j^{\prime}\in\mathcal{J}}\big{(}C^{e}_{j,j^{\prime}}y^{e}_{t,j,j^{\prime}}+C^{a}_{j,j^{\prime}}y^{a}_{t,j,j^{\prime}}\big{)}\bigg{)}
s.t.\displaystyle\mathrm{s.t.}\quad xt,jl+yt,jh+yt,js=xt1,jl+ξt,j,\displaystyle x^{l}_{t,j}+y^{h}_{t,j}+y^{s}_{t,j}=x^{l}_{t-1,j}+\xi_{t,j}, j𝒥,\displaystyle\forall\,j\in\mathcal{J},
yt,jh+ljyt,lg+j𝒥(yt,j,jayt,j,je+yt,j,je)=Dt,j,\displaystyle y^{h}_{t,j}+\sum_{l\in\mathcal{L}_{j}}y^{g}_{t,l}+\sum_{j^{\prime}\in\mathcal{J}}(y^{a}_{t,j,j^{\prime}}-y^{e}_{t,j,j^{\prime}}+y^{e}_{t,j^{\prime},j})=D_{t,j}, j𝒥,\displaystyle\forall\,j\in\mathcal{J},
yt,jh[0,Bjh],\displaystyle y^{h}_{t,j}\in[0,B^{h}_{j}], j𝒥,\displaystyle\forall\,j\in\mathcal{J},
yt,lg[Blg,,Blg,+],\displaystyle y^{g}_{t,l}\in[B^{g,-}_{l},B^{g,+}_{l}], l,\displaystyle\forall\,l\in\mathcal{L},
yt,j,ja[0,Bj,ja],yt,j,je[0,Bj,je],\displaystyle y^{a}_{t,j,j^{\prime}}\in[0,B^{a}_{j,j^{\prime}}],\ y^{e}_{t,j,j^{\prime}}\in[0,B^{e}_{j,j^{\prime}}], j,j𝒥.\displaystyle\forall\,j,j^{\prime}\in\mathcal{J}.

Here in the formulation, CsC^{s} denotes the unit penalty on energy spillage, ClgC^{g}_{l} the unit cost of thermal power generation of plant ll, Cj,jeC^{e}_{j,j^{\prime}} the unit cost of power exchange from region jj to region jj^{\prime}, Cj,jaC^{a}_{j,j^{\prime}} the unit cost on the energy deficit account for region jj in region jj^{\prime}, Dt,jD_{t,j} the deterministic power demand in stage tt and region jj, BjlB^{l}_{j} the bound on the storage level in region jj, BjhB^{h}_{j} the bound on hydro power generation in region jj, Blg,,Blg,+B^{g,-}_{l},B^{g,+}_{l} the lower and upper bounds of thermal power generation in plant ll, Bj,jaB^{a}_{j,j^{\prime}} the bound on the deficit account for region jj in region jj^{\prime}, and Bj,jeB^{e}_{j,j^{\prime}} the bound on the energy exchange from region jj to region jj^{\prime}. The first constraint in (21) characterizes the change of energy storage levels in each region jj, the second constraint imposes the power generation-demand balance for each region jj, and the rest are bounds on the decision variables. The initial state x0x_{0} and ξ1\xi_{1} are given by data.

The energy inflow outcomes are sampled from multivariate lognormal distributions that are interstage independent. Then the distributional ambiguity set is constructed using Wasserstein metric to reduce the effect of overtraining with the sampled outcome, according to [14]. To be precise, suppose p^t=(1/nt,,1/nt)Δnt\hat{p}_{t}=(1/n_{t},\dots,1/n_{t})\in\Delta^{n_{t}} is an empirical distribution of outcomes Ξt={ξ^t,1,,ξ^t,nt}\Xi_{t}=\{\hat{\xi}_{t,1},\dots,\hat{\xi}_{t,n_{t}}\}. Then, the distributional ambiguity set 𝒫t\mathcal{P}_{t} is described by

𝒫t{ptΔnt:Wt(pt,p^t)ρt},\mathcal{P}_{t}\coloneqq\left\{p_{t}\in\Delta^{n_{t}}\mathrel{\mathop{\ordinarycolon}}W_{t}(p_{t},\hat{p}_{t})\leq\rho_{t}\right\}, (22)

for some radius ρt0\rho_{t}\geq 0, where the Wasserstein metric WtW_{t} for finitely supported distributions is defined by

Wt(pt,p^t)minπk,k0\displaystyle W_{t}(p_{t},\hat{p}_{t})\coloneqq\min_{\pi_{k,k^{\prime}}\geq 0}\quad k,k=1ntξ^t,kξ^t,kπk,k\displaystyle\sum_{k,k^{\prime}=1}^{n_{t}}\lVert\hat{\xi}_{t,k}-\hat{\xi}_{t,k^{\prime}}\rVert\pi_{k,k^{\prime}} (23)
s.t.\displaystyle\mathrm{s.t.}\quad k=1ntπk,k=pt,k,\displaystyle\sum_{k=1}^{n_{t}}\pi_{k,k^{\prime}}=p_{t,k}, k=1,,nt,\displaystyle\forall\,k^{\prime}=1,\dots,n_{t},
k=1ntπk,k=p^t,k=1nt,\displaystyle\sum_{k^{\prime}=1}^{n_{t}}\pi_{k,k^{\prime}}=\hat{p}_{t,k}=\frac{1}{n_{t}},\quad k=1,,nt.\displaystyle\forall\,k=1,\dots,n_{t}.

Note when the radius ρt=0\rho_{t}=0, the ambiguity set 𝒫t\mathcal{P}_{t} becomes a singleton. In our numerical tests, we choose the radius to be relative to the total distances, i.e., ρt=βk,k=1ntξ^t,kξ^t,k\rho_{t}=\beta\cdot\sum_{k,k^{\prime}=1}^{n_{t}}\lVert\hat{\xi}_{t,k}-\hat{\xi}_{t,k^{\prime}}\rVert for some β0\beta\geq 0. At the same time, we use uniform dual bounds for the tests, i.e., Mt=M>0M_{t}=M>0 for all t𝒯t\in\mathcal{T}^{\prime}. When the relative optimality gap α\alpha is smaller than the threshold 5%5\%, we check whether all the active cuts in the recent iterations are strictly smaller than the regularization factor. If they are, then the algorithm is terminated, and otherwise the dual bound MM is increased by a factor of 103.1623\sqrt{10}\approx 3.1623 with all the over-approximations reset to 𝒬¯ti(x)+\underline{\mathcal{Q}}_{t}^{i}(x)\leftarrow+\infty, t𝒯t\in\mathcal{T}. Five scenarios are sampled independently in each stage for the nominal problem nt=5n_{t}=5 before the distributional robust counterpart is constructed by (22). For the 24-stage problem that we consider, the samples already give a total 5245.9×10165^{24}\approx 5.9\times 10^{16} scenario paths, which is already impossible to solve via an extensive formulation. We have then obtained the following results (Table 2).

Table 2: CDDP on hydro-thermal power planning with different radii
β\beta log10(M)\log_{10}(M) LB (107\cdot 10^{7}) UB (107\cdot 10^{7}) Med. Time (s) Med. #Eval
0.00 2 4.60 4.84 784.45 33075
3 4.59 4.83 301.06 15345
4 4.58 4.82 374.88 17505
5 4.58 4.82 378.80 17595
6 4.58 4.82 386.86 17910
0.02 2 4.84 5.09 439.28 22095
3 4.84 5.09 263.26 13725
4 4.82 5.07 365.72 17100
5 4.82 5.07 358.08 16830
6 4.82 5.07 405.42 18090
0.04 2 5.10 5.36 921.99 36720
3 5.06 5.32 237.34 13095
4 5.06 5.33 327.38 15975
5 5.06 5.33 325.62 15930
6 5.06 5.33 350.65 16650
0.06 2 5.34 5.62 606.61 27810
3 5.32 5.60 284.40 14445
4 5.31 5.59 277.21 14310
5 5.30 5.58 271.26 14130
6 5.31 5.59 285.30 14445
0.08 2 5.57 5.86 519.87 25380
3 5.55 5.84 204.36 11790
4 5.54 5.83 290.96 14715
5 5.54 5.83 223.92 12510
6 5.54 5.83 257.26 13500
0.10 2 5.81 6.12 446.94 22140
3 5.78 6.08 194.85 10980
4 5.78 6.09 262.42 12915
5 5.78 6.09 215.10 11790
6 5.79 6.09 237.22 12420

In Table 2, the lower bound (LB), the upper bound (UB) at termination, the computation time (Med. Time) and the number of subproblem oracles (Med. #Eval) shown are the median of the five test cases. The logarithmic regularization factors log10(M)\log_{10}(M) listed in the table correspond to the initial regularization factors. We see that for different choices of the relative radii β\beta, the median computation time and number of subproblem oracle evaluations are usually smaller when log10(M)=3\log_{10}(M)=3, without compromising the quality of upper and lower bounds. This can be explained by the better numerical conditions for the smaller regularization factors (the cuts have smaller Lipschitz constants), leading to shorter subproblem oracle evaluations times (cf. Algorithm 1). We thus conclude that the regularization technique could lead to smaller number of subproblem oracle evaluations, as well as shorter computation time for a given DR-MCO problem.

5 Concluding Remarks

In this work, we extend CDDP algorithms to a broad class of DR-MCO problems, and propose a novel NDDP algorithm that decides its stage exploration strategy dynamically based on the cost-to-go function approximation qualities. We enforce bounds on the DDP recursion dual variables to effectively control the growth of Lipschitz constants in the approximation. We provide a comprehensive complexity analysis of both CDDP and NDDP algorithms, proving both upper complexity bounds and a matching lower bound, which reveal, in a precise way, the dependence of the complexity of the DDP-type algorithms on the number of stages, the dimension of the decision space, and various regularity characteristics of DR-MCO. In particular, our NDDP algorithm scales linearly with TT, assuming that the target optimality gap scales with TT. This is the first complexity analysis of DDP-type algorithms in such a general setting, and we believe it provides key insights for further developing efficient computational tools for the very many applications of sequential decision making under uncertainty. We also provide numerical examples to show the capability of the DDP-type algorithms method to solve problems without RCR, and reduction in computation time and number of subproblem oracle evaluations, due to the dual-bounding technique.

References

  • Ahmed et al. [2019] S Ahmed, F G Cabral, and B F Paulo da Costa. Stochastic lipschitz dynamic programming, 2019.
  • Anderson and Philpott [2019] E J Anderson and A B Philpott. Improving sample average approximation using distributional robustness. Optimization Online, 2019.
  • Baucke et al. [2017] R Baucke, A Downward, and G Zakeri. A deterministic algorithm for solving multistage stochastic programming problems. Optimization Online, 2017.
  • Baucke et al. [2018] R Baucke, A Downward, and G Zakeri. A deterministic algorithm for solving stochastic minimax dynamic programmes. Optimization Online, 2018.
  • Bauschke et al. [2011] Heinz H Bauschke, Patrick L Combettes, et al. Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
  • Ben-Tal et al. [2009] A Ben-Tal, L El Ghaoui, and A Nemirovski. Robust optimization, volume 28. Princeton University Press, 2009.
  • Bertsimas and Georghiou [2015] D Bertsimas and A Georghiou. Design of near optimal decision rules in multistage adaptive mixed-integer optimization. Operations Research, 63(3):610–627, 2015.
  • Birge [1985] John R. Birge. Decomposition and Partitioning Methods for Multistage Stochastic Linear Programs. Operations Research, 33(5):989–1007, October 1985. ISSN 0030-364X, 1526-5463. doi: 10.1287/opre.33.5.989.
  • Boyd et al. [2004] Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • Chen and Powell [1999] Z L Chen and W B Powell. Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage Stochastic Linear Programs with Recourse. Journal of Optimization Theory and Applications, 102(3):497–524, September 1999. ISSN 0022-3239, 1573-2878. doi: 10.1023/A:1022641805263.
  • de Matos et al. [2015] Vitor L. de Matos, Andy B. Philpott, and Erlon C. Finardi. Improving the performance of Stochastic Dual Dynamic Programming. Journal of Computational and Applied Mathematics, 290:196–208, December 2015. ISSN 03770427. doi: 10.1016/j.cam.2015.04.048.
  • Ding et al. [2019] L Ding, S Ahmed, and A Shapiro. A python package for multi-stage stochastic programming. Optimization Online, pages 1–41, 2019.
  • Dunning et al. [2017] I Dunning, J Huchette, and M Lubin. Jump: A modeling language for mathematical optimization. SIAM Review, 59(2):295–320, 2017. doi: 10.1137/15M1020575.
  • Duque and Morton [2019] D Duque and D P Morton. Distributionally robust stochastic dual dynamic programming. Optimization Online, 2019.
  • Dyer and Stougie [2006] Martin Dyer and Leen Stougie. Computational complexity of stochastic programming problems. Mathematical Programming, 106(3):423–432, May 2006. ISSN 0025-5610, 1436-4646. doi: 10.1007/s10107-005-0597-0.
  • Flach et al. [2010] B.C. Flach, L.A. Barroso, and M.V.F. Pereira. Long-term optimal allocation of hydro generation for a price-maker company in a competitive market: Latest developments and a stochastic dual dynamic programming approach. IET Generation, Transmission & Distribution, 4(2):299, 2010. ISSN 17518687. doi: 10.1049/iet-gtd.2009.0107.
  • Georghiou et al. [2019] A Georghiou, A Tsoukalas, and W Wiesemann. Robust dual dynamic programming. Operations Research, 67(3):813–830, 2019.
  • Girardeau et al. [2015] P. Girardeau, V. Leclere, and A. B. Philpott. On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs. Mathematics of Operations Research, 40(1):130–145, February 2015. ISSN 0364-765X, 1526-5471. doi: 10.1287/moor.2014.0664.
  • Grothey et al. [1999] A Grothey, S Leyffer, and KIM McKinnon. A note on feasibility in benders decomposition. Numerical Analysis Report NA/188, Dundee University, 1999.
  • Gurobi Optimization, LLC [2023] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023. URL https://www.gurobi.com.
  • Hadjiyiannis et al. [2011] M J Hadjiyiannis, P J Goulart, and D Kuhn. A scenario approach for estimating the suboptimality of linear decision rules in two-stage robust optimization. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 7386–7391. IEEE, 2011.
  • Huang et al. [2017] J Huang, K Zhou, and Y Guan. A Study of Distributionally Robust Multistage Stochastic Optimization. arXiv:1708.07930 [math], August 2017.
  • Komiya [1988] Hidetoshi Komiya. Elementary proof for sion’s minimax theorem. Kodai mathematical journal, 11(1):5–7, 1988.
  • Kuhn et al. [2011] D Kuhn, W Wiesemann, and A Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming, 130(1):177–209, 2011.
  • Lan [2019] Guanghui Lan. Complexity of Stochastic Dual Dynamic Programming. arXiv:1912.07702, December 2019.
  • Linowsky and Philpott [2005] K. Linowsky and A. B. Philpott. On the Convergence of Sampling-Based Decomposition Algorithms for Multistage Stochastic Programs. Journal of Optimization Theory and Applications, 125(2):349–366, May 2005. ISSN 0022-3239, 1573-2878. doi: 10.1007/s10957-004-1842-z.
  • Liu [2019] R P Liu. On feasibility of sample average approximation solutions to stochastic programming. arXiv preprint arXiv:1904.00137, 2019.
  • Pereira and Pinto [1985] M. V. F. Pereira and L. M. V. G. Pinto. Stochastic Optimization of a Multireservoir Hydroelectric System: A Decomposition Approach. Water Resources Research, 21(6):779–792, June 1985. ISSN 00431397. doi: 10.1029/WR021i006p00779.
  • Pereira and Pinto [1991] M. V. F. Pereira and L. M. V. G. Pinto. Multi-stage stochastic optimization applied to energy planning. Mathematical Programming, 52(1-3):359–375, May 1991. ISSN 0025-5610, 1436-4646. doi: 10.1007/BF01582895.
  • Philpott et al. [2019] A Philpott, F Wahid, and F Bonnans. MIDAS: a mixed integer dynamic approximation scheme. Mathematical Programming, pages 1–32, 2019.
  • Philpott et al. [2018] A. B. Philpott, V. L. de Matos, and L. Kapelevich. Distributionally robust SDDP. Computational Management Science, 15(3-4):431–454, October 2018. ISSN 1619-697X, 1619-6988. doi: 10.1007/s10287-018-0314-0.
  • Philpott and Guan [2008] A.B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters, 36(4):450–455, July 2008. ISSN 01676377. doi: 10.1016/j.orl.2008.01.013.
  • Philpott et al. [2013] Andy Philpott, Vitor de Matos, and Erlon Finardi. On Solving Multistage Stochastic Programs with Coherent Risk Measures. Operations Research, 61(4):957–970, August 2013. ISSN 0030-364X, 1526-5463. doi: 10.1287/opre.2013.1175.
  • Qi [2010] Feng Qi. Bounds for the ratio of two gamma functions. Journal of Inequalities and Applications, 2010:1–84, 2010.
  • Rudin [1987] Walter Rudin. Real and complex analysis. Tata McGraw-hill education, third edition, 1987.
  • Shapiro [2006] A Shapiro. On complexity of multistage stochastic programs. OR Letters, 34(1):1–8, January 2006. ISSN 01676377. doi: 10.1016/j.orl.2005.02.003.
  • Shapiro [2011] A Shapiro. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research, 209(1):63–72, February 2011. ISSN 03772217. doi: 10.1016/j.ejor.2010.08.007.
  • Shapiro and Ding [2019] A Shapiro and L Ding. Stationary multistage programs. Optimization Online, 2019.
  • Shapiro et al. [2009] A Shapiro, D Dentcheva, and A Ruszczyński. Lectures on stochastic programming. SIAM, 2009.
  • Shapiro et al. [2013] A Shapiro, W Tekaya, J P da Costa, and M P Soares. Risk neutral and risk averse Stochastic Dual Dynamic Programming method. European Journal of Operational Research, 224(2):375–391, January 2013. ISSN 03772217. doi: 10.1016/j.ejor.2012.08.022.
  • Shapiro et al. [2012] Alexander Shapiro, Wajdi Tekaya, Joari P da Costa, and Murilo P Soares. Final report for technical cooperation between georgia institute of technology and ons–operador nacional do sistema elétrico. Georgia Tech ISyE Report, 2012.
  • Zhang and Sun [2022] Shixuan Zhang and Xu Andy Sun. Stochastic dual dynamic programming for multistage stochastic mixed-integer nonlinear optimization. Mathematical Programming, 196(1):935–985, 2022.
  • Zou et al. [2019a] J Zou, S Ahmed, and X A Sun. Stochastic dual dynamic integer programming. Mathematical Programming, 175(1-2):461–502, May 2019a. ISSN 0025-5610, 1436-4646. doi: 10.1007/s10107-018-1249-5.
  • Zou et al. [2019b] Jikai Zou, Shabbir Ahmed, and Xu Andy Sun. Multistage Stochastic Unit Commitment Using Stochastic Dual Dynamic Integer Programming. IEEE Transactions on Power Systems, 34(3):1814–1823, May 2019b. ISSN 0885-8950, 1558-0679. doi: 10.1109/TPWRS.2018.2880996.