This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimizing distortion riskmetrics with distributional uncertainty

Silvana M. Pesenti Department of Statistical Sciences, University of Toronto, Canada. ✉ silvana.pesenti@utoronto.ca    Qiuqi Wang Department of Statistics and Actuarial Science, University of Waterloo, Canada. ✉ q428wang@uwaterloo.ca    Ruodu Wang Department of Statistics and Actuarial Science, University of Waterloo, Canada. ✉ wang@uwaterloo.ca
Abstract

Optimization of distortion riskmetrics with distributional uncertainty has wide applications in finance and operations research. Distortion riskmetrics include many commonly applied risk measures and deviation measures, which are not necessarily monotone or convex. One of our central findings is a unifying result that allows us to convert an optimization of a non-convex distortion riskmetric with distributional uncertainty to a convex one, leading to great tractability. A sufficient condition to the unifying equivalence result is the novel notion of closedness under concentration, a variation of which is also shown to be necessary for the equivalence. Our results include many special cases that are well studied in the optimization literature, including but not limited to optimizing probabilities, Value-at-Risk, Expected Shortfall, Yaari’s dual utility, and differences between distortion risk measures, under various forms of distributional uncertainty. We illustrate our theoretical results via applications to portfolio optimization, optimization under moment constraints, and preference robust optimization.

Keywords: risk measures; deviation measures; distributionally robust optimization; convexification; conditional expectation

1 Introduction

Riskmetrics, such as measures of risk and variability, are common tools to represent preferences, model decisions under risks, and quantify different types of risks. To fix terms, we refer to riskmetrics as any mapping from a set of random variables to the real line, and risk measures as riskmetrics that are monotone in the sense of Artzner et al. (1999).

In this paper, we focus on distortion riskmetrics which is a large class of commonly used measures of risk and variability; see Wang et al. (2020a) for the terminology “distortion riskmetrics”. Distortion riskmetrics include L-functionals (Huber and Ronchetti, 2009) in statistics, Yaari’s dual utilities (Yaari, 1987) in decision theory, distorted premium principles (Wang et al., 1997) in insurance, and spectral risk measures (Acerbi, 2002) in finance; see Wang et al. (2020a) for further examples. After a normalization, increasing distortion riskmetrics are distortion risk measures, which include, in particular, the two most important risk measures used in current banking and insurance regulation, the Value-at-Risk (VaR) and the Expected Shortfall (ES). Moreover, convex distortion riskmetrics are the building blocks (via taking a supremum) for all convex risk functionals (Liu et al., 2020), including classic risk measures (Artzner et al., 1999; Föllmer and Schied, 2002) and deviation measures (Rockafellar et al., 2006).

When riskmetrics are evaluated on distributions that are subject to uncertainty, decisions should be taken with respect to the worst (or best) possible values a riskmetric attains over a set of alternative distributions; giving rise to the active subfield of distributionally robust optimization. The set of alternative distributions, the uncertainty set, may be characterized by moment constraints (e.g., Popescu (2007)), parameter uncertainty (e.g., Delage and Ye (2010)), probability constraints (e.g., Wiesemann et al. (2014)), and distributional distances (e.g., Blanchet and Murthy (2019)), amongst others. Popular distortion risk measures such as VaR and ES are studied extensively in this context; see e.g., Natarajan et al. (2008) and Zhu and Fukushima (2009).

Optimization of convex distortion risk measures, i.e., distortion riskmetrics with an increasing and concave distortion function, is relatively well understood under distributional uncertainty; see Cornilly et al. (2018), Li (2018), and Liu et al. (2020) for some recent work. Nevertheless, many distortion riskmetrics are not convex or monotone. For example, in the Cumulative Prospect Theory of Tversky and Kahneman (1992), the distortion function is typically assumed to be inverse-S-shaped; in financial risk management, the popular risk measure VaR\mathrm{VaR} has a non-concave distortion function, and the inter-quantile difference (Wang et al., 2020b) has a distortion function that is neither concave nor monotone. Another example is the difference between two distortion risk measures, which is clearly not increasing or convex in general. Optimizing non-convex distortion riskmetrics under distributional uncertainty is difficult and results are available only for special cases; see Li et al. (2018), Cai et al. (2018), Zhu and Shao (2018), Wang et al. (2019), and Bernard et al. (2020), all with an increasing distortion function.

There is, however, a notable common feature in the above mentioned literature when a non-convex distortion risk metric is involved. For numerous special cases, one often obtains an equivalence between the optimization problem with non-convex distortion riskmetric and that with a convex one. Inspired by this observation, the aim of this paper is to address:

What conditions provide equivalence between a non-convex riskmetric and a convex one in the setting of distributional uncertainty?

An answer to this question is still missing in the literature. In this sense, we offer a novel perspective on distributionally robust optimization problems by converting non-convex optimization problems to their convex counterpart. Transforming a non-convex to a convex optimization problem through approximation and via a direct equivalence has been studied by Zymler et al. (2013) and Cai et al. (2020). Both contributions, however, consider uncertainty sets described by some special forms of constraints. A unifying framework applicable to numerous uncertainty sets and the entire class of distortion riskmetrics is however missing and at the core of this paper.

The main novelty of our results is three-fold: first, we obtain a unifying result (Theorem 1) that allows, under distributional uncertainty, to convert an optimization problem of a non-convex distortion riskmetric to an optimization problem with a convex one. The result covers, to the authors’ best knowledge, all known equivalences between optimization problems of non-convex and convex riskmetrics with distributional uncertainty. The proof requires techniques beyond the ones used in the existing literature, as we do not make assumptions such as monotonicity, positiveness, and continuity. Our framework can also be easily applied to settings with atomic probability space or with uncertainty sets of multi-dimensional distributions. Second, we introduce the concept of closedness under concentration as a sufficient condition to establish the equivalence, and it is also a necessary condition on the set of optimizers given that the equivalence holds (Theorem 2). We show how the properties of closedness under concentration within a collection of intervals \mathcal{I} and closedness under concentration for all intervals can easily be verified and provide numerous examples. Third, the classes of distortion riskmetrics and uncertainty formulations considered in this paper include all special cases studied in the literature; examples are presented in Sections 3-4. In particular, our class of riskmetrics include all practically used risk measures and variability measures (some via taking a sup), dual utilities with inverse-S-shaped distortion functions of Tversky and Kahneman (1992), and differences between two dual utilities or distortion risk measures. Our uncertainty formulations include both supremum and infimum problems,111Thus we provide a universal treatment of worst-case and best-case risk values. Calculating best-case risk values allows us to solve economic decision making problems where optimal distributions are chosen to minimize the risk. moment constraints, convex order/risk measure constraints, marginal constraints in risk aggregation with dependence uncertainty (e.g., Embrechts et al. (2015)), preference robust optimization (e.g., Armbruster and Delage (2015) and Guo and Xu (2021)), and some one-dimensional and multi-dimensional uncertainty sets induced by Wasserstein metrics.

The great generality distinguishes our work from the large literature on distributional robust optimization cited above. Our work is of analytical and probabilistic nature, and we focus on theoretical equivalence results which will be also illustrated via numerical implementations. The target problems are formulated in Section 2. Sections 3 is devoted to our main contribution of the equivalence of non-convex and convex optimization problems with distributional uncertainty. We illustrate by many examples the concepts of closedness under conditional expectation and closedness under concentration, and distinguish them in several practical settings. Section 4 demonstrates the equivalence results in multi-dimensional settings. In addition to a general multi-dimensional model with a concave loss function, we solve a robust risk aggregation problem with ambiguity on both the marginal distributions and the dependence structure. In Section 5, our results are used to solve optimization problems with uncertainty sets defined via moment constraints. In particular, we generalize a few well-known results in the literature on optimization and worst-case values of risk measures. Sections 6 and 7 contain numerical illustrations of optimizing differences between two distortion riskmetrics, portfolio optimization, and preference robust optimization. Some concluding remarks are put in Section 8. Proofs of all results are relegated to Appendix B.

2 Distortion riskmetrics with distributional uncertainty

2.1 Problem formulation

Throughout, we work with an atomless probability space (Ω,,)(\Omega,\mathscr{F},\mathbb{P}). For nn\in\mathbb{N}, AA represents a set of actions, ρ\rho is an objective functional, f:A×nf:A\times\mathbb{R}^{n}\to\mathbb{R} is a loss function, and 𝐗\mathbf{X} is an nn-dimensional random vector with distributional uncertainty. Many problems in distributionally robust optimization have the form

min𝐚AsupF𝐗~ρ(f(𝐚,𝐗)),\min_{\mathbf{a}\in A}\,\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\;\rho(f(\mathbf{a},\mathbf{X})), (1)

where F𝐗F_{\mathbf{X}} denotes the distribution of 𝐗\mathbf{X} and ~\widetilde{\mathcal{M}} is a set of plausible distributions for 𝐗\mathbf{X}. We will first focus on the inner problem

supF𝐗~ρ(f(𝐚,𝐗)),\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\;\rho(f(\mathbf{a},\mathbf{X})), (2)

which we may rewrite as

supFYρ(Y),\sup_{F_{Y}\in\mathcal{M}}\rho(Y), (3)

where FYF_{Y} denotes the distribution of YY and \mathcal{M} is a set of distributions on \mathbb{R}. We suppress the reliance on 𝐚\mathbf{a} as it remains constant in the inner problem (2). The supremum in (3) is typically referred to as the worst-case risk measure in the literature if ρ\rho is monotone.222A risk measure ρ:p\rho:\mathcal{L}^{p}\to\mathbb{R} is monotone if ρ(X)ρ(Y)\rho(X)\leqslant\rho(Y) for all X,YpX,Y\in\mathcal{L}^{p} with XYX\leqslant Y. The problem (3) can also represent an optimal decision problem, where ρ\rho is an objective to maximize, and a decision maker chooses an optimal distribution from the set \mathcal{M} which is interpreted as an action set instead of an uncertainty set (i.e., no uncertainty in this problem). Since the two problems share the same mathematical formulation (3), we will navigate through our results mainly with the first interpretation of worst-case risk under uncertainty.

We denote by p\mathcal{L}^{p}, p[1,)p\in[1,\infty), the space of random variables with finite pp-th moment. Let \mathcal{L}^{\infty} represent the set of bounded random variables and let 0\mathcal{L}^{0} represent the space of all random variables. Denote by \mathcal{H} the set of functions h:[0,1]h:[0,1]\mapsto\mathbb{R} with bounded variation satisfying h(0)=0h(0)=0. For p[1,]p\in[1,\infty] and hh\in\mathcal{H}, a distortion riskmetric ρh:p\rho_{h}:\mathcal{L}^{p}\to\mathbb{R} is defined as

ρh(Y)=0h((Y>x))dx+0(h((Y>x))h(1))dx,Yp,\rho_{h}(Y)=\int_{0}^{\infty}h(\mathbb{P}(Y>x))\,\mathrm{d}x+\int_{-\infty}^{0}(h(\mathbb{P}(Y>x))-h(1))\,\mathrm{d}x,~{}~{}Y\in\mathcal{L}^{p}, (4)

whenever the above integrals are finite; see Proposition 5 below for a sufficient condition. The function hh\in\mathcal{H} is called a distortion function. Note that we allow hh to be non-monotone; if hh is increasing and h(1)=1h(1)=1, then ρh\rho_{h} is a distortion risk measure. The distortion riskmetric ρh\rho_{h} is convex if and only if hh is concave; see Wang et al. (2020b) for this and other properties of ρh\rho_{h}.

In this paper we consider the objective functional ρ\rho in (1) to be a distortion riskmetric ρh\rho_{h} for some hh\in\mathcal{H}, as the class of distortion riskmetrics includes a large class of objective functionals of interest. Note that a general analysis of (3) also covers the infimum problem infFYρh(Y)\inf_{F_{Y}\in\mathcal{M}}\rho_{h}(Y), since ρh=ρh-\rho_{h}=\rho_{-h} is again a distortion riskmetric. This illustrates an advantage of studying distortion riskmetrics over monotone ones, as our analysis unifies best- and worst-case risk evaluations. Best-case risk measures are also of practical importance. In particular, they may represent risk minimization problems through the second interpretation of (3), where \mathcal{M} represents a set of possible actions (see Section 3.4 for some examples).

If ρh\rho_{h} is not convex, or equivalently, hh is not concave, optimization problems of the type (3) are often highly nontrivial. However, the optimization problem of maximizing ρh(Y)\rho_{h^{*}}(Y) over FY{F_{Y}\in\mathcal{M}}, where hh^{*} is the smallest concave distortion function dominating hh, is convex and can often be solved relatively easily either analytically or through numerical methods. Clearly, since ρhρh\rho_{h^{*}}\geqslant\rho_{h}, we have

supFYρh(Y)supFYρh(Y)\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)\leqslant\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y)

and one naturally wonders when the above inequality holds as equality, that is, under what conditions, it holds

supFYρh(Y)=supFYρh(Y).\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y)\,. (5)

The main contribution of this paper is a sufficient condition on the uncertainty set \mathcal{M} that guarantees equivalence of these optimization problems, that is (5) holds. We will also obtain a necessary condition. If (5) holds, then the non-convex problem (the left-hand side of (5)) is converted into the convex problem (the right-hand side of (5)), providing huge convenience, which in turn helps to solve the minimax problem (1).

2.2 Notation and preliminaries

For p1p\geqslant 1 and nn\in\mathbb{N}, we denote by pn\mathcal{M}^{n}_{p} the set of all distributions on n\mathbb{R}^{n} with finite pp-th moment. Let n\mathcal{M}^{n}_{\infty} be the set of nn-dimensional distributions of bounded random variables. For p[1,]p\in[1,\infty], write p1=p\mathcal{M}^{1}_{p}=\mathcal{M}_{p} for simplicity. The set inclusion \subset and terms like “increasing” and “decreasing” are in the non-strict sense. For X,YpX,Y\in\mathcal{L}^{p}, we write X=dYX\buildrel\mathrm{d}\over{=}Y to represent that XX and YY have the same distribution. For a distribution F1F\in\mathcal{M}_{1}, let its left- and right-quantile functions be given respectively by

F1(α)=inf{x:F(x)α}andF1+(α)=inf{x:F(x)>α},α[0,1],F^{-1}(\alpha)=\inf\>\{x\in\mathbb{R}:F(x)\geqslant\alpha\}~{}~{}\text{and}~{}~{}F^{-1+}(\alpha)=\inf\>\{x\in\mathbb{R}:F(x)>\alpha\},~{}~{}\alpha\in[0,1],

with the convention inf()=\inf(\varnothing)=\infty. For x,yx,y\in\mathbb{R}, we write xy=max{x,y}x\vee y=\max\{x,y\} and xy=min{x,y}x\wedge y=\min\{x,y\}. Since hh\in\mathcal{H} is of bounded variation, its discontinuity points are at most countable and the left- and right-limits exist at each of these points. We write

h(t+)={limxth(x),t[0,1),h(1),t=1,andh(t)={limxth(x),t(0,1],h(0),t=0,h(t^{+})=\left\{\begin{array}[]{l l}\lim_{x\downarrow t}h(x),&t\in[0,1),\\ h(1),&t=1,\end{array}\right.~{}~{}\text{and}~{}~{}h(t^{-})=\left\{\begin{array}[]{l l}\lim_{x\uparrow t}h(x),&t\in(0,1],\\ h(0),&t=0,\end{array}\right.

and the upper semicontinuous modification of hh is denoted by

h^(t)=h(t+)h(t)h(t),t(0,1),with h^(0)=0andh^(1)=h(1).\hat{h}(t)=h(t^{+})\vee h(t^{-})\vee h(t),~{}~{}t\in(0,1),~{}~{}\mbox{with }\hat{h}(0)=0~{}\mbox{and}~{}\hat{h}(1)=h(1).

Note that h^(t)=h(t)\hat{h}(t)=h(t) at all continuous points of hh, and we do not make any modification at the points 0 and 11 even if hh has a jump at these points. For hh\in\mathcal{H} and t[0,1]t\in[0,1], define its concave and convex envelopes hh^{*} and hh_{*} respectively by

h(t)=inf{g(t):g,gh,g is concave on[0,1]},h^{*}(t)=\inf\left\{g(t):~{}g\in\mathcal{H},~{}g\geqslant h,~{}g\textrm{ is concave on}~{}[0,1]\right\},
h(t)=sup{g(t):g,gh,g is convex on[0,1]}.h_{*}(t)=\sup\left\{g(t):~{}g\in\mathcal{H},~{}g\leqslant h,~{}g\textrm{ is convex on}~{}[0,1]\right\}.

Both hh^{*} and hh_{*} are continuous functions on (0,1)(0,1) for all hh\in\mathcal{H}, and if hh is continuous at 0 and 11, then so are hh^{*} and hh_{*} (see Figure 4 below for an illustration of hh and hh^{*}). Denote by \mathcal{H}^{*} (resp. \mathcal{H}_{*}) the set of concave (resp. convex) functions in \mathcal{H}. Note that for all hh\in\mathcal{H}, we have hh^{*}\in\mathcal{H}^{*} and hh_{*}\in\mathcal{H}_{*}. As a well-known property of the convex and concave envelopes of a continuous hh (e.g., Brighi and Chipot (1994)), hh^{*} (resp. hh_{*}) differs from hh on a union of disjoint open intervals, and hh^{*} (resp. hh_{*}) is linear on these intervals. The functions hh, h^\hat{h}, hh^{*} and (h^)(\hat{h})^{*} are illustrated in Figure 1.

Refer to caption
Refer to caption
Figure 1: An example of hh (left) and h^\hat{h} (right) with the set of discontinuity points {t1,t2,t3,t4,t5}\{t_{1},t_{2},t_{3},t_{4},t_{5}\} excluding 0 and 11; the dashed lines represent hh^{*} and (h^)(\hat{h})^{*}, which are identical by Proposition 1

While in general ρh\rho_{h} and ρh^\rho_{\hat{h}} are different functionals, one has ρh(Y)=ρh^(Y)\rho_{h}(Y)=\rho_{\hat{h}}(Y) for any random variable YY with continuous quantile function; see Lemma 1 of Wang et al. (2020a). Moreover, h=(h^)h^hh^{*}=(\hat{h})^{*}\geqslant\hat{h}\geqslant h and the four functions are all equal if hh is concave. Below, we provide a new result on convex envelopes of distortion functions hh that are not necessarily monotone or continuous, which may be of independent interest.

Proposition 1.

For any hh\in\mathcal{H}, we have h=(h^)h^{*}=(\hat{h})^{*} and the set {t[0,1]:h^(t)h(t)}\{t\in[0,1]:\hat{h}(t)\neq h^{*}(t)\} is the union of some disjoint open intervals. Moreover, hh^{*} is linear on each of the above intervals.

In the sequel, we mainly focus on hh^{*}, which will be useful when optimizing ρh\rho_{h} in (3). A similar result to Proposition 1 holds for hh_{*}, useful in the corresponding infimum problem, where the upper semicontinuous modification of hh is replaced by the lower semicontinuous one. This follows directly from Proposition 1 by setting g=hg=-h which gives ρg=ρh\rho_{g}=-\rho_{h} and h=gh_{*}=-g^{*}.

For all distortion functions hh\in\mathcal{H}, from Proposition 1, there exist (countably many) disjoint open intervals on which h^h\hat{h}\neq h^{*}. Using a similar notation to Wang et al. (2019), we define the set

h={(1b,1a):h^h on (a,b),h^(a)=h(a),h^(b)=h(b)}.\mathcal{I}_{h}=\{(1-b,1-a):\hat{h}\neq h^{*}\text{ on }(a,b),~{}\hat{h}(a)=h^{*}(a),~{}\hat{h}(b)=h^{*}(b)\}\,.

The set h\mathcal{I}_{h} is easy to identify in practice; see Section 3.2 for examples of commonly used distortion riskmetrics and their corresponding sets h\mathcal{I}_{h}.

3 Equivalence between non-convex and convex riskmetrics

3.1 Concentration and the main equivalence result

In this section, we introduce the concept of concentration, and use this concept to explain our main equivalence results, Theorems 1 and 2. For a distribution F1F\in\mathcal{M}_{1} and an interval C[0,1]C\subset[0,1] (when speaking of an interval in [0,1][0,1], we exclude singletons or empty sets), we define the CC-concentration of FF, denote by FCF^{C}, as the distribution of the random variable

F1(U)𝟙{UC}+𝔼[F1(U)|UC]𝟙{UC},F^{-1}(U)\mathds{1}_{\{U\not\in C\}}+\mathbb{E}[F^{-1}(U)|U\in C]\mathds{1}_{\{U\in C\}}, (6)

where UU[0,1]U\sim\mathrm{U}[0,1] is a standard uniform random variable. In other words, FCF^{C} is obtained by concentrating the probability mass of F1(U)F^{-1}(U) on {UC}\{U\in C\} at its conditional expectation, whereas the rest of the distribution remains unchanged. For F1F\in\mathcal{M}_{1} and 0a<b10\leqslant a<b\leqslant 1, it is clear that the left-quantile function of F(a,b)F^{(a,b)} is given by

F1(t)𝟙{t(a,b]}+abF1(u)duba𝟙{t(a,b]},t[0,1].F^{-1}(t)\mathds{1}_{\{t\not\in(a,b]\}}+\frac{\int_{a}^{b}F^{-1}(u)\,\mathrm{d}u}{b-a}\mathds{1}_{\{t\in(a,b]\}},~{}~{}t\in[0,1]. (7)

For a collection \mathcal{I} of (possibly infinitely many) non-overlapping intervals in [0,1][0,1], let FF^{\mathcal{I}} be the distribution corresponding to the left-quantile function given by the left-continuous version of

F1(t)𝟙{tCC}+CCF1(u)duλ(C)𝟙{tC},t[0,1],F^{-1}(t)\mathds{1}_{\{t\not\in\bigcup_{C\in\mathcal{I}}C\}}+\sum_{C\in\mathcal{I}}\frac{\int_{C}F^{-1}(u)\,\mathrm{d}u}{\lambda(C)}\mathds{1}_{\{t\in C\}},~{}~{}~{}~{}t\in[0,1], (8)

where λ\lambda is the Lebesgue measure; see Figure 2 for an illustration.

11013\frac{1}{3}12\frac{1}{2}23\frac{2}{3}F1F^{-1}
11013\frac{1}{3}12\frac{1}{2}23\frac{2}{3}(F)1(F^{\mathcal{I}})^{-1}
Figure 2: Left panel: quantile function of FF; right panel: quantile function of FF^{\mathcal{I}} where ={(0,1/3),(1/2,2/3)}\mathcal{I}=\{(0,1/3),(1/2,2/3)\}
Definition 1.

Let \mathcal{M} be a set of distributions in 1\mathcal{M}_{1} and \mathcal{I} be a collection \mathcal{I} of intervals in [0,1][0,1]. We say that (a) \mathcal{M} is closed under concentration within \mathcal{I} if FF^{\mathcal{I}}\in\mathcal{M} for all FF\in\mathcal{M}; (b) \mathcal{M} is closed under concentration for all intervals if for all FF\in\mathcal{M}, we have FCF^{C}\in\mathcal{M} for all intervals C[0,1]C\subset[0,1]; (c) \mathcal{M} is closed under conditional expectation if for all FXF_{X}\in\mathcal{M}, the distribution of any conditional expectation of XX is in \mathcal{M}.

The relationship between the three properties of closedness in Definition 1 is discussed in Propositions 2 and 3 below. Generally, (c)\Rightarrow(b)\Rightarrow(a) if \mathcal{I} is finite. Our main equivalence result is summarized in the following theorem.

Theorem 1.

For 1\mathcal{M}\subset\mathcal{M}_{1} and hh\in\mathcal{H}, the following hold.

  1. (i)

    If h=h^h=\hat{h}, i.e., hh is upper semicontinuous on (0,1)(0,1), and \mathcal{M} is closed under concentration within h\mathcal{I}_{h}, then

    supFYρh(Y)=supFYρh(Y).\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y). (9)
  2. (ii)

    If \mathcal{M} is closed under concentration for all intervals, then (9) holds.

  3. (iii)

    If h=h^h=\hat{h}, \mathcal{M} is closed under concentration within h\mathcal{I}_{h}, and the second supremum in (9) is attained by some FF\in\mathcal{M}, then FhF^{\mathcal{I}_{h}} attains both suprema.

Both suprema in (9) may be infinite, and this is discussed in Remark 5 in Appendix A.2. The proof of Theorem 1 is more technical than similar results in the literature because of the challenges arising from non-monotonicity, non-positivity, and discontinuity of hh; see Figure 1 for a sample of possible complications. In (ii), hh does not need to be upper semicontinuous on (0,1)(0,1) for (9) to hold because closedness under concentration for all intervals in (ii) is stronger than the condition in (i).

Remark 1.

For 1\mathcal{M}\subset\mathcal{M}_{1} and hh\in\mathcal{H}, if h=h^h=\hat{h} and FCF^{C}\in\mathcal{M} for all FF\in\mathcal{M} and ChC\in\mathcal{I}_{h}, then the equivalence relation (9) also holds. If h\mathcal{I}_{h} is finite, then this condition is generally stronger than closedness under concentration within h\mathcal{I}_{h} in (i).

A natural question from Theorem 1 is whether our key condition of closedness under concentration is necessary in some sense for the equivalence (9) to hold.333We thank an anonymous referee for raising this question. It is immediate to notice that adding any distributions FZF_{Z} satisfying ρh(Z)<supFYρh(Y)\rho_{h^{*}}(Z)<\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y) to the set \mathcal{M} does not affect the equivalence, and therefore we turn our attention to the set of maximizers instead of the whole set \mathcal{M}. In the next result, we show that closedness under concentration within h\mathcal{I}_{h} of the set of maximizers of (3) is necessary for the equivalence (9) to hold.

Theorem 2.

For 1\mathcal{M}\subset\mathcal{M}_{1} and hh\in\mathcal{H} such that hhh\neq h^{*}, suppose that the set opt\mathcal{M}_{\mathrm{opt}} of all maximizers of maxFYρh(Y)\max_{F_{Y}\in\mathcal{M}}\rho_{h}(Y) is non-empty. If the equivalence (9) holds, i.e., supFYρh(Y)=supFYρh(Y)\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y), then opt\mathcal{M}_{\mathrm{opt}} is closed under concentration within h\mathcal{I}_{h}.

If the equivalence (9) holds, then each FoptF\in\mathcal{M}_{\mathrm{opt}} also maximizes the problem supFYρh(Y)\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y). Conversely, if h=h^h=\hat{h}, then this condition and closedness of opt\mathcal{M}_{\mathrm{opt}} under concentration within h\mathcal{I}_{h} together are necessary (by Theorem 2) and sufficient (by Theorem 1) for the equivalence (9) to hold. If the maximizer FF of the original problem (3) is unique, then by Theorem 2, FF must be equal to FhF^{\mathcal{I}_{h}}. The equivalence (9) does not imply closedness under concentration within h\mathcal{I}_{h} of the uncertainty set \mathcal{M} itself; an example showing this is discussed in Remark 2.

3.2 Some examples of distortion riskmetrics

We provide a few examples of distortion riskmetrics ρh\rho_{h} commonly used in decision theory and finance, and obtain their corresponding set h\mathcal{I}_{h}. The Value-at-Risk (VaR) and the Expected Shortfall (ES) are the most popular risk measures in practice. We introduce them first, followed by an inverse-S-shaped distortion function of Tversky and Kahneman (1992).

Example 1 (VaR and ES).

For Y0Y\in\mathcal{L}^{0}, using the sign convention of McNeil et al. (2015), VaR is defined as the left-quantile, and upper VaR (VaR+\mathrm{VaR}^{+}) is defined as the right-quantile; that is,

VaRα(Y)=FY1(α),α(0,1]andVaRα+(Y)=FY1+(α),α[0,1).\mathrm{VaR}_{\alpha}(Y)=F^{-1}_{Y}(\alpha),~{}~{}\alpha\in(0,1]~{}~{}~{}\mbox{and}~{}~{}~{}\mathrm{VaR}^{+}_{\alpha}(Y)=F^{-1+}_{Y}(\alpha),~{}~{}\alpha\in[0,1).

ES at level α\alpha is defined as

ESα(Y)=11αα1VaRt(Y)dt,α(0,1),Y1.\mathrm{ES}_{\alpha}(Y)=\frac{1}{1-\alpha}\int_{\alpha}^{1}\mathrm{VaR}_{t}(Y)\,\mathrm{d}t,~{}~{}\alpha\in(0,1),~{}Y\in\mathcal{L}^{1}.

Both VaRα\mathrm{VaR}_{\alpha} and ESα\mathrm{ES}_{\alpha} belong to the class of distortion riskmetrics. Take α(0,1)\alpha\in(0,1). Let h(t)=𝟙(1α,1](t)h(t)=\mathds{1}_{(1-\alpha,1]}(t), t[0,1]t\in[0,1]. It follows that hh\in\mathcal{H} and h^(t)=𝟙[1α,1](t)\hat{h}(t)=\mathds{1}_{[1-\alpha,1]}(t), t[0,1]t\in[0,1]. In this case, ρh=VaRα\rho_{h}=\mathrm{VaR}_{\alpha}. Moreover, h(t)=t1α1h^{*}(t)=\frac{t}{1-\alpha}\wedge 1, t[0,1]t\in[0,1] and ρh=ESα\rho_{h^{*}}=\mathrm{ES}_{\alpha}. Since hh^{*} and h^\hat{h} differ on (0,1α),(0,1-\alpha), we have h={(α,1)}\mathcal{I}_{h}=\{(\alpha,1)\}.

Example 2 (TK distortion riskmetrics).

The following function hh is an inverse-S-shaped distortion function (see also Figure 4):

h(t)=tγ(tγ+(1t)γ)1/γ,t[0,1],γ(0,1).h(t)=\frac{t^{\gamma}}{\left(t^{\gamma}+(1-t)^{\gamma}\right)^{1/\gamma}},~{}~{}t\in[0,1],~{}\gamma\in(0,1). (10)

Distortion riskmetrics with distortion function (10) are commonly used in behavioural economics and finance; see e.g., Tversky and Kahneman (1992). For simplicity, we call such distortion riskmetrics TK distortion riskmetrics. Typical values of γ\gamma are in [0.5,0.9][0.5,0.9]; see Wu and Gonzalez (1996). For hh in (10), it is clear that h=h^h=\hat{h} on [0,1][0,1] by continuity of hh. We have hhh^{*}\neq h on (t0,1)(t_{0},1), for some t0(0,1)t_{0}\in(0,1), and hh^{*} is linear on [t0,1][t_{0},1]. Thus, h={(0,1t0)}\mathcal{I}_{h}=\{(0,1-t_{0})\}. An example of hh in (10) and its concave envelope hh^{*} are plotted in Figure 3 (left).

For h1,h2h_{1},h_{2}\in\mathcal{H}, we write h=h1h2h=h_{1}-h_{2}\in\mathcal{H} and consider the difference between two distortion riskmetrics, that is

ρh=ρh1ρh2.\rho_{h}=\rho_{h_{1}}-\rho_{h_{2}}. (11)

Such type of distortion riskmetrics measure the difference or disagreement between two utilities, risk attitudes, or capital requirements. Determining the upper and lower bounds, or the largest absolute values of such measures of disagreement, is of interest in practice but rarely studied in the literature. Note that h1h2h_{1}-h_{2} is in general not monotone or concave even when h1h_{1} and h2h_{2} themselves have the specified properties. Below we show some examples of distortion riskmetrics taking the form of (11).

Example 3 (Inter-quantile range and inter-ES range).

For α[1/2,1)\alpha\in[1/2,1), we take h1(t)=𝟙[1α,1](t)h_{1}(t)=\mathds{1}_{[1-\alpha,1]}(t) and h2(t)=𝟙(α,1](t)h_{2}(t)=\mathds{1}_{(\alpha,1]}(t), t[0,1]t\in[0,1]. It follows that h(t)=h1(t)h2(t)=𝟙{1αtα}h(t)=h_{1}(t)-h_{2}(t)=\mathds{1}_{\{1-\alpha\leqslant t\leqslant\alpha\}}, t[0,1]t\in[0,1], h^=h\hat{h}=h, and

ρh(X)=FX1+(α)FX1(1α),X0.\rho_{h}(X)=F^{-1+}_{X}(\alpha)-F^{-1}_{X}(1-\alpha),~{}~{}X\in\mathcal{L}^{0}.

Correspondingly, we have h(t)=t/(1α)1+(αt)/(1α)0h^{*}(t)=t/(1-\alpha)\wedge 1+(\alpha-t)/(1-\alpha)\wedge 0, t[0,1]t\in[0,1], and

ρh(X)=ESα(X)+ESα(X),X1.\rho_{h^{*}}(X)=\mathrm{ES}_{\alpha}(X)+\mathrm{ES}_{\alpha}(-X),~{}~{}X\in\mathcal{L}^{1}.

This distortion riskmetric ρh\rho_{h} is called an inter-quantile range and ρh\rho_{h^{*}} is called an inter-ES range. As the distortion functions hh^{*} and h^\hat{h} differ on the open intervals (0,1α)(0,1-\alpha) and (α,1)(\alpha,1), we have h={(α,1),(0,1α)}\mathcal{I}_{h}=\{(\alpha,1),(0,1-\alpha)\}. The distortion functions hh and hh^{*} are displayed in Figure 3 (right).

Refer to caption
Refer to caption
Figure 3: Left panel: hh and hh^{*} for the TK distortion riskmetric with γ=0.7\gamma=0.7 in Example 2; right panel: hh and hh^{*} for the inter-quantile range in Example 3
Example 4 (Difference of two inverse-S-shaped distortion functions).

We take h1h_{1} and h2h_{2} to be the inverse-S-shaped distortion functions in (10), with parameters γ1=0.8\gamma_{1}=0.8 and γ2=0.7\gamma_{2}=0.7, respectively. By calculation, the function h=h1h2h=h_{1}-h_{2} is convex on [0,0.3770][0,0.3770], concave on [0.3770,1][0.3770,1], and as seen in Figure 4 not monotone. The concave envelope hh^{*} is linear on [0,0.7578][0,0.7578] and h=hh^{*}=h on [0.7578,1][0.7578,1]. Thus, we have h={(0.2422,1)}\mathcal{I}_{h}=\{(0.2422,1)\}. The graphs of the distortion functions h1h_{1}, h2h_{2}, hh, and hh^{*} are displayed in Figure 4.

Refer to caption
Refer to caption
Figure 4: Left panel: inverse-S-shaped distortion functions h1h_{1} and h2h_{2} in Example 4; right panel: h=h1h2h=h_{1}-h_{2} and hh^{*} of the same example

The functions in \mathcal{H} are a.e. differentiable, and for an absolutely continuous function hh\in\mathcal{H}, let hh^{\prime} be a (representative) function on [0,1][0,1] that is a.e. equal to the derivative of hh. If hh\in\mathcal{H} is left-continuous or VaRt(Y)\mathrm{VaR}_{t}(Y) is continuous with respect to t(0,1)t\in(0,1), the risk measure ρh\rho_{h} in (4) has representation

ρh(Y)=01VaR1t(Y)dh(t),Yp;\rho_{h}(Y)=\int_{0}^{1}\mathrm{VaR}_{1-t}(Y)\,\mathrm{d}h(t),~{}~{}Y\in\mathcal{L}^{p}; (12)

see Lemma 1 of Wang et al. (2020a). If hh\in\mathcal{H} is absolutely continuous it holds

ρh(Y)=01VaR1t(Y)h(t)dt,Yp.\rho_{h}(Y)=\int_{0}^{1}\mathrm{VaR}_{1-t}(Y)h^{\prime}(t)\,\mathrm{d}t,~{}~{}Y\in\mathcal{L}^{p}. (13)

Another example of a recently introduced distortion riskmetric with concave distortion function may be of independent interest in risk management.

Example 5 (Second-order superquantile).

As introduced by Rockafellar and Royset (2018), a second-order superquantile is defined as

SSQα(Y)=11αα1ESt(Y)dt,α(0,1),Y2.\mathrm{SSQ}_{\alpha}(Y)=\frac{1}{1-\alpha}\int^{1}_{\alpha}\mathrm{ES}_{t}(Y)\,\mathrm{d}t,~{}~{}\alpha\in(0,1),~{}Y\in\mathcal{L}^{2}.

By Theorem 2.4 of Rockafellar and Royset (2018), SSQα\mathrm{SSQ}_{\alpha} is a distortion riskmetric with a concave distortion function hh given by

h(t)={t1α(1+log1αt),0t<1α,1,1αt1.h(t)=\left\{\begin{array}[]{ll}\frac{t}{1-\alpha}\left(1+\log\frac{1-\alpha}{t}\right),&0\leqslant t<1-\alpha,\\ 1,&1-\alpha\leqslant t\leqslant 1.\end{array}\right.

Clearly, SSQαESα\mathrm{SSQ}_{\alpha}\geqslant\mathrm{ES}_{\alpha}. The difference SSQαESα\mathrm{SSQ}_{\alpha}-\mathrm{ES}_{\alpha} between second-order superquantile and ES, which has a similar interpretation as ESαVaRα\mathrm{ES}_{\alpha}-\mathrm{VaR}_{\alpha}, is a distortion riskmetric with a non-concave and non-monotone distortion function gg, and the set g\mathcal{I}_{g} contains a single interval of the form (0,β)(0,\beta) for some β[α,1)\beta\in[\alpha,1).

3.3 Closedness under concentration for all intervals

In this section, we present some technical results and specific examples about closedness under concentration for all intervals and under conditional expectation. The proposition below clarifies the relationship between closedness under concentration for all intervals and closedness under conditional expectation.

Proposition 2.

Closedness under conditional expectation implies closedness under concentration for all intervals, but the converse is not true.

Example 6.

We present 6 classes of sets \mathcal{M} that are closed under conditional expectation, and hence also under concentration for all intervals.

  1. 1.

    (Moment conditions) For p>1p>1, mm\in\mathbb{R}, and v>0v>0, the set

    (p,m,v)={FYp:𝔼[Y]=m,𝔼[|Ym|p]vp}\mathcal{M}(p,m,v)=\{F_{Y}\in\mathcal{M}_{p}:\mathbb{E}[Y]=m,~{}\mathbb{E}[|Y-m|^{p}]\leqslant v^{p}\}

    is closed under conditional expectation by Jensen’s inequality. The set (p,m,v)\mathcal{M}(p,m,v) corresponds to distributional uncertainty with moment information, and the setting p=2p=2 (mean and variance constraints) is the most commonly studied.

  2. 2.

    (Mean-covariance conditions) For nn\in\mathbb{N}, 𝐚n\mathbf{a}\in\mathbb{R}^{n}, 𝝁n\bm{\mu}\in\mathbb{R}^{n}, and Σn×n\Sigma\in\mathbb{R}^{n\times n} positive semidefinite, let

    mv(𝐚,𝝁,Σ)={F𝐚𝐗2:F𝐗2n,𝔼[𝐗]=𝝁,var(𝐗)Σ},\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)=\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{2}:F_{\mathbf{X}}\in\mathcal{M}^{n}_{2},~{}\mathbb{E}[\mathbf{X}]=\bm{\mu},~{}\mathrm{var}(\mathbf{X})\preceq\Sigma\},

    where 𝐗=(X1,,Xn)\mathbf{X}=(X_{1},\dots,X_{n}), 𝔼[𝐗]=(𝔼[X1],,𝔼[Xn])\mathbb{E}[\mathbf{X}]=(\mathbb{E}[X_{1}],\dots,\mathbb{E}[X_{n}]), var(𝐗)\mathrm{var}(\mathbf{X}) is the covariance matrix of 𝐗\mathbf{X}, and BBB^{\prime}\preceq B means that the matrix BBB-B^{\prime} is positive semidefinite for two positive semidefinite symmetric matrices BB and BB^{\prime}. With a simple verification in Appendix A.1, mv(𝐚,𝝁,Σ)=(2,𝐚𝝁,(𝐚Σ𝐚)1/2)\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)=\mathcal{M}(2,\mathbf{a}^{\top}\bm{\mu},(\mathbf{a}^{\top}\Sigma\mathbf{a})^{1/2}).

  3. 3.

    (Convex function conditions) For nn\in\mathbb{N}, 𝐚n\mathbf{a}\in\mathbb{R}^{n}, KK\subset\mathbb{N}, a collection 𝐟=(fk)kK\mathbf{f}=(f_{k})_{k\in K} of convex functions on n\mathbb{R}^{n}, and a vector 𝐱=(xk)kK|K|\mathbf{x}=(x_{k})_{k\in K}\in\mathbb{R}^{|K|}, let

    𝐟(𝐚,𝐱)={F𝐚𝐗1:𝔼[fk(𝐗)]xkfor allkK}.\mathcal{M}^{\mathbf{f}}(\mathbf{a},\mathbf{x})=\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{1}:\mathbb{E}[f_{k}(\mathbf{X})]\leqslant x_{k}~{}\mbox{for all}~{}k\in K\}.

    The set 𝐟\mathcal{M}^{\mathbf{f}} corresponds to distributional uncertainty with constraints on expected losses or test functions. The set 𝐟\mathcal{M}^{\mathbf{f}} includes (p,m,v)\mathcal{M}(p,m,v) as a special case.

  4. 4.

    (Distortion conditions) For KK\subset\mathbb{N}, a collection 𝐡=(hk)kK()|K|\mathbf{h}=(h_{k})_{k\in K}\in(\mathcal{H}^{*})^{|K|} and a vector 𝐱=(xk)kK|K|\mathbf{x}=(x_{k})_{k\in K}\in\mathbb{R}^{|K|}, let

    𝐡(𝐱)={FY1:ρhk(Y)xkfor allkK}.\mathcal{M}^{\mathbf{h}}(\mathbf{x})=\{F_{Y}\in\mathcal{M}_{1}:\rho_{h_{k}}(Y)\leqslant x_{k}~{}\mbox{for all}~{}k\in K\}.

    The set 𝐡\mathcal{M}^{\mathbf{h}} corresponds to distributional uncertainty with constraints on preferences modeled by convex dual utilities.

  5. 5.

    (Convex order conditions) For KK\subset\mathbb{N} and a collection of random variables 𝐙=(Zk)kK(1)|K|\mathbf{Z}=(Z_{k})_{k\in K}\in(\mathcal{L}^{1})^{|K|}, let

    cx(𝐙)={FY1:YcxZkfor allkK},\mathcal{M}^{\rm cx}(\mathbf{Z})=\{F_{Y}\in\mathcal{M}_{1}:Y\leqslant_{\rm cx}Z_{k}~{}\mbox{for all}~{}k\in K\},

    where cx\leqslant_{\rm cx} is the inequality in convex order.444Precisely, we write Gcx(icx)FG\leqslant_{\rm cx}(\leqslant_{\rm icx})\,F if ϕdGϕdF\int\phi\,\mathrm{d}G\leqslant\int\phi\,\mathrm{d}F for all (increasing) convex functions ϕ\phi such that the above two integrals are well defined. Similar to the above two examples, cx(𝐙)\mathcal{M}^{\rm cx}(\mathbf{Z}) is closed under conditional expectation (cf. Remark 6 in Appendix A.2).

  6. 6.

    (Marginal conditions) For given univariate distributions F1,,Fn1F_{1},\dots,F_{n}\in\mathcal{M}_{1}, let

    S(F1,,Fn)={FX1++Xn1:XiFi,i=1,,n}.\mathcal{M}^{S}(F_{1},\dots,F_{n})=\{F_{X_{1}+\dots+X_{n}}\in\mathcal{M}_{1}:X_{i}\sim F_{i},~{}i=1,\dots,n\}.

    In other words, S\mathcal{M}^{S} is the set of all possible aggregate risks X1++XnX_{1}+\dots+X_{n} with given marginal distributions of X1,,XnX_{1},\dots,X_{n}; see Embrechts et al. (2015) for some results on S\mathcal{M}^{S}. Generally, S\mathcal{M}^{S} is not closed under concentration for all intervals or conditional expectation, since closedness under concentration for all intervals is stronger than joint mixability (Wang and Wang, 2016). In the special case where F1==Fn=U[0,1]F_{1}=\dots=F_{n}=\mathrm{U}[0,1], Proposition 1 and Theorem 5 of Mao et al. (2019) imply that S\mathcal{M}^{S} is closed under conditional expectation if and only if n3n\geqslant 3.

Remark 2.

The uncertainty set (p,m,v)\mathcal{M}(p,m,v) of the moment condition in Example 6 can be restricted to the set

¯(p,m,v)={FYp:𝔼[Y]=m,𝔼[|Ym|p]=vp},\overline{\mathcal{M}}(p,m,v)=\{F_{Y}\in\mathcal{M}_{p}:\mathbb{E}[Y]=m,~{}\mathbb{E}[|Y-m|^{p}]=v^{p}\},

which is the “boundary” of (p,m,v)\mathcal{M}(p,m,v). For =(p,m,v)\mathcal{M}=\mathcal{M}(p,m,v), the suprema on both sides of (9) are obtained by some distributions in ¯(p,m,v)\overline{\mathcal{M}}(p,m,v); see Theorem 5. As a direct consequence, we get

supFY¯(p,m,v)ρh(Y)=supFY(p,m,v)ρh(Y)=supFY(p,m,v)ρh(Y)=supFY¯(p,m,v)ρh(Y).\sup_{F_{Y}\in\overline{\mathcal{M}}(p,m,v)}\rho_{h^{*}}(Y)=\sup_{F_{Y}\in{\mathcal{M}}(p,m,v)}\rho_{h^{*}}(Y)=\sup_{F_{Y}\in{\mathcal{M}}(p,m,v)}\rho_{h}(Y)=\sup_{F_{Y}\in\overline{\mathcal{M}}(p,m,v)}\rho_{h}(Y).

Hence, equivalence holds even though ¯(p,m,v)\overline{\mathcal{M}}(p,m,v) is not closed under concentration for any interval. By Theorem 2, the set of optimizers is closed under concentration within h\mathcal{I}_{h} for each hh\in\mathcal{H}.

For a distribution F1F\in\mathcal{M}_{1} and a collection \mathcal{I} of disjoint intervals in [0,1][0,1], we have the following result regarding to the distribution FF^{\mathcal{I}}.

Proposition 3.

Let \mathcal{I} be a collection of disjoint intervals in [0,1][0,1] and \mathcal{M} be a set of distributions. If \mathcal{M} is closed under concentration for all intervals and \mathcal{I} is finite, or \mathcal{M} is closed under conditional expectation, then \mathcal{M} is closed under concentration within \mathcal{I}.

If \mathcal{I} is infinite, closedness under concentration for all intervals may not be sufficient for closedness under concentration within \mathcal{I}; see Remark 7 in Appendix A.2 for a technical explanation. An infinite h\mathcal{I}_{h} does not appear for any distortion riskmetrics in practice.

3.4 Examples of closedness under concentration within \mathcal{I} but not for all intervals

In practice, it is much easier to check closedness under concentration within a specific collection of intervals \mathcal{I} than closedness under concentration for all intervals or under conditional expectation. In this section, we show several examples for closedness under concentration within some \mathcal{I}.

For distortion functions hh such that h={(p,1)}\mathcal{I}_{h}=\{(p,1)\} (resp. h={(0,p)}\mathcal{I}_{h}=\{(0,p)\}) for some p(0,1)p\in(0,1), the result in Theorem 1 (i) only requires \mathcal{M} to be closed under concentration within {(p,1)}\{(p,1)\} (resp. {(0,p)}\{(0,p)\}). Such distortion functions include the inverse-S-shaped distortion functions in (10), those of VaRp\mathrm{VaR}_{p}, and VaRp+\mathrm{VaR}^{+}_{p}, and that of the difference between the second-order superquantile and ES in Example 5. Below we present some more concrete examples.

Example 7 (\mathcal{M} has two elements).

Let p(0,1)p\in(0,1) and ={U[0,1],pδp/2+(1p)U[p,1]}\mathcal{M}=\{\mathrm{U}[0,1],\,p\delta_{p/2}+(1-p)\mathrm{U}[p,1]\} where δp/2\delta_{p/2} is the point-mass at p/2p/2. We can check that \mathcal{M} is closed under concentration within {(0,p)}\{(0,p)\} but \mathcal{M} is not closed under concentration for all intervals. Indeed, any set closed under concentration for all intervals and containing U[0,1]\mathrm{U}[0,1] has infinitely many elements. In general, a finite set which contains any non-degenerate distribution is not closed under conditional expectation in an atomless probability space, since there are infinitely many possible distributions for the conditional expectation of a given non-constant random variable. Another similar example that is closed under concentration within {(0,p)}\{(0,p)\} is the set of all possible distributions of the sum of several Pareto risks; see Example 5.1 of Wang et al. (2019).

Example 8 (VaR and ES).

As we see from Example 1, if ρh=VaRα+\rho_{h}=\mathrm{VaR}^{+}_{\alpha} for some α(0,1)\alpha\in(0,1), then ρh\rho_{h^{*}} is ESα\mathrm{ES}_{\alpha} and h={(α,1)}\mathcal{I}_{h}=\{(\alpha,1)\}. Theorem 1 (i) implies that if \mathcal{M} is closed under concentration within {(α,1)}\{(\alpha,1)\}, then

supFYVaRα+(Y)=supFYESα(Y).\sup_{F_{Y}\in\mathcal{M}}\mathrm{VaR}^{+}_{\alpha}(Y)=\sup_{F_{Y}\in\mathcal{M}}\mathrm{ES}_{\alpha}(Y).

This observation leads to (with some modifications) the main results in Wang et al. (2015) and Li et al. (2018) on the equivalence between VaR and ES.

Example 9 (TK distortion riskmetric).

If we take hh to be an inverse-S-shaped distortion function in (10), then h={(0,1t0)}\mathcal{I}_{h}=\{(0,1-t_{0})\} for some t0(0,1)t_{0}\in(0,1), and ρh\rho_{h} is the TK distortion riskmetric. As a direct consequence of Theorem 1 (i), if \mathcal{M} is closed under concentration within {(0,1t0)}\{(0,1-t_{0})\}, then

supFYρh(Y)=supFYρh(Y).\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y).

This result implies Theorem 4.11 of Wang et al. (2019) on the robust risk aggregation problem based on dual utilities with inverse-S-shaped distortion functions.

Example 10 (Wasserstein ball, 11-dimensional).

Optimization problems under the uncertainty set of a Wasserstein ball are common in literature when quantifying the discrepancy between a benchmark distribution and alternative scenarios; see e.g., Blanchet and Murthy (2019). We discuss the application of the concept of concentration to optimization with Wasserstein distances. For p1p\geqslant 1 and F,GpF,G\in\mathcal{M}_{p}, the pp-Wasserstein distance between FF and GG is defined as

Wp(F,G)=(01|F1(u)G1(u)|pdu)1/p.W_{p}(F,G)=\left(\int^{1}_{0}\left|F^{-1}(u)-G^{-1}(u)\right|^{p}\,\mathrm{d}u\right)^{1/p}.

For ε0\varepsilon\geqslant 0, the uncertainty set of an ε\varepsilon-Wasserstein ball around a benchmark distribution G~p\widetilde{G}\in\mathcal{M}_{p} is given by

(G~,ε)={Fp:Wp(F,G~)ε}.\mathcal{M}(\widetilde{G},\varepsilon)=\{F\in\mathcal{M}_{p}:W_{p}(F,\widetilde{G})\leqslant\varepsilon\}.

Suppose that the benchmark distribution G~\widetilde{G} has a quantile function that is constant on each element in some collection of disjoint intervals ~[0,1]\widetilde{\mathcal{I}}\subset[0,1]. As shown in Appendix A.1, (G~,ε)\mathcal{M}(\widetilde{G},\varepsilon) is closed under concentration within \mathcal{I} for all ~\mathcal{I}\subset\widetilde{\mathcal{I}}. Using this closedness property and Theorem 1 (i), the equivalence

supFY(G~,ε)ρh(Y)=supFY(G~,ε)ρh(Y)\sup_{F_{Y}\in\mathcal{M}(\widetilde{G},\varepsilon)}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}(\widetilde{G},\varepsilon)}\rho_{h^{*}}(Y) (14)

holds for all hh\in\mathcal{H} such that h~\mathcal{I}_{h}\subset\widetilde{\mathcal{I}}.

Remark 3.

In general, if the quantile function G~\widetilde{G} in Example 10 is not constant on some interval in ~\widetilde{\mathcal{I}}, then (G~,ε)\mathcal{M}(\widetilde{G},\varepsilon) is not necessarily closed under concentration within ~\widetilde{\mathcal{I}}, and the equivalence (14) may not hold. For instance, the worst-case VaRα\mathrm{VaR}_{\alpha} over (G~,ε)\mathcal{M}(\widetilde{G},\varepsilon) is generally different from the worst-case ESα\mathrm{ES}_{\alpha} over (G~,ε)\mathcal{M}(\widetilde{G},\varepsilon) as obtained in Proposition 4 of Liu et al. (2022). We also refer to Bernard et al. (2020) who consider a Wasserstein ball together with moment constraints.

Example 11 (Wasserstein ball, nn-dimensional).

For nn\in\mathbb{N}, p1p\geqslant 1, a1a\geqslant 1 and F,GpnF,G\in\mathcal{M}^{n}_{p}, the pp-Wasserstein distance on n\mathbb{R}^{n} between FF and GG is defined as

Wa,pn(F,G)=inf𝐗F,𝐘G(𝔼[𝐗𝐘ap])1/p,W^{n}_{a,p}(F,G)=\inf_{\mathbf{X}\sim F,~{}\mathbf{Y}\sim G}(\mathbb{E}[\|\mathbf{X}-\mathbf{Y}\|^{p}_{a}])^{1/p},

where a\|\cdot\|_{a} is the a\mathcal{L}^{a}-norm on n\mathbb{R}^{n}. Similarly to the 11-dimensional case, for ε0\varepsilon\geqslant 0, an ε\varepsilon-Wasserstein ball on n\mathbb{R}^{n} around a benchmark distribution G~pn\widetilde{G}\in\mathcal{M}^{n}_{p} is defined as

n(G~,ε)={Fpn:Wa,pn(F,G~)ε}.\mathcal{M}^{n}(\widetilde{G},\varepsilon)=\{F\in\mathcal{M}^{n}_{p}:W^{n}_{a,p}(F,\widetilde{G})\leqslant\varepsilon\}.

In a portfolio selection problem, we consider the worst-case riskmetric of a linear combination of random losses. For ε0\varepsilon\geqslant 0, 𝐰[0,)n\mathbf{w}\in[0,\infty)^{n}, p>1p>1, a>1a>1 and 𝐙(p)n\mathbf{Z}\in(\mathcal{L}^{p})^{n}, as shown in Appendix A.1, the uncertainty set

{F𝐰𝐗p:F𝐗n(F𝐙,ε)}\{F_{\mathbf{w}^{\top}\mathbf{X}}\in\mathcal{M}_{p}:F_{\mathbf{X}}\in\mathcal{M}^{n}(F_{\mathbf{Z}},\varepsilon)\}

is closed under concentration within {(0,t)}\{(0,t)\} for all tp0t\leqslant p_{0}. For a practical example, assume that an investor holds a portfolio of bonds (for simplicity, assume that they have the same maturity). The loss vector 𝐗𝟎\mathbf{X}\geqslant\mathbf{0} from this portfolio at maturity has an estimated benchmark loss distribution G~\widetilde{G}, and the probability of no default from these bonds (i.e., 𝐗=𝟎\mathbf{X}=\mathbf{0}) is estimated as p0>0p_{0}>0 (usually quite large). Suppose that the investor uses a distortion riskmetric with an inverse-S-shaped distortion function hh given in (10) of Example 2, and considers a Wasserstein ball around G~\widetilde{G} with radius ε\varepsilon. Note that h={(0,t)}\mathcal{I}_{h}=\{(0,t)\} for some t(0,1)t\in(0,1) from Example 9. By Theorem 1 (i), we obtain an equivalence result on the worst-case riskmetrics for the portfolio with weight vector 𝐰\mathbf{w},

supF𝐗n(G~,ε)ρh(𝐰𝐗)=supF𝐗n(G~,ε)ρh(𝐰𝐗),\sup_{F_{\mathbf{X}}\in\mathcal{M}^{n}(\widetilde{G},\varepsilon)}\rho_{h}(\mathbf{w}^{\top}\mathbf{X})=\sup_{F_{\mathbf{X}}\in\mathcal{M}^{n}(\widetilde{G},\varepsilon)}\rho_{h^{*}}(\mathbf{w}^{\top}\mathbf{X}),

whenever t(0,p0]t\in(0,p_{0}].

Example 12 (Optimal hedging strategy).

Suppose that an investor is willing to hedge her random loss XX only when it exceeds some certain level ll\in\mathbb{R}. Mathematically, for a fixed X1X\in\mathcal{L}^{1} continuously distributed on (FX1(p0),FX1(1))(F^{-1}_{X}(p_{0}),F^{-1}_{X}(1)) such that (Xl)=p0\mathbb{P}(X\leqslant l)=p_{0} for some p0(0,1)p_{0}\in(0,1) and ll\in\mathbb{R}, define the set of measurable functions

𝒱={V:xxV(x) is increasing,V(x)=0 for all xl}\mathcal{V}=\{V:\mathbb{R}\to\mathbb{R}\mid x\mapsto x-V(x)\text{ is increasing},~{}V(x)=0\text{ for all }x\leqslant l\}

representing possible hedging strategies. Let g:g:\mathbb{R}\to\mathbb{R} be an increasing and convex function. The final payoff obtained by a hedging strategy V𝒱V\in\mathcal{V} is given by XV(X)+g(𝔼[V(X)])X-V(X)+g(\mathbb{E}[V(X)]), where g(𝔼[V(X)])g(\mathbb{E}[V(X)]) is a fixed cost of the hedging strategy that depends on the expected value of V(X)V(X) calculated by a risk-neutral seller in the market using the same probability measure \mathbb{P}. As shown in Appendix A.1, the action set in this optimization problem,

={FXV(X)+g(𝔼[V(X)])1:V𝒱},\mathcal{M}=\{F_{X-V(X)+g(\mathbb{E}[V(X)])}\in\mathcal{M}_{1}:V\in\mathcal{V}\},

is closed under concentration within {(p,1)}\{(p,1)\} for all p[p0,1)p\in[p_{0},1). On the other hand, it is obvious that \mathcal{M} is not closed under concentration for all intervals or closed under conditional expectation since the quantiles of the distributions in \mathcal{M} are fixed beyond the interval (p0,1)(p_{0},1). The above closedness under concentration property allows us to use Theorem 1 to convert the optimal hedging problem for ρh\rho_{h} with an inverse-S-shaped distortion function hh as in (10) to a convex version ρh\rho_{h^{*}}.

Example 13 (Risk choice).

Suppose that an investor is faced with a random loss X1X\in\mathcal{L}^{1}. The distortion function hh of her riskmetric is inverse-S-shaped with h={(p,1)}\mathcal{I}_{-h}=\{(p,1)\} for some p(0,1)p\in(0,1). Suppose that pp is known to the seller. Since the investor is averse to risk for large losses, the seller may provide her with the option to stick to the initial investment or to convert the upper part of the random loss into a fixed payment to avoid large loss. Specifically, we consider the set ={FX,FX(p,1)}\mathcal{M}=\{F_{X},F_{X}^{(p,1)}\} containing two elements, where (Xu)=p\mathbb{P}(X\leqslant u)=p for some uu\in\mathbb{R}. It is clear that \mathcal{M} is closed under concentration within {(p,1)}\{(p,1)\} but not closed under conditional expectation. We assume that the costs of the two investment strategies are calculated by expectation and thus are the same. By (i) of Theorem 1, it follows that the risk minimization problem satisfies

minFYρh(Y)=minFYρh(Y)=ρh(X),\min_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\min_{F_{Y}\in\mathcal{M}}\rho_{h_{*}}(Y)=\rho_{h_{*}}(X),

where the last equality follows from Theorem 3 of Wang et al. (2020a). By (iii) of Theorem 1, we further have the minimum of the original problem minFYρh(Y)\min_{F_{Y}\in\mathcal{M}}\rho_{h}(Y) is obtained by FX(p,1)F_{X}^{(p,1)}; intuitively, the investor will choose to convert the upper part of her loss into a fixed payment.

3.5 Atomic probability space

The definition of closedness under concentration in Definition 1 requires the assumption of an atomless probability space since a uniform random variable is used in the setup. It may be of practical interest in some economic and optimization settings to assume a finite probability space. In this section, we let the sample space be Ωn={ω1,,ωn}\Omega_{n}=\{\omega_{1},\dots,\omega_{n}\} for nn\in\mathbb{N} and the probability measure n\mathbb{P}_{n} be such that n(ωi)=1/n\mathbb{P}_{n}(\omega_{i})=1/n for all i=1,,ni=1,\dots,n (such a space is called adequate in economics). The possible distributions in such a probability space are supported by at most nn points each with probability a multiple of 1/n1/n, and we denote by [n]\mathcal{M}_{[n]} the set of these distributions.

Define the collection of intervals n={(j/n,k/n]:j,k{0},j<kn}\mathcal{I}_{n}=\{(j/n,k/n]:j,k\in\mathbb{N}\cup\{0\},~{}j<k\leqslant n\}. We say a set of distributions [n]\mathcal{M}\subset\mathcal{M}_{[n]} is closed under grid concentration within n\mathcal{I}\subset\mathcal{I}_{n} if for all FF\in\mathcal{M}, the distribution of the random variable

F1(Un)𝟙{UnCC}+C𝔼[F1(Un)|UnC]𝟙{UnC}F^{-1}(U_{n})\mathds{1}_{\{U_{n}\notin\bigcup_{C\in\mathcal{I}}C\}}+\sum_{C\in\mathcal{I}}\mathbb{E}[F^{-1}(U_{n})|U_{n}\in C]\mathds{1}_{\{U_{n}\in C\}}

is also in \mathcal{M}, where UnU_{n} is a random variable such that Un(ωi)=i/nU_{n}(\omega_{i})=i/n for all i=1,,ni=1,\dots,n. For a distribution FF with finite mean and (a,b]n(a,b]\in\mathcal{I}_{n}, it is straightforward that the left-quantile function of F(a,b]F^{(a,b]} is given by (7). The following equivalence result holds with additional assumption hn\mathcal{I}_{h}\subset\mathcal{I}_{n}. The proof can be obtained directly from that of Theorem 1.

Proposition 4.

Let [n]\mathcal{M}\subset\mathcal{M}_{[n]} and hh\in\mathcal{H}. If h=h^h=\hat{h}, hn\mathcal{I}_{h}\subset\mathcal{I}_{n} and \mathcal{M} is closed under grid concentration within h\mathcal{I}_{h}, then

supFYρh(Y)=supFYρh(Y).\sup_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y).

We note that the condition hn\mathcal{I}_{h}\subset\mathcal{I}_{n} in Proposition 4 is satisfied by all distortion functions hh which are linear (or constant) on each of ((j1)/n,j/n]((j-1)/n,j/n], j=1,,nj=1,\dots,n. It is common to assume such a distortion function hh in an adequate probability space of nn states, since any distribution function can only take values in {j/n:j=0,,n}\{j/n:j=0,\dots,n\}.

4 Multi-dimensional setting

Our main equivalence results in Theorems 1 and 2 are stated under the context of one-dimensional random variables. In this section, we discuss their generalization to multi-dimensional framework with a few additional steps.

In the multi-dimensional setting, closedness under concentration is not easy to define, as quantile functions are not naturally defined for multivariate distributions. Nevertheless, closedness under conditional expectation can be analogously formulated. For nn\in\mathbb{N}, we say that n\mathcal{M}\subset\mathcal{M}^{n} is closed under conditional expectation, if for all F𝐗F_{\mathbf{X}}\in\mathcal{M}, the distribution of any conditional expectation of 𝐗\mathbf{X} is in \mathcal{M}. The following theorem states the multi-dimensional version of our main equivalence result using closedness under conditional expectation.

Theorem 3.

For ~1n\widetilde{\mathcal{M}}\subset\mathcal{M}^{n}_{1}, increasing function hh\in\mathcal{H} and f:A×nf:A\times\mathbb{R}^{n}\to\mathbb{R} concave in the second argument, if ~\widetilde{\mathcal{M}} is closed under conditional expectation, then for all 𝐚A\mathbf{a}\in A,

supF𝐗~ρh(f(𝐚,𝐗))=supF𝐗~ρh(f(𝐚,𝐗)).\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h}(f(\mathbf{a},\mathbf{X}))=\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})). (15)

If h=h^h=\hat{h} and the second supremum in (15) is attained by some F𝐗~F_{\mathbf{X}}\in\widetilde{\mathcal{M}}, then Ff(𝐚,𝐗)hF^{\mathcal{I}_{h}}_{f(\mathbf{a},\mathbf{X})} attains both suprema. Moreover, if ff is linear in the second component, then (15) holds for all hh\in\mathcal{H} (not necessarily monotone).

Remark 4.

If we assume that ff is convex (instead of concave) in the second argument in Theorem 3 and keep the other assumptions, then for an increasing hh,

infF𝐗~ρh(f(𝐚,𝐗))=infF𝐗~ρh(f(𝐚,𝐗)).\inf_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h}(f(\mathbf{a},\mathbf{X}))=\inf_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h_{*}}(f(\mathbf{a},\mathbf{X})).

This statement follows by noting ρh=ρh\rho_{-h}=-\rho_{h}. The case of a decreasing hh is similar.

Theorem 3 is similar to Theorem 3.4 of Cai et al. (2020) which states the equivalence (15) for increasing hh and a specific set ~\widetilde{\mathcal{M}} which is a special case in Example 14 below. In contrast, our result applies to non-monotone hh (with an extra condition on ff), more general set ~\widetilde{\mathcal{M}}, and also the infimum problem. The setting of a function ff linear in the second argument often appears in portfolio selection problems where f(𝐚,𝐗)=𝐚𝐗f(\mathbf{a},\mathbf{X})=\mathbf{a}^{\top}\mathbf{X}; see Example 11 and Section 6.

Example 14.

Similarly to Example 6, we give examples of sets of multi-dimensional distributions closed under conditional expectation.

  1. 1.

    (Convex function conditions) For nn\in\mathbb{N}, a convex set BnB\subset\mathbb{R}^{n}, set Ψ\Psi of convex functions on n\mathbb{R}^{n}, and a mapping π:Ψ\pi:\Psi\to\mathbb{R}, let

    ~(B,Ψ,f)={F𝐗1n:(𝐗B)=1,𝔼[ψ(𝐗)]π(ψ) for all ψΨ}.\widetilde{\mathcal{M}}(B,\Psi,f)=\{F_{\mathbf{X}}\in\mathcal{M}^{n}_{1}:\mathbb{P}(\mathbf{X}\in B)=1,~{}\mathbb{E}[\psi(\mathbf{X})]\leqslant\pi(\psi)\text{ for all }\psi\in\Psi\}.

    It is clear that ~(B,Ψ,f)\widetilde{\mathcal{M}}(B,\Psi,f) is closed under conditional expectation due to Jensen’s inequality. The uncertainty set proposed by Delage et al. (2014) and used in Theorem 3.4 of Cai et al. (2020) can be obtained as a special case of this setting by taking Ψ={f1,,fn}{g1,,gn}Φ\Psi=\{f_{1},\dots,f_{n}\}\cup\{g_{1},\dots,g_{n}\}\cup\Phi, where fi:(x1,,xn)xif_{i}:(x_{1},\dots,x_{n})\mapsto x_{i}, gi:(x1,,xn)xig_{i}:(x_{1},\dots,x_{n})\mapsto-x_{i} for all i=1,,ni=1,\dots,n, and Φ\Phi is a set of convex functions. The specification for π\pi is that π(fi)=mi\pi(f_{i})=m_{i}\in\mathbb{R}, π(gi)=mi\pi(g_{i})=-m_{i}, π(ϕ)=0\pi(\phi)=0 for all i=1,,ni=1,\dots,n, ϕΦ\phi\in\Phi.

  2. 2.

    (Distortion conditions) For nn\in\mathbb{N}, KK\subset\mathbb{N}, 𝐚=(𝐚k)kKn×|K|\mathbf{a}=(\mathbf{a}_{k})_{k\in K}\in\mathbb{R}^{n\times|K|}, 𝐡=(hk)kK()|K|\mathbf{h}=(h_{k})_{k\in K}\in(\mathcal{H}^{*})^{|K|} and 𝐱=(xk)kK|K|\mathbf{x}=(x_{k})_{k\in K}\in\mathbb{R}^{|K|}, the set

    ~𝐡(𝐚,𝐱)={F𝐗1n:ρhk(𝐚k𝐗)xkfor allkK}\widetilde{\mathcal{M}}^{\mathbf{h}}(\mathbf{a},\mathbf{x})=\{F_{\mathbf{X}}\in\mathcal{M}^{n}_{1}:\rho_{h_{k}}(\mathbf{a}_{k}^{\top}\mathbf{X})\leqslant x_{k}~{}\mbox{for all}~{}k\in K\}

    is closed under conditional expectation. In portfolio optimization problems, this setting incorporates distributional uncertainty with constraints on convex distortion risk measures of the total loss. In particular, optimization with the riskmetrics chosen as ES is common in the literature; see e.g., Rockafellar and Uryasev (2002), where ES is called CVaR.

  3. 3.

    (Convex order conditions) For nn\in\mathbb{N} and random vectors 𝐙k(1)n\mathbf{Z}_{k}\in(\mathcal{L}^{1})^{n}, kKk\in K\subset\mathbb{N}, we naturally extend from part 5 of Example 6 and obtain that the set

    ~cx(𝐙)={F𝐗1n:𝐗cx𝐙k for all kK}\widetilde{\mathcal{M}}^{\rm cx}(\mathbf{Z})=\{F_{\mathbf{X}}\in\mathcal{M}^{n}_{1}:\mathbf{X}\leqslant_{\rm cx}\mathbf{Z}_{k}\mbox{ for all }k\in K\}

    is closed under conditional expectation.

Next, we discuss a multi-dimensional problem setting involving concentrations of marginal distributions. For nn\in\mathbb{N}, we assume that marginal distributions of an nn-dimensional distribution in 1n\mathcal{M}^{n}_{1} are uncertain and are in some sets 1,,n1\mathcal{F}_{1},\dots,\mathcal{F}_{n}\subset\mathcal{M}_{1}. For F1,,Fn1F_{1},\dots,F_{n}\in\mathcal{M}_{1}, define the set

𝒟(F1,,Fn)={cdf of (X1,,Xn):XiFi,i=1,,n},\mathcal{D}(F_{1},\dots,F_{n})=\{\text{cdf of }(X_{1},\dots,X_{n}):X_{i}\sim F_{i},~{}i=1,\dots,n\},

which is the set of all possible joint distributions with specified marginals; see Embrechts et al. (2015). For 𝐚A\mathbf{a}\in A, hh\in\mathcal{H} and 1,,n1\mathcal{F}_{1},\dots,\mathcal{F}_{n}\subset\mathcal{M}_{1}, the worst-case distortion riskmetric can be represented as

supF𝐗𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐗)).\sup_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h}(f(\mathbf{a},\mathbf{X})). (16)

The outer problem of (16) is a robust risk aggregation problem (see Embrechts et al. (2013, 2015) and item 6 of Example 6), which is typically nontrivial in general when hh is not concave. With additional uncertainty of the marginal distributions, (16) can be converted to a convex problem given that 1,,n\mathcal{F}_{1},\dots,\mathcal{F}_{n} are closed under concentration.

Theorem 4.

For 1,,n1\mathcal{F}_{1},\dots,\mathcal{F}_{n}\subset\mathcal{M}_{1}, increasing hh\in\mathcal{H} with h=h^h=\hat{h}, and f:A×nf:A\times\mathbb{R}^{n}\to\mathbb{R} increasing, supermodular and positively homogeneous in the second argument, if 1,,n\mathcal{F}_{1},\dots,\mathcal{F}_{n} are closed under concentration within h\mathcal{I}_{h}, then the following hold.555For a function f:nf:\mathbb{R}^{n}\to\mathbb{R}, we say ff is supermodular if f(𝐱)+f(𝐲)f(𝐱𝐲)+f(𝐱𝐲)f(\mathbf{x})+f(\mathbf{y})\leqslant f(\mathbf{x}\wedge\mathbf{y})+f(\mathbf{x}\vee\mathbf{y}) for all 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}; ff is positively homogeneous if f(λ𝐱)=λf(𝐱)f(\lambda\mathbf{x})=\lambda f(\mathbf{x}) for all λ0\lambda\geqslant 0 and 𝐱n\mathbf{x}\in\mathbb{R}^{n}.

  1. (i)

    For all 𝐚A\mathbf{a}\in A,

    supF𝐗𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐗))=supF𝐗𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐗)).\sup_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h}(f(\mathbf{a},\mathbf{X}))=\sup_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})). (17)
  2. (ii)

    If the supremum of the right-hand side of (17) is attained by some F11,,FnnF_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n} and F𝒟(F1,,Fn)F\in\mathcal{D}(F_{1},\dots,F_{n}), then for all 𝐚A\mathbf{a}\in A, F1h,,FnhF_{1}^{\mathcal{I}_{h}},\dots,F_{n}^{\mathcal{I}_{h}} and a comonotonic random vector (X1h,,Xnh)({X}_{1}^{\mathcal{I}_{h}},\dots,{X}_{n}^{\mathcal{I}_{h}}) with XihFih{X}_{i}^{\mathcal{I}_{h}}\sim F_{i}^{\mathcal{I}_{h}}, i=1,,ni=1,\dots,n attain the suprema on both sides of (17).666A random vector (X1,,Xn)(1)n(X_{1},\dots,X_{n})\in(\mathcal{L}^{1})^{n} is called comonotonic if there exists a random variable Z𝒳Z\in\mathcal{X} and increasing functions f1,,fnf_{1},\dots,f_{n} on \mathbb{R} such that Xi=fi(Z)X_{i}=f_{i}(Z) almost surely for all i=1,,ni=1,\dots,n.

Some examples of functions on n\mathbb{R}^{n} that are supermodular and positively homogeneous are given below. These functions are concave due to Theorem 3 of Marinacci and Montrucchio (2008).

Example 15 (Supermodular and positively homogeneous functions).

For nn\in\mathbb{N}, the following functions f:nf:\mathbb{R}^{n}\to\mathbb{R} are supermodular and positively homogeneous. Write 𝐱=(x1,,xn)n\mathbf{x}=(x_{1},\dots,x_{n})\in\mathbb{R}^{n}.

  1. (i)

    (Linear function) f:𝐱𝐚𝐗f:\mathbf{x}\mapsto\mathbf{a}^{\top}\mathbf{X} for 𝐚n\mathbf{a}\in\mathbb{R}^{n}. The function is increasing for 𝐚+n\mathbf{a}\in\mathbb{R}_{+}^{n}.

  2. (ii)

    (Geometric mean) f:𝐱(i=1n|xi|)1/nf:\mathbf{x}\mapsto-(\prod^{n}_{i=1}|x_{i}|)^{1/n} on n\mathbb{R}_{-}^{n} for odd nn. The function is also increasing on n\mathbb{R}_{-}^{n}.

  3. (iii)

    (Negated pp-norm) f:𝐱𝐱pf:\mathbf{x}\mapsto-\|\mathbf{x}\|_{p} for p1p\geqslant 1. The function is increasing on n\mathbb{R}_{-}^{n}.

  4. (iv)

    (Sum of functions) f:𝐱i=1nfi(xi)f:\mathbf{x}\mapsto\sum^{n}_{i=1}f_{i}(x_{i}) for positively homogeneous functions f1,,fn:f_{1},\dots,f_{n}:\mathbb{R}\to\mathbb{R}. The function is increasing if f1,,fnf_{1},\dots,f_{n} are increasing.

5 One-dimensional uncertainty set with moment constraints

A popular example of an uncertainty set closed under concentration for all intervals is that of distributions with specified moment constraints as in Example 6. We investigate this uncertainty set in detail and offer in this section some general results, which generalize several existing results in the literature; none of the results in the literature include non-monotone and non-convex distortion functions. Non-monotone distortion functions create difficulties because of possible complications at their discontinuity points.

For p>1p>1, mm\in\mathbb{R} and v>0v>0, we recall the set of interest in Example 6:

(p,m,v)={FYp:𝔼[Y]=m,𝔼[|Ym|p]vp}.\mathcal{M}(p,m,v)=\{F_{Y}\in\mathcal{M}_{p}:\mathbb{E}[Y]=m,~{}\mathbb{E}[|Y-m|^{p}]\leqslant v^{p}\}.

Let q[1,]q\in[1,\infty] be the Hölder conjugate of pp, namely q=(11/p)1q=(1-1/p)^{-1}, or equivalently, 1/p+1/q=11/p+1/q=1. For all hh\in\mathcal{H}^{*} or hh\in\mathcal{H}_{*}, we denote by

hxq=(01|h(t)x|qdt)1/q,q< and hx=maxt[0,1]|h(t)x|,x.\|h^{\prime}-x\|_{q}=\left(\int_{0}^{1}|h^{\prime}(t)-x|^{q}\,\mathrm{d}t\right)^{1/q},~{}q<\infty\mbox{~{}~{}and~{}~{}}\|h^{\prime}-x\|_{\infty}=\max_{t\in[0,1]}|h^{\prime}(t)-x|,~{}~{}x\in\mathbb{R}. (18)

We introduce the following quantities:

ch,q=argminxhxq and [h]q=minxhxq=hch,qq.c_{h,q}=\operatorname*{arg\,min}_{x\in\mathbb{R}}\|h^{\prime}-x\|_{q}\mbox{~{}~{}~{}and~{}~{}~{}}[h]_{q}=\min_{x\in\mathbb{R}}\|h^{\prime}-x\|_{q}=\|h^{\prime}-c_{h,q}\|_{q}.

We set [h]q=[h]_{q}=\infty if hh is not continuous. It is easy to verify that ch,qc_{h,q} is unique for q>1q>1. The quantity [h]q[h]_{q} may be interpreted as a qq-central norm of the function hh and ch,qc_{h,q} as its qq-center. Note that for q=2q=2 and hh continuous, [h]2=hh(1)2[h]_{2}=\|h^{\prime}-h(1)\|_{2} and ch,2=h(1)c_{h,2}=h(1). We also note that the optimization problem is trivial if [h]q=0[h]_{q}=0, which corresponds to the case that h=h(1)𝟙[0,1]h^{\prime}=h(1)\mathds{1}_{[0,1]} and ρh\rho_{h} is a linear functional, thus a multiple of the expectation. In this case, the supremum and infimum are attained by all random variables whose distributions are in (p,m,v)\mathcal{M}(p,m,v), and they are equal to mh(1)mh(1). Furthermore, for hh\in\mathcal{H}^{*} or hh\in\mathcal{H}_{*}, and q>1q>1, we define a function on [0,1][0,1] by

ϕhq(t)=|h(1t)ch,q|qh(1t)ch,q[h]q1qif h(1t)ch,q0, and ϕhq(t)=0 otherwise.\phi^{q}_{h}(t)=\frac{|h^{\prime}(1-t)-c_{h,q}|^{q}}{h^{\prime}(1-t)-c_{h,q}}[h]_{q}^{1-q}~{}~{}~{}\mbox{if~{}}h^{\prime}(1-t)-c_{h,q}\neq 0,\mbox{~{}~{} and $\phi^{q}_{h}(t)=0$ otherwise}.

In case q=2q=2, for t[0,1]t\in[0,1], ϕh2(t)=(h(1t)h(1))hh(1)21\phi^{2}_{h}(t)=(h^{\prime}(1-t)-h(1))\|h^{\prime}-h(1)\|_{2}^{-1} if hh(1)2>0\|h^{\prime}-h(1)\|_{2}>0 and 0 otherwise. We summarize our findings in the following theorem.

Theorem 5.

For any hh\in\mathcal{H}, mm\in\mathbb{R}, v>0v>0 and p>1p>1, we have

supFY(p,m,v)ρh(Y)=mh(1)+v[h]q and infFY(p,m,v)ρh(Y)=mh(1)v[h]q.\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h}(Y)=mh(1)+v[h^{*}]_{q}\mbox{~{}~{}and~{}~{}}\inf_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h}(Y)=mh(1)-v[h_{*}]_{q}. (19)

Moreover, if h=h^h=\hat{h}, 0<[h]q<0<[h_{*}]_{q}<\infty and 0<[h]q<0<[h^{*}]_{q}<\infty, then the supremum and infimum in (19) are attained by a random variable XX such that FX(p,m,v)F_{X}\in\mathcal{M}(p,m,v) with its quantile function uniquely specified as a.e. equal to m+vϕhqm+v\phi_{h^{*}}^{q} and mvϕhqm-v\phi_{h_{*}}^{q}, respectively.

The proof of Theorem 5 follows from a combination of Lemmas A.1 and A.2 in Appendix B.4 and Theorem 1. Note that for hh\in\mathcal{H}^{*} (resp. hh\in\mathcal{H}_{*}) and q>1q>1, ϕhq\phi^{q}_{h} is increasing (resp. decreasing) on [0,1][0,1]. Hence, ϕhq\phi^{q}_{h} (resp. ϕhq-\phi^{q}_{h}) in Theorem 5 indeed determines a quantile function.

The following proposition concerns the finiteness of ρh\rho_{h} on p\mathcal{L}^{p}.

Proposition 5.

For any hh\in\mathcal{H} and p[1,]p\in[1,\infty], ρh\rho_{h} is finite on p\mathcal{L}^{p} if [h]q<[h^{*}]_{q}<\infty and [h]q<[h_{*}]_{q}<\infty.

As a special case of Proposition 5, ρh\rho_{h} is always finite on 1\mathcal{L}^{1} if hh is convex or concave with bounded hh^{\prime} because [h]<[h^{*}]_{\infty}<\infty and [h]<[h_{*}]_{\infty}<\infty.

As a common example of the general result in Theorem 5, below we collect our findings for the case of VaR.

Corollary 1.

For α(0,1)\alpha\in(0,1), p>1p>1, mm\in\mathbb{R} and v>0v>0, we have

supFY(p,m,v)VaRα(Y)=maxFY(p,m,v)ESα(Y)=m+vα(αp(1α)+(1α)pα)1/p,\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{VaR}_{\alpha}(Y)=\max_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{ES}_{\alpha}(Y)=m+v\alpha\left(\alpha^{p}(1-\alpha)+(1-\alpha)^{p}\alpha\right)^{-1/p},

and

infFY(p,m,v)VaRα(Y)=minFY(p,m,v)ESαL(Y)=mv(1α)(αp(1α)+(1α)pα)1/p,\inf_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{VaR}_{\alpha}(Y)=\min_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{ES}^{L}_{\alpha}(Y)=m-v(1-\alpha)\left(\alpha^{p}(1-\alpha)+(1-\alpha)^{p}\alpha\right)^{-1/p},

where

ESαL(Y)=1α0αVaRt(Y)dt,Y1.\mathrm{ES}^{L}_{\alpha}(Y)=\frac{1}{\alpha}\int_{0}^{\alpha}\mathrm{VaR}_{t}(Y)\,\mathrm{d}t,~{}~{}Y\in\mathcal{L}^{1}.

We see from Theorem 5 that if h=h^h=\hat{h}, then the supremum and the infimum of ρh(Y)\rho_{h}(Y) over FY(p,m,v)F_{Y}\in\mathcal{M}(p,m,v) are always attainable. However, in case hh^h\neq\hat{h}, the supremum or infimum may no longer be attainable as a maximum or minimum. We illustrate this in Example 16 below.

Example 16 (VaR and ES, p=2p=2).

Take α(0,1)\alpha\in(0,1), p=2p=2 and ρh=VaRα\rho_{h}=\mathrm{VaR}_{\alpha}, which implies ρh=ESα\rho_{h^{*}}=\mathrm{ES}_{\alpha}. Corollary 1 gives supFY(2,m,v)VaRα(Y)=supFY(2,m,v)ESα(Y)=m+vα/(1α)\sup_{F_{Y}\in\mathcal{M}(2,m,v)}\mathrm{VaR}_{\alpha}(Y)=\sup_{F_{Y}\in\mathcal{M}(2,m,v)}\mathrm{ES}_{\alpha}(Y)=m+v\sqrt{\alpha/(1-\alpha)}. This is the well-known Cantalli-type formula for ES. By Lemma A.1, the unique left-quantile function of the random variable ZZ that attains the supremum of ESα\mathrm{ES}_{\alpha} is given by FZ1(t)=m+v(𝟙(α,1](t)/(1α)1)(1α)/αF^{-1}_{Z}(t)=m+v(\mathds{1}_{(\alpha,1]}(t)/(1-\alpha)-1)\sqrt{(1-\alpha)/\alpha}, t[0,1]t\in[0,1] a.e. We thus have VaRα(Z)=mv(1α)/(α)\mathrm{VaR}_{\alpha}(Z)=m-v\sqrt{(1-\alpha)/(\alpha)}, and hence ZZ does not attain supFY(2,m,v)VaRα(Y)\sup_{F_{Y}\in\mathcal{M}(2,m,v)}\mathrm{VaR}_{\alpha}(Y). It follows by the uniqueness of FZF_{Z} that the supremum of VaRα(Y)\mathrm{VaR}_{\alpha}(Y) over FY(2,m,v)F_{Y}\in\mathcal{M}(2,m,v) cannot be attained. However, the supremum of VaRα+\mathrm{VaR}^{+}_{\alpha} is attained by ZZ since VaRα+(Z)=m+v(1α)/(α)\mathrm{VaR}^{+}_{\alpha}(Z)=m+v\sqrt{(1-\alpha)/(\alpha)}.

Example 17 (Difference of two TK distortion riskmetrics).

Take p=2p=2 and h=h1h2h=h_{1}-h_{2} to be the difference between two inverse-S-shaped functions in (10) with parameters the same as those in Example 4 (γ1=0.8\gamma_{1}=0.8, γ2=0.7\gamma_{2}=0.7). By Theorem 5, the worst-case distortion riskmetrics under the uncertainty set (2,m,v)\mathcal{M}(2,m,v) are given by supFY(2,m,v)ρh(Y)=supFY(2,m,v)ρh(Y)=0.3345v\sup_{F_{Y}\in\mathcal{M}(2,m,v)}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}(2,m,v)}\rho_{h^{*}}(Y)=0.3345v, and the unique left-quantile function of the random variable ZZ attaining both suprema above is given by FZ1(t)=m+2.9892h(1t)vF^{-1}_{Z}(t)=m+2.9892\cdot h^{*\prime}(1-t)v, t[0,1]t\in[0,1] a.e. The worst-case distortion riskmetrics obtained above are independent of the mean mm as h(1)=h1(1)h2(1)=0h(1)=h_{1}(1)-h_{2}(1)=0, which is sensible since ρh\rho_{h} and ρh\rho_{h^{*}} only incorporate the disagreement between two distortion riskmetrics. Similarly, we can calculate the infimum of ρh(Y)\rho_{h}(Y) over Y(2,m,v)Y\in\mathcal{M}(2,m,v), and thus obtain the largest absolute difference between the two preferences numerically represented by ρh1\rho_{h_{1}} and ρh2\rho_{h_{2}}.

6 Related optimization problems

In this section, we discuss the applications of our main results to some related optimization problems commonly investigated in the literature by including the outer problem of (1).

6.1 Portfolio optimization

Our equivalence results can be applied to robust portfolio optimization problems. For an uncertainty set ~pn\widetilde{\mathcal{M}}\subset\mathcal{M}^{n}_{p} with p[1,]p\in[1,\infty], let the random vector 𝐗=(X1,,Xn)F𝐗~\mathbf{X}=(X_{1},\dots,X_{n})\sim F_{\mathbf{X}}\in\widetilde{\mathcal{M}}, representing the random losses from nn risky assets. For AnA\subset\mathbb{R}^{n}, denote by a vector 𝐚A\mathbf{a}\in A the amounts invested in each of the nn risky assets. For a distortion function hh\in\mathcal{H} and distortion riskmetric ρh:p\rho_{h}:\mathcal{L}^{p}\to\mathbb{R}, we aim to solve the robust portfolio optimization problem given by

min𝐚A(supF𝐗~ρh(𝐚𝐗)+β(𝐚)),\min_{\mathbf{a}\in A}\left(\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h}(\mathbf{a}^{\top}\mathbf{X})+\beta(\mathbf{a})\right), (20)

where β:n\beta:\mathbb{R}^{n}\to\mathbb{R} is a penalty function of risk concentration. Note that β\beta is irrelevant for the inner problem of (20). For a general non-concave hh, there is no known algorithm to solve the inner problem of (20). The outer optimization problem is also nontrivial in general. Therefore, we usually cannot obtain closed-form solutions of (20) using classical results of optimization problems for non-convex risk measures. However, as a direct consequence of Theorems 1 and 3, the following proposition converts (20) to an equivalent convex optimization problem that becomes much easier to solve. The proof of Proposition 6 follows directly from Theorems 1 and 3.

Proposition 6.

Let hh\in\mathcal{H}, nn\in\mathbb{N}, AnA\subset\mathbb{R}^{n}, and ~1n\widetilde{\mathcal{M}}\subset\mathcal{M}^{n}_{1}.

  1. (i)

    if h=h^h=\hat{h} and the set {F𝐚𝐗1:F𝐗~}\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{1}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\} is closed under concentration within h\mathcal{I}_{h} for all 𝐚A\mathbf{a}\in A, then

    min𝐚A(supF𝐗~ρh(𝐚𝐗)+β(𝐚))=min𝐚A(supF𝐗~ρh(𝐚𝐗)+β(𝐚)).\displaystyle\min_{\mathbf{a}\in A}\left(\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h}(\mathbf{a}^{\top}\mathbf{X})+\beta(\mathbf{a})\right)=\min_{\mathbf{a}\in A}\left(\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\rho_{h^{*}}(\mathbf{a}^{\top}\mathbf{X})+\beta(\mathbf{a})\right). (21)
  2. (ii)

    if the set {F𝐚𝐗1:F𝐗~}\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{1}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\} is closed under concentration for all intervals for all 𝐚A\mathbf{a}\in A, then (21) holds.

  3. (iii)

    If ~\widetilde{\mathcal{M}} is closed under conditional expectation, then (21) holds.

6.2 Preference robust optimization

We are also able to solve the preference robust optimization problem with distributional uncertainty. For nn\in\mathbb{N}, an nn-dimensional action set AA, a set of plausible distributions ~1n\widetilde{\mathcal{M}}\subset\mathcal{M}^{n}_{1}, and a set of possible probability perceptions 𝒢\mathcal{G}\subset\mathcal{H}, the problem is formulated as follows:

min𝐚AsupF𝐗~suph𝒢ρh(f(𝐚,𝐗)).\min_{\mathbf{a}\in A}~{}\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}~{}\sup_{h\in\mathcal{G}}\rho_{h}(f(\mathbf{a},\mathbf{X})). (22)

Preference robust optimization refers to the situation when the objective is not completely known, e.g., hh is in the set 𝒢\mathcal{G} but not identified. Therefore, optimization is performed under the worst-case preference in the set 𝒢\mathcal{G}. Also note that the form suph𝒢ρh\sup_{h\in\mathcal{G}}\rho_{h} includes (but is not limited to) all coherent risk measures via the representation of Kusuoka (2001). For the problem of (22) without distributional uncertainty (thus, only the minimum and the second supremum), see Delage and Li (2018). We have the following result whose proof follows from Theorems 1 and 3.

Proposition 7.

Let ~1n\widetilde{\mathcal{M}}\subset\mathcal{M}^{n}_{1} and AnA\subset\mathbb{R}^{n} with nn\in\mathbb{N}.

  1. (i)

    If h=h^h=\hat{h} and the set {Ff(𝐚,𝐗)1:F𝐗~}\{F_{f(\mathbf{a},\mathbf{X})}\in\mathcal{M}_{1}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\} is closed under concentration within h\mathcal{I}_{h} for all 𝐚A\mathbf{a}\in A, then for all 𝒢\mathcal{G}\subset\mathcal{H},

    min𝐚AsupF𝐗~suph𝒢ρh(f(𝐚,𝐗))=min𝐚AsupF𝐗~suph𝒢ρh(f(𝐚,𝐗)).\min_{\mathbf{a}\in A}\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\sup_{h\in\mathcal{G}}\rho_{h}(f(\mathbf{a},\mathbf{X}))=\min_{\mathbf{a}\in A}\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\sup_{h\in\mathcal{G}}\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})). (23)
  2. (ii)

    If the set {Ff(𝐚,𝐗)1:F𝐗~}\{F_{f(\mathbf{a},\mathbf{X})}\in\mathcal{M}_{1}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\} is closed under concentration for all intervals for all 𝐚A\mathbf{a}\in A, then (23) holds for all 𝒢\mathcal{G}\subset\mathcal{H}.

  3. (iii)

    If 𝒢\mathcal{G} is a set of increasing functions in \mathcal{H}, f:A×nf:A\times\mathbb{R}^{n}\to\mathbb{R} is concave in the second component, and ~\widetilde{\mathcal{M}} is closed under conditional expectation, then (23) holds.

The preference robust optimization problem without distributional uncertainty (i.e., problem (22) with only the minimum and the second supremum) is generally difficult to solve when the distortion function hh is not concave. However, when the distribution of the random variable is not completely known, we can transfer the original non-convex problem to its convex counterpart using (23), provided that the set of plausible distributions is well structured.

7 Applications and numerical illustrations

Following the discussion in Section 6, we provide several applications of our theoretical results to portfolio management for specific sets of plausible distributions. None of the considered optimization problems in this section are convex, and we provide numerical calculations or approximation for the solutions to these optimization problems.777The processors we use are Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2.59GHz (2 processors). The numerical results are calculated by MATLAB.

7.1 Difference of risk measures under moment constraints

We demonstrate a price competition problem as an application of optimizing the difference between two risk measures shown in Example 17. Similar to the portfolio management problem discussed in Section 6.1, we consider nn risky assets with random losses X1,,Xn2X_{1},\dots,X_{n}\in\mathcal{L}^{2} that are only known to have a fixed mean and a constrained covariance. That is, we choose the set

~={F𝐗2n:𝔼[𝐗]=𝝁,var(𝐗)Σ},\widetilde{\mathcal{M}}=\{F_{\mathbf{X}}\in\mathcal{M}^{n}_{2}:\mathbb{E}[\mathbf{X}]=\bm{\mu},~{}\mathrm{var}(\mathbf{X})\preceq\Sigma\},

for 𝝁n\bm{\mu}\in\mathbb{R}^{n} and Σn×n\Sigma\in\mathbb{R}^{n\times n} positive semidefinite. For an nn-dimensional 𝐚A\mathbf{a}\in A, the set of all possible distributions of aggregate portfolio losses

{F𝐚𝐗2:F𝐗~}=mv(𝐚,𝝁,Σ)=(2,𝐚𝝁,(𝐚Σ𝐚)1/2)\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{2}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\}=\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)=\mathcal{M}\left(2,\mathbf{a}^{\top}\bm{\mu},\left(\mathbf{a}^{\top}\Sigma\mathbf{a}\right)^{1/2}\right) (24)

is closed under concentration for all intervals as is shown in Example 6. Let ρh1:2\rho_{h_{1}}:\mathcal{L}^{2}\to\mathbb{R} be an investor’s own price of the portfolio, while ρh2:2\rho_{h_{2}}:\mathcal{L}^{2}\to\mathbb{R} is her opponent’s price of the same portfolio. We choose h1h_{1} and h2h_{2} to be the inverse-S-shaped distortion functions in (10), with parameters the same as those in Example 17 (γ1=0.8\gamma_{1}=0.8 and γ2=0.7\gamma_{2}=0.7). Write h=h1h2h=h_{1}-h_{2}. For an action set A={(a1,,an)[0,1]n:i=1nai=1}A=\{(a_{1},\dots,a_{n})\in[0,1]^{n}:\sum^{n}_{i=1}a_{i}=1\}, the investor chooses the optimal 𝐚A\mathbf{a}^{*}\in A, such that the worst-case overpricing from her opponent is minimized.

From the calculation in Example 17, we get

D(Σ)\displaystyle D(\Sigma) :=min𝐚AsupF𝐗~(ρh1(𝐚𝐗)ρh2(𝐚𝐗))\displaystyle:=\min_{\mathbf{a}\in A}\sup_{F_{\mathbf{X}}\in\widetilde{\mathcal{M}}}\left(\rho_{h_{1}}(\mathbf{a}^{\top}\mathbf{X})-\rho_{h_{2}}(\mathbf{a}^{\top}\mathbf{X})\right) (25)
=min𝐚AsupFYmv(𝐚,𝝁,Σ)ρh(Y)=0.3345×min𝐚A(𝐚Σ𝐚)1/2.\displaystyle=\min_{\mathbf{a}\in A}\sup_{F_{Y}\in\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)}\rho_{h^{*}}(Y)=0.3345\times\min_{\mathbf{a}\in A}\left(\mathbf{a}^{\top}\Sigma\mathbf{a}\right)^{1/2}.

We note that optimizing ρh1ρh2\rho_{h_{1}}-\rho_{h_{2}} is generally nontrivial since the difference between two distortion functions h1h2h_{1}-h_{2} is not necessarily monotone, concave, or continuous, even though h1h_{1} and h2h_{2} themselves may have these properties. The generality of our equivalence result allows us to convert the original problem to the much simpler form (25), which can be solved efficiently.888The convex problem (25) is solved numerically by the constrained nonlinear multivariable function “fmincon” with the interior-point method. Table 1 demonstrates the optimal values of 𝐚\mathbf{a}^{*} and DD for different choices of Σ\Sigma.

Table 1: Optimal results in (25) for difference between two TK distortion riskmetrics
nn Σ\Sigma 𝐚\mathbf{a}^{*} DD
33 (100010001)\left(\begin{matrix}1&0&0\\ 0&1&0\\ 0&0&1\end{matrix}\right) (0.333,0.333,0.333)(0.333,~{}0.333,~{}0.333) 0.1930.193
33 (210121012)\left(\begin{matrix}2&-1&0\\ -1&2&-1\\ 0&-1&2\end{matrix}\right) (0.300,0.400,0.300)(0.300,~{}0.400,~{}0.300) 0.1500.150
33 (111121113)\left(\begin{matrix}1&1&1\\ 1&2&1\\ 1&1&3\end{matrix}\right) (0.997,0.002,0.001)(0.997,~{}0.002,~{}0.001) 0.3350.335
55 (1000002000003000004000005)\left(\begin{matrix}1&0&0&0&0\\ 0&2&0&0&0\\ 0&0&3&0&0\\ 0&0&0&4&0\\ 0&0&0&0&5\end{matrix}\right) (0.438,0.219,0.146,0.110,0.088)(0.438,~{}0.219,~{}0.146,~{}0.110,~{}0.088) 0.2210.221

7.2 Preference robust portfolio optimization with moment constraints

Next, we discuss an example of preference robust optimization with distributional uncertainty using the results in Sections 5. Similarly to Section 7.1, we consider the set of plausible aggregate portfolio loss distributions

mv(𝐚,𝝁,Σ)={F𝐚𝐗2:F𝐗2n,𝔼[𝐗]=𝝁,var(𝐗)Σ}\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)=\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{2}:F_{\mathbf{X}}\in\mathcal{M}^{n}_{2},~{}\mathbb{E}[\mathbf{X}]=\bm{\mu},~{}\mathrm{var}(\mathbf{X})\preceq\Sigma\}

and the action set A={(a1,,an)[0,1]n:i=1nai=1}A=\{(a_{1},\dots,a_{n})\in[0,1]^{n}:\sum^{n}_{i=1}a_{i}=1\} representing the weights the investor assigns to each random loss. The investor considers TK distortion riskmetrics, however, she is not certain about the parameter γ\gamma of the distortion function hh. Thus, the investor consider the set of TK distortion riskmetrics with distortion functions in

𝒢={h:h=hγ,γ[0.5,0.9]},\mathcal{G}=\left\{h\in\mathcal{H}:h=h^{\gamma},~{}\gamma\in[0.5,0.9]\right\},

which is approximately the two-sigma confidence interval of γ\gamma in Wu and Gonzalez (1996).999The aggregate least-square estimate of γ\gamma in Section 5 of Wu and Gonzalez (1996) is 0.710.71 with standard deviation 0.10.1. Therefore, the investor aims to find a optimal portfolio given the uncertainty in the riskmetrics. To penalize deviations from the benchmark parameter γ=0.71\gamma=0.71 (Wu and Gonzalez, 1996), the investor use the term ec(γ0.71)2\mathrm{e}^{c(\gamma-0.71)^{2}} for some c0c\geqslant 0. Since the set mv(𝐚,𝝁,Σ)\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma) is closed under concentration for all intervals for all 𝐚A\mathbf{a}\in A, Proposition 7, (24), and Theorem 5 lead to

V(𝝁,Σ)\displaystyle V(\bm{\mu},\Sigma) :=min𝐚AsupFYmv(𝐚,𝝁,Σ)supγ[0.5,0.9](ρhγ(Y)ec(γ0.71)2)\displaystyle:=\min_{\mathbf{a}\in A}\sup_{F_{Y}\in\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)}\sup_{\gamma\in[0.5,0.9]}\left(\rho_{h^{\gamma}}(Y)-\mathrm{e}^{c(\gamma-0.71)^{2}}\right) (26)
=min𝐚AsupFY(2,𝐚𝝁,(𝐚Σ𝐚)1/2)supγ[0.5,0.9](ρ(hγ)(Y)ec(γ0.71)2)\displaystyle=\min_{\mathbf{a}\in A}\sup_{F_{Y}\in\mathcal{M}\left(2,\mathbf{a}^{\top}\bm{\mu},\left(\mathbf{a}^{\top}\Sigma\mathbf{a}\right)^{1/2}\right)}\sup_{\gamma\in[0.5,0.9]}\left(\rho_{(h^{\gamma})^{*}}(Y)-\mathrm{e}^{c(\gamma-0.71)^{2}}\right)
=min𝐚Asupγ[0.5,0.9](𝐚𝝁+(𝐚Σ𝐚)1/2[(hγ)]2ec(γ0.71)2).\displaystyle=\min_{\mathbf{a}\in A}\sup_{\gamma\in[0.5,0.9]}\left(\mathbf{a}^{\top}\bm{\mu}+\left(\mathbf{a}^{\top}\Sigma\mathbf{a}\right)^{1/2}\,[(h^{\gamma})^{*}]_{2}-\mathrm{e}^{c(\gamma-0.71)^{2}}\right).

We calculate the optimal values VV for different choices of parameters (nn, cc, 𝝁\bm{\mu} and Σ\Sigma) and report them in Table 2, where 𝐚\mathbf{a}^{*} and γ^\hat{\gamma} represent the optimal weights and the parameters of the inverse-S-shaped distortion function, respectively. Note that the last optimization problem in (26) can be calculated numerically.101010The problem (26) is solved numerically by the constrained nonlinear multivariable function “fmincon” with the interior-point method.

Table 2: Optimal values in (26) for TK distortion riskmetrics
nn cc 𝝁\bm{\mu} Σ\Sigma 𝐚\mathbf{a}^{*} γ^\hat{\gamma} VV
33 0 (1,1,1)(1,1,1) (100010001)\left(\begin{matrix}1&0&0\\ 0&1&0\\ 0&0&1\end{matrix}\right) (0.333,0.333,0.333)(0.333,~{}0.333,~{}0.333) 0.6100.610 1.411.41
33 3030 (2,1,1)(2,1,1) (100010001)\left(\begin{matrix}1&0&0\\ 0&1&0\\ 0&0&1\end{matrix}\right) (0.000,0.500,0.500)(0.000,~{}0.500,~{}0.500) 0.6760.676 1.291.29
33 3030 (1,1,1)(1,1,1) (210121012)\left(\begin{matrix}2&-1&0\\ -1&2&-1\\ 0&-1&2\end{matrix}\right) (0.300,0.400,0.300)(0.300,~{}0.400,~{}0.300) 0.6900.690 1.171.17
33 3030 (1.2,1,1)(1.2,1,1) (111121113)\left(\begin{matrix}1&1&1\\ 1&2&1\\ 1&1&3\end{matrix}\right) (0.500,0.331,0.168)(0.500,~{}0.331,~{}0.168) 0.6300.630 1.571.57
55 3030 (1,1,1,1,1)(1,1,1,1,1) (1000002000003000004000005)\left(\begin{matrix}1&0&0&0&0\\ 0&2&0&0&0\\ 0&0&3&0&0\\ 0&0&0&4&0\\ 0&0&0&0&5\end{matrix}\right) (0.438,0.219,0.146,0.110,0.088)(0.438,~{}0.219,~{}0.146,~{}0.110,~{}0.088) 0.6780.678 1.261.26

7.3 Portfolio optimization with marginal constraints

A special case of the portfolio optimization problem introduced in Section 6.1, which is of interest in robust risk aggregation (see e.g., Blanchet et al. (2020)), is to take ~\widetilde{\mathcal{M}} to be the Fréchet class defined as

~(F1,,Fn)={F𝐗1n:XiFi,i=1,,n},\widetilde{\mathcal{M}}(F_{1},\dots,F_{n})=\{F_{\mathbf{X}}\in\mathcal{M}^{n}_{1}:X_{i}\sim F_{i},~{}i=1,\dots,n\}, (27)

for some known marginal distributions F1,,Fn1F_{1},\dots,F_{n}\in\mathcal{M}_{1}. In this case, although the left-hand side of (21) is generally difficult to solve, for A+nA\subset\mathbb{R}^{n}_{+}, the right-hand side of (21) can be rewritten using convexity and comonotonicity as

min𝐚A(𝐚(ρh(X1),,ρh(Xn))+β(𝐚)),\min_{\mathbf{a}\in A}\left(\mathbf{a}^{\top}({\rho}_{h^{*}}(X_{1}),\dots,{\rho}_{h^{*}}(X_{n}))+\beta(\mathbf{a})\right), (28)

where XiFiX_{i}\sim F_{i}, i=1,,ni=1,\dots,n. We see that (28) is a linear optimization problem with a penalty β\beta, which often admits closed-form solutions when β\beta is properly chosen. For any given 𝐚A\mathbf{a}\in A, we define

(𝐚,F1,,Fn)={F𝐚𝐗1:XiFi,i=1,,n}.\mathcal{M}(\mathbf{a},F_{1},\dots,F_{n})=\{F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}_{1}:X_{i}\sim F_{i},~{}i=1,\dots,n\}. (29)

The set (𝐚,F1,,Fn)\mathcal{M}(\mathbf{a},F_{1},\dots,F_{n}) is the weighted version of S(F1,,Fn)\mathcal{M}^{S}(F_{1},\dots,F_{n}) in Example 6. Note that (𝐚,F1,,Fn)\mathcal{M}(\mathbf{a},F_{1},\dots,F_{n}) is generally neither closed under concentration for all intervals nor closed under conditional expectation. However, (𝐚,F1,,Fn)\mathcal{M}(\mathbf{a},F_{1},\dots,F_{n}) is asymptotically (for large nn) similar to a set of distributions closed under concentration for all intervals; see Theorem 3.5 of Mao and Wang (2015) for a precise statement in the case of equal weights and identical marginal distributions. Therefore, even though (𝐚,F1,,Fn)\mathcal{M}(\mathbf{a},F_{1},\dots,F_{n}) is not closed under concentration for all intervals for some 𝐚A\mathbf{a}\in A, our result of the problem (28) is a good approximation of the original problem for large nn. Such asymptotic equivalence between worst-case riskmetrics of aggregate risks with equal weights has already been well studied in the literature; see e.g., Theorem 3.3 of Embrechts et al. (2015) for the VaR\mathrm{VaR}/ES\mathrm{ES} pair and Theorem 3.5 of Cai et al. (2018) for distortion risk measures.

We conduct numerical calculations to illustrate the equivalence between both sides in (21). We choose the action set Aa,b={(x1,,xn)[a,b]n:i=1nxi=1}A_{a,b}=\{(x_{1},\dots,x_{n})\in[a,b]^{n}:\sum^{n}_{i=1}x_{i}=1\}, for 0a<1/n<b10\leqslant a<1/n<b\leqslant 1 and the penalty function β\beta to be the 2\mathcal{L}^{2}-norm multiplied by a scaler c0c\geqslant 0, namely c2c\|\cdot\|_{2}, where the scaler cc is a tuning parameter of the 2\mathcal{L}^{2} penalty. We first solve the optimization problems separately for the well-known VaR/ES pair at the level of 0.950.95. Specifically, the two problems are given by

VVaR(a,b,F1,,Fn)\displaystyle V_{\mathrm{VaR}}(a,b,F_{1},\dots,F_{n}) =min𝐚Aa,b(supF𝐗(F1,,Fn)VaR0.95(𝐚𝐗)+c𝐚2),\displaystyle=\min_{\mathbf{a}\in A_{a,b}}\left(\sup_{F_{\mathbf{X}}\in\mathcal{M}(F_{1},\dots,F_{n})}\mathrm{VaR}_{0.95}(\mathbf{a}^{\top}\mathbf{X})+c\|\mathbf{a}\|_{2}\right), (30)
VES(a,b,F1,,Fn)\displaystyle V_{\mathrm{ES}}(a,b,F_{1},\dots,F_{n}) =min𝐚Aa,b(supF𝐗(F1,,Fn)ES0.95(𝐚𝐗)+c𝐚2)\displaystyle=\min_{\mathbf{a}\in A_{a,b}}\left(\sup_{F_{\mathbf{X}}\in\mathcal{M}(F_{1},\dots,F_{n})}\mathrm{ES}_{0.95}(\mathbf{a}^{\top}\mathbf{X})+c\|\mathbf{a}\|_{2}\right)
=min𝐚Aa,b(𝐚(ES0.95(F1),,ES0.95(Fn))+c𝐚2),\displaystyle=\min_{\mathbf{a}\in A_{a,b}}\left(\mathbf{a}^{\top}(\mathrm{ES}_{0.95}(F_{1}),\dots,\mathrm{ES}_{0.95}(F_{n}))+c\|\mathbf{a}\|_{2}\right), (31)

where the true value of the original inner VaR problem is approximated by the rearrangement algorithm (RA) of Puccetti and Rüschendorf (2012) and Embrechts et al. (2013), whereas the optimal value of the inner ES problem is obtained by simultaneously minimizing the sum of a linear combination of ES and the 22-norm of the vector 𝐚\mathbf{a}, which can be done efficiently.111111The outer problems of (30) and (31) are solved numerically by the constrained nonlinear multivariable function “fmincon” with the sequential quadratic programming (SQP) algorithm. The same method is also applied when solving outer problems of (32) and (33). In particular, if the marginals of the random losses are identical (i.e., F1==Fn=FF_{1}=\cdots=F_{n}=F), the optimal solution is 𝐚=(1/n,,1/n)\mathbf{a^{*}}=(1/n,\dots,1/n) and VES(a,b,F1,,Fn)=ES0.95(F)+c/nV_{\mathrm{ES}}(a,b,F_{1},\dots,F_{n})=\mathrm{ES}_{0.95}(F)+c/\sqrt{n}. We consider the following marginal distributions

  1. (i)

    FiF_{i} follows a Pareto distribution with scale parameter 11 and shape parameter 3+(i1)/(n1)3+(i-1)/(n-1) for i=1,,ni=1,\dots,n;

  2. (ii)

    FiF_{i} is normally distributed with parameters N(1,1+(i1)/(n1))\mathrm{N}(1,1+(i-1)/(n-1)), for i=1,,ni=1,\dots,n;

  3. (iii)

    FiF_{i} follows an exponential distribution with parameter 1+(i1)/(n1)1+(i-1)/(n-1), for i=1,,ni=1,\dots,n.

We choose nn to be 33, 1010, and 2020. For comparison, we calculate the value nΔ𝐚2n\|\Delta\mathbf{a}^{*}\|_{2}, where Δ𝐚\Delta\mathbf{a}^{*} is the difference between the optimal weights of the non-convex problem and the convex problem. In addition, we calculate the absolute differences between the optimal values obtained by the two problems, ΔV=VESVVaR0\Delta V=V_{\mathrm{ES}}-V_{\mathrm{VaR}}\geqslant 0, and the percentage differences ΔV/VVaR\Delta V/V_{\mathrm{VaR}}. Tables 3 and 4 show the numerical results that compare both optimization problems with two choices of the action sets Aa,bA_{a,b}. The computation time is reported (in seconds). We observe that the optimal values obtained in the two problems get closer and become approximately the same as nn gets larger. As explained before, this is because the set of plausible distributions (F1,,Fn)\mathcal{M}(F_{1},\dots,F_{n}) is asymptotically equal to a set closed under concentration for all intervals.

Next, we consider a TK distortion riskmetric with parameter γ=0.7\gamma=0.7. Due to the non-concavity of hh, there are no known ways of directly solving the non-convex optimization problem

min𝐚Aa,b(supF𝐗(F1,,Fn)ρh(𝐚𝐗)+c𝐚2).\min_{\mathbf{a}\in A_{a,b}}\left(\sup_{F_{\mathbf{X}}\in\mathcal{M}(F_{1},\dots,F_{n})}\rho_{h}(\mathbf{a}^{\top}\mathbf{X})+c\|\mathbf{a}\|_{2}\right). (32)

We may get an approximation of (32) using a lower bound of ρh\rho_{h} in (32) produced with the dependence structure created by the rearrangement algorithm (RA);121212Such a dependence structure obviously provides a lower bound for the worst-case value in (32). In theory, the result from RA is thus not an optimal dependence structure for (32). In our numerical results, this lower bound is very close to an upper bound only for the case of VaR and ES but not for the case of TK distortion riskmetrics. for simplicity, we denote by VhV_{h} this lower bound. On the other hand, by (21), the convex counterpart of (32) can be written (using Theorem 1) as

Vh(a,b,F1,,Fn)\displaystyle V_{h^{*}}(a,b,F_{1},\dots,F_{n}) =min𝐚Aa,b(supF𝐗(F1,,Fn)ρh(𝐚𝐗)+c𝐚2)\displaystyle=\min_{\mathbf{a}\in A_{a,b}}\left(\sup_{F_{\mathbf{X}}\in\mathcal{M}(F_{1},\dots,F_{n})}\rho_{h^{*}}(\mathbf{a}^{\top}\mathbf{X})+c\|\mathbf{a}\|_{2}\right) (33)
=min𝐚Aa,b(𝐚(ρh(X1),,ρh(Xn))+c𝐚2),\displaystyle=\min_{\mathbf{a}\in A_{a,b}}\left(\mathbf{a}^{\top}({\rho}_{h^{*}}(X_{1}),\dots,{\rho}_{h^{*}}(X_{n}))+c\|\mathbf{a}\|_{2}\right),

where XiFiX_{i}\sim F_{i} for i=1,,ni=1,\dots,n. We calculate the absolute differences between the optimal values of the convex and non-convex problems ΔV=VhVh0\Delta V=V_{h^{*}}-V_{h}\geqslant 0, and the percentage differences ΔV/Vh\Delta V/V_{h}. Tables 5 and 6 compare the numerical results of the two optimization problems with different choices of Aa,bA_{a,b}. We observe that the percentage differences between the RA lower bound VhV_{h} for the non-convex problem (32) and the minimum value VhV_{h^{*}} of the convex problem (33) are roughly between 10%10\% to 20%20\%. Note that the RA lower bound is not expected to be very close to the true minimum of (32), and hence the differences between the solution of (32) and the optimal value in (33) are smaller than the observed numbers.

Note that, by transforming a non-convex optimization problem to a convex one, we significantly reduce the computational time of calculating bounds with negligible errors, as shown in Tables 3-6.

Table 3: Comparison of the numerical results of the two optimization problems (30) and (31) for VaR0.95\mathrm{VaR}_{0.95} and ES0.95\mathrm{ES}_{0.95} with a=0a=0 and b=1b=1
cc VVaRV_{\mathrm{VaR}} time VESV_{\mathrm{ES}} time nΔ𝐚2n\|\Delta\mathbf{a}^{*}\|_{2} ΔV\Delta V ΔV/VVaR{\Delta V}/{V_{\mathrm{VaR}}} (%)
(i) Pareto n=3n=\phantom{0}3 2.52.5 3.5473.547 31.53\phantom{0}31.53 3.7413.741 0.720.72 8.88×1058.88\times 10^{-5} 0.1940.194\phantom{0} 5.485.48\phantom{0}
n=10n=10 3.03.0 3.1973.197 153.83153.83 3.2153.215 1.391.39 9.18×1049.18\times 10^{-4} 0.01780.0178 0.5580.558
n=20n=20 4.04.0 3.1563.156 424.17424.17 3.1593.159 9.379.37 3.53×1053.53\times 10^{-5} 2.68×1032.68\times 10^{-3} 0.0850\phantom{0}0.0850
(ii) Normal n=3n=\phantom{0}3 4.04.0 5.7665.766 31.19\phantom{0}31.19 5.7855.785 0.180.18 1.39×1031.39\times 10^{-3} 0.01860.0186 0.3230.323
n=10n=10 2.02.0 4.0824.082 97.30\phantom{0}97.30 4.0834.083 0.770.77 1.18×1031.18\times 10^{-3} 3.24×1053.24\times 10^{-5} 7.93×1047.93\times 10^{-4}
n=20n=20 3.03.0 4.1324.132 431.79431.79 4.1324.132 4.664.66 2.69×1032.69\times 10^{-3} 1.88×1051.88\times 10^{-5} 4.55×1044.55\times 10^{-4}
(iii) Exp n=3n=\phantom{0}3 3.03.0 4.2514.251 26.78\phantom{0}26.78 4.4054.405 0.070.07 0.3310.331 0.1550.155\phantom{0} 3.643.64\phantom{0}
n=10n=10 4.04.0 3.8923.892 118.23118.23 3.8933.893 0.500.50 9.74×1049.74\times 10^{-4} 2.92×1042.92\times 10^{-4} 7.52×1037.52\times 10^{-3}
n=20n=20 7.07.0 4.2304.230 543.03543.03 4.2304.230 3.473.47 3.08×1043.08\times 10^{-4} 4.47×1054.47\times 10^{-5} 1.06×1031.06\times 10^{-3}
Table 4: Comparison of the numerical results of the two optimization problems (30) and (31) for VaR0.95\mathrm{VaR}_{0.95} and ES0.95\mathrm{ES}_{0.95} with a=1/(2n)a=1/(2n) and b=2/nb=2/n
cc VVaRV_{\mathrm{VaR}} time VESV_{\mathrm{ES}} time nΔ𝐚2n\|\Delta\mathbf{a}^{*}\|_{2} ΔV\Delta V ΔV/VVaR{\Delta V}/{V_{\mathrm{VaR}}} (%)
(i) Pareto n=3n=\phantom{0}3 2.52.5 3.5463.546 54.59\phantom{0}54.59 3.7413.741 0.190.19 6.58×1046.58\times 10^{-4} 0.1940.194\phantom{0} 5.485.48\phantom{0}
n=10n=10 3.03.0 3.2043.204 146.63146.63 3.2203.220 1.601.60 1.99×1041.99\times 10^{-4} 0.01600.0160 0.4980.498
n=20n=20 4.04.0 3.1623.162 847.13847.13 3.1633.163 10.0810.08\phantom{0} 1.69×1031.69\times 10^{-3} 2.23×1032.23\times 10^{-3} 0.0706\phantom{0}0.0706
(ii) Normal n=3n=\phantom{0}3 4.04.0 5.7665.766 57.31\phantom{0}57.31 5.7855.785 0.190.19 1.32×1031.32\times 10^{-3} 0.01870.0187 0.3240.324
n=10n=10 2.02.0 4.0844.084 166.25166.25 4.0844.084 0.790.79 0 2.94×1052.94\times 10^{-5} 7.20×1047.20\times 10^{-4}
n=20n=20 3.03.0 4.1334.133 691.91691.91 4.1334.133 5.915.91 0 1.99×1051.99\times 10^{-5} 4.82×1044.82\times 10^{-4}
(iii) Exp n=3n=\phantom{0}3 3.03.0 4.3694.369 48.58\phantom{0}48.58 4.4224.422 0.090.09 1.04×1031.04\times 10^{-3} 0.05330.0533 1.221.22\phantom{0}
n=10n=10 4.04.0 3.9163.916 115.18115.18 3.9163.916 0.500.50 2.54×1052.54\times 10^{-5} 1.38×1041.38\times 10^{-4} 3.52×1033.52\times 10^{-3}
n=20n=20 7.07.0 4.2364.236 665.05665.05 4.2364.236 3.483.48 2.73×1042.73\times 10^{-4} 4.04×1054.04\times 10^{-5} 9.54×1049.54\times 10^{-4}
Table 5: Comparison of the numerical results of the two optimization problems (32) and (33) for TK distortion riskmetrics with a=0a=0 and b=1b=1
cc VhV_{h} time VhV_{h^{*}} time nΔ𝐚2n\|\Delta\mathbf{a}^{*}\|_{2} ΔV\Delta V ΔV/Vh{\Delta V}/{V_{h}} (%)
(i) Pareto n=3n=\phantom{0}3 1.01.0 1.0761.076 144.75144.75 1.1851.185 0.230.23 0.4880.488 0.1090.109 10.210.2
n=10n=10 2.02.0 1.0471.047 220.03220.03 1.2371.237 1.421.42 0 0.1900.190 18.118.1
n=20n=20 4.04.0 1.3011.301 826.64826.64 1.5011.501 8.248.24 0 0.2000.200 15.415.4
(ii) Normal n=3n=\phantom{0}3 0.50.5 1.2401.240 60.76\phantom{0}60.76 1.4931.493 0.160.16 0.0784\phantom{0}0.0784 0.2530.253 20.420.4
n=10n=10 0.50.5 1.1411.141 246.31246.31 1.3631.363 0.720.72 1.281.28\phantom{0} 0.2220.222 19.419.4
n=20n=20 0.50.5 1.1031.103 1503.351503.35\phantom{0} 1.3161.316 2.802.80 1.781.78\phantom{0} 0.2130.213 19.319.3
(iii) Exp n=3n=\phantom{0}3 1.01.0 1.3051.305 49.79\phantom{0}49.79 1.4271.427 0.230.23 0.3600.360 0.1220.122 9.329.32
n=10n=10 2.02.0 1.3131.313 198.43198.43 1.4841.484 1.621.62 0.1840.184 0.1710.171 13.013.0
n=20n=20 2.02.0 1.1201.120 850.12850.12 1.2861.286 10.9110.91\phantom{0} 0.1580.158 0.1660.166 14.814.8
Table 6: Comparison of the numerical results of the two optimization problems (32) and (33) for TK distortion riskmetrics with a=1/(2n)a=1/(2n) and b=2/nb=2/n
cc VhV_{h} time VhV_{h^{*}} time nΔ𝐚2n\|\Delta\mathbf{a}^{*}\|_{2} ΔV\Delta V ΔV/Vh{\Delta V}/{V_{h}} (%)
(i) Pareto n=3n=\phantom{0}3 1.01.0 1.0771.077 73.21\phantom{0}73.21 1.1851.185 0.250.25 0.4690.469 0.1090.109 10.11\phantom{0}10.11
n=10n=10 2.02.0 1.0471.047 248.38248.38 1.2371.237 2.292.29 0.3780.378 0.1910.191 18.218.2
n=20n=20 4.04.0 1.3011.301 638.24638.24 1.5011.501 12.2112.21\phantom{0} 0 0.2000.200 15.415.4
(ii) Normal n=3n=\phantom{0}3 0.50.5 1.2401.240 179.68179.68 1.4931.493 0.190.19 0.0784\phantom{0}0.0784 0.2530.253 20.420.4
n=10n=10 0.50.5 1.1461.146 389.97389.97 1.3631.363 0.760.76 0.6600.660 0.2170.217 19.019.0
n=20n=20 0.50.5 1.1031.103 1563.841563.84\phantom{0} 1.3161.316 3.393.39 1.631.63\phantom{0} 0.2130.213 19.319.3
(iii) Exp n=3n=\phantom{0}3 1.01.0 1.3041.304 52.66\phantom{0}52.66 1.4301.430 0.250.25 0.1070.107 0.1260.126 9.659.65
n=10n=10 2.02.0 1.3121.312 236.15236.15 1.4851.485 2.272.27 0.2140.214 0.1720.172 13.113.1
n=20n=20 2.02.0 1.1191.119 879.73879.73 1.2891.289 10.1010.10\phantom{0} 0.1410.141 0.1700.170 15.215.2

8 Concluding remarks

We introduced the new concept of closedness under concentration, which is, in the context of distributional uncertainty, a sufficient condition to transform an optimization problem with a non-convex distortion riskmetric to its convex counterpart. This concept is genuinely weaker than closedness under conditional expectation, and our main result unifies and improves many existing results in the literature. Many sets of plausible distributions commonly used in the literature of finance, optimization, and risk management are closed under concentration within some \mathcal{I}. Moreover, by focusing on distortion riskmetrics whose distortion functions are not necessarily monotone, concave, or continuous, we are able to solve optimization problems for the class of functionals larger than classical risk measures or deviation measures. In particular, we are able to obtain bounds on differences between two distortion riskmetrics, which represent measures of disagreement between two utilities/risk attitudes. Our result can also be applied to solve the popular problem of optimizing risk measures under moment constraints. In particular, we obtain the worst- and best-case distortion riskmetrics when the underlying random variable has a fixed mean and bounded pp-th moment.

We demonstrate the applicability of our result by numerically calculating the solution to optimizing the difference between risk measures, preference robust optimization and portfolio optimization under marginal constraints. In all numerical examples, the original non-convex problem is converted or well approximated by a convex one which can be solved efficiently.

Our condition of closedness under concentration within \mathcal{I} in Theorem 1 is sufficient but not necessary for the equivalence of a non-convex and a convex optimization problem under distributional uncertainty. A necessary condition of the equivalence is closedness under concentration of the set of maximizers in Theorem 2. An open question is to find a necessary and sufficient condition on the uncertainty set \mathcal{M} itself such that the desired equivalence holds. Pinning down such a condition may facilitate many more applications in decision theory, finance, game theory, and operations research.

Acknowledgments

The authors would like to thank anonymous referees for their constructive comments enhancing the paper. SMP would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada with funding reference numbers DGECR-2020-00333 and RGPIN-2020-04289. RW acknowledges financial support from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-03823, RGPAS-2018-522590).

References

  • Acerbi (2002) Acerbi, C. (2002). Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking and Finance, 26(7), 1505–1518.
  • Armbruster and Delage (2015) Armbruster, B. and Delage, E. (2015). Decision making under uncertainty when preference information is incomplete. Management Science, 61(1), 111–128.
  • Artzner et al. (1999) Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999). Coherent measures of risk. Mathematical Finance, 9(3), 203–228.
  • Bernard et al. (2020) Bernard, C., Pesenti, S. M. and Vanduffel, S. (2020). Robust distortion risk measures. SSRN: 3677078.
  • Blanchet et al. (2020) Blanchet, J., Lam, H., Liu, Y. and Wang, R. (2020). Convolution bounds on quantile aggregation. arXiv: 2007.09320.
  • Blanchet and Murthy (2019) Blanchet, J. and Murthy, K. (2019). Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44(2), 565–600.
  • Brighi and Chipot (1994) Brighi, B. and Chipot, M. (1994). Approximated convex envelope of a function. SIAM Journal on Numerical Analysis, 31, 128–148.
  • Cai et al. (2020) Cai, J., Li, J. and Mao, T. (2020). Distributionally robust optimization under distorted expectations. SSRN: 3566708.
  • Cai et al. (2018) Cai, J., Liu, H. and Wang, R. (2018). Asymptotic equivalence of risk measures under dependence uncertainty. Mathematical Finance, 28(1), 29–49.
  • Cornilly et al. (2018) Cornilly, D., Rüschendorf, L. and Vanduffel, S. (2018). Upper bounds for strictly concave distortion risk measures on moment spaces. Insurance: Mathematics and Economics, 82, 141–151.
  • Delage et al. (2014) Delage, E., Arroyo, S. and Ye, Y. (2014). The value of stochastic modeling in two-stage stochastic programs with cost uncertainty. Operations Research, 62(6), 1377–1393.
  • Delage and Li (2018) Delage, E. and Li, Y. (2018). Minimizing risk exposure when the choice of a risk measure is ambiguous. Management Science, 64(1), 327–344.
  • Delage and Ye (2010) Delage, E. and Ye, Y. (2010). Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3), 595–612.
  • Denneberg (1994) Denneberg, D. (1994). Non-additive Measure and Integral. Springer Science & Business Media.
  • Embrechts et al. (2013) Embrechts, P., Puccetti, G. and Rüschendorf, L. (2013). Model uncertainty and VaR aggregation. Journal of Banking and Finance, 37(8), 2750–2764.
  • Embrechts et al. (2015) Embrechts, P., Wang, B. and Wang, R. (2015). Aggregation-robustness and model uncertainty of regulatory risk measures. Finance and Stochastics, 19(4), 763–790.
  • Föllmer and Schied (2002) Föllmer, H. and Schied, A. (2002). Convex measures of risk and trading constraints. Finance and Stochastics, 6(4), 429–447.
  • Guo and Xu (2021) Guo, S. and Xu, H. (2021). Statistical robustness in utility preference robust optimization models. Mathematical Programming Series A, 190, 679–720.
  • Huber and Ronchetti (2009) Huber, P. J. and Ronchetti E. M. (2009). Robust Statistics. Second Edition, Wiley Series in Probability and Statistics. Wiley, New Jersey.
  • Kusuoka (2001) Kusuoka, S. (2001). On law invariant coherent risk measures. Advances in Mathematical Economics, 3, 83–95.
  • Li et al. (2018) Li, L., Shao, H., Wang, R. and Yang, J. (2018). Worst-case Range Value-at-Risk with partial information. SIAM Journal on Financial Mathematics, 9(1), 190–218.
  • Li (2018) Li, Y. (2018). Closed-form solutions for worst-case law invariant risk measures with application to robust portfolio optimization. Operations Research, 66(6), 1457–1759.
  • Liu et al. (2020) Liu, F., Cai, J., Lemieux, C. and Wang, R. (2020). Convex risk functionals: Representation and applications. Insurance: Mathematics and Economics, 90, 66–79.
  • Liu et al. (2022) Liu, F., Mao, T., Wang, R. and Wei, L. (2022). Inf-convolution, optimal allocations, and model uncertainty for tail risk measures. Mathematics of Operations Research, published online.
  • Mao et al. (2019) Mao, T., Wang, B. and Wang, R. (2019). Sums of uniform random variables. Journal of Applied Probability, 56(3), 918–936.
  • Mao and Wang (2015) Mao, T. and Wang, R. (2015). On aggregation sets and lower-convex sets. Journal of Multivariate Analysis, 138, 170–181.
  • Mao et al. (2022) Mao, T., Wang, R. and Wu, Q. (2022). Model aggregation for risk evaluation and robust optimization. arXiv: 2201.06370.
  • Marinacci and Montrucchio (2008) Marinacci, M. and Montrucchio, L. (2008). On concavity and supermodularity. Journal of Mathematical Analysis and Applications, 344(2), 642–654.
  • McNeil et al. (2015) McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques and Tools. Revised Edition. Princeton, NJ: Princeton University Press.
  • Natarajan et al. (2008) Natarajan, K., Pachamanova, D. and Sim, M. (2008). Incorporating asymmetric distributional information in robust value-at-risk optimization. Management Science, 54(3), 573–585.
  • Popescu (2007) Popescu, I. (2007). Robust mean-covariance solutions for stochastic optimization. Operations Research, 55(1), 98–112.
  • Puccetti and Rüschendorf (2012) Puccetti, G. and Rüschendorf, L. (2012). Computation of sharp bounds on the distribution of a function of dependent risks. Journal of Computational and Applied Mathematics, 236(7), 1833–1840.
  • Rockafellar and Royset (2018) Rockafellar, R. T. and Royset, J. O. (2018). Superquantile/CVaR risk measures: second-order theory. Annals of Operations Research, 262(1), 3–28.
  • Rockafellar and Uryasev (2002) Rockafellar, R. T. and Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Banking and Finance, 26(7), 1443–1471.
  • Rockafellar et al. (2006) Rockafellar, R. T., Uryasev, S. and Zabarankin, M. (2006). Generalized deviation in risk analysis. Finance and Stochastics, 10, 51–74.
  • Shaked and Shanthikumar (2007) Shaked, M. and Shanthikumar, J. G. (2007). Stochastic Orders. Springer Series in Statistics.
  • Simchi-Levi et al. (2005) Simchi-Levi, D., Chen, X. and Bramel, J. (2005). The Logic of Logistics. Theory, Algorithms, and Applications for Logistics and Supply Chain Management. Third edition. New York, NY: Springer.
  • Tchen (1980) Tchen, A. H. (1980). Inequalities for distributions with given marginals. Annals of Probability8(4), 814–827.
  • Tversky and Kahneman (1992) Tversky, A. and Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of Uncertainty. Journal of Risk and Uncertainty, 5(4), 297–323.
  • Wang and Wang (2016) Wang, B. and Wang, R. (2016). Joint mixability. Mathematics of Operations Research, 41(3), 808–826.
  • Wang et al. (2020a) Wang, Q., Wang, R. and Wei, Y. (2020a). Distortion riskmetrics on general spaces. ASTIN Bulletin, 50(4), 827–851.
  • Wang et al. (2015) Wang, R., Bignozzi, V. and Tsakanas, A. (2015). How superadditive can a risk measure be? SIAM Jounral on Financial Mathematics, 6, 776–803.
  • Wang et al. (2019) Wang, R., Xu, Z. Q. and Zhou, X. Y. (2019). Dual utilities on risk aggregation under dependence uncertainty. Finance and Stochastics, 23(4), 1025–1048.
  • Wang et al. (2020b) Wang, R., Wei, Y. and Willmot, G. E. (2020b). Characterization, robustness and aggregation of signed Choquet integrals. Mathematics of Operations Research, 45(3), 993–1015.
  • Wang et al. (1997) Wang, S., Young, V. R. and Panjer, H. H. (1997). Axiomatic characterization of insurance prices. Insurance: Mathematics and Economics, 21(2), 173–183.
  • Wiesemann et al. (2014) Wiesemann, W., Kuhn, D. and Sim, M. (2014). Distributionally robust convex optimization. Operations Research, 62(6), 1203–1466.
  • Wu and Gonzalez (1996) Wu, G. and Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science, 42(12), 1676–1690.
  • Yaari (1987) Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55(1), 95–115.
  • Zhu and Shao (2018) Zhu, W. and Shao, H. (2018). Closed-form solutions for extreme-case distortion risk measures and applications to robust portfolio management. SSRN: 3103458.
  • Zhu and Fukushima (2009) Zhu, S. and Fukushima, M. (2009). Worst-case conditional value-at-risk with application to robust portfolio management. Operations Research, 57(5), 1155–1168.
  • Zymler et al. (2013) Zymler, S., Kuhn, D. and Rustem, B. (2013). Distributionally robust joint chance constraints with second-order moment information. Mathematical Programming, 137(1–2), 167–198.

Technical appendices

Appendix A Omitted technical details from the paper

In this appendix, we present technical details for some examples and as well as some technical remarks omitted from the paper.

A.1 Proofs of claims in some Examples

Proof of the claim in Example 6.

We show that mv(𝐚,𝝁,Σ)\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma) is equivalent to

{FS2:𝔼[S]=𝐚𝝁,var(S)𝐚Σ𝐚}=(2,𝐚𝝁,(𝐚Σ𝐚)1/2).\{F_{S}\in\mathcal{M}_{2}:\mathbb{E}[S]=\mathbf{a}^{\top}\bm{\mu},~{}\mathrm{var}(S)\leqslant\mathbf{a}^{\top}\Sigma\mathbf{a}\}=\mathcal{M}\left(2,\mathbf{a}^{\top}\bm{\mu},\left(\mathbf{a}^{\top}\Sigma\mathbf{a}\right)^{1/2}\right).

For a proof of the equivalence between the sets with fixed mean and covariance matrix, see Popescu (2007). Indeed, it is clear that mv(𝐚,𝝁,Σ)(2,𝐚𝝁,(𝐚Σ𝐚)1/2)\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)\subset\mathcal{M}(2,\mathbf{a}^{\top}\bm{\mu},(\mathbf{a}^{\top}\Sigma\mathbf{a})^{1/2}). On the other hand, for all FS(2,𝐚𝝁,(𝐚Σ𝐚)1/2)F_{S}\in\mathcal{M}(2,\mathbf{a}^{\top}\bm{\mu},(\mathbf{a}^{\top}\Sigma\mathbf{a})^{1/2}), we write 𝐚=(a1,,an)\mathbf{a}=(a_{1},\dots,a_{n}), 𝝁=(μ1,,μn)\bm{\mu}=(\mu_{1},\dots,\mu_{n}), and take 𝐗=(X1,,Xn)\mathbf{X}=(X_{1},\dots,X_{n}) such that Xi=(S𝐚𝝁)/(nai)+μiX_{i}=(S-\mathbf{a}^{\top}\bm{\mu})/(na_{i})+\mu_{i}, for i=1,,ni=1,\dots,n. It follows that FS=F𝐚𝐗mv(𝐚,𝝁,Σ)F_{S}=F_{\mathbf{a}^{\top}\mathbf{X}}\in\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma). Therefore, we have mv(𝐚,𝝁,Σ)=(2,𝐚𝝁,(𝐚Σ𝐚)1/2)\mathcal{M}^{\mathrm{mv}}(\mathbf{a},\bm{\mu},\Sigma)=\mathcal{M}(2,\mathbf{a}^{\top}\bm{\mu},(\mathbf{a}^{\top}\Sigma\mathbf{a})^{1/2}). ∎

Proof of the claim in Example 10.

We will show that (G~,ε)\mathcal{M}(\widetilde{G},\varepsilon) is closed under concentration within \mathcal{I} for all ~\mathcal{I}\subset\widetilde{\mathcal{I}}. Write ={Ci:iK}\mathcal{I}=\{C_{i}:i\in K\} for some KK\subset\mathbb{N}. For all iKi\in K and F(G~,ε)F\in\mathcal{M}(\widetilde{G},\varepsilon), we have G~1(u)=ci\widetilde{G}^{-1}(u)=c_{i} for uCiu\in C_{i} for some cic_{i}\in\mathbb{R}. For all iKi\in K, by Jensen’s inequality,

1λ(Ci)Ci|F1(u)G~1(u)|pdu|CiF1(u)duλ(Ci)ci|p=1λ(Ci)Ci|(FCi)1(u)G~1(u)|pdu.\frac{1}{\lambda(C_{i})}\int_{C_{i}}\left|F^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u\geqslant\left|\frac{\int_{C_{i}}F^{-1}(u)\,\mathrm{d}u}{\lambda(C_{i})}-c_{i}\right|^{p}=\frac{1}{\lambda(C_{i})}\int_{C_{i}}\left|(F^{C_{i}})^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u.

It follows that

(Wp(F,G~))p(Wp(FCi,G~))p\displaystyle(W_{p}(F,\widetilde{G}))^{p}-(W_{p}(F^{C_{i}},\widetilde{G}))^{p} =01|F1(u)G~1(u)|pdu01|(FCi)1(u)G~1(u)|pdu\displaystyle=\int^{1}_{0}\left|F^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u-\int^{1}_{0}\left|(F^{C_{i}})^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u
=Ci|F1(u)G~1(u)|pduCi|(FCi)1(u)G~1(u)|pdu0,\displaystyle=\int_{C_{i}}\left|F^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u-\int_{C_{i}}\left|(F^{C_{i}})^{-1}(u)-\widetilde{G}^{-1}(u)\right|^{p}\,\mathrm{d}u\geqslant 0,

and thus Wp(FCi,G~)Wp(F,G~)εW_{p}(F^{C_{i}},\widetilde{G})\leqslant W_{p}(F,\widetilde{G})\leqslant\varepsilon. Moreover, (8) and the above argument lead to

(Wp(F,G~))p(Wp(F,G~))p=iK(Wp(F,G~))p(Wp(FCi,G~))p0.(W_{p}(F,\widetilde{G}))^{p}-(W_{p}(F^{\mathcal{I}},\widetilde{G}))^{p}=\sum_{i\in K}(W_{p}(F,\widetilde{G}))^{p}-(W_{p}(F^{C_{i}},\widetilde{G}))^{p}\geqslant 0.

Hence, Wp(F,G~)Wp(F,G~)εW_{p}(F^{\mathcal{I}},\widetilde{G})\leqslant W_{p}(F,\widetilde{G})\leqslant\varepsilon. ∎

Proof of the claim in Example 11.

For ε0\varepsilon\geqslant 0, 𝐰[0,)n\mathbf{w}\in[0,\infty)^{n}, p>1p>1, a>1a>1 and 𝐙(p)n\mathbf{Z}\in(\mathcal{L}^{p})^{n}, by Theorem 7 of Mao et al. (2022), the uncertainty set

{F𝐰𝐗p:F𝐗n(F𝐙,ε)}=(F𝐰𝐙,ε𝐰b),\{F_{\mathbf{w}^{\top}\mathbf{X}}\in\mathcal{M}_{p}:F_{\mathbf{X}}\in\mathcal{M}^{n}(F_{\mathbf{Z}},\varepsilon)\}=\mathcal{M}(F_{\mathbf{w}^{\top}\mathbf{Z}},\varepsilon\|\mathbf{w}\|_{b}),

where bb is the conjugate of aa (i.e., 1/a+1/b=11/a+1/b=1). Suppose that for a benchmark distribution G~pn\widetilde{G}\in\mathcal{M}^{n}_{p}, there exists a random vector 𝐙G~\mathbf{Z}\sim\widetilde{G} such that 𝐙𝟎\mathbf{Z}\geqslant\mathbf{0} and (𝐙=𝟎)=p0\mathbb{P}(\mathbf{Z}=\mathbf{0})=p_{0} for some p0(0,1]p_{0}\in(0,1]. Note that (𝐰𝐙=0)p0\mathbb{P}(\mathbf{w}^{\top}\mathbf{Z}=0)\geqslant p_{0} and the quantile function of 𝐰𝐙\mathbf{w}^{\top}\mathbf{Z} is equal to 0 on (0,p0](0,p_{0}]. It follows from Example 10 that the set (F𝐰𝐙,ε𝐰b)\mathcal{M}(F_{\mathbf{w}^{\top}\mathbf{Z}},\varepsilon\|\mathbf{w}\|_{b}) is closed under concentration within {(0,t)}\{(0,t)\} for all tp0t\leqslant p_{0}. ∎

Proof of the claim in Example 12.

We will show that the set of distributions,

={FXV(X)+g(𝔼[V(X)])1:V𝒱},\mathcal{M}=\{F_{X-V(X)+g(\mathbb{E}[V(X)])}\in\mathcal{M}_{1}:V\in\mathcal{V}\},

is closed under concentration within {(p,1)}\{(p,1)\} for all p[p0,1)p\in[p_{0},1). For each V𝒱V\in\mathcal{V} and a standard uniform random variable UU, we write a=𝔼[FXV(X)1(U)|U(p,1)]a=\mathbb{E}[F^{-1}_{X-V(X)}(U)|U\in(p,1)]. Since FX1(p)lF^{-1}_{X}(p)\geqslant l, we can take

W(x)=V(x)𝟙{xFX1(p)}+(xa)𝟙{x>FX1(p)},x.W(x)=V(x)\mathds{1}_{\{x\leqslant F^{-1}_{X}(p)\}}+(x-a)\mathds{1}_{\{x>F^{-1}_{X}(p)\}},~{}~{}x\in\mathbb{R}.

It follows that W𝒱W\in\mathcal{V}. Noting that a=𝔼[XV(X)|X>FX1(p)]a=\mathbb{E}[X-V(X)|X>F^{-1}_{X}(p)], we have

XW(X)+g(𝔼[W(X)])\displaystyle X-W(X)+g(\mathbb{E}[W(X)])
=(XV(X))𝟙{XFX1(p)}+a𝟙{X>FX1(p)}+g(𝔼[V(X)𝟙{XFX1(p)}+(Xa)𝟙{X>FX1(p)}])\displaystyle=(X-V(X))\mathds{1}_{\{X\leqslant F^{-1}_{X}(p)\}}+a\mathds{1}_{\{X>F^{-1}_{X}(p)\}}+g\left(\mathbb{E}[V(X)\mathds{1}_{\{X\leqslant F^{-1}_{X}(p)\}}+(X-a)\mathds{1}_{\{X>F^{-1}_{X}(p)\}}]\right)
=(XV(X))𝟙{XFX1(p)}+a𝟙{X>FX1(p)}+g(𝔼[V(X)]),\displaystyle=(X-V(X))\mathds{1}_{\{X\leqslant F^{-1}_{X}(p)\}}+a\mathds{1}_{\{X>F^{-1}_{X}(p)\}}+g(\mathbb{E}[V(X)]),

which follows the same distribution as FXV(X)+g(𝔼[V(X)])(p,1)F_{X-V(X)+g(\mathbb{E}[V(X)])}^{(p,1)}. It follows that \mathcal{M} is closed under concentration within {(p,1)}\{(p,1)\} for all p[p0,1)p\in[p_{0},1). ∎

A.2 A few additional technical remarks mentioned in the paper

Remark 5 (on Theorem 1).

Using Theorem 1, if for some 𝐚A\mathbf{a}\in A, the set :={Ff(𝐚,𝐗):F𝐗~}\mathcal{M}:=\{F_{f(\mathbf{a},\mathbf{X})}:F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\} is closed under concentration for all intervals and sup{ρh(f(𝐚,𝐗)):F𝐗~}=\sup\{\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})):F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\}=\infty, then sup{ρh(f(𝐚,𝐗)):F𝐗~}=\sup\{\rho_{h}(f(\mathbf{a},\mathbf{X})):F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\}=\infty. Thus, both objectives in the inner optimization of (1) are infinite for this 𝐚\mathbf{a}, which can be excluded from the outer optimization over AA. Verifying sup{ρh(f(𝐚,𝐗)):F𝐗~}=\sup\{\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})):F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\}=\infty is easier than verifying sup{ρh(f(𝐚,𝐗)):F𝐗~}=\sup\{\rho_{h}(f(\mathbf{a},\mathbf{X})):F_{\mathbf{X}}\in\widetilde{\mathcal{M}}\}=\infty since generally ρh\rho_{h} is smaller than ρh\rho_{h^{*}}.

Remark 6 (on Example 6).

Using Strassen’s Theorem (e.g., Theorem 3.A.4 of Shaked and Shanthikumar (2007)), closedness under conditional expectation can equivalently be expressed using convex order. A set 1\mathcal{M}\subset\mathcal{M}_{1} is closed under conditional expectation if and only if it holds that for FF\in\mathcal{M} and GcxFG\leqslant_{\rm cx}F, we have GG\in\mathcal{M}.

Remark 7 (on Proposition 3).

In Proposition 3, if \mathcal{M} is closed under conditional expectation, \mathcal{I} can be taken as an infinite set. However, \mathcal{M} may not be closed under concentration within an infinite \mathcal{I} if we only assume that \mathcal{M} is closed under concentration for all intervals. Indeed, if we take \mathcal{M} as the set of distributions obtained by some FF\in\mathcal{M} with finitely many concentrations, then clearly \mathcal{M} is closed under concentration for all intervals. However, FF^{\mathcal{I}}\notin\mathcal{M} when \mathcal{I} is an infinite collection of disjoint intervals. This also serves as a counter-example of the converse statement of Proposition 2 since \mathcal{M} is closed under concentration for all intervals but not closed under conditional expectation.

Appendix B Proofs of all technical results

We present all proofs of technical results in this appendix. Throughout, we denote the set of discontinuity points of hh (excluding 0 and 11) by

Jh={t(0,1):h(t)h(t+) or h(t)h(t)}.J_{h}=\{t\in(0,1):h(t)\neq h(t^{+})\mbox{ or }h(t)\neq h(t^{-})\}. (A.1)

Note that h^(t)\hat{h}(t) can be written as

h^(t)={h(t+)h(t)h(t),tJh,h(t),otherwise.\hat{h}(t)=\left\{\begin{array}[]{l l}h(t^{+})\vee h(t^{-})\vee h(t),&t\in J_{h},\\[2.5pt] h(t),&\text{otherwise}.\end{array}\right. (A.2)

B.1 Proof of results in Section 2

Proof of Proposition 1.

Note that (h^)=h=h^=h(\hat{h})^{*}=h^{*}=\hat{h}=h on 0 and 11. For all t(0,1)t\in(0,1), since (h^)(t)h^(t)h(t)(\hat{h})^{*}(t)\geqslant\hat{h}(t)\geqslant h(t), we have (h^)(t)h(t)(\hat{h})^{*}(t)\geqslant h^{*}(t). On the other hand, we have h(t)h(t+)h^{*}(t)\geqslant h(t^{+}) for t(0,1)t\in(0,1). Indeed, if h(t0)<h(t0+)h^{*}(t_{0})<h(t^{+}_{0}) for some t0(0,1)t_{0}\in(0,1), then we have h(t0+ε)<h(t0+ε)h^{*}(t_{0}+\varepsilon)<h(t_{0}+\varepsilon) for some ε>0\varepsilon>0, which leads to a contradiction. Similarly, we have h(t)h(t)h^{*}(t)\geqslant h(t^{-}) for t(0,1)t\in(0,1). Together with hhh^{*}\geqslant h on (0,1)(0,1), we have hh^h^{*}\geqslant\hat{h} on (0,1)(0,1), which implies that h(h^)h^{*}\geqslant(\hat{h})^{*} on (0,1)(0,1). Therefore, (h^)=h(\hat{h})^{*}=h^{*} on [0,1][0,1].

Next, we assert that the set {t[0,1]:h^(t)h(t)}\{t\in[0,1]:\hat{h}(t)\neq h^{*}(t)\} is a union of disjoint sets that are not singletons. To show this assertion, assume that the converse is true. There exists x(0,1)x\in(0,1), such that h^(x)<h(x)\hat{h}(x)<h^{*}(x) and h^(t)=h(t)\hat{h}(t)=h^{*}(t) on t(xε,x)(x,x+ε)t\in(x-\varepsilon,x)\cup(x,x+\varepsilon) for some 0<εx(1x)0<\varepsilon\leqslant x\wedge(1-x). It is clear that xJhx\in J_{h}. Since hh^{*} is continuous on (xε,x+ε)(x-\varepsilon,x+\varepsilon), we have

h^(x)<h(x)=h(x+)=h^(x+).\hat{h}(x)<h^{*}(x)=h^{*}(x^{+})=\hat{h}(x^{+}).

This contradicts (A.2). Therefore, the set {t[0,1]:h^(t)h(t)}\{t\in[0,1]:\hat{h}(t)\neq h^{*}(t)\} is the union of some disjoint intervals, denoted by lLAl\cup_{l\in L}A_{l} for some LL\subset\mathbb{N}. For all lLl\in L, we denote the left and right endpoints of AlA_{l} by ala_{l} and blb_{l}, respectively, with al<bla_{l}<b_{l}. Define a function via linear interpolation

hc(t)={h^(al)+h^(bl)h^(al)blal(tal),tAl,lL,h^(t),otherwise.h^{c}(t)=\left\{\begin{array}[]{l l}\hat{h}(a_{l})+\frac{\hat{h}(b_{l})-\hat{h}(a_{l})}{b_{l}-a_{l}}(t-a_{l}),&t\in A_{l},~{}l\in L,\\ \hat{h}(t),&\text{otherwise}.\end{array}\right.

It is clear that hchh^{c}\leqslant h^{*} and hch^{c} is continuous on (0,1)(0,1). We will prove that hc=hh^{c}=h^{*} on lLAl\cup_{l\in L}A_{l}. Suppose for the purpose of contradiction that hchh^{c}\neq h^{*} on lLAl\cup_{l\in L}A_{l}. Since hc<hh^{c}<h^{*} for some point in lLAl\cup_{l\in L}A_{l}, there exists x0Alx_{0}\in A_{l} for some lLl\in L such that hc(x0)<h^(x0)h^{c}(x_{0})<\hat{h}(x_{0}). Thus we can take a point (x1,h^(x1))(0,1)×(x_{1},\hat{h}(x_{1}))\in(0,1)\times\mathbb{R} with h^(x1)>hc(x1)\hat{h}(x_{1})>h^{c}(x_{1}), which has the largest perpendicular distance to the straight line hc(t)=h^(al)+h^(bl)h^(al)blal(tal)h^{c}(t)=\hat{h}(a_{l})+\frac{\hat{h}(b_{l})-\hat{h}(a_{l})}{b_{l}-a_{l}}(t-a_{l}), namely,

x1=argmaxxAlh^(x)>hc(x)(blal)h^(x)(h^(bl)h^(al))x(blal)h^(al)+(h^(bl)h^(al))al((h^(bl)h^(al))2+(blal)2)1/2.x_{1}=\operatorname*{arg\,max}_{\begin{subarray}{c}x\in A_{l}\\ \hat{h}(x)>h^{c}(x)\end{subarray}}\frac{(b_{l}-a_{l})\hat{h}(x)-(\hat{h}(b_{l})-\hat{h}(a_{l}))x-(b_{l}-a_{l})\hat{h}(a_{l})+(\hat{h}(b_{l})-\hat{h}(a_{l}))a_{l}}{\left((\hat{h}(b_{l})-\hat{h}(a_{l}))^{2}+(b_{l}-a_{l})^{2}\right)^{1/2}}.

The existence of the maximizer x1x_{1} is due to the upper semicontinuity of h^\hat{h}. There exists a function gg with g=hg=h^{*} on [0,1]Al[0,1]\setminus A_{l} and g(x1)=h^(x1)g(x_{1})=\hat{h}(x_{1}), such that gg is concave and h^gh\hat{h}\leqslant g\leqslant h^{*} on [0,1][0,1]. Since h>h^h^{*}>\hat{h} on AlA_{l}, we have h(x1)>h^(x1)=g(x1)h^{*}(x_{1})>\hat{h}(x_{1})=g(x_{1}). Thus hh^{*} cannot be the concave envelope of h^\hat{h}, which leads to a contradiction. Thus, h=hch^{*}=h^{c} on lLAl\cup_{l\in L}A_{l}. Since h=h^=hch^{*}=\hat{h}=h^{c} on (0,1)(lLAl)(0,1)\setminus(\cup_{l\in L}A_{l}), we have h=hch^{*}=h^{c}. Therefore, {t[0,1]:h^(t)h(t)}\{t\in[0,1]:\hat{h}(t)\neq h^{*}(t)\} is a union of disjoint open intervals, and hh^{*} is linear on each of the intervals. ∎

B.2 Proofs of results in Section 3

Proof of Theorem 1.

We will first show that, assuming that \mathcal{M} is closed under concentration within h\mathcal{I}_{h}, we have

supFXρh^(X)=supFXρh(X).\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X)=\sup_{F_{X}\in\mathcal{M}}\rho_{h^{*}}(X). (A.3)

After proving (A.3), we show the three statements in Theorem 1 in the order (i), (ii), and (iii).

For hh\in\mathcal{H}, suppose that \mathcal{M} is closed under concentration within h\mathcal{I}_{h}. Take an arbitrary random variable YY with FYF_{Y}\in\mathcal{M}. Let G=FYhG=F^{\mathcal{I}_{h}}_{Y}. For hh\in\mathcal{H}, write functions g(t)=1h^(1t)g(t)=1-\hat{h}(1-t) and g(t)=1h(1t)g_{*}(t)=1-h^{*}(1-t) for t[0,1]t\in[0,1]. By definition of h\mathcal{I}_{h}, ggg\neq g_{*} on each set in h\mathcal{I}_{h} and g=gg=g_{*} on other sets. For any (a,b)h(a,b)\in\mathcal{I}_{h}, we have G1(t)=abFY1(u)dubaG^{-1}(t)=\frac{\int_{a}^{b}F^{-1}_{Y}(u)\,\mathrm{d}u}{b-a} for all t(a,b]t\in(a,b] and G1+(t)=abFY1(u)dubaG^{-1+}(t)=\frac{\int_{a}^{b}F^{-1}_{Y}(u)\,\mathrm{d}u}{b-a} for all t[a,b)t\in[a,b). Using the fact that gg_{*} is linear on (a,b)(a,b) and g(t)=g(t)g(t)=g_{*}(t) for t=a,bt=a,b, we have

(a,b)FY1(t)dg(t)\displaystyle\int_{(a,b)}F^{-1}_{Y}(t)\,\mathrm{d}{g_{*}}(t) =(g(b)g(a))abFY1(t)dtba\displaystyle=(g_{*}(b)-g_{*}(a))\frac{\int_{a}^{b}F^{-1}_{Y}(t)\,\mathrm{d}t}{b-a} (A.4)
=(g(b)g(a))abFY1(t)dtba\displaystyle=(g(b)-g(a))\frac{\int_{a}^{b}F^{-1}_{Y}(t)\,\mathrm{d}t}{b-a}
=(a,b]G1(t)dg(t)+G1+(a)(g(a+)g(a)).\displaystyle=\int_{(a,b]}G^{-1}(t)\,\mathrm{d}g(t)+G^{-1+}(a)(g(a^{+})-g(a)).

Define the sets

J+={tJh:h^(t+)=h^(t)h^(t)},J={tJh:h^(t+)h^(t)=h^(t)},\displaystyle J_{+}=\{t\in J_{h}:\hat{h}(t^{+})=\hat{h}(t)\neq\hat{h}(t^{-})\},~{}~{}J_{-}=\{t\in J_{h}:\hat{h}(t^{+})\neq\hat{h}(t)=\hat{h}(t^{-})\},
andJ0={tJh:h^(t+)h^(t)h^(t)}.\displaystyle\text{and}~{}~{}J_{0}=\{t\in J_{h}:\hat{h}(t^{+})\neq\hat{h}(t)\neq\hat{h}(t^{-})\}.

To better understand these sets, we recall Figure 1 (without concave envelopes) as Figure A.1 to demonstrate an example of a distortion function hh, the corrresponding h^\hat{h}, the sets JhJ_{h}, J+J_{+}, JJ_{-}, and J0J_{0}, and the sets J^\hat{J}, J^+\hat{J}_{+}, J^\hat{J}_{-}, J^+0\hat{J}^{0}_{+}, and J^0\hat{J}^{0}_{-} (defined in the proof of (i) below).

Refer to caption
Refer to caption
Figure A.1: An example of hh (left) and h^\hat{h} (right); in this figure, Jh={t1,t2,t3,t4,t5}J_{h}=\{t_{1},t_{2},t_{3},t_{4},t_{5}\}, J+={t1}J_{+}=\{t_{1}\}, J={t2,t3}J_{-}=\{t_{2},t_{3}\}, and J0={t5}J_{0}=\{t_{5}\}. Moreover, the sets we use in the proof of (i) are J^={t1,t2,t3,t4}\hat{J}=\{t_{1},t_{2},t_{3},t_{4}\}, J^+={t1,t4}\hat{J}_{+}=\{t_{1},t_{4}\}, J^={t2,t3}\hat{J}_{-}=\{t_{2},t_{3}\}, J^+0={t4}\hat{J}^{0}_{+}=\{t_{4}\}, and J^0={t3}\hat{J}^{0}_{-}=\{t_{3}\}

Note that for a random variable ZhFYhZ_{\mathcal{I}_{h}}\sim F^{\mathcal{I}_{h}}_{Y}, we have

ρh^(Zh)\displaystyle\rho_{\hat{h}}(Z_{\mathcal{I}_{h}}) =(0,1](J+J0)G1(t)dg(t)+tJ+J0{0}G1+(t)(g(t+)g(t)).\displaystyle=\int_{(0,1]\setminus(J_{+}\cup J_{0})}G^{-1}(t)\,\mathrm{d}g(t)+\sum_{t\in J_{+}\cup J_{0}\cup\{0\}}G^{-1+}(t)(g(t^{+})-g(t)).

Hence using (A.4) and (8), we get

ρh(Y)ρh^(Zh)\displaystyle\rho_{h^{*}}(Y)-\rho_{\hat{h}}(Z_{\mathcal{I}_{h}}) (A.5)
=01FY1(t)dg(t)+FY1+(0)(g(0+)g(0))\displaystyle=\int_{0}^{1}F^{-1}_{Y}(t)\,\mathrm{d}{g_{*}}(t)+F^{-1+}_{Y}(0)(g_{*}(0^{+})-g_{*}(0))
(0,1](J+J0)G1(t)dg(t)tJ+J0{0}G1+(t)(g(t+)g(t))\displaystyle\quad\quad-\int_{(0,1]\setminus(J_{+}\cup J_{0})}G^{-1}(t)\,\mathrm{d}g(t)-\sum_{t\in J_{+}\cup J_{0}\cup\{0\}}G^{-1+}(t)(g(t^{+})-g(t))
=(a,b)h((a,b)FY1(t)dg(t)(a,b]G1(t)dg(t)G1+(a)(g(a+)g(a)))=0.\displaystyle=\sum_{(a,b)\in\mathcal{I}_{h}}\left(\int_{(a,b)}F^{-1}_{Y}(t)\,\mathrm{d}{g_{*}}(t)-\int_{(a,b]}G^{-1}(t)\,\mathrm{d}g(t)-G^{-1+}(a)(g(a^{+})-g(a))\right)=0.

Since \mathcal{M} is closed under concentration within h\mathcal{I}_{h}, we have FYhF^{\mathcal{I}_{h}}_{Y}\in\mathcal{M} by definition. Thus we have

ρh(Y)=ρh^(Zh)supFXρh^(X),\rho_{h^{*}}(Y)=\rho_{\hat{h}}(Z_{\mathcal{I}_{h}})\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X),

which gives our desired equality (A.3) since ρh=ρ(h^)ρh^\rho_{h^{*}}=\rho_{(\hat{h})^{*}}\geqslant\rho_{\hat{h}}.

Proof of (i): Using h=h^h=\hat{h} and (A.3), we have supFXρh(X)=supFXρh(X)\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X)=\sup_{F_{X}\in\mathcal{M}}\rho_{h^{*}}(X).

Proof of (ii): We will prove (ii) in two main steps. First, we show that (ii) holds if h\mathcal{I}_{h} is finite and hh has finitely many discontinuity points. Next, we discuss general hh.

Finite case: Here we prove (9) under the case where h\mathcal{I}_{h} is finite and hh has finitely many discontinuity points (i.e. JhJ_{h} in (A.1) is a finite set). Suppose that \mathcal{M} is closed under concentration for all intervals, it directly implies that \mathcal{M} is closed under concentration within h\mathcal{I}_{h} by Proposition 3. Therefore, (A.3) holds for all hh\in\mathcal{H}. Next, we need to show that supFXρh(X)=supFXρh^(X).\sup_{F_{X}\in\mathcal{M}}\rho_{{h}}(X)=\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X). Define

J^={tJh:h^(t)h(t)},J^+={tJ^:h^(t)=h^(t+)},andJ^=J^J^+.\displaystyle\hat{J}=\{t\in J_{h}:\hat{h}(t)\neq h(t)\},~{}~{}\hat{J}_{+}=\{t\in\hat{J}:\hat{h}(t)=\hat{h}(t^{+})\},~{}~{}\text{and}~{}~{}\hat{J}_{-}=\hat{J}\setminus\hat{J}_{+}.

For n>0n>0, write intervals

Asn={(1s1/n,1s+1/n),sJ^,(1s1/n,1s+1/n),sJ^+.A^{n}_{s}=\left\{\begin{array}[]{l l}(1-s-1/\sqrt{n},1-s+1/n),&s\in\hat{J}_{-},\\ (1-s-1/n,1-s+1/\sqrt{n}),&s\in\hat{J}_{+}.\end{array}\right.

Let n={Asn:sJ^}\mathcal{I}^{n}=\{A^{n}_{s}:s\in\hat{J}\}. Note that hh\in\mathcal{H} has finitely many discontinuity points. Thus the intervals in n\mathcal{I}^{n} are disjoint when nn is large enough. For all FYF_{Y}\in\mathcal{M} and YFYY\sim F_{Y}, we define

Zn=FY1(U)𝟙{UsJ^Asn}+sJ^𝔼[FY1(U)|UAsn]𝟙{UAsn}.Z_{\mathcal{I}^{n}}=F^{-1}_{Y}(U)\mathds{1}_{\{U\notin\bigcup_{s\in\hat{J}}A^{n}_{s}\}}+\sum_{s\in\hat{J}}\mathbb{E}[F^{-1}_{Y}(U)|U\in A^{n}_{s}]\mathds{1}_{\{U\in A^{n}_{s}\}}.

It follows that ZnFYnZ_{\mathcal{I}^{n}}\sim F^{\mathcal{I}^{n}}_{Y} and the right-quantile function of ZnZ_{\mathcal{I}^{n}}, denoted by Gn1+G^{-1+}_{n}, is given by the right-continuous adjusted version of

FY1+(t)𝟙{tsJ^Asn}+sJ^AsnFY1(u)duλ(Asn)𝟙{tAsn},t(0,1).F^{-1+}_{Y}(t)\mathds{1}_{\{t\notin\bigcup_{s\in\hat{J}}A^{n}_{s}\}}+\sum_{s\in\hat{J}}\frac{\int_{A^{n}_{s}}F^{-1}_{Y}(u)\,\mathrm{d}u}{\lambda(A^{n}_{s})}\mathds{1}_{\{t\in A^{n}_{s}\}},~{}~{}t\in(0,1).

Thus we get

limnGn1+(1t)={FY1(1t),tJ^,FY1+(1t),otherwise.\lim_{n\to\infty}G^{-1+}_{n}(1-t)=\left\{\begin{array}[]{l l}F^{-1}_{Y}(1-t),&t\in\hat{J}_{-},\\ F^{-1+}_{Y}(1-t),&\text{otherwise}.\end{array}\right.

Similarly, if we denote the left-quantile function of ZnZ_{\mathcal{I}^{n}} by Gn1G^{-1}_{n}, then Gn1G^{-1}_{n} is given by the left-continuous version of

FY1(t)𝟙{tsJ^Asn}+sJ^AsnFY1(u)duλ(Asn)𝟙{tAsn}.F^{-1}_{Y}(t)\mathds{1}_{\{t\notin\bigcup_{s\in\hat{J}}A^{n}_{s}\}}+\sum_{s\in\hat{J}}\frac{\int_{A^{n}_{s}}F^{-1}_{Y}(u)\,\mathrm{d}u}{\lambda(A^{n}_{s})}\mathds{1}_{\{t\in A^{n}_{s}\}}.

It follows that

limnGn1(1t)={FY1+(1t),tJ^+,FY1(1t),otherwise.\lim_{n\to\infty}G^{-1}_{n}(1-t)=\left\{\begin{array}[]{l l}F^{-1+}_{Y}(1-t),&t\in\hat{J}_{+},\\ F^{-1}_{Y}(1-t),&\text{otherwise}.\end{array}\right.

Define, further, the sets

J^+0={tJ^+:h(t)h(t)}andJ^0={tJ^:h(t)h(t+)}.\hat{J}^{0}_{+}=\{t\in\hat{J}_{+}:h(t)\neq h(t^{-})\}~{}~{}\text{and}~{}~{}\hat{J}^{0}_{-}=\{t\in\hat{J}_{-}:h(t)\neq h(t^{+})\}.

For u[0,1]u\in[0,1], write

h(u)=tJ^(h(t)h(t))𝟙{ut},h0(u)=tJ^0(h(t+)h(t))𝟙{u>t},\displaystyle h_{-}(u)=\sum_{t\in\hat{J}_{-}}(h(t)-h(t^{-}))\mathds{1}_{\{u\geqslant t\}},~{}~{}h^{0}_{-}(u)=\sum_{t\in\hat{J}^{0}_{-}}(h(t^{+})-h(t))\mathds{1}_{\{u>t\}},
h+(u)=tJ^+(h(t+)h(t))𝟙{u>t},h+0(u)=tJ^+0(h(t)h(t))𝟙{ut},\displaystyle h_{+}(u)=\sum_{t\in\hat{J}_{+}}(h(t^{+})-h(t))\mathds{1}_{\{u>t\}},~{}~{}h^{0}_{+}(u)=\sum_{t\in\hat{J}^{0}_{+}}(h(t)-h(t^{-}))\mathds{1}_{\{u\geqslant t\}},
h^(u)=tJ^(h(t+)h(t))𝟙{u>t},h^+(u)=tJ^+(h(t+)h(t))𝟙{ut},\displaystyle\hat{h}_{-}(u)=\sum_{t\in\hat{J}_{-}}(h(t^{+})-h(t^{-}))\mathds{1}_{\{u>t\}},~{}~{}\hat{h}_{+}(u)=\sum_{t\in\hat{J}_{+}}(h(t^{+})-h(t^{-}))\mathds{1}_{\{u\geqslant t\}},
andh0(u)=h(u)h+(u)h(u)h+0(u)h0(u)=h^(u)h^+(u)h^(u).\displaystyle\text{and}~{}~{}h_{0}(u)=h(u)-h_{+}(u)-h_{-}(u)-h^{0}_{+}(u)-h^{0}_{-}(u)=\hat{h}(u)-\hat{h}_{+}(u)-\hat{h}_{-}(u).

Note that |ZnFY1(U)|=0|Z_{\mathcal{I}^{n}}-F^{-1}_{Y}(U)|=0 when UsJ^AsnU\notin\bigcup_{s\in\hat{J}}A^{n}_{s} and 0,1[0,1]sJ^Asn0,1\in[0,1]\setminus\bigcup_{s\in\hat{J}}A^{n}_{s}. We have |ZnFY1(U)|<|Z_{\mathcal{I}^{n}}-F^{-1}_{Y}(U)|<\infty. Therefore, by the dominated convergence theorem,

limn(ρh(Zn)+ρh0(Zn))\displaystyle\lim_{n\to\infty}(\rho_{h_{-}}(Z_{\mathcal{I}^{n}})+\rho_{h^{0}_{-}}(Z_{\mathcal{I}^{n}}))
=limn01Gn1+(1u)dh(u)+limn01Gn1(1u)dh0(u)\displaystyle=\lim_{n\to\infty}\int^{1}_{0}G^{-1+}_{n}(1-u)\,\mathrm{d}{h_{-}}(u)+\lim_{n\to\infty}\int^{1}_{0}G^{-1}_{n}(1-u)\,\mathrm{d}{h^{0}_{-}}(u)
=tJ^FY1(1t)(h(t)h(t))+tJ^0FY1(1t)(h(t+)h(t))\displaystyle=\sum_{t\in\hat{J}_{-}}F^{-1}_{Y}(1-t)(h(t)-h(t^{-}))+\sum_{t\in\hat{J}^{0}_{-}}F^{-1}_{Y}(1-t)(h(t^{+})-h(t))
=tJ^J^0FY1(1t)(h(t)h(t))+tJ^0FY1(1t)(h(t)h(t)+h(t+)h(t))\displaystyle=\sum_{t\in\hat{J}_{-}\setminus\hat{J}^{0}_{-}}F^{-1}_{Y}(1-t)(h(t)-h(t^{-}))+\sum_{t\in\hat{J}^{0}_{-}}F^{-1}_{Y}(1-t)(h(t)-h(t^{-})+h(t^{+})-h(t))
=tJ^J^0FY1(1t)(h(t+)h(t))+tJ^0FY1(1t)(h(t+)h(t))=ρh^(Y).\displaystyle=\sum_{t\in\hat{J}_{-}\setminus\hat{J}^{0}_{-}}F^{-1}_{Y}(1-t)(h(t^{+})-h(t^{-}))+\sum_{t\in\hat{J}^{0}_{-}}F^{-1}_{Y}(1-t)(h(t^{+})-h(t^{-}))=\rho_{\hat{h}_{-}}(Y).

Similarly, we get limn(ρh+(Zn)+ρh+0(Zn))=ρh^+(Y)\lim_{n\to\infty}(\rho_{h_{+}}(Z_{\mathcal{I}^{n}})+\rho_{h^{0}_{+}}(Z_{\mathcal{I}^{n}}))=\rho_{\hat{h}_{+}}(Y). On the other hand, it is clear that limnρh0(Zn)=ρh0(Y)\lim_{n\to\infty}\rho_{h_{0}}(Z_{\mathcal{I}^{n}})=\rho_{h_{0}}(Y). Therefore, we have

limnρh(Zn)\displaystyle\lim_{n\to\infty}\rho_{h}(Z_{\mathcal{I}^{n}}) =limn(ρh(Zn)+ρh0(Zn)+ρh+(Zn)+ρh+0(Zn)+ρh0(Zn))\displaystyle=\lim_{n\to\infty}(\rho_{h_{-}}(Z_{\mathcal{I}^{n}})+\rho_{h^{0}_{-}}(Z_{\mathcal{I}^{n}})+\rho_{h_{+}}(Z_{\mathcal{I}^{n}})+\rho_{h^{0}_{+}}(Z_{\mathcal{I}^{n}})+\rho_{h_{0}}(Z_{\mathcal{I}^{n}}))
=ρh^(Y)+ρh^+(Y)+ρh0(Y)=ρh^(Y).\displaystyle=\rho_{\hat{h}_{-}}(Y)+\rho_{\hat{h}_{+}}(Y)+\rho_{h_{0}}(Y)=\rho_{\hat{h}}(Y).

Thus we have

ρh^(Y)=limnρh(Zn)supFXρh(X).\rho_{\hat{h}}(Y)=\lim_{n\to\infty}\rho_{h}(Z_{\mathcal{I}^{n}})\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X). (A.6)

Using (A.3) and (A.6), we get

supFXρh(X)=supFXρh^(X)supFXρh(X).\sup_{F_{X}\in\mathcal{M}}\rho_{h^{*}}(X)=\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X)\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X).

General case: We prove Theorem 1 for all general hh\in\mathcal{H} where h\mathcal{I}_{h} or the number of discontinuity points of hh is countable.

1. If h\mathcal{I}_{h} is countable, it suffices to prove (A.3). We write h\mathcal{I}_{h} as the collection of (ai,bi)(a_{i},b_{i}) for ii\in\mathbb{N} and define 1n={(ai,bi):i=1,,n}\mathcal{I}^{n}_{1}=\{(a_{i},b_{i}):i=1,\dots,n\} for all nn\in\mathbb{N}. Define the function

hn(t)={h(t),t(1bi,1ai),i=1,,n,h^(t),otherwise.h_{n}(t)=\left\{\begin{array}[]{l l}h^{*}(t),&t\in(1-b_{i},1-a_{i}),~{}i=1,\dots,n,\\ \hat{h}(t),&\text{otherwise}.\end{array}\right.

It is clear that for all nn\in\mathbb{N}, the set {t[0,1]:hn(t)h^(t)}\{t\in[0,1]:h_{n}(t)\neq\hat{h}(t)\} is a finite union of disjoint open intervals and hnh_{n} is linear on each of the intervals. For all random variables YY with FYF_{Y}\in\mathcal{M}, let random variable Z1nFY1nZ_{\mathcal{I}^{n}_{1}}\sim F^{\mathcal{I}^{n}_{1}}_{Y}. Similar to (A.3), we have

ρhn(Y)=ρh^(Z1n)supFXρh^(X),for all n.\rho_{h_{n}}(Y)=\rho_{\hat{h}}(Z_{\mathcal{I}^{n}_{1}})\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X),~{}~{}\text{for all }n\in\mathbb{N}.

Note that hn(t)h(t)h_{n}(t)\uparrow h^{*}(t) as nn\to\infty for all t(0,1)t\in(0,1). By the monotone convergence theorem, we get ρhn(Y)ρh(Y)\rho_{h_{n}}(Y)\to\rho_{h^{*}}(Y) as nn\to\infty. It follows that

supFXρh^(X)ρhn(Y)nρh(Y).\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X)\geqslant\rho_{h_{n}}(Y)\xrightarrow{n\to\infty}\rho_{h^{*}}(Y).

2. If hh\in\mathcal{H} has countably many discontinuity points, it suffices to prove (A.6). There exist series of finite sets {J^m}mJ^\{\hat{J}^{m}\}_{m\in\mathbb{N}}\subset\hat{J}, such that J^mJ^\hat{J}^{m}\to\hat{J} as mm\to\infty. For all mm\in\mathbb{N}, write

h^m(t)={h^(t),tJ^m,h(t),otherwise,\hat{h}_{m}(t)=\left\{\begin{array}[]{l l}\hat{h}(t),&t\in\hat{J}^{m},\\ h(t),&\text{otherwise},\end{array}\right.

and define

J^+m={tJ^m:h^m(t)=h^m(t+)},andJ^m=J^mJ^+m.\displaystyle\hat{J}^{m}_{+}=\{t\in\hat{J}^{m}:\hat{h}_{m}(t)=\hat{h}_{m}(t^{+})\},~{}~{}\text{and}~{}~{}\hat{J}^{m}_{-}=\hat{J}^{m}\setminus\hat{J}^{m}_{+}.

For n>0n>0, let 2n,m={Bsn,m:iJ^m}\mathcal{I}^{n,m}_{2}=\{B^{n,m}_{s}:i\in\hat{J}^{m}\} with

Bsn,m={(1s1/n,1s+1/n),sJ^m,(1s1/n,1s+1/n),sJ^+m.B^{n,m}_{s}=\left\{\begin{array}[]{l l}(1-s-1/\sqrt{n},1-s+1/n),&s\in\hat{J}^{m}_{-},\\ (1-s-1/n,1-s+1/\sqrt{n}),&s\in\hat{J}^{m}_{+}.\end{array}\right.

Following the same argument as (A.6), for all random variable YY with FYF_{Y}\in\mathcal{M}, we have

supFXρh(X)ρh(Z2n,m)nρh^m(Y),for all m,\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X)\geqslant\rho_{h}(Z_{\mathcal{I}^{n,m}_{2}})\xrightarrow{n\to\infty}\rho_{\hat{h}_{m}}(Y),~{}~{}\text{for all }m\in\mathbb{N},

where Z2n,mFY2n,mZ_{\mathcal{I}^{n,m}_{2}}\sim F^{\mathcal{I}^{n,m}_{2}}_{Y}. Moreover, we have h^m(t)h^(t)\hat{h}_{m}(t)\uparrow\hat{h}(t) for all t[0,1]t\in[0,1] as mm\to\infty. By the monotone convergence theorem, we have ρh^m(Y)ρh^(Y)\rho_{\hat{h}_{m}}(Y)\to\rho_{\hat{h}}(Y) as mm\to\infty. Therefore, we have

supFXρh^(X)supFXρh(X).\sup_{F_{X}\in\mathcal{M}}\rho_{\hat{h}}(X)\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X).

Proof of (iii): For all hh\in\mathcal{H}, if \mathcal{M} is closed under concentration within h\mathcal{I}_{h} and h=h^h=\hat{h}, we have FYhF_{Y}^{\mathcal{I}_{h}}\in\mathcal{M} by definition. Since ZhFYhZ_{\mathcal{I}_{h}}\sim F_{Y}^{\mathcal{I}_{h}}, (A.5) gives

ρh(Y)=ρh^(Zh)=ρh(Zh).\rho_{h^{*}}(Y)=\rho_{\hat{h}}(Z_{\mathcal{I}_{h}})=\rho_{h}(Z_{\mathcal{I}_{h}}).

Note that ρhρh\rho_{h}\leqslant\rho_{h^{*}} generally. Therefore, if maxFYρh(Y)\max_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y) is attained by FYF_{Y}, then so is maxFYρh(Y)\max_{F_{Y}\in\mathcal{M}}\rho_{h}(Y) by FYhF_{Y}^{\mathcal{I}_{h}}. Obviously, these two quantities share a common maximizer FYhF_{Y}^{\mathcal{I}_{h}} because

ρh(Zh)maxFYρh(Y)=maxFYρh(Y)=ρh(Zh)ρh(Zh).\rho_{h^{*}}(Z_{\mathcal{I}_{h}})\leqslant\max_{F_{Y}\in\mathcal{M}}\rho_{h^{*}}(Y)=\max_{F_{Y}\in\mathcal{M}}\rho_{h}(Y)=\rho_{h}(Z_{\mathcal{I}_{h}})\leqslant\rho_{h^{*}}(Z_{\mathcal{I}_{h}}).

The proof is complete. ∎

Proof of Theorem 2.

Suppose for contradiction that opt\mathcal{M}_{\mathrm{opt}} is not closed under concentration within h\mathcal{I}_{h}. There exists FYoptF_{Y}\in\mathcal{M}_{\mathrm{opt}}, such that FYhoptF^{\mathcal{I}_{h}}_{Y}\notin\mathcal{M}_{\mathrm{opt}}. Define the set 𝒴h={(FY1(a),FY1(b)):(a,b)h}\mathcal{Y}_{h}=\{(F^{-1}_{Y}(a),F^{-1}_{Y}(b)):(a,b)\in\mathcal{I}_{h}\}. Since FYhoptF^{\mathcal{I}_{h}}_{Y}\notin\mathcal{M}_{\mathrm{opt}}, there exists an interval (a,b)h(a,b)\in\mathcal{I}_{h}, such that FY1F^{-1}_{Y} is not constant on (a,b)(a,b). Thus the Lebesgue measure λ((FY1(a),FY1(b)))>0\lambda((F^{-1}_{Y}(a),F^{-1}_{Y}(b)))>0. Since h>hh^{*}>h on (a,b)(a,b),

ρh(Y)ρh(Y)\displaystyle\rho_{h^{*}}(Y)-\rho_{h}(Y) =(h((Y>x))h((Y>x)))dx\displaystyle=\int_{\mathbb{R}}(h^{*}(\mathbb{P}(Y>x))-h(\mathbb{P}(Y>x)))\,\mathrm{d}x (A.7)
=A𝒴hA(h((Y>x))h((Y>x)))dx>0.\displaystyle=\sum_{A\in\mathcal{Y}_{h}}\int_{A}(h^{*}(\mathbb{P}(Y>x))-h(\mathbb{P}(Y>x)))\,\mathrm{d}x>0.

On the other hand, we have

ρh(Y)supFXρh(X)=supFXρh(X)=ρh(Y)ρh(Y),\rho_{h^{*}}(Y)\leqslant\sup_{F_{X}\in\mathcal{M}}\rho_{h^{*}}(X)=\sup_{F_{X}\in\mathcal{M}}\rho_{h}(X)=\rho_{h}(Y)\leqslant\rho_{h^{*}}(Y),

which leads to a contradiction to (A.7). Therefore, opt\mathcal{M}_{\mathrm{opt}} is closed under concentration within h\mathcal{I}_{h}. ∎

Proof of Proposition 2.

We first prove that closedness under conditional expectation implies closedness under concentration for all intervals. For all random variables Y1Y\in\mathcal{L}^{1} and intervals C[0,1]C\subset[0,1], define

X=FY1(U)𝟙{UC}+𝔼[FY1(U)|UC]𝟙{UC},X=F^{-1}_{Y}(U)\mathds{1}_{\{U\not\in C\}}+\mathbb{E}[F^{-1}_{Y}(U)|U\in C]\mathds{1}_{\{U\in C\}},

where UU[0,1]U\sim\mathrm{U}[0,1]. The distribution of XX is the concentration FYCF^{C}_{Y}. For all σ(X)\sigma(X)-measurable random variables ZZ, we have that Z|{UC}Z|\{U\in C\} is constant. Hence,

𝔼[XZ]\displaystyle\mathbb{E}[XZ] =𝔼[ZFY1(U)𝟙{UC}+Z𝔼[FY1(U)|UC]𝟙{UC}]\displaystyle=\mathbb{E}[ZF^{-1}_{Y}(U)\mathds{1}_{\{U\not\in C\}}+Z\mathbb{E}[F^{-1}_{Y}(U)|U\in C]\mathds{1}_{\{U\in C\}}]
=𝔼[ZFY1(U)𝟙{UC}]+𝔼[𝔼[ZFY1(U)|UC]𝟙{UC}]\displaystyle=\mathbb{E}[ZF^{-1}_{Y}(U)\mathds{1}_{\{U\not\in C\}}]+\mathbb{E}[\mathbb{E}[ZF^{-1}_{Y}(U)|U\in C]\mathds{1}_{\{U\in C\}}]
=𝔼[ZFY1(U)𝟙{UC}]+𝔼[ZFY1(U)|UC](UC)\displaystyle=\mathbb{E}[ZF^{-1}_{Y}(U)\mathds{1}_{\{U\not\in C\}}]+\mathbb{E}[ZF^{-1}_{Y}(U)|U\in C]\mathbb{P}(U\in C)
=𝔼[ZFY1(U)𝟙{UC}]+𝔼[ZFY1(U)𝟙{UC}]=𝔼[ZFY1(U)].\displaystyle=\mathbb{E}[ZF^{-1}_{Y}(U)\mathds{1}_{\{U\not\in C\}}]+\mathbb{E}[ZF^{-1}_{Y}(U)\mathds{1}_{\{U\in C\}}]=\mathbb{E}[ZF^{-1}_{Y}(U)].

It follows that 𝔼[Y|X]=𝔼[FY1(U)|X]=X\mathbb{E}[Y|X]=\mathbb{E}[F^{-1}_{Y}(U)|X]=X, \mathbb{P}-almost surely. If a set of distributions, \mathcal{M}, is closed under conditional expectation and FYF_{Y}\in\mathcal{M}, then F𝔼[Y|X]F_{\mathbb{E}[Y|X]}\in\mathcal{M}, which implies that FYC=FXF^{C}_{Y}=F_{X}\in\mathcal{M}. Thus \mathcal{M} is also closed under concentration for all intervals. ∎

Proof of Proposition 3.

(i) Suppose that \mathcal{M} is closed under concentration for all intervals and \mathcal{I} is a finite. Using (7), we can see that FF^{\mathcal{I}} is the resulting distribution obtained by sequentially applying finitely many CC-concentrations to FF over all CC\in\mathcal{I}. We thus have FF^{\mathcal{I}}\in\mathcal{M} for all FF\in\mathcal{M}.

(ii) Suppose that \mathcal{M} is closed under conditional expectation and FF\in\mathcal{M}. We define

X=F1(U)𝟙{UCC}+C𝔼[F1(U)|UC]𝟙{UC},X=F^{-1}(U)\mathds{1}_{\{U\not\in\bigcup_{C\in\mathcal{I}}C\}}+\sum_{C\in\mathcal{I}}\mathbb{E}[F^{-1}(U)|U\in C]\mathds{1}_{\{U\in C\}},

whose left-quantile function is given by (8) according to (7). Following similar argument to the proof of Proposition 2, for all σ(X)\sigma(X)-measurable random variables ZZ, we have

𝔼[XZ]\displaystyle\mathbb{E}[XZ] =𝔼[ZF1(U)𝟙{UCC}+CZ𝔼[F1(U)|UC]𝟙{UC}]\displaystyle=\mathbb{E}[ZF^{-1}(U)\mathds{1}_{\{U\not\in\bigcup_{C\in\mathcal{I}}C\}}+\sum_{C\in\mathcal{I}}Z\mathbb{E}[F^{-1}(U)|U\in C]\mathds{1}_{\{U\in C\}}]
=𝔼[ZF1(U)𝟙{UCC}]+C𝔼[𝔼[ZF1(U)|UC]𝟙{UC}]\displaystyle=\mathbb{E}[ZF^{-1}(U)\mathds{1}_{\{U\not\in\bigcup_{C\in\mathcal{I}}C\}}]+\sum_{C\in\mathcal{I}}\mathbb{E}[\mathbb{E}[ZF^{-1}(U)|U\in C]\mathds{1}_{\{U\in C\}}]
=𝔼[ZF1(U)𝟙{UCC}]+C𝔼[ZF1(U)𝟙{UC}]=𝔼[ZF1(U)].\displaystyle=\mathbb{E}[ZF^{-1}(U)\mathds{1}_{\{U\not\in\bigcup_{C\in\mathcal{I}}C\}}]+\sum_{C\in\mathcal{I}}\mathbb{E}[ZF^{-1}(U)\mathds{1}_{\{U\in C\}}]=\mathbb{E}[ZF^{-1}(U)].

Thus 𝔼[F1(U)|X]=X\mathbb{E}[F^{-1}(U)|X]=X, \mathbb{P}-almost surely, which implies that F=FXF^{\mathcal{I}}=F_{X}\in\mathcal{M}. ∎

B.3 Proofs of results in Section 4

Proof of Theorem 3.

To prove the first statement, according to the proof of Theorem 1, it suffices to show that for all increasing hh\in\mathcal{H}, 𝐗(1)n\mathbf{X}\in(\mathcal{L}^{1})^{n} and 𝒢\mathscr{G}\subset\mathscr{F}, ρh(𝔼[f(𝐚,𝐗)|𝒢])ρh(f(𝐚,𝔼[𝐗|𝒢]))\rho_{h}(\mathbb{E}[f(\mathbf{a},\mathbf{X})|\mathscr{G}])\leqslant\rho_{h}(f(\mathbf{a},\mathbb{E}[\mathbf{X}|\mathscr{G}])), which holds directly by Jensen’s inequality and monotonicity of ρh\rho_{h}. The second statement holds by Theorem 1. The last statement follows from ρh(𝔼[f(𝐚,𝐗)|𝒢])=ρh(f(𝐚,𝔼[𝐗|𝒢]))\rho_{h}(\mathbb{E}[f(\mathbf{a},\mathbf{X})|\mathscr{G}])=\rho_{h}(f(\mathbf{a},\mathbb{E}[\mathbf{X}|\mathscr{G}])) and using Theorem 1. ∎

Proof of Theorem 4.

(i) For all 𝐗=(X1,,Xn)(1)n\mathbf{X}=(X_{1},\dots,X_{n})\in(\mathcal{L}^{1})^{n}, take a comonotonic 𝐗~=(X~1,,X~n)(1)n\widetilde{\mathbf{X}}=(\widetilde{X}_{1},\dots,\widetilde{X}_{n})\in(\mathcal{L}^{1})^{n} such that X~i=dXi\widetilde{X}_{i}\buildrel\mathrm{d}\over{=}X_{i} for all i=1,,ni=1,\dots,n. It follows that 𝔼[g(𝐗)]𝔼[g(𝐗~)]\mathbb{E}[g(\mathbf{X})]\leqslant\mathbb{E}[g(\widetilde{\mathbf{X}})] for all supermodular functions g:ng:\mathbb{R}^{n}\to\mathbb{R} due to Theorem 5 of Tchen (1980). By Proposition 2.2.5 of Simchi-Levi et al. (2005), we have f(𝐚,𝐗)icxf(𝐚,𝐗~)f(\mathbf{a},\mathbf{X})\leqslant_{\rm icx}f(\mathbf{a},\widetilde{\mathbf{X}}). Moreover, there exists a standard uniform random variable UU such that X~i=FX~i1(U)\widetilde{X}_{i}=F^{-1}_{\widetilde{X}_{i}}(U) for all i=1,,ni=1,\dots,n and f(𝐚,𝐗~)=Ff(𝐚,𝐗~)1(U)f(\mathbf{a},\widetilde{\mathbf{X}})=F^{-1}_{f(\mathbf{a},\widetilde{\mathbf{X}})}(U) almost surely (Denneberg, 1994). Take

f(𝐚,𝐗~)h=Ff(𝐚,𝐗~)1(U)𝟙{UChC}+Ch𝔼[Ff(𝐚,𝐗~)1(U)|UC]𝟙{UC}Ff(𝐚,𝐗~)h.f(\mathbf{a},\widetilde{\mathbf{X}})^{\mathcal{I}_{h}}=F^{-1}_{f(\mathbf{a},\widetilde{\mathbf{X}})}(U)\mathds{1}_{\{U\notin\bigcup_{C\in\mathcal{I}_{h}}C\}}+\sum_{C\in\mathcal{I}_{h}}\mathbb{E}[F^{-1}_{f(\mathbf{a},\widetilde{\mathbf{X}})}(U)|U\in C]\mathds{1}_{\{U\in C\}}\sim F_{f(\mathbf{a},\widetilde{\mathbf{X}})}^{\mathcal{I}_{h}}.

It follows that f(𝐚,𝐗~)h=𝔼[f(𝐚,𝐗~)|𝒢]f(\mathbf{a},\widetilde{\mathbf{X}})^{\mathcal{I}_{h}}=\mathbb{E}[f(\mathbf{a},\widetilde{\mathbf{X}})|\mathscr{G}], where 𝒢=σ(U𝟙{UChC})\mathscr{G}=\sigma(U\mathds{1}_{\{U\notin\bigcup_{C\in\mathcal{I}_{h}}C\}}). Similarly, X~ih=𝔼[X~i|𝒢]\widetilde{X}_{i}^{\mathcal{I}_{h}}=\mathbb{E}[\widetilde{X}_{i}|\mathscr{G}] for all i=1,,ni=1,\dots,n, where

X~ih=FX~i1(U)𝟙{UChC}+Ch𝔼[FX~i1(U)|UC]𝟙{UC}FX~ih.\widetilde{X}_{i}^{\mathcal{I}_{h}}=F^{-1}_{\widetilde{X}_{i}}(U)\mathds{1}_{\{U\notin\bigcup_{C\in\mathcal{I}_{h}}C\}}+\sum_{C\in\mathcal{I}_{h}}\mathbb{E}[F^{-1}_{\widetilde{X}_{i}}(U)|U\in C]\mathds{1}_{\{U\in C\}}\sim F_{\widetilde{X}_{i}}^{\mathcal{I}_{h}}.

Since ff is supermodular and positively homogeneous, we have by Theorem 3 of Marinacci and Montrucchio (2008) that f(𝐚,𝐗)f(\mathbf{a},\mathbf{X}) is concave in 𝐗\mathbf{X}. By Jensen’s inequality, we have

f(𝐚,𝐗~)h=𝔼[f(𝐚,𝐗~)|𝒢]f(𝐚,𝔼[𝐗~|𝒢])=f(𝐚,X~1h,,X~nh).f(\mathbf{a},\widetilde{\mathbf{X}})^{\mathcal{I}_{h}}=\mathbb{E}[f(\mathbf{a},\widetilde{\mathbf{X}})|\mathscr{G}]\leqslant f(\mathbf{a},\mathbb{E}[\widetilde{\mathbf{X}}|\mathscr{G}])=f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}}).

Thus we have

ρh(f(𝐚,𝐗))ρh(f(𝐚,𝐗~))=ρh(f(𝐚,𝐗~)h)\displaystyle\rho_{h^{*}}(f(\mathbf{a},\mathbf{X}))\leqslant\rho_{h^{*}}(f(\mathbf{a},\widetilde{\mathbf{X}}))=\rho_{h}(f(\mathbf{a},\widetilde{\mathbf{X}})^{\mathcal{I}_{h}}) ρh(f(𝐚,X~1h,,X~nh))\displaystyle\leqslant\rho_{h}(f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}}))
supF𝐘𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐘)),\displaystyle\leqslant\sup_{F_{\mathbf{Y}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h}(f(\mathbf{a},\mathbf{Y})),

where the first inequality follows from Theorem 4.A.3 of Shaked and Shanthikumar (2007) and Theorem 5 of Wang et al. (2020a) and the second equality is by the proof of Theorem 1. Combined with the fact that

supF𝐗𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐗))supF𝐗𝒟(F1,,Fn)supF11,,Fnnρh(f(𝐚,𝐗)),\sup_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h}(f(\mathbf{a},\mathbf{X}))\leqslant\sup_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\sup_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h^{*}}(f(\mathbf{a},\mathbf{X})),

we have (17) holds.

(ii) Suppose that the supremum of the right-hand side of (17) is attained by some F11,,FnnF_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n} and F𝐗𝒟(F1,,Fn)F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n}). For comonotonic (X~1,,X~n)(\widetilde{X}_{1},\dots,\widetilde{X}_{n}) such that X~iFi\widetilde{X}_{i}\sim F_{i} for all i=1,,ni=1,\dots,n, using the argument in (i),

ρh(f(𝐚,𝐗))ρh(f(𝐚,X~1h,,X~nh)),\rho_{h^{*}}(f(\mathbf{a},\mathbf{X}))\leqslant\rho_{h}(f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}})),

where (X~1h,,X~nh)(\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}}) is comonotonic and X~ihFih\widetilde{X}_{i}^{\mathcal{I}_{h}}\sim F_{i}^{\mathcal{I}_{h}} for all i=1,,ni=1,\dots,n. Similarly to the proof of Theorem 1 (iii), since ρhρh\rho_{h}\leqslant\rho_{h^{*}}, we have the supremum of the left-hand side of (17) is attained by F1h,,FnhF_{1}^{\mathcal{I}_{h}},\dots,F_{n}^{\mathcal{I}_{h}} and (X~1h,,X~nh)(\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}}), which also obtain the supremum of the right-hand side of (17) since

ρh(f(𝐚,X~1h,,X~nh))\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\rho_{h^{*}}(f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}})) maxF𝐗𝒟(F1,,Fn)maxF11,,Fnnρh(f(𝐚,𝐗))\displaystyle\leqslant\max_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\max_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h^{*}}(f(\mathbf{a},\mathbf{X}))
=maxF𝐗𝒟(F1,,Fn)maxF11,,Fnnρh(f(𝐚,𝐗))\displaystyle=\max_{F_{\mathbf{X}}\in\mathcal{D}(F_{1},\dots,F_{n})}\max_{F_{1}\in\mathcal{F}_{1},\dots,F_{n}\in\mathcal{F}_{n}}\rho_{h}(f(\mathbf{a},\mathbf{X}))
=ρh(f(𝐚,X~1h,,X~nh))ρh(f(𝐚,X~1h,,X~nh)).\displaystyle=\rho_{h}(f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}}))\leqslant\rho_{h^{*}}(f(\mathbf{a},\widetilde{X}_{1}^{\mathcal{I}_{h}},\dots,\widetilde{X}_{n}^{\mathcal{I}_{h}})).~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\qed

B.4 Proofs of results in Section 5 and related lemmas

In the following, we write qq as the Hölder conjugate of pp. The following lemma closely resembles Theorem 3.4 of Liu et al. (2020) with only an additional statement on the uniqueness of the quantile function of the maximizer.

Lemma A.1.

For hh\in\mathcal{H}^{*}, mm\in\mathbb{R}, v>0v>0 and p>1p>1, we have

supFY(p,m,v)ρh(Y)=mh(1)+v[h]q,\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h}(Y)=mh(1)+v[h]_{q},

If 0<[h]q<0<[h]_{q}<\infty, the above supremum is attained by a random variable XX such that FX(p,m,v)F_{X}\in\mathcal{M}(p,m,v) with its quantile function uniquely determined by

VaRt(X)=m+vϕhq(t),t(0,1)a.e.\mathrm{VaR}_{t}(X)=m+v\phi_{h}^{q}(t),~{}~{}t\in(0,1)~{}~{}\text{a.e.} (A.8)

If [h]q=0[h]_{q}=0, the above maximum value is attained by any random variable XX such that FX(p,m,v)F_{X}\in\mathcal{M}(p,m,v).

Proof.

The only statement that is more than Theorem 3.4 of Liu et al. (2020) is the uniqueness of the quantile function (A.8). Without loss of generality, assume m=0m=0 and v=1v=1. Using the Hölder inequality

supFY(p,0,1)01h(t)VaR1t(Y)dt\displaystyle\sup_{F_{Y}\in\mathcal{M}(p,0,1)}\int_{0}^{1}h^{\prime}(t)\mathrm{VaR}_{1-t}(Y)\,\mathrm{d}t =supFY(p,0,1)01(h(t)ch,q)VaR1t(Y)dt\displaystyle=\sup_{F_{Y}\in\mathcal{M}(p,0,1)}\int_{0}^{1}(h^{\prime}(t)-c_{h,q})\mathrm{VaR}_{1-t}(Y)\,\mathrm{d}t
supFY(p,0,1)hch,qq(01|VaR1t(Y)|pdt)1/p=[h]q.\displaystyle\leqslant\sup_{F_{Y}\in\mathcal{M}(p,0,1)}{\|h^{\prime}-c_{h,q}\|_{q}\left(\int_{0}^{1}|\mathrm{VaR}_{1-t}(Y)|^{p}\,\mathrm{d}t\right)^{1/p}}=[h]_{q}.

The maximum is attained by FXF_{X} only if the above inequality is an equality, which is equivalent to that the function t|VaR1t(X)|pt\mapsto|\mathrm{VaR}_{1-t}(X)|^{p} is a multiple of |hch,q|q|h^{\prime}-c_{h,q}|^{q}. Therefore,

VaRt(X)=|h(1t)ch,q|qh(1t)ch,q[h]q1q=ϕhq(t),t(0,1)a.e.\mathrm{VaR}_{t}(X)=\frac{|h^{\prime}(1-t)-c_{h,q}|^{q}}{h^{\prime}(1-t)-c_{h,q}}[h]_{q}^{1-q}=\phi_{h}^{q}(t),~{}~{}t\in(0,1)~{}~{}\mbox{a.e.}

Hence, the quantile function of XX is uniquely determined by (A.8). ∎

Lemma A.2.

For all hh\in\mathcal{H} with h=h^h=\hat{h}, mm\in\mathbb{R}, v>0v>0 and p>1p>1, if [h]q<[h^{*}]_{q}<\infty, we have

supFY(p,m,v)ρh(Y)=supFY(p,m,v)ρh(Y)=mh(1)+v[h]q,\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h}(Y)=\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h^{*}}(Y)=mh(1)+v[h^{*}]_{q},

and the above suprema are simultaneously attained by a random variable XX such that FX(p,m,v)F_{X}\in\mathcal{M}(p,m,v) with

VaRt(X)=m+vϕhq(t),t(0,1)a.e.\mathrm{VaR}_{t}(X)=m+v\phi_{h^{*}}^{q}(t),~{}~{}t\in(0,1)~{}a.e. (A.9)
Proof.

The statement directly follows from Theorem 1 and Lemma A.1. ∎

Proof of Theorem 5.

Together with Theorem 1, Lemmas A.1 and A.2 give the statement in Theorem 5 on the supremum. Arguments for the infimum are symmetric. For instance, noting that (h)=h(-h)^{*}=-h_{*}, Theorem 1 yields

infFY(p,m,v)ρh(Y)\displaystyle\inf_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h}(Y) =supFY(p,m,v)ρh(Y)\displaystyle=-\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{-h}(Y)
=supFY(p,m,v)ρ(h)(Y)\displaystyle=-\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{(-h)^{*}}(Y)
=supFY(p,m,v)ρh(Y)=infFY(p,m,v)ρh(Y).\displaystyle=-\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{-h_{*}}(Y)=\inf_{F_{Y}\in\mathcal{M}(p,m,v)}\rho_{h_{*}}(Y).

We omit the detailed arguments for the infimum in Theorem 5. ∎

Proof of Proposition 5.

Note that ρhρh\rho_{h}\leqslant\rho_{h^{*}}, which is implied by hhh\leqslant h^{*} and (4). By Hölder’s inequality, for any YpY\in\mathcal{L}^{p}, using (13), we have

01h(t)VaR1t(Y)dt\displaystyle\int_{0}^{1}{h^{*}}^{\prime}(t)\mathrm{VaR}_{1-t}(Y)\,\mathrm{d}t =01(h(t)ch,q)VaR1t(Y)dt+ch,q𝔼[Y]\displaystyle=\int_{0}^{1}({h^{*}}^{\prime}(t)-c_{h^{*},q})\mathrm{VaR}_{1-t}(Y)\,\mathrm{d}t+c_{h,q}\mathbb{E}[Y]
[h]qYp+ch,q𝔼[Y]<.\displaystyle\leqslant[h^{*}]_{q}\|Y\|_{p}+c_{h^{*},q}\mathbb{E}[Y]<\infty.

The other half of the statement is analogous. ∎

Proof of Corollary 1.

We prove the first half (the suprema). The second half is symmetric to the first half. Theorem 5 and Lemma A.2 give

supFY(p,m,v)VaRα(Y)=supFY(p,m,v)ESα(Y)=m+v[h]q.\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{VaR}_{\alpha}(Y)=\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{ES}_{\alpha}(Y)=m+v[h^{*}]_{q}.

By Lemma A.1, the corresponding random variable ZZ which attains ESα(Z)=m+v[h]q\mathrm{ES}_{\alpha}(Z)=m+v[h^{*}]_{q} has left-quantile function

FZ1(t)=m+vϕhq(t)=m+v|11α𝟙(α,1](t)1|q11α𝟙(α,1](t)1[h]q1q,t[0,1]a.e.F_{Z}^{-1}(t)=m+v\phi_{h^{*}}^{q}(t)=m+v\frac{\left|\frac{1}{1-\alpha}\mathds{1}_{(\alpha,1]}(t)-1\right|^{q}}{\frac{1}{1-\alpha}\mathds{1}_{(\alpha,1]}(t)-1}[h^{*}]^{1-q}_{q},~{}~{}t\in[0,1]~{}~{}\text{a.e.}

Note that ϕhq(t)\phi_{h^{*}}^{q}(t) only takes two values for tαt\geqslant\alpha and t<αt<\alpha, respectively. Thus ZZ is a bi-atomic random variable, and using 𝔼[Z]=m\mathbb{E}[Z]=m, we have, for some kp>0k_{p}>0,

(Z=m+αkp)=1α and (Z=m(1α)kp)=α.\mathbb{P}\left(Z=m+\alpha k_{p}\right)=1-\alpha\mbox{~{}~{}and~{}~{}}\mathbb{P}\left(Z=m-(1-\alpha)k_{p}\right)=\alpha.

We note that the number kpk_{p} can be determined from 𝔼[|Zm|p]=vp\mathbb{E}[|Z-m|^{p}]=v^{p}, that is,

kp=v(αp(1α)+(1α)pα)1/p,k_{p}=v\left(\alpha^{p}(1-\alpha)+(1-\alpha)^{p}\alpha\right)^{-1/p},

leading to

supFY(p,m,v)VaRα(Y)=supFY(p,m,v)ESα(Y)=m+vα(αp(1α)+(1α)pα)1/p,\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{VaR}_{\alpha}(Y)=\sup_{F_{Y}\in\mathcal{M}(p,m,v)}\mathrm{ES}_{\alpha}(Y)=m+v\alpha\left(\alpha^{p}(1-\alpha)+(1-\alpha)^{p}\alpha\right)^{-1/p},

and thus the desired equalities in the statement on suprema hold. ∎