Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures

Wei Wang²²2School of Business, University of Southampton, Southampton, SO17 1BJ, UK (ww1e17@soton.ac.uk). Huifu Xu³³3Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong (hfxu@se.cuhk.edu.hk), Shatin, N. T., Hong Kong. Tiejun Ma⁴⁴4School of Business, University of Southampton, Southampton, SO17 1BJ, UK (tiejun.ma@soton.ac.uk).

Abstract

When estimating the risk of a financial position with empirical data or Monte Carlo simulations via a tail-dependent law invariant risk measure such as the Conditional Value-at-Risk (CVaR), it is important to ensure robustness of the statistical estimator particularly when the data contain noise. Krätscher et al. [1] propose a new framework to examine the qualitative robustness of estimators for tail-dependent law invariant risk measures on Orlicz spaces, which is a step further from earlier work for studying the robustness of risk measurement procedures by Cont et al. [2]. In this paper, we follow the stream of research to propose a quantitative approach for verifying the statistical robustness of tail-dependent law invariant risk measures. A distinct feature of our approach is that we use the Fortet-Mourier metric to quantify variation of the true underlying probability measure in the analysis of the discrepancy between the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data, which enables us to derive an explicit error bound for the discrepancy when the risk functional is Lipschitz continuous with respect to a class of admissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us to examine the degree of robustness for tail-dependent risk measures. Finally, we apply our quantitative approach to some well-known risk measures to illustrate our theory.

Keywords. Quantitative robustness, tail-dependent law invariant risk measures, Fortet-Mourier metric, admissible laws, index of quantitative robustness.

1 Introduction

One of the main purposes of quantitative modeling in finance is to quantify the loss of a financial portfolio. Over the past two decades, various risk measures have been proposed for measuring the risk of financial portfolios. A risk measure is represented as a map assigning an extended real number (a measure of risk) to each random loss under an implicit assumption that the true loss probability distribution is known. However, in practice, the true probability distribution is often unknown or it is prohibitively expensive to calculate the risk using the true distribution. Thus, in applications, evaluating the risk of a random variable representing the loss of a financial position often involves two steps: estimating the probability distribution from available observations or from the sampling data of the random financial loss via, e.g., Monte Carlo method and then plugging the estimated distribution into a risk measure to quantify the financial loss. This is because the risk measures are mostly law invariant, that is, they are determined only by the probability distributions of random variables. For the loss of a financial portfolio, a measure of risk computed based on the estimated distribution is known as a plug-in estimate for the risk measure [3].

Let $X$ denote the random loss of a financial portfolio on a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and $\rho$ be a law invariant risk measure. The plug-in estimate for $\rho(X)$ is given by $\varrho(\widehat{P})$ , where $\widehat{P}$ is the empirical distribution based on available observations and $\varrho$ is a risk functional defined by

\displaystyle\varrho(P)=\rho(X),\;\;\mbox{\rm{if}}\;X\;\mbox{\rm{has law}}\;P;

(1.1)

see e.g. [4, 5]. In the literature, Cont et al. [2] first study the quality of statistical estimators of the law invariant risk measures using Hampel’s classical concept of qualitative robustness [6], that is, a risk functional estimator is said to be qualitatively robust if it is insensitive to the variation of the sampling data. The research is important because perceived data (particularly empirical data) may contain some noise. Without such insensitivity, financial activities based on the risk measures may cause damage. For instance, when $\rho(X)$ is applied to allocate the risk capital for an insurance company, altering the capital allocation may be costly. According to Hampel’s theorem, Cont et al. [2] demonstrate that the qualitative robustness of a statistical estimator is equivalent to the weak continuity of the risk functional, and that value at risk (VaR) is qualitatively robust whereas conditional value at risk (CVaR) is not.

Krätschmer et al [7] argue that the use of Hampel’s classical concept of qualitative robustness may be problematic because it requires the risk measure essentially to be insensitive with respect to the tail behaviour of the random variable and the recent financial crisis shows that a faulty estimate of tail behaviour can lead to a drastic underestimation of the risk. Consequently, they propose a refined notion of qualitative robustness that applies also to tail-dependent statistical functionals and that allows us to compare statistical functionals in regards to their degree of robustness. The new concept captures the trade-off between robustness and sensitivity and can be quantified by an index of qualitative robustness. Furthermore, under the new concept, Krätschmer et al [1] analyze the qualitative robustness to the law-invariant convex risk measure on Orlicz spaces and show that CVaR and spectral risk measures are all qualitatively robust when the perturbation of probability distribution is restricted to a finer topological space. Alternative generalizations of Hampel’s theorem can be found for strong mixing data (Zähle [8, 9]) and for stochastic processes in various ways (Boente et al [10] and Strohriegl and Hable [11]). For comprehensive study of statistical robustness, we refer readers to [12, 13, 14, 15] and references therein.

In this paper, we take a step further by deriving an error bound for the plug-in estimators of law invariant risk measures in terms of the variation of data and we call the analysis quantitative because no such error bound is established in the existing qualitative robust analysis. This is achieved by adopting different metrics to measure the discrepancy of the estimators and the variation of data. Specifically, we use the Fortet-Mourier metrics as opposed to the Lévy distance in Cont et al. [2] or the weighted Kolmogorov metric in Krätschmer et al. [7] to quantify the data variation (the perturbation of the true probability distributions). Moreover, we introduce a new notion of the so-called admissible laws, which effectively restrict the scope of data variation. The new metrics enable us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators (of law invariant risk measure based on the true data and perturbed data) and the discrepancy of the associated probability distributions of the data. The research is inspired by the recent work of Guo and Xu [16] where the authors derive quantitative statistical robustness for preference robust optimization models under Kantorovich metric. The main contributions of the paper can be summarized as follows.

First, we introduce the notion of admissible laws induced by a probability metric, which is a class of probability distributions whose discrepancy with the law of the Dirac measure at $0$ is finite. The admissibility effectively restricts the scope of data perturbation. Using the notion, we compare the admissibility under $\phi$ -topology and the Fortet-Mourier metric.

Second, we propose to use the Fortet-Mourier metric to quantify the variation of the probability measure. The metric enables us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data by noise and the change of the true underlying probability measures when the risk functional is Lipschitz continuous on a class of admissible laws. We find that the risk functionals associated with the general moment-type convex risk measures are Lipschitz continuous.

Third, we introduce the concept of Lipschitz continuity for a general statistical functional on a class of admissible laws induced by the Fortet-Morier metric and find that for the Lipschitz continuous risk measure, the parameter of the Fortet-Mourier metric allows us to compare the tail-dependent risk measures with regard to their degree of robustness, i.e., the index of statistical robustness.

Fourth, we apply the new approach to examine the quantitative statistical robustness of a range of well known risk measures, including CVaR, optimized certainty equivalent, shortfall risk measure and conclude that under mild conditions, they are all quantitatively robust, and the indexes of quantitative robustness to them are also calculated.

The rest of the paper is organized as follows. In Section 2, we set up the background of the problem for research. In Section 3, we introduce the concept of Fortet-Mourier metric and admissible laws. In section 4, we establish the quantitative statistical robustness theory and compare with the qualitative statistical robustness theory. In section 5, we apply our theory to risk measures and give some examples. Some technical details are given in the appendix.

2 Problem statement

In this section, we discuss the background of statistical robustness in the context of law invariant risk measures. We begin by a brief review of law invariant risk measures and its estimation, and then move to explain the issues when the data may contain noise.

Let $(\Omega,\mathcal{F},\mathbb{P})$ be an atomless probability space, where $\Omega$ is a sample space with sigma algebra $\mathcal{F}$ and $\mathbb{P}$ is a probability measure. Let $X:(\Omega,\mathcal{F},\mathbb{P})\rightarrow{\rm I\!R}$ be a financial loss and $F_{X}(x):=\mathbb{P}(X\leq x)$ be the law or the probability distribution of $X$ . For $p\geq 1$ , let $\mathscr{L}^{p}(\Omega,{\cal F},\mathbb{P})$ ( $\mathscr{L}^{p}$ for short) denote the space of random variables mapping from $(\Omega,{\cal F},\mathbb{P})$ to ${\rm I\!R}$ with finite $p$ -th order moments. We say that a map $\rho:\mathscr{L}^{1}\to\overline{{\rm I\!R}}:={\rm I\!R}\cup\{+\infty\}$ is a convex risk measure¹¹1We note that the canonical model space for law invariant convex risk measure is $\mathscr{L}^{1}$ [17]. [18] if it satisfies the following properties:

(i)

Monotonicity: $\rho(X)\leq\rho(Y)$ for $X,Y\in\mathscr{L}^{1}$ with $X\leq Y$ $\mathbb{P}$ -almost surely;
(ii)

Translation invariance: $\rho(X+c)=\rho(X)+c$ for $X\in\mathscr{L}^{1}$ and $c\in{\rm I\!R}$ ;
(iii)

Convexity: $\rho(\lambda X+(1-\lambda)Y)\leq\lambda\rho(X)+(1-\lambda)\rho(Y)$ for $X,Y\in\mathscr{L}^{1}$ and $\lambda\in[0,1]$ .

Moreover, if $\rho$ satisfies positive homogeneity, i.e., for any $\alpha\geq 0$ , $\rho(\alpha X)=\alpha\rho(X)$ , then $\rho$ is a coherent risk measure, see [19, 18] for the original definitions of these concepts. A risk measure $\rho$ is said to be law invariant if $\rho(X)=\rho(Y)$ for $X$ and $Y$ having the same law. We refer readers to Föllmer and Weber [20] for a recent overview of risk measures.

As discussed in [2, 7], it is a widely-accepted procedure to estimate the risk of a financial loss by means of a Monte Carlo method or from a set of available observations. Such a procedure is particularly sensible when $\rho$ is law invariant. The following proposition states that the law invariance of a risk measure $\rho$ is equivalent to the existence of a risk functional $\varrho$ in (1.1).

Proposition 2.1

Let $\mathscr{P}({\rm I\!R})$ denote the set of all probability measures on ${\rm I\!R}$ . If $\rho:\mathscr{L}^{1}\to\overline{{\rm I\!R}}$ is a law invariant risk measure, then there exists a unique risk functional $\varrho:\mathscr{P}({\rm I\!R})\to\overline{{\rm I\!R}}$ associated with $\rho$ such that for any $X\in\mathscr{L}^{1}$ ,

\displaystyle\rho(X)=\varrho(\mathbb{P}\circ X^{-1}).

(2.1)

The result is well-known, see for instance Delage et al. [21] for random variables defined in $\mathscr{L}^{\infty}$ . The usefulness of the representation is that it naturally captures the law invariance and allows one to define any law invariant risk measure directly over the space of probability measures $\mathscr{P}({\rm I\!R})$ induced by random variables in $\mathscr{L}^{\infty}$ (also known as probability distributions), see Fritelli et al. [22]. Dentcheva and Ruszczyński [23] take it further to define a class of law invariant risk measures in the space of quantile functions directly. In a more recent development, Haskell et al. [24] extend the research to a broad class of multi-attribute choice functions defined over the space of survival functions. Let $P:=\mathbb{P}\circ X^{-1}$ be the push-forward probability measure on ${\rm I\!R}$ induced by $X$ . Since $\mathbb{P}(X\leq x)$ coincides with $P((-\infty,x])$ ( $P(x)$ for short), we also call $P$ the distribution or the law of $X$ interchangeably throughout the paper. Consequently, we can write (2.1) as (1.1).

In this paper, we are not concerned with the definition of risk measures over the space of probability distributions or the space of quantile functions, rather we concentrate on the stability of statistical estimators of law invariant risk measures. The risk functional $\varrho(P)$ with the law $P=\mathbb{P}\circ X^{-1}$ can be used in a natural way to construct an estimator for the risk $\rho(X)$ of $X\in\mathscr{L}^{1}$ . All one needs to do is to take an estimate $P_{N}$ of $P$ based on the available observations of $X$ and then to plug this estimator into the risk functional $\varrho$ to obtain the desired estimator of $\rho(X)$ , i.e.,

\displaystyle\widehat{\varrho}_{N}(\xi^{1},\xi^{2},\ldots,\xi^{N}):=\varrho(P_{N}),

(2.2)

where in this paper, $P_{N}$ can be seen as the empirical distribution of an independent and identically distributed (i.i.d., for short) sequence $\xi^{1},\xi^{2},\ldots,\xi^{N}$ of historical observations or Monte Carlo simulations, i.e.,

\displaystyle P_{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\xi^{i}\leq x},\quad x\in{\rm I\!R}.

(2.3)

Here and later on $\mathbf{1}_{A}$ denotes the indicator function of event $A$ . Indeed, $P_{N}$ can be a fairly general estimates, for instance, $P_{N}$ can be a smoothed empirical distribution based on uncensored data or empirical distribution based on censored data, see, e.g., [3] or empirical distribution based on identically distributed dependent data, see, e.g., [9].

We can see that $\widehat{\varrho}_{N}$ is a mapping from ${\rm I\!R}^{N}$ to ${\rm I\!R}$ . Figure 1 illustrates the relationship between the risk functionals, their estimators and the spaces associated.

Figure 1: The diagram for risk functionals, their estimators and associated spaces

In practice, the samples obtained from empirical data may contain noise. In that case, we might regard the samples as generated by a perturbed random variable $Y$ with law $Q$ , that is, $Q=\mathbb{P}\circ Y^{-1}$ . Let $\tilde{\xi}^{1},\cdots,\tilde{\xi}^{N}$ be i.i.d samples from $Y$ . Then the practical empirical distribution function for estimating the law of $X$ is

\displaystyle Q_{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\tilde{\xi}^{i}\leq x},\quad x\in{\rm I\!R},

(2.4)

and the practical estimator is $\tilde{\varrho}_{N}=\widehat{\varrho}_{N}(\tilde{\xi}^{1},\cdots,\tilde{\xi}^{N}):=\varrho(Q_{N})$ with perceived empirical data whereas $\widehat{\varrho}_{N}$ is a statistical estimator with noise being detached. Since we are unable to obtain the latter, we tend to use the former as a statistical estimator of $\rho(X)$ and this works only if the two estimators are sufficiently close.

To quantify the closeness, we may look into the discrepancy between the laws of the two estimators under some metric $\mathsf{d\kern-0.70007ptl}$ , i.e.,

\displaystyle\mathsf{d\kern-0.70007ptl}(\mathrm{law}\{\varrho(P_{N})\},\mathrm{law}\{\varrho(Q_{N})\})=\mathsf{d\kern-0.70007ptl}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right),

(2.5)

where $P^{\otimes N}$ and $Q^{\otimes N}$ denote the probability measures on measurable space $\left({\rm I\!R}^{N},{\cal B}({\rm I\!R})^{\otimes N}\right)$ with marginals $P$ and $Q$ on each $({\rm I\!R},{\cal B}({\rm I\!R}))$ respectively, $\mathcal{B}({\rm I\!R})$ denotes the corresponding Borel sigma algebra of ${\rm I\!R}$ . Since neither $P$ nor $Q$ is known, we want the discrepancy to be uniformly small for all $P$ and $Q$ over a subset of admissible laws on $\mathscr{P}({\rm I\!R})$ so long as $Q$ is sufficiently close to $P$ under some metric $\mathsf{d\kern-0.70007ptl}^{\prime}$ . The uniformity may be interpreted as robustness. Qualitative robustness refers to the case that the relationship between $\mathsf{d\kern-0.70007ptl}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\tilde{\varrho}_{N}^{-1}\right)$ and $\mathsf{d\kern-0.70007ptl}^{\prime}(P,Q)$ is implicit whereas quantitative robustness refers to the case that the relationship is explicit, i.e., a function of the latter can be used to bound the former, and this is what we aim to achieve in this paper because qualitative robustness have been well investigated, for instance, in [2, 1, 7].

3 $\zeta$ -metrics and admissible laws

There are two essential elements in investigating both the qualitative and quantitative statistical robustness of a risk functional: One is the specific choice of probability metrics but not just the topologies generated by them, see, e.g., [12, 2, 7], to quantify the change of the law $P$ and to estimate the discrepancy between the laws of two estimators, i.e., (2.5); the other is the determination of the subset $\mathscr{M}$ of admissible laws in $\mathscr{P}({\rm I\!R})$ (see, e.g., [7, 9]), containing all empirical distributions: $\mathscr{M}_{1,\mathrm{emp}}\subset\mathscr{M}$ , to restrict the perturbation of the law $P$ . For instance, the subset $\mathscr{M}$ may be specified via some generalized moment conditions, which are interesting in econometric or financial applications.

To introduce these two essential elements thoroughly, some preliminary notions and results in probability theory and statistics such as $\phi$ -weak topology are required. We first give a sketch of them to prepare our discussions in the follow-up sections. Let $\phi:{\rm I\!R}\to[0,\infty)$ be a continuous function and ${\cal M}_{1}^{\phi}:=\left\{P^{\prime}\in\mathscr{P}({\rm I\!R}):\int_{{\rm I\!R}}\phi(t)P^{\prime}(dt)<\infty\right\}$ . In the particular case when $\phi(\cdot):=|\cdot|^{p}$ and $p$ is a positive number, write ${\cal M}_{1}^{p}$ for ${\cal M}_{1}^{|\cdot|^{p}}$ . Note that ${\cal M}_{1}^{\phi}$ defines a subset of probability measures in $\mathscr{P}({\rm I\!R})$ which satisfies the generalized moment condition of $\phi$ . From the definition, we can see that ${\cal M}_{1}^{p_{2}}\subset{\cal M}_{1}^{p_{1}}$ for any positive numbers $p_{1},p_{2}$ with $p_{1}<p_{2}$ due to Hölder inequality.

Definition 3.1 ( $\phi$ -weak topology)

Let $\phi:{\rm I\!R}\to[0,\infty)$ be a gauge function, that is, $\phi$ is continuous and $\phi\geq 1$ holds outside a compact set. Define ${\cal C}_{1}^{\phi}$ the linear space of all continuous functions $h:{\rm I\!R}\to{\rm I\!R}$ for which there exists a positive constant $c$ such that

|h(t)|\leq c(\phi(t)+1),\forall t\in{\rm I\!R}.

The $\phi$ -weak topology, denoted by $\tau_{\phi}$ , is the coarsest topology on ${\cal M}_{1}^{\phi}$ for which the mapping $g_{h}:{\cal M}_{1}^{\phi}\to{\rm I\!R}$ defined by $g_{h}(P^{\prime}):=\int_{{\rm I\!R}}h(t)P^{\prime}(dt),\;\forall h\in{\cal C}_{1}^{\phi}$ , is continuous. A sequence $\{P_{l}\}\subset{\cal M}_{1}^{\phi}$ is said to converge $\phi$ -weakly to $P\in{\cal M}_{1}^{\phi}$ written ${P_{l}}\xrightarrow[]{\phi}P$ if it converges w.r.t. $\tau_{\phi}$ .

Clearly, $\phi$ -weak topology is finer than the weak topology, and the two topologies coincide if and only if $\phi$ is bounded. It is well known (see [7, Lemma 3.4]) that $\phi$ -weak convergence is equivalent to weak convergence, denoted by ${P_{l}}\xrightarrow[]{w}P$ , together with $\int_{{\rm I\!R}}\phi(t)P_{l}(dt)\to\int_{{\rm I\!R}}\phi(t)P(dt)$ . Moreover, it follows by [7, 1] that the $\phi$ -weak topology on ${\cal M}_{1}^{\phi}$ is generated by the metric $\mathsf{d\kern-0.70007ptl}_{\phi}:{\cal M}_{1}^{\phi}\times{\cal M}_{1}^{\phi}\to{\rm I\!R}$ defined by

\displaystyle\mathsf{d\kern-0.70007ptl}_{\phi}(P,Q):=\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)+\left|\int_{{\rm I\!R}}\phi(t)P(dt)-\int_{{\rm I\!R}}\phi(t)Q(dt)\right|,

(3.1)

for $P,Q\in{\cal M}_{1}^{\phi}$ , where $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}:\mathscr{P}({\rm I\!R})\times\mathscr{P}({\rm I\!R})\to{\rm I\!R}_{+}$ is the Prokhorov metric defined by

\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q):=\inf\{\epsilon>0:P(A)\leq Q(A^{\epsilon})+\epsilon,\forall A\in\mathcal{B}({\rm I\!R})\},

(3.2)

where $A^{\epsilon}:=A+B_{\epsilon}(0)$ denotes the Minkowski sum of $A$ and the open ball centred at $0$ on ${\rm I\!R}$ and $\mathcal{B}({\rm I\!R})$ is the corresponding Borel sigma algebra on ${\rm I\!R}$ . We note that the Prokhorov metric metrized the weak topology on ${\rm I\!R}$ see, e.g., [25].

3.1 $\zeta$ -metrics

Instead of exploiting the widely-used probability metrics such as the Prokhorov metric and the weighted Kolmogorov metric in the literature of qualitative robustness [2, 7], we will switch to the so-called metrics with $\zeta$ -structure to establish the quantitative statistical robustness framework for a risk functional. In particular, we will use the well-known Kantorovich metric and Fortet-Mourier metrics. The new metrics enable us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators of law invariant risk measures based on the true data and perturbed data with noise and the discrepancy of the associated true probability measure. We begin with a formal definition of $\zeta$ -metrics and then clarify the relationships between metrics of $\zeta$ -structure and those used in [26, 2, 7].

Definition 3.2

Let $P,Q\in\mathscr{P}({\rm I\!R})$ and ${\cal F}$ be a class of measurable functions from ${\rm I\!R}$ to ${\rm I\!R}$ . The metric with $\zeta$ -structure is defined by

\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q):=\sup_{\psi\in\mathcal{F}}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|.

(3.3)

From the definition, we can see that $\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)$ is the maximum difference of the expected values of the class of measurable functions $\mathcal{F}$ with respect to $P$ and $Q$ . $\zeta$ -metrics are widely used in the stability analysis of stochastic programming, see Römisch [27] for an excellent overview. The specific metrics with $\zeta$ -structure that we consider in this paper are the Kantorovich metric and the Fortet-Mourier metric. The next definition gives a precise description of the two notions.

Definition 3.3 (Fortet-Mourier metric)

Let

\displaystyle\mathcal{F}_{p}({\rm I\!R}):=\left\{\psi:{\rm I\!R}\rightarrow{\rm I\!R}:|\psi(\xi)-\psi(\tilde{\xi})|\leq c_{p}(\xi,\tilde{\xi})|\xi-\tilde{\xi}|,\forall\xi,\tilde{\xi}\in{\rm I\!R}\right\},

(3.4)

where $c_{p}(\xi,\tilde{\xi}):=\max\{1,|\xi|,|\tilde{\xi}|\}^{p-1}$ for all $\xi,\tilde{\xi}\in{\rm I\!R}$ and $p\geq 1$ describes the growth of the local Lipschitz constants. The $p$ -th order Fortet-Mourier metric for $P,Q\in\mathscr{P}({\rm I\!R})$ is defined by

\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q):=\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|.

(3.5)

In the case when $p=1$ , it is known as the Kantorovich metric for $P,Q\in\mathscr{P}({\rm I\!R})$

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P,Q):=\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|.

(3.6)

From the definition, we can see that for any positive numbers $p\geq p^{\prime}\geq 1$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)\geq\mathsf{d\kern-0.70007ptl}_{FM,p^{\prime}}(P,Q)\geq\mathsf{d\kern-0.70007ptl}_{K}(P,Q),

(3.7)

which means that $\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)$ becomes tighter as $p$ increases and they are all tighter than $\mathsf{d\kern-0.70007ptl}_{K}(P,Q)$ . Moreover, the Fortet–Mourier metric metricizes weak convergence on sets of probability measures possessing uniformly a $p$ -th moment [28, p. 350]. Notice that the function $t\rightarrow\frac{1}{p}|t|^{p}$ for $t\in{\rm I\!R}$ belongs to $\mathcal{F}_{p}({\rm I\!R})$ . On ${\rm I\!R}$ , the Fortet–Mourier metric may be equivalently written as

\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)=\int_{{\rm I\!R}}\max\{1,|x|^{p-1}\}|P(x)-Q(x)|dx,\;\mbox{\rm{for}}\;P,Q\in\mathscr{P}({\rm I\!R}),

(3.8)

see, e.g., [29, p. 93].

In the next example, we illustrate the relationship between the existing probability metrics used in statistical robustness and the metrics with $\zeta$ -structure.

Example 3.1

A number of well known probability metrics are used in the literature of statistical robustness.

(i) The Kantorovich (or Wasserstein) metric. Let $\mathcal{F}_{1}$ be the set of all Lipschitz continuous functions with modulus being bounded by $1$ . Then

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P,Q):=\int_{-\infty}^{+\infty}|P(x)-Q(x)|dx=\mathsf{d\kern-0.70007ptl}_{\mathcal{F}_{1}}(P,Q).

(3.9)

Moreover, $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q)$ , see [25, Theorem 2].

(ii) The Lévy distance [29]. Let $\mathcal{F}$ be the set of functions bounded by 1. Then

\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q):=\inf\{\epsilon>0:Q(x-\epsilon)-\epsilon\leq P(x)\leq Q(x+\epsilon)+\epsilon,\;\forall\;x\in{\rm I\!R}\}\leq\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q).

Moreover, $\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q)\leq\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)$ and $\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q)\leq\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q)$ for any $\phi\geq 1$ , see, e.g., [25].

(iii) The weighted Kolmogorov metric [7]. Let $\phi$ be a $u$ -shaped function, i.e., a continuous function $\phi:{\rm I\!R}\rightarrow[1,+\infty)$ that is non-increasing on $(-\infty,0)$ and non-decreasing on $(0,+\infty)$ . Then the weighted Kolmogorov metric is defined as

\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q):=\sup_{x\in{\rm I\!R}}|P(x)-Q(x)|\phi(x)\leq\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q),

where $\mathcal{F}$ is the set of all functions bounded by $\phi$ . Precisely, if $\mathcal{F}$ is the set of all indicator functions $\mathbf{1}_{B}$ , where $B:=\{(-\infty,\xi],\xi\in{\rm I\!R}\}$ , then $\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\mathsf{d\kern-0.70007ptl}_{(1)}(P,Q)$ , which is known as the Kolmogorov metric. Similarly, by letting $\mathcal{F}$ be the set of all weighted indicator functions with weighting $\phi$ , one can obtain $\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q)$ .

(iv) The Prokhorov metric [7]. Let $\mathcal{F}$ be the set of all functions bounded by 1. Then by [25],

\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q):=\inf\{\epsilon>0:P(A)\leq Q(A^{\epsilon})+\epsilon,\forall A\in\mathcal{B}({\rm I\!R})\}\leq\frac{1}{2}\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q),

where $A^{\epsilon}:=\{x\in{\rm I\!R}:\inf_{y\in A}|x-y|\leq\epsilon\}$ . Moreover, $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q)$ .

(v) The Dudley’s (or Bounded) Lipschitz metric [26]. Let $\mathcal{F}_{\mathrm{BL}}$ consist of all Lipschitz continuous $f$ such that $\|f\|_{\infty}+\mathrm{Lip}(f)\leq 1$ , where $\|f\|_{\infty}$ denotes the usual sup-norm and $\mathrm{Lip}(f)$ is the Lipschiz constant of the Lipschiz function $f$ , then

\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\sup_{f\in\mathcal{F}_{\mathrm{BL}}}\left|\int f(x)P(dx)-\int f(x)Q(dx)\right|:=\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,Q).

Moreover, $\frac{2}{3}\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,Q)\leq 2\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)$ , see, e.g., [26, Section 3].

3.2 Admissible laws

We now turn to discuss another important component in statistical robust analysis, that is, the subset $\mathscr{M}$ of admissible laws in $\mathscr{P}({\rm I\!R})$ which describes the scope of the perturbation of the law $P$ by a metric. This can be motivated by ensuring the finiteness of $\mathsf{d\kern-0.70007ptl}(P,Q)$ . To this effect, we formally introduce the concept of admissible laws induced by probability metrics.

Definition 3.4 (Admissible laws induced by probability metrics)

Let $\mathsf{d\kern-0.70007ptl}$ be a probability metric on $\mathscr{P}({\rm I\!R})$ . The admissible laws induced by $\mathsf{d\kern-0.70007ptl}$ are defined as

\displaystyle\mathscr{P}_{\mathsf{d\kern-0.49005ptl}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}(P,\delta_{0})<+\infty\},

(3.10)

where $\delta_{0}$ denotes the Dirac measure at $0$ .

Let $\mathscr{P}_{p}({\rm I\!R})$ denote the admissible laws induced by the Fortet-Mourier metrics with parameter $p$ on $\mathscr{P}({\rm I\!R})$ . By Definition 3.4, we have

$\displaystyle\mathscr{P}_{p}({\rm I\!R})$	$\displaystyle:=$	$\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{FM,p}(P,\delta_{0})<+\infty\}$	(3.11)
	$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left\|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)\delta_{0}(d\xi)\right\|<+\infty\right\}$
	$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left\|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right\|<+\infty\right\}.$

By triangle inequality, this ensures $\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty$ for any $P,Q\in\mathscr{P}_{p}({\rm I\!R})$ .

In the following example, we compare the admissible laws induced by different probability metrics.

Example 3.2 (Admissible laws induced by probability metrics)

We reconsider the admissible laws induced by probability metrics defined in Example 3.1.

(i) The admissible laws induced by the Kantorovich (or Wasserstein) metric are defined as

$\displaystyle\mathscr{P}_{K}({\rm I\!R})$	$\displaystyle:=$	$\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{K}(P,\delta_{0})<+\infty\}$
	$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\int_{-\infty}^{0}P(x)dx+\int_{0}^{+\infty}(1-P(x))dx<+\infty\right\}(=\mathscr{P}_{1}({\rm I\!R}))$
	$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\int_{{\rm I\!R}}\|x\|dP(x)<+\infty\right\}$
	$\displaystyle=$	$\displaystyle\mathcal{M}_{1}^{1},$

where the second equality follows from the definition of the Kantorovich metric (see, (3.9)). To see how the third equality holds, we note that for any $t<0$ , we have

	$\displaystyle+\infty>\int_{-\infty}^{0}P(x)dx$	$\displaystyle=$	$\displaystyle\int_{t}^{0}P(x)dx+\int_{2t}^{t}P(x)dx+\int_{-\infty}^{2t}P(x)dx$
		$\displaystyle\geq$	$\displaystyle\int_{t}^{0}P(x)dx+\frac{1}{2}P(2t)\|2t\|$

Since $-2tP(2t)\geq 0$ , then let $t\rightarrow-\infty$ , then we have $\lim_{t\rightarrow-\infty}tP(t)=0$ . Similarly, we have $\lim_{t\rightarrow+\infty}t(1-P(t))=0$ . By using integration-by-parts formula (more precisely [30, Theorem 1.15]), we obtain the right hand side of the third equality. The last equality follows from the definition of $\phi$ -topology in which case $\phi=|\cdot|$ .

(ii) The admissible laws induced by the Lévy distance are defined as

\displaystyle\mathscr{P}_{\mathrm{L\acute{e}vy}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,\delta_{0})<+\infty\}=\mathscr{P}({\rm I\!R}).

Since $\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}\leq 1$ , then the admissible laws coincide with $\mathscr{P}({\rm I\!R})$ .

(iii) The admissible laws induced by the weighted Kolmogorov metric are defined as

	$\displaystyle\mathscr{P}_{(\phi)}({\rm I\!R})$	$\displaystyle:=$	$\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,\delta_{0})<+\infty\}$
		$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{x\leq 0}\|P(x)\phi(x)\|+\sup_{x>0}\|(1-P(x))\phi(x)\|<+\infty\right\},$

which coincides with the set $\mathscr{M}_{1}^{(\phi)}$ defined in Krätschmer at al. [7, subsection 3.2].

If $\phi$ is bounded on ${\rm I\!R}$ , then it is straight that $\mathscr{P}_{(\phi)}({\rm I\!R})=\mathscr{P}({\rm I\!R})$ . In the case when $\phi$ is unbounded on ${\rm I\!R}$ , then

\displaystyle\mathcal{M}_{1}^{\phi}\subset\mathscr{P}_{(\phi)}({\rm I\!R})\subset\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}.

(3.12)

In what follows, we give a proof for (3.12). Let $P\in\mathcal{M}_{1}^{\phi}$ , since $\phi$ is a $u$ -shaped function, then for any $M>0$ and $N\leq 0$ , we have

$\displaystyle+\infty>\int_{{\rm I\!R}}\phi(x)dP(x)$	$\displaystyle=$	$\displaystyle\int_{0}^{+\infty}\phi(x)dP(x)+\int_{-\infty}^{0}\phi(x)dP(x)$
	$\displaystyle\geq$	$\displaystyle\int_{0}^{M}\phi(x)dP(x)+\phi(M)(1-P(M))+\int_{N}^{0}\phi(x)dP(x)+\phi(N)P(N)$
	$\displaystyle\geq$	$\displaystyle\phi(M)(1-P(M))+\phi(N)P(N),$

and consequently $\int\phi d\mu\geq\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|$ . Thus, $\mathcal{M}_{1}^{\phi}\subset\mathscr{P}_{(\phi)}({\rm I\!R})$ .

On the other hand, for any $\epsilon>0$ , if we let $\phi_{\epsilon}(x):=\phi(x)^{1-\epsilon}$ for $x\in{\rm I\!R}$ , then $\phi_{\epsilon}$ is a gague function. Moreover, for any $P\in\mathscr{P}_{(\phi)}({\rm I\!R})$ , there exists a $k<+\infty$ such that $k=\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|$ . To ease the exposition, we can assume that the law $P(x)>0$ for any $x\in{\rm I\!R}$ . Then

\displaystyle\phi(x)\leq\frac{k}{P(x)}\;\mbox{\rm{for}}\;x\leq 0\;\;\mbox{\rm{and}}\;\phi(x)\leq\frac{k}{1-P(x)}\;\mbox{\rm{for}}\;x>0.

Thus

$\displaystyle\int_{{\rm I\!R}}\phi_{\epsilon}(x)dP(x)$	$\displaystyle=$	$\displaystyle\int_{-\infty}^{0}\phi_{\epsilon}(x)dP(x)+\int_{0}^{+\infty}\phi_{\epsilon}(x)dP(x)$
	$\displaystyle\leq$	$\displaystyle k^{1-\epsilon}\int_{-\infty}^{0}\frac{1}{P(x)^{1-\epsilon}}dP(x)+k^{1-\epsilon}\int_{0}^{+\infty}\frac{1}{(1-P(x))^{1-\epsilon}}dP(x)$
	$\displaystyle=$	$\displaystyle k^{1-\epsilon}\left[\frac{1}{\epsilon}P(x)^{\epsilon}\right]_{-\infty}^{0}-k^{1-\epsilon}\left[\frac{1}{\epsilon}(1-P(x))^{\epsilon}\right]$
	$\displaystyle=$	$\displaystyle\frac{1}{\epsilon}k^{1-\epsilon}[P(0)^{\epsilon}-(1-P(0))^{\epsilon}]<+\infty$

which implies $P\in\mathcal{M}_{1}^{\phi_{\epsilon}}$ . Summarizing the discussions above, we obtain (3.12).

We note that if $\phi$ is unbounded, then the inclusions in (3.12) are strict because we can find a counterexample showing equality may fail, see Example B.1 in the appendix.

(iv) The admissible laws induced by the Prokhorov metric are defined as

\displaystyle\mathscr{P}_{\mathrm{Prok}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,\delta_{0})<+\infty\}=\mathscr{P}({\rm I\!R}).

Since $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}\leq 1$ , then the admissible laws coincides with $\mathscr{P}({\rm I\!R})$ .

(v) The admissible laws induced by the Dudley’s (or Bounded) metric are defined as

	$\displaystyle\mathscr{P}_{\mathrm{Lip}}({\rm I\!R})$	$\displaystyle:=$	$\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,\delta_{0})<+\infty\}$
		$\displaystyle=$	$\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{f\in\mathcal{F}_{\mathrm{BL}}}\left\|\int_{{\rm I\!R}}f(x)P(dx)-f(0)\right\|<+\infty\right\}=\mathscr{P}({\rm I\!R}).$

Since $\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}\leq 2$ , then the admissible laws coincide with $\mathscr{P}({\rm I\!R})$ .

3.3 Relationship with $\phi$ -weak topology

Since $\phi$ -weak topology has been widely used for qualitative robust analysis in the literature whereas we use the topology induced by the Fortet-Mourier metrics for quantitative robust analysis, it would therefore be helpful to look into potential connections of the two apparently completely different metrics. In the next proposition, we look into such connection from admissible set perspective (which defines the space of probability measures that $P$ is perturbed in both qualitative and quantitative robust analysis), we find that $\mathscr{P}_{p}({\rm I\!R})$ coincides with ${\cal M}_{1}^{\phi}$ for some specific choice of $\phi$ and subsequently show that the Fortet-Mourier metric is tighter than $\mathsf{d\kern-0.70007ptl}_{\phi}$ .

Proposition 3.1

Let $p\geq 1$ be fixed and

\displaystyle\phi_{p}(t):=\left\{\begin{array}[]{ll}|t|,&\mbox{\rm{for}}\;|t|\leq 1,\\ |t|^{p},&\mbox{\rm{otherwise}}.\end{array}\right.

The following assertions hold.

(i)

$\mathscr{P}_{p}({\rm I\!R})=\mathcal{M}_{1}^{\phi_{p}}(=\mathcal{M}_{1}^{p})$ .
(ii)

$\mathsf{d\kern-0.70007ptl}_{\phi_{p}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}+p\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q),\;\forall P,Q\in\mathscr{P}_{p}({\rm I\!R})$ .
(iii)

$\mathsf{d\kern-0.70007ptl}_{FM,p}$ metrizes the $\phi_{p}$ -weak topology on $\mathscr{P}_{p}({\rm I\!R})$ .

Part (i) of the proposition says that the admissible set $\mathscr{P}_{p}({\rm I\!R})$ coincides with the set of laws on ${\rm I\!R}$ satisfying the generalized moment condition of $\phi_{p}$ . Part (ii) indicates that $\mathsf{d\kern-0.70007ptl}_{FM,p}$ is tighter than $\mathsf{d\kern-0.70007ptl}_{\phi_{p}}$ . Part (iii) means that the $\phi_{p}$ -weak topology on $\mathscr{P}_{p}({\rm I\!R})$ is generated by the metric $\mathsf{d\kern-0.70007ptl}_{FM,p}$ .

Proof. Part (i). Since for any $p\geq 1$ , $\frac{1}{p}\phi_{p}\in\mathcal{F}_{p}({\rm I\!R})$ , then by the definition of $\mathscr{P}_{p}({\rm I\!R})$ , we have that $P\in\mathscr{P}_{p}({\rm I\!R})$ implies $P\in\mathcal{M}_{1}^{\phi_{p}}$ and subsequently, $\mathscr{P}_{p}({\rm I\!R})\subset\mathcal{M}_{1}^{\phi_{p}}$ .

On the other hand, let $P\in\mathcal{M}_{1}^{\phi_{p}}$ , then $\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)<\infty$ . For any $\psi\in\mathcal{F}_{p}({\rm I\!R})$ , we have

\displaystyle|\psi(\xi)-\psi(0)|\leq\max\{1,|\xi|^{p-1}\}|\xi|\leq\max\{|\xi|,|\xi|^{p}\},\;\mbox{\rm{for all}}\;\xi\in{\rm I\!R},

and consequently,

	$\displaystyle\left\|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right\|$	$\displaystyle=$	$\displaystyle\left\|\int_{{\rm I\!R}}(\psi(\xi)-\psi(0))P(d\xi)\right\|\leq\int_{{\rm I\!R}}\|\psi(\xi)-\psi(0)\|P(d\xi)$
		$\displaystyle=$	$\displaystyle\int_{{\rm I\!R}}\max\{\|\xi\|,\|\xi\|^{p}\}P(d\xi)\leq\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi).$

Therefore, we have

\displaystyle\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right|\leq\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)<\infty,

and consequently, $\mathcal{M}_{1}^{\phi_{p}}\subset\mathscr{P}_{p}({\rm I\!R})$ .

Part (ii). Since $\frac{1}{p}\phi_{p}\in\mathcal{F}_{p}({\rm I\!R})$ , then for any $P,Q\in\mathscr{P}_{p}({\rm I\!R})$ ,

\displaystyle\left|\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)-\int_{{\rm I\!R}}\phi_{p}(\xi)Q(d\xi)\right|\leq p\left|\int_{{\rm I\!R}}\frac{1}{p}\phi_{p}(\xi)P(d\xi)-\int_{{\rm I\!R}}\frac{1}{p}\phi_{p}(\xi)Q(d\xi)\right|\leq p\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q).

From Example 3.1(i) and (3.7), we have $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}$ . Finally, by the definition of $\mathsf{d\kern-0.70007ptl}_{\phi_{p}}$ , i.e., (3.1), we obtain the conclusion.

Part (iii) follows straightforwardly from Part (ii).

Proposition 3.1 indicates that despite Fortet-Mourier metric $\mathsf{d\kern-0.70007ptl}_{FM,p}$ and $\mathsf{d\kern-0.70007ptl}_{\phi_{p}}$ are different metrics, they generate the same topology, which confirms the statement at the beginning of this section, i.e., for the qualitative robustness and the quantitative robustness, the specific choice of probability metrics matters but not the topologies generated by them. To conclude this section, we remark that the subset $\mathscr{M}$ to be used in the definition of qualitative robust analysis will be confined to the set of admissible laws when we adopt the Fortet-Mourier metric for quantitative robust analysis in the next section.

4 Statistical robustness

We are now ready to return our discussions to the robustness of statistical estimators of law invariant risk measures that are outlined in Section 2.

4.1 Qualitative statistical robustness

To position our research properly, we begin by a brief overview of the existing results about the qualitative statistical robustness.

Definition 4.1 (Qualitative $\mathcal{P}_{0}$ -Robustness [2, 1])

Let $\mathcal{P}_{0}$ be a subset of $\mathscr{P}({\rm I\!R})$ and $P\in\mathcal{P}_{0}$ . The sequence $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ of estimators is said to be qualitatively $\mathcal{P}_{0}$ -robust at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime})$ if for every $\epsilon>0$ there exist $\delta>0$ and $N_{0}\in\mathbb{N}$ such that for all $Q\in\mathscr{P}_{0}$ and $N\geq N_{0}$

\displaystyle\mathsf{d\kern-0.70007ptl}(P,Q)\leq\delta\implies\mathsf{d\kern-0.70007ptl}^{\prime}(P^{\mathbb{N}}\circ\widehat{\varrho}_{N}^{-1},Q^{\mathbb{N}}\circ\widehat{\varrho}_{N}^{-1})\leq\epsilon.

(4.1)

If, in addition, $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ arises as in (2.2) from a risk functional $\varrho$ , then $\varrho$ is called qualitatively $\mathscr{P}_{0}$ -robust at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime})$ .

The definition above captures two versions of qualitative statistical robustness proposed by Cont et al. [2] for i.i.d. observations on ${\rm I\!R}$ with $\mathsf{d\kern-0.70007ptl}$ and $\mathsf{d\kern-0.70007ptl}^{\prime}$ being Lévy distance and Krätchmer et al. [7] for i.i.d. observations on ${\rm I\!R}$ with $\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{(\phi)}$ and $\mathsf{d\kern-0.70007ptl}^{\prime}=\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}$ respectively. Since $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}$ is tighter than $\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}$ , it means Krätchmer et al. [7] examines the discrepancy of the laws with a tighter metric. On the other hand, from the definition of $\mathsf{d\kern-0.70007ptl}_{(\phi)}$ , we can see that it is also tighter than $\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}$ and allows one to capture the difference of distributions at the tail, it means the robust analysis in Krätchmer et al. [7] is restricted to a smaller class of probability distributions when $Q$ is perturbed from $P$ . This explains why CVaR is robust under the criterion of the latter but not the former.

A key result that Krätchmer et al. [7] establish is the Hampel’s theorem which states the equivalence between qualitative statistical robustness and stability/continuity of a risk functional (with respect to perturbation of the probability distribution) under uniform Glivenko-Cantelli (UGC) property of empirical distributions over a specified set.

Definition 4.2 ( $\mathscr{C}$ -Continuity [7])

Let $P\in\mathscr{P}({\rm I\!R})$ and $\mathscr{C}$ be a subset of $\mathscr{P}({\rm I\!R})$ . Then $\varrho$ is called $\mathscr{C}$ -continuous at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},|\cdot|)$ if for every $\epsilon>0$ , there exists $\delta>0$ such that for all $Q\in\mathscr{C}$

\displaystyle\mathsf{d\kern-0.70007ptl}(P,Q)\leq\delta\implies|\varrho(P)-\varrho(Q)|\leq\epsilon.

Definition 4.3 (UGC Property [7])

Let $\mathscr{C}$ be a subset of $\mathscr{P}({\rm I\!R})$ . Then we say that the metric space $(\mathscr{C},\mathsf{d\kern-0.70007ptl})$ has the UGC property if for every $\epsilon>0$ and $\delta>0$ , there exists $N_{0}\in\mathbb{N}$ such that for all $P\in\mathscr{C}$ and $N\geq N_{0}$

\displaystyle P^{\otimes N}\left[(\xi^{1},\ldots,\xi^{N})\in{\rm I\!R}^{N}:\mathsf{d\kern-0.70007ptl}(P,P_{N})\geq\delta\}\right]\leq\epsilon.

The UGC property means that convergence in probability of the empirical probability measure to the true marginal distribution uniformly in $\mathscr{C}$ on $\mathscr{P}({\rm I\!R})$ . Examples for metrics spaces $(\mathscr{C},\mathsf{d\kern-0.70007ptl})$ having the UGC property can be found in [7, Section 3]. In particular, it is shown that there exists a subset of the admissible laws induced by the weigthed Kolmogorov metric enjoys the UGC property, see [7, Theorem 3.1].

Theorem 4.1 (Hampel’s Theorem [7])

Let $\mathscr{P}_{0}$ be a subset of $\mathscr{P}({\rm I\!R})$ and $P\in\mathscr{P}_{0}$ . Assume that $(\mathscr{P}_{0},\mathsf{d\kern-0.70007ptl})$ has the UGC property and $\mathscr{M}_{1,\mathrm{emp}}\subset\mathscr{P}_{0}$ . Then if the mapping $\varrho$ is $\mathscr{M}_{1,\mathrm{emp}}$ -continuous at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},|\cdot|)$ , the sequence $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ is qualitatively $\mathscr{P}_{0}$ -robust at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}})$ .

4.2 Quantitative statistical robustness

We now move on to discuss our central topic, quantitative statistical robustness for the plug-in estimators of law invariant risk measures. Intuitively speaking, quantitative statistical robustness of a risk functional $\varrho$ means that for any two admissible laws $P$ and $Q$ on $\mathscr{P}({\rm I\!R})$ , the distance between the laws of their plug-in estimators $\varrho(P_{N})$ and $\varrho(Q_{N})$ is bounded by the distance between $P$ and $Q$ when the sample size is sufficiently large.

Definition 4.4 (Quantitative statistical robustness)

Let $\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime}$ be probability metrics on $\mathscr{P}({\rm I\!R})$ and $\mathscr{M}\subset\mathscr{P}_{\mathsf{d\kern-0.49005ptl}^{\prime}}({\rm I\!R})$ denote a subset of admissible laws on ${\rm I\!R}$ . A sequence of statistical estimators $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ is said to be quantitative statistical robust on $\mathscr{M}$ w.r.t. $(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime})$ if there exists a non-decreasing real-valued continuous function $h:{\rm I\!R}_{+}\to{\rm I\!R}_{+}$ with $h(0)=0$ such that for all $P,Q\in\mathscr{M}$ and $N\in\mathbb{N}$

\displaystyle\mathsf{d\kern-0.70007ptl}(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1})\leq h(\mathsf{d\kern-0.70007ptl}^{\prime}(P,Q))<+\infty.

(4.2)

If in addition, $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ arise as in (2.2) from a risk functional $\varrho$ , then $\varrho$ is called quantitative statistical robust on $\mathscr{M}$ at $P$ w.r.t. $(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime})$ . In a particular case when $\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{K}$ , $h(t)=Lt$ and $\mathsf{d\kern-0.70007ptl}^{\prime}=\mathsf{d\kern-0.70007ptl}_{FM,p}$ , inequality (4.2) reduces to

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty.

(4.3)

In comparison with the qualitative statistical robustness introduced by Krätchmer et al. [1] or Cont et al. [2], the definition (4.3) here has several advantages. First, we use Kantorovich metric instead of Prokhorov metric to quantify the discrepancy between $P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}$ and $Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}$ . This enables us to capture the tail behaviour of the two laws and facilitate us to derive an explicit bound for the difference. Second, we use the Fortet-Mourier metric to quantify the perturbation of $P$ , which is more sensitive than the Lévy metric used in [2] and the weighted Kolmogorov metric in [1, 8, 9, 13] to the variation of the tails. Third, inequality (4.3) gives an error bound for the discrepancy of the two laws and the bound is valid for all $Q$ in $\mathscr{M}$ instead of those in a neighborhood of $P$ .

Next, we introduce a definition on the Lipschitz continuity of a general statistical mapping from $\mathscr{P}({\rm I\!R})$ to ${\rm I\!R}$ , which strengthens the earlier definition of $\mathscr{C}$ -continuity for a general statistical functional.

Definition 4.5 (Lipschitz continuity)

Let $\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R}$ be a general statistical functional and $\mathscr{M}$ be a subset of $\mathscr{P}({\rm I\!R})$ . $\varrho$ is said to be Lipschitz continuous on $\mathscr{M}$ w.r.t. $\mathsf{d\kern-0.70007ptl}$ if there exists a positive constant $L$ such that

\displaystyle|\varrho(P)-\varrho(Q)|\leq L\mathsf{d\kern-0.70007ptl}(P,Q)<+\infty,\quad\forall\;P,Q\in\mathscr{M}.

(4.4)

There are a few of points to note to the above definition of Lipschitz continuity:

1.

The Lipschitz continuity is global instead of local over $\mathscr{M}$ . The condition is strong but we will find that many risk functionals are global Lipschitz continuous on some $\mathscr{M}$ indeed.

The magnitude of the continuity depends on the metric $\mathsf{d\kern-0.70007ptl}$ which measures the distance between $P$ and $Q$ . In a specific case when $\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{FM,p}$ , (4.4) reduces to

\displaystyle|\varrho(P)-\varrho(Q)|\leq L\int_{{\rm I\!R}}|P(x)-Q(x)|c_{p}(x)dx<+\infty,\;\forall\;P,Q\in\mathscr{M},

(4.5)

where $c_{p}(x)=\max\{1,|x|^{p-1}\}$ . The exponent $p$ plays an important role in (4.5) because it interacts with the tails of $P(\cdot)$ and $Q(\cdot)$ . Moreover, if $\mathcal{M}\subset\mathscr{P}_{p}({\rm I\!R})$ , then (4.5) is finite. We will come back to this later.

Let $P_{N}$ and $Q_{N}$ be empirical distributions on ${\rm I\!R}$ . By plugging $P_{N}$ and $Q_{N}$ into (4.5), we obtain

$\displaystyle\|\varrho(P_{N})-\varrho(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle L\int_{{\rm I\!R}}\|P_{N}(x)-Q_{N}(x)\|c_{p}(x)dx$	(4.6)
	$\displaystyle=$	$\displaystyle L\sum_{k=1}^{2N}\left\|\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\xi^{i}\leq x_{k}}-\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\widehat{\xi}^{i}\leq x_{k}}\right\|\int_{x_{k}}^{x_{k+1}}c_{p}(x)dx$
	$\displaystyle=$	$\displaystyle L\sum_{k=1}^{N}\frac{1}{N}\left\|\int_{\xi^{i_{k}}}^{\widehat{\xi}^{j_{k}}}c_{p}(x)dx\right\|$
	$\displaystyle\leq$	$\displaystyle L\sum_{k=1}^{N}\frac{1}{N}\|\xi^{i_{k}}-\widehat{\xi}^{j_{k}}\|\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}$
	$\displaystyle\leq$	$\displaystyle\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})\|\xi^{k}-\widehat{\xi}^{k}\|,\quad\forall\xi^{k},\widehat{\xi}^{k}\in{\rm I\!R},$

where $x_{k}$ is the $k$ -th smallest number among $\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}$ for $k=1,\ldots,2N$ and $x_{2N+1}=x_{2N}$ and $c_{p}(\xi,\widehat{\xi})=\max\{1,|\xi|,|\widehat{\xi}|\}^{p-1}$ for all $\xi,\widehat{\xi}\in{\rm I\!R}$ . The equality is due to Fubini’s theorem for discrete case and the last inequality from Lemma A.1 for the non-decreasing sequences $\{\xi^{i_{k}}\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}\}_{k=1}^{N}$ and $\{\widehat{\xi}^{j_{k}}\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}\}_{k=1}^{N}$ .

4.

In the case when $\varrho$ is continuous on $\mathscr{M}$ , the Lipschitz continuity (4.5) is equivalent to the Lipschitz continuity (4.6) on the set of all empirical distributions $\mathscr{M}_{1,\mathrm{emp}}$ (see the first inequality of equation (4.6)) because $\mathscr{M}_{1,\mathrm{emp}}$ is dense in $\mathscr{P}({\rm I\!R})$ .

Example 4.1 ( $p$ -th moment functional)

For $p\geq 1$ , we consider the $p$ -th moment functional $T^{(p)}$ on $\mathcal{M}_{1}^{p}=\mathscr{P}_{p}({\rm I\!R})$ as defined by:

T^{(p)}(P):=\int_{-\infty}^{+\infty}x^{p}dP(x)<+\infty,\quad\forall P\in\mathcal{M}_{1}^{p}.

Analogous to Example B.1, we have

\displaystyle T^{(p)}(P)=-\int_{-\infty}^{0}P(x)px^{p-1}dx+\int_{0}^{+\infty}(1-P(x))px^{p-1}dx.

(4.7)

Thus, for any $P,Q\in\mathcal{M}_{1}^{p}$ ,

	$\displaystyle\|T^{(p)}(P)-T^{(p)}(Q)\|$	$\displaystyle=$	$\displaystyle\left\|\int_{-\infty}^{+\infty}(P(x)-Q(x))px^{p-1}dx\right\|\leq p\int_{-\infty}^{+\infty}\|P(x)-Q(x)\|\|x\|^{p-1}dx$		(4.8)
		$\displaystyle\leq$	$\displaystyle p\int_{-\infty}^{+\infty}\|P(x)-Q(x)\|c_{p}(x)dx<+\infty,$		(4.8)

where $c_{p}(x)=\max\{1,|x|^{p-1}\}$ . From (4.5), we can see that the $p$ -th moment functional $T^{(p)}$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$ on $\mathcal{M}_{1}^{p}$ .

Lemma 4.1

Let $\boldsymbol{\xi}:=(\xi^{1},\cdots,\xi^{N})\in{\rm I\!R}^{N}$ and $\Psi$ be a set of functions from ${\rm I\!R}^{N}$ to ${\rm I\!R}$ , i.e.,

\displaystyle\Psi:=\left\{\psi:{\rm I\!R}^{N}\to{\rm I\!R}:|\psi(\tilde{\boldsymbol{\xi}})-\psi(\widehat{\boldsymbol{\xi}})|\leq\frac{1}{N}\sum_{k=1}^{N}c_{p}(\tilde{\xi}^{k},\widehat{\xi}^{k})|\tilde{\xi}^{k}-\widehat{\xi}^{k}|,\;\forall\boldsymbol{\xi},\tilde{\boldsymbol{\xi}}\in{\rm I\!R}^{N}\right\},

(4.9)

where $c_{p}(\xi,\tilde{\xi}):=\max\{1,|\xi|,|\tilde{\xi}|\}^{p-1}$ for all $\xi,\tilde{\xi}\in{\rm I\!R}$ and $p\geq 1$ . Then

\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}(P^{\otimes N},Q^{\otimes N})\leq\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,\quad\forall P,Q\in\mathscr{P}_{p}({\rm I\!R}),

(4.10)

where $\mathsf{d\kern-0.70007ptl}_{\Psi}$ is defined by (3.3).

Before presenting a proof, it might be helpful for us to explain why we consider a specific set of functions $\Psi$ . For fixed $N\in\mathbb{N}$ , let $\mathscr{M}_{1,\mathrm{emp}}^{N}$ denote the set of all empirical laws $P_{N}$ over ${\rm I\!R}$ , then $\mathscr{M}_{1,\mathrm{emp}}=\bigcup_{N\in\mathbb{N}}\mathscr{M}_{1,\mathrm{emp}}^{N}$ . Then $\Psi$ may be regarded as a set of functions derived from a class of Lipschitz continuous functional on $\mathscr{M}_{1,\mathrm{emp}}^{N}$ with $L=1$ and $\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{FM,p}$ (by writing $T(P_{N})$ as a function of samples). Lemma 4.1 says that for any $N\in\mathbb{N}$ , the discrepancy between $P^{\otimes N}$ and $Q^{\otimes N}$ under the metric $\mathsf{d\kern-0.70007ptl}_{\Psi}$ can be bounded by $\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)$ .

Proof. Let $\xi^{-j}:=\{\xi^{1},\cdots,\xi^{j-1},\xi^{j+1},\cdots,\xi^{N}\}$ , $\vec{\xi}^{j}:=\{\xi^{1},\cdots,\xi^{j}\}$ and $\vec{\xi}^{-j}:=\{\xi^{j+1},\cdots,\xi^{N}\}$ . For any $P_{1},\cdots,P_{N}\in\mathscr{P}({\rm I\!R})$ and any $j\in\{1,\cdots,N\}$ , denote

P_{-j}(d\xi^{-j}):=P_{1}(d\xi^{1})\cdots P_{j-1}(d\xi^{j-1})P_{j+1}(d\xi^{j+1})\cdots P_{N}(d\xi^{N})

and

h_{{\xi}^{-j}}({\xi}^{j}):=\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j}).

Then

$\displaystyle\|h_{{\xi}^{-j}}(\tilde{\xi}^{j})-h_{{\xi}^{-j}}(\widehat{\xi}^{j})\|$	$\displaystyle\leq$	$\displaystyle\int_{{\rm I\!R}^{(N-1)}}\left\|\psi({\xi}^{-j},\tilde{\xi}^{j})-\psi({\xi}^{-j},\widehat{\xi}^{j})\right\|P_{-j}(d\xi^{-j})$
	$\displaystyle\leq$	$\displaystyle\int_{{\rm I\!R}^{(N-1)}}\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})\|\tilde{\xi}^{j}-\widehat{\xi}^{j}\|P_{-j}(d\xi^{-j})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})\|\tilde{\xi}^{j}-\widehat{\xi}^{j}\|.$

Let $\mathcal{H}$ denote the set of functions $h_{{\xi}^{-j}}({\xi}^{j})$ generated by $\psi\in\Psi$ . By the definition of $\mathsf{d\kern-0.70007ptl}_{\Psi}$ and the $p$ -th order Forter-Mourier metric,

$\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}(P_{-j}\times\tilde{P}_{j},P_{-j}\times\widehat{P}_{j})$	$\displaystyle=$	$\displaystyle\sup_{\psi\in\Psi}\left\|\int_{{\rm I\!R}}\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j})\tilde{P}_{j}(d\xi^{j})\right.$	(4.11)
		$\displaystyle-\left.\int_{{\rm I\!R}}\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j})\widehat{P}_{j}(d\xi^{j})\right\|$
	$\displaystyle=$	$\displaystyle\sup_{h_{{\xi}^{-j}}\in{\cal H}}\left\|\int_{{\rm I\!R}}h_{{\xi}^{-j}}({\xi}^{j})\tilde{P}_{j}(d\xi^{j})-\int_{{\rm I\!R}}h_{{\xi}^{-j}}({\xi}^{j})\widehat{P}_{j}(d\xi^{j})\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\mathsf{d\kern-0.70007ptl}_{FM,p}(\tilde{P}_{j},\widehat{P}_{j}),$

where the inequality is due to $Nh_{\xi^{-j}}(\xi^{j})\in\mathcal{F}_{p}({\rm I\!R})$ and the definition of $\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)$ . Finally, by the triangle inequality of the pseudo-metric, we have

$\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes N},Q^{\otimes N}\right)$	$\displaystyle\leq$	$\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes N},P^{\otimes(N-1)}\times Q\right)+\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes(N-1)}\times Q,P^{\otimes(N-2)}\times Q^{\otimes 2}\right)$
		$\displaystyle+\cdots+\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P\times Q^{\otimes(N-1)},Q^{\otimes N}\right)$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)\times N$
	$\displaystyle=$	$\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q).$

The proof is complete.

With the intermediate technical result, we are now ready to present our main result of quantitative statistical robustness for the plug-in estimator of a general risk functional.

Theorem 4.2

Let $\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R}$ be a general statistical functional and $\mathscr{M}$ be a subset of $\mathscr{P}_{p}({\rm I\!R})$ with $p\geq 1$ . Assume, for fixed $N\in\mathbb{N}$ , there exists a positive constant $L$ such that

\displaystyle|\varrho(P_{N})-\varrho(Q_{N})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})|\xi^{k}-\widehat{\xi}^{k}|,\;\forall\xi^{k},\widehat{\xi^{k}}\in{\rm I\!R},

(4.12)

where $P_{N}$ and $Q_{N}$ are given by (2.3) and (2.4) respectively. Then $\widehat{\varrho}_{N}$ is quantitatively robust on $\mathscr{M}$ w.r.t. $(\mathsf{d\kern-0.70007ptl}_{K},\mathsf{d\kern-0.70007ptl}_{FM,p})$ , i.e.,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,\;\forall\;P,Q\in\mathscr{M}.

(4.13)

If (4.12) holds for all $N\in\mathbb{N}$ , then the whole sequence of the plug-in estimators $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ is quantitatively robust on $\mathscr{M}$ , i.e., (4.13) holds for all $N\in\mathbb{N}$ . Moreover, in the case when $p=1$ , (4.13) reduces to

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,\;\forall\;P,Q\in\mathscr{M}.

(4.14)

Proof. Since the underlying probability space is atomless, then for any $N\in\mathbb{N}$ , by definition

	$\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)$	(4.15)
$\displaystyle=$	$\displaystyle\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left\|\int_{\rm I\!R}\psi(t)P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}(dt)-\int_{\rm I\!R}\psi(t)Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}(dt)\right\|$
$\displaystyle=$	$\displaystyle\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left\|\int_{{\rm I\!R}^{N}}\psi(\varrho(\vec{\xi}^{N}))P^{\otimes N}(d\vec{\xi}^{N})-\int_{{\rm I\!R}^{N}}\psi(\varrho(\vec{\xi}^{N}))Q^{\otimes N}(d\vec{\xi}^{N})\right\|,$

where we write $\vec{\xi}^{N}$ for $(\xi^{1},\cdots,\xi^{N})$ and $\varrho(\vec{\xi}^{N})$ for $\widehat{\varrho}_{N}$ to indicate its dependence on $\xi^{1},\cdots,\xi^{N}$ .

For any $\psi\in\mathcal{F}_{1}({\rm I\!R})$ , (4.12) ensures that

\displaystyle|\psi(\varrho(\tilde{\vec{\xi}}))-\psi(\varrho(\widehat{\vec{\xi}}))|\leq|\varrho(\tilde{\vec{\xi}})-\varrho(\widehat{\vec{\xi}})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\tilde{\xi}^{k},\widehat{\xi}^{k})|\tilde{\xi}^{k}-\widehat{\xi}^{k}|,\;\forall\tilde{\xi},\widehat{\xi}\in{\rm I\!R},

which means that $\psi(\varrho(\cdot))$ is locally Lipschitz continuous in $\vec{\xi}^{N}$ , i.e., $\psi(\varrho(\cdot))\in\mathcal{F}_{p}(({\rm I\!R}^{N})$ from (3.4). Since $P,Q\in\mathscr{M}\subset\mathscr{P}_{p}({\rm I\!R})\subset\mathscr{P}_{K}({\rm I\!R})$ (see Example 3.2(i) and Proposition 3.1(i)), then (4.15) is finite. The rest follows from Lemma 4.1 by setting $\psi(\xi^{1},\cdots,\xi^{N})=\psi(\varrho(\xi^{1},\cdots,\xi^{N}))$ .

From Example 3.1, we have $\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{K}(P,Q)}$ for all $P,Q\in\mathscr{P}({\rm I\!R})$ , then we have the following corollary.

Corollary 4.1

Let $\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R}$ be a general statistical functional. Assume that $\varrho$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$ ( $p\geq 1$ ) on $\mathscr{M}\subset\mathscr{P}_{p}({\rm I\!R})$ for the constant $L$ . Then the plug-in estimator sequence $\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}}$ is quantitatively robust on $\mathscr{M}$ w.r.t. $(\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}},\mathsf{d\kern-0.70007ptl}_{FM,p})$ , i.e.,

\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq\sqrt{L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}<+\infty,\;\forall\;P,Q\in\mathscr{M}

for all $N\in\mathbb{N}$ .

Next, we take a step further to consider the index of quantitative robustness for a general statistical functional.

Definition 4.6 (Index of quantitative robustness)

Let $\varrho:\mathscr{P}({\rm I\!R})\rightarrow{\rm I\!R}$ be a general statistical functional. If $\varrho$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$ on $\mathscr{P}_{p}({\rm I\!R})$ for the constant $L$ for some $p\geq 1$ , then we can define an index of quantitative robustness of a statistical functional $\varrho$ as

\displaystyle\mathrm{iqr}(\varrho):=\left(\inf\{p\in[1,+\infty):\mbox{\rm{$\varrho$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$}}\;\mbox{\rm{on}}\;\mathscr{P}_{p}({\rm I\!R})\}\right)^{-1}.

(4.16)

This index is a quantitative measurement for the degree of robustness of a statistical functional. A larger index reflects a higher degree of robustness. For a general statistical functional $\varrho$ , (4.5) may hold for uncountable many $p$ , see e.g., the $2$ -th moment functional $T^{(2)}$ satisfying (4.5) for any $p\geq 2$ on $\mathscr{P}_{p}({\rm I\!R})=\mathcal{M}_{1}^{p}$ . From Definition 4.6, we conclude that the $p$ -th moment functional $T^{(p)}$ has the index $\mathrm{iqr}(T^{(p)})=\frac{1}{p}$ . Definition 4.6 coincides with the index of qualitative robustness proposed by Krätschmer et al. [7] when $\varrho$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$ on $\mathscr{P}_{p}({\rm I\!R})$ . The main advantage of Definition 4.6 is that it is easy to calculate and we will illustrate this in the next section.

5 Application to risk measures

As we discussed in Proposition 2.1, law invariant risk measure of a random variable can be represented as a composition of a risk functional and law of the random variable. In practice, risk of a random variable is often calculated with empirical data, this is because either the true probability distribution is unknown or it might be prohibitively expensive to calculate the risk of a random variable with the true probability distribution. This raises a question as to whether the estimated risk measure based on empirical data is reliable or not. In this section, we apply the quantitative robustness results established in Theorem 4.2 to some well-known risk measures. The next proposition synthesizes Proposition 2.1 and Theorem 4.2.

Proposition 5.1

Let $\rho(X)$ be a tail-dependent law invariant convex risk measure with representation (2.1), let $P_{N}$ and $Q_{N}$ be empirical probability measures defined as in (2.4). Assume that there exists a positive number $p\geq 1$ such that

\displaystyle|\varrho(P_{N})-\varrho(Q_{N})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})|\xi^{k}-\widehat{\xi}^{k}|,\forall\vec{\xi},\widehat{\vec{\xi}}\in{\rm I\!R}^{N}.

(5.1)

Then for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{p}$

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\varrho(P_{N})^{-1},Q^{\otimes N}\circ\varrho(Q_{N})^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty.

(5.2)

In what follows, we verify condition (5.1) for some well-known risk measures and hence show that they satisfy the proposed quantitative statistical robustness (5.2). To make the notation easily, we introduce the law invariant risk measure on the space of probability distributions.

Example 5.1

The expectation of $G\in\mathscr{P}({\rm I\!R})$ given by $\mathbb{E}(G):=\int_{{\rm I\!R}}\xi dG(\xi)$ satisfies

\displaystyle|\mathbb{E}(P_{N})-\mathbb{E}(Q_{N})|=\left|\int_{{\rm I\!R}}\xi d(P_{N}-Q_{N})(\xi)\right|\leq\frac{1}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|.

Let $T_{N}:=\mathbb{E}(\widehat{G}_{N})$ , where $\widehat{G}_{N}$ is the empirical distribution of $G$ . Then for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.3)

and the index of quantitative robustness $\mathrm{iqr}(\mathbb{E})=1$ .

Example 5.2

Consider the conditional value-at-risk of a probability distribution $G\in\mathscr{P}({\rm I\!R})$ at level $\tau\in(0,1)$ , which is defined by

\displaystyle\mbox{\rm{CVaR}}_{\tau}(G):=\inf\left\{r+\frac{1}{1-\tau}\int_{{\rm I\!R}}\max\{0,\xi-r\}dG(\xi),\forall r\in{\rm I\!R}\right\}.

Then

$\displaystyle\|\mbox{\rm{CVaR}}_{p}(P_{N})-\mbox{\rm{CVaR}}_{p}(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\left\|\int_{{\rm I\!R}}\max\{0,\xi-r\}d(P_{N}-Q_{N})(\xi)\right\|$
	$\displaystyle=$	$\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\frac{1}{N}\left\|\sum_{i=1}^{N}\max\{0,\xi^{i}-r\}-\max\{0,\widehat{\xi}^{i}-r\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{1-\tau}\times\frac{1}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|,$

the last inequality is due to the fact that $|\max\{0,x\}-\max\{0,y\}|\leq|x-y|$ holds for all $x,y\in{\rm I\!R}$ .

Let $T_{N}:=\mbox{\rm{CVaR}}_{\tau}(\widehat{G}_{N})$ , where $\widehat{G}_{N}$ is the empirical distribution of $G$ . Then for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\frac{1}{1-\tau}\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.4)

and the index of quantitative robustness $\mathrm{iqr}(\mbox{\rm{CVaR}}_{\tau})=1$ for $\tau\in(0,1)$ .

Example 5.3

The upper semi-deviation $sd_{+}(G)$ of a measure $G\in\mathscr{P}({\rm I\!R})$ , which is defined by

\displaystyle sd_{+}(G):=\int_{{\rm I\!R}}\max\left\{0,\xi-\int_{{\rm I\!R}}udG(u)\right\}dG(\xi),

satisfies

$\displaystyle\|sd_{+}(P_{N})-sd_{+}(Q_{N})\|$	$\displaystyle=$	$\displaystyle\left\|\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left\|\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left\|\left(\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right)-\left(\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\sum_{j=1}^{N}\left(\left\|\xi^{j}-\widehat{\xi}^{j}\right\|+\frac{1}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|\right)$
	$\displaystyle=$	$\displaystyle\frac{2}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|.$

Let $T_{N}:=sd_{+}(\widehat{G}_{N})$ , where $\widehat{G}_{N}$ is the empirical distribution of $G$ . Then for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq 2\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.5)

and the index of quantitative robustness $\mathrm{iqr}(\mbox{\rm{sd}}_{+})=1$ .

Example 5.4

The Optimized Certainty Equivalent (OCE) [31] of $G\in\mathscr{P}({\rm I\!R})$ is given by

\displaystyle S_{u}(G):=\sup_{\eta\in{\rm I\!R}}\left\{\eta+\int_{{\rm I\!R}}u(\xi-\eta)dG(\xi)\right\},

where $u:{\rm I\!R}\rightarrow[-\infty,\infty)$ is a proper concave and non-decreasing utility function satisfying the normalized property: $u(0)=0$ and $1\in\partial u(0)$ , where $\partial u(\cdot)$ denotes the subdifferential map of $u$ . By the essential of [31, Proposition 2.1], we have

\displaystyle S_{u}(P_{N})=\sup_{\eta\in{\rm I\!R}}\left\{\eta+\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)\right\}=\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\{\eta+\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)\right\},

where $\xi_{\min}=\min\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}$ and $\xi_{\max}=\max\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}$ . Let $\rho(G):=-S_{u}(G)$ . Then $\rho(\cdot)$ is a convex risk measure [31] and

$\displaystyle\|\rho(P_{N})-\rho(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\|\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dP_{N}(\xi)\right)-\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dQ_{N}(\xi)\right)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\|\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)-\frac{1}{N}\sum_{i=1}^{N}u(\widehat{\xi}^{i}-\eta)\right\|$
	$\displaystyle\leq$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left\|u(\xi^{i}-\eta)-u(\widehat{\xi}^{i}-\eta)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{i=1}^{N}u^{\prime}_{-}(\xi_{\min})\|\xi^{i}-\widehat{\xi}^{i}\|,$

where $u^{\prime}_{-}(t)$ denotes the left derivative of $u$ at $t$ and the last inequality is due to the fact that $u$ is non-decreasing and concave, subsequently, $u^{\prime}_{-}(t)$ is non-increasing.

Let $T_{N}:=-S_{u}(\widehat{G}_{N})$ , where $\widehat{G}_{N}$ is the empirical distribution of $G$ . We consider two interesting cases.

One is that $\sup_{\eta\in{\rm I\!R}}u^{\prime}_{-}(\eta)<+\infty$ , in which case

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{\eta\in{\rm I\!R}}u^{\prime}_{-}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.6)

for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ and the index of quantitative robustness for this case is $1$ .

The other is that there exists some positive number $p>1$ and positive constant $L$ such that $u^{\prime}_{-}(\xi_{\min})\leq Lc_{p}(\xi^{i},\widehat{\xi}^{i})$ , where $c_{p}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}^{p-1}$ . In that case, we have

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,

(5.7)

and the index of quantitative robustness for this case is $\frac{1}{p}$ .

To see how (5.6) and (5.7) could possibly be satisfied, we consider two specific utility functions: piecewise linear utility function and quadratic utility function, both of which are extracted from [31].

(a) Piecewise linear utility function with $u(t):=\gamma_{1}[t]_{+}+\gamma_{2}[-t]_{+}$ , where $0\leq\gamma_{1}<1<\gamma_{2}$ and $[z]_{+}=\max\{0,z\}$ . A simple calculation yields

\displaystyle|\rho(P_{N})-\rho(Q_{N})|\leq\frac{\gamma_{2}}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|.

Thus for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\gamma_{2}\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.8)

and and the index of quantitative robustness $\mathrm{iqr}(-S_{u})=1$ .

(b) Quadratic utility with $u(t):=(t-\frac{1}{2}t^{2})\mathbf{1}_{(-\infty,1)}(t)+\frac{1}{2}\mathbf{1}_{[1,+\infty)}(t)$ . It is easy to observe that the function is locally Lipschitz continuous over $[\xi_{\min},\xi_{\max}]$ with modulus being bounded by $|1-\xi_{\min}|$ . Thus

	$\displaystyle\|\rho(P_{N})-\rho(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left\|u(\xi^{i}-\eta)-u(\widehat{\xi}^{i}-\eta)\right\|$
		$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{i=1}^{N}\|1-\xi_{\min}\|\|\xi^{i}-\widehat{\xi}^{i}\|.$

Moreover, if $\xi_{\min}\leq-1$ , then $|1-\xi_{\min}|\leq 2|\xi_{\min}|$ . Subsequently,

|\rho(P_{N})-\rho(Q_{N})|\leq\frac{2}{N}\sum_{i=1}^{N}c_{2}(\xi^{i},\widehat{\xi}^{i})|\xi^{i}-\widehat{\xi}^{i}|,

where $c_{2}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}$ . Thus for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{2}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq 2\mathsf{d\kern-0.70007ptl}_{FM,2}(P,Q)

(5.9)

provided that $\xi_{\min}<-1$ and the index of quantitative robustness $\mathrm{iqr}(-S_{u})=\frac{1}{2}$ .

Example 5.5

Suppose that $l:{\rm I\!R}\rightarrow{\rm I\!R}$ is an increasing convex loss function which is not identically constant. Let $x_{0}$ be an interior point in the range of $l$ . The Shortfall Risk Measure [18] of $G\in\mathscr{P}({\rm I\!R})$ is defined by

\displaystyle\rho_{l}(G):=\inf\left\{m\in{\rm I\!R}:\int_{{\rm I\!R}}l(\xi-m)dG(\xi)\leq x_{0}\right\}.

(5.10)

Following a similar analysis to Guo and Xu [32], we can recast the formulation above as

\displaystyle\rho_{l}(G)=\inf_{m\in{\rm I\!R}}\sup_{\lambda\geq 0}\left\{m+\lambda\left(\int_{{\rm I\!R}}l(\xi-m)dG(\xi)-x_{0}\right)\right\}.

(5.11)

Swapping the inf and sup operations, we can obtain the Lagrange dual of the problem. Moreover, if we assume that the inequality constraint in (5.10) satisfies the well-known Slater condition, i.e., there exists $m_{0}$ such that $\int_{\rm I\!R}l(\xi-m_{0})dG(\xi)-x_{0}<0$ , then the Lagrange multipliers of (5.10) is bounded and the strong duality holds. Consequently, we can rewrite (5.11) as

\displaystyle\rho_{l}(G)=\inf_{m\in{\rm I\!R}}\sup_{\lambda\in[a,b]}\left\{m+\lambda\left(\int_{{\rm I\!R}}l(\xi-m)dG(\xi)-x_{0}\right)\right\},

(5.12)

where $a,b$ are some positive numbers. By the essential of [31, Proposition 2.1], we have

	$\displaystyle\rho_{l}(P_{N})$	$\displaystyle=$	$\displaystyle\sup_{\lambda\in[a,b]}\inf_{m\in{\rm I\!R}}\left\{m+\lambda\left(\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-\eta)-x_{0}\right)\right\}$
		$\displaystyle=$	$\displaystyle\sup_{\lambda\in[a,b]}\inf_{m\in[\xi_{\min},\xi_{\max}]}\left\{m+\lambda\left(\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-m)-x_{0}\right)\right\},$

where $\xi_{\min}=\min\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}$ and $\xi_{\max}=\max\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}$ . Subsequently,

$\displaystyle\|\rho_{l}(P_{N})-\rho_{l}(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\left\|\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-m)-\frac{1}{N}\sum_{i=1}^{N}l(\widehat{\xi}^{i}-m)\right\|$
	$\displaystyle\leq$	$\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left\|l(\xi^{i}-m)-l(\widehat{\xi}^{i}-m)\right\|$
	$\displaystyle\leq$	$\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}[l^{\prime}_{+}(\xi^{i}-m)\vee l^{\prime}_{+}(\widehat{\xi}^{i}-m)]\|\xi^{i}-\widehat{\xi}^{i}\|$
	$\displaystyle\leq$	$\displaystyle\frac{b}{N}\sum_{i=1}^{N}[l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})]\|\xi^{i}-\widehat{\xi}^{i}\|$
	$\displaystyle\leq$	$\displaystyle\frac{b}{N}\sum_{i=1}^{N}\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)\|\xi^{i}-\widehat{\xi}^{i}\|,$

where $l^{\prime}_{+}(t)$ denote the right derivative of $l$ at $t$ and the last three inequalities are due to the fact $l$ is non-decreasing convex, subsequently, $l^{\prime}_{+}(t)$ is non-decreasing.

Let $T_{N}:=\rho_{l}(\widehat{G}_{N})$ , where $\widehat{G}_{N}$ is the empirical distribution of $G$ . If $\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)<+\infty$ , then for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{1}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty.

(5.13)

If there exists some positive number $p>1$ and positive constant $L$ such that $l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})\leq Lc_{p}(\xi^{i},\widehat{\xi}^{i})$ , where $c_{p}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}^{p-1}$ , then

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty.

(5.14)

In what follows, we illustrate the above two inequalities with two specific loss functions: deposit insurance loss function [33] and $p$ -th power loss function [18].

(a) Deposit insurance loss function, $l(x)=[x]_{+}$ , where $[x]_{+}=\max\{x,0\}$ . Then $\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)<+\infty$ . Thus, for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{\phi}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,

(5.15)

and the index of quantitative robustness is 1.

(b) For $x_{0}>0$ , we consider the $p$ -th power loss function,

l(x)=\begin{cases}\frac{1}{p}x^{p},&\mbox{\rm{if}}\;x\geq 0\\ 0,&\mbox{\rm{otherwise}}\end{cases},

where $p>1$ . We have $l^{\prime}_{+}(x)=x^{p-1}$ for $x\geq 0$ and $l^{\prime}_{+}(x)=0$ for $x<0$ . Then, if $\xi_{\min}\geq 0$ , then $0\leq\xi^{i}-\xi_{\min}\leq|\xi^{i}|$ and subsequently $l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})\leq c_{p}(\xi^{i},\widehat{\xi}^{i})$ . Thus for any $N\in\mathbb{N}$ and any $P,Q\in{\cal M}_{1}^{p}$ ,

\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty

(5.16)

provided that $\xi_{\min}\geq 0$ and the index of quantitative robustness is $\frac{1}{p}$ .

In all of the above examples, the risk measures can either be represented explicitly in the form of $\int_{\rm I\!R}f(x)dP(x)$ (such as Expectation) or be obtained from solving an optimization problem where the underlying functions are represented in the expected utility form (CVaR, Certainty Equivalent and Shortfall risk measure), this is because the utility (disutility) functions are assumed to be concave (convex) and hence locally Lipschitz continuous. When growth of the Lipschitz modulus is controlled by $c_{p}(\xi,\xi^{\prime})$ , these risk measures satisfy inequality (4.5) as we have shown. This may not work for the spectral risk measures [34] with unbounded risk spectrum because the latter distort the probability distribution $P(x)$ . However, when the risk spectrum is bounded (such as CVaR which is a special case of spectral risk measure), we can still manage inequality (4.5). This explains why we haven’t included spectral risk measures in the examples.

References

[1] V. Krätschmer, A. Schied, and H. Zähle, “Comparative and qualitative robustness for law-invariant risk measures,” Finance and Stochastics, vol. 18, no. 2, pp. 271–295, 2014.
[2] R. Cont, R. Deguest, and G. Scandolo, “Robustness and sensitivity analysis of risk measurement procedures,” Quantitative finance, vol. 10, no. 6, pp. 593–606, 2010.
[3] H. Zähle, “Rates of almost sure convergence of plug-in estimates for distortion risk measures,” Metrika, vol. 74, no. 2, pp. 267–285, 2011.
[4] D. Belomestny and V. Krätschmer, “Central limit theorems for law-invariant coherent risk measures,” Journal of Applied Probability, vol. 49, no. 1, pp. 1–21, 2012.
[5] E. Beutner and H. Zähle, “A modified functional delta method and its application to the estimation of risk functionals,” Journal of Multivariate Analysis, vol. 101, no. 10, pp. 2452–2463, 2010.
[6] F. R. Hampel, “A general qualitative definition of robustness,” The Annals of Mathematical Statistics, pp. 1887–1896, 1971.
[7] V. Krätschmer, A. Schied, and H. Zähle, “Qualitative and infinitesimal robustness of tail-dependent statistical functionals,” Journal of Multivariate Analysis, vol. 103, no. 1, pp. 35–47, 2012.
[8] H. Zähle, “Qualitative robustness of von mises statistics based on strongly mixing data,” Statistical Papers, vol. 55, no. 1, pp. 157–167, 2014.
[9] H. Zähle et al., “Qualitative robustness of statistical functionals under strong mixing,” Bernoulli, vol. 21, no. 3, pp. 1412–1434, 2015.
[10] G. Boente, R. Fraiman, V. J. Yohai, et al., “Qualitative robustness for stochastic processes,” The Annals of Statistics, vol. 15, no. 3, pp. 1293–1312, 1987.
[11] K. Strohriegl and R. Hable, “Qualitative robustness of estimators on stochastic processes,” Metrika, vol. 79, no. 8, pp. 895–917, 2016.
[12] P. J. Huber and E. M. Ronchetti, Robust statistics. Springer, 2011.
[13] H. Zähle, “A definition of qualitative robustness for general point estimators, and examples,” Journal of Multivariate Analysis, vol. 143, pp. 12–31, 2016.
[14] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust statistics: the approach based on influence functions, vol. 196. John Wiley & Sons, 2011.
[15] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibián-Barrera, Robust statistics: theory and methods (with R). John Wiley & Sons, 2019.
[16] S. Guo and H. Xu, “Statistical robustness in utility preference robust optimization models,” Submitted to Mathematical Programming, 2020.
[17] D. Filipović and G. Svindland, “The canonical model space for law-invariant convex risk measures is l1,” Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, vol. 22, no. 3, pp. 585–589, 2012.
[18] H. Föllmer and A. Schied, “Convex measures of risk and trading constraints,” Finance and stochastics, vol. 6, no. 4, pp. 429–447, 2002.
[19] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, “Coherent measures of risk,” Mathematical finance, vol. 9, no. 3, pp. 203–228, 1999.
[20] H. Föllmer and S. Weber, “The axiomatic approach to risk measures for capital determination,” Annual Review of Financial Economics, vol. 7, pp. 301–337, 2015.
[21] E. Delage, D. Kuhn, and W. Wiesemann, ““dice”-sion–making under uncertainty: When can a random decision reduce risk?,” Management Science, vol. 65, no. 7, pp. 3282–3301, 2019.
[22] M. Frittelli, M. Maggis, and I. Peri, “Risk measures on and value at risk with probability/loss function,” Mathematical Finance, vol. 24, no. 3, pp. 442–463, 2014.
[23] D. Dentcheva and A. Ruszczyński, “Risk preferences on the space of quantile functions,” Mathematical Programming, vol. 148, no. 1-2, pp. 181–200, 2014.
[24] W. B. Haskell, W. Huang, and H. Xu, “Preference elicitation and robust optimization with multi-attribute quasi-concave choice functions,” arXiv preprint arXiv:1805.06632, 2018.
[25] A. L. Gibbs and F. E. Su, “On choosing and bounding probability metrics,” International statistical review, vol. 70, no. 3, pp. 419–435, 2002.
[26] I. Mizera et al., “Qualitative robustness and weak continuity: the extreme unction,” Nonparametrics and robustness in modern statistical inference and time series analysis: a Festschrift in honor of Professor Jana Jurecková, vol. 1, p. 169, 2010.
[27] W. Römisch, “Stability of stochastic programming problems,” Handbooks in operations research and management science, vol. 10, pp. 483–554, 2003.
[28] G. C. Pflug and A. Pichler, “Approximations for probability distributions and stochastic optimization problems,” in Stochastic optimization methods in finance and energy, pp. 343–387, Springer, 2011.
[29] S. T. Rachev, Probability metrics and the stability of stochastic models, vol. 269. John Wiley & Son Ltd, 1991.
[30] P. Mattila, Geometry of sets and measures in Euclidean spaces: fractals and rectifiability. No. 44, Cambridge university press, 1999.
[31] A. Ben-Tal and M. Teboulle, “An old-new concept of convex risk measures: The optimized certainty equivalent,” Mathematical Finance, vol. 17, no. 3, pp. 449–476, 2007.
[32] S. Guo, H. Xu, and L. Zhang, “Convergence analysis for mathematical programs with distributionally robust chance constraint,” SIAM Journal on optimization, vol. 27, no. 2, pp. 784–816, 2017.
[33] C. Chen, G. Iyengar, and C. C. Moallemi, “An axiomatic approach to systemic risk,” Management Science, vol. 59, no. 6, pp. 1373–1388, 2013.
[34] C. Acerbi, “Spectral measures of risk: A coherent representation of subjective risk aversion,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1505–1518, 2002.
[35] S. Kusuoka, “On law invariant coherent risk measures,” in Advances in mathematical economics, pp. 83–95, Springer, 2001.

Appendix A

Lemma A.1

Let $\{a_{i}\}_{i=1}^{N}$ and $\{b_{i}\}_{i=1}^{N}$ be two non-decreasing sequences. Then for any permutation $\{k_{1},k_{2},\ldots,k_{N}\}$ of $\{1,2,\ldots,N\}$ , we have

\displaystyle\sum_{i=1}^{N}|a_{i}-b_{i}|\leq\sum_{i=1}^{N}|a_{i}-b_{k_{i}}|.

Proof. The result is perhaps well known. We include a proof as we cannot find a reference. We do so by induction.

For $N=1$ , the statement is trivial and for $N=2$ , $|a_{1}-b_{1}|+|a_{2}-b_{2}|\leq|a_{1}-b_{2}|+|a_{2}-b_{1}|$ for any $a_{1}\leq a_{2}$ and $b_{1}\leq b_{2}$ . Assume that the conclusion holds for $N\leq n$ . Then for $N=n+1$ , we have for any non-decreasing sequences $\{a_{i}\}_{i=1}^{n+1}$ and $\{b_{i}\}_{i=1}^{n+1}$ and any permutation of $\{k_{1},\ldots,k_{n+1}\}$ of $\{1,\ldots,n+1\}$ , there exists a $j\in\{1,\ldots,n+1\}$ such that $b_{k_{j}}=b_{n+1}$ . If $j=n+1$ , then from induction hypothesis for $N=n$ , we have

\displaystyle\sum_{i=1}^{n+1}|a_{i}-b_{i}|=\sum_{i=1}^{n}|a_{i}-b_{i}|+|a_{n+1}-b_{k_{n+1}}|\leq\sum_{i=1}^{n+1}|a_{i}-b_{k_{i}}|.

If $j<n+1$ , then we have

$\displaystyle\sum_{i=1}^{n+1}\|a_{i}-b_{k_{i}}\|$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{j-1}\|a_{i}-b_{k_{i}}\|+\sum_{i=j+1}^{n}\|a_{i}-b_{k_{i}}\|+\|a_{j}-b_{k_{j}}\|+\|a_{n+1}-b_{k_{n+1}}\|$
	$\displaystyle\geq$	$\displaystyle\sum_{i=1}^{j-1}\|a_{i}-b_{k_{i}}\|+\sum_{i=j+1}^{n}\|a_{i}-b_{k_{i}}\|+\|a_{j}-b_{k_{n+1}}\|+\|a_{n+1}-b_{k_{j}}\|$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{j-1}\|a_{i}-b_{k_{i}}\|+\sum_{i=j+1}^{n}\|a_{i}-b_{k_{i}}\|+\|a_{j}-b_{k_{n+1}}\|+\|a_{n+1}-b_{n+1}\|$
	$\displaystyle\geq$	$\displaystyle\sum_{i=1}^{n}\|a_{i}-b_{i}\|+\|a_{n+1}-b_{n+1}\|=\sum_{i=1}^{n+1}\|a_{i}-b_{i}\|,$

where the first inequality is from induction hypothesis for $N=2$ to the non-decreasing sequences $\{a_{j},a_{n+1}\}$ and $\{b_{k_{n+1}},b_{k_{j}}\}$ and the second inequality is due to induction hypothesis for $N=n$ to the non-decreasing sequences $\{a_{1},\ldots,a_{n}\}$ and $\{b_{1},\ldots,b_{n}\}$ .

Proposition A.1

Let $\{a_{i}\}_{i=1}^{N}$ be a sequence of numbers and $\{b_{i}\}_{i=1}^{N}$ be a sequence of non-negative numbers. If $a_{i_{1}}\leq a_{i_{2}}\leq\cdots\leq a_{i_{N}}$ , $b_{i_{1}}\leq b_{i_{2}}\leq\cdots\leq b_{i_{N}}$ , then

\displaystyle\sum_{k=1}^{N}a_{k}b_{k}\leq\sum_{k=1}^{N}a_{i_{k}}b_{i_{k}}.

See e.g. [35, Proposition 12].

Appendix B

Example B.1

In this example, we show that both inclusions in (3.12) are strict. We first show that $\mathcal{M}_{1}^{\phi}\neq\mathscr{P}_{(\phi)}({\rm I\!R})$ , i.e., there exists a $P\in\mathscr{P}_{(\phi)}({\rm I\!R})$ such that $P\notin\mathcal{M}_{1}^{\phi}$ . Let $\phi$ be a unbounded $u$ -shaped function. Then by the continuity of $\phi$ , there exist $a<0$ and $b>0$ with $\phi(a)=2=\phi(b)$ . Let

\displaystyle P(x)=\begin{cases}\frac{1}{\phi(x)},&\mbox{\rm{for}}\;x\leq a,\\ \frac{1}{2},&\mbox{\rm{for}}\;a\leq x\leq b\\ 1-\frac{1}{\phi(x)},&\mbox{\rm{for}}\;x\geq b.\end{cases},

Since $\phi(x)\geq 1$ for all $x$ outside $[a,b]$ , then $P(x)$ is well-defined on ${\rm I\!R}$ . By the monotonicity and unboundedness of $\phi$ , we have $P\in\mathscr{P}({\rm I\!R})$ . Moreover, since

\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|=2,

then $P\in\mathscr{P}_{(\phi)}({\rm I\!R})$ . However, by change of variables in integration, we have

	$\displaystyle\int_{{\rm I\!R}}\phi(x)dP(x)$	$\displaystyle=$	$\displaystyle\int_{-\infty}^{a}\phi(x)d\left(\frac{1}{\phi(x)}\right)+\int_{b}^{+\infty}\phi(x)d\left(1-\frac{1}{\phi(x)}\right)$
		$\displaystyle=$	$\displaystyle\int_{0}^{\frac{1}{2}}\frac{1}{t}dt+\int_{\frac{1}{2}}^{1}\frac{1}{1-t}dt=2\int_{0}^{\frac{1}{2}}\frac{1}{t}dt=+\infty,$

which means $P\notin\mathcal{M}_{1}^{\phi}$ .

Now we show that $\mathscr{P}_{(\phi)}({\rm I\!R})\neq\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}$ , i.e., there exists a $P\in\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}$ such that $P\notin\mathscr{P}_{(\phi)}({\rm I\!R})$ . Let $\phi$ be an unbounded $u$ -shaped function. Then there exists an unbounded $u$ -shaped function $\psi$ such that $\lim_{|x|\rightarrow+\infty}\psi(x)/\phi(x)=0$ . More precisely, for any $\epsilon\in(0,1)$ , there exists an unbounded $u$ -shaped function $\psi$ such that

\displaystyle\lim_{|x|\rightarrow+\infty}\psi(x)/\phi(x)^{1-\epsilon}=0.

(B.1)

We construct such $\psi$ as follows: since $\phi$ is an unbounded $u$ -shape function, then there exist $a<0$ and $b>0$ with $\phi(a)=e^{2}=\phi(b)$ . Let

\displaystyle\psi(x)=\begin{cases}\ln{(\phi(x))},&\mbox{\rm{for}}\;x\leq a,\\ 2,&\mbox{\rm{for}}\;a\leq x\leq b,\\ \ln{(\phi(x))},&\mbox{\rm{for}}\;x\geq b.\end{cases}

Then $\psi$ is an unbounded $u$ -shaped function and satisfies (B.1). Let

\displaystyle P(x)=\begin{cases}\frac{1}{\psi(x)},&\mbox{\rm{for}}\;x\leq a,\\ \frac{1}{2},&\mbox{\rm{for}}\;a\leq x\leq b\\ 1-\frac{1}{\psi(x)},&\mbox{\rm{for}}\;x\geq b.\end{cases},

Since $\psi(x)\geq 1$ for all $x$ , then $P(x)$ is well-defined on ${\rm I\!R}$ . By the monotonicity and unboundedness of $\psi$ , we have $P\in\mathscr{P}({\rm I\!R})$ .

For fixed $\epsilon\in(0,1)$ , by change of variables in integration, we have

$\displaystyle\int_{{\rm I\!R}}\phi(x)^{1-\epsilon}dP(x)$	$\displaystyle=$	$\displaystyle\int_{-\infty}^{a}\phi(x)^{1-\epsilon}d\left(\frac{1}{\psi(x)}\right)+\int_{b}^{+\infty}\phi(x)^{1-\epsilon}d\left(1-\frac{1}{\psi(x)}\right)$
	$\displaystyle=$	$\displaystyle\int_{-\infty}^{a}\phi(x)^{1-\epsilon}d\left(\frac{1}{\ln{\phi(x)}}\right)+\int_{b}^{+\infty}\phi(x)^{1-\epsilon}d\left(1-\frac{1}{\ln{\phi(x)}}\right)$
	$\displaystyle=$	$\displaystyle\int_{0}^{\frac{1}{\ln{2}}}e^{\frac{1-\epsilon}{t}}dt+\int_{\frac{1}{\ln{2}}}^{1}e^{\frac{1-\epsilon}{1-t}}dt$
	$\displaystyle<$	$\displaystyle+\infty.$

Since for $\epsilon\geq 1$ , $\phi^{1-\epsilon}$ is bounded on ${\rm I\!R}$ , then $\mathcal{M}_{1}^{\phi^{1-\epsilon}}=\mathscr{P}({\rm I\!R})$ . Thus, $P\in\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}$ . However,

\displaystyle\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|\geq\sup_{x\leq a}\left|\frac{\phi(x)}{\ln{(\phi(x))}}\right|+\sup_{x\geq b}\left|\frac{\phi(x)}{\ln{(\phi(x))}}\right|=+\infty,

which means $P\notin\mathscr{P}_{(\phi)}({\rm I\!R})$ .

	$\displaystyle\left\|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right\|$	$\displaystyle=$	$\displaystyle\left\|\int_{{\rm I\!R}}(\psi(\xi)-\psi(0))P(d\xi)\right\|\leq\int_{{\rm I\!R}}\|\psi(\xi)-\psi(0)\|P(d\xi)$
		$\displaystyle=$	$\displaystyle\int_{{\rm I\!R}}\max\{\|\xi\|,\|\xi\|^{p}\}P(d\xi)\leq\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi).$

$\displaystyle\|h_{{\xi}^{-j}}(\tilde{\xi}^{j})-h_{{\xi}^{-j}}(\widehat{\xi}^{j})\|$	$\displaystyle\leq$	$\displaystyle\int_{{\rm I\!R}^{(N-1)}}\left\|\psi({\xi}^{-j},\tilde{\xi}^{j})-\psi({\xi}^{-j},\widehat{\xi}^{j})\right\|P_{-j}(d\xi^{-j})$
	$\displaystyle\leq$	$\displaystyle\int_{{\rm I\!R}^{(N-1)}}\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})\|\tilde{\xi}^{j}-\widehat{\xi}^{j}\|P_{-j}(d\xi^{-j})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})\|\tilde{\xi}^{j}-\widehat{\xi}^{j}\|.$

$\displaystyle\|\mbox{\rm{CVaR}}_{p}(P_{N})-\mbox{\rm{CVaR}}_{p}(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\left\|\int_{{\rm I\!R}}\max\{0,\xi-r\}d(P_{N}-Q_{N})(\xi)\right\|$
	$\displaystyle=$	$\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\frac{1}{N}\left\|\sum_{i=1}^{N}\max\{0,\xi^{i}-r\}-\max\{0,\widehat{\xi}^{i}-r\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{1-\tau}\times\frac{1}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|,$

$\displaystyle\|sd_{+}(P_{N})-sd_{+}(Q_{N})\|$	$\displaystyle=$	$\displaystyle\left\|\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left\|\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left\|\left(\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right)-\left(\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{2}\sum_{j=1}^{N}\left(\left\|\xi^{j}-\widehat{\xi}^{j}\right\|+\frac{1}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|\right)$
	$\displaystyle=$	$\displaystyle\frac{2}{N}\sum_{i=1}^{N}\|\xi^{i}-\widehat{\xi}^{i}\|.$

$\displaystyle\|\rho(P_{N})-\rho(Q_{N})\|$	$\displaystyle\leq$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\|\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dP_{N}(\xi)\right)-\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dQ_{N}(\xi)\right)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\|\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)-\frac{1}{N}\sum_{i=1}^{N}u(\widehat{\xi}^{i}-\eta)\right\|$
	$\displaystyle\leq$	$\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left\|u(\xi^{i}-\eta)-u(\widehat{\xi}^{i}-\eta)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{N}\sum_{i=1}^{N}u^{\prime}_{-}(\xi_{\min})\|\xi^{i}-\widehat{\xi}^{i}\|,$