This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures

Wei Wang222School of Business, University of Southampton, Southampton, SO17 1BJ, UK (ww1e17@soton.ac.uk).    Huifu Xu333Department of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong (hfxu@se.cuhk.edu.hk), Shatin, N. T., Hong Kong.    Tiejun Ma444School of Business, University of Southampton, Southampton, SO17 1BJ, UK (tiejun.ma@soton.ac.uk).
Abstract

When estimating the risk of a financial position with empirical data or Monte Carlo simulations via a tail-dependent law invariant risk measure such as the Conditional Value-at-Risk (CVaR), it is important to ensure robustness of the statistical estimator particularly when the data contain noise. Krätscher et al. [1] propose a new framework to examine the qualitative robustness of estimators for tail-dependent law invariant risk measures on Orlicz spaces, which is a step further from earlier work for studying the robustness of risk measurement procedures by Cont et al. [2]. In this paper, we follow the stream of research to propose a quantitative approach for verifying the statistical robustness of tail-dependent law invariant risk measures. A distinct feature of our approach is that we use the Fortet-Mourier metric to quantify variation of the true underlying probability measure in the analysis of the discrepancy between the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data, which enables us to derive an explicit error bound for the discrepancy when the risk functional is Lipschitz continuous with respect to a class of admissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us to examine the degree of robustness for tail-dependent risk measures. Finally, we apply our quantitative approach to some well-known risk measures to illustrate our theory.

Keywords. Quantitative robustness, tail-dependent law invariant risk measures, Fortet-Mourier metric, admissible laws, index of quantitative robustness.

1 Introduction

One of the main purposes of quantitative modeling in finance is to quantify the loss of a financial portfolio. Over the past two decades, various risk measures have been proposed for measuring the risk of financial portfolios. A risk measure is represented as a map assigning an extended real number (a measure of risk) to each random loss under an implicit assumption that the true loss probability distribution is known. However, in practice, the true probability distribution is often unknown or it is prohibitively expensive to calculate the risk using the true distribution. Thus, in applications, evaluating the risk of a random variable representing the loss of a financial position often involves two steps: estimating the probability distribution from available observations or from the sampling data of the random financial loss via, e.g., Monte Carlo method and then plugging the estimated distribution into a risk measure to quantify the financial loss. This is because the risk measures are mostly law invariant, that is, they are determined only by the probability distributions of random variables. For the loss of a financial portfolio, a measure of risk computed based on the estimated distribution is known as a plug-in estimate for the risk measure [3].

Let XX denote the random loss of a financial portfolio on a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) and ρ\rho be a law invariant risk measure. The plug-in estimate for ρ(X)\rho(X) is given by ϱ(P^)\varrho(\widehat{P}), where P^\widehat{P} is the empirical distribution based on available observations and ϱ\varrho is a risk functional defined by

ϱ(P)=ρ(X),ifXhas lawP;\displaystyle\varrho(P)=\rho(X),\;\;\mbox{\rm{if}}\;X\;\mbox{\rm{has law}}\;P; (1.1)

see e.g. [4, 5]. In the literature, Cont et al. [2] first study the quality of statistical estimators of the law invariant risk measures using Hampel’s classical concept of qualitative robustness [6], that is, a risk functional estimator is said to be qualitatively robust if it is insensitive to the variation of the sampling data. The research is important because perceived data (particularly empirical data) may contain some noise. Without such insensitivity, financial activities based on the risk measures may cause damage. For instance, when ρ(X)\rho(X) is applied to allocate the risk capital for an insurance company, altering the capital allocation may be costly. According to Hampel’s theorem, Cont et al. [2] demonstrate that the qualitative robustness of a statistical estimator is equivalent to the weak continuity of the risk functional, and that value at risk (VaR) is qualitatively robust whereas conditional value at risk (CVaR) is not.

Krätschmer et al [7] argue that the use of Hampel’s classical concept of qualitative robustness may be problematic because it requires the risk measure essentially to be insensitive with respect to the tail behaviour of the random variable and the recent financial crisis shows that a faulty estimate of tail behaviour can lead to a drastic underestimation of the risk. Consequently, they propose a refined notion of qualitative robustness that applies also to tail-dependent statistical functionals and that allows us to compare statistical functionals in regards to their degree of robustness. The new concept captures the trade-off between robustness and sensitivity and can be quantified by an index of qualitative robustness. Furthermore, under the new concept, Krätschmer et al [1] analyze the qualitative robustness to the law-invariant convex risk measure on Orlicz spaces and show that CVaR and spectral risk measures are all qualitatively robust when the perturbation of probability distribution is restricted to a finer topological space. Alternative generalizations of Hampel’s theorem can be found for strong mixing data (Zähle [8, 9]) and for stochastic processes in various ways (Boente et al [10] and Strohriegl and Hable [11]). For comprehensive study of statistical robustness, we refer readers to [12, 13, 14, 15] and references therein.

In this paper, we take a step further by deriving an error bound for the plug-in estimators of law invariant risk measures in terms of the variation of data and we call the analysis quantitative because no such error bound is established in the existing qualitative robust analysis. This is achieved by adopting different metrics to measure the discrepancy of the estimators and the variation of data. Specifically, we use the Fortet-Mourier metrics as opposed to the Lévy distance in Cont et al. [2] or the weighted Kolmogorov metric in Krätschmer et al. [7] to quantify the data variation (the perturbation of the true probability distributions). Moreover, we introduce a new notion of the so-called admissible laws, which effectively restrict the scope of data variation. The new metrics enable us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators (of law invariant risk measure based on the true data and perturbed data) and the discrepancy of the associated probability distributions of the data. The research is inspired by the recent work of Guo and Xu [16] where the authors derive quantitative statistical robustness for preference robust optimization models under Kantorovich metric. The main contributions of the paper can be summarized as follows.

First, we introduce the notion of admissible laws induced by a probability metric, which is a class of probability distributions whose discrepancy with the law of the Dirac measure at 0 is finite. The admissibility effectively restricts the scope of data perturbation. Using the notion, we compare the admissibility under ϕ\phi-topology and the Fortet-Mourier metric.

Second, we propose to use the Fortet-Mourier metric to quantify the variation of the probability measure. The metric enables us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data by noise and the change of the true underlying probability measures when the risk functional is Lipschitz continuous on a class of admissible laws. We find that the risk functionals associated with the general moment-type convex risk measures are Lipschitz continuous.

Third, we introduce the concept of Lipschitz continuity for a general statistical functional on a class of admissible laws induced by the Fortet-Morier metric and find that for the Lipschitz continuous risk measure, the parameter of the Fortet-Mourier metric allows us to compare the tail-dependent risk measures with regard to their degree of robustness, i.e., the index of statistical robustness.

Fourth, we apply the new approach to examine the quantitative statistical robustness of a range of well known risk measures, including CVaR, optimized certainty equivalent, shortfall risk measure and conclude that under mild conditions, they are all quantitatively robust, and the indexes of quantitative robustness to them are also calculated.

The rest of the paper is organized as follows. In Section 2, we set up the background of the problem for research. In Section 3, we introduce the concept of Fortet-Mourier metric and admissible laws. In section 4, we establish the quantitative statistical robustness theory and compare with the qualitative statistical robustness theory. In section 5, we apply our theory to risk measures and give some examples. Some technical details are given in the appendix.

2 Problem statement

In this section, we discuss the background of statistical robustness in the context of law invariant risk measures. We begin by a brief review of law invariant risk measures and its estimation, and then move to explain the issues when the data may contain noise.

Let (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be an atomless probability space, where Ω\Omega is a sample space with sigma algebra \mathcal{F} and \mathbb{P} is a probability measure. Let X:(Ω,,)IRX:(\Omega,\mathcal{F},\mathbb{P})\rightarrow{\rm I\!R} be a financial loss and FX(x):=(Xx)F_{X}(x):=\mathbb{P}(X\leq x) be the law or the probability distribution of XX. For p1p\geq 1, let p(Ω,,)\mathscr{L}^{p}(\Omega,{\cal F},\mathbb{P}) (p\mathscr{L}^{p} for short) denote the space of random variables mapping from (Ω,,)(\Omega,{\cal F},\mathbb{P}) to IR{\rm I\!R} with finite pp-th order moments. We say that a map ρ:1IR¯:=IR{+}\rho:\mathscr{L}^{1}\to\overline{{\rm I\!R}}:={\rm I\!R}\cup\{+\infty\} is a convex risk measure111We note that the canonical model space for law invariant convex risk measure is 1\mathscr{L}^{1} [17]. [18] if it satisfies the following properties:

  1. (i)

    Monotonicity: ρ(X)ρ(Y)\rho(X)\leq\rho(Y) for X,Y1X,Y\in\mathscr{L}^{1} with XYX\leq Y \mathbb{P}-almost surely;

  2. (ii)

    Translation invariance: ρ(X+c)=ρ(X)+c\rho(X+c)=\rho(X)+c for X1X\in\mathscr{L}^{1} and cIRc\in{\rm I\!R};

  3. (iii)

    Convexity: ρ(λX+(1λ)Y)λρ(X)+(1λ)ρ(Y)\rho(\lambda X+(1-\lambda)Y)\leq\lambda\rho(X)+(1-\lambda)\rho(Y) for X,Y1X,Y\in\mathscr{L}^{1} and λ[0,1]\lambda\in[0,1].

Moreover, if ρ\rho satisfies positive homogeneity, i.e., for any α0\alpha\geq 0, ρ(αX)=αρ(X)\rho(\alpha X)=\alpha\rho(X), then ρ\rho is a coherent risk measure, see [19, 18] for the original definitions of these concepts. A risk measure ρ\rho is said to be law invariant if ρ(X)=ρ(Y)\rho(X)=\rho(Y) for XX and YY having the same law. We refer readers to Föllmer and Weber [20] for a recent overview of risk measures.

As discussed in [2, 7], it is a widely-accepted procedure to estimate the risk of a financial loss by means of a Monte Carlo method or from a set of available observations. Such a procedure is particularly sensible when ρ\rho is law invariant. The following proposition states that the law invariance of a risk measure ρ\rho is equivalent to the existence of a risk functional ϱ\varrho in (1.1).

Proposition 2.1

Let 𝒫(IR)\mathscr{P}({\rm I\!R}) denote the set of all probability measures on IR{\rm I\!R}. If ρ:1IR¯\rho:\mathscr{L}^{1}\to\overline{{\rm I\!R}} is a law invariant risk measure, then there exists a unique risk functional ϱ:𝒫(IR)IR¯\varrho:\mathscr{P}({\rm I\!R})\to\overline{{\rm I\!R}} associated with ρ\rho such that for any X1X\in\mathscr{L}^{1},

ρ(X)=ϱ(X1).\displaystyle\rho(X)=\varrho(\mathbb{P}\circ X^{-1}). (2.1)

The result is well-known, see for instance Delage et al. [21] for random variables defined in \mathscr{L}^{\infty}. The usefulness of the representation is that it naturally captures the law invariance and allows one to define any law invariant risk measure directly over the space of probability measures 𝒫(IR)\mathscr{P}({\rm I\!R}) induced by random variables in \mathscr{L}^{\infty} (also known as probability distributions), see Fritelli et al. [22]. Dentcheva and Ruszczyński [23] take it further to define a class of law invariant risk measures in the space of quantile functions directly. In a more recent development, Haskell et al. [24] extend the research to a broad class of multi-attribute choice functions defined over the space of survival functions. Let P:=X1P:=\mathbb{P}\circ X^{-1} be the push-forward probability measure on IR{\rm I\!R} induced by XX. Since (Xx)\mathbb{P}(X\leq x) coincides with P((,x])P((-\infty,x]) (P(x)P(x) for short), we also call PP the distribution or the law of XX interchangeably throughout the paper. Consequently, we can write (2.1) as (1.1).

In this paper, we are not concerned with the definition of risk measures over the space of probability distributions or the space of quantile functions, rather we concentrate on the stability of statistical estimators of law invariant risk measures. The risk functional ϱ(P)\varrho(P) with the law P=X1P=\mathbb{P}\circ X^{-1} can be used in a natural way to construct an estimator for the risk ρ(X)\rho(X) of X1X\in\mathscr{L}^{1}. All one needs to do is to take an estimate PNP_{N} of PP based on the available observations of XX and then to plug this estimator into the risk functional ϱ\varrho to obtain the desired estimator of ρ(X)\rho(X), i.e.,

ϱ^N(ξ1,ξ2,,ξN):=ϱ(PN),\displaystyle\widehat{\varrho}_{N}(\xi^{1},\xi^{2},\ldots,\xi^{N}):=\varrho(P_{N}), (2.2)

where in this paper, PNP_{N} can be seen as the empirical distribution of an independent and identically distributed (i.i.d., for short) sequence ξ1,ξ2,,ξN\xi^{1},\xi^{2},\ldots,\xi^{N} of historical observations or Monte Carlo simulations, i.e.,

PN(x):=1Ni=1N𝟏ξix,xIR.\displaystyle P_{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\xi^{i}\leq x},\quad x\in{\rm I\!R}. (2.3)

Here and later on 𝟏A\mathbf{1}_{A} denotes the indicator function of event AA. Indeed, PNP_{N} can be a fairly general estimates, for instance, PNP_{N} can be a smoothed empirical distribution based on uncensored data or empirical distribution based on censored data, see, e.g., [3] or empirical distribution based on identically distributed dependent data, see, e.g., [9].

We can see that ϱ^N\widehat{\varrho}_{N} is a mapping from IRN{\rm I\!R}^{N} to IR{\rm I\!R}. Figure 1 illustrates the relationship between the risk functionals, their estimators and the spaces associated.

IRN{\rm I\!R}^{N}1(Ω)\mathscr{L}^{1}(\Omega)𝒫(IR)\mathscr{P}({\rm I\!R})IR¯\overline{{\rm I\!R}}EstimatingSamplingϱ\varrhoρ\rhoϱ^N\widehat{\varrho}_{N}True
Figure 1: The diagram for risk functionals, their estimators and associated spaces

In practice, the samples obtained from empirical data may contain noise. In that case, we might regard the samples as generated by a perturbed random variable YY with law QQ, that is, Q=Y1Q=\mathbb{P}\circ Y^{-1}. Let ξ~1,,ξ~N\tilde{\xi}^{1},\cdots,\tilde{\xi}^{N} be i.i.d samples from YY. Then the practical empirical distribution function for estimating the law of XX is

QN(x):=1Ni=1N𝟏ξ~ix,xIR,\displaystyle Q_{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\tilde{\xi}^{i}\leq x},\quad x\in{\rm I\!R}, (2.4)

and the practical estimator is ϱ~N=ϱ^N(ξ~1,,ξ~N):=ϱ(QN)\tilde{\varrho}_{N}=\widehat{\varrho}_{N}(\tilde{\xi}^{1},\cdots,\tilde{\xi}^{N}):=\varrho(Q_{N}) with perceived empirical data whereas ϱ^N\widehat{\varrho}_{N} is a statistical estimator with noise being detached. Since we are unable to obtain the latter, we tend to use the former as a statistical estimator of ρ(X)\rho(X) and this works only if the two estimators are sufficiently close.

To quantify the closeness, we may look into the discrepancy between the laws of the two estimators under some metric 𝖽𝗅\mathsf{d\kern-0.70007ptl}, i.e.,

𝖽𝗅(law{ϱ(PN)},law{ϱ(QN)})=𝖽𝗅(PNϱ^N1,QNϱ^N1),\displaystyle\mathsf{d\kern-0.70007ptl}(\mathrm{law}\{\varrho(P_{N})\},\mathrm{law}\{\varrho(Q_{N})\})=\mathsf{d\kern-0.70007ptl}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right), (2.5)

where PNP^{\otimes N} and QNQ^{\otimes N} denote the probability measures on measurable space (IRN,(IR)N)\left({\rm I\!R}^{N},{\cal B}({\rm I\!R})^{\otimes N}\right) with marginals PP and QQ on each (IR,(IR))({\rm I\!R},{\cal B}({\rm I\!R})) respectively, (IR)\mathcal{B}({\rm I\!R}) denotes the corresponding Borel sigma algebra of IR{\rm I\!R}. Since neither PP nor QQ is known, we want the discrepancy to be uniformly small for all PP and QQ over a subset of admissible laws on 𝒫(IR)\mathscr{P}({\rm I\!R}) so long as QQ is sufficiently close to PP under some metric 𝖽𝗅\mathsf{d\kern-0.70007ptl}^{\prime}. The uniformity may be interpreted as robustness. Qualitative robustness refers to the case that the relationship between 𝖽𝗅(PNϱ^N1,QNϱ~N1)\mathsf{d\kern-0.70007ptl}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\tilde{\varrho}_{N}^{-1}\right) and 𝖽𝗅(P,Q)\mathsf{d\kern-0.70007ptl}^{\prime}(P,Q) is implicit whereas quantitative robustness refers to the case that the relationship is explicit, i.e., a function of the latter can be used to bound the former, and this is what we aim to achieve in this paper because qualitative robustness have been well investigated, for instance, in [2, 1, 7].

3 ζ\zeta-metrics and admissible laws

There are two essential elements in investigating both the qualitative and quantitative statistical robustness of a risk functional: One is the specific choice of probability metrics but not just the topologies generated by them, see, e.g., [12, 2, 7], to quantify the change of the law PP and to estimate the discrepancy between the laws of two estimators, i.e., (2.5); the other is the determination of the subset \mathscr{M} of admissible laws in 𝒫(IR)\mathscr{P}({\rm I\!R}) (see, e.g., [7, 9]), containing all empirical distributions: 1,emp\mathscr{M}_{1,\mathrm{emp}}\subset\mathscr{M}, to restrict the perturbation of the law PP. For instance, the subset \mathscr{M} may be specified via some generalized moment conditions, which are interesting in econometric or financial applications.

To introduce these two essential elements thoroughly, some preliminary notions and results in probability theory and statistics such as ϕ\phi-weak topology are required. We first give a sketch of them to prepare our discussions in the follow-up sections. Let ϕ:IR[0,)\phi:{\rm I\!R}\to[0,\infty) be a continuous function and 1ϕ:={P𝒫(IR):IRϕ(t)P(dt)<}{\cal M}_{1}^{\phi}:=\left\{P^{\prime}\in\mathscr{P}({\rm I\!R}):\int_{{\rm I\!R}}\phi(t)P^{\prime}(dt)<\infty\right\}. In the particular case when ϕ():=||p\phi(\cdot):=|\cdot|^{p} and pp is a positive number, write 1p{\cal M}_{1}^{p} for 1||p{\cal M}_{1}^{|\cdot|^{p}}. Note that 1ϕ{\cal M}_{1}^{\phi} defines a subset of probability measures in 𝒫(IR)\mathscr{P}({\rm I\!R}) which satisfies the generalized moment condition of ϕ\phi. From the definition, we can see that 1p21p1{\cal M}_{1}^{p_{2}}\subset{\cal M}_{1}^{p_{1}} for any positive numbers p1,p2p_{1},p_{2} with p1<p2p_{1}<p_{2} due to Hölder inequality.

Definition 3.1 (ϕ\phi-weak topology)

Let ϕ:IR[0,)\phi:{\rm I\!R}\to[0,\infty) be a gauge function, that is, ϕ\phi is continuous and ϕ1\phi\geq 1 holds outside a compact set. Define 𝒞1ϕ{\cal C}_{1}^{\phi} the linear space of all continuous functions h:IRIRh:{\rm I\!R}\to{\rm I\!R} for which there exists a positive constant cc such that

|h(t)|c(ϕ(t)+1),tIR.|h(t)|\leq c(\phi(t)+1),\forall t\in{\rm I\!R}.

The ϕ\phi-weak topology, denoted by τϕ\tau_{\phi}, is the coarsest topology on 1ϕ{\cal M}_{1}^{\phi} for which the mapping gh:1ϕIRg_{h}:{\cal M}_{1}^{\phi}\to{\rm I\!R} defined by gh(P):=IRh(t)P(dt),h𝒞1ϕg_{h}(P^{\prime}):=\int_{{\rm I\!R}}h(t)P^{\prime}(dt),\;\forall h\in{\cal C}_{1}^{\phi}, is continuous. A sequence {Pl}1ϕ\{P_{l}\}\subset{\cal M}_{1}^{\phi} is said to converge ϕ\phi-weakly to P1ϕP\in{\cal M}_{1}^{\phi} written PlϕP{P_{l}}\xrightarrow[]{\phi}P if it converges w.r.t. τϕ\tau_{\phi}.

Clearly, ϕ\phi-weak topology is finer than the weak topology, and the two topologies coincide if and only if ϕ\phi is bounded. It is well known (see [7, Lemma 3.4]) that ϕ\phi-weak convergence is equivalent to weak convergence, denoted by Pl𝑤P{P_{l}}\xrightarrow[]{w}P, together with IRϕ(t)Pl(dt)IRϕ(t)P(dt)\int_{{\rm I\!R}}\phi(t)P_{l}(dt)\to\int_{{\rm I\!R}}\phi(t)P(dt). Moreover, it follows by [7, 1] that the ϕ\phi-weak topology on 1ϕ{\cal M}_{1}^{\phi} is generated by the metric 𝖽𝗅ϕ:1ϕ×1ϕIR\mathsf{d\kern-0.70007ptl}_{\phi}:{\cal M}_{1}^{\phi}\times{\cal M}_{1}^{\phi}\to{\rm I\!R} defined by

𝖽𝗅ϕ(P,Q):=𝖽𝗅Prok(P,Q)+|IRϕ(t)P(dt)IRϕ(t)Q(dt)|,\displaystyle\mathsf{d\kern-0.70007ptl}_{\phi}(P,Q):=\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)+\left|\int_{{\rm I\!R}}\phi(t)P(dt)-\int_{{\rm I\!R}}\phi(t)Q(dt)\right|, (3.1)

for P,Q1ϕP,Q\in{\cal M}_{1}^{\phi}, where 𝖽𝗅Prok:𝒫(IR)×𝒫(IR)IR+\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}:\mathscr{P}({\rm I\!R})\times\mathscr{P}({\rm I\!R})\to{\rm I\!R}_{+} is the Prokhorov metric defined by

𝖽𝗅Prok(P,Q):=inf{ϵ>0:P(A)Q(Aϵ)+ϵ,A(IR)},\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q):=\inf\{\epsilon>0:P(A)\leq Q(A^{\epsilon})+\epsilon,\forall A\in\mathcal{B}({\rm I\!R})\}, (3.2)

where Aϵ:=A+Bϵ(0)A^{\epsilon}:=A+B_{\epsilon}(0) denotes the Minkowski sum of AA and the open ball centred at 0 on IR{\rm I\!R} and (IR)\mathcal{B}({\rm I\!R}) is the corresponding Borel sigma algebra on IR{\rm I\!R}. We note that the Prokhorov metric metrized the weak topology on IR{\rm I\!R} see, e.g., [25].

3.1 ζ\zeta-metrics

Instead of exploiting the widely-used probability metrics such as the Prokhorov metric and the weighted Kolmogorov metric in the literature of qualitative robustness [2, 7], we will switch to the so-called metrics with ζ\zeta-structure to establish the quantitative statistical robustness framework for a risk functional. In particular, we will use the well-known Kantorovich metric and Fortet-Mourier metrics. The new metrics enable us to establish an explicit relationship between the discrepancy of the laws of the plug-in estimators of law invariant risk measures based on the true data and perturbed data with noise and the discrepancy of the associated true probability measure. We begin with a formal definition of ζ\zeta-metrics and then clarify the relationships between metrics of ζ\zeta-structure and those used in [26, 2, 7].

Definition 3.2

Let P,Q𝒫(IR)P,Q\in\mathscr{P}({\rm I\!R}) and {\cal F} be a class of measurable functions from IR{\rm I\!R} to IR{\rm I\!R}. The metric with ζ\zeta-structure is defined by

𝖽𝗅(P,Q):=supψ|IRψ(ξ)P(dξ)IRψ(ξ)Q(dξ)|.\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q):=\sup_{\psi\in\mathcal{F}}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|. (3.3)

From the definition, we can see that 𝖽𝗅(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q) is the maximum difference of the expected values of the class of measurable functions \mathcal{F} with respect to PP and QQ. ζ\zeta-metrics are widely used in the stability analysis of stochastic programming, see Römisch [27] for an excellent overview. The specific metrics with ζ\zeta-structure that we consider in this paper are the Kantorovich metric and the Fortet-Mourier metric. The next definition gives a precise description of the two notions.

Definition 3.3 (Fortet-Mourier metric)

Let

p(IR):={ψ:IRIR:|ψ(ξ)ψ(ξ~)|cp(ξ,ξ~)|ξξ~|,ξ,ξ~IR},\displaystyle\mathcal{F}_{p}({\rm I\!R}):=\left\{\psi:{\rm I\!R}\rightarrow{\rm I\!R}:|\psi(\xi)-\psi(\tilde{\xi})|\leq c_{p}(\xi,\tilde{\xi})|\xi-\tilde{\xi}|,\forall\xi,\tilde{\xi}\in{\rm I\!R}\right\}, (3.4)

where cp(ξ,ξ~):=max{1,|ξ|,|ξ~|}p1c_{p}(\xi,\tilde{\xi}):=\max\{1,|\xi|,|\tilde{\xi}|\}^{p-1} for all ξ,ξ~IR\xi,\tilde{\xi}\in{\rm I\!R} and p1p\geq 1 describes the growth of the local Lipschitz constants. The pp-th order Fortet-Mourier metric for P,Q𝒫(IR)P,Q\in\mathscr{P}({\rm I\!R}) is defined by

𝖽𝗅FM,p(P,Q):=supψp(IR)|IRψ(ξ)P(dξ)IRψ(ξ)Q(dξ)|.\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q):=\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|. (3.5)

In the case when p=1p=1, it is known as the Kantorovich metric for P,Q𝒫(IR)P,Q\in\mathscr{P}({\rm I\!R})

𝖽𝗅K(P,Q):=supψ1(IR)|IRψ(ξ)P(dξ)IRψ(ξ)Q(dξ)|.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P,Q):=\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)Q(d\xi)\right|. (3.6)

From the definition, we can see that for any positive numbers pp1p\geq p^{\prime}\geq 1,

𝖽𝗅FM,p(P,Q)𝖽𝗅FM,p(P,Q)𝖽𝗅K(P,Q),\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)\geq\mathsf{d\kern-0.70007ptl}_{FM,p^{\prime}}(P,Q)\geq\mathsf{d\kern-0.70007ptl}_{K}(P,Q), (3.7)

which means that 𝖽𝗅FM,p(P,Q)\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q) becomes tighter as pp increases and they are all tighter than 𝖽𝗅K(P,Q)\mathsf{d\kern-0.70007ptl}_{K}(P,Q). Moreover, the Fortet–Mourier metric metricizes weak convergence on sets of probability measures possessing uniformly a pp-th moment [28, p. 350]. Notice that the function t1p|t|pt\rightarrow\frac{1}{p}|t|^{p} for tIRt\in{\rm I\!R} belongs to p(IR)\mathcal{F}_{p}({\rm I\!R}). On IR{\rm I\!R}, the Fortet–Mourier metric may be equivalently written as

𝖽𝗅FM,p(P,Q)=IRmax{1,|x|p1}|P(x)Q(x)|𝑑x,forP,Q𝒫(IR),\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)=\int_{{\rm I\!R}}\max\{1,|x|^{p-1}\}|P(x)-Q(x)|dx,\;\mbox{\rm{for}}\;P,Q\in\mathscr{P}({\rm I\!R}), (3.8)

see, e.g., [29, p. 93].

In the next example, we illustrate the relationship between the existing probability metrics used in statistical robustness and the metrics with ζ\zeta-structure.

Example 3.1

A number of well known probability metrics are used in the literature of statistical robustness.

(i) The Kantorovich (or Wasserstein) metric. Let 1\mathcal{F}_{1} be the set of all Lipschitz continuous functions with modulus being bounded by 11. Then

𝖽𝗅K(P,Q):=+|P(x)Q(x)|𝑑x=𝖽𝗅1(P,Q).\displaystyle\mathsf{d\kern-0.70007ptl}_{K}(P,Q):=\int_{-\infty}^{+\infty}|P(x)-Q(x)|dx=\mathsf{d\kern-0.70007ptl}_{\mathcal{F}_{1}}(P,Q). (3.9)

Moreover, 𝖽𝗅Prok(P,Q)2𝖽𝗅K(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q), see [25, Theorem 2].

(ii) The Lévy distance [29]. Let \mathcal{F} be the set of functions bounded by 1. Then

𝖽𝗅Le´vy(P,Q):=inf{ϵ>0:Q(xϵ)ϵP(x)Q(x+ϵ)+ϵ,xIR}𝖽𝗅(P,Q).\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q):=\inf\{\epsilon>0:Q(x-\epsilon)-\epsilon\leq P(x)\leq Q(x+\epsilon)+\epsilon,\;\forall\;x\in{\rm I\!R}\}\leq\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q).

Moreover, 𝖽𝗅Le´vy(P,Q)𝖽𝗅Prok(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q)\leq\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q) and 𝖽𝗅Le´vy(P,Q)𝖽𝗅(ϕ)(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,Q)\leq\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q) for any ϕ1\phi\geq 1, see, e.g., [25].

(iii) The weighted Kolmogorov metric [7]. Let ϕ\phi be a uu-shaped function, i.e., a continuous function ϕ:IR[1,+)\phi:{\rm I\!R}\rightarrow[1,+\infty) that is non-increasing on (,0)(-\infty,0) and non-decreasing on (0,+)(0,+\infty). Then the weighted Kolmogorov metric is defined as

𝖽𝗅(ϕ)(P,Q):=supxIR|P(x)Q(x)|ϕ(x)𝖽𝗅(P,Q),\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q):=\sup_{x\in{\rm I\!R}}|P(x)-Q(x)|\phi(x)\leq\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q),

where \mathcal{F} is the set of all functions bounded by ϕ\phi. Precisely, if \mathcal{F} is the set of all indicator functions 𝟏B\mathbf{1}_{B}, where B:={(,ξ],ξIR}B:=\{(-\infty,\xi],\xi\in{\rm I\!R}\}, then 𝖽𝗅(P,Q)=𝖽𝗅(1)(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\mathsf{d\kern-0.70007ptl}_{(1)}(P,Q), which is known as the Kolmogorov metric. Similarly, by letting \mathcal{F} be the set of all weighted indicator functions with weighting ϕ\phi, one can obtain 𝖽𝗅(P,Q)=𝖽𝗅(ϕ)(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,Q).

(iv) The Prokhorov metric [7]. Let \mathcal{F} be the set of all functions bounded by 1. Then by [25],

𝖽𝗅Prok(P,Q):=inf{ϵ>0:P(A)Q(Aϵ)+ϵ,A(IR)}12𝖽𝗅(P,Q),\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q):=\inf\{\epsilon>0:P(A)\leq Q(A^{\epsilon})+\epsilon,\forall A\in\mathcal{B}({\rm I\!R})\}\leq\frac{1}{2}\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q),

where Aϵ:={xIR:infyA|xy|ϵ}A^{\epsilon}:=\{x\in{\rm I\!R}:\inf_{y\in A}|x-y|\leq\epsilon\}. Moreover, 𝖽𝗅Prok(P,Q)2𝖽𝗅K(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q).

(v) The Dudley’s (or Bounded) Lipschitz metric [26]. Let BL\mathcal{F}_{\mathrm{BL}} consist of all Lipschitz continuous ff such that f+Lip(f)1\|f\|_{\infty}+\mathrm{Lip}(f)\leq 1, where f\|f\|_{\infty} denotes the usual sup-norm and Lip(f)\mathrm{Lip}(f) is the Lipschiz constant of the Lipschiz function ff, then

𝖽𝗅(P,Q)=supfBL|f(x)P(dx)f(x)Q(dx)|:=𝖽𝗅Lip(P,Q).\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathcal{F}}(P,Q)=\sup_{f\in\mathcal{F}_{\mathrm{BL}}}\left|\int f(x)P(dx)-\int f(x)Q(dx)\right|:=\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,Q).

Moreover, 23𝖽𝗅Prok(P,Q)2𝖽𝗅Lip(P,Q)2𝖽𝗅Prok(P,Q)\frac{2}{3}\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)^{2}\leq\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,Q)\leq 2\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q), see, e.g., [26, Section 3].

3.2 Admissible laws

We now turn to discuss another important component in statistical robust analysis, that is, the subset \mathscr{M} of admissible laws in 𝒫(IR)\mathscr{P}({\rm I\!R}) which describes the scope of the perturbation of the law PP by a metric. This can be motivated by ensuring the finiteness of 𝖽𝗅(P,Q)\mathsf{d\kern-0.70007ptl}(P,Q). To this effect, we formally introduce the concept of admissible laws induced by probability metrics.

Definition 3.4 (Admissible laws induced by probability metrics)

Let 𝖽𝗅\mathsf{d\kern-0.70007ptl} be a probability metric on 𝒫(IR)\mathscr{P}({\rm I\!R}). The admissible laws induced by 𝖽𝗅\mathsf{d\kern-0.70007ptl} are defined as

𝒫𝖽𝗅(IR):={P𝒫(IR):𝖽𝗅(P,δ0)<+},\displaystyle\mathscr{P}_{\mathsf{d\kern-0.49005ptl}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}(P,\delta_{0})<+\infty\}, (3.10)

where δ0\delta_{0} denotes the Dirac measure at 0.

Let 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) denote the admissible laws induced by the Fortet-Mourier metrics with parameter pp on 𝒫(IR)\mathscr{P}({\rm I\!R}). By Definition 3.4, we have

𝒫p(IR)\displaystyle\mathscr{P}_{p}({\rm I\!R}) :=\displaystyle:= {P𝒫(IR):𝖽𝗅FM,p(P,δ0)<+}\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{FM,p}(P,\delta_{0})<+\infty\} (3.11)
=\displaystyle= {P𝒫(IR):supψp(IR)|IRψ(ξ)P(dξ)IRψ(ξ)δ0(dξ)|<+}\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\int_{{\rm I\!R}}\psi(\xi)\delta_{0}(d\xi)\right|<+\infty\right\}
=\displaystyle= {P𝒫(IR):supψp(IR)|IRψ(ξ)P(dξ)ψ(0)|<+}.\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right|<+\infty\right\}.

By triangle inequality, this ensures 𝖽𝗅FM,p(P,Q)<+\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty for any P,Q𝒫p(IR)P,Q\in\mathscr{P}_{p}({\rm I\!R}).

In the following example, we compare the admissible laws induced by different probability metrics.

Example 3.2 (Admissible laws induced by probability metrics)

We reconsider the admissible laws induced by probability metrics defined in Example 3.1.

(i) The admissible laws induced by the Kantorovich (or Wasserstein) metric are defined as

𝒫K(IR)\displaystyle\mathscr{P}_{K}({\rm I\!R}) :=\displaystyle:= {P𝒫(IR):𝖽𝗅K(P,δ0)<+}\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{K}(P,\delta_{0})<+\infty\}
=\displaystyle= {P𝒫(IR):0P(x)𝑑x+0+(1P(x))𝑑x<+}(=𝒫1(IR))\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\int_{-\infty}^{0}P(x)dx+\int_{0}^{+\infty}(1-P(x))dx<+\infty\right\}(=\mathscr{P}_{1}({\rm I\!R}))
=\displaystyle= {P𝒫(IR):IR|x|𝑑P(x)<+}\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\int_{{\rm I\!R}}|x|dP(x)<+\infty\right\}
=\displaystyle= 11,\displaystyle\mathcal{M}_{1}^{1},

where the second equality follows from the definition of the Kantorovich metric (see, (3.9)). To see how the third equality holds, we note that for any t<0t<0, we have

+>0P(x)𝑑x\displaystyle+\infty>\int_{-\infty}^{0}P(x)dx =\displaystyle= t0P(x)𝑑x+2ttP(x)𝑑x+2tP(x)𝑑x\displaystyle\int_{t}^{0}P(x)dx+\int_{2t}^{t}P(x)dx+\int_{-\infty}^{2t}P(x)dx
\displaystyle\geq t0P(x)𝑑x+12P(2t)|2t|\displaystyle\int_{t}^{0}P(x)dx+\frac{1}{2}P(2t)|2t|

Since 2tP(2t)0-2tP(2t)\geq 0, then let tt\rightarrow-\infty, then we have limttP(t)=0\lim_{t\rightarrow-\infty}tP(t)=0. Similarly, we have limt+t(1P(t))=0\lim_{t\rightarrow+\infty}t(1-P(t))=0. By using integration-by-parts formula (more precisely [30, Theorem 1.15]), we obtain the right hand side of the third equality. The last equality follows from the definition of ϕ\phi-topology in which case ϕ=||\phi=|\cdot|.

(ii) The admissible laws induced by the Lévy distance are defined as

𝒫Le´vy(IR):={P𝒫(IR):𝖽𝗅Le´vy(P,δ0)<+}=𝒫(IR).\displaystyle\mathscr{P}_{\mathrm{L\acute{e}vy}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}(P,\delta_{0})<+\infty\}=\mathscr{P}({\rm I\!R}).

Since 𝖽𝗅Le´vy1\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}\leq 1, then the admissible laws coincide with 𝒫(IR)\mathscr{P}({\rm I\!R}).

(iii) The admissible laws induced by the weighted Kolmogorov metric are defined as

𝒫(ϕ)(IR)\displaystyle\mathscr{P}_{(\phi)}({\rm I\!R}) :=\displaystyle:= {P𝒫(IR):𝖽𝗅(ϕ)(P,δ0)<+}\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{(\phi)}(P,\delta_{0})<+\infty\}
=\displaystyle= {P𝒫(IR):supx0|P(x)ϕ(x)|+supx>0|(1P(x))ϕ(x)|<+},\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|<+\infty\right\},

which coincides with the set 1(ϕ)\mathscr{M}_{1}^{(\phi)} defined in Krätschmer at al. [7, subsection 3.2].

If ϕ\phi is bounded on IR{\rm I\!R}, then it is straight that 𝒫(ϕ)(IR)=𝒫(IR)\mathscr{P}_{(\phi)}({\rm I\!R})=\mathscr{P}({\rm I\!R}). In the case when ϕ\phi is unbounded on IR{\rm I\!R}, then

1ϕ𝒫(ϕ)(IR)ϵ>01ϕ1ϵ.\displaystyle\mathcal{M}_{1}^{\phi}\subset\mathscr{P}_{(\phi)}({\rm I\!R})\subset\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}. (3.12)

In what follows, we give a proof for (3.12). Let P1ϕP\in\mathcal{M}_{1}^{\phi}, since ϕ\phi is a uu-shaped function, then for any M>0M>0 and N0N\leq 0, we have

+>IRϕ(x)𝑑P(x)\displaystyle+\infty>\int_{{\rm I\!R}}\phi(x)dP(x) =\displaystyle= 0+ϕ(x)𝑑P(x)+0ϕ(x)𝑑P(x)\displaystyle\int_{0}^{+\infty}\phi(x)dP(x)+\int_{-\infty}^{0}\phi(x)dP(x)
\displaystyle\geq 0Mϕ(x)𝑑P(x)+ϕ(M)(1P(M))+N0ϕ(x)𝑑P(x)+ϕ(N)P(N)\displaystyle\int_{0}^{M}\phi(x)dP(x)+\phi(M)(1-P(M))+\int_{N}^{0}\phi(x)dP(x)+\phi(N)P(N)
\displaystyle\geq ϕ(M)(1P(M))+ϕ(N)P(N),\displaystyle\phi(M)(1-P(M))+\phi(N)P(N),

and consequently ϕ𝑑μsupx0|P(x)ϕ(x)|+supx>0|(1P(x))ϕ(x)|\int\phi d\mu\geq\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|. Thus, 1ϕ𝒫(ϕ)(IR)\mathcal{M}_{1}^{\phi}\subset\mathscr{P}_{(\phi)}({\rm I\!R}).

On the other hand, for any ϵ>0\epsilon>0, if we let ϕϵ(x):=ϕ(x)1ϵ\phi_{\epsilon}(x):=\phi(x)^{1-\epsilon} for xIRx\in{\rm I\!R}, then ϕϵ\phi_{\epsilon} is a gague function. Moreover, for any P𝒫(ϕ)(IR)P\in\mathscr{P}_{(\phi)}({\rm I\!R}), there exists a k<+k<+\infty such that k=supx0|P(x)ϕ(x)|+supx>0|(1P(x))ϕ(x)|k=\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|. To ease the exposition, we can assume that the law P(x)>0P(x)>0 for any xIRx\in{\rm I\!R}. Then

ϕ(x)kP(x)forx0andϕ(x)k1P(x)forx>0.\displaystyle\phi(x)\leq\frac{k}{P(x)}\;\mbox{\rm{for}}\;x\leq 0\;\;\mbox{\rm{and}}\;\phi(x)\leq\frac{k}{1-P(x)}\;\mbox{\rm{for}}\;x>0.

Thus

IRϕϵ(x)𝑑P(x)\displaystyle\int_{{\rm I\!R}}\phi_{\epsilon}(x)dP(x) =\displaystyle= 0ϕϵ(x)𝑑P(x)+0+ϕϵ(x)𝑑P(x)\displaystyle\int_{-\infty}^{0}\phi_{\epsilon}(x)dP(x)+\int_{0}^{+\infty}\phi_{\epsilon}(x)dP(x)
\displaystyle\leq k1ϵ01P(x)1ϵ𝑑P(x)+k1ϵ0+1(1P(x))1ϵ𝑑P(x)\displaystyle k^{1-\epsilon}\int_{-\infty}^{0}\frac{1}{P(x)^{1-\epsilon}}dP(x)+k^{1-\epsilon}\int_{0}^{+\infty}\frac{1}{(1-P(x))^{1-\epsilon}}dP(x)
=\displaystyle= k1ϵ[1ϵP(x)ϵ]0k1ϵ[1ϵ(1P(x))ϵ]\displaystyle k^{1-\epsilon}\left[\frac{1}{\epsilon}P(x)^{\epsilon}\right]_{-\infty}^{0}-k^{1-\epsilon}\left[\frac{1}{\epsilon}(1-P(x))^{\epsilon}\right]
=\displaystyle= 1ϵk1ϵ[P(0)ϵ(1P(0))ϵ]<+\displaystyle\frac{1}{\epsilon}k^{1-\epsilon}[P(0)^{\epsilon}-(1-P(0))^{\epsilon}]<+\infty

which implies P1ϕϵP\in\mathcal{M}_{1}^{\phi_{\epsilon}}. Summarizing the discussions above, we obtain (3.12).

We note that if ϕ\phi is unbounded, then the inclusions in (3.12) are strict because we can find a counterexample showing equality may fail, see Example B.1 in the appendix.

(iv) The admissible laws induced by the Prokhorov metric are defined as

𝒫Prok(IR):={P𝒫(IR):𝖽𝗅Prok(P,δ0)<+}=𝒫(IR).\displaystyle\mathscr{P}_{\mathrm{Prok}}({\rm I\!R}):=\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,\delta_{0})<+\infty\}=\mathscr{P}({\rm I\!R}).

Since 𝖽𝗅Prok1\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}\leq 1, then the admissible laws coincides with 𝒫(IR)\mathscr{P}({\rm I\!R}).

(v) The admissible laws induced by the Dudley’s (or Bounded) metric are defined as

𝒫Lip(IR)\displaystyle\mathscr{P}_{\mathrm{Lip}}({\rm I\!R}) :=\displaystyle:= {P𝒫(IR):𝖽𝗅Lip(P,δ0)<+}\displaystyle\{P\in\mathscr{P}({\rm I\!R}):\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}(P,\delta_{0})<+\infty\}
=\displaystyle= {P𝒫(IR):supfBL|IRf(x)P(dx)f(0)|<+}=𝒫(IR).\displaystyle\left\{P\in\mathscr{P}({\rm I\!R}):\sup_{f\in\mathcal{F}_{\mathrm{BL}}}\left|\int_{{\rm I\!R}}f(x)P(dx)-f(0)\right|<+\infty\right\}=\mathscr{P}({\rm I\!R}).

Since 𝖽𝗅Lip2\mathsf{d\kern-0.70007ptl}_{\mathrm{Lip}}\leq 2, then the admissible laws coincide with 𝒫(IR)\mathscr{P}({\rm I\!R}).

3.3 Relationship with ϕ\phi-weak topology

Since ϕ\phi-weak topology has been widely used for qualitative robust analysis in the literature whereas we use the topology induced by the Fortet-Mourier metrics for quantitative robust analysis, it would therefore be helpful to look into potential connections of the two apparently completely different metrics. In the next proposition, we look into such connection from admissible set perspective (which defines the space of probability measures that PP is perturbed in both qualitative and quantitative robust analysis), we find that 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) coincides with 1ϕ{\cal M}_{1}^{\phi} for some specific choice of ϕ\phi and subsequently show that the Fortet-Mourier metric is tighter than 𝖽𝗅ϕ\mathsf{d\kern-0.70007ptl}_{\phi}.

Proposition 3.1

Let p1p\geq 1 be fixed and

ϕp(t):={|t|,for|t|1,|t|p,otherwise.\displaystyle\phi_{p}(t):=\left\{\begin{array}[]{ll}|t|,&\mbox{\rm{for}}\;|t|\leq 1,\\ |t|^{p},&\mbox{\rm{otherwise}}.\end{array}\right.

The following assertions hold.

  • (i)

    𝒫p(IR)=1ϕp(=1p)\mathscr{P}_{p}({\rm I\!R})=\mathcal{M}_{1}^{\phi_{p}}(=\mathcal{M}_{1}^{p}).

  • (ii)

    𝖽𝗅ϕp(P,Q)𝖽𝗅FM,p(P,Q)+p𝖽𝗅FM,p(P,Q),P,Q𝒫p(IR)\mathsf{d\kern-0.70007ptl}_{\phi_{p}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}+p\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q),\;\forall P,Q\in\mathscr{P}_{p}({\rm I\!R}).

  • (iii)

    𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} metrizes the ϕp\phi_{p}-weak topology on 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}).

Part (i) of the proposition says that the admissible set 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) coincides with the set of laws on IR{\rm I\!R} satisfying the generalized moment condition of ϕp\phi_{p}. Part (ii) indicates that 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} is tighter than 𝖽𝗅ϕp\mathsf{d\kern-0.70007ptl}_{\phi_{p}}. Part (iii) means that the ϕp\phi_{p}-weak topology on 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) is generated by the metric 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p}.

Proof. Part (i). Since for any p1p\geq 1, 1pϕpp(IR)\frac{1}{p}\phi_{p}\in\mathcal{F}_{p}({\rm I\!R}), then by the definition of 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}), we have that P𝒫p(IR)P\in\mathscr{P}_{p}({\rm I\!R}) implies P1ϕpP\in\mathcal{M}_{1}^{\phi_{p}} and subsequently, 𝒫p(IR)1ϕp\mathscr{P}_{p}({\rm I\!R})\subset\mathcal{M}_{1}^{\phi_{p}}.

On the other hand, let P1ϕpP\in\mathcal{M}_{1}^{\phi_{p}}, then IRϕp(ξ)P(dξ)<\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)<\infty. For any ψp(IR)\psi\in\mathcal{F}_{p}({\rm I\!R}), we have

|ψ(ξ)ψ(0)|max{1,|ξ|p1}|ξ|max{|ξ|,|ξ|p},for allξIR,\displaystyle|\psi(\xi)-\psi(0)|\leq\max\{1,|\xi|^{p-1}\}|\xi|\leq\max\{|\xi|,|\xi|^{p}\},\;\mbox{\rm{for all}}\;\xi\in{\rm I\!R},

and consequently,

|IRψ(ξ)P(dξ)ψ(0)|\displaystyle\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right| =\displaystyle= |IR(ψ(ξ)ψ(0))P(dξ)|IR|ψ(ξ)ψ(0)|P(dξ)\displaystyle\left|\int_{{\rm I\!R}}(\psi(\xi)-\psi(0))P(d\xi)\right|\leq\int_{{\rm I\!R}}|\psi(\xi)-\psi(0)|P(d\xi)
=\displaystyle= IRmax{|ξ|,|ξ|p}P(dξ)IRϕp(ξ)P(dξ).\displaystyle\int_{{\rm I\!R}}\max\{|\xi|,|\xi|^{p}\}P(d\xi)\leq\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi).

Therefore, we have

supψp(IR)|IRψ(ξ)P(dξ)ψ(0)|IRϕp(ξ)P(dξ)<,\displaystyle\sup_{\psi\in\mathcal{F}_{p}({\rm I\!R})}\left|\int_{{\rm I\!R}}\psi(\xi)P(d\xi)-\psi(0)\right|\leq\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)<\infty,

and consequently, 1ϕp𝒫p(IR)\mathcal{M}_{1}^{\phi_{p}}\subset\mathscr{P}_{p}({\rm I\!R}).

Part (ii). Since 1pϕpp(IR)\frac{1}{p}\phi_{p}\in\mathcal{F}_{p}({\rm I\!R}), then for any P,Q𝒫p(IR)P,Q\in\mathscr{P}_{p}({\rm I\!R}),

|IRϕp(ξ)P(dξ)IRϕp(ξ)Q(dξ)|p|IR1pϕp(ξ)P(dξ)IR1pϕp(ξ)Q(dξ)|p𝖽𝗅FM,p(P,Q).\displaystyle\left|\int_{{\rm I\!R}}\phi_{p}(\xi)P(d\xi)-\int_{{\rm I\!R}}\phi_{p}(\xi)Q(d\xi)\right|\leq p\left|\int_{{\rm I\!R}}\frac{1}{p}\phi_{p}(\xi)P(d\xi)-\int_{{\rm I\!R}}\frac{1}{p}\phi_{p}(\xi)Q(d\xi)\right|\leq p\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q).

From Example 3.1(i) and (3.7), we have 𝖽𝗅Prok(P,Q)𝖽𝗅FM,p(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}. Finally, by the definition of 𝖽𝗅ϕp\mathsf{d\kern-0.70007ptl}_{\phi_{p}}, i.e., (3.1), we obtain the conclusion.

Part (iii) follows straightforwardly from Part (ii).  

Proposition 3.1 indicates that despite Fortet-Mourier metric 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} and 𝖽𝗅ϕp\mathsf{d\kern-0.70007ptl}_{\phi_{p}} are different metrics, they generate the same topology, which confirms the statement at the beginning of this section, i.e., for the qualitative robustness and the quantitative robustness, the specific choice of probability metrics matters but not the topologies generated by them. To conclude this section, we remark that the subset \mathscr{M} to be used in the definition of qualitative robust analysis will be confined to the set of admissible laws when we adopt the Fortet-Mourier metric for quantitative robust analysis in the next section.

4 Statistical robustness

We are now ready to return our discussions to the robustness of statistical estimators of law invariant risk measures that are outlined in Section 2.

4.1 Qualitative statistical robustness

To position our research properly, we begin by a brief overview of the existing results about the qualitative statistical robustness.

Definition 4.1 (Qualitative 𝒫0\mathcal{P}_{0}-Robustness [2, 1])

Let 𝒫0\mathcal{P}_{0} be a subset of 𝒫(IR)\mathscr{P}({\rm I\!R}) and P𝒫0P\in\mathcal{P}_{0}. The sequence {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} of estimators is said to be qualitatively 𝒫0\mathcal{P}_{0}-robust at PP w.r.t. (𝖽𝗅,𝖽𝗅)(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime}) if for every ϵ>0\epsilon>0 there exist δ>0\delta>0 and N0N_{0}\in\mathbb{N} such that for all Q𝒫0Q\in\mathscr{P}_{0} and NN0N\geq N_{0}

𝖽𝗅(P,Q)δ𝖽𝗅(Pϱ^N1,Qϱ^N1)ϵ.\displaystyle\mathsf{d\kern-0.70007ptl}(P,Q)\leq\delta\implies\mathsf{d\kern-0.70007ptl}^{\prime}(P^{\mathbb{N}}\circ\widehat{\varrho}_{N}^{-1},Q^{\mathbb{N}}\circ\widehat{\varrho}_{N}^{-1})\leq\epsilon. (4.1)

If, in addition, {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} arises as in (2.2) from a risk functional ϱ\varrho, then ϱ\varrho is called qualitatively 𝒫0\mathscr{P}_{0}-robust at PP w.r.t. (𝖽𝗅,𝖽𝗅)(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime}).

The definition above captures two versions of qualitative statistical robustness proposed by Cont et al. [2] for i.i.d. observations on IR{\rm I\!R} with 𝖽𝗅\mathsf{d\kern-0.70007ptl} and 𝖽𝗅\mathsf{d\kern-0.70007ptl}^{\prime} being Lévy distance and Krätchmer et al. [7] for i.i.d. observations on IR{\rm I\!R} with 𝖽𝗅=𝖽𝗅(ϕ)\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{(\phi)} and 𝖽𝗅=𝖽𝗅Prok\mathsf{d\kern-0.70007ptl}^{\prime}=\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}} respectively. Since 𝖽𝗅Prok\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}} is tighter than 𝖽𝗅Le´vy\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}}, it means Krätchmer et al. [7] examines the discrepancy of the laws with a tighter metric. On the other hand, from the definition of 𝖽𝗅(ϕ)\mathsf{d\kern-0.70007ptl}_{(\phi)}, we can see that it is also tighter than 𝖽𝗅Le´vy\mathsf{d\kern-0.70007ptl}_{\mathrm{L\acute{e}vy}} and allows one to capture the difference of distributions at the tail, it means the robust analysis in Krätchmer et al. [7] is restricted to a smaller class of probability distributions when QQ is perturbed from PP. This explains why CVaR is robust under the criterion of the latter but not the former.

A key result that Krätchmer et al. [7] establish is the Hampel’s theorem which states the equivalence between qualitative statistical robustness and stability/continuity of a risk functional (with respect to perturbation of the probability distribution) under uniform Glivenko-Cantelli (UGC) property of empirical distributions over a specified set.

Definition 4.2 (𝒞\mathscr{C}-Continuity [7])

Let P𝒫(IR)P\in\mathscr{P}({\rm I\!R}) and 𝒞\mathscr{C} be a subset of 𝒫(IR)\mathscr{P}({\rm I\!R}). Then ϱ\varrho is called 𝒞\mathscr{C}-continuous at PP w.r.t. (𝖽𝗅,||)(\mathsf{d\kern-0.70007ptl},|\cdot|) if for every ϵ>0\epsilon>0, there exists δ>0\delta>0 such that for all Q𝒞Q\in\mathscr{C}

𝖽𝗅(P,Q)δ|ϱ(P)ϱ(Q)|ϵ.\displaystyle\mathsf{d\kern-0.70007ptl}(P,Q)\leq\delta\implies|\varrho(P)-\varrho(Q)|\leq\epsilon.
Definition 4.3 (UGC Property [7])

Let 𝒞\mathscr{C} be a subset of 𝒫(IR)\mathscr{P}({\rm I\!R}). Then we say that the metric space (𝒞,𝖽𝗅)(\mathscr{C},\mathsf{d\kern-0.70007ptl}) has the UGC property if for every ϵ>0\epsilon>0 and δ>0\delta>0, there exists N0N_{0}\in\mathbb{N} such that for all P𝒞P\in\mathscr{C} and NN0N\geq N_{0}

PN[(ξ1,,ξN)IRN:𝖽𝗅(P,PN)δ}]ϵ.\displaystyle P^{\otimes N}\left[(\xi^{1},\ldots,\xi^{N})\in{\rm I\!R}^{N}:\mathsf{d\kern-0.70007ptl}(P,P_{N})\geq\delta\}\right]\leq\epsilon.

The UGC property means that convergence in probability of the empirical probability measure to the true marginal distribution uniformly in 𝒞\mathscr{C} on 𝒫(IR)\mathscr{P}({\rm I\!R}). Examples for metrics spaces (𝒞,𝖽𝗅)(\mathscr{C},\mathsf{d\kern-0.70007ptl}) having the UGC property can be found in [7, Section 3]. In particular, it is shown that there exists a subset of the admissible laws induced by the weigthed Kolmogorov metric enjoys the UGC property, see [7, Theorem 3.1].

Theorem 4.1 (Hampel’s Theorem [7])

Let 𝒫0\mathscr{P}_{0} be a subset of 𝒫(IR)\mathscr{P}({\rm I\!R}) and P𝒫0P\in\mathscr{P}_{0}. Assume that (𝒫0,𝖽𝗅)(\mathscr{P}_{0},\mathsf{d\kern-0.70007ptl}) has the UGC property and 1,emp𝒫0\mathscr{M}_{1,\mathrm{emp}}\subset\mathscr{P}_{0}. Then if the mapping ϱ\varrho is 1,emp\mathscr{M}_{1,\mathrm{emp}}-continuous at PP w.r.t. (𝖽𝗅,||)(\mathsf{d\kern-0.70007ptl},|\cdot|), the sequence {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} is qualitatively 𝒫0\mathscr{P}_{0}-robust at PP w.r.t. (𝖽𝗅,𝖽𝗅Prok)(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}).

4.2 Quantitative statistical robustness

We now move on to discuss our central topic, quantitative statistical robustness for the plug-in estimators of law invariant risk measures. Intuitively speaking, quantitative statistical robustness of a risk functional ϱ\varrho means that for any two admissible laws PP and QQ on 𝒫(IR)\mathscr{P}({\rm I\!R}), the distance between the laws of their plug-in estimators ϱ(PN)\varrho(P_{N}) and ϱ(QN)\varrho(Q_{N}) is bounded by the distance between PP and QQ when the sample size is sufficiently large.

Definition 4.4 (Quantitative statistical robustness)

Let 𝖽𝗅,𝖽𝗅\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime} be probability metrics on 𝒫(IR)\mathscr{P}({\rm I\!R}) and 𝒫𝖽𝗅(IR)\mathscr{M}\subset\mathscr{P}_{\mathsf{d\kern-0.49005ptl}^{\prime}}({\rm I\!R}) denote a subset of admissible laws on IR{\rm I\!R}. A sequence of statistical estimators {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} is said to be quantitative statistical robust on \mathscr{M} w.r.t. (𝖽𝗅,𝖽𝗅)(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime}) if there exists a non-decreasing real-valued continuous function h:IR+IR+h:{\rm I\!R}_{+}\to{\rm I\!R}_{+} with h(0)=0h(0)=0 such that for all P,QP,Q\in\mathscr{M} and NN\in\mathbb{N}

𝖽𝗅(PNϱ^N1,QNϱ^N1)h(𝖽𝗅(P,Q))<+.\displaystyle\mathsf{d\kern-0.70007ptl}(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1})\leq h(\mathsf{d\kern-0.70007ptl}^{\prime}(P,Q))<+\infty. (4.2)

If in addition, {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} arise as in (2.2) from a risk functional ϱ\varrho, then ϱ\varrho is called quantitative statistical robust on \mathscr{M} at PP w.r.t. (𝖽𝗅,𝖽𝗅)(\mathsf{d\kern-0.70007ptl},\mathsf{d\kern-0.70007ptl}^{\prime}). In a particular case when 𝖽𝗅=𝖽𝗅K\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{K}, h(t)=Lth(t)=Lt and 𝖽𝗅=𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}^{\prime}=\mathsf{d\kern-0.70007ptl}_{FM,p}, inequality (4.2) reduces to

𝖽𝗅K(PNϱ^N1,QNϱ^N1)L𝖽𝗅FM,p(P,Q)<+.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty. (4.3)

In comparison with the qualitative statistical robustness introduced by Krätchmer et al. [1] or Cont et al. [2], the definition (4.3) here has several advantages. First, we use Kantorovich metric instead of Prokhorov metric to quantify the discrepancy between PNϱ^N1P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1} and QNϱ^N1Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}. This enables us to capture the tail behaviour of the two laws and facilitate us to derive an explicit bound for the difference. Second, we use the Fortet-Mourier metric to quantify the perturbation of PP, which is more sensitive than the Lévy metric used in [2] and the weighted Kolmogorov metric in [1, 8, 9, 13] to the variation of the tails. Third, inequality (4.3) gives an error bound for the discrepancy of the two laws and the bound is valid for all QQ in \mathscr{M} instead of those in a neighborhood of PP.

Next, we introduce a definition on the Lipschitz continuity of a general statistical mapping from 𝒫(IR)\mathscr{P}({\rm I\!R}) to IR{\rm I\!R}, which strengthens the earlier definition of 𝒞\mathscr{C}-continuity for a general statistical functional.

Definition 4.5 (Lipschitz continuity)

Let ϱ:𝒫(IR)IR\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R} be a general statistical functional and \mathscr{M} be a subset of 𝒫(IR)\mathscr{P}({\rm I\!R}). ϱ\varrho is said to be Lipschitz continuous on \mathscr{M} w.r.t. 𝖽𝗅\mathsf{d\kern-0.70007ptl} if there exists a positive constant LL such that

|ϱ(P)ϱ(Q)|L𝖽𝗅(P,Q)<+,P,Q.\displaystyle|\varrho(P)-\varrho(Q)|\leq L\mathsf{d\kern-0.70007ptl}(P,Q)<+\infty,\quad\forall\;P,Q\in\mathscr{M}. (4.4)

There are a few of points to note to the above definition of Lipschitz continuity:

  • 1.

    The Lipschitz continuity is global instead of local over \mathscr{M}. The condition is strong but we will find that many risk functionals are global Lipschitz continuous on some \mathscr{M} indeed.

  • 2.

    The magnitude of the continuity depends on the metric 𝖽𝗅\mathsf{d\kern-0.70007ptl} which measures the distance between PP and QQ. In a specific case when 𝖽𝗅=𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{FM,p}, (4.4) reduces to

    |ϱ(P)ϱ(Q)|LIR|P(x)Q(x)|cp(x)𝑑x<+,P,Q,\displaystyle|\varrho(P)-\varrho(Q)|\leq L\int_{{\rm I\!R}}|P(x)-Q(x)|c_{p}(x)dx<+\infty,\;\forall\;P,Q\in\mathscr{M}, (4.5)

    where cp(x)=max{1,|x|p1}c_{p}(x)=\max\{1,|x|^{p-1}\}. The exponent pp plays an important role in (4.5) because it interacts with the tails of P()P(\cdot) and Q()Q(\cdot). Moreover, if 𝒫p(IR)\mathcal{M}\subset\mathscr{P}_{p}({\rm I\!R}), then (4.5) is finite. We will come back to this later.

  • 3.

    Let PNP_{N} and QNQ_{N} be empirical distributions on IR{\rm I\!R}. By plugging PNP_{N} and QNQ_{N} into (4.5), we obtain

    |ϱ(PN)ϱ(QN)|\displaystyle|\varrho(P_{N})-\varrho(Q_{N})| \displaystyle\leq LIR|PN(x)QN(x)|cp(x)𝑑x\displaystyle L\int_{{\rm I\!R}}|P_{N}(x)-Q_{N}(x)|c_{p}(x)dx (4.6)
    =\displaystyle= Lk=12N|1Ni=1N𝟏ξixk1Ni=1N𝟏ξ^ixk|xkxk+1cp(x)𝑑x\displaystyle L\sum_{k=1}^{2N}\left|\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\xi^{i}\leq x_{k}}-\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}_{\widehat{\xi}^{i}\leq x_{k}}\right|\int_{x_{k}}^{x_{k+1}}c_{p}(x)dx
    =\displaystyle= Lk=1N1N|ξikξ^jkcp(x)𝑑x|\displaystyle L\sum_{k=1}^{N}\frac{1}{N}\left|\int_{\xi^{i_{k}}}^{\widehat{\xi}^{j_{k}}}c_{p}(x)dx\right|
    \displaystyle\leq Lk=1N1N|ξikξ^jk|max{cp(ξik),cp(ξ^jk)}\displaystyle L\sum_{k=1}^{N}\frac{1}{N}|\xi^{i_{k}}-\widehat{\xi}^{j_{k}}|\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}
    \displaystyle\leq LNk=1Ncp(ξk,ξ^k)|ξkξ^k|,ξk,ξ^kIR,\displaystyle\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})|\xi^{k}-\widehat{\xi}^{k}|,\quad\forall\xi^{k},\widehat{\xi}^{k}\in{\rm I\!R},

    where xkx_{k} is the kk-th smallest number among {ξ1,,ξN;ξ^1,,ξ^N}\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\} for k=1,,2Nk=1,\ldots,2N and x2N+1=x2Nx_{2N+1}=x_{2N} and cp(ξ,ξ^)=max{1,|ξ|,|ξ^|}p1c_{p}(\xi,\widehat{\xi})=\max\{1,|\xi|,|\widehat{\xi}|\}^{p-1} for all ξ,ξ^IR\xi,\widehat{\xi}\in{\rm I\!R}. The equality is due to Fubini’s theorem for discrete case and the last inequality from Lemma A.1 for the non-decreasing sequences {ξikmax{cp(ξik),cp(ξ^jk)}}k=1N\{\xi^{i_{k}}\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}\}_{k=1}^{N} and {ξ^jkmax{cp(ξik),cp(ξ^jk)}}k=1N\{\widehat{\xi}^{j_{k}}\max\{c_{p}(\xi^{i_{k}}),c_{p}(\widehat{\xi}^{j_{k}})\}\}_{k=1}^{N}.

  • 4.

    In the case when ϱ\varrho is continuous on \mathscr{M}, the Lipschitz continuity (4.5) is equivalent to the Lipschitz continuity (4.6) on the set of all empirical distributions 1,emp\mathscr{M}_{1,\mathrm{emp}} (see the first inequality of equation (4.6)) because 1,emp\mathscr{M}_{1,\mathrm{emp}} is dense in 𝒫(IR)\mathscr{P}({\rm I\!R}).

Example 4.1 (pp-th moment functional)

For p1p\geq 1, we consider the pp-th moment functional T(p)T^{(p)} on 1p=𝒫p(IR)\mathcal{M}_{1}^{p}=\mathscr{P}_{p}({\rm I\!R}) as defined by:

T(p)(P):=+xp𝑑P(x)<+,P1p.T^{(p)}(P):=\int_{-\infty}^{+\infty}x^{p}dP(x)<+\infty,\quad\forall P\in\mathcal{M}_{1}^{p}.

Analogous to Example B.1, we have

T(p)(P)=0P(x)pxp1𝑑x+0+(1P(x))pxp1𝑑x.\displaystyle T^{(p)}(P)=-\int_{-\infty}^{0}P(x)px^{p-1}dx+\int_{0}^{+\infty}(1-P(x))px^{p-1}dx. (4.7)

Thus, for any P,Q1pP,Q\in\mathcal{M}_{1}^{p},

|T(p)(P)T(p)(Q)|\displaystyle|T^{(p)}(P)-T^{(p)}(Q)| =\displaystyle= |+(P(x)Q(x))pxp1𝑑x|p+|P(x)Q(x)||x|p1𝑑x\displaystyle\left|\int_{-\infty}^{+\infty}(P(x)-Q(x))px^{p-1}dx\right|\leq p\int_{-\infty}^{+\infty}|P(x)-Q(x)||x|^{p-1}dx (4.8)
\displaystyle\leq p+|P(x)Q(x)|cp(x)𝑑x<+,\displaystyle p\int_{-\infty}^{+\infty}|P(x)-Q(x)|c_{p}(x)dx<+\infty,

where cp(x)=max{1,|x|p1}c_{p}(x)=\max\{1,|x|^{p-1}\}. From (4.5), we can see that the pp-th moment functional T(p)T^{(p)} is Lipschitz continuous w.r.t. 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} on 1p\mathcal{M}_{1}^{p}.

Lemma 4.1

Let 𝛏:=(ξ1,,ξN)IRN\boldsymbol{\xi}:=(\xi^{1},\cdots,\xi^{N})\in{\rm I\!R}^{N} and Ψ\Psi be a set of functions from IRN{\rm I\!R}^{N} to IR{\rm I\!R}, i.e.,

Ψ:={ψ:IRNIR:|ψ(𝝃~)ψ(𝝃^)|1Nk=1Ncp(ξ~k,ξ^k)|ξ~kξ^k|,𝝃,𝝃~IRN},\displaystyle\Psi:=\left\{\psi:{\rm I\!R}^{N}\to{\rm I\!R}:|\psi(\tilde{\boldsymbol{\xi}})-\psi(\widehat{\boldsymbol{\xi}})|\leq\frac{1}{N}\sum_{k=1}^{N}c_{p}(\tilde{\xi}^{k},\widehat{\xi}^{k})|\tilde{\xi}^{k}-\widehat{\xi}^{k}|,\;\forall\boldsymbol{\xi},\tilde{\boldsymbol{\xi}}\in{\rm I\!R}^{N}\right\}, (4.9)

where cp(ξ,ξ~):=max{1,|ξ|,|ξ~|}p1c_{p}(\xi,\tilde{\xi}):=\max\{1,|\xi|,|\tilde{\xi}|\}^{p-1} for all ξ,ξ~IR\xi,\tilde{\xi}\in{\rm I\!R} and p1p\geq 1. Then

𝖽𝗅Ψ(PN,QN)𝖽𝗅FM,p(P,Q)<+,P,Q𝒫p(IR),\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}(P^{\otimes N},Q^{\otimes N})\leq\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,\quad\forall P,Q\in\mathscr{P}_{p}({\rm I\!R}), (4.10)

where 𝖽𝗅Ψ\mathsf{d\kern-0.70007ptl}_{\Psi} is defined by (3.3).

Before presenting a proof, it might be helpful for us to explain why we consider a specific set of functions Ψ\Psi. For fixed NN\in\mathbb{N}, let 1,empN\mathscr{M}_{1,\mathrm{emp}}^{N} denote the set of all empirical laws PNP_{N} over IR{\rm I\!R}, then 1,emp=N1,empN\mathscr{M}_{1,\mathrm{emp}}=\bigcup_{N\in\mathbb{N}}\mathscr{M}_{1,\mathrm{emp}}^{N}. Then Ψ\Psi may be regarded as a set of functions derived from a class of Lipschitz continuous functional on 1,empN\mathscr{M}_{1,\mathrm{emp}}^{N} with L=1L=1 and 𝖽𝗅=𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}=\mathsf{d\kern-0.70007ptl}_{FM,p} (by writing T(PN)T(P_{N}) as a function of samples). Lemma 4.1 says that for any NN\in\mathbb{N}, the discrepancy between PNP^{\otimes N} and QNQ^{\otimes N} under the metric 𝖽𝗅Ψ\mathsf{d\kern-0.70007ptl}_{\Psi} can be bounded by 𝖽𝗅FM,p(P,Q)\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q).

Proof. Let ξj:={ξ1,,ξj1,ξj+1,,ξN}\xi^{-j}:=\{\xi^{1},\cdots,\xi^{j-1},\xi^{j+1},\cdots,\xi^{N}\}, ξj:={ξ1,,ξj}\vec{\xi}^{j}:=\{\xi^{1},\cdots,\xi^{j}\} and ξj:={ξj+1,,ξN}\vec{\xi}^{-j}:=\{\xi^{j+1},\cdots,\xi^{N}\}. For any P1,,PN𝒫(IR)P_{1},\cdots,P_{N}\in\mathscr{P}({\rm I\!R}) and any j{1,,N}j\in\{1,\cdots,N\}, denote

Pj(dξj):=P1(dξ1)Pj1(dξj1)Pj+1(dξj+1)PN(dξN)P_{-j}(d\xi^{-j}):=P_{1}(d\xi^{1})\cdots P_{j-1}(d\xi^{j-1})P_{j+1}(d\xi^{j+1})\cdots P_{N}(d\xi^{N})

and

hξj(ξj):=IR(N1)ψ(ξj,ξj)Pj(dξj).h_{{\xi}^{-j}}({\xi}^{j}):=\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j}).

Then

|hξj(ξ~j)hξj(ξ^j)|\displaystyle|h_{{\xi}^{-j}}(\tilde{\xi}^{j})-h_{{\xi}^{-j}}(\widehat{\xi}^{j})| \displaystyle\leq IR(N1)|ψ(ξj,ξ~j)ψ(ξj,ξ^j)|Pj(dξj)\displaystyle\int_{{\rm I\!R}^{(N-1)}}\left|\psi({\xi}^{-j},\tilde{\xi}^{j})-\psi({\xi}^{-j},\widehat{\xi}^{j})\right|P_{-j}(d\xi^{-j})
\displaystyle\leq IR(N1)1Ncp(ξ~j,ξ^j)|ξ~jξ^j|Pj(dξj)\displaystyle\int_{{\rm I\!R}^{(N-1)}}\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})|\tilde{\xi}^{j}-\widehat{\xi}^{j}|P_{-j}(d\xi^{-j})
\displaystyle\leq 1Ncp(ξ~j,ξ^j)|ξ~jξ^j|.\displaystyle\frac{1}{N}c_{p}(\tilde{\xi}^{j},\widehat{\xi}^{j})|\tilde{\xi}^{j}-\widehat{\xi}^{j}|.

Let \mathcal{H} denote the set of functions hξj(ξj)h_{{\xi}^{-j}}({\xi}^{j}) generated by ψΨ\psi\in\Psi. By the definition of 𝖽𝗅Ψ\mathsf{d\kern-0.70007ptl}_{\Psi} and the pp-th order Forter-Mourier metric,

𝖽𝗅Ψ(Pj×P~j,Pj×P^j)\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}(P_{-j}\times\tilde{P}_{j},P_{-j}\times\widehat{P}_{j}) =\displaystyle= supψΨ|IRIR(N1)ψ(ξj,ξj)Pj(dξj)P~j(dξj)\displaystyle\sup_{\psi\in\Psi}\left|\int_{{\rm I\!R}}\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j})\tilde{P}_{j}(d\xi^{j})\right. (4.11)
IRIR(N1)ψ(ξj,ξj)Pj(dξj)P^j(dξj)|\displaystyle-\left.\int_{{\rm I\!R}}\int_{{\rm I\!R}^{(N-1)}}\psi({\xi}^{-j},\xi^{j})P_{-j}(d\xi^{-j})\widehat{P}_{j}(d\xi^{j})\right|
=\displaystyle= suphξj|IRhξj(ξj)P~j(dξj)IRhξj(ξj)P^j(dξj)|\displaystyle\sup_{h_{{\xi}^{-j}}\in{\cal H}}\left|\int_{{\rm I\!R}}h_{{\xi}^{-j}}({\xi}^{j})\tilde{P}_{j}(d\xi^{j})-\int_{{\rm I\!R}}h_{{\xi}^{-j}}({\xi}^{j})\widehat{P}_{j}(d\xi^{j})\right|
\displaystyle\leq 1N𝖽𝗅FM,p(P~j,P^j),\displaystyle\frac{1}{N}\mathsf{d\kern-0.70007ptl}_{FM,p}(\tilde{P}_{j},\widehat{P}_{j}),

where the inequality is due to Nhξj(ξj)p(IR)Nh_{\xi^{-j}}(\xi^{j})\in\mathcal{F}_{p}({\rm I\!R}) and the definition of 𝖽𝗅FM,p(P,Q)\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q). Finally, by the triangle inequality of the pseudo-metric, we have

𝖽𝗅Ψ(PN,QN)\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes N},Q^{\otimes N}\right) \displaystyle\leq 𝖽𝗅Ψ(PN,P(N1)×Q)+𝖽𝗅Ψ(P(N1)×Q,P(N2)×Q2)\displaystyle\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes N},P^{\otimes(N-1)}\times Q\right)+\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P^{\otimes(N-1)}\times Q,P^{\otimes(N-2)}\times Q^{\otimes 2}\right)
++𝖽𝗅Ψ(P×Q(N1),QN)\displaystyle+\cdots+\mathsf{d\kern-0.70007ptl}_{\Psi}\left(P\times Q^{\otimes(N-1)},Q^{\otimes N}\right)
\displaystyle\leq 1N𝖽𝗅FM,p(P,Q)×N\displaystyle\frac{1}{N}\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)\times N
=\displaystyle= 𝖽𝗅FM,p(P,Q).\displaystyle\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q).

The proof is complete.  

With the intermediate technical result, we are now ready to present our main result of quantitative statistical robustness for the plug-in estimator of a general risk functional.

Theorem 4.2

Let ϱ:𝒫(IR)IR\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R} be a general statistical functional and \mathscr{M} be a subset of 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) with p1p\geq 1. Assume, for fixed NN\in\mathbb{N}, there exists a positive constant LL such that

|ϱ(PN)ϱ(QN)|LNk=1Ncp(ξk,ξ^k)|ξkξ^k|,ξk,ξk^IR,\displaystyle|\varrho(P_{N})-\varrho(Q_{N})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})|\xi^{k}-\widehat{\xi}^{k}|,\;\forall\xi^{k},\widehat{\xi^{k}}\in{\rm I\!R}, (4.12)

where PNP_{N} and QNQ_{N} are given by (2.3) and (2.4) respectively. Then ϱ^N\widehat{\varrho}_{N} is quantitatively robust on \mathscr{M} w.r.t. (𝖽𝗅K,𝖽𝗅FM,p)(\mathsf{d\kern-0.70007ptl}_{K},\mathsf{d\kern-0.70007ptl}_{FM,p}), i.e.,

𝖽𝗅K(PNϱ^N1,QNϱ^N1)L𝖽𝗅FM,p(P,Q)<+,P,Q.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty,\;\forall\;P,Q\in\mathscr{M}. (4.13)

If (4.12) holds for all NN\in\mathbb{N}, then the whole sequence of the plug-in estimators {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} is quantitatively robust on \mathscr{M}, i.e., (4.13) holds for all NN\in\mathbb{N}. Moreover, in the case when p=1p=1, (4.13) reduces to

𝖽𝗅K(PNϱ^N1,QNϱ^N1)L𝖽𝗅K(P,Q)<+,P,Q.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty,\;\forall\;P,Q\in\mathscr{M}. (4.14)

Proof. Since the underlying probability space is atomless, then for any NN\in\mathbb{N}, by definition

𝖽𝗅K(PNϱ^N1,QNϱ^N1)\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right) (4.15)
=\displaystyle= supψ1(IR)|IRψ(t)PNϱ^N1(dt)IRψ(t)QNϱ^N1(dt)|\displaystyle\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left|\int_{\rm I\!R}\psi(t)P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}(dt)-\int_{\rm I\!R}\psi(t)Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}(dt)\right|
=\displaystyle= supψ1(IR)|IRNψ(ϱ(ξN))PN(dξN)IRNψ(ϱ(ξN))QN(dξN)|,\displaystyle\sup_{\psi\in\mathcal{F}_{1}({\rm I\!R})}\left|\int_{{\rm I\!R}^{N}}\psi(\varrho(\vec{\xi}^{N}))P^{\otimes N}(d\vec{\xi}^{N})-\int_{{\rm I\!R}^{N}}\psi(\varrho(\vec{\xi}^{N}))Q^{\otimes N}(d\vec{\xi}^{N})\right|,

where we write ξN\vec{\xi}^{N} for (ξ1,,ξN)(\xi^{1},\cdots,\xi^{N}) and ϱ(ξN)\varrho(\vec{\xi}^{N}) for ϱ^N\widehat{\varrho}_{N} to indicate its dependence on ξ1,,ξN\xi^{1},\cdots,\xi^{N}.

For any ψ1(IR)\psi\in\mathcal{F}_{1}({\rm I\!R}), (4.12) ensures that

|ψ(ϱ(ξ~))ψ(ϱ(ξ^))||ϱ(ξ~)ϱ(ξ^)|LNk=1Ncp(ξ~k,ξ^k)|ξ~kξ^k|,ξ~,ξ^IR,\displaystyle|\psi(\varrho(\tilde{\vec{\xi}}))-\psi(\varrho(\widehat{\vec{\xi}}))|\leq|\varrho(\tilde{\vec{\xi}})-\varrho(\widehat{\vec{\xi}})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\tilde{\xi}^{k},\widehat{\xi}^{k})|\tilde{\xi}^{k}-\widehat{\xi}^{k}|,\;\forall\tilde{\xi},\widehat{\xi}\in{\rm I\!R},

which means that ψ(ϱ())\psi(\varrho(\cdot)) is locally Lipschitz continuous in ξN\vec{\xi}^{N}, i.e., ψ(ϱ())p((IRN)\psi(\varrho(\cdot))\in\mathcal{F}_{p}(({\rm I\!R}^{N}) from (3.4). Since P,Q𝒫p(IR)𝒫K(IR)P,Q\in\mathscr{M}\subset\mathscr{P}_{p}({\rm I\!R})\subset\mathscr{P}_{K}({\rm I\!R}) (see Example 3.2(i) and Proposition 3.1(i)), then (4.15) is finite. The rest follows from Lemma 4.1 by setting ψ(ξ1,,ξN)=ψ(ϱ(ξ1,,ξN))\psi(\xi^{1},\cdots,\xi^{N})=\psi(\varrho(\xi^{1},\cdots,\xi^{N})).

 

From Example 3.1, we have 𝖽𝗅Prok(P,Q)𝖽𝗅K(P,Q)\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}(P,Q)\leq\sqrt{\mathsf{d\kern-0.70007ptl}_{K}(P,Q)} for all P,Q𝒫(IR)P,Q\in\mathscr{P}({\rm I\!R}), then we have the following corollary.

Corollary 4.1

Let ϱ:𝒫(IR)IR\varrho:\mathscr{P}({\rm I\!R})\to{\rm I\!R} be a general statistical functional. Assume that ϱ\varrho is Lipschitz continuous w.r.t. 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} (p1p\geq 1) on 𝒫p(IR)\mathscr{M}\subset\mathscr{P}_{p}({\rm I\!R}) for the constant LL. Then the plug-in estimator sequence {ϱ^N}N\{\widehat{\varrho}_{N}\}_{N\in\mathbb{N}} is quantitatively robust on \mathscr{M} w.r.t. (𝖽𝗅Prok,𝖽𝗅FM,p)(\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}},\mathsf{d\kern-0.70007ptl}_{FM,p}), i.e.,

𝖽𝗅Prok(PNϱ^N1,QNϱ^N1)L𝖽𝗅FM,p(P,Q)<+,P,Q\displaystyle\mathsf{d\kern-0.70007ptl}_{\mathrm{Prok}}\left(P^{\otimes N}\circ\widehat{\varrho}_{N}^{-1},Q^{\otimes N}\circ\widehat{\varrho}_{N}^{-1}\right)\leq\sqrt{L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)}<+\infty,\;\forall\;P,Q\in\mathscr{M}

for all NN\in\mathbb{N}.

Next, we take a step further to consider the index of quantitative robustness for a general statistical functional.

Definition 4.6 (Index of quantitative robustness)

Let ϱ:𝒫(IR)IR\varrho:\mathscr{P}({\rm I\!R})\rightarrow{\rm I\!R} be a general statistical functional. If ϱ\varrho is Lipschitz continuous w.r.t. 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} on 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}) for the constant LL for some p1p\geq 1, then we can define an index of quantitative robustness of a statistical functional ϱ\varrho as

iqr(ϱ):=(inf{p[1,+):ϱ is Lipschitz continuous w.r.t. 𝖽𝗅FM,pon𝒫p(IR)})1.\displaystyle\mathrm{iqr}(\varrho):=\left(\inf\{p\in[1,+\infty):\mbox{\rm{$\varrho$ is Lipschitz continuous w.r.t. $\mathsf{d\kern-0.70007ptl}_{FM,p}$}}\;\mbox{\rm{on}}\;\mathscr{P}_{p}({\rm I\!R})\}\right)^{-1}. (4.16)

This index is a quantitative measurement for the degree of robustness of a statistical functional. A larger index reflects a higher degree of robustness. For a general statistical functional ϱ\varrho, (4.5) may hold for uncountable many pp, see e.g., the 22-th moment functional T(2)T^{(2)} satisfying (4.5) for any p2p\geq 2 on 𝒫p(IR)=1p\mathscr{P}_{p}({\rm I\!R})=\mathcal{M}_{1}^{p}. From Definition 4.6, we conclude that the pp-th moment functional T(p)T^{(p)} has the index iqr(T(p))=1p\mathrm{iqr}(T^{(p)})=\frac{1}{p}. Definition 4.6 coincides with the index of qualitative robustness proposed by Krätschmer et al. [7] when ϱ\varrho is Lipschitz continuous w.r.t. 𝖽𝗅FM,p\mathsf{d\kern-0.70007ptl}_{FM,p} on 𝒫p(IR)\mathscr{P}_{p}({\rm I\!R}). The main advantage of Definition 4.6 is that it is easy to calculate and we will illustrate this in the next section.

5 Application to risk measures

As we discussed in Proposition 2.1, law invariant risk measure of a random variable can be represented as a composition of a risk functional and law of the random variable. In practice, risk of a random variable is often calculated with empirical data, this is because either the true probability distribution is unknown or it might be prohibitively expensive to calculate the risk of a random variable with the true probability distribution. This raises a question as to whether the estimated risk measure based on empirical data is reliable or not. In this section, we apply the quantitative robustness results established in Theorem 4.2 to some well-known risk measures. The next proposition synthesizes Proposition 2.1 and Theorem 4.2.

Proposition 5.1

Let ρ(X)\rho(X) be a tail-dependent law invariant convex risk measure with representation (2.1), let PNP_{N} and QNQ_{N} be empirical probability measures defined as in (2.4). Assume that there exists a positive number p1p\geq 1 such that

|ϱ(PN)ϱ(QN)|LNk=1Ncp(ξk,ξ^k)|ξkξ^k|,ξ,ξ^IRN.\displaystyle|\varrho(P_{N})-\varrho(Q_{N})|\leq\frac{L}{N}\sum_{k=1}^{N}c_{p}(\xi^{k},\widehat{\xi}^{k})|\xi^{k}-\widehat{\xi}^{k}|,\forall\vec{\xi},\widehat{\vec{\xi}}\in{\rm I\!R}^{N}. (5.1)

Then for any NN\in\mathbb{N} and any P,Q1pP,Q\in{\cal M}_{1}^{p}

𝖽𝗅K(PNϱ(PN)1,QNϱ(QN)1)L𝖽𝗅FM,p(P,Q)<+.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ\varrho(P_{N})^{-1},Q^{\otimes N}\circ\varrho(Q_{N})^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty. (5.2)

In what follows, we verify condition (5.1) for some well-known risk measures and hence show that they satisfy the proposed quantitative statistical robustness (5.2). To make the notation easily, we introduce the law invariant risk measure on the space of probability distributions.

Example 5.1

The expectation of G𝒫(IR)G\in\mathscr{P}({\rm I\!R}) given by 𝔼(G):=IRξ𝑑G(ξ)\mathbb{E}(G):=\int_{{\rm I\!R}}\xi dG(\xi) satisfies

|𝔼(PN)𝔼(QN)|=|IRξd(PNQN)(ξ)|1Ni=1N|ξiξ^i|.\displaystyle|\mathbb{E}(P_{N})-\mathbb{E}(Q_{N})|=\left|\int_{{\rm I\!R}}\xi d(P_{N}-Q_{N})(\xi)\right|\leq\frac{1}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|.

Let TN:=𝔼(G^N)T_{N}:=\mathbb{E}(\widehat{G}_{N}), where G^N\widehat{G}_{N} is the empirical distribution of GG. Then for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1},

𝖽𝗅K(PNTN1,QNTN1)𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.3)

and the index of quantitative robustness iqr(𝔼)=1\mathrm{iqr}(\mathbb{E})=1.

Example 5.2

Consider the conditional value-at-risk of a probability distribution G𝒫(IR)G\in\mathscr{P}({\rm I\!R}) at level τ(0,1)\tau\in(0,1), which is defined by

CVaRτ(G):=inf{r+11τIRmax{0,ξr}𝑑G(ξ),rIR}.\displaystyle\mbox{\rm{CVaR}}_{\tau}(G):=\inf\left\{r+\frac{1}{1-\tau}\int_{{\rm I\!R}}\max\{0,\xi-r\}dG(\xi),\forall r\in{\rm I\!R}\right\}.

Then

|CVaRp(PN)CVaRp(QN)|\displaystyle|\mbox{\rm{CVaR}}_{p}(P_{N})-\mbox{\rm{CVaR}}_{p}(Q_{N})| \displaystyle\leq 11τsuprIR|IRmax{0,ξr}d(PNQN)(ξ)|\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\left|\int_{{\rm I\!R}}\max\{0,\xi-r\}d(P_{N}-Q_{N})(\xi)\right|
=\displaystyle= 11τsuprIR1N|i=1Nmax{0,ξir}max{0,ξ^ir}|\displaystyle\frac{1}{1-\tau}\sup_{r\in{\rm I\!R}}\frac{1}{N}\left|\sum_{i=1}^{N}\max\{0,\xi^{i}-r\}-\max\{0,\widehat{\xi}^{i}-r\}\right|
\displaystyle\leq 11τ×1Ni=1N|ξiξ^i|,\displaystyle\frac{1}{1-\tau}\times\frac{1}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|,

the last inequality is due to the fact that |max{0,x}max{0,y}||xy||\max\{0,x\}-\max\{0,y\}|\leq|x-y| holds for all x,yIRx,y\in{\rm I\!R}.

Let TN:=CVaRτ(G^N)T_{N}:=\mbox{\rm{CVaR}}_{\tau}(\widehat{G}_{N}), where G^N\widehat{G}_{N} is the empirical distribution of GG. Then for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1},

𝖽𝗅K(PNTN1,QNTN1)11τ𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\frac{1}{1-\tau}\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.4)

and the index of quantitative robustness iqr(CVaRτ)=1\mathrm{iqr}(\mbox{\rm{CVaR}}_{\tau})=1 for τ(0,1)\tau\in(0,1).

Example 5.3

The upper semi-deviation sd+(G)sd_{+}(G) of a measure G𝒫(IR)G\in\mathscr{P}({\rm I\!R}), which is defined by

sd+(G):=IRmax{0,ξIRu𝑑G(u)}𝑑G(ξ),\displaystyle sd_{+}(G):=\int_{{\rm I\!R}}\max\left\{0,\xi-\int_{{\rm I\!R}}udG(u)\right\}dG(\xi),

satisfies

|sd+(PN)sd+(QN)|\displaystyle|sd_{+}(P_{N})-sd_{+}(Q_{N})| =\displaystyle= |1Nj=1Nmax{0,ξj1Ni=1Nξi}1Nj=1Nmax{0,ξ^j1Ni=1Nξ^i}|\displaystyle\left|\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\frac{1}{N}\sum_{j=1}^{N}\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right|
\displaystyle\leq 1Nj=1N|max{0,ξj1Ni=1Nξi}max{0,ξ^j1Ni=1Nξ^i}|\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left|\max\left\{0,\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right\}-\max\left\{0,\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right\}\right|
\displaystyle\leq 1Nj=1N|(ξj1Ni=1Nξi)(ξ^j1Ni=1Nξ^i)|\displaystyle\frac{1}{N}\sum_{j=1}^{N}\left|\left(\xi^{j}-\frac{1}{N}\sum_{i=1}^{N}\xi^{i}\right)-\left(\widehat{\xi}^{j}-\frac{1}{N}\sum_{i=1}^{N}\widehat{\xi}^{i}\right)\right|
\displaystyle\leq 12j=1N(|ξjξ^j|+1Ni=1N|ξiξ^i|)\displaystyle\frac{1}{2}\sum_{j=1}^{N}\left(\left|\xi^{j}-\widehat{\xi}^{j}\right|+\frac{1}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|\right)
=\displaystyle= 2Ni=1N|ξiξ^i|.\displaystyle\frac{2}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|.

Let TN:=sd+(G^N)T_{N}:=sd_{+}(\widehat{G}_{N}), where G^N\widehat{G}_{N} is the empirical distribution of GG. Then for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1},

𝖽𝗅K(PNTN1,QNTN1)2𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq 2\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.5)

and the index of quantitative robustness iqr(sd+)=1\mathrm{iqr}(\mbox{\rm{sd}}_{+})=1.

Example 5.4

The Optimized Certainty Equivalent (OCE) [31] of G𝒫(IR)G\in\mathscr{P}({\rm I\!R}) is given by

Su(G):=supηIR{η+IRu(ξη)𝑑G(ξ)},\displaystyle S_{u}(G):=\sup_{\eta\in{\rm I\!R}}\left\{\eta+\int_{{\rm I\!R}}u(\xi-\eta)dG(\xi)\right\},

where u:IR[,)u:{\rm I\!R}\rightarrow[-\infty,\infty) is a proper concave and non-decreasing utility function satisfying the normalized property: u(0)=0u(0)=0 and 1u(0)1\in\partial u(0), where u()\partial u(\cdot) denotes the subdifferential map of uu. By the essential of [31, Proposition 2.1], we have

Su(PN)=supηIR{η+1Ni=1Nu(ξiη)}=supη[ξmin,ξmax]{η+1Ni=1Nu(ξiη)},\displaystyle S_{u}(P_{N})=\sup_{\eta\in{\rm I\!R}}\left\{\eta+\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)\right\}=\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left\{\eta+\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)\right\},

where ξmin=min{ξ1,,ξN;ξ^1,,ξ^N}\xi_{\min}=\min\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\} and ξmax=max{ξ1,,ξN;ξ^1,,ξ^N}\xi_{\max}=\max\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}. Let ρ(G):=Su(G)\rho(G):=-S_{u}(G). Then ρ()\rho(\cdot) is a convex risk measure [31] and

|ρ(PN)ρ(QN)|\displaystyle|\rho(P_{N})-\rho(Q_{N})| \displaystyle\leq supη[ξmin,ξmax]|(η+IRu(ξη)𝑑PN(ξ))(η+IRu(ξη)𝑑QN(ξ))|\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left|\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dP_{N}(\xi)\right)-\left(\eta+\int_{{\rm I\!R}}u(\xi-\eta)dQ_{N}(\xi)\right)\right|
=\displaystyle= supη[ξmin,ξmax]|1Ni=1Nu(ξiη)1Ni=1Nu(ξ^iη)|\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\left|\frac{1}{N}\sum_{i=1}^{N}u(\xi^{i}-\eta)-\frac{1}{N}\sum_{i=1}^{N}u(\widehat{\xi}^{i}-\eta)\right|
\displaystyle\leq supη[ξmin,ξmax]1Ni=1N|u(ξiη)u(ξ^iη)|\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left|u(\xi^{i}-\eta)-u(\widehat{\xi}^{i}-\eta)\right|
\displaystyle\leq 1Ni=1Nu(ξmin)|ξiξ^i|,\displaystyle\frac{1}{N}\sum_{i=1}^{N}u^{\prime}_{-}(\xi_{\min})|\xi^{i}-\widehat{\xi}^{i}|,

where u(t)u^{\prime}_{-}(t) denotes the left derivative of uu at tt and the last inequality is due to the fact that uu is non-decreasing and concave, subsequently, u(t)u^{\prime}_{-}(t) is non-increasing.

Let TN:=Su(G^N)T_{N}:=-S_{u}(\widehat{G}_{N}), where G^N\widehat{G}_{N} is the empirical distribution of GG. We consider two interesting cases.

One is that supηIRu(η)<+\sup_{\eta\in{\rm I\!R}}u^{\prime}_{-}(\eta)<+\infty, in which case

𝖽𝗅K(PNTN1,QNTN1)supηIRu(η)𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{\eta\in{\rm I\!R}}u^{\prime}_{-}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.6)

for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1} and the index of quantitative robustness for this case is 11.

The other is that there exists some positive number p>1p>1 and positive constant LL such that u(ξmin)Lcp(ξi,ξ^i)u^{\prime}_{-}(\xi_{\min})\leq Lc_{p}(\xi^{i},\widehat{\xi}^{i}), where cp(ξi,ξ^i)=max{1,|ξi|,|ξ^i|}p1c_{p}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}^{p-1}. In that case, we have

𝖽𝗅K(PNTN1,QNTN1)L𝖽𝗅FM,p(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty, (5.7)

and the index of quantitative robustness for this case is 1p\frac{1}{p}.

To see how (5.6) and (5.7) could possibly be satisfied, we consider two specific utility functions: piecewise linear utility function and quadratic utility function, both of which are extracted from [31].

(a) Piecewise linear utility function with u(t):=γ1[t]++γ2[t]+u(t):=\gamma_{1}[t]_{+}+\gamma_{2}[-t]_{+}, where 0γ1<1<γ20\leq\gamma_{1}<1<\gamma_{2} and [z]+=max{0,z}[z]_{+}=\max\{0,z\}. A simple calculation yields

|ρ(PN)ρ(QN)|γ2Ni=1N|ξiξ^i|.\displaystyle|\rho(P_{N})-\rho(Q_{N})|\leq\frac{\gamma_{2}}{N}\sum_{i=1}^{N}|\xi^{i}-\widehat{\xi}^{i}|.

Thus for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1},

𝖽𝗅K(PNTN1,QNTN1)γ2𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\gamma_{2}\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.8)

and and the index of quantitative robustness iqr(Su)=1\mathrm{iqr}(-S_{u})=1.

(b) Quadratic utility with u(t):=(t12t2)𝟏(,1)(t)+12𝟏[1,+)(t)u(t):=(t-\frac{1}{2}t^{2})\mathbf{1}_{(-\infty,1)}(t)+\frac{1}{2}\mathbf{1}_{[1,+\infty)}(t). It is easy to observe that the function is locally Lipschitz continuous over [ξmin,ξmax][\xi_{\min},\xi_{\max}] with modulus being bounded by |1ξmin||1-\xi_{\min}|. Thus

|ρ(PN)ρ(QN)|\displaystyle|\rho(P_{N})-\rho(Q_{N})| \displaystyle\leq supη[ξmin,ξmax]1Ni=1N|u(ξiη)u(ξ^iη)|\displaystyle\sup_{\eta\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left|u(\xi^{i}-\eta)-u(\widehat{\xi}^{i}-\eta)\right|
\displaystyle\leq 1Ni=1N|1ξmin||ξiξ^i|.\displaystyle\frac{1}{N}\sum_{i=1}^{N}|1-\xi_{\min}||\xi^{i}-\widehat{\xi}^{i}|.

Moreover, if ξmin1\xi_{\min}\leq-1, then |1ξmin|2|ξmin||1-\xi_{\min}|\leq 2|\xi_{\min}|. Subsequently,

|ρ(PN)ρ(QN)|2Ni=1Nc2(ξi,ξ^i)|ξiξ^i|,|\rho(P_{N})-\rho(Q_{N})|\leq\frac{2}{N}\sum_{i=1}^{N}c_{2}(\xi^{i},\widehat{\xi}^{i})|\xi^{i}-\widehat{\xi}^{i}|,

where c2(ξi,ξ^i)=max{1,|ξi|,|ξ^i|}c_{2}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}. Thus for any NN\in\mathbb{N} and any P,Q12P,Q\in{\cal M}_{1}^{2},

𝖽𝗅K(PNTN1,QNTN1)2𝖽𝗅FM,2(P,Q)\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq 2\mathsf{d\kern-0.70007ptl}_{FM,2}(P,Q) (5.9)

provided that ξmin<1\xi_{\min}<-1 and the index of quantitative robustness iqr(Su)=12\mathrm{iqr}(-S_{u})=\frac{1}{2}.

Example 5.5

Suppose that l:IRIRl:{\rm I\!R}\rightarrow{\rm I\!R} is an increasing convex loss function which is not identically constant. Let x0x_{0} be an interior point in the range of ll. The Shortfall Risk Measure [18] of G𝒫(IR)G\in\mathscr{P}({\rm I\!R}) is defined by

ρl(G):=inf{mIR:IRl(ξm)𝑑G(ξ)x0}.\displaystyle\rho_{l}(G):=\inf\left\{m\in{\rm I\!R}:\int_{{\rm I\!R}}l(\xi-m)dG(\xi)\leq x_{0}\right\}. (5.10)

Following a similar analysis to Guo and Xu [32], we can recast the formulation above as

ρl(G)=infmIRsupλ0{m+λ(IRl(ξm)𝑑G(ξ)x0)}.\displaystyle\rho_{l}(G)=\inf_{m\in{\rm I\!R}}\sup_{\lambda\geq 0}\left\{m+\lambda\left(\int_{{\rm I\!R}}l(\xi-m)dG(\xi)-x_{0}\right)\right\}. (5.11)

Swapping the inf and sup operations, we can obtain the Lagrange dual of the problem. Moreover, if we assume that the inequality constraint in (5.10) satisfies the well-known Slater condition, i.e., there exists m0m_{0} such that IRl(ξm0)𝑑G(ξ)x0<0\int_{\rm I\!R}l(\xi-m_{0})dG(\xi)-x_{0}<0, then the Lagrange multipliers of (5.10) is bounded and the strong duality holds. Consequently, we can rewrite (5.11) as

ρl(G)=infmIRsupλ[a,b]{m+λ(IRl(ξm)𝑑G(ξ)x0)},\displaystyle\rho_{l}(G)=\inf_{m\in{\rm I\!R}}\sup_{\lambda\in[a,b]}\left\{m+\lambda\left(\int_{{\rm I\!R}}l(\xi-m)dG(\xi)-x_{0}\right)\right\}, (5.12)

where a,ba,b are some positive numbers. By the essential of [31, Proposition 2.1], we have

ρl(PN)\displaystyle\rho_{l}(P_{N}) =\displaystyle= supλ[a,b]infmIR{m+λ(1Ni=1Nl(ξiη)x0)}\displaystyle\sup_{\lambda\in[a,b]}\inf_{m\in{\rm I\!R}}\left\{m+\lambda\left(\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-\eta)-x_{0}\right)\right\}
=\displaystyle= supλ[a,b]infm[ξmin,ξmax]{m+λ(1Ni=1Nl(ξim)x0)},\displaystyle\sup_{\lambda\in[a,b]}\inf_{m\in[\xi_{\min},\xi_{\max}]}\left\{m+\lambda\left(\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-m)-x_{0}\right)\right\},

where ξmin=min{ξ1,,ξN;ξ^1,,ξ^N}\xi_{\min}=\min\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\} and ξmax=max{ξ1,,ξN;ξ^1,,ξ^N}\xi_{\max}=\max\{\xi^{1},\ldots,\xi^{N};\widehat{\xi}^{1},\ldots,\widehat{\xi}^{N}\}. Subsequently,

|ρl(PN)ρl(QN)|\displaystyle|\rho_{l}(P_{N})-\rho_{l}(Q_{N})| \displaystyle\leq bsupm[ξmin,ξmax]|1Ni=1Nl(ξim)1Ni=1Nl(ξ^im)|\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\left|\frac{1}{N}\sum_{i=1}^{N}l(\xi^{i}-m)-\frac{1}{N}\sum_{i=1}^{N}l(\widehat{\xi}^{i}-m)\right|
\displaystyle\leq bsupm[ξmin,ξmax]1Ni=1N|l(ξim)l(ξ^im)|\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}\left|l(\xi^{i}-m)-l(\widehat{\xi}^{i}-m)\right|
\displaystyle\leq bsupm[ξmin,ξmax]1Ni=1N[l+(ξim)l+(ξ^im)]|ξiξ^i|\displaystyle b\sup_{m\in[\xi_{\min},\xi_{\max}]}\frac{1}{N}\sum_{i=1}^{N}[l^{\prime}_{+}(\xi^{i}-m)\vee l^{\prime}_{+}(\widehat{\xi}^{i}-m)]|\xi^{i}-\widehat{\xi}^{i}|
\displaystyle\leq bNi=1N[l+(ξiξmin)l+(ξ^iξmin)]|ξiξ^i|\displaystyle\frac{b}{N}\sum_{i=1}^{N}[l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})]|\xi^{i}-\widehat{\xi}^{i}|
\displaystyle\leq bNi=1NsupmIRl+(m)|ξiξ^i|,\displaystyle\frac{b}{N}\sum_{i=1}^{N}\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)|\xi^{i}-\widehat{\xi}^{i}|,

where l+(t)l^{\prime}_{+}(t) denote the right derivative of ll at tt and the last three inequalities are due to the fact ll is non-decreasing convex, subsequently, l+(t)l^{\prime}_{+}(t) is non-decreasing.

Let TN:=ρl(G^N)T_{N}:=\rho_{l}(\widehat{G}_{N}), where G^N\widehat{G}_{N} is the empirical distribution of GG. If supmIRl+(m)<+\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)<+\infty, then for any NN\in\mathbb{N} and any P,Q11P,Q\in{\cal M}_{1}^{1},

𝖽𝗅K(PNTN1,QNTN1)supmIRl+(η)𝖽𝗅K(P,Q)<+.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty. (5.13)

If there exists some positive number p>1p>1 and positive constant LL such that l+(ξiξmin)l+(ξ^iξmin)Lcp(ξi,ξ^i)l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})\leq Lc_{p}(\xi^{i},\widehat{\xi}^{i}), where cp(ξi,ξ^i)=max{1,|ξi|,|ξ^i|}p1c_{p}(\xi^{i},\widehat{\xi}^{i})=\max\{1,|\xi^{i}|,|\widehat{\xi}^{i}|\}^{p-1}, then

𝖽𝗅K(PNTN1,QNTN1)L𝖽𝗅FM,p(P,Q)<+.\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq L\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty. (5.14)

In what follows, we illustrate the above two inequalities with two specific loss functions: deposit insurance loss function [33] and pp-th power loss function [18].

(a) Deposit insurance loss function, l(x)=[x]+l(x)=[x]_{+}, where [x]+=max{x,0}[x]_{+}=\max\{x,0\}. Then supmIRl+(m)<+\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(m)<+\infty. Thus, for any NN\in\mathbb{N} and any P,Q1ϕP,Q\in{\cal M}_{1}^{\phi},

𝖽𝗅K(PNTN1,QNTN1)supmIRl+(η)𝖽𝗅K(P,Q)<+,\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\sup_{m\in{\rm I\!R}}l^{\prime}_{+}(\eta)\mathsf{d\kern-0.70007ptl}_{K}(P,Q)<+\infty, (5.15)

and the index of quantitative robustness is 1.

(b) For x0>0x_{0}>0, we consider the pp-th power loss function,

l(x)={1pxp,ifx00,otherwise,l(x)=\begin{cases}\frac{1}{p}x^{p},&\mbox{\rm{if}}\;x\geq 0\\ 0,&\mbox{\rm{otherwise}}\end{cases},

where p>1p>1. We have l+(x)=xp1l^{\prime}_{+}(x)=x^{p-1} for x0x\geq 0 and l+(x)=0l^{\prime}_{+}(x)=0 for x<0x<0. Then, if ξmin0\xi_{\min}\geq 0, then 0ξiξmin|ξi|0\leq\xi^{i}-\xi_{\min}\leq|\xi^{i}| and subsequently l+(ξiξmin)l+(ξ^iξmin)cp(ξi,ξ^i)l^{\prime}_{+}(\xi^{i}-\xi_{\min})\vee l^{\prime}_{+}(\widehat{\xi}^{i}-\xi_{\min})\leq c_{p}(\xi^{i},\widehat{\xi}^{i}). Thus for any NN\in\mathbb{N} and any P,Q1pP,Q\in{\cal M}_{1}^{p},

𝖽𝗅K(PNTN1,QNTN1)𝖽𝗅FM,p(P,Q)<+\displaystyle\mathsf{d\kern-0.70007ptl}_{K}\left(P^{\otimes N}\circ T_{N}^{-1},Q^{\otimes N}\circ T_{N}^{-1}\right)\leq\mathsf{d\kern-0.70007ptl}_{FM,p}(P,Q)<+\infty (5.16)

provided that ξmin0\xi_{\min}\geq 0 and the index of quantitative robustness is 1p\frac{1}{p}.

In all of the above examples, the risk measures can either be represented explicitly in the form of IRf(x)𝑑P(x)\int_{\rm I\!R}f(x)dP(x) (such as Expectation) or be obtained from solving an optimization problem where the underlying functions are represented in the expected utility form (CVaR, Certainty Equivalent and Shortfall risk measure), this is because the utility (disutility) functions are assumed to be concave (convex) and hence locally Lipschitz continuous. When growth of the Lipschitz modulus is controlled by cp(ξ,ξ)c_{p}(\xi,\xi^{\prime}), these risk measures satisfy inequality (4.5) as we have shown. This may not work for the spectral risk measures [34] with unbounded risk spectrum because the latter distort the probability distribution P(x)P(x). However, when the risk spectrum is bounded (such as CVaR which is a special case of spectral risk measure), we can still manage inequality (4.5). This explains why we haven’t included spectral risk measures in the examples.

References

  • [1] V. Krätschmer, A. Schied, and H. Zähle, “Comparative and qualitative robustness for law-invariant risk measures,” Finance and Stochastics, vol. 18, no. 2, pp. 271–295, 2014.
  • [2] R. Cont, R. Deguest, and G. Scandolo, “Robustness and sensitivity analysis of risk measurement procedures,” Quantitative finance, vol. 10, no. 6, pp. 593–606, 2010.
  • [3] H. Zähle, “Rates of almost sure convergence of plug-in estimates for distortion risk measures,” Metrika, vol. 74, no. 2, pp. 267–285, 2011.
  • [4] D. Belomestny and V. Krätschmer, “Central limit theorems for law-invariant coherent risk measures,” Journal of Applied Probability, vol. 49, no. 1, pp. 1–21, 2012.
  • [5] E. Beutner and H. Zähle, “A modified functional delta method and its application to the estimation of risk functionals,” Journal of Multivariate Analysis, vol. 101, no. 10, pp. 2452–2463, 2010.
  • [6] F. R. Hampel, “A general qualitative definition of robustness,” The Annals of Mathematical Statistics, pp. 1887–1896, 1971.
  • [7] V. Krätschmer, A. Schied, and H. Zähle, “Qualitative and infinitesimal robustness of tail-dependent statistical functionals,” Journal of Multivariate Analysis, vol. 103, no. 1, pp. 35–47, 2012.
  • [8] H. Zähle, “Qualitative robustness of von mises statistics based on strongly mixing data,” Statistical Papers, vol. 55, no. 1, pp. 157–167, 2014.
  • [9] H. Zähle et al., “Qualitative robustness of statistical functionals under strong mixing,” Bernoulli, vol. 21, no. 3, pp. 1412–1434, 2015.
  • [10] G. Boente, R. Fraiman, V. J. Yohai, et al., “Qualitative robustness for stochastic processes,” The Annals of Statistics, vol. 15, no. 3, pp. 1293–1312, 1987.
  • [11] K. Strohriegl and R. Hable, “Qualitative robustness of estimators on stochastic processes,” Metrika, vol. 79, no. 8, pp. 895–917, 2016.
  • [12] P. J. Huber and E. M. Ronchetti, Robust statistics. Springer, 2011.
  • [13] H. Zähle, “A definition of qualitative robustness for general point estimators, and examples,” Journal of Multivariate Analysis, vol. 143, pp. 12–31, 2016.
  • [14] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust statistics: the approach based on influence functions, vol. 196. John Wiley & Sons, 2011.
  • [15] R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibián-Barrera, Robust statistics: theory and methods (with R). John Wiley & Sons, 2019.
  • [16] S. Guo and H. Xu, “Statistical robustness in utility preference robust optimization models,” Submitted to Mathematical Programming, 2020.
  • [17] D. Filipović and G. Svindland, “The canonical model space for law-invariant convex risk measures is l1,” Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, vol. 22, no. 3, pp. 585–589, 2012.
  • [18] H. Föllmer and A. Schied, “Convex measures of risk and trading constraints,” Finance and stochastics, vol. 6, no. 4, pp. 429–447, 2002.
  • [19] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, “Coherent measures of risk,” Mathematical finance, vol. 9, no. 3, pp. 203–228, 1999.
  • [20] H. Föllmer and S. Weber, “The axiomatic approach to risk measures for capital determination,” Annual Review of Financial Economics, vol. 7, pp. 301–337, 2015.
  • [21] E. Delage, D. Kuhn, and W. Wiesemann, ““dice”-sion–making under uncertainty: When can a random decision reduce risk?,” Management Science, vol. 65, no. 7, pp. 3282–3301, 2019.
  • [22] M. Frittelli, M. Maggis, and I. Peri, “Risk measures on and value at risk with probability/loss function,” Mathematical Finance, vol. 24, no. 3, pp. 442–463, 2014.
  • [23] D. Dentcheva and A. Ruszczyński, “Risk preferences on the space of quantile functions,” Mathematical Programming, vol. 148, no. 1-2, pp. 181–200, 2014.
  • [24] W. B. Haskell, W. Huang, and H. Xu, “Preference elicitation and robust optimization with multi-attribute quasi-concave choice functions,” arXiv preprint arXiv:1805.06632, 2018.
  • [25] A. L. Gibbs and F. E. Su, “On choosing and bounding probability metrics,” International statistical review, vol. 70, no. 3, pp. 419–435, 2002.
  • [26] I. Mizera et al., “Qualitative robustness and weak continuity: the extreme unction,” Nonparametrics and robustness in modern statistical inference and time series analysis: a Festschrift in honor of Professor Jana Jurecková, vol. 1, p. 169, 2010.
  • [27] W. Römisch, “Stability of stochastic programming problems,” Handbooks in operations research and management science, vol. 10, pp. 483–554, 2003.
  • [28] G. C. Pflug and A. Pichler, “Approximations for probability distributions and stochastic optimization problems,” in Stochastic optimization methods in finance and energy, pp. 343–387, Springer, 2011.
  • [29] S. T. Rachev, Probability metrics and the stability of stochastic models, vol. 269. John Wiley & Son Ltd, 1991.
  • [30] P. Mattila, Geometry of sets and measures in Euclidean spaces: fractals and rectifiability. No. 44, Cambridge university press, 1999.
  • [31] A. Ben-Tal and M. Teboulle, “An old-new concept of convex risk measures: The optimized certainty equivalent,” Mathematical Finance, vol. 17, no. 3, pp. 449–476, 2007.
  • [32] S. Guo, H. Xu, and L. Zhang, “Convergence analysis for mathematical programs with distributionally robust chance constraint,” SIAM Journal on optimization, vol. 27, no. 2, pp. 784–816, 2017.
  • [33] C. Chen, G. Iyengar, and C. C. Moallemi, “An axiomatic approach to systemic risk,” Management Science, vol. 59, no. 6, pp. 1373–1388, 2013.
  • [34] C. Acerbi, “Spectral measures of risk: A coherent representation of subjective risk aversion,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1505–1518, 2002.
  • [35] S. Kusuoka, “On law invariant coherent risk measures,” in Advances in mathematical economics, pp. 83–95, Springer, 2001.

Appendix A

Lemma A.1

Let {ai}i=1N\{a_{i}\}_{i=1}^{N} and {bi}i=1N\{b_{i}\}_{i=1}^{N} be two non-decreasing sequences. Then for any permutation {k1,k2,,kN}\{k_{1},k_{2},\ldots,k_{N}\} of {1,2,,N}\{1,2,\ldots,N\}, we have

i=1N|aibi|i=1N|aibki|.\displaystyle\sum_{i=1}^{N}|a_{i}-b_{i}|\leq\sum_{i=1}^{N}|a_{i}-b_{k_{i}}|.

Proof. The result is perhaps well known. We include a proof as we cannot find a reference. We do so by induction.

For N=1N=1, the statement is trivial and for N=2N=2, |a1b1|+|a2b2||a1b2|+|a2b1||a_{1}-b_{1}|+|a_{2}-b_{2}|\leq|a_{1}-b_{2}|+|a_{2}-b_{1}| for any a1a2a_{1}\leq a_{2} and b1b2b_{1}\leq b_{2}. Assume that the conclusion holds for NnN\leq n. Then for N=n+1N=n+1, we have for any non-decreasing sequences {ai}i=1n+1\{a_{i}\}_{i=1}^{n+1} and {bi}i=1n+1\{b_{i}\}_{i=1}^{n+1} and any permutation of {k1,,kn+1}\{k_{1},\ldots,k_{n+1}\} of {1,,n+1}\{1,\ldots,n+1\}, there exists a j{1,,n+1}j\in\{1,\ldots,n+1\} such that bkj=bn+1b_{k_{j}}=b_{n+1}. If j=n+1j=n+1, then from induction hypothesis for N=nN=n, we have

i=1n+1|aibi|=i=1n|aibi|+|an+1bkn+1|i=1n+1|aibki|.\displaystyle\sum_{i=1}^{n+1}|a_{i}-b_{i}|=\sum_{i=1}^{n}|a_{i}-b_{i}|+|a_{n+1}-b_{k_{n+1}}|\leq\sum_{i=1}^{n+1}|a_{i}-b_{k_{i}}|.

If j<n+1j<n+1, then we have

i=1n+1|aibki|\displaystyle\sum_{i=1}^{n+1}|a_{i}-b_{k_{i}}| =\displaystyle= i=1j1|aibki|+i=j+1n|aibki|+|ajbkj|+|an+1bkn+1|\displaystyle\sum_{i=1}^{j-1}|a_{i}-b_{k_{i}}|+\sum_{i=j+1}^{n}|a_{i}-b_{k_{i}}|+|a_{j}-b_{k_{j}}|+|a_{n+1}-b_{k_{n+1}}|
\displaystyle\geq i=1j1|aibki|+i=j+1n|aibki|+|ajbkn+1|+|an+1bkj|\displaystyle\sum_{i=1}^{j-1}|a_{i}-b_{k_{i}}|+\sum_{i=j+1}^{n}|a_{i}-b_{k_{i}}|+|a_{j}-b_{k_{n+1}}|+|a_{n+1}-b_{k_{j}}|
=\displaystyle= i=1j1|aibki|+i=j+1n|aibki|+|ajbkn+1|+|an+1bn+1|\displaystyle\sum_{i=1}^{j-1}|a_{i}-b_{k_{i}}|+\sum_{i=j+1}^{n}|a_{i}-b_{k_{i}}|+|a_{j}-b_{k_{n+1}}|+|a_{n+1}-b_{n+1}|
\displaystyle\geq i=1n|aibi|+|an+1bn+1|=i=1n+1|aibi|,\displaystyle\sum_{i=1}^{n}|a_{i}-b_{i}|+|a_{n+1}-b_{n+1}|=\sum_{i=1}^{n+1}|a_{i}-b_{i}|,

where the first inequality is from induction hypothesis for N=2N=2 to the non-decreasing sequences {aj,an+1}\{a_{j},a_{n+1}\} and {bkn+1,bkj}\{b_{k_{n+1}},b_{k_{j}}\} and the second inequality is due to induction hypothesis for N=nN=n to the non-decreasing sequences {a1,,an}\{a_{1},\ldots,a_{n}\} and {b1,,bn}\{b_{1},\ldots,b_{n}\}.  

Proposition A.1

Let {ai}i=1N\{a_{i}\}_{i=1}^{N} be a sequence of numbers and {bi}i=1N\{b_{i}\}_{i=1}^{N} be a sequence of non-negative numbers. If ai1ai2aiNa_{i_{1}}\leq a_{i_{2}}\leq\cdots\leq a_{i_{N}}, bi1bi2biNb_{i_{1}}\leq b_{i_{2}}\leq\cdots\leq b_{i_{N}}, then

k=1Nakbkk=1Naikbik.\displaystyle\sum_{k=1}^{N}a_{k}b_{k}\leq\sum_{k=1}^{N}a_{i_{k}}b_{i_{k}}.

See e.g. [35, Proposition 12].

Appendix B

Example B.1

In this example, we show that both inclusions in (3.12) are strict. We first show that 1ϕ𝒫(ϕ)(IR)\mathcal{M}_{1}^{\phi}\neq\mathscr{P}_{(\phi)}({\rm I\!R}), i.e., there exists a P𝒫(ϕ)(IR)P\in\mathscr{P}_{(\phi)}({\rm I\!R}) such that P1ϕP\notin\mathcal{M}_{1}^{\phi}. Let ϕ\phi be a unbounded uu-shaped function. Then by the continuity of ϕ\phi, there exist a<0a<0 and b>0b>0 with ϕ(a)=2=ϕ(b)\phi(a)=2=\phi(b). Let

P(x)={1ϕ(x),forxa,12,foraxb11ϕ(x),forxb.,\displaystyle P(x)=\begin{cases}\frac{1}{\phi(x)},&\mbox{\rm{for}}\;x\leq a,\\ \frac{1}{2},&\mbox{\rm{for}}\;a\leq x\leq b\\ 1-\frac{1}{\phi(x)},&\mbox{\rm{for}}\;x\geq b.\end{cases},

Since ϕ(x)1\phi(x)\geq 1 for all xx outside [a,b][a,b], then P(x)P(x) is well-defined on IR{\rm I\!R}. By the monotonicity and unboundedness of ϕ\phi, we have P𝒫(IR)P\in\mathscr{P}({\rm I\!R}). Moreover, since

supx0|P(x)ϕ(x)|+supx>0|(1P(x))ϕ(x)|=2,\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|=2,

then P𝒫(ϕ)(IR)P\in\mathscr{P}_{(\phi)}({\rm I\!R}). However, by change of variables in integration, we have

IRϕ(x)𝑑P(x)\displaystyle\int_{{\rm I\!R}}\phi(x)dP(x) =\displaystyle= aϕ(x)d(1ϕ(x))+b+ϕ(x)d(11ϕ(x))\displaystyle\int_{-\infty}^{a}\phi(x)d\left(\frac{1}{\phi(x)}\right)+\int_{b}^{+\infty}\phi(x)d\left(1-\frac{1}{\phi(x)}\right)
=\displaystyle= 0121t𝑑t+12111t𝑑t=20121t𝑑t=+,\displaystyle\int_{0}^{\frac{1}{2}}\frac{1}{t}dt+\int_{\frac{1}{2}}^{1}\frac{1}{1-t}dt=2\int_{0}^{\frac{1}{2}}\frac{1}{t}dt=+\infty,

which means P1ϕP\notin\mathcal{M}_{1}^{\phi}.

Now we show that 𝒫(ϕ)(IR)ϵ>01ϕ1ϵ\mathscr{P}_{(\phi)}({\rm I\!R})\neq\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}, i.e., there exists a Pϵ>01ϕ1ϵP\in\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}} such that P𝒫(ϕ)(IR)P\notin\mathscr{P}_{(\phi)}({\rm I\!R}). Let ϕ\phi be an unbounded uu-shaped function. Then there exists an unbounded uu-shaped function ψ\psi such that lim|x|+ψ(x)/ϕ(x)=0\lim_{|x|\rightarrow+\infty}\psi(x)/\phi(x)=0. More precisely, for any ϵ(0,1)\epsilon\in(0,1), there exists an unbounded uu-shaped function ψ\psi such that

lim|x|+ψ(x)/ϕ(x)1ϵ=0.\displaystyle\lim_{|x|\rightarrow+\infty}\psi(x)/\phi(x)^{1-\epsilon}=0. (B.1)

We construct such ψ\psi as follows: since ϕ\phi is an unbounded uu-shape function, then there exist a<0a<0 and b>0b>0 with ϕ(a)=e2=ϕ(b)\phi(a)=e^{2}=\phi(b). Let

ψ(x)={ln(ϕ(x)),forxa,2,foraxb,ln(ϕ(x)),forxb.\displaystyle\psi(x)=\begin{cases}\ln{(\phi(x))},&\mbox{\rm{for}}\;x\leq a,\\ 2,&\mbox{\rm{for}}\;a\leq x\leq b,\\ \ln{(\phi(x))},&\mbox{\rm{for}}\;x\geq b.\end{cases}

Then ψ\psi is an unbounded uu-shaped function and satisfies (B.1). Let

P(x)={1ψ(x),forxa,12,foraxb11ψ(x),forxb.,\displaystyle P(x)=\begin{cases}\frac{1}{\psi(x)},&\mbox{\rm{for}}\;x\leq a,\\ \frac{1}{2},&\mbox{\rm{for}}\;a\leq x\leq b\\ 1-\frac{1}{\psi(x)},&\mbox{\rm{for}}\;x\geq b.\end{cases},

Since ψ(x)1\psi(x)\geq 1 for all xx, then P(x)P(x) is well-defined on IR{\rm I\!R}. By the monotonicity and unboundedness of ψ\psi, we have P𝒫(IR)P\in\mathscr{P}({\rm I\!R}).

For fixed ϵ(0,1)\epsilon\in(0,1), by change of variables in integration, we have

IRϕ(x)1ϵ𝑑P(x)\displaystyle\int_{{\rm I\!R}}\phi(x)^{1-\epsilon}dP(x) =\displaystyle= aϕ(x)1ϵd(1ψ(x))+b+ϕ(x)1ϵd(11ψ(x))\displaystyle\int_{-\infty}^{a}\phi(x)^{1-\epsilon}d\left(\frac{1}{\psi(x)}\right)+\int_{b}^{+\infty}\phi(x)^{1-\epsilon}d\left(1-\frac{1}{\psi(x)}\right)
=\displaystyle= aϕ(x)1ϵd(1lnϕ(x))+b+ϕ(x)1ϵd(11lnϕ(x))\displaystyle\int_{-\infty}^{a}\phi(x)^{1-\epsilon}d\left(\frac{1}{\ln{\phi(x)}}\right)+\int_{b}^{+\infty}\phi(x)^{1-\epsilon}d\left(1-\frac{1}{\ln{\phi(x)}}\right)
=\displaystyle= 01ln2e1ϵt𝑑t+1ln21e1ϵ1t𝑑t\displaystyle\int_{0}^{\frac{1}{\ln{2}}}e^{\frac{1-\epsilon}{t}}dt+\int_{\frac{1}{\ln{2}}}^{1}e^{\frac{1-\epsilon}{1-t}}dt
<\displaystyle< +.\displaystyle+\infty.

Since for ϵ1\epsilon\geq 1, ϕ1ϵ\phi^{1-\epsilon} is bounded on IR{\rm I\!R}, then 1ϕ1ϵ=𝒫(IR)\mathcal{M}_{1}^{\phi^{1-\epsilon}}=\mathscr{P}({\rm I\!R}). Thus, Pϵ>01ϕ1ϵP\in\bigcap_{\epsilon>0}\mathcal{M}_{1}^{\phi^{1-\epsilon}}. However,

supx0|P(x)ϕ(x)|+supx>0|(1P(x))ϕ(x)|supxa|ϕ(x)ln(ϕ(x))|+supxb|ϕ(x)ln(ϕ(x))|=+,\displaystyle\sup_{x\leq 0}|P(x)\phi(x)|+\sup_{x>0}|(1-P(x))\phi(x)|\geq\sup_{x\leq a}\left|\frac{\phi(x)}{\ln{(\phi(x))}}\right|+\sup_{x\geq b}\left|\frac{\phi(x)}{\ln{(\phi(x))}}\right|=+\infty,

which means P𝒫(ϕ)(IR)P\notin\mathscr{P}_{(\phi)}({\rm I\!R}).