This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quasi-conjugate Bayes estimates for GPD parameters and application to heavy tails modelling

Jean Diebolt          Mhamed-Ali El-Aroui
CNRS, Université Marne-la-Vallée             ISG de Tunis
5 Bd Descartes. 77454 Marne-La-Vallée. France.    41 Av. de la Liberté, Bardo 2000. Tunisia.
diebolt@math.univ-mlv.fr         Mhamed.Elaroui@isg.rnu.tn

Myriam Garrido          Stéphane Girard
     Dept. MI, ENAC             SMS-LMC, Université Grenoble I
Av. E. Belin. 31055 Toulouse. France.        BP 53, 38041 Grenoble. France.
garrido@recherche.enac.fr          Stephane.Girard@imag.fr

Abstract – We present a quasi-conjugate Bayes approach for estimating Generalized Pareto Distribution (GPD) parameters, distribution tails and extreme quantiles within the Peaks-Over-Threshold framework. Damsleth conjugate Bayes structure on Gamma distributions is transfered to GPD. Posterior estimates are then computed by Gibbs samplers with Hastings-Metropolis steps. Accurate Bayes credibility intervals are also defined, they provide assessment of the quality of the extreme events estimates. An empirical Bayesian method is used in this work, but the suggested approach could incorporate prior information. It is shown that the obtained quasi-conjugate Bayes estimators compare well with the GPD standard estimators when simulated and real data sets are studied.

Key words – Extreme quantiles, Gamcon II distribution, Generalized Pareto Distribution, Gibbs Sampler, Peaks over thresholds (POT).

1 Introduction

Motivated by univariate tail and extreme quantile estimation, our goal is to develop new Bayesian procedures for making statistical inference on the shape and scale parameters of Generalized Pareto Distributions (GPD) when used to model heavy tails and estimate extreme quantiles.

Let us assume that observations of a studied phenomenon x1,x2,,xnx_{1},x_{2},\dots,x_{n} are issued from independent and identically distributed (i.i.d.) random variables X1,X2,,XnX_{1},X_{2},\dots,X_{n} with unknown common distribution function (d.f.) FF. Suppose that one needs to estimate either extreme quantiles q1pq_{1-p} of FF (i.e., 1F(q1p)=p1-F(q_{1-p})=p with pp \in (0, 1/n](0,\,1/n] typically), or the extreme tail of FF (i.e., 1F(x)1-F(x) for xxn,nx\geq x_{n,\,n}, where x1,nxn,nx_{1,\,n}\leq\dots\leq x_{n,\,n} denote the ordered observations). It is usually recommended to use the Peaks Over Threshold (POT) method (described for example in Davison and Smith (1990) or in the monographs Embrechts et al. (1997) and Reiss and Thomas (2001)), where only observations xix_{i} exceeding a sufficiently high threshold unu_{n} are considered. In view of the theorem of Balkema and de Haan (1974) and Pickands (1975) the probability distribution of the kk == knk_{n} positive excesses yjy_{j} == xnj+1,nunx_{n-j+1,\,n}-u_{n} for j=1,,kj=1,\dots,k, where xnk,nx_{n-k,\,n} << unu_{n} \leq xnk+1,nx_{n-k+1,\,n}, can be approximated for large unu_{n} by a GPD(γ,σ)(\gamma,\,\sigma) distribution with scale parameter σ>0\sigma>0 and shape parameter γ\gamma. The d.f. of GPD(γ,σ)(\gamma,\,\sigma) is

Fγ,σ(y)={1(1+γyσ)+1/γifγ01exp(yσ)ifγ=0,\displaystyle F_{\gamma,\,\sigma}(y)\,=\,\left\{\begin{array}[]{ll}\displaystyle 1\;-\;\left(1\;+\;\frac{\gamma y}{\sigma}\right)_{+}^{-1/\gamma}&\mbox{if}\;\gamma\neq 0\\ \\ \displaystyle 1\;-\;\exp{\left(-\frac{y}{\sigma}\right)}&\mbox{if}\;\gamma=0\,,\end{array}\right. (4)

with y+=max(y, 0)y_{+}=\max(y,\,0), where yy \in +\mathbb{R}^{+} when γ0\gamma\geq 0, and yy \in [0,σ/γ][0,\,-\sigma/\gamma] when γ<0\gamma<0.

The shape and scale parameters of the approximating GPD are estimated on the basis of the excesses above unu_{n}. The estimates are then usually plugged into the GPD d.f. and extreme quantile estimates are deduced. In this perspective, good estimation procedures for the shape and scale parameters of a GPD on the basis of approximately i.i.d. observations are necessary for accurate tail estimation. Estimating the shape and scale parameters, γ\gamma and σ\sigma, is not easy. Smith (1987) has shown that estimating GPD parameters by maximum likelihood (ML) is a non regular problem for γ<1/2\gamma<-1/2. Besides, the ML estimators may be numerically hardly tractable, see Davison and Smith (1990) and Grimshaw (1993). Moreover, the properties of MLE’s meet their asymptotic theory only when the sample size (in our case the number kk of exceedances) is larger than about 500500. Many alternative estimators have been proposed: Hosking and Wallis (1987) linear estimators are based on probability weighted moments (PWM); they are easily computed and reasonably efficient when 0.4<γ<0.4-0.4<\gamma<0.4 approximately (see also the comparative study of Singh and Guo, 1997). Castillo and Hadi (1997) have proposed other estimators based on the elemental percentile method (EPM), involving intensive computations. Their numerical simulations show that EPM estimators are more efficient than PWM ones only when γ<0\gamma<0.

Semiparametric estimators of γ\gamma along with related estimators of extreme quantiles have been intensively studied. For example, the Hill estimator presented by Hill (1975) and studied in Haeusler and Teugels (1985) and Beirlant and Teugels (1989), among many others. Two classic extensions of the Hill estimator are: The moment tail index estimator (denoted hereafter MTI(DEdH)) of Dekkers, Einmahl and de Haan (1989); The Zipf estimator, see Schultze and Steinebach (1996), and its generalization by Beirlant, Dierckx and Guillou (2001), denoted hereafter ZipfG. Most of these semiparametric estimators do not perform much better than parametric ones when applied to sets of excesses. Only the ZipfG estimator seems to outperform the other ones.

In this paper, a new Bayesian inference approach for GPD’s with γ>0\gamma>0 is introduced. In a number of application areas such as structural reliability (see for example Grimshaw, 1993) and excess-of-loss reinsurance (see Reiss and Thomas, 2001), tail estimation based on small or moderate data sets is needed. In such situations Bayes procedures can be used to capture and take into account all available information including expert information even when it is loose. Moreover, in the realm of tail inference, evaluating the imprecision of estimates is of vital importance. Bootstrap methods have been suggested to assess this imprecision. But standard bootstrap based on larger values of ordered samples is known to be inconsistent, whereas standard bootstrap based on excess samples has not second-order coverage accuracy and is imprecise when sample sizes are not extremely large (e.g., Bacro and Brito (1993), Caers, Beirlant and Vynckier, 1998). On the contrary, in the Bayesian context, credibility regions and marginal credibility intervals for GPD parameters and related high quantiles provide a non-asymptotic geometry of uncertainty directly based on outputs of the procedure, thus shortcutting bootstrap. For all these reasons (expert information, credibility intervals) easily implementable Bayesian inference procedures for GPD’s are highly desirable to study excess samples in the scope of POT methodology.

The restriction γ>0\gamma>0 is not too damaging, since several major application areas are connected to heavy-tailed distributions. Our approach can also be tried for data issued from distributions suspected to lie in Gumbel’s maximum domain of attraction (DA(Gumbel)) where γ=0\gamma=0. In the latter case, direct Bayesian analysis of the exponential distribution can be made in parallel (see Appendix A).

For other papers on Bayesian approaches to high quantile estimation, see, e.g., Coles and Tawn (1996), Coles and Powell (1996), Reiss and Thomas (1999), Tancredi et al. (2002), Bottolo et al. (2003), and the monographs Reiss and Thomas (2001) and Coles (2001) along with references therein.

Our starting point is a representation of heavy-tailed GPD’s as mixtures of exponential distributions with a gamma mixing distribution. Since the Bayesian conjugate class for gamma distributions is documented (Damsleth, 1975) we only have to transfer it to GPD’s. As described in Section 2, this provides a natural Bayesian quasi-conjugate class for heavy-tailed GPD’s. Even though Bayes estimators have no analytical expressions, the quasi-conjugate structure makes the Bayes computations very simple, the convergence of the MCMC algorithms very quick and gave a high parsimony to the global approach with an intuitive interpretation of the hyperparameters.

Section 3 compares our Bayes estimates to ML, PWM, moment tail index MTI(DEdH) and generalized Zipf (ZipfG) estimates on excess samples through Monte Carlo simulations. Section 4 is devoted to benchmark real data sets. Finally, Section 5 lists some conclusions and presents forthcoming research projects.

2 Bayesian inference for GPD parameters

The standard parameterization of heavy-tailed GPD distributions described by (1) when γ>0\gamma>0 is now replaced by a more convenient one depending on the two positive parameters α\alpha == 1/γ1/\gamma and β\beta == σ/γ\sigma/\gamma. The re-parameterized version GPD(α,β)(\alpha,\,\beta) has probability density function (p.d.f.)

fα,β(y)=f(y|α,β)=αβ(1+yβ)α 1,y0.\displaystyle f_{\alpha,\,\beta}(y)\,=\,f\left(y\left|\,\alpha,\,\beta\right.\right)\,=\,\frac{\alpha}{\beta}\,\left(1\;+\;\frac{y}{\beta}\right)^{-\alpha\,-\,1},\quad y\geq 0\,. (5)

We assume in the following that we have observations 𝐲{\mathbf{y}} == (y1,,yk)(y_{1},\dots,y_{k}) which are realizations of i.i.d. random variables Y1,,YkY_{1},\dots,Y_{k} approximately issued from (5). Typically, they represent excesses above some sufficiently high threshold uu. The latter means that for each jkj\leq k, there exists an integer ini\leq n such that YjY_{j} == XiuX_{i}-u, Xi>uX_{i}>u, where the XisX_{i}^{\prime}s are assumed i.i.d. and issued from a distribution in Fréchet’s maximum domain of attraction: DA(Fréchet). Remark that the case where the common distribution of the XisX_{i}^{\prime}s is in DA(Gumbel) can also be covered by considering the limiting situation α+\alpha\to+\infty and β+\beta\to+\infty with β/ασ>0\beta/\alpha\to\sigma>0 (see Appendix A).

Our starting point is the following mixture representation for (5), see Reiss and Thomas (2001) page 157:

f(y|α,β)=0zeyzg(z|α,β)𝑑z,\displaystyle f\left(y\left|\,\alpha,\,\beta\right.\right)\,=\,\int_{0}^{\infty}z\,e^{-yz}\,g\left(z\left|\,\alpha,\,\beta\right.\right)\,dz\,, (6)

where for z0z\geq 0, g(z|α,β)g(z|\,\alpha,\,\beta) == [βα/Γ(α)]zα1eβz[\beta^{\alpha}/\Gamma(\alpha)]\,z^{\alpha-1}e^{-\beta z} is the density of the Gamma(α,β)(\alpha,\,\beta) distribution with shape and scale parameters α\alpha and β\beta. The previous representation stands only in DA(Fréchet) as α\alpha and β\beta have to be non-negative (as parameters of a gamma distribution) which implies γ>0\gamma>0.

There is no Bayesian conjugate class for GPD’s. Nevertheless, as shown below, the mixture form (6) allows us to make use of the conjugate class for gamma distributions to construct a suitable quasi-conjugate class for GPD’s.

2.1 Transferring conjugate structure from Gamma to GPD

According to Damsleth (1975), the description of the conjugate class for gamma distributions relies on the so-called type II Gamcon distributions: for x>0x>0, the density of the Gamcon II(c,dc,\,d) distribution with parameters c>1c>1 and d>0d>0 is

ξc,d(x)=Ic,d1Γ(dx+ 1)(Γ(x))d(cd)dx,\displaystyle\xi_{c,\,d}\left(x\right)\,=\,I_{c,\,d}^{-1}\;\Gamma(dx\,+\,1)\left(\Gamma(x)\right)^{-d}(c\,d)^{-dx}\,, (7)

where Ic,d=0Γ(dx+ 1)(Γ(x))d(cd)dx𝑑xI_{c,\,d}=\int_{0}^{\infty}\Gamma(dx\,+\,1)(\Gamma(x))^{-d}(c\,d)^{-dx}\,dx. Let in the following 𝐳{\mathbf{z}} == (z1,,zk)(z_{1},\dots,z_{k}) denote a kk-sample of realizations of i.i.d. random variables Z1,,ZkZ_{1},\dots,Z_{k} issued from Gamma(α,β)(\alpha,\,\beta). According to Damsleth (1975), Theorem 2, the conjugate prior density on (α,β)(\alpha,\,\beta) with hyperparameters δ>0\delta>0 and η>μ>0\eta>\mu>0 is given by    πδ,η,μ(α,β)=π(α)π(β|α)\pi_{\delta,\,\eta,\mu}(\alpha,\,\beta)\,=\,\pi(\alpha)\,\pi\left(\beta\left|\,\alpha\right.\right)\,    where π(α)\pi(\alpha) is the density of Gamcon II(c=η/μ,d=δc=\eta/\mu,\,d=\delta) and π(β|α)\pi(\beta|\,\alpha) is the density of Gamma(δα+ 1,δη\delta\alpha\,+\,1,\,\delta\eta). Then, the conditional density of α\alpha given 𝐳{\mathbf{z}}, denoted π(α|𝐳)\,\pi(\alpha|\,{\mathbf{z}}), is Gamcon II(η/μ,δ\eta^{\prime}/\mu^{\prime},\,\delta^{\prime}) with

δ=δ+k,η=δη+i=1kziδ+kandμ=μδ/(δ+k)(i=1kzi)1/(δ+k),\displaystyle\delta^{\prime}=\delta+k\,,\quad\eta^{\prime}=\frac{\delta\eta\,+\,\sum_{i=1}^{k}z_{i}}{\delta\,+\,k}\quad\mbox{and}\quad\mu^{\prime}=\mu^{\delta/(\delta\,+\,k)}\left(\prod_{i=1}^{k}z_{i}\right)^{1/(\delta\,+\,k)}\,, (8)

and the conditional density of β\beta given α\alpha and 𝐳{\mathbf{z}}, denoted π(β|α,𝐳)\pi(\beta|\,\alpha,\,{\mathbf{z}}), is Gamma(δα+ 1,δη)(\delta^{\prime}\alpha\,+\,1,\,\delta^{\prime}\eta^{\prime}). Remark 1 . – The hyperparameters η\eta and μ\mu act on the sufficient statistics s1=i=1kzis_{1}=\sum_{i=1}^{k}z_{i} and s2=i=1klnzis_{2}=\sum_{i=1}^{k}\ln{z_{i}}, whereas δ\delta tunes the importance of these modifications. When δ\delta is a positive integer, the introduction of these prior distributions can loosely be interpreted as artificially adding δ\delta observations with arithmetic mean η\eta in s1s_{1} and another δ\delta observations with geometric mean μ\mu in s2s_{2} with η/μ=c>1\eta/\mu=c>1. \Box

Now the question is: How can we deduce the posterior density of the GPD parameters (α,β)(\alpha,\,\beta) given 𝐲{\mathbf{y}} (assumed in the following to be a kk-sample from GPD) from the gamma conjugate structure, starting with Damsleth prior density πδ,η,μ(α,β)\pi_{\delta,\,\eta,\mu}(\alpha,\,\beta) ?

In the remaining of this section the following notations are used:

– Let θ=(α,β)\theta=(\alpha,\,\beta). Consequently fθf_{\theta} and gθg_{\theta} will denote respectively p.d.f.’s of GPD(α,β\alpha,\,\beta) and Gamma(α,β\alpha,\,\beta) probability distributions.

– The likelihood function of the observations 𝐲{\mathbf{y}} for fθf_{\theta} writes f(𝐲|θ)f({\mathbf{y}}|\,\theta) == i=1k\prod_{i=1}^{k} f(yi|θ)f(y_{i}|\,\theta). Similarly, the likelihood function of 𝐳{\mathbf{z}} for gθg_{\theta} writes g(𝐳|θ)g({\mathbf{z}}|\,\theta) == i=1k\prod_{i=1}^{k} g(zi|θ)g(z_{i}|\,\theta).

– We let π(θ|𝐲)\pi(\theta|\,{\mathbf{y}}) denote the posterior density of θ\theta given the observations 𝐲{\mathbf{y}} == (y1,,yk)(y_{1},\dots,y_{k}) corresponding to fθf_{\theta} and the prior density π(θ)=πδ,η,μ\pi(\theta)=\pi_{\delta,\,\eta,\mu} (Damsleth prior):

π(θ|𝐲)=f(𝐲|θ)π(θ)fπ(𝐲),wherefπ(𝐲)=Θf(𝐲|θ)π(θ)𝑑θ.\displaystyle\pi\left(\theta\left|\,{\mathbf{y}}\right.\right)\,=\,\frac{f\left({\mathbf{y}}\left|\,\theta\right.\right)\,\pi\left(\theta\right)}{f_{\pi}\left({\mathbf{y}}\right)},\quad\mbox{where}\;\;\;f_{\pi}({\mathbf{y}})\,=\,\int_{\Theta}f\left({\mathbf{y}}|\,\theta^{\prime}\right)\pi(\theta^{\prime})\,d\theta^{\prime}\,. (9)

– Similarly, π(θ|𝐳)\pi(\theta|\,{\mathbf{z}}) denotes the posterior density of θ\theta given 𝐳{\mathbf{z}} == (z1,,zk)(z_{1},\dots,z_{k}) corresponding to gθg_{\theta} and Damsleth prior density π\pi:

π(θ|𝐳)=g(𝐳|θ)π(θ)gπ(𝐳),wheregπ(𝐳)=Θg(𝐳|θ)π(θ)𝑑θ.\displaystyle\pi\left(\theta\left|\,{\mathbf{z}}\right.\right)\,=\,\frac{g\left({\mathbf{z}}\left|\,\theta\right.\right)\,\pi\left(\theta\right)}{g_{\pi}\left({\mathbf{z}}\right)},\quad\mbox{where}\;\;\;g_{\pi}({\mathbf{z}})\,=\,\int_{\Theta}g\left({\mathbf{z}}|\,\theta^{\prime}\right)\pi\left(\theta^{\prime}\right)\,d\theta^{\prime}\,. (10)

– We further denote

qπ(𝐳|𝐲)=p(𝐲|𝐳)gπ(𝐳)p(𝐲|𝐳)gπ(𝐳)𝑑𝐳,wherep(𝐲|𝐳)=(i=1kzi)exp(i=1kyizi).\displaystyle q_{\pi}\left({\mathbf{z}}\left|\,{\mathbf{y}}\right.\right)\,=\,\frac{p\left({\mathbf{y}}\left|\,{\mathbf{z}}\right.\right)g_{\pi}\left({\bf z}\right)}{\int\,p\left({\mathbf{y}}\left|\,{\mathbf{z}}^{\prime}\right.\right)g_{\pi}\left({\mathbf{z}}^{\prime}\right)\,d{\mathbf{z}}^{\prime}},\quad\mbox{where}\quad p\left({\mathbf{y}}\left|\,{\mathbf{z}}\right.\right)=\left(\prod_{i=1}^{k}z_{i}\right)\exp{\left(-\sum_{i=1}^{k}y_{i}z_{i}\right)}\,. (11)

Thus, the posterior distribution of θ\theta given 𝐲{\mathbf{y}} is a mixture of the posterior distributions of θ\theta given 𝐳{\mathbf{z}} with mixing density qπ(𝐳|𝐲)q_{\pi}({\mathbf{z}}|\,{\mathbf{y}}):

π(θ|𝐲)=qπ(𝐳|𝐲)π(θ|𝐳)𝑑𝐳.\displaystyle\pi\left(\theta\left|\,{\bf y}\right.\right)\,=\,\int q_{\pi}\left({\mathbf{z}}\left|\,{\mathbf{y}}\right.\right)\,\pi\left(\theta\left|\,{\mathbf{z}}\right.\right)\,d{\mathbf{z}}\,. (12)

It follows that each posterior moment given 𝐲{\mathbf{y}} is the integral of the corresponding posterior moment given 𝐳{\mathbf{z}} with respect to qπ(𝐳|𝐲)q_{\pi}({\mathbf{z}}|\,{\mathbf{y}}). Unfortunately, the functions gπg_{\pi} (see (10)–(11)) and 𝐳\,{\mathbf{z}} \mapsto qπ(𝐳|𝐲)q_{\pi}({\mathbf{z}}|\,{\mathbf{y}}) (see (11)–(12)) are not expressible in analytical close form. Therefore, a numerical algorithm is needed. The mixture representation (12) allows us to design a simple and efficient Gibbs sampler. It is presented in the next subsection.

2.2 Gibbs sampling

A Gibbs sampler is used to get approximate simulations from the posterior density of θ\theta given 𝐲{\mathbf{y}}, π(θ|𝐲)\pi(\theta|{\mathbf{y}}). Damsleth’s priors are used for θ\theta == (α,β)(\alpha,\,\beta). The proposed sampler generates a Markov chain whose equilibrium density (denoted π(𝐳,θ|𝐲)\,\pi({\mathbf{z}},\theta|\,{\mathbf{y}})) is the joint density of (𝐙,θ)({\mathbf{Z}},\,\theta) conditionally on 𝐘=𝐲{\mathbf{Y}}={\mathbf{y}} where 𝐘{\mathbf{Y}} and 𝐙{\mathbf{Z}} denote respectively random vectors of kk independent Gamma(α\alpha, β\beta) and GPD(α\alpha, β\beta) random variables. To implement the Gibbs sampler we first note that within the general setting of subsection 2.1, the conditional density of θ\theta given (𝐲,𝐳)({\mathbf{y}},\,{\mathbf{z}}) is independent of 𝐲{\mathbf{y}}: π(θ|𝐲,𝐳)\pi(\theta|{\mathbf{y}},\,{\mathbf{z}}) == π(θ|𝐳)\pi(\theta|{\mathbf{z}}), and the conditional density of 𝐙{\mathbf{Z}} given (𝐲,θ)({\mathbf{y}},\,\theta) is

f(𝐳|𝐲,θ)=p(𝐲|𝐳)g(𝐳|θ)f(𝐲|θ)=i=1kf(zi|yi,θ)i=1kziαe(β+yi)zi 1lzi>0.\displaystyle f\left({\mathbf{z}}\left|\,{\mathbf{y}},\,\theta\right.\right)\,=\,\frac{p\left({\mathbf{y}}\left|\,{\mathbf{z}}\right.\right)g\left({\mathbf{z}}\left|\,\theta\right.\right)}{f\left({\mathbf{y}}\left|\,\theta\right.\right)}\,=\,\prod_{i=1}^{k}f\left(z_{i}\left|\,y_{i},\,\theta\right.\right)\,\propto\,\prod_{i=1}^{k}\,z_{i}^{\alpha}\,e^{-\left(\beta\,+\,y_{i}\right)z_{i}}\,{\mathrm{1\!\!\hskip 0.85355ptl}}_{z_{i}>0}\,. (13)

It follows that for iki\leq k, conditionally on θ\theta and Yi=yiY_{i}=y_{i}, ZiZ_{i} \sim Gamma(α+1,β+yi)(\alpha+1,\,\beta+y_{i})
independently. This yields the following intertwining sampler, where θ(m)\theta^{(m)} denotes the current parameter value at iteration mm. The next iteration:

  1. 1.

    independently simulates each zi(m+1)z_{i}^{(m+1)} from Gamma(α(m)+1,β(m)+yi)(\alpha^{(m)}+1,\,\beta^{(m)}+y_{i}) ;

  2. 2.

    simulates θ(m+1)\theta^{(m+1)} from π(θ|𝐳(m+1))\pi(\theta|\,{\mathbf{z}}^{(m+1)}).

In such a setting, both (𝐳(m))m0({\mathbf{z}}^{(m)})_{m\geq 0} and (θ(m))m0(\theta^{(m)})_{m\geq 0} are Markov chains. The simulation step of θ(m+1)\theta^{(m+1)} is split into the marginal simulation of α(m+1)\alpha^{(m+1)} and the conditional simulation of β(m+1)\beta^{(m+1)} given α(m+1)\alpha^{(m+1)}. Finally, the iteration m+1m+1 of our Gibbs sampler:

  1. 1.

    independently simulates each zi(m+1)z_{i}^{(m+1)} from Gamma(α(m)+1,β(m)+yi)(\alpha^{(m)}+1,\,\beta^{(m)}+y_{i}) ;

  2. 2.

    simulates α(m+1)\alpha^{(m+1)} from π(α|𝐳(m+1))\pi(\alpha|\,{\mathbf{z}}^{(m+1)}), i.e. from Gamcon II(η/μ,δ\eta^{\prime}\big{/}\mu^{\prime},\,\delta^{\prime}) with δ\delta^{\prime}, η\eta^{\prime} and μ\mu^{\prime} computed from 𝐳(m+1){\mathbf{z}}^{(m+1)} using equation (8) ;

  3. 3.

    simulates β(m+1)\beta^{(m+1)} from Gamma(δα(m+1)+ 1,δη)(\delta^{\prime}\alpha^{(m+1)}\,+\,1,\,\delta^{\prime}\eta^{\prime}).

When the equilibrium regime is nearly reached, simulated values of θ\theta are approximately issued from the posterior distribution of θ\theta given 𝐲{\mathbf{y}}. Implementing the previous algorithm requires simulating Gamcon II distributions and choosing adequate values of the hyperparameters δ\delta, η\eta and μ\mu of the priors π(α)\pi(\alpha) and π(β|α)\pi(\beta|\,\alpha).

2.3 Simulating Gamcon II distributions

Our sampling scheme involves simulations from Gamcon II(c,d)(c,\,d) distributions with cc == cc^{\prime} == η/μ\eta^{\prime}\big{/}\mu^{\prime}, where η\eta^{\prime} and μ\mu^{\prime} are given by (8), and moderate to large values of dd == dd^{\prime} == δ\delta^{\prime} == δ+k\delta+k. Up to our knowledge, there is no standard algorithm for simulating such distributions. The simulation method that we present is based on a normal approximation to Gamcon II distributions using Laplace’s method.

Gamcon II(c,d)(c,\,d) can be approximated by a normal distribution having the same mode. It is proved in Garrido (2002) that this mode, Mc,dM_{c,\,d}, is the unique root of the equation

ψ(dMc,d+ 1)ψ(Mc,d)lndlnc= 0,\displaystyle\psi\left(d\,M_{c,\,d}\,+\,1\right)\,-\,\psi\left(M_{c,\,d}\right)\,-\,\ln{d}\,-\,\ln{c}\,=\,0\,, (14)

where ψ\psi denotes the digamma function: the derivative of the logarithm of Γ(t)\Gamma(t). The standard deviation Sc,dS_{c,\,d} of the normal approximant distribution is computed through a Taylor expansion of the Gamcon II(c,d)(c,\,d) density in a neighborhood of its mode:

Sc,d=1dψ(Mc,d)d2ψ(dMc,d+ 1).\displaystyle S_{c,\,d}\,=\,\frac{1}{\sqrt{d\psi^{\prime}\left(M_{c,\,d}\right)\,-\,d^{2}\psi^{\prime}\left(dM_{c,\,d}\,+\,1\right)}}\,. (15)

Garrido (2002) has established that (11/d)/(lnc+lnd/2)(1-1/d)/(\ln{c}+\ln{d/2}) \leq Mc,dM_{c,\,d} \leq 2/lnc2/\ln{c}. In practice, Mc,dM_{c,\,d} is numerically approximated through the bisection method.

At each iteration, we simulate Gamcon II(c,d)(c^{\prime},\,d^{\prime}) with the help of the independent Hastings-Metropolis algorithm, which requires a suitable proposal density. Actually, it is enough to make only one step of Hastings-Metropolis at each iteration of the Gibbs sampler: see Robert (1998). The proposal density must be as close as possible to the simulated density, Gamcon II(c,d)(c^{\prime},\,d^{\prime}), and have heavier tails to ensure good mixing. Since Gamcon II densities have gamma-like tails, we cannot directly take the normal approximant density as a proposal. Rather, we have chosen a Cauchy proposal density as close as possible to the normal approximant density to Gamcon II(c,d)(c^{\prime},\,d^{\prime}), i.e. with the same mode and modal value. Therefore at each iteration our Hastings-Metropolis step:

  1. 1.

    independently simulates a new YY from the Cauchy distribution with mode
    Mc,dM_{c^{\prime},\,d^{\prime}} and modal value 1/(Sc,d2π)1\big{/}(S_{c^{\prime},\,d^{\prime}}\sqrt{2\pi}) ;

  2. 2.

    computes the ratio ρ=min[1,fcauchy(α(m))ξc,d(Y)fcauchy(Y)ξc,d(α(m))],\displaystyle{\rho\,=\,\min{\left[1,\,\frac{f^{\star}_{\mbox{\rm\scriptsize cauchy}}\left(\alpha^{(m)}\right)\,\xi_{c^{\prime},\,d^{\prime}}(Y)}{f^{\star}_{\mbox{\rm\scriptsize cauchy}}(Y)\;\xi_{c^{\prime},\,d^{\prime}}\left(\alpha^{(m)}\right)}\right]}\,,} where fcauchyf^{\star}_{\mbox{\rm\scriptsize cauchy}} denotes the density of YY ;

  3. 3.

    takes α(m+1)\alpha^{(m+1)} == YY with probability ρ\rho and α(m+1)\alpha^{(m+1)} == α(m)\alpha^{(m)} otherwise.

Since all the transition densities involved are positive, the resulting Markov chain (θ(m))m0(\theta^{(m)})_{m\geq 0} is ergodic with unique invariant probability measure equal to the posterior distribution of θ\theta given 𝐲{\mathbf{y}}. Intensive numerical experiments reported in Garrido (2002) show that for δ>0.5\delta>0.5, this Gibbs sampler with one Hastings-Metropolis step at each iteration converges quickly to its invariant distribution. Actually, it seems that discarding the first 500500 iterations is sufficient. For δ0.5\delta\leq 0.5, we observed numerical instabilities.

Bayesian inference on the GPD parameters α\alpha, β\beta is based on the outputs of this algorithm when it has approximately reached its stationary regime. We then record a sufficient number of realizations α(m)\alpha^{(m)} and β(m)\beta^{(m)}.

2.4 Choice of hyperparameters

In this paper we suppose that no expert information is available for the choice of priors on the GPD parameters (α,β)(\alpha,\,\beta) and we thus take an empirical Bayes approach. Introduction of expert information is discussed in Appendix B.

We take δ=1\delta=1 both for convenience (for δ=1\delta=1, the prior π(α)\pi(\alpha) reduces to a gamma distribution, see below) and because a small value of δ>0\delta>0 indicates little confidence in prior information (see Remark 1). Recall that we observed numerical instabilities of our Gibbs sampler for δ<0.5\delta<0.5. A natural approach to compute hyperparameter values of priors π(α)\pi(\alpha) and π(β|α)\pi(\beta|\,\alpha) is to equate some of their location parameters (e.g. the mean) to frequentist estimates of α\alpha and β\beta, denoted here α^\widehat{\alpha} and β^\widehat{\beta}. Recall that the mean of the prior distribution π(β|α)\pi(\beta|\,\alpha) is (α+1)/η(\alpha+1)\big{/}\eta. Taking this mean equal to β^=xnk,n\widehat{\beta}=x_{n-k,\,n}, the estimate induced by the Hill procedure, and replacing α\alpha by its Hill estimate yields η=(α^+ 1)/β^\eta=(\widehat{\alpha}\,+\,1)\big{/}\widehat{\beta}. When δ=1\delta=1, the prior π(α)\pi(\alpha) reduces to Gamma(2,ln(η/μ))(2,\,\ln{(\eta\big{/}\mu)}). Its mean is 2/ln(η/μ)2\big{/}\ln{(\eta/\mu)}. Solving the equation where this prior mean is set equal to α^\widehat{\alpha} yields μ=(α^+1)(exp( 2/α^))/β^.\displaystyle{\mu\,=(\widehat{\alpha}+1)\;\left(\exp{\left(-\,2/\widehat{\alpha}\right)}\right)/\widehat{\beta}}.

If prior modes are used instead of prior means, a similar approach leads to slightly different formulas for η\eta and μ\mu. Actually, with prior modes explicit expressions can be obtained for all δ>0\delta>0. However, preliminary numerical experiments yielded better estimates with prior means.

2.5 Computation of Bayesian estimates

When the Gibbs sampler approximately reaches its stationary regime, KK values denoted (α(m),β(m))(\alpha^{(m)},\beta^{(m)}), m=1,,Km=1,\dots,K, are saved. This sample is used to compute posterior means, medians or modes to estimate (α,β)(\alpha,\,\beta), leading to Bayesian estimates for (γ,σ)(\gamma,\,\sigma) and q1pq_{1-p}. Remark 2 . – The posterior modes are more difficult to compute since one has first to construct smooth estimates of the joint posterior density of α\alpha and β\beta. Numerical experiments reported in Garrido (2002) for both GPD generated data and excess samples led us to keep only posterior medians. \Box
Concerning the estimation of an extreme quantile q1pq_{1-p} (𝐲=(y1,,yk){\mathbf{y}}=(y_{1},\dots,y_{k}) is a sample of excesses over a threshold uu), a sample of values q^1p(m)\widehat{q}^{(m)}_{1-p} is computed using POT from the simulated (α(m),β(m))(\alpha^{(m)},\beta^{(m)})’s:

q^1p(m)=u+β(m)[(npk)1/α(m) 1].\widehat{q}^{(m)}_{1-p}\,=\,u\ +\ \beta^{(m)}\left[\left(\frac{np}{k}\right)^{-1/\alpha^{(m)}}\,-\,1\right]\,. (16)

Means, histograms and credibility intervals can then be computed and represented from (16). Bayesian credibility intervals for α\alpha, β\beta and q1pq_{1-p} are obtained by sorting the corresponding simulated values obtained from the Gibbs sampler. See sections 3 and 4 below. The probability distribution of the observed sample 𝐲{\mathbf{y}} can be estimated either by GPD(α^,β^)(\widehat{\alpha},\,\widehat{\beta}), or by the posterior predictive distribution. Similarly, predictive quantile functions can be approximated through

F1^pred(y)1Km=1KFα(m),β(m)1(y).\displaystyle\widehat{F^{-1}}_{\mbox{\scriptsize pred}}(y)\approx\frac{1}{K}\,\sum_{m=1}^{K}F^{-1}_{\alpha^{(m)},\,\beta^{(m)}}(y)\,. (17)

3 Comparative Monte Carlo simulations

Intensive Monte Carlo simulations were used to compare our Bayes quasi-conjugate estimators (denoted hereafter Bayes-QC) of γ\gamma and q1pq_{1-p} with their counterparts when usual estimators of GPD parameters are used: Maximum Likelihood (ML), Moment Tail Index estimator (MTI(DEdH)), Generalized Zipf estimator (ZipfG) and Probability Weighted Moments (PWM).

3.1 The simulation design

Three probability distributions in DA(Fréchet) were considered in order to produce various excess samples:

– The Fréchet(1) distribution, for which γ=1\gamma=1 and the second-order regular variation parameter (presented below) ρ=1\rho=-1. The d.f. of Fréchet(β\beta) for β>0\beta>0 is F(x)F(x) == exp(x1/β)\exp{(-x^{-1/\beta})}, x>0x>0.

– The Burr(1, 0.5, 2)(1,\,0.5,\,2) distribution, for which γ=1\gamma=1 and ρ=0.5\rho=-0.5. The d.f. of Burr(β,τ,λ\beta,\,\tau,\,\lambda) for β>0\beta>0, τ>0\tau>0, λ>0\lambda>0 is F(x)F(x) == 1[β/(β+xτ)]λ1\,-\,[\beta/(\beta+x^{\tau})]^{\lambda}, x>0x>0.

– The Log-Gamma(2) distribution, for which γ=1\gamma=1 and ρ=0\rho=0. The density of Log-Gamma(2) is f(x)f(x) == x2(lnx)1x^{-2}(\ln{x})^{-1}, x>0x>0.

The second-order regular variation parameter ρ\rho (ρ0\rho\leq 0) indicates the quality of approximation of FuF_{u} by a GPD(γ,σ(u))(\gamma,\,\sigma(u)) for high values of uu and suitable σ(u)\sigma(u)’s. High values of |ρ||\rho| indicate excellent fitting, whereas values of |ρ||\rho| close to 0 indicate bad fitting.

For each one of these three probability distributions, 100100 data sets of size n=500n=500 were independently generated. For each simulated data set and each value of kk == 5,10,,4955,10,\dots,495, we performed 1 0001\,000 iterations of the Gibbs sampler and only the last 500500 ones were kept.

3.2 First results

For each simulated data set and each value of kk == 5,10,,4955,10,\dots,495 Bayes-QC estimates of γ\gamma and q1pq_{1-p}, p=1/5 000\,p=1/5\,000, are computed as the medians of the resulting 500500 γ^(m)\,\widehat{\gamma}^{(m)}’s and q^1p(m)\,\widehat{q}_{1-p}^{(m)}’s given in the 500 last iterations of the Gibbs sampler. Figures 13 display the averages over the 100100 data sets of these Bayesian estimates of γ\gamma and q1pq_{1-p} as functions of kk. The means and modes were also computed and gave quite similar results. They are not displayed here. ML, MTI(DEdH) and ZipfG estimates of γ\gamma and q1pq_{1-p}, p=1/5 000\,p=1/5\,000, were also computed for each of those simulated data sets and the same values of kk. Figures 13 display the averages over the 100100 data sets of these estimates of γ\gamma and q1pq_{1-p}.

The left panels of Figures 13 show that our estimates of γ\gamma based on posterior medians (continuous line curves) perform rather well compared to the other ones, and give estimates close to ML and ZipfG. The right panels of Figures 13 show that our estimates of q1pq_{1-p} are very close to those obtained by ML. Both give better estimates than MTI(DEdH) but in general do not perform as well as ZipfG, although they seem to be more stable as functions of kk. Again, credibility intervals for γ\gamma and q1pq_{1-p} and potential improvements when prior information is available are strong arguments supporting the use of our Bayesian procedure.

3.3 Bayes credibility and Monte-Carlo confidence intervals

Monte Carlo simulations were also used to study whether Bayesian credibility intervals could be used as approximate frequentist confidence intervals for γ\gamma and q1pq_{1-p}. For each simulated data set, the last 500500 iterations of our Gibbs sampler provide 500500 estimates γ^(m)\widehat{\gamma}^{(m)} and 500500 estimates q^1p(m)\widehat{q}_{1-p}^{(m)}, mm == 501,,1 000501,\dots,1\,000. They can be sorted to provide approximate 90%90~\% credibility intervals for γ\gamma and q1pq_{1-p}. The precision of these credibility intervals was studied through Monte Carlo simulations: for each one of the three probability distributions considered and for each value of kk, we counted the number of simulated data sets (out of 100100) for which the true values of γ\gamma and q1pq_{1-p} fell within the corresponding 90%90~\% credibility intervals. Figure 4 exhibits the coverage rates for each simulated distribution and for k=5,10,,495k=5,10,\dots,495. These credibility intervals are very accurate for the Fréchet distribution. For the Log-Gamma distribution, the credibility intervals have good coverage rates for q1pq_{1-p} but not for γ\gamma.

This rather unexpected behavior when ρ=0\rho=0 can be explained in terms of penultimate approximation, see Diebolt, Guillou and Worms (2003): it can be proved that for ρ=0\rho=0, the distribution of excesses is better approximated by a GPD with scale parameter γ+ak(F)\gamma+a_{k}(F), where ak(F)a_{k}(F) is some correction term, than by a GPD with scale parameter γ\gamma (see, e.g., the paper coauthored by Kaufmann, pages 183–190 in Reiss and Thomas (2001) along with references therein and Worms (2001)). This explains why in the case ρ=0\rho=0 the estimates of γ\gamma strongly deviate from the true value. Furthermore, Diebolt et al. (2003) have established for all sufficiently regular estimators (γ^,σ^)(\widehat{\gamma},\,\widehat{\sigma}) of the parameters (γ,σ)(\gamma,\,\sigma) such as ML or PWM, that when ρ=0\rho=0 the estimated survival function F¯γ^,σ^\bar{F}_{\widehat{\gamma},\,\widehat{\sigma}} is a bias-corrected estimate of F¯γ,σ\bar{F}_{\gamma,\,\sigma}. We think that this result partially explains the good and stable coverage rates observed for q1pq_{1-p} for the Log-Gamma distribution. These trials show that the credibility intervals computed through our procedure give very promising results.

For each one of the three simulated probability distributions and each value of k{5,10,,495}k\in\{5,10,\dots,495\} the 100 simulated data sets give 100 estimates of (γ,q1p)(\gamma,q_{1-p}). The empirical 0.05 and 0.95 quantiles of the previous estimates give 90%90\% Monte-Carlo Confidence intervals (MCCI) for (γ,q1p)(\gamma,q_{1-p}). The width of these MCCI’s are used in Figure 5 to compare the precision of our Bayes Quasi-conjugate q1pq_{1-p} estimator to ML, ZipfG and MTI(DEdH) estimators. For Burr data sets (left panel of Figure 5) ZipfG gives the most precise quantile estimators, it is followed by ML, Bayes-QC and MTI(DEdH). For Fréchet, Bayes-QC and ML have the best precisions. It is worth noting that the Bayes-QC is used here in its empirical version. Its precision will increase when combined with expert opinions.

Refer to caption

Figure 1: Means of γ^\hat{\gamma} and q^\hat{q} on the 100100 Monte Carlo replications from Burr(1, 0.5, 2)(1,\,0.5,\,2) for different values of the number kk of excesses.

Refer to caption

Figure 2: Means of γ^\hat{\gamma} and q^\hat{q} on the 100100 Monte Carlo replications from Fréchet(1) for different values of kk.

Refer to caption

Figure 3: Means of γ^\hat{\gamma} and q^\hat{q} on the 100100 Monte Carlo replications from Log-Gamma(2) for different values of kk.

Refer to caption

Figure 4: Coverage rates of our 90%90~\% Bayesian credibility intervals for γ\gamma and q1pq_{1-p} based on the 100100 Monte Carlo replications for different values of kk.

Refer to caption

Figure 5: Width of 90%90\% Monte-Carlo confidence intervals of  q^1p\hat{q}_{1-p} for different values of kk.

Refer to caption

Figure 6: 90%90~\% Bayes credibility and Monte-Carlo confidence intervals for  q1pq_{1-p} for different values of kk.

In Figure 6 the average Bayes credibility intervals (averaged over the 100 simulated data sets) are compared to the 90%90\% Monte-Carlo Confidence intervals (MCCI) for our Bayes-QC estimator. For both Burr (left panel) and Fréchet (right panel) ditributions lower bounds of the average Bayes credibility intervals and MCCI are very close. Upper bounds of average Bayes credibility intervals are larger than those of MCCI. It is interesting to note that the width of the average Bayes credibility intervals are the narrowest for kk where Bayes estimate of q1pq_{1-p} is the closest to the true value q1pq_{1-p}. This could be used to chose optimal values of the number of excesses kk. Finally, Figure 1 suggests that the optimal value of kk in the Burr(1, 0.5, 2)(1,\,0.5,\,2) case is close to 9090. For k=90k=90, the credibility intervals for both γ\gamma and q1pq_{1-p} are still satisfactory.

4 Application to real data sets

Here, advantages and good performance of our Bayesian estimators are illustrated through the analysis of extreme events described by two benchmark real data sets.

Nidd river data are widely used in extreme value studies (Hosking and Wallis, 1987 and Davison and Smith, 1990). The raw data consist in 154154 exceedances of the level 6565 m3s-1 by the river Nidd (Yorkshire, England) during the period 1934-1969 (35 years). The NN-year return level is the water level which is exceeded on average once in NN years. Hydrologists need to estimate extreme quantiles in order to predict return levels over long periods (250 years, i.e. p=9 104p=9\,10^{-4}, or even 500 years).

Fire reinsurance data were first studied by Schnieper (1993) and Reiss and Thomas (1999). They represent insurance claims exceeding uu == 2222 millions of Norwegian Kröner from 1983 to 1992 (1985 prices are used).

Many Goodness-of-fit tools (see for example Embrechts et al. 1997) suggested that excesses from Nidd river data and Fire reinsurance data are well modeled by GPD probability distribution. It was also shown that there are no problems of stationarity violation.

4.1 Nidd river data

Bayes quasi-conjugate estimates and related 90%90~\% credibility intervals for γ\gamma and q1pq_{1-p} are depicted in Figure 7 for several values of kk. Compared to other estimators, our approach provides the most stable estimates as kk varies. For 88 values of kk Grimshaw’s algorithm for computing ML estimates did not converge: see the broken ML curves in Figure 7. Histograms of γ\gamma’s and q1pq_{1-p}’s for k=82k=82 are displayed in Figure 8. Table 1 summarizes results of the estimation of the 5050-year and 100100-year return levels of the Nidd river when the threshold uu is set equal to either 100100 (k=39k=39) or 120120 (k=24k=24).

Remark 3 . – Recall that the NN-year return level RLN{RL}_{N} is the water level which is exceeded on average once in NN years. Equations relating q1pq_{1-p} to the return level RLN{RL}_{N} follow from Davison and Smith (1990): RL^N\widehat{RL}_{N} \simeq u(σ^/γ^)[1(λ^N)γ^]u\,-\,(\widehat{\sigma}\big{/}\widehat{\gamma})[1-(\widehat{\lambda}N)^{\widehat{\gamma}}], where it is assumed that the exceedance process is Poisson with annual rate λ\lambda. If we have observed kk excesses above the threshold uu during 3535 years, then λ\lambda is estimated by k/35k\big{/}35. It follows that RL^N\widehat{RL}_{N} == q^1p\widehat{q}_{1-p} with pp \simeq k/λ^Nnk\,\big{/}\,\widehat{\lambda}Nn. For the Nidd river data, this yields pp \simeq 35/Nn35\,\big{/}\,Nn. Therefore, estimating the 100100-year return level is equivalent to estimating q1pq_{1-p} by pp == 35/(100×154)35\big{/}(100\times 154) \simeq 2.27 1032.27\,10^{-3}. \Box

As shown in Table 1, our credibility intervals with level 95%95~\% (see the last column) compare well to the Bayesian confidence intervals obtained by Davison and Smith (1990), which are based on uniform priors for q1pq_{1-p}, λ\lambda and γ\gamma (see Smith and Naylor, 1987 for more details). Actually, ours are slightly narrower.

Refer to caption

Figure 7: γ^\widehat{\gamma} and q^1p\widehat{q}_{1-p} for the Nidd river data set with several values of kk.

Refer to caption

Figure 8: Histograms of 500500 γ\gamma’s and q1pq_{1-p}’s simulated from the posterior distribution for the Nidd river data set, k=82k=82.

4.2 Fire reinsurance data and net premium estimation

In the excess-of-loss (XL) reinsurance agreements, the re-insurer pays only for excesses over a high value uu of the individual claim sizes. The net premium is the expectation of the total claim amounts that the re-insurer will pay during the future period [0,T][0,\,T]: 𝔼(SNT)\mathbb{E}(S_{N_{T}}) == i=1NTYi\sum_{i=1}^{N_{T}}Y_{i}, where NTN_{T} is the random number of claims exceeding uu during [0,T][0,\,T] and Y1,Y2,Y_{1},Y_{2},\dots are the amounts of excesses above uu. If the YiY_{i}’s are modeled by a GPD(γ\gamma, σ\sigma) and the exceedance arrival process is modeled by a homogeneous Poisson process with annual rate λ\lambda, then the net premium over the coming year is approximated by 𝔼(SN1)\mathbb{E}(S_{N_{1}}) \simeq 𝔼(N1)𝔼(Yi)\mathbb{E}(N_{1})\,\mathbb{E}(Y_{i}) == λσ/(1γ)\lambda\sigma\big{/}(1-\gamma). Reiss and Thomas (1999, 2001) estimate the net premium of the Norwegian fire claims reinsurance by analysing a data set (see Table 2) of large Norwegian fire claims between 1983 and 1992 (1985 prices). They use a Bayesian inference method for the GPD parameters and assume that the exceedance process is Poisson. Independent gamma priors are used for λ\lambda and α\alpha and an inverse-gamma prior, with parameters (aa, bb), is chosen for β\beta. Posterior means of λ\lambda, α\alpha and β\beta are computed using Monte Carlo numerical approximations of integrals. Table 3 compares estimates of (γ\gamma, σ\sigma) and the net premium λσ/(1γ)\lambda\sigma\big{/}(1-\gamma) obtained by our quasi-conjugate approach with those obtained by Reiss and Thomas for ML and Bayesian estimates with different values of the hyperparameters aa and bb of the inverse-gamma. Our approach has the advantage of indicating the precision of the estimates.

Analysis ML Bayes-QC Uniform Bayes Bayes-QC
point Return level estimate Credibility Intervals
5050-year return level
u=100u=100 305 374 [210,   775] [266,   672]
u=120u=120 280 403 [215,   850] [304,   690]
100100-year return level
u=100u=100 340 457 [220,   935] [306,   911]
u=120u=120 307 499 [225,   940] [354,   961]
Table 1: Uniform and quasi-conjugate Bayesian 95%95~\% credibility intervals for 5050-year and 100100-year return levels. Nidd river data.
year claim sizes year claim sizes
(in millions) (in millions)
1983 42.719 23.208
1984 105.860 1990 37.772
1986 29.172 34.126
22.654 27.990
1987 61.992 1992 53.472
35.000 36.269
1988 26.891 31.088
1989 25.590 25.907
24.130
Table 2: Norwegian fire claims sizes over 22.0 millions NKr from 1983 to 1992 (1985 prices), from Schnieper (1993).
Analysis γ^\widehat{\gamma} σ^\widehat{\sigma} Net premium
Point estimate 90%90~\% credib. interv.
ML for GPD 0.254 11.948 27.23
Bayes (Reiss and Thomas)
Inv.-Gamma(a=2a=2, b=2b=2) 0.288 11.658 27.83
Inv.-Gamma(a=4a=4, b=6b=6) 0.274 11.814 27.66
Bayes-QC approach 0.384 10.332 30.03 [17.09,   84.39]
Table 3: Bayesian estimates of γ\gamma, σ\sigma and the XL net premium for fire reinsurance data.

5 Discussion

The proposed quasi-conjugate Bayes approach has many advantages when compared to standard GPD parameters and extreme quantiles estimators:

– it can incorporate experts prior information and improve estimation of extreme events even when data are scarce,

– it provides Bayes credibility intervals assessing the quality of the extreme events estimation,

– it often gives estimators with weak dependence on the number kk of used excesses,

– the variances of the empirical quasi-conjugate Bayes estimators compares well to the variances of the standard estimators. This suggests that quasi-conjugate Bayes estimators including experts opinion would give very accurate extreme quantile estimators, this point will be illustrated in a forthcoming paper.

We deeply describe the proposed quasi-conjugate Bayes approach for the most frequent case of DA(Fréchet) where tails are heavy (γ>0\gamma>0), the case of DA(Gumbel) is analytically discussed in Appendix A. Future work is needed to extend this approach to the general case where the user has no prior idea on γ\gamma.

The present paper is the first of a series of papers devoted to various developments of the Bayesian inference methodology that we introduced here. In particular, we will study how to determine and compute hyperparameters in a hierarchical structure setting based on the quasi-conjugate class defined here to take into account realistic expert prior information on extreme events.

Finally, note that it would be possible to include a Poisson parameter for the stream of exceedances as in Reiss and Thomas (2000). Also, spatial quantile estimation and multivariate or time-series extensions based on our approach are natural and very promising.

Appendix A

We present here a brief account of the simple case where Bayesian inference is made for exponential distributions, rather than GPD’s with both parameters unknown. This simple setup is of interest since it is the Bayesian analogue of the Exponential Tail (ET) method (Breiman et al. 1990), and all calculations lead to explicit analytical formulas.

Set λ=1/σ\lambda=1/\sigma and fλ(y)f_{{\lambda}}(y) == λeλy 1y>0\lambda\,e^{-\lambda y}\,\mathbf{1}_{y>0}. Denote by πa,b\pi_{a,\,b} the prior Gamma(a,b)(a,\,b) density (a,ba,\,b >0>0). The posterior density πa,b(λ|y1,,yk)\pi_{a,\,b}(\lambda|\,y_{1},\dots,y_{k}) is Gamma(a+k,b+Sk)(a+k,\,b+S_{k}), where SkS_{k} == i=1kyj\sum_{i=1}^{k}y_{j}. Expert information is reflected in the choice of aa and bb. The corresponding posterior predictive distribution is GPD (a+k,b+Sk)(a+k,\,b+S_{k}), with γpred\gamma_{\mbox{\rm\scriptsize pred}} == 1/(a+k)1/(a+k) and σpred\sigma_{\mbox{\rm\scriptsize pred}} == (b+Sk)/(a+k)(b+S_{k})/(a+k). Our first estimate of q1pq_{1-p} is based on the posterior mean λ^bayes\widehat{\lambda}_{\mbox{\rm\scriptsize bayes}} == (a+k)/(b+Sk)(a+k)\big{/}(b+S_{k}) of λ\lambda:

q^1p,bayes=u+b+Ska+kln(knp).\widehat{q}_{1-p,\;\mbox{\rm\scriptsize bayes}}\;=\;u\;+\;\frac{b+S_{k}}{a+k}\ln{\left(\frac{k}{np}\right)}\,.

Our second estimate is based on the posterior predictive distribution:

q^1p,pred=u+(b+Sk)[(knp)1/(a+k) 1].\widehat{q}_{1-p,\;\mbox{\rm\scriptsize pred}}\;=\;u\;+\;\left(b+S_{k}\right)\left[\left(\frac{k}{np}\right)^{1/(a+k)}\;-\;1\right]\,.

Our third estimate is based on the transformed posterior distribution. Since the posterior on σ\sigma == 1/λ1/\lambda is an inverse-gamma distribution with density

(b+Sk)a+kΓ(a+k)σa+k+1exp(b+Skσ) 1σ>0\frac{\left(b+S_{k}\right)^{a+k}}{\Gamma\left(a+k\right)\,\sigma^{a+k+1}}\,\exp{\left(-\,\frac{b+S_{k}}{\sigma}\right)}\,\mathbf{1}_{\sigma>0}

the corresponding distribution of u+σln(k/np)u+\sigma\ln{(k/np)} has a similar shape, and has mean

q^1p,post=u+b+Ska+k1ln(knp)\widehat{q}_{1-p,\;\mbox{\rm\scriptsize post}}\;=\;u\;+\;\frac{b+S_{k}}{a+k-1}\,\ln{\left(\frac{k}{np}\right)}

For kk large enough, q^1p,bayes\widehat{q}_{1-p,\;\mbox{\rm\scriptsize bayes}} is close to q^1p,post\widehat{q}_{1-p,\;\mbox{\rm\scriptsize post}} with respect to the standard deviation scale, which is of the order of (b+Sk)(a+k)3/2ln(k/np)(b+S_{k})(a+k)^{-3/2}\ln{(k/np)}. On the contrary, a Taylor expansion shows that when ln(k/np)/(a+k)\ln{(k/np)}/(a+k) is not too large,

q^1p,predu+b+Ska+kln(knp)[1+ln(knp)2(a+k)].\widehat{q}_{1-p,\;\mbox{\rm\scriptsize pred}}\;\approx\;u\;+\;\frac{b+S_{k}}{a+k}\ln{\left(\frac{k}{np}\right)}\left[1\;+\;\frac{\ln{\left(\frac{k}{np}\right)}}{2(a+k)}\right]\,.

The distance between q^1p,pred\widehat{q}_{1-p,\;\mbox{\rm\scriptsize pred}} and each of the two other estimates can be significant, and q^1p,pred\widehat{q}_{1-p,\;\mbox{\rm\scriptsize pred}} exhibits a positive bias with respect to the other estimates. We have observed a similar behavior when dealing with GPD’s: This is the reason why we have discarded the analogous of q^1p,pred\widehat{q}_{1-p,\;\mbox{\rm\scriptsize pred}} in that setting, and selected estimates of q1pq_{1-p} based on its posterior distribution.

Appendix B

We propose here two examples for introducing expert opinion in our Bayesian framework. In the first one, we use a partial expert opinion: It acts only on one parameter of the GPD distribution, whereas the second one is derived from the empirical choice made in Subsection 2.4. In the second one, the expert opinion acts on both parameters of the GPD distribution.

Example 1.

In this situation, the expert provides a rare value qmaxq_{\mbox{\scriptsize max}} of the variable as well as an interval [p1,p2][p_{1},\,p_{2}] containing the probability pp to overpass qmaxq_{\mbox{\scriptsize max}} and a (small) probability ε\varepsilon measuring the uncertainty of this opinion. From Pickands theorem, we deduce q~1p2qmaxq~1p1\tilde{q}_{1-p_{2}}\leq q_{\mbox{\scriptsize max}}\leq\tilde{q}_{1-p_{1}} where

q~1p=u+β[(npk)1/α1].\tilde{q}_{1-p}=u+\beta\left[\left(\frac{np}{k}\right)^{-1/\alpha}-1\right]. (18)

Plugging u=xnk,nu=x_{n-k,n} yields

β[(np2k)1/α1]qmaxxnk,nβ[(np1k)1/α1].\beta\left[\left(\frac{np_{2}}{k}\right)^{-1/\alpha}-1\right]\leq q_{\mbox{\scriptsize max}}-x_{n-k,n}\leq\beta\left[\left(\frac{np_{1}}{k}\right)^{-1/\alpha}-1\right]. (19)

Replacing α\alpha by its Hill estimate α^\hat{\alpha} (similarly to Subsection 2.4), we obtain the following bounds for β\beta:

β1ββ2 where β1=qmaxxnk,n(np1/k)1/α^1 and β2=qmaxxnk,n(np2/k)1/α^1.\beta_{1}\leq\beta\leq\beta_{2}\ \ \textrm{ where }\ \ \beta_{1}=\displaystyle\frac{q_{\mbox{\scriptsize max}}-x_{n-k,n}}{(np_{1}/k)^{-1/\hat{\alpha}}-1}\ \ \textrm{ and }\ \ \beta_{2}=\frac{q_{\mbox{\scriptsize max}}-x_{n-k,n}}{(np_{2}/k)^{-1/\hat{\alpha}}-1}\,.

Note that, the converse approach (fixing β\beta and bounding α\alpha) can also be considered. The prior distribution of β\beta given α=α^\alpha=\hat{\alpha} is Gamma(δα^+1,δη)(\delta\hat{\alpha}+1,\,\delta\eta). Suppose that [0,β1][0,\beta_{1}] and [β2,+)[\beta_{2},+\infty) both have probability ε/2\varepsilon/2 for this gamma distribution. We thus have two nonlinear equations permitting to obtain δ\delta and η\eta. Approaching the gamma distribution by a Gaussian one yields explicit solutions:

δ\displaystyle\displaystyle\delta =\displaystyle= 1α^[z1ε/22(β1+β2β2β1)21].\displaystyle\frac{1}{\hat{\alpha}}\left[z_{1-\varepsilon/2}^{2}\left(\frac{\beta_{1}+\beta_{2}}{\beta_{2}-\beta_{1}}\right)^{2}-1\right]. (20)
η\displaystyle\displaystyle\eta =\displaystyle= 2α^z1ε/2β1+β2(β2β1)2[z1ε/22(β1+β2β2β1)21]1,\displaystyle 2\hat{\alpha}z_{1-\varepsilon/2}\,\frac{\beta_{1}+\beta_{2}}{(\beta_{2}-\beta_{1})^{2}}\left[z_{1-\varepsilon/2}^{2}\left(\frac{\beta_{1}+\beta_{2}}{\beta_{2}-\beta_{1}}\right)^{2}-1\right]^{-1}, (21)

where z1ε/2z_{1-\varepsilon/2} is the 1ε/21-\varepsilon/2 quantile of the standard Gaussian distribution. The parameter μ\mu is determined by imposing that the gamconII(δ,η/μ\delta,\,\eta/\mu) prior distribution of α\alpha has mode α^\hat{\alpha}. From (14), we obtain:

μ=ηexp[lnδ+ψ(α^)ψ(δα^+1)],\displaystyle\mu=\eta\exp\left[\ln\delta+\psi(\hat{\alpha})-\psi\left(\delta\hat{\alpha}+1\right)\right], (22)

where ψ\psi denotes the digamma function.

Example 2.

Here, the expert provides two rare values qmax,1q_{\mbox{\scriptsize max},1} and qmax,2q_{\mbox{\scriptsize max},2} of the variable as well as their associated probabilities p1p_{1} and p2p_{2} to be overcome. The confidence on this opinion is measured by δ\delta (see Remark 1). Equation (18) yields two nonlinear equations from which the GPD parameters (α0,β0)(\alpha_{0},\beta_{0}) can be computed. This allows to determinate the hyperparameters η\eta and μ\mu. To this end, we impose α0\alpha_{0} and β0\beta_{0} to be respectively the modes of the prior distribution of α\alpha and β\beta given α\alpha. From (14), we obtain

η\displaystyle\eta =\displaystyle= α0/β0\displaystyle\alpha_{0}/\beta_{0} (23)
μ\displaystyle\mu =\displaystyle= ηexp(lnδ+ψ(α0)ψ(α0δ+1)),\displaystyle\eta\exp(\ln\delta+\psi(\alpha_{0})-\psi(\alpha_{0}\delta+1)), (24)

where ψ\psi denotes the digamma function.

References

  • [1] Bacro, J., and Brito, M. Strong limiting behaviour of a simple tail Pareto-index estimator. Statist. Dec. 3, 133–143, (1993).
  • [2] Balkema, A., and de Haan, L. Residual life time at a great age. Annals of Probability, 2, 792–804, (1974).
  • [3] Beirlant, J., Dierckx, G., and Guillou, A. Estimation of the extreme value index and regression on generalized quantile plots. Preprint, Catholic University of Leuven, Catholic University of Leuven, Belgium. (2001).
  • [4] Beirlant, J., and Teugels, J. Asymptotic normality of Hill’s estimator. In Extreme Value Theory (1989), eds., J. Hüsler and R.D. Reiss, Proceedings of the Oberwolfach Conference, 1987. Springer-Verlag New York.
  • [5] Bottolo, L., Consonni, G., Dellaportas, P. and Lijoi, A. Bayesian analysis of extreme values by mixture modeling, Extremes 6, 25–47, (2003).
  • [6] Breiman, L., Stone, C., and Kooperberg, C. Robust confidence bounds for extreme upper quantiles. J. Statist. Comput. Simul. 37, 127–149, (1990).
  • [7] Caers, J., Beirlant, J., and Vynckier, P. Bootstrap confidence intervals for tail indices. Computational Statistics and Data Analysis 26, 259–277, (1998).
  • [8] Castillo, E., and Hadi, A. Fitting the Generalized Pareto Distribution to data. J. Amer. Stat. Assoc. 92, 440, 1609–1620, (1997).
  • [9] Coles, S. An introduction to statistical modelling of extreme values. Springer series in Statistics, 2001.
  • [10] Coles, S., and Powell, E. Bayesian methods in extreme value modelling: A review and new developments. International Statistical Review 64, 1, 119–136, (1996).
  • [11] Coles, S., and Tawn, J. A Bayesian analysis of extreme rainfall data. Applied Statistics 45, 4, 463–478, (1996).
  • [12] Damsleth, E. Conjugate classes of Gamma distributions. Scandinavian Journal of Statistics: Theory and Applications 2, 80–84, (1975).
  • [13] Davison, A., and Smith, R. Models for exceedances over high thresholds. J. R. Statist. Soc. B 52, 3, 393–442, (1990).
  • [14] Dekkers, A., Einmahl J.H.J., and de Haan, L. A moment estimator for the index of an extreme-value distribution. The Annals of Statistics 17, 4, 1833–1855, (1989).
  • [15] Diebolt, J., Guillou, A., and Worms, R. Asymptotic behaviour of the probability-weighted moments and penultimate approximation. ESAIM: Probability and Statistics 7, 219–238, (2003).
  • [16] Dupuis, D. Exceedances over high thresholds: A guide to threshold selection. Extremes 3, 1, 251–261, (1998).
  • [17] Embrechts, P., Klüppelberg, C., and Mikosch, T. Modelling Extremal Events. Springer, 1997.
  • [18] Garrido, M. Modélisation des évènements rares et estimation des quantiles extrêmes, Méthodes de sélection de modèles pour les queues de distribution. PhD thesis, Université Joseph Fourier de Grenoble (France), 2002.
  • [19] Grimshaw, S. Computing maximum likelihood estimates for the Generalized Pareto Distribution. Technometrics 35, 2, May, 185–191, (1993).
  • [20] Haeusler, E., and Teugels, J. On asymptotic normality of Hill’s estimator for the exponent of regular variation. The Annals of Statistics 13, 2, 743–756, (1985).
  • [21] Hill, B. A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3, 5, 1163–1174, (1975).
  • [22] Hosking, J., and Wallis, J. Parameter and quantile estimation for the Generalized Pareto Distribution. Technometrics 29, 3, August, 339–349, (1987).
  • [23] Pickands, J. Statistical inference using extreme order statistics. The Annals of Statistics, 3, 119–131, (1975).
  • [24] Reiss, R., and Thomas, M. A new class of Bayesian estimators in paretian excess-of-loss reinsurance. Astin Journal 29, 2, 339–349, (1999).
  • [25] Reiss, R., and Thomas, M. Statistical Analysis of Extreme Values. Birkhauser Verlag, 2001.
  • [26] Robert, C.P. Discretization and MCMC Convergence Assessment. Lecture Notes in Statistics 135, Springer Verlag, 1998.
  • [27] Schnieper, R. Praktische Erfahrungen mit Grossschadenverteilungen. Mitteil. Schweiz. Verein Versicherungsmath, 149–165, (1993).
  • [28] Schultze, J., and Steinebach, J. On least squares estimates of an exponential tail coefficient. Statist. Decisions 14, 353–372, (1996).
  • [29] Singh, V., and Guo, H. Parameter estimation for 2-parameter Generalized Pareto Distribution by POME. Stochastic Hydrology and Hydraulics 11, 3, 211–227, (1997).
  • [30] Smith, R. Estimating tails of probability distributions. The Annals of Statistics 15, 3, 1174–1207, (1987).
  • [31] Smith, R., and Naylor, J. A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. Appl. Statist. 36, 3, 358–369, (1987).
  • [32] Tancredi, A., Anderson, C. and O’Hagan, A. A Bayesian model for threshold uncertainty in extreme value theory. Working paper 2002.12, Dipartimento di Scienze Statistiche, Universita di Padova. (2002).
  • [33] Worms, R. Vitesse de convergence de l’approximation de Pareto Généralisée de la loi des excès. Comptes-Rendus de l’Académie des Sciences, série 1, Tome 333, 65–70, (2001).