This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Credit migration: Generating generators

Richard J. Martin111Department of Mathematics, Imperial College London, Exhibition Road, London SW7 2AZ, U.K. Email richard.martin1@imperial.ac.uk
Abstract

Markovian credit migration models are a reasonably standard tool nowadays, but there are fundamental difficulties with calibrating them. We show how these are resolved using a simplified form of matrix generator and explain why risk-neutral calibration cannot be done without volatility information. We also show how to use elementary ideas from differential geometry to make general inferences about calibration stability.

This is the longer version of a paper published in RISK (Feb 2021).

Introduction

Credit markets face uncertain times, and the current upset caused by the coronavirus pandemic is likely to result in deterioration of the credit fundamentals for many issuers. Although there has been a huge rally since the end of the Global Financial Crisis, which now seems a distant memory, there have been upsets in the Eurozone (2011), and in commodity and energy markets (2014–16), while in emerging markets there have been downgrades in the sovereigns of Brazil, Russia and Turkey, and even during the benign year of 2019, the great majority of LatAm corporate names on rating-watch were potential downgrades rather than upgrades222See e.g. Credit Suisse Latin America Chartbook.. It is worthwhile, we think, to revisit the subject of credit migration.

A standard modelling framework, introduced by Jarrow et al. [Jarrow97] over twenty years ago, uses a Markov chain, parametrised by a generator matrix, to represent the evolution of the credit rating of an issuer. A good overview of subsequent developments, including its extension to multivariate settings, is given by [Bielecki11]; in terms of how it relates to the credit markets and default history, an excellent series of annual reports is produced by Standard & Poors, of which the most recent is [SandP19]. In the taxonomy of credit models, the ratings-based Markov chain model is best understood as a sort of reduction of the credit barrier models, in the sense that the firm value, or distance to default, is replaced by the credit rating. This reduces the dimensionality from a continuum of possible default distances to a small number of rating states, a natural idea as credit rating is a convenient way of thinking about a performing issuer.

The Markov chain model can be used in a number of ways, making it of general interest:

  • Econometrically, to tie potential credit losses to econometric indicators and thence to bank capital requirements [Fei13];

  • Risk management of vanilla and exotic credit portfolios;

  • Valuation of structured credit products. It is worth bearing in mind that although the synthetic CDO market is a fraction of its former self, the CLO market is still thriving, with new CLOs being issued frequently. Further, the ‘reg-cap’ sector (trades providing regulatory capital relief for banks) is growing, and requires the pricing and risk management of contingent credit products;

  • Understanding the impact of rating-dependent collateral calls in XVA;

  • Stripping the credit curve from bond prices, as a way of constructing sensibly-shaped yield curves for different ratings and tenors.

The model can, therefore, be used in both subjective (historical) and market-implied (risk-neutral) modelling, and we shall consider both here. Nevertheless there are several difficulties that have not been discussed in the literature, detracting from the utility of the model and limiting its application hitherto:

  • (I)

    The number of free parameters is uncomfortably high (with nn states plus default it is n2n^{2});

  • (II)

    Computation of transition probabilities requires the generator matrix to be exponentiated, but this is a trappy subject [Moler03];

  • (III)

    There are algebraic hurdles associated with trying to make a generator matrix time-varying.

Additionally the following fundamental matters are insufficiently understood:

  • (IV)

    What is the best way to calibrate to an empirical (historical) transition matrix, and what level of accuracy may reasonably be asked for?

  • (V)

    Model calibration is one of the most important disciplines in quantitative finance, and yet no serious attempt has been made to answer the following: Can we uniquely identify the risk-neutral generator from the term structure of credit spreads for each rating? If not, what extra information is needed?

We address all of these here, solving most of the above problems using a new parametrisation in which a tridiagonal generator matrix is coupled with a Lévy stochastic time change (we call this TDST).

1 Tridiagonal–stochastic time change (TDST) model

We consider an nn-states-plus-default continuous-time Markov chain and write Gn+1,n+1G\in\mathbb{R}^{n+1,n+1} for the generator matrix with the (n+1)(n+1)th state denoting default (we will write this state as \maltese). The off-diagonal elements of GG must be nonnegative and the rows sum to zero; the last row is all zero because default is an absorbing state. The probability of default in time τ\tau, conditional on starting at state ii, is the iith element of the last column of the matrix exp(Gτ)\exp(G\tau): in other words [exp(Gτ)]i[\exp(G\tau)]_{i\maltese}.

Let us begin by writing the generator matrix Gn+1,n+1G\in\mathbb{R}^{n+1,n+1} in the form

G=[Hv00]G=\begin{bmatrix}H&v\\ 0&0\end{bmatrix} (1)

where Hn,nH\in\mathbb{R}^{n,n}, and vnv\in\mathbb{R}^{n} is such that the rows of GG sum to zero; the offdiagonal elements of HH, and those of vv, must of course be nonnegative. The τ\tau-period transition matrix is given by

exp(Gτ)=[exp(Hτ)1(Hτ)vτ01]\exp(G\tau)=\begin{bmatrix}\exp(H\tau)&\mathcal{E}_{1}(H\tau)\cdot v\tau\\ 0&1\end{bmatrix}

where the function 1\mathcal{E}_{1} is defined by 1(x)=(ex1)/x\mathcal{E}_{1}(x)=(e^{x}-1)/x. As is apparent, we have used a matrix exponential, and also the function 1\mathcal{E}_{1}, and it is worth making some general remarks about analytic functions of matrices.

Let ff be a function analytic on a domain 𝒟\mathcal{D}\subset\mathbb{C}. Let B(z0,ρ)B(z_{0},\rho) denote the open disc of centre z0z_{0} and radius ρ\rho, and assume that this lies inside 𝒟\mathcal{D}. Then ff possesses a Taylor series k=0fk(zz0)k\sum_{k=0}^{\infty}f_{k}(z-z_{0})^{k} convergent for zB(z0,ρ)z\in B(z_{0},\rho). It is immediate, proven for example by reduction to Jordan normal form [Cohn84], that the expansion

k=0fk(Az0I)k\sum_{k=0}^{\infty}f_{k}(A-z_{0}I)^{k}

enjoys the same convergence provided that all the eigenvalues of AA lie in B(z0,ρ)B(z_{0},\rho). We can therefore write f(A)f(A) without ambiguity, and will use this notation henceforth. This construction establishes the existence of the matrix exponential (fk=1/k!f_{k}=1/k! and ρ=\rho=\infty), though it does not at all make it a good computational method, because roundoff errors can be substantial333For example in computing eze^{-z} at z=20z=20: the roundoff error is far in excess of the correct answer.. Taylor series expansion is one of the many bad ways of calculating the matrix exponential [Higham08, Moler03]—a point that is routinely ignored. A better idea is

exp(A)=limn(I+A/n)n=limn(IA/n)n\exp(A)=\lim_{n\to\infty}(I+A/n)^{n}=\lim_{n\to\infty}(I-A/n)^{-n} (2)

and in each case computation is easiest when nn is a power of two because it can be accomplished by repeated squaring. The second expression is slightly harder to calculate as it requires a matrix inverse, but it has an advantage: for nn\in\mathbb{N}, (IA/n)n(I-A/n)^{-n} is a valid transition matrix whereas (I+A/n)n(I+A/n)^{n} may not be.

Having discussed the matrix exponential, we also have something to say about its inverse. It is in principle possible to write down a τ\tau-period (commonly one year) transition matrix MM and ask if it comes from a generator GG, which amounts to finding the matrix logarithm of MM. If such a generator exists then MM can be raised to any real positive power, and is sometimes said to be embeddable444‘Embedding’: there exists a smooth map 𝔪\mathfrak{m} from 0\mathbb{R}_{\geq 0} into the space of valid transition matrices, obeying 𝔪(s+t)=𝔪(s)𝔪(t)\mathfrak{m}(s+t)=\mathfrak{m}(s)\mathfrak{m}(t). It is, of course, 𝔪(t)=Mt\mathfrak{m}(t)=M^{t}.. Not all valid transition matrices are embeddable. By identifying a generator at the outset we avoid this problem.

Another important thread is the use of diagonalisation, implicit in the above discussion because we have already mentioned eigenvalues. If A=PEP1A=PEP^{-1} for some invertible PP and diagonal EE then f(A)=Pf(E)P1f(A)=Pf(E)P^{-1}. However, while a real matrix is generically555This means that if it isn’t, it is arbitrarily close to one that is. diagonalisable, it may be that PP is almost singular. There is also the minor irritation that P,EP,E may not be real. This method is another popular way of exponentiating matrices, but again one that is only reliable in certain contexts, for example symmetric matrices: more of this presently.

A simplification of the model is to assume that HH is tridiagonal: that is to say the only nonzero elements are those immediately above, below, or on the leading diagonal. We define a transitive generator to be one in which all the super- and sub-diagonal elements of HH are strictly positive. An intransitive generator does not permit certain transitions to neighbouring states to occur: informally this causes the rating to ‘get stuck’ and can be ruled out on fundamental grounds, though it does retain some interest as a limiting case. Also, we define a restricted generator to be one in which all the rows of HH, except for the nnth, sum to zero: equivalently, vv is zero except for its bottom element. Both these simplified models have a clear connection to credit modelling as discretisations of the structural (Merton) model, in which the firm value is replaced by the credit rating as a distance-to-default measure. A restricted tridiagonal generator corresponds to a Brownian motion hitting a barrier, while an unrestricted tridiagonal generator has an additional dynamic of the firm suddenly jumping to default. The transitivity condition simply ensures that the volatility always exceeds zero.

It is clear that the space of restricted tridiagonal generators has 2n12n-1 degrees of freedom, and the unrestricted one 3n23n-2. These are substantially lower than the n2n^{2} needed to define a general model. Also, tridiagonality is useful not just for dimensionality reduction. A transitive generator can always be diagonalised, and has real eigenvalues. This is because H=DH~D1H=D\widetilde{H}D^{-1} with DD a positive diagonal matrix and H~\widetilde{H} a symmetric tridiagonal matrix; then H~\widetilde{H} can be diagonalised via an orthogonal change of basis as H~=QEQ\widetilde{H}=QEQ^{\prime} with denoting transpose. (The method is particularly fast for tridiagonal matrices: see e.g. [NRC, §11.3].) Thus

exp(Gτ)=[DQexp(Eτ)QD101].\exp(G\tau)=\begin{bmatrix}DQ\exp(E\tau)Q^{\prime}D^{-1}&\ast\\ 0&1\end{bmatrix}.

The right-hand column of the matrix, written as “\ast”, need not be computed directly because it can be obtained from the rows summing to unity. Note that the eigenvalues of HH, i.e. the (diagonal) elements of EE, are real and negative. This makes computation of the matrix exponential straightforward666Technically, the limit of an intransitive generator will cause DD to become singular, and so the construction then fails, but this seems to be a theoretical issue only.. As an aside we note the connection between tridiagonal matrices, orthogonal polynomials and Riemann-Hilbert problems [Deift00, Ch.2].

The problem with tridiagonal generators is that they do not permit multiple downgrades over an infinitesimal time period, and so the probability of such events over short (non-infinitesimal) periods, while positive, is not high enough. In this respect, the presence of a jump-to-default transition achieves little, because that is not the main route to default: far more likely is the occurrence of multiple downgrades, often by several notches at a time. So how do we make these sudden multiple downgrades occur?

To retain the structure above we employ a stochastic time change, replacing τ\tau in the above equation by τt\tau_{t} with tt denoting ‘real’ time and τt\tau_{t} ‘business’ time. We are going to make τt\tau_{t} a pure jump process, with the idea that business time occasionally takes a big leap, causing multiple transitions, and we call this model TDST (tridiagonal with stochastic time change). Formally the process (τt)(\tau_{t}) is to be a monotone-increasing Lévy process777See e.g. [Schoutens03] for a general introduction. described by the generator φ\varphi, i.e.:

𝐄0[eu(τtτ0)]eφ(u)t.\mathbf{E}_{0}[e^{u(\tau_{t}-\tau_{0})}]\equiv e^{\varphi(u)t}.

The TT-period transition matrix is obtained by integrating over all paths {τt}t=0T\{\tau_{t}\}_{t=0}^{T}. As the dependence on τ\tau is through an exponential, this is straightforward, and the effect is to replace HH by φ(H)\varphi(H) in (1), adjusting the last column as necessary. (In context φ\varphi will be regular in the left half-plane, so we can legitimately write the matrix function φ(H)\varphi(H).) Accordingly the generator and TT-period transition matrix are

Gφ=[φ(H)00];Mφ(T)=[exp(φ(H)T)01]=[DQexp(φ(E)T)QD101].G_{\varphi}=\begin{bmatrix}\varphi(H)&\ast\\ 0&0\end{bmatrix};\qquad M_{\varphi}(T)=\begin{bmatrix}\exp(\varphi(H)T)&\ast\\ 0&1\end{bmatrix}=\begin{bmatrix}DQ\cdot\exp(\varphi(E)T)\cdot Q^{\prime}D^{-1}&\ast\\ 0&1\end{bmatrix}. (3)

We can impose that τt/t\tau_{t}/t have unit mean (as otherwise the model is over-specified because HH and τ\tau can be scaled in opposite directions), so φ(0)=1\varphi^{\prime}(0)=1. An obvious choice is the CMY process

φ(u)=βγ(1(1u/β)γ),β>0,γ<1\varphi(u)=\frac{\beta}{\gamma}\big{(}1-(1-u/\beta)^{\gamma}\big{)},\qquad\beta>0,\quad\gamma<1 (4)

which has as special cases the Inverse Gaussian process and the Gamma process, respectively (γ=12,0\gamma=\frac{1}{2},0):

φ(u)=2β(11u/β),φ(u)=βlog(1u/β).\varphi(u)=2\beta(1-\sqrt{1-u/\beta}),\qquad\varphi(u)=-\beta\log(1-u/\beta).

This retains the connection with structural (Merton) modelling, in that we have a discretisation of a Lévy process hitting a barrier—a standard idea in credit risk modelling [Lipton02c, Martin09a, Martin10b, Dalessandro11]. An incidental remark about the Gamma process is that it allows the matrix exponential φ(H)\varphi(H) to be calculated in an alternative simple way, as noted in the Appendix.

It is perhaps worth emphasising that despite our using the term ‘stochastic time change’ this model specification still has a static generator. The stochastic time change serves only as a mechanism for generating multiple downgrades in an infinitesimal time period using a tridiagonal generator matrix. The number of parameters for this model is 2n1+nφ2n-1+n_{\varphi}, where nφn_{\varphi} (=2=2 here) is the number needed to specify φ\varphi.

2 Calibration to historical transition matrix

Israel et al. [Israel00] spend some time showing that empirical transition matrices are not necessarily embeddable (q.v.). In fact it is only necessary to test whether the empirical matrix is statistically likely to be so. Taking S&P’s data in [SandP19, Tables 21,23], we ask if we can find a TDST model that fits well enough.

There are a number of issues to consider. The first is the right measure of closeness of fit. From the perspective of statistical theory the most appropriate is the Kullback–Leibler divergence888A Taylor series expansion around p=qp=q gives the well-known chi-squared measure of “(OE)2/E(O-E)^{2}/E, summed” (OO observed, EE expected)—this can also be used to define the fitting error, and it gives similar results.:

𝖽M=i=1nj=1n+1pijlnpijqij\mathsf{d}_{M}=\sum_{i=1}^{n}\sum_{j=1}^{n+1}p_{ij}\ln\frac{p_{ij}}{q_{ij}} (5)

where pijp_{ij} are the historical probabilities of transition from state ii to jj (with n+1n+1 denoting default) and qijq_{ij} the model ones. This is essentially the log likelihood ratio statistic, i.e. the logarithm of the ratio of the likelihood of the model compared with that of the maximum likelihood estimator [Cox74, Ch.4]. Note that 𝖽M\mathsf{d}_{M} is not symmetrical in p,qp,q: nor should it be. To see why, consider first the effect of letting qij0q_{ij}\to 0 with pij>0p_{ij}>0: this means that the event has been observed but that the model probability is zero, an error that should be heavily penalised. On the other hand, pij0p_{ij}\to 0 with qij>0q_{ij}>0 simply means that the model is attaching positive probability to an event that has not been observed, which is not a major objection.

Empirical transition data reveal a problem: the probability of transiting from rated to unrated (‘NR’) is quite high. A standard idea is to distribute this probability pro rata amongst the elements of each row, excepting the transition to default: doing so retains, as it must, the property that each row sum to unity. However, this is not the only way of making the adjustment, and although the ‘NR’ problem is unfortunate, it allows for, or cannot rule out, considerable leeway in the fitting—which is why attempts to exactly fit the matrix are misguided. Another issue is that the standard deviation of low empirical probabilities is proportionally quite high. The finiteness of the samples used to construct empirical transition matrices introduces considerable relative uncertainties.

We then minimise 𝖽M\mathsf{d}_{M} with respect to the elements of HH (and the bottom element of vv) and the parameters β,γ\beta,\gamma in (4). Optimisation of the 7-state model takes a fraction of a second and the results are shown in Figure 2. It is seen that the fit is good, and furthermore the fitting errors are small by comparison with the uncertainties introduced by the ‘NR’ problem. The errors are also typically smaller than the standard errors given by S&P, which are in [SandP19, Table 21] but omitted here999One has to be careful in interpreting these, for a reason that we have already touched on. Certain transitions have never been observed, and for these S&P show the probability and standard error as zero. Now if the true frequency is zero, then the mean and standard deviation of the observed frequency will be zero. But what we really want is an estimate of the true frequency and the standard deviation of that estimate. Clearly if Bayesian inference is followed then the latter is not zero, because the true frequency may be >0>0..

If we pass to an 18-state model (AAA, AA+, AA, …, CCC+, CCC) we still have a large number of parameters even after the reduction to a TDST model (324 to around 36). It is possible to fit these, but some further simplification proves advantageous. Suppose that the free parameters in the matrix HH are the sub/superdiagonal elements for the following transitions only: AAAA±\mbox{AA}\to\mbox{AA}\pm, AA±\mbox{A}\to\mbox{A}\pm, BBBBBB±\mbox{BBB}\to\mbox{BBB}\pm, BBBB±\mbox{BB}\to\mbox{BB}\pm, BB±\mbox{B}\to\mbox{B}\pm, CCCCCC+\mbox{CCC}\to\mbox{CCC}+, and also the bottom element of vv, giving CCC\mbox{CCC}\to\maltese. (Hence there are 12 in all.) The values for intermediate states (AAAA/A+\mbox{AA}-\to\mbox{AA}/\mbox{A}+ and so on) are found by logarithmic interpolation101010For convenience AA+AAA\mbox{AA+}\to\mbox{AAA} is taken to be the same as AAAA+\mbox{AA}\to\mbox{AA}+, and AAAAA+\mbox{AAA}\to\mbox{AA+} and AA+AA\mbox{AA}+\to\mbox{AA} the same as AAAA\mbox{AA}\to\mbox{AA}-.. A minor complication is that the S&P data group CCC+, CCC and lower ratings into one. We have ignored any rating lower than CCC because in practice transition through these states is very rapid, and default almost always ensues [SandP19, Chart 10]. Again the results are good: see Figure 2.

(i)

(ii) (%) AAA AA A BBB BB B CCC \maltese AAA 89.8289.82 9.429.42 0.550.55 0.050.05 0.080.08 0.030.03 0.050.05 0.000.00 AA 0.520.52 90.6490.64 8.178.17 0.510.51 0.050.05 0.060.06 0.020.02 0.020.02 A 0.030.03 1.771.77 92.2992.29 5.405.40 0.300.30 0.130.13 0.020.02 0.060.06 BBB 0.010.01 0.100.10 3.643.64 91.6291.62 3.853.85 0.490.49 0.120.12 0.170.17 BB 0.010.01 0.030.03 0.120.12 5.355.35 85.8685.86 7.377.37 0.610.61 0.650.65 B 0.000.00 0.020.02 0.090.09 0.200.20 5.665.66 85.5285.52 5.075.07 3.443.44 CCC 0.000.00 0.000.00 0.140.14 0.250.25 0.750.75 16.7616.76 55.2155.21 26.8926.89 (iii) (%) AAA AA A BBB BB B CCC \maltese AAA 89.6989.69 8.908.90 1.081.08 0.230.23 0.050.05 0.020.02 0.000.00 0.030.03 AA 0.560.56 91.1291.12 7.497.49 0.660.66 0.090.09 0.040.04 0.000.00 0.040.04 A 0.020.02 1.831.83 92.4492.44 5.215.21 0.330.33 0.090.09 0.010.01 0.070.07 BBB 0.000.00 0.110.11 3.643.64 91.5591.55 4.034.03 0.470.47 0.040.04 0.160.16 BB 0.000.00 0.020.02 0.300.30 5.215.21 85.6885.68 7.697.69 0.460.46 0.630.63 B 0.000.00 0.010.01 0.060.06 0.430.43 5.435.43 85.4385.43 5.695.69 2.962.96 CCC 0.000.00 0.000.00 0.020.02 0.120.12 0.960.96 16.7216.72 55.0655.06 27.1227.12 (%) AAA AA A BBB BB B CCC \maltese AAA 10.91-10.91 9.849.84 0.780.78 0.190.19 0.040.04 0.020.02 0.000.00 0.020.02 AA 0.620.62 9.42-9.42 8.168.16 0.490.49 0.080.08 0.030.03 0.000.00 0.040.04 A 0.010.01 2.002.00 8.05-8.05 5.665.66 0.240.24 0.080.08 0.010.01 0.060.06 BBB 0.000.00 0.080.08 3.953.95 9.08-9.08 4.544.54 0.330.33 0.030.03 0.140.14 BB 0.000.00 0.020.02 0.220.22 5.875.87 15.88-15.88 8.998.99 0.270.27 0.510.51 B 0.000.00 0.010.01 0.050.05 0.300.30 6.356.35 16.91-16.91 8.288.28 1.931.93 CCC 0.000.00 0.000.00 0.020.02 0.090.09 0.560.56 24.3424.34 60.85-60.85 35.8435.84 (iv) \uparrow 0 \downarrow AAA 0.1371-0.1371 0.13710.1371 AA 0.00860.0086 0.1184-0.1184 0.10980.1098 A 0.02690.0269 0.1024-0.1024 0.07550.0755 BBB 0.05270.0527 0.1172-0.1172 0.06460.0646 BB 0.08350.0835 0.2179-0.2179 0.13440.1344 B 0.09490.0949 0.2434-0.2434 0.14850.1485 CCC 0.43640.4364 1.0281-1.0281 0.59180.5918 value γ\gamma 0.81540.8154 β\beta 0.02410.0241

Figure 1: S&P 7-state historical transition matrix: (i) raw [SandP19, Table 21], (ii) adjusted for transition to unrated (NR); (iii) TDST fitted one-year transition matrix Mφ(1)M_{\varphi}(1), and its corresponding generator GφG_{\varphi}, as per eq.(3); (iv) parameters, i.e. the subdiagonal (upgrade, ‘\uparrow’), leading diagonal (rating unchanged, ‘0’), and superdiagonal (downgrade, ‘\downarrow’) in eq.(1), and the parameters β,γ\beta,\gamma specifying φ\varphi in eq.(4).

[Fitted] (%) AAA AA+ AA AA– A+ A A– BBB+ BBB BBB– BB+ BB BB– B+ B B– CCC+ CCC \maltese AAA 85.6585.65 9.499.49 2.502.50 1.061.06 0.560.56 0.310.31 0.180.18 0.110.11 0.060.06 0.030.03 0.020.02 0.010.01 0.010.01 0.010.01 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 AA+ 2.642.64 83.7083.70 9.199.19 2.362.36 0.990.99 0.490.49 0.270.27 0.150.15 0.090.09 0.040.04 0.020.02 0.010.01 0.010.01 0.010.01 0.010.01 0.000.00 0.000.00 0.000.00 0.010.01 AA 0.190.19 2.562.56 83.9083.90 9.129.12 2.322.32 0.920.92 0.450.45 0.240.24 0.130.13 0.060.06 0.030.03 0.020.02 0.010.01 0.010.01 0.010.01 0.000.00 0.000.00 0.000.00 0.020.02 AA– 0.030.03 0.250.25 3.443.44 83.3383.33 9.049.04 2.192.19 0.860.86 0.410.41 0.210.21 0.100.10 0.050.05 0.030.03 0.020.02 0.010.01 0.010.01 0.010.01 0.000.00 0.000.00 0.030.03 A+ 0.010.01 0.050.05 0.440.44 4.524.52 82.5882.58 8.788.78 2.092.09 0.790.79 0.360.36 0.160.16 0.080.08 0.040.04 0.030.03 0.020.02 0.010.01 0.010.01 0.000.00 0.000.00 0.030.03 A 0.000.00 0.020.02 0.110.11 0.730.73 5.825.82 81.3581.35 8.618.61 1.991.99 0.730.73 0.290.29 0.130.13 0.070.07 0.040.04 0.030.03 0.020.02 0.010.01 0.010.01 0.000.00 0.050.05 A– 0.000.00 0.010.01 0.040.04 0.210.21 1.011.01 6.306.30 80.8380.83 8.478.47 1.901.90 0.600.60 0.250.25 0.120.12 0.070.07 0.050.05 0.040.04 0.020.02 0.010.01 0.000.00 0.070.07 BBB+ 0.000.00 0.000.00 0.020.02 0.080.08 0.310.31 1.191.19 6.916.91 80.2880.28 8.318.31 1.661.66 0.560.56 0.240.24 0.130.13 0.080.08 0.060.06 0.040.04 0.020.02 0.010.01 0.110.11 BBB 0.000.00 0.000.00 0.010.01 0.040.04 0.130.13 0.390.39 1.411.41 7.577.57 79.6979.69 7.707.70 1.651.65 0.570.57 0.280.28 0.170.17 0.110.11 0.060.06 0.030.03 0.010.01 0.170.17 BBB– 0.000.00 0.000.00 0.000.00 0.020.02 0.060.06 0.170.17 0.500.50 1.701.70 8.658.65 76.7876.78 8.408.40 1.851.85 0.740.74 0.380.38 0.240.24 0.130.13 0.060.06 0.020.02 0.290.29 BB+ 0.000.00 0.000.00 0.000.00 0.010.01 0.040.04 0.090.09 0.230.23 0.630.63 2.062.06 9.329.32 73.8473.84 9.089.08 2.332.33 0.940.94 0.510.51 0.260.26 0.120.12 0.040.04 0.480.48 BB 0.000.00 0.000.00 0.000.00 0.010.01 0.020.02 0.050.05 0.120.12 0.290.29 0.770.77 2.252.25 9.979.97 70.3870.38 10.5410.54 2.762.76 1.191.19 0.530.53 0.240.24 0.080.08 0.790.79 BB– 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.030.03 0.060.06 0.140.14 0.330.33 0.760.76 2.172.17 8.968.96 70.1670.16 11.1111.11 3.153.15 1.171.17 0.480.48 0.160.16 1.281.28 B+ 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.010.01 0.030.03 0.070.07 0.150.15 0.300.30 0.680.68 1.831.83 8.678.67 69.8369.83 11.9111.91 3.043.04 1.071.07 0.340.34 2.052.05 B 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.020.02 0.040.04 0.070.07 0.140.14 0.270.27 0.560.56 1.761.76 8.528.52 70.1870.18 11.4011.40 2.882.88 0.790.79 3.363.36 B– 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.020.02 0.040.04 0.070.07 0.130.13 0.250.25 0.650.65 2.172.17 11.3811.38 65.4865.48 11.4411.44 2.412.41 5.915.91 CCC+ 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.010.01 0.020.02 0.040.04 0.070.07 0.130.13 0.310.31 0.890.89 3.333.33 13.2713.27 59.8259.82 10.2210.22 11.8711.87 CCC 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.000.00 0.010.01 0.010.01 0.020.02 0.030.03 0.060.06 0.140.14 0.370.37 1.231.23 3.773.77 13.7513.75 50.2650.26 30.3430.34

Figure 2: As above but with the 18-state model. One-year transition matrix [SandP19, Table 23], TDST fit, and parameters.

3 Making the generator time-varying

3.1 Theoretical considerations

We start by pointing out a trap that is very easy to fall into. When the generator varies with time, the transition matrix is not given by the following expression (or its expectation conditional on information known at time 0, for a stochastically varying GG):

M(T)=??exp(0TGt𝑑t)(Wrong in general).M(T)\stackrel{{\scriptstyle??}}{{=}}\exp\left(\int_{0}^{T}G_{t}\,dt\right)\qquad\mbox{(Wrong in general)}. (6)

Instead, the so-called ordered exponential, sometimes written ‘𝒯exp\mathcal{T}\exp’, must be used111111To see where the problem is, suppose that GtG_{t} equals GiG^{\textrm{i}} for t[0,T/2)t\in[0,T/2) and GiiG^{\textrm{ii}} for t[T/2,T]t\in[T/2,T], both constant matrices. Then the period-TT transition matrix is exp(GiT/2)exp(GiiT/2)exp(GiT/2+GiiT/2)\exp(G^{\textrm{i}}T/2)\exp(G^{\textrm{ii}}T/2)\neq\exp(G^{\textrm{i}}T/2+G^{\textrm{ii}}T/2) because one needs commutativity but might not have it. See Wikipedia page on ‘ordered exponential’.. This can be thought of as

limN[(I+G0δt)(I+Gδtδt)(I+G2δtδt)(I+G(N1)δtδt)],δt=T/N.\lim_{N\to\infty}\Big{[}(I+G_{0}\,\delta t)(I+G_{\delta t}\,\delta t)(I+G_{2\delta t}\,\delta t)\cdots(I+G_{(N-1)\delta t}\,\delta t)\Big{]},\qquad\delta t=T/N.

What is beguiling about (6) is its simplicity, its ‘obvious’ connection to standard ideas on rates modelling, and that its RHS is a valid transition matrix, albeit the wrong one.

We can invoke the Feynman-Kac representation, which is that if XtX_{t} obeys the SDE

dXt=μ(Xt)dt+σ(Xt)dWt,dX_{t}=\mu(X_{t})\,dt+\sigma(X_{t})\,dW_{t},

and GG is a function of xx only, then the matrix MM obeys the backward equation

Mt=G(x)M+μ(x)Mx+σ2(x)22Mx2,M(0)=I;\frac{\partial{M}}{\partial{t}}=G(x)M+\mu(x)\frac{\partial{M}}{\partial{x}}+\frac{\sigma^{2}(x)}{2}\frac{\partial^{2}{M}}{\partial{x}^{2}},\qquad M(0)=I;

but this equation is unlikely to have a closed-form solution, so has to be solved numerically e.g. on a trinomial tree.

Another route that is likely to cause difficulties is an over-reliance on diagonalisation methods. We have said that if we can write G=PEP1G=PEP^{-1} then it may be easy to exponentiate GG, depending on how well-conditioned PP is. To make GG time-varying it is superficially attractive to allow EE to vary while fixing PP. The problem with that is that there is no guarantee that a valid transition matrix is thereby produced.

There is one case which, interestingly, links to both these ideas. That is where the effect of time variation is simply to scale GG by a factor, i.e. Gt=XtG¯G_{t}=X_{t}\overline{G} with G¯\overline{G} a static matrix and XtX_{t} some scalar positive process which may as well have unit mean. The commutativity problem disappears, so that (6) is now correct, and only the eigenvalues vary over time. This model may or may not be appropriate, depending on the context, as we discuss later.

An obvious choice of dynamics is the CIR process,

dXt=κ(X¯Xt)dt+σXtdWt.dX_{t}=\kappa(\overline{X}-X_{t})\,dt+\sigma\sqrt{X_{t}}\,dW_{t}. (7)

Importantly, the joint distribution of XtX_{t} and exp(0tXs𝑑s)\exp\big{(}\int_{0}^{t}X_{s}\,ds\big{)} is known via the double Laplace transform [Lamberton12, Prop. 2.5]:

𝐄0[eλXtμ0tXs𝑑s]=𝒜(λ,μ,t)e(λ,μ,t)X0\mathbf{E}_{0}\left[e^{-\lambda X_{t}-\mu\int_{0}^{t}X_{s}\,ds}\right]=\mathcal{A}(\lambda,\mu,t)e^{-\mathcal{B}(\lambda,\mu,t)X_{0}} (8)

with

𝒜(λ,μ,t)\displaystyle\mathcal{A}(\lambda,\mu,t) =\displaystyle= (2κ^μe(κ^μ+κ)t/2σ2λ(eκ^μt1)+κ^μκ+eκ^μt(κ^μ+κ))2κX¯/σ2\displaystyle\left(\frac{2\widehat{\kappa}_{\mu}e^{(\widehat{\kappa}_{\mu}+\kappa)t/2}}{\sigma^{2}\lambda(e^{\widehat{\kappa}_{\mu}t}-1)+\widehat{\kappa}_{\mu}-\kappa+e^{\widehat{\kappa}_{\mu}t}(\widehat{\kappa}_{\mu}+\kappa)}\right)^{2\kappa\overline{X}/\sigma^{2}}
(λ,μ,t)\displaystyle\mathcal{B}(\lambda,\mu,t) =\displaystyle= λ(κ^μ+κ+eκ^μt(κ^μκ))+2μ(eκ^μt1)σ2λ(eκ^μt1)+κ^μκ+eκ^μt(κ^μ+κ)\displaystyle\frac{\lambda\big{(}\widehat{\kappa}_{\mu}+\kappa+e^{\widehat{\kappa}_{\mu}t}(\widehat{\kappa}_{\mu}-\kappa)\big{)}+2\mu(e^{\widehat{\kappa}_{\mu}t}-1)}{\sigma^{2}\lambda(e^{\widehat{\kappa}_{\mu}t}-1)+\widehat{\kappa}_{\mu}-\kappa+e^{\widehat{\kappa}_{\mu}t}(\widehat{\kappa}_{\mu}+\kappa)}

and κ^μ=κ2+2σ2μ\widehat{\kappa}_{\mu}=\sqrt{\kappa^{2}+2\sigma^{2}\mu}. Accordingly the TT-period transition matrix conditional on X0X_{0} is121212The functions z𝒜(0,z,T)z\mapsto\mathcal{A}(0,z,T) and z(0,z,T)z\mapsto\mathcal{B}(0,z,T) are regular for Rez0\mathrm{Re}\,z\geq 0, so can be evaluated ‘at’ z=G¯z=-\overline{G}.

𝐄0[eG¯0TXt𝑑t]=𝒜(0,G¯,T)e(0,G¯,T)X0;\mathbf{E}_{0}\left[e^{\overline{G}\int_{0}^{T}X_{t}\,dt}\right]=\mathcal{A}(0,-\overline{G},T)e^{-\mathcal{B}(0,-\overline{G},T)X_{0}}; (9)

the unconditional one is obtained by integrating X0X_{0} out, which is immediate because its unconditional distribution is Gamma of mean X¯\overline{X} and shape parameter 2κX¯/σ22\kappa\overline{X}/\sigma^{2}.

3.2 Historical calibration with a time-varying generator

Default rates fluctuate over time, and we wish to capture this effect in a model that can then be linked to suitable econometric indicators [Fei13]. Thus we need a stochastically-driven model, as opposed to a deterministic but time-inhomogenous one as considered in [Bluhm07].

As just mentioned, a simple scaling of the generator matrix can be done easily. However, it is unrealistic, because in benign years there are more upgrades and fewer downgrades, and the reverse in bad ones: it is not correct simply to scale both by the same factor. Returning to (3), write

Gφ=Gφ++GφG_{\varphi}=G_{\varphi}^{+}+G_{\varphi}^{-}

in which the terms on the RHS are valid generator matrices that are, respectively, lower- and upper-triangular. These are generators for an upgrade-only and a downgrade-only Markov chain, and so this decomposition is unique. Except in trivial cases we will always have G+GGG+G^{+}G^{-}\neq G^{-}G^{+}. Now scale these two terms by different positive variables Xt+X^{+}_{t} and XtX^{-}_{t} representing the contribution from upgrades and downgrades respectively, so that

Gφ,t=Xt+Gφ++XtGφ.G_{\varphi,t}=X^{+}_{t}G_{\varphi}^{+}+X^{-}_{t}G_{\varphi}^{-}. (10)

It is easiest to do this in a discrete-time setting, because our natural source of data for calibration is annual. Then the one-year transition matrix is the exponential of Gφ,tG_{\varphi,t}, but we no longer have a convenient closed-form calculation for it, and have to resort to (2).

We do not have yearly transition matrices to calibrate to: even if we did, they would be quite ‘noisy’, so that for example we might have an instance of a transition A to BB, but none to BB+. However two pieces of annual information are available [SandP19, Table 6]: the downgrade:upgrade ratio and the default rate, both of which are averaged across rating states. Roughly speaking the former quantity gives information about Xt/Xt+X^{-}_{t}/X^{+}_{t}, and the latter about XtX^{-}_{t}. They can be matched exactly, and uniquely, by a simple numerical search, except in singular cases where there are no upgrades/downgrades/defaults. Doing this for each year gives Figure 3.

The sample statistics of X+X^{+} and XX^{-} are: mean 0.95, 0.95; standard deviation 0.42, 0.65; correlation 0.55. This shows, as anticipated, that it is necessary to have two factors rather than just a single one. In fact, this is a similar conclusion to that in [Andersson00, §2]. It is perhaps surprising that the correlation is positive, because ‘good years’ should see X+X^{+} high and XX^{-} low and oppositely in bad years, which was the whole point of having two factors. The author is indebted to M. van Beek for pointing out that over shorter time scales, data suggest a negative correlation. In other words, looking at all of 2009 is unhelpful as it is a year of two parts: first multiple downgrades and defaults, then multiple upgrades. It is hard to be precise about what stochastic process Xt±X^{\pm}_{t} follow, as we have only 38 data points, but a sensible model choice for each of Xt±X^{\pm}_{t} is the exponential of an autoregressive (AR) process. That is to say, we write xtx_{t} for the logarithm of XtX_{t}, with the mean subtracted; then the AR(pp) model is

xt=j=1pajxtj+etx_{t}=-\sum_{j=1}^{p}a_{j}x_{t-j}+e_{t}

where {et}\{e_{t}\} is a white noise process and the (aj)(a_{j}) are coefficients to be determined. The properties of these processes, and the fitting of their coefficients, are well understood, e.g. [Makhoul75, Marple87]. An AR(1) process captures mean-reversion. An AR(2) process allows more subtlety, and allows the possibility of rating momentum, in the sense that next year’s up/downgrade rate is likely to be higher than this year’s if this year’s was higher than last year’s, or in equations

𝐄[(xnxn1)(xn1xn2)]>0,\mathbf{E}[(x_{n}-x_{n-1})(x_{n-1}-x_{n-2})]>0,

a condition that can be expressed in terms of the autocovariance function and thence of the (aj)(a_{j}) by the Yule-Walker equations. However, there is little evidence for this in the data used here, suggesting that if rating momentum exists it is likely to be observed over a faster time scale than a year, and is possibly more of an idiosyncratic effect, in the sense that troubled issuers often suffer successive downgrades over a period of months (which does occur).

Refer to caption
Figure 3: Evolution of coefficients Xt±X^{\pm}_{t} in time-varying model. The green trace Xt+X^{+}_{t} controls upgrades and the red trace XtX^{-}_{t} controls downgrades.

4 Risk-neutral calibration

4.1 Theoretical considerations

Risk-neutral calibration is the identification of a generator matrix that fits a given set of CDS spreads or bond prices, via the survival curve131313The valuation of CDS and bonds in terms of the survival curve is well-understood, e.g. [OKane08].. This is an entirely different problem from fitting to a prescribed historical transition matrix.

The first point we make is that in reality we should not expect anywhere near a perfect fit in doing this: there is no reason why the market should trade a credit in line with its public rating, as the market provides a current view of the firm’s future solvency, and the rating may not be up to date. Thus it is quite possible to have two different BBB- firms trading at 80bp and 140bp at the 5y tenor; once we get down to CCC+ we can have names trading anywhere between 500bp and 5000bp. Also there may be no bonds of a particular rating: this will almost certainly be true when rating modifiers are used. All these examples can be found in the dataset we used. Therefore we must be prepared for very ‘noisy’ data, and the objective of fitting must be to come up with sensibly-shaped curves that give a reasonable fit.

It is superficially attractive to describe this as a ‘bond pricing’ model. However, bonds are quoted on price or spread-to-Treasury, so there is no need for a pricing model as such. Arguably it is suitable for ‘matrix pricing’ of illiquid bonds, but an illiquid bond is likely to trade at a significant yield premium to a liquid one of the same rating and tenor, so this would have to be borne in mind. In reality, the model is most suited to relative value analysis or for a parsimonious representation of a certain sector, e.g. “Where do US industrials trade for different rating and maturity?” Another application is as an input to the evaluation of EM corporate bonds, coupling the developed market spread to the country spread [Martin18c].

Our next point is different, but no less fundamental. Even in a hypothetical world of bond/CDS spreads exactly lining up with their public ratings, and trading with great liquidity and low bid-offer, does the term structure, for each rating, allow us to uniquely identify the generator? There is a clear intuition about why the term structure on its own does not suffice. Knowing the full transition matrix allows us to value any contingent claim, including claims that require us to know about rating (and hence price) volatility. But such information cannot be gleaned from bond prices alone.

The term structure provides (in principle) 2n2n clearly-interpretable pieces of information, in the form of the short-term spread and the gradient of the spread curve at the short end, but in practice observations of these quantities will be ‘noisy’. Denoting by Qi(T)Q_{i}(T) the survival probability for time TT, conditional on being in state ii at time zero, we have for small TT,

Qi(T)=1GiT(G2)iT22+O(T3).Q_{i}(T)=1-G_{i\maltese}T-(G^{2})_{i\maltese}\frac{T^{2}}{2}+O(T^{3}).

If we convert to a par CDS spread then the PV of the payout leg and the coupon leg are, for small maturity TT, respectively

(1)(1Qi(T))=(1)(GiT+(G2)iT22+)(1-\Re)\big{(}1-Q_{i}(T)\big{)}=(1-\Re)\left(G_{i\maltese}T+(G^{2})_{i\maltese}\frac{T^{2}}{2}+\cdots\right)

(with \Re the recovery rate), and

T1+Qi(T)2=T(1GiT2+)T\frac{1+Q_{i}(T)}{2}=T\left(1-G_{i\maltese}\frac{T}{2}+\cdots\right)

and the CDS spread is the ratio of the two:

si(T)1=Gi+[(G2)i+(Gi)2]T2+\frac{s_{i}(T)}{1-\Re}=G_{i\maltese}+\big{[}(G^{2})_{i\maltese}+(G_{i\maltese})^{2}\big{]}\frac{T}{2}+\cdots

Alternatively we can use the continuously-compounded zero-coupon Z-spread si(T)=T1lnQi(T)s_{i}(T)=-T^{-1}\ln Q_{i}(T), and the result is the same. Writing the matrix square G2G^{2} as an explicit sum, we obtain after some elementary algebra:

si(0)=(1)Gisi(0)=12j=1nGij(GjGi)\begin{array}[]{rcl}s_{i}(0)&=&(1-\Re)G_{i\maltese}\\ s_{i}^{\prime}(0)&=&\displaystyle\frac{1-\Re}{2}\sum_{j=1}^{n}G_{ij}(G_{j\maltese}-G_{i\maltese})\end{array} (11)

The first of these is obvious: the short-term spread is explained by the instantaneous probability of transition to default. The second expresses extra loss arising from transition to riskier states (iji\to j with Gj>GiG_{j\maltese}>G_{i\maltese}) balanced off against transition to less-risky states (Gj<GiG_{j\maltese}<G_{i\maltese}): if the former exceeds the latter, the spread curve is upward-sloping, and vice versa. In particular, the highest rating necessarily has an upward-sloping curve and the lowest a downward-sloping one.

Now suppose we have two term structures that agree on si(0)s_{i}(0) and si(0)s_{i}^{\prime}(0) for each ii. They might look different at the long end (TT\to\infty), but how many extra parameters would such a difference entail? So it seems plausible that the term structure provides a little over 2n2n pieces of information, and certainly nowhere near n2n^{2}. A ‘toy’ example shows this well (Figure 4141414The letter codes A,B,C do not correspond to S&P ratings: they are just labels of convenience, with A the best and C the worst. Recovery =0.4\Re=0.4.). The term structures are not identical, but are so close that there is no realistic way of telling the generators apart—we chose, as is easily done, matrices for which the information in (11) is identical. Rating B clearly has higher volatility in the second matrix.

4.2 Spectral theory

It is possible to use basic differential geometry to provide some quantitative precision about invertibility. There exists a smooth pricing map f𝒞f\in\mathcal{C}^{\infty} from the space of generator matrices 𝒢\mathcal{G} to the space of term structures 𝒮\mathcal{S}, that is, a set of curves of spread vs maturity, one for each rating. At any point G𝒢G\in\mathcal{G} it has a derivative f\partial f that relates, linearly, infinitesimal changes in generator matrix to infinitesimal changes in term structure, and represented by the jacobian151515The jacobian of a map f:𝒳𝒴f:\mathcal{X}\to\mathcal{Y} at x𝒳x\in\mathcal{X} is the matrix Jf(x)J_{f}(x) whose (i,j)(i,j)th element is fi/xj\partial f_{i}/\partial x_{j} evaluated at xx. JfJ_{f}. To construct it, take G𝒢G\in\mathcal{G} and perform, one by one, n2n^{2} bumps on it, in which each bump consists in increasing the element GijG_{ij}, for iji\neq j, by a small amount ϵ\epsilon while adding ϵ-\epsilon to GiiG_{ii} so as to keep the iith row sum zero. Compute the new term structure f(G+δG)f(G+\delta G), written as a set of NnNn points (maturities 1y, 2y, …NN\,y and nn ratings). The difference (f(G+δG)f(G))/ϵ\big{(}f(G+\delta G)-f(G)\big{)}/\epsilon estimates the directional derivative, and repeating this for each possible bump gives the jacobian, which is of dimension Nn×n2Nn\times n^{2}, containing the sensitivity of all points on the term structure to all n2n^{2} bumps. The degree to which ff is locally invertible161616The singular values of JfJ_{f} only tell us about local invertibility: it is easy to construct an ff that causes the domain to be folded on itself in such a way that JfJ_{f} is nonsingular but ff is not one-to-one. A simple example is f:22f:\mathbb{R}^{2}\to\mathbb{R}^{2} given by f(x,y)=ex(cosy,siny)f(x,y)=e^{x}(\cos y,\sin y): the singular values of JfJ_{f} are identical, being exe^{x}, but ff is periodic in yy, and so obviously not one-to-one. In practice, therefore, establishing that a function f:ddf:\mathbb{R}^{d}\to\mathbb{R}^{d} is invertible is hard if d>1d>1. In context this means that we might have two totally different generator matrices giving precisely the same term structure, though this is perhaps unlikely. can be understood by examining the singular value decomposition (SVD, [NRC, §2.6]) of the jacobian, that is, Jf=UΔVJ_{f}=U\Delta V^{\prime}, in which U,VU,V are orthogonal and Δ=diag(δ1,δ2,,)\Delta=\mathrm{diag}(\delta_{1},\delta_{2},\ldots,) with δ1δ20\delta_{1}\geq\delta_{2}\geq\cdots\geq 0: these elements are the roots of the characteristic equation det(δ2IJfJf)=0\det(\delta^{2}I-{J_{f}}^{\prime}J_{f})=0, and are the singular values. In attempting to locally invert ff we have to divide by Δ\Delta, so small singular values cause a problem. As an overall scaling of the matrix does not affect its conditioning, standard procedure is to plot δ^i=δi/δ1\hat{\delta}_{i}=\delta_{i}/\delta_{1}, for i=1,2,i=1,2,\ldots, on a logarithmic scale, and we call this the normalised spectrum of f\partial f. In well-conditioned problems the δ^i\hat{\delta}_{i} are all roughly equal; in ill-conditioned problems some are very small, and in these coordinate directions inversion will be impossible without grossly amplifying any observation noise.

Taking one further step we also want to examine whether a particular parametrised subspace of 𝒢\mathcal{G} permits stable inversion: in context 𝒫\mathcal{P} is, of course, the TDST parameters. Let 𝒫\mathcal{P} be the space of parameters, with a map π:𝒫𝒢\pi:\mathcal{P}\to\mathcal{G} taking a particular parameter set to its associated generator. We can construct JπJ_{\pi} and JfπJ_{f\circ\pi} by bumping the parameters pp and seeing the effect on the generator π(p)𝒢\pi(p)\in\mathcal{G} and the term structure f(π(p))𝒮f(\pi(p))\in\mathcal{S}. Notionally the singular values of the restricted171717i.e. the restiction of f\partial f to the tangent space of π(𝒫)𝒢\pi(\mathcal{P})\subset\mathcal{G}. f\partial f are simply those of JfπJπ1J_{f\circ\pi}\cdot J_{\pi}^{-1}, but this is not meaningful as JπJ_{\pi} is a rectangular matrix: instead, the values we seek are the roots (δi)(\delta_{i}) of the more general characteristic equation

det(δ2JπJπJfπJfπ)=0.\det\big{(}\delta^{2}{J_{\pi}}^{\prime}J_{\pi}-{J_{f\circ\pi}}^{\prime}J_{f\circ\pi}\big{)}=0. (12)

This construction is invariant under local reparametrisation: if we have two different parametrisations of the same subspace of 𝒢\mathcal{G}, i.e. π1:𝒫1𝒢\pi_{1}:\mathcal{P}_{1}\to\mathcal{G} and π2:𝒫2𝒢\pi_{2}:\mathcal{P}_{2}\to\mathcal{G}, with π11π2\pi_{1}^{-1}\circ\pi_{2} locally invertible, then the above equation does not depend on the choice (π1,𝒫1)(\pi_{1},\mathcal{P}_{1}) or (π2,𝒫2)(\pi_{2},\mathcal{P}_{2}): in other words it just depends on ff and on the subspace of 𝒢\mathcal{G}.

4.3 Empirical work

We fitted an 18-state matrix to US bond data from the manufacturing sector on 05-Feb-20. To reduce the dimensionality of the problem further we used the same trick as in §2, specifying sub- and super-diagonal elements for AA,A,BBB,BB,B,CCC and dealing with rating modifiers by interpolation: this gives 14 parameters (=2×6+nφ=2\times 6+n_{\varphi}). The performance surface was quite flat: different parameter sets gave almost the same quality of fit. We have more to discuss, but the reader may wish to glance at Figure 5, which shows the end result. To avoid cluttering the plot, rating modifiers are not shown, so that for example BBB+,BBB,BBB- are all coloured the same as BBB. One obvious feature is the high degree of dispersion of bond spreads around their fitted curves. We can also see from Figure 6(a) that there is no realistic way of fitting anywhere near n2=324n^{2}=324 parameters, as the singular values roll off so rapidly. This first graph can be constructed for any generator so it has nothing to do with TDST. Turning next to Figure 6(b), which pertains to TDST, we see from the green trace that some instability is likely as the smallest normalised singular values are quite small. This points to the conclusion that further constraining is required. We have already anticipated that volatility assumptions need to be incorporated, and so we address that next.

4.4 Adding spread volatility

If we wish to capture spread volatility in a realistic way, we should augment the model to allow for the generator to be stochastic, thereby capturing day-to-day fluctuations in spread that are not associated with rating transition. Let us therefore return to §3.1 and revisit the idea of a CIR process to drive this. Equation (8) giving the joint law of XtX_{t} and exp(G¯0tXs𝑑s)\exp(\overline{G}\int_{0}^{t}X_{s}\,ds) allows a wide range of contingent claims to be valued. However, single-name CDS options scarcely trade181818See [Martin11c] for a review and discussion of this subject. so there would be nothing to calibrate to.

A more practical idea is to compute the instantaneous spread volatility and attempt to match it to historical data. As the effect of XX is approximately to scale the credit spreads proportionally, the instantaneous volatility of the period-TT spread for rating ii is

σis(T)(j=1nGij(lnsj(T)lnsi(T))2+σ2)1/2\sigma^{s}_{i}(T)\approx\left(\sum_{j=1}^{n}G_{ij}\big{(}\ln s_{j}(T)-\ln s_{i}(T)\big{)}^{2}+\sigma^{2}\right)^{1/2} (13)

where σ\sigma is the volatility parameter in the CIR process.

Turning now to empirical matters, we make some simple observations, based on time series analysis of some 1500 bond and CDS spreads going back 10–20 years. The first is that credit spread volatility is roughly 40%, and that this does not depend strongly on rating, tenor, or spread level. The second is that there is little evidence of mean reversion, and so κ\kappa in (7) needs to be quite low: we have fixed it at 0.1(/year), though the model calibration is not sensitive to this choice. Thus, at the expense of adding one more191919As X¯\overline{X} can be set to unity. parameter (σ\sigma), we can penalise the deviation of each σis(T)\sigma^{s}_{i}(T) from our assumed value of 0.4. This was found to stabilise the calibration.

We can now redo the spectral analysis of Figure 6, augmenting the pricing map ff so that it takes a given generator G𝒢G\in\mathcal{G} to the term structure of spread and also of instantaneous volatility, notated f:𝒢𝒮×𝒱f:\mathcal{G}\to\mathcal{S}\times\mathcal{V}. The singular values, as expected, roll off more slowly (blue trace), showing that new information is being supplied by the volatility term structure. By Figure 6(a) this is, unsurprisingly, still not enough to allow 324 parameters to be fitted, but Figure 6(b) suggests that enough stability may be conferred within the TDST parametrisation, which is what we found in practice.

A final point about this kind of model is that it is capable of capturing the effects seen in early 2008 where high-grade credits had inverted yield curves even at modest spreads of 100–200bp. As we said earlier, a static generator cannot deal with this as the spread curve for the best rating state cannot possibly be inverted, and in any realistic parametrisation all the investment-grade curves will be not be. But with a dynamic generator this is now possible, if one introduces X0X_{0} into the calibration. When it is very much higher than its long-term mean, short-term spreads will be elevated, as happened in that time period.

A B C \maltese
A 0.14-0.14 0.10 0.03 0.01
B 0.00 0.05-0.05 0.00 0.05
C 0.00 0.05 0.15-0.15 0.10
Refer to caption
A B C \maltese
A 0.14-0.14 0.10 0.03 0.01
B 0.20 0.41-0.41 0.16 0.05
C 0.00 0.05 0.15-0.15 0.10
Refer to caption
Figure 4: Two different hypothetical generators (last row omitted) and corresponding term structures, which are seen to be virtually identical.
Refer to caption
Figure 5: Implied par spreads from fitting to US bond data (manufacturing sector, 05-Feb-20).
(a)
Refer to caption
(b)
Refer to caption
Figure 6: Normalised spectrum δ^i\hat{\delta}_{i} vs ii of f\partial f (the derivative of the pricing map), using the model calibration for Figure 5. In each case “spreads” means f:𝒢𝒮f:\mathcal{G}\to\mathcal{S}, and “spreads+vols” means f:𝒢𝒮×𝒱f:\mathcal{G}\to\mathcal{S}\times\mathcal{V}. Plot (a) refers to the map ff, and (b) to its restriction to π(𝒫)𝒢\pi(\mathcal{P})\subset\mathcal{G}, using (12) to compute the singular values.

5 Conclusions

We have presented a new model (TDST) based on a tridiagonal generator matrix coupled with a stochastic time change to allow multiple downgrades to occur with a probability that is not far too low. The model may be calibrated historically or to bond/CDS spreads, and the relevant conclusions are given below. An incidental advantage is that the matrix exponential is generally easy to calculate.

With regard to historical calibration, the presence of a ‘WR/NR’ (unrated or withdrawn rating), and the fact that some transitions are very rare, gives rise to considerable uncertainty as to the true underlying transition probabilities. The TDST model as described here is sufficiently flexible to calibrate within these uncertainties, without having too many parameters. If the model is made time-varying, two factors are needed to capture the volatility of upgrades and downgrades.

With regard to risk-neutral calibration, it is impossible to unambiguously identify a generator matrix from term structure, and two ingredients are essential: (i) dimensionality reduction e.g. through TDST and (ii) information about volatility. Once this is provided the calibration works well even in the presence of real data which must be expected to be very ‘noisy’.

Using the theory of §4.2 to analyse the derivative of the pricing map should be a standard procedure in any calibration exercise. It is not at all restricted to the application in this paper.

Acknowledgements

I thank Di Wang and Huong Vu for their work in earlier stages of this project, and Roland Ordovàs and Misha van Beek for helpful discussions.

Appendix A Appendix

A.1 Comment on the Gamma process

In the limit of one or more super- or sub-diagonal elements of HH being zero, i.e. an intransitive generator, the H=DH~D1H=D\widetilde{H}D^{-1} construction will no longer work, because DD becomes singular. However, in the case of the Gamma model the transition matrix is simply a matrix power, as it is

exp(φ(H)t)=(IH/β)βt.\exp(\varphi(H)t)=(I-H/\beta)^{-\beta t}.

Going back to (2), we can now see why the second expression is a valid transition matrix. It can be computed directly, as follows. First, compute A=(IH/β)1A=(I-H/\beta)^{-1}, which is most easily done by LU factorisation of IH/βI-H/\beta. Next, raise this to the power ν=βt\nu=\beta t. As raising a matrix to an integer ν\nu is easy (by successive squaring and using the binary representation of ν\nu), we only have to deal with the case where ν(0,1)\nu\in(0,1). Applying the binomial theorem twice and interchanging the order of summation:

Aν\displaystyle A^{\nu} =\displaystyle= (I+(AI))ν\displaystyle(I+(A-I))^{\nu}
=\displaystyle= r=0mΓ(ν+1)(AI)rΓ(ν+1r)r!+ϵm\displaystyle\sum_{r=0}^{m}\frac{\Gamma(\nu+1)(A-I)^{r}}{\Gamma(\nu+1-r)\,r!}+\epsilon_{m}
=\displaystyle= r=0mΓ(ν+1)Γ(ν+1r)r!k=0r()rk(rk)Ak+ϵm\displaystyle\sum_{r=0}^{m}\frac{\Gamma(\nu+1)}{\Gamma(\nu+1-r)\,r!}\sum_{k=0}^{r}(-)^{r-k}{r\choose k}A^{k}+\epsilon_{m}
=\displaystyle= k=0mAkck(ν)+ϵm\displaystyle\sum_{k=0}^{m}A^{k}c_{k}(\nu)+\epsilon_{m}
ck(ν)\displaystyle c_{k}(\nu) =\displaystyle= 1k!r=km()rkΓ(ν+1)(rk)!Γ(ν+1r)\displaystyle\frac{1}{k!}\sum_{r=k}^{m}\frac{(-)^{r-k}\Gamma(\nu+1)}{(r-k)!\,\Gamma(\nu+1-r)}

where ϵm\epsilon_{m} denotes the truncation error. It is easy to see that ck(ν)c_{k}(\nu) is a polynomial of degree mm that vanishes at ν=0,1,,k1,k+1,,m\nu=0,1,\ldots,k-1,k+1,\ldots,m, while ck(k)=1c_{k}(k)=1. (See [Gradshteyn94, §0.154].) Therefore this method writes AνA^{\nu} as the Lagrange interpolant between I,A1,,AmI,A^{1},\ldots,A^{m}; the simplest formula is m=1m=1 which gives a linear interpolation between II and AA. The expansion is convergent because in context the eigenvalues of AA lie in (0,1](0,1].