Markovian projections for Itô semimartingales with jumps

Martin Larsson¹¹1Department of Mathematical Sciences, Carnegie Mellon University, larsson@cmu.edu Shukun Long²²2Department of Mathematical Sciences, Carnegie Mellon University, shukunl@andrew.cmu.edu

Abstract

Given a general Itô semimartingale, its Markovian projection is an Itô process, with Markovian differential characteristics, that matches the one-dimensional marginal laws of the original process. We construct Markovian projections for Itô semimartingales with jumps, whose flows of one-dimensional marginal laws are solutions to non-local Fokker–Planck–Kolmogorov equations (FPKEs). As an application, we show how Markovian projections appear in building calibrated diffusion/jump models with both local and stochastic features.

1 Introduction

The Markovian projection arises in the problem where we want to mimic the one-dimensional marginal laws of an Itô process using another one with simpler dynamics. To be more specific, suppose we are given an Itô process $X$ whose characteristics are general stochastic processes. Our goal is to find another Itô process $\widehat{X}$ solving a Markovian SDE, i.e. the coefficients are functions of time and the process itself, such that the law of $\widehat{X}_{t}$ agrees with the law of $X_{t}$ for every $t\geq 0$ . The process $\widehat{X}$ is called a Markovian projection of $X$ .

The terminology Markovian projection has no standard definition, but is widely used in literature. Some authors require the mimicking process $\widehat{X}$ to be a true Markov process, while others (including our paper) only require $\widehat{X}$ to solve a Markovian SDE and we know the Markov property is not guaranteed in general. Also, some authors prefer to use alternative terminologies like “mimicking process” or “mimicking theorem” when referring to the same problem.

The idea of Markovian projections for Itô processes originated from Gyöngy [8], which was inspired by Krylov [12]. In [8] Markovian projections were constructed for continuous Itô semimartingales, under some boundedness and non-degeneracy conditions on the coefficients. Brunick and Shreve [5] extended the results of [8] by relaxing the assumptions therein to an integrability condition. They also proved mimicking theorems for functionals of sample paths such as running average and running maximum, using techniques of updating functions. Bentata and Cont [4] studied Markovian projections for Itô semimartingales with jumps. Their proof was based on a uniqueness result of the FPKE, and the mimicking process they constructed was Markov. To get such results, they imposed relatively strong assumptions on the coefficients such as continuity, which is not always easy to check in practice. See also Köpfer and Rüschendorf [11] for work closely related to [4].

In this paper, we construct Markovian projections for càdlàg Itô semimartingales. Our results holds under reasonable integrability and growth conditions. In the context of mimicking marginal laws of the process itself, this paper complements Brunick and Shreve [5] by allowing the diffusion process to have jumps. On the other hand, we work under different settings from Bentata and Cont [4]. Our assumptions are weaker in most cases, at the cost of not guaranteeing the uniqueness and Markov property of the mimicking process. One of our main tools is the superposition principle established by Röckner, Xie and Zhang [16], which constitutes a bridge from weak solutions of non-local FPKEs to martingale solutions for the associated non-local operator. The idea of using a superposition principle to prove a mimicking theorem seems to have been first used in Lacker, Shkolnikov and Zhang [14].

This paper is organized as follows. In Section 2 we gather all the required preliminaries. In Section 3 we state and prove our main result (Theorem 3.2). In Section 4 we provide several examples to illustrate how the theorem can be applied.

Throughout this paper, we let $(\Omega,\mathcal{F},(\mathcal{F}_{t})_{t\geq 0},\mathbb{P})$ be a filtered probability space satisfying the usual conditions, and we use the following notation:

•

$\mathbb{R}_{+}=[0,\infty)$ .
•

$\mathbb{S}_{+}^{d}$ is the set of symmetric positive semi-definite $d\times d$ real matrices.
•

$C_{0}(\mathbb{R}^{d})$ (resp. $C_{c}(\mathbb{R}^{d})$ ) is the set of continuous functions on $\mathbb{R}^{d}$ which vanish at infinity (resp. have compact support).
•

$\mu(f)=\int f\,d\mu$ , for $\mu$ a measure and $f$ a measurable function on some space such that the integral is well-defined.
•

$\mathcal{P}(X)$ is the space of Borel probability measures on a Polish space $X$ , endowed with the topology of weak convergence.

2 Prerequisites and Preliminary Results

This section serves as a preparation for stating and proving our main results. In the sequel, we review some standard notions and present two key lemmas.

2.1 Transition Kernel

In the study of the characteristics of Itô semimartingales with jumps (see Section 2.3) and other fields like Markov processes, the notion of transition kernels comes into play. In this subsection, we recall some of the standard definitions and fix some terminologies for our later use.

Definition 2.1 (Transition kernel).

Let $(X,\mathcal{A})$ , $(Y,\mathcal{B})$ be two measurable spaces. We call $\kappa:X\times\mathcal{B}\to[0,\infty]$ a transition kernel from $(X,\mathcal{A})$ to $(Y,\mathcal{B})$ if:

(i)

for each $x\in X$ , the map $\kappa(x,\cdot):\mathcal{B}\to[0,\infty]$ is a measure,
(ii)

for each $B\in\mathcal{B}$ , the map $\kappa(\cdot,B):X\to[0,\infty]$ is a measurable function.

We often say $\kappa$ is a transition kernel from $X$ to $Y$ if there is no ambiguity on the $\sigma$ -algebras $\mathcal{A}$ , $\mathcal{B}$ . Unless otherwise specified, on a topological space we consider its Borel $\sigma$ -algebra; on a product space we consider its product $\sigma$ -algebra. In particular, when working with stochastic processes, we assume by default that $\Omega\times\mathbb{R}_{+}$ is equipped with the $\sigma$ -algebra $\mathcal{F}\times\mathcal{B}(\mathbb{R}_{+})$ . If we require stronger measurability, e.g. with respect to the predictable $\sigma$ -algebra, we will explicitly say so.

When $X=\Omega$ , we also call $\kappa$ a random measure. We often use the notation $\kappa(dy)$ , omitting its dependency on $\omega\in\Omega$ . When $X=\Omega\times\mathbb{R}_{+}$ , for fixed $t\geq 0$ the map

\Omega\times\mathcal{B}\ni(\omega,B)\mapsto\kappa(\omega,t,B)\in[0,\infty]

is a random measure, and we denote it by $\kappa_{t}(dy)$ .

The following terminologies will be convenient for our later use.

Definition 2.2.

Let $(X,\mathcal{A})$ , $(Y,\mathcal{B})$ be two measurable spaces, and $\kappa$ be a transition kernel from $X$ to $Y$ .

(i)

We say $\kappa$ is a finite transition kernel if for each $x\in X$ , $\kappa(x,dy)$ is a finite measure on $Y$ .
(ii)

When $Y=\mathbb{R}^{d}$ , we say $\kappa$ is a Lévy transition kernel if for each $x\in X$ , $\kappa(x,dy)$ is a Lévy measure on $\mathbb{R}^{d}$ , i.e.

$\kappa(x,\{0\})=0\quad\text{and}\quad\int_{\mathbb{R}^{d}}1\land|y|^{2}\,\kappa(x,dy)<\infty.$
(iii)

When $X=\Omega\times\mathbb{R}_{+}$ and $\mathcal{A}$ is the predictable $\sigma$ -algebra, we say $\kappa$ is a predictable transition kernel. That is, for each $B\in\mathcal{B}$ , $(\omega,t)\mapsto\kappa(\omega,t,B)$ is a predictable process.

2.2 Key Lemmas

Now we present two lemmas which are crucial in proving our main results. These lemmas are also of interest on their own. The first lemma was proved by Brunick and Shreve [5], which we quote below.

Lemma 2.3 (cf. [5], Proposition 5.1).

Let $X$ be an $\mathbb{R}^{d}$ -valued measurable process, and $\alpha$ be a $C$ -valued measurable process, where $C\subseteq\mathbb{R}^{n}$ is a closed convex set, satisfying

\mathbb{E}\bigg{[}\int_{0}^{t}|\alpha_{s}|\,ds\biggr{]}<\infty,\quad\forall\,t>0.

Then, there exists a measurable function $a:\mathbb{R}_{+}\times\mathbb{R}^{d}\to C$ such that for Lebesgue-a.e. $t\geq 0$ ,

a(t,X_{t})=\mathbb{E}[\alpha_{t}\,|\,X_{t}].

Remark 2.4.

For each fixed $t\geq 0$ , we all know $\mathbb{E}[\alpha_{t}\,|\,X_{t}]$ is some measurable function of $X_{t}$ . However, the joint measurability of $a$ is less obvious, and this is the key point of Lemma 2.3. The proof of this lemma is constructive. Indeed, we define the $\sigma$ -finite measure $\mu$ and the $\sigma$ -finite vector-valued measure $\nu$ via

\begin{split}\mu(A)&\coloneqq\mathbb{E}\biggl{[}\int_{0}^{\infty}\bm{1}_{A}(s,X_{s})\,ds\biggr{]},\quad A\in\mathcal{B}(\mathbb{R}_{+}\times\mathbb{R}^{d}),\\ \nu(A)&\coloneqq\mathbb{E}\biggl{[}\int_{0}^{\infty}\alpha_{s}\bm{1}_{A}(s,X_{s})\,ds\biggr{]},\quad A\in\mathcal{B}(\mathbb{R}_{+}\times\mathbb{R}^{d}).\end{split}

(2.1)

Clearly, we have $\nu\ll\mu$ . Then, we can choose function $a$ to be any version of the Radon–Nikodym derivative $\frac{d\nu}{d\mu}$ . For more details, see the proof in [5].

The second lemma is novel, and it is an analogue of Lemma 2.3 in terms of transition kernels. We will construct a kernel $k(t,x,d\xi)$ from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ satisfying an identity involving conditional expectations. The key point is to find a family of measures indexed by $(t,x)$ , and simultaneously preserve the joint measurability in $(t,x)$ .

Lemma 2.5.

Let $X$ be an $\mathbb{R}^{d}$ -valued measurable process, and $\kappa$ be a transition kernel from $\Omega\times\mathbb{R}_{+}$ to $\mathbb{R}^{d}$ satisfying

\mathbb{E}\bigg{[}\int_{0}^{t}\kappa_{s}(\mathbb{R}^{d})\,ds\biggr{]}<\infty,\quad\forall\,t>0.

(2.2)

Then, there exists a finite transition kernel $k$ from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ such that for Lebesgue-a.e. $t\geq 0$ ,

k(t,X_{t},A)=\mathbb{E}[\kappa_{t}(A)\,|\,X_{t}],\quad\forall\,A\in\mathcal{B}(\mathbb{R}^{d}).

(2.3)

Proof.

By the integrability condition (2.2), without loss of generality, we may assume that $\kappa$ is a finite transition kernel. Otherwise, we can simply modify $\kappa(\cdot,\cdot,d\xi)\coloneqq 0$ on a $(\mathbb{P}\otimes dt)$ -null set.

Our proof is based on the Riesz–Markov–Kakutani representation theorem for the dual space of $C_{0}(\mathbb{R}^{d})$ . Since nonzero constant functions do not belong to $C_{0}(\mathbb{R}^{d})$ , for technical reasons we first consider the function space

C_{\ell}(\mathbb{R}^{d})\coloneqq C_{0}(\mathbb{R}^{d})\oplus\mathbb{R}=\{f+c:f\in C_{0}(\mathbb{R}^{d}),\,c\in\mathbb{R}\}.

In other words, $C_{\ell}(\mathbb{R}^{d})$ is the space of continuous functions on $\mathbb{R}^{d}$ which admit a finite limit at infinity. We endow $C_{\ell}(\mathbb{R}^{d})$ with the supremum norm. Since $C_{0}(\mathbb{R}^{d})$ is a separable Banach space, it is easy to check that $C_{\ell}(\mathbb{R}^{d})$ is also a separable Banach space. Let $\mathcal{C}$ be a countable dense subset of $C_{\ell}(\mathbb{R}^{d})$ with $1\in\mathcal{C}$ . Let $\mathcal{L}$ be the $\mathbb{Q}$ -span of $\mathcal{C}$ , i.e. the collection of all finite linear combinations of elements of $\mathcal{C}$ with rational coefficients. Clearly, $\mathcal{L}$ is a countable dense subset of $C_{\ell}(\mathbb{R}^{d})$ with $1\in\mathcal{L}$ . Moreover, $\mathcal{L}$ is a vector space over $\mathbb{Q}$ by construction.

For each $\varphi\in\mathcal{L}$ , by (2.2) and Lemma 2.3, there exists an $\mathbb{R}$ -valued measurable function of $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , denoted by $L_{t,x}(\varphi)$ , such that for Lebesgue-a.e. $t\geq 0$ ,

L_{t,X_{t}}(\varphi)=\mathbb{E}\biggl{[}\int_{\mathbb{R}^{d}}\varphi(\xi)\,\kappa_{t}(d\xi)\,\bigg{|}\,X_{t}\biggr{]}.

(2.4)

Now for fixed $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , we can view $\varphi\mapsto L_{t,x}(\varphi)$ as a functional on $\mathcal{L}$ . We expect $L_{t,x}$ to be a positive $\mathbb{Q}$ -linear functional, but this is not guaranteed unless for each $\varphi\in\mathcal{L}$ we carefully modify the function $(t,x)\mapsto L_{t,x}(\varphi)$ .

As discussed in Remark 2.4, $(t,x)\mapsto L_{t,x}(\varphi)$ is defined via the Radon–Nikodym derivative $\frac{d\nu_{\varphi}}{d\mu}$ , where $\mu$ is as defined in (2.1) and

\nu_{\varphi}(A)\coloneqq\mathbb{E}\biggl{[}\int_{0}^{\infty}\bm{1}_{A}(s,X_{s})\int_{\mathbb{R}^{d}}\varphi(\xi)\,\kappa_{s}(d\xi)\,ds\biggr{]},\quad A\in\mathcal{B}(\mathbb{R}_{+}\times\mathbb{R}^{d}).

For $\varphi\in\mathcal{L}$ with $\varphi\geq 0$ , we have that $\nu_{\varphi}$ is a (positive) measure, so there exists a $\mu$ -null set $N_{\varphi}$ such that for all $(t,x)\notin N_{\varphi}$ ,

L_{t,x}(\varphi)\geq 0.

(2.5)

For $\varphi,\psi\in\mathcal{L}$ and $p,q\in\mathbb{Q}$ , by the uniqueness of the Radon–Nikodym derivative, there exists a $\mu$ -null set $N_{\varphi,\psi,p,q}$ such that for all $(t,x)\notin N_{\varphi,\psi,p,q}$ ,

L_{t,x}(p\varphi+q\psi)=pL_{t,x}(\varphi)+qL_{t,x}(\psi).

(2.6)

We define the $\mu$ -null set

N\coloneqq\Biggl{(}\bigcup_{\varphi\in\mathcal{L},\varphi\geq 0}N_{\varphi}\Biggr{)}\cup\Biggl{(}\bigcup_{\varphi,\psi\in\mathcal{L},p,q\in\mathbb{Q}}N_{\varphi,\psi,p,q}\Biggr{)}.

For each $\varphi\in\mathcal{L}$ , we modify $L_{t,x}(\varphi)\coloneqq 0$ for $(t,x)\in N$ and keep the same notation. Now by construction, (2.5) holds for all $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , $\varphi\in\mathcal{L}$ with $\varphi\geq 0$ , and (2.6) holds for all $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , $\varphi,\psi\in\mathcal{L}$ , $p,q\in\mathbb{Q}$ . Thus, for fixed $(t,x)$ we see that $L_{t,x}$ is a positive $\mathbb{Q}$ -linear functional on $\mathcal{L}$ . Moreover, for each $\varphi\in\mathcal{L}$ , the function $(t,x)\mapsto L_{t,x}(\varphi)$ is still a version of $\frac{d\nu_{\varphi}}{d\mu}$ , so (2.4) remains true for Lebesgue-a.e. $t\geq 0$ .

The next step is to extend $L_{t,x}$ to $C_{\ell}(\mathbb{R}^{d})$ for each fixed $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ . Let $\varphi\in\mathcal{L}$ , and take a sequence $(q_{n})_{n\in\mathbb{N}}\subset\mathbb{Q}$ decreasing to $\lVert\varphi\rVert_{\infty}$ . Note that $|\varphi|\leq q_{n}$ for all $n$ , so it follows that

\begin{split}L_{t,x}(\varphi)=-L_{t,x}(q_{n}-\varphi)+q_{n}L_{t,x}(1)\leq q_{n}L_{t,x}(1),\\ L_{t,x}(\varphi)=L_{t,x}(q_{n}+\varphi)-q_{n}L_{t,x}(1)\geq-q_{n}L_{t,x}(1),\end{split}

i.e. $|L_{t,x}(\varphi)|\leq q_{n}L_{t,x}(1)$ . Letting $n\to\infty$ , we obtain that

|L_{t,x}(\varphi)|\leq L_{t,x}(1)\lVert\varphi\rVert_{\infty}.

(2.7)

By (2.7) and the density of $\mathcal{L}$ in $C_{\ell}(\mathbb{R}^{d})$ , we can uniquely extend³³3 This extension is based on a standard argument. One delicate point is that $\mathcal{L}$ is a vector space over $\mathbb{Q}$ but $C_{\ell}(\mathbb{R}^{d})$ is a vector space over $\mathbb{R}$ . In the proof of the linearity of $L_{t,x}$ on $C_{\ell}(\mathbb{R}^{d})$ , we need an extra step simply by the density of $\mathbb{Q}$ in $\mathbb{R}$ . $L_{t,x}$ to a bounded linear functional on $C_{\ell}(\mathbb{R}^{d})$ , and (2.7) holds for all $\varphi\in C_{\ell}(\mathbb{R}^{d})$ . Moreover, let $\varphi\in C_{\ell}(\mathbb{R}^{d})$ with $\varphi\geq 0$ , and take a sequence $(\varphi_{n})_{n\in\mathbb{N}}\subset\mathcal{L}$ converging to $\varphi$ . Let $0<\varepsilon\in\mathbb{Q}$ . Since $\varphi_{n}\geq-\varepsilon$ for $n$ large enough and $L_{t,x}$ is positive on $\mathcal{L}$ , it follows that

L_{t,x}(\varphi)=\lim_{n\to\infty}L_{t,x}(\varphi_{n})=\lim_{n\to\infty}L_{t,x}(\varphi_{n}+\varepsilon)-\varepsilon L_{t,x}(1)\geq-\varepsilon L_{t,x}(1).

Sending $\varepsilon\to 0$ along rational numbers, we get $L_{t,x}(\varphi)\geq 0$ . Thus, $L_{t,x}$ is a positive bounded linear functional on $C_{\ell}(\mathbb{R}^{d})$ , and in particular on $C_{0}(\mathbb{R}^{d})$ . By the Riesz–Markov–Kakutani representation theorem, there exists a finite (positive) Radon measure, denoted by $k(t,x,d\xi)$ , such that

L_{t,x}(\varphi)=\int_{\mathbb{R}^{d}}\varphi(\xi)\,k(t,x,d\xi),\quad\forall\,\varphi\in C_{0}(\mathbb{R}^{d}).

(2.8)

We claim that $k$ is a finite transition kernel from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ . For fixed $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , by construction $k(t,x,d\xi)$ is a finite measure. On the other hand, we have that $L_{t,x}(\varphi)$ is measurable in $(t,x)$ for all $\varphi\in\mathcal{L}$ , thus for all $\varphi\in C_{0}(\mathbb{R}^{d})$ by pointwise convergence. Since the indicator function of an open cube can be approximated by functions in $C_{0}(\mathbb{R}^{d})$ , from (2.8) and the monotone convergence theorem we know that $k(t,x,A)$ is measurable in $(t,x)$ for all open cubes $A$ . Then by Dynkin’s $\pi$ - $\lambda$ theorem, measurability holds for all $A\in\mathcal{B}(\mathbb{R}^{d})$ . This proves our claim.

It only remains to verify (2.3) for Lebesgue-a.e. $t\geq 0$ . The way we argue is similar to the previous paragraph. We already know that for Lebesgue-a.e. $t\geq 0$ :

(i)

(2.4) holds for all $\varphi\in\mathcal{L}$ , since $\mathcal{L}$ is countable,
(ii)

$\mathbb{E}[\kappa_{t}(\mathbb{R}^{d})]<\infty$ , due to (2.2).

We fix such “good” $t$ . Now for $\varphi\in C_{0}(\mathbb{R}^{d})$ , take a sequence in $\mathcal{L}$ converging to $\varphi$ . By pointwise convergence on the left-hand side and $L^{1}$ -convergence on the right-hand side of (2.4), it is easy to check that (2.4) holds for all $\varphi\in C_{0}(\mathbb{R}^{d})$ . Then by (2.8) and the monotone convergence theorem, we know that (2.3) holds for all open cubes $A$ . Finally, Dynkin’s $\pi$ - $\lambda$ theorem yields that (2.3) holds for all $A\in\mathcal{B}(\mathbb{R}^{d})$ . This finishes the proof. ∎

Remark 2.6.

Under the framework of Lemma 2.5, with a bit more effort, we can show that for Lebesgue-a.e. $t\geq 0$ ,

\int_{\mathbb{R}^{d}}g(X_{t},\xi)\,k(t,X_{t},d\xi)=\mathbb{E}\biggl{[}\int_{\mathbb{R}^{d}}g(X_{t},\xi)\,\kappa_{t}(d\xi)\,\bigg{|}\,X_{t}\biggr{]}

(2.9)

holds for all bounded measurable functions $g:\mathbb{R}^{2d}\to\mathbb{R}$ . Indeed, (2.3) implies that (2.9) holds for all $g$ of the form $\bm{1}_{A_{1}\times A_{2}}$ with $A_{1},A_{2}\in\mathcal{B}(\mathbb{R}^{d})$ . Dynkin’s $\pi$ - $\lambda$ theorem then tells us that (2.9) holds for all $g$ of the form $\bm{1}_{E}$ with $E\in\mathcal{B}(\mathbb{R}^{2d})$ . Finally, a standard approximation argument yields the desired result.

2.3 Differential Characteristics

In this subsection we briefly review the concept of differential characteristics of Itô semimartingales. For a detailed discussion, the readers can refer to [9], Chapter II.2. Note that in this paper, all semimartingales have càdlàg sample paths by convention.

Definition 2.7.

We say $h:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is a truncation function if $h$ is measurable, bounded and $h(x)=x$ in a neighborhood of $0$ .

Now we give the definition of differential characteristics. Recall that an Itô semimartingale is a semimartingale whose characteristics are absolutely continuous in the time variable.

Definition 2.8.

Let $X=(X^{i})_{1\leq i\leq d}$ be an $\mathbb{R}^{d}$ -valued Itô semimartingale. The differential characteristics of $X$ associated with a truncation function $h$ is the triplet $(\beta,\alpha,\kappa)$ consisting in:

(i)

$\beta=(\beta^{i})_{1\leq i\leq d}$ is an $\mathbb{R}^{d}$ -valued predictable process such that $\int_{0}^{\cdot}\beta_{s}\,ds$ is the predictable finite variation part of the special semimartingale

$X(h)_{t}=X_{t}-\sum_{s\leq t}(\Delta X_{s}-h(\Delta X_{s})).$
(ii)

$\alpha=(\alpha^{ij})_{1\leq i,j\leq d}$ is an $\mathbb{S}_{+}^{d}$ -valued predictable process such that

$\int_{0}^{\cdot}\alpha_{s}^{ij}\,ds=\langle X^{i,c},X^{j,c}\rangle,\quad 1\leq i,j\leq d,$

where $X^{c}=(X^{i,c})_{1\leq i\leq d}$ is the continuous local martingale part of $X$ .
(iii)

$\kappa$ is a predictable Lévy transition kernel from $\Omega\times\mathbb{R}_{+}$ to $\mathbb{R}^{d}$ such that $\kappa_{t}(d\xi)dt$ is the compensator of the random measure $\mu^{X}$ associated to the jumps of $X$ , namely

$\mu^{X}(dt,d\xi)=\sum_{s>0}\bm{1}_{\{\Delta X_{s}\neq 0\}}\delta_{(s,\Delta X_{s})}(dt,d\xi).$

Remark 2.9.

We require the differential characteristics $(\beta,\alpha,\kappa)$ to be predictable. As was discussed in [9], Proposition II.2.9, we can always find such a “good” version. We also note that $\alpha$ and $\kappa$ do not depend on the choice of the truncation function $h$ , while $\beta=\beta(h)$ does. For two truncation functions $h$ , $\widetilde{h}$ , the relationship between their corresponding $\beta$ is given by [9], Proposition II.2.24:

\beta(h)_{t}-\beta(\widetilde{h})_{t}=\int_{\mathbb{R}^{d}}(h(\xi)-\widetilde{h}(\xi))\,\kappa_{t}(d\xi).

(2.10)

Using differential characteristics, one can write an Itô semimartingale in its canonical decomposition ([9], Theorem II.2.34):

\begin{split}X_{t}=X_{0}&+\int_{0}^{t}\beta_{s}\,ds+X^{c}_{t}\\ &+\int_{0}^{t}\int_{\mathbb{R}^{d}}h(\xi)\,(\mu^{X}(ds,d\xi)-\kappa_{s}(d\xi)ds)+\int_{0}^{t}\int_{\mathbb{R}^{d}}(\xi-h(\xi))\,\mu^{X}(ds,d\xi).\end{split}

Always perhaps, after enlarging the probability space, we may have the representation $X^{c}=\int_{0}^{\cdot}(\alpha_{s})^{1/2}\,dB_{s}$ for some $d$ -dimensional Brownian motion $B$ , and this is what we usually see in applications. As our proof does not rely on such Itô integrals, we stick to the more general setting.

Finally, we give a well-known property of Itô semimartingales, which will be used in our main results. Since the proof is short, we present it below for completeness.

Proposition 2.10.

Let $X$ be an Itô semimartingale. Then, for each fixed $t\geq 0$ , $\Delta X_{t}=0$ $\mathbb{P}$ -a.s.

Proof.

Let $\kappa$ be the third differential characteristic of $X$ , i.e. $\kappa_{s}(d\xi)ds$ is the compensator of $\mu^{X}$ . Fix $t\geq 0$ , then by the definition of compensators,

\mathbb{P}(\Delta X_{t}\neq 0)=\mathbb{E}\biggl{[}\int_{\mathbb{R}_{+}\times\mathbb{R}^{d}}\bm{1}_{\{s=t\}}\,\mu^{X}(ds,d\xi)\biggr{]}=\mathbb{E}\biggl{[}\int_{\mathbb{R}_{+}}\int_{\mathbb{R}^{d}}\bm{1}_{\{s=t\}}\,\kappa_{s}(d\xi)\,ds\biggr{]}=0.

∎

3 Main Results

In this section we present our main results on Markovian projections for Itô semimartingales with jumps. Our proof uses the superposition principle for non-local FPKEs established in [16]. As a consequence, we construct Markovian projections which are solutions to martingale problems, or equivalently, weak solutions to SDEs.

First we recall the notion of martingale problem. Since we are working with semimartingales with jumps, consider the path space $\mathbb{D}(\mathbb{R}_{+};\mathbb{R}^{d})$ of all càdlàg functions from $\mathbb{R}_{+}$ to $\mathbb{R}^{d}$ , endowed with the Skorokhod topology. Let $X$ be the canonical process, i.e. $X_{t}(\omega)=\omega(t)$ for $\omega\in\mathbb{D}(\mathbb{R}_{+};\mathbb{R}^{d})$ and $t\geq 0$ . Let $\mathbb{F}^{0}$ be the natural filtration generated by $X$ , and $\mathbb{F}$ be the right-continuous regularization of $\mathbb{F}^{0}$ . Consider the non-local operator $\mathcal{L}=(\mathcal{L}_{t})_{t\geq 0}$ given, for $f\in C^{2}(\mathbb{R}^{d})\cap C_{b}(\mathbb{R}^{d})$ and $x\in\mathbb{R}^{d}$ , by

\begin{split}\mathcal{L}_{t}f(x)\coloneqq b(t,x)\cdot\nabla f(x)&+\frac{1}{2}\mathrm{tr}(a(t,x)\nabla^{2}f(x))\\ &+\int_{\mathbb{R}^{d}}\bigl{(}f(x+\xi)-f(x)-\nabla f(x)\cdot\xi\bm{1}_{\{|\xi|\leq r\}}\bigr{)}\,k(t,x,d\xi),\end{split}

(3.1)

where $b:\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ , $a:\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{S}_{+}^{d}$ are measurable functions, $k$ is a Lévy transition kernel from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ , and $r>0$ is a constant.

Definition 3.1 (Martingale Problem).

Let $\mu_{0}\in\mathcal{P}(\mathbb{R}^{d})$ . We call $\widehat{\mathbb{P}}\in\mathcal{P}(\mathbb{D}(\mathbb{R}_{+};\mathbb{R}^{d}))$ a solution to the martingale problem (or a martingale solution) for $\mathcal{L}$ with initial law $\mu_{0}$ , if

(i)

$\widehat{\mathbb{P}}\circ(X_{0})^{-1}=\mu_{0}$ ,
(ii)

for each $f\in C_{c}^{2}(\mathbb{R}^{d})$ , the process

$M^{f}_{t}\coloneqq f(X_{t})-f(X_{0})-\int_{0}^{t}\mathcal{L}_{s}f(X_{s})\,ds$

is well-defined and an $\mathbb{F}$ -martingale under $\widehat{\mathbb{P}}$ .

Under some regularity conditions, e.g. local boundedness of $b$ , $a$ , and $\int_{\mathbb{R}^{d}}1\land|\xi|^{2}\,k(\cdot,\cdot,d\xi)$ (which holds under the assumptions of Theorem 3.2), (ii) in Definition 3.1 implies that for each $f\in C^{2}(\mathbb{R}^{d})\cap C_{b}(\mathbb{R}^{d})$ , $M^{f}$ is an $\mathbb{F}$ -local martingale under $\widehat{\mathbb{P}}$ . In particular, by [9], Theorem II.2.42, $X$ admits differential characteristics $b(t,X_{t-})$ , $a(t,X_{t-})$ and $k(t,X_{t-},d\xi)$ , associated with the truncation function $h(x)=x\bm{1}_{\{|x|\leq r\}}$ . We sometimes also say a process $\widetilde{X}$ is a solution to the martingale problem for $\mathcal{L}$ . By this, we mean there exists some filtered probability space and an adapted, càdlàg process $\widetilde{X}$ on it, such that (i) and (ii) in Definition 3.1 are satisfied by $\widetilde{X}$ on its underlying probability space. We can think of it as an analogy to the notion of weak solutions of SDEs.

Now we can state our main results.

Theorem 3.2 (Markovian Projection).

Let $X$ be an $\mathbb{R}^{d}$ -valued Itô semimartingale with differential characteristics $(\beta,\alpha,\kappa)$ associated with the truncation function $h(x)=x\bm{1}_{\{|x|\leq r\}}$ for some $r>0$ . Suppose that $(\beta,\alpha,\kappa)$ satisfies

\mathbb{E}\biggl{[}\int_{0}^{t}\biggl{(}|\beta_{s}|+|\alpha_{s}|+\int_{\mathbb{R}^{d}}1\land|\xi|^{2}\,\kappa_{s}(d\xi)\biggr{)}\,ds\biggr{]}<\infty,\quad\forall\,t>0.

(3.2)

Then, there exist measurable functions $b:\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ , $a:\mathbb{R}_{+}\times\mathbb{R}^{d}\to\mathbb{S}_{+}^{d}$ , and a Lévy transition kernel $k$ from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ such that for Lebesgue-a.e. $t\geq 0$ ,

\begin{split}b(t,X_{t-})&=\mathbb{E}[\beta_{t}\,|\,X_{t-}],\\ a(t,X_{t-})&=\mathbb{E}[\alpha_{t}\,|\,X_{t-}],\\ \int_{A}1\land|\xi|^{2}\,k(t,X_{t-},d\xi)&=\mathbb{E}\biggl{[}\int_{A}1\land|\xi|^{2}\,\kappa_{t}(d\xi)\,\bigg{|}\,X_{t-}\biggr{]},\quad\forall\,A\in\mathcal{B}(\mathbb{R}^{d}).\end{split}

(3.3)

Furthermore, if $(b,a,k)$ satisfies the condition

\begin{split}\sup_{(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}}\biggl{[}&\frac{|b(t,x)|}{1+|x|}+\frac{|a(t,x)|}{1+|x|^{2}}\\ &+\int_{\mathbb{R}^{d}}\biggl{(}\bm{1}_{\{|\xi|<r\}}\frac{|\xi|^{2}}{1+|x|^{2}}+\bm{1}_{\{|\xi|\geq r\}}\log\biggl{(}1+\frac{|\xi|}{1+|x|}\biggr{)}\biggr{)}\,k(t,x,d\xi)\biggr{]}<\infty,\end{split}

(3.4)

then there exists a solution $\widehat{X}$ to the martingale problem for $\mathcal{L}$ , where $\mathcal{L}$ is as defined in (3.1), such that for each $t\geq 0$ , the law of $\widehat{X}_{t}$ agrees with the law of $X_{t}$ .

Before proving Theorem 3.2, we make a few remarks to give more insight into this theorem.

Remark 3.3.

Consider the measure $\widetilde{\mu}$ defined as follows:

\widetilde{\mu}(A)\coloneqq\mathbb{E}\biggl{[}\int_{0}^{\infty}\bm{1}_{A}(s,X_{s})\,ds\biggr{]}=\mathbb{E}\biggl{[}\int_{0}^{\infty}\bm{1}_{A}(s,X_{s-})\,ds\biggr{]},\quad A\in\mathcal{B}(\mathbb{R}_{+}\times\mathbb{R}^{d}).

Intuitively, we can think of $\widetilde{\mu}$ as the “law” of $(t,X_{t}(\omega))$ or $(t,X_{t-}(\omega))$ (though $\widetilde{\mu}$ is not a probability measure). One can easily check that the triplet $(b,a,k(\cdot,\cdot,d\xi))$ , which satisfies (3.3) for Lebesgue-a.e. $t\geq 0$ , is unique up to a $\widetilde{\mu}$ -null set. Moreover, the Markovian projection $\widehat{X}$ is a martingale solution for $\mathcal{L}$ , regardless of which version of $(b,a,k)$ is used in (3.1). Indeed, for each $f\in C_{c}^{2}(\mathbb{R}^{d})$ , the function $(t,x)\mapsto\mathcal{L}_{t}f(x)$ is uniquely defined up to a $\widetilde{\mu}$ -null set. We also note that by Fubini’s theorem and the mimicking property, $\widetilde{\mu}$ can be written as

\widetilde{\mu}(A)=\widehat{\mathbb{E}}\biggl{[}\int_{0}^{\infty}\bm{1}_{A}(s,\widehat{X}_{s})\,ds\biggr{]},\quad A\in\mathcal{B}(\mathbb{R}_{+}\times\mathbb{R}^{d}),

where $\widehat{\mathbb{E}}$ is the expectation on the underlying probability space of $\widehat{X}$ . It follows that different versions of $(b,a,k)$ lead to indistinguishable processes $\int_{0}^{\cdot}\mathcal{L}_{s}f(\widehat{X}_{s})\,ds$ . As a consequence of this observation, condition (3.4) can be weakened by replacing supremum with $\widetilde{\mu}$ -essential supremum.

Remark 3.4.

In the theorem we take a truncation function $h(x)=x\bm{1}_{\{|x|\leq r\}}$ for some $r>0$ . Recall that $\beta$ depends on $r$ , while $\alpha$ , $\kappa$ do not. By (2.10), we see that the integrability condition (3.2) does not depend on the choice of $r$ . However, the growth condition (3.4) does depend on $r$ . One can check that for $0<r<\widetilde{r}$ , if (3.4) holds for $r$ , then it also holds for $\widetilde{r}$ (note that $b$ also depends on $r$ ). The converse is not true in general. In applications, we can pick any specific $r$ such that the assumptions of the theorem are satisfied.

Remark 3.5.

Under (3.2), one sufficient condition on $X$ that automatically implies (3.4) with $\widetilde{\mu}$ -essential supremum is the following: the process

\frac{|\beta_{t}|}{1+|X_{t}|}+\frac{|\alpha_{t}|}{1+|X_{t}|^{2}}+\int_{\mathbb{R}^{d}}\biggl{(}\bm{1}_{\{|\xi|<r\}}\frac{|\xi|^{2}}{1+|X_{t}|^{2}}+\bm{1}_{\{|\xi|\geq r\}}\log\biggl{(}1+\frac{|\xi|}{1+|X_{t}|}\biggr{)}\biggr{)}\,\kappa_{t}(d\xi)

(or equivalently replacing $X$ with $X_{-}$ ) is bounded up to a $(\mathbb{P}\otimes dt)$ -null set. The proof is simply by taking conditional expectations $\mathbb{E}[\cdot\,|\,X_{t-}]$ .

Remark 3.6.

In the case where $X$ is a continuous Itô semimartingale, i.e. $\kappa=0$ , the growth condition (3.4) is not needed. This is exactly Corollary 3.7 (Process itself) in Brunick and Shreve [5]. We will discuss the continuous case further at the end of this section.

Now we prove our main theorem.

Proof of Theorem 3.2.

The existence of $b$ and $a$ follows from (3.2) and Lemma 2.3, noticing that $\mathbb{S}_{+}^{d}$ is a closed convex set in $\mathbb{R}^{d\times d}$ . To get the existence of $k$ , consider the transition kernel $\widetilde{\kappa}_{t}(d\xi)\coloneqq 1\land|\xi|^{2}\,\kappa_{t}(d\xi)$ from $\Omega\times\mathbb{R}_{+}$ to $\mathbb{R}^{d}$ . (3.2) and Lemma 2.5 yield a finite transition kernel $\widetilde{k}$ from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ such that for Lebesgue-a.e. $t\geq 0$ ,

\widetilde{k}(t,X_{t-},A)=\mathbb{E}[\widetilde{\kappa}_{t}(A)\,|\,X_{t-}],\quad\forall\,A\in\mathcal{B}(\mathbb{R}^{d}).

For $(t,x)\in\mathbb{R}_{+}\times\mathbb{R}^{d}$ , define $k(t,x,d\xi)\coloneqq(1\land|\xi|^{2})^{-1}\widetilde{k}(t,x,d\xi)$ on $\mathbb{R}^{d}\setminus\{0\}$ and $k(t,x,\{0\})\coloneqq 0$ . Then, $k$ is a Lévy transition kernel from $\mathbb{R}_{+}\times\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ that satisfies (3.3). Moreover, Remark 2.6 further tells us that for Lebesgue-a.e. $t\geq 0$ ,

\int_{\mathbb{R}^{d}}g(X_{t-},\xi)\,k(t,X_{t-},d\xi)=\mathbb{E}\biggl{[}\int_{\mathbb{R}^{d}}g(X_{t-},\xi)\,\kappa_{t}(d\xi)\,\bigg{|}\,X_{t-}\biggr{]}

(3.5)

holds for all measurable functions $g:\mathbb{R}^{2d}\to\mathbb{R}$ satisfying $|g(x,\xi)|\leq C(1\land|\xi|^{2})$ , $\forall\,x,\xi\in\mathbb{R}^{d}$ , for some constant $C>0$ .

Now we prove the second part of Theorem 3.2. By [9], Theorem II.2.42, we know that for each $f\in C_{c}^{2}(\mathbb{R}^{d})$ , the process

\begin{split}M_{t}^{f}\coloneqq f(X_{t})-f(X_{0})-\int_{0}^{t}\biggl{(}&\beta_{s}\cdot\nabla f(X_{s-})+\frac{1}{2}\mathrm{tr}(\alpha_{s}\nabla^{2}f(X_{s-}))\\ &+\int_{\mathbb{R}^{d}}\bigl{(}f(X_{s-}+\xi)-f(X_{s-})-\nabla f(X_{s-})\cdot h(\xi)\bigr{)}\,\kappa_{s}(d\xi)\biggr{)}\,ds\end{split}

is a local martingale. In particular, $M^{f}$ is locally bounded, thus locally square-integrable and $\langle M^{f},M^{f}\rangle$ is well-defined. We claim that $M^{f}$ is a (true) martingale. To show this, it suffices to check $\mathbb{E}[\langle M^{f},M^{f}\rangle_{t}]<\infty$ for all $t\geq 0$ . Let’s first compute $[M^{f},M^{f}]$ . Note that $M^{f}-f(X)-f(X_{0})$ is a continuous finite variation process, so we have $[M^{f},M^{f}]=[f(X),f(X)]$ . By Itô’s formula, the continuous local martingale part of $f(X)$ is given by $\sum_{i=1}^{d}\int_{0}^{\cdot}\partial_{i}f(X_{s-})\,dX^{i,c}_{s}$ . Then, it follows from [9], Theorem I.4.52 that

\begin{split}[f(X),f(X)]_{t}&=\sum_{i=1}^{d}\sum_{j=1}^{d}\int_{0}^{t}\partial_{i}f(X_{s-})\partial_{j}f(X_{s-})\,d\langle X^{i,c},X^{j,c}\rangle_{s}+\sum_{s\leq t}(f(X_{s})-f(X_{s-}))^{2}\\ &=\int_{0}^{t}\nabla f(X_{s-})\cdot\alpha_{s}\nabla f(X_{s-})\,ds+\int_{0}^{t}\int_{\mathbb{R}^{d}}(f(X_{s-}+\xi)-f(X_{s-}))^{2}\,\mu^{X}(ds,d\xi).\end{split}

Since $\langle M^{f},M^{f}\rangle$ is the compensator of $[M^{f},M^{f}]=[f(X),f(X)]$ , we deduce that

\begin{split}\langle M^{f},M^{f}\rangle_{t}&=\int_{0}^{t}\biggl{(}\nabla f(X_{s-})\cdot\alpha_{s}\nabla f(X_{s-})+\int_{\mathbb{R}^{d}}(f(X_{s-}+\xi)-f(X_{s-}))^{2}\,\kappa_{s}(d\xi)\biggr{)}\,ds\\ &\leq C\int_{0}^{t}\biggl{(}|\alpha_{s}|+\int_{\mathbb{R}^{d}}1\land|\xi|^{2}\,\kappa_{s}(d\xi)\biggr{)}\,ds,\end{split}

where $C=C(\lVert f\rVert_{\infty},\lVert\nabla f\rVert_{\infty})>0$ is some constant, and we used the fact that

|f(x+\xi)-f(x)|^{2}\leq C(1\land|\xi|^{2}),\quad\forall\,x,\xi\in\mathbb{R}^{d}.

Thus, by (3.2) we get $\mathbb{E}[\langle M^{f},M^{f}\rangle_{t}]<\infty$ for all $t\geq 0$ , which proves our claim that $M^{f}$ is a martingale.

From the martingale property established above, we have that $\mathbb{E}[M_{t}^{f}]=\mathbb{E}[M_{0}^{f}]=0$ for each $t\geq 0$ . This allows us to compute

\begin{split}&\mathbb{E}[f(X_{t})]-\mathbb{E}[f(X_{0})]\\ &\quad\quad=\int_{0}^{t}\mathbb{E}\biggl{[}\beta_{s}\cdot\nabla f(X_{s-})+\frac{1}{2}\mathrm{tr}(\alpha_{s}\nabla^{2}f(X_{s-}))\\ &\quad\quad\quad\quad\quad\,\,\,\,\,+\int_{\mathbb{R}^{d}}\bigl{(}f(X_{s-}+\xi)-f(X_{s-})-\nabla f(X_{s-})\cdot h(\xi)\bigr{)}\,\kappa_{s}(d\xi)\biggr{]}\,ds\\ &\quad\quad=\int_{0}^{t}\mathbb{E}\biggl{[}\mathbb{E}[\beta_{s}\,|\,X_{s-}]\cdot\nabla f(X_{s-})+\frac{1}{2}\mathrm{tr}(\mathbb{E}[\alpha_{s}\,|\,X_{s-}]\nabla^{2}f(X_{s-}))\\ &\quad\quad\quad\quad\quad\,\,\,\,\,+\mathbb{E}\biggl{[}\int_{\mathbb{R}^{d}}\bigl{(}f(X_{s-}+\xi)-f(X_{s-})-\nabla f(X_{s-})\cdot h(\xi)\bigr{)}\,\kappa_{s}(d\xi)\,\bigg{|}\,X_{s-}\biggr{]}\biggr{]}\,ds\\ &\quad\quad=\int_{0}^{t}\mathbb{E}\biggl{[}b(s,X_{s-})\cdot\nabla f(X_{s-})+\frac{1}{2}\mathrm{tr}(a(s,X_{s-})\nabla^{2}f(X_{s-}))\\ &\quad\quad\quad\quad\quad\,\,\,\,\,+\int_{\mathbb{R}^{d}}\bigl{(}f(X_{s-}+\xi)-f(X_{s-})-\nabla f(X_{s-})\cdot h(\xi)\bigr{)}\,k(s,X_{s-},d\xi)\biggr{]}\,ds\\ &\quad\quad=\int_{0}^{t}\mathbb{E}[\mathcal{L}_{s}f(X_{s-})]\,ds,\end{split}

(3.6)

where in the first equality Fubini’s theorem is justified because of (3.2) and the fact that

|f(x+\xi)-f(x)-\nabla f(x)\cdot h(\xi)|\leq C(1\land|\xi|^{2}),\quad\forall\,x,\xi\in\mathbb{R}^{d},

(3.7)

for some constant $C=C(\lVert f\rVert_{\infty},\lVert\nabla^{2}f\rVert_{\infty})>0$ , and in the last but one equality we used (3.3), (3.5) and (3.7) once more.

Let $\mu_{t}$ denote the law of $X_{t}$ . Since $X$ is a càdlàg process, it is easy to see that the map $\mathbb{R}_{+}\ni t\mapsto\mu_{t}\in\mathcal{P}(\mathbb{R}^{d})$ is càdlàg and $\mu_{t-}$ is the law of $X_{t-}$ . Moreover, by Proposition 2.10, for fixed $t\geq 0$ we have $\Delta X_{t}=0$ $\mathbb{P}$ -a.s., i.e. $X_{t}=X_{t-}$ $\mathbb{P}$ -a.s. This implies that $\mu_{t}=\mu_{t-}$ , and the map $\mathbb{R}_{+}\ni t\mapsto\mu_{t}\in\mathcal{P}(\mathbb{R}^{d})$ is actually continuous. Then, (3.6) can be written as

\mu_{t}(f)=\mu_{0}(f)+\int_{0}^{t}\mu_{s}(\mathcal{L}_{s}f)\,ds,\quad\forall\,t\geq 0,\,f\in C_{c}^{2}(\mathbb{R}^{d}).

(3.8)

This shows that $(\mu_{t})_{t\geq 0}$ is a weak solution to the non-local FPKE associated with $\mathcal{L}$ in the sense of [16], Definition 1.1. Together with the growth condition (3.4), we are now in a position to apply [16], Theorem 1.5.⁴⁴4 In the proof of the superposition principle in [16], the authors assumed without loss of generality that $r\leq 1/\sqrt{2}$ . This is for simplicity in some upper bound estimates, without introducing complicated constants involving $r$ . The result actually holds for all $r>0$ . We conclude that there exists a solution $\widehat{\mathbb{P}}\in\mathcal{P}(\mathbb{D}(\mathbb{R}_{+};\mathbb{R}^{d}))$ to the martingale problem for $\mathcal{L}$ such that for each $t\geq 0$ , the time- $t$ marginal of $\widehat{\mathbb{P}}$ agrees with $\mu_{t}$ . Equivalently, there exists a martingale solution $\widehat{X}$ for $\mathcal{L}$ which mimics the one-dimensional marginal laws of $X$ . This finishes the proof. ∎

As was mentioned in Remark 3.6, when $X$ is a continuous Itô semimartingale, Theorem 3.2 holds without assumption (3.4). In this case, the setting of the theorem is much simplified: we have $\kappa=0$ , thus $k=0$ . We also do not need the truncation function $h$ , so $\beta$ and $b$ have no dependency on $r$ . The same type of proof still works here. Indeed, following a similar argument, one can derive the FPKE (3.8). Now $\mathcal{L}$ is a local FPK operator, so we refer to Trevisan [18], which implies that the superposition principle holds under the assumption:

\Gamma_{t}\coloneqq\int_{0}^{t}\int_{\mathbb{R}^{d}}\bigl{(}|b(s,x)|+|a(s,x)|\bigr{)}\,\mu_{s}(dx)\,ds<\infty,\quad\forall\,t\geq 0.

This is an immediate consequence of (3.2) and (3.3), once we rewrite $\Gamma_{t}$ in the following way:

\Gamma_{t}=\int_{0}^{t}\mathbb{E}\bigl{[}|b(s,X_{s})|+|a(s,X_{s})|\bigr{]}\,ds\leq\int_{0}^{t}\mathbb{E}\bigl{[}|\beta_{s}|+|\alpha_{s}|\bigr{]}\,ds<\infty.

For local FPK operators, the superposition principle holds under relatively mild integrability assumptions. However, in the non-local case, the literature is limited and there is no such result to the best of our knowledge. Some boundedness or growth conditions need to be imposed, for example as in [16]. As of now, assumption (3.4) is needed for general discontinuous Itô semimartingales. Removing or weakening this assumption is a possible direction of future work.

4 Examples

In applications, Markovian projections usually appear in the inversion problem. More specifically, suppose we start with a relatively simple process $\widehat{X}$ . Our goal is to construct a more complicated process $X$ , while keeping the one-dimensional marginal laws unchanged. If we manage to find an $X$ such that $\widehat{X}$ is a Markovian projection of $X$ , then the marginal law constraints are automatically satisfied. This is what we mean by “inverting the Markovian projection”. In this section, we present three examples where our Markovian projection theorem can be applied.

4.1 Local Stochastic Volatility (LSV) Model.

One of the most famous applications of Markovian projections is the calibration of the LSV model in mathematical finance (see [3], Appendix A, [7], Chapter 11, and the references therein). Under the risk-neutral measure, the dynamics of the stock price is modeled via the following SDE (assuming constant interest rate $r$ and no dividend):

dS_{t}=rS_{t}\,dt+\eta_{t}\sigma(t,S_{t})S_{t}\,dB_{t},

(4.1)

where $\eta$ is the stochastic volatility, $\sigma$ is a function to be determined, and $B$ is a Brownian motion. Assume that $\eta$ is bounded from above and below by positive constants. One requires the LSV model to be perfectly calibrated to European call option prices (which depends on one-dimensional marginal laws). By the seminal work of Dupire [6], we have perfect calibration to European calls in the local volatility (LV) model:

d\widehat{S}_{t}=r\widehat{S}_{t}\,dt+\sigma_{\text{Dup}}(t,\widehat{S}_{t})\widehat{S}_{t}\,d\widehat{B}_{t},\quad\sigma_{\text{Dup}}^{2}(t,K)\coloneqq\frac{\partial_{t}C(t,K)+rK\partial_{K}C(t,K)}{(1/2)K^{2}\partial_{KK}C(t,K)},

where $\widehat{B}$ is a Brownian motion, $C(t,K)$ is the European call prices, and we assume that $\sigma_{\text{Dup}}$ is bounded. Thus, it suffices to have $\widehat{S}$ be a Markovian projection of $S$ . One can choose

\sigma(t,x)\coloneqq\frac{\sigma_{\text{Dup}}(t,x)}{\sqrt{\mathbb{E}[\eta_{t}^{2}\,|\,S_{t}=x]}},

(4.2)

where the conditional expectation term is understood in the sense of Lemma 2.3. Plugging (4.2) into (4.1) yields the McKean–Vlasov type SDE

dS_{t}=rS_{t}\,dt+\frac{\eta_{t}}{\sqrt{\mathbb{E}[\eta_{t}^{2}\,|\,S_{t}]}}\sigma_{\text{Dup}}(t,S_{t})S_{t}\,dB_{t}.

(4.3)

Suppose (4.3) admits a solution $S$ starting from $s_{0}>0$ . The differential characteristics of $S$ are

\beta_{t}=rS_{t},\quad\alpha_{t}=\frac{\eta_{t}^{2}}{\mathbb{E}[\eta_{t}^{2}\,|\,S_{t}]}\sigma_{\text{Dup}}^{2}(t,S_{t})S_{t}^{2},\quad\kappa_{t}(d\xi)=0.

By a standard Grönwall type argument, one can show that $S$ is bounded in $L^{2}$ on any finite time interval $[0,t]$ . Thus, assumption (3.2) is satisfied. Taking conditional expectations $\mathbb{E}[\cdot\,|\,S_{t}]$ , we get

b(t,x)=rx,\quad a(t,x)=\sigma_{\text{Dup}}^{2}(t,x)x^{2},\quad k(t,x,d\xi)=0.

It then follows from Theorem 3.2 that $\widehat{S}$ is indeed a Markovian projection of $S$ .

However, the SDE (4.3) is notoriously hard to solve, and doing so still remains an open problem in full generality. Partial results exist when $\eta$ is of the form $f(Y)$ . For instance, Abergel and Tachet [1] proved short-time existence of solutions to the corresponding FPKE, with $Y$ being a multi-dimensional diffusion process. Jourdain and Zhou [10] showed the weak existence when $Y$ is a finite-state jump process and $f$ satisfies a structural condition. Lacker, Shkolnikov and Zhang [13] showed the strong existence and uniqueness of stationary solutions, when $\sigma_{\text{Dup}}$ does not depend on $t$ and $Y$ solves an independent time homogeneous SDE.

4.2 Local Stochastic Intensity (LSI) Model.

The LSI model (see [2]) is a jump process analogue of the LSV model. It is often used in credit risk applications to model the number of defaults via a counting process $X$ whose intensity has the form $\eta_{t}\lambda(t,X_{t-})$ , where $\eta$ is the stochastic intensity and $\lambda$ is a function to be determined. In other words, the process

X_{t}-\int_{0}^{t}\eta_{s}\lambda(s,X_{s-})\,ds

is a (local) martingale. Similarly as in Example 4.1, we want the one-dimensional marginal laws of the LSI model to match those of the local intensity (LI) model, which can be perfectly calibrated to collateralized debt obligation (CDO) tranche prices (see [17]). Note that in the LI model, defaults are modeled via a counting process $\widehat{X}$ whose intensity has the form $\lambda_{\text{Loc}}(t,\widehat{X}_{t-})$ .

Assume that $\eta$ is bounded from above and below by positive constants, and $\lambda_{\text{Loc}}$ is bounded. One can choose

\lambda(t,x)=\frac{\lambda_{\text{Loc}}(t,x)}{\mathbb{E}[\eta_{t}\,|\,X_{t-}=x]},

which yields the McKean–Vlasov type martingale problem:

\biggl{(}X_{t}-\int_{0}^{t}\frac{\eta_{s}}{\mathbb{E}[\eta_{s}\,|\,X_{s-}]}\lambda_{\text{Loc}}(s,X_{s-})\,ds\biggr{)}_{t\geq 0}\text{ is a martingale}.

The differential characteristics of $X$ are

\beta_{t}=0,\quad\alpha_{t}=0,\quad\kappa_{t}(d\xi)=\frac{\eta_{s}}{\mathbb{E}[\eta_{s}\,|\,X_{s-}]}\lambda_{\text{Loc}}(s,X_{s-})\delta_{1}(d\xi),

where we used the truncation function $h(x)=x\bm{1}_{\{|x|\leq r\}}$ for $r<1$ . Taking conditional expectations $\mathbb{E}[\cdot\,|\,X_{t-}]$ , we get

b(t,x)=0,\quad a(t,x)=0,\quad k(t,x,d\xi)=\lambda_{\text{Loc}}(t,x)\delta_{1}(d\xi).

Clearly, (3.2) and (3.4) are justified, so it follows from Theorem 3.2 that $\widehat{X}$ is a Markovian projection of $X$ . When $\widehat{X}$ is a Poisson process (i.e. $\lambda_{\text{Loc}}$ is constant or a deterministic function of time $t$ ), we call $X$ a fake Poisson process.

Alfonsi, Labart and Lelong [2] constructed solutions to the LSI model when $\eta_{t}=f(Y_{t})$ for $Y$ either being a discrete state Markov chain or solving an SDE of the following type:

dY_{t}=b(t,X_{t-},Y_{t-})\,dt+\sigma(t,X_{t-},Y_{t-})\,dB_{t}+\gamma(t,X_{t-},Y_{t-})\,dX_{t},

where $B$ is a Brownian motion. In recent work [15], we prove the existence of solutions to the LSI model under milder regularity conditions, while our $\eta$ is an exogenously given process not in the above feedback form involving $X$ . We also extend the jump sizes of $X$ to follow any discrete law with finite first moment.

4.3 Fake Hawkes Processes.

A Hawkes process $\widehat{X}$ is a self-exciting counting process whose intensity is given by

\lambda_{t}=\lambda_{0}+\int_{0}^{t-}K(t-s)\,d\widehat{X}_{s}=\lambda_{0}+\sum_{i:\widehat{\tau}_{i}<t}K(t-\widehat{\tau}_{i}),

where $\lambda_{0}>0$ is the background intensity, $K\in L^{1}(\mathbb{R}_{+};\mathbb{R}_{+})$ is the excitation function and $\widehat{\tau}_{1}<\widehat{\tau}_{2}<\cdots$ are the jump times of $\widehat{X}$ . In this example, we consider the most basic excitation function, namely the exponential $K(t)=ce^{-\theta t}$ for some $c,\theta>0$ .

We are interested in inverting the Markovian projection of $\widehat{X}$ . However, we observe that the intensity of $\widehat{X}$ depends on the history of $\widehat{X}$ . In other words, the differential characteristics of $\widehat{X}$ are not functions of time and the process itself. Therefore, we cannot expect $\widehat{X}$ to be a Markovian projection of some process. To tackle this problem, we lift $\widehat{X}$ to the pair $(\widehat{X},\widehat{Y})$ by incorporating the right-continuous version of the intensity process, $\widehat{Y}=\lambda_{+}$ , and our goal is to invert the Markovian projection of $(\widehat{X},\widehat{Y})$ .

The specific form of the excitation function allows us to derive the dynamics of $\widehat{Y}$ as

d\widehat{Y}_{t}=\theta(\lambda_{0}-\widehat{Y}_{t-})\,dt+cd\widehat{X}_{t}.

We see that the differential characteristics of $(\widehat{X},\widehat{Y})$ are

\widehat{\beta}_{t}=\bigl{(}0,\theta(\lambda_{0}-\widehat{Y}_{t-})\bigr{)},\quad\widehat{\alpha}_{t}=0_{2\times 2},\quad\widehat{\kappa}_{t}(d\xi_{1},d\xi_{2})=\widehat{Y}_{t-}\delta_{(1,c)}(d\xi_{1},d\xi_{2}),

where we used the truncation function $h(x)=x\bm{1}_{\{|x|\leq r\}}$ for $r<\sqrt{1+c^{2}}$ (the jump size of $(\widehat{X},\widehat{Y})$ ). This inspires us to define $(X,Y)$ as follows: $X$ is a counting process with intensity

\frac{\eta_{t}}{\mathbb{E}[\eta_{t}\,|\,X_{t-},Y_{t-}]}Y_{t-},

and $Y$ satisfies

Y_{t}=\lambda_{0}+\int_{0}^{t}ce^{-\theta(t-s)}\,dX_{s}=\lambda_{0}+\sum_{i:\tau_{i}\leq t}ce^{-\theta(t-\tau_{i})},

where $\eta$ is some stochastic intensity bounded from above and below by positive constants, and $\tau_{1}<\tau_{2}<\cdots$ are the jump times of $X$ . We can similarly write down the differential characteristics of $(X,Y)$ with the same truncation function:

\beta_{t}=\bigl{(}0,\theta(\lambda_{0}-Y_{t-})\bigr{)},\quad\alpha_{t}=0_{2\times 2},\quad\kappa_{t}(d\xi_{1},d\xi_{2})=\frac{\eta_{t}}{\mathbb{E}[\eta_{t}\,|\,X_{t-},Y_{t-}]}Y_{t-}\delta_{(1,c)}(d\xi_{1},d\xi_{2}).

One can show that $(X,Y)$ is bounded in $L^{1}$ on any finite time interval $[0,t]$ . Thus, (3.2) and (3.4) are justified, and Theorem 3.2 tells us that $(X,Y)$ has the same one-dimensional marginal laws as $(\widehat{X},\widehat{Y})$ . We call $(X,Y)$ a fake Hawkes process. In our recent work [15], we prove the existence of such fake Hawkes processes.

References

[1] F. Abergel and R. Tachet. A nonlinear partial integro-differential equation from mathematical finance. Discrete Contin. Dyn. Syst., 27(3):907–917, 2010.
[2] A. Alfonsi, C. Labart, and J. Lelong. Stochastic local intensity loss models with interacting particle systems. Math. Finance, 26(2):366–394, 2016.
[3] L. B. Andersen and V. V. Piterbarg. Interest Rate Modeling. Volume 3: Products and Risk Management. Atlantic Financial Press, 2010.
[4] A. Bentata and R. Cont. Mimicking the marginal distributions of a semimartingale. arXiv preprint arXiv:0910.3992v5, 2012.
[5] G. Brunick and S. Shreve. Mimicking an Itô process by a solution of a stochastic differential equation. Ann. Appl. Probab., 23(4):1584–1628, 2013.
[6] B. Dupire. Pricing with a smile. Risk, 7(1):18–20, 1994.
[7] J. Guyon and P. Henry-Labordère. Nonlinear option pricing. Chapman & Hall/CRC Financial Mathematics Series. CRC Press, Boca Raton, FL, 2014.
[8] I. Gyöngy. Mimicking the one-dimensional marginal distributions of processes having an Itô differential. Probab. Theory Relat. Fields, 71(4):501–516, 1986.
[9] J. Jacod and A. N. Shiryaev. Limit theorems for stochastic processes, volume 288 of Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, second edition, 2003.
[10] B. Jourdain and A. Zhou. Existence of a calibrated regime switching local volatility model. Math. Finance, 30(2):501–546, 2020.
[11] B. Köpfer and L. Rüschendorf. Markov projection of semimartingales—application to comparison results. Stochastic Process. Appl., 162:361–386, 2023.
[12] N. V. Krylov. Once more about the connection between elliptic operators and Itô’s stochastic equations. In Statistics and control of stochastic processes (Moscow, 1984), Transl. Ser. Math. Engrg., pages 214–229. Optimization Software, New York, 1985.
[13] D. Lacker, M. Shkolnikov, and J. Zhang. Inverting the Markovian projection, with an application to local stochastic volatility models. Ann. Probab., 48(5):2189–2211, 2020.
[14] D. Lacker, M. Shkolnikov, and J. Zhang. Superposition and mimicking theorems for conditional McKean-Vlasov equations. J. Eur. Math. Soc. (JEMS), 25(8):3229–3288, 2023.
[15] M. Larsson and S. Long. Inverting the markovian projection for pure jump processes. Working paper, 2024.
[16] M. Röckner, L. Xie, and X. Zhang. Superposition principle for non-local Fokker-Planck-Kolmogorov operators. Probab. Theory Related Fields, 178(3-4):699–733, 2020.
[17] P. Schönbucher. Portfolio losses and the term structure of loss transition rates: a new methodology for the pricing of portfolio credit derivatives. Technical report, Citeseer, 2005.
[18] D. Trevisan. Well-posedness of multidimensional diffusion processes with weakly differentiable coefficients. Electron. J. Probab., 21:Paper No. 22, 41, 2016.