This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

maximum principle for optimal control of interacting particle system: stochastic flow model

Andrey A. Dorogovtsev Institute of Mathematics, NAS of Ukraine, Tereschenkivska st 3, Kyiv 01024, Ukraine. Yuecai Han School of Mathematics, Jilin University, Changchun 130012, China. Kateryna Hlyniana Institute of Mathematics, NAS of Ukraine Tereschenkivska st 3, Kyiv 01024, Ukraine, and School of Mathematics, Jilin University, Changchun 130012, China.  and  Yuhang Li School of Mathematics, Jilin University, Changchun 130012, China.
(Date: …)
Abstract.

In this paper, we consider the stochastic optimal control problem for the interacting particle system. We obtain the stochastic maximum principle of the optimal control system by introducing a generalized backward stochastic differential equation with interaction. The existence and uniqueness of the solution of this type of equation is proved. We derive the necessary condition that the optimal control should satisfy. As an application, the linear quadratic case is investigated to illustrate the main results.

Key words and phrases:
Maximum principle, backward stochastic differential equation with interaction, differentiation of measure, linear quadratic control.
1991 Mathematics Subject Classification:
60H10, 93E20, 93E03.
This work was partially supported by the National Science Foundation of China (grant no. 11871244 and 12201241).

1. Introduction

In this paper, we investigate a stochastic optimal control problem for a family of particles whose evolution is described by an equation with interaction. Equations with interactions were introduced by A.A. Dorogovtsev in [1, 2]. Here we use the following form of the stochastic differential equation (SDE) with interactions:

{dX(u,t)=b(t,X(u,t),μt)dt+σ(t,X(u,t),μt)dWt,X(u,0)=u,ud,μt=μ0X(,t)1.\left\{\begin{array}[]{l}dX(u,t)=b\left(t,X(u,t),\mu_{t}\right)dt+\sigma(t,X(u,t),\mu_{t})dW_{t},\\ X(u,0)=u,\quad u\in\mathbb{R}^{d},\\ \mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1}.\end{array}\right. (1)

with mm-dimensional standard Wiener process {Wt}t0\{W_{t}\}_{t\geq 0} and the function σ\sigma with values in d×m\mathbb{R}^{d\times m}. Equation (1) describes the dynamics of a family of particles {x(u,)}ud\{x(u,\cdot)\}_{u\in\mathbb{R}^{d}} that start from each point of the space d\mathbb{R}^{d}. The initial mass distribution of the family of particles is denoted by μ0,\mu_{0}, which is supposed to be a probability measure on the Borel σ\sigma-field in d\mathbb{R}^{d}. The mass distribution of the family of particles evolves over time, and at moment tt, it can be written as the push forward of μ0\mu_{0} by the mapping x(,t)x(\cdot,t). The coefficients of drift and diffusion in Equation (1) depend on the mass distribution μt\mu_{t}. This gives the possibility to describe the motion of a particle that depends on the positions of all other particles in the system.

Analogously to the usual stochastic optimal control problem, one can state the following control problem for a SDE with interaction:

{dX(u,t)=b(t,X(u,t),μt,αt)dt+σ(t,X(u,t),μt,αt)dWt,X(u,0)=u,μt=μ0X(,t)1.\left\{\begin{array}[]{l}dX(u,t)=b\left(t,X(u,t),\mu_{t},\alpha_{t}\right)dt+\sigma\left(t,X(u,t),\mu_{t},\alpha_{t}\right)dW_{t},\\ X(u,0)=u,\\ \mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1}.\end{array}\right. (2)

The goal of the stochastic optimal control problem is to find a control process
(αt)0tT\left(\alpha_{t}\right)_{0\leq t\leq T} that minimizes the cost function J(α)J(\alpha) of the form

J(α)=𝔼d[0Tf(t,X(u,t),μt,αt)𝑑t+g(X(u,T),μT)]μ0(du).J(\alpha)=\mathbb{E}\int_{\mathbb{R}^{d}}\left[\int_{0}^{T}f\left(t,X(u,t),\mu_{t},\alpha_{t}\right)dt+g\left(X(u,T),\mu_{T}\right)\right]\mu_{0}(du). (3)

The time evolution of the system (2) depends upon a stochastic process α=(αt,0tT)\mathbf{\alpha}=\left(\alpha_{t},\quad 0\leq t\leq T\right), which is supposed to be progressively measurable with respect to filtration {t}t0\{\mathcal{F}_{t}\}_{t\geq 0} of Wiener process. The value αt\alpha_{t} should be considered as a control, the choice of which is made at time tt. The investigation of such an optimal control problem was inspired by the Monge-Kantorovich [3, 4, 5], transport problem. The Monge-Kantorovich transport problem deals with a mapping of one probability measure to another, minimizing a transportation cost function. In the stochastic optimal control problem for equation with interaction, the system starts from the mass distribution μ0\mu_{0}, and the cost function (3) of the control problem has two parts: a running cost function (integral of ff over time) and a terminal cost function gg, that depends on mass distribution of the system at the final time. In terms of the Monge-Kantorovich transport problem, the function gg can play the role of a distance between a given measure ν\nu and the distribution of mass in the system at the final time. For example, for a given probability measure ν\nu we can take g(X(u,T),μT)=ρ(μT,ν),g(X(u,T),\mu_{T})=\rho(\mu_{T},\nu), where the distance ρ\rho between measures can be given by

ρ2(μ,ν)=2Γ(uv)[μ(du)ν(du)][ν(dv)μ(dv)],\displaystyle\rho^{2}(\mu,\nu)=\int\int_{\mathbb{R}^{2}}\Gamma(u-v)[\mu(du)-\nu(du)]\cdot[\nu(dv)-\mu(dv)], (4)

with non-negative continuous function Γ\Gamma such that Γ(0)=0\Gamma(0)=0.

In the case of a discrete initial measure μ0=i=1npiδui\mu_{0}=\sum_{i=1}^{n}p_{i}\delta_{u_{i}}, ui,1inu_{i}\in\mathbb{R},1\leq i\leq n, the mass distribution of the system is equal to μt=i=1npiδx(ui,t).\mu_{t}=\sum_{i=1}^{n}p_{i}\delta_{x(u_{i},t)}. From this one can see that the dependence of coefficients on the measure μt\mu_{t} in equation (2) and in the cost function (3) is realized through the dependence on the positions of the heavy points x(u1,t),,x(un,t).x(u_{1},t),\ldots,x(u_{n},t). In this case the optimal control process (αt)0tT(\alpha_{t})_{0\leq t\leq T} for the problem (2)-(3) can be found from the corresponding nn-dimensional stochastic optimal control problem for the system of heavy particles x(u1,t),,x(un,t)x(u_{1},t),\ldots,x(u_{n},t) (see [6]). Having optimal control process (αt)0tT(\alpha_{t})_{0\leq t\leq T} we solve equation (2) for all points with zero mass x(u,),u2{u1,,un}.x(u,\cdot),\ u\in\mathbb{R}^{2}\setminus\{u_{1},\ldots,u_{n}\}. This example shows how we can describe the dynamic of a family of particles {x(u,t),ud,t0}\{x(u,t),\ u\in\mathbb{R}^{d},\ t\geq 0\} driven not only by external forces but also by internal forces. Precisely, the internal force is represented through the dependence of the coefficients in (2) on the position of heavy particles in the family of particles.

It should be mentioned that the considered SDEs with interaction and BSDEs with interaction is different from the Mckean-Vlasov equation, which is actively investigated by many researchers [7, 8, 9, 10]. First of all, the coefficients of the Mckean-Vlasov equation depend on the distribution law of the position of a particle, but not on the distribution of mass of a system, as in SDEs with interaction. More specifically, Mckean-Vlasov equation and SDEs with interaction describe the particle system in different views. For Mckean-Vlasov equation, there are a lot of particles driven by independent noises, and their initial positions are i.i.d variables. The distribution of the position for these particles converge to the probability distribution law. For SDEs with interaction, the initial positions of these particles are determined and the distribution of their initial position is known. There are not many independent noise. Such as these particles are driven by a Brownian sheet, which makes they have different random perturbations when they are in different locations. Consider the following example, we assume there are NN particles. For Mckean-Vlasov type, the state process has the form:

{dXi(t)=b(t,Xi(t),1Nj=1NδXj(t))dt+σ(t,Xi(t),1Nj=1NδXj(t))dWti,Xi(0)=x0i\displaystyle\left\{\begin{array}[]{ll}dX^{i}(t)=b\left(t,X^{i}(t),\frac{1}{N}\sum_{j=1}^{N}\delta_{X^{j}(t)}\right)dt+\sigma\left(t,X^{i}(t),\frac{1}{N}\sum_{j=1}^{N}\delta_{X^{j}(t)}\right)dW^{i}_{t},\\ X^{i}(0)=x^{i}_{0}\end{array}\right.

for 1iN1\leq i\leq N, where x0ix^{i}_{0} are i.i.d variables with the probability distribution 0\mathcal{L}_{0} and WtiW_{t}^{i} are independent Brownian motions. While for interaction type, the form of equation is

{dX(ui,t)=b(t,X(ui,t),1Nj=1NδX(uj,t))dt+dσ(t,X(ui,t),1Nj=1NδX(uj,t),q)W(dq,dt),X(ui,0)=ui,\displaystyle\left\{\begin{array}[]{ll}dX(u^{i},t)=b\left(t,X(u^{i},t),\frac{1}{N}\sum_{j=1}^{N}\delta_{X(u^{j},t)}\right)dt\\ \qquad\qquad\quad+\int_{\mathbb{R}^{d}}\sigma\left(t,X(u^{i},t),\frac{1}{N}\sum_{j=1}^{N}\delta_{X(u^{j},t)},q\right)W(dq,dt),\\ X(u^{i},0)=u^{i},\end{array}\right.

where uiu^{i} are determined and the initial distribution μ0=1Ni=1Nδui\mu_{0}=\frac{1}{N}\sum_{i=1}^{N}\delta_{u^{i}} is known. Let NN\to\infty, then the limit condition of the above two equations are

{dX(t)=b(t,X(t),t)dt+σ(t,X(t),t)dWt,X(0)=x0,t=0X(t)1,\displaystyle\left\{\begin{array}[]{ll}dX(t)=b\left(t,X(t),\mathcal{L}_{t}\right)dt+\sigma\left(t,X(t),\mathcal{L}_{t}\right)dW_{t},\\ X(0)=x_{0},\\ \mathcal{L}_{t}=\mathcal{L}_{0}\circ X(t)^{-1},\end{array}\right.

and

{dX(u,t)=b(t,X(u,t),μt)dt+dσ(t,X(u,t),μt,q)W(dq,dt),X(u,0)=u,μt=μ0X(,t)1.\left\{\begin{array}[]{l}dX(u,t)=b\left(t,X(u,t),\mu_{t}\right)dt\\ \qquad\qquad\quad+\int_{\mathbb{R}^{d}}\sigma\left(t,X(u,t),\mu_{t},q\right)W(dq,dt),\\ X(u,0)=u,\\ \mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1}.\end{array}\right.

We see that for SDEs with interaction, one can determine a stochastic flow {x(,t),t0}\{x(\cdot,t),t\geq 0\}, whose evolution depends on the measure transferred by the flow. This measure can be treated as a mass distribution of the interacting particle system. This is the main difference with Mckean-Vlasov type.

Then it is natural to consider controlling the system, whose equation contains interaction. For example, we control the trajectory of each particles in order to get their mass distribution close to a given distribution ν\nu and hope to use less energy of control at the same time. So that we may set the state function has the form of (2) and the cost function is

J(αt)=𝔼0TAtαt2𝑑t+𝔼ρ2(μT,ν),\displaystyle J(\alpha_{t})=\mathbb{E}\int_{0}^{T}A_{t}\alpha_{t}^{2}dt+\mathbb{E}\rho^{2}(\mu_{T},\nu),

where AtA_{t} is a determined positive function and ρ\rho is a distance between measures, for example, defined as (4).

For the optimal control problem, the adjoint process is used to get the necessary condition for optimal control (known as Pontryagin maximum principle) [11, 12, 13, 14], some related works refer to [15, 16, 17, 18, 19, 20]. More precisely, for the classical stochastic optimal control problem, in which coefficients do not depend on the distribution of masses, the adjoint process is the solution (Y,Z)(Y,Z) to a backward stochastic differential equation (BSDE) [21, 22, 23], which has the following form:

{dYt=[bx(t,Xt,αt)Yt+σx(t,Xt,αt)Zt+fx(t,Xt,αt)]dtZtdWt,YT=gx(XT).\displaystyle\left\{\begin{array}[]{ll}-dY_{t}=[b_{x}(t,X_{t},\alpha_{t})Y_{t}+\sigma_{x}(t,X_{t},\alpha_{t})Z_{t}+f_{x}(t,X_{t},\alpha_{t})]dt-Z_{t}dW_{t},\\ \quad Y_{T}=g_{x}(X_{T}).\end{array}\right.

In our case the state equation (2) depends on the distribution of masses of the system μt\mu_{t}, as well as functions ff and gg in the cost function (3). This leads us to the necessity to use a derivative of a function with respect to a probability measure in order to define an adjoint process to the optimal control problem with interactions. Using the existence and uniqueness of the generalized BSDE we will get an analog of Pontryagin stochastic maximum principle for optimal control of the equation with interaction.

In this paper, we will construct an adjoint process in order to get the condition that the optimal control should satisfy. We will get the analogue of Pontryagin stochastic maximum principle for the optimal control problem (2-3). To this aim, we introduce a generalized backward stochastic differential equation with interaction and prove the existence and uniqueness of the solution. Note that one form of backward stochastic differential equation was considered in [24]. But in order to apply it to a control problem with interaction, we should have BSDE with a more general form. The main difficulty that arises here is that in such BSDE with interaction, the coefficients of the equation depend on a random distribution of masses of the flow. The statement of existence and uniqueness of the solution is interesting itself.

The rest of this paper has the following structure. In Section 2, we recall the notion of L-differentiability for functions of probability law. We will use this type of differentiability to introduce an adjoint equation to a control problem with interaction. We define the generalized backward stochastic differential equation with interaction in Section 3. Here we prove the existence and uniqueness of the solution to this equation. Section 4 is devoted to the control problem with interaction. The corresponding adjoint equation is given in the form of a backward stochastic differential equation with interaction from Section 3. We end this section with the maximum principle for optimal control. Some examples are considered in Section 5 to illustrate the main results.

2. Differentiation of Function of Measure Argument

As we mentioned in the Introduction, in order to define the adjoint process to the stochastic optimal control for the equation with interactions, we need to differentiate the coefficients of equations with respect to probability measure. There are many approaches for the definition of differentiability of a real-valued function that is defined on a space of measures (see, for example, [25, 26, 27, 28] ). We will use the concept of differentiability in the sense of P.L.Lions for functions of measure (see, for example, [29]). Let Lp(d,,μ;d)L^{p}(\mathbb{R}^{d},\mathcal{B},\mu;\mathbb{R}^{d}) be the set of \mathcal{B} measurable variables X:ddX:\mathbb{R}^{d}\to\mathbb{R}^{d} such that XLp=(d|x|pμ(dx))1p<||X||_{L^{p}}=\left(\int_{\mathbb{R}^{d}}|x|^{p}\mu(dx)\right)^{\frac{1}{p}}<\infty. Let 𝒫(d)\mathcal{P}(\mathbb{R}^{d}) be the set of mass measures μ\mu on (d,(d))(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d})), and

𝒫2(d):={μ𝒫(d):d|x|2μ(dx)<}.\displaystyle\mathcal{P}_{2}(\mathbb{R}^{d}):=\left\{\mu\in\mathcal{P}(\mathbb{R}^{d}):\int_{\mathbb{R}^{d}}|x|^{2}\mu(dx)<\infty\right\}.

The 2-Wasserstein metric for μ1,μ2𝒫2(d)\mu^{1},\mu^{2}\in\mathcal{P}_{2}(\mathbb{R}^{d}) defined as

W2(μ1,μ2):=inf{(d×d|xy|2ρ(dx,dy))12;ρ𝒫2(d×d),ρ(,d)=μ1(),ρ(d,)=μ2()}.\displaystyle W_{2}(\mu^{1},\mu^{2}):=\inf\left\{\left(\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}|x-y|^{2}\rho(dx,dy)\right)^{\frac{1}{2}};\,\rho\in\mathcal{P}_{2}(\mathbb{R}^{d}\times\mathbb{R}^{d}),\,\rho(\cdot,\mathbb{R}^{d})=\mu^{1}(\cdot),\,\rho(\mathbb{R}^{d},\cdot)=\mu^{2}(\cdot)\right\}.

For any function h:𝒫2(d)h:\mathcal{P}_{2}(\mathbb{R}^{d})\to\mathbb{R}, we let ”lifted” function h~:L2(;d)d\tilde{h}:L^{2}(\mathcal{B};\mathbb{R}^{d})\to\mathbb{R}^{d} defined by h~(X)=h(μX),\tilde{h}(X)=h(\mu_{X}), where XL2(;d)X\in L^{2}(\mathcal{B};\mathbb{R}^{d}) is a random variable with distribution μ.\mu. If for μ𝒫2(d)\mu\in\mathcal{P}_{2}(\mathbb{R}^{d}), there exists XL2(;d)X\in L^{2}(\mathcal{B};\mathbb{R}^{d}) such that μ=μX\mu=\mu_{X} and h~:L2(;d)d\tilde{h}:L^{2}(\mathcal{B};\mathbb{R}^{d})\to\mathbb{R}^{d} is Fréchet differentiable at XX, then h:𝒫2(d)h:\mathcal{P}_{2}(\mathbb{R}^{d})\to\mathbb{R} is said to be differentiable at μ\mu and the derivative is defined as follows.  

Definition 1.

Let XL2(;d)X\in L^{2}(\mathcal{B};\mathbb{R}^{d}). We say h~\tilde{h} is Fréchet differentiable at XX, if there exists Dh~(X)L(L2(;d),d)D\tilde{h}(X)\in L(L^{2}(\mathcal{B};\mathbb{R}^{d}),\mathbb{R}^{d}) such that for all YL2(;d)\quad Y\in L^{2}(\mathcal{B};\mathbb{R}^{d}),

h~(X+Y)h~(X)=Dh~(X)(Y)+o(YL2),asYL20,\displaystyle\tilde{h}(X+Y)-\tilde{h}(X)=D\tilde{h}(X)(Y)+o(||Y||_{L^{2}}),\quad as\quad||Y||_{L^{2}}\to 0, (5)

where YL22=dy2μY(dy).||Y||_{L^{2}}^{2}=\int_{\mathbb{R}^{d}}y^{2}\mu_{Y}(dy).

Since L2(;d)L^{2}(\mathcal{B};\mathbb{R}^{d}) is HilbertHilbert space, due to the Riesz representation theorem, and as shown by Lions [30], there exists a Borel measurable function g:ddg:\mathbb{R}^{d}\to\mathbb{R}^{d} such that

Dh~(X)(Y)=d×dg(x)yρ(dx,dy),Pa.s.YL2(;d),\displaystyle D\tilde{h}(X)(Y)=\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}g(x)\cdot y\rho(dx,dy),\quad P-a.s.\quad Y\in L^{2}(\mathcal{B};\mathbb{R}^{d}),

where ρ\rho is the adjoint probability measure of XX and YY, and gg depends on XX only through its law μX\mu_{X}. Thus, we can write (5) as

h(μX+Y)h(μX)=d×dg(x)yρ(dx,dy)+o(YL2).\displaystyle h(\mu_{X+Y})-h(\mu_{X})=\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}g(x)\cdot y\rho(dx,dy)+o(||Y||_{L^{2}}).

The function g()g(\cdot) is called the derivative of h:𝒫2(d)h:\mathcal{P}_{2}(\mathbb{R}^{d})\to\mathbb{R} at μ=μX\mu=\mu_{X} and it is denoted by hμ(μ,y)=g(y),ydh_{\mu}(\mu,y)=g(y),\quad y\in\mathbb{R}^{d}.

Example 1.

Consider a(u,μ)=K(uv)μ(dv)a(u,\mu)=\int_{\mathbb{R}}K(u-v)\mu(dv), μ𝒫2(d)\mu\in\mathcal{P}_{2}(\mathbb{R}^{d}) and KK is a smooth function, then a~(u,X)=K(uX)¯\tilde{a}(u,X)=\overline{K(u-X)}, where XμX\sim\mu. Let YY be another random variable and YL2\|Y\|_{L^{2}} is small enough. Assume that the joint distribution of XX and YY is ρ(,)\rho(\cdot,\cdot). Then

a~(u,X+Y)a~(u,X)\displaystyle\tilde{a}(u,X+Y)-\tilde{a}(u,X) =K(u(X+Y))¯K(uX)¯\displaystyle=\overline{K\left(u-(X+Y)\right)}-\overline{K(u-X)}
=K(uX)Y¯=×K(ux)yρ(dx,dy)+o(YL2),\displaystyle=-\overline{K^{\prime}(u-X)Y}=-\int_{\mathbb{R}\times\mathbb{R}}K^{\prime}(u-x)y\rho(dx,dy)+o(\|Y\|_{L^{2}}),

which shows Da~(u,X)(Y)=×K(ux)yρ(dx,dy)D\tilde{a}(u,X)(Y)=-\int_{\mathbb{R}\times\mathbb{R}}K^{\prime}(u-x)y\rho(dx,dy) and aμ(u,μ,x)=K(ux).a_{\mu}(u,\mu,x)=-K^{\prime}(u-x).

3. Backward stochastic differential equation with interaction

Consider the following generalized BSDE with interaction

{dy(u,t)=f(u,t,y(u,t),z(u,t),ty,u,tz,u)dtz(u,t)dWt,y(u,T)=ξ(u),uμ0,ty,u=μ0Φ1(u,t,,y(,t)),tz,u=μ0Ψ1(u,t,,z(,t)),\displaystyle\left\{\begin{array}[]{ll}-dy(u,t)&=f(u,t,y(u,t),z(u,t),\mathcal{M}^{y,u}_{t},\mathcal{M}^{z,u}_{t})dt-z(u,t)dW_{t},\\ \quad y(u,T)&=\xi(u),\quad u\sim\mu_{0},\\ \qquad\mathcal{M}^{y,u}_{t}&=\mu_{0}\circ\Phi^{-1}(u,t,\cdot,y(\cdot,t)),\\ \qquad\mathcal{M}^{z,u}_{t}&=\mu_{0}\circ\Psi^{-1}(u,t,\cdot,z(\cdot,t)),\end{array}\right. (10)

where μ0𝒫(d)\mu_{0}\in\mathcal{P}(\mathbb{R}^{d}) is the given mass distribution, and for all u,vd,u,v\in\mathbb{R}^{d}, E|ξ(u1)ξ(u2)|2L|u1u2|2E|\xi(u_{1})-\xi(u_{2})|^{2}\leq L|u_{1}-u_{2}|^{2}. Φ(u,t,v,y)\Phi(u,t,v,y) and Ψ(u,t,v,z)\Psi(u,t,v,z) are measurable functions on d×[0,T]×d×d\mathbb{R}^{d}\times[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d} and d×[0,T]×d×d×m\mathbb{R}^{d}\times[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d\times m} with values in d\mathbb{R}^{d}, respectively. The function f(u,t,y,z,μ,ν)f(u,t,y,z,\mu,\nu) is t\mathcal{F}_{t}-adapted on d×[0,T]×d×d×m×𝒫(d)×𝒫(d)\mathbb{R}^{d}\times[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathcal{P}(\mathbb{R}^{d})\times\mathcal{P}(\mathbb{R}^{d}) with values in d\mathbb{R}^{d}.

In order to formulate the definition of the solution to this generalized BSDE with interaction we introduce the following notations. Let μ0\mu_{0} is a given mass measure.

T2(d)\mathbb{H}_{T}^{2}\left(\mathbb{R}^{d}\right), the space of all predictable processes φ:Ω×d×[0,T]d\varphi:\Omega\times\mathbb{R}^{d}\times[0,T]\mapsto\mathbb{R}^{d} such that φ2=\|\varphi\|^{2}= Ed0T|φt|2𝑑tμ0(du)<+.E\int_{\mathbb{R}^{d}}\int_{0}^{T}\left|\varphi_{t}\right|^{2}dt\mu_{0}(du)<+\infty.

For β>0\beta>0 ,ϕβ2,\|\phi\|_{\beta}^{2} denotes 𝔼d0Teβt|ϕ(u,t)|2𝑑tμ0(du).\mathbb{E}\int_{\mathbb{R}^{d}}\int_{0}^{T}e^{\beta t}\left|\phi(u,t)\right|^{2}dt\mu_{0}(du).

T,β2(d)\mathbb{H}_{T,\beta}^{2}\left(\mathbb{R}^{d}\right) denotes the space of measurable functions ϕ:Ω×d×[0,T]d\phi:\Omega\times\mathbb{R}^{d}\times[0,T]\mapsto\mathbb{R}^{d} ϕT2(d)\phi\in\mathbb{H}_{T}^{2}\left(\mathbb{R}^{d}\right) endowed with the norm β\|\cdot\|_{\beta}.

Definition 2.

A solution of the generalized BSDE with interaction (10) is a pair of processes (y,z)T,β2(d)×T,β2(d×m)(y,z)\in\mathbb{H}_{T,\beta}^{2}\left(\mathbb{R}^{d}\right)\times\mathbb{H}_{T,\beta}^{2}\left(\mathbb{R}^{d\times m}\right) such that for all t[0,T]t\in[0,T],

y(u,t)=ξ(u)+tTf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑stTz(u,s)𝑑Ws,y(u,t)=\xi(u)+\int_{t}^{T}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})ds-\int_{t}^{T}z(u,s)dW_{s},

where

ty,u=μ0Φ1(u,t,,y(,t)),tz,u=μ0Ψ1(u,t,,z(,t)).\mathcal{M}^{y,u}_{t}=\mu_{0}\circ\Phi^{-1}(u,t,\cdot,y(\cdot,t)),\qquad\mathcal{M}^{z,u}_{t}=\mu_{0}\circ\Psi^{-1}(u,t,\cdot,z(\cdot,t)).

Note that the solution of this type of BSDE is given by

y(u,t)\displaystyle y(u,t) =𝔼t[ξ(u)+tTf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑t]\displaystyle=\mathbb{E}^{\mathcal{F}_{t}}\left[\xi(u)+\int_{t}^{T}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})dt\right]
=𝔼t[ξ(u)+0Tf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑t]\displaystyle=\mathbb{E}^{\mathcal{F}_{t}}\left[\xi(u)+\int_{0}^{T}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})dt\right]
0tf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑t\displaystyle\quad-\int_{0}^{t}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})dt
=M(u,t)0tf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑t,\displaystyle=M(u,t)-\int_{0}^{t}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})dt,

where M(u,t)=𝔼t[ξ(u)+0Tf(u,s,y(u,s),z(u,s),sy,u,sz,u)𝑑t]M(u,t)=\mathbb{E}^{\mathcal{F}_{t}}\left[\xi(u)+\int_{0}^{T}f(u,s,y(u,s),z(u,s),\mathcal{M}^{y,u}_{s},\mathcal{M}^{z,u}_{s})dt\right] is a martingale and the function z(u,t)z(u,t) is given by the martingale represent theorem

M(u,t)=M(u,T)tTz(u,s)𝑑Ws.\displaystyle M(u,t)=M(u,T)-\int_{t}^{T}z(u,s)dW_{s}.

We assume that there are exist constants L1,L2>0L_{1},\ L_{2}>0 such that for all t[0,T],u,vdt\in[0,T],u,v\in\mathbb{R}^{d},

|f(u,t,y1,z1,μ1,ν1)f(u,t,y2,z2,μ2,ν2)|2L1R12,\displaystyle|f(u,t,y_{1},z_{1},\mu_{1},\nu_{1})-f(u,t,y_{2},z_{2},\mu_{2},\nu_{2})|^{2}\leq L_{1}R_{1}^{2}, (11)

and

|Φ(u,t,v1,x1)Φ(u,t,v2,x2)|2+|Ψ(u,t,v1,x1)Ψ(u,t,v2,x2)|2L2R22,\displaystyle|\Phi(u,t,v_{1},x_{1})-\Phi(u,t,v_{2},x_{2})|^{2}+|\Psi(u,t,v_{1},x_{1})-\Psi(u,t,v_{2},x_{2})|^{2}\leq L_{2}R_{2}^{2}, (12)

where R12=(|y1y2|2+|z1z2|2+W22(μ1,μ2)+W22(ν1,ν2))R_{1}^{2}=\left(|y_{1}-y_{2}|^{2}+|z_{1}-z_{2}|^{2}+W_{2}^{2}(\mu_{1},\mu_{2})+W_{2}^{2}(\nu_{1},\nu_{2})\right), W2(,)W_{2}(\cdot,\cdot) is the 2-Wasserstein metric and R22=(|v1v2|2+|x1x2|2)R_{2}^{2}=\left(|v_{1}-v_{2}|^{2}+|x_{1}-x_{2}|^{2}\right).  

Lemma 1.

Assume that (11) and (12) hold. Then the equation (10) has a unique solution.

Proof.

We use the method of constructing contract mapping which is referred to Chapter 2 in [23]. For any t\mathcal{F}_{t}-adapted continuous process y1(u,t),y2(u,t),z1(u,t)y^{1}(u,t),y^{2}(u,t),z^{1}(u,t),
z2(u,t)z^{2}(u,t) with bounded β\beta-norm, and measurable with respect to spatial variable, let

{dYi(u,t)=f(u,t,yi(u,t),zi(u,t),tyi,u,tzi,u)dtZi(u,t)dWt,Yi(u,T)=ξ(u),\displaystyle\left\{\begin{array}[]{ll}-dY^{i}(u,t)=f(u,t,y^{i}(u,t),z^{i}(u,t),\mathcal{M}^{y^{i},u}_{t},\mathcal{M}^{z^{i},u}_{t})dt-Z^{i}(u,t)dW_{t},\\ \quad Y^{i}(u,T)=\xi(u),\end{array}\right.

for i=1,2i=1,2. Our aim is to construct a contraction mapping T:(y,z)(Y,Z)T:(y,z)\to(Y,Z) under the β\beta-norm.

We denote

δl(u,t)=l1(u,t)l2(u,t),\displaystyle\delta l(u,t)=l^{1}(u,t)-l^{2}(u,t),

for l=Y,Z,y,zl=Y,Z,y,z, and

δf(u,t)=\displaystyle\delta f(u,t)= f(u,t,y1(u,t),z1(u,t),ty1,u,tz1,u)\displaystyle f(u,t,y^{1}(u,t),z^{1}(u,t),\mathcal{M}^{y^{1},u}_{t},\mathcal{M}^{z^{1},u}_{t})
f(u,t,y2(u,t),z2(u,t),ty2,u,tz2,u).\displaystyle-f(u,t,y^{2}(u,t),z^{2}(u,t),\mathcal{M}^{y^{2},u}_{t},\mathcal{M}^{z^{2},u}_{t}).

By Ito^\hat{\rm o}’s formula, we have that

d(eβtδY(u,t)2)=\displaystyle d\left(e^{\beta t}\delta Y(u,t)^{2}\right)= βeβtδY(u,t)2dt+2eβtδY(u,t)dδY(u,t)+eβt(dδY(u,t))2\displaystyle\beta e^{\beta t}\delta Y(u,t)^{2}dt+2e^{\beta t}\delta Y(u,t)d\delta Y(u,t)+e^{\beta t}\left(d\delta Y(u,t)\right)^{2}
=\displaystyle= eβt[βδY(u,t)22δY(u,t)δf(u,t)+δZ(u,t)2]dt\displaystyle e^{\beta t}\left[\beta\delta Y(u,t)^{2}-2\delta Y(u,t)\delta f(u,t)+\delta Z(u,t)^{2}\right]dt
+eβtδY(u,t)δZ(u,t)dWt.\displaystyle+e^{\beta t}\delta Y(u,t)\delta Z(u,t)dW_{t}.

Taking the integral from tt to TT and taking the expectation, we have

𝔼eβtδY(u,t)2+EtTeβsδZ(u,s)2𝑑s\displaystyle\mathbb{E}e^{\beta t}\delta Y(u,t)^{2}+E\int_{t}^{T}e^{\beta s}\delta Z(u,s)^{2}ds (13)
=\displaystyle= 𝔼tTeβs[βδY(u,s)2+2δY(u,s)δf(u,s)]𝑑s\displaystyle\mathbb{E}\int_{t}^{T}e^{\beta s}\left[-\beta\delta Y(u,s)^{2}+2\delta Y(u,s)\delta f(u,s)\right]ds
\displaystyle\leq 𝔼tTeβs[βδY(u,s)2+βδY(u,s)2+1βδf(u,s)2]𝑑s\displaystyle\mathbb{E}\int_{t}^{T}e^{\beta s}\left[-\beta\delta Y(u,s)^{2}+\beta\delta Y(u,s)^{2}+\frac{1}{\beta}\delta f(u,s)^{2}\right]ds
\displaystyle\leq L1β𝔼tTeβs[δy(u,s)2+δz(u,s)2+W22(sy1,u,sy2,u)+W22(sz1,u,sz2,u)]𝑑s.\displaystyle\frac{L_{1}}{\beta}\mathbb{E}\int_{t}^{T}e^{\beta s}\Big{[}\delta y(u,s)^{2}+\delta z(u,s)^{2}+W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u})+W_{2}^{2}(\mathcal{M}_{s}^{z^{1},u},\mathcal{M}_{s}^{z^{2},u})\Big{]}ds.

Notice that

W22(sy1,u,sy2,u)\displaystyle W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u}) |Φ(u,s,v,y1(v,s))Φ(u,s,v,y2(v,s))|2μ0(dv)\displaystyle\leq\int_{\mathbb{R}}\Big{|}\Phi\big{(}u,s,v,y^{1}(v,s)\big{)}-\Phi\big{(}u,s,v,y^{2}(v,s)\big{)}\Big{|}^{2}\mu_{0}(dv)
L2dδy(v,s)2μ0(dv).\displaystyle\leq L_{2}\int_{\mathbb{R}^{d}}\delta y(v,s)^{2}\mu_{0}(dv).

So by Fubini’s theorem, we have that

𝔼0TeβsW22(sy1,u,sy2,u)𝑑sL2δyβ2.\displaystyle\mathbb{E}\int_{0}^{T}e^{\beta s}W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u})ds\leq L_{2}\|\delta y\|^{2}_{\beta}. (14)

In the same way, we also get

𝔼0TeβsW22(sz1,u,sz2,u)𝑑sL2δzβ2.\displaystyle\mathbb{E}\int_{0}^{T}e^{\beta s}W_{2}^{2}(\mathcal{M}_{s}^{z^{1},u},\mathcal{M}_{s}^{z^{2},u})ds\leq L_{2}\|\delta z\|^{2}_{\beta}. (15)

Through (13) we have

𝔼eβtδY(u,t)2\displaystyle\quad\mathbb{E}e^{\beta t}\delta Y(u,t)^{2} (16)
L1β𝔼tTeβs[δy(u,s)2+δz(u,s)2+W22(sy1,u,sy2,u)+W22(sz1,u,sz2,u)]𝑑s\displaystyle\leq\frac{L_{1}}{\beta}\mathbb{E}\int_{t}^{T}e^{\beta s}\Big{[}\delta y(u,s)^{2}+\delta z(u,s)^{2}+W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u})+W_{2}^{2}(\mathcal{M}_{s}^{z^{1},u},\mathcal{M}_{s}^{z^{2},u})\Big{]}ds
L1β𝔼0Teβs[δy(u,s)2+δz(u,s)2+W22(sy1,u,sy2,u)+W22(sz1,u,sz2,u)]𝑑s,\displaystyle\leq\frac{L_{1}}{\beta}\mathbb{E}\int_{0}^{T}e^{\beta s}\Big{[}\delta y(u,s)^{2}+\delta z(u,s)^{2}+W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u})+W_{2}^{2}(\mathcal{M}_{s}^{z^{1},u},\mathcal{M}_{s}^{z^{2},u})\Big{]}ds,

and

𝔼tTeβsδZ(u,s)2𝑑s\displaystyle\quad\mathbb{E}\int_{t}^{T}e^{\beta s}\delta Z(u,s)^{2}ds
L1β𝔼tTeβs[δy(u,s)2+δz(u,s)2+W22(sy1,u,sy2,u)+W22(sz1,u,sz2,u)]𝑑s.\displaystyle\leq\frac{L_{1}}{\beta}\mathbb{E}\int_{t}^{T}e^{\beta s}\Big{[}\delta y(u,s)^{2}+\delta z(u,s)^{2}+W_{2}^{2}(\mathcal{M}_{s}^{y^{1},u},\mathcal{M}_{s}^{y^{2},u})+W_{2}^{2}(\mathcal{M}_{s}^{z^{1},u},\mathcal{M}_{s}^{z^{2},u})\Big{]}ds.

Taking the integral of μ0\mu_{0} and integral of tt from 0 to TT in (16) and by (14), (15), we have that

δYβ2L1(1+L2)Tβ(δyβ2+δzβ2).\displaystyle\|\delta Y\|^{2}_{\beta}\leq\frac{L_{1}(1+L_{2})T}{\beta}\Big{(}\|\delta y\|^{2}_{\beta}+\|\delta z\|^{2}_{\beta}\Big{)}.

Let t=0t=0 and take the integral of μ0\mu_{0}, we get

δZβ2L1(1+L2)β(δyβ2+δzβ2).\displaystyle\|\delta Z\|^{2}_{\beta}\leq\frac{L_{1}(1+L_{2})}{\beta}\Big{(}\|\delta y\|^{2}_{\beta}+\|\delta z\|^{2}_{\beta}\Big{)}.

Thus we have

δYβ2+δZβ2L1(1+L2)(T+1)β(δyβ2+δzβ2).\displaystyle\|\delta Y\|^{2}_{\beta}+\|\delta Z\|^{2}_{\beta}\leq\frac{L_{1}(1+L_{2})(T+1)}{\beta}\Big{(}\|\delta y\|^{2}_{\beta}+\|\delta z\|^{2}_{\beta}\Big{)}.

Choosing β>L1(1+L2)(T+1)\beta>L_{1}(1+L_{2})(T+1), we see that this mapping T:(y,z)(Y,Z)T:(y,z)\to(Y,Z) is a contraction from T,β2(d)×T,β2(d×m)\mathbb{H}_{T,\beta}^{2}\left(\mathbb{R}^{d}\right)\times\mathbb{H}_{T,\beta}^{2}\left(\mathbb{R}^{d\times m}\right) onto itself and that there exists a fixed point, which is the unique continuous solution of the BSDE. ∎

Lemma 2.

Assume that ξ\xi satisfy 𝔼|ξ(u)ξ(v)|2kMk|uv|2k\mathbb{E}|\xi(u)-\xi(v)|^{2k}\leq M_{k}|u-v|^{2k} for every u,vd,u,v\in\mathbb{R}^{d}, and (11), (12) hold. Then there exists some constants Ck>0C_{k}>0 related to Mk,L1,L2M_{k},L_{1},L_{2} and TT, such that

𝔼|y(u,t)y(v,t)|2kCk|uv|2k,t[0,T].\mathbb{E}|y(u,t)-y(v,t)|^{2k}\leq C_{k}|u-v|^{2k},\quad\forall t\in[0,T].
Proof.

We consider the case k=1k=1 at first. Denote

Δlt=l(u,t)l(v,t)\Delta l_{t}=l(u,t)-l(v,t)

for l=y,zl=y,z, and

Δft=f(u,t,y(u,t),z(u,t),ty,u,tz,u)f(v,t,y(v,t),z(v,t),ty,v,tz,v).\Delta f_{t}=f(u,t,y(u,t),z(u,t),\mathcal{M}^{y,u}_{t},\mathcal{M}^{z,u}_{t})-f(v,t,y(v,t),z(v,t),\mathcal{M}^{y,v}_{t},\mathcal{M}^{z,v}_{t}).

So

d|Δyt|2=\displaystyle d|\Delta y_{t}|^{2}= 2ΔytdΔyt+(dΔyt)2\displaystyle 2\Delta y_{t}d\Delta y_{t}+(d\Delta y_{t})^{2}
=\displaystyle= 2ΔytΔftdt+|Δzt|2dt+2ΔytΔztdWt.\displaystyle-2\Delta y_{t}\Delta f_{t}dt+|\Delta z_{t}|^{2}dt+2\Delta y_{t}\Delta z_{t}dW_{t}.

Taking the integral from tt to TT and taking the expectation, we have

𝔼|Δyt|2=\displaystyle\mathbb{E}|\Delta y_{t}|^{2}= 𝔼|ξ(u)ξ(v)|2+2𝔼tTΔysΔfs𝑑s𝔼tT|Δzs|2𝑑s\displaystyle\mathbb{E}|\xi(u)-\xi(v)|^{2}+2\mathbb{E}\int_{t}^{T}\Delta y_{s}\Delta f_{s}ds-\mathbb{E}\int_{t}^{T}|\Delta z_{s}|^{2}ds
\displaystyle\leq M1|uv|2+𝔼tT(c|Δys|2+1c|Δfs|2)𝑑s𝔼tT|Δzs|2𝑑s.\displaystyle M_{1}|u-v|^{2}+\mathbb{E}\int_{t}^{T}\left(c|\Delta y_{s}|^{2}+\frac{1}{c}|\Delta f_{s}|^{2}\right)ds-\mathbb{E}\int_{t}^{T}|\Delta z_{s}|^{2}ds.

Notice that

W22(ty,u,ty,v)+W22(tz,u,tz,v)\displaystyle W_{2}^{2}(\mathcal{M}_{t}^{y,u},\mathcal{M}_{t}^{y,v})+W_{2}^{2}(\mathcal{M}_{t}^{z,u},\mathcal{M}_{t}^{z,v})
\displaystyle\leq |Φ(u,t,r,y(r,t))Φ(v,t,r,y(r,t))|2μ0(dr)\displaystyle\int_{\mathbb{R}}\Big{|}\Phi\big{(}u,t,r,y(r,t)\big{)}-\Phi\big{(}v,t,r,y(r,t)\big{)}\Big{|}^{2}\mu_{0}(dr)
+|Ψ(u,t,r,y(r,t))Ψ(v,t,r,y(r,t))|2μ0(dr)\displaystyle+\int_{\mathbb{R}}\Big{|}\Psi\big{(}u,t,r,y(r,t)\big{)}-\Psi\big{(}v,t,r,y(r,t)\big{)}\Big{|}^{2}\mu_{0}(dr)
\displaystyle\leq L2|uv|2μ0(dr)=L2|uv|2.\displaystyle L_{2}\int_{\mathbb{R}}|u-v|^{2}\mu_{0}(dr)=L_{2}|u-v|^{2}.

So

|Δft|2\displaystyle|\Delta f_{t}|^{2}\leq L1(|uv|2+|Δyt|2+|Δzt|2+W22(ty,u,ty,v)+W22(tz,u,tz,v))\displaystyle L_{1}\left(|u-v|^{2}+|\Delta y_{t}|^{2}+|\Delta z_{t}|^{2}+W_{2}^{2}(\mathcal{M}_{t}^{y,u},\mathcal{M}_{t}^{y,v})+W_{2}^{2}(\mathcal{M}_{t}^{z,u},\mathcal{M}_{t}^{z,v})\right)
\displaystyle\leq L1(1+L2)|uv|2+L1(|Δyt|2+|Δzt|2).\displaystyle L_{1}(1+L_{2})|u-v|^{2}+L_{1}\Big{(}|\Delta y_{t}|^{2}+|\Delta z_{t}|^{2}\Big{)}. (17)

Then take c=2L1c=2L_{1}, we get

𝔼|Δyt|2+12tT|Δzs|2𝑑s(M1+T+L22)|uv|2+(2L1+12)tT|Δys|2𝑑s.\displaystyle\mathbb{E}|\Delta y_{t}|^{2}+\frac{1}{2}\int_{t}^{T}|\Delta z_{s}|^{2}ds\leq\Big{(}M_{1}+\frac{T+L_{2}}{2}\Big{)}|u-v|^{2}+\Big{(}2L_{1}+\frac{1}{2}\Big{)}\int_{t}^{T}|\Delta y_{s}|^{2}ds.

By Gronwall’s inequality, we get the conclusion of k=1k=1. Moreover, we also get

𝔼0T|z(u,s)z(v,s)|2𝑑sC~|uv|2\displaystyle\mathbb{E}\int_{0}^{T}|z(u,s)-z(v,s)|^{2}ds\leq\tilde{C}|u-v|^{2} (18)

for some C~>0\tilde{C}>0. Similar with the proof of the case k=1k=1, we get

𝔼|Δyt|2k=\displaystyle\mathbb{E}|\Delta y_{t}|^{2k}= 𝔼|ξ(u)ξ(v)|2k+2k𝔼tTΔys2k1Δfs𝑑s\displaystyle\mathbb{E}|\xi(u)-\xi(v)|^{2k}+2k\mathbb{E}\int_{t}^{T}\Delta y_{s}^{2k-1}\Delta f_{s}ds
+k(2k1)𝔼tT|Δys|2k2|Δzs|2𝑑s.\displaystyle+k(2k-1)\mathbb{E}\int_{t}^{T}|\Delta y_{s}|^{2k-2}|\Delta z_{s}|^{2}ds.

Notice that Δys2k1Δfs12(Δ|ys|2k+|Δys|2k2|Δfs|2)\Delta y_{s}^{2k-1}\Delta f_{s}\leq\frac{1}{2}(\Delta|y_{s}|^{2k}+|\Delta y_{s}|^{2k-2}|\Delta f_{s}|^{2}). In the same way, the conclusion is proved. ∎

4. Stochastic Maximum Principle

We consider the following control problem. The state equation is

{dX(u,t)=b(t,X(u,t),μt,αt)dt+σ(t,X(u,t),μt,αt)dWt,0tT,X(u,0)=u,μt=μ0X(,t)1,\displaystyle\left\{\begin{array}[]{ll}dX(u,t)=b\Big{(}t,X(u,t),\mu_{t},\alpha_{t}\Big{)}dt+\sigma\Big{(}t,X(u,t),\mu_{t},\alpha_{t}\Big{)}dW_{t},\qquad 0\leq t\leq T,\\ X(u,0)=u,\\ \mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1},\end{array}\right. (22)

correspondingly μt=μ0X(,t)1\mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1} is a mass distribution of all particles at time tt. Choose the cost function as

J(αt)=𝔼d[0Tf(t,X(u,t),μt,αt)𝑑t+g(X(u,T),μT)]μ0(du),\displaystyle J(\alpha_{t})=\mathbb{E}\int_{\mathbb{R}^{d}}\Big{[}\int_{0}^{T}f\Big{(}t,X(u,t),\mu_{t},\alpha_{t}\Big{)}dt+g(X(u,T),\mu_{T})\Big{]}\mu_{0}(du), (23)

where b(t,x,μ,α)b(t,x,\mu,\alpha) and σ(t,x,μ,α)\sigma(t,x,\mu,\alpha) are measurable functions on ×d×𝒫2(d)×k\mathbb{R}\times\mathbb{R}^{d}\times\mathcal{P}_{2}(\mathbb{R}^{d})\times\mathbb{R}^{k} with values in d\mathbb{R}^{d} and d×m\mathbb{R}^{d\times m}, respectively, and differentiable with respect to x,μ,α.x,\mu,\alpha. The functions f(t,x,μ,α)f(t,x,\mu,\alpha) and g(x,μ)g(x,\mu) are differential measurable functions on ×d×𝒫2(d)×k\mathbb{R}\times\mathbb{R}^{d}\times\mathcal{P}_{2}(\mathbb{R}^{d})\times\mathbb{R}^{k} and d×𝒫2(d)\mathbb{R}^{d}\times\mathcal{P}_{2}(\mathbb{R}^{d}), respectively, with values in \mathbb{R}, and differentiable with respect to x,μ,α.x,\mu,\alpha. Here we assume

|b(t,x,μ,α)|\displaystyle|b(t,x,\mu,\alpha)| +|σ(t,x,μ,α)|L(1+|x|+M2(μ)),\displaystyle+|\sigma(t,x,\mu,\alpha)|\leq L(1+|x|+M_{2}(\mu)),
|b(t,x1,μ1,α)b(t,x2,μ2,α)|\displaystyle|b(t,x_{1},\mu_{1},\alpha)-b(t,x_{2},\mu_{2},\alpha)| +|σ(t,x1,μ1,α)σ(t,x2,μ2,α)|LR3\displaystyle+|\sigma(t,x_{1},\mu_{1},\alpha)-\sigma(t,x_{2},\mu_{2},\alpha)|\leq LR_{3}

for some constants L>0L>0, where R3=(|x1x2|+W2(μ1,μ2))R_{3}=\left(|x_{1}-x_{2}|+W_{2}(\mu_{1},\mu_{2})\right).

We denote by 𝕌\mathbb{U} the set of admissible controls α=(αt)0tT\alpha=(\alpha_{t})_{0\leq t\leq T} taking values in a given closed-convex set Uk\textbf{U}\subset\mathbb{R}^{k} and satisfying 𝔼0T|αt|2𝑑t<\mathbb{E}\int_{0}^{T}|\alpha_{t}|^{2}dt<\infty.

To simplify the notation without losing the generality, we just consider the case d=m=k=1d=m=k=1. We assume αt\alpha_{t}^{*} is the optimal control process, i.e.,

J(αt)=minαt𝕌J(αt).\displaystyle J(\alpha_{t}^{*})=\min_{\alpha_{t}\in\mathbb{U}}J(\alpha_{t}).

For all 0<ε<10<\varepsilon<1, let

αtε=(1ε)αt+εα~t:=αt+εβt,\displaystyle\alpha_{t}^{\varepsilon}=(1-\varepsilon)\alpha_{t}^{*}+\varepsilon\tilde{\alpha}_{t}:=\alpha^{*}_{t}+\varepsilon\beta_{t},

where α~t\tilde{\alpha}_{t} is any other admissible control. Let Xε(u,t),X(u,t)X^{\varepsilon}(u,t),X^{*}(u,t) be the corresponding state process of αtε,αt\alpha^{\varepsilon}_{t},\alpha^{*}_{t}, respectively.

Define V(u,t)V(u,t) by

{dV(u,t)=[bx(u,t)V(u,t)+bμ(u,t)(v)V(v,t)μ0(dv)+bα(u,t)βt]dt+[σx(u,t)V(u,t)+σμ(u,t)(v)V(v,t)μ0(dv)+σα(u,t)βt]dWt,V(u,0)=0,\displaystyle\left\{\begin{array}[]{ll}dV(u,t)=\Big{[}b_{x}^{*}(u,t)V(u,t)+\int_{\mathbb{R}}b_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)+b_{\alpha}^{*}(u,t)\beta_{t}\Big{]}dt\\ \qquad\qquad\quad+\Big{[}\sigma_{x}^{*}(u,t)V(u,t)+\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)+\sigma_{\alpha}^{*}(u,t)\beta_{t}\Big{]}dW_{t},\\ V(u,0)=0,\end{array}\right. (27)

where

bx(u,t)=bx(t,X(u,t),μt,αt),\displaystyle b_{x}^{*}(u,t)=b_{x}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right), bα(u,t)=bα(t,X(u,t),μt,αt),\displaystyle\qquad b_{\alpha}^{*}(u,t)=b_{\alpha}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right),
σx(u,t)=σx(t,X(u,t),μt,αt),\displaystyle\sigma_{x}^{*}(u,t)=\sigma_{x}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right), σα(u,t)=σα(t,X(u,t),μt,αt),\displaystyle\qquad\sigma_{\alpha}^{*}(u,t)=\sigma_{\alpha}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right),
bμ(u,t)(v)=bμ\displaystyle b^{*}_{\mu}(u,t)(v)=b_{\mu} (t,X(u,t),μt,αt)(X(v,t)),\displaystyle\Big{(}t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\Big{)}(X^{*}(v,t)),
σμ(u,t)(v)=σμ\displaystyle\sigma^{*}_{\mu}(u,t)(v)=\sigma_{\mu} (t,X(u,t),μt,αt)(X(v,t)).\displaystyle\Big{(}t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\Big{)}(X^{*}(v,t)).
Lemma 3.

Let V(u,t)V(u,t) defined as (27). Then we have

sup0tTlimε0𝔼R[Xε(u,t)X(u,t)εV(u,t)]2μ0(du)=0.\displaystyle\sup_{0\leq t\leq T}\lim_{\varepsilon\to 0}\mathbb{E}\int_{R}\Big{[}\frac{X^{\varepsilon}(u,t)-X^{*}(u,t)}{\varepsilon}-V(u,t)\Big{]}^{2}\mu_{0}(du)=0.
Proof.

The proof is classical. Let ΔX(u,t)=Xε(u,t)X(u,t)\Delta X(u,t)=X^{\varepsilon}(u,t)-X^{*}(u,t) and Y(u,t)=Xε(u,t)X(u,t)εV(u,t)Y(u,t)=\frac{X^{\varepsilon}(u,t)-X^{*}(u,t)}{\varepsilon}-V(u,t). Using Taylor’s expansion to Xε(u,t)X(u,t)X^{\varepsilon}(u,t)-X^{*}(u,t), we get

ΔX(u,t)\displaystyle\Delta X(u,t)
=\displaystyle= 0t[bx(u,s)ΔX(u,s)+bμ(u,s)(v)ΔX(v,s)μ0(dv)+εbα(u,s)βs]𝑑s\displaystyle\int_{0}^{t}\Big{[}b_{x}^{*}(u,s)\Delta X(u,s)+\int_{\mathbb{R}}b_{\mu}^{*}(u,s)(v)\Delta X(v,s)\mu_{0}(dv)+\varepsilon b_{\alpha}^{*}(u,s)\beta_{s}\Big{]}ds
+0t[σx(u,s)ΔX(u,s)+σμ(u,s)(v)ΔX(v,s)μ0(dv)+εσα(u,s)βs]𝑑Ws+o(ε),\displaystyle+\int_{0}^{t}\Big{[}\sigma_{x}^{*}(u,s)\Delta X(u,s)+\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,s)(v)\Delta X(v,s)\mu_{0}(dv)+\varepsilon\sigma_{\alpha}^{*}(u,s)\beta_{s}\Big{]}dW_{s}+o(\varepsilon),

then

Y(u,t)=\displaystyle Y(u,t)= 0t[bx(u,s)Y(u,s)+bμ(u,s)(v)Y(v,s)μ0(dv)]𝑑s\displaystyle\int_{0}^{t}\Big{[}b_{x}^{*}(u,s)Y(u,s)+\int_{\mathbb{R}}b_{\mu}^{*}(u,s)(v)Y(v,s)\mu_{0}(dv)\Big{]}ds
+0t[σx(u,s)Y(u,s)+σμ(u,s)(v)Y(v,s)μ0(dv)]𝑑Ws+o(1).\displaystyle+\int_{0}^{t}\Big{[}\sigma_{x}^{*}(u,s)Y(u,s)+\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,s)(v)Y(v,s)\mu_{0}(dv)\Big{]}dW_{s}+o(1).

So that

𝔼Y(u,t)22(T+1)L2𝔼0t[Y(u,s)2+(Y(v,s)μ0(dv))2]𝑑s+o(1).\displaystyle\mathbb{E}Y(u,t)^{2}\leq 2(T+1)L^{2}\mathbb{E}\int_{0}^{t}\left[Y(u,s)^{2}+\left(\int_{\mathbb{R}}Y(v,s)\mu_{0}(dv)\right)^{2}\right]ds+o(1).

Taking the integral of μ0\mu_{0}, by Fubini’s theorem, Jessen’s inequality and Gronwall’s inequality, we get the conclusion. ∎

By Taylor’s expansion of J(αtε)J(αt)J(\alpha_{t}^{\varepsilon})-J(\alpha_{t}^{*}), we have that

J(αtε)J(αt)ε𝔼[\displaystyle\frac{J(\alpha_{t}^{\varepsilon})-J(\alpha_{t}^{*})}{\varepsilon}\to\mathbb{E}\int_{\mathbb{R}}\Big{[} 0T(fx(u,t)V(u,t)+fμ(u,t)(v)V(v,t)μ0(dv))𝑑t\displaystyle\int_{0}^{T}\Big{(}f_{x}^{*}(u,t)V(u,t)+\int_{\mathbb{R}}f_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)\Big{)}dt
+0Tfα(u,t)βt𝑑t+gx(X(u,T),μT)V(u,T)\displaystyle+\int_{0}^{T}f_{\alpha}^{*}(u,t)\beta_{t}dt+g_{x}(X^{*}(u,T),\mu_{T}^{*})V(u,T)
+gμ(X(u,T),μT)(X(v,T))V(v,T)μ0(dv)]μ0(du),\displaystyle+\int_{\mathbb{R}}g_{\mu}(X^{*}(u,T),\mu_{T}^{*})(X^{*}(v,T))V(v,T)\mu_{0}(dv)\Big{]}\mu_{0}(du), (28)

as ε0\varepsilon\to 0, where

fx(u,t)=fx(t,X(u,t),μt,αt),\displaystyle f_{x}^{*}(u,t)=f_{x}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right), fα(u,t)=fα(t,X(u,t),μt,αt),\displaystyle\qquad f_{\alpha}^{*}(u,t)=f_{\alpha}\left(t,X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\right),
fμ(u,t)(v)=fμ(t,\displaystyle f^{*}_{\mu}(u,t)(v)=f_{\mu}\Big{(}t, X(u,t),μt,αt)(X(v,t)).\displaystyle X^{*}(u,t),\mu_{t}^{*},\alpha^{*}_{t}\Big{)}(X^{*}(v,t)).

Define the Hamiltonian function HH by

H(t,x,μ,α,p,q)=b(t,x,μ,α)p+σ(t,x,μ,α)q+f(t,x,μ,α).\displaystyle H(t,x,\mu,\alpha,p,q)=b(t,x,\mu,\alpha)p+\sigma(t,x,\mu,\alpha)q+f(t,x,\mu,\alpha). (29)

Then we give the adjoint equation

{dp(u,t)=[bx(u,t)p(u,t)+bμ(v,t)(u)p(v,t)μ0(dv)+σx(u,t)q(u,t)+σμ(v,t)(u)q(v,t)μ0(dv)+fx(u,t)+fμ(v,t)(u)μ0(dv)]dtq(u,t)dWt,p(u,T)=gx(X(u,T))+gμ(X(v,T),μT)(X(u,T))μ0(dv),\displaystyle\left\{\begin{array}[]{ll}-dp(u,t)=&\Big{[}b_{x}^{*}(u,t)p(u,t)+\int_{\mathbb{R}}b_{\mu}^{*}(v,t)(u)p(v,t)\mu_{0}(dv)+\sigma_{x}^{*}(u,t)q(u,t)\\ \\ &+\int_{\mathbb{R}}\sigma_{\mu}^{*}(v,t)(u)q(v,t)\mu_{0}(dv)+f_{x}^{*}(u,t)+\int_{\mathbb{R}}f_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\Big{]}dt\\ \\ &-q(u,t)dW_{t},\\ \\ \quad p(u,T)=&g_{x}(X^{*}(u,T))+\int_{\mathbb{R}}g_{\mu}(X^{*}(v,T),\mu_{T}^{*})(X^{*}(u,T))\mu_{0}(dv),\end{array}\right. (37)

i.e.

{dp(u,t)=[Hx(u,t)+Hμ(v,t)(u)μ0(dv)]dtq(u,t)dWt,p(u,T)=gx(X(u,T),μT)+gμ(X(v,T),μT)(X(u,T))μ0(dv),\displaystyle\left\{\begin{array}[]{ll}-dp(u,t)=\Big{[}H^{*}_{x}(u,t)+\int_{\mathbb{R}}H_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\Big{]}dt-q(u,t)dW_{t},\\ \\ p(u,T)=g_{x}(X^{*}(u,T),\mu_{T}^{*})+\int_{\mathbb{R}}g_{\mu}(X^{*}(v,T),\mu_{T}^{*})(X^{*}(u,T))\mu_{0}(dv),\end{array}\right. (41)

where

H(u,t)=H(t,X(u,t),μt,αt,p(u,t),q(u,t)).\displaystyle H^{*}(u,t)=H\Big{(}t,X^{*}(u,t),\mu^{*}_{t},\alpha^{*}_{t},p(u,t),q(u,t)\Big{)}.
Lemma 4.

Let (αt)0tT(\alpha_{t}^{*})_{0\leq t\leq T} be the optimal control process and (Xt)0tT(X_{t}^{*})_{0\leq t\leq T} be the corresponding state process, and (pt,qt)(p_{t},q_{t}) be the adjoint process satisfying (37). Then, the Ga^teauxG\hat{a}teaux derivative of JJ at αt\alpha^{*}_{t} in the direction (α~tαt)\left(\tilde{\alpha}_{t}-\alpha_{t}^{*}\right) is

ddεJ(αt+ε(α~tαt))|ε=0=𝔼0T[Hα(u,t)(α~tαt)]𝑑tμ0(du),\displaystyle\frac{d}{d\varepsilon}J\left(\alpha_{t}^{*}+\varepsilon\left(\tilde{\alpha}_{t}-\alpha_{t}^{*}\right)\right)\Big{|}_{\varepsilon=0}=\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}[H^{*}_{\alpha}(u,t)\cdot\left(\tilde{\alpha}_{t}-\alpha_{t}^{*}\right)]dt\mu_{0}(du), (42)

where α~t\tilde{\alpha}_{t} is any other control process.

Proof.

Further, we will use the notation βt=α~tαt\beta_{t}=\tilde{\alpha}_{t}-\alpha_{t}^{*} for simplicity. By Ito^\hat{\rm o}’s formula, we have that

d(p(u,t)V(u,t))\displaystyle d(p(u,t)V(u,t)) (43)
=p(u,t)dV(u,t)+V(u,t)dp(u,t)+dp(u,t)dV(u,t)\displaystyle=p(u,t)dV(u,t)+V(u,t)dp(u,t)+dp(u,t)dV(u,t)
=p(u,t)[bx(u,t)V(u,t)+bμ(u,t)(v)V(v,t)μ0(dv)+bα(u,t)βt]dt\displaystyle=p(u,t)\Big{[}b_{x}^{*}(u,t)V(u,t)+\int_{\mathbb{R}}b_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)+b_{\alpha}^{*}(u,t)\beta_{t}\Big{]}dt
V(u,t)[bx(u,t)p(u,t)+bμ(v,t)(u)p(v,t)μ0(dv)+σx(u,t)q(u,t)\displaystyle\quad-V(u,t)\Big{[}b_{x}^{*}(u,t)p(u,t)+\int_{\mathbb{R}}b_{\mu}^{*}(v,t)(u)p(v,t)\mu_{0}(dv)+\sigma_{x}^{*}(u,t)q(u,t)
+σμ(v,t)(u)q(v,t)μ0(dv)+fx(u,t)+fμ(v,t)(u)μ0(dv)]dt\displaystyle\quad+\int_{\mathbb{R}}\sigma_{\mu}^{*}(v,t)(u)q(v,t)\mu_{0}(dv)+f_{x}^{*}(u,t)+\int_{\mathbb{R}}f_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\Big{]}dt
+V(u,t)σx(t,u)q(u,t)drdt+σα(t,u)q(u,t)βtdt\displaystyle\quad+V(u,t)\sigma_{x}^{*}(t,u)q(u,t)drdt+\sigma_{\alpha}^{*}(t,u)q(u,t)\beta_{t}dt
+q(u,t)σμ(u,t)(v)V(v,t)μ0(dv)+MtudWt\displaystyle\quad+q(u,t)\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)+M_{t}^{u}dW_{t}
=[p(u,t)bμ(u,t)(v)V(v,t)μ0(dv)V(u,t)bμ(v,t)(u)p(v,t)μ0(dv)]dt\displaystyle=\Big{[}p(u,t)\int_{\mathbb{R}}b_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)-V(u,t)\int_{\mathbb{R}}b_{\mu}^{*}(v,t)(u)p(v,t)\mu_{0}(dv)\Big{]}dt
+[q(u,t)σμ(u,t)(v)V(v,t)μ0(dv)V(u,t)σμ(v,t)(u)q(v,t)μ0(dv)]dt\displaystyle\quad+\Big{[}q(u,t)\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)-V(u,t)\int_{\mathbb{R}}\sigma_{\mu}^{*}(v,t)(u)q(v,t)\mu_{0}(dv)\Big{]}dt
[fx(u,t)V(u,t)+V(u,t)fμ(v,t)(u)μ0(dv)]dt\displaystyle\quad-\Big{[}f_{x}^{*}(u,t)V(u,t)+V(u,t)\int_{\mathbb{R}}f_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\Big{]}dt
+[bα(u,t)p(u,t)+σα(u,t,r)q(u,t,r)𝑑r]βtdt+MtudWt,\displaystyle\quad+\Big{[}b_{\alpha}^{*}(u,t)p(u,t)+\int_{\mathbb{R}}\sigma_{\alpha}^{*}(u,t,r)q(u,t,r)dr\Big{]}\beta_{t}dt+M_{t}^{u}dW_{t},

where MtuM_{t}^{u} is a t\mathcal{F}_{t} adapted process.

By Fubini’s theorem we have

p(u,t)bμ(u,t)(v)V(v,t)μ0(dv)μ0(du)\displaystyle\quad\int_{\mathbb{R}}p(u,t)\int_{\mathbb{R}}b_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)\mu_{0}(du) (44)
=V(v,t)bμ(u,t)(v)p(u,t)μ0(du)μ0(dv)\displaystyle=\int_{\mathbb{R}}V(v,t)\int_{\mathbb{R}}b_{\mu}^{*}(u,t)(v)p(u,t)\mu_{0}(du)\mu_{0}(dv)
=V(u,t)bμ(v,t)(u)p(v,t)μ0(dv)μ0(du).\displaystyle=\int_{\mathbb{R}}V(u,t)\int_{\mathbb{R}}b_{\mu}^{*}(v,t)(u)p(v,t)\mu_{0}(dv)\mu_{0}(du).

In the same way we get

q(u,t)σμ(u,t)(v)V(v,t)μ0(dv)μ0(du)\displaystyle\quad\int_{\mathbb{R}}q(u,t)\int_{\mathbb{R}}\sigma_{\mu}^{*}(u,t)(v)V(v,t)\mu_{0}(dv)\mu_{0}(du) (45)
=V(u,t)σμ(v,t)(u)q(v,t)μ0(dv)μ0(du),\displaystyle=\int_{\mathbb{R}}V(u,t)\int_{\mathbb{R}}\sigma_{\mu}^{*}(v,t)(u)q(v,t)\mu_{0}(dv)\mu_{0}(du),

and

V(u,t)fμ(v,t)(u)μ0(dv)μ0(du)=fμ(u,t)(v)V(v,t)μ0(dv)μ0(du).\displaystyle\int_{\mathbb{R}}V(u,t)\int_{\mathbb{R}}f_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\mu_{0}(du)=\int_{\mathbb{R}}\int_{\mathbb{R}}f^{*}_{\mu}(u,t)(v)V(v,t)\mu_{0}(dv)\mu_{0}(du). (46)

Taking the expectation and taking the integral of t,μ0t,\mu_{0} to equality (43), by (44) and (45), we have that

𝔼gx(X(u,T))V(u,T)μ0(du)\displaystyle\quad\mathbb{E}\int_{\mathbb{R}}g_{x}(X^{*}(u,T))V(u,T)\mu_{0}(du) (47)
=𝔼p(u,T)V(u,T)μ0(du)=𝔼0Td(p(u,t)V(u,t))𝑑tμ0(du)\displaystyle=\mathbb{E}\int_{\mathbb{R}}p(u,T)V(u,T)\mu_{0}(du)=\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}d(p(u,t)V(u,t))dt\mu_{0}(du)
=𝔼0T[fx(u,t)V(u,t)+V(u,t)fμ(v,t)(u)μ0(dv)]𝑑tμ0(du)\displaystyle=-\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}\Big{[}f_{x}^{*}(u,t)V(u,t)+V(u,t)\int_{\mathbb{R}}f^{*}_{\mu}(v,t)(u)\mu_{0}(dv)\Big{]}dt\mu_{0}(du)
+𝔼0T[bα(u,t)p(u,t)+σα(u,t)q(u,t)dr]βt𝑑tμ0(du).\displaystyle\quad+\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}\Big{[}b_{\alpha}^{*}(u,t)p(u,t)+\sigma_{\alpha}^{*}(u,t)q(u,t)dr\Big{]}\beta_{t}dt\mu_{0}(du).

Then, substitute (47) into (4), and by (46) we have

ddεJ(αt+εβt)|ε=0\displaystyle\quad\frac{d}{d\varepsilon}J(\alpha_{t}^{*}+\varepsilon\beta_{t})\Big{|}_{\varepsilon=0} (48)
=𝔼0T[(bα(u,t)p(u,t)+σα(u,t)q(u,t)+fα(u,t))βt]𝑑tμ0(du)\displaystyle=\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}\Big{[}\Big{(}b_{\alpha}^{*}(u,t)p(u,t)+\sigma_{\alpha}^{*}(u,t)q(u,t)+f_{\alpha}^{*}(u,t)\Big{)}\cdot\beta_{t}\Big{]}dt\mu_{0}(du)
=𝔼0T[Hα(u,t)βt]𝑑tμ0(du).\displaystyle=\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}[H_{\alpha}^{*}(u,t)\cdot\beta_{t}]dt\mu_{0}(du).

Theorem 1.

Assume that (αt)0tT(\alpha_{t}^{*})_{0\leq t\leq T} is the optimal control process,
(Xt)0tT(X_{t}^{*})_{0\leq t\leq T} and (pt,qt)0tT(p_{t},q_{t})_{0\leq t\leq T} are corresponding state process and adjoint process satisfying (37), respectively, μt=μ0X(,t)1\mu^{*}_{t}=\mu_{0}\circ X^{*}(\cdot,t)^{-1} . Then

[Hα(u,t)μ0(du)](α~tαt)0,α~t𝕌,dλdPa.s.\displaystyle\Big{[}\int_{\mathbb{R}}H_{\alpha}^{*}(u,t)\mu_{0}(du)\Big{]}\cdot(\tilde{\alpha}_{t}-\alpha^{*}_{t})\geq 0,\qquad\forall\tilde{\alpha}_{t}\in\mathbb{U},\quad d\lambda\otimes dP\quad a.s. (49)
Proof.

Since (αt)0tT(\alpha_{t}^{*})_{0\leq t\leq T} is optimal control process, we have that

ddεJ(αt+ε(α~tαt))|ε=00.\displaystyle\frac{d}{d\varepsilon}J\Big{(}\alpha_{t}^{*}+\varepsilon(\tilde{\alpha}_{t}-\alpha_{t}^{*})\Big{)}\Big{|}_{\varepsilon=0}\geq 0.

By Lemma 4.1, we get

𝔼0T[Hα(u,t)(α~tαt)]𝑑tμ0(du)0.\displaystyle\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}\Big{[}H_{\alpha}^{*}(u,t)\cdot(\tilde{\alpha}_{t}-\alpha_{t}^{*})\Big{]}dt\mu_{0}(du)\geq 0.

So

𝔼[𝟏𝒜Hα(u,t)(α~tαt)]μ0(du)0,t[0,T],𝒜t.\displaystyle\mathbb{E}\left[\mathbf{1}_{\mathcal{A}}\mathbb{\int}_{\mathbb{R}}H_{\alpha}^{*}(u,t)\cdot(\tilde{\alpha}_{t}-\alpha_{t}^{*})\right]\mu_{0}(du)\geq 0,\quad\forall t\in[0,T],\quad\forall\mathcal{A}\subset\mathcal{F}_{t}.

Thus,

[Hα(u,t)μ0(du)](α~tαt)0,α~t𝕌,dλdPa.s.\displaystyle\Big{[}\int_{\mathbb{R}}H_{\alpha}^{*}(u,t)\mu_{0}(du)\Big{]}\cdot(\tilde{\alpha}_{t}-\alpha^{*}_{t})\geq 0,\qquad\forall\tilde{\alpha}_{t}\in\mathbb{U},\quad d\lambda\otimes dP\quad a.s.

TRemark 1.

If the optimal control process (αt)0tT(\alpha_{t}^{*})_{0\leq t\leq T} takes values in the interior of the 𝕌\mathbb{U} , then we can replace (49) with the following condition

Hα(u,t)μ0(du)=0.\displaystyle\int_{\mathbb{R}}H_{\alpha}^{*}(u,t)\mu_{0}(du)=0.

Thus, we give the Hamiltonian system, which is a forward-backward SDE with interaction.

{dX(u,t)=Hp(u,t)dt+Hq(u,t)dWt,dpt=[Hx(u,t)+Hμ(v,t)(u)μ0(dv)]dtq(u,t)dWt,X(u,0)=u,p(u,T)=gx(X(u,T),μT)+gμ(X(v,T),μT)(X(u,T))μ0(dv),μt=μ0X(,t)1.\displaystyle\left\{\begin{array}[]{ll}dX^{*}(u,t)=H_{p}^{*}(u,t)dt+H^{*}_{q}(u,t)dW_{t},\\ \\ -dp_{t}=\Big{[}H^{*}_{x}(u,t)+\int_{\mathbb{R}}H_{\mu}^{*}(v,t)(u)\mu_{0}(dv)\Big{]}dt-q(u,t)dW_{t},\\ \\ X^{*}(u,0)=u,\\ \\ p(u,T)=g_{x}(X^{*}(u,T),\mu_{T}^{*})+\int_{\mathbb{R}}g_{\mu}(X^{*}(v,T),\mu_{T}^{*})(X^{*}(u,T))\mu_{0}(dv),\\ \\ \mu_{t}^{*}=\mu_{0}\circ X^{*}(\cdot,t)^{-1}.\end{array}\right. (59)

And the optimal control process should satisfy:

Hα(u,t)μ0(du)=0,\displaystyle\int_{\mathbb{R}}H_{\alpha}^{*}(u,t)\mu_{0}(du)=0, (60)

where

H(u,t)=\displaystyle H^{*}(u,t)= H(t,X(u,t),μt,αt,p(u,t),q(u,t)),\displaystyle H\Big{(}t,X^{*}(u,t),\mu^{*}_{t},\alpha^{*}_{t},p(u,t),q(u,t)\Big{)},
H(t,x,μ,α,p,q)\displaystyle H(t,x,\mu,\alpha,p,q) =b(t,x,μ,α)p+σ(t,x,μ,α)q+f(t,x,μ,α).\displaystyle=b(t,x,\mu,\alpha)p+\sigma(t,x,\mu,\alpha)q+f(t,x,\mu,\alpha).

5. Linear Quadratic Case

In this section, we consider two linear quadratic (LQ) problems. The first example is a discrete case, we study this problem to show the rationality of the conclusion in section 4. The second example is more general, we get the optimal control and prove the uniqueness of it.

Example 2.

Consider the following system

{dri(t)=1Nk=1Na(αt,ri(t)rk(t))dt+σdWt,ri(0)=ri0,\displaystyle\left\{\begin{array}[]{ll}dr_{i}(t)=\frac{1}{N}\sum_{k=1}^{N}a(\alpha_{t},r_{i}(t)-r_{k}(t))dt+\sigma dW_{t},\\ r_{i}(0)=r_{i}^{0},\end{array}\right. (63)

with the cost function

J(αt)=𝔼[0Tαt2𝑑t+1N2k,j=1NΓ(rk(T)rj(T))],\displaystyle J(\alpha_{t})=\mathbb{E}\left[\int_{0}^{T}\alpha_{t}^{2}dt+\frac{1}{N^{2}}\sum_{k,j=1}^{N}\Gamma\Big{(}r_{k}(T)-r_{j}(T)\Big{)}\right], (64)

where Γ\Gamma is a non-negative definite function. This is the system with interaction with μ0=1Ni=1Nδri0\mu_{0}=\frac{1}{N}\sum_{i=1}^{N}\delta_{r^{0}_{i}}. By using the conclusion (59) and (60) we obtain in section 4, we get the adjoint function as

{dpi(t)=1Nk=1N[ar(αt,ri(t)rk(t))(pi(t)pk(t))]dtqi(t)dWt,pi(T)=1Nj=1NΓ(ri(T)rj(T))1Nk=1NΓ(rk(T)ri(T)).\displaystyle\left\{\begin{array}[]{ll}-dp_{i}(t)=\frac{1}{N}\sum_{k=1}^{N}\Big{[}a_{r}\Big{(}\alpha_{t}^{*},r_{i}^{*}(t)-r^{*}_{k}(t)\Big{)}\cdot\Big{(}p_{i}(t)-p_{k}(t)\Big{)}\Big{]}dt-q_{i}(t)dW_{t},\\ \\ p_{i}(T)=\frac{1}{N}\sum_{j=1}^{N}\Gamma^{\prime}\Big{(}r_{i}^{*}(T)-r_{j}^{*}(T)\Big{)}-\frac{1}{N}\sum_{k=1}^{N}\Gamma^{\prime}\Big{(}r_{k}^{*}(T)-r^{*}_{i}(T)\Big{)}.\end{array}\right. (68)

Moreover, if Γ(r)=Γ(r),Γ(r)=Γ(r)\Gamma(r)=\Gamma(-r),-\Gamma^{\prime}(r)=\Gamma^{\prime}(-r), we rewrite pi(T)=2Nj=1NΓ(ri(T)rj(T))p_{i}(T)=\frac{2}{N}\sum_{j=1}^{N}\Gamma^{\prime}\Big{(}r_{i}^{*}(T)-r_{j}^{*}(T)\Big{)}. Then the optimal control should satisfy

1N2iNpi(t)k=1Naα(αt,ri(t)rk(t))+2αt=0.\displaystyle\frac{1}{N^{2}}\sum_{i}^{N}p_{i}(t)\sum_{k=1}^{N}a_{\alpha}\Big{(}\alpha_{t}^{*},r_{i}^{*}(t)-r_{k}^{*}(t)\Big{)}+2\alpha_{t}^{*}=0. (69)

On the other hand, it is a classical optimal control problem with the state process being NN-dimension of [6, 13] and one can get the same conclusion as (68), (69) (see for example equation 3.23 in [31]), which shows the rationality of the conclusion in Section 4.

Example 3.

Consider the following linear quadratic case: The system is

{dX(u,t)=(AX(u,t)+BX¯t+Cαt)dt+(DX(u,t)+FX¯t+Hαt)dWt,X(u,0)=u,μt=μ0X(,t)1,0tT,\displaystyle\left\{\begin{array}[]{ll}dX(u,t)=\Big{(}AX(u,t)+B\bar{X}_{t}+C\alpha_{t}\Big{)}dt+\Big{(}DX(u,t)+F\bar{X}_{t}+H\alpha_{t}\Big{)}dW_{t},\\ X(u,0)=u,\\ \mu_{t}=\mu_{0}\circ X(\cdot,t)^{-1},\qquad 0\leq t\leq T,\end{array}\right. (73)

where X¯t=uμT(du)=X(u,t)μ0(du)\bar{X}_{t}=\int_{\mathbb{R}}u\mu_{T}(du)=\int_{\mathbb{R}}X(u,t)\mu_{0}(du). The cost function is

J(αt)=12𝔼0T(QX(u,t)2μ0(du)+SX¯t2+Rαt2)𝑑t+𝔼ρ2(μT,ν),\displaystyle J(\alpha_{t})=\frac{1}{2}\mathbb{E}\int_{0}^{T}\left(Q\int_{\mathbb{R}}X(u,t)^{2}\mu_{0}(du)+S\bar{X}^{2}_{t}+R\alpha_{t}^{2}\right)dt+\mathbb{E}\rho^{2}(\mu_{T},\nu), (74)

with Q,S0,R>0Q,S\geq 0,R>0, where

ρ2(μ,ν)=2(uv)2[μ(du)ν(dv)][ν(dv)μ(dv)],\displaystyle\rho^{2}(\mu,\nu)=\int\int_{\mathbb{R}^{2}}(u-v)^{2}[\mu(du)-\nu(dv)]\cdot[\nu(dv)-\mu(dv)],

here ν\nu is a given probability measure.

Lemma 5.

Let h(μ)=ρ2(μ,ν)h(\mu)=\rho^{2}(\mu,\nu). Then the differential of hh at μT\mu_{T}^{*} is

hμ(μT)(X(u,T))=4(X¯Tvν(dv)),u,\displaystyle h_{\mu}(\mu^{*}_{T})(X^{*}(u,T))=4\left(\bar{X}^{*}_{T}-\int_{\mathbb{R}}v\nu(dv)\right),\qquad\forall u\in\mathbb{R}, (75)

where

X¯T=xμT(dx)=X(u,T)μ0(du).\displaystyle\bar{X}^{*}_{T}=\int_{\mathbb{R}}x\mu^{*}_{T}(dx)=\int_{\mathbb{R}}X^{*}(u,T)\mu_{0}(du).
Proof.
h(μ)\displaystyle h(\mu) =2(uv)2[μ(du)μ(dv)+μ(du)ν(dv)+μ(dv)ν(du)ν(du)ν(dv)]\displaystyle=\int\int_{\mathbb{R}^{2}}(u-v)^{2}\left[-\mu(du)\mu(dv)+\mu(du)\nu(dv)+\mu(dv)\nu(du)-\nu(du)\nu(dv)\right]
=2(uv)2μ(du)μ(dv)+22(uv)2μ(du)ν(dv)\displaystyle=-\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu(du)\mu(dv)+2\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu(du)\nu(dv)
2(uv)2ν(du)ν(dv)\displaystyle\quad-\int\int_{\mathbb{R}^{2}}(u-v)^{2}\nu(du)\nu(dv)
=h1(μ)+2h2(μ)h3(ν).\displaystyle=-h_{1}(\mu)+2h_{2}(\mu)-h_{3}(\nu).

Let δX(u,T)=Xε(u,T)X(u,T)\delta X(u,T)=X^{\varepsilon}(u,T)-X^{*}(u,T). Notice that

h1(μTε)h1(μT)\displaystyle\quad h_{1}(\mu^{\varepsilon}_{T})-h_{1}(\mu_{T}^{*})
=2(uv)2μTε(du)μTε(dv)2(uv)2μT(du)μT(dv)\displaystyle=\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu^{\varepsilon}_{T}(du)\mu^{\varepsilon}_{T}(dv)-\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu^{*}_{T}(du)\mu^{*}_{T}(dv)
=2[(Xε(u,T)Xε(v,T))2(X(u,T)X(v,T))2]μ0(du)μ0(dv)\displaystyle=\int\int_{\mathbb{R}^{2}}\left[\Big{(}X^{\varepsilon}(u,T)-X^{\varepsilon}(v,T)\Big{)}^{2}-\Big{(}X^{*}(u,T)-X^{*}(v,T)\Big{)}^{2}\right]\mu_{0}(du)\mu_{0}(dv)
=22[(X(u,T)X(v,T))(δX(u,T)δX(v,T))]μ0(du)μ0(dv)+o(ε)\displaystyle=2\int\int_{\mathbb{R}^{2}}\left[\Big{(}X^{*}(u,T)-X^{*}(v,T)\Big{)}\Big{(}\delta X(u,T)-\delta X(v,T)\Big{)}\right]\mu_{0}(du)\mu_{0}(dv)+o(\varepsilon)
=22[X(u,T)δX(u,T)X(u,T)δX(v,T)X(v,T)δX(u,T)\displaystyle=2\int\int_{\mathbb{R}^{2}}\Big{[}X^{*}(u,T)\delta X(u,T)-X^{*}(u,T)\delta X(v,T)-X^{*}(v,T)\delta X(u,T)
+X(v,T)δX(v,T)]μ0(du)μ0(dv)+o(ε)\displaystyle\quad\qquad\qquad+X^{*}(v,T)\delta X(v,T)\Big{]}\mu_{0}(du)\mu_{0}(dv)+o(\varepsilon)
=4X(u,T)δX(u,T)μ0(du)42X(v,T)δX(u,T)μ0(du)μ0(dv)+o(ε)\displaystyle=4\int_{\mathbb{R}}X^{*}(u,T)\delta X(u,T)\mu_{0}(du)-4\int\int_{\mathbb{R}^{2}}X^{*}(v,T)\delta X(u,T)\mu_{0}(du)\mu_{0}(dv)+o(\varepsilon)

and

h2(μTε)h2(μT)\displaystyle\quad h_{2}(\mu^{\varepsilon}_{T})-h_{2}(\mu_{T}^{*})
=2(uv)2μTε(du)ν(dv)2(uv)2μT(du)ν(dv)\displaystyle=\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu^{\varepsilon}_{T}(du)\nu(dv)-\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu^{*}_{T}(du)\nu(dv)
=2[(Xε(u,T)v)2(X(u,T)v)2]μ0(du)ν(dv)\displaystyle=\int\int_{\mathbb{R}^{2}}\left[\Big{(}X^{\varepsilon}(u,T)-v\Big{)}^{2}-\Big{(}X^{*}(u,T)-v\Big{)}^{2}\right]\mu_{0}(du)\nu(dv)
=22[(X(u,T)v)δX(u,T)]μ0(du)ν(dv)+o(ε)\displaystyle=2\int\int_{\mathbb{R}^{2}}\Big{[}\Big{(}X^{*}(u,T)-v\Big{)}\delta X(u,T)\Big{]}\mu_{0}(du)\nu(dv)+o(\varepsilon)
=2X(u,T)δX(u,T)μ0(du)2vδX(u,T)μ0(du)ν(dv)+o(ε).\displaystyle=2\int_{\mathbb{R}}X^{*}(u,T)\delta X(u,T)\mu_{0}(du)-\int\int_{\mathbb{R}^{2}}v\delta X(u,T)\mu_{0}(du)\nu(dv)+o(\varepsilon).

So that

h(μTε)h(μT)\displaystyle h(\mu^{\varepsilon}_{T})-h(\mu^{*}_{T}) =(h1(μTε)h1(μT))+2(h2(μTε)h2(μT))\displaystyle=-\left(h_{1}(\mu^{\varepsilon}_{T})-h_{1}(\mu_{T}^{*})\right)+2\left(h_{2}(\mu^{\varepsilon}_{T})-h_{2}(\mu_{T}^{*})\right)
=42vδX(u,T)μ0(du)ν(dv)\displaystyle=-4\int\int_{\mathbb{R}^{2}}v\delta X(u,T)\mu_{0}(du)\nu(dv)
+42X(v,T)δX(u,T)μ0(du)μ0(dv)+o(ε)\displaystyle\quad+4\int\int_{\mathbb{R}^{2}}X^{*}(v,T)\delta X(u,T)\mu_{0}(du)\mu_{0}(dv)+o(\varepsilon)
=hμ(μT)(X(u,T))δX(u,T)μ0(du)+o(ε),\displaystyle=\int_{\mathbb{R}}h_{\mu}(\mu_{T}^{*})(X^{*}(u,T))\delta X(u,T)\mu_{0}(du)+o(\varepsilon),

where

hμ(μT)(X(u,T))=4X(v,T)μ0(dv)4vν(dv):=4(X¯Tvν(dv)).\displaystyle h_{\mu}(\mu_{T}^{*})(X^{*}(u,T))=4\int_{\mathbb{R}}X^{*}(v,T)\mu_{0}(dv)-4\int_{\mathbb{R}}v\nu(dv):=4\left(\bar{X}^{*}_{T}-\int_{\mathbb{R}}v\nu(dv)\right).

TRemark 2.

For any other probability measure μ1,μ2\mu_{1},\mu_{2} and the corresponding random variable X1,X2X_{1},X_{2}, same as the proof above, we can get that

h(μ2)h(μ1)<hμ(μ1)(X1),X2X1>,\displaystyle h(\mu_{2})-h(\mu_{1})\geq\ \big{<}h_{\mu}(\mu_{1})(X_{1}),X_{2}-X_{1}\big{>}, (76)

where <X,Y>=2xyρ(dx,dy)\big{<}X,Y\big{>}=\int_{\mathbb{R}^{2}}xy\rho(dx,dy), and

h(μ1)+h(μ2)2h(μ3),\displaystyle h(\mu_{1})+h(\mu_{2})\geq 2h(\mu_{3}), (77)

where μ3\mu_{3} is the distribution of X1+X22\frac{X_{1}+X_{2}}{2}. The inequality (76), (77) will be useful in the following.

So through (59) and (60), we can get the adjoint equation

{dp(u,t)=[Ap(u,t)+Bp¯t+Dq(u,t)+Fq¯t+QX(u,t)+SX¯t]dtq(u,t)dWt,p(u,T)=4(X¯Tvν(dv)),\displaystyle\left\{\begin{array}[]{ll}-dp(u,t)&=\Big{[}Ap(u,t)+B\bar{p}_{t}+Dq(u,t)+F\bar{q}_{t}+QX^{*}(u,t)+S\bar{X}_{t}^{*}\Big{]}dt\\ \\ &\qquad\quad-q(u,t)dW_{t},\\ \\ p(u,T)&=4\left(\bar{X}^{*}_{T}-\int_{\mathbb{R}}v\nu(dv)\right),\end{array}\right. (83)

where p¯t=p(u,t)μ0(du)\bar{p}_{t}=\int_{\mathbb{R}}p(u,t)\mu_{0}(du) and q¯t=q(u,t)μ0(du)\bar{q}_{t}=\int_{\mathbb{R}}q(u,t)\mu_{0}(du). The necessary condition for optimal control should be

αt=R1(Cp¯t+Hq¯t),t[0,T].\displaystyle\alpha_{t}^{*}=-R^{-1}(C\bar{p}_{t}+H\bar{q}_{t}),\quad t\in[0,T]. (84)
Theorem 2.

The function αt=R1(Cp¯t+Hq¯t),t[0,T],\alpha_{t}^{*}=-R^{-1}(C\bar{p}_{t}+H\bar{q}_{t}),t\in[0,T], is the unique optimal control for LQ problem (73), (74), where (pt,qt)(p_{t},q_{t}) is defined by equality (83).

Proof.

We now prove utu_{t}^{*} is the optimal control. for any αt𝕌\alpha_{t}\subset\mathbb{U}, let XtX_{t} and XtX_{t}^{*} be the state processes corresponding to αt\alpha_{t} and αt\alpha_{t}^{*}, respectively. Denote δX(u,t)=X(u,t)X(u,t)\delta X(u,t)=X(u,t)-X^{*}(u,t) and δαt=αtαt\delta\alpha_{t}=\alpha_{t}-\alpha^{*}_{t}, we use Ito^\hat{\rm o}’s formula to get

dp(u,t)δX(u,t)=\displaystyle dp(u,t)\delta X(u,t)= B(p(u,t)δX¯tp¯tδX(u,t))+F(q(u,t)δX¯tq¯tδX(u,t))\displaystyle B\left(p(u,t)\delta\bar{X}_{t}-\bar{p}_{t}\delta X(u,t)\right)+F\left(q(u,t)\delta\bar{X}_{t}-\bar{q}_{t}\delta X(u,t)\right)
δX(u,t)(QX(u,t)+SX¯t)+δαt(Cp(u,t)+Hq(u,t))\displaystyle-\delta X(u,t)\left(QX^{*}(u,t)+S\bar{X}^{*}_{t}\right)+\delta\alpha_{t}\left(Cp(u,t)+Hq(u,t)\right)
+MtudWt.\displaystyle+M_{t}^{u}dW_{t}.

So by (76) and (84),

𝔼[h(μT)h(μT)]\displaystyle\mathbb{E}[h(\mu_{T})-h(\mu^{*}_{T})]\geq 𝔼0Tp(u,t)δX(u,t)𝑑tμ0(du)\displaystyle\mathbb{E}\int_{\mathbb{R}}\int_{0}^{T}p(u,t)\delta X(u,t)dt\mu_{0}(du)
=\displaystyle= 𝔼0T[QX(u,t)δX(u,t)μ0(du)+SX¯tδX¯t+Rαtδαt]𝑑t.\displaystyle-\mathbb{E}\int_{0}^{T}\left[Q\int_{\mathbb{R}}X^{*}(u,t)\delta X(u,t)\mu_{0}(du)+S\bar{X}^{*}_{t}\delta\bar{X}_{t}+R\alpha_{t}^{*}\delta\alpha_{t}\right]dt.

By using the fact a2b22b(ab)a^{2}-b^{2}\geq 2b(a-b), we get

J(αt)J(αt)\displaystyle J(\alpha_{t})-J(\alpha^{*}_{t})
=12𝔼[Q𝐑(X(u,t)2X(u,t)2)μ0(du)+S(X¯t2X¯t2)+R(αt2αt2)]dt\displaystyle=\frac{1}{2}\mathbb{E}\left[Q\int_{\mathbf{R}}\left(X(u,t)^{2}-X^{*}(u,t)^{2}\right)\mu_{0}(du)+S(\bar{X}^{2}_{t}-\bar{X}_{t}^{*2})+R(\alpha_{t}^{2}-{\alpha_{t}^{*}}^{2})\right]dt
+𝔼[h(μT)h(μT)]\displaystyle\quad+\mathbb{E}[h(\mu_{T})-h(\mu^{*}_{T})]
12𝔼[Q𝐑(X(u,t)2X(u,t)2)μ0(du)2QX(u,t)δX(u,t)μ0(du)\displaystyle\geq\frac{1}{2}\mathbb{E}\bigg{[}Q\int_{\mathbf{R}}\left(X(u,t)^{2}-X^{*}(u,t)^{2}\right)\mu_{0}(du)-2Q\int_{\mathbb{R}}X^{*}(u,t)\delta X(u,t)\mu_{0}(du)
+S(X¯t2X¯t2)2SX¯tδX¯t+R(αt2αt2)2Rαtδαt]dt0.\displaystyle\quad+S(\bar{X}^{2}_{t}-\bar{X}_{t}^{*2})-2S\bar{X}^{*}_{t}\delta\bar{X}_{t}+R(\alpha_{t}^{2}-{\alpha_{t}^{*}}^{2})-2R\alpha_{t}^{*}\delta\alpha_{t}\bigg{]}dt\geq 0.

This shows that utu_{t}^{*} is an optimal control. Then we prove that αt\alpha^{*}_{t} is unique. Assume that both αt,1\alpha_{t}^{*,1} and αt,2\alpha_{t}^{*,2} are optimal controls, X1(u,t)X^{1}(u,t) and X2(u,t)X^{2}(u,t) are corresponding state processes, respectively. It is easy to get X1(u,t)+X2(u,t)2\frac{X^{1}(u,t)+X^{2}(u,t)}{2} is the corresponding state process to αt,1+αt,22\frac{\alpha_{t}^{*,1}+\alpha_{t}^{*,2}}{2}. We assume there exists a constant θ0\theta\geq 0, such that

J(αt,1)=J(αt,2)=θ.\displaystyle J(\alpha_{t}^{*,1})=J(\alpha_{t}^{*,2})=\theta.

Using the fact a2+b2=2[(a+b2)2+(ab2)2]a^{2}+b^{2}=2[(\frac{a+b}{2})^{2}+(\frac{a-b}{2})^{2}] and by (77), we have that

2θ=\displaystyle 2\theta= J(αt,1)+J(αt,2)\displaystyle J(\alpha_{t}^{*,1})+J(\alpha_{t}^{*,2})
=\displaystyle= 12𝔼0T[Q(X1(u,t)X1(u,t)+X2(u,t)X2(u,t))μ0(du)\displaystyle\frac{1}{2}\mathbb{E}\int_{0}^{T}\bigg{[}Q\int_{\mathbb{R}}\Big{(}X^{1}(u,t)X^{1}(u,t)+X^{2}(u,t)X^{2}(u,t)\Big{)}\mu_{0}(du)
+S(X¯t1X¯t1+X¯t2X¯t2)+R(αt,1αt,1+αt,2αt,2)]dt\displaystyle\qquad\qquad+S\Big{(}\bar{X}^{1}_{t}\bar{X}^{1}_{t}+\bar{X}^{2}_{t}\bar{X}^{2}_{t}\Big{)}+R\Big{(}\alpha_{t}^{*,1}\alpha_{t}^{*,1}+\alpha_{t}^{*,2}\alpha_{t}^{*,2}\Big{)}\bigg{]}dt
+𝔼(ρ2(μT,1,ν)+ρ2(μT,2.,ν))\displaystyle\qquad+\mathbb{E}\Big{(}\rho^{2}(\mu_{T}^{*,1},\nu)+\rho^{2}(\mu_{T}^{*,2.},\nu)\Big{)}
\displaystyle\geq 𝔼0T[Q(X1(u,t)+X2(u,t)2)2μ0(du)+S(X¯t1+X¯t22)2\displaystyle\mathbb{E}\int_{0}^{T}\bigg{[}Q\int_{\mathbb{R}}\Big{(}\frac{X^{1}(u,t)+X^{2}(u,t)}{2}\Big{)}^{2}\mu_{0}(du)+S\Big{(}\frac{\bar{X}^{1}_{t}+\bar{X}^{2}_{t}}{2}\Big{)}^{2}
+R(αt,1+αt,22)2+R(αt,1αt,22)2]dt+2𝔼ρ2(μT,3,ν)\displaystyle\qquad\qquad+R\Big{(}\frac{\alpha_{t}^{*,1}+\alpha_{t}^{*,2}}{2}\Big{)}^{2}+R\Big{(}\frac{\alpha_{t}^{*,1}-\alpha_{t}^{*,2}}{2}\Big{)}^{2}\bigg{]}dt+2\mathbb{E}\rho^{2}\Big{(}\mu_{T}^{*,3},\nu\Big{)}
=\displaystyle= 2J(αt,1+αt,22)+𝔼0TR(αt,1αt,22)2𝑑t\displaystyle 2J\Big{(}\frac{\alpha_{t}^{*,1}+\alpha_{t}^{*,2}}{2}\Big{)}+\mathbb{E}\int_{0}^{T}R\Big{(}\frac{\alpha_{t}^{*,1}-\alpha_{t}^{*,2}}{2}\Big{)}^{2}dt
\displaystyle\geq 2θ+R4𝔼0T|αt,1αt,2|2𝑑t,\displaystyle 2\theta+\frac{R}{4}\mathbb{E}\int_{0}^{T}|\alpha_{t}^{*,1}-\alpha_{t}^{*,2}|^{2}dt,

where μT,3\mu_{T}^{*,3} is the distribution of XT,1+XT,22\frac{X_{T}^{*,1}+X_{T}^{*,2}}{2}. Thus, we have that

𝔼0T|αt,1αt,2|2𝑑t0,\displaystyle\mathbb{E}\int_{0}^{T}|\alpha_{t}^{*,1}-\alpha_{t}^{*,2}|^{2}dt\leq 0,

which shows that αt,1=αt,2\alpha_{t}^{*,1}=\alpha_{t}^{*,2}. ∎

TRemark 3.

Consider another terminal condition of the cost function (74), let ρ~2(μT)=2(uv)2μT(du)μT(dv)\tilde{\rho}^{2}(\mu_{T})=\int\int_{\mathbb{R}^{2}}(u-v)^{2}\mu_{T}(du)\mu_{T}(dv). By the similar method with Lemma 5.1, we can get the differential of ρ~2\tilde{\rho}^{2} at μT\mu^{*}_{T} is μρ~2(μT)(X(u,T))=4(X(u,T)X¯T)\partial_{\mu}\tilde{\rho}^{2}(\mu^{*}_{T})(X^{*}(u,T))=4\left(X^{*}(u,T)-\bar{X}_{T}^{*}\right) and the same conclusion as (76), (77). So that we can just get the unique optimal control in the same way by changing the terminal value of adjoint equation (83) to μρ~2(μT)(X(u,T))\partial_{\mu}\tilde{\rho}^{2}(\mu^{*}_{T})(X^{*}(u,T)).

References

  • [1] Andrey Dorogovtsev. Stochastic flows with interaction and measure-valued processes. International Journal of Mathematics and Mathematical Sciences, 67, 11 2003.
  • [2] Andrey A Dorogovtsev. Measure-valued processes and stochastic flows, volume 3. Walter de Gruyter GmbH & Co KG, 2023.
  • [3] Svetlozar T Rachev. The monge–kantorovich mass transference problem and its stochastic applications. Theory of Probability & Its Applications, 29(4):647–676, 1985.
  • [4] Vladimir I Bogachev and Aleksandr V Kolesnikov. The monge-kantorovich problem: achievements, connections, and perspectives. Russian Mathematical Surveys, 67(5):785, 2012.
  • [5] Christian Léonard. From the schrödinger problem to the monge–kantorovich problem. Journal of Functional Analysis, 262(4):1879–1920, 2012.
  • [6] René Carmona. Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2016.
  • [7] Peter Kotelenez. A class of quasilinear stochastic partial differential equations of mckean-vlasov type with mass conservation. Probability Theory and Related Fields, 102(2):159 – 188, 1995.
  • [8] Rainer Buckdahn, Boualem Djehiche, Juan Li, and Shige Peng. Mean-field backward stochastic differential equations: A limit approach. The Annals of Probability, 37(4):1524 – 1565, 2009.
  • [9] Nacira Agram, Yaozhong Hu, and Bernt Øksendal. Mean-field backward stochastic differential equations and applications. Systems &\& Control Letters, 162:105196, 2022.
  • [10] Vassili N. Kolokoltsov. Nonlinear Markov Processes and Kinetic Equations. Cambridge Tracts in Mathematics. Cambridge University Press, 2010.
  • [11] H. J. Kushner. Necessary conditions for continuous parameter stochastic optimization problems. SIAM Journal on Control, 10(3):550–565, 1972.
  • [12] Jean-Michel Bismut. An introductory approach to duality in optimal stochastic control. SIAM Review, 20(1):62–78, 1978.
  • [13] A. Bensoussan. Lectures on stochastic control. In Sanjoy K. Mitter and Antonio Moro, editors, Nonlinear Filtering and Stochastic Control, pages 1–62, Berlin, Heidelberg, 1982. Springer Berlin Heidelberg.
  • [14] Jiongmin Yong and Xun Yu Zhou. Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999.
  • [15] Shige Peng. A general stochastic maximum principle for optimal control problems. SIAM Journal on Control and Optimization, 28(4):966–979, 1990.
  • [16] Xun Yu Zhou. Stochastic near-optimal controls: necessary and sufficient conditions for near-optimality. SIAM Journal on Control and Optimization, 36(3):929–947, 1998.
  • [17] Shige Peng and Zhen Wu. Fully coupled forward-backward stochastic differential equations and applications to optimal control. SIAM Journal on Control and Optimization, 37(3):825–843, 1999.
  • [18] Yuecai Han, Shige Peng, and Zhen Wu. Maximum principle for backward doubly stochastic control systems with applications. SIAM Journal on Control and Optimization, 48(7):4224–4241, 2010.
  • [19] Jiongmin Yong. Linear-quadratic optimal control problems for mean-field stochastic differential equations. SIAM journal on Control and Optimization, 51(4):2809–2838, 2013.
  • [20] Yuecai Han, Yaozhong Hu, and Jian Song. Maximum principle for general controlled systems driven by fractional brownian motions. Applied Mathematics &\& Optimization, 67(2):279–322, 2013.
  • [21] Etienne Pardoux and Shige Peng. Adapted solution of a backward stochastic differential equation. Systems &\& Control Letters, 14(1):55–61, 1990.
  • [22] Shige Peng. Backward stochastic differential equations and applications to optimal control. Applied Mathematics and Optimization, 27(2):125–144, 1993.
  • [23] Nicole El Karoui, Shige Peng, and Marie Claire Quenez. Backward stochastic differential equations in finance. Mathematical Finance, 7(1):1–71, 1997.
  • [24] Jasmina Dordevic and Andrey Dorogovtsev. Backward stochastic differential equations with interaction, 2022.
  • [25] Donald A. Dawson. Measure Valued Processes. École d’été de probabilités de Saint-Flour, 1991.
  • [26] Giuseppe Savaré Luigi Ambrosio, Nicola Gigli. Gradient flows: In metric spaces and in the space of probability measures. Gradient flows: In metric spaces and in the space of probability measures, 2nd edition, 2008.
  • [27] Cédric Villani. Optimal transport, old and new. Springer, Grundlehren der mathematischen Wissenschaften, 2008.
  • [28] Max Renesse and Karl-Theodor Sturm. Entropic measure and wasserstein diffusion. Ann. Probab, 37, 05 2007.
  • [29] Pierre Cardaliaguet. Notes on mean field games. Technical report, Technical report, 2010.
  • [30] P. L. Lions. Cours au collège de france: Théorie des jeu à champs moyens. 2013.
  • [31] Jiongmin Yong. Stochastic optimal control — a concise introduction. Mathematical Control &\& Related Fields, 12, 01 2019.