This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Event-based EV Charging Scheduling in A Microgrid of Buildings

Qilong Huang, , Li Yang, , Chen Hou, , Zhiyong Zeng, , and Yaowen Qi This work has bee accepted by IEEE Transactions on Transportation Electrification. Please refer to the final version in IEEE.Q. Huang, L. Yang and Y. Qi are with the School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China (email: huangql@njust.edu.cn, yangli945@126.com, qiyaowen@njust.edu.cn).Li. Yang is the corresponding author.
Abstract

With the popularization of the electric vehicles (EVs), EV charging demand is becoming an important load in the building. Considering the mobility of EVs from building to building and their uncertain charging demand, it is of great practical interest to control the EV charging process in a microgrid of buildings to optimize the total operation cost while ensuring the transmission safety between the microgrid and the main grid. We consider this important problem in this paper and make the following contributions. First, we formulate this problem as a Markov decision process to capture the uncertain supply and EV charging demand in the microgrid of buildings. Besides reducing the total operation cost of buildings, the model also considers the power exchange limitation to ensure transmission safety. Second, this model is reformulated under event-based optimization framework to alleviate the impact of large state and action space. By appropriately defining the event and event-based action, the EV charging process can be optimized by searching a randomized parametric event-based control policy in the microgrid controller and implementing a selecting-to-charging rule in each building controller. Third, a constrained gradient-based policy optimization method with adjusting mechanism is proposed to iteratively find the optimal event-based control policy for EV charging demand in each building. Numerical experiments considering a microgrid of three buildings are conducted to analyze the structure and the performance of the event-based control policy for EV charging.

Index Terms:
Electric vehicle, building energy management, Markov decision process, discrete event dynamic system, event-based optimization.

I Introduction

Electric vehicles (EVs) have attracted more and more attention in recent years due to its lower emission and less dependence on fossil fuel. In order to achieve carbon peak and neutrality goals, China has made great effort to encourage the EV popularization. For example, the EV adoption increases nearly nine-fold in 2020 comparing to 2015. In 2020, there are about 4.17 million EVs on the road with 3:1 vehicle pile ratio[1]. Although the EV popularization helps to alleviate the fossil-fuel crisis and environment pollution, it brings a new challenge to the operation of microgrid if there is no charging control for the increasing EVs[2].

As EVs are charged by connecting to the charging piles in the building, the building is the main infrastructure which interacts with EVs. The impact of EV charging on the building lies in two aspects. On one hand, the charging profile of EVs will influence the energy operation of the building. The building energy operator needs to procure extra power in order to satisfy EV charging demand and achieve load balance. If no proper charging control policy is implemented, the energy operation cost of the building may increase which will be paid by the users in the end. On the other hand, if we only consider the charging control of EVs which parked in the corresponding building, the charging actions of multiple buildings may be homogeneous which may bring a new load peak to the grid. This may make the aggregated load exceed the contract capacity considering a microgrid with multiple buildings. When this happens, it may congest the distribution feeders and transformers which may cause voltage fluctuations in the microgrid[3].

According to the UK national travel survey, EVs are usually parked in the building 96.5% of the time[4]. Therefore, EVs can be considered as mobile storage devices with short unavailable time. In this context, it is of great practical interest to schedule the EV charging in a microgrid of buildings to reduce the EV charging impact on the economic operation of the building while limiting the load peak impact to the main grid. This problem is non-trivial to solve due to the following difficulties:

First, uncertainties in the supply and demand side of the buildings. The uncertainty in the supply side of the buildings comes from the uncertain generation of distributed renewable energy[5]. The uncertainty in the demand side of the buildings comes from the uncertain charging demand of EVs. The arrival time, the parking time and the required charging energy are all uncertain before EV begins to park and charge in the building[6]. Second, the large state space and action space introduced by the large number of EVs in the buildings. Currently there are usually dozens to hundreds of charging piles in the building[7]. If all these charging piles are occupied, the charging states and charging actions will be high-dimension and increase exponentially with the number of EVs. This makes it difficult to find an optimal charging control policy for this problem. Third, the limitation on the aggregated charging power in a microgrid of buildings. As aforementioned, a new load peak may appear if all the buildings implement homogeneous charging control policy. Therefore, the power exchange of each building in the microgrid should also be considered during scheduling in order to avoid the increasing of the load peak. This further increases the difficulty of solving this problem.

In recent years, the EV charging control attracts much attention due to the popularization of EVs and its impact to the grid[8, 9]. Various control methods have been established for EV charging scheduling. Based on the control model of these works, the control approaches can be mainly divided into two categories: the determined control approach and stochastic control approach.

In the determined charging control approach, the EV charging process is usually formulated as a determined programming model. In other words, it assumes the future EV charging demand is known a prior, such as arrival time, parking duration and required charging energy. Therefore, many traditional methods can be applied to search an optimized charging control policy, such as linear/quadratic programming method[10], mixed-integer programming[11], heuristic approach[12], model predictive control[13], etc. These methods enjoy the high optimization efficiency due to the assumption of perfect information of uncertainty.

However, the assumption introduced above is hard to obtain and these uncertainties cannot be underestimated in practice. For example, if the charging control policy is derived based on the determined programming model by assuming some EVs are parked for charging, this policy may be sub-optimal or even infeasible when these EVs are absent due to the prediction error. Therefore, many researchers study the stochastic control method, such as scenario-based optimization[6], robust optimization[14], reinforcement learning[15], simulation-based policy improvement[16], etc. The first two methods usually try to convert the stochastic charging control problem into a determined control problem. These two methods should be carried out periodically and there is no experience accumulation among the control policies obtained at each time. The latter two methods usually formulate the problem as a Markov decision process (MDP) and can be considered as a state-based method. Due to the well-known curse of dimensionality[17], these methods suffer the large state space and action space considering large number of EVs in a microgrid of buildings.

Most of existing works mainly consider the EV charging control in a parking lot or charging station[12, 6]. The interaction between EV charging and building energy operation also gains attention in recent years. In [13], it considers coordination between EV charging and building integrated wind energy. In [18], it studies the EV charging impact on the operation cost of the building. In [19], a transactive real-time EV charging management scheme is proposed to coordinate EV charging with the distributed photovoltaic (PV) generation in the building. However, few works consider the EV charging control in a microgrid of buildings to avoid homogeneous charging actions. This is critical to avoid exceeding the load capacity of the microgrid and ensure the operation safety.

Based on the discussions above, we study the EV charging scheduling problem in a microgrid of buildings in this paper. Compared with the published literature, the main contributions of this paper are as follows:

1) The EV charging scheduling is formulated as a Markov decision process model to capture the uncertain distributed renewable energy and EV charging demand in the microgrid of buildings. Besides reducing the charging impact on the economic operation of the buildings, this model also limits the power exchange to the main grid to avoid the increasing of load peak for transmission safety.

2) In order to alleviate the impact of large state space and action space when solving the proposed MDP model, we reformulate it within event-based optimization framework. By appropriately defining the event and event-based action, the EV charging process can be optimized by searching a randomized parametric event-based control policy in the microgrid controller and implementing a selecting-to-charging rule in each building controller.

3) A constrained gradient-based policy optimzation method with adjusting mechanism is proposed to iteratively find the optimal event-based control policy for EV charging demand in each building while ensuring transmission safety. Numerical experiments considering a microgrid of three buildings are conducted to analyze the structure and the performance of the event-based control policy for EV charging.

The rest of this paper is organized as follows. We formulate the problem in Section II, present the solution methodology in Section III, discuss the numerical results in Section IV, and briefly conclude in Section V.

II Problem Formulation

II-A System Description

We consider a microgrid of buildings as depicted in Fig. 1. In the microgrid, each building is equipped with distributed renewable energy (DRE), hydrogen energy storage (HES) and charging piles. The building should provide charging service and keep load balance. We assume that only when the output of DRE and HES cannot satisfy the EV charging demand and building load, the building will procure power from the grid through microgrid operation controller. The building can also sell power to the grid if the output of DRE is large. In order to reduce the EV charging impact and ensure transmission safety, microgrid operation controller should regulate EV charging behaviors in each building based on the supply and demand status.

Refer to caption
Figure 1: System description of EV scheduling in a microgrid of buildings.

In the following, we will first formulate this multi-stage stochastic decision problem as Markov decision process to capture the uncertain renewable energy and EV charging demand in each building. To simplify the discussions, the following assumptions are made:

1) The charge level of each EV is fixed.

2) The distributed renewable energy is free.

3) The electricity price from the grid is deterministic and known a priori.

II-B System Model

We consider this EV scheduling problem in a microgrid of buildings over the discretized horizon t=1,2,,Tt=1,2,...,T where tt denotes the decision epoch and ΔT\Delta T denotes the decision interval. There are KK buildings and NN EVs in the microgrid. The MDP model of the proposed problem is shown below.

1) System States: The system state at stage tt is defined as St=[st1,st2,,stK]S_{t}=[s_{t}^{1},s_{t}^{2},...,s_{t}^{K}] where k=1,2,,Kk=1,2,...,K and stks_{t}^{k} denotes the state of the kkth building. For each stks_{t}^{k}, it is defined as stk=[rtk,btk,nt,mk,nt,ck]s_{t}^{k}=[r_{t}^{k},b_{t}^{k},n_{t,m}^{k},n_{t,c}^{k}] where rtkr_{t}^{k} denotes the output of DRE in the building, btkb_{t}^{k} denotes the State of Charge (SOC) of HES, nt,mcn_{t,m}^{c} denotes the number of EVs which must be charged at stage tt and nt,ccn_{t,c}^{c} denotes the number of EVs which can be charged at stage tt.

2) Action Space: The control action at stage tt is defined as At=[at1,at2,,atK]A_{t}=[a_{t}^{1},a_{t}^{2},...,a_{t}^{K}] where atka_{t}^{k} denotes the specific action for the kkth building. Each building should decide whether to provide charge service for the connected EVs at stage tt. Therefore, there is atk=[zt,1k,zt,2k,,zt,Nk]a_{t}^{k}=[z_{t,1}^{k},z_{t,2}^{k},...,z_{t,N}^{k}] where zt,ik{0,1},i=1,2,,Nz_{t,i}^{k}\in\{0,1\},i=1,2,...,N. When zt,ik=1z_{t,i}^{k}=1, it means the kkth building should provide charge service for the iith EV at stage tt if it is parked, otherwise zt,ik=0z_{t,i}^{k}=0.

3) System Dynamics: As the energy status of EV and HES are both time-coupled, their system dynamics should be considered when action AtA_{t} is decided for the current state StS_{t}.

For each EV, we use a tuple (Tti,Eti,Lti)(T_{t}^{i},E_{t}^{i},L_{t}^{i}) to represent its remaining parking time, remaining required charging energy and parking location. Lti{0,1,2,,K}L_{t}^{i}\in\{0,1,2,...,K\} and Lti=0L_{t}^{i}=0 if the iith EV is on the road. Then, there is

Tt+1i={TtiΔT,ifLti>0τt+1i,ifLt+1i×(1Lti)>00,ifLt+1i=0T_{t+1}^{i}=\begin{cases}T_{t}^{i}\!-\!\Delta T,&\text{if}\ L_{t}^{i}>0\\ \tau_{t+1}^{i},&\text{if}\ L_{t+1}^{i}\times(1-L_{t}^{i})>0\\ 0,&\text{if}\ L_{t+1}^{i}=0\end{cases} (1)
Et+1i={EtiztiPΔTψc,ifLti>0ηt+1i,ifLt+1i×(1Lti)>00,ifLt+1i=0E_{t+1}^{i}=\begin{cases}E_{t}^{i}\!-\!z_{t}^{i}P\Delta T\psi^{c},&\text{if}\ L_{t}^{i}>0\\ \eta_{t+1}^{i},&\text{if}\ L_{t+1}^{i}\times(1-L_{t}^{i})>0\\ 0,&\text{if}\ L_{t+1}^{i}=0\end{cases} (2)

where PP is the charge power, ψc\psi^{c} denotes the charge efficiency, τt+1i\tau_{t+1}^{i} and ηt+1i\eta_{t+1}^{i} are both nonnegative random variables which denote the stochastic characteristic of EV charging demand in the future. As the location transitions for EVs are time-variant, the location transition probability can be established as

P(Lt+1i|Lti)=[p11(t)p12(t)p1K(t)p10(t)p21(t)p22(t)p2K(t)p20(t)pK1(t)pK2(t)pKK(t)pK0(t)]P(L_{t+1}^{i}|L_{t}^{i})=\begin{bmatrix}p_{11}(t)&p_{12}(t)&...&p_{1K}(t)&p_{10}(t)\\ p_{21}(t)&p_{22}(t)&...&p_{2K}(t)&p_{20}(t)\\ ...&...&...&...&...\\ p_{K1}(t)&p_{K2}(t)&...&p_{KK}(t)&p_{K0}(t)\\ \end{bmatrix} (3)

where pK1(t)p_{K1}(t) denotes the EV is parked in the KKthe building at stage tt and moves to the first building, and so forth.

According to [20], the system dynamics of HES in each building can be depicted as follows

κt+1k={max{κtkhtk/φH2P,0},ifhtk0min{κtkhtkφP2H,κcap},ifhtk0\kappa_{t+1}^{k}=\begin{cases}\max\{\kappa_{t}^{k}-h_{t}^{k}/\varphi^{\text{H2P}},0\},&\text{if}\ h_{t}^{k}\geq 0\\ \min\{\kappa_{t}^{k}-h_{t}^{k}\varphi^{\text{P2H}},\kappa^{\text{cap}}\},&\text{if}\ h_{t}^{k}\leq 0\end{cases} (4)

where κtk\kappa_{t}^{k} is the stored hydrogen of HES in the kkth building at stage tt, κcap\kappa_{\text{cap}} is the hydrogen storage capacity of HES, φH2P\varphi^{\text{H2P}} is the round-trip efficiency from hydrogen to power, φP2H\varphi^{\text{P2H}} is the round-trip efficiency from power to hydrogen, htkh_{t}^{k} is the discharge power of HES if htk0h_{t}^{k}\geq 0, otherwise is the charge power of HES. Considering κe,tk=κtkσH2\kappa_{e,t}^{k}=\kappa_{t}^{k}\sigma_{H_{2}} where κe,tk\kappa_{e,t}^{k} is the stored hydrogen energy and σH2\sigma_{H_{2}} is the lower heating of hydrogen, equation (4) can be rewritten as follows by multiplying both sides with σH2/κecap\sigma_{H_{2}}/\kappa_{e}^{\text{cap}}

bt+1k={max{btkhtk/(ηdcκecap),0},ifhtk0min{btkhtkηc/κecap,1},ifhtk0b_{t+1}^{k}=\begin{cases}\max\{b_{t}^{k}-h_{t}^{k}/(\eta^{dc}\kappa_{e}^{\text{cap}}),0\},&\text{if}\ h_{t}^{k}\geq 0\\ \min\{b_{t}^{k}-h_{t}^{k}\eta^{c}/\kappa_{e}^{\text{cap}},1\},&\text{if}\ h_{t}^{k}\leq 0\end{cases} (5)

where κecap=κcapσH2\kappa_{e}^{\text{cap}}=\kappa^{\text{cap}}\sigma_{H_{2}} denotes the energy capacity of HES, ηdc=φH2P/σH2\eta^{dc}=\varphi^{\text{H2P}}/\sigma_{H_{2}} denotes the discharge efficiency of HES and ηc=φP2HσH2\eta^{c}=\varphi^{\text{P2H}}\sigma_{H_{2}} denotes the charge efficiency of HES.

For each HES in the building, we regulate that HES will discharge when the DRE cannot satisfy the building demand and will charge if DRE is sufficient to meet the building demand, i.e.,

htk={min{rtkltkptk,ht,ck¯},ifrtkltk+ptkmin{ltk+ptkrtk,ht,dck¯},ifrtkltk+ptkh_{t}^{k}=\begin{cases}-\min\{r_{t}^{k}-l_{t}^{k}-p_{t}^{k},\overline{h_{t,c}^{k}}\},&\text{if}\ r_{t}^{k}\geq l_{t}^{k}+p_{t}^{k}\\ \min\{l_{t}^{k}+p_{t}^{k}-r_{t}^{k},\overline{h_{t,dc}^{k}}\},&\text{if}\ r_{t}^{k}\leq l_{t}^{k}+p_{t}^{k}\end{cases} (6)

where ltkl_{t}^{k} denotes the net demand in the kkth building, ptk=i=1Nzt,ikPp_{t}^{k}=\sum_{i=1}^{N}z_{t,i}^{k}P is the total charge power in the kkth building, ht,ck¯\overline{h_{t,c}^{k}} and ht,dck¯\overline{h_{t,dc}^{k}} satisfy

ht,dck¯=min{hcap,btkκecapηdc},\displaystyle\overline{h_{t,dc}^{k}}=\min\{h^{\text{cap}},b_{t}^{k}\kappa_{e}^{\text{cap}}\eta^{dc}\}, (7)
ht,ck¯=min{hcap,(1btk)κecap/ηc}\displaystyle\overline{h_{t,c}^{k}}=\min\{h^{\text{cap}},(1-b_{t}^{k})\kappa_{e}^{\text{cap}}/\eta^{c}\}

in which hcaph^{\text{cap}} is the maximum charge/discharge power of HES.

Based on the supply demand status in the building, each building can sell excess power or procure power from the grid. Therefore, the exchange power gtkg_{t}^{k} between the building and the grid can be depicted as follows,

gtk={max{ltk+ptkrtkhtk,0},ifhtk0max{rtk+htkltkptk,0},ifhtk<0g_{t}^{k}=\begin{cases}\max\{l_{t}^{k}+p_{t}^{k}-r_{t}^{k}-h_{t}^{k},0\},&\text{if}\ h_{t}^{k}\geq 0\\ -\max\{r_{t}^{k}+h_{t}^{k}-l_{t}^{k}-p_{t}^{k},0\},&\text{if}\ h_{t}^{k}<0\end{cases} (8)

4) Constraints: The action AtA_{t} corresponding to the state StS_{t} should satisfy the following constraints,

k=1Kgtk=Gt\sum_{k=1}^{K}g_{t}^{k}=G_{t} (9)
G¯GtG¯\underline{G}\leq G_{t}\leq\overline{G} (10)
gk¯gtkgk¯\underline{g^{k}}\leq g_{t}^{k}\leq\overline{g^{k}} (11)
0EtiEcap0\leq E_{t}^{i}\leq E_{\text{cap}} (12)
0EtiPBti0\leq E_{t}^{i}\leq P\cdot B_{t}^{i} (13)
rtk+htk+gtk=ltk+ptkr_{t}^{k}+h_{t}^{k}+g_{t}^{k}=l_{t}^{k}+p_{t}^{k} (14)

. Constraint (9) denotes the total exchange power in a microgrid of buildings. Note that constraints (10) and (11) regulate the lower and upper bound of the total exchange power of the microgrid and the exchange power of the building to ensure transmission safety, respectively. Constraint (12) regulates that the required charging energy of each EV should not exceed the battery capacity. Constraint (13) constrains the remaining required charging energy should not exceed the maximum energy that can be supplied during the remaining parking time. This constraint is used to satisfy the driver’s charging demand. Constraint (14) denotes the load balance in each building.

5) Objective Function: As the responsibility of the microgrid operator controller is to optimize the operation cost during the entire periods, the following expected cumulative cost within a sliding window is chosen as the objective function considering the uncertain charging demand and output of DRE in the buildings, i.e.,

minJ=t=t0t0+Tw1𝐄π[k=1Kct(stk,atk)]\min J=\sum_{t=t_{0}}^{t_{0}+T_{w}-1}\mathbf{E}^{\pi}[\sum_{k=1}^{K}c_{t}(s_{t}^{k},a_{t}^{k})] (15)

where t0t_{0} denotes the current decision stage, TwT_{w} denotes the sliding window, π\pi denotes the EV scheduling policy and ct(stk,atk)c_{t}(s_{t}^{k},a_{t}^{k}) denotes the one-step cost incurred by taking action atka_{t}^{k} at state stks_{t}^{k} for the kkth building which is defined as follow:

ct(stk,atk)={ωt(ltk+ptk+ht,ck¯rtk),ifptkrtkltkht,ck¯ωt(ltk+ptkht,dck¯rtk),ifptkrtkltk+ht,dck¯0,elsec_{t}(s_{t}^{k},a_{t}^{k})=\begin{cases}\omega_{t}(l_{t}^{k}+p_{t}^{k}+\overline{h_{t,c}^{k}}-r_{t}^{k}),\\ \qquad\qquad\text{if}\ p_{t}^{k}\leq r_{t}^{k}-l_{t}^{k}-\overline{h_{t,c}^{k}}\\ \omega_{t}(l_{t}^{k}+p_{t}^{k}-\overline{h_{t,dc}^{k}}-r_{t}^{k}),\\ \qquad\qquad\text{if}\ p_{t}^{k}\geq r_{t}^{k}-l_{t}^{k}+\overline{h_{t,dc}^{k}}\\ 0,\qquad\quad\text{else}\end{cases} (16)

where ωt\omega_{t} denotes the electricity price.

Based on the proposed model above, the microgrid operator controller should minimize JJ to find an optimal scheduling policy π\pi^{*} for each decision stage. However, due to the coupled constraint (10) and the large state space and action space of the problem, the exact optimal solution of the above model can rarely be derived[21]. In the next section, we will explore a event-based approach with gradient search to approximately solve the problem.

III Solution Methodology

III-A Event Definition

The proposed MDP model suffers large state space difficulty as it is a state-based model and its state space increases exponentially with the number of buildings and EVs. Therefore, we propose a event-based optimization (EBO) framework to solve the large state space difficulty. In the EBO framework, the model and the solution focus on the event which depicts the sets of state transition[22]. The decision is event-triggered which can save computation burden. When the event is defined approximately, EBO can be applied to solve MDPs with large state space[23]. Due to these advantages, EBO has been applied in various fields, such as HVAC control[23, 24], communication network[25], stock trading[26], etc.

In this paper, our idea to use EBO comes from the fact that it may not need to describe the detailed EV charging state and output of the DRE which incurs the large state space. Instead, we can use event to depict the elastic ratio of the EV charging, DRE and HES. The larger elastic ratio means the larger control margin during scheduling.

Based on this idea, we can firstly define the elastic ratio of EV charging in each building, i.e.,

It,EV=(nt,cknt,mk)P(nk¯P)I_{t,EV}=\frac{(n_{t,c}^{k}-n_{t,m}^{k})P}{(\overline{n^{k}}P)} (17)

where nk¯\overline{n^{k}} denotes the number of charging piles in the kkth building. Equation (17) describes the elastic degree of charging which can be delayed. Secondly, the elastic ratio of HES in each building can be defined as its SOC, i.e., It,HES=btkI_{t,HES}=b_{t}^{k}. The larger btkb_{t}^{k} means that the HES can provide a more flexible charge/discharge power for operation cost optimization. Thirdly, the elastic ratio of DRE can be defined as follows,

It,DRE=rtkrk¯I_{t,DRE}=\frac{r_{t}^{k}}{\overline{r^{k}}} (18)

where rk¯\overline{r^{k}} denotes the generation capacity of the DRE in the kkth building. Equation (18) describes the generation level of the DRE and the higher level indicates the larger potential of operation cost saving.

Based on the elastic ratios introduced above, we can finally define the event as follows,

={etk|t=1,2,,T,k=1,2,,K}\mathcal{E}=\{e_{t}^{k}|t=1,2,...,T,k=1,2,...,K\} (19)

where

etk={<st1k,stk>|(It,EV+It,HES+It,DRE)/3[0,1]}e_{t}^{k}=\{<s_{t-1}^{k},s_{t}^{k}>|(I_{t,EV}+I_{t,HES}+I_{t,DRE})/3\in[0,1]\} (20)

. In equation (19), etke_{t}^{k} denotes the triggered event in the kkth building at stage tt. The value of etke_{t}^{k} is within [0,1][0,1]. If we divide this interval equally with discrete unit set as 0.1, the event space for each building is limited as 10 which is largely reduced compared with the large state space in the proposed MDP model.

III-B Randomized Parametric Event-based Policy

Another difficulty of the proposed MDP is the large action space. It is of great computation burden to compute the charge control variables zt,ikz_{t,i}^{k} for each EVs. Therefore, we propose a randomized parametric event-based policy to alleviate the large action space impact of this problem.

The charge control of each EV is implemented into two steps. Firstly, the microgrid operator controller decides a parametric charge ratio αtk\alpha_{t}^{k} for each building and is chosen as the event-based action, i.e., atk=αtk[0,1]a_{t}^{k}=\alpha_{t}^{k}\in[0,1]. In this way, the total charge power for each building can be described as follows,

ptk=[nt,mk+atk(nt,cknt,mk)]Pp_{t}^{k}=[n_{t,m}^{k}+a_{t}^{k}(n_{t,c}^{k}-n_{t,m}^{k})]P (21)

. As the charge ratio αtk[0,1]\alpha_{t}^{k}\in[0,1], the action space of the proposed problem can be largely restricted. Based on [24], the performance of randomized policy may be better than deterministic policy and easier to obtain. Therefore, we will find a randomized parametric event-based policy σ\sigma for the proposed problem, i.e.,

σ:𝒫(atk)\sigma:\mathcal{E}\rightarrow\mathcal{P}(a_{t}^{k}) (22)

where 𝒫\mathcal{P} denotes a probability distribution over action space. Equation (22) means that the action atka_{t}^{k} will be chosen based on probability distribution 𝒫\mathcal{P} and observed event etke_{t}^{k}. When an optimal randomized parametric event-based policy is obtained, the action with the highest probability can be selected for implementation in practice.

Secondly, after the microgrid operator controller allocates the charge ratio for each building, the charge controller in each building should decide which EV should be charged and keep the total number of charged EVs within nt,mk+atk(nt,cknt,mk)n_{t,m}^{k}+a_{t}^{k}(n_{t,c}^{k}-n_{t,m}^{k}). Therefore, we use a modified least-laxity-longer-processing-time-first (mLLLP) principle to select EVs to charge, which is introduced in [27]. The mLLLP principle generates a complete order among EVs and selects EVs based on the remaining processing time Eti/PE_{t}^{i}/P and the EV laxity TtiEti/PT_{t}^{i}-E_{t}^{i}/P.

III-C Constrained Gradient-based Policy Optimization for EBO

Due to the existence of constraint (10), the proposed MDP is coupled among all the buildings in the microgrid. Therefore, in this paper we propose a constrained gradient-based policy optimization method to search the optimal randomized parametric event-based policy considering the coupled constraint.

First, we neglect constraint (10) and the proposed model can be decoupled into single building optimization problem. For each building, let σ\sigma and ν\nu denotes two event-based policies, the following state-based performance difference formula can be derived based on [28].

Jt0k,σ(st0k)Jt0k,ν(st0k)=βt0k,σPt0k,σ(Jt0+1k,σJt0+1k,ν)+βt0k,σ[ct0k,σ+Pt0k,σJt0+1k,ν(ct0k,ν+Pt0k,νJt0+1k,ν)]=t=t0t0+Tw1βtk,σ[ctk,σctk,ν+(Ptk,σPtk,ν)Jt+1k,ν)]\begin{split}&J_{t_{0}}^{k,\sigma}(s_{t_{0}}^{k})-J_{t_{0}}^{k,\nu}(s_{t_{0}}^{k})\\ &=\beta_{t_{0}}^{k,\sigma}P_{t_{0}}^{k,\sigma}(J_{t_{0}+1}^{k,\sigma}-J_{t_{0}+1}^{k,\nu})\\ &+\beta_{t_{0}}^{k,\sigma}[c_{t_{0}}^{k,\sigma}+P_{t_{0}}^{k,\sigma}J_{t_{0}+1}^{k,\nu}-(c_{t_{0}}^{k,\nu}+P_{t_{0}}^{k,\nu}J_{t_{0}+1}^{k,\nu})]\\ &=\sum_{t=t_{0}}^{t_{0}+T_{w}-1}\beta_{t}^{k,\sigma}[c_{t}^{k,\sigma}-c_{t}^{k,\nu}+(P_{t}^{k,\sigma}-P_{t}^{k,\nu})J_{t+1}^{k,\nu})]\end{split} (23)

where Jt0k,σJ_{t_{0}}^{k,\sigma} and Jt0k,νJ_{t_{0}}^{k,\nu} denotes the value function from stage t0t_{0} to t0+Tw1t_{0}+T_{w}-1 corresponding to the event-based policy σ\sigma and ν\nu for the kkth building, βtk,σ\beta_{t}^{k,\sigma} is the state distribution at stage tt corresponding to σ\sigma, Ptk,σP_{t}^{k,\sigma} and Ptk,νP_{t}^{k,\nu} denotes the transition probability at stage tt corresponding to σ\sigma and ν\nu, ctk,σc_{t}^{k,\sigma} and ctk,νc_{t}^{k,\nu} denotes the one-step cost at stage tt corresponding to σ\sigma and ν\nu. Note that the initial distribution βt0k,σ\beta_{t_{0}}^{k,\sigma} is independent with policy σ{\sigma} or ν{\nu}. By using recursion and βt0k,σPt0σ=βt0+1k,σ\beta_{t_{0}}^{k,\sigma}P_{t_{0}}^{\sigma}=\beta_{t_{0}+1}^{k,\sigma}, the last equation can be obtained.

Based on (23), we can extend the performance difference for event-based optimization, i.e.,

Jt0k,σ(st0k)Jt0k,ν(st0k)=t=t0t0+Tw1stk𝒮βtk,σ(st)[ctk,σctk,ν+(Ptk,σPtk,ν)Jt+1k,ν]=t=t0t0+Tw1stk𝒮etkβtk,σ(stk,etk)[ctk,σctk,ν+(Ptk,σPtk,ν)Jt+1k,ν]=t=t0t0+Tw1etkβtk,σ(etk){stk𝒮βtk,σ(stk|etk)[ctk,σctk,ν+(Ptk,σPtk,ν)Jt+1k,ν]}\begin{split}&J_{t_{0}}^{k,\sigma}(s_{t_{0}}^{k})-J_{t_{0}}^{k,\nu}(s_{t_{0}}^{k})\\ &=\sum_{t=t_{0}}^{t_{0}+T_{w}-1}\sum_{s_{t}^{k}\in\mathcal{S}}\beta_{t}^{k,\sigma}(s_{t})[c_{t}^{k,\sigma}-c_{t}^{k,\nu}+(P_{t}^{k,\sigma}-P_{t}^{k,\nu})J_{t_{+}1}^{k,\nu}]\\ &=\sum_{t=t_{0}}^{t_{0}+T_{w}-1}\sum_{s_{t}^{k}\in\mathcal{S}}\sum_{e_{t}^{k}\in\mathcal{E}}\beta_{t}^{k,\sigma}(s_{t}^{k},e_{t}^{k})\cdot\\ &\qquad\qquad\qquad[c_{t}^{k,\sigma}-c_{t}^{k,\nu}+(P_{t}^{k,\sigma}-P_{t}^{k,\nu})J_{t_{+}1}^{k,\nu}]\\ &=\sum_{t=t_{0}}^{t_{0}+T_{w}-1}\sum_{e_{t}^{k}\in\mathcal{E}}\beta_{t}^{k,\sigma}(e_{t}^{k})\{\sum_{s_{t}^{k}\in\mathcal{S}}\beta_{t}^{k,\sigma}(s_{t}^{k}|e_{t}^{k})\cdot\\ &\qquad\qquad\qquad[c_{t}^{k,\sigma}-c_{t}^{k,\nu}+(P_{t}^{k,\sigma}-P_{t}^{k,\nu})J_{t_{+}1}^{k,\nu}]\}\end{split} (24)

where 𝒮\mathcal{S} denotes the state space. When policy ν\nu is close to σ\sigma, the performance gradient at stage t0t_{0} can be derived as follows by observing event et0ke_{t_{0}}^{k},

Jt0k,σ(et0k)σt0=βt0k,σt0(et0k)st0k𝒮βt0k,σt0(st0k|et0k)[ct0k,σt0σt0+Pt0k,σt0σt0Jt0+1k,σ]\begin{split}\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial\sigma_{t_{0}}}=&\beta_{t_{0}}^{k,\sigma_{t_{0}}}(e_{t_{0}}^{k})\sum_{s_{t_{0}}^{k}\in\mathcal{S}}\beta_{t_{0}}^{k,\sigma_{t_{0}}}(s_{t_{0}}^{k}|e_{t_{0}}^{k})\cdot\\ &[\frac{\partial c_{t_{0}}^{k,\sigma_{t_{0}}}}{\partial\sigma_{t_{0}}}+\frac{\partial P_{t_{0}}^{k,\sigma_{t_{0}}}}{\partial\sigma_{t_{0}}}J_{t_{0}+1}^{k,\sigma}]\end{split} (25)

where σt0\sigma_{t_{0}} denotes the detailed charge control policy at stage t0t_{0} for the kkth building.

Note that policy σ\sigma is a randomized policy which selects the action with specific probability. Suppose there are MM actions for the kkth building at stage t0t_{0} and each action is denoted as αt0k,m\alpha_{t_{0}}^{k,m}. Let pt0k,mp_{t_{0}}^{k,m} denotes the probability to choose αt0k,m\alpha_{t_{0}}^{k,m}. Then there is the following relationship. The proof is given in the Appendix.

Pt0k,σt0pt0k,m=i=1Mpt0k,ipt0k,m(i=1Mpt0k,i)2P(st0+1k|st0k,αt0k,m)+impt0k,i(i=1Mpt0k,i)2P(st0+1k|st0k,αt0k,i)\begin{split}\frac{\partial P_{t_{0}}^{k,\sigma_{t_{0}}}}{\partial p_{t_{0}}^{k,m}}=&\frac{\sum_{i=1}^{M}p_{t_{0}}^{k,i}-p_{t_{0}}^{k,m}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}P(s_{{t_{0}}+1}^{k}|s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})\\ &+\sum_{i\neq m}\frac{-p_{t_{0}}^{k,i}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}P(s_{{t_{0}}+1}^{k}|s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i})\end{split} (26)
ct0k,σt0pt0k,m=i=1Mpt0k,ipt0k,m(i=1Mpt0k,i)2c(st0k,αt0k,m)+impt0k,i(i=1Mpt0k,i)2c(st0k,αt0k,i)\begin{split}\frac{\partial c_{t_{0}}^{k,\sigma_{t_{0}}}}{\partial p_{t_{0}}^{k,m}}=&\frac{\sum_{i=1}^{M}p_{t_{0}}^{k,i}-p_{t_{0}}^{k,m}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}c(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})\\ &+\sum_{i\neq m}\frac{-p_{t_{0}}^{k,i}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}c(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i})\end{split} (27)

Substituting (26) and (27) into (25), the policy gradient can be finally obtained which is shown below,

Jt0k,σ(et0k)pt0k,m=βt0k,σt0(et0k)st0k𝒮βt0k,σt0(st0k|et0k){i=1Mpt0k,ipt0k,m(i=1Mpt0k,i)2[c(st0k,αt0k,m)+V(st0k,αt0k,m)]+impt0k,i(i=1Mpt0k,i)2[c(st0k,αt0k,i)+V(st0k,αt0k,i)]}\begin{split}\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial p_{t_{0}}^{k,m}}=&\beta_{t_{0}}^{k,\sigma_{t_{0}}}(e_{t_{0}}^{k})\sum_{s_{t_{0}}^{k}\in\mathcal{S}}\beta_{t_{0}}^{k,\sigma_{t_{0}}}(s_{t_{0}}^{k}|e_{t_{0}}^{k})\cdot\\ &\{\frac{\sum_{i=1}^{M}p_{t_{0}}^{k,i}-p_{t_{0}}^{k,m}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}[c(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})+V(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})]\\ &+\sum_{i\neq m}\frac{-p_{t_{0}}^{k,i}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}}[c(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i})+V(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i})]\}\end{split} (28)

where

V(st0k,αt0k,m)=st0+1k𝒮P(st0+1k|st0k,αt0k,m)Jt0+1k,σ(st0+1k)V(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})=\sum_{s_{t_{0}+1}^{k}\in\mathcal{S}}P(s_{t_{0}+1}^{k}|s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})J_{t_{0}+1}^{k,\sigma}(s_{t_{0}+1}^{k}) (29)

denotes the incurred future total cost when taking action αt0k,m\alpha_{t_{0}}^{k,m} for current state st0ks_{t_{0}}^{k} and Jt0k,σ(et0k)σt0=(Jt0k,σ(et0k)pt0k,1,Jt0k,σ(et0k)pt0k,2,,Jt0k,σ(et0k)pt0k,M)\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial\sigma_{t_{0}}}=(\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial p_{t_{0}}^{k,1}},\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial p_{t_{0}}^{k,2}},...,\frac{\partial J_{t_{0}}^{k,\sigma}(e_{t_{0}}^{k})}{\partial p_{t_{0}}^{k,M}}).

Based on the policy gradient (28), the randomized parametric event-based policy can be updated as follows during policy optimization,

σt0,j+1=σt0,jδjJt0k,σj(et0k)σt0k\sigma_{t_{0},j+1}=\sigma_{t_{0},j}-\delta_{j}\frac{\partial J_{t_{0}}^{k,\sigma_{j}}(e_{t_{0}}^{k})}{\partial\sigma_{t_{0}}^{k}} (30)

where σt0,j\sigma_{t_{0},j} denotes the updated event-based policy at jjth iteration, δj=1/(1+ξj)\delta_{j}=1/(1+\xi j) denotes the update step at jjth iteration and ξ\xi denotes the decay factor.

Due to the uncertainties in the DRE generation and EV charging demand, it is impractical to analytically compute equation (28) under expectation. Therefore, the Monte Carlo simulation method is adopted to estimate (28). The estimation algorithm is summarized in Algorithm 1.

Algorithm 1 Policy Gradient Estimation
1:  Input:policy σ\sigma.
2:  Generate and record sample paths under policy σ\sigma, i.e., {st0,ik,at0,ik,st0+1,ik,at0+1,ik,,st0+Tw1,ik,at0+Tw1,ik}\{s_{t_{0},i}^{k},a_{t_{0},i}^{k},s_{t_{0}+1,i}^{k},a_{t_{0}+1,i}^{k},...,s_{t_{0}+T_{w}-1,i}^{k},a_{t_{0}+T_{w}-1,i}^{k}\} where i=1,2,,Li=1,2,...,L, LL is the total number of sample paths, st0,iks_{t_{0},i}^{k} and at0,ika_{t_{0},i}^{k} denotes the observed state and selected action of the kkth building in the iith sample path.
3:  Compute the number of occurrences L(et0k)L(e_{t_{0}}^{k}) when event et0ke_{t_{0}}^{k} happens. Then there is βt0k,σt0=L(et0k)/L\beta_{t_{0}}^{k,\sigma_{t_{0}}}=L(e_{t_{0}}^{k})/L.
4:  Compute the number of occurrences L(st0k|et0k)L(s_{t_{0}}^{k}|e_{t_{0}}^{k}) when event et0ke_{t_{0}}^{k} happens and the system observes state st0ks_{t_{0}}^{k}. Then there is βt0k,σt0(st0k|et0k)=L(st0k|et0k)/L(et0k)\beta_{t_{0}}^{k,\sigma_{t_{0}}}(s_{t_{0}}^{k}|e_{t_{0}}^{k})=L(s_{t_{0}}^{k}|e_{t_{0}}^{k})/L(e_{t_{0}}^{k}).
5:  Compute the number of occurrences L(et0k,st0k,αt0k,m)L(e_{t_{0}}^{k},s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m}) when et0ke_{t_{0}}^{k} happens and state-action pair (st0k,αt0k,m)(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m}) appears (Denote this sample path set as (et0k,st0k,αt0k,m)\mathcal{L}(e_{t_{0}}^{k},s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})). Then there is V(st0k,αt0k,m)=1L(et0k,st0k,αt0k,m)i(et0k,st0k,αt0k,m)t=t0+1t0+Tw1ctk,σV(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})=\frac{1}{L(e_{t_{0}}^{k},s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})}\sum_{i\in\mathcal{L}(e_{t_{0}}^{k},s_{t_{0}}^{k},\alpha_{t_{0}}^{k,m})}\sum_{t=t_{0}+1}^{t_{0}+T_{w}-1}c_{t}^{k,\sigma}.
6:  Substituting the above equations into (28).

As mentioned before, the derived policy gradient neglects the coupled constraint (10). In order to satisfy this transmission safety constraint, the following adjusting mechanism is proposed to ensure the feasibility of policy σ\sigma.

Adjusting Step I: If the total exchange power exceed the upper bound of (10) by Δ\Delta, the total number of EVs to be charged should be reduced by Δ/P\Delta/P. Each building should reduce the number of charged EVs by (Jt0k,σΔ/σt0k)/(Pk=1KJt0k,σ/σt0k)(\partial J_{t_{0}}^{k,\sigma}\Delta/\partial\sigma_{t_{0}}^{k})/(P\sum_{k=1}^{K}\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}). The policy σ\sigma can be updated by solving the following Quadratic Programming problem,

minpt0,j+1k,i[0,1]i=1M(pt0,j+1k,ipt0,jk,i)2s.t.i=1Mpt0,j+1k,iαt0k,m=i=1Mpt0,jk,iαt0k,mJt0k,σ/σt0kk=1KJt0k,σ/σt0kΔP(nt,cknt,mk)\begin{split}&\quad\min_{p_{t_{0},j+1}^{k,i}\in[0,1]}\sum_{i=1}^{M}(p_{t_{0},j+1}^{k,i}-p_{t_{0},j}^{k,i})^{2}\\ &\textit{s.t.}\\ &\sum_{i=1}^{M}p_{t_{0},j+1}^{k,i}\alpha_{t_{0}}^{k,m}=\sum_{i=1}^{M}p_{t_{0},j}^{k,i}\alpha_{t_{0}}^{k,m}-\\ &\qquad\qquad\qquad\quad\frac{\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}}{\sum_{k=1}^{K}\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}}\cdot\frac{\Delta}{P(n_{t,c}^{k}-n_{t,m}^{k})}\\ \end{split} (31)

where pt0,j+1k,ip_{t_{0},j+1}^{k,i} denotes the selection probability in σt0,j+1\sigma_{t_{0},j+1}.

Adjusting Step II: If the total exchange power is below the lower bound of (10) with Δ-\Delta, the total number of EVs to be charged should be increased by Δ/P\Delta/P. Each building should increase the number of charged EVs by (1Jt0k,σ/σt0k/k=1KJt0k,σ/σt0k)Δ/P(1-\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}/\sum_{k=1}^{K}\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k})\Delta/P. Similarly, the policy σ\sigma can be updated by solving the following problem,

minpt0,j+1k,i[0,1]i=1M(pt0,j+1k,ipt0,jk,i)2s.t.i=1Mpt0,j+1k,iαt0k,m=i=1Mpt0,jk,iαt0k,m+(1Jt0k,σ/σt0kk=1KJt0k,σ/σt0k)ΔP(nt,cknt,mk)\begin{split}&\quad\min_{p_{t_{0},j+1}^{k,i}\in[0,1]}\sum_{i=1}^{M}(p_{t_{0},j+1}^{k,i}-p_{t_{0},j}^{k,i})^{2}\\ &\textit{s.t.}\\ &\sum_{i=1}^{M}p_{t_{0},j+1}^{k,i}\alpha_{t_{0}}^{k,m}=\sum_{i=1}^{M}p_{t_{0},j}^{k,i}\alpha_{t_{0}}^{k,m}+\\ &\qquad\qquad\quad(1-\frac{\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}}{\sum_{k=1}^{K}\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k}})\cdot\frac{\Delta}{P(n_{t,c}^{k}-n_{t,m}^{k})}\\ \end{split} (32)

Note that as Jt0k,σ/σt0k\partial J_{t_{0}}^{k,\sigma}/\partial\sigma_{t_{0}}^{k} can be considered as the marginal operation cost for the kkth building, the proposed adjusting mechanism allocates the reduced or increased number of charged EVs for each building based on this marginal cost. When required to reduce the charge demand, the building with the large marginal cost should largely reduce its charged EVs. On the contrary, when required to increase the charge demand, the building with small marginal cost should largely increase its EVs to be charged. The motivation of solving (31) and (32) is to minimize the probability difference between the adjacent policy σt0,j+1\sigma_{t_{0},j+1} and σt0,j\sigma_{t_{0},j} while reduce or increase the expected charge ratio to satisfy the allocated reduced or increased charge demand for the kkth building.

The idea of this adjust mechanism can be denoted as Fig. 2. As there exists large number of discrete variables and non-linear constraints, the feasible policy space can be considered as disconnected. When constraint (10) is not violated, the policy update happens within a feasible policy set. When violated, the policy update should transfer to another feasible policy set based on this adjust mechanism. In the end, the proposed constrained gradient-based policy optimization for the problem is summarized in Algorithm 2.

Refer to caption
Figure 2: Illustration of the adjusting mechanism.
Algorithm 2 Constrained Gradient-based Policy Optimization
1:  for t0=1,2,,Tt_{0}=1,2,...,T do
2:     Set j0j\rightarrow 0 and select the initial policy σt0,j\sigma_{t_{0},j}.
3:     for k=1,2,…,K do
4:        Compute the policy gradient based on Algorithm 1 when observing the event et0ke_{t_{0}}^{k}.
5:     end for
6:     Check whether constraint (10) is violated. If violated, update policy by adjusting mechanism. Go to Step 3.
7:     If Jt0k,σjσt02ϵ||\frac{\partial J_{t_{0}}^{k,\sigma_{j}}}{\partial\sigma_{t_{0}}}||_{2}\leq\epsilon or Jt0k,σjσt0Jt0k,σj1σt02ϵ||\frac{\partial J_{t_{0}}^{k,\sigma_{j}}}{\partial\sigma_{t_{0}}}-\frac{\partial J_{t_{0}}^{k,\sigma_{j-1}}}{\partial\sigma_{t_{0}}}||_{2}\leq\epsilon, go to Step 1.
8:     Update policy using (30) and go to Step 3.
9:  end for

IV Numerical Results

In this section, we evaluate the proposed method for EV charging scheduling via simulations. The charge control policy of different types of buildings, the performance and comparison of the proposed solution method are analyzed in the following experiments.

IV-A Parameter Settings

We take building load data from [29] as shown in Table I. In the experiment, we consider there are three buildings in the microgrid. The first two are residential buildings and the rest is office building. We take distributed wind generation data from [5] and set as the predicted value. The actual output of DRE in each building is assumed to follow normal distribution with predicted value as the mean value and 10%10\% of the predicted value as the standard deviation. Fig. 3 shows the realization of the actual output of DRE in three buildings. The time-of-use electricity price is shown in Table II.

TABLE I: Building Load Data (unit: kW)
Time(h) k=1k=1 k=2k=2 k=3k=3 Time(h) k=1k=1 k=2k=2 k=3k=3
1 139 180 367 13 185 216 953
2 103 120 353 14 420 516 953
3 144 180 333 15 430 516 947
4 127 147 333 16 743 876 967
5 151 180 433 17 1132 1356 953
6 150 180 387 18 1340 1596 1053
7 67 84 520 19 2681 2160 1033
8 202 240 567 20 1780 1643 1000
9 216 264 820 21 1760 1400 967
10 151 191 1053 22 1648 1320 820
11 147 170 967 23 1251 1500 700
12 150 258 973 24 559 660 593
Refer to caption
Figure 3: Realization of actual wind output in three buildings.
TABLE II: Electricity Price ωt\omega_{t}
Price Time
0.3515RMB/kWh 23:00-6:00
0.8135RMB/kWh 7:00-10:00
0.4883RMB/kWh 11:00-18:00
0.8135RMB/kWh 19:00-22:00

We consider there are 200 EVs in the microgrid and their drivers live evenly in the two residential buildings and work in the office building. In the experiment, we assume that the departure time from residential and office buildings follow normal distribution 𝒩(7:00,60min)\mathcal{N}(7:00,60\text{min}) and 𝒩(17:00,60min)\mathcal{N}(17:00,60\text{min}), respectively. The trip time between building #1 (k=1) and building #3 (k=3) follows normal distribution 𝒩(60min,30min)\mathcal{N}(60\text{min},30\text{min}). The trip time between building #2 (k=2) and building #3 (k=3) follows normal distribution 𝒩(90min,30min)\mathcal{N}(90\text{min},30\text{min}).

The battery specification of the Nissan Leaf EV [30] is used in the experiments. The required charging energy ηt+1i\eta_{t+1}^{i} of future parking event is sampled based on the probability distribution of the trip distance and the electric drive efficiency which is introduced in [5]. We take the parameters of HES from [31]. The detailed parameters are shown in Table III.

TABLE III: Parameter Settings
Parameter Setting Parameter Setting
EcapE_{\text{cap}} 36kWh PP 3.6kW
ψc\psi^{c} 0.92 κecap\kappa_{e}^{\text{cap}} 166.65kWh
hcaph^{\text{cap}} 50kW ηc\eta^{c} 0.82
ηdc\eta^{dc} 0.62 ϵ\epsilon 0.1
G¯\underline{G} -5600kW G¯\overline{G} 5600kW
LL 50 ξ\xi 0.1

In the experiment, we consider this problem on a daily basis with T=48T=48, ΔT=30\Delta T=30minutes and Tw=12T_{w}=12. The event is evenly discretized by 0.1. The action selection probabilities of the initial policy σ\sigma are all set as equal.

IV-B Result Analysis

Refer to caption
(a) Optimized Selection Probability (k=1k=1)
Refer to caption
(b) Optimized Selection Probability (k=2k=2)
Refer to caption
(c) Optimized Selection Probability (k=3k=3)
Refer to caption
(d) Total Charge Power and Event (k=1)k=1)
Refer to caption
(e) Total Charge Power and Event (k=2)k=2)
Refer to caption
(f) Total Charge Power and Event (k=3)k=3)
Refer to caption
(g) SOC of HES (k=1k=1)
Refer to caption
(h) SOC of HES (k=2k=2)
Refer to caption
(i) SOC of HES (k=3k=3)
Figure 4: Scheduling results for a microgrid of three buildings with 200 EVs

The scheduling results are shown in Fig. 4 which shows the optimized selection probability, total charging power, observed event and SOC of HES for three buildings at each stage. Note that Fig. 4(a), Fig. 4(b) and Fig. 4(c) shows the optimized selection probability of each charge ratio action in the randomized event-based policy after observing the event as shown in Fig. 4(d), Fig. 4(e) and Fig. 4(f). The darker color indicates higher selection probability. From Fig. 4(a), Fig. 4(b) and Fig. 4(c), it can be seen that the selection probabilities tend to achieve high value near the departure time, such as 7:00 in building #1 and building #2 and 17:00 in building #3. The reason why the charging probability is small at the arrival and becomes large at the departure lies in two aspects. The first is that parking deadline approaches and the charging demand should be satisfied. Another important reason is that the distributed wind power begins to increase during time interval (2:30-7:30) and (14:00-16:30). Note that as there are few EVs in building #1 and building #2 during (8:00-16:00), there is no action whose selection probability is significantly higher than others. The same reason holds for building #3 during (24:00-6:00). Furthermore, as there is no EV parked in building #3 after 20:00 considering time window TwT_{w} set as 6 hours, the policy gradient keeps zero and the selection probability for each action remains unchanged and equal.

The total charging power and observed event is shown in Fig. 4(d), Fig. 4(e) and Fig. 4(f). Due to the stochastic charging demand of EVs and their distinct departure, the charging behavior occurs during (16:00-10:00) for building #1 and building #2 and during (10:00-18:00) for building #3. Furthermore, it can be found that the peak of the total charging power occurs later in building #2 than building #1 after 15:00. This is because that the trip time from building #3 to building #2 is longer than that from building #3 to building #1. Based on the delayed feature of the optimized selection probability in building #3, the peak occurrence of charging power are also delayed to 15:00 and gradually decrease due to the departure of EVs. It can also be found that the trend of the observed event is similar with the trend of the SOC of HES in each building as shown in Fig. 4(g), Fig. 4(h) and Fig. 4(i). This is because the SOC of HES is one of the main factor which influence the value of the observed event. The difference between the trend of observed event and SOC of HES is caused by the distributed wind power generation and EV charging elasticity. In Fig. 4(g), Fig. 4(h) and Fig. 4(i), the decreasing of SOC indicates that the HES provides power for balancing building load and EV charging load and the increasing of SOC indicates that the excess generation of distributed wind power. The small peak at 10:00 in building #2 is because the insufficient generation of distributed wind power and larger load demand comparing with building #1.

In order to analyze the performance of the optimized event-based EV charging policy, we compare the derived event-based policy with rule-based charging policy and ideal charging policy. The rule-based charging policy will satisfy the EV charging demand as soon as possible once connected to the charging pile in the building. The ideal charging policy is derived by implementing model predictive control (MPC) method with precise information of the EV charging demand and wind power generation and the same length of sliding window. Particularly, the optimal scheduling of HES are also considered as the control variables in MPC. In this way, it will introduce some non-linear constraints in the MPC model which should be linearized, such as the product between integer variables and continuous variables. The performance of the above three policies are shown in Table IV. It can be seen that the rule-based policy achieves highest operation cost as the EV charging control has no relationship with the building load and supply. On the contrary, the ideal charging achieves lowest operation cost. However, this policy can not be implemented in practice due to its requirement of seeing the future. Compared the event-based policy with ideal charging, it can be found that the performance of our policy is close to the idea policy and better than the rule-based charging policy. This demonstrates the effectiveness of the proposed EV charging control method.

TABLE IV: Total Operation Cost of Microgrid under Different Policies
Rule-based Charging Event-based Charging Ideal Charging
28163RMB 24965RMB 24811RMB

The total exchange power of microgrid under the above three policies are also shown in Fig. 5. The peak of the total exchange power at 19:00 is caused by the high building load in building #1 and building #2 at night. It can be seen that the rule-based charging policy exceed the maximum transmission power G¯=5600kW\overline{G}=5600\text{kW} while the peak of the latter two policies are below this upper bound. This indicates that the proposed constrained gradient-based policy optimization method can ensure the transmission safety. It can be seen from the figure that the main reason why the event-based charging and ideal charging outperform the rule-based charging lies in two aspects. The first is the total exchange power of the event-based policy and ideal policy postpones during (9:00-12:00) to enjoy low electricity price after 11:00. The second is the ahead of schedule of the event-based policy and ideal policy during (13:00-22:00) to avoid high electricity price during (19:00-22:00) and enjoy free wind power generation around 15:00.

Refer to caption
Figure 5: Total exchange power of microgrid under different policies.

Lastly, it is important to investigate the convergence rate of the proposed constrained gradient-based policy optimization. Therefore, Fig. 6 shows the total iteration number at each decision stage. For this experiment, the minimum iteration number is 5 which happens at 7:30 and the maximum iteration number is 50 which happens at 16:30 and 19:30. It takes about 25.3 iterations in average to find the optimal event-based charging policy at each decision stage. The average computation time at each decision stage is 5.2 minutes under the simulation environment i7-11700K@3.60GHz. This indicates the proposed algorithm requires a handful of iterations and its running time can be acceptable in practice.

Refer to caption
Figure 6: Total iteration number at each decision stage.

V Conclusion

In this paper, the EV charging scheduling problem in a microgrid of buildings is studied to optimize the total operation cost of the microgrid while ensuring its transmission safety. The MDP formulation is introduced to represent the uncertain supply and EV charging demand in the buildings. In order to alleviate the large state and action space difficulties, we reformulated it within a event-based optimization framework with searchable control policy space. A constrained gradient-based policy optimization approach is proposed to find an optimal randomized parametric control policy for EV charging. We analyze the structure of the control policy through numerical experiments and demonstrate the proposed method can reduce the total operation cost while ensuring transmission safety in the microgrid of buildings.

Appendix A Proof of Equation (26) and (27)

Proof.

Based on the randomized control policy σ\sigma, there is

P(αt0k,m|st0k)=pt0k,mi=1Mpt0k,iP(\alpha_{t_{0}}^{k,m}|s_{t_{0}}^{k})=\frac{p_{t_{0}}^{k,m}}{\sum_{i=1}^{M}p_{t_{0}}^{k,i}} (33)
Pt0k,σ(st0+1k|st0k)=i=1MP(αt0k,i|st0k)P(st0+1k|st0k,αt0k,i)P_{t_{0}}^{k,\sigma}(s_{t_{0}+1}^{k}|s_{t_{0}}^{k})=\sum_{i=1}^{M}P(\alpha_{t_{0}}^{k,i}|s_{t_{0}}^{k})P(s_{t_{0}+1}^{k}|s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i}) (34)

For the selection probability P(αt0k,m|st0k)P(\alpha_{t_{0}}^{k,m}|s_{t_{0}}^{k}), there is

P(αt0k,i|st0k)pt0k,m={i=1Mpt0k,ipt0k,m(i=1Mpt0k,i)2,ifi=mpt0k,i(i=1Mpt0k,i)2,ifim\frac{\partial P(\alpha_{t_{0}}^{k,i}|s_{t_{0}}^{k})}{\partial p_{t_{0}}^{k,m}}=\begin{cases}\frac{\sum_{i=1}^{M}p_{t_{0}}^{k,i}-p_{t_{0}}^{k,m}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}},&\text{if}\ i=m\\ \frac{-p_{t_{0}}^{k,i}}{(\sum_{i=1}^{M}p_{t_{0}}^{k,i})^{2}},&\text{if}\ i\neq m\end{cases} (35)

. As only P(αt0k,i|st0k)P(\alpha_{t_{0}}^{k,i}|s_{t_{0}}^{k}) depends on pt0k,mp_{t_{0}}^{k,m} in equation (34), equation (26) is obtained by taking derivative of equation (34) with pt0k,mp_{t_{0}}^{k,m} and substituting (35) into it.

Similarly, for the one-step cost ct0k,σt0c_{t_{0}}^{k,\sigma_{t_{0}}}, there is

ct0k,σ(st0k)=i=1MP(αt0k,i|st0k)c(st0k,αt0k,i)c_{t_{0}}^{k,\sigma}(s_{t_{0}}^{k})=\sum_{i=1}^{M}P(\alpha_{t_{0}}^{k,i}|s_{t_{0}}^{k})c(s_{t_{0}}^{k},\alpha_{t_{0}}^{k,i}) (36)

. Taking derivative of equation (36) with pt0k,mp_{t_{0}}^{k,m} and substituting (35) into it, equation (27) can be obtained. ∎

References

  • [1] Annual Report on New Energy Vehicle Industry in China (2020).   Social Sciences Academic Press, 2020.
  • [2] H. Das, M. Rahman, S. Li, and C. Tan, “Electric vehicles standards, charging infrastructure, and impact on grid integration: A technological review,” Renewable and Sustainable Energy Reviews, vol. 120, 2020.
  • [3] J.-H. Teng, S.-H. Liao, and C.-K. Wen, “Design of a fully decentralized controlled electric vehicle charger for mitigating charging impact on power grids,” IEEE Trans. Ind Appl., vol. 53, no. 2, pp. 1497–1505, 2016.
  • [4] J. Bates and D. Leibling, Spaced out: Perspectives on parking policy.   RAC Foundation, 2012.
  • [5] Q. Huang, Q.-S. Jia, and X. Guan, “A multi-timescale and bilevel coordination approach for matching uncertain wind supply with EV charging demand,” IEEE Trans. Autom. Sci. Eng., vol. 14, no. 2, pp. 694–704, 2016.
  • [6] O. Fallah-Mehrjardi, M. H. Yaghmaee, and A. Leon-Garcia, “Charge scheduling of electric vehicles in smart parking-lot under future demands uncertainty,” IEEE Trans. Smart Grid, vol. 11, no. 6, pp. 4949–4959, 2020.
  • [7] Z. Xiaodi, S. Yunlian, Y. Junwei, and D. Qing, “The method of charging piles planning in parking lot,” in 2016 IEEE International Conference on Power System Technology (POWERCON).   IEEE, 2016, pp. 1–5.
  • [8] A. S. Al-Ogaili, T. J. T. Hashim, N. A. Rahmat, A. K. Ramasamy, M. B. Marsadek, M. Faisal, and M. A. Hannan, “Review on scheduling, clustering, and forecasting strategies for controlling electric vehicle charging: challenges and recommendations,” IEEE Access, vol. 7, pp. 353–371, 2019.
  • [9] J. Kang, H. Kong, Z. Lin, and A. Dang, “Mapping the dynamics of electric vehicle charging demand within beijing’s spatial structure,” Sustainable Cities and Society, 2022.
  • [10] B. Sun, Z. Huang, X. Tan, and D. H. K. Tsang, “Optimal scheduling for electric vehicle charging with discrete charging levels in distribution grid,” IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 624–634, 2018.
  • [11] A.-M. Koufakis, E. S. Rigas, N. Bassiliades, and S. D. Ramchurn, “Offline and online electric vehicle charging scheduling with V2V energy transfer,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 5, pp. 2128–2138, 2020.
  • [12] M. F. Shaaban, S. Mohamed, M. Ismail, K. A. Qaraqe, and E. Serpedin, “Joint planning of smart EV charging stations and DGs in eco-friendly remote hybrid microgrids,” IEEE Trans. Smart Grid, vol. 10, no. 5, pp. 5819–5830, 2019.
  • [13] Y. Yang, Q.-S. Jia, X. Guan, X. Zhang, Z. Qiu, and G. Deconinck, “Decentralized EV-based charging optimization with building integrated wind energy,” IEEE Trans. Autom. Sci. Eng., vol. 16, no. 3, pp. 1002–1017, 2019.
  • [14] S. Zhao, X. Lin, and M. Chen, “Robust online algorithms for peak-minimizing EV charging under multistage uncertainty,” IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 5739–5754, 2017.
  • [15] H. Li, Z. Wan, and H. He, “Constrained EV charging scheduling based on safe deep reinforcement learning,” IEEE Trans. Smart Grid, vol. 11, no. 3, pp. 2427–2439, 2020.
  • [16] Q. Huang, Q.-S. Jia, and X. Guan, “Robust scheduling of EV charging load with uncertain wind power integration,” IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 1043–1054, 2018.
  • [17] H. S. Chang, J. Hu, M. C. Fu, and S. I. Marcus, Simulation-based algorithms for Markov decision processes.   Springer, 2007.
  • [18] E. M. Elgqvist and J. Pohl, “Evaluating utility costs savings for EV charging infrastructure,” National Renewable Energy Lab.(NREL), Golden, CO (United States), Tech. Rep., 2019.
  • [19] Q. Wu, M. Shahidehpour, C. Li, S. Huang, and W. Wei, “Transactive real-time electric vehicle charging management for commercial buildings with PV on-site generation,” IEEE Trans. Smart Grid, vol. 10, no. 5, pp. 4939–4950, 2018.
  • [20] T. Long, Q.-S. Jia, G. Wang, and Y. Yang, “Efficient real-time EV charging scheduling via ordinal optimization,” IEEE Trans. Smart Grid, 2021.
  • [21] D. Bertsekas, Dynamic programming and optimal control: Volume I.   Athena scientific, 2012, vol. 1.
  • [22] X.-R. Cao, Stochastic Learning and Optimization: A Sensitivity-Based Approach.   Springer, 2007.
  • [23] Q.-S. Jia, J. Wu, Z. Wu, and X. Guan, “Event-based HVAC control-a complexity-based approach,” IEEE Trans. Autom. Sci. Eng., vol. 15, no. 4, pp. 1909–1919, 2018.
  • [24] Z. Wu, Q.-S. Jia, and X. Guan, “Optimal control of multiroom HVAC system: An event-based approach,” IEEE Trans. Control Syst. Technol., vol. 24, no. 2, pp. 662–669, 2015.
  • [25] Y. Wardi, C. G. Cassandras, and X.-R. Cao, “Perturbation analysis: A framework for data-driven control and optimization of discrete event and hybrid systems,” Annual Reviews in Control, vol. 45, pp. 267–280, 2018.
  • [26] R.-B. Xue, X.-S. Ye, and X.-R. Cao, “Optimization of stock trading with additional information by limit order book,” Automatica, vol. 127, 2021.
  • [27] Q.-S. Jia and J. Wu, “A structural property of charging scheduling policy for shared electric vehicles with wind power generation,” IEEE Trans. Control Syst. Technol., 2020.
  • [28] Y. Zhao, “Optimization theories and methods for Markov decision processes in resource scheduling of networked systems,” Ph.D. dissertation, Tsinghua University, 2010.
  • [29] D. Wang, X. Guan, J. Wu, P. Li, P. Zan, and H. Xu, “Integrated energy exchange scheduling for multimicrogrid system with electric vehicles,” IEEE Trans. Smart Grid, vol. 7, no. 4, pp. 1762–1774, 2015.
  • [30] Nissan Leaf Price and Specification [online]. Available: https://ev-database.org/car/1106/Nissan-Leaf#charge-table.
  • [31] T. Wu, X. Ji, G. Wang, Y. Liu, Q. Yang, Z. Bao, and J. Peng, “Hydrogen energy storage system for demand forecast error mitigation and voltage stabilization in a fast-charging station,” IEEE Trans. Ind. Appl., 2021.