Real-Time Differential Epidemic Analysis and Prediction for COVID-19 Pandemic
Abstract
In this paper, we propose a new real-time differential virus transmission model, which can give more accurate and robust short-term predictions of COVID-19 transmitted infectious disease with benefits of near-term trend projection. Different from the existing Susceptible-Exposed-Infected-Removed (SEIR) based virus transmission models, which fits well for pandemic modeling with sufficient historical data, the new model, which is also SEIR based, uses short history data to find the trend of the changing disease dynamics for the infected, the dead and the recovered so that it can naturally accommodate the adaptive real-time changes of disease mitigation, business activity and social behavior of populations. Our work is inspired by the observation that contagious disease transmission prediction is similar to weather prediction: only short term prediction is typically accurate due to many time-varying interplaying factors as well as social and behavior uncertainties involved. On the other hand, accurate short-term prediction such as one week or ten days can give local government and hospical decision makers sufficient lead time for healthcare resources and critical personnel planning to provide in-patient treatment. As the parameters of the improved SEIR models are trained by short history window data for accurate trend prediction, our differential epidemic model, essentially are window-based time-varying SEIR model. Since SEIR model still is a physics-based disease transmission model, its near-term (like one month) projection can still be very instrumental for policy makers to guide their decision for disease mitigation and business activity policy changes in a real-time. This is especially useful if the pandemic lasts more than one year with different phases across the world like 1918 flu pandemic. Numerical results on the recent COVID-19 data from China, Italy and US, California and New York states have been analyzed. A dedicated website has been built to show the projections based on the latest data [1].
I Introduction
The novel coronavirus (COVID-19) epidemic is generating significant social, economic, and health impacts and has highlighted the importance of real-time analysis and prediction of emerging infectious diseases and health care resource and personnel planning and economical activity guideling.
One of the well-known models that is reasonably predictive for human-to-human transmission is the so-called Susceptible-Infectious-Removed (SIR) model, which was published in its first form around the 1920s [21]. The model later has been extended to consider more complicated situations such as the Susceptible-Exposed-Infected-Removed (SEIR) model [3]. Those models are very successful to describe how the disease dynamics will change over time once more sufficient data are available (from outbreak to finish). Recently those models have been applied to study recent COVID-19 transmissions in different countries with certain successes [16, 4, 17, 25, 15]. For example, it is observed that city-wide lockdown can lower the transmission rate substantially from those models. On the other hand, the data-driven and curve-fitting methods for the prediction of COVID-19 such as Gaussian function based fitting method in IHME projection [10], exponential curving fitting in[24], machine-learning based approaches in [23, 9] can fit the data well. However, those methods generally suffer the lack of physical insights of transmission and will not work well when the data does fit their models well (For instance, two waves of infections). For real time prediction, large projection ranges have to be given, which renders the projection less valuable for medical resources and mitigation policy planning.
Furthermore, the worldwide public health crisis like 1918 pandemic, which killed an estimated 50 million people worldwide, including an estimated 675,000 people in the United States, are difficult to model, even to project [2]. H1N1 flu lasted from April 1917 to April 1919 for two years with three major phases across different parts of world. Cities like San Francisco even experienced strong second waves of death increase when the social distance was relaxed too early. Such long-term multi-phase and multi-year transmission dynamic is very difficult to be captured by existing pandemic models. As a result, real-time short-term prediction and near-tern trend projection can give each city and county policy maker and resource planners extremely valuable information to guide the disease mitigation and business open/close decision in a real time.
On the other hand, traditional physics-based SIR/SEIR epidemic modeling and its variants suffer several drawbacks especially for real-time disease transmission predictions as it is static model in which many key parameters such as transmission rate and recovery rates are typically fixed values from fitting. However, for countries like US, the local government interventions and mitigation policies such as active surveillance, contact tracing, quarantine, massive testing, school and business closure, shelter in place, social distancing etc. keep changing to respond the local epidemic situations. Also for each state in US has different time lines for implementing different prevention and mitigation measures, which make the static based prediction even more difficult to predict. Further the social behaviors of population, such as bearing face coverage masks are not consistent through different regions and cities as the pandemic progresses, which will affect the transmission rate as well. The availability of healthcare resources of each city and county, which are also changing factors, which may also affect recovery rate. As a result, existing static SEIR models do not fit well for real-time disease prediction as the key parameters such as transmission rate, recovery rate, which is also closely related to the effective reproduction rate , are time-varying parameters. On the other hand, one important observation is that contagious disease transmission prediction is similar to weather prediction: only short term prediction is accurate due to many time-varying interplaying factors as well as social and behavior uncertainties involved. The widely watched IHME (Institute for Health Metrics and Evaluation) model for real-time COVID-19 epidemic prediction [10] is purely based on mathematic curve fitting techniques, which lacks the theoretical foundation of epidemic transmission found in the SIR models and thus its prediction accuracy is highly debatable.
To mitigate this problem, time-varying SEIR models have been proposed in the past. Dureau et al tried to model the time-varying affects of the SEIR models by considering partial and noisy data. They introduced stochastic processes into the SEIR models and solved the resulting stochastic SEIR partial differential equations using Markov Chain Monte Carlo methods in which the transmission is modeled as random walks, which are very expensive to compute [8]. This approach was applied to study the transmission within and outside Wuhan for January to February 2020 [11]. Recently Chen et al proposed time-dependent SIR model to model the COVID-19 outbreaks in China [6], In this mode, model parameters are computed in a daily basis. As a result, it lacks the good predictability as all the key parameters have been predicted a prior first, and some ad-hoc methods was introduced to predict the parameters in a near future.
In this work, we propose a new real-time differential virus transmission modeling method, which can give more accurate and robust short-term predictions of COVID-19 transmitted infectious disease while still maintain the near-term projection benefit. The new model is based on enhanced Susceptible-Exposed-Infected-Removed (SEIR) virus transmission model. But it tries to obtain the differential view of pandemic dynamics in a short history window to analyze the short trend of the transmission so that it can be more accurate over different periods of time to accommodate the adaptive changes of disease mitigation, business activity and social behavior of populations. As the parameters of the improved SEIR models are trained by short history window data for accurate trend prediction, our differential epidemic model, essentially are window-based time-varying SEIR model. Since SEIR model still is a physics-based disease transmission model, its near-term (like one month) projection can still be very instrumental for policy makers to guide their decision for disease mitigation and business activity policy changes in a real-time. Numerical results on the recent COVID-19 data from China, Italy and US, California and New York states are analyzed. A dedicated website has been built to show the projections based on the latest data [1].

Fig. 1 illustrates the modeling and prediction results from the proposed differential SEIR model for the COVID-19 disease for California from early March to later July 2020. As we can see, at the beginning, the infected grows very fast, the projected growth also reflects such fast changing rates. But as social distancing and stay at home policies were introduced across cities and counties in California in the later March, the growth rate went down, the projected growth at different time point also reflects such trends. Based on the projection of our model at May 1, 2020, the infected case will reach to the peak about 72.37K around Jun 14 in California, which indicates the peak medical resources needed. In contrast, the well-watched IHME’s prediction [10] predicts that the peak medical resource needed is around April 17. We will show more significant differences between our models and IHME’s prediction [10] for New York state later.
II The enhanced SEIR base model
In this section, we first present the extended SIR model, which is a extension of classical SEIR model [20, 19, 12, 18, 7, 22, 13] as the base model for the proposed differential modeling shown later. The proposed base model, called SEIRDP model, is similar to the recently proposed generalized SEIR model for studying the COVID-19 disease in Wuhan and China [17]. We removed the quarantine compartment in the proposed SEIRDP model as there is no quarantine data for most of countries outside China. The resulting SEIRDP (Susceptible, Exposed, Infected, Recovered, Death, Insusceptible (P)) model is shown in Fig. 2. In this base model, we have six states, i.e. , which indicate the number of the susceptible cases, insusceptible cases, exposed cases (infected but not yet be infectious, in a latent period), infectious cases (confirmed with infectious capacity), recovered and immune cases and closed cases (or death). Then the total number of population in a certain region or county is . The coefficients represent the protection rate, infection rate, average latent time, cure rate, and mortality rate, respectively. The introduction of insusceptible compartment represents gradual changing (or growing) population, which will not be infected due to some strong disease mitigation measures such as enforced shelter in place, strict city lockdown in China and European countries. The basic reproduction number, , represents the number of secondary infections from a primary infected individual in a fully susceptible population, which can be computed by [17]
(1) |
where is the number of days. We remark that If we force , the proposed SEIRDP model essentially becomes the classical SEIR model (if we put recovered and death cases into one recovery compartment).

III The proposed differential SEIR model
Once we have the base SEIR model, then we can present our differential SEIR model. In this model, basically the key parameters will become time dependent.

In our work, we introduce a time window concept, which represents a short period of time in a few days, as shown in Fig. 3. In this figure, the red box present a history length of the data () we use for training the model. The red box will move one window size () as one step forward. For instance five days can be a good time window. In each time window, all the key parameters in the base SEIR models are constant. This reflects the fact that the transmission and recovery conditions for population does not change in a short window time so that we can use a traditional static SEIR model based on the history of data around this window time and we can also predict a short future with sufficient accurate assuming such disease dynamic does not change dramatically, which is also reasonable. Based on this observation, we can present the resulting ordinary differential equations for the proposed differential SEIR model as follows:
(2) |
where time belongs to the th time window. If we use to indicate the last day in the window , then we have , where is the window size. The window can be moved forward with stride days. Therefore, we have the relationship . The cure rate and mortality rate are time-dependent, which again are dependent on two parameters. They again are time-window dependent (explained below):
Those two equations basically say that cute rate or recovery rate will goes to constant value exponentially over a time and the mortality rate goes down exponentially over time [5].
We notice that all the key parameters for the enhanced SEIR model are time-varying instead of fixed values. But they are so-called time window dependent as they do not change every day compared to existing time-varying based SEIR models. Specifically, we select days as a window (the window size is a hyper parameter and can be optimized for different regions and countries). We perform each prediction over each non-overlapping window and then slide one window forward for next prediction and so on. As a result, within the th window, the seven parameters, , , , , , , are kept constant. The parameters will be found by a regression process shown next section. The stride can be days or days, which is also another hyper parameter for our model. Since all the parameters are time window dependent, the reproduction number becomes time dependent, which is also called effective reproduction number, at time in th time window. can be computed as follows:
(3) |
We note that such window-based SEIR model obtained at th window also depends on the historical data from the previous windows. The reason is that our differential model can be viewed as performing the differential operations on a dynamic systems at a specific time frame or window, not on a static function. As a result, the impacts from the historical data of the previous windows will be represented as the initial conditions for solving the PDE of (2) and obtaining the resulting parameters by fitting the resulting discretized PDE with the data in this window.
IV Window-dependent model parameter estimation for differential SEIR models
For the proposed differential SEIR model, for each differential window, we will calculate the seven , , , , , , in the th time window indicated by , which indicates the last day in the window . As a result, we can rewrite the partial differential equation (2) into the following initial value PDE in matrix form with the initial condition , which indicate the the impacts from the historical data before th window:
(4) |
where IC is initial condition.
Then we perform the time domain discretization using simple Forward Euler or higher order explicit Runge-Kutta method and we end up of number of algebraic equations. By solving this resulting equation in the time domain, we can obtain the seven parameters over the given data of in the past 10 or 15 days with the initial state conditions computed from the previous time window. In this paper, we use a nonlinear least-squares solver to estimate the parameters by the expression [5]
(5) |
where , , , , , ,, is the parameters of estimated model, is the time from the real data, are the infected, recovered and death cases from the real data, and SEIRDP represents the SEIRDP model.
V Results and discussion
V-A Analysis for public data from Hubei Province, China



We first show the modeling results for Hubei province of China from early January to the middle of April, 2020. Fig. 4 shows the time evolution of the numbers of the currently infected, the recovered and the death cases over this period. Note that many public websites show the cumulative cases, which are different than the currently infected cases. The currently infected cases equal the cumulative case minus the recovered and dead cases. The circles in the figure (in all the figures) are the measured data and the solid line is the last project in early April 2020 as China’s pandemic outbreak basically has run its course already and we have all the historical data from early outbreak to finish.
Notice that on Feb. 12, 2020, COVID-19 testing criteria was relaxed in Hubei Province. As a result, there is a huge jump of confirmed infected cases, which can be viewed as an outlier for the date. As a result, the projection based on the history before and around Feb. 12 is quite off the track of actual cases as shown in Fig. 4. But for the most of days over the two month period, the projected and actual measured cases match very well.
Fig. 5 shows the effective reproduction number, , over the same period time. As we can see, becomes less than 1 around Feb. 19, which is about 3 weeks after city lockdown in Wuhan on Feb 23, 2020. This indeed shows the effects of strict city lockdown in Hubei province to reduce the transmission effect of the virus. Feb. 19 is also close to the time when the peak of infected people was reached as shown in Fig. 4. As a result, analysis and prediction of effective reproduction number can give us more insights into how the pandemic dynamics will play out and when the peak or turning points will happen in the real time. Fig. 6 shows the 5-day predicted mean errors in percentage against the measured data. The mean errors are computed for the average estimated errors in each time window (every five days) between the 5-day projected infected cases and measured cases. This is case for all the mean error computations in the sequel.
V-B Analysis and prediction for Italy



Fig. 7 shows COVID-19 disease modeling and prediction for Italy from late February to middle of April 2020 as Italy was mostly severely impacted by COVID-19 in Europe. As we can see, our differential models match well for the historical data. The prediction around early March is quite aggressive as the actual growth rate for both infected and death are very high. This is also reflected in the effective reproduction number, which is about 6-7, at those days as shown in Fig. 8. As of April 13, has reached to around 2 and is still going downward. Based on our 10 day prediction, the currently infected cases will reach to the peak around April 21 with about 107K cases. Fig. 9 shows the 5-day projected mean errors. As we can see, as time progresses, the projected error goes down and is limited to about 10% for the three types of cases analyzed.
V-C Analysis and prediction for United State
For US data, we show the results for US, New York state and California state as New York is the epicenter of the COVID-19 in US. California has the largest population in US. The two states also have dramatically different transmission situations.
Fig. 10 shows first modeling and projection for US from late early March to middle July 2020 for the currently infected, recovery and death cases. The results from the models match the given data very well.
The effective reproduction numbers over the same time period and the relative projection errors are shown in Fig. 11 and Fig. 12 respectively. As we can see, the effective reproduction number seems higher than basic reproduction number, , which was estimated about 2-7 for COVID-19 [14]. This may reflect the significant portion of unconfirmed and asymptotic population at the beginning of this outbreak in US (this is the cases for all the other countries). But as time progresses, tends to level down to a more reasonable range as more confirmed cases have been reported.



Fig. 13 and Fig. 14 show that daily newly confirmed infected cases and daily death cases for US over the same period respectively. The blue lines are the actual data and the brown lines are projected data from April 13 to middle of July 2020. We want to express that such projection is subject to change due to many changing factors such as mitigation policies and people behaviors in the near term.


Fig. 15 show the existing daily confirmed infected cases for US over March to middle July. We project it will reach total number probably will reach to about 1.1 millions in June 12.

V-D Analysis and prediction for California state
For California, the modeling and projection results from the same period of US are shown in Fig. 10, Fig. 16 and Fig. 17.
The effective reproduction number is around 5 as of April 13, which is close to the observed basic reproduction number for COVID-19, estimated about 2-7 for COVID-19 [14].


Fig. 18, Fig. 19 and 20 show the daily confirmed infected cases, the death cases and the total accumulative infected cases for California over mentioned period respectively.



V-E Analysis and prediction for New York state
For New York State, the modeling and projection results from the same period of US are shown in Fig. 21, Fig. 22 and Fig. 23. The estimated effective reproduction numbers from the differential models are reduced to the be less than 5 and continue going downward around April 13, 2020.



Fig. 24, Fig. 25 and Fig. 26 show the daily confirmed infected cases, the death cases and the total accumulative infected cases for New York state over mentioned period respectively.



VI Conclusion
In this paper, we have proposed a new real-time differential virus transmission model, which can give more accurate and robust short-term predictions of COVID-19 transmitted infectious disease with benefits for near-term trend projection. The new model is based on enhanced Susceptible-Exposed-Infected-Removed (SEIR) virus transmission model. As the parameters of the improved SEIR models are trained by short history window data for accurate trend prediction, our differential epidemic model, essentially are window-based time-varying SEIR model. Numerical results on the recent COVID-19 data from China, Italy and US, California and New York states have been analyzed.
VI-A Acknowledgment
The authors would like to thanks Dr. Cheynet for his open-sourced SEIR model [5] and for his comments for our work, which improves the presentation of the article.
References
- [1] “Covid-19 pandemic projection.” [Online]. Available: https://intra.ece.ucr.edu/~stan/project/vsclab_wiki_new/index.php/COVID-19_Pandemic_Projection
- [2] D. J. anbd Terrence Tumpey and B. Jester, “The deadliest flu: The complete story of the discovery and reconstruction of the 1918 pandemic virus,” Centers for Disease Control and Prevention, Dec 2019. [Online]. Available: https://www.cdc.gov/flu/pandemic-resources/reconstruction-1918-virus.html
- [3] R. M. Anderson and R. M. May, Infectious diseases of humans: Dynamics and control. Oxford University Press, 1991.
- [4] Y. Chen, J. Cheng, Y. Jiang, and K. Liu, “A time delay dynamical model for outbreak of 2019-ncov and the parameter identification,” Journal of Inverse and Ill-posed Problems, vol. 28, no. 2, p. 243–250, Apr 2020. [Online]. Available: http://dx.doi.org/10.1515/jiip-2020-0010
- [5] E. Cheynet, “Generalized SEIR Epidemic Model (fitting and computation) ,” Apr 2020. [Online]. Available: https://www.github.com/ECheynet/SEIR
- [6] Y. chun Chen, P.-E. Lu, and C.-S. Chang, “A time-dependent sir model for covid-19,” ArXiv, vol. abs/2003.00122, 2020.
- [7] S. J. Clifford, C. A. B. Pearson, P. Klepac, K. Van Zandvoort, B. J. Quilty, , R. M. Eggo, and S. Flasche, “Interventions targeting air travellers early in the pandemic may delay local outbreaks of sars-cov-2,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/28/2020.02.12.20022426
- [8] J. Dureau, K. Kalogeropoulos, and M. Baguelin, “Capturing the time-varying drivers of an epidemic using stochastic dynamical systems,” Biostatistics, vol. 14, no. 3, pp. 541–555, 01 2013. [Online]. Available: https://doi.org/10.1093/biostatistics/kxs052
- [9] Z. Hu, Q. Ge, S. Li, L. Jin, and M. Xiong, “Artificial intelligence forecasting of covid-19 in china,” 2020.
- [10] IHME, “Covid-19 projects, institute for health metrics and evaluation (ihme), university of washington,” https://covid19.healthdata.org/united-states-of-america.
- [11] A. J. Kucharski, T. W. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk, R. M. Eggo, F. Sun, M. Jit, J. D. Munday, N. Davies, A. Gimma, K. van Zandvoort, H. Gibbs, J. Hellewell, C. I. Jarvis, S. Clifford, B. J. Quilty, N. I. Bosse, S. Abbott, P. Klepac, and S. Flasche, “Early dynamics of transmission and control of COVID-19: a mathematical modelling study,” The Lancet Infectious Diseases, Mar. 2020.
- [12] J. Labadin and B. H. Hong, “Transmission Dynamics of 2019-nCoV in Malaysia,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/11/2020.02.07.20021188
- [13] X. Li, X. Zhao, and Y. Sun, “The lockdown of hubei province causing different transmission dynamics of the novel coronavirus (2019-ncov) in wuhan and beijing,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/17/2020.02.09.20021477
- [14] Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklöv, “The reproductive number of COVID-19 is higher compared to SARS coronavirus.” Journal of Travel Medicine, vol. 27, no. 2, Mar. 2020.
- [15] B. F. Maier and D. Brockmann, “Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China,” Science, 2020. [Online]. Available: https://science.sciencemag.org/content/early/2020/04/07/science.abb4557
- [16] I. Nesteruk, “Statistics based predictions of coronavirus 2019-nCoV spreading in mainland China,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/13/2020.02.12.20021931
- [17] L. Peng, W. Yang, D. Zhang, C. Zhuge, and L. Hong, “Epidemic analysis of COVID-19 in China by dynamical modeling,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/18/2020.02.16.20023465
- [18] M. Shen, Z. Peng, Y. Guo, Y. Xiao, and L. Zhang, “Lockdown may partially halt the spread of 2019 novel coronavirus in hubei province, china, elocation-id = 2020.02.11.20022236, year = 2020, doi = 10.1101/2020.02.11.20022236, publisher = Cold Spring Harbor Laboratory Press, url = https://www.medrxiv.org/content/early/2020/02/13/2020.02.11.20022236, journal = medRxiv.”
- [19] B. Tang, N. L. Bragazzi, Q. Li, S. Tang, Y. Xiao, and J. Wu, “An updated estimation of the risk of transmission of the novel coronavirus (2019-ncov),” Infectious Disease Modelling, vol. 5, pp. 248 – 255, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S246804272030004X
- [20] B. Tang, X. Wang, Q. Li, N. L. Bragazzi, S. Tang, Y. Xiao, and J. Wu, “Estimation of the Transmission Risk of the 2019-nCoV and Its Implication for Public Health Interventions,” Journal of Clinical Medicine, vol. 9, no. 2, Feb. 2020.
- [21] H. Weiss, “The sir model and the foundations of public health,” MATerials MATemàtics,, vol. 2013, no. 3, pp. 1–17, 2013.
- [22] H. Xiong and H. Yan, “Simulating the infected population and spread trend of 2019-ncov under different policy by eir model,” medRxiv, 2020. [Online]. Available: https://www.medrxiv.org/content/early/2020/02/12/2020.02.10.20021519
- [23] T. Zeng, Y. Zhang, Z. Li, X. Liu, and B. Qiu, “Predictions of 2019-ncov transmission ending via comprehensive methods,” 2020.
- [24] S. Zhao, Q. Lin, J. Ran, S. S. Musa, G. Yang, W. Wang, Y. Lou, D. Gao, L. Yang, D. He, and M. H. Wang, “Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov) in china, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak,” International Journal of Infectious Diseases, 2020. [Online]. Available: https://doi.org/10.1016/j.ijid.2020.01.050
- [25] T. Zhou, Q. Liu, Z. Yang, J. Liao, K. Yang, W. Bai, X. Lu, and W. Zhang, “Preliminary prediction of the basic reproduction number of the wuhan novel coronavirus 2019-ncov,” Journal of Evidence-Based Medicine, vol. 13, no. 1, pp. 3–7, 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/jebm.12376