Model Selection for Infectious Disease Dynamics

Date: Mar. 14, 2020

Model selection with SINDy (sparse identification of nonlinear dynamics)

Brunton et al. introduced the method of sparse identification of nonlinear dynamics (SINDy) to recover the governing equations of dynamical systems given time series data. The method can also be used for model construction with experimental data. SINDy determines the nonlinear dynamics terms by conducting sparse regression on a library of nonlinear functions.

In this report, we tested the SINDy model selection algorithm on a variety of infectious disease models, namely, the SIR model with constant transmission rate, noisy transmission rate, and decreasing transmission rate. Also, we collected data on the reported COVID-19 infection, recovery and death in Hubei province, China, and used SINDy to construct a model for the disease spreading process.

Model construction with COVID-19 data

From the JHU CSSE (Johns Hopkins University Center for Systems Science and Engineering) website, we were able to obtain the reported COVID-19 data in Hubei province, China, from January 22 to March 14, 2020, which gives us the numbers of infections, recoveries, and deaths for 53 days. We consider both recoveries and deaths as the removed population in this section. Also, since most of the infections in the Hubei province have occurred in the city Hubei, which has an 11 million population, we set the total population to be N = 10^7.

To avoid SINDy failing due to insufficient sampling, we performed a nonlinear OLS regression on the collected data to produce more data points (see Figure below).

Comparison between the early COVID-19 data and the nonlinear OLS approximations

 

Using fourth order central differencing to approximate the derivatives and polynomials of order 1 to 3 for the function library, we obtained the numerical solution to the SINDy-constructed model. The predictions produced by the SINDy model is plotted along with the COVID-19 data and its regression in the figures below.

Numerical solution to the SINDy-constructed model along with COVID-19 data and its nonlinear OLS regression (model population N = 10^7)

Code: https://github.com/aprilzhizhou/data_driven_methods/tree/main/model_selection_for_infectious_disease_dynamics 

Full report:  https://github.com/aprilzhizhou/data_driven_methods/blob/main/model_selection_for_infectious_disease_dynamics/model_selection_report_SINDy.pdf