Forecasting dengue in the Mekong Delta region

This code fits and evaluate models for forecasting the incidence of dengue cases 2 months ahead at the district level in the Mekong Delta region of Vietnam. There are three main types of models that are fit

Bayesian spatiotemporal models. These models use lags of meteorological variables (e.g., temperature, precipitation), sociodemographic variables (urbanization, income, etc), and lags of observed cases. Models include lags of observed incidence and also include a spatiotemporal random effects with different structures (e.g., uncorrelated AR(1) terms by district, spatial random effects, spatiotemporal random effect).
hhh4 models. These semi-mechanistic spatiotemporal models account for baseline variations in incidence, autoregression, and spatial spread. Each of these components can be modeled as a function of covariates
An experimental approach that models the district from each time series independently and uses the lagged cases from the district of interest and all of the other districts. A large matrix of these lagged cases is created, and Y-aware principal components analysis is performed. The top N principal components that explain 85% of the variation is used in the model along with harmonic terms and an AR(1) random intercept

Several variations of each of these models with different covariates and random effects structures is evaluated. Each of these models is evaluated using time series cross validation by moving forward the end of the training period by 1 time unit at a time. Forecasting performance at 2 months is evaluated using continuous ranked probability score (CRPS) and Brier scores. We create an ensemble based on the CRPS scores.

Getting started.

The code in this repository is intended to be run on an HPC.

In an interactive session, Install packages found in R/load.R, particularly INLA and scoringutils. On the shell, run:

salloc

module load R/4.2.0-foss-2020b

R

remotes::install_version("INLA", version="23.04.24",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/testing"), dep=TRUE)

####choose option 3 to say don't update any packages > 3

library(INLA)

options(timeout=300)

inla.binary.install()

#select option 2 to work on the Yale HPC

2

library(INLA)

install.packages('scoringutils')

if there are other packages that need to be installed, install them here as well.

if the .sh scripts are created or modified on a Windows machine, you will get an error. In the terminal on the cluster, run: > dos2unix mod1.sh

When running INLA on Linux, use the following. Without this, the model unpredictably fails: library(INLA) inla.setOption(mkl=TRUE)

IMPORTANT: on set X to be the number of cores requested from the cluster (e.g., 8): inla(..., num.threads=8)

Modifying the code

More Bayesian spatiotemporal models can be defined under99_define_inla_spacetime_mods.R. More hhh4 models can be defined in fun_hhh4.R
Specify which inla spacetime models and hhh4 models you want to run in 01_call_inla_spacetime.R, 02_call_hhh4.R, and 03_call_lag_district_pca.R
Modify the *.sh scripts to reflect the number of models and time poitns being tested

#SBATCH --array=1-504 # If k models and J hold out time points this should be 1-J*K

N_models = __ #this should be the number of models that were specified in the call_*.R files

Running the models

Each of the 3 model types needs to be called separately. On the command prompt, change directory to the directory containing the .sh file using > cd XX

>salloc

> sbatch 1_inlaspacetime.sh

> sbatch 1_hhh4.sh

> sbatch 1_lag_district_pca.sh

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
Data		Data
Dengue_after_2022_informative_prior		Dengue_after_2022_informative_prior
One_year_forecast_2022		One_year_forecast_2022
R		R
Report		Report
Threshold_real_data		Threshold_real_data
preformance_after_2018		preformance_after_2018
-MWBTWS24600A1H.Rhistory		-MWBTWS24600A1H.Rhistory
.gitignore		.gitignore
00_formating_input_data.R		00_formating_input_data.R
01_fun_inla_spacetime.R		01_fun_inla_spacetime.R
01_inla_spacetime.sh		01_inla_spacetime.sh
02_hhh4_fitmod.sh		02_hhh4_fitmod.sh
03_2_lag_district_pca.sh		03_2_lag_district_pca.sh
03_3_lag_district_pca.sh		03_3_lag_district_pca.sh
03_lag_district_pca.sh		03_lag_district_pca.sh
04_combine_forecasts.R		04_combine_forecasts.R
05_evaluate_forecasts.R		05_evaluate_forecasts.R
CI_HPC_ensemble_updated.R		CI_HPC_ensemble_updated.R
Code_to_repreduce_the_plots_for_paper.R		Code_to_repreduce_the_plots_for_paper.R
Dengue_District_HPC.Rproj		Dengue_District_HPC.Rproj
MDR.graph		MDR.graph
PULL_results.sh		PULL_results.sh
README.md		README.md
ROC_by_district_using_draws.R		ROC_by_district_using_draws.R
Sharpness_and_bias.R		Sharpness_and_bias.R
animated_plot.gif		animated_plot.gif
baseline ensemble accuracy.R		baseline ensemble accuracy.R
ensemble_run_guide.html		ensemble_run_guide.html
extract_prob.R		extract_prob.R
extract_prob_50_percentile.R		extract_prob_50_percentile.R
extract_prob_80_percentile.R		extract_prob_80_percentile.R
extract_prob_90_percentile.R		extract_prob_90_percentile.R
prob.R		prob.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forecasting dengue in the Mekong Delta region

Getting started.

Modifying the code

Running the models

About

Uh oh!

Releases

Packages

Contributors 2

Languages

E-DENGUE/Dengue_District_HPC

Folders and files

Latest commit

History

Repository files navigation

Forecasting dengue in the Mekong Delta region

Getting started.

Modifying the code

Running the models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages