Using SARIMAX for Time Series Forecasting on Seasonal Data that is influenced by Exogenous variables
Data Provided: Traffic Data (refer train.csv for more)
Data description
|
Columns |
Description |
|
date_time |
Date, time, and hour of the data that is collected in the local IST time |
|
is_holiday |
Categorical Indian national holidays combined with regional holidays |
|
air_pollution_index |
Air Quality Index (10-300) |
|
humidity |
Numeric humidity in Celcius |
|
wind_speed |
Numeric wind speed in miles per hour |
|
wind_direction |
Cardinal wind direction (0-360 degree) |
|
visibility_in_miles |
Visibility of distance in miles |
|
dew_point |
Numeric dew point in Celcius |
|
temperature |
Numeric average temperature in Kelvin |
|
rain_p_h |
Numeric amount in mm of rain that occurred in the hour |
|
snow_p_h |
Numeric amount in mm of snow that occurred in the hour |
|
clouds_all |
Numeric percentage of cloud cover |
|
weather_type |
Categorical short textual description of the current weather |
|
weather_description |
Categorical longer textual description of the current weather |
|
traffic_volume |
Numeric hourly traffic volume bound in a specific direction |
traffic_volume attribute has to be forecasted on the basis of the time series data provided, taking the exogenous variables into account
Approach used: SARIMAX (Seasonal Autoregressive Integrated Moving Average with eXogeneous variables)
Reason: The data provided is seasonal, and it is a time series data with multiple exogeneous variables influencing the result. Hence, the optimal statistical model that can be applied to this task is SARIMAX
Main Modules Used:
statsmodelpackage in Python