Time Series Forecasting - ARIMA [Part 3]

Here comes the climax of the Time Series Forecasting - ARIMA series. Hope you have gone through and enjoyed learning previous two articles in the series, if not then please do it.


We have checked the Volatility and stationarity in the series and have made the series non-volatile and stationary. We have also divided dataset into two parts : training and validaton. Now we are ready to perform ARIMA modeling on training Dataset.

Next Step : Model Identification
The order of an ARIMA (autoregressive integrated moving-average) model is usually denoted by the notation ARIMA(p,d,q ) or it can be read as AR(p) , I(d), MA(q)
  1. p = Order of Autoregression (Individual values of time series can be described by linear models based  on preceding observations. For instance: x(t) = 3 x(t-1) - 4 x(t-2))
  2. d = Order of differencing (No. of times data to be differenced to become stationary)
  3. q = Order of Moving Average (Number of lagged forecast errors in the prediction equation. Past estimation or forecasting errors are taken into account when estimating the next time series value. The difference between the estimation x(t) and the actually observed value x(t) is denoted ε(t). For instance: x(t) = 3 ε(t-1) - 4 ε(t-2).)

Many of the simple time series models are special cases of ARIMA Model
  1. Simple Exponential Smoothing ARIMA(0,1,1)
  2. Holt's Exponential Smoothing  ARIMA(0,2,2)
  3. White noise ARIMA(0,0,0)
  4. Random walk ARIMA(0,1,0) with no constant
  5. Random walk with drift ARIMA(0,1,0) with a constant
  6. Autoregression ARIMA(p,0,0)
  7. Moving average ARIMA(0,0,q)

We can do the  model identification in two ways :

1 . Using ACF and PACF Functions

2.  Using Minimum Information Criteria Matrix (Recommended)

Autocorrelation Function (ACF)

Autocorrelation is a correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xt and Xt-h. Correlation between two or more lags.

Partial Autocorrelation Function (PACF)

For a time series, the partial autocorrelation between xt and xt-h is defined as the conditional correlation between xt and xt-h, conditional on xt-h+1, ... , xt-1, the set of observations that come between the time points t and t−h.

ARIMA - ACF PACF


ARIMA Procedure
identify var=VariableY(PeriodsOfDifferencing);
estimate p=OrderOfAutoregression q=OrderOfMovingAverage;
where VariableY is modeled as ARIMA(p,d,q) with p = OrderOfAutoregression, d = the order of differencing (determined from PeriodsOfDifferencing), and q = OrderOfMovingAverage.

Using these identified p and q values, we run ARIMA model.
PROC ARIMA DATA= Training ;
IDENTIFY VAR = Log_Air(1,12) ;
ESTIMATE P =1 Q =1 OUTSTAT= stats ;
Forecast lead=12 interval = month id = date
out = result;
RUN;
Quit;

We strongly suggest to follow Minimum Information Criteria Matrix approach though.

Minimum Information Criteria Matrix approach

A MINIC table is then constructed using BIC(m,j) where m=pmin,.......pmax and j=qmin....qmax.
ARIMA Orders
We run following code first to get MINIC:
PROC ARIMA DATA= Training;
IDENTIFY VAR = Log_Air(1,12) MINIC;
RUN;Quit;
It would give you the matrix given below. Find the minimum value (largest negative) point in the matrix.

ARIMA : MIC

Now we consider the maximum of P(3) and Q(0) suggested by MINIC which is max(3,0) = 3 in this case. And then we iterate ARIMA model for P = 0 to 3 to Q = 0 to 3 (Except 0,0).


%Macro top_models;

%do p = 0  %to 3 ;
%do q = 0 % to 3 ;

PROC ARIMA DATA= test ;
IDENTIFY VAR = Log_Air(1,12)  ;
ESTIMATE P = &p. Q =&q.  OUTSTAT= stats_&p._&q. ;
Forecast lead=12 interval = month id = date 
out = result_&p._&q.;
RUN;
Quit;

data stats_&p._&q.;
set   stats_&p._&q.;
p = &p.;
q = &q.;
Run;

data result_&p._&q.;
set   result_&p._&q.;
p = &p.;
q = &q.;
Run;

%end;
%end;

Data final_stats ;
set %do p = 0  %to 3 ;
%do q = 0 % to 3 ;
stats_&p._&q. 
%end;
%end;;
Run;
Data final_results ;
set %do p = 0  %to 3 ;
%do q = 0 % to 3 ;
result_&p._&q.
%end;
%end;;
Run;

%Mend;
%top_models

/* Then to calculate the mean of AIC and SBC*/

proc sql;
create table final_stats_1  as select p,q, sum(_VALUE_)/2 as mean_aic_sbc from final_stats
where _STAT_ in ('AIC','SBC')
group by p,q
order by mean_aic_sbc;
quit;

Save AIC and SBC values of all the iterations and choose top 5-7 models with minimum mean(AIC,SBC) values.

Now for all these selected models selected using AIC and SBC average, we calculate MAPE on validation data. We run the ARIMA on validation data with all selected P and Q.

Mean Squared Percentage Error (MAPE) for each model :

MAPE  =  Abs(Actual – Predicted) / Actual *100

Use  the following code to calculate MAPE :

Proc SQL;
create table final_results_1 as select a.p, a.q, a.date,a.forecast, b.log_air
from final_results as a join validation as b
on a.date = b.date;
quit;

Data Mape;
set final_results_1 ;
Ind_Mape = abs(log_air - forecast)/ log_air;
Run;


Proc Sql;
create table mape as select p, q, mean(ind_mape) as mape from mape
group by p, q
order by mape ;
quit;

Results:
ARIMA : MAPE
Model with least MAPE is finally your climax model which is p= 0, q=3;
Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

5 Responses to "Time Series Forecasting - ARIMA [Part 3]"

  1. Hi,

    Thank your for the detailed explanation of the Time Series Forecasting model, it's really helpful.

    Could you please also elaborate, after selecting the model with the least MAPE, how would we predict the value for the next time period i.e. Jan 61.

    ReplyDelete
  2. in stead of using Forecast lead=12, you can use a higher number in lead option to forecast further.

    ReplyDelete
  3. Hi,

    Thank you that u r so elaborately explaining ARIMA.

    I have a doubt after reading this article.How can we divide the data set into validation and training data while doing so we lose sequence of time series.

    ReplyDelete
    Replies
    1. You do not lose the sequence of the data. First 70-80% data-set works as a training set and the rest for validation.

      Delete

Next → ← Prev