Time Series Forecasting - ARIMA [Part 1]

Introduction:

Time Series : A time series is a data series consisting of several values over a time interval. e.g. daily BSE Sensex closing point, weekly sales and monthly profit of a company etc.

Typically, in a time series it is assumed that value at any given point of time is a result of its historical values. This assumption is the basis of performing a time series analysis. ARIMA technique exploits the auto-correlation (Correlation of observation with its lags) for forecasting.

So talking mathematically,

Vt = p(Vt-n) + e

It means Value (V) at time "t" is a function of value at time "n" instance ago with an error (e). Value at time "t" can depend on one or various lags of various order.

Example :
Suppose Mr. X starts his job in year 2010 and his starting salary was $5,000 per month. Every years he is appraised and salary reached to a level of $20,000 per month in year 2014. His annual salary can be considered a time series and it is clear that every year's salary is function of previous year's salary (here function is appraisal rating).

Components of a Time Series :

1. Trend
Series could be constantly increasing or decreasing or first decreasing for a considerable time period and then decreasing. This trend is identified and then removed from the time series in ARIMA forecasting process.

2. Seasonality
Repeating pattern with fixed period.
Example - Sales in festive seasons. Sales of Candies and sales of Chocolates peaks in every October Month and December month respectively every year in US. It is because of Halloween and Christmas falling in those months. The time-series should be de-seasonalized in ARIMA forecasting process.

3. Random Variation (Irregular Component / Residual)
This is the unexplained variation in the time-series which is totally random. Erratic movements that are not predictable because they do not follow a pattern. It is also known as residual.
Example - Earthquake


Terminologies related to Time Series

1. Stationary Series

 A stationary series is one whose mean and variance of the series is constant over time.
The series has to be stationary before building a time series with ARIMA. Most of the time series are non-stationary. If series is non-stationary, we need to make it stationary with detrending, differencing etc.
Why Stationary?

To calculate the expected value, we generally take a mean across time intervals. The mean across many time intervals makes sense only when the expected value is the same across those time periods. If the mean and population variance can vary, there is no point estimating by taking an average across time.

2. White Noise

A white noise process is one with a constant mean of zero, a constant variance and no correlation between its values at different times. White noise series exhibit a very erratic, jumpy, unpredictable behavior. Since values  are uncorrelated, previous values do not help us to forecast future values.
White noise series themselves are quite uninteresting from a forecasting standpoint (they are no linearly forecastable).
Image Source : Forecasting: principles and practice Book
3. Autocorrelation

Autocorrelation refers to the correlation of a time series with its own past and future values. Autocorrelation is also sometimes called “lagged correlation” or “serial correlation”.

4. Random Walk

A random walk is defined as a process where the current value of a variable is composed of the past value plus an error term defined as a white noise (a normal variable with zero mean and constant variance). Algebraically a random walk is represented as follows:  yt = yt−1 + e

The implication of a process of this type is that the best prediction of y for next period is the current value or in other words the process does not allow to predict the change (yt − yt−1). That is, the change of y is absolutely random. A random walk process is non-stationary as its mean and variance increases with t.
Time Series : Random Walk
ARIMA (Box-Jenkins Approach)

ARIMA stands for Auto-Regressive Integrated Moving Average. It is also known as Box-Jenkins approach. It is one of the most popular techniques used for time series analysis and forecasting purpose.

We would cover ARIMA in a series of blogs starting from introduction, theory and finally the process of performing ARIMA on SAS.

Well, coming back to ARIMA, as its full form indicates that it involves two components :
  1. Auto-regressive component
  2. Moving average component
We would first understand these components one by  one.

1. Auto-regressive Component

It implies relationship of a value of a series at a point of time with its own previous values. Such relationship can exist with any order of lag.

Lag -

Lag is basically value at a previous point of time. It can have various orders as shown in the table below. It hints toward a pointed relationship.

Time Series : Lag

2. Moving average components 

It implies the current deviation from mean depends on previous deviations. Such relationship can exist with any number of lags which decides the order of moving average.

Moving Average -

Moving Average is average of consecutive values at various time periods.  It can have various orders as shown in the table below. It hints toward a distributed relationship as moving itself is derivative of various lags.

Moving Average Explanation
Moving average is itself considered as one of the most rudimentary methods of forecasting. So if you drag the average formula in excel further (beyond Dec-15), it would give you forecast for next month.

Both Auto-regressive (lag based) and moving average components in conjunction are used by ARIMA technique for forecasting a time series.

Now we would directly jump to ARIMA process in SAS.

Part 2 : Time Series Forecasting : ARIMA

About the Author -
This article was originally written by Rajat Agarwal, later Deepanshu gave final touch to the post. Rajat is an analytics professional with more than 8 years of work experience in diverse business domains. He has gained expert knowledge in Excel and SAS. He loves to create innovative and imaginative dashboards with Excel. He is founder and lead author cum editor at Ask Analytics.

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

4 Responses to "Time Series Forecasting - ARIMA [Part 1]"

  1. HI Folk,
    thanks for providing us a rich article on Forecasting,
    Could you please elaborate or explain White Noise again,
    Definition above for White Noise is ONE WITH CONSTANT MEAN AND VARIATION, by this I am getting it that both mean and variance are constant.

    But when again in short definition for White noise has been explained in Random Walk column then things are quite different . It is mentioned that with zero mean and variance one.

    Could you be so kind to explain the thin line of difference between them.

    ReplyDelete
    Replies
    1. A white noise series has a constant mean (of zero), a constant variance and no correlation. Hence, it is stationary. Whereas, random walk is non-stationary as its mean and variance increases over time.

      Delete
  2. I am not able to understand that how it can be stationary if it has sudden jumps or erratic changes

    ReplyDelete
    Replies
    1. Hi Rishabh,

      I believe that for white noise, at any instant the probability associated with the occurence of any particular value is 0. Further, each value is indepedent of the others. These justify the fact that overall, the mean value of white noise is zero and therefore a constant.

      For a mathematical explanation,the definition of a white series is that the covariance matrix should be an identity matrix(I).

      Let x be a random vector. Covariance matrix of a random vector is E{x*x'}. Mean of the random vector is m = E{x}. Let y be a white random
      vector with zero mean so that x = y + m.

      Now,
      E{x*x'} = E{(y+m)*(y'+m')} = E{y*y'}+E{y*m'}+E{m*y'}+E{m*m'} = I + E{y}*m' + m*E{y'} + m*m' = I + m*m'.
      Clearly, for x to be white series,
      (I + m*m') should be equal to I => random vector is not white if the mean is not zero.

      Hope this helps :)

      Delete

Next → ← Prev