This tutorial demonstrates various ways to calculate lag and lead in SAS. In time series data, it is generally required to calculate lag and lead of one or more measured variables.
Time series or longitudinal data are considered one of the most challenging data manipulation tasks. This tutorial covers various data manipulation tricks to make it easy.
Looking back some number of periods or rows.
Lag of first order - Looking back 1 observation (last value). It is denoted by lag1_value in the image shown below.
Lag of second order - Looking back 2 observations. It is denoted by lag2_value in the image shown below.
Calculate Lag in SAS |
The program below creates a sample data for demonstration.
data example; input id value; cards; 1 25 1 30 1 35 1 40 2 25 2 30 2 37 ; run;
In SAS, the LAG function is used to compare the current value to its predecessors. If you want to calculate lag of second order, use LAG2 function. Similarly, you can use LAG3 function for measuring lag of third order.
data temp; set example; lag1_value = lag(value); lag2_value = lag2(value); run;
Suppose you are asked to calculate LAG of first order by a grouping variable. In this case, there is no SAS function directly available to accomplish this task. We need to apply some tricks to make it happen.
The program below uses BY statement and FIRST. to measure lag of first order by ID. First, we are calculating lag and then make the value missing against the first record in a group i.e. ID.
data temp; set example; by id; lag_value = lag(value); if first.id then lag_value = .; run;
Lag by Group |
The POINT= Option is used to point to a specified row number. The _N_ system variable is used to generate a serial number which are used as an index. The NOBSoption is used to calculate the number of rows in a variable.
data example; set example; lagid = lag(id); run; data lags; if _n_ <= k then do; set example point = _n_; if lagid = id then do; N = _N_ - 1; set example point = N; lag_value = value; end; end; else lag_value = .; set example nobs = k; by id; drop lagid; run;
In the program below, we are first creating a sequence of values and later we are joining two tables taking primary key as current value from the first table with predecessor value from the second table.
data example; set example; by id; if first.id then N = 1; else N + 1; run;
proc sql; select a.* , b.value as lag_value from example a left join example b on a.id = b.id and a.N = b.N + 1; quit;
PROC EXPAND is one of the most useful procedure of SAS ETS. It has multiple functions such as creating lag, lead and moving average variables by group(s), aggregation of previous k rows etc.
proc expand data= example out= t2; convert value = lag_value /transformout = (lag 1); by id; run;
It requires license of SAS ETS package.
Looking ahead some number of rows or periods.
Lead Value of First Order - Looking ahead one observation of a variable (next row).
Lead Value of Second Order -Looking ahead two observations of a variable (next to next row).
Lead Value |
In SAS, there is no direct function for calculating LEAD. I feel the same there should be a SAS function for it :-) It can be calculated with the following logic :
Create a sequence of numbers and then sort the sequence by descending order. Then we calculate lag of the variable for which we need to calculate lead. At last, we sort the data by sequence ID.
data temp; set example; x + 1; run; proc sort data = temp; by descending x; run; data temp2; set temp; lead_value = lag(value); run; proc sort data = temp2; by x; run;
The NOBS option counts the number of rows in a variable. The _N_+ 1 creates a sequence of numbers start from 2 to (number of records+ 1).The POINT= points to a row when the sequence of numbers are less than or equal to number of rows.
data temp; _N_+1; if _N_ <= k then do; set example point=_N_; lead_value = value; end; else lead_value = .; set example nobs=k; proc print noobs; run;
In this section, we will cover how to calculate lead by group.
In this case, we are applying the same trick that we have used above to calculate lead. The only difference is we need to incorporate grouping variable to calculate lead by a grouping variable (ID).
data temp; set example; by id; if first.id then x =1; else x + 1; run; proc sort data = temp; by id descending x; run; data temp2; set temp; by id descending x; l = lag(x); lead_val = lag(value); if first.id then do; l = .; lead_val = .; end; run; proc sort data = temp2; by id x; run;
Lead Variables |
In the program below, we are performing self joins on next value with the current value.
data example; set example; by id; if first.id then N = 1; else N + 1; run; proc sql; select a.* , b.value as lag_value from example a left join example b on a.id = b.id and a.N + 1 = b.N; quit;
In the program below, we have used the same POINT= and NOBStrick that we have used in the above article.
data example; set example; lagid = lag(id); run;
data leads; _n_ + 1; if _n_ <= n then do; set example point=_n_; if lagid = id then do; y = value; end; end; else y = .; set example nobs = n; by id; drop lagid; proc print noobs; run;
PROC EXPAND can be used to calculate leads by a grouping variable. It can also calculate moving averages.
proc expand data= example out= t2; convert value = lead_value /transformout = (lead 1); by id; run;
The above method works only if you have a license of SAS ETS package.
Share Share Tweet