Extract last 4 characters / digits of value in SAS

This tutorial explains how to extract last n characters or numbers of a variable in SAS. In this tutorial, we will cover several cases in which we pull last 4 character or numeric values from a column. In MS Excel, it is easily possible with RIGHT() function but there is no-inbuilt function to do it in SAS. The RIGHT() function of SAS is used for something else i.e. it right aligns string or character value.

Example 1

Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.
data example;
productID = "AB123ZX253";
run;

data example;
set example;
referenceid = substr(productID,length(productID)-3,4);
run;

The output is shown in the image below -
Get last N Characters
Explanation

The SUBSTR() function returns sub-string from a character variable.
= SUBSTR(character-variable, beginning-position, number-of-characters-to-pull)
The LENGTH() function returns the length of a character variable. In this case, it is 10 characters long. The calculated SUBSTR() function would work like below -
= SUBSTR(productID, 10-3, 4)
= SUBSTR(productID, 7, 4)

Example 2 : Handle missing while extracting

Suppose you have multiple product IDs in which some of them are missing.
data example2;
input productID $10.;
cards;
AB123ZX253

AB123ZX254
AB123ZX255
;
run;
When you apply SUBSTR() function in case of missing cases, it returns a note in log 'Invalid second argument'. See the image below

SUBSTR : Missing Cases
To workaround this problem, we can use SUBSTRN() which handles missing cases while extracting.
data example2;
set example2;
referenceid = substrn(productID,length(productID)-3,4);
run;
Example 3 : Get last N digits from a numeric variable

Suppose product ID variable is numeric.
data example3;
input productID;
cards;
12345253

12354234
12354235
;
run;
The SUBSTR() / SUBSTRN() function works only for character variables. In order to make it run for numeric variables, we first need to convert numeric variables to character variables with PUT() function.
data example3;
set example3;
x = put(productID, 10.);
referenceid = input(substrn(x,length(x)-3,4),8.);
drop x;
proc print noobs;
run;
Since SUBSTRN() returns character variable, we need to convert it to numeric with INPUT() function.

Another Method -
data example3;
set example3;
referenceid = input(substrn(productID,INT(LOG10(productID))+1-3,4),8.);
proc print noobs;
run;
How it works -

Step 1 : LOG10(1021) returns 3.009
Step 2 : INT(LOG10(1021)) makes it 3
Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

Best Online Course : Practical SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects & Case Studies
- Job Placement Assistance
- Money Back Guarantee


SAS Tutorials : 100 Free SAS Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

1 Response to "Extract last 4 characters / digits of value in SAS"

  1. This was very useful. Thanks so much!

    ReplyDelete

Next → ← Prev