This tutorial explains how to extract last n characters or numbers of a variable in SAS. In this tutorial, we will cover several cases in which we pull last 4 character or numeric values from a column. In MS Excel, it is easily possible with RIGHT() function but there is no-inbuilt function to do it in SAS. The RIGHT() function of SAS is used for something else i.e. it right aligns string or character value.

Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.

The output is shown in the image below -

Suppose you have multiple product IDs in which some of them are missing.

To workaround this problem, we can use

Suppose product ID variable is numeric.

Step 1 : LOG10(1021) returns 3.009

Step 2 : INT(LOG10(1021)) makes it 3

Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

**Example 1**Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.

data example;

productID = "AB123ZX253";

run;

data example;

set example;

referenceid =substr(productID,length(productID)-3,4);

run;

The output is shown in the image below -

Get last N Characters |

**Explanation**
The

**SUBSTR()**function returns sub-string from a character variable.= SUBSTR(character-variable, beginning-position, number-of-characters-to-pull)The

**LENGTH()**function returns the length of a character variable. In this case, it is 10 characters long. The calculated SUBSTR() function would work like below -= SUBSTR(productID, 10-3, 4)

= SUBSTR(productID, 7, 4)

**Example 2 : Handle missing while extracting**Suppose you have multiple product IDs in which some of them are missing.

data example2;When you apply SUBSTR() function in case of missing cases, it returns a note in log '

input productID $10.;

cards;

AB123ZX253

AB123ZX254

AB123ZX255

;

run;

**Invalid second argument'.**See the image belowSUBSTR : Missing Cases |

**SUBSTRN()**which handles missing cases while extracting.data example2;

set example2;

referenceid =substrn(productID,length(productID)-3,4);

run;

**Example 3 : Get last N digits from a numeric variable**Suppose product ID variable is numeric.

data example3;The

input productID;

cards;

12345253

12354234

12354235

;

run;

**SUBSTR() / SUBSTRN()**function works only for character variables. In order to make it run for numeric variables, we first need to convert numeric variables to character variables with**PUT()**function.data example3;Since

set example3;

x = put(productID, 10.);

referenceid = input(substrn(x,length(x)-3,4),8.);

drop x;

proc print noobs;

run;

**SUBSTRN()**returns character variable, we need to convert it to numeric with**INPUT()**function.**Another Method -**data example3;

set example3;

referenceid = input(substrn(productID,INT(LOG10(productID))+1-3,4),8.);

proc print noobs;

run;

**How it works -**Step 1 : LOG10(1021) returns 3.009

Step 2 : INT(LOG10(1021)) makes it 3

Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

This was very useful. Thanks so much!

ReplyDelete