This tutorial explains how to extract last n characters or numbers of a variable in SAS. In this tutorial, we will cover several cases in which we pull last 4 character or numeric values from a column. In MS Excel, it is easily possible with RIGHT() function but there is no-inbuilt function to do it in SAS. The RIGHT() function of SAS is used for something else i.e. it right aligns string or character value.

Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.

The output is shown in the image below -

Suppose you have multiple product IDs in which some of them are missing.

To workaround this problem, we can use

Suppose product ID variable is numeric.

Step 1 : LOG10(1021) returns 3.009

Step 2 : INT(LOG10(1021)) makes it 3

Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

**Example 1**Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.

data example;

productID = "AB123ZX253";

run;

data example;

set example;

referenceid =substr(productID,length(productID)-3,4);

run;

The output is shown in the image below -

Get last N Characters |

**Explanation**
The

**SUBSTR()**function returns sub-string from a character variable.= SUBSTR(character-variable, beginning-position, number-of-characters-to-pull)The

**LENGTH()**function returns the length of a character variable. In this case, it is 10 characters long. The calculated SUBSTR() function would work like below -= SUBSTR(productID, 10-3, 4)

= SUBSTR(productID, 7, 4)

**Example 2 : Handle missing while extracting**Suppose you have multiple product IDs in which some of them are missing.

data example2;When you apply SUBSTR() function in case of missing cases, it returns a note in log '

input productID $10.;

cards;

AB123ZX253

AB123ZX254

AB123ZX255

;

run;

**Invalid second argument'.**See the image belowSUBSTR : Missing Cases |

**SUBSTRN()**which handles missing cases while extracting.data example2;

set example2;

referenceid =substrn(productID,length(productID)-3,4);

run;

**Example 3 : Get last N digits from a numeric variable**Suppose product ID variable is numeric.

data example3;The

input productID;

cards;

12345253

12354234

12354235

;

run;

**SUBSTR() / SUBSTRN()**function works only for character variables. In order to make it run for numeric variables, we first need to convert numeric variables to character variables with**PUT()**function.data example3;Since

set example3;

x = put(productID, 10.);

referenceid = input(substrn(x,length(x)-3,4),8.);

drop x;

proc print noobs;

run;

**SUBSTRN()**returns character variable, we need to convert it to numeric with**INPUT()**function.**Another Method -**data example3;

set example3;

referenceid = input(substrn(productID,INT(LOG10(productID))+1-3,4),8.);

proc print noobs;

run;

**How it works -**Step 1 : LOG10(1021) returns 3.009

Step 2 : INT(LOG10(1021)) makes it 3

Step 3 : Adding 1 to step2 makes it 4, which is result of number of digits in 1021.

This was very useful. Thanks so much!

ReplyDeletebut ..when numeric variable when used in character function ..implicit conversion from numeric to character happen ....why convert explicitly with put statement.

ReplyDeleteHow can we find last position of two digit number

ReplyDeletecould anyone explain the meaning of -3.

ReplyDeleteSecond parameter in SUBSTR() refers to starting position. In the first example, it returns 7 after subtracting 3 from productID. End result would return 7th, 8th, 9th and 10th character. In total it returns last 4 characters.

Delete-3 becoz we want last 4 digit so when we are doing lenght(productID) it gives 10 and 10-3 gives 4 that's why there is -3.

DeleteIt gives the result from right to left

Deletewhat if you have different lengths of a variable? in that case , how will you extract last 4 characters

DeleteIf we want last 4 characters, then we would want the substr()'s second argument be the fourth character's position from right to left, which is the length of the string - 3; If we want the last 5, then it should be the length - 4.

DeleteCan I use substr (, -1,4) for getting last 4 characters?

ReplyDeleteNO

Deletewhat if the input variables has different length and we have to extract some characters from the string of different length

ReplyDelete