This tutorial explains how to extract last N characters or digits of a variable in SAS, along with examples.

In MS Excel, it is easily possible with the RIGHT() function but there is no built-in function to do it in SAS. The RIGHT() function of SAS is used for something else i.e. it right aligns string or character value.

Suppose you have a product ID in which last 4 characters refers to a product category so you are asked to pull product category information.

data example; productID = "AB123ZX253"; run; data example; set example; referenceid =substr(productID,length(productID)-3,4); run;

The output is shown in the image below -

The **SUBSTR()** function returns sub-string from a character variable.

= SUBSTR(character-variable, beginning-position, number-of-characters-to-pull)

The **LENGTH()** function returns the length of a character variable. In this case, it is 10 characters long. The calculated SUBSTR() function would work like below -

= SUBSTR(productID, 10-3, 4) = SUBSTR(productID, 7, 4)

Suppose you have multiple product IDs in which some of them are missing.

data example2; input productID $10.; cards; AB123ZX253 AB123ZX254 AB123ZX255 ; run;

When you apply SUBSTR() function in case of missing cases, it returns a note in log '**Invalid second argument'.** See the image below

To workaround this problem, we can use **SUBSTRN() **which handles missing cases while extracting.

data example2; set example2; referenceid =substrn(productID,length(productID)-3,4); run;

Suppose product ID variable is numeric.

data example3; input productID; cards; 12345253 12354234 12354235 ; run;

The **SUBSTR() / SUBSTRN()** function works only for character variables. In order to make it run for numeric variables, we first need to convert numeric variables to character variables with **PUT() **function.

data example3; set example3; x = put(productID, 10.); referenceid = input(substrn(x,length(x)-3,4),8.); drop x; proc print noobs; run;

Since **SUBSTRN()** returns character variable, we need to convert it to numeric with **INPUT()** function.

data example3; set example3; referenceid = input(substrn(productID,INT(LOG10(productID))+1-3,4),8.); proc print noobs; run;

- LOG10(1021) returns 3.009
- INT(LOG10(1021)) makes it 3
- Adding 1 to step 2 makes it 4, which is the result of the number of digits in 1021

In this process, we first calculate the logarithm base 10 of the number 1021, which gives us a result of 3.009. Then, by using the INT function, we extract the integer part of the logarithm, resulting in 3. Finally, by adding 1 to the integer part, we determine that the number 1021 has 4 digits.

This was very useful. Thanks so much!

ReplyDeletebut ..when numeric variable when used in character function ..implicit conversion from numeric to character happen ....why convert explicitly with put statement.

ReplyDeleteHow can we find last position of two digit number

ReplyDeletecould anyone explain the meaning of -3.

ReplyDeleteSecond parameter in SUBSTR() refers to starting position. In the first example, it returns 7 after subtracting 3 from productID. End result would return 7th, 8th, 9th and 10th character. In total it returns last 4 characters.

Delete-3 becoz we want last 4 digit so when we are doing lenght(productID) it gives 10 and 10-3 gives 4 that's why there is -3.

DeleteIt gives the result from right to left

Deletewhat if you have different lengths of a variable? in that case , how will you extract last 4 characters

DeleteIf we want last 4 characters, then we would want the substr()'s second argument be the fourth character's position from right to left, which is the length of the string - 3; If we want the last 5, then it should be the length - 4.

DeleteCan I use substr (, -1,4) for getting last 4 characters?

ReplyDeleteNO

Deletewhat if the input variables has different length and we have to extract some characters from the string of different length

ReplyDeleteCan anyone answer how can I output the date in YYYY-MM-DD format if I have diferent tipe of dates?

ReplyDeleteExample

17.Jan/2013

13-11-14

3/12/16

January 16 2019

16 feb 2020

haha i think recently u attend interview for hcl screening round

Deletebrother u got answer or not

Deleteplease let us know

data have;

Deleteinput char_date : $20. fmt : $20.;

num_char=inputn(char_date,fmt);

format num_char date9.;

cards;

12-08-2015 DDMMYY12.

12/8/2016 MMDDYY12.

05/25/2015 MMDDYY12.

;

run;

proc print;run;

how do i get the position for a numeric value?

ReplyDeleteI HAVE NEED FRIST FOUR STRINGS OF FRIST SPACE

ReplyDeletedata ab;

ReplyDeleteinput var1 var2 var3 var4 var5 var6 var7 var8 var9 var10;

cards;

1 2 3 4 5 6 7 8 9 10

run;

how can we convert the above 10 numeric varianbles into character variables by using arrays?

data ink2;

ReplyDeleteinput joindate;

informat joindate :anydtdte.;

cards;

17.Jan/2013

13-11-14

3/12/16

january 16 2019

16feb2020

;

proc print;

run;

can anyone answer this query am facing error

data ink2;

Deleteinput joindate:anydtdte;

cards;

17.Jan/2013

13-11-14

3/12/16

january 16 2019

16feb2020

;

proc print;

run;

pradeep.malegave123@yahoo.com in this mail id how can i extract my last name. ie malegave. Please give solution in base sas.

ReplyDeletedata db;

Deletey='pradeep.malegave123@yahoo.com';

pos1=index(y,'.');

pos2=index(y,'@');

out=compress(substr(y,pos1+1,pos2-pos1),'','ak');

put out=;

drop pos1 pos2;

run;

data db;

Deletey='pradeep.malegave123@yahoo.com';

pos1=index(y,'.');

pos2=index(y,'@');

out=compress(substr(y,pos1+1,pos2-pos1),'','ak');

put out=;

drop pos1 pos2;

run;

data db;

Deletey='pradeep.malegave123@yahoo.com';

pos1=index(y,'.');

pos2=index(y,'@');

out=compress(substr(y,pos1+1,pos2-pos1),'','ak');

put out=;

drop pos1 pos2;

run;

Use of length function inside substr function was useful. Thanks. - Dr. Abhijeet Safai

ReplyDelete