This tutorial explains how to use SUBSTR function in SAS, along with examples.
The SUBSTR function in SAS is used to extract a specific part of a string.
Syntax of SUBSTR Function
Below is the syntax of SUBSTR function in SAS.
SUBSTR(string, start, length)
string
: String from which you want to extract a substring.start
: Starting position where the extraction should start.length
: Number of characters to extract from the string. This is an optional argument. If you don't specify it, SAS would read the number of characters from the starting position to the end of the string.
Let's create a sample SAS dataset that will be used to demonstrate examples in this tutorial.
data readin; input name $20.; datalines; John Simpson Dane Stewart Deepanshu Bhalla Jonathan Lee ; run;
How to extract First N Characters?
In the example below, we are extracting first name from the full name. We are pulling first 4 characters from the variable name
.
data readin2; set readin; firstname = substr(name, 1, 4); run; proc print data=readin2; run;
Incorrect Method: As shown in the image above, the first names of the last two individuals have a length greater than 4 characters. Hence, this results in an incorrect output.
Correct Method: We can use the FIND
function to find out the position of the first space in the name. The "firstname" variable extracts characters from the beginning of the name until the space. We have modified the SUBSTR function accordingly by combining it with the FIND function to make the code dynamic.
data readin2; set readin; firstname = substr(name, 1, find(name, ' ') - 1); run; proc print data=readin2; run;
How to extract Last N Characters?
In this example, we are showing how to extract last 3 characters using SUBSTR function in SAS.
The LENGTH
function is used here to determine the length of the variable "name". We have substracted 2 from it to set as starting position so that we can fetch last 3 characters from the variable "name".
data readin2; set readin; last3 = substr(name, length(name)-2,3); run; proc print data=readin2; run;
To extract last name from the variable name
, you can use the code below.
data readin2; set readin; lastname1 = substr(name, 6); lastname2 = substr(name, find(name, ' ') + 1); run; proc print data=readin2; run;
- The
lastname1
variable is static and always extracts from the 6th character till the end . - The
lastname2
variable extracts characters after the space till the end of the name. Here we are using FIND function like the previous example to determine the position of space.
Note: In the code above, we have not defined the third argument of the SUBSTR function. SAS automatically considers it as extracting characters till the last character by default.
How to use SUBSTR in IF-ELSE?
Suppose you want to create a new variable based on the first letter of the variable "name". If the first letter is 'J' then set new variable as 'pass' else 'fail'.
data readin2; set readin; if substr(name, 1,1) = 'J' then newvar = 'pass'; else newvar = 'fail'; run; proc print data=readin2; run;
How to replace characters using SUBSTR?
Suppose you have a phone number and you want to change the country code of the number. Here we are replacing the second and third character with '87'.
data _null_ ; phone='(91) 9811-4343' ; substr(phone, 2, 2)='87' ; put phone=; run;
Result: phone=(87) 9811-4343
phone has been changed from (91) 9811-4343 to (87) 9811-4343.
Share Share Tweet