This tutorial covers the most frequently used SAS character functions with examples. Dealing with character strings can be a little tricky compared to numeric values. Therefore, it is necessary to understand the practical usage of character functions.
The COMPBL function compresses multiple blanks to a single blank.
In the example below, the Name variable contains a record "Sandy David". It has multiple spaces between the first and last name.
Data char; Input Name $ 1-50 ; Cards; Sandy David Annie Watson Hello ladies and gentlemen Hi, I am good ; Run;
Data char1; Set char; char1 = compbl(Name); run;
The STRIP function removes leading and trailing spaces.
Data char1; Set char; char1 = strip(Name); run;
The COMPRESS function removes leading, between and trailing spaces.
SYNTAXCOMPRESS(String, characters to be removed, Modifier)
Data char1; Set char; char1 = compress(Name); run;
data _null_; x='ABCDEF-!1.234'; string=compress(x,'!4'); put string=; run;It returns ABCDEF-1.23. In SAS 9.1.3, the additional parameter called MODIFIER was added to the function.
- a – Remove all upper and lower case characters from String.
- ak - Keep only alphabets from String.
- kd - Keeps only numeric values
- d – Remove numerical values from String.
- i – Remove specified characters both upper and lower case from String.
- k – keeps the specified characters in the string instead of removing them.
- l – Remove lowercase characters from String.
- p – Remove Punctuation characters from String.
- s – Remove spaces from String. This is default.
- u – Remove uppercase characters from String.
data _null_; x='ABCDEF-!1234'; string=compress(x,'','ak'); put string=; run;It returns ABCDEF
data _null_; x='ABCDEF-!1234'; string=compress(x,'','kd'); put string=; run;It returns 1234
data _null_; x='ABCDEF-!1234'; string=compress(x,'','p'); put string=; run;It returns ABCDEF1234
data _null_; x='ABCDEF-!1.234'; string=compress(x,'0123456789.','k'); put string=; run;It returns 1.234
The LEFT function moves leading blanks to the end of the value. The length of the string does not change.
Data char1; Set char; char1 = left(Name); run;
The TRIM function removes trailing spaces.
Data char1; Set char; char1 = trim(Name); run;
It is equivalent to STRIP function. It first removes leading spaces and then trailing spaces.
The CAT function concatenates character strings. It is equivalent to || sign.
data _null_; a = 'abc'; b = 'xyz'; c= a || b; d= cat(a,b); put c= d =; run;
Both c and d returns "abcxyz".
data _null_; x = "Temp"; y = 22; z = x||y;z1 = cats(x,y);z2 = catx("",x,y); put z = z1= z2 =; run;z = Temp 22
z1=Temp22
z2=Temp 22
- The || keyword inserts multiple spaces when numeric and text values are concatenated.
- CATS strips both leading and trailing blanks, and does not insert separators.
- CATX strips both leading and trailing blanks, and inserts separators. The first argument to CATX specifies the separator.
The SCAN Function extracts words within a value that is marked by delimiters.
SCAN( text, nth word, <delimiters>)For example :
We wish to extract first word in a sentence 'Hi, How are you doing?'. In this case, delimiter is a blank.
data _null_; string='Hi, How are you doing?'; first_word=scan(string, 1, ' ' ); put first_word =; run;first_word returns 'Hi,' since it is the first word in the above sentence using blank as a delimiter. We wish to extract last word in a sentence 'Hi, How are you doing?'. In this case, delimiter is a blank.
data _null_; string='Hi, How are you doing?'; last_word=scan(string, -1, ' ' ); put last_word =; run;
last_word returns 'doing?' since it is the last word in the above sentence.
Let's make it a little complicated.
Suppose, delimiter is a character instead of blank or special sign.
string='Hello SAS community people'; beginning= scan( string, 1, 'S' ); ** returns "Hello "; middle = scan( string, 2, 'S' ); ** returns "A"; end= scan( string, 3, 'S' ); **returns " community people";
The SUBSTR function extracts strings based on character position and length. It is equivalent to MS Excel's MID Function.
= substr(old_var, starting_position, number of characters to keep);
Examples :
data _null_; t="AFHood Analytics Group"; new_var=substr(t,8,9); put new_var =; run;
Result: new_var=Analytics
It replaces a portion of string with new string
data _null_; string='old_variable'; substr(string,1,8) = "new_data"; put string =; run;
Result: string=new_dataable
In this case, SUBSTR replaces the first 8 characters of string with the value "new_data". After this line executes, the value of string will be 'new_dataable'
LOWCASE converts the character string to lowercase.
UPCASE converts the character string to uppercase.
PROPCASE returns the word having uppercase in the first letter and lowercase in the rest of the letter (sentence format).
data _null_; name = 'Hello world'; name_upper = upcase(name); name_lower = lowcase(name); name_proper = propcase(name); put name_upper=; put name_lower=; put name_proper=; run;Output: name_upper=HELLO WORLD name_lower=hello world name_proper=Hello World
The INDEX function finds characters or words in a character variable
data _null_; string='Hi,How are you doing?'; x = index(string, "How"); put x=; run;
x returns 4 as "How" starts from 4th character.
To select all the records having 'ian' in their character.
if index(name,'ian') > 0;
To select all the records having first letter 'H'
if name =: 'H';
The FIND function locates a substring within a string
FIND(character-value, find-string <,'modifiers'> <,start>)
STRING1 = "Hello hello goodbye"
Examples :
1. FIND(STRING1, "hello") returns 7
2. FIND("abcxyzabc","abc",4) 7
The TRANWRD function replaces all occurrences of a word in a character string. It doesn't replace full phrase (entire value content).
TRANWRD (variable name, find what , replace with)
Example : name : Mrs. Joan Smith
name=tranwrd(name, "Mrs.", "Ms.");
Result : Ms. Joan Smith
The TRANSLATE function replaces specific characters in a character expression.
TRANSLATE(source, replace with, find what)
Example : x = translate('XYZW','AB','VW');
Result : "XYZB"
The TRANSLATE function converts every occurrence of a user-supplied character to another character. TRANSLATE can scan for more than one character in a single call. In doing this, however, TRANSLATE searches for every occurrence of any of the individual characters within a string. That is, if any letter (or character) in the target string is found in the source string, it is replaced with the corresponding letter (or character) in the replacement string.
The TRANWRD function differs from TRANSLATE in that it scans for words (or patterns of characters) and replaces those words with a second word (or pattern of characters).
The PRXMATCH function can be used for the following cases :
- When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
- If you need to search a character variable for multiple different substrings.
PRXMATCH (perl-regular-expression, source);Perl Regular Expression
- ^ - start with
- $ - end with
- \D - any non digits
- \d - digits
- ? - may or may not have?
- | - or
- * - repeating
- ( i:) - turns ON the case insensitive search
- (-i:) - turn OFF the case insensitive search
DATA test; INPUT string $ 1-8; prxmatch=prxmatch("/[a-zA-Z]/",string); CARDS; ACBED 11 12 zx 11 2c abc123 ; run;
Note : prxmatch("/[a-zA-Z]/",string) checks first character.
2. Replace multiple words with a new word
if prxmatch('/Luthir|Luthr|Luther/',name) then name='Luthra' ;
The INPUT Function is used to convert character variable to numeric.
new_num=input(character-variable, 4.);Example -
data temp; x = '12345'; new_x = input(x,5.); run;
In the above example, the variable x is a character variable as it is defined in quotes '12345'. The newly created variable new_x is in numeric format.
The PUT Function is used to convert numeric variable to character.
new_char=put(numeric,4.0);
data temp; x = 12345; new_x = put(x,5.); run;
In this example, the variable x is originally in numeric format and later converted to character in the new variable new_x.
The LENGTH function returns length of a string.
data _null_; x='ABCDEF-!1.234'; n= length(x); put n=; run;
It returns 13.
If you need to calculate the number of digits in a numeric variable -
First, we need to convert our numeric variable to character to count the number of digits as LENGTH function works only for character variable.
data _null_; x = 12345; cnt = length(strip(put(x,12.))); put cnt=; run;
In the above nested function, we first converted the variable x to character and then remove spaces by using STRIP function and then count number of digits by using LENGTH() function.
Another Method -data _null_; x = 12345; cnt = int(log10(x)) + 1; put cnt=; run;
We can also use LOG10 function to solve it. LOG10 has a property which says :
Number of Digits = Integer value of [LOG10(x)] + 1. For example, LOG10(100) = 2 so Number of digits in 100 = 2 +1. See LOG10(1100) = 3.04 => INT(3.04) = 3 => 3+1 = Number of digits in 1100.
IF THEN replaces the entire phrase in a string.
data mydata; input names $30.; cards; Raj Gates Allen Lee Dave Sandy William Gates Jon Jedi ; run; data mydata2; set mydata; length new_names $30.; if find(names, "Raj") then new_names = "Raj Kumar"; else new_names = names; run;
In the above SAS program we are checking if the string "Raj" is found within the variable "names". The FIND function is used to perform this search. If the string "Raj" is found, the if block is executed, and the value "Raj Kumar" is assigned to the "new_names" variable in the mydata2 dataset. If the string "Raj" is not found in the names variable, the else block is executed, and the value of the "names" variable is assigned to the "new_names" variable.
The COUNT function counts the number of times that a specified substring appears within a character string.
data _null_; name = "DeepAnshu Bhalla"; x = count(name,"a"); x1 = count(name,"a","i"); put x= x1=; run;
Result : x=2 as there are 2 lower case 'a's in the variable name. x1=3 as there are 3 'a's in the variable name (The 'i' modifier ignores case sensitivity)
The COUNTW function counts the number of words in a character string.
data readin; input name$15.; cards; Trait Jhonson 3+3=6 ; run; data out; set readin; x = countw(name); x1 = countw(name,' '); proc print; run;
Output : COUNTW Function |
If you don't specify delimiter in the second parameter of COUNTW function, the function will use the default delimiter, which is a blank space.
Hi ,I am unable to understand this concept.
ReplyDeleteThere is no Delimiter specified other than blank in the specified string .
Question is why we are using 'S' in the delimiter field.
===============================================================
Let's make it a little complicated.
Suppose, delimiter is a character instead of blank or special sign.
string='Hello SAS community people';
beginning= scan( string, 1, 'S' ); ** returns "Hello ";
middle = scan( string, 2, 'S' ); ** returns "A";
end= scan( string, 3, 'S' ); **returns " community people";
We can use any character as a delimeter.
DeleteIf we specify 'S' as a delimeter, SAS would read it like -
First Word - Hello Second Word - A Third Word - community people
third word is "community" only correct ?
Deletehere "S" acting as a delimiter so the third word will whole "community people".
Deleteif we give "S" as delimiter how it will take third word as community people can you explain in brief
Deletebecause you are considering blank also....whereas the delimiter here is 'S' and not space. So it will consider 'community people' as one word not two.
DeleteHere "S" acting as delimiter
DeleteSo obviously it will be
Hello "S" A "S" community people
Can we give any other delimiter other than "s"
DeleteIt is so nice to read.. keep posting
ReplyDeleteThank you for your appreciation!
DeleteDeepanshu, I simply love reading your posts, in fact I just passed Base SAS Certification, and I would give a great deal of credit to you for that. Extremely grateful for providing us such a useful website. Have a small request, is it possible for you to provide PDF format for SAS interview questions ? Regards,
ReplyDeleteGlad you found posts useful and congratulations for clearing Base SAS Certification. Cheers!
DeleteIN SAS University Edition the space between 2 words is always single space. thus there is no point of using COMPBL.
ReplyDeleteIn real-world datasets, we generally encounter a case wherein we need to remove extra space within the text. In corporate, you would not use SAS University Edition. You would rather use enterprise version of SAS or SAS EG to perform data cleaning.
DeleteHi Deepanshu!
ReplyDeleteI have collected data on change in educational practices using a questionaire and wish to do the analysis. What all types of analysis can be done using SAS?
SAS supports many survey analysis such as Crosstabs, Linear Regression, Logistic Regression, Cox-Regression Analysis for survey data. You can explore PROC SURVEYLOGISTIC, PROC SURVEYREG, PROC SURVEYPHREG, PROC SURVEYSELECT etc
DeleteThanks Man!
DeleteHi Deepanshu!
ReplyDeleteNeed a favor from you. Can you suggest some topics for last year project that uses SAS Analytics. Thnaks in advance!
Hi
ReplyDeleteWhat is the difference between strip and compress function please ?
regards
STRIP removes leading and trailing spaces. Whereas, COMPRESS removes leading, within text spaces and trailing spaces. Check the article above for examples.
DeleteHi Deepanshu.
ReplyDeleteneed a favour could you also place the pharma data and programs(TLF's) which will help full for pharma ppl also.
Hi Deepanshu,
ReplyDeleteVery nice post. I am stuck with a problem. I have a file with a column that contains the following types of strings:
Word1 Word2 ABCD XYZ. or
Word1 ABC XYZ. or
Word1 AB XYZ.
What I need to do is to separate strings ABCD XYZ, ABC XYZ and
AB XYZ and put them in separate column. So my final result would be 2 columns:
COLUMN1| COLUMN2
Word1 Word2 | ABCD XYZ
Word1 |ABC XYZ
Word1 |AB XYZ
I tried to combine reverse and substr functions to separate strings into two columns but was not able to do because length of the string ABCD XYZ is 8, length of ABC XYZ is 7 and of AB XYZ is 6.
Would you be able to tell me how would you separate these strings into two columns?
Thanks
try this :
Deletee=substr(a,length(a)-4,5);
change the digits n variable name as per your requirement
can you explain in detail .
Delete
DeleteDATA X;
STR="ABCDXYZ";
RUN;
DATA ABC;
SET X;
COLUMN2= SUBSTR(STR,1,2)||SUBSTR(STR,5,3);
COLUMN1= SUBSTR(STR,1,7);
WORD1= SUBSTR(STR,1,4);
WORD2= SUBSTR(STR,5,3);
RUN;
PROC TRANSPOSE DATA = ABC OUT=KK;
VAR COLUMN1 COLUMN2 WORD1 WORD2;
QUIT;
Hi Deepanshu, Many thanks for this info. It is simple and easy to understand.
ReplyDeleteI have learned base SAS. Can you please guide me how to prepare and clear base SAS exam.
Your suggestions are utmost important to me.
Thanks
Naresh
Thank you for the great post. The examples were really helpful in clarifying. I have my base sas cert and looking for work, any suggestions?
ReplyDeletethanks
Charles
data t;
ReplyDeleteinput a $200.;
cards;
red blood cells(rbc)
white blood cells(wbc)
;
run;
I need output as
Red Blood Cells(RBC)
White Blood Cells(WBC)
How to write the code for this output?
Thanks in Advance
data p;
Deleteset t;
b=scan(a,1,'');
c=scan(a,2,'');
d=scan(a,3,'( ') ;
e=substr(a,length(a)-4,5);
f=propcase(b);
g=propcase(c);
h=propcase(d);
i=upcase(e);
j=compbl(f||(g||h));
k=cats(j,i);
drop b c d e f g h i j;
run;
DATA D;
DeleteSET T;
NE=PROPCASE(scan(a,1));
d=propcase(scan(a,2));
e=propcase(scan(a,3));
cb=substr(a,length(a)-4,5);
cv=upcase(cb);
full=catx(" ",ne,d,e,cv);
keep full;
run;
data t2;
Deleteset t;
a=propcase(a);
a=tranwrd(a,'bc)','BC)');
run;
DATA W;
DeleteSET t;
x=propcase(scan(a,1,'('));
y=upcase(substr(a,index(a,'('),5));
z=cats(x,y);
drop x y;
run;
data p;
ReplyDeleteset t;
b=scan(a,1,'');
c=scan(a,2,'');
d=scan(a,3,'( ') ;
e=substr(a,length(a)-4,5);
f=propcase(b);
g=propcase(c);
h=propcase(d);
i=upcase(e);
j=compbl(f||(g||h));
k=cats(j,i);
drop b c d e f g h i j;
run;
data asd;
ReplyDeleteset t;
a=propcase(scan(a,1,''));
b=propcase(scan(a,2,''));
c=scan(a,3,'');
e=find(c,'(');
f=propcase(substr(c,1,e-1))||upcase(substr(c,e,length(c)-1));
n=catx("",a,b);
m=catx("",n,f);
keep a m;
run;
Hi All,
ReplyDeleteAny idea on : "The || keyword inserts multiple spaces when numeric and text values are concatenated." How many spaces are being inserted in between Temp and 22 in this example in case of cat()?
Also, can someone please share the default size of string value and numeric value in sas?
data _null_;
ReplyDeletestring='Hi, How are you doing?';
first_word=scan(string, 1, ' ' );
put first_word =;
run;
Should return "Hi," instead of Hi.
Thank you.
Hi, please check the below code for "Hi," result.
Deletedata _null;
string='Hi, How are you doing?';
first_word=quote(scan(string, 1,' '));
*quote is used to give quotes for varaible;
put first_word =;
run;
data names;
ReplyDeleteinfile datalines;
input name $;
datalines;
arun krishna
gopal rao
venu vardhan reddy
rames krishna rao sunkara
kiran
;
run;
I want to 3 more variables like
first name
middle name
sir name
if the variable does not have the middle name or last name then it should be blank please can you tell me the how to solve that particular problem
data na (keep=name firstname middlename lastname);
Deleteset names ;
firstname=scan(name,1);
y=scan(name,2);
z=scan(name,3);
r=scan(name,4);
q=catx(' ',y,z,r);
lastname=scan(q,-1);
s1=scan(q,-2);
s2=scan(q,-3);
middlename=catx(' ',s1,s2);
run;
Thanks for excellent materials .
ReplyDeletequestion 1.Left function will not remove leading spaces.it will move the content to left and spaces will be moved to right.but your description seems to be wrong.
data _null_;
ReplyDeletex = 12345.6;
cnt = length(strip(put(x,12.)));
put cnt=;
run;
should return 6 but its returning 7,please help
data _null_;
Deletex = 12345;
cnt = length(strip(put(x,12.1)));
put cnt=;
run;
should return 6 but its returning 7,please help
prev post is incorrect sorry
in this case the value of x is "12345.0" , so the length is 7.
Deletehow to find second occurrence of a word or letter in a string.kindly explain
ReplyDeleteHello Deepanshu,
ReplyDeletedata readin;
input name$15.;
cards;
Trait Jhonson
3+3=6
;
run;
data out;
set readin;
x = countw(name);
x1 = countw(name,' ');
proc print;
run;
Output : COUNTW Function
In this example involving 3+3=6, how x gives 2 as the number of words? Are + or = consider as special characters?
Hi,
ReplyDeleteIf x="Ramachandra";
In this character Value, I need to output only "r" and "R" Like x1="Rr" .
Do we have any particular function to achieve this ?!
data g;
Deletex="Ramachandra";
x1=compress(x,'Rr','k');
run;
Hi,
ReplyDeleteIf x="Ramachandra";
In this character Value, I need to output only "r" and "R" Like x1="Rr" .
Do we have any particular function to achieve this ?!
Hi Harish,
DeleteYes we have compress function with modifier. Please check the below code.
data temp2;
x="Ramachandra";
y=compress(x,'R''r','k');
run;
data p1;
Deletename='Ramchandra';
p2=compress(name,'amchnd');
run;
Hi
ReplyDeleteYou are doing a great job. Thanks.
Please would you be able to explain the round function.
Great Work sir, keep posting always.
ReplyDeleteThank for creating listen data.
data qwer;
ReplyDeleteinput name $18.;
datalines;
imran khan
mohsin uzzama
mohammed oviz khan
;
run;
output should be like this ;-
NAME Fname Lname MailId
imran khan imran khan imran.khan@gmail.com
mohsin uzzama mohsin uzzama mohsin.uzzama@gmail.com
mohammed oviz khan mohammed oviz khan mohammed.oviz.khan@gmail.com
Hi, Please check the below code.
ReplyDeletedata temp;
set qwer;
Fname = scan(name,1,' ');
Lname= scan(name,-2,"");
mailid= tranwrd(cats(name,"@gmail.com"), " ",".");
run;
data t;
ReplyDeleteinput a $200.;
cards;
red blood cells(rbc)
white blood cells(wbc)
;
run;
I need output as
Red Blood Cells(RBC)
White Blood Cells(WBC)
How to write the code for this output?
Thanks in Advance
Ans:
HI the below code will give the result as WBC AND RBC.
data t2;
set t;
b= (scan(a,-1,'('));
b= tranwrd(b,')',' ');
b=upcase(b);
run;
Thanks.
hellon
ReplyDelete