SAS : Character Functions

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 20% off till July 14, 2017

This tutorial covers most frequently used SAS character functions with examples. It's a little bit tricky to deal character strings as compared to numeric values. Hence, it is required to know the practical usage of character functions.

1. COMPBL Function

It compresses multiple blanks to a single blank.

In the example below, the Name variable contains a record "Sandy   David". It has multiple spaces between the first and last name.

Create a dummy data
Data char;
Input Name $ 1-50 ;
Cards;
Sandy    David
Annie Watson
Hello ladies and gentlemen
Hi, I am good
;
Run;
Use COMPBL Function
Data char1;
Set char;
char1 = compbl(Name);
run;
Output



2. STRIP Function

It removes leading and trailing spaces.
Data char1;
Set char;
char1 = strip(Name);
run;
3. COMPRESS Function

SYNTAX 
COMPRESS(String, characters to be removed, Modifier)
Default - It removes leading, between and trailing spaces
Data char1;
Set char;
char1 = compress(Name);
run;
Output


Remove specific characters
data _null_;
x='ABCDEF-!1.234';
string=compress(x,'!4');put string=;
run;
It returns ABCDEF-1.23.

In SAS 9.1.3, the additional parameter called MODIFIER was added to the function.

The following keywords can be used as modifiers-
  1. a – Remove all upper and lower case characters from String.
  2. ak - Keep only alphabets from String.
  3. kd - Keeps only numeric values
  4. d – Remove numerical values from String.
  5. i – Remove specified characters both upper and lower case from String.
  6. k – keeps the specified characters in the string instead of removing them.
  7. l – Remove lowercase characters from String.
  8. p – Remove Punctuation characters from String.
  9. s – Remove spaces from String. This is default.
  10. u – Remove uppercase characters from String.

Example 1 : Keep only alphabets from alphanumeric values
data _null_;
x='ABCDEF-!1234';
string=compress(x,'','ak');put string=;
run;
It returns ABCDEF

Example 2 : Keep only numeric from alphanumeric
data _null_;
x='ABCDEF-!1234';
string=compress(x,'','kd');
put string=;
run;
It returns 1234

Example 3 : Remove all punctuation from string
data _null_;
x='ABCDEF-!1234';
string=compress(x,'','p');
put string=;
run;
It returns ABCDEF1234

Example 4 : Keep Integer Values from String
data _null_;
x='ABCDEF-!1.234';
string=compress(x,'0123456789.','k');put string=;
run;
It returns 1.234

4. LEFT Function

It removes leading spaces.
Data char1;
Set char;
char1 = left(Name);
run;

5. TRIM Function

It removes trailing spaces.
Data char1;
Set char;
char1 = trim(Name);
run;

6. TRIM(LEFT(string))

It is equivalent to STRIP function. It first removes leading spaces and then trailing spaces.


7. CAT Function

It concatenates character strings. It is equivalent to || sign.
data _null_;
a = 'abc';
b = 'xyz';
c= a || b;
d= cat(a,b);
put c= d =;
run;
Both c and d returns "abcxyz".

Concatenate String and Numeric Value
data _null_;
x = "Temp";
y = 22;
z = x||y;
z1 = cats(x,y);
z2 = catx("",x,y);

put z = z1= z2 =;
run;
z = Temp          22 
z1=Temp22 
z2=Temp 22

Note -

  1. The || keyword inserts multiple spaces when numeric and text values are concatenated.
  2. CATS strips both leading and trailing blanks, and does not insert separators.
  3. CATX strips both leading and trailing blanks, and inserts separators. The first argument to CATX specifies the separator.


8. SCAN Function

It extracts words within a value that is marked by delimiters.
SCAN( text, nth word, <delimiters>)
For example : 

We wish to extract first word in a sentence 'Hi, How are you doing?'. In this case, delimiter is a blank.
data _null_;
string='Hi, How are you doing?';
first_word=scan(string, 1, ' ' );
put first_word =;
run;
first_word returns 'Hi' since it is the first word in the above sentence.

We wish to extract last word in a sentence 'Hi, How are you doing?'. In this case, delimiter is a blank.
data _null_;
string='Hi, How are you doing?';
last_word=scan(string, -1, ' ' );
put last_word =;
run;
last_word returns 'doing?' since it is the last word in the above sentence.

Let's make it a little complicated.

Suppose, delimiter is a character instead of blank or special sign.
string='Hello SAS community people';
beginning= scan( string, 1, 'S' ); ** returns "Hello ";
middle = scan( string, 2, 'S' ); ** returns "A";
end= scan( string, 3, 'S' ); **returns " community people";

9. SUBSTR

It extracts strings based on character position and length. It is equivalent to MS Excel's MID Function.
= substr(old_var, starting_position, number of characters to keep);
Examples :
t="AFHood Analytics Group";
new_var=substr(t,8,9);
Result : Analytics

10. SUBSTR(Left of =) Function

It replaces a portion of string with new string
substr("old_variable",1,8) = new_data;
Result : New_dataable

11. LOWCASE, UPCASE and PROPCASE

LOWCASE converts the character string to lowercase.

UPCASE converts the character string to uppercase.

PROPCASE returns the word having uppercase in the first letter and lowercase in the rest of the letter (sentence format).


12. INDEX Function

It finds characters or words in a character variable
data _null_;
string='Hi,How are you doing?';
x = index(string, "How");
put x=;
run;
x returns 4 as "How" starts from 4th character.

To select all the records having 'ian' in their character.
if index(name,'ian') > 0;
To select all the records having first letter 'H'
if name =: 'H';

13. FIND Function

To locate a substring within a string
FIND (character-value, find-string <,'modifiers'> <,start>)

STRING1 = "Hello hello goodbye"

Examples :

1.  FIND(STRING1, "hello") returns 7

2.  FIND("abcxyzabc","abc",4) 7


14. TRANWRD Function

It replaces all occurrences of a word in a character string. It doesn't replace full phrase (entire value content).
TRANWRD (variable name, find what , replace with)

Example :  name : Mrs. Joan Smith

name=tranwrd(name, "Mrs.", "Ms.");

Result : Ms. Joan Smith


15. TRANSLATE Function

It replaces specific characters in a character expression
TRANSLATE(source, replace with, find what)
Examples:

x = translate('XYZW','AB','VW');

Result :  "XYZB"

Difference between TRANWRD and TRANSLATE Functions

The TRANSLATE function converts every occurrence of a user-supplied character to another character. TRANSLATE can scan for more than one character in a single call. In doing this, however, TRANSLATE searches for every occurrence of any of the individual characters within a string. That is, if any letter (or character) in the target string is found in the source string, it is replaced with the corresponding letter (or character) in the replacement string. 
The TRANWRD function differs from TRANSLATE in that it scans for words (or patterns of characters) and replaces those words with a second word (or pattern of characters).

16. PRXMATCH

It can be used for the following cases :
  1. When you want to identify if there is alphanumeric (has any letter from A to Z) in a variable.
  2. If you need to search a character variable for multiple different substrings.
PRXMATCH (perl-regular-expression, source);
Perl Regular Expression
  1. ^ - start with 
  2. $ - end with 
  3. \D - any non digits 
  4. \d - digits 
  5. ? - may or may not have? 
  6. | - or 
  7. * - repeating 
  8. ( i:) - turns ON the case insensitive search
  9. (-i:) - turn OFF the case insensitive search

1. Check alphanumeric value

DATA test;
INPUT string $ 1-8;
prxmatch=prxmatch("/[a-zA-Z]/",string);
CARDS;
ACBED
11
12
zx
11 2c
abc123
;
run; 

Note : prxmatch("/[a-zA-Z]/",string) checks first character.


2. Replace multiple words with a new word
if prxmatch('/Luthir|Luthr|Luther/',name) then name='Luthra' ;

17. INPUT and PUT Function

INPUT Function is used to convert character variable to numeric.
new_num=input(character-variable, 4.);
Example - 
data temp;
x = '12345';
new_x = input(x,5.);
run;
In the above example, the variable x is a character variable as it is defined in quotes '12345'. The newly created variable new_x is in numeric format.

PUT Function is used to convert numeric variable to character.
new_char=put(numeric,4.0);
data temp;
x = 12345;
new_x = put(x,5.);
run;
In this example, the variable x is originally in numeric format and later converted to character in the new variable new_x.

18. LENGTH

It return length of a string.
data _null_;
x='ABCDEF-!1.234';
n= length(x);
put n=;
run;
It returns 13.

If you need to calculate the number of digits in a numeric variable -

First, we need to convert our numeric variable to character to count the number of digits as LENGTH function works only for character variable.
data _null_;
x = 12345;
cnt = length(strip(put(x,12.)));
put cnt=;
run;
In the above nested function, we first converted the variable x to character and then remove spaces by using STRIP function and then count number of digits by using LENGTH() function.

Another Method -
data _null_;
x = 12345;
cnt = int(log10(x)) + 1;
put cnt=;
run;
We can also use LOG10 function to solve it. LOG10 has a property which says :
Number of Digits = Integer value of [LOG10(x)] + 1. For example, LOG10(100) = 2 so Number of digits in 100 = 2 +1. See LOG10(1100) = 3.04 => INT(3.04) = 3 => 3+1 = Number of digits in 1100.


19. IF THEN
It replaces entire phrase.

20. COUNT Function

It counts the number of times that a specified substring appears within a character string.
data _null_;
name = "DeepAnshu Bhalla";
x = count(name,"a");
x1 = count(name,"a","i");
put x= x1=;
run;
Result :  x=2 as there are 2 lower case 'a's in the variable name. x1=3 as there are 3 'a's in the variable name (The 'i' modifier ignores case sensitivity)

21. COUNTW Function

It counts the number of words in a character string.
data readin;
input name$15.;
cards;
Trait Jhonson
3+3=6
;
run;

data out;
set readin;
x = countw(name);
x1 = countw(name,' ');
proc print;
run;
Output : COUNTW Function
 If you don't specify delimiter in the second parameter of COUNTW function, it would automatically special characters as a delimiter.

SAS Tutorials : 100 Free SAS Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

27 Responses to "SAS : Character Functions"

  1. Hi ,I am unable to understand this concept.
    There is no Delimiter specified other than blank in the specified string .
    Question is why we are using 'S' in the delimiter field.
    ===============================================================
    Let's make it a little complicated.

    Suppose, delimiter is a character instead of blank or special sign.
    string='Hello SAS community people';
    beginning= scan( string, 1, 'S' ); ** returns "Hello ";
    middle = scan( string, 2, 'S' ); ** returns "A";
    end= scan( string, 3, 'S' ); **returns " community people";

    ReplyDelete
    Replies
    1. We can use any character as a delimeter.
      If we specify 'S' as a delimeter, SAS would read it like -

      First Word - Hello Second Word - A Third Word - community people

      Delete
    2. third word is "community" only correct ?

      Delete
    3. here "S" acting as a delimiter so the third word will whole "community people".

      Delete
  2. It is so nice to read.. keep posting

    ReplyDelete
  3. Deepanshu, I simply love reading your posts, in fact I just passed Base SAS Certification, and I would give a great deal of credit to you for that. Extremely grateful for providing us such a useful website. Have a small request, is it possible for you to provide PDF format for SAS interview questions ? Regards,

    ReplyDelete
    Replies
    1. Glad you found posts useful and congratulations for clearing Base SAS Certification. Cheers!

      Delete
  4. IN SAS University Edition the space between 2 words is always single space. thus there is no point of using COMPBL.

    ReplyDelete
    Replies
    1. In real-world datasets, we generally encounter a case wherein we need to remove extra space within the text. In corporate, you would not use SAS University Edition. You would rather use enterprise version of SAS or SAS EG to perform data cleaning.

      Delete
  5. Hi Deepanshu!
    I have collected data on change in educational practices using a questionaire and wish to do the analysis. What all types of analysis can be done using SAS?

    ReplyDelete
    Replies
    1. SAS supports many survey analysis such as Crosstabs, Linear Regression, Logistic Regression, Cox-Regression Analysis for survey data. You can explore PROC SURVEYLOGISTIC, PROC SURVEYREG, PROC SURVEYPHREG, PROC SURVEYSELECT etc

      Delete
  6. Hi Deepanshu!
    Need a favor from you. Can you suggest some topics for last year project that uses SAS Analytics. Thnaks in advance!

    ReplyDelete
  7. Hi

    What is the difference between strip and compress function please ?

    regards

    ReplyDelete
    Replies
    1. STRIP removes leading and trailing spaces. Whereas, COMPRESS removes leading, within text spaces and trailing spaces. Check the article above for examples.

      Delete
  8. Hi Deepanshu.
    need a favour could you also place the pharma data and programs(TLF's) which will help full for pharma ppl also.

    ReplyDelete
  9. Hi Deepanshu,

    Very nice post. I am stuck with a problem. I have a file with a column that contains the following types of strings:

    Word1 Word2 ABCD XYZ. or
    Word1 ABC XYZ. or
    Word1 AB XYZ.

    What I need to do is to separate strings ABCD XYZ, ABC XYZ and
    AB XYZ and put them in separate column. So my final result would be 2 columns:

    COLUMN1| COLUMN2
    Word1 Word2 | ABCD XYZ
    Word1 |ABC XYZ
    Word1 |AB XYZ

    I tried to combine reverse and substr functions to separate strings into two columns but was not able to do because length of the string ABCD XYZ is 8, length of ABC XYZ is 7 and of AB XYZ is 6.

    Would you be able to tell me how would you separate these strings into two columns?

    Thanks

    ReplyDelete
    Replies
    1. try this :

      e=substr(a,length(a)-4,5);

      change the digits n variable name as per your requirement

      Delete
  10. Hi Deepanshu, Many thanks for this info. It is simple and easy to understand.

    I have learned base SAS. Can you please guide me how to prepare and clear base SAS exam.

    Your suggestions are utmost important to me.

    Thanks
    Naresh

    ReplyDelete
  11. Thank you for the great post. The examples were really helpful in clarifying. I have my base sas cert and looking for work, any suggestions?

    thanks
    Charles

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. data t;
    input a $200.;
    cards;
    red blood cells(rbc)
    white blood cells(wbc)
    ;
    run;

    I need output as
    Red Blood Cells(RBC)
    White Blood Cells(WBC)

    How to write the code for this output?
    Thanks in Advance

    ReplyDelete
    Replies
    1. data p;
      set t;

      b=scan(a,1,'');
      c=scan(a,2,'');
      d=scan(a,3,'( ') ;
      e=substr(a,length(a)-4,5);
      f=propcase(b);
      g=propcase(c);
      h=propcase(d);
      i=upcase(e);
      j=compbl(f||(g||h));
      k=cats(j,i);
      drop b c d e f g h i j;
      run;

      Delete
  14. data p;
    set t;

    b=scan(a,1,'');
    c=scan(a,2,'');
    d=scan(a,3,'( ') ;
    e=substr(a,length(a)-4,5);
    f=propcase(b);
    g=propcase(c);
    h=propcase(d);
    i=upcase(e);
    j=compbl(f||(g||h));
    k=cats(j,i);
    drop b c d e f g h i j;
    run;

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. data asd;
    set t;

    a=propcase(scan(a,1,''));
    b=propcase(scan(a,2,''));
    c=scan(a,3,'');
    e=find(c,'(');
    f=propcase(substr(c,1,e-1))||upcase(substr(c,e,length(c)-1));

    n=catx("",a,b);
    m=catx("",n,f);
    keep a m;
    run;

    ReplyDelete

Next → ← Prev