SAS SCAN Function: Learn with Examples

Deepanshu Bhalla Add Comment

This article explains how to use the SCAN function in SAS. It includes various examples to practice and master the function.

What does the SCAN Function do?

The SCAN function extracts words from a character string in SAS. Character string is a variable having text. For example let's say you have a variable which contains this phrase - I love SAS and you wish to extract the second word "love" from the phrase.

Tutorial : SCAN Function

Syntax of SCAN Function

The syntax of the SCAN Function is as follows -

SCAN(text, nth-word, [delimiters], [modifiers])
  • text: The string from which you want to extract the word.
  • nth-word: The nth-word is the position of the word you want to extract. A positive value extracts from left to right, and a negative value extracts from right to left.
  • delimiters: The delimiter that separate the words within the string. It is optional. By default it takes space as delimiter.
  • modifiers: It is an optional argument that can be used to change or expand the default behavior of SCAN Function. For example it can be used to perform a case-insensitive search. See the list of modifiers in the next section of this tutorial.
Modifiers Description
a Adds alphabetic characters to the list of characters.
b Scans backward from right to left instead of default behavior of left to right.
c Adds control characters to the list of characters.
d Adds digits to the list of characters.
f Adds an underscore and English letters to the list of characters.
g Adds graphic characters to the list of characters. Graphic characters are characters that, when printed, produce an image on paper.
h Adds a horizontal tab to the list of characters.
i Case-insensitive search.
k Causes all characters not in the list of characters to be treated as delimiters.
l Adds lowercase letters to the list of characters.
m Multiple consecutive delimiters can be specified with the m modifier.
n Adds digits, an underscore, and English letters to the list of characters.
o Processes the charlist and modifier arguments only once, rather than every time the SCAN function is called.
p Adds punctuation marks to the list of characters.
q q modifier is used when you want to ignore delimiters inside substrings enclosed in quotation marks.
r Removes leading and trailing blanks from the word that SCAN returns.
s Adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).
t Trims trailing blanks from the string and charlist arguments.
u Adds uppercase letters to the list of characters.
w Adds printable (writable) characters to the list of characters.
x Adds hexadecimal characters to the list of characters.

Examples

Let's take a simple example. We have a string I love SAS Programming and we want to extract second word from the string.

data _null_;
text = "I love SAS Programming";
result = scan(text,2);
put result=;
run;
Output

As shown in the image below, "love" is the second word extracted from the string. The DATA _NULL_ statement is used when you need to perform some operations or calculations without creating an output dataset. In the above example, we have not created any dataset as we just wanted to show how SCAN function works.

SCAN Function in SAS
How to Extract Second Last Word?

There are two ways to scan from right to left in the SCAN function.

Solution 1

The SCAN function can also be used to read from right to left. When you specify a negative number in the second argument of the function nth-word, SAS starts scanning from the right. For example -1 means the last word of the string.

Since we wish to find the second last word in the string, we have mentioned -2 in the second argument of the SCAN function.

data _null_;
text = "I love SAS Programming";
result = scan(text,-2);
put result=;
run;
Output

As shown in the image below, the SAS Program returns "SAS" as the second-to-last word.

SCAN Function : Output
Solution 2 : Use Modifier

We can also scan from right to left using modifier which is the fourth argument of the SCAN Function. The "b" modifier tells SAS to scan backward which means reading string from right to left.

data _null_;
text = "I love SAS Programming";
result = scan(text,2," ","b");
put result=;
run;
How to Handle Delimiters?

Let's create a sample dataset for demonstration. In the dataset, we have a variable (column) named 'text' which contains names not in a proper format. Suppose you are asked to pull lastname from the variable.

Comma (,) is a separator or delimiter in the example below. Hence we will use comma as a third argument in the function to extract last name from the variable 'text'.

data readin;
input text $30.;
datalines;
Mrs Serena, Williams
Mr. Dave, Sandy
Rakesh Kumar, Arora
Peter,Sandreas
;
run;

data readin2;
set readin;
lastname=scan(text,2,",");
proc print;
run;
Output

In the above SAS program, we have created a new variable named 'lastname' which contains last names.

SCAN Function: Find LastName
How to Convert a String into Multiple Observations?

Suppose you have a string that consists of multiple substrings delimited by commas, and you wish to transform it into multiple observations (rows).

data readin;
input text $30.;
datalines;
live, love, laugh, repeat
;
run;
data readin2(keep=word);
set readin;
  do i = 1 to countw(text, ',');
    word = scan(text, i, ',');
    output;
  end;
proc print;	
run;
Loop using SCAN
Explanation
  1. A DO loop is initiated with the variable 'i' iterating from 1 to the number of words in the 'text' variable, separated by commas. This is done using the 'COUNTW' function.
  2. Within the loop, the SCAN function is used to extract each word from the 'text' variable based on the current value of 'i' and the comma delimiter. The extracted word is then assigned to the 'word' variable.
  3. The 'OUTPUT' statement is used to store each word as a separate observation in the new dataset named 'readin2'. It is run within the loop.
  4. In the new dataset named 'readin2', we have kept the variable 'word' only. We didn't retain the variables 'i', 'text'.
How to Convert a String into Multiple Variables?

In the previous example, we converted a string into multiple rows. Here, we are transforming it into multiple columns (variables). We have used the same sample dataset in this example.

data readin;
input text $30.;
datalines;
live, love, laugh, repeat
;
run;
data _null_;
set readin;
call symputx('nWords', countw(text, ","));  
run;

data readin2;
  set readin;
  array word[&nWords] $12 word1-word&nWords;  
  do i = 1 to &nWords;
    word[i] = scan(text, i, ',');
  end;
  drop i;
proc print;  
run;
SCAN Function : Example
Explanation

First we count the number of words in the variable "text" using the countw function, with the delimiter as a comma (","). The call symputx statement was used to create a macro variable named "nWords" and assigned the count of words to it.

We declared an array named "word" with a size equal to the value stored in the macro variable "nWords". Each array element is a character variable of length 12, and the variables are named "word1" to "word4". Then we executed a do-loop that iterates from 1 to the value of "nWords".

Within the do-loop, we assigned each word from the variable "text" to the corresponding element of the array "word" using the scan function with a comma (",") as the delimiter. In simple words, we created a new variable for each word in the variable "text". Later we removed the variable "i" from the dataset.

Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "SAS SCAN Function: Learn with Examples"
Next → ← Prev