This article explains how to use the SCAN function in SAS. It includes various examples to practice and master the function.
What does the SCAN Function do?
The SCAN function extracts words from a character string in SAS. Character string is a variable having text. For example let's say you have a variable which contains this phrase - I love SAS
and you wish to extract the second word "love" from the phrase.
Syntax of SCAN Function
The syntax of the SCAN Function is as follows -
SCAN(text, nth-word, [delimiters], [modifiers])
- text: The string from which you want to extract the word.
- nth-word: The nth-word is the position of the word you want to extract. A positive value extracts from left to right, and a negative value extracts from right to left.
- delimiters: The delimiter that separate the words within the string. It is optional. By default it takes space as delimiter.
- modifiers: It is an optional argument that can be used to change or expand the default behavior of SCAN Function. For example it can be used to perform a case-insensitive search. See the list of modifiers in the next section of this tutorial.
Modifiers | Description |
---|---|
a | Adds alphabetic characters to the list of characters. |
b | Scans backward from right to left instead of default behavior of left to right. |
c | Adds control characters to the list of characters. |
d | Adds digits to the list of characters. |
f | Adds an underscore and English letters to the list of characters. |
g | Adds graphic characters to the list of characters. Graphic characters are characters that, when printed, produce an image on paper. |
h | Adds a horizontal tab to the list of characters. |
i | Case-insensitive search. |
k | Causes all characters not in the list of characters to be treated as delimiters. |
l | Adds lowercase letters to the list of characters. |
m | Multiple consecutive delimiters can be specified with the m modifier. |
n | Adds digits, an underscore, and English letters to the list of characters. |
o | Processes the charlist and modifier arguments only once, rather than every time the SCAN function is called. |
p | Adds punctuation marks to the list of characters. |
q | q modifier is used when you want to ignore delimiters inside substrings enclosed in quotation marks. |
r | Removes leading and trailing blanks from the word that SCAN returns. |
s | Adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed). |
t | Trims trailing blanks from the string and charlist arguments. |
u | Adds uppercase letters to the list of characters. |
w | Adds printable (writable) characters to the list of characters. |
x | Adds hexadecimal characters to the list of characters. |
Examples
Let's take a simple example. We have a string I love SAS Programming
and we want to extract second word from the string.
data _null_; text = "I love SAS Programming"; result = scan(text,2); put result=; run;
As shown in the image below, "love" is the second word extracted from the string. The DATA _NULL_ statement is used when you need to perform some operations or calculations without creating an output dataset. In the above example, we have not created any dataset as we just wanted to show how SCAN function works.
There are two ways to scan from right to left in the SCAN function.
The SCAN function can also be used to read from right to left. When you specify a negative number in the second argument of the function nth-word, SAS starts scanning from the right. For example -1 means the last word of the string.
Since we wish to find the second last word in the string, we have mentioned -2 in the second argument of the SCAN function.
data _null_; text = "I love SAS Programming"; result = scan(text,-2); put result=; run;
As shown in the image below, the SAS Program returns "SAS" as the second-to-last word.
We can also scan from right to left using modifier which is the fourth argument of the SCAN Function. The "b" modifier tells SAS to scan backward which means reading string from right to left.
data _null_; text = "I love SAS Programming"; result = scan(text,2," ","b"); put result=; run;
Let's create a sample dataset for demonstration. In the dataset, we have a variable (column) named 'text' which contains names not in a proper format. Suppose you are asked to pull lastname from the variable.
Comma (,) is a separator or delimiter in the example below. Hence we will use comma as a third argument in the function to extract last name from the variable 'text'.
data readin; input text $30.; datalines; Mrs Serena, Williams Mr. Dave, Sandy Rakesh Kumar, Arora Peter,Sandreas ; run; data readin2; set readin; lastname=scan(text,2,","); proc print; run;
In the above SAS program, we have created a new variable named 'lastname' which contains last names.
Suppose you have a string that consists of multiple substrings delimited by commas, and you wish to transform it into multiple observations (rows).
data readin; input text $30.; datalines; live, love, laugh, repeat ; run;
data readin2(keep=word); set readin; do i = 1 to countw(text, ','); word = scan(text, i, ','); output; end; proc print; run;
- A DO loop is initiated with the variable 'i' iterating from 1 to the number of words in the 'text' variable, separated by commas. This is done using the 'COUNTW' function.
- Within the loop, the SCAN function is used to extract each word from the 'text' variable based on the current value of 'i' and the comma delimiter. The extracted word is then assigned to the 'word' variable.
- The 'OUTPUT' statement is used to store each word as a separate observation in the new dataset named 'readin2'. It is run within the loop.
- In the new dataset named 'readin2', we have kept the variable 'word' only. We didn't retain the variables 'i', 'text'.
In the previous example, we converted a string into multiple rows. Here, we are transforming it into multiple columns (variables). We have used the same sample dataset in this example.
data readin; input text $30.; datalines; live, love, laugh, repeat ; run;
data _null_; set readin; call symputx('nWords', countw(text, ",")); run; data readin2; set readin; array word[&nWords] $12 word1-word&nWords; do i = 1 to &nWords; word[i] = scan(text, i, ','); end; drop i; proc print; run;
First we count the number of words in the variable "text" using the countw function, with the delimiter as a comma (","). The call symputx statement was used to create a macro variable named "nWords" and assigned the count of words to it.
We declared an array named "word" with a size equal to the value stored in the macro variable "nWords". Each array element is a character variable of length 12, and the variables are named "word1" to "word4". Then we executed a do-loop that iterates from 1 to the value of "nWords".
Within the do-loop, we assigned each word from the variable "text" to the corresponding element of the array "word" using the scan function with a comma (",") as the delimiter. In simple words, we created a new variable for each word in the variable "text". Later we removed the variable "i" from the dataset.
Share Share Tweet