This tutorial explains how to use regular expression language (pattern matching) with SAS.
Sample Data
Important Functions for Pattern Matching
1. PRXMATCH
Searches for a pattern match and returns the position at which the pattern is found.
Example 1 :
Important Points
Sample Data
data x;
infile datalines truncover;
input name $100.;
datalines;
Deepanshu
How are you, deepanshu
dipanshu
deepanshu is a good boy
My name is deepanshu
Deepanshu Bhalla
Deepanshuuu
DeepanshuBhalla
Bhalla Deepanshu
;
run;
Important Functions for Pattern Matching
1. PRXMATCH
Searches for a pattern match and returns the position at which the pattern is found.
PRXMATCH (perl-regular-expression, variable_name)It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.
Example 1 :
data xx;
set x;
if prxmatch("/Deepanshu/", name) > 0 then flag = 1;
if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;
if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;
if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;
if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;
if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;
proc print;
run;
: |
Output : Pattern Matching |
- The /i in the regular expression makes search case-insensitive.
- The ^ in the regular expression tells SAS to search for the strings that starts with the search-string.
- The \b in the regular expression tells SAS to match word boundary.
- The \B in the regular expression tells SAS to match non-word boundary.
- The [ai] in the regular expression searches any of the characters within the string.
- The . in the regular expression tells SAS to take any of the characters within the string.
Example 2 : Search Multiple Sub Strings
data temp;
Input company $30.;
cards;
Tata
tata
Tataz
TataM Jan
Tata Motor
Reliance World
Reliance Ltd
Reliance Petro
Reliance Global
Vanucoverltd Company
;
run;
data temp1;
set temp;
if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;
Example 3 : Find Pattern
Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
It performs a pattern-matching replacement.
Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
data _null_;
x = 'A345';
x2 = 'A55A';
y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);
y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);
put y= y2=;
run;
2. PRXCHANGE Function
It performs a pattern-matching replacement.
PRXCHANGE(regular-expression, -1, variable)
Suppose you are asked to replace 'Tata' with 'Tata Group'.
data temp2;Note : The 's keyword indicates substitution.
set temp;
Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));
proc print;
run;
Remove a list of keywords such as Jan, Ltd, Company
Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));
$ means the digits should be in last
ReplyDeletewhy do we use ?? in sas.can you please give the solution.
ReplyDeletesir please give me idea about fresher interview rounds
ReplyDeleteLeapyear concept in sas
ReplyDelete