Pattern Matching with SAS

This tutorial explains how to use regular expression language (pattern matching) with SAS.

Sample Data
data x;
infile datalines truncover;
input name $100.;
datalines;
Deepanshu
How are you, deepanshu
dipanshu
deepanshu is a good boy
My name is deepanshu
Deepanshu Bhalla
Deepanshuuu
DeepanshuBhalla
Bhalla Deepanshu
;
run;

Important Functions for Pattern Matching

1. PRXMATCH

Searches for a pattern match and returns the position at which the pattern is found.
PRXMATCH (perl-regular-expression, variable_name)
It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.

Example 1 :
data xx;
set x;
if prxmatch("/Deepanshu/", name) > 0 then flag = 1;
if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;
if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;
if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;
if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;
if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;
proc print;
run;
:
Output : Pattern Matching
Important Points

  1. The /i in the regular expression makes search case-insensitive.
  2. The ^ in the regular expression tells SAS to search for the strings that starts with the search-string.
  3. The \b in the regular expression tells SAS to match word boundary.
  4. The \B in the regular expression tells SAS to match non-word boundary.
  5. The [ai] in the regular expression searches any of the characters within the string.
  6. The in the regular expression tells SAS to take any of the characters within the string.
Example 2 : Search Multiple Sub Strings

data temp;
Input company $30.;
cards;
Tata
tata
Tataz
TataM Jan
Tata Motor
Reliance World
Reliance Ltd
Reliance Petro
Reliance Global
Vanucoverltd Company
;
run;
data temp1;
set temp;
if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;

Example 3 : Find Pattern

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
data _null_;
x =  'A345';
x2 = 'A55A';
y =  prxmatch("/^[a-zA-Z][0-9]{3}$/", x);
y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);
put y= y2=;
run;
2. PRXCHANGE Function

It performs a pattern-matching replacement.
PRXCHANGE(regular-expression, -1, variable) 
Suppose you are asked to replace 'Tata' with 'Tata Group'.
data temp2;
set temp;
Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));
proc print;
run;
Note : The 's keyword indicates substitution.
 
Remove a list of keywords such as Jan, Ltd, Company
Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));

SAS Tutorials : 100 Free SAS Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Pattern Matching with SAS"

Post a Comment

Next → ← Prev