Pattern Matching with SAS

This tutorial explains how to use regular expression language (pattern matching) with SAS.

Sample Data
data x;
infile datalines truncover;
input name $100.;
How are you, deepanshu
deepanshu is a good boy
My name is deepanshu
Deepanshu Bhalla
Bhalla Deepanshu

Important Functions for Pattern Matching


Searches for a pattern match and returns the position at which the pattern is found.
PRXMATCH (perl-regular-expression, variable_name)
It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.

Example 1 :
data xx;
set x;
if prxmatch("/Deepanshu/", name) > 0 then flag = 1;
if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;
if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;
if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;
if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;
if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;
proc print;
Output : Pattern Matching
Important Points

  1. The /i in the regular expression makes search case-insensitive.
  2. The ^ in the regular expression tells SAS to search for the strings that starts with the search-string.
  3. The \b in the regular expression tells SAS to match word boundary.
  4. The \B in the regular expression tells SAS to match non-word boundary.
  5. The [ai] in the regular expression searches any of the characters within the string.
  6. The in the regular expression tells SAS to take any of the characters within the string.
Example 2 : Search Multiple Sub Strings

data temp;
Input company $30.;
TataM Jan
Tata Motor
Reliance World
Reliance Ltd
Reliance Petro
Reliance Global
Vanucoverltd Company
data temp1;
set temp;
if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;

Example 3 : Find Pattern

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.
data _null_;
x =  'A345';
x2 = 'A55A';
y =  prxmatch("/^[a-zA-Z][0-9]{3}$/", x);
y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);
put y= y2=;
2. PRXCHANGE Function

It performs a pattern-matching replacement.
PRXCHANGE(regular-expression, -1, variable) 
Suppose you are asked to replace 'Tata' with 'Tata Group'.
data temp2;
set temp;
Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));
proc print;
Note : The 's keyword indicates substitution.
Remove a list of keywords such as Jan, Ltd, Company
Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));
Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

5 Responses to "Pattern Matching with SAS"
  1. Hi,

    Can any one please explain me from Example 3 : Find Pattern: Program

    data _null_;
    x = 'A345';
    x2 = 'A55A';
    y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);
    y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);
    put y= y2=;

    in Y and Y2 variable expression why $ symbol used before completing the expression?

    "/^[a-zA-Z][0-9]{3}$/" - expression used in Y and Y2 variables.

    Thank you in advance

  2. why do we use ?? in sas.can you please give the solution.

  3. sir please give me idea about fresher interview rounds


Next → ← Prev
Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.