This tutorial explains how to use regular expression language (pattern matching) with SAS.

Searches for a pattern match and returns the position at which the pattern is found.

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.

It performs a pattern-matching replacement.

**Sample Data**data x;

infile datalines truncover;

input name $100.;

datalines;

Deepanshu

How are you, deepanshu

dipanshu

deepanshu is a good boy

My name is deepanshu

Deepanshu Bhalla

Deepanshuuu

DeepanshuBhalla

Bhalla Deepanshu

;

run;

**Important Functions for Pattern Matching****1. PRXMATCH**Searches for a pattern match and returns the position at which the pattern is found.

PRXMATCH (perl-regular-expression, variable_name)It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.

**Example 1 :**data xx;

set x;

if prxmatch("/Deepanshu/", name) > 0 then flag = 1;

if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;

if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;

if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;

if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;

if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;

proc print;

run;

: |

Output : Pattern Matching |

**Important Points**- The
**/i**in the regular expression makes search case-insensitive. - The
**^**in the regular expression tells SAS to search for the strings that starts with the search-string. - The
**\b**in the regular expression tells SAS to match word boundary. - The
**\B**in the regular expression tells SAS to match non-word boundary. - The
**[ai]**in the regular expression searches any of the characters within the string. - The
**.**in the regular expression tells SAS to take any of the characters within the string.

**Example 2 : Search Multiple Sub Strings**

data temp;

Input company $30.;

cards;

Tata

tata

Tataz

TataM Jan

Tata Motor

Reliance World

Reliance Ltd

Reliance Petro

Reliance Global

Vanucoverltd Company

;

run;

data temp1;

set temp;

if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;

**Example 3 : Find Pattern**

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.

data _null_;

x = 'A345';

x2 = 'A55A';

y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);

y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);

put y= y2=;

run;

**2. PRXCHANGE Function**

It performs a pattern-matching replacement.

PRXCHANGE(regular-expression, -1, variable)

**Suppose you are asked to replace 'Tata' with 'Tata Group'.**

data temp2;

set temp;

Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));

proc print;

run;

**Note :**The

**'s keyword**indicates substitution.

**Remove a list of keywords such as Jan, Ltd, Company**

Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));

Hi,

ReplyDeleteCan any one please explain me from Example 3 : Find Pattern: Program

data _null_;

x = 'A345';

x2 = 'A55A';

y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);

y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);

put y= y2=;

run;

in Y and Y2 variable expression why $ symbol used before completing the expression?

"/^[a-zA-Z][0-9]{3}$/" - expression used in Y and Y2 variables.

Thank you in advance

$ means the digits should be in last

Deletewhy do we use ?? in sas.can you please give the solution.

ReplyDeletesir please give me idea about fresher interview rounds

ReplyDeleteLeapyear concept in sas

ReplyDelete