This tutorial explains how to use regular expression language (pattern matching) with SAS.

Searches for a pattern match and returns the position at which the pattern is found.

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.

It performs a pattern-matching replacement.

**Sample Data**data x;

infile datalines truncover;

input name $100.;

datalines;

Deepanshu

How are you, deepanshu

dipanshu

deepanshu is a good boy

My name is deepanshu

Deepanshu Bhalla

Deepanshuuu

DeepanshuBhalla

Bhalla Deepanshu

;

run;

**Important Functions for Pattern Matching****1. PRXMATCH**Searches for a pattern match and returns the position at which the pattern is found.

PRXMATCH (perl-regular-expression, variable_name)It returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.

**Example 1 :**data xx;

set x;

if prxmatch("/Deepanshu/", name) > 0 then flag = 1;

if prxmatch("/Deepanshu/i", name) > 0 then flag1 = 1;

if prxmatch("/^Deepanshu/i", name) > 0 then flag2 = 1;

if prxmatch("/\bDeepanshu\b/i", name) > 0 then flag3 = 1;

if prxmatch("/D[ai]panshu/i", name) > 0 then flag4 = 1;

if prxmatch("/D.panshu/i", name) > 0 then flag5 = 1;

proc print;

run;

: |

Output : Pattern Matching |

**Important Points**- The
**/i**in the regular expression makes search case-insensitive. - The
**^**in the regular expression tells SAS to search for the strings that starts with the search-string. - The
**\b**in the regular expression tells SAS to match word boundary. - The
**\B**in the regular expression tells SAS to match non-word boundary. - The
**[ai]**in the regular expression searches any of the characters within the string. - The
**.**in the regular expression tells SAS to take any of the characters within the string.

**Example 2 : Search Multiple Sub Strings**

data temp;

Input company $30.;

cards;

Tata

tata

Tataz

TataM Jan

Tata Motor

Reliance World

Reliance Ltd

Reliance Petro

Reliance Global

Vanucoverltd Company

;

run;

data temp1;

set temp;

if prxmatch("/\b(Tata|Reliance)\b/i",company) > 0;run;

**Example 3 : Find Pattern**

Suppose you are asked to find strings that contain length of 4 characters. The first character must contain a letter and the remaining characters must contain numeric.

data _null_;

x = 'A345';

x2 = 'A55A';

y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);

y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);

put y= y2=;

run;

**2. PRXCHANGE Function**

It performs a pattern-matching replacement.

PRXCHANGE(regular-expression, -1, variable)

**Suppose you are asked to replace 'Tata' with 'Tata Group'.**

data temp2;

set temp;

Company0 = PrxChange('s/\b(Tata)\b/Tata Group/i' , -1 , strip(company));

proc print;

run;

**Note :**The

**'s keyword**indicates substitution.

**Remove a list of keywords such as Jan, Ltd, Company**

Company1 = PrxChange('s/\b(Jan|ltd|Company)\b//i' , -1 , strip(company));

Hi,

ReplyDeleteCan any one please explain me from Example 3 : Find Pattern: Program

data _null_;

x = 'A345';

x2 = 'A55A';

y = prxmatch("/^[a-zA-Z][0-9]{3}$/", x);

y2 = prxmatch("/^[a-zA-Z][0-9]{3}$/", x2);

put y= y2=;

run;

in Y and Y2 variable expression why $ symbol used before completing the expression?

"/^[a-zA-Z][0-9]{3}$/" - expression used in Y and Y2 variables.

Thank you in advance

$ means the digits should be in last

Deletewhy do we use ?? in sas.can you please give the solution.

ReplyDelete