SAS : Advanced String Manipulation

Deepanshu Bhalla Add Comment
This post covers how to deal some of the advanced string operations with SAS. In SAS, there are various functions available for handling character strings but sometimes they are not enough to manipulate character strings.

Example 1 : Generate frequently used keywords

Suppose you have a list of customer complaints with their open-ended comments You are asked to analyze it. The most common (or basic) text mining technique is to generate common used words in the list of complaints. It is easily possible via SAS text miner but a little bit complicated to be done via base SAS. The following SAS macro accomplish this task.
%macro frequency(inputdata=,var=,outdata=);

data test2;
set &inputdata.;
varr = compress(lowcase(&var.),' ','ak');
do i= 1 to countw(varr);
var1= scan(varr,i);
output;
end;
run;

proc sql noprint;
create table &outdata. as
select var1, count(*) as N from test2
where length(var1) > 2
group by 1
order by N desc;
quit;
%mend;

%frequency(inputdata=temp,var=var,outdata=freqlist);
Macro Parameters

  1. inputdata : Specify the name of the dataset in which open-ended comments exist
  2. var : Specify the name of the variable which contains comments
  3. outdata : Specify the name you want to assign to the output dataset
SAS : Frequency of Words

Areas of Improvement
In the macro, this line of code "where length(var1) > 2" removes all keywords having length less than or equal to 2. It is to remove common non-meaningful words like "a", "an", "be", "is", "am" "of" "on" "in" etc. It does not cover exhaustive list of non-meaningful keywords such as "the" ,"and", "that" etc. Also, this WHERE condition can remove important keywords that are abbreviations of some department / business unit etc. Example, CA refers to Corporate Agency. So, instead of using this line of code, prepare an exclusion list which can be used to exclude non-meaningful keywords.

Example 2 : Reverse a Character String

Suppose you have a list of words. You are asked to reverse it.

Create a Sample Dataset
data temp;
input list $50.;
cards;
listendata
saspythonr
datascience
analytics
;
run;
 REVERSE Function
data temp2;
set temp;
x = left(reverse(list));
run;
In SAS, there is a function available for reversing a string. The function is called REVERSE. The LEFT function is used before REVERSE function to remove leading spaces.
SAS : Reverse String
You may want to get your hands dirty by writing code for it without using REVERSE function. You can do it by extracting each letter from a string using DO LOOP and then reverse it with PROC SORT, RETAIN and FIRST., LAST. variables. See the code below -
data test;
set temp;
do i= 1 to length(list);
list1= substr(list,i,1);
output;
end;
run;

proc sort data = test;
by list descending i ;
run;

data test2;
set test(keep = list list1);
retain list2;
by list;
if first.list then list2=trim(list1);
else list2 = cats("",list2,list1);
if last.list;
keep list list2;
run;

Example 3 :  Extracting Alternate Letters from a String

Suppose you are asked to pull alternate letters from a character string. The logic for it is similar to the REVERSE code. A few changes are : (1) To increment by 2 in loop instead of 1. (2) No sorting letters on descending order.

SAS : Alternate Letters
data test2;
set temp;
do i= 1 to length(list) by 2;
list1= substr(list,i,1);
output;
end;
run;

proc sort data = test2;
by list;
run;

data test3;
set test2(keep= list list1);
retain list2;
by list;
if first.list then list2=trim(list1);
else list2 = cats("",list2,list1);
if last.list;
keep list list2;
run; 
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

0 Response to "SAS : Advanced String Manipulation"
Next → ← Prev