This tutorial demonstrates how we can read or import data with a character variable of varying length. We generally encounter this situation when we have company names or first and last names of a person in our dataset.
In the following example, the variable "Name" has varying length i.e. not all observations of this variable has similar length.
Example Dataset |
Read Messy Data |
Method I : Use COLON Modifier
We can use colon modifier : to tell SAS to read variable "Name" until there is a space or other delimiter. The $30. defines the variable as a character variable having max length 30.
data example1;The colon modifier is also used to read numeric data that contains special characters such as comma For example 1,000.
input ID Name :$30. Score;
cards;
1 DeepanshuBhalla 22
2 AttaPat 21
3 XonxiangnamSamnuelnarayan 33
;
proc print noobs;
run;
Suppose you want to read a variable which holds numeric values with comma in thousands place (or thousand separator).
data ex2; input ID Name:$30. Score fee:$10.; cards; 1 DeepanshuBhalla 22 1,000 2 AttaPat 21 2,000 3 XonxiangnamSamnuelnarayan 33 3,000 ; run;In the above program, we have used colon modifier to load "fee" variable and used $ sign to read this variable. It is stored as a character variable.If you would not use $ sign for the same, it will return missing values. See the program below how to store it as a numeric variable.
data ex2; input ID Name:$30. Score fee comma5. ; cards; 1 DeepanshuBhalla 22 1,000 2 AttaPat 21 2,000 3 XonxiangnamSamnuelnarayan 33 3,000 ; run;comma5. informat removes comma and store it as a numeric variable. 5 refers to width of the input field. To read bigger number like 3,000,000, you can use comma10.
Method II : Use LENGTH statement prior to INPUT Statement
In the following program, we use a length statement prior to input statement to adjust varying length of a variable. In this case, the variable Name would be read first. Use only $ instead of $30. after "Name" in INPUT statement.
data example2;
length Name $30.;
input ID Name $ Score;
cards;
1 DeepanshuBhalla 22
2 AttaPat 21
3 XonxiangnamSamnuelnarayan 33
;
proc print noobs;
run;
Output |
Method III : Use Ampersand (&) and Put Extra Space
We can use ampersand (&) to tell SAS to read the variable until there are two or more spaces as a delimeter. This technique is very useful when the variable contains two or more words. For example, if we have observation like "Deepanshu Bhalla" rather than "DeepanshuBhalla".
Note : 2 spaces before 22, 21 and 33data example1;
input ID Name & $30. Score;
cards;
1 DeepanshuBhalla 22
2 AttaPat 21
3 XonxiangnamSamnuelnarayan 33
;
proc print noobs;
run;
In this case, we have a space between First Name and Last Name and we want to store both the first and last names in a single variable.
Example 2 : Read Messy Data |
In this case, the following methods do not work.
- Colon modifier (:) does not work for a variable having multiple words
- LENGTH Statement prior to INPUT Statement does not work here.
Use Ampersand (&) and add ADDITIONAL space works.
data example1;This trick works in reading data from external file.
input ID Name & $30. Score;
cards;
1 Deepanshu Bhalla 22
2 Atta Pat 21
3 Xonxiangnam Samnuelnarayan 33
;
proc print noobs;
run;
data temp;
infile "C:\Users\Deepanshu\Desktop\file1.txt";
input ID Name & $30. Score;
proc print noobs;
run;
Awesome
ReplyDeleteNice tips to resolve time consuming issues for SAS beginners
ReplyDeletevery good post
ReplyDeleteHi,
ReplyDeleteIt it possible to get all sas tuturials as an PDF file ? :)
really learnt a new thing here.....
ReplyDeleteappreciating your efforts
thanx
I appreciate your efforts in explainnig this...
ReplyDeleteThanks.
really helpful
ReplyDeletedata read;
ReplyDeleteinput cc spent;
cards;
cc spend
1 100
1 200
1 550
1 100
1 200
1 550
1 100
2 200
2 550
2 200
2 200
2 550
2 200
2 900
3 750
3 550
3 1300
3 1900
3 750
;
run;
this code is giving error could you please tell me why?
Hi , you have created numeric variable as cc and spent and you are passing character value in your first line of cards("cc", "spend".)
DeleteAmit sir se nahi puche ye....:D
Deletejust remove cc and spend after cards;
DeleteRemove the CC Spend from cards
DeleteAdd $ after "cc$ & Spend$"
Deletejust remove cc spend after cards and it will work
Deletedata read;
Deleteinput cc spent;
cards;
1 100
1 200
1 550
1 100
1 200
1 550
1 100
2 200
2 550
2 200
2 200
2 550
2 200
2 900
3 750
3 550
3 1300
3 1900
3 750
;
run;
In this senario name variable having space between first variable and last variable how can we read the data normally we r using & but it's NT working can u guys tell me
ReplyDeleteEx
Data student;
Input studid studname$ rank;
Cards;
101 Rajkumar varma 20
102 Rajesh 23
103 Manojkumar p 19
104 saravanakumar prudhvi 21
Run;
Can u tell me like this data how can we read please explain me
Please use below code. It should work.
DeleteData student;
Input studid studname & $30. rank;
Cards;
101 Rajkumar varma 20
102 Rajesh 23
103 Manojkumar p 19
104 saravanakumar prudhvi 21
;
proc print noobs;
Run;
Just don't forget to put 2 spaces before numbers 20, 23, 19, 21.
DeleteAs you are having space in name so you have to use & to read it
Deleteif u don't want to put 2 spaces before 20, 23, 19, 21 then use this code
Delete/* first read the data*/
Data student;
length studname $30;
Input studid studname & $30. ;
Cards;
101 Rajkumar varma 20
102 Rajesh 23
103 Manojkumar p 19
104 saravanakumar prudhvi 21
;
proc print noobs;
Run;
/*second create rank and studname*/
data student;
set student;
rank=substr(studname,length(studname)-1,3);
studname=substr(studname,1,length(studname)-2);
proc print noobs;
Run;
run;
In above first code, if we have used 30. wid length, then why are we again using it wid input
DeleteIn this senario name variable having space between first variable and last variable how can we read the data normally we r using & but it's NT working can u guys tell me
ReplyDeleteEx
Data student;
Input studid studname$ rank;
Cards;
101 Rajkumar varma 20
102 Rajesh 23
103 Manojkumar p 19
104 saravanakumar prudhvi 21
Run;
Can u tell me like this data how can we read please explain me
Use below code
DeleteData student;
Input studid studname& $21. rank;
Cards;
101 Rajkumar varma 20
102 Rajesh 23
103 Manojkumar p 19
104 saravanakumar prudhvi 21
Run;
I have given double space between studid and rank
Struck with same kind of problem.
Deletewere you able to find the way to read the data with spaces between first variable an second variable?
please let me know.
I need the output of lastname of this type of data:
data ss;
input name$ 40.;
cards;
Shanmugam ram anand
vadi vel raja kumar
ram jaya
ravi
SERVICIOS PROTEXA CONSTRUCTION
;
run;
proc print;
run;
i need output as follows
output:
anand
kumar
jaya
construction
please suggest the sas programme to get this output.
data sub;
Deleteset ss;
e= scan(name,-1,'');
name=e;
keep name;
run;
Hi Shilpi, Can you please assist in explaining the use of name=e in the fourth step of the above code.
DeleteRegards
name=e means whatever comes from scan function in e goes to name.
DeleteIn Example 1 where there are no spaces between first and last name, we can also use truncover to avoid the problem of SAS reading next variable when the variable length is less than passed in input statement. Correct me if I am wrong.
ReplyDeleteReally helpful. Thanks :)
ReplyDeletevery useful
ReplyDeleteAfter importing the excel file to SAS, how to view the data set ? Is there is any other step involved before importing the data like, assign a name to the data or anything like that ? please provide clarification regarding this.
ReplyDeletevery helpful.
ReplyDeletethanks for this amazing content .
ReplyDeleteplease help me on below.
'The colon modifier is also used to read numeric data that contains special characters such as comma For example 1,000.'
I have tried but wrong o/p.
data ex2;
input ID Name:$30. Score fee:10. ;
cards;
1 DeepanshuBhalla 22 1,000
2 AttaPat 21 2,000
3 XonxiangnamSamnuelnarayan 33 3,000
;
run;
Added more explanation in the post. Hope it helps!
DeleteYou have missed the $ sign while defining fee varible in input statement.try fee:$10.
ReplyDeleteotherwise fee comma5.
both will work.
Suppose I am Reading data from external data set,
ReplyDeleteData set is as follows :
1 Ritu Ritika Uttar Pradesh
2 Ria Andhra Pradesh
3 Saloni Maharashtra
Note : Ritu Ritika is single name
Code
Data abc;
Infile 'c:\folder\random.txt'
Input Cust_Id Cust_Name$ State& $30.;
Run;
So by using above code I am getting the full name of the state but when I use same thing with name it's not working.
How to use this for two columns?
Try this it will work:
DeleteData abc;
Infile 'c:\folder\random.txt'
Input Cust_Id Cust_Name:$20. State& $30.;
Run;
please suggest any code
ReplyDeleteHi, do not mention [cc spent] again after cards statement
ReplyDeletedata read;
input cc spent;
cards;
1 100
1 200
1 550
1 100
1 200
1 550
1 100
2 200
2 550
2 200
2 200
2 550
2 200
2 900
3 750
3 550
3 1300
3 1900
3 750
;
run;
proc print data=read;
run;
Below code is not showing values in output.. Please help
ReplyDeletedata manualdata;
input ID Name &$20. marks ;
cards;
1 Lakshmi priya 30
;
run;
add one more space between priya and 30.
Delete