SAS : Read Character Variable of Varying Length

This tutorial demonstrates how we can read or import data with a character variable of varying length. We generally encounter this situation when we have company names or both first and last names of a person in our dataset.

Example I

In the following example, the variable "Name" has varying length i.e. not all observations of this variable has similar length.

Example Dataset
Read Messy Data

Method I : Use COLON Modifier


We can use colon modifier : to tell SAS to read variable "Name" until there is a space or other delimiter. The  $30. defines the variable as a character variable having max length 30.
data example1;
input ID Name :$30. Score;
cards;
1 DeepanshuBhalla 22
2 AttaPat 21
3 XonxiangnamSamnuelnarayan 33
;
proc print noobs;
run;
The colon modifier is also used to read numeric data that contains special characters such as comma For example 1,000.

Suppose you want to read a variable which holds numeric values with comma in thousands place (or thousand separator).
data ex2;
input ID Name:$30. Score fee:$10.;
cards;
1 DeepanshuBhalla 22 1,000
2 AttaPat 21 2,000
3 XonxiangnamSamnuelnarayan 33 3,000
;
run;
In the above program, we have used colon modifier to load "fee" variable and used $ sign to read this variable. It is stored as a character variable.If you would not use $ sign for the same, it will return missing values. See the program below how to store it as a numeric variable.
data ex2;
input ID Name:$30. Score fee comma5. ;
cards;
1 DeepanshuBhalla 22 1,000
2 AttaPat 21 2,000
3 XonxiangnamSamnuelnarayan 33 3,000
;
run;
comma5. informat removes comma and store it as a numeric variable. 5 refers to width of the input field. To read bigger number like 3,000,000, you can use comma10.

Method II : Use LENGTH statement prior to INPUT Statement

In the following program, we use a length statement prior to input statement to adjust varying length of a variable. In this case, the variable Name would be read first. Use only $ instead of $30. after "Name" in INPUT statement.
data example2;
length Name $30.;
input ID Name $ Score;
cards;
1 DeepanshuBhalla 22
2 AttaPat 21
3 XonxiangnamSamnuelnarayan 33
;
proc print noobs;
run;
Output
It changes the order of variables as the variable Name would be read first. 

Method III : Use Ampersand (&) and Put Extra Space

We can use ampersand (&) to tell SAS to read the variable until there are two or more spaces as a delimeter. This technique is very useful when the variable contains two or more words. For example, if we have observation like "Deepanshu Bhalla" rather than "DeepanshuBhalla".

Note : 2 spaces before 22, 21 and 33
data example1;
input ID Name & $30. Score;
cards;
1 DeepanshuBhalla  22
2 AttaPat  21
3 XonxiangnamSamnuelnarayan  33
;
proc print noobs;
run;

Example II : When a variable contains more than 1 word

In this case, we have a space between First Name and Last Name and we want to store both the first and last names in a single variable.

Example 2 : Read Messy Data

In this case, the following methods do not work.

  1. Colon modifier (:) does not work for a variable having multiple words
  2.  LENGTH Statement prior to INPUT Statement does not work here.

Use Ampersand (&) and add ADDITIONAL space works.
data example1;
input ID Name & $30. Score;
cards;
1 Deepanshu Bhalla  22
2 Atta Pat  21
3 Xonxiangnam Samnuelnarayan  33
;
proc print noobs;
run;
This trick works in reading data from external file.
data temp;
infile "C:\Users\Deepanshu\Desktop\file1.txt";
input ID Name & $30. Score;
proc print noobs;
run;
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

32 Responses to "SAS : Read Character Variable of Varying Length"
  1. Nice tips to resolve time consuming issues for SAS beginners

    ReplyDelete
  2. Hi,

    It it possible to get all sas tuturials as an PDF file ? :)

    ReplyDelete
  3. really learnt a new thing here.....
    appreciating your efforts
    thanx

    ReplyDelete
  4. I appreciate your efforts in explainnig this...
    Thanks.

    ReplyDelete
  5. data read;
    input cc spent;
    cards;
    cc spend
    1 100
    1 200
    1 550
    1 100
    1 200
    1 550
    1 100
    2 200
    2 550
    2 200
    2 200
    2 550
    2 200
    2 900
    3 750
    3 550
    3 1300
    3 1900
    3 750
    ;
    run;

    this code is giving error could you please tell me why?

    ReplyDelete
    Replies
    1. Hi , you have created numeric variable as cc and spent and you are passing character value in your first line of cards("cc", "spend".)

      Delete
    2. Amit sir se nahi puche ye....:D

      Delete
  6. In this senario name variable having space between first variable and last variable how can we read the data normally we r using & but it's NT working can u guys tell me
    Ex
    Data student;
    Input studid studname$ rank;
    Cards;
    101 Rajkumar varma 20
    102 Rajesh 23
    103 Manojkumar p 19
    104 saravanakumar prudhvi 21
    Run;
    Can u tell me like this data how can we read please explain me

    ReplyDelete
    Replies
    1. Please use below code. It should work.
      Data student;
      Input studid studname & $30. rank;
      Cards;
      101 Rajkumar varma 20
      102 Rajesh 23
      103 Manojkumar p 19
      104 saravanakumar prudhvi 21
      ;
      proc print noobs;
      Run;

      Delete
    2. Just don't forget to put 2 spaces before numbers 20, 23, 19, 21.

      Delete
    3. As you are having space in name so you have to use & to read it

      Delete
    4. if u don't want to put 2 spaces before 20, 23, 19, 21 then use this code


      /* first read the data*/
      Data student;
      length studname $30;
      Input studid studname & $30. ;
      Cards;
      101 Rajkumar varma 20
      102 Rajesh 23
      103 Manojkumar p 19
      104 saravanakumar prudhvi 21
      ;

      proc print noobs;
      Run;

      /*second create rank and studname*/
      data student;
      set student;
      rank=substr(studname,length(studname)-1,3);
      studname=substr(studname,1,length(studname)-2);
      proc print noobs;
      Run;
      run;

      Delete
    5. Rakhi Aggarwal14 May 2018 at 21:45

      In above first code, if we have used 30. wid length, then why are we again using it wid input

      Delete
  7. In this senario name variable having space between first variable and last variable how can we read the data normally we r using & but it's NT working can u guys tell me
    Ex
    Data student;
    Input studid studname$ rank;
    Cards;
    101 Rajkumar varma 20
    102 Rajesh 23
    103 Manojkumar p 19
    104 saravanakumar prudhvi 21
    Run;
    Can u tell me like this data how can we read please explain me

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Use below code
      Data student;
      Input studid studname& $21. rank;
      Cards;
      101 Rajkumar varma 20
      102 Rajesh 23
      103 Manojkumar p 19
      104 saravanakumar prudhvi 21
      Run;
      I have given double space between studid and rank

      Delete
    3. Struck with same kind of problem.
      were you able to find the way to read the data with spaces between first variable an second variable?
      please let me know.

      Delete

  8. I need the output of lastname of this type of data:


    data ss;
    input name$ 40.;
    cards;
    Shanmugam ram anand
    vadi vel raja kumar
    ram jaya
    ravi
    SERVICIOS PROTEXA CONSTRUCTION
    ;
    run;
    proc print;
    run;




    i need output as follows
    output:
    anand
    kumar
    jaya
    construction

    please suggest the sas programme to get this output.

    ReplyDelete
    Replies
    1. data sub;
      set ss;
      e= scan(name,-1,'');
      name=e;
      keep name;
      run;

      Delete
    2. Hi Shilpi, Can you please assist in explaining the use of name=e in the fourth step of the above code.

      Regards

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. In Example 1 where there are no spaces between first and last name, we can also use truncover to avoid the problem of SAS reading next variable when the variable length is less than passed in input statement. Correct me if I am wrong.

    ReplyDelete
  11. After importing the excel file to SAS, how to view the data set ? Is there is any other step involved before importing the data like, assign a name to the data or anything like that ? please provide clarification regarding this.

    ReplyDelete
  12. thanks for this amazing content .
    please help me on below.

    'The colon modifier is also used to read numeric data that contains special characters such as comma For example 1,000.'

    I have tried but wrong o/p.
    data ex2;
    input ID Name:$30. Score fee:10. ;
    cards;
    1 DeepanshuBhalla 22 1,000
    2 AttaPat 21 2,000
    3 XonxiangnamSamnuelnarayan 33 3,000
    ;
    run;

    ReplyDelete
    Replies
    1. Added more explanation in the post. Hope it helps!

      Delete
  13. You have missed the $ sign while defining fee varible in input statement.try fee:$10.
    otherwise fee comma5.
    both will work.

    ReplyDelete

Next → ← Prev
Love this Post? Spread the Word!
Share