Avoid Truncation in PROC IMPORT

This tutorial explains how to stop truncation in character variables while importing CSV or Tab files with PROC IMPORT. It is a common issue when your CSV file has a character variable having inconsistent length such as open-end comments, company names and addresses etc.

Important Note :
By default, SAS scans 20 rows to determine the appropriate data type and length for the columns.

Sample Dataset

The sample data is shown in the image below. We have two variables named ID and Score. ID is a numeric variable and the Score is a character variable.

Sample Dataset

Importing CSV File with PROC IMPORT 
/* For demonstration, use 3 rows for guessing the column length */
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
    dbms=csv replace
    out=temp;
    guessingrows=3; /* if omitted, the default is 20 */
proc print noobs;
run;
Output
The numeric variable "ID" didn't get truncated. However, the character variable Score got truncated. As we have defined guessingrows=3, SAS considers the length of the character variable based on the third row (including header row) of the respective variable.

PROC IMPORT : Truncation Issue

For demonstration purpose, we have used guessingrows=3. If this option is omitted, SAS would scan first 20 rows.

Method I : Fix Truncating Character Variables

The simplest way to fix this issue is to use the higher number in GUESSINGROWS option.

Change GUESSINGROWS to 3000 (or higher value)
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
    dbms=csv replace
    out=temp;
    guessingrows=3000;
proc print noobs;
run;
In SAS 9.2, the maximum value of GUESSINGROWS is 32,767.
In SAS 9.3 or above, the maximum value of GUESSINGROWS is 2147483647
To define the max value, write GUESSINGROWS = MAX

Should I use GUESSINGROWS= MAX for simplicity?

It depends. If your file is heavy and contains hundreds of thousands of records, it would make the import process VERY SLOW. You might find that earlier it was taking 20-30 seconds to import the file when you were not using  GUESSINGROWS= MAX. Later it is taking 5-10 minutes to import it. If your file is short, don't hesitate to use GUESSINGROWS = MAX.


Method 2 : Use the generated PROC IMPORT code and Modify it

Step 1. Run PROC IMPORT Code Once

440      data WORK.TEMP    ;
441      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
442      infile 'C:\Users\Deepanshu\Documents\dat2.csv' delimiter = ',' MISSOVER DSD lrecl=32767
442! firstobs=2 ;
443         informat ID best32. ;
444         informat Score $6. ;
445         format ID best12. ;
446         format Score $6. ;
447      input
448                  ID
449                  Score $
450      ;
451      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
452      run;

Step 2. Copy the generated import code from the log to your program, remove the line numbers.

Step 3. Change width of the Score Variable from 6 to 30 in both INFORMAT and FORMAT. It's done!

Other Application of GUESSINGROWS

It is useful if you have both character and numeric values in a variable and the first 20 observations are all numeric and the remaining observations are character values. So SAS would make all the character values blank.

Determine Variable Type

ListenData Logo
Spread the Word!
Share
Related Posts
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

0 Response to "Avoid Truncation in PROC IMPORT"

Post a Comment

Next → ← Prev
Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.