Avoid Truncation in PROC IMPORT

Deepanshu Bhalla Add Comment

This tutorial explains how to stop truncation in character variables while importing CSV or Tab files with PROC IMPORT. It is a common issue when your CSV file has a character variable having inconsistent length such as open-end comments, company names and addresses etc.

Important Note :
By default, SAS scans 20 rows to determine the appropriate data type and length for the columns.
Sample Dataset

The sample data is shown in the image below. We have two variables named ID and Score. ID is a numeric variable and the Score is a character variable.

Sample SAS Dataset
Importing CSV File with PROC IMPORT
/* For demonstration, use 3 rows for guessing the column length */
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
    dbms=csv replace
    guessingrows=3; /* if omitted, the default is 20 */
proc print noobs;

The numeric variable "ID" didn't get truncated. However, the character variableScore got truncated. As we have defined guessingrows=3, SAS considers the length of the character variable based on the third row (including header row) of the respective variable.

PROC IMPORT : Truncation Issue

For demonstration purpose, we have used guessingrows=3. If this option is omitted, SAS would scan first 20 rows.

Method I : Fix Truncating Character Variables

The simplest way to fix this issue is to use the higher number in GUESSINGROWS.

Change GUESSINGROWS to 3000 (or higher value)

proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
    dbms=csv replace
proc print noobs;

In SAS 9.2, the maximum value of GUESSINGROWS is 32,767.

In SAS 9.3 or above, the maximum value of GUESSINGROWS is 2147483647

To define the max value, write GUESSINGROWS = MAX
Should I use GUESSINGROWS= MAX for simplicity?

It depends. If your file is heavy and contains hundreds of thousands of records, it would make the import process VERY SLOW. You mightfind that earlier it was taking 20-30 seconds to import the file when you were not using GUESSINGROWS= MAX. Later it is taking 5-10 minutes to import it. If your file is short, don't hesitate to use GUESSINGROWS = MAX.

Method 2 : Use the generated PROC IMPORT code and Modify it

Step 1 : Run PROC IMPORT Code Once.

440   data WORK.TEMP  ;
441   %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
442   infile 'C:\Users\Deepanshu\Documents\dat2.csv' delimiter = ',' MISSOVER DSD lrecl=32767
442! firstobs=2 ;
443     informat ID best32. ;
444     informat Score $6. ;
445     format ID best12. ;
446     format Score $6. ;
447   input
448         ID
449         Score $
450   ;
451   if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
452   run;

Step 2 : Copy the generated import code from the log to your program, remove the line numbers.

Step 3 : Change width of the Score Variable from 6 to 30 in both INFORMAT and FORMAT. It's done!

Other Application of GUESSINGROWS

It is useful if you have both character and numeric values in a variable and the first 20 observations are all numeric and the remaining observations are character values. So SAS would make all the character values blank.

SAS : Determine Variable Type
Determine Variable Type
Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "Avoid Truncation in PROC IMPORT"
Next → ← Prev