This tutorial explains how to stop truncation in character variables while importing CSV or Tab files with PROC IMPORT. It is a common issue when your CSV file has a character variable having inconsistent length such as open-end comments, company names and addresses etc.
Important Note :
Sample Dataset
The sample data is shown in the image below. We have two variables named ID and Score. ID is a numeric variable and the Score is a character variable.
Importing CSV File with PROC IMPORT
The numeric variable "ID" didn't get truncated. However, the character variable Score got truncated. As we have defined guessingrows=3, SAS considers the length of the character variable based on the third row (including header row) of the respective variable.
For demonstration purpose, we have used guessingrows=3. If this option is omitted, SAS would scan first 20 rows.
Method I : Fix Truncating Character Variables
The simplest way to fix this issue is to use the higher number in GUESSINGROWS option.
Important Note :
By default, SAS scans 20 rows to determine the appropriate data type and length for the columns.
Sample Dataset
The sample data is shown in the image below. We have two variables named ID and Score. ID is a numeric variable and the Score is a character variable.
![]() |
Sample Dataset |
Importing CSV File with PROC IMPORT
/* For demonstration, use 3 rows for guessing the column length */Output
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
dbms=csv replace
out=temp;
guessingrows=3; /* if omitted, the default is 20 */
proc print noobs;
run;
The numeric variable "ID" didn't get truncated. However, the character variable Score got truncated. As we have defined guessingrows=3, SAS considers the length of the character variable based on the third row (including header row) of the respective variable.
![]() |
PROC IMPORT : Truncation Issue |
For demonstration purpose, we have used guessingrows=3. If this option is omitted, SAS would scan first 20 rows.
Method I : Fix Truncating Character Variables
The simplest way to fix this issue is to use the higher number in GUESSINGROWS option.
Change GUESSINGROWS to 3000 (or higher value)
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv"
dbms=csv replace
out=temp;
guessingrows=3000;
proc print noobs;
run;
In SAS 9.2, the maximum value of GUESSINGROWS is 32,767.
In SAS 9.3 or above, the maximum value of GUESSINGROWS is 2147483647
To define the max value, write GUESSINGROWS = MAX
Should I use GUESSINGROWS= MAX for simplicity?
It depends. If your file is heavy and contains hundreds of thousands of records, it would make the import process VERY SLOW. You might find that earlier it was taking 20-30 seconds to import the file when you were not using GUESSINGROWS= MAX. Later it is taking 5-10 minutes to import it. If your file is short, don't hesitate to use GUESSINGROWS = MAX.
Method 2 : Use the generated PROC IMPORT code and Modify it
Step 1. Run PROC IMPORT Code Once
440 data WORK.TEMP ;
441 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
442 infile 'C:\Users\Deepanshu\Documents\dat2.csv' delimiter = ',' MISSOVER DSD lrecl=32767
442! firstobs=2 ;
443 informat ID best32. ;
444 informat Score $6. ;
445 format ID best12. ;
446 format Score $6. ;
447 input
448 ID
449 Score $
450 ;
451 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
452 run;
Step 2. Copy the generated import code from the log to your program, remove the line numbers.
Step 3. Change width of the Score Variable from 6 to 30 in both INFORMAT and FORMAT. It's done!
Other Application of GUESSINGROWS
It is useful if you have both character and numeric values in a variable and the first 20 observations are all numeric and the remaining observations are character values. So SAS would make all the character values blank.
![]() |
Determine Variable Type |
Post a Comment