This tutorial explains how to stop truncation in character variables while importing CSV or Tab files with PROC IMPORT. It is a common issue when your CSV file has a character variable having inconsistent length such as open-end comments, company names and addresses etc.
By default, SAS scans 20 rows to determine the appropriate data type and length for the columns.
The sample data is shown in the image below. We have two variables named ID and Score. ID is a numeric variable and the Score is a character variable.
/* For demonstration, use 3 rows for guessing the column length */ proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv" dbms=csv replace out=temp; guessingrows=3; /* if omitted, the default is 20 */ proc print noobs; run;
The numeric variable "ID" didn't get truncated. However, the character variableScore got truncated. As we have defined guessingrows=3, SAS considers the length of the character variable based on the third row (including header row) of the respective variable.
For demonstration purpose, we have used guessingrows=3. If this option is omitted, SAS would scan first 20 rows.
Method I : Fix Truncating Character Variables
The simplest way to fix this issue is to use the higher number in GUESSINGROWS.
Change GUESSINGROWS to 3000 (or higher value)
proc import datafile="C:\Users\Deepanshu\Documents\dat2.csv" dbms=csv replace out=temp; guessingrows=3000; proc print noobs; run;
In SAS 9.2, the maximum value of GUESSINGROWS is 32,767.
In SAS 9.3 or above, the maximum value of GUESSINGROWS is 2147483647
To define the max value, write GUESSINGROWS = MAX
It depends. If your file is heavy and contains hundreds of thousands of records, it would make the import process VERY SLOW. You mightfind that earlier it was taking 20-30 seconds to import the file when you were not using GUESSINGROWS= MAX. Later it is taking 5-10 minutes to import it. If your file is short, don't hesitate to use GUESSINGROWS = MAX.
Method 2 : Use the generated PROC IMPORT code and Modify it
Step 1 : Run PROC IMPORT Code Once.
440 data WORK.TEMP ; 441 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 442 infile 'C:\Users\Deepanshu\Documents\dat2.csv' delimiter = ',' MISSOVER DSD lrecl=32767 442! firstobs=2 ; 443 informat ID best32. ; 444 informat Score $6. ; 445 format ID best12. ; 446 format Score $6. ; 447 input 448 ID 449 Score $ 450 ; 451 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 452 run;
Step 2 : Copy the generated import code from the log to your program, remove the line numbers.
Step 3 : Change width of the Score Variable from 6 to 30 in both INFORMAT and FORMAT. It's done!
It is useful if you have both character and numeric values in a variable and the first 20 observations are all numeric and the remaining observations are character values. So SAS would make all the character values blank.
Determine Variable Type |
Share Share Tweet