In this tutorial, we will cover multiple ways to select a random sample in SAS.
PROC SURVEYSELECT
: Select a Random Sample with a Fixed Number of Observationsproc surveyselect data=mydata /* Select random sample from dataset "mydata" */ out=newdata /* Output the random sample to dataset "newdata" */ method=srs /* Use Simple Random Sampling method */ sampsize=5 /* Select 5 observations randomly */ seed=123; /* Set seed to get same random sampling every time you run */ run;
PROC SURVEYSELECT
: Select a Random Sample with a Fixed Percentage of Observationsproc surveyselect data=mydata /* Select random sample from dataset "mydata" */ out=newdata /* Output the random sample to dataset "newdata" */ method=srs /* Use Simple Random Sampling method */ samprate=0.3 /* Select 30% of observations randomly */ seed=123; /* Set seed to get same random sampling every time you run */ run;
PROC SQL
: Select a Random Sample with a Fixed Number of Observationsproc sql outobs=5 /* Select 5 observations randomly */; create table newdata as select * from mydata order by ranuni(123) /* Set seed 123 to get same random sampling every time you run */; quit;
Data Step
: Select a Random Sample with a Fixed Number of Observationsdata newdata; n = rand("uniform"); set mydata; run; proc sort data=newdata out=newdata2(drop=n); by n; run; data newdata2; set newdata2 (obs=5) /* Select 5 observations randomly */; run;
In the examples below, we are using the built-in SAS dataset sashelp.cars
.
PROC SURVEYSELECT: Random Sampling
Random Sample with a Fixed Number of Observations using PROC SURVEYSELECT
The following SAS code shows how to use the PROC SURVEYSELECT procedure to perform simple random sampling (SRS). After running this code, the "newdata" dataset will contain a simple random sample of 5 observations from the "sashelp.cars" dataset.
proc surveyselect data=sashelp.cars out=newdata method=srs sampsize=5 seed=123; run;
The sampsize=5 option indicates the desired sampling size of 5 rows. seed=123
: This sets the random number generator seed to 123. Setting the seed ensures reproducibility, meaning if you run this code multiple times, you will get the same random sample each time.
Random Sample with a Fixed Percentage of Observations using PROC SURVEYSELECT
The sashelp.cars
dataset has 428 observations in total. The following code randomly selects 30% of these 428 observations, resulting in 129 observations in the "newdata" dataset. The samprate=0.3 option indicates the desired sampling rate of 30%.
proc surveyselect data=sashelp.cars out=newdata method=srs samprate=0.3 seed=123; run;
PROC SQL: Random Sampling
The following SAS code shows how to use the PROC SQL procedure to perform random sampling. The outobs= option limits the output to only 5 observations.
proc sql outobs=5; create table newdata as select * from sashelp.cars order by ranuni(123); quit;
order by ranuni(123);
: This is the ORDER BY clause used to sort the data randomly. The ranuni(123) function generates a random number between 0 and 1 based on the seed value 123. By ordering the data randomly, the subsequent 5 observations will be selected as the first 5 rows, effectively creating a random sample.
Data Step: Random Sampling
The following code shows how to use the SAS Data Step to perform random sampling.
data newdata; n = rand("uniform"); set sashelp.cars; run; proc sort data=newdata out=newdata2(drop=n); by n; run; data newdata2; set newdata2 (obs=5) /* Select 5 observations randomly */; run;
In the SAS code above, a new dataset named "newdata" is being created with a variable "n" assigned a random uniform value. Then, the contents of the sashelp.cars dataset are copied into newdata. Next, the newdata dataset is sorted by the variable "n", and the sorted dataset is saved as newdata2, dropping the variable "n" from it. Finally, a new dataset named newdata2 is created, and it takes the first 5 observations from the sorted newdata2 dataset. The obs=5 option in the set statement limits the number of observations to 5, effectively selecting 5 random observations from the sorted newdata2 dataset.
Share Share Tweet