4 Ways to Select a Random Sample in SAS

Deepanshu Bhalla Add Comment

In this tutorial, we will cover multiple ways to select a random sample in SAS.

PROC SURVEYSELECT: Select a Random Sample with a Fixed Number of Observations
proc surveyselect data=mydata /* Select random sample from dataset "mydata" */
    out=newdata /* Output the random sample to dataset "newdata" */
    method=srs /* Use Simple Random Sampling method */
    sampsize=5 /* Select 5 observations randomly */
    seed=123; /* Set seed to get same random sampling every time you run */
run;
PROC SURVEYSELECT: Select a Random Sample with a Fixed Percentage of Observations
proc surveyselect data=mydata /* Select random sample from dataset "mydata" */
    out=newdata /* Output the random sample to dataset "newdata" */
    method=srs /* Use Simple Random Sampling method */
    samprate=0.3 /* Select 30% of observations randomly */
    seed=123; /* Set seed to get same random sampling every time you run */
run;
PROC SQL: Select a Random Sample with a Fixed Number of Observations
proc sql outobs=5 /* Select 5 observations randomly */;
create table newdata as
select * from mydata
order by ranuni(123) /* Set seed 123 to get same random sampling every time you run */;
quit;
Data Step: Select a Random Sample with a Fixed Number of Observations
data newdata;
n = rand("uniform");
set mydata;
run;

proc sort data=newdata out=newdata2(drop=n);
by n;
run;

data newdata2;
set newdata2 (obs=5) /* Select 5 observations randomly */;
run;

In the examples below, we are using the built-in SAS dataset sashelp.cars.

PROC SURVEYSELECT: Random Sampling

Random Sample with a Fixed Number of Observations using PROC SURVEYSELECT

The following SAS code shows how to use the PROC SURVEYSELECT procedure to perform simple random sampling (SRS). After running this code, the "newdata" dataset will contain a simple random sample of 5 observations from the "sashelp.cars" dataset.

proc surveyselect data=sashelp.cars
    out=newdata
    method=srs
    sampsize=5
    seed=123;
run;

The sampsize=5 option indicates the desired sampling size of 5 rows. seed=123: This sets the random number generator seed to 123. Setting the seed ensures reproducibility, meaning if you run this code multiple times, you will get the same random sample each time.

PROC SURVEYSELECT: Random Sampling

Random Sample with a Fixed Percentage of Observations using PROC SURVEYSELECT

The sashelp.cars dataset has 428 observations in total. The following code randomly selects 30% of these 428 observations, resulting in 129 observations in the "newdata" dataset. The samprate=0.3 option indicates the desired sampling rate of 30%.

proc surveyselect data=sashelp.cars
    out=newdata
    method=srs
    samprate=0.3
    seed=123;
run;

PROC SQL: Random Sampling

The following SAS code shows how to use the PROC SQL procedure to perform random sampling. The outobs= option limits the output to only 5 observations.

proc sql outobs=5;
create table newdata as
select * from sashelp.cars
order by ranuni(123);
quit;

order by ranuni(123);: This is the ORDER BY clause used to sort the data randomly. The ranuni(123) function generates a random number between 0 and 1 based on the seed value 123. By ordering the data randomly, the subsequent 5 observations will be selected as the first 5 rows, effectively creating a random sample.

Data Step: Random Sampling

The following code shows how to use the SAS Data Step to perform random sampling.

data newdata;
n = rand("uniform");
set sashelp.cars;
run;

proc sort data=newdata out=newdata2(drop=n);
by n;
run;

data newdata2;
set newdata2 (obs=5) /* Select 5 observations randomly */;
run;

In the SAS code above, a new dataset named "newdata" is being created with a variable "n" assigned a random uniform value. Then, the contents of the sashelp.cars dataset are copied into newdata. Next, the newdata dataset is sorted by the variable "n", and the sorted dataset is saved as newdata2, dropping the variable "n" from it. Finally, a new dataset named newdata2 is created, and it takes the first 5 observations from the sorted newdata2 dataset. The obs=5 option in the set statement limits the number of observations to 5, effectively selecting 5 random observations from the sorted newdata2 dataset.

Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "4 Ways to Select a Random Sample in SAS"
Next → ← Prev