Advanced SAS Interview Questions and Answers

Deepanshu Bhalla 33 Comments
These are the most frequently asked Advanced SAS Interview Questions and Answers. It covers topics on PROC SQL, SAS Macros, Advanced Data Manipulation case studies. These questions are best suited for interviews for SAS Developer, SAS Programmer role. It includes some common tricky and tough questions that are generally asked in an interview.
Advanced SAS Interview Questions and Answers
To see a list of Base SAS Interview Questions, Click on this link :  50+ Base SAS Interview Questions and Answers

1. Two ways to select every second row in a data set
data example;
set sashelp.class;
if mod(_n_,2) eq 0;
run;
MOD Function returns the remainder from the division of the first argument by the second argument. _N_ corresponds to each row. The second row would be calculated like (2/2) which returns zero remainder.
data example1;
do i = 2 to nobs by 2;
set sashelp.class point=i nobs=nobs;
output;
end;
stop;
run;

2. How to select every second row of a group

Suppose we have a table sashelp.class. We want every second row by variable 'sex'.
proc sort data = sashelp.class;
by sex;
run;
data example2 (drop = N);
set sashelp.class;
by sex;
if first.sex then N = 1;
else N +1;
if N = 2 then output;
run;
Tutorial : First. and Last. Variables


3. How to calculate cumulative sum by group

Create Sample Data
data abcd;
input x y;
cards;
1 25
1 28
1 27
2 23
2 35
2 34
3 25
3 29
;
run; 
Cumulative Sum by Group
Cumulative Sum by X
data example3;
set abcd;
if first.x then z1 = y;
else z1 + y;
by x;
run;
4. Can both WHERE and IF statements be used for subsetting on a newly derived variable?
SAS : WHERE vs. IF
No. Only IF statement can be used for subsetting when it is based on a newly derived variable. WHERE statement would return an error "newly derived variable is not on file".

Please note that WHERE Option can be used for subsetting on a newly created variable.
data example4 (where =(z <=50));
set abcd;
z = x*y;
run;
5. Select the Second Highest Score with PROC SQL
data example5;
input Name $ Score;
cards;
sam 75
dave 84
sachin 92
ram 91
;
run;
proc sql;
select *
from example5
where score in (select max(score) from example5 where score not in (select max(score) from example5));
quit; 
Tutorial : Learn PROC SQL with 20 Examples


6. Two ways to create a macro variable that counts the number of observations in a dataset

data _NULL_;
if 0 then set sashelp.class nobs=n;
call symputx('totalrows',n);
stop;
run;
%put nobs=&totalrows.;
proc sql;
select count(*) into: nrows from sashelp.class;
quit;
%put nobs=%left(&nrows.);

7. Suppose you have data for employees. It comprises of employees' name, ID and manager ID. You need to find out manager name against each employee ID.

SQL: Self Join
Create Sample Data
data example2;
input Name $ ID ManagerID;
cards;
Smith 123 456
Robert 456  .
William 222 456
Daniel 777 222
Cook 383 222
;
run;
SQL Self Join
proc sql;
create table want as
select a.*, b.Name as Manager
from example2 as a left join example2 as b
on a.managerid = b.id;
quit;

Data Step : Self Join 
proc sort data=example2 out=x;
by ManagerID;
run;
proc sort data=example2 out=y (rename=(Name=Manager ID=ManagerID ManagerID=ID));
by ID;
run;
data want;
merge x (in= a) y (in=b);
by managerid;
if a;
run;

8.  Create a macro variable and store TomDick&Harry

Issue : When the value is assigned to the macro variable, the ampersand placed after TomDick may cause SAS to interpret it as a macro trigger and an warning message would be occurred.
%let x = %NRSTR(TomDick&Harry);
%PUT &x.;
%NRSTR function is a macro quoting function which is used to hide the normal meaning of special tokens and other comparison and logical operators so that they appear as constant text as well as to mask the macro triggers ( %, &).
9. Difference between %STR and %NRSTR
Both %STR and %NRSTR functions are macro quoting functions which are used to hide the normal meaning of special tokens and other comparison and logical operators so that they appear as constant text. The only difference is %NRSTR can mask the macro triggers ( %, &) whereas %STR cannot.

10. How to pass unmatched single or double quotations text in a macro variable
%let eg  = %str(%'x);
%let eg2 = %str(x%");
%put &eg;
%put &eg2;
If the argument to %STR or %NRSTR contains an single or double quotation mark or an unmatched open or close parenthesis, precede each of these characters with a % sign.

11. How can we use COUNTW function in a macro
%let cntvar = %sysfunc(countw(&nvar));
There are several useful Base SAS function that are not directly available in Macro, %Sysfunc enables those function to make them work in a macro.

12.
%let x=temp;
%let n=3;
%let x3=result;
%let temp3 = result2;
 What %put &&x&n; and %put &&&x&n; would return?
  1. &&x&n : Two ampersands (&&) resolves to one ampersand (&) and scanner continues and then N resolves to 3 and then &x3 resolves to result.
  2. &&&x&n :  First two ampersands (&&) resolves to & and then X resolves to temp and then N resolves to 3. In last, &temp3 resolves to result2.

13. How to reference a macro variable in selection criteria
Use double quotes to reference a macro variable in a selection criteria. Single quotes would not work.
SAS : Reference Macro Variable

14. How to debug %IF %THEN statements in a macro code
MLOGIC option will display how the macro variable resolved each time in the LOG file as TRUE or FALSE for %IF %THEN.

15. Difference between %EVAL and %SYSEVALF functions 

Both %EVAL and %SYSEVALF are used to perform mathematical and logical operation with macro variables. %let last = %eval (4.5+3.2); returns error as %EVAL cannot perform arithmetic calculations with operands that have the floating point values. It is when the %SYSEVALF function comes into picture.
%let last2 = %sysevalf(4.5+3.2);
%put &last2;

16. What would be the value of i after the code below completes
data test;
set temp;
array nvars {3} x1-x3;
do i = 1 to 3;
if nvars{i} > 3 then nvars{i} =.;
end;
run;
Answer is 4. It is because when the first time the loop processes, the value of count is 1; the second time, 2; and the third time, 3. At the beginning of the fourth iteration, the value of count is 4, which is found to be greater than the stop value of 3 so the loop stops. However, the value of i is now 4 and not 3, the last value before it would be greater than 3 as the stop value.

17. How to compare two tables with PROC SQL

The EXCEPT operator returns rows from the first query that are not part of the second query.
proc sql;
select * from newfile
except
select * from oldfile;
quit;

18. Selecting Random Samples with PROC SQL

The RANUNI and OUTOBS functions can be used for selecting random samples. The RANUNI function is used to generate random numbers.
proc sql outobs = 10;
create table tt as
select * from sashelp.class
order by ranuni(1234);
quit;
19. How to use NODUPKEY kind of operation with PROC SQL

In PROC SORT, NODUPKEY option is used to remove duplicates based on a variable. In SQL, we can do it like this :
proc sql noprint;
create table tt (drop = row_num) as
select *, monotonic() as row_num
from readin
group by name
having row_num = min(row_num)
order by ID;
quit;

20. How to make SAS stop macro processing on Error

Check out this link - Stop SAS Macro on Error


21. Count Number of Variables assigned in a macro variables
%macro nvars (ivars);
%let n=%sysfunc(countw(&ivars));
%put &n;
%mend;
%nvars (X1 X2 X3 X4);

22. Two ways to assign incremental value by group

See the snapshot below -

Advanced SAS Interview Questions
Prepare Input Data
data xyz;
input x $;
cards;
AA
AA
AA
BB
BB
;
run;
Data Step Code
data example22;
set xyz;
if first.x then N+1;
by x;
proc print;
run;

PROC SQL Code
proc sql;
select a.x, b.N from xyz a
inner join
(select x, monotonic() as N
from (
select distinct x
from xyz)) b
on a.x=b.x;
quit;

23. Prepare a Dynamic Macro with %DO loop

Check out this link - Dynamic SAS Macro


24. Write a SAS Macro to extract Variable Names from a Dataset
*Selecting all the variables;
proc sql noprint;
select name into : vars separated by " "
from dictionary.columns
where LIBNAME = upcase("work")
and MEMNAME = upcase("predata");
quit;
The DICTIONARY.COLUMNS contains information such as name, type, length, and format, about all columns in the table. LIBNAME : Library Name, MEMNAME : Dataset Name
%put variables = &vars.;

25. How would DATA STEP MERGE and PROC SQL JOIN works on the following datasets shown in the image below?
Many to Many Merge

The DATA step does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records. Whereas, PROC SQL creates a cartesian product in case of many to many relationship.

Detailed Explanation - Many to Many Merge

26. Two ways to create a blank table

Copy structure of existing table
PROC SQL;
CREATE TABLE EXAMPLE2 LIKE TEMP;
QUIT;
Enforce FALSE condition in Selection Criteria
PROC SQL NOPRINT;
CREATE TABLE EXAMPLE2 AS
SELECT * FROM TEMP
WHERE 1=0;
QUIT;
27. How to insert rows in a table with PROC SQL



28. Difference between %LOCAL and %GLOBAL
%LOCAL is used to create a local macro variable during macro execution. It gets removed when macro finishes its processing.
%GLOBAL is used to create a global macro variable and would remain accessible till the end of a session . It gets removed when session ends.

29. Write a macro with CALL EXECUTE

Detailed Explanation of CALL EXECUTE


30. Write a macro to split data into N number of datasets

Suppose you are asked to write a macro to split large data into 2 parts (not static 2). In the macro, user should have flexibility to change the number of datasets to be created.
%macro split(inputdata=, noofsplits=2);
data %do i = 1 %to &noofsplits.;
split&i. %end;;
retain x;
set &inputdata. nobs=nobs;
if _n_ eq 1 then do;
if mod(nobs,&noofsplits.) eq 0
then x=int(nobs/&noofsplits.);
else x=int(nobs/&noofsplits.)+1;
end;
if _n_ le x then output split1;
%do i = 2 %to &noofsplits.;
else if _n_ le (&i.*x)
then output split&i.;
%end;
run;
%mend split;
%split(inputdata=temp, noofsplits=2);

31. Store value in each row of a variable into macro variables
data _null_;
set sashelp.class ;
call symput(cats('x',_n_),Name);
run;
%put &x1. &x2. &x3.;
The CATS function is used to concatenate 'x' with _N_ (row index number) and removes leading and trailing spaces to the result.

32. How PROC TRANSPOSE works?

Tutorial : PROC TRANSPOSE Explained

33. How to check if SAS dataset is empty?

Tutorial : Check number of observations


End Note

The above list of SAS interview questions are designed especially for experienced SAS programmers and analysts. These are real world examples with proper explanation. Most of the tough SAS interviews include SAS SQL and Macros questions. Before going for interview, you need to brush up your concepts of SAS programming. It is advised to go through concepts when practicing above interview questions.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

33 Responses to "Advanced SAS Interview Questions and Answers"
  1. All these are an awesome resource.Thanks for these. Could you suggest any alternate solution to question no 7 using datastep ?

    ReplyDelete
    Replies
    1. Yes, question no. 7 can easily be done with Data step. I have updated the solution of question no. 7. Thanks!

      Delete
    2. Hi Deepanshu, I think a small thing need to be updated in merge syntax for the Question No. 7
      Data want;
      merge x(in=a) y(in=b drop = ID);
      by managerID;
      if a;
      run;
      we need to drop ID to get the desired result

      Delete
  2. Hi Deepanshu.. Your blogs are really helpful. I have finished by Base Programming certification with 98%. I am put up in Bangalore. Can you help me with companies looking for SAS candidates? Thanks in Advance

    ReplyDelete
  3. hi Deepanshu..i am looking for more sas macro question along with example.
    can you please forward me on mail address along with some macro book.
    thanks.. yashpalsingh942@gmail.com

    ReplyDelete
  4. how can we create an algo in sas to find any combination of say 3 letters. These 3 letters can have any other alphabets in between.

    ReplyDelete
  5. Great job man. I think ques 19 doesnt include the Nodupkey in the proc sql query. can you please correct it?

    ReplyDelete
    Replies
    1. Q19 explains how to replicate the usage of NODUPKEY in PROC SQL query.

      Delete
  6. Great Job dude. I have one query..
    Q : I need to check dataset is empty or not. How can i do this?

    ReplyDelete
    Replies
    1. data new;
      old data;
      if _n_ >=1 then newvar= nonempty;
      else newvar=empty;
      run;

      Delete
    2. using proc content we check

      Delete
    3. I have just written an article on this topic. See the link in Q33. Thanks!

      Delete
  7. Also how can we print multiple datasets using single proc print statement horizontally in output window. Can anyone help here

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Can you please explain me the 30th question- Split macro
    Why do we need the last Do loop? in split macro?

    Please explain the program.
    Thanks!

    ReplyDelete
  10. Excellent question...mostly asked during interview....thanks for writing in such detail.

    ReplyDelete
  11. please post more real life scenarios

    ReplyDelete
  12. Hi for Q7 i am getting error as variable managerid has been defined as both character and numeric .

    Any suggestions why this is coming?

    ReplyDelete
  13. please give the answer:

    use data in the Data Sheet
    1.       Consider you have Sales Data and a variable Order_date is the Date of Order Placed. Write a Macro that will Display the Report of all the Sales Conducted on Daily Basis ?

    Example: If today is 01 January 2000, If I call macro today, It shows orders placed on 01 January 2000. If I call same macro on 02 January, It should give Report of orders placed on 02 January.

    proc print data=Sales_Data;

    where Order_date='01January2000'd;

    title “Below are the details of Sales data for 01-January-2000”;

    Run;


    2.       Consider Same Sales Data, Write a Macro to get summarized report on any particular year ?
    Example: I want to run proc means on sales data for the year 2004, may be tomorrow I want to get for 2005 or so.

    proc print data=Sales_Data;

    where Order_date='2004'd;

    Run;


    3.       Consider the Same Sales Data. Suppose you have 3 Types of Orders and a variable Order_type having values either Type 1 Type 2 or Type 3 for Each Order Placed and you have another variable Quantity, having the total number of quantities ordered for each order. Write a Macro to display the frequency of quantity ordered for given order type in any particular year.
    Example: If I mention value of year 2004 and type 3, It should display frequency of quantities placed in 2004 with type 3 with appropriate title. If I just mention year 2004, It should display frequency of all quantities ordered in 2004 with all types.


    4.       Write a Macro to get pdf report of orders placed in range of years with appropriate title?
    Example: If I select range of years 2003 to 2006, I should get pdf report of orders placed in all these years separately on desktop.



    5.       Write a Macro to get a Report for each year and each order type, so we get the mean of total_retail_price in the title ?

    Example: Suppose I choose year = 2001 and order_type = 2, It should display orders placed in these parameters and I also want the mean of total_retail_price in the title of the report.













    6.       Suppose you are having live Data, Orders coming from different Countries. Write a Macro to split this master Data into smaller Datasets depending on the Unique value of each countries ?

    Example: Suppose I got Order placed from US, CA Today, so I call the macro and it will split the master data into 2 data US and CA having those observation. Tomorrow suppose I have order placed from CH, JP and AU, I call same macro and it will split master data into 3 smaller Data.

    ReplyDelete
  14. please give the answer:

    1.       Consider you have Sales Data and a variable Order_date is the Date of Order Placed. Write a Macro that will Display the Report of all the Sales Conducted on Daily Basis ?

    Example: If today is 01 January 2000, If I call macro today, It shows orders placed on 01 January 2000. If I call same macro on 02 January, It should give Report of orders placed on 02 January.

    - 2.       Consider Same Sales Data, Write a Macro to get summarized report on any particular year ?
    Example: I want to run proc means on sales data for the year 2004, may be tomorrow I want to get for 2005 or so.

    - 3.       Consider the Same Sales Data. Suppose you have 3 Types of Orders and a variable Order_type having values either Type 1 Type 2 or Type 3 for Each Order Placed and you have another variable Quantity, having the total number of quantities ordered for each order. Write a Macro to display the frequency of quantity ordered for given order type in any particular year.
    Example: If I mention value of year 2004 and type 3, It should display frequency of quantities placed in 2004 with type 3 with appropriate title. If I just mention year 2004, It should display frequency of all quantities ordered in 2004 with all types.


    - 4.       Write a Macro to get pdf report of orders placed in range of years with appropriate title?
    Example: If I select range of years 2003 to 2006, I should get pdf report of orders placed in all these years separately on desktop.

    - 5.       Write a Macro to get a Report for each year and each order type, so we get the mean of total_retail_price in the title ?

    Example: Suppose I choose year = 2001 and order_type = 2, It should display orders placed in these parameters and I also want the mean of total_retail_price in the title of the report.

    Thanks,
    Shweta

    ReplyDelete
    Replies
    1. 1. Consider you have Sales Data and a variable Order_date is the Date of Order Placed.
      Write a Macro that will Display the Report of all the Sales Conducted on Daily Basis ?
      ;

      Data Sales;
      input Product $ sales_id order_id ;
      cards;
      Mobile 101 201
      Laptop 102 202
      Mobile 103 203
      Laptop 104 304
      ;
      run;

      Data order;
      input order_date order_id;
      informat order_date date9.;
      format order_date date9.;
      cards;
      01DEC2017 201
      02DEC2017 202
      02DEC2017 203
      01DEC2017 204
      ;
      run;


      %let daily=%sysfunc(today());

      %put &daily;

      proc sql;
      select count(a.product) into:Total_Order from sales as a left join order as b
      on a.order_id=b.order_id
      where b.order_date=&daily;
      quit;

      %put &Total_Order;

      Title 'Find the total sales as on &daily is &Total_Order';

      - 2. Consider Same Sales Data, Write a Macro to get summarized report on any particular year ?
      Example: I want to run proc means on sales data for the year 2004, may be tomorrow I want to get for 2005 or so.

      ;

      Data sales_data;
      input sales_id qty item $ sales_date;
      informat sales_date date9.;
      format sales_date date9.;
      cards;
      101 2 Mobile 25jun2005
      101 3 Mobile 25jul2005
      101 4 Mobile 25aug2005
      101 6 Mobile 25jun2004
      101 8 Mobile 25jul2004
      101 2 Mobile 25aug2004
      ;
      run;

      %let year=2005; * Change year as per your requirements;

      proc sql;
      select item,total_qty_sales from(
      select item,year(sales_date) as year,sum(qty) as total_qty_sales from sales_data
      group by 1,2)
      where year=&year;
      quit;


      3. Consider the Same Sales Data. Suppose you have 3 Types of Orders and a variable Order_type having
      values either Type 1 Type 2 or Type 3 for Each Order Placed and you have another variable Quantity, having
      the total number of quantities ordered for each order.
      Write a Macro to display the frequency of quantity
      ordered for given order type in any particular year.
      Example: If I mention value of year 2004 and type 3, It should display frequency of quantities placed in
      2004 with type 3 with appropriate title. If I just mention year 2004, It should display frequency of all
      quantities ordered in 2004 with all types.
      ;


      Data sales_data;
      input sales_id qty item $ sales_date order_type;
      informat sales_date date9.;
      format sales_date date9.;
      cards;
      101 2 Pdrive 25jun2005 1
      101 3 Pdrive 25jul2005 1
      101 4 Mobile 25aug2005 2
      101 6 Mobile 25jun2004 2
      101 8 Mobile 25jul2004 2
      101 2 Mobile 25aug2004 2
      102 2 Pdrive 25jun2005 1
      102 3 Mobile 25jul2005 2
      102 4 Laptop 25aug2005 3
      102 6 Mobile 25jun2004 2
      102 8 Mobile 25jul2004 2
      102 2 Mobile 25aug2004 2
      103 2 Laptop 25jun2005 3
      103 3 Mobile 25jul2005 2
      103 4 Laptop 25aug2005 3
      103 6 Mobile 25jun2004 2
      103 8 Mobile 25jul2004 2
      103 2 Laptop 25aug2004 3
      ;
      ;
      run;

      %macro freq(year=,type=);
      proc sql;
      select item,total_qty from(
      select item,year(sales_date) as year ,order_type ,sum(qty) as total_qty from sales_data
      group by 1,2,3)
      where year=&year and order_type=&type;
      quit;
      %mend;
      %freq(year=2005,type=1);

      Delete
    2. Hello,
      When I tried to execute the macro to print the freq of orders placed based on type and year,I am getting below error.Please help.

      %macro freq(year=,type=);
      proc sql;
      select total_qty from(
      select year(Order_date) as year ,order_type ,sum(quantity) as total_qty from Vani_Sas.sales_data
      group by Order_type)
      where year=&year and order_type=&type;
      quit;
      %mend;
      %freq(year=2005,type=1);

      NOTE: The query requires remerging summary statistics back with the original data.
      NOTE: No rows were selected.

      Delete
    3. its not the error it is the note.

      try this on

      Data sales ;
      input sales_id qty item $ sales_date order_type;
      informat sales_date date9.;
      format sales_date date9.;
      cards;
      101 2 Pdrive 25jun2005 1
      101 3 Pdrive 25jul2005 1
      101 4 Mobile 25aug2005 2
      101 6 Mobile 25jun2004 2
      101 8 Mobile 25jul2004 2
      101 2 Mobile 25aug2004 2
      102 2 Pdrive 25jun2005 1
      102 3 Mobile 25jul2005 2
      102 4 Laptop 25aug2005 3
      102 6 Mobile 25jun2004 2
      102 8 Mobile 25jul2004 2
      102 2 Mobile 25aug2004 2
      103 2 Laptop 25jun2005 3
      103 3 Mobile 25jul2005 2
      103 4 Laptop 25aug2005 3
      103 6 Mobile 25jun2004 2
      103 8 Mobile 25jul2004 2
      103 2 Laptop 25aug2004 3
      ;
      ;
      run;



      %macro freq(year,type);
      proc sql;
      select year,quant,order_type from (
      select distinct year(sales_date) as year,order_type,sum(qty) as quant from sales group by order_type)
      where year=&year and order_type=&type;
      quit;
      %mend;

      %freq(2005,1);

      Delete
  15. Hi Deepanshu,

    Awesome explanation and cleared so many doubts here.
    Can you please tell me what are the companies that offer Jobs based on SAS in Hyderabad?


    Thank you.

    ReplyDelete
  16. Short code for question 30....

    options mlogic symbolgen mprint;

    %macro split(ds,n);

    %do i=1 %to &n;

    data data&i. ;
    set &ds. nobs=nobs;
    f=ceil(_n_*&n/nobs);

    if f=&i then output data&i;

    run;

    %end;

    %mend();

    %split( ds=sasuser.cargorev,n=2);

    ReplyDelete
  17. I have data set
    Id test$ results
    101 WBC 90
    101 RBC 103
    101 alc 100
    101 WBC 92
    101 alc 93
    101 RBC 92
    101 WBC 96
    101 RBC 92
    101 alc 93
    I need this one sub ,one test second highest value

    ReplyDelete
  18. Hi Deepanshu,

    Do you provide coachings for SAS?
    if yes then let me know how can i connect with you.

    ReplyDelete
  19. Post Questions based on, creating datasets based on dates using macros

    ReplyDelete
  20. data xyz;
    input x $;
    cards;
    AA
    AA
    AA
    BB
    BB
    ;
    run;

    **// incremental value by group ///******;
    proc sql;
    select *, case when x="AA" then 1
    else 1+1 end as n from xyz;quit;

    ReplyDelete
  21. Nice Post... this will help mainly for Experience persons.

    ReplyDelete
Next → ← Prev