SAS Statistical Business Analyst Certification Questions and Answers

Deepanshu Bhalla 25 Comments ,
This article covers some of the SAS Statistical Business Analyst certification questions with detailed answers. This certification covers some of the most widely used statistical techniques such as ANOVA, linear and logistic regression.

To crack the exam, candidates should prepare the following topics. The weightage assigned to each topic is mentioned below :
  1. Analysis of Variance (ANOVA) - 10%
  2. Linear Regression - 20%
  3. Logistic Regression - 25% 
  4. Preparing Inputs for Predictive Models - 20%
  5. Measuring Model Performance - 25%

There would be 60 multiple-choice questions a candidate has to answer in 2 hours.  A candidate must achieve a minimum 68% marks to pass the exam

Question 1
Which of the following two sampling methods are acceptable while splitting data into multiple samples - training, validation and test samples? 

A. Simple random sampling without replacement
B. Simple random sampling with replacement 
C. Stratified random sampling without replacement 
D. Sequential random sampling with replacement

Answer : A, C

Explanation : When we split our data into 3 parts - training, validation and test, we perform sampling without replacement. It means a row can be selected only one time which would either move to training, validation or test sample. In other words, same row can never be found in more than one sample. The opposite of this is sampling with replacement. Why not sampling with replacement? If we perform sampling with replacement, we would not be able to assess model performance correctly because same data points that were used to train model exists in validation or test datasets. The explanation of Stratified Sampling is provided in the next question.


Question 2

Which SAS program will divide the original data set into 60% training and 40% validation data sets, stratified by county?

SAS Statistical Business Analyst Question

Answer : C

Explanation : It is required to sort the variable you want to use to stratify sample before running PROC SURVEYSELECT.

Stratified Sampling helps to keep the initial ratio of events to non-events in both the training and validation data sets. It is important in the case of rare-event model. In this case, we are keeping initial ratio of country variable in both the training and validation sample.

Question 3

In order to perform honest assessment on a predictive model, which is an acceptable division between training, validation, and testing data?

A. Training: 50% Validation: 0% Testing: 50%
B. Training: 100% Validation: 0% Testing: 0%
C. Training: 0% Validation: 100% Testing: 0%
D. Training: 50% Validation: 50% Testing: 0%
Answer : D

Explanation : There is no fixed optimal splitting rule. Some researchers use splitting rule - 70% training and 30% validation. Some use 60% training-20% validation -20% test. It is important to note that 20 to 50% of data should be used as a validation set in order to measure model performance.

Question 4

A marketing campaign will send brochures describing an expensive product to a set of customers.
The cost for mailing and production per customer is $50. The company makes $500 revenue for each sale. What is the profit matrix for a typical person in the population?
Profit Matrix


Answer : C

Explanation : It is 450 because $500 revenue was generated and $50 mailing cost was incurred when purchase was made and mail was sent. So, profit = 500 - 50 =450. Profit matrix is used to choose optimal predicted probability cutoff. It is more used rather than sensitivity or specificity to decide the cutoff. The optimal cutoff maximizes the total expected profit.

Question 5

What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data
prior to partitioning the data for honest assessment as opposed to performing the data cleansing
after partitioning the data?

A. It violates assumptions of the model.
B. It requires extra computational effort and time.
C. It omits the training (and test) data sets from the benefits of the cleansing methods.
D. There is no ability to compare the effectiveness of different cleansing methods.

Answer : D

Explanation : If we perform data cleaning before splitting data into training and validation datasets, we would not be able to compare models based on different imputations / transformations methods.

Question 6

ROC Curve
As you move along the ROC curve, what changes?
A. The priors in the population
B. The true negative rate in the population
C. The proportion of events in the training data
D. The probability cutoff for scoring

Answer: D

Explanation: As you move along the ROC curve, you get more true positive (Sensitivity) but also more false positive (1-Specificity). It also changes the probability cutoff for scoring as the idea is to maximize the difference between True Positive and False Positive.

Question 7

How multicollinearity can affect the regression model?

A. Inflate Standard Error of Estimates
B. Deflate Standard Error of Estimates
C  Does not affect the model
D  Help interpreting Estimates

Answer : A

Explanation : Multicollinearity implies high correlation between independent variables. High multicollinearity inflates standard error of parameter estimates and makes the interpretation of estimates incorrect.


Question 8

Which of the following is an assumption of ANOVA?

A. No correlation between any one observation with another.
B. No correlation between independent and dependent variable
C. No correlation between independent variables
D. High correlation between any one observation with another.

Answer : A

Explanation : The most important assumption of ANOVA is independent observations. It implies the response value of one observation does not influence the response value of another.


Question 9

You have 50 observations in ANOVA and you calculate the residuals. What will they sum to?

A. 50
B. 2500
C. 0
D. -50

Answer : C

Explanation : The residuals always sum to 0 no matter the number of observations in your dataset.


Question 10

If you want to compare the average monthly salary of males and females, which of the following two statistical method should you choose?

A. two sample t-test
B. one sample t-test
C. two way ANOVA
D. one way ANOVA

Answer : A, D

Explanation : 
You can use one-way ANOVA and two-sample t-test because you are comparing two groups, males and females. You can use two-way ANOVA when you have more than one independent variable.
Question 11

What values are not affected by oversampling in a rare event model?
A. Predicted Probabilities
B. Intercept
C. Negative Predicted Value
D. Sensitivity and Specificity

Answer: D

Explanation : 
Oversampling does not affect sensitivity or specificity measures. It affects Intercept of a model.

Question 12

An analyst has a sufficient volume of data to perform a 3-way partition of the data into training,
validation, and test sets to perform honest assessment during the model building process.
What is the purpose of the test data set?
A. To provide a unbiased measure of assessment for the final model.
B. To compare models and select and fine-tune the final model.
C. To reduce total sample size to make computations more efficient.
D. To build the predictive models.
Answer: A
Explanation : The test data set is used to assess model without any biaseness.

Question 13

An analyst generates a model using the LOGISTIC procedure. They are now interested in getting
the sensitivity and specificity statistics on a validation data set for a variety of cutoff values.

Which statement and option combination will generate these statistics?
A. Scoredata=valid1 out=roc;
B. Scoredata=valid1 outroc=roc;
C. mode1resp(event= '1') = gender region/outroc=roc;
D. mode1resp(event"1") = gender region/ out=roc;

Answer: B

Explanation: In PROC LOGISTIC, the OUTROC= option tells SAS to generate data for the ROC curve to the SAS data set named roc.

Question 14

Assume a $10 cost for soliciting a non-responder and a $200 profit for soliciting a responder. The logistic regression model gives a probability score named P_R on a SAS data set called VALID. The VALID data set contains the responder variable Purch, a 1/0 variable coded as 1 for responder. Customers will be solicited when their probability score is more than 0.05.

Which SAS program computes the profit for each customer in the data set VALID?
SAS Certified Statistical Business Analyst Questions

A. Option A
B. Option B
C. Option C
D. Option D

Answer: A

Explanation: Profit = Revenue - Cost


Question 15

How c statistics is calculated :

A. percent concordant + (1.5* percent tied)
B. percent concordant + (0.5 * percent tied)
C. percent discordant +  (0.5 * percent tied)
D. percent discordant + (1.5* percent tied)

Answer : B

Explanation : c statistics is also called AUC (Area under curve). See the example below -
SAS Output
Percent Concordant     82.3
Percent Discordant      17.5
Percent Tied                 0.2
c statistics                    0.824 =(82.3/100) + (0.5 * (0.2/100))
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

25 Responses to "SAS Statistical Business Analyst Certification Questions and Answers"
  1. These questions were very useful. Thank you.

    ReplyDelete
  2. good one.. Thank you so much

    ReplyDelete
  3. This is really helpful.I didn't find explanation anywhere.Thanks much buddy!

    ReplyDelete
  4. Very informative. Really enjoyed it! Thank you!

    ReplyDelete
  5. I never really understand when they attach a director to a project that should be epic whether in action or budget who doesn’t have a ton of experience. china sourcing agent

    ReplyDelete
  6. Hey what a brilliant post I have come across and believe me I have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. auto insurance services in avebury

    ReplyDelete
  7. The following LOGISTIC procedure output analyzes the relationship between a binary response and an ordinal predictor variable, wrist_size Using reference cell coding, the analyst selects Large (L) as the reference level.
    Analysis of Maximum Likelihood Estimates
    Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq
    Intercept 1 -1.0415 0.4749 4.8101 0.0283
    Wrist_size M 1 1.1234 0.4989 5.0697 0.0243
    Wrist_size S 1 1.6078 0.5478 8.6133 0.0033


    What is the estimated log it for a person with large wrist size?
    You can use calculator if needed.

    Can you explain this?
    Thanks

    ReplyDelete
    Replies
    1. NS Khan, the question is asking for the estimated logit for Large Wrist size, which (as noted) is the reference level. Reference level is the Intercept in the logistic procedure output.

      So we can find the intercept in the Parameter column, and find it's logit estimated value in the Estimate column; and we find that the answer is -1.0415.

      The question is testing:
      1.) If you understand that the reference level = the intercept
      2.) If you know in which output column to find the estimated logit.

      Delete
  8. hi. I observe that you’re most likely interested in generating quality backlinks and stuff. I’m promoting scrapebox auto approve link lists. Do you want to trade ? harga saham

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. definitely believe that which you stated Your most wanted justification seemed to be on the internet the easiest thing to be aware of I speak to you, I unquestionably become irked while people consider worries that they simply don’t understand about You managed to hit the nail upon the top along with as well defined out the entire thing without having side-effects , persons know how to take a signal Will likely be back to grow further Thanks answering services

    ReplyDelete
  11. The quickest a business owner can receive the money in their business account is 48 to 72 hours, from the time they submit a complete application. bookkeeping chicago

    ReplyDelete
  12. woh I enjoy your articles , saved to bookmarks ! . Crowdfunding Platforms

    ReplyDelete
  13. Can I recently say what a relief to seek out someone who really knows what theyre discussing over the internet. You certainly learn how to bring a concern to light and produce it critical. The best way to ought to look at this and fully grasp this side from the story. I cant think youre less popular because you absolutely contain the gift. Short term finance

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. great resources here. Ill be back for the next your posting. keep writing and happy blogging. sslc result

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. Hello, this weekend is good for me, since this time i am reading this enormous informative article here at my home. Bolt Posts

    ReplyDelete
  18. Informative Content. Please Keep Sharing Thanks!

    ReplyDelete
Next → ← Prev