####
**SAS Analytics :**
Practical SAS, Statistics & Analysis Course

This tutorial provides detailed explanation and steps to calculate concordance, discordance and c statistics (AUC) with example. By default, every statistical packages like SAS,SPSS and R generates these model fit measures when you run syntax for logistic regression. However, it is important to know how these model performance metrics are calculated mathematically. One more reason to know the calculation behind these metrics is it would give you an edge over your peers when your predictive model demands calibration or refitting.

Download the SAS data file from

In general, higher percentages of concordant pairs and lower percentages of discordant and tied pairs indicate a more desirable model.

The code below calculates these performance metrics in SAS. This program executes each step explained above theoretically.

Understanding Concordance and AUC |

Download the SAS data file from

**UCLA website**.**Steps to calculate concordance / discordance and AUC**- Calculate the predicted probability in logistic regression model.
- Divide the data into two datasets. One dataset contains observations having actual value of dependent variable with value 1 (i.e. event) and corresponding predicted probability values. And the other dataset contains observations having actual value of dependent variable 0 (non-event) against their predicted probability scores.
- Compare each predicted value in first dataset with each predicted value in second dataset. Total Number of pairs to compare = x * y
- A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
- A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
- A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).
- The final percent values are calculated using the formula below -

x: Number of observations in first dataset (actual values of 1 in dependent variable)

y: Number of observations in second dataset (actual values of 0 in dependent variable).

In this step, we are performing

**cartesian product (cross join) of events and non-events**. For example, you have 100 events and 1000 non-events. It would create 100k (100*1000) pairs for comparison.

Percent Concordant = (Number of concordant pairs)/Total number of pairs

Percent Discordance = (Number of discordant pairs)/Total number of pairs

Percent Tied = (Number of tied pairs)/Total number of pairs

Area under curve (c statistics) = Percent Concordant + 0.5 * Percent Tied

**Interpretation of Concordant, Discordant and Tied Percent**

**Percent Concordant :**Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event).

**Percent Discordant :**Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event).

**Percent Tied :**Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event).

**c statistics (AUC) :**c-statistics is also called area under curve (AUC). It is calculated by adding Concordance Percent and 0.5 times of Tied Percent

In general, higher percentages of concordant pairs and lower percentages of discordant and tied pairs indicate a more desirable model.

**SAS Code for Concordant / Discordant / AUC :**

The code below calculates these performance metrics in SAS. This program executes each step explained above theoretically.

/* Creates library reference. The data file is stored in this directory*/

libname file "C:\Users\Deepanshu\Downloads";

/* Run logistic regression and generate estimated probability in the dataset named "estprob" with variable name "pred"*/

Proc logistic data= file.binary descending;

class rank / param=ref ;

model admit = gre gpa rank;

output out = estprob p= pred;

run;

/*Divide the data into two datasets- event and non-event*/

Data event nonevent;

Set estprob;

If admit = 1 then output event;

else if admit = 0 then output nonevent;

run;

/*Cartesian product of event and non-event actual cases*/

Proc SQL noprint;

create table pairs as

select a.admit as admit1, b.admit as admit0,

a.pred as pred1,b.pred as pred0

from event a cross join nonevent b;

quit;

/*Calculating concordant,discordant and tied percent*/

Data pairs;

set pairs;

concordant =0;

discordant=0;

tied=0;

If pred1 > pred0 then concordant = 1;

else If pred1 < pred0 then discordant = 1;

else tied = 1;

run;

/*Mean values - Final Result*/

Proc Means Data= Pairs Mean;

Var Concordant Discordant Tied;

Run;

Very precise and clear explanation of concordance and discordance. Also the code helps in better understanding of the phenomenon. Thanks.

ReplyDeleteThank you for your appreciation. Cheers!

DeleteNeat explanations, really helpful to understood these definitions. Thanks!

ReplyDeleteVery clear explanation, thank you :)

ReplyDeleteThanks for the post! Shouldn't it be proc logistic with descending option? as we are treating 1s as events and 0 as nonevents

ReplyDeleteCorrected! Thanks for pointing it out.

DeleteFirst time I understood concordance and discordance. Thanks

ReplyDeleteFor a good model what should be the concordance?

ReplyDeleteConcordance Percent should be 80 or above.

DeleteVery good explanation

ReplyDeleteVery informative, clear, and to the point

ReplyDeleteVery good explanation and informative. Thanks Buddy keep sharing

ReplyDeleteCan you please give the calculation of concordance and disconcordance in excel format with example which will be easy to understand the calculation.

ReplyDeleteThe above codes are very useful. Any suggestions for weighted data?

ReplyDelete