Calculating Concordant, Discordant and Tied Pairs

Live Online Training :SAS Programming with 50+ Case Studies - Explain Programming Concepts in Simple English - Live Projects - Scenario Based Questions - Job Placement Assistance - Get 10% off till Oct 26, 2017 - Batch starts from October 28, 2017

This tutorial provides detailed explanation and steps to calculate concordance, discordance and c statistics (AUC) with example. By default, every statistical packages like SAS,SPSS and R generates these model fit measures when you run syntax for logistic regression. However, it is important to know how these model performance metrics are calculated mathematically. One more reason to know the calculation behind these metrics is it would give you an edge over your peers when your predictive model demands calibration or refitting.
 Understanding Concordance and AUC

Steps to calculate concordance / discordance and AUC
1. Calculate the predicted probability in logistic regression model.
2. Divide the data into two datasets. One dataset contains observations having actual value of dependent variable with value 1 (i.e. event) and corresponding predicted probability values. And the other dataset contains observations having actual value of dependent variable 0 (non-event) against their predicted probability scores.
3. Compare each predicted value in first dataset with each predicted value in second dataset.
4.    Total Number of pairs to compare = x * y
x:  Number of observations in first dataset (actual values of 1 in dependent variable)
y: Number of observations in second dataset (actual values of 0 in dependent variable).

In this step, we are performing cartesian product (cross join) of events and non-events. For example, you have 100 events and 1000 non-events. It would create 100k (100*1000) pairs for comparison.

5. A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
6. A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
7. A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).
8. The final percent values are calculated using the formula below -
Percent Concordant = (Number of concordant pairs)/Total number of pairs
Percent Discordance = (Number of discordant pairs)/Total number of pairs
Percent Tied = (Number of tied pairs)/Total number of pairs
Area under curve (c statistics) = Percent Concordant + 0.5 * Percent Tied

Interpretation of Concordant, Discordant and Tied Percent

Percent Concordant : Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event).

Percent Discordant : Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event).

Percent Tied : Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event).

c statistics (AUC) : c-statistics is also called area under curve (AUC). It is calculated by adding Concordance Percent and 0.5 times of Tied Percent

In general, higher percentages of concordant pairs and lower percentages of discordant and tied pairs indicate a more desirable model.

SAS Code for Concordant / Discordant / AUC :

The code below calculates these performance metrics in SAS. This program executes each step explained above theoretically.
/* Creates library reference. The data file is stored in this directory*/

/* Run logistic regression and generate estimated probability in the dataset named "estprob" with variable name "pred"*/
Proc logistic data= file.binary descending;
class rank / param=ref ;
model admit = gre gpa rank;
output out = estprob p= pred;
run;

/*Divide the data into two datasets- event and non-event*/
Data event nonevent;
Set estprob;
If admit = 1 then output event;
else if admit = 0 then output nonevent;
run;

/*Cartesian product of event and non-event actual cases*/
Proc SQL noprint;
create table pairs as
a.pred as pred1,b.pred as pred0
from event a cross join nonevent b;
quit;

/*Calculating concordant,discordant and tied percent*/
Data pairs;
set pairs;
concordant =0;
discordant=0;
tied=0;
If pred1 > pred0 then concordant = 1;
else If pred1 < pred0 then discordant = 1;
else tied = 1;
run;

/*Mean values - Final Result*/
Proc Means Data= Pairs Mean;
Var Concordant Discordant Tied;
Run;

SAS Tutorials :100 Free SAS Tutorials

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

16 Responses to "Calculating Concordant, Discordant and Tied Pairs"

1. Very precise and clear explanation of concordance and discordance. Also the code helps in better understanding of the phenomenon. Thanks.

1. Thank you for your appreciation. Cheers!

2. Neat explanations, really helpful to understood these definitions. Thanks!

3. Very clear explanation, thank you :)

4. Thanks for the post! Shouldn't it be proc logistic with descending option? as we are treating 1s as events and 0 as nonevents

1. Corrected! Thanks for pointing it out.

5. First time I understood concordance and discordance. Thanks

6. For a good model what should be the concordance?

1. Concordance Percent should be 80 or above.

7. Very good explanation

8. Very informative, clear, and to the point

9. Very good explanation and informative. Thanks Buddy keep sharing

10. Can you please give the calculation of concordance and disconcordance in excel format with example which will be easy to understand the calculation.

11. The above codes are very useful. Any suggestions for weighted data?

12. Hello, I want to know, what to do in cases where tied percentage is high, say 20%. How to reduce tied percentage?

13. Excellent Work. Thanks for such detailed description.

Next → ← Prev