Important Credit Risk Modeling Projects
Probability of Default (PD)
tells us the likelihood that a borrower will default on the debt (loan or credit card). In simple words, it returns the expected probability of customers fail to repay the loan.Loss Given Default (LGD)
is a proportion of the total exposure when borrower defaults. It is calculated by (1 - Recovery Rate). For example someone takes $200,000 loan from bank for purchase of flat. He/She paid some installments before he stopped paying installments further. When he defaults, loan has an outstanding balance of $100,000. Bank took possession of flat and was able to sell it for $90,000. Net loss to the bank is $10,000 which is 100,000-90,000, and the LGD is 10% i.e. $10,000/$100,000.Exposure at Default (EAD)
is the amount that the borrower has to pay the bank at the time of default. In the above example shown in LGD, outstanding balance of $100,000 is EAD
Datasets for Credit Risk Modeling Projects
We have gathered data from several sources. See the list below. The following websites own the copyright on these data and authorizes their reproduction.- Kaggle
- UCI Machine Learning Repository
- Econometric Analysis Book by William H. Greene
- Credit scoring and its applications Book by Lyn C. Thomas
- Credit Risk Analytics Book by Harald, Daniel and Bart
- Lending Club
- PAKDD 2009 Data Mining Competition, organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco
- Credit bureau variables which contains details about borrower's previous credits provided by other banks
- Previous Loans that the applicant had with Home Credit
- Previous Point of sales and cash loans that the applicant had with Home Credit
- Previous Credit Cards that the applicant had with Home Credit
Variable Name | Description |
---|---|
SeriousDlqin2yrs | Person experienced 90 days past due delinquency or worse |
RevolvingUtilizationOfUnsecuredLines | Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits |
age | Age of borrower in years |
NumberOfTime30-59DaysPastDueNotWorse | Number of times borrower has been 30-59 days past due but no worse in the last 2 years. |
DebtRatio | Monthly debt payments, alimony,living costs divided by monthy gross income |
MonthlyIncome | Monthly income |
NumberOfOpenCreditLinesAndLoans | Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) |
NumberOfTimes90DaysLate | Number of times borrower has been 90 days or more past due. |
NumberRealEstateLoansOrLines | Number of mortgage and real estate loans including home equity lines of credit |
NumberOfTime60-89DaysPastDueNotWorse | Number of times borrower has been 60-89 days past due but no worse in the last 2 years. |
NumberOfDependents | Number of dependents in family excluding themselves (spouse, children etc.) |
You can download data and its description from this link
- Taiwan: http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
- Germany: http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
- Australia: http://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29
- Japan: http://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening
- Poland: http://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data
Dataset about credit card defaults in Taiwan contains several attributes or characters which can be leveraged to test various machine learning algorithms for building credit scorecard.
Note : Poland dataset contains information about attributes of companies rather than retail customers.
To download the datasets below, visit the link and fill the required details in the form. Once filled, you can download the datasets.
1. Data Set HMEQThe data set HMEQ reports characteristics and delinquency information for 5,960 home equity loans. A home equity loan is a loan where the obligor uses the equity of his or her home as the underlying collateral.
2. Data Set MortgageThe data set mortgage is in panel form and reports origination and performance observations for 50,000 residential U.S. mortgage borrowers over 60 periods. The periods have been deidentified. As in the real world, loans may originate before the start of the observation period (this is an issue where loans are transferred between banks and investors as in securitization). The loan observations may thus be censored as the loans mature or borrowers refinance. The data set is a randomized selection of mortgage-loan-level data collected from the portfolios underlying U.S. residential mortgage-backed securities (RMBS) securitization portfolios and provided by International Financial Research (www.internationalfinancialresearch.org).
3. Data Set LGDThe data set has been kindly provided by a European bank and has been slightly modified and anonymized. It includes 2,545 observations on loans and LGDs.
4. Data Set RatingsThe ratings data set is an anonymized data set with corporate ratings where the ratings have been numerically encoded (1 = AAA, etc.).
Data Description is shown below -
Bad Good/bad indicator 1 = Bad 0 = Good yob Year of birth (If unknown the year will be 99) nkid Number of children dep Number of other dependents phon Is there a home phone (1=yes, 0 = no) sinc Spouse's income aes Applicant's employment status V = Government W = housewife M = military P = private sector B = public sector R = retired E = self employed T = student U = unemployed N = others Z = no response dainc Applicant's income res Residential status O = Owner F = tenant furnished U = Tenant Unfurnished P = With parents N = Other Z = No response dhval Value of Home 0 = no response or not owner 000001 = zero value blank = no response dmort Mortgage balance outstanding 0 = no response or not owner 000001 = zero balance blank = no response doutm Outgoings on mortgage or rent doutl Outgoings on Loans douthp Outgoings on Hire Purchase doutcc Outgoings on credit cards
asdsad
ReplyDeleteYou just saved a hell lot of time for me!! I was struggling a lot to find lgd data. You just made my task simpler.
ReplyDeleteHi ,
ReplyDeleteI am looking for Indian credit data set , along with default flags , and loan types for my research . Will you be able to help me with any references please
find listen data extremely useful.It makes understanding difficult concepts of analytics extremely easy.
ReplyDeleteThanks a ton once again :)
You've done so much a great job!
ReplyDeleteThanks a bunch!
Thanks a lot...
ReplyDeleteHi, I am not able to download the LGD data from the link given above. Could anyone kindly help me with a source to get the LGD data
ReplyDeleteI have updated the link
Delete