Credit Risk Modelling- WIP

Python Code WIP available in my GitHub

Contents

  • About dataset
  • Exploring the data
  • Cleaning the data
    • Creating dummy variables for categorical variables
    • Dealing with Missing values
    • Labeling good borrowers and bad borrowers from loan status (target/dependent variable)
  • split dataset into train and test
  • Preprocessing data (calculating Information Value)
    • Calculating Weight of Evidence and Visualizing WoE for easier Coarse classing
  • Probability of Default model with Logistic Regression
  • PD model validation
  • Loss Given Default and Exposure at Default models
  • Expected Loss

About Dataset

The dataset used for this project belongs to Lending Club, a peer-to-peer Lending company based in the US. They match people looking to invest money with people looking to borrow money. When investors invest their money through Lending Club, this money is passed onto borrowers, and when borrowers pay their loans back, the capital plus the interest passes on back to the investors.

The Lending Club dataset contains complete loan data for all loans issued through the 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. Features (aka variables) include credit scores, number of finance inquiries, address including zip codes and state, and collections among others. Collections indicates whether the customer has missed one or more payments and the team is trying to recover their money. The file is a matrix of about 890 thousand observations and 75 variables.

Exploring the data

Cleaning data

Converting string values into int to make them easier to work with

converting to datetime to get the time since the first cr_line in months

Creating Dummy variables

Replacing Null Values

Total high credit/credit limit nulls are replaced with the total amount committed to that loan.

For nulls in annual income we replace with the mean annual income, the rest are replaced with Zero.

Labeling good borrowers and bad borrowers

Bad borrowers are the users with loan status as: Charged Off, Default, Does not meet credit policy. Status:Charged Off or Late(31-120 days. Good Borrowers are the rest

Splitting the Dataset

Using Sci-kit learn for splitting the dataset into train and test. 80 is for training and 20% for testing

Calculating WoE and Information Value

Visualizing for easier Coarse Classing

This is repeated for the rest of the variables (see code for more)

Based on the graphs, the categories are grouped with similar WoE, taking care that groups contain broadly a similar number of observations

Leave a Reply

Your email address will not be published. Required fields are marked *