Lending Club is an online marketplace that matches borrowers with lenders.
We analyze the demographic and financial information about borrowers
to predict credit worthiness and model the expected value of loans.
Lending Club (LC) asks prospective borrowers to supply demographic and financial information (e.g. zip code and credit score). LC uses this information to give each borrower a grade. Prospective lenders can use this grade to infer the credit worthiness of the borrower.
The LC grade is generally predictive of the borrower's credit worthiness.
If we break down the borrowers according to whether they have a mortgage or rent their home, some differences among LC grade levels emerge. (Error bars indicate 95% confidence intervals.)
We continue to see a difference between mortgage holders and renters when we break down the loans according to the Lending Club assigned subgrade. Here are the loans within the B grade.
The Interest Rate is determined in large part by the Lending Club Grade.
I created a regularized logistic regression model to predict loans that would default or be delinquent. Upon cross-validation the model is 90% accurate, but the recall is only 33%.
I also created an ensemble model, grouping by LC grade and fitting the model on each grade. The ensemble model has 72% recall and 60% accuracy (i.e. the model correctly identifies 72% of loans that defaulted or were delinquent).
Since very few borrowers reach delinquency or default, I over-sampled the default/delinquency group when fitting the model.
The coefficients across LC grade Levels for borrowers who rent, and borrowers who have mortgages are displayed.