Machine Learning Based Application Score Cards

1 Background

Lending organization need to predict the risk in disbursing loan to a customer in order to make a profitable business. They typically use rule based engines to take underwriting decisions. In general these involve manual decision process based upon some pre-defined set of business rules such as the income and debt ratio of a customer should fall under certain range etc..,

2 Limitations of Rule Based Underwriting

For borrowers, these rule based manual decision processes result in filling up long application forms, submission of volumes of documents, several back and forth with lenders, followed by long due diligence with some obscure methods with no status updates, and finally decisions with no justifications and sometimes too late to be of any use. But in this age of consumer, the pressure is on lending organizations to keep the consumers constantly happy with speed and convenience. So lending organizations are looking alternate decision making mechanism which can be quick and consistent.

3 What is Machine Learning (ML) Based Underwriting?

Machine learning is a method of training computers to parse data, learn from it, and then make a determination or prediction regarding new data. Rather than hand-coding a specific set of instructions to accomplish a particular task, the machine is trained using large amounts of data and algorithms to learn how to perform the task. Thanks to rapid increases in data availability and computing power, machine learning now plays a vital role in both technology and business. Machine learning contributes significantly to underwrite an application for a loan from a consumer. So a lender organization can use ML on the available data of the consumers to predict the probability that an applicant becomes a defaulter and hence make a decision whether to approve the loan or not.

4 ML vs Rule Based Underwriting

A machine learning model, unconstrained by some of the assumptions of classic statistical models, can yield much better insights that a human analyst could not infer from the data. Rule based underwriting will not be sufficient in tackling the vast amount of data that is available today. It requires access to big data and a specific set of skills in order to analyze it. This ML based underwriting contributes the automated and quick decision making process. These in turn allowed the lenders to rapidly scale up their operations and hence increase in the Revenue Growth. Additionally, profitability has also been buoyed by higher operating leverage as the change in the business model from “manual, judgmental and relationship driven” to “digital, credit scoring and transaction driven” has led to partial substitution of variable costs by fixed costs.

5 Who Can Use ML Based Application Scorecards?

6 Which Organizations Can Use ML Based Application Scorecards?

Within the NBFCs, the ML Application Scorecard Models are useful in many segments such as:

7 How ML Based Application Scorecards are Developed?

8 Typical Data Requirements

8.1 First to Credit Customer Related Information

  • Age
  • Income
  • Loan Amount applied for
  • Tenure
  • EMI
  • Rate of Interest
  • Address type
  • Income
  • Occupation and others
  • Geographical location of Prospect
  • Identification Proofs

8.2 For repeated lending

  • Types of Loans availed (Two-wheeler loans, Personal loans etc.…)
  • High-Credit
  • Current Balance
  • Amount Overdue
  • Payment History
  • Collateral details
  • Written off details
  • Loans currently being repaid and others

9 Data Exploration

  • Structuring the raw data: Most of the times, raw data that is provided is unstructured i.e. it does not fit into rows and columns under specific labels. Need to structure it.
  • Missing Value Analysis: Missing values may be present for a variable. When the percentage of missing value is less, they are imputed by median or mean if they are numeric and in case of categories they are replaced by mode or a new category.
  • Outlier Treatment: Check for the presence of outliers that lie far away from the other data and affect the overall patterns. Impute or remove the outliers by suitable statistical techniques.
  • Visualization for detecting the patterns: Build several exploratory graphs to analyze the distributions of the individual data points as well as to explore the hidden relations between the variables.

9.1 Define Bad and Goods

Once the target variable is chosen, further analysis processes such as Vintage Analysis (defining interval of time to be considered for defining bad) is done in order to categorize the a customer as ‘Good’ or ‘Bad’.

9.1.1 Vintage Analysis

In VintageAnalysis, the customer behavior is analyzed. For example, if a loan has been issued, the EMI payment behavior is tracked. Let’s say there are 3 buckets in which bucket A signifies customers who did not pay EMI once, bucket B signifies customers who did not pay EMI twice and bucket C signifies customers who did not pay EMI thrice. The customers are segregated into these buckets to analyze their behavior.


9.2.1 Model Building

  • Languages: R/PYTHON based on business call
  • Baseline Model is logistic as it is linear and it generalizes well
  • In presence of high amount of non-linearity in data, we use boosting and bagging techniques such as random forest or XGBoost
  • Best model is chosen based on different metrics of predictive power of the model such as Area under Curve (AUC), Accuracy, and KS Stat etc..,
  • Once the model is generating probability of default we test it on test set to check the model behavior on new data. Model will be tuned until we get similar performance on both train (on which model is built) and test (new data)
  • Once this all is done, we calibrate probability of default to application score

9.2.2 Is the model ready for production?

Once we finalize the champion model, we say that model is ready to go for production after following checks

  • Run the model against never exposed data such as in different period that the model is not built on. It should perform more or less similar to that of train and test data sets
  • Check for the execution time for predicting on a single data point and try to minimize it

9.2.3 Monitoring Model Performance during Production

A ML Application Scorecard Model is deployed for a period of time in production environment. Its predictive performance often degrades over time. A model deployed needs to be abandoned once it the model performance degradation hits a threshold. Therefore, a cautious monitoring is needed to check performance of the model that is deployed. The following are a few of the checks:

  • Evaluate and track accuracy metrics over time
  • Define multiple levels of performance degradation thresholds so that warning notifications are generated
  • A notification should be sent to the concerned when a threshold is crossed
  • Events of model monitoring tasks should be logged
  • Tracking Input and Output variable distributions shift

9.2.4 How to Use ML Based Application Score Card?

The Application Scorecard model can help optimize business operations by providing the lending organization with efficient and quick analysis of its customers. When a new application is subjected to the model, the customer can be categorized as ‘HIGH’,’MODERATE’ and ‘LOW’ risk customers. The HIGH-risk customers can be rejected directly whereas the LOW risk customers can be approved loans. In case of the MODERATE risk customers should be dealt with some human intervention.

10 New/Upcoming Trends:

In recent times to capture the consumer’s debt payment behavior new data points are being considered for ML based application score cards. Some of them include:

  • Utilities: Gas, Electricity etc.., spend patterns
  • Telecom: Mobile phone, TV, Broadband etc.., usage bills
  • Rent
  • Property/ Asset record: Has refrigerator, TV, Washing Machine etc..,
  • Number of Dependents
  • House type
  • Shopping and Travelling Patterns
  • Social Media such as Facebook, LinkedIn, Twitter etc..,
  • Web clickstream

11 Advantage of ML Based Application Scorecards

  • Automated / supported decision process
  • Unbiased, fast, consistent and scalable decision making
  • Ability of detecting patterns in loads of available data
  • Predicting the BAD customers at LOS itself more accurately and hence increase in the profits

Karvy Analytics

115 Broadway, Suite 1506
New York, NY 10006
Tel: 212 267 4334
Fax: 212 267 4335
Registered Address
"Karvy House", 46 Avenue 4,
Street No. 1, Banjara Hills,
Hyderabad 500 034
Tel No:(+91-40) 23312454, 23320751
Fax No:(+91-40) 23311968
back to top