XGBoost for Interpretable Credit Models
Model documentation for Stakeholders and Validation
Created on April 11|Last edited on April 11
Comment
Executive Summary and Model OverviewModel Stakeholders Model Development Purpose and Intended UseModel Description and OverviewOverview of ResultsModel InterdependenciesModel Data OverviewData Source Overview and AppropriatenessInput Data Extraction, Preparation, and Quality & CompletenessData AssumptionsProtected ClassesModel Theoretical Framework and MethodologyModel Development OverviewModel MethodologyModel Performance and StabilityModel Validation StabilityModel PerformanceModel InterpretabilityModel Implementation and Output ReportingVersion Control
Executive Summary and Model Overview
Handcrafted credit scorecards are still common across many areas of Finance, partially due to their interpretability vs more complex credit score modeling methods. However there are ways to leverage more complex modeling techniques such as XGBoost in credit assessment that can increase the performance of the assessment, whilst retaining interpretability for internal Risk Management functions as well as external regulators
Model Stakeholders
Model Development Purpose and Intended Use
XGBoost Classifier will be used to classify whether submitted loan applications will default or not.
Typically, we would in a summary of the business need for this particular model. Concretely, how the model will be used to address this business problem. Furthermore, we should describe with great precision all model uses covered by this document. These descriptions will address this statement made in regulatory guidance, FRB SR-11-7, "Even a fundamentally sound model producing accurate outputs consistent with the design objective of the model may exhibit high model risk if it is misapplied or misused."
Model Description and Overview
Model -> XGBoost
Preprocessing steps
Maybe a picture
Overview of Results
Put performance details
Model Interdependencies
Understanding interdependent relationships allows for an enhanced understanding of, and improved ability to manage and aggregate model risk at the company. Explain how this model is interconnected with other models in the model inventory. If the output of this model feeds an interdependent model then the direction of that relationship is "downstream" otherwise it is "upstream." In addition to the directional relationship, also provide a brief description of each interconnected model.
Model Data Overview
Data Source Overview and Appropriateness
raw-dataset
Direct lineage view
Some nodes are concealed in this view - Break out items to reveal more.
Input Data Extraction, Preparation, and Quality & Completeness
Data Assumptions
Protected Classes
`AgeInMonths` as a feature might be suspect, and we might be introducing some bias into our model based on this feature alone. Generally, a creditor such as a lender or dealer cannot use your age to make credit decisions.
There are exceptions to this rule, such as:
- If the applicant is too young to enter into a contract. State law governs the age at which you can enter into a legally binding contract
- Age can be considered in a valid credit scoring system. The credit scoring system may not disfavor applicants 62 years old or older. It may favor applicants 62 years or older
- A lender or dealer may relate your age to other information about you that the lender or dealer considers in evaluating creditworthiness. For example, a lender or dealer may consider your job and length of time to retirement to determine whether your income, including your retirement income, will be adequate for the life of the loan.
For a fairness assessment later on, we'll consider `AgeInMonths` as a protected feature
Model Theoretical Framework and Methodology
Model Development Overview
Model Methodology
Model Performance and Stability
Model Validation Stability
Data Partitioning Methodology
Model Performance
Explored Model Specifications
Sweeps to investigate hyperparameter space
Model Interpretability
Key Drivers and Dependence Plots
Prediction Explanations
Shap as a Supervised Embedding
Considering the Shap value + labels as a means to complete a supervised embedding, we'll use the tables functionality to vizualize this embedding in our report
Model Implementation and Output Reporting
Version Control
Local
Section 1
Add a comment