Blog article
See all stories »

Aspects of MLOps on Cloud

 

Aspects of MLOps on Cloud 

Focus on Retail Banking

by:   Rajeev Verma

 

 

 


 

I. Introduction

Over the last 5 years, the adoption of cloud-enabled computing in the banking industry has gained traction. Moving from On-premise model deployment to big-data clusters was a step towards Machine Learning operations (MLOps). 

 MLOps processes are defined taking into account model characteristics such as end usage, scoring frequency, model algorithm etc. Apart from these model characteristics, it is also important to understand the regulatory requirement towards the model monitoring, fair lending and model explainability before planning to initiate the MLOps process.  It is also very important to have strong governance by clearly defining roles and responsibilities. In this paper, I have covered different aspects of the MLOps process and reference solutions.

II. Key aspects of ML

In the recent industry scenario modelling units and the business, stakeholders are increasingly focusing on the faster deployment of the model and therefore a lot of resources are being deployed for seamless cloud migration.  Further sections discuss the key components involved in each stage, the challenges and the best practices adopted while migrating the process to the cloud.

i. Data/ Model pipelines

Data pipelines are nothing but a way to ensure that data is transported from source to target efficiently, in usable form and in an automated manner.  An automated data pipeline becomes critical when the objective is to make a real-time data-driven informed decision. The important features of a reliable pipeline in the context of risk  and regulatory predicted models are :

a)    Data Quality checks        b)  Automated ETL          c) Disaster Recovery of the data                d) Flexible in handling the stress scenarios

New age data pipelines enable easy access to data from various sources such as apps, APIs, and perform better on analytics and insights opportunities. In the BFSI domain when dealing with critical projects such as AML, fraud detection etc. it is very critical to have real-time data feed by establishing an automated data pipeline.  However, if a task is related to credit scoring which is more often a batch process, we may not need a fully automated data pipeline.

ii. CI/CD Pipelines

One question that arises here is: How complex should the CI-CD paradigm be?  Well, it depends on many things such as the business context, modelling complexity and usage of the model.  However, it is recommended to start with a simple CI-CD pipeline vs a complex one and MLOps engineers can enhance it periodically.

iii. Monitoring and Observability

Monitoring refers to the process of identifying whether the model health checks are in place and whether model performance is within the benchmark. However, monitoring alone does not help data scientists.  It is the observability that gives answers to the data scientists on “Why the model performance deteriorated?”.  Observability helps in identifying the root cause of the model and data drift.

 

(a). Concept of drift

It is a well-known saying that “Change is the only Constant”, and it applies well in the domain of model development. Model monitoring is a key aspect of MLOps.  Monitoring or validation includes two concepts. First, monitoring the model performance that includes comparing statistical metrics vs their respective thresholds.  Second, tracking the change in data which includes tracking the population shift[1]  

(b))Data Drift and Model Drift

Data drift is defined as any statistically significant change in the distribution of the data when compared with the testing and training environment.  Model drift is defined as a breach in the model performance threshold. Generally, credit scoring models developed on a stable portfolio perform consistently for 3-4 years.  The life of the model gets affected either due to any external internal policy change, regulatory intervention, macro-economic factors or pandemics such as Covid-19.  The pace and intensity will depend on the characteristics of the portfolio on which the model is built[2].   

 When we talk about monitoring, one obvious question that comes to mind is how frequently the monitoring and retraining/recalibration exercise must be performed.  Well, the answer depends on factors such as business domain, frequency of data collection and cost-benefit analysis[3]

III What level of cloud maturity we should adopt

As understood so far, for any large business, the decision to migrate to cloud infrastructure is not an easy path to walk.  Various cloud service providers have defined different levels of maturity on cloud adoption. The below table shows various models against the probable level of maturity requirement and their features. 

                  Model type:      Level (Low(1) to High(3):        Feature

 

               Risk Scoring Models            Level1                                       Manual Pipeline

                  Response/

recommendation model                          Level 2                                     Automated Pipeline

                  Fraud/AML                           Level 3                                     Automated Pipeline and Deployment

 

Migrating to the cloud opens up the scope for innovation for any product owner, however, given the differences in the size of the books, and current infrastructure not all migration/cloud adoption will follow the same path.       In the image, I have shown personal view on a ML Ops Maturity Grid that helps to do a gap assessment, which guides an organization to decide the right MLOps capabilities rather than going with a fully mature environment.  The following table gives reference to different cloud components and indicative levels of maturity in adopting the MLOps process.  Aspects of MLOps on Cloud

 

Table:  Reference  Grid Level of MLOps Components Maturity across Model Types (image)

 

IV. Conclusion

In summary, the paper provides a comprehensive overview of the transformation of cloud-enabled computing in the banking sector, with a specific emphasis on MLOps processes and their various components. It sheds light on the importance of considering model characteristics, regulatory requirements, and governance practices in successful MLOps implementation. By dissecting key aspects of MLOps and cloud maturity, the paper offers valuable insights into the challenges and strategies associated with adopting these transformative practices in the banking industry.

 

[1] which could be due to economic changes or changes in underwriting policies or changes in customer behaviour etc.

[2]If the portfolio is new and the product is newly launched the model performance may decay faster as compared to the model which is built on a stable portfolio.  

[3] Areas like AML, Fraud detection, Recommendation engine or Cyber security would need more frequent retraining of models. If the model monitoring pipeline takes one week to complete and give only 0.5% expected lift, then high frequency of model update may not be justified. Challenges could be there if the ground truth or actual performance is captured with lag or delay.  In practice, there are models which are retrained daily (e.g. recommendation engine) and once or twice a year such as the credit scoring model

 

2395

Comments: (0)

Rajeev Verma

Rajeev Verma

Consultant

Tata Consultancy Services Ltd

Member since

28 Sep 2022

Location

Mumbai

Blog posts

4

This post is from a series of posts in the group:

Analytics in Banking

This is for discussion and sharing of views on trends, practices and views in analytics in banking and financial industry


See all

Now hiring