ARN

The importance of monitoring machine learning models

Changing assumptions and ever-changing data mean the work doesn’t end after deploying machine learning models to production. These best practices keep complex models reliable.

Agile development teams must ensure that microservices, applications, and databases are observable, have monitoring in place to identify operational issues, and use AIops to correlate alerts into manageable incidents.

When users and business stakeholders want enhancements, many devops teams follow agile methodologies to process feedback and deploy new versions.

Even if there are few requests, devops teams know they must upgrade apps and patch underlying components; otherwise, the software developed today will become tomorrow’s technical debt.

The life-cycle management of machine learning models is more complex than software.

Andy Dang, co-founder and head of engineering at WhyLabs, explains, “model development life cycle resembles software development life cycle from a high level, but with much more complexity. We treat software as code, but data, the foundation of an ML model, is complex, highly dimensional, and its behaviour is unpredictable.”

In addition to code, components, and infrastructure, models are built using algorithms, configuration, and training data sets. These are selected and optimised at design time but need updating as assumptions and the data change over time.

Why monitor machine learning models?

Like monitoring applications for performance, reliability, and error conditions, machine learning model monitoring provides data scientists visibility on model performance. ML monitoring is especially important when models are used for predictions or when the ML runs on datasets with high volatility.

Dmitry Petrov, cofounder and CEO of Iterative, says, “The main goals around model monitoring focus on performance and troubleshooting as ML teams want to be able to improve on their models and ensure everything is running as intended.”

Rahul Kayala, principal product manager at Moveworks, shares this explanation on ML model monitoring. 

“Monitoring can help businesses balance the benefits of AI predictions with their need for predictable outcomes,” he says. “Automated alerts can help ML operations teams detect outliers in real time, giving them time to respond before any harm occurs.”

Stu Bailey, cofounder of ModelOp, adds, “Coupling robust monitoring with automated remediation accelerates time to resolution, which is key for maximising business value and reducing risk.”

In particular, data scientists need to be notified of unexpected outliers. “AI models are often probabilistic, meaning they can generate a range of results,” says Kayala.

“Sometimes, models can produce an outlier, an outcome significantly outside the normal range. Outliers can be disruptive to business outcomes and often have major negative consequences if they go unnoticed. To ensure AI models are impactful in the real world, ML teams should also monitor trends and fluctuations in product and business metrics that AI impacts directly.”

For example, let’s consider predicting a stock’s daily price. When there’s low market volatility, algorithms such as the long short-term memory (LSTM) can provide rudimentary predictions, and more comprehensive deep learning algorithms can improve accuracy. But most models will struggle to make accurate predictions when markets are highly volatile, and model monitoring can alert for these conditions.

Another type of ML model performs classifications, and precision and recall metrics can help track accuracy.

Precision measures the true positives against the ones the model selected, while recall tracks a model’s sensitivity. ML monitoring can also alert on ML model drift, such as concept drift when the underlying statistics of what’s being predicted change, or data drift when the input data changes.

A third concern is explainable ML, where models are stressed to determine which input features contribute most significantly to the results. This issue relates to model bias, where the training data has statistical flaws that skew the model to make erroneous predictions.

These issues can erode trust and create significant business issues.Model performance management aims to address them across the development, training, deployment, and monitoring phases.

Krishnaram Kenthapadi, chief scientist at Fiddler, believes that explainable ML with reduced risk of biases requires model performance management.

“To ensure ML models are not unduly discriminating, enterprises need solutions that deliver context and visibility into model behaviours throughout the entire life cycle—from model training and validation to analysis and improvement,” says Kenthapadi.

“Model performance management ensures models are trustworthy and helps engineers and data scientists identify bias, monitor the root cause, and provide explanations for why those instances occurred in a timely manner.”

Best practices in ML monitoring

Modelops, ML monitoring, and model performance management are terms for practices and tools to ensure machine learning models operate as expected and provide trustworthy predictions. What underlying practices should data science and devops teams consider in their implementations?

Josh Poduska, chief field data scientist at Domino Data Lab, says, “Model monitoring is a critical, ongoing process. To improve future accuracy for a model that has drifted, retrain it with fresher data, along with its associated ground truth labels that are more representative of the current reality.“

Ira Cohen, chief data scientist and cofounder at Anodot, shares important factors in ML model monitoring. “First, models should monitor output and input features’ behaviour, as shifts in the input features can cause issues,” he says. He suggests using proxy measures when model performance cannot be measured directly or quickly enough.

Cohen says data scientists need tools for model monitoring. He says, “Monitoring models manually is not scalable, and dashboards and reports are not equipped to handle the complexity and volume of the monitoring data generated when many AI models are deployed.”

Here are some recommended practices for ML model monitoring and performance management. Petrov recommends, “Ensure you have the tools and automation in place upstream at the beginning of the model development life cycle to support your monitoring needs.”

Meanwhile, Dang says, “Data engineers and scientists should run preliminary validations to ensure their data is in the expected format. As the data and the code move through a CI/CD pipeline, they should enable data unit testing through validations and constraint checks.”

Cohen suggests, “Use scalable anomaly detection algorithms that learn the behaviour of each model’s inputs and outputs to alert when they deviate from the norm, effectively using AI to monitor AI.”

In addition, Kayala says, “Track the drift in the distribution of features. A large change in distribution indicates the need to retrain our models to achieve optimal performance.”

Also, Bailey adds, “Increasingly, organisations are looking to monitor model risk and ROI as part of more comprehensive model governance programs, ensuring that models meet business and technical KPIs.”

Software development largely focuses on maintaining the code, monitoring application performance, improving reliability, and responding to operational and security incidents. In machine learning, ever-changing data, volatility, bias, and other factors require data science teams to manage models across their life cycle and monitor them in production.