This is a guest blog by Ranjan Bhattacharya
The earlier post on this topic covered the need for MLOps. This post will look into the various maturity levels for MLOps and also review some of the MLOps tools available.
As shown in the diagram below, an ML system can comprise a large and complex infrastructure, with cross-functional responsibilities spanning multiple teams:
The first two stages, part of the “ML Pipeline”, include tasks related to data preparation, model experimentation and training, typically performed by data engineers, data scientists and ML researchers. The last two stages, part of the “Deployment Pipeline”, involve model deployment, monitoring and automatic retraining, which are handled by infrastructure engineers. The degree of automation achieved across these stages can be assessed using maturity levels of MLOps practices. Although quite a few measures of maturity levels have been proposed by industry experts, one of the more popular ones is from Google: MLOps: Continuous delivery and automation pipelines in machine learning, which is notable for its simplicity and coverage.
Organizations will have to iteratively evolve and mature the stages of ML lifecycle based on the nature of the business problem being addressed. On one end of the spectrum, at MLOps Level 0, there may be data scientists working individually, on small datasets on their local machines, manually and deploying models to production only a few times in a year.
With increasing complexity of business cases, tools, and oversight, it becomes necessary to move up the maturity scale. MLOps Level 1 is recommended for companies with more frequently changing data sets, possibly monthly. At this level the ML pipeline stage is automated while the testing and deployment of the models can still be manual.
Web-scale companies with very frequent data changes, daily if not hourly, need to be at MLOps Level 2, to allow for continuous retraining and deployment of ML models to multiple nodes or servers. Being at this level also may need ML lifecycle responsibilities to be spread across multiple teams.
For software (SaaS) companies that are helping enterprises build ML capabilities, the considerations for the right MLOps maturity level gets more nuanced. Companies which are building general purpose ML solutions, targeting multiple cloud providers, being at the top of the MLOps maturity curve allows them to deploy a repeatable end-to-end process to address the different needs of different organizations, although some customizations may be unavoidable. Software companies specializing in a specific domain like language or image processing, chatbots, fraud detection etc. may need different approaches to MLOps automation.
There is a wide choice of tools and technologies available for enabling MLOps in enterprise and software companies. All major public cloud providers offer a wide coverage of MLOps features which may be sufficient for most organizations.
AWS SageMaker is Amazon’s fully managed integrated environment for AI/ML that provides a set of capabilities to build, train, deploy, and monitor ML models. It comes with the following services:
- Amazon SageMaker Studio a visual interface for ML development including organizing, tracking, comparing, and evaluating different machine learning experiments.
- Amazon SageMaker Model Monitor for detecting quality deviations for deployed machine learning models.
- Amazon SageMaker Autopilot for building machine learning models automatically with full visibility and control.
- Amazon SageMaker Pipelines for building continuous integration and continuous delivery (CI/CD) service for ML.
Azure ML is Microsoft’s AI/ML service that offers similar capabilities for enabling ML pipelines. It comes with the following services:
- Azure Machine Learning Designer a visual canvas to build, test, and deploy machine learning models.
- Azure Pipelines for building, and managing automated ML deployments.
- Azure Monitor for tracking and analyzing metrics.
- Azure MLOps is Microsoft’s own open source MLOps environment.
Google AI Platform is Google Cloud’s AI/ML service offering which provides the following services:
- AI Platform Notebooks a managed Jupyter Notebook service that offers an integrated and secure JupyterLab environment to experiment, develop, and deploy models into production.
- AI Platform Training a fully managed model training service.
- AI Platform Predictions to deploy and run models at scale and make them available for online and batch prediction requests.
- AI Platform Pipelines for applying MLOps with best practices and robust, repeatable pipelines.
Each of these cloud providers also offers additional capabilities to enable ML at scale like distributed data storage and replication, stream processing, containerization, access to GPUs etc. They also offer specialized ML services for text, audio, and image processing which can address specific business needs.
For software companies whose products need to deploy to multiple cloud providers, or enterprises exploring a multi-cloud strategy, it may be worth looking into cloud-agnostic MLOps tools. There are quite a few choices in this space, ranging from fully free and open-source to subscription based. It should be kept in mind though because of how nascent this field is, no single tool may satisfy all requirements. Most companies will need a combination of different tools for end-to-end lifecycle automation. The major categories of tools in this space are
|Versioning||Orchestration||Experiment Tracking||Model Deployment|
For enterprise and software companies to advance their ML capabilities and enable agility, reproducibility, auditability, and maintainability of their ML models, it is becoming increasingly necessary to incorporate MLOps practices. Thinking in terms of MLOps maturity levels can help assess the current state of the practice and also create a roadmap for getting to the next maturity level as dictated by the strategic objectives of the business.
Ranjan is a senior technology leader from Boston who has built and led technology teams to build innovative SaaS solutions, several of them incorporating Data Science/Machine Learning, in different companies across multiple industries.