MLOps [5] - What DevOps practices can be applied to Machine Learning projects?
This video and post are part of a One Dev Question series on MLOps - DevOps for Machine Learning. See the full video playlist here, and the rest of the blog posts here.
There are many lessons and good practices we've learned from decades of software development and more than a decade of "DevOps". Most (but not all) of these can be applied to machine learning projects.
The day-to-day work of data scientists for machine learning projects is usually very different from the day-to-day work of traditional software developers. While there's still (usually) code, there's more experimentation, a focus on data, longer cycle times, etc.
Because of this, it can be tempting to think all the process around that work must also be different. Data scientists are (usually) not developers after all.
But DevOps is broad, and although many practices can be prescriptive, the fundamental ideas are not. Even the practices, when tweaked and applied appropriately can fit perfectly well into an ML world.
The open source MLflow platform is a perfect example of how some of these DevOps concepts align nicely to machine learning projects.
Earlier this month, Azure Machine Learning announced a public preview of native support for MLflow, meaning (amongst other things) that you can take your existing on-prem workloads and MLOps processes and move them to the cloud. There's support for MLflow Tracking, Projects, Models, and Registry.
Let's look again at the definition for DevOps that we use at Microsoft:
DevOps is the union of people, process, and products to enable continuous delivery of value to the end user.
What we're really talking about is coordination in order to deliver value. That also fits perfectly with what MLOps is trying to achieve. The people, process and products may differ, but the focus towards delivering value in a continuous way does not.
So let's look briefly at some of the fundamental practices in DevOps to see how they can align with MLOps:
- Planning and tracking work - keeping track of current and historical experiments gives us continuity and provenance. We can easily look back at what was tried, and the results of that experiment.
- Source control - Keeping all ML code in source control means team members can collaborate more easily. If practices like trunk-based development and appropriate branching strategies are used, it also means there's a central repository of the main production-candidate code, alongside any work in progress.
- Infrastructure as code - By defining the training environment requirements in code, resources can be spun up and torn down efficiently and consistently. When combined with cloud-first training, this means effective elastic compute that you only pay for when you need. And by defining the production or inferencing environments in code, you can scale up and down easily, and even spin up test environments that you know will mirror production.
- Continuous Integration - While the actual "build" (read: training run) is almost always more CPU-, time-, and data-intensive than a traditional software build, that doesn't mean some of the same concepts can't be applied. Quality gates and tests can be run, and executing training runs from the main branch in source control means consistency and reproducibility.
- Continuous Delivery - Following on from an automated training run, deployable artefacts in the form of packaged models can be versioned and tracked. When combined with defined production environments definitions in code, new models can be deployed at a moment's notice.
- Continuous Delivery - Automated deployment of portable, packaged models is the next step. Controlling the rollout of these new models is also something DevOps does really well. Gradual rollout techniques like canary deployments, rolling deployments, using deployment rings, and even A/B testing of models or shadow deployments can be extremely useful.
- Production Monitoring - Watching what happens in production is key to closing the loop. Production monitoring is more than just watching for errors, it's about knowing whether your hypotheses about what's "valuable" are actually correct. This is even more true for machine learning projects. Often it's difficult to really evaluate "value" until the model has been running in production for some time.
These are just some of the practices in DevOps that can be applied very effectively to machine learning projects. Of course, consideration must be given to the differences in the flow of work, but the core ideas are still present.
I'd strongly encourage you to check out Azure Machine Learning's support for MLflow. In particular, we have some great docs that walk you through how to use Azure ML to track experiments.
If you want to get straight into running code, there are some excellent MLflow examples in the Azure Machine Learning Notebooks repository on GitHub.