DevOps for Machine Learning

Machine learning is easier to use than ever before, but how do we coordinate the teams involved? Software engineering has been maturing for decades, but introducing an unfamiliar parallel workstream like data science brings new challenges. Thankfully, we can draw upon the cross-team collaborative efforts of DevOps to bring data science and software engineering into sync.

Building effective predictive models involves data acquisition and preparation, then resource-intensive experimentation and training. Data scientists don't tend to focus on testing and deploying models to production systems in sync with software releases. Working with software engineers and IT operations in a repeatable, automated, coherent pipeline is still an afterthought in many organisations, and data science teams are left out of the loop.

Let's look at an end-to-end software and data science delivery pipeline that is repeatable and robust. We'll focus on model source control, repeatable data preparation, model training and continuous retraining, code validation and testing, model storage and versioning, and production deployment. Data scientists and software engineers can work together effectively to produce smart software. Let's learn how!