DevOps.com
Home » Blogs » Business of DevOps » What is MLOps? DataOps? And Why do They Matter?
What is MLOps? DataOps? And Why do They Matter?
Leave a Comment
Let’s look at three distinct disciplines—DevOps, MLOps and DataOps. In 2011, Marc Andreessen famously proclaimed that software was “eating the world.” A little more than a decade later, it’s all but impossible to argue with his premonition as software has embedded itself into virtually every industry.
But now another shift is underway and AI is gobbling up software as we know it. Our emails can finish our sentences for us, connected cars are helping us stay safer on the road and predictive technology is helping minimize supply chain disruptions caused by COVID-19. In the near future, almost every piece of software we interact with will have intelligence built-in, and as AI gains acceptance and new use cases are rolled out, applications will only continue to become more intelligent. The widespread move toward ML-enabled software has the potential to be as transformative and far-reaching as what Andreessen predicted.
But innovation is rarely easy or simple—and intelligent applications are no exception. While conventional software involves one thing—code—and not much else, intelligent software relies on a complex relationship between three interconnected variables or legs of a three-legged stool.
A Three-Legged Stool: DevOps, MLOps and DataOps
Model: The one or more AI/ML models (e.g., linear regressions, rules, deep neural networks, etc.) trained to recognize patterns in data and make decisions are what makes an application “intelligent.”
Data: Most often, a model is trained on historical data and programmed to emulate it. As a result, an application’s behavior depends heavily on the data—the raw input data, labels and features, as well as the new data to which the model is applied and ground-truth for this data.
Code: Code is the language an application uses to function. When used in the context of intelligent applications, the code pillar might refer to business logic, calls to models, receiving outputs, decision making and calls to other data systems.
A Real-World Example
To see how these pillars play out in real life, consider the example of a perception system for a self-driving car. Such systems use data (provided by offline sources and real-time cameras and sensors), models and code to determine myriad ever-changing variables (for example, if there are obstacles on the road, what lane markings look like, how fast a car can safely travel, etc.) While the following illustration is greatly simplified, it outlines the basic steps needed to build and deploy such a system and indicates which pillar each step belongs to.
Regardless of application type, the legs that intelligent applications stand on are always the same: Model, data and code. And if you try to remove any one of them, the application will topple over.
With the three distinct pillars that make up intelligent applications, three distinct disciplines have emerged to keep each individual pillar functioning as efficiently as possible: DataOps, MLOps and DevOps. The rest of this post examines the details that make each discipline distinct and necessary to the model life cycle.
DevOps
Once upon a time, companies used waterfall processes to develop software. The process often moved so slowly that products didn’t realize their potential before they died or were replaced. In 2000, Agile was introduced, allowing developers to work quickly and iteratively in tight sprints where they were constantly delivering. By 2009, Flickr, for example, was the envy of the industry, routinely performing 10+ deploys per day. Companies seeking to increase their own productivity combined software development with established operations principles—and the term “DevOps” was coined.
As the familiar loop graphic below illustrates, the DevOps cycle goes from the planning stages to the creation, testing/verification, packaging, release, configuration, monitoring and then starts over again with planning. Over the past two decades, a host of unique tools have emerged to support individual steps in the DevOps life cycle—helping developers ensure quality by building in continuous integration and continuous testing from the start and helping speed up time-to-market with workflow automation. The DevOps ecosystem is robust, well-thought-out and always iteratively improving.
MLOps
MLOps helps teams move more quickly and efficiently through the model life cycle—a loop that involves six distinct steps, each of which contains unique considerations.
Since MLOps is related to DevOps, businesses sometimes try to adapt the processes used to write conventional software to execute the unfamiliar task of machine learning operationalization. But the approach doesn’t work, as the two disciplines are actually quite different.
MLOps primarily deals with models, not code—and models are quite different from code. For example, model training and testing look very different from code building and testing. (What does “test case” even mean for a model?) And monitoring model performance is completely different than monitoring traditional software. (DevOps looks at things like CPU utilization, latency and throughput, while MLOps looks at model quality, drift and data quality.)
Furthermore, the people that MLOps engineers are building for are quite different than their DevOps counterparts. As one might expect, the primary users of models are data scientists while DevOps generally serves software developers. Most often, MLOps engineers come from a background in ML and data science or software engineers who picked up ML.
As with DevOps, no single all-inclusive MLOps platform exists. Rather, different tools support different individual steps in the model life cycle and provide some amount of workflow automation to speed up processes. Different tools support specific pieces of the ML life cycle—but the ecosystem is still somewhat fragmented. Ideally, MLOps will evolve to offer more DevOps-like tools that work well together and offer more automation. For companies looking to build an MLOps platform, build a suite of supporting tools with speed, safety and automation in mind.
DataOps
We’ve seen very well-defined life cycles for the other two pillars discussed above; however, when it comes to DataOps, the life cycle is still being defined. But the overall objective of DataOps should sound familiar: DataOps involves processes and tools to ship high-quality data frequently—which requires a combination of data engineering, data quality, data security and data integration.
As with DevOps and MLOps, what the DataOps platform or DataOps engineer is doing is supporting the tools that help perform one of these activities and help build workflows. For example, a workflow may take data from a database, apply some transformation to it and then make it available for a business intelligence (BI) tool. That’s the kind of automation that is involved in DataOps.
As you might imagine, the skillset and the tools required for DataOps are quite different from the ones needed in MLOps and DevOps—so expecting a DevOps engineer to also be an expert in DataOps seems a bit unrealistic. While MLOps requires a much tighter integration with data than DevOps historically has required, MLOps is not the same thing as DataOps. More so than the other pillars that support intelligent applications, DataOps tends to be quite separate and distinct.
Conclusion
From perception systems to enterprise-scale fintech software, every intelligent application depends on three distinct disciplines: DevOps, MLOps and DataOps. And therein lies the complexity of developing intelligent products. Each of these fields is distinct, dealing with a different set of questions and objectives within the ML life cycle and requiring different kinds of people and tools. However, they are all fundamentally united by a common goal: Optimizing quality and speed of iteration of the ML life cycle. The tools, practices and organizations that will rise to the top are the ones that will enable more seamless collaboration between different ML teams, helping them move through the ML life cycle quickly and efficiently.
Related Posts