DataOps (data operations) is an agile, process-oriented methodology for developing and delivering analytics. It brings together DevOps teams with data engineers and data scientists to provide the tools, processes and organizational structures to support the data-focused enterprise. Michele Goetz, vice president and principal analyst at Forrester, defines DataOps as, "the ability to enable solutions, develop data products, and activate data for business value across all technology tiers from infrastructure to experience."
According to Dataversity, the goal of DataOps is to streamline the design, development, and maintenance of applications based on data and data analytics. It seeks to improve the way data is managed and products are created, and to coordinate these improvements with the goals of the business.
DevOps is a software development methodology that brings continuous delivery to the systems development lifecycle by combining development teams and operations teams into a single unit responsible for a product or service. DataOps builds on that concept by adding data specialists — data analysts, data developers, data engineers, and/or data scientists — to focus on the collaborative development of data flows and the continuous use of data across the organization.
"You've got the modern trend for development of DevOps, but more and more people are injecting some sort of data science capability into development, into systems, so you need someone on the DevOps team who has a data frame of mind," says Ted Dunning, CTO for MapR at HPE and co-author of Machine Learning Logistics: Model Management in the Real World.
Like DevOps, DataOps takes its cues from the agile methodology. The approach values continuous delivery of analytic insights with the primary goal of satisfying the customer.
According to the DataOps Manifesto, DataOps teams value analytics that work, measuring the performance of data analytics by the insights they deliver. DataOps teams also embrace change and seek to constantly understand evolving customer needs. They self-organize around goals and seek to reduce “heroism” in favor of sustainable and scalable teams and processes.
DataOps teams also seek to orchestrate data, tools, code, and environments from beginning to end, with the aim of providing reproducible results. DataOps teams tend to view analytic pipelines as analogous to lean manufacturing lines and regularly reflect on feedback provided by customers, team members, and operational statistics.
Enterprises today are increasingly injecting machine learning into a vast array of products and services and DataOps is an approach geared toward supporting the end-to-end needs of machine learning.
"For example, this style makes it more feasible for data scientists to have the support of software engineering to provide what is needed when models are handed over to operations during deployment," Dunning and co-author Ellen Friedman, principal technologist at HPE, write.
"The DataOps approach is not limited to machine learning," they add. "This style of organization is useful for any data-oriented work, making it easier to take advantage of the benefits offered by building a global data fabric."
They also note DataOps fit well with microservices architectures.
To make the most of DataOps, enterprises must evolve their data management strategies to deal with data at scale and in response to real-world events as they happen, according to Dunning and Friedman.