A DataOps strategy is heavily reliant on collaboration as data flows between managers and consumers throughout the business. Collaboration is essential to DataOps success, so it's important to start with the right team to drive these initiatives.
It's natural to think of DataOps as simply DevOps for data -- not quite. It would be more accurate to say that DataOps is trying to achieve for data what DevOps achieves for coding: a dramatic improvement in productivity and quality. However, DataOps has some other problems to solve, in particular how to maintain a mission-critical system in continuous production.
The distinction is important when it comes to thinking about putting together a DataOps team. If the DevOps approach is a template, with Product Managers, Scrum Masters and Developers, the focus will end up on delivery. DataOps also needs to focus on continuous maintenance and requires some other frameworks to work with.
One key influence on DataOps has been Lean manufacturing techniques. Managers often use terms taken from the classic Toyota Production System, which has been much studied and imitated. There're also terms like data factory when talk starts about data pipelines in production.
This approach requires a distinctive team structure. Let's first look at some roles within a DataOps team.
The roles described here are for a DataOps team deploying data science in mission-critical production. What about teams who are less focused on data science? Do they need DataOps, too, for example, for a data warehouse? Certainly, some of the techniques may be similar, but a traditional team of extract, transform and load (ETL) developers and data architects is probably going to work well.
A data warehouse, by its nature, is less dynamic and more constant than an Agile pipelined data environment. The following DataOps team roles handle the rather more volatile world of pipelines, algorithms and self-service users. Nevertheless, DataOps techniques are becoming more relevant as data warehouse teams push to be ever more Agile, especially with cloud deployments and data lakehouse architectures. Let's start with defining the roles required for these new analytics techniques. Data scientists do research. If an organization knows what they want and they just need someone to implement a predictive process, then get a developer who knows their way around algorithms.
The data scientist, on the other hand, explores for a living, discovering what is relevant and meaningful as they do. In the course of exploration, a data scientist may try numerous algorithms, often in ensembles of diverse models. They may even write their own algorithms. The DataOps team can make the difference between an enterprise who occasionally does cool things with data and an enterprise that runs efficiently and reliably on data, analytics and insight.
The key attributes for this role are restless curiosity and an interest in the domain, as well as technical insight -- especially in statistics -- to understand the significance of what they discover and the real-world impact of their work. This diligence matters. It is not enough to find one good model and stop there because business domains rapidly evolve. Also, while everyone may not work in areas with compelling ethical dilemmas, data scientists in every domain sooner or later come across issues of personal or commercial privacy. This is a technical role, but don't overlook the human side, especially if the organization is only hiring one data scientist.
A good data scientist is a good communicator that is able to explain findings to a nontechnical audience, often executives, while being straightforward about what is and is not possible. Finally, the data scientist, especially one working in a domain which is new to them, is unlikely to know all the operational data sources -- ERP, CRM, HR systems and so on -- but they certainly need to work with the data.