What is DataOps?
DataOps (data operations) is an emerging discipline that brings together DevOps teams with data engineer and data scientist roles to provide the tools, processes and organizational structures to support the data-focused enterprise.
"You've got the modern trend for development of DevOps, but more and more people are injecting some sort of data science capability into development, into systems, so you need someone on the DevOps team who has a data frame of mind," says Ted Dunning, chief applications architect at MapR Technologies and co-author of Machine Learning Logistics: Model Management in the Real World.
Like DevOps, the DataOps approach takes its cues from the agile methodology. The approach values continuous delivery of analytic insights with the primary goal of satisfying the customer.
DataOps teams value analytics that work; they measure the performance of data analytics by the insights they deliver. DataOps teams embrace change, and seek to constantly understand evolving customer needs.
DataOps teams are teams. They self-organize around goals, and seek to reduce “heroism” in favor of sustainable and scalable teams and processes.
DataOps teams seek to orchestrate data, tools, code, and environments from beginning to end. Reproducible results are essential. DataOps teams tend to view analytic pipelines as analogous to lean manufacturing lines.
Where DataOps fits
Enterprises today are increasingly injecting machine learning into a vast array of products and services, Dunning says, and DataOps is an approach geared to supporting the end-to-end needs of machine learning.
"For example, this style makes it more feasible for data scientists to have the support of software engineering to provide what is needed when models are handed over to operations during deployment," Dunning and co-author Ellen Friedman, principal technologist at MapR, write.
"The DataOps approach is not limited to machine learning," they add. "This style of organization is useful for any data-oriented work, making it easier to take advantage of the benefits offered by building a global data fabric."
They also note DataOps fit well with microservices architectures.
DataOps in practice
As enterprises adopt emerging data technologies such as these, Dunning and Friedman say it is imperative that enterprises evolve their approach to improve their ability to work with data at scale and to respond to real-world events as they happen.
"Traditionally siloed roles can prove too rigid and slow to be a good fit in big data organizations undergoing digital transformation," they write. "That's where a DataOps style of work can help."
The DevOps approach brings together specialists in software development and operations to more closely align development with business objectives and to shorten development cycles and increase deployment frequency. It emphasizes cross-functional teams that cut across "skill guilds" like operations, software engineering, architecture and planning, and product management. DataOps adds data science and data engineering roles to the mix, with the aim of increasing collaboration and communication among developers, operations professionals and data experts.
Dunning emphasizes that attaining the alignment promised by DataOps requires embedding data scientists in the DataOps team.
"I think the most important thing to do here is to not stick with the more traditional Ivory Tower organization where data scientists live apart from dev teams," Dunning says. "The most important step you can take is to actually embed data scientists in a DevOps team. When they live in the same room, eat the same meals, hear the same complaints, they will naturally gain alignment."
"Don't make them a thing apart," he adds. "They need to hear frontline comments, recommend the same solutions, undergo the same triage. That embedding is the key step to take."
However, Dunning also notes that data scientists aren't necessarily permanently embedded in a DataOps team.
"Typically, there's a data scientist embedded in the team for a time," Dunning says. "Their capabilities and sensibilities begin to rub off. Someone on the team then takes on the role of data engineer and kind of a low-budget data scientist. The actual data scientist embedded in the team then moves along. It's a fluid situation."
How to build a DataOps team
Building a DataOps team doesn't necessarily mean you must hire new specialists. Friedman notes that many enterprises already have the nucleus of a DataOps team in existing DevOps teams. The next step is to identify projects that need data-intensive development and somebody with data training. That person may even be a data engineer rather than a full-on data scientist.
"When you're covering these different skills, and putting them together toward this common goal, that doesn't necessarily mean you're having to hire a bunch of people to fill these roles," Friedman says. "Often you have these people with the key skills. It just requires a realignment to understand what the key roles are."
The important part, she says, is improving collaboration between skill sets for efficiency and better use of people's time and expertise.
"In large-scale projects, a particular DataOps role may be filled by more than one person, but it's also common that some people will cover more than one role," Dunning and Friedman write in their book. "Operations and software engineering skills may overlap; team members with software engineering experience also may be qualified as data engineers. Often, data scientists have data engineering skills. It's rare, however, to see overlap between data science and operations."
Dunning and Friedman say it's also key that DataOps teams share a common goal: the data-driven needs of the services they support.
"With engineering teams, good engineers, what you need to do is you need to set goals well," Dunning says. "Once there's a common goal, solving a problem, then the team organizes itself very often toward solving that problem. The difficulty comes when different people see different aspects of the problem. Ops people are going to be worried about reliability, that you get an answer within a certain time. The data science person tends to be focused on the accuracy of the answer. You've already got a bit of a divergence. But if they're trying to solve the same problem and they're willing to compromise on how it's solved, I think it's a pretty easy social structure to build up."