As 2016 draws to a close, a new study suggests big data is growing in maturity and surging in the cloud.
AtScale, which specializes in BI on Hadoop using OLAP-like cubes, recently conducted a survey of more than 2,550 big data professionals at 1,400 companies across 77 countries. The survey was conducted in conjunction with Cloudera, Hortonworks, MapR, Cognizant, Trifacta and Tableau.
AtScale's 2016 Big Data Maturity Survey found that nearly 70 percent of respondents have been using big data for more than a year (compared with 59 percent last year). Seventy-six percent of respondents are using Hadoop today, and 73 percent say they are now using Hadoop in production (compared with 65 percent last year). Additionally, 74 percent have more than 10 Hadoop nodes and 20 percent 20 percent have more than 100 nodes.
[ Related 8 big data predictions for 2017 ]
"The maturity of respondents in this survey is a key consideration," Thomas Dinsmore, big data analytics industry analyst and author of the book "Disruptive Analytics," said in a statement Wednesday. "One in five respondents has more than 100 nodes and 74 percent of them are in production, indicating double-digit growth year-over-year."
Respondents also say they are increasingly turning to the cloud when it comes to hosting their big data analytics. Fifty-three percent of respondents say they have already deployed big data in the cloud and 14 percent of respondent have all their big data in the cloud. Seventy-two percent plan to use the cloud for a big data deployment in the future.
"There's been a clear surge in use of big data in the cloud over the last year and what's perhaps as interesting is the fact that respondents are far more likely to achieve tangible value when their data is in the cloud," says AtScale CTO and co-founder Matt Baird.
Hadoop is better off-premises
"Hadoop is freaking hard," adds Dave Mariani, CEO and founder of AtScale. "It's really hard to deploy, it's really hard to manage. I see a lot of customers really like not having to worry about managing their Hadoop cluster. Being able to elastically scale, not just add new nodes but also shrink them, and to use object storage as a persistent layer to do that, that is a completely different notion than on-prem Hadoop."
[ Related: Big data on campus ]
Alongside big data's increasing maturity, the primary workloads are also shifting.
"The number one workload last year was ETL, then business intelligence, then data science," says Bruno Aziza, chief marketing officer of AtScale. "This year, the number one workload was business intelligence."
BI is big
ETL and data science remain popular big data workloads, but business intelligence (BI), which was already trending upward last year, has become the predominant workload with 75 percent of respondents using or planning to use BI on big data. And that's not slowing down any time soon if the indications are correct. Fully 97 percent of respondents said they would do as much oremore with big data over the next three months.
While there has been a lot of hype around Spark, the survey found that 42 percent of organizations use Spark for educational purposes but have no real project using Spark as of yet. A third of respondents say Spark is primarily in development today, while 25 percent say they have deployed Spark in development and production.
"There's a lot of excitement around Spark, but very little real-life deployment," Aziza says.
"If you look at those planning on using Hadoop, most people go in thinking, 'I'm going to be using Spark as my primary engine.' But when you actually start using Hadoop, most people use Hive," Mariani adds. "You would never use Spark for an ETL pipeline. You're going to use Hive for that. But we would never use Hive for interactive queries; we'd use Spark or Impala for that."
It should be noted, however, that organizations that have deployed Spark in production were 85 percent more likely to achieve value.
When it comes to concerns around big data, accessibility, security and governance have become the fastest growing areas of concern year-over year, with worries related to governance growing the most at 21 percent.