Big data, according to Harper Reed, is "bullshit". It's a surprising sentiment from the man, who as chief technology officer of the Obama for America campaign helped head up the tech side of re-electing the US president in what was widely hailed as a political coming of age for big data.
Reed helped build a technology organisation that spearheaded the digital side of Barack Obama's election campaign, which provided tools essential to 'get out the vote' efforts in a country where voting is non-compulsory.
"Big data is a term we used in 2007 because it was hard to store data," Reed told the opening session of the CeBIT trade show in Sydney. "It was literally hard."
"It was expensive. It cost billions of dollars. The people who were doing it were doing it very well, but they were doing it in a closed room with huge budgets. And we were just kind of hippy computer science guys in the corner going, 'I have too much data! How do I store this?'"
The solutions to the problem have existed for a while now, Reed said. Technologies, such as Hadoop, a platform created by Doug Cutting for distributed data crunching, HBase, which is used to run Facebook's messaging system and the paper released by Google researchers that outlined the search company's BigTable database system, mean that handling large data sets is a lot easier.
"When we started talking about 'big data' it was about storage. The thing about storage is it has nothing to do with analysis, nothing to do with questions and answers. It's only about storing," Reed said.
"When I hear big data, I immediately hear marketing and a lot of people saying like, 'Oh, well, we need to invest in big data... I just look over and you see all these great brands... and they're doing really great things but they've really jumped into this marketing world of talking about the problems that are pretty much solved."
Companies on the big data bandwagon are really offering analytics platforms to get answers, Reed said.
"I think that's really the important thing," Reed said. "I'm just tired of it being called big data. It should just be called data. And the other thing is I bet there are very few people in this room who actually have data that is big. You probably have large data or medium data or long data. But it's the big data that's actually a pain in the arse still and it's hard."
Rayid Ghani, the campaign's chief data scientist, quipped that the amount of data that the 'big data' Obama for America campaign had to deal with with was less than he had in his home. "I have more hard disks in my apartment than the campaign had data," Ghani said.