Critical.
Authoritative.
Strategic.
Subscribe to CIO Magazine »

No unified stack soon for big data

Hadoop and other big data tools need a standardized software stack to deploy more easily, experts agreed at a GigaOm panel

Despite the growing interest in big data platforms, it may be some time before organizations will be able to deploy a standardized big data software stack, concluded a panel of speakers Wednesday during a virtual panel hosted by GigaOm.

The panelists agreed that a standardized stack of big data analysis software would make it easier to develop large scale data analysis systems, in much the same way the open source LAMP stack engendered a whole generation of Web 2.0 services over the past decade. But the ways software such as Hadoop can be used vary so much that it may be difficult to settle on one core package of technologies, the panelists said.

LAMP is an abbreviation for a set of software programs that work very well together: Linux, the Apache Web server, the MySQL database and a set of programming languages--Perl, Python and PHP.

LAMP "provided a common framework upon which people could build. It was freely available. It was easily understood. It ran on almost anything. It created a foundation upon which a generation of start ups grew up," said independent consultant Paul Miller, who moderated the panel, "Designing for Big Data: The New Architectural Stack."

"As we're beginning to see an explosion of interest in big data, do we need a stack that is similarly ubiquitous? Do we need a LAMP stack for big data?" Miller asked.

All agreed that not having a standardized stack slows deployments of big data systems. "There isn't a standard stack, and people aren't clear which piece works best for a particular workload. There's a trial and error period going on now," said Jo Maitland, a research director covering cloud technology for GigaOm Pro.

One reason LAMP was so popular was that its users all had similar needs, all based around putting services online, pointed out Mark Baker, Canonical Ubuntu server product manager. The needs around analysis, on the other hand, tend to vary from business to business, and change often, he noted.

Large Web services companies that use Hadoop, such as eBay and Twitter, are running in a "continuous beta," and they hire a lot of technically competent staff to handle the pace of rapid change," said Mark Staimer, president of Dragon Slayer Consulting.

"Having a constantly evolving platform and stack is fine for them. They have the process and culture within the company to manage it," Staimer said. The more traditional "brick and mortar" companies are "much more conservative," Staimer added. "They like to see a fully baked solution."

Arriving at such a stack may be difficult, given the variety of technologies available, and the degrees of difficulty inherent in connecting them together in various configurations.

"Now we have loads of different pieces out there that you can plug together. Just in the database space, there is MongoDB, Cassandra, HSpace," Maitland said. All this choice "makes it more difficult for people. We're in a mashup situation with all these different components."

Such variety came about to address differing needs among users, Baker said. MySQL, for instance, is really fast at reading data, but the Cassandra data store, on the other hand, can write data more quickly. The production company behind the U.K. television show "Britain's Got Talent," used a Cassandra database to log the votes of viewers choosing their favorite performer, because it could ingest a high number of writes simultaneously, Baker noted.

A number of companies have released commercial Hadoop distributions, such as Cloudera, Hortonworks and MapR, in which all the software components are integrated. But even Hadoop itself is not suited for all jobs, Maitland argued. It processes data as batch jobs, meaning the full data set must be written to a file before it can be analyzed. Many jobs, however, involve the analysis of a continually updated data, such as click streams or Twitter messages.

Also, a stack would need to have support from more than one company to be an industry standard, Maitland said. "If there is going to be a stack, it needs to be [managed by] an open source organization and not necessarily managed by a specific company," Maitland said.

Another problem with not having a standardized stack is that it drives up the cost of hiring experts to manage and use such systems. Right now the competition for experts is fierce.

"Trying to build [a big data system] takes knowledge and skill. To plug those into your infrastructure can take time and money," Baker said. "There is no standard roadmap -- it is a feeling along process. Putting it all together is not a simple task."

"You can't have the explosive growth in an industry with so much specialized knowledge that is required as of right now," Maitland said.

"The average business analyst can't write queries against Hadoop," Staimer added.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

More about: AMP, Apache, eBay, Hewlett-Packard, HP, IDG, Linux, MySQL, Ubuntu
References show all
Comments are now closed.
Related Whitepapers
Latest Stories
Community Comments
Latest Blog Posts
Whitepapers
  • Implementing A Security Analytics Architecture
    According to the 2012 Verizon Data Breach Investigations report, 99% of breaches led to data compromise within “days” or less, whereas 85% of breaches took “weeks” or more to discover. This presents a significant challenge to security teams as it grants attackers extended periods of time within a victim’s environment. More “free time” leads to more stolen data and more digital damage. Principally, this is because today’s security measures aren’t designed to counter today’s more advanced threats. Read on.
    Learn more »
  • The Foundation for Cloud Management
    For businesses looking to provide real-time business solutions to employees and customers alike, you need to have a comprehensive network management strategy. The network is the foundation of all successful cloud services; it must be robust to meet traffic, efficiency, and performance demands. Download today the four steps to get your network operations cloud-ready.
    Learn more »
  • Best Practice in BYOD
    The key trend affecting enterprise mobility today can be summarized in four letters: BYOD – Bring Your Own Device. As the number of end-users bringing devices into your organization grows, so does the need for an effective Enterprise Mobility Management (EMM) solution. Learn how to manage devices across multiple platforms all from a single, centralised and unified management console. Download for more!
    Learn more »
All whitepapers
rhs_login_lockGet exclusive access to Invitation only events CIO, reports & analysis.
Recent comments

Computerworld
ARN
CFO World
CMO