Cloudera preps Hadoop for the enterprise
- 30 June, 2010 00:07
- Comments
Cloudera has unveiled a new set of Hadoop management tools, called Cloudera Enterprise, that the company will offer for an annual subscription fee, it announced on Tuesday. It has also updated its open-source distribution package of Hadoop.
Both new releases, as well as several new partnerships with providers of data management software vendors, show the company gearing up to offer the emerging database technology -- now mostly used by Web giants like Google and Yahoo -- to the enterprise market as an alternative to relational databases.
"Our bet is not only the big Web companies, but banks, hospitals and insurance companies will discover they need to analyze complex and structured data together, and Hadoop was made for that," said Cloudera CEO Mike Olson. "Hadoop solves a new problem, in a new way."
One of a growing number of non-SQL, or NoSQL databases, Hadoop is based off of Google MapReduce, a framework for processing data in parallel across large numbers of computer nodes. Hadoop, now being developed as an open-source project by the Apache Software Foundation, offers an alternative to traditional relational databases, for at least those cases of analyzing large, quickly changing data sets.
It can work with both SQL and non-SQL data, and is more resilient to server failure than relational databases, Olson said.
Cloudera is packaging Hadoop for midlevel organizations, both with its Hadoop distribution, and its newly released set of management tools. Both packages should allow organizations without a lot of in-depth technical experience in Hadoop to run the software, Olson said. "There is this myth that Hadoop is usable if you have Google-scaled data. There are many users who have merely a few terabytes of data that they wish to analyze," Olson said.
Cloudera's Distribution for Hadoop (CDH) is an open-source package of pre-integrated software programs built around the Hadoop Common, formerly named Hadoop Core. The package includes: Hive, which provides a data warehouse infrastructure; HBase, the database underlying Hadoop; Pig, a compiler for map-reduce programs; Zookeper, a scheduling for running applications across multiple servers, and MapReduce.
In the newly released version 3, the package includes three programs that the company has released as open-source projects, under the Apache V2 open-source license. One is Flume, which can assist in the loading of data into Hadoop. Another new addition is Oozie, which is a workflow management software. The last is the Hadoop User Environment (HUE) code, which provides an user interface for managing Hadoop.
"HUE allows anyone to build an applications targeted at analysts. It knows how to talk to the Hadoop clusters," Olson said.
The Cloudera Enterprise package augments CDH version 3 with additional management tools. This new software, which is not open source, allows administrators to control access management through use of the Lightweight Directory Access Protocol. Programs are also provided to provision resources, to do configuration and performance monitoring.
Olson would not discuss how much Cloudera has made from subscription and consulting fees thus far, but notes that the first quarter of 2010, the company made as much as what it earned through half of 2009. Among different industries, financial services, telecommunications, retail, government and Web commerce companies have shown an interest in the technology, Olson said.
"The things that companies are doing with Hadoop vary. In general, these people are catching lots of data from lots of places and need to subject it to sophisticated analytics," Olson said. "Financial services are interested in using Hadoop for fraud detection. In telecommunications, there is a real need to optimize networks and reduce churn of customers."
In addition to offering these packages, Cloudera has been rallying support for Hadoop from providers of business intelligence (BI) and data management software.
Olson plans to announce, during his keynote at the Hadoop Summit 2010, taking place in Santa Clara, California on Tuesday, that BI vendor MicroStrategy will support Hadoop use.
Another new partner is Talend, a vendor of open source data integration software. The company has extended its Talend Integration Suite to interface with Hadoop databases. Its suite allows administrators to manage and aggregate multiple data sources from a single console. With Hadoop, the software "can natively insert or retrieve data, and process the data within the Hadoop architecture," said Talend vice president of marketing, Yves de Montcheuil.
Microstrategy and Talend join a growing number of companies are prepping open source or commercial management tools for Hadoop. Last week, Cloudera and Quest embarked on a project to build software that can link Hadoop with Oracle databases. In May, open-source, business-intelligence company Pentaho announced that its BI suite would work with Hadoop databases.
In a separate interview with IDG News Service, Yahoo CTO Raymie Stata pointed out that Hadoop could reduce the need for building supercomputers to analyze large data sets. Traditionally, large data sets have been moved from storage into the supercomputer, which is a pooled set of servers, to be analyzed. In contrast, Hadoop moves the analytic computation to where the data resides, eliminating the need for a cental, giant number-crunching machine. Yahoo was an early leading contributor to Hadoop.
In addition to Cloudera's offering, Hadoop is also being commercialized by IBM, which recently started offering a set of analytic services that use the technology.
Joab Jackson covers enterpise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Bookmark this page
- Share this article
- Got more on this story? Email CIO
- Follow CIO on twitter
- Anti-database movement gains steam : Open Source - InfoWorld
- Cloudera’s Distribution for Hadoop « Cloudera » Apache Hadoop for the Enterprise
- Welcome to Apache Hadoop!
- Hadoop Summit 2010 - Yahoo! Developer Network
- Cloudera, Quest to link Hadoop to Oracle : Data Management - InfoWorld
- Open source BI vendor ties up with Hadoop : Open Source - InfoWorld
- China flexes supercomputing muscle in Top 500 rankings
- IBM, Sybase upgrade analytics capabilities : Data Management - InfoWorld
- @Joab_Jackson
- Joab_Jackson@idg.com
- Oracle Enterprise Gateway
- Printer Usage and Cost Management Strategies for the Australian Mid-market, an Unrealised Opportunity
- Case Study: Keeping information on the move: Clearswift protects Maman, the logistics experts
- Email Encryption/Decryption and Signing integrated into a comprehensive content security solution
- Magic Quadrant for Enterprise Disk-Based Backup/Recovery
-
Monday Grok: Will Siri crack the walls of GOOG?
-
Face Time - Interview with John Brennan and Robert DiStefano
-
Face Time - Interview with John Brennan and Robert DiStefano
-
Phones are distractions during catch-ups
-
Google's Sidewiki lets people post comments about Web pages
-
Transforming Your Business by Transforming Your Processes
In this white paper, we build on the “Intelligent Guide to Enterprise BPM: V olume One” in which we described the three entry points where you can begin to build true Enterprise BPM. In this white paper we explain the value of Process T ransformation, the entry point to strategy and design. Successful implementation of Process T ransformation will mean you have successfully documented, standardized, harmonized, managed—as well as analyzed and improved—your business processes. T he next two white papers will detail the other two entry points: Process Automation and Process Intelligence. -
Government Communications 2.0
The problem with data is that it’s only useful if you share and use it. Equally, the more data we share electronically, the greater the risk of it falling into the wrong hands. Public sector organisations can’t function without legitimately gathering and using personal information about the citizens they are mandated to serve. Technology has made a significant contribution to that process, but has also brought new risks. Read on. -
Backup and Recovery as we Know it is Changing
Increasing complexity in the data centre, including the rapid deployment of virtual servers, ever-expanding compliance requirements, and increasing amounts of sensitive data on mobile devices has put more strain on backup and recovery. Read on.
-
Excel 2007 All-In-One Desk Reference for Dummies
-
Office 2007 for Dummies
-
Microsoft Office
-
Windows 7 for Dummies® Dvd+book Bundle
-
MYOB Software for Dummies 6E Australian Edition
-
Windows 7 for Dummies®
-
Office 2007 All-In-One Desk Reference for Dummies
-
Windows 7 for Seniors for Dummies®
-
Computers for Seniors for Dummies, 2nd Edition








Comments
Post new comment