Yahoo drops its own Hadoop distribution
- 02 February, 2011 07:24
Yahoo is discontinuing its distribution of the Hadoop platform and will instead focus on Apache Hadoop, the Hadoop Team at Yahoo said this week.
Hadoop, which was built initially by Apache Chairman Doug Cutting while he was at Yahoo, has become prominent in data centers and cloud computing. Yahoo will halt its own distribution and remove all references to a Yahoo distribution from its Web site and close its github facility for Hadoop. "Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bulletproof that Yahoo and other production Hadoop users can run them unpatched on their clusters," said Eric Baldeschwieler, vice president of Hadoop development at Yahoo, in the company's announcement.
[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page Cloud Computing Deep Dive PDF special report. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]
The Apache Hadoop community has been "very turbulent" lately, according to Baldeschwieler. "Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt, but the future of the Yahoo distribution of Hadoop was far from clear. We've concluded that focusing on Apache Hadoop is the way forward," said Baldeschwieler
Yahoo will have to sort out how to contribute several man-years' worth of work to Apache to "unwind the Yahoo git repositories," Baldeschwieler said. Yahoo has proposed a 20.100 release of Hadoop, featuring stability and high performance. Also, Yahoo has set up a feature branch called hadoop-future. A draft list of proposed features includes federation, with the ability to use more storage per Hadoop cluster; a new metrics framework; and optimizing the Hadoop MapReduce parallel applications framework for use with small jobs
Yahoo said that until the Hadoop 0.20 release, Yahoo committers worked as release masters to produce binary Apache Hadoop releases for the entire community to use on clusters. "As the community grew, we experimented with using the Yahoo distribution of Hadoop as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases," Baldeschwieler said.
This article, "Yahoo drops its own Hadoop distribution," was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest developments in business technology news, follow InfoWorld.com on Twitter.
Read more about data management in InfoWorld's Data Management Channel.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Selecting the right cloud: A step-by-step guide : Cloud Computing - InfoWorld
- InfoWorld’s Cloud Computing Report - InfoWorld
- Yahoo drops its own Hadoop distribution : Data Management - InfoWorld
- Business technology, IT news, product reviews and enterprise IT strategies - InfoWorld
- IT news and top technology headlines - InfoWorld
- InfoWorld Daily Newsletter - InfoWorld
- InfoWorld.com on Twitter
- Data Management - InfoWorld
Why change management doesn’t work
Larry Page wants to see your medical records
Dual-Persona Smartphones Not a BYOD Panacea
After two-year hiatus, EFF accepts bitcoin donations again
CIOs struggle to deliver timely mobile business apps: survey
Unleashing the Power of Information
If business-relevant information is not well managed, secured and analysed, it can become an underutilized asset or—worst case—a legal and competitive liability. Nearly all of the IT and business executives who responded to a recent survey recognise this risk, and say they understand the importance of having an enterprise information management (EIM) strategy. Find out more on how to reduce costs, improve competitiveness and avoid risk by making information management an enterprisewide strategic priority.
NetApp FAS6240 Clustered SAN Champion of Champions
Storage systems today must match agility with diversified I/O performance to satisfy an enterprise’s changing needs. In their review, Silverton Consulting ranks the NetApp FAS6240 Clustered SAN, as an Enterprise OLTP “Champion of Champions.” Read the results of their benchmark testing and the features that impressed them the most.
Key Factors in Modernising Backup and Recovery
There is a definite need for better data protection solutions in today’s enterprise data centres. The question is whether to continue with software-only backup and recovery solutions, or to make the move to a purpose-built backup appliance with de-duplication capabilities. This paper discusses the trends that have made modernising backup and recovery an urgent priority. Click to download.