Hadoop challenger works to add developers
- 22 December, 2011 22:29
- Comments
LexisNexis has worked for more than a decade to develop a large scale system for Big Data manipulation, and it believes that it has produced something that's better and more mature than the better known Hadoop technology.
The company just needs developers to agree.
LexisNexis developed the parallel processing data platform to handle the demands of its own data intensive research business. It wants it extend use of the technology, dubbed HPCC Systems, to broader markets, but is clearly aware that open source Hadoop has already established itself as a strong presence.
The company has opened sourced the HPCC platform, and says it is challenging Hadoop in benchmarks.
The company says there are now about 1,000 HPCC Systems developers worldwide, most of who have been trained since the platform was opened sourced in June,
By contrast, a Hadoop developer conference last summer drew a crowd of some 1,700.
To help demonstrate its capabilities, a Terasort benchmark was run to compare HPCC against a similar benchmark and workload by SGI on a Hadoop cluster , announced in October.
LexisNexis says its benchmark was 25% faster, and ran on far less hardware: A 4-node cluster versus a 20-node cluster on the SGI system. The LexisNexis test was done on a Dell PowerEdge, two socket servers, with six core Intel Xeon processors.
Flavio Villanustre, vice president of infrastructure and products at LexisNexis Risk Solutions, credited the test results, in part, on the number of lines in code needed for the sorting versus Hadoop.
LexisNexis developed its own language, ECL, for this system
It took three lines of ECL code to do the sorting, compared to 100 plus lines in Java, which is what is used in Hadoop, said Villanustre.
Asked to respond to the HPCC benchmark, an Bill Mannel, vice president of product marketing at SGI said in a statement that "there are many variations of distributed processing which can run Terasort. HPCC Systems is running Terasort on ECL code, which is different than SGI running on a MapReduce-based Hadoop. SGI remains committed to pushing the bar on performance and beating and improving our own record." MapReduce is a software framework.
Villanustre believes HPCC could do well in the marketplace against Hadoop, but he doesn't take anything for granted. He said that he wants to avoid ending up like Betamax, which lost the video format wars to VHS, or IBM's OS/2 operating system, which was cruushed by Microsoft Windows.
"We want to ensure adoption and that's why we are pushing so much," said Villanustre.
The company has also made its HPCC system available in the cloud via Amazon Web Services.
The platform is available through a dual licensing strategy that allows a community edition and a commercial enterprise platform.
Matt Aslett, an analyst at The 451 Group, believes LexisNexis can be a lot more aggressive "given the large and growing ecosystem of developers and vendors that has formed around Apache Hadoop."
Specifically, Aslett believes the dual licensing strategy enables the company to protect the code from forking and generate revenue from adopters, "but dual licensing strategies have traditionally not been very successful at generating a developer community."
Aslett said that "releasing the software under a more permissive license or contributing it to an established open source foundation would have been more likely to drive developer adoption."
Bruce Perens, a leading open source advocate and a strategic consultant to LexisNexis, developed the licensing approach, called The Covenant , for the HPCC Services platform. He agrees that dual-licensing strategies have had a mixed history, but says the HPCC licensing approach is designed to address that problem.
Perens said the present version of the code will always remain open and there's no way to withdraw an open source license. "One assigns code to HPCC only if one wishes HPCC to maintain it from then on - which, of course, is very desirable," he said.
Every time a developer adds code and then assigns the copyright to the company, there's a three-year guarantee to each contributor that the HPCC code will remain open source, under the Covenant.
The three-year provision "is a guarantee to help developers be confident about the destiny of their contribution, not a way of holding the project at ransom," said Perens, in an email response to questions.
"HPCC always has the option to go to a less restrictive license if dual-licensing doesn't work for them, but this is not expected," said Perens. Everybody loves to get a gift, "but it's not always fair to the party that writes the code" to give it as no-strings-attached gift to competitors.
Perens argues that dual-licensing puts some economic sense in Open Source, and "the covenant repairs the community side of dual licensing," he said.
Patrick Thibodeau covers SaaS and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov , or subscribe to Patrick's RSS feed . His e-mail address is pthibodeau@computerworld.com .
Read more about bi and analytics in Computerworld's BI and Analytics Topic Center.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Bookmark this page
- Share this article
- Got more on this story? Email CIO
- Follow CIO on twitter
- 'Hadoop alternative' to be open sourced - Computerworld
- The Grill: Doug Cutting - Computerworld
- SGI - Press Releases: SGI Establishes New World Record Apache Hadoop Benchmark
- its benchmark
- The Covenant - A New Approach to Open Source Cooperation : HPCC Systems
- Computerworld Patrick Thibodeau News
- pthibodeau@computerworld.com
- BI and Analytics Topic Center - Computerworld
-
Pfizer's Future Depends on IT Transformation
-
Pfizer's Future Depends on IT Transformation
-
Pfizer's Future Depends on IT Transformation
-
Apple aims iPads at High Schools
-
Face Time - Interview with John Brennan and Robert DiStefano
-
Case Study: Danske Bank Group improves efficiency and reduces time to market
Danske Bank Group wanted to deliver new services faster. It sought to reduce time to market from approximately 14 months to nine months and increase IT development efficiency by 10 percent. Find out more. -
CISO Guide to Next Generation Threats - Combating Advanced Malware, Zero-Day and Targeted APT Attacks
Over 95% of businesses unknowingly host compromised endpoints, despite their use of firewalls, intrusion prevention systems (IPS), antivirus and Web gateways.1 Today’s attacks look new and unknown to signature-based tools because the attacks employ advanced malware and zero-day vulnerabilities. To regain the upper hand against next-generation attacks, enterprises must turn to true next-generation protection: signature-less, proactive and real time. Read on. -
Best practices for a Data Warehouse on Oracle Database 11g
Increasingly companies are recognizing the value of an enterprise data warehouse (EDW). A true EDW provides a single 360-degree view of the business and a powerful platform for a wide spectrum of business intelligence tasks ranging from predictive analysis to near real-time strategic and tactical decision support throughout the organization. Read on.

















Comments
Post new comment