The concept of a "data lake," sometimes called an "enterprise data hub," is a seductive one.
The data lake is the landing zone for all the data in your organization - structured, unstructured and semi-structured. - a central repository where all data is ingested and stored at its original fidelity All your enterprise workloads, from batch processing and interactive SQL to enterprise search and advanced analytics, then draw upon that data substrate.
Generally, the idea is to use HDFS (Hadoop Distributed File System) to store all your data in a single, large table. But building out such a next-generation data infrastructure requires more than simply deploying Hadoop; there's a whole ecosystem of related technologies that need to integrate with Hadoop to make it happen. And while Hadoop itself is open source, many of the other technologies that can help you build that infrastructure are open core or fully proprietary.
[Related: Pivotal Bulks Up Big Data Software Portfolio]
What's worse is that they typically use wildly varying pricing metrics, from the amount of data stored to the number of nodes used or the number of CPUs involved. Keeping track of the various licenses involved to understand how your costs will change as you scale out is challenging at best.
Simplifying Licensing With Pivotal Big Data Suite
In an effort to simplify that process and make deploying a data lake as painless as possible, Pivotal today announced the Pivotal Big Data Suite, an annual subscription-based software, support and maintenance package that bundles its Pivotal Greenplum Database, Pivotal GemFire real-time distributed data store, Pivotal SQLFire (a SQL layer for the real-time distributed data store), Pivotal GemFire XD (in-memory SQL over HDFS), Pivotal HAWQ parallel query engine over HDFS and Pivotal HD Hadoop distribution.
"We've assembled the pieces to allow customers to start pursuing this business data lake architecture," says Joshua Klahr, vice president of product management at Pivotal. "Until now, the purchase process was actually pretty complicated for our customers because the pricing was different across the different products."
"We're launching a big data suite pricing model," Klahr adds. "It an all-you-can-eat license with a single unit of measure - all the assets you need to build this data lake."
Essentially, Klahr says, you buy the Pivotal Big Data Suite in cores. As part of the deal, you get the capability to deploy Pivotal's Hadoop distribution, Pivotal HD, to an unlimited number of cores.
The cores purchased as part of the suite can then be used to deploy Pivotal's other purpose-built offerings on top of Hadoop however the organization sees fit.
An organization that buys 500 cores of the Pivotal Big Data Suite can use all of those 500 cores for Greenplum Database, or use 250 for Greenplum DB and 250 for HAWQ. Moreover, those cores can be shifted to the various technologies in the suite as needed, with no modification required to the subscription.
"It makes the buying process really simple," Klahr says. "They can deploy a vanilla Hadoop cluster as big as they want it to be. Where we start deriving value is when the customer starts deploying services on top of that Hadoop. We're not going to charge you a license fee for storing data. We'll charge you a license fee when you start deriving value by analyzing that data or operationalizing that data."
The suite is available as of today on two- and three-year terms.
Read more about big data in CIO's Big Data Drilldown.