Big Data, Cheap Storage Bring In-Memory Analytics Into Spotlight
- 06 December, 2012 14:05
If you're paying attention to big data, lately you've probably heard terms such as in-memory analytics or in-memory technologies. Like many tech trends that appear new only because their histories are obscured by newer and sexier tech, or because time has yet to catch up with them&mdashserver virtualization and the cloud are just reinventions from the mainframe days, after all-in-memory is a term being resurrected by two trends today: big data and cheap, fast commodity storage, particularly DRAM.
"In-memory has been around a long, long time," says Dave Smith vice president of marketing for Revolution Analytics, a commercial provider of software, services and support for R, the open source programming language underpinning much of the predictive analytics landscape. "Now that we have big data, it's only the availability of terabyte (TB) systems and massive parallel processing [that makes] in-memory more interesting."
If you haven't already, you'll start to see offerings, including SAP HANA and Oracle Exalytics, which aim to bring big data and analytics together on the same box. Or you can also get HANA as a platform supported in the cloud by Amazon Web Services or SAP's NetWeaver platform, which includes Java and some middleware.
Meanwhile, analytics providers from SAS, Cognos, Pentaho, Tableau and Jaspersoft have all rolled out offerings to take advantage of the in-memory buzz, even if some of these offerings are mere bolt-ons to their existing product suite, says Gary Nakamura, general manager of in-memory database player Terracotta, a SoftwareAG company.
"They're saying, 'Hey, we're putting 10 gigs of memory into our product capability because that's all it can handle, but were calling it an in-memory solution,'" Nakamura says. The question, he adds, is whether they can scale to handle real-world problems and data flows. (To be fair, Terracotta has just released two competing products, BigMemory Max and Big Memory Go, the latter of which is free up to 32 GB. Both products scale into the TB range and can run on virtual machines or in distributed environments.)
In-Memory Technology Removes Latency From Analytics
"What is comes down to," says Shawn Blevins, executive vice president of sales and general manager at Opera Solutions, is that each product has "an actual layer where we can stage the data model itself, not just the data-and they exist in the same platform and the same box in flash memory."
From a business point of view, this is really what matters. In-memory technology gets complicated quickly. If you want to understand how all the bits and bytes line up, then it's probably best to call down to your IT guys for another rousing round of "What's that part do again?" However, if you want to understand why in-memory is becoming the buzzword du jour, that's a little easier: It provides business insights that lead to better business outcomes in real-time.
Essentially, in-memory analytics technology lets businesses take advantage of performance metrics gleaned from production systems and turn those into KPIs they can do something about. A company such as Terracotta can give away 32 GB of capacity because in-memory analytics doesn't require the entire fire hose of data that a traditional BI app needs in order to produce useful results.
"The deal with in-memory analytics is the analysis process is all about search," says Paul Barth, co-founder of data consulting firm NewVantage Partners. You're trying to see how many different combinations of things, such as blue car owners and ZIP code, are correlated, he adds.
For every one of those correlations, it takes time to pull the data, cluster it, notice the dependencies are and see how strongly one variable is affected by the others. Every time you pivot that table to find something new or get some clarity, data moves and gets reorganized. That introduces latency-which is the problem in-memory analytics is precisely designed to defeat.
High-Frequency, Low-Computation Analysis-For Now
At this stage of the game, big data analytics is really about discovery. Running iterations to see correlations between data points doesn't happen without milliseconds of latency, multiplied by millions (or billions) of iterations. Working in memory is at three orders of magnitude faster than going to disk, Barth says. "Speed matters in this business."
Ever wonder how Facebook can tag you in a photo as soon as it goes live on the site? A photo is a big file, and Facebook has Exabytes of photos on file. Facebook runs an algorithm against every photo to finds faces and reduces those faces to a few data points, says Revolution's Smith. This reduces a 40 MB photo down to about 40 bytes of data. The data then goes into a "black box," which determines whose face it is, tags it, searches for that person's account and all the accounts associated with person, and sends everyone a message.
That's big data at work. But it's also how in-memory analytics makes big data work. Currently, most people don't put more than 100 MB into an in-memory cache at any one time because of Java's limitations. The more data that's put into memory, Nakamura says, the more you have to tune the Java virtual machine. "It gets slower, not faster, and that is problematic when you are a performance-at-scale play." (Terracotta's Big Memory product line gets around this issue.)
For now, in-memory analytics is well-suited to high-frequency, low-computation number crunching. Of course, when you have Terabytes of data available to run real-time analytics, that behavior will change. In this case, the technology needs to catch up to the need, not the other way around. The need exists, the data exists and, based on the number of announcements coming from Hadoop World in October, the technology is on its way. No chicken and egg here.
Allen Bernard is a Boston native now living in Columbus, Ohio. He has covered IT management and the integration of technology into the enterprise since 2000. You can reach Bernard via email or follow him on Twitter @allen_bernard1. Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.
Read more about data management in CIO's Data Management Drilldown.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Quantifying the Value of Investments in HP ALM Solutions: Focus on Quality
- Information Security, Virtualisation, and the Journey to the Cloud
- Unified Recovery Management - Reducing Risk and Cost by Simplifying the Recovery Infrastructure
- Data Centre Operational Efficiency Best Practices
- Five Steps to Managing Content in Context
Why change management doesn’t work
Larry Page wants to see your medical records
Dual-Persona Smartphones Not a BYOD Panacea
After two-year hiatus, EFF accepts bitcoin donations again
CIOs struggle to deliver timely mobile business apps: survey
Governance For All - Empowering IT and Business Content Owners
Governance for all is more than an IT initiative or a goal written in a plan document; it’s a strategy that unites IT and business content owners to achieve their SharePoint goals. At its best, governance means empowering self-governance, with tools like delegated access, effective reporting, and automated policy enforcement. This white paper explains how to create a “governance for all” strategy that will enhance SharePoint adoption and its benefits to the organization. Read now.
Customer Success - Slater & Gordon Lawyers
Lawyers work hard, and they work fast. Any activity that takes their focus away from the task at hand represents lost productivity and lost revenue. Slater & Gordon Lawyers needed to filter spam and email-borne malware and provide high availability for email. Results from the business solution they chose include 250 hours of IT staff time reclaimed annually for other tasks, long delays in email delivery alleviated, reduced email-related storage costs, and email failover to the cloud in minutes, avoiding hours-long outages. Find out how they got these results.
Advanced Targeted Attacks
The new threat landscape has changed. Cybercriminals are aggressively pursuing valuable data assets, such as financial transaction information, product design blueprints, user credentials to sensitive systems, and other intellectual property. Simply put, the cyber offense has outpaced the defensive technologies used by most companies today. Find out more on how to protect against the next generation of cyber-attacks.