Why your BI strategy needs a universal semantic data layer
- 10 November, 2017 22:00
To make the most of your corporate data, your analysts should have universal access to data that can be understood by their tool of choice. But the siloed nature of data repositories, coupled with semantic data layers tailored to specific BI tools, have long scuttled that goal. Enter the universal semantic data layer, which, when applied to a data lake, can give your BI strategy a universal boost.
What is a universal semantic data layer?
A universal semantic data layer is a single business representation of all corporate data. It aims to help end users access all corporate data using common business terms via the business intelligence (BI) and analytics tools of their choice.
[ Deliver deep insights with the 7 keys to a successful business intelligence strategy and learn why machine learning is the new BI. | Get the latest business intelligence and IT strategy analysis by signing up for our newsletter. ]
The concept of a semantic layer underpinning BI platforms has been around for some time. It was patented by Business Objects in 1991, and the patent was successfully challenged by MicroStrategy in 2003. But these semantic layers have always been purpose-built for specific BI tools, used by specific teams within the enterprise.
In the past decade, the advent of the data lake â a single repository of all enterprise data stored in its native format â gave rise to the promise that enterprises would be able to access all their data with whatever BI or analytics tools they chose, without having to move the data.
But that promise hasn't been realized, says Dave Mariani, co-founder and CEO of startup AtScale and former vice president of development, user data and analytics at Yahoo. Mariani says the missing piece is the universal semantic data layer.
The advantages of a universal semantic data layer
"Data lakes are just collection areas for files," Mariani says. "Without semantics on top of those, it's impossible to get any value out of it. I think of it as an abstraction layer. We're abstracting how the data's stored and where it's stored. We're taking what is essentially raw data and we're giving it semantic meaning for the business."
Consider the concept of "net sales," writes Matthew Baird, co-founder and CTO of AtScale.
"Is it net of invoice line-item costs and/or net of rebates? A small use case may contain tens of these calculations, while a departmental model may contain hundreds," Baird writes. "Without some level of abstraction, business is beholden to IT to generate and run reports or risk making big, costly, and worst of all, hidden mistakes. Can you afford to have each of your employees independently trying to replicate this logic correctly in their spreadsheets and reports? Will you be able to catch the subtle yet impactful errors?"
The semantic layers of the past have been point solutions, smoothing over this issue for individual BI tools. The idea behind the universal semantic data layer is to take an inventory of all the key business metrics, gather the definitions already existing in the BI tools, and collect them in a single abstraction layer where they can be managed and changed in one place.
"It gives you one place with which to manage those metrics," Mariani says. "It still allows you to have different ways of visualizing those metrics or working with them; you put them in one place and let them be consumed in different forms."
Regaining IT control of data in a self-service BI world
There's an upside for CIOs too: The universal semantic data layer puts control of data pipelines back in the purview of the IT function, while continuing to offer the business the speed and agility of self-service BI.
"You've taken away all of that data movement and data pipeline work that's been distributed out to the business and put it back in control of the data teams who are trained to do just that," Mariani says. "Because they have that view of all data, not just the data for one business unit, it enables them to create at the right scale and to react quickly when new data sources arrive. They can incorporate them holistically rather than looking through a very narrow lens."
By minimizing data movement and the creation of multiple copies of data throughout the enterprise, Mariani also says the universal semantic data layer allows you to simplify and better secure your infrastructure.
"You can define security on the data lake itself in Hadoop using Kerberos or Sentry or Ranger," Mariani says. "Anyone who logs in and runs queries on the data lake is going to be secured at the data bit level rather than at the application that's using it. Now data is being secured as it's written as opposed to as it's used. You can't do that if you're sending data extracts out to the business and the business is dealing with it on its own."