This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
As organizations work to make big data broadly available in the form of easily consumable analytics, they should consider outsourcing functions to the cloud. By opting for a Big Data as a Service solution that handles the resource-intensive and time-intensive operational aspects of big data technologies such as Hadoop, Spark, Hive and more, enterprises can focus on the benefits of big data and less on the grunt work.
The advent of big data raises fundamental questions about how organizations can embrace its potential, bring its value to greater parts of the organization and incorporate that data with pre-existing enterprise data stores, such as enterprise data warehouses (EDWs) and data marts.
The dominant big data technology in commercial use today is Apache Hadoop. It’s used alongside other technologies that are part of the greater Hadoop ecosystem, such as the Apache Spark in-memory processing engine, the Apache Hive data warehouse infrastructure, and the Apache HBase NoSQL storage system.
In order for enterprises to include big data in their core enterprise data architecture, adaptation of and investment in Big Data as a Service technologies are required. A modern data architecture suited for today’s demands should be comprised of the following components:
* High-performance, analytic-ready data store on Hadoop. How can big data be speedy and analysis-ready? A best practice for building an analysis-friendly big data environment is to create an analytic data store that loads the most commonly used datasets from the Hadoop data lake and structures them into dimensional models. With an analytic-ready data store on top of Hadoop, organizations can get the fastest response to queries. These models are easy for business users to understand, and they facilitate the exploration of how business contexts change over time.
This analytic data store must not only support reporting for the known-use cases, but also exploratory analysis for unplanned scenarios. The process should be seamless to the user, eliminating the need to know whether to query the analytic data store or Hadoop directly.
* Semantic layer that facilitates “business language” data analysis. How can big data be accessible to more business users? To hide the complexities in raw data and to expose data to business users in easily understood business terms, a semantic overlay is required. This semantic layer is a logical representation of data, where business rules can be applied.
For example, a semantic layer can define “high-value customers” as “those who have been customers for more than three years and are making new or renewal purchases on a regular basis.” The data for “high-value customers” might have been sourced from different tables and gone through different levels of calculation and transformation before arriving at the semantic layer, all invisible to the business user who queries for “high-value customer.”
Previously, business users would have to query Hadoop directly, which is impractical, or request information from IT, which means waiting in a queue of reporting requests. A semantic layer enables business users to analyze and explore data using familiar business terms — without the need to wait for IT to prioritize requests. It also allows for the reuse of data, reports and analysis across different users, maintaining alignment and consistency and saving IT the effort of responding to every individual request on a case-by-case basis.
* A multi-tenant big data environment. How can big data be accessed throughout the organization, no matter where people sit? With widespread demand for analytics, organizations need to embrace a hybrid centralized and decentralized approach to data. This allows different teams to incorporate local data sets and semantic definitions while also accessing the enterprise data resources that IT creates.
This hybrid approach can be achieved with a multi-tenant data architecture. In this architecture, IT collects and cleanses data into a shared Hadoop data lake and prepares a central semantic layer and analytic data store from that data.
IT then creates virtual copies of the centralized data environment for different business groups, such as finance, sales, marketing and customer support. This way, IT keeps the authority in data governance and semantic rules, while business groups and departments can truly see the impact of their daily business activities against historical or corporate data stored in Hadoop.
* User-friendly ways of consuming analytics. How can the experience of big data analysis be user friendly? A final consideration for the end-user delivery of big data is the form in which data will be represented. These data interfaces should meet the unique and individual needs of all users. This requirement includes providing highly interactive and responsive dashboards for business users, intuitive visual discovery for analysts and pixel-perfect, scheduled reports for information consumers.
While each style is unique, the best practice is to ensure that each interface is not a separate tool, so that creating, collaborating and publishing information is done with consistency and accuracy. This is only achievable through a semantic layer that ensures data values remain consistent, while data presentations might differ from one user interface to another.
Big data is increasingly vital to the enterprise and a fundamental part of the enterprise data architecture. To tap big data's full potential, enterprises need to accelerate investments in technologies that efficiently and effectively analyze and store data. Cloud solutions for big data and analytics make that possible. With them, enterprises can position themselves well for future data growth, and in turn, excel in the ever evolving big data ecosystem.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.