MapR Technologies updated its Hadoop distribution today to support self-service SQL analytics.
The company introduced Apache Drill 0.5 in September of last year, but has now replaced it with the Apache Drill 1.0 release.
Drill is an open source distributed ANSI SQL query engine for self-service data exploration -- an open source version of Google's Dremel system for interactively querying large datasets, which powers its BigQuery service. The stated goal of the Apache Drill project is to make it able to scale to 10,000 servers or more while processing petabytes of data and trillions of records in seconds.
Drilling into data
Drill allows you to interact with data from both legacy transactional systems and new data sources -- including Internet of Things (IoT) sensors, Web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools. Perhaps most important, it is a schema-free SQL engine for big data. Because it doesn't require pre-defined schema definitions, IT doesn't have to insert itself into the middle of the discovery process to flatten the data.
"IT is being stressed as it is, and this is a chicken and egg problem," says Jack Norris, CMO of MapR. "We'd like to explore this data, but how do we prioritize what we want to work on if we don't know what we're looking for."
The advantage Drill provides, Norris says, is data agility. For instance, JSON files are messy structures. They contain their own schemas, which can be complex and can change almost record-by-record. One document might have purchases by name. The next might include purchases with data about spouse and children nested within. When you get into IoT data, you might have JSON files from hundreds and thousands of devices, with each dataset potentially having a different format.
[ Related: The Best Open Source Big Data Tools]
"You've got to flatten it or do some sort of sub-select," Norris says. "That's typically an IT function to determine how to represent this data. That's what's going on with other tools."
Drill, on the other hand, is designed to deal with the nested structure and doesn't require IT to step in to flatten it out and figure out what data is important ahead of time.
Norris also notes that the MapR partner ecosystem is embracing Apache Drill, including: Information Builders, JReport (Jinfonet Software), MicroStrategy, Qlik, SAP, Simba, Tableau and TIBCO. They've all been working closely with MapR and the Drill community to make interoperable BI tools with Drill through standard ODBC/JDBC connectivity. Drill Explorer sits inside the ODBC driver, where it browses data available via Drill and exposes a transparent view into schema, enabling seamless and fast self-service data exploration.
Gaining real-time insight
"The availability of Apache Drill in the MapR distribution is a major milestone for the SQL-on-Hadoop project, which is significant in delivering real-time insights from complex data formats without requiring any data preparation," Matt Aslett, research director, data platforms and analytics, 451 Research, said in a press statement Tuesday.