The traditional database, known as RDBMS (Relational Database Management System), is a marvel of engineering and computer science. In one piece of software, it stores your data safely, provides a unified SQL interface to your data, and even reasons about your data in certain contexts.
Yet, it carries at a foundational level assumptions from the single-machine past that are preventing enterprises from growing the capabilities of their analytics. As Web companies like Google, Amazon, and others unbundle the database, the technologies at its core are experiencing a revival and are yielding even more value for their users than their original versions, especially for analysis.
What has changed since IBM's System R and later Oracle redefined the landscape? First, the big data revolution has led many more companies to collect substantial data and a lot more people in those companies are called on to work with that data day-to-day. It's not just the BI or decision support teams anymore.
Second, networking technology has become more advanced and made it possible to move terabytes easily while compute power has spread everywhere outside the corporate data center, from your desktop to your pocket to the cloud.
Third, the consequences of not using the data correctly are becoming dire, both from a competition as well as regulatory and security standpoints. More on these trends at my XLDB talk at Stanford: “There's no data like more data”.
Traditional databases -- and the business model behind them (charging based on data size, for instance) -- are predicated on the notion that the storage must be done together with the computation, which is a requirement for certain kinds of data integrity, as in the case of database transactions. If you debit $10 from one account and credit to another, the system ensures that no money is lost or accidentally created in the process. That's OLTP (online transaction processing). As the value of enterprise analytics is realized (i.e., understanding money flows across accounts, say, vs processing individual debit/credit), the share of OLAP (online analytical processing) is growing over OLTP. I'll wager that the CPU cycles spent on OLTP/OLAP have gone from 90/10 to 10/90 in the last decade.
Next generation analytical applications are now combining data from many more sources, from inside the organization (spreadsheets, machine generated logs, SaaS applications like Salesforce/Workday (disclaimer: Workday is an investor in Metanautix) as well as from outside (web, social media, weather, etc). As underlying storage system options multiply from the traditional RDBMS to NoSQL to high-throughput filesystems like AWS/S3 (in the cloud) or Hadoop/HDFS (perhaps on-prem), it matters less where the data is stored than what you're doing with it. In fact, you're nearly guaranteed a series of re-platforming exercises aimed at reducing costs, improving performance, shoring up compliance or audit, and other goals. So, the value of the data is growing relative to its container and the value of agility to the organization is growing relative to the cost of management, which is why organizations are investing in high price data science teams, sometimes even without specific objectives.
As always, we'd love to hear your feedback on this. Please feel free to email me or leave comments below.
In the 1960s, Edsger Dijkstra wrote the heartfelt "Go To Statement Considered Harmful" that inspired many other "XYZ Considered Harmful." The title was intended to challenge orthodox views on a topic. This is a lighthearted series of posts in that vein.