Menu
Menu
Big data - Part 1

Big data - Part 1

The rate of data growth in the world is mind boggling

Movideo chief technology officer, Cameron Moore, is using parallel database technology to store and analyse terabytes of video data.

Movideo chief technology officer, Cameron Moore, is using parallel database technology to store and analyse terabytes of video data.

According to IDC’s Digital Universe report the data created globally on an annual basis will leap from 1.2 zettabytes this year to 35 zettabytes in 2020 (one zettabyte is equal to one billion terabytes).

Even when scaled back to the data generated by a single business, the numbers can be comparatively scary. The internet has opened new marketing and communication channels whose digital nature result in the generation of data by the gigabyte, while many organisations have also become better at capturing transactional data. And most organisations have not even begun to analyse the massive piles of unstructured data being generated through social media.

Many of the data sets in existence today defy being crunched with conventional tools, so the wisdom they could yield lies unrealised.

It’s little wonder that the explosion of data volumes and complexity — and the techniques for dealing with it — has become known as Big Data.

Read more about Big Data.

While the data sets might be daunting, Big Data promises to unlock business value through delivering better intelligence in areas such as fraud detection, loan risk analysis and customer behavioural analytics — and do so potentially even in real time.

Director of the business intelligence specialist company C3, Cameron Wall, says Big Data is pushing the limits of the current technology, and the glory days of traditional relational database management technology are fading.

There is a lot more transaction data and a lot more fragmentation in terms of channels

“The move to capturing more information and the move to real time analysis has meant that businesses are now starting to think seriously about how they store their data and what they do with it,” Wall says. “It’s not good enough to just have a bigger repository sitting there collecting data in batch mode.

For some sectors, such as genomics and bio-informatics research, dealing with Big Data issues is nothing new, and in many ways the technologies pioneered in these sectors are forming the basis of solutions used in the commercial world. The reason commercial organisations are becoming more interested in Big Data techniques is simple: Big Data analysis can deliver insights above and beyond that which is possible using even current relational technology. Online companies such as Amazon and eBay in particular are showing what can be done, and rivals are keen to whittle down their competitive advantage.

“There used to be this catch-cry that knowledge is power, and that information is the asset of the organisation, and I think companies were paying lip service to that,” Wall says. “But now they are paying full attention and are forced to deliver on that promise, because they are seeing companies overseas perform, and perform well — especially the online businesses.”

The Australian commercial online video company Movideo is using Big Data analysis techniques to perform advanced analytics on its video feeds. Movideo is a subsidiary of the digital media company MCM Entertainment and streams video content for popular television programming such as MasterChef and Formula 1 racing, as well as music video streaming.

Movideo’s chief technology officer, Cameron Moore, says his company is using massively parallel database technology from Greenplum to create a data warehouse for storing and analysing terabytes of data that are being collected in relation to its video streams, such as how long is a viewer watching a video for, where are they watching it from, and what format they are watching.

“Basically we use that information to work with key performance indicators of the platform and content editors use that to work out if the content is doing well or not,” Moore says.

The advantage of using Greenplum is Movideo can store and process terabytes of data while keeping the data in its most basic native form, rather than ‘rolling it up’ into averaged data.

“We don’t roll up our data, which the traditional systems do,” Moore says. “Once you roll that data up, you destroy what’s underneath and you can never go and do an ad hoc query after the fact. If you roll up all your data for the averages, you can’t drill down to a specific moment in time.”

For instance, were Movideo to roll up regional data to a state or country level, it would never be able to go back into that data to pin-point data at a suburban level.

“If you are a local provider, having data at a country level is pretty much useless to you,” he says.

The massively parallel nature of Greenplum means that huge tasks can be broken down into manageable bites and executed in parallel on commodity hardware, significantly reducing the cost of processing Big Data problems.

Greenplum was acquired by EMC in 2010, and is just one of dozens of tools that have emerged to handle these problems. Other distributed processing tools commonly used to tackle these problems include MapReduce, Hadoop and Cassandra, which are all designed to process large data sets across clusters of computers. MapReduce was developed by Google in 2004; the latter two are open source projects, with Cassandra initially developed by Facebook to power its Inbox Search feature. The online dating service eHarmony uses Hadoop to determine with whom members are ideally matched.

Read Part 2 of Big Data.

Moore says Movideo has been working with Greenplum for about 12 months, having previously used traditional open source databases. But these simply weren’t fast enough to process the growth in the data that it was collecting.

Now, he says, Movideo is able to offer customers a filtering system whereby they can ‘drag’ in a video and the immediately analyse metrics such as how often it was viewed in a specific region and for how long.

“The system can do an ad hoc query on the fly and return that data, and draw that on a map.”

In future Moore hopes to be able to use the power of massive parallel processing to provide new options to consumers, such as being able to provide ad hoc matching of preferences to help them discover additional content.

“What we are looking at doing right now is exposing that data back to the users in a social networking sense,” Moore says. “One user watching content can see other users watching content at the same time, so we can create a social atmosphere around premium content.”

Recommended reading:

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!

Error: Please check your email address.

Tags data analyticsgreenplummovideoFormula 1Cameron MoorestorageMasterChefbig datadata analysisemctransactional dataC3

More about Amazon Web ServiceseBayEMC CorporationFacebookGoogleIBM AustraliaIBM AustraliaIDC AustraliaNSA

Show Comments
Computerworld
ARN
Techworld
CMO