Microsoft beats data-sorting record with new approach
- 23 May, 2012 04:56
Besting a record set by Yahoo in 2009, the research arm of Microsoft have deployed a new technique for quickly sorting large amounts of data, called Flat Datacenter Storage (FDS).
The researchers will discuss their work at an Association for Computing Machinery conference dedicated to databases this week in Scottsdale, Arizona. They are also implanting their data-sorting techniques in Microsoft's Bing search engine, where it could boost response times to user queries.
"Improving big-data performance has a wide range of implications across a huge number of businesses," said Microsoft Research project leader Jeremy Elson, in an online entry describing the work. "Almost any big-data problem now becomes more efficient, which, in many cases, will be the difference between the work being economically feasible or not."
In tests conducted under the MinuteSort benchmark, the system set up by Elson and his colleagues was able to sort 1,401Gb of data in a minute, which beat Yahoo's previous record of 500GB in the same time. Microsoft also boasted of sorting the data using fewer resources: The system used 1,033 disks in 250 machines while Yahoo required 5,624 disks across 1,406 machines to complete their operation.
FDS starts with a similar approach as Google's MapReduce -- as it is implemented in Apache Hadoop -- by moving the computational sorting to each individual data server. Unlike Hadoop, however, every server trades information with all the other server in the sorting cluster. The researchers used an additional Microsoft networking technology, called full bisection bandwidth networks, to boost the bandwidth, allowing each computer to both send a receive send up to 2GB per second.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
Why IT projects really fail
The enlightened CIO’s guide to running projects
Why IT projects really fail
Queensland government to provide 200 services online by 2015
Call Centers Suffer From Big Data Overload
Complexity Ate My Budget
It’s high time we tamed the monster we created! Against a backdrop of sustained and uncontrollable data growth, most of today’s operational problems revolve around backup and recovery. Understanding the hidden costs and implications for data protection strategies is critical, but the complexity of the nebulous and amorphous cloud can make everything hazy. This white paper breaks it down to different dimensions of virtualisation and how to deliver the productivity and flexibility it promises.
Managing Web Security in an Increasingly Challenging Threat Landscape
Cybercriminals have increasingly turned their attention to the web, which has become by far the predominant area of attack. Those who would do harm to our computer systems for profit or malice always manage to focus their efforts on our most vulnerable weak spots. Today, that is the web, for a wide number of reasons. Download to find out why and what you can do to protect yourself.
Deliver Enterprise Mobility with Security and Performance
Mobility and the consumerisation of IT pose key challenges for IT around scalability, security and application visibility. In this whitepaper, we look at complete, integrated and scalable solutions that deliver apps and data to any device with full security and a high-performance user experience. Learn more!