Microsoft beats data-sorting record with new approach
- 23 May, 2012 04:56
Besting a record set by Yahoo in 2009, the research arm of Microsoft have deployed a new technique for quickly sorting large amounts of data, called Flat Datacenter Storage (FDS).
The researchers will discuss their work at an Association for Computing Machinery conference dedicated to databases this week in Scottsdale, Arizona. They are also implanting their data-sorting techniques in Microsoft's Bing search engine, where it could boost response times to user queries.
"Improving big-data performance has a wide range of implications across a huge number of businesses," said Microsoft Research project leader Jeremy Elson, in an online entry describing the work. "Almost any big-data problem now becomes more efficient, which, in many cases, will be the difference between the work being economically feasible or not."
In tests conducted under the MinuteSort benchmark, the system set up by Elson and his colleagues was able to sort 1,401Gb of data in a minute, which beat Yahoo's previous record of 500GB in the same time. Microsoft also boasted of sorting the data using fewer resources: The system used 1,033 disks in 250 machines while Yahoo required 5,624 disks across 1,406 machines to complete their operation.
FDS starts with a similar approach as Google's MapReduce -- as it is implemented in Apache Hadoop -- by moving the computational sorting to each individual data server. Unlike Hadoop, however, every server trades information with all the other server in the sorting cluster. The researchers used an additional Microsoft networking technology, called full bisection bandwidth networks, to boost the bandwidth, allowing each computer to both send a receive send up to 2GB per second.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
Why change management doesn’t work
Larry Page wants to see your medical records
Dual-Persona Smartphones Not a BYOD Panacea
After two-year hiatus, EFF accepts bitcoin donations again
CIOs struggle to deliver timely mobile business apps: survey
Managing the Rapid Rise in Database Growth: 2011 IOUG Survey on Database Manageability
As the era of “Big Data” marches on unabated, data is coming from an ever wider range of sources, including transactional systems, mobile devices, sensors, streaming media, and social networks. Businesses are looking for innovative ways to better leverage terabytes—and for some, petabytes—of information. Read more.
Leading Through Connections – Insights from the Global Chief Executive Officer Study
IBM’s 2012 Global CEO study follows face-to-face discussions with more than 1,700 CEOs and senior public sector leaders from around the globe. The findings examine how CEOs are responding to the complexity of increasingly interconnected organisations, markets, societies and governments. For example, almost one-quarter of CEOs say their organisations operate below par in terms of driving value from data. CEOs have expressed frustration about their inability to capitalise on available information. This is because: “The time available to capture, interpret and act on information is getting shorter and shorter.” CEO, Chemicals and Petroleum, United States Given the need for deeper business insight, the best performing organisations are more adept at converting complex data into insights, and insights into action. Download Entire Report Now.
Tolly Report: Performance Survey of Virtual Environment Security
This report by Tolly tests the system resource requirements of competing vendor solutions when performing on-demand and on-access scanning functions, during distributed definition updates. Click to download how the four competing options ranked against each other.