Of course, it would be far more difficult to build an operating system capable of handling all the resources of a data center than it is to build one that allows a single device to run its applications. Data centers have teams of IT pros to make sure all the servers are running and that applications get enough storage and so forth, but the job is getting so big that a much more expansive OS to handle the whole data center is becoming necessary.
That's what UC-Berkeley Ph.D. student Matei Zaharia argued at this week's Usenix annual technical conference in Portland, Ore.
MORE USENIX NEWS: Cloud economics favor the small workload
He's not the first to propose an OS for large clusters of computing systems, but he believes the need is getting more critical because of the growing diversity of applications and users, programming frameworks and storage systems.
A data center OS would wrap those all together into one management platform and provide resource sharing, data sharing, programming abstraction and debugging.
"These are the same reasons we developed time sharing and operating systems for computers," Zaharia said.
One audience member noted that the idea of building operating systems for clusters has been around for decades, and challenged Zaharia to describe what's new today and why it might succeed now.
Zaharia countered that early versions of data center operating systems are already being built. He pointed to Google and the sophisticated methods the company has employed to run its data centers, which haven't been completely revealed to the public.
"Google's software stack is something that is designed with operating system-like thinking," he said.
Zaharia and colleagues described their thoughts in a paper titled, aptly, "The Datacenter Needs an Operating System," which can be read on the Usenix website.
"Datacenters already host a diverse array of applications (storage systems, web applications, long-running services, and batch analytics), and as new cluster programming frameworks are developed, we expect the number of applications to grow," the paper states. "For example, Google has augmented its MapReduce framework with Pregel (a specialized framework for graph applications), Dremel (a low-latency system for interactive data mining), and Percolator (an incremental indexing system). At the same time, the number of cluster users is growing: for example, Facebook's Hadoop data warehouse runs near-interactive SQL queries from hundreds of users. Consequently, it is crucial for datacenter operators to be able to multiplex resources efficiently both between users of an application and across applications."
Zaharia didn't claim to have built a data center operating system, but says his team has taken an initial step by designing a cluster manager called Mesos "that enables fine-grained sharing across applications."
Questions that still need to be answered include how to build standardized interfaces, handle streaming data and guarantee storage performance.
But Zaharia sees numerous companies, including Google, Amazon and Microsoft, working on these problems.
"Software platforms such as the Hadoop stack, LAMP, Amazon Web Services, Windows Azure, and Google's GFS / BigTable / MapReduce stack form today's de facto datacenter OS," he writes. "These platforms are gradually evolving to cope with the increased diversity of datacenter users and workloads (for example, substantial effort was put into Hadoop scheduling for multi-user clusters), but datacenter applications are still generally hard to develop and do not interoperate easily."
Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin
Read more about data center in Network World's Data Center section.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.