Critical.
Authoritative.
Strategic.
Subscribe to CIO Magazine »

Linux is culprit in leap-second lapses: Cassandra exec

The Linux kernel has a bug in how it handles leap seconds, according to the creator of the Cassandra database

A number of high-profile outages that took place last weekend can be traced back to how the Linux OS kernel mishandled a leap second added to the official time, charges the CTO of DataStax, a company that manages the open source Cassandra database.

"Initial reporting often fingered Java or even Cassandra as the culprit ... but the actual problem was a kind of livelock in the Linux system calls responsible for timers," wrote DataStax CTO and Cassandra creator Jonathan Ellis, in a blog post.

On Saturday midnight Greenwich Mean Time (GMT), an extra second was added to the Universal Coordinated Time (UTC), the official time used to coordinate servers across the Internet. Although the Network Time Protocol (NTP), the most widely used mechanism to synchronize the time across the Internet, was designed to handle leap seconds, a number of popular Internet services briefly went offline after the second was inserted in their servers, including those running Reddit, LinkedIn and the Quantas airline reservation system.

ReddIt engineers had initially assumed that Cassandra, along with Java, was source of its leap- second related outage on Saturday. The problem wasn't with either of those technologies, Ellis countered, but rather with the underlying OS. (Oracle, which manages Java, did not immediately respond to comment).

A system administrator would have first noticed the problem manifesting as an extremely high system load or even a system crash that could be traced back, via the normal administrative tools, to an application such as Cassandra, the Java Virtual Machine, Hadoop, or MySQL. The actual culprit, however, turned out to be a harder-to-pinpoint bug in the way Linux updated its clocks when a leap second was introduced, Ellis said.

Many had found that resetting the application did not restore the server to normal operation. They could, however, remedy the issue by resetting the system clock or rebooting the server.

Whatever the cause, the bug disrupted the Saturday evenings of many a system administrator. On the Time-Nuts mailing list, one admin reported spending the evening rebooting hundreds of servers.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Comments are now closed.
Related Whitepapers
Latest Stories
Community Comments
Latest Blog Posts
Whitepapers
  • Tolly Report: Performance Survey of Virtual Environment Security
    This report by Tolly tests the system resource requirements of competing vendor solutions when performing on-demand and on-access scanning functions, during distributed definition updates. Click to download how the four competing options ranked against each other.
    Learn more »
  • Pathways Advanced ICT Leadership Development Program Course Outline and Big 6 2013
    Developed by the CIO executive Council in conjunction with Rob Livingstone Advisory, Pathways Advanced is a 12-month CIO delivered, small group, mentor based professional leadership development program. Pathways Advanced brings together best practice, thought leadership and business insights for today’s most promising ICT professionals
    Learn more »
  • Benefits of Deploying Microsoft Exchange Server 2010 on Dell Compellent with Data Progression
    Messaging and collaboration platforms have emerged as mission critical applications, consuming a large portion of IT spending for organisations. The rich features in these applications have significantly changed the messaging requirements and needs of today’s information from anywhere with any device, the result is an ever increasing demand on storage systems both in terms of capacity and bandwidth. Many organisations are rethinking their storage strategies to meet the demanding criteria and to handle the future requirements. Read more.
    Learn more »
All whitepapers
rhs_login_lockGet exclusive access to Invitation only events CIO, reports & analysis.
Recent comments