When Roger Hardy, IT director for the US city of Jeffersonville, gets an alert from an automated monitoring system that his data centre air conditioning is failing, he has 20 minutes to fix the system before the computer room's temperature reaches what he describes as its "death point".
If the thermometer inside the data centre hits 91 degrees, Jeffersonville's IT equipment is cooked — literally. That's what happened this month when the city lost $US20,000 worth of equipment after its strained air conditioning system shut down during a spell of warm weather.
Hardy, who is the sole IT worker for the community of about 29,000 residents, was getting the approval of city officials this week for adding more chiller capacity. That wasn't an IT problem he expected when he took the job in Jeffersonville last December, since the city built a new data centre just last year.
But Hardy's predecessor died very early in the project. And later construction decisions didn't fully account for future cooling needs, Hardy said, adding that it wasn't until after he was hired that he discovered that the new computer room had only a fraction of the required cooling capacity.
To try to avoid unpleasant surprises like the one that occurred in Jeffersonville, IT managers are increasingly investing in computer-aided studies that map the airflow in data centres — similar to the computational fluid dynamics studies that automotive or aircraft manufacturers use to see how air moves around objects.
Even if an IT facility has ample cooling capacity, it could still have heat problems if equipment isn't properly arranged. High-density systems such as blade servers are particularly vulnerable to airflow problems.
But airflow studies can cost as much as $US150,000, said Mark Evanko, president and principal engineer at US-based Bruns-Pak, which conducts computational fluid dynamics studies as part of its data centre engineering and design services.
The studies can be complicated, Evanko said. Assembling the data for a computerized model can involve going from rack to rack and verifying every aspect of airflow in a data centre, he said. The modelling also has to be able to account for possible changes in a data centre's configuration.
Question of effectiveness
In addition, there is debate about how effective the computational fluid dynamics studies are within data centres. Studies of airflow "look good", said John Musilli, a data centre operations manager at Intel. "But at the end of the day, it only works when you have a pristine design."
Musilli, who also is a member of the Data Centre Institute think tank within the US AFCOM professional association for data centre managers, said that as soon as users begin adding equipment to data centres or moving systems around, it creates turbulence that can upset the airflow models.
Intel does use computational modelling in its data centres, but Musilli said that it is just one tool. The models "will tell you if you have a big problem", he said. But they can show large, ominous-looking red areas over racks of servers that upon closer inspection "may not be significant", according to Musilli.
Heat-related server problems may be obvious in some cases but less so in others. For instance, sporadic disk-drive failures may be attributable to normal mechanical problems and not necessarily to hot spots in data centres.
But Bob Sullivan, an engineer and consultant at The US-based Uptime Institute, said he believes that excessive heat is at the root of many IT equipment problems. "The problem is larger than people think," he suggested.
The Uptime Institute looked at 30 computer rooms totalling 27,871 square metres of data centre space and found that on average, 10 percent of the server cabinets had hot spots — areas around them where the temperature was 25 degrees Celsius or higher.
Sullivan said IT managers can do a lot to control the hot-spot problem by placing thermometer strips on their IT equipment, checking them regularly and taking action if temperatures are rising.
Mark Levin, an independent consultant at US-based Metrics Based Assessments, said that although data on heat-related system failures is lacking, he has seen evidence that IT managers are scrambling to cope with the problem.
The need to do something is often apparent when Levin tours data centres. "When you walk through a data centre and can feel the hot spots, you know there is a problem," he said.
Hardy hopes to increase the cooling capacity in Jeffersonville's data centre within the next two months. But until that happens, he said, "it's going to be a rough road and a few late nights".
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.