GRAPEVINE, Texas -- When it came to managing most of AOL's 6 petabytes of data, a Fibre Channel SAN sufficed. But for its most critical relational database, AOL found that the SAN was too constrained and caused its IT shop to miss business unit service levels more than 50% of the time.
After investigating what may have been causing I/O bottlenecks, AOL found the problem was backend storage. To fix the problem, AOL decided to build a 50TB storage area network (SAN) from solid state technology.
The upgrade worked, increasing throughput to the SQL database by four times over the Fibre Channel SAN, while still providing storage admins the flexibility to migrate data between storage systems because the NAND flash memory sat behind an existing virtualization appliance, which aggregates all of the backend storage and serves it up as from a single pool.
AOL engineers found that the throughput problems didn't appear to be based on hardware. The company has five large high-end arrays with 15,000 or 10,000 rpm drives for primary storage and two lower-end arrays with serial ATA (SATA) drives for nearline backup. But the storage arrays could only feed data to internal drives as fast as the SAS backbone allowed, which was about 6Gbit/sec.
AOL's storage infrastructure supports about 4,000 servers that feed information to both online users as well as to the company's own backend applications.
Dan Pollack, senior operations architect for AOL, said he considered adding solid state drives to the company's existing legacy storage arrays, but decided against it because they would be held back by the array's controller bandwidth.
"So you find you can take a high-performance SSD device and half the performance is lost at the head of the array," Pollack told an audience at the Storage Networking World conference here this week.
Pollack also considered using solid state drives in servers, but didn't because he would not have been able to cluster them into his SAN, they don't offer the required capacity, such drives could not be non-disruptively swapped out and data couldn't be migrated between systems.
Pollack settled on a solid state array from Violin Memory . It could plug directly into his Fibre Channel network, and could sit behind his storage virtualization appliance from YadaYada, which was purchased last year by storage giant EMC.
Pollack said the rollout, including planning and testing, took only eight weeks. The system has been live since July 1. Since then, the SSD array has had zero downtime and it has allowed his shop to meet all of the business SLAs without exception. I/O response times are typically less than 1 millisecond and there was no impact in his IT team's ability to manage the backend storage.
"It's very easy to fall in love with this stuff once you're on it," he said.
But love can be expensive. Without offering an exact price tag, Pollack said the solid state array cost AOL about $20 per gigabyte, which with 50TB of capacity, adds up to about $1 million.
Pollack said that money was less of an obstacle than that alternative -- throwing more manpower at the database problems.
The flash array has RAID 5 internal protection, so the drives are hot swappable. For added protection, Pollack configured the "preliminary" installation so that it mirrored data across two clusters of Violin appliances -- each cluster being made up of six Violin appliances.
"Because this is all new and unproven, and this is a tier one heavily visible application, we felt it was prudent to spend the extra money and time and provide that additional protection. In the future we won't do that," he said.
And, while the array didn't offer the 1 million-plus I/Os per second (IOPS) that vendor marketing material boasts, it does come in at around 250,000 IOPS, which Pollack said is more than enough for his purposes.
There were also significant cost savings associated with the flash memory, he added.
For one, Pollack said, Fibre Channel arrays often only use about 10% of the capacity in its top tier drives because storage admins often short-stroke the drives, using only the outer sectors of the drive to increase I/O response times. That can eat as much as 20 kilowatts of power per array, he said.
By comparison, the flash array only uses 2 kilowatts and 90% of its capacity is utilized. On top of that, the Violin storage array takes up less floor space and produces less heat, which also helps to reduce power required from HVAC systems.
Violin doesn't use NAND flash in hard drive form factors, as many solid state companies do today. Instead it uses flash chips directly on cards called Violin Intelligent Memory Modules (VIMMs). VIMMs are similar to DIMMs, only built out of flash instead of DRAM.
"So you're getting the 4GB/sec of PCIe bandwidth not the 5Gbit/sec or 6 Gbit/sec SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian , or subscribe to Lucas's RSS feed . His e-mail address is firstname.lastname@example.org .
Read more about storage in Computerworld's Storage Topic Center.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.