If Eastman Kodak Co. had waited four more days to implement disaster recovery plans in Florida, it would have been too late. In September 1998, Hurricane Georges was barrelling toward the Miami computing center, which supports some of the photography giant's Latin American operations. Only four days after the first test of the recovery plans had been completed at the site, managers declared a disaster and closed up shop in enough time not only to back up vital records but also to do another set of backups when the shipping company destroyed the first ones with an X-ray machine. Back at headquarters in Rochester, N.Y., systems safely restored, CIO John Chiazza sent out certificates declaring those involved in the close call "disaster masters."
The deadly storm that dumped two feet of water on parts of the Sunshine State and left hundreds of thousands of residents without electricity isn't the only time Eastman Kodak has tested aspects of its disaster recovery program. Each piece of the program is tested at least once a year but would get a workout even without planned tests. In separate incidents, backhoes five miles from headquarters destroyed main and backup telephone cables, air conditioning hoses spewed computer rooms with 8,000 gallons of water, ice storms caused power outages and left employees stranded. Tie-clad employees tell these war stories calmly, as if they watched from a heavily armoured tank.
The business continuity program at this Fortune 500 company with 80,000-plus employees reaches into computer rooms around the world, spanning the mainframe to call center to telecommunications to vital records. The program has won Kodak accolades from insurance agents and analysts, and in 1999 it earned its long-time manager, Richard Corcoran, a place in Contingency Planning and Management magazine's Hall of Fame. And now that recovery plans for the mission-critical portions of IS are so ingrained that they are part of the company's ERP rollout, the IS team wants to spread the word, by educating business units and by using year 2000 lessons to move knowledge throughout the supply chain. Chiazza wants everyone to make time for the painful questions that start with "what if."
PICKING AT THE PIECES
In the late 1980s, while Kodak's then-CIO Katherine Hudson was famously deciding to outsource the company's mainframes, a companywide disaster recovery program was quietly being born. Kodak had a patchwork of plans in place, with some testing and offsite storage of vital records, but no verified recovery program. Realising that the company had come to rely on IT's infrastructure and that the CIO was expected to protect it, Hudson asked Corcoran, then information security administrator, to conduct a study identifying the biggest risks of potential disasters and the cost of mitigating them.
Based on that report, Hudson started building a formal disaster recovery program, beginning with the most vulnerable parts of IS, at this behemoth manufacturing company that sells photographic supplies to everyone from consumers to NASA. Only 18 months and three tests later, Corcoran and outsourcing provider IBM had created and verified a recovery program that could restore the most critical mainframes with their vital data at a backup site within four to six days of a disaster. Next, the IS team focused on a telecommunications recovery program; a contract with Nortel promises a mobile PBX unit with 3,000 phones within two days of a disaster. The next crucial piece was the distributed recovery program for the company's 1,000-plus LANs--some of which can be restarted in days, and some of which don't need to be restored for a month. Finally came a call center recovery program; at a nearby Kodak training facility, 100 agents can be answering phones within one day of a disaster. Chiazza, who calls disaster recovery "insulation," says that the programs have become "part of the rule book by which we do IT."
The ERP system, which went live in mid-1997 and quickly became the most critical component of contingency planning, is the latest chapter. Before a business unit can have ERP, both business and IT managers must conduct a business impact analysis (BIA) that asks two questions: If the ERP data center is down, how will you continue your business function, and If the ERP is up but the area in which you do business is gone, how will you continue the business function? Then, they must have plans in place to support the BIA. "When we got into the ERP program, this was baked in right at the front end, because we were so used to thinking about it," Chiazza says. "With our ERP environment, it's all or nothing. The stakes are high."
Donna Scott, vice president and research director at Gartner Group, a research company based in Stamford, Conn., had not heard of this approach but practically applauded when she did. "Usually businesses roll out ERP systems, and they don't think about the continuity side of it until later," she says. "You shouldn't deploy a major new operation without having a business contingency plan in place, but it happens all the time."
But Chiazza doesn't really want to talk about disasters; he wants to talk business. "The more you talk about 'disaster recovery,' the more it gets pushed off," Chiazza says. "The more you talk about 'business continuity' and how we as a company can stay in business, the more we get [business people] engaged. If they think of it as only an IT thing, then they think they can do nothing."
Corcoran's title, he's now manager of business continuity, illustrates this shift in thinking: The word recovery has been scratched out. Simply put, disaster recovery is the restoration of computing and telecommunications services. Contingency or continuity planning are the procedures that business functions follow to keep going until IT facilities are recovered. Kodak's overall business continuity program sits on three legs: disaster recovery programs, continuity planning programs and an emergency management plan--a six-person team that bridges the business and IT sides is responsible for mitigating problems and declaring a disaster when necessary. Although Corcoran indirectly reports to Chiazza through information security systems, executives at some companies have moved business contingency into the auditing department, or evento the vice president level, to increase visibility.
Regardless of where the program is placed, however, management must play a major role, Chiazza says, partly because of the high-level dictate needed for a successful program, and partly because IT doesn't necessarily have a handle on organizationwide priorities. At Kodak, executives from every corner of the business settled on a nine-step restoration sequence in which order entry and inventory control come first and second, followed by manufacturing processes, purchasing and warehouse control. Next come payroll and accounting, and then quality assurance and master data. Not only does the list help IS prioritise, the sequence allows members of each business unit to understand their importance in the company. Individual business units still have some leeway based on their needs.
As wording shifts from recovery to continuity, the role of individual business units is changing too. Each business unit has an IS staff that reports to Chiazza through both regional and functional IS directors. Although big-picture disaster recovery--like that of the ERP system--can be handled centrally, every workstation and assembly line can't be the direct responsibility of one unit. To that end, executives at Kodak emphasise that business units are the owners of contingency plans. "They will bear the consequences of any outages, so they're in the best position to make the cost-benefit decisions," Chiazza says.
Each business unit is responsible for looking at its needs and evaluating whether its IS staff is meeting them--both in terms of the recovery point objective (how current the restored information needs to be) and the recovery time objective (how soon systems must be restored). "There's a give and take," Corcoran explains. "The manager says, 'Well, maybe I can go two days. How much is that going to cost me?'" Once managers know how long they may be without computing and telecommunications services, they can figure out what to do in the meantime.
Because Kodak is a US$14 billion company with manufacturing in 10 countries, this proactive ownership is crucial. Individual business units are more aware of the risks they face based on location. While the Miami unit plans for hurricanes, Rochester businesses worry about blizzards and a nearby nuclear power plant, and a plant in China deals with an unreliable telephone system.
Fred Joy, senior research analyst at the Stamford, Conn.-based research company Meta Group Inc., emphasises the importance of a system in which both business and IT have significant involvement. "Certain aspects are useful to centralise, like the overall policy and planning," he says. "The IT organisation of things, like procurement of hardware, is also really good to centralise because it's specialised.... [But] there is a level at which you really want individual units to be responsible for coming up with a plan."
At Kodak, a set of internal control standards establish the rules that business units are audited against. Corcoran, who acts as a consultant for the business units, is working on giving the units the tools to create those plans, including templates users can download, through an intranet site he posted in April 1999. The site's information was developed with IS in mind, but he says the principles of the six-step plan are more broadly applicable: initiate the project, conduct a vital business assessment, develop business continuity options, design a business plan, test it and maintain it.
Gartner Group Inc.'s Scott says a setup in which personalised templates are made available is typical. "You don't want to reinvent the wheel," she says. "You want to modify them to have your own terminology and your own company's culture so that business units understand them."
If continuity is Kodak's religion, then they've begun evangelising to the wider world. Terry Breslawski, the former Y2K manager, has turned his efforts to the plans of suppliers--and the plans of their suppliers. A recent explosion at a factory in Japan illustrates an oddity of business continuity: disasters help. After the explosion, the chemical manufacturer sent Kodak a letter saying that it wouldn't be able to meet its delivery requirements, and Breslawski has evidence of what he wants to prevent.As the manager of business systems in the worldwide purchasing department, Breslawski has been asking suppliers about their continuity plans. (See "Weak Links," below) "We expect our top suppliers to know that this is important. We can't make them do it, though." Although he says Kodak is not terminating contracts because of a lack of continuity plans, evidence of good planning helps new suppliers land the deals.
Although a majority of Y2K plans are gathering dust, analysts say, some companies have used Y2K as a springboard. "The year 2000 [problem] raised the responsibility of contingency planning to the highest level," and has also increased funding, says Gartner Group's Scott.
Fellow photography and manufacturing company Polaroid, for example, used Y2K to standardise its patchwork of business continuity plans. "We developed a common template for the organisation, and we ensured that everyone's continuation plans were designed accordingly and that everybody had them," says Thomas R. Hennigan, CIO and vice president at the US$2 billion company known for its instant film and cameras. "Because we ran the whole year 2000 program, we wound up being responsible for making sure all the entities had developed continuation plans." Those entities--manufacturing plants, for example, or the human resources department--had to answer a set of 20 questions about their business continuity plans. Many entities already had plans, but some did not; the answers ranged from three pages long to binders full.
WORTH THE MONEY
Not that Kodak hasn't been glad for those Y2K plans too. Although mission-critical recovery plans are all in place, some of the systems deemed "essential" are not covered by a full-blown contingency plan. Chiazza is tight-lipped about details but says that a computing service interruption in Europe during the first half of this year could have had severe implications, but the Y2K plan was recent enough that managers could use it to restart business. "Would they have figured it out in a day or two? Maybe. But they had a plan sitting on the shelf and went right to it," Chiazza says. "There was no question about where to start, what to do, who to call, who had to be involved."
Chiazza's other major projects now involve shortening the time needed to restore the ERP system (currently eight hours for full operation at a standby site) and strengthening Kodak's Internet-facing infrastructure. Of course, there are costs associated with building and maintaining a program. CIOs trying to drum up high-level support should start by researching how other companies have been affected by the lack of contingency planning, Chiazza says, and then look at how their companies would be affected by a power outage to a sensitive operation that's heavily supported by technology. CIOs should also note that businesses are legally required to have business continuity plans in place--to meet due diligence requirements for stockholders, for instance, or IRS requirements for keeping records on file--and that good continuity plans can have a significant impact on insurance costs.
Even so, "it doesn't come free," Chiazza says. "But when you weigh it against the implications of being out of business, it's one of those small insurance policies that you take and religiously follow. The relatively modest amount we invest each year is well worth the money."
At press time, nothing catastrophic was preventing Sarah D. Scalet from receiving feedback at firstname.lastname@example.org.
Kodak's internal control standards--a set of companywide policies against which business units are audited--offer the following definitions:
- Contingency planning: The prearranged plans and procedures that critical business functions will execute to ensure business continuity until computer and telecommunications facilities are reestablished following a disaster.
- Critical application: One that supports major revenue activities, movement of goods to customers or a strategic manufacturing process, or fulfils contractual or regulatory obligations. In addition, the application's availability is deemed by management to be vital to the continued functioning of business.
- Disaster: A loss of computing or telecommunication resources to the extent that routine recovery measures cannot restore normal service levels. Applies to losses expected to last longer than one day and significantly impact business operations.
- Recovery: The restoration of computing and telecommunications services following an outage resulting from a disaster.
- Vital business assessment: A process required to determine what business functions and supporting applications are critical for the company to continue to conduct business in the event of a disaster.
In IT circles, the name Eastman Kodak brings to mind not floodwaters or photos, but former CIO Katherine Hudson's landmark decision to outsource the company's mainframes to IBM in 1989. Less known is the fact that outsourcing jitters started Kodak down a path toward best-practice plans for the worst.
"Outsourcing was new to us; it was new to the whole industry," recalls current CIO John Chiazza, then a manager in Kodak's Logistics Organisation. With the new setup, Kodak was consolidating at least five data centres from the United States and Canada, increasing the risks of a system failure.
Hudson made sure the outsourcing contract included a clause that IBM would develop a disaster recovery plan for the mainframe center. Meanwhile, what began as a systems problem turned into discussions about business. "It was the first time we got people together from a variety of functions and started to ask the question, What if? If you had an interruption, what would you bring up first? What did it really mean to be 'in business?'" Chiazza says. "It also got us thinking about how we could put in some insulation so that we wouldn't have to bear the consequences of that really painful restart."
Hudson, now CEO and president of Brady Corp. in Milwaukee, is surprised by her role in this disaster recovery legend. "I remember big, thick reports," she recalls when asked what prompted her to request a study of the company's recovery program. "But it was just part of the grand scheme." And what about the mandate she got the CEO to issue, stating that every critical business unit needed a disaster recovery program? She shrugs that off too. "I needed the involvement of the business leaders themselves. You've got to have the hammer of the CEO. I went to him and said, 'I need your hammer,' so he hammered."
The former Y2K manager at Kodak, Terry Breslawski, has turned his focus to how well Kodak's supply chain is preparing for the worst. When he asks suppliers for evidence of a business continuity plan, he says, "We get everything from, 'What's that?' to, 'We can't send you the whole thing; what do you want?'" Once Breslawski has the answers, he checks a "yes" or "needs" column down the left side of the following checklist. When the sales people don't know the answers, they usually take the list straight to IT. How would your company fare?
Evidence of plan
Top management commitment and approval
Threat and risk analysis (including natural and man-made disasters) Enterprisewide business impact analysis (BIA) Preventative measures taken Evidence of response,recovery and continuity plans Safety and evacuation plans for personnel Emergency response and command plans Disaster recovery plans derived from BIA: How would we recover from a shutdown?
Business continuity plans: What are our key operations during a shutdown?
Communication plans with personnel, media and business partners Evidence of preparedness, testing and maintenance Plan awareness and personnel training 12- to 18-month review and revisionprocess Plan audit and/or testing to determine results and improvement opportunities Evidence of supplier readiness Key suppliers investigated for business continuity plans.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.