No one likes to think about disasters, but pretending that it can't happen is asking for even worse troubleWhen a huge fire at Esso's Longford gas plant cut supplies to Victoria last September, authorities faced a Herculean effort to respond to the crisis. The Department of Human Services had to put up a 1000-seat call centre virtually overnight to help vulnerable people cope with the loss of services, while shared IT service department Energy Information Solutions had just 48 hours to build Internet-based systems to tie in with it. Before supplies were even restored, gas authority VENCorp had to set up a 500-seat helpline for the thousands of householders calling to register for reconnection assistance.
Meanwhile, up to 89,000 businesses took an estimated $13 billion worth of losses; as much as $200 million was knocked off the state's export earnings, while the restaurateurs, manufacturers, tourism and hospitality operators most affected by the crisis had to find their own emergency response to the supply interruption. If ever there was an argument for continuity planning, the Longford gas crisis was it. And if, as seems likely, failures in essential services become more common as suppliers continue to make do with fewer people and cut back further on routine maintenance, continuity planning is likely to become even more of a focus.
And the buck stops with you -- the CIO. While more and more companies are now evaluating continuity planning from a whole-of-business point of view, systems remain a priority -- and that means the CIO continues to be seen as having the primary responsibility.
Continuity planning has evolved considerably since its inception in the '80s, when disaster recovery planning was the focus and the glasshouse the centre of activity. Disaster recovery planning became business recovery planning (BRP) once organisations realised that not only the systems themselves, but also every business area reliant on those systems had to be capable of rapid recovery in the event of a disaster. Nowadays, says Bergman Voysey and Associates managing director Peter Voysey, heavily IT-dependent businesses are calling the activity business continuity planning (BCP), and its sole aim is to minimise the need for any recovery effort at all in the event of a disaster.
But BCP by itself -- with its detailed planning, its sophisticated backup strategies and its mirrored sites -- is still not enough to ensure corporate peace of mind, Voysey says. Just as important, particularly to big business, is a full-blown crisis management policy. A gas plant explosion is one thing. The well-ordered business must also find ways to ensure business will continue even if the CEO is kidnapped or the building bombed. And here, too, the CIO has an important role to play, with responsibility for participation in the recovery process and as a key player in crisis management.
However, as Peter Berents, risk manager for telco Optus Communications, says, BCP is like walking a tightrope: you have to get the balance right. "Those directly involved in business continuity planning know that if we don't have the right equipment, don't practise, and don't have appropriate support, the consequences for us and our organisations could be catastrophic," Berents told the Survive! '98 annual conference in Sydney last year. "We also know that we don't have unlimited funds. We therefore must ensure that what we do is appropriate. Management time is precious, so we need to keep management focused on the key issues. Most people are mainly interested in looking at the positive developments in their organisations rather than worrying about what can go wrong -- but we all know we must [worry]." Line managers responsible for BCP still perceive securing senior management approval for BCP projects to be one of the key barriers facing continuity planners today. With such an audience in mind, ANZ Banking Group senior consultant, IT service delivery, Richard Heron told Survive! that in many organisations, senior management either does not want to commit funds or resources towards BCP or else fails to appreciate the importance of continuity management.
"The 'What if we do have a disaster?' approach turns into a 'What if we don't have a disaster?' discussion," Heron said. BCP consultants typically advise BCP managers to approach the board of directors, state the reasons why BCP is needed and then gain the board's support by the strength of the argument. It's an approach that simply doesn't work, said Heron, whose paper focused in part on ways line managers could turn on their CIOs to the importance of BCP. "The best approach is to slowly 'plant' the need for continuity management, whether that means disaster recovery or business continuity . . . use every opportunity at all meetings you attend to keep up the momentum that business continuity is important . . . "The advent of a new chief information officer or general manager of information technology can be a great opportunity to make a difference with your pursuit of business continuity. Previous IT managers or chief information officers may not have cared too much about business continuity. New incumbents in these positions are different. They need to be seen to perform, especially in their first three to four months. Use this ideal opportunity to provide them with an awareness of the current business continuity issues and to get some support from them," Heron advised his audience.
Emergency Management Australia (EMA) is the federal agency responsible for reducing the effect of natural and man-made disasters on the Australian community. It is also the leading federal agency responsible for disaster relief and the producer of "Non Stop Service Guidelines on Continuity Management for Public Sector Agencies". According to EMA assistant director of policy Jonathon Abrahams, BCP simply should not be left to middle managers, who don't have the authority to push the plan through. "It's the same old story, it's really project management at its best, but the decisions will need to be made as to what the priorities are, and I don't think it's really the responsibility of lower level managers to take that on board. Senior people need to be making those decisions," he said. But while the buck should stop with the CIO, the focus must be much wider than just the IT side of continuity management or on worst-case-scenario planning. The needs of the entire organisation must be taken into account, and it may well be up to the CIO to get the board on side. "It [BCP] needs to be driven from the highest level; it has to have the sanction of the highest level; and then it needs to filter through the organisation, because you might find that your shop-front staff are as critical to the organisation as the senior managers," Abrahams said.
Dr Carl Gibson, Department of Premier and Cabinet Victoria, agrees. He told Survive! that for risk to become a "living, breathing" part of the business, it had to be owned by individuals at the business unit level and be capable of delivering improvements back to those business units. "Success is very dependent upon providing the very visible support of senior management, an appreciation of the business role of risk management and the necessary skills and tools to implement it," he said. Gibson's department achieved this by arranging for senior executives to take part in workshops of up to 10 local managers, by providing comprehensive training and analysis tools, by delegating responsibility for the process to the business units and by facilitating the inclusion of risk management issues into local operational plans. The work has paid a rich dividend in identifying strategic improvements with potential impact across the whole business. By recognising potential events such as a loss of key staff or electronic and paper-based information, for instance, the department has identified significant risks to its knowledge base. That lets it formulate improved strategies in the human resource and IT areas to mitigate those risks.
Old Discipline, New Realities Back in the days when the glasshouse reigned supreme, you couldn't add an application to the mainframe until you filled in a host of forms designed to ensure that addition caused minimal disruption to the system. Procedures were highly prescribed and rigorously monitored. Too many managers have lost that old glasshouse discipline; but for those who are serious about BCP, it is high time to start re-acquiring it, said xxxFulcrum Group of Companies principal consultant Danny Davis, who insists BCP must be about clarity, segmentation and structure. "It's one of those areas where the IT people are being called on increasingly to be business analysts, because it is very much a business analyst focus from where you need to start," Davis said. "The CIO position is where the role should be . . . IT needs to be taking a leading role in defining the processes that run the business. Part of that is asking what happens when those processes go wrong." Instead of declaring a goal of having disrupted processes back up in four hours, ask what the company would do if those processes were out for four hours. "What do you do if it is out for half an hour; what if it's out for four days? Because in 99.5 per cent of cases it might only be out for four hours -- but it may be out for four days; what are you going to do?" Davis asked. "What is the company going to do outside of just trying to get the service back up again? Put in call centres? Coordinate with people to get an emergency response? Coordinate with competitors? There are all sorts of things that an organisation may need to do.
"How will the business handle public relations? Now I don't think the IT people should be handling public relations; but they should be making sure these questions are being asked and documented, and that an appropriate public relations-type person is identified who needs to be consulted in a crisis," Davis said. While plans must incorporate ways to localise problems when they occur in order to reduce the impact, it is equally important to look at reducing the incidence of problems, Davis said. "There are certain components that are really mission-critical. Take IP networking. There might be a small system somewhere that's serving your security and when that goes off the air nobody can log into anything any more," Davis said. "We have seen it happen, that somebody has got completely locked out of their security system and brought everything to its knees because of trouble with a tiny little machine they ignored. They hadn't done that vital step of structuring and isolating, identifying what has universal impact, identifying which components are mission-critical then reducing the [chance of] incidence on those."The process must be iterative, involving in a regular cycle a series of audits which check the plan still matches the needs of the business. And last: standardise planning to identify a business framework and a service-levels framework. Make sure no one can add anything to the system without documenting the new facility in terms of that framework. The need for iterative BCP is something Berents knows all about. Optus quickly found its business continuity planning effort needed to meet constantly changing organisational and IT requirements. "The strategies put in place one month required urgent review after only another six," he said at Survive! "To counter this, we implemented a structured review of the Optus risks, undertaken by senior managers under the auspices of the Contingency Planning Committee. As part of this intense process we were able to discard some of the traditional business continuity planning concepts and fast-track the task to arrive at not only a new continuity plan but to entrench a structure for the ongoing review, reformulation and retesting of the plan."Ultimately, the objective must be to balance the exposure to risk against the treatment of that risk. Every organisation should invest in risk control focused both on preparation for crisis management and planning for the recovery of business operations. The caveat, Abrahams warns, is that external sources of supply frequently prove the weak point in the supply chain. "The infrastructure agencies are very important, as are the emergency services. One of the problems that an agency might have is that it focuses on local disasters like a building fire, but doesn't plan for widespread disasters that might occur where its demand for resources is in a hierarchy which might not suit. It might be that essential services get higher priority in terms of providing those utilities," Abrahams said. "You need to consider a wide range of potential scenarios covering the things they can handle internally as well as those where they will need assistance from outside, whether it be the emergency services or the utilities or any other suppliers."The key, he said, was to invest a lot in the upfront analysis and determine what the risks are, get an understanding of where they might come from, and then really try to manage those risks as efficiently as possible. "And that means you don't try to achieve a zero risk environment; you try to knock off a much as the risk as you can with the resources available." Look at ways to reduce the risk. Even simple things like smoke detectors, sprinkler systems and devices that prevent entry to the building by car can reduce the risk. EMA's guidelines are full of examples. And exploit the efficiencies to be gained from bringing BCP under the same umbrella as other risk management activities like security, fraud and occupational health and safety risk planning, Abrahams advised. At Optus, for every key risk identified the telco appointed an owner responsible for identifying potential impacts, mitigating factors and first response strategies. "Checklists to assist in the response -- including supplies, consultant contacts and so on -- were also prepared. The objective was not to produce a footstool that nobody used but everybody had, but a user-friendly document that would provide real assistance if we had an incident. We supported this by establishing a first response capability by ensuring we had available 24 hours a day, a trained emergency coordinator for every Optus facility," Abrahams said.
The Big Bang
But if such preparations might be adequate for normal disruptions, it would be a trap to believe they will do when it comes to Y2K preparations, warns Howard Rubin, senior consultant of the Cutter Consortium's Business-IT Alignment Advisory Service and Y2000 Service. Writing in The Cutter Edge newsletter last year, Rubin warns of significant differences between "normal" disruptions and the potential types and sources of disruption that may result from the Y2000 problem (see related story "Planning for the Worst", page 44). These include the fact that failures, or disruptions, might happen simultaneously in numbers of locations, elements or systems and in functions, devices, and systems that are interdependent. Multiple failures may occur in multiple locations both within the normal geographic areas of a company's operation and outside that "footprint", while errors in one device, function or system may well trigger anomalous behaviour in otherwise remediated devices, functions or systems, all of which may mask the triggering event.
Twelve steps to recovery
1. Enlist the cooperation of upper management 2. Seek help from qualified experts 3. Conduct a business impact analysis to identify key business functions and IT resources 4. Assess the risk of particular disasters based on company profile and location 5. Devise a detailed, flexible plan that outlines staff responsibilities 6. Select a company -- or vendor-based -- recovery option (that is, a redundant system, hot site, mobile data unit or quick shipment solution) 7. Cover all IT resources, including telecommunications networks and LANs 8. Select IT equipment vendors that can provide prompt service 9. Maintain updated vendor information 10. Test your plan at least once a year11. Don't underestimate IT needs: maintain a strong technical support staff and plan to replace lost equipment with more powerful equipment 12. Structure the workload to address top priorities first Source: Research Company of America
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.