Tsunami, bombings, hurricanes - the past year's disasters have been a wake-up call for Australia's government riskmanagement practitioners.
Although we're half a world away from dramas such as the Katrina devastation and the London bombings, business continuity planning (BCP) is no less a concern for government organizations in Australia. Here and elsewhere around the globe a lengthening string of disasters is fuelling a growing awareness that bad things can happen anywhere - and that those once-theoretical BCP exercises could well become lifesavers.
The challenges of ensuring continuity have been fodder for numerous conferences in Australia and overseas this year, and one of the recurring themes of these events is that the recent spate of high-profile disasters has forced government and private-sector organizations alike to reconsider their BCP in terms of the "new normal" - a changed operational state in which the perceived likelihood that an emergency will be put into action is escalated dramatically.
At the NSW Treasury's Office of State Revenue (OSR), the need to revisit BCP has driven a change in philosophy, particularly when it comes to ensuring continuity of IT systems. "We've been moving away from just talking about focusing on IT solely for disaster recovery, and moving to an all-hazards approach that includes the business, resources and other aspects," says Jason Thewlis, business continuity manager with OSR. "There are a lot of risk management procedures in place already in most [organisations], but the challenge is in pulling them all together to have a unified front in risk management."
While risk managers know that the mark of good business continuity planning is exactly that - that nothing happens - arguments for strict BCP can lose their strength in the fierce competition for budget funding. The September 11 terrorist attacks changed all that, but four years later many organizations still lack fundamentally sound business continuity practices.
John Worthington, Australian representative for the UK-based Business Continuity Institute, sees the disconnect between perception of risk and actual BCP activities all the time. "It's unfortunate that it takes an event of magnitude to reinforce the imperative for all organizations to embrace BCP as a standard management function," he says. "Too often it's a reactive process that comes after the event, but once you get a worst-case scenario it's quite frightening what the consequences are going to be. All too often, organizations take a limited view of this and forget that most of the things that could happen to them are within their control."
Because of their unique role in the economic and social landscape, government agencies face even more pressing business continuity requirements, due largely to their monopolies over delivery of their particular services. Fortunately for government organizations, recent disasters have lent political weight to efforts to improve business continuity planning inside and outside of the IT department. Most government bodies already have BCP plans in place, and current efforts are focused more on refining them than on building them up from scratch.
The NSW government, for one, has recently pressured its subsidiary organizations to ensure their continuity plans meet today's needs, while an ongoing campaign from the Australian National Audit Office (ANAO) has prompted reviews of key Commonwealth agencies. Centrelink recently completed a comprehensive ANAO audit, and the Department of Immigration, Multicultural and Indigenous Affairs (DIMIA) is next up to bat with an audit planned for the current financial year.
"To ensure the continuity of services to its clients," ANAO wrote in its audit proposal, "DIMIA must have BCM and/or associated risk management procedures and plans in place that minimize the likelihood of a significant business outage; and in the event of such an outage, minimize disruption of critical services to customers."
ANAO's focus on Centrelink and DIMIA - two of the larger departments in the Commonwealth portfolio - suggests an expectation that their audits will provide guidance for other agencies with less complicated BCP requirements. The audit will evaluate DIMIA's BCM capabilities including frameworks, approaches, strategies, plans, capabilities and recent performance in both BCM and related elements of risk management.
When developing its BCP strategies, DIMIA closely follows the guidelines of ANAO's own Business Continuity Management Better Practice guide and workbook, which play a role in the structure of a large four-year process review that DIMIA kicked off in the middle of this year.
Deputy CIO Michelle Foster is confident that DIMIA's history of attention to BCP will help the organization's processes hold up well to the ANAO audit. "This isn't suddenly a new thing just because of the last five years' worth of major events," Foster says.
"We have been, and are doing, major improvements to IT disaster recovery capabilities. It's definitely a core element of managing IT. It's a priority to keep the environment running, and on any project you've got to look at all levels - data, applications and then the process and people sides. They're the same things you have to look at in any project," she says.
Small Threats, Big Problems
The frequency of recent high-impact world events has increased many people's fears that an attack could threaten the nation's infrastructure. Media attention tends to amplify the perception of threat severity; such events are in fact rare in an operational environment where minor disruptions are an everyday occurrence. In most cases, such disruption comes from common problems - a tripped circuit, failed server, local power outage, accidentally cut fibre optic cable, or an outside denial of service attack - happening on a small enough scale that they slip underneath most people's radars.
As the blackout that happened in Los Angeles in September demonstrated (700,000 LA residents lost power after a worker mistakenly cut a wrong line), minor disasters can quickly grow into real show-stoppers. Yet studies show that many organizations aren't committed to planning even for these small events. Where budgets are tight, it's likely that many risk managers are tackling the large disasters at the expense of smaller, everyday threats to continuity.
This is the reason Thewlis, Foster and their peers at other government organizations tend to base their plans on generic scenarios - in which, say, an undefined event has limited or cut off access to an office building - rather than focus on specific disasters like earthquakes or terrorist attacks.
"History tells us that if we try to get too detailed in our scenario planning, we back ourselves into corners and don't have the flexibility to deal with the situation," Thewlis says. "There are a thousand things that can happen to an organisation that stop it from its goals, and we try to give a scenario that gives us options to plan. That scenario is that the building is inaccessible; anything less is a good thing."
The focus, then, is on recovery time objectives (RTOs), or the allowable time for each individual system or process to be down while the organization works around the loss of building access. Specific issues - such as the fact that one of OSR's seven NSW offices is located within a flood plain - can then be layered on top of those generic plans.
An overall understanding of the organization's operations is critical to prioritizing RTOs, says DIMIA's Foster. "You have a sliding scale," she says. "Some systems have to be non-stop under any circumstances, some have to be up x number of hours a day, and others might have to be up y number of days per month."
Bridging the Divides
Although such abstractions have remained relevant for decades, it's important to pair high-level BCP efforts with some very concrete plans for IT service continuity - providing an additional layer or protection above the technology-enabled disaster recovery planning that's commonly run as part of large IT projects.
Such disaster recovery often relies on purpose-built technologies: for example, a new storage array might have internal redundancy built into it and be linked by redundant fibre-optic cable to a second array in another suburb. Full business continuity, however, depends not on the technology but on the way that other related IT systems, such as desktop computers, accommodate the swap-over. There are also other, broader issues such as the movement of staff between offices, and backup plans should the secondary site also be affected by a spreading disaster such as flood or chemical contamination.
Growing functional interdependencies between government organizations is another area that demands careful attention, since data and process sharing can be seriously compromised in the event of a catastrophic event. Similarly, local government organizations would seem to be most seriously behind the eight ball, since many lack the structured BCP resources and budgets necessary to provide risk management that's on par with their state and Commonwealth cousins.
Other IT activities that are also having an impact on BCP efforts include the trends towards consolidation and outsourcing of key business processes. In the first case, consolidation of IT resources reduces geographical redundancy and creates more tightly focused resource pools that must be protected even more carefully than they were previously.
In the second case, organizations working with outsourcers must make sure their partners and suppliers have taken steps to ensure BCP policies that are in line with their own. For this reason, the Critical Infrastructure Protection (CIP) division of the Commonwealth Attorney-General's Department has been working to facilitate communications between private-sector and government organizations, and the infrastructure providers upon which they rely. This includes mainstream IT platforms, data security, and even automated SCADA (Supervisory Control And Data Acquisition) systems that remotely monitor valves, pipes and other process-related equipment.
"We see [SCADA] as being a major vulnerability, and we also bring IT security issues into the mainstream," says CIP assistant secretary Mike Rothery. "Most organizations have a connection into a service but have no visibility into what goes on behind that, and don't realize how vulnerable their operations are to other service providers behind the scenes. We're trying to give people a better understanding of the vulnerabilities in those services that they are dependent on, because they inherit that risk. Where we can't guarantee that infrastructure, we're talking about risk management and not risk avoidance."
Around 90 percent of CIP's operations involve dealings with the private-sector companies, since they are the ones that manage the majority of critical infrastructure in Australia. Recognizing that many organizations baulk at the cost of formal audits, ten months ago CIP set up a grants program under which it subsidizes half of the cost of a risk audit for interested organizations. The goal: to encourage proactivity on the part of organizations, so that potential risks are identified and remediated before they become real problems.
Although such issues are endemic to the relationships with outside providers, they have also come to apply internally as IT organizations increasingly manage relationships in the context of formal service provision agreements. In many departments, however, a historical lack of clear communication has created a situation where IT groups are wearing much of the risk for business continuity, but can struggle to get the necessary support and contextual understanding from agency officials and department heads.
It's not enough to blame management for the lack of communication, however. "Probably the hardest thing to change over recent years has been IT's view of business continuity", says NSW Treasury's Thewlis.
"IT people are the hardest to change, and it takes work to get them away from the idea that we just have to look after the systems. We actually want to look after the resources that look after that system, to make sure they're available as well, and not just the particular system."
The Same Old Normal?
Is it premature to refer to today's situation as a new normal?
Isn't all this concern over disaster recovery planning just more of the same thing that government organizations have been dealing with for decades?
It may be, but anything that increases the profile of BCP can't be a bad thing, argues Carl Gibson, who until recently served as chief risk officer with WorkCover Victoria and is currently in the process of establishing a new Risk Management Unit at La Trobe University.
"BCP has been around for a number of years, and some organizations did a very good job of it," says Gibson. "Post-September 11, however, has really focused attention that there can be a massive catastrophe - not necessarily even involving your own organization, but somewhere else in the value chain or the community - that can have profound effects on your own resilience. There are significant risks within the very structures we've set up, but we're not necessarily focused on them because we believe somebody else is looking after them."
In a similar vein, there is a broad discrepancy between organizations as to who is actually responsible for BCP efforts. A recent survey by The Economist Intelligence Unit survey found that CFOs (19%) and CIOs (14%) were most commonly responsible for BCP, with the business continuity manager (9%), the chief risk officer (6%), chief strategy officer (5%) and internal auditors less frequently putting in an appearance. Fully 19% of respondents didn't even know who was responsible for BCP - a telling and frightening statistic given the inarguable importance of BCP for organizations operating within the new normal.
In some cases, the problems that BCP addresses exist outside the sphere of IT or the business. People issues, for example, can be a major problem for business continuity: if a natural disaster affects business continuity, it may be impractical for people to even get to their work - a problem that has been particularly pointed in New Orleans, where most people are more concerned with finding loved ones and staying alive than implementing their company's business continuity plans. And without people, even the best BCP process will fall flat on its face.
In the end, of course, risk is all about what we can accept: "It's unrealistic to be 100 percent bulletproof," says BCI's Worthington. "Budget limitations are probably going to stop that. You've got to accept the fact that it's not going to be perfect, but it will get you by."
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.