Sometimes there's nothing like a short sharp shock to alter your perspective.
When a burst water main took wholesale merchant banking company Dresdner Australia's computers down for a day three years ago, it proved a valuable object lesson in the import of a business recovery strategy.
More by luck than good planning, Dresdner found itself better positioned than most to weather such a storm, even with only "limited" disaster recovery in place. As a result of that good fortune, it ultimately got through the incident with minimal detriment to either its clients or the bottom line. Since then it has developed a comprehensive business recovery plan consistent with its business strategy and customer service philosophy and built on the lessons learned from the entire experience.
But that plan contains an interesting paradox. On the one hand, Dresdner sees the route to its long-term salvation as a move away from the usual heavy reliance on technology. On the other, it is relying heavily on a technological solution for data recovery which it is confident will serve its needs well into the future. Combined together, the two prongs of its disaster planning fork make perfect sense.
Dresdner Australia is a subsidiary of the German Dresdner Bank Group.
Specialising in part in securities, money market, foreign exchange and precious metal trading, Dresdner has operated in Australia since February 1990, when it purchased the treasury operations of the then Elders Finance company.
As a wholesale merchant bank, Dresdner trades large amounts of money across a spectrum of different markets, all with the potential to be volatile and all having inherent exchange rate risks. Those risks are managed by the computer systems, making high availability extremely important to the organisation. The bank currently employs 90 staff, including eight IT staff in Sydney and Melbourne.
According to Treasury Operations director Les Andrews, the small size of its staff, and their high degree of loyalty, proved its salvation when a burst water main flooded the Sydney office, robbing the bank of all power for 24 hours.
"We were lucky because we have such a small staff turnover, and we have people working with us who have been working with us for years. We wouldn't have been able to get through if the people who worked for me hadn't worked for me for 10 years and knew how to do things manually," Andrews says. "They balanced positions manually, they processed manually and that's the only thing that got us through."Meanwhile, without power the simplest things caused problems. Like the fact that most calculators in use were solar powered. Combine that with tinted windows and no lights, and it meant that staff had to crowd around the windows to use their calculators. It was a valuable lesson in the need to reduce reliance on technology of all sorts.
"It becomes a bit impossible to do a fully comprehensive, fault proof disaster recovery plan and that's why you need people with manual skills," says Andrews.
That experience gave rise to one of the main components of its business recovery plan, the recognition that while its environment is so dynamic that people need to work with computers to achieve business objectives, making technology a fundamental part of the business, it is also merely a tool for users. As far as Dresdner is concerned, the business should never be constrained by or fully reliant on computers.
"What we're looking at really is the fact that you can have disaster recovery, but you also need to maintain manual skills," Andrews says.
Or as Information Services associate director Greg Gilmour puts it: "Over the years one of the roles of IT has been to automate menial and error-prone processes to such an extent that we're now forgetting how the manual processes work that actually went into creating this system. You can't rely totally on computers. I guess we are looking to stay in touch with our roots."At the heart of the disaster recovery plan is the recognition that there are four potential types of disasters the company needs to be shielded from: threats to the building or building access, threats to communications, threats to systems and threats to or from staff. Dresdner has invested considerable effort into working out what amount of down time would be acceptable in the event of a disaster, and in investigating the full gamut of statutory and industry requirements for business recovery.
It has identified the information and materials which are critical in order for its staff to maintain minimal operations and has set up alternative hot sites.
The merchant bank has also worked out a chain of command to be deployed in the event of a disaster, and has invested considerable effort in disaster recovery plan testing and revision.
Other components of that plan include a recognition that the merchant bank's core requirement is to reduce market risk for customers; that it must continually aim to provide "value add" to deals - the "service behind the rate"; that best practice involves provision of dynamic market information to clients on demand; and that the greater affordability of disaster recovery solutions now makes it unacceptable not to have one in place.
The key plank to that disaster recovery solution is where the technology comes in. Dresdner has become the first Australian company to use Object Mirroring Systtem/400 (OMS/400) from IBM business partner Vision Solutions, supplied in Australia by AS/400 Value Added Remarketer Primur Systems. A mirroring solution specifically for IBM's AS/400, OMS/ 400 is built on top of the OS/400 operating system, provides a seamless interface to the AS/400 and uses built in AS/400 capabilities.
Dresdner inherited its IT infrastructure from Elders Finance, and remains with AS/400 because the platform supports the MIDAS application suite from Kapiti, a de facto standard in banking circles; because of its inherent reliability and security; and because its ease of use and built-in systems management makes life simpler for an MIS group which had no experience with the AS/400 prior to the Elders buy-out. It also runs six IBM PC Servers and 150 IBM PCs running general desktop applications.
AS/400 OMS/400 provides an automated means of maintaining duplicate databases across two or more AS/400 processors. This is accomplished via communication links, sophisticated communications programs and the IBM Journalling Facility.
The product supports both centralised and decentralised networks. Because data is mirrored on a real-time basis, remote locations are instantly updated with all data changes in the network. In addition, central locations are online for real-time data transmissions from remote locations.
Dresdner also uses Object Distribution System/400 (ODS/ 400), which provides for automated distribution of application software, authority changes, folders/ documents, user profile changes, system values, subsystem descriptions, job descriptions, logical files, OutQ descriptions and JobQ descriptions throughout an AS/400 network using an event driven process.
It currently operates on an AS/400 model 310 at its main production site in Sydney, with another Model 300 in Melbourne used for both production and as a backup environment.
According to Gilmour, the merchant bank sees a comprehensive disaster recovery strategy as fundamental to running a professional, responsible organisation.
"From a business perspective it should be part of the daily overheads of doing business and we do take it seriously," he says.
Gilmour says the bank had a policy of buying product off the shelf wherever possible, and so did not consider developing its own mirroring software.
"There is another product in the market, but I was fortunate enough to meet someone who had tried to install it," he says. "It had got a bit of bad press, and I didn't really like the way it would have worked in the environment we have here."Dresdner received the OMS/400 software in January and had it up and running by July. Both ODS/400 and OMS/ 400 offered easy integration with existing applications, making the disaster recovery solution simple to implement and manage. Gilmour says difficulties during implementation were largely to do with the bank's environment, with technical problems to do with the application software drawing out the process a bit longer than anticipated.
Mirroring runs as a series of batch processes in a sub-system, just as with any other jobs on an AS400. All data and object changes are mirrored in real time.
"OMS captures all the transactions before they get written to the database, via journalling into the journal receivers, and the journals are sent to the target machine as they are being applied by the application software or the operating system on the source machine," says Gilmour.
Gilmour has thoroughly tested the solution since it was implemented.
"As far as testing goes, I synchronised the two machines - basically I cloned the target machine from the source machine - then turned on the mirroring software. We then ran for nearly four weeks over an end of month.
"After that I came to work on a Friday morning and turned off the source machine. We disconnected the port controller box, which has all the hardware and all the devices connected to it, and plugged it into the target machine, then changed all the PC's connections across to the target machine. After that we ran an input cycle stage - which is just our normal daily processing stage, with all its complexities and interfaces and the like - then we ran an end of day, which is the process by which payments are generated, reports are generated, mature deals are netted out of the system and so on.
"After that we had a small team come in on the Saturday morning to review the results. And at no point from the start of that process to the end was there a single hiccup in terms of the data, although there were one or two minor environmental issues. And I then took the data off that target machine, put it back onto the source machine, and on Monday morning when staff came in we were working back on the main machine."Gilmour says the whole procedure went smoothly and there has been absolutely no fallout in the months since that initial testing was done. Indeed, part of the merchant bank's confidence in the reliability of the disk mirroring technology lies in the fact that it sees the entire, regular routine disk mirroring process as basically a "24 hour a day test".
According to Andrews, it is all very well for people to insist that a disaster recovery plan should be tested once or twice a year. In practice, such testing is by no means as easy as it sounds.
"We've all faced this before, where you try to test something, you walk over to another building with floppy disks or some tape and you try to jump start another machine. You find it's configured differently, or the person who set the machine up is on annual leave in Mexico and you can't get hold of him in the Mayan Indian temples he's visiting and so someone has to second-guess the way it's done.
"Even when you see people trying to restore information on machines with software they know everything about, they have difficulties, and I can just imagine a disaster recovery site where you have to pull your tapes from storage, go to another site, start the machine up . . . it's just a recipe for disaster," Andrews says. "If you've got disk mirroring, the thing is running constantly, so basically you've got a 24 hour test."So is the merchant bank confident OMS disk mirroring will be a reliable and viable solution into the future?"It's an elegant, simple system built on top of an operating system, it utilises the operating system's functions, and on that basis even to an outsider it must be an attractive option," says Gilmour.
"I'm confident in the solution partly because it is IBM. This gives me the confidence as far as the infrastructure itself goes, so I know at that sort of foundation level it's pretty solid. As far as the technology is concerned, well, it works, and I know it works, but how far do you take these things?"
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.