The political uproar surrounding the revamped Job Network can’t eclipse the Herculean effort expended to get one of government’s most complicated information systems up and running. We talked with DEWR to find out what went wrong with application EA3000 - and how the department made it right.
Australia’s media wasted no time jumping on anecdotal reports about the performance of the federal Department of Employment and Workplace Relations’ (DEWR’s) new EA3000 application which, shortly after its July 1 launch, made some surprising occupational suggestions for a number of unemployed people.
There was, of course, the Tasmanian man the system suggested could work as a call girl. Then there was the 62-year-old clothing cutter offered a job as a junior assembler, or the 56-year-old Ballarat woman - struggling with arthritis, high blood pressure and shortness of breath - that the system suggested would make a great combat officer.
It may have made for good “current-affairs” fodder, but the early problems with the system were in fact due to the kind of data quality issues that plague most large systems implementations. Combined with reports of technological problems and the usual political grandstanding, the climate for the introduction of the third phase of the Job Network program - also known as ESC3 (Employment Services Contract 3) - was anything but welcoming.
The situation became even worse in August, when Minister for Employment Services Mal Brough was forced to defend an alleged $2.1 billion government bailout for the 109 Job Network subcontractors that signed onto the outcomes-based program expecting more job seekers to front up for interviews than ever did. Labor has used the dramas around the project to vilify the Howard government’s proclivity towards privatisation, while the Liberals continue to argue that the current system has been a major improvement on its predecessor.
Far away from the political limelight, the fact remains that ESC3 is a major shift from previous unemployment services models. It is a technology-enabled initiative with far-reaching implications for service delivery, and the story of its genesis reflects both the environment it was created in and the inherent complexity of such a far-reaching information system.
Building a New Job Network
ESC3 is only the latest phase of Job Network, which was conceived in the Howard government’s early years as a way to improve the effectiveness of the government-run Commonwealth Employment Service (CES) in helping the country’s nearly 750,000 unemployed.
The CES was terminated in 1998, with responsibility passed on to a network of Job Network subcontractors. Anticipating the shift would require more complex data management. In 1996 DEWR had upgraded the CES’s 1988-vintage Fujitsu mainframe to a more open Hitachi MVS-compatible mainframe system that used DB2 and CICS to track the system’s interactions with workers. Reflecting the competitive nature of the Job Network scheme, the database was partitioned so each member could “own” its own customers.
Aiming to make the shift as painless as possible, DEWR initially gave in to cries of protest from contracted providers who were not keen on having their IT strategy dictated to them. The department built an EDI-based (electronic document interchange) system that would allow providers to keep their current employment systems, yet still communicate with the Job Network mainframe.
Taking stock of lessons learned over the original 18-month ESC1 program, the more ambitious ESC2 began in 2000, with 194 contracted service providers awarded three-year contracts worth around $3 billion.
Supporting them was EA2000, a Visual Basic application designed to improve access to the DEWR mainframe. Yet EA2000 faced its own problems: a classic client/server application, many employment agencies found it difficult to install and maintain. The EDI-based approach was also becoming “a huge legacy and an impediment”, according to Anthony Parsons, general manager of employment systems with DEWR.
Facing the impending expiry of ESC2 contracts, in 2002 DEWR embarked on what was to become a complete reworking of the Job Network technology model. The move was seen both as an opportunity for the network to upgrade its technology and add new features, and as a way of breaking the department’s links with an ever more burdensome past.
Job Network had been reviewed extensively during ESC1 and ESC2 ? including major reviews by the OECD and the Productivity Commission - and ESC3 reflected major policy changes that drew on many of the findings of these reviews. “We used this opportunity to significantly adjust policy settings,” says Parsons. “It was time to reflect on the evolution of the labour market and how well some of those policy settings had worked, and to decide which changes to put on the table for ESC3.”
Several key changes soon emerged as the driving goals of the new program.
First, the new system would only give job seekers access to ongoing service from one Job Network provider of their choice for as long as they were unemployed. This approach was intended to eliminate previously fragmented arrangements that had seen job seekers having to register with multiple providers to maximise their chances of finding work. Multiple registrations had made it impossible for Job Network to keep a continuous record of interactions with each person, and individual providers’ knowledge of each person was limited to their own experiences.
A second problem was in keeping jobseekers actively engaged with their Job Network provider. Historical data had showed that Job Network members’ efforts to find work were intense at the beginning of their interaction, tapered off to a low plateau for much of the relationship, then rose sharply in the lead-up to expiration of the provider’s window of opportunity.
To keep efforts at a more consistent level throughout, EA3000 was designed not only to record the job search but to proactively assist it by matching job seekers’ skills with available positions. Each job seeker would build an online resume with up to five areas of specialty, then let the system regularly search available positions on their behalf. New communication channels - including SMS messaging, an IVR (interactive voice response) phone system and interactive touch-screen kiosks - were introduced to give job seekers more ways to find out about job matches.
Finally, EA3000 was enlisted to help speed the process of linking people with their Job Network provider, which could easily take four to six weeks under the old system. Improving this centred around the creation of a online diary application that would be used by Centrelink to query instantly each Job Network provider’s schedule and book available slots - potentially as early as the same day - on the spot for the job seeker.
Managing the Complexity
Such were the terms of reference for EA3000. Faced with pressure to deliver a working system by the deadline of July 1 this year, Parsons’s team was charged with delivering what from the beginning was going to be an ambitious and massive project.
Just how massive? At its completion, the system was measured as having approximately 9970 function points - a standard method for measuring applications in terms of the number of functions they have, such as printing a document or displaying a screen. Tenderers compare bids by measuring the cost per function point ($1000 per function point is a rule of thumb). Most moderate-sized applications might have around 500 function points, and anything over 1500 is generally considered high risk. EA3000 was, it turns out, more than six times that limit. And while the number of function points was not immediately quantifiable, the system’s complexity was clear as soon as political leaders described their vision for the system.
“When you’re in a front-line policy department like this, you have all sorts of pressures to respond to political change quickly,” Parsons says. “When they described the magnitude of change, we knew that it couldn’t be achieved within 12 months using our traditional systems development approach. We jettisoned as much optional functionality as we could, and went searching for an iterative approach that would let us achieve as much as possible.”
In an ideal world, that approach would have been the Rational Unified Process (now owned by IBM), but its high cost led DEWR to consider alternatives. The department settled on Object Consulting’s Process MeNtOR methodology, a comprehensive change management process that was far more cost-effective than the Rational approach.
As dozens of DEWR analysts began planning the development project, Parsons’s earlier feeling that traditional ways of doing things were not going to cut it were confirmed. Process MeNtOR is based on a train metaphor, with teams seen as a collection of cars whose members get on and off at stations along the way. The methodology recommends a project never have more than seven “trains” at a time, but in EA3000’s early days DEWR had more than 30.
Adopting Process MeNtOR helped Parsons’s team - which rapidly grew to include more than 30 business analysts, 30 testers, 60 programmers and around 80 training, support and other staff - to plan an aggressive yet effective course of development for the project.
The latter half of 2002 was a flurry of planning, development, testing and redevelopment as EA3000 took shape. Yet one very important thing was missing: probity requirements meant DEWR could not communicate with companies bidding to become Job Network providers until after the tenders had closed. This meant it was difficult to ascertain, early on, exactly what they would expect from the system.
“During the early days of scoping and design, I was forbidden from talking with any tender respondent in order to keep the tender process absolutely beyond dispute,” Parsons recalls. “We had to second-guess some of the end-user requirements, saying ‘we think they’d like this’ but couldn’t talk with them to justify our assumptions. When the dust cleared, we had covered around 60 to 70 per cent of their wish list.”
Towards a Better Interface
While designing EA3000, DEWR faced a very common problem: how to build a modern application with a modern user interface, but manage data using an antiquated back-end legacy system such as DEWR’s 1996-era mainframe.
In the past, direct mainframe access - bolstered in EA2000 by a rudimentary interface - had been a simple compromise. But as available technologies improved, it became clear that EA3000 was going to need strong back-end application integration and a flexible presentation layer capable of delivering messages via SMS, e-mail, IVR, a Web browser, in-office kiosks or other interfaces.
Simply building on the previous system, it was clear, was no longer going to cut it. DEWR also made the strategic decision to discontinue the EDI system, mandating that ESC3 members update their IT to reflect current technological realities. “One of our challenges was to take the substantial, legacy mainframe that powered ESC2 and make it such that it could deliver messages through various channels,” Parsons says.
“The old mainframe tool was clunky to navigate and not at all pre-emptive, and users had to know the right steps to complete an operation. We wanted EA3000 to turn that around to follow the logical sequence of business rules, so that we could provide a best-of-breed tool that was really attuned to the workflow. The policy settings were so radically different from ESC2 that there was no easy evolution from the old IT systems that may have been built.”
With EA2000, the department had experimented with bog-standard HTML in order to provide an interface that would run on as many desktops as possible. However, DEWR soon realised that it could build a much richer, usable application interface if it built EA3000 around Windows and associated technologies. Most specifically, this meant using Microsoft’s .NET Framework and WinForms components, which enable direct application calls to the Windows environment to provide features that are impossible using generic Web technologies.
To make the .NET approach work, providers had to be running Windows XP - a requirement that DEWR wrote into the initial Job Network contracts. Yet when subcontractors asked for configuration specifics, the department deferred to Microsoft’s Web site, which is known for providing optimistically low minimum configuration suggestions.
Down the track, DEWR traced some employment providers’ problems to this miscommunication: one company was using Pentium II-based systems with just 64MB of memory. They’d managed to get Windows XP running, but the unsurprising poor performance was being attributed to EA3000 and not to their outdated technology. Such issues demonstrate the inherent complexity in ensuring that 109 different companies were reading from the same page as DEWR.
Troubleshooting = Political Damage
After a fiercely busy development cycle, EA3000 went live on July 1. The department was on the back foot soon afterwards as minister Brough suffered a roasting from political opponents - first over the highly lampooned anomalous job matches, and second over the more political issue of the number of unemployed people actually showing up at interviews to participate in ESC3.
As is common in large IT projects, the matching problem came from a relatively straightforward data quality issue. DEWR, keen to expand the number of positions available to Job Network participants, had sourced data from newspapers and other job outlets. Those positions were labelled using different headings from those used by DEWR, meaning that EA3000 labelled anything that did not match as ‘Other’ - a category that threw up potential matches for every Job Network participant.
If data quality issues tainted public perception of EA3000, other issues were causing dissention among the ranks of the Job Network members. Soon after the system’s launch, some providers began reporting slow performance when running queries against the system’s 100GB DB2 database.
One member surveyed its staff and found a range of anecdotal claims reporting transaction latencies of 10 to 30 seconds or more - particularly from offices in regional areas. Others alleged it had taken up to 80 minutes to log onto EA3000, which was processing more than three million transactions a day shortly after its debut.
Investigating the claims, DEWR discovered strangely inconsistent behaviour: the system would run beautifully on Friday, says Parsons, but by Tuesday it might be struggling to keep up when under peak load. Additional servers were added and DEWR doubled bandwidth - from 40Mbps to 80Mbps, split between Telstra and Optus connections - running into its data centre.
Yet even with these improvements, performance continued to suffer. The solution ultimately uncovered: one of the telecommunications lines had gone down, which meant that one out of every three transactions sent to the system was simply being lost. Since the backup was still operational, however, service was not cut altogether so the problem had not been noticed.
Although fixing the problem improved EA3000’s performance somewhat, strange delays at peak periods were still plaguing the system. In a process that Parsons likens to peeling an onion, the team went through weeks of debugging and testing. This was particularly difficult on the new Windows platform, since DEWR had been used to the comprehensive system monitoring built into their mainframe environment for years.
The lack of equivalent granularity in Microsoft’s systems monitoring technology made it hard to get information that was detailed enough to spot the problem quickly (DEWR later learned that Microsoft application developers must ‘instrument’ their code to provide detailed insights into transaction processing).
Eventually, the team peeled away the last layer of the onion to reveal the show-stopper: the mainframe ran HTML 1.0, while the Microsoft Web server preferred HTML 1.1. Slight differences in the way multiple connections were configured meant that increasing the number of simultaneous connections on the Windows servers was having no effect on the number of connections the mainframe supported.
“We kept on ratcheting up the number of connections on the Microsoft side, but the mainframe wasn’t aware of that,” Parsons says. “Every time we’d push it up, we had to wait until the next week’s peak times to realise response times were still going up.”
The team methodically worked through the environment’s server configuration and found out how to work around the problem, and performance immediately picked up. Resolving this issue helped the EA3000 team reclaim control over its system, improving performance problems that deputy opposition leader Jenny Macklin had proclaimed in July would be the death of the Job Network scheme.
Playing the Features Game
Months into its existence, resolution of the new Job Network’s IT teething problems has shifted the focus of ministerial discussions away from the perceived shortcomings of the IT system, and more onto the issue of ESC3 participant numbers.
These days, Parsons says, EA3000 is regularly delivering half-second response times for as much as 3.8 million Job Network member transactions in a single day. During its nightly job matching run, EA3000 sends out more than 65,000 potential job matches through 15,000 SMSes, 10,000 e-mails, and 3000 IVR enquiries. The Australian Job Search Web site (www.jobsearch.gov.au) is serving up roughly half a million pages per day.
Internally, EA3000 is proving its worth in other ways. For example, OLAP (online analytical processing) capabilities allow department managers and Job Network members to analyse data in real time rather than having to pore through regular paper reports as in the past. They can also sign up for e-mail alerts that are triggered when activity patterns trend past defined thresholds - for example, if one particular provider sees a 10 per cent drop in new referrals over a period of time.
Confident that the system’s teething problems are behind it, Parsons says the development team has been focusing on expanding the system through feedback from users and the addition of many features that were taken out of the system’s original design.
The September update, for example, incorporated an enhanced resume editor and XML compression technology that Parsons says halved EA3000 message sizes during testing. The next release, planned for December 6, will let EA3000 users tweak job weightings - for example, to suit seasonal occupations - and tailor the logic used to do job matching. In the future, the development team will also revisit features such as the collaborative notice board, something “we thought was an insignificant add-on to the diary” that has become unexpectedly popular.
Another planned feature is better support for offline data, something requested by Job Network members that travel into the field.
As the EA3000 project settles into technical maintenance mode, Parsons says some things could have been done differently. One of the biggest problems was the technical team’s lack of access to bidding Job Network members during critical design stages. Another, in a similar vein, was the challenge of getting DEWR consultants to block off time in which to sit down and work through emerging issues with the design team.
“If we could have put together a working prototype, and simulated some real-life experiences, we might have gained early insights into how to best build the system to suit end-user behaviour,” he says. “I needed their expertise here to do justice to the agile development approach, since they’re the ones that understand policy implications. In the last three months of the project, [increasing involvement from users meant] the productivity rate climbed markedly.” Clear requirements planning, effective project management, user involvement in design and testing - the things typical of most large corporate and government systems deployments - were no less important in the development of ESC3. Even as the dust settles on the scheme’s complex introduction, EA3000 is humming away in the background, testament to the importance of careful planning, testing and buy-in from across the board.