Shooting Star

Shooting Star

Everyone can find holes in current data sets that they think should be filled, and this can lead to an exponential growth in data that is the bane of data collectors and data classifiers.

The Australian Institute of Health and Welfare is one of the first government agencies in the country to get serious about metadata.

It might not be one of the busiest Web addresses in the world - its weekly traffic is measured in double-digits - but the Australian Institute of Health and Welfare's (AIHW's) metadata portal, as well as the entire system that underlies it, is about to complete a total revamp that will cement its position as a world leader in assisting the organized dissemination of vital health and welfare data.

The revamp itself will also not rattle the world's financial markets. At a million dollars all up, it is small beer compared with the usual round of projects in tens and hundreds of million dollars; but in terms of value, it is up there with the best of them.

The AIHW is an independent Australian statutory authority established in 1987. Based in Canberra, it is part of the federal government's Health and Ageing portfolio. It collates and publishes national health, community services and housing assistance statistics and related information, including biennial reports to Parliament on the nation's health and welfare services, and more than 120 reports each year.

This information is used by policy makers at all levels Australia-wide, as well as academics, students and the general public, and is shared with overseas organizations such as the World Health Organization and the OECD.

Most of the data is numeric codes and classifications that anonymously describe individuals, their characteristics, circumstances, institutions, services and so on.

And it is big. The hospital data alone increases by approximately seven million records per year.

Needless to say, such a massive data bank requires serious organization. But, as David Braddock, head of the AIHW's Metadata Management Unit (MMU), admits, "the data is absolutely useless unless you can standardize the way it is collected and reported". If the data is not collected according to standard definitions and formats (such as the metadata) then the resulting data inconsistencies make any subsequent analysis less meaningful, or like comparing apples with oranges, as Braddock puts it.

Which is where the MMU and METeOR comes in.

METeOR is the repository of the metadata, the "data on data" definition and standards that provide the underlying structure to support the collection, exchange, reporting and use of data within a defined context. Metadata needs to accompany data being transmitted otherwise it cannot be understood.

The metadata registry is a system or application where these structures and definitions are stored and managed, and through which they are made available to users.

Everything from field descriptors such as "name", "age", "height", to a plethora of medical conditions and personal circumstances - thousands of items in total - comprise the metadata registry. It is a constantly evolving collection that increases and contracts (but more commonly the former) as the data itself and the uses it is put to change over time.

The AIHW metadata itself was organized in a system called the Knowledgebase, an Oracle database using Oracle's Webserver technology (PL/SQL). Launched in 1997, the Knowledgebase was based on an international standard for data element definition (ISO/IEC 11179 (1994) - Information technology - Specification and standardization of data elements).

The metadata that populates the Knowledgebase is approved by the national information management group for each sector it serves: the National Health Information Group, the National Community Services Information Management Group and the National Housing Data Agreement Management Group.

At the time, the Knowledgebase was one of the first examples in the world of such an online implementation of the 11179 standard, and certainly a world first for health, community services and related data. As such, over the years it has elicited much interest from other local and international metadata authorities, both health-oriented and otherwise, and continues to play a leading role as a metadata authority.

Not bad for just three or four people at the time, and now only six dedicated to the development of its successor.

Changing Times

It is an axiom that the more data you have, the more interest you create, which means the more players in the field, which in turn creates more demand for still more information. Everyone can find holes in current data sets that they think should be filled, and this can lead to an exponential growth in data that is the bane of data collectors and data classifiers.

This has been precisely the case with the Knowledgebase. "The data collection is growing, but data standards need to grow before that data collection grows," Braddock says.

And grow it has. According to Braddock, five years ago the AIHW would add about 60 new metadata elements (definitions, classifications and so on) per year. Last year they added 500. He says that new groups of users coming in have added new categories and subdivided existing categories to a more granular level.

Currently the MMU manages between 2500 and 3500 metadata items, a figure that fluctuates as new items are added, new areas of interest raised and old items discarded as superseded, amalgamated or whatever. Approximately 1300 items have been dropped from the registry over the years as no longer relevant. Nevertheless, in recent times, there had been a growing view that the international standard that underpinned the metadata was simply not up to the requirements of the users.

"We had been struggling with the limitations of the old standard," Braddock says. "Trying to squeeze whatever metadata we had into old concepts, you ended up with some very funny looking elements."

Added to this was the realization that the Knowledgebase's underlying technology and interface were proving to be seriously dated.

The news of a revision to the international standard, launched in 2003 and titled ISO/IEC 11179, International technology - Metadata registries, was the final blow. Time for a change.

In 2003 the AIHW began planning for a changeover that would include: a total redevelopment of the Knowledgebase; an improved user-friendly Web interface with greater functionality; comprehensive background information, assistance and tools for those developing metadata; and enhanced functionality for those whose job it is to maintain the metadata content.

They would be supported by a new system of up to 10 registrars and 10-20 metadata stewards. The registrars are part of the AIHW (outside the MMU's own small contingent) who process new metadata items, and the stewards will be subject experts scattered throughout the community of users, developers and commentators. They will be given responsibility for keeping the metadata up to date.

Braddock says they have never had people in a steward role before, a further indication of the need for increasing sophistication in the metadata field.

While 2003 was taken up primarily with conceptual issues, in February 2004 systems specifications and design were developed. A call for tenders was issued in May 2004 with a very tight deadline: closing date of June 1. Ten tenders were received, and in July the AIHW chose a small, R&D-based organization called Synop, located in Sydney and Canberra, using its Sytadel XML-based content management system.

Synop's other clients have included the United Nations Joint Logistics Centre and the Australian Competition and Consumer Commission, both of which have engaged the company to build Internet and Intranet/Extranet Web site/CMS facilities. This is the first metadata registry project the company has been involved in.

The AIHW's plan is to launch the revised Knowledgebase - retitled METeOR for "metadata online registry" (the extraneous "e" added to signify "electronic" as well as for aesthetic reasons) - in March 2005.

"This is a big project," Braddock admits, "a massive change from the old standard to the new one . . . quite complex . . . and of course we have discovered complexities we did not expect."

He adds, almost as a word of warning to other metadata and database managers, that the data re-engineering issues should not be underestimated. "But it has been a very valuable exercise in terms of providing flexible building blocks for metadata developers. It's quite worthwhile doing."

METeORic Challenges

"We regard this project very much as bringing together the three different sectors [health, community services and housing assistance]. Meeting their needs in an integrated fashion is quite a challenge," Braddock continues.

"A key issue for us is also the usability of the site - we're actually building in a lot of user support functionality that explains the workflow and the process that data standards create with the aim of trying to create high-quality metadata. There's various help functions that we're building in, and a lot of online data entry. So there's quite a bit of building in there but we're optimistic that it will work quite well."

To an outsider, it seems amazing that some of the facilities and programs being instituted for METeOR have not been done earlier. However, Braddock warns that it takes years to get the proper governance processes up and running.

This means that metadata management units around the world and the databases they support have apparently been soldiering on with less than perfect systems, which must have an impact on the ability of users to make the most of the often extensive content of databases.

In the end, much of the content remains unusable because the data has been collected inconsistently. This means that much of the data collection effort goes to waste.

The MMU is using the redevelopment to restructure its metadata items, although this has meant a major increase in their number. Prior to the redevelopment, they had about 1300 metadata items, but this has increased to upwards of 3300 in METeOR. Much of this increase has resulted from moving to the new structure, which entails breaking down each existing metadata into several components.

"We've developed the migration environment, so we've extracted the metadata from the Knowledgebase into an interim environment that supports both old and new structures. We're basically restructuring the metadata as we go along and doing quality assurance on the product before we download that environment into METeOR.

"As we go through the restructuring process, we also identify any inconsistencies in the existing metadata items. So what we're doing is documenting all of those concerns which will go through the resolution process [at the end]."

While the increase of 2000 metadata items is, thankfully, less than they expected, it is likely that will be by no means the final figure.

The project tender specified that the final system would have to allow for a potential 10-fold increase over the next two years. This astounding growth, Braddock asserts, is an extreme scenario, but such is the growth and interest in the field that it is a possibility. "There are a lot of areas we don't cover, so new groups are adding a lot of new topics. And the more metadata we add, the more we support the production of quality and consistent data. So while it is a lot more work and maintenance, the end result will be better data."

Plus, presuming the success of the new system, a greatly increased user population will arise. Not that they are currently considered in overwhelming numbers. It is anticipated in the tender documents that there will be 100 consecutive users, with only about 10 of these concurrent at most.

"This is a very specialized area," Braddock says. "This [100] is the number of people who actually know about this stuff and understand it; they tend to be developers."

There might, in fact, only be one developer, say, per state government program, with that person acting as conduit for all the suggestions and requests for metadata changes from a greater number of data users.

Despite the small number of users, security is considered an issue, particularly as much of the content in the AIHW health, services and housing databases is identifiable on an individual person level. AIHW staff are required to sign a privacy provision under the AIHW Act, reflected in their signing a confidentiality undertaking that they will not reveal any person-specific information. Penalties for breaches include imprisonment. Like priests, AIHW staff and its collaborators cannot be forced to reveal confidential AIHW data, even in a court of law.

While the metadata information is not particularly confidential - it is purely a listing of categorizations and definitions, not the data itself - there was a need to ensure that the METeOR Web site could not act as a potential back-door breach into the AIHW data.

Consequently a decision was made to adopt external hosting of the site, which will be handled, via the Synop contract, by Planet Internet Services.

Building Block Approach

A major issue for a unit such as Braddock's is integrating the metadata input from different sources, which he says "is a bigger job than anyone could imagine".

"It's a massive problem that everyone talks a different language.

"Everyone defines things according to what makes sense to them. That's fine for them, it serves their data purposes, but it causes problems when trying to compare data between silos. For our purposes, if you can't communicate data between sectors then you're in trouble in terms of comparing the usage of different service types.

"The new model for the ISO is very good because it has components that you can reuse again and again for different metadata, for greater consistency and comparability of data."

Braddock describes this as "a building block system" that creates a picture of an individual or circumstance based on standard elements. For instance, the descriptor for a person's height is made up of five components, starting from the most basic definition of what a person is (called an object class), the definition of what height is (property), a description of the gathering process of person height information (data element concept), the concept of measuring length or height (value domain), and the actual height to nearest 0.1 cm (data element).

This may sound complicated on first viewing, yet the old system used a simple measurement but then made no allowance for conceptual or definition differences, creating misunderstandings and/or errors.

The new process is more systematic, useful and flexible, Braddock says.

While METeOR is cutting edge in terms of its concept, as an IT project, it is reasonably conventional. The Knowledgebase has never suffered a system crash and it is expected that METeOR will run as smoothly and efficiently as its predecessor. The major IT issue has been data migration, but at time of writing, there had been no problems.

What have proved to be of concern are the project management issues such as restructuring the metadata items, incorporating the new standard and, eventually, educating the user base.

"We have a lot of developers, registrars and approving committees that will need to be brought up to speed on what we've done, so we see that as a pretty major task. We will certainly be doing training of these people," Braddock says.

METeOR's Future Course

Considering the MMU's leading role in health, community services and housing assistance data, it could conceivably play a consulting role in assisting others with similar registries to establish or redefine their own operations.

This might not be limited to health and welfare metadata.

The AIHW's Web site suggests that it can provide services in such fields as defining and classifying data, developing minimum data sets, managing data collections, and developing and testing data standards.

And so it can, in other areas of the AIHW. But at the moment, entrepreneurial activity is not the primary focus of the Metadata Management Unit.

Braddock admits that the current project is taking up all of his unit's time. Not to say there is not the occasional informal advice passed to other agencies, but formal consulting processes are still some time down the road. Six staff can only stretch so far.

There is the possibility that METeOR may form the basis for a metadata server for national electronic health records and Health Connect. If this happens, then METeOR would quickly assume mission critical status, "and that will be a whole different ball game", Braddock acknowledges.

Looking even further afield, if there should ever be the need for a global metadata registry for health, community services and housing assistance, then the AIHW's metadata unit is well placed to fill the role, at least as a model if not the actual facility.

But just not straight away.

In the meantime, the MMU is flat out with its current load. However, despite their small size, the minuscule user base and the apparently routine project budget ("We're a small agency, so this is a fairly big investment for us"), the METeOR project should give health, community services and housing assistance policy makers in Australia (and at times the world) access to better organized and more accessible information, assisting in better informed assessments and policy decisions.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about ACTAustralian Competition and Consumer CommissionCMSHISiECInformation Management GroupISOOECDOraclePlanet InternetPLUSProvisionSpeedUnited NationsVIAWorld Health Organization

Show Comments