Dirty Little Secrets

Dirty Little Secrets

For years the Australian Customs Service has endured ongoing problems resulting from dirty data.

For example, the confusion generated by having five different standards for the naming of a port operating within a single export system. Or the headaches caused by the independent modification of numbers of postcode tables - which all come from the same original source: Australia Post - to suit the needs of different systems. Or the fact that most of the data that comes to it via commercial invoices has been generated by businesses resident outside Australia with little interest in ensuring that the data conforms to Customs' standards and requirements.

The situation is so bad that director data management Barbara Toohey describes the task of cleaning up and integrating some of that historical data as "immense".

"We're talking about a lot of data, and we're talking about an integration problem that would be expensive to implement in the back end, in a data warehouse solution," Toohey says. "It would probably cost at least half a million dollars - even more - just to put in place a couple of simple integration exercises. We did look about 18 months ago at cleaning up our client identifiers. That was just one data field: one very small piece, to allow us to put the data relating to the same clients together. We found we were looking at $0.5 million just to do that."

Now Customs is taking a closer look at the concept of data quality and taking significant - and determined - steps to achieve it.

It's not that having dirty data has an impact on Customs' main function of controlling the imports and exports of goods into and out of Australia. On the whole, it doesn't. In almost all cases, Customs captures the data sufficiently well to enable that business process to occur. What it hasn't been able to do, until now, is to perform some of the integration exercises it needs to conduct in order to gain a better picture on an overall client profile, so that it can make better decisions in terms of risk analysis. Toohey is determined to change all that.

"We want to get a better picture of the client's whole trading pattern, across both imports and exports, so that we can make appropriate risk assessments on their complete trading profile," Toohey says. "It's more about identifying appropriate clients and facilitating their cargo, so that we can focus more carefully on a smaller group who may be of a higher risk. Its called ‘reducing the lake to the size of a pond that we might be able to search'," she says.

Then there have been the ongoing headaches associated with what Toohey calls the "query users from hell".

"The query user from hell is the research and analysis guy who's got access to the tables and goes in and decides that he wants something different to the standard report that's out there," Toohey says. "He chooses a few tables, makes what he thinks is an appropriate join, asks for six months' worth of detailed information which he doesn't really need but that he thinks he needs, and sets it running.

"There are people like that in every organisation. For us it's important that the information that they're accessing is in an environment that is going to be protected from our production-transaction environments, because we can't have those slowed down by that query from hell."

Toohey knows even the most experienced, mature and knowledgeable power user will occasionally mistakenly run the query from hell. She's even done it herself. She also knows the key to addressing the problem doesn't lie in restricting power users - who comprise around 5 per cent of the organisation's staff - from access to functions they can competently use.

She also accepts that giving people the analysis capabilities they need automatically carries the risk that they'll slow down the system by putting in a big query. "That's a risk you have to wear if you want to make that research capability available to users."

Instead, she sees training, education and access to good quality data as the key to resolving most of Customs' data headaches. After all, it was making that research capability available that was a key business driver for the introduction of a data warehouse to Customs in 1997. "One of the main reasons for data warehousing, of course, is that you restructure your data in a more appropriate format for research," Toohey says. "But another main reason is that you are protecting your production environment, your transaction environment, from the demands of research queries."

Now Toohey hopes new and better focused data management strategies have put Customs firmly on the path to resolving these and numerous other problems to do with data quality.

Customs' core business centres on the movement of people and goods. During the 1997-98 financial year, Customs facilitated movement of more than 14.8 million passengers through international airports and cleared more than two million parcels and 1.2 million bags of letter-class mail. Customs generates $20 billion in excise duties for the federal government every year, second only to the amount raised by the Taxation Department.

Among other things, the service manages border control and airport passenger checks to prevent illegal drug smuggling and illegal entry of goods into Australia, facilitates trade and movement of people across the border, and helps Australian industry through delivering government support measures like bounties and tariffs. Its 4000 staff manage an extraordinarily diverse range of activities including assisting with 3.6 million import and export entries a year, clearing 3.7 million air cargo assignments, and conducting nine daily surveillance flights over coastal and offshore areas.

While Customs had long known its continued ability to serve those goals meant overcoming its data quality limitations by putting in better data management strategies on the front end, the issue came to a head with the introduction of the data warehouse.

Needful Things

In 1996, dedicated business groups within Customs identified a general need for data warehousing capabilities across the organisation to increase the accessibility and effective use of information and reporting. Customs' priority was to obtain access to a full range of reporting functions, with multiple delivery channels, including its intranet. It also was determined to help a range of employees to access and manipulate data quickly and intuitively.

To that end, it chose Brio Insight for intranet delivery and reporting, and Brio Explorer for analysis and research, running on a Sequent [now IBM] NUMAQ platform interfacing with an Oracle RDMS. In a later iteration, it used Brio's Executive Information Systems (EIS) capability to develop front-end menus. "We recognised that a data warehouse was needed to meet some of our research and analysis and business information requirements," Toohey says. "It proved to be very successful from the end user's point of view, with power users and people in compliance and research and analysis finding it to be extremely successful."

One benefit of the data warehouse system is the ability it gives the Excise branch to apply research and analysis techniques on historical data to improve operations. For example, one program involves providing certain industries, such as mining and farming, with a rebate on diesel fuel purchases. Until the introduction of the data warehouse, Customs had minimal control over illegal claims. Now data is stored in a structured environment and staff can write an ad hoc query to track claimant historical data and patterns so it can better target the compliance activities of the claimant.

By tracking the results of compliance audits over time, staff can identify areas where compliance has provided most return. Customs will soon use this ability to examine the historic results of its randomly conducted examinations of air and sea cargo consignments. This will determine what type of consignments represent the highest risk and will allow the most efficient use of resources to effectively manage border control.

Already, Toohey says, some "crooks" have been captured whose activities might otherwise never have been detected. Customs has also identified some other high-risk clients whose activities it may never have been aware of without the data warehouse capability.

The data warehouse has also reinforced the knowledge that the data Customs currently holds is simply not good enough to allow it to do some of the advanced analysis it requires, or to meet some of its data mining objectives. "It really highlighted that we had to do something about the quality of our data, do something about capturing what our data means to people - and that's meta data - and then do something about making that information about the data available to people through some organisational strategies," Toohey says.

Now an ongoing re-engineering exercise, the Cargo Management Re-engineering project (CMR), has provided an ideal opportunity to clean up some of Customs' dirty data. It's also proved the ideal means to introduce improved data management control, and ensure future Customs data will be of the highest possible standard.

Meanwhile, since 1996, Australian and New Zealand customs services have been jointly working on the Trans-Tasman Cargo Management Project, designed to harmonise systems and procedures and reduce compliance costs for businesses. Discussing the initiative in December 1998, trade ministers described it as "a useful initiative looking at ways to reduce compliance costs at the border".

Going with the Flow

CMR represents a step forward in the way in which Customs deals with the increasing flow of cargo into and out of Australia. Under the project, the service gets an integrated IT system to replace its current separate systems and also vastly more sophisticated tools to facilitate timely identification of high-risk goods.

CMR will allow for the development of flexible business relationships which will enable the CEO of Customs, in consultation with relevant government bodies, to enter into a compliance agreement with individuals and entities involved in the import and export of goods. The idea is to enhance voluntary compliance and facilitate international business trading processes by allowing approved persons to operate in a more flexible environment.

At the same time, a sub-set of CMR, the Customs Connect Facility (CCF), will facilitate secure Internet and voice communications with Customs cargo systems as well as other Customs business applications. The CCF will support a number of commonly used messaging standards and provide access through a range of communications technologies.

Toohey says that along with some significant re-hosting developments under way within Customs, the re-engineering exercise is proving an excellent catalyst for data cleansing. "Our whole cargo management re-engineering project is about bringing together existing systems that are disparate into one integrated system. That in itself will improve the quality of our data because it will be integrated from the word go."

CMR brings together the four completely independent systems that currently capture air cargo reports, sea cargo reports, export declarations and import declarations. All of these have independent reference tables, independent standards and independent client identifiers. "Trying to integrate that data has proven to be extremely difficult. So the fact that we're re-engineering those whole four systems into one integrated whole should make a huge difference," Toohey says.

Customs is also focusing strongly on ensuring only relevant data providing good business returns is fed into the data warehouse. Otherwise, Toohey says, the warehouse would quickly become too big to be manageable. "We're focusing very much on identifying what data will give us the best return and what data will be the best indicators for our risk analysis at the end of the day."

Foreign Objects

The genesis for many of Customs' data headaches is that areas outside of its control generate so much of its data.

For instance, businesses resident outside Australia typically generate commercial invoices for exports, which may pass through the hands of several agents before reaching Customs. Shipping information arrives via cargo reports, and Customs declarations for value and import duties and any tax requirements like GST arrive through agents like brokers.

"The original source is the same, but data quite often comes through more than one set of agent's hands, and for goods that are sold on the way to Australia it can come through a multitude of agents' hands," Toohey says. "There's no reason at all why they should conform to our requirements. It is up to us to then to try and match up that data."

Australian Customs is not alone. The customs arms of governments all around the world suffer the same problems with data standards and integration. In fact, there are numerous international projects under way that aim to standardise the use of commercial data in the global trade environment. These will make a difference, over time.

There are also well established systems operating within many commercial clients' own environments that are designed to ensure they meet Customs' formatting needs and reporting requirements. However, these were introduced when those businesses were relying on EDI for data exchange with Customs and are sorely in need of updating.

"As a result, any changes that are happening internationally are going to take some time to be implemented into business, and any changes that we try to introduce through new systems are going to take some time to be implemented into the business systems as well," Toohey says. "For us, it's really a matter now of recognising the problems being caused and the need to address them as far as we can within the realms of our control."

Full Consultation

There are highly formalised processes within the CMR project to ensure end users are fully consulted. Customs has also undergone lengthy discussions with industry, and there is a change management section dedicated to managing the process. That's in keeping with Toohey's belief that the biggest barrier to achieving more from Customs' data is the fact that staff have insufficient understanding of the data on hand.

"We can use advanced technology and put data into the hands of users and give them powerful analysis tools to manipulate that data. However, that won't help if they don't have enough understanding of the data itself," Toohey says. "The data that we're making available to them for research and analysis has come from our transaction systems. The data there is not necessarily equal to the data you see on the screen of the transaction system because we're extracting data from the back-end database that people don't usually see. The learning of what that data means can never be underestimated."

Users sometimes mistakenly assume that data presented to them is complete, not understanding that there can be gaps or the potential for duplicates. That makes education crucial to their success in using the data. In fact, Toohey says, less than 5 per cent of users fully understand the data available in the warehouse. For power users, that means intensive programs to give them a better insight into the nature and limitations of the data at hand.

However, to address the needs of the other 95 per cent, who neither have nor should be expected to have that understanding, Toohey is working on developing some well-structured data marts addressing individual business needs. "These gives them the flexibility of being able to choose the dimensions and the values that they want to look at, and of extracting information with the confidence that they're not going to make mistakes.

"What we haven't got available for users at the moment is some of the critical data for the organisation. We haven't got the cargo report data available to users at all. We have it in no research environment that they can get at. It's not accessible to business users. It's data that we haven't got in the data warehouse at the moment. It's in our applications.

"But we will be capturing relevant data for that into the data warehouse as the result of this re-engineering project. That should make a huge difference to most users," Toohey says. vExpert Advice: Maintaining Data QualityMark Atkins, president and CEO of Boston-based Vality Technology offers insight and advice on how companies can ensure the quality and integrity of their data, and optimise their information assets in areas like data warehousing, business intelligence systems, and e-commerce.

Q. What is it about online exchanges that makes them particularly prone to "dirty" data? Also what kind of problems can an online exchange face if it doesn't ensure the accuracy of data?

A. Data from multiple suppliers is a Pandora's box. First, there's the sheer volume of it: an online exchange receives catalogue data from dozens to thousands of suppliers. Then each supplier may have thousands of items in its catalogues. The data is in different formats, organised in different classification schemes, with its diverse product numbers or other IDs, different names and descriptions of the same products, and different terms and abbreviations used in the descriptions and attributes.

What's more, there may be various errors and inconsistencies within a single catalogue's data. To compound the problem, product data is always in flux, especially prices. So by the time the exchange gets the data, it may be out of data.

It's no wonder that normalising this data is a big and continuous job. But it's necessary - for customers to find what they want. If they don't, they click away. That means that a poor site catalogue will eventually alienate the member suppliers as well as the buyers.

The worst case scenario concerns inaccurate pricing: if an exchange has an out-of-date price that the supplier may not want to honour, there may be legal hassles for both the supplier and the exchange, especially if the supplier can show that it sent the right price to the exchange and the exchange didn't publish it online in a timely manner.

Q. If I invest in data quality software or services, how would it affect my bottom line? How does it help business (not just IT) goals.

A. Let me give you some concrete examples of some of the benefits our customers have received from my experience. First, there are some hard numbers of how data quality software and services affect the bottom line.

By uncovering duplicates for the same chemical compound, a global chemical producer was able to shrink inventory by 27 per cent.

By consolidating vendor information for its three chains, a major grocery and drug retailer was able to save $US500,000 on its first procurement effort after the consolidation project was completed.

By cleaning and consolidating customer records, the transaction-processing clearinghouse for a group of insurance companies identified dollars paid on claims due from other insurance companies, saving $1 million every three years. The data re-engineering also saved the company $US600,000 via volume discounts from window-repair shops most frequented by policyholders. Finally, the company was able to segregate a group of policyholders more likely to have adverse claims - who paid $US3 million less in premiums than the company was paying to cover their claims - and so was able to rectify the situation.

In the case of a merger/acquisition between two major banks consolidating their customer file data, manual integration was estimated to require the time and costs of 5 clerks working 45 weeks on double shift. Working with data quality software, a single programmer did the job in 18 weeks, with exceptions handled by a single clerk working 2 weeks.

An industrial firm using data quality software to clean and consolidate its data for a warehouse used for sales and services strategies estimates that the warehouse brings in $2.8 million additional revenues annually.

Now for some examples of the business benefits, often less quantifiable, of using data quality software and services.

By incorporating data quality software in their master patient indexing software (which enables health administrative personnel and employees to access the consolidated records of an individual patient in real time), healthcare software companies enable healthcare providers to deliver better care on the spot. They can also avoid huge liabilities. Consider what could happen if, during an emergency, an unconscious person was delivered a drug he or she was allergic to - through lack of access to the patient's historical records.

A major healthcare/pharmaceutical company which had used data quality software to load its data warehouse with clean data, then used data mining to uncover links between certain illnesses and effective treatments. It found alternative, less costly, but just-as-effective drugs for certain gastrointestinal ailments - helping its customers cut prescription costs by 10-15 per cent.

Across the board our financial services, telecommunications, utilities, pharmaceuticals, government agencies, and other customers tell us how data re-engineering helps them to become more customer-focused - to get business intelligence on their customers and households for better marketing, sales, and service.

Q. Do you think data quality has a significant effect on retaining customer loyalty?

A. Definitely. The more you understand your customers, their wants and needs, the more effective you'll be at gaining and retaining their loyalty. Don't take my word for it; look at all the companies spending millions of dollars on CRM initiatives for that very reason. However, without including a data quality initiative as part of a CRM initiative, the results may be a lot less than desired.

I believe Lou Agosta, an analyst at the Giga Information Group, summed it up: "Data quality is the weak underbelly of customer relationship management . . . " With that said, I think a little clarification of exactly what data quality involves, as it relates to CRM, is appropriate. Certainly, having the correct mailing address in the correct format for each customer is important. However, data quality extends far beyond a simple name/address cleansing.

Crucial data quality components include identifying the relationships between your customers and your products and your locations. Closing a store that isn't showing the highest profits for your company makes sense on the surface - but what if that location is the closest to your top 10 per cent customers? Loyalty will be the last thing that that store closing would improve.

What will the delay of a shipment of parts mean to your best customers? You won't be able to answer that question if you don't know the relationship of your customer to your products and of your products to your vendors. Could disaster be averted and loyalty improved if those relationships are known - definitely. No one likes to wait around for a service provider (especially during the 3-4 hour "window" that most provide).

Understanding the physical location of your customers and their relationship to your service centres (or even each other) would result in more efficient service. Who would you be more loyal to - the provider who can give you a date that they'll be there or the one that can tell you the hour that they'll be there.

At first glance, location information may not seem like a data quality issue, but it is. Quality goes beyond correctness and may be relative to the use or the purpose it serves. While data may be of sufficient quality to get a bill to someone, it may not be of high-enough quality to allow the maximisation of marketing efforts or the minimisation of credit risk. So, it's important to keep in mind that data quality has many different meanings depending on who's using the data and for what purpose.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about Australian Customs & Border Protection ServiceAustralia PostBRIOGiga Information GroupIBM AustraliaOracleSequentToohey'sVality Technology

Show Comments