Taming Enterprise Data

Taming Enterprise Data

One day soon Telstra wants to be able to fill its customer service representatives in on every single thing there is to know about each and every customer they talk to - before they even pick up the phone. All the data is there now. Like many organisations, Telstra's real difficulty is finding ways to integrate that wealth of detail from a host of separate databases. Now the telco is well into an innovative pilot dedicated to finding new, more efficient ways to do the job - a pilot aimed squarely at cutting the notoriously high cost of such integration exercises.

"The cost of data integration has always been high," says Telstra executive director of customer process and information Dwight King. "People can build a GUI for front-of-house staff - and, of course, there's lots of options in how to do that," he says. "The hard part is moving that data from wherever it is in your company to that front-of-house system. And typically for these front-of-house systems, you just end up building many, many interfaces, and 80 per cent of your costs and time goes into the interfaces." So Telstra's pilot project with developer Expert Software Services aims at building interfaces to its core systems using Microsoft's Distributed interNet Application (DNA) architecture interface. DNA is based on Component Object Model (COM), a software framework for creating objects that is used to build distributed Internet applications.

Like Telstra, many other organisations in both the public and private sector have long struggled with the problems of integrating data from a range of disparate databases. E-commerce implementations and follow on business-to-business efforts are compounding the problem, placing unheralded demands on databases as growing numbers of organisations strive to link their back-end databases and applications to their Web site. At the core of Telstra's project is its efforts to prove the new way of building systems, which uses component architecture to develop the components required as opposed to building or buying whole new systems.

At the time of going to press King had no idea whether the pilot would handle the volumes required or be sufficiently "fast, cheap and disposable". He says he's never seen a real-life volume application that would allow, for instance, a new sales program to be put up every few weeks. "We see people make table changes, but when those require different data to be brought to the front of house, that's really hard. We're seeing it still but we're talking in the few months category versus the nine months to a year that it typically takes to build and deploy those interfaces." But the need for data integration is so crucial that if this pilot doesn't stand up, King says, Telstra will immediately turn to the next approach that might help it live up to its dream.

The real challenge in delivering data to the Web lies in finding cost-effective ways to integrate data from an increasing number of distributed data sources which historically were never expected to talk to each other. There are many different approaches to the problem - not all of them very effective. "Too often what people do is to simply transfer data from their existing systems in bulk to their Web systems, and the Web systems then become rather disconnected from their main internal systems," Informix technical director John MacLeod says. "You often get significantly more value by making direct connections between the Web systems - the external systems - and the internal systems, so that you have real-time access rather than a bulk access."

At its most fundamental, data integration is whatever set of processes occurs when multiple applications from diverse developers exchange information. The process typically involves four steps: extraction, transport, transformation and loading. Some organisations adopt separate programs to perform each of the four separate functions. Others invest in message-brokering software in the interests of automatically managing all aspects of the data flow.

Message brokering tools can add real value, creating a source all applications can go to where data integration is conducted across multiple sources, although such tools tend to be used most for real-time integration of data from multiple systems. Another tool in the data integration armoury is XML, which is increasingly proving instrumental in cutting data integration costs by helping sites ease the challenge of unifying the formats of data from different sources.

And it is XML that may ultimately prove the salvation for the Australian Oceanographic Data Centre, as it attempts to integrate ocean current, temperature and salinity data for use by the Defence Department in Mine Warfare. Director Ben Searle says the project is highly complex, in part because oceanographic marine data can incorporate three or even four dimensions and even a simple temperature observation can take something like 200 ASCII characters. There's also a wide range of instruments used to measure observations and create data. To make matters worse, most scientists that collect data have their own peculiar data formats and data structures. "So you can get the same parameter - it might be ocean temperature - measured by different instruments to different levels of accuracy in 10, 20, or 30 different formats and structures," Searle says.

Integrating the data has been a laborious project that has involved running a simple program to convert data from one structure to another. It's an enormous job, with the process of writing, testing and checking each translator taking several days. To get around the problems the centre is evaluating the possibility of using object structures for data, which would let it store people's data closer to their originator's format and close to a real-world structure within an object environment. It is also looking at trying to establish a marine-specific version of XML. That's where the power of the Internet really shows up, Searle says, by providing an underlying framework developers need to know little about.

"The Internet is providing a very powerful framework, on which people can build applications that in most cases they know will work, without necessarily understanding the underlying technologies. That's a linkage and a framework that we haven't had before and it is making life much easier," he says. "We can send a packet of information around the world to somebody and as long as it is ASCII they can read it. Whether they can make use of it is another story. That is where something like XML is making the difference, enabling people to use machines to understand it rather than having people sit there and try and translate what it's all about."

Open Standards

Things are slightly less complicated but standards no less important for the NSW Land Information Centre as it makes its maps available on the Internet. The project involves integrating customer information stored in an Oracle database with spatial information stored in Informix. An API handles user authentication and access audits as the centre tries to leverage existing technology in a modular way.

"By keeping it separate and keeping it in fairly open industry-standard formats we're able to use the best bits of technology for each link in the chain, rather than going for a one-vendor solution across everything," says Tony Rooke, executive manager of information management and technology. "That's important because in a spatial data warehouse scenario, which is what we are, there is no single-vendor solution that does everything well."

A major challenge for the centre has been the need to cater for a heterogeneous internal environment embracing Unix, Mac and PC machines. Rooke says its been hard to develop tools that can run in multiple browsers like Internet Explorer or Netscape, have the level of functionality required, and yet are as browser-version independent as possible. The centre is dealing with the problem by doing as much server-side processing as possible. "That way we are just largely painting pictures and sending it out to relatively thin-client-based solutions. Given the moving targets of Java and browsers themselves, for the moment it's really the only viable solution," Rooke says.

Different Strokes

Facing similar problems, South East Water has adopted a rather different approach to integrating its data. When it sprang from the former Melbourne Water five years ago, it became obvious South East Water would need to make available to field staff and customers data from various databases spread right across the organisation. To address the problem, Expert Software Services initially redeveloped a call centre application for South East Water from an Access application to an Access VB Server application, and is now looking at turning that into a Windows DNA application.

Martin Dunkley, manager of the geographic information system, South East Water, says the organisation uses middleware to connect its Custima customer database with its Intergraph GIS system. Integration has been made easier by the organisation's use of paracentroids, a concept used in proprietary GIS software packages to describe a variable, tied to a coordinate system, that represents the approximate centre of a polygon.

"In our map base we have the street numbers, and behind the street numbers is that paracentroid, which is a unique number. So you would see a street number in our graphics, and the text note or the text point for it would be the paracentroid; that's the x and y coordinates and that's the common denominator we use between the two databases," Dunkley says. "So we have an application interface that does the ODBC connection for us. We've had this sort of connectivity and this sort of logic in our map-based CIS for many years now. When we updated to Custima about four years ago the concepts were all there, and we just repeated the concepts with Custima."

Dunkley says since the organisation had never dealt with a Progress database before, configuring the ODBC connection proved a little bit complicated, although at least the concepts involved were familiar. He says the system, which gets "quite a heavy belting", is pretty robust and works reasonably fast.

Perils and Pitfalls

Whatever approach you take to data integration, there are things you can do right at the start that will stand you in good stead as you seek ways to avoid having to constantly fork out for cost blow-outs.

For a start, you should take a leaf out of South East Water's book and always go with the lowest common denominator, Dunkley says. "Paracentroids identify property, and over on our CIS that's pretty much the lowest common denominator there is. So we go property to property, and that enables us then to travel through the Custima database to get to owners, to get to metering information, to get to any other information that we need to. But we've gone with a really low common denominator, something that's inherent in the map, and it's a logical piece of data in the map, and it's also a logical piece of data on our CIS."

It's also crucial to figure out what data is needed, how to convert it and how to clean it up so that it can be loaded into new applications, Informix's MacLeod says. "I guess the temptation is to assume that the source data is correct and that all that is needed is to simply pull it from multiple places into a single place and put it together," he says. "That's the biggest pitfall. You've got to go into it with your eyes wide open, prepared for some hard work in reconciling apparently conflicting data from different sources." MacLeod says one of the most valuable exercises you can undertake before integrating your data is to get a firm handle on what data is reliable and what is suspect.

And avoid above all the costs of sending a boy to do a man's job, says Lloyd Borrett, general manager Expert Software Services. Too many organisations rely on Web developers with no expertise in integrating the data from their back-end systems. "Most of the existing so-called Web developers or e-business providers are of the polo-necked, pony-tailed brigade, who come at it from a design point of view and an aesthetic point of view, but have got no idea what's required to integrate the data into back-end systems," Borrett says. "You've got to start with people that really know what they're talking about and aren't just talking the hype and the sizzle - that there's some actual meat and potatoes there in capability to deliver these solutions and understand the complexity of large-scale back-end systems."

Be prepared to re-engineer your business, if necessary. It's no use getting the technology right unless the rest of your processes are going to make maximum use of that technology, and vice versa, Borrett says. He draws on Bill Gates' concept of the digital nervous system - as outlined in his book, Business at the Speed of Thought - to illustrate the point. The digital nervous system concept describes attempts to enable collaboration and information sharing, while streamlining business processes or line of business-type applications while also using e-commerce for collaboration and partnering with customers, suppliers, distributors.

"It strikes a chord, as a good way to explain to people these are the areas you've really got to focus on here, and you've got to make them work collaboratively together over this new technology platform," Borrett says. To achieve that kind of capability involves building things in entirely new ways. That's why, as Telstra's King set out to prove the new approach, the first lesson he learned was that a new approach demands an entirely new team. "Our focus was on attacking the time to market," King says. "It wasn't a technology solution we were looking for; it was, how do we get good time to market for new products and services and sales campaigns and so forth. Everything we did was focused around reducing that time.

"And if you've really come up with a tool that allows you to build things differently, why would you use the same design methodologies and why would you use the same testing protocols as usual? If I've got a system that I've got to roll out to 22 locations, and I've got to swap it all over at once, I'd better do rigorous testing and make sure it's all okay, because I'm going to change everything at once. If I've got something I'm going to put on a server farm and just have certain people accessing it using IP-type protocols, well gee, I just put it up on one place. I've only got a few people accessing it; if it doesn't work, I shut it down and I start over."

The new reality means the approach taken by traditional testing teams used to a three-month testing cycle will no longer wash, King says "Some people can change, some people find it hard. So typically I find it helps to start with a fresh team that's got a different mindset." You also need to develop partnerships with very patient and willing users to help prove the new techniques, King says.

Telstra is not yet prepared to endorse the new component development methods King says, but is determined to find ways to make the change. And he fully agrees with Borrett about the need to be prepared to re-engineer processes to maximise data integration efforts. "We're not ready to say that this is the right thing to do. You can see there is the one piece of it, which is the system, and the other part . . . is how we would like to do business. That takes more than one system, that takes a mindset change in how you want to relate with your customers," King says.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!

Error: Please check your email address.

More about Defence DepartmentInformixIntergraph CorpLogicalMicrosoftOracleTelstra Corporation

Show Comments