Will dependent data marts strike a balance between huge data warehouse rollouts and piecemeal data mart implementations?Read this story to learn: How to avoid data warehouse project failure The best ways to create an enterprise data warehouse strategy Team structures for creating and supporting a data warehouse The invoice is one of the most elementary units of information in a company.
It is the protozoan in the information food chain, the single-celled organism that can't get any smaller. Name, address, item, price, date, amount paid...it just doesn't get any lower on the ol' corporate hard drive.
Massing these invoices together into a primordial swamp of information is the theory behind the data warehouse. The ooze of invoices becomes the basis for all the analytical information the company produces, the base numbers from which all calculations are derived. Staffers can analyse the company at the lowest level of specificity and the highest level of abstraction, gaining powerful new insights into customer habits, developing new products and selling more of them. All of which makes CEOs spontaneously yell, "Cha-ching!" The promises of data warehousing are so sweet that in 1997, according to a survey by Meta Group Inc., a Stamford, Conn.-based consultancy, companies spent an average of $1.9 million on data warehousing projects. But the reality of data warehousing is much more risky and difficult than the promise. Those big bucks are being poured into projects involving unprecedented levels of technical difficulty and painful organisational change. Another report from Meta Group says that 50 percent of these multimillion-dollar projects fail to meet the desired levels of success among the IS groups charged with combining all the ooze. And although few companies actually abandon data warehousing after an initial failure, many CIOs and IT project managers report failed first and sometimes even second attempts at creating a successful data warehouse.
Though data warehousing has been around for 10 years, the reasons for these multiple failures are only now beginning to be widely understood. Experience has shown that size really does matter to data warehousing success. Companies that build a single vast collection of data find these multiterabyte databases are expensive, difficult to build and maintain and can quickly overwhelm a company's network and technology infrastructure. Meanwhile, the small-scale, narrowly focused databases (a.k.a. data marts) pose a much more insidious threat. They have emerged as the quicker and ostensibly cheaper alternatives to data warehouses, costing as much as 80 percent less than a data warehouse, according to GartnerGroup Inc., a Stamford, Conn.-based research and consulting company. But data marts, usually built to accommodate the information needs of a particular division or function-finance, for example-present no incentives to share information with the rest of the company and can lead to a chaos of incompatible technologies, duplicate data and an overwhelming demand for system maintenance. "You don't notice there's a problem until the problems are manifest," says Bill Inmon, chief technical officer at Pine Cone Systems Inc., a data warehouse software company based in Englewood, Colo. "It's not until you build the third or fourth data mart that you see the problems with integrating data and with maintaining adequate IT support." IS must take the blame and do the cleanup when either of these extremes goes awry because so much of the budget goes toward the technical details.
According to a Meta Group survey of its clients, 60 percent to 80 percent of a data warehouse project budget is spent on processing data, rooting it from legacy systems, correcting it, converting it into a readable format and treating a usable, accurate collection of info ooze. Worse than the expense, IS alone cannot perform these tasks. Because no one can afford to spend money on data that has no direct relevance to the key business processes of the company, the business must help ferret out the data and define it within the data warehouse. With so much money being thrown at a complex, abstract effort and the large time commitments required from the business's best and brightest, CIOs have a lot to lose by managing one of these projects. The only way to prevent costly "do overs" in data warehousing is by putting together a strategy that balances the business's anxiousness to access valuable data with IS's need to create a stable database infrastructure that can be shared across the company and can be supported by something less than an army of database technologists.
Although approaches to creating such a strategy are all over the map, a hybrid emerges that combines a central data warehouse (to reduce IS's support nightmare) with many smaller data marts that pull data through the data warehouse so that everyone uses identically formatted data in their calculations. Known as dependent data marts, the approach can mean less chaos for IS and faster, more focused access to data for the business.
Sharing Is Hard
In 1995 3M Corp. built an enterprise strategy for data warehousing to go along with its plan to present "one face, one voice" to customers. Though 3M has long had a strong centralised corporate IS group, "we had the equivalent of independent data marts for 25 years," says Al Messerli, manager of global information management for the St. Paul, Minn.-based maker of chemicals and office products. According to Messerli, "The business units and geographies were very independent, and they had their own decision support systems. All those different databases had different definitions of data. Three or four years ago we realised that this worked as long as you had autonomous units but not if you wanted to begin sharing data about customers, products and markets across the company." Before he began building the company's new data warehouse architecture, Messerli spent a year and a half researching dozens of data stores and lining up support among the different "data owners" in 3M's many business units. But old habits die hard in a company with as many fiercely independent business units as 3M. Despite strong support from 3M's executive staff, "the honest truth," says Messerli, "is we're still dealing with the pride-of-data-ownership issues in the different business units. We won't be done integrating the data warehouse until late 1999." Messerli's team is working to root out the "thousands" of home-grown applications that exist around the company and tie them to the data warehouse.
He is doing so quietly and gently. "There are large numbers of people in the business unit support groups that built and maintained their own decision-support applications, and now we are asking them to depend on others for some of those applications," he says. "In many cases, they've been doing it their way for 25 years, so I would call that a major cultural change." Messerli says these local data czars will see some compromises in their applications and in overall performance, but they are also beginning to see the many advantages of being able to access data from all areas of the company. For example, the first implementation of the global data warehouse in 1997 focused on replacing 3M's U.S. sales reporting and order history databases and was fed with data from the nascent (and only partially full) data warehouse as proof of the new centralised concept. "The first step in convincing people to give up their own systems and go to the shared data warehouse was proving we can deliver," he says.
Data Marts Will Overwhelm IT
State Farm Insurance Cos. discovered the hard way that its data marts couldn't exist independently without some kind of enterprise-linking strategy. After building a 3.8-terabyte data mart in 1996 for its actuarial group to help it more quickly calculate insurance rates, State Farm's data management group was besieged by requests from other departments that wanted to access the actuarial database-requests that could not be fulfilled. "The actuarial data mart was designed to handle large amounts of data rather than many users," says Scott Hartema, staff director of the State Farm systems department at the Bloomington, Ill.-based company. "It was designed to be a departmental system." But Hartema realised that if State Farm kept building data marts this way-department by department, each pulling data directly out of many different operational systems and each customising the content and definitions for its particular users-his IS group would have an integration nightmare on its hands.
And the business would have conflicting, inconsistent data flowing from those myriad data marts.
"With a well-focused, short-term project scope, you can show ROI quickly [with a data mart]," says Hartema. "It's hard to say to business managers, 'Do the big picture first,' because they get what they want from a data mart.
Marketing can put one of these things up in three months, and it's tough to say that's a bad decision. But there are issues of data quality and consistency that cannot be addressed at the departmental level." For example, if each data mart has a different way of describing a customer, which one is correct? Is a customer someone who walks through the front door or someone who actually spends money? "If there's a top-down effort to [decide one code for everyone], it may add three to six months to the project. But that's time well spent for the long term," Hartema says.
Cost studies of data mart strategies seem to agree with Hartema. To build six independent data marts, each with a distinct set of applications and the extra storage required for those applications, costs $1.4 million in initial acquisition costs and $1.4 million in annual maintenance for the set. But it costs only $560,000 in acquisition and $761,000 in maintenance to build six dependent data marts that link to a central data warehouse, according to Pine Cone Systems' Inmon.
Data About Data Is Vital
Metadata is the information about the information contained in the data warehouse. It answers questions like, What system does this data come from? When was it last updated? How is a particular category of data, like "customer," defined in the database? Having many independent data marts across the company makes it difficult for new users (and IS, for that matter) to understand what the developers had in mind when they built the database and what data is contained inside. Having an enterprise plan for developing, installing and describing data marts gives IS and users the chance to create a single set of "assembly instructions" that everyone can read and understand.
Enterprise Strategies Require Quick Payback Beginning in 1991, State Farm's Hartema spent 18 months researching an enterprise data model designed to first define all the types of data available, second, show how to pull the data out of various old computer systems to make it consistent and third, demonstrate how to pool data in a data warehouse.
But State Farm's business unit chiefs sent him back to the drawing board.
"As soon as we got out of the research mode, we encountered immediate resistance," says Hartema. "The problem was the amount of time people in the LOBs saw they'd have to invest in the effort, and they couldn't see the business benefit." Hartema and his team realised that they could not afford to put the business on hold while they pooled all the data the company would ever need. If Hartema and his team were going to get business support for an enterprise data sharing strategy, the business people were going to need access to at least some reasonable chunks of data during each phase of the project.
So Hartema is now building a data mart for the State Farm marketing group that includes policy information (name, address and so on) approved by all the departments within the company. Data for the mart will be coming from his nascent warehouse. This way, marketing gets quicker payback for its investment and Hartema gets new building blocks for the big data warehouse to come.
Enlist Information Czars
Every company has business people who are obsessed with data and ways to slice it for new insights. In the old days of mainframes, they were the people who used to hang around the data centre waiting for reports. Today they may well be running their own secret databases to serve their division or function, and they know lots about how people in the company use and share data. IS usually doesn't have their kind of knowledge. That's why it is critical to enlist these people as "data owners" to act as advocates and teachers within their business departments. At 3M, these people were easy for Messerli to spot. "The advantage of the old independent business unit structure was that each unit had people who did this stuff," he says. "So we got them to be on the core user team when we started the data warehouse project. How are you going to get them to accept it in the first place-they have to be part of it, right?" Data Warehouses Require Constant Maintenance Once business people get access to the data warehouse, watch out for the avalanche of requests. "As people start using [a warehouse], creative thinking starts to explode. I thought it would take people some time to adapt to the new technology," Hartema says.
But IS quickly discovers that there are trade-offs in terms of storage costs and access performance that come from adding new columns to the tables in the data warehouse. "We've gone through the data warehouse four or five times already to get rid of stuff that people don't use and use that space to add attributes that they will use," says Duane McKinley, director of information warehouse services for PCS Health Systems Inc., a drug benefit management company based in Scottsdale, Ariz.
As companies recognise the need to develop a strategy for sharing information across the company, CIOs start feeling more lonely. There is no "technology" or packaged software available to do the job. CIOs have to do much of the strategic work of data warehousing on their own or with consultants. The only constants CIOs can count on are that data warehousing will become more difficult and consume more IT resources. The number of data warehouses projecting more than 500 users jumped to 28 percent in 1997 from 6 percent in 1996, according to a Meta Group survey. Fragmented, departmental or divisional strategies for creating data warehouses and data marts will not survive this kind of surge in demand. IS must minimise duplication of effort through centralised, repeatable processes that do not create bureaucratic delays in getting the business the information it needs. The strategic challenge is huge but will only get bigger the longer IS waits. "You have to think big, plan big, then start small and deliver quickly," says Messerli. "That's not easy."Senior Editor Christopher Koch can be reached at firstname.lastname@example.org.
Beat 'Em and Join 'Em
Different teaming strategies enlist aid and support across the company, but the goal is always the same: head off potential resistance through involvement At the Prudential Insurance Co. of America, which has a decentralised business and IS structure, Pat Komar, vice president of information systems at the Newark, N.J.-based company, developed a data group made up of representatives from the six different IS units in the company, each of which serves a product line business unit. "Most of the data reps in the different divisions had never met each other before this began," says Komar. "That's how decentralised we were." The data reps from the different IS groups within Prudential were charged with moving data from old computer systems to the centralised warehouse. They had to enlist "power users" in their business units to prioritise the specific information needs of the users and define unit-specific data. The combined teams of data reps and power users worked together to mould a company standard for such issues as data definitions, data consistency and technology infrastructure. The team also decided on the rules for building new data marts so that the marts fit into these guidelines while still letting the different units get at the specific information they needed.
At State Farm Insurance Cos., a technical group and a business group meet at least once a month, according to Scott Hartema, staff director of the systems department at the Bloomington, Ill.-based company. Data mart project teams also send representatives to the meeting to report on project status and raise any relevant problems or issues for group discussion. The group structure puts pressure on the various business representatives to enlist support and cooperation from their home bases because the central teams make decisions that will greatly influence people's ability to access and analyse important information locally. "These people recognise that they have responsibility to get buy-in from their own groups and from the data warehouse group," says Hartema. "Negotiating [data definitions] can be a tough process." Meanwhile, the group structure helps IT focus on whether its work on building the data warehouse and the various data marts is not only visible but understood.
Sometimes IS has the advantage of high-level executive support for an enterprise data warehouse strategy right from the beginning and can take time to plan. Al Messerli, manager of global information management for St. Paul, Minn.-based 3M Corp., was able to spend a year and a half researching and preparing an enterprise plan in parallel with 3M's broader effort to create "one face, one voice." During the course of the research effort, he created three teams that fanned out across 3M's many business units and geographic locations. The data research team interviewed hundreds of potential data warehouse users to understand their needs and requirements. The global data standardisation team brought together IS representatives to develop and implement standardised data formats and transform procedures to create consistent, integrated data in the data warehouse. The third team was entrusted with benchmarking the different hardware and software platforms available from vendors. "We tested five major database platforms and scaled from 1 to 200 concurrent users," says Messerli. "That separated the wheat from the chaff pretty quickly." With 3M's global workforce, people needed to be able to access the data warehouse from anywhere. Any platform that couldn't work with a Web browser was crossed off the list as was any platform unable to support access from 30,000 people. "If we had gone into this on a small scale, we would not have succeeded," says Messerli. "If we had gone in and started doing pilots or data marts without thinking about all these elements, we would have failed the first time we tried to do something with many users concurrently or with people who needed remote access. You have to think about these sorts of things before you get in too deep." -C. Koch
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.