With the hype surrounding data marts as a faster alternative to building a data warehouse and with an increasing number of vendors offering data mart products, users are pushing many IS organisations to implement them. While they may seem attractive as a data warehouse alternative, enterprises should approach them with much the same planning strategies as they would a data warehouse.
A data mart is a decentralised subset of data found either in a data warehouse or as a standalone subset designed to support the unique business unit requirements of a specific decision-support system. Even the most strategic of enterprises may end up with data marts, and some implementation guidelines can aid in this process. But before building a data mart, enterprises should be aware of a few common myths.
Data marts are small. The industry debate surrounding size as the determinant of whether an enterprise has implemented a data mart or a data warehouse is naive. One definition suggests that a database of less than 50Gb is a data mart and a database of more than 50Gb is a data warehouse. In truth, a data mart focuses on the specific requirements of a particular business function and maintains data and a data model to meet that need. Absolute size of a data mart is not the distinguishing characteristic because it may contain several hundred gigabytes of detailed data. A data mart also could have only several gigabytes of summarised data to meet the requirements of an executive information system-oriented application. The issue is that the data model of a data mart (which can be a subset of a data warehouse) is provided specifically to meet the known requirements of the application.
Data marts are easier to build and faster to deploy. A single data mart is less complex than a data warehouse because it focuses on a specific business problem that needs to be solved, but many complexity issues surrounding data acquisition do not go away. Data acquisition involves the extraction, consolidation and integration of data from the operational data sources to the data mart or data warehouse.
Most data marts require data from more than a single operational source and thus require applications that can perform the data acquisition process from multiple operational sources. This process takes time because data integration requires the same planning and administrative efforts as a data warehouse implementation as well as the need to model the data. These upfront efforts can cause implementation efforts to stretch to an 18-month deployment time frame.
A criticism of the data warehouse is that it becomes obsolete before it is rolled out because of quickly changing business requirements. However, data marts are more at risk of becoming obsolete because they focus on application-specific requirements, while data warehouses are built with application neutrality. As is often the case with data warehouses, by the time data marts are deployed, requirements have changed, departments have been re engineered or functions have been moved, making data marts less useful than originally thought.
Data marts easily can grow into data warehouses. In reality, data marts with a focus on a business-specific requirement cannot be scaled easily. They deliver application-specific data models that contain data elements spanning multiple subject areas and are modelled to the requirements of the application. But because data marts are application-specific, adding supplementary data is difficult without first remodelling the data model.
Also, because many infrastructure issues are ignored when implementing a data mart, it is more difficult to add to it when attempting to scale up with a greater breadth of data. For example, a data mart is a quick solution to a business problem, such as how to capture sales figures in Nebraska for the black patent leather pump. To add additional information about that shoe, such as the percentage of new customers in Idaho, a new data model has to be created. In contrast, data warehouses -- built in stages, one or two subject areas at a time -- provide application neutrality and the infrastructure to grow when adding additional subject areas and applications.
With those myths dispelled, Gartner Group recommends some data mart guidelines for the informed executive.
1. Staff data mart projects with a team separate from data warehouse projects.
Do not underestimate the resources needed to implement a data mart. A common mistake is using the same data warehouse implementation team to build a data mart. That team likely will be distracted by the immediate tactical needs of the business area for which the data mart is being built, thereby failing to provide the emphasis and planning needed for the data warehouse architecture.
With separate implementation teams, the work from the data mart team can be used for the data warehouse at a considerable saving of resources.
2. Apply a data mart planning effort for the data warehouse project.
The best approach to data mart implementation is to satisfy the requirements of the business area while targeting the planning and implementation effort toward the strategic goals of the data warehouse. Many vendors of data mart solutions (for example, Informatica Corp with its PowerMart suite) say such implementations are easy and avoid discussion of the more arduous tasks of data investigation, integration and transformation-processes that are required for data marts or data warehouses.
The data mart project leader should coordinate and communicate with the data warehouse team leader to avoid redundant data investigation efforts. The information gathered from this activity should be put in a repository of information about the data contained in the data warehouse, as should the source of that data and the transformations or derivations that may have been performed to create the data elements.
3. Be sensitive to business areas with the most urgent need for a tactical solution.
With pressure from business users to obtain critical business data, enterprises must determine which areas have the most urgent need for a data mart. This process can be expedited by analysing the data to be used and selecting business areas with overlapping data requirements, consequently reducing the amount of data investigation and deployment time required. To minimise the resources dedicated to numerous data mart projects, management must prioritise the projects.
4. Limit the number of data sources to three.
Data acquisition is the most complex aspect of any data mart or data warehouse implementation. For some enterprises, only one or two relational databases serve as the data source. However, because the average IS department supports five to eight data management technologies and 30 to 50 data stores, the data acquisition and integration complexities quickly can become too much for most data mart products to handle efficiently. If a data mart implementation deemed to be critical requires data from more than three sources, the added time and resources needed to build the data acquisition process may be better applied to a different data mart opportunity. The alternative to building a new data mart is to restrict the number of sources and deploy a data mart of limited usability.
5. Define a policy to prevent data mart proliferation.
Each data mart implementation brings additional data acquisition processes that must be built and maintained, which increases deployment, maintenance and administration costs. What would have been a sub project of operational system maintenance now becomes more complex, with multiple acquisition processes to modify each time an operational system change occurs. Enterprises should therefore enact policies to prevent the proliferation of data marts and require that the data warehouse, once implemented, be used as the source of data for data marts.
Data marts can provide business areas with an application-specific resource and data model to aid in decision making. Using the guidelines in this column, enterprises can provide fast data mart solutions to areas in need while advancing efforts toward building a strategic data warehouse architecture.
Kevin Strange is research director at Gartner Group's Strategic Data Management Service in the US
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.