Dealing with an accidental data warehouse

By Martin Rennhackkamp for PBT Group

Johannesburg, 19 Sep 2006

In many organisations we find what is called the accidental data warehouse. These data warehouses have typically grown as the result of individual un-coordinated projects.

Characteristics

In the accidental data warehouse we often find different data stores supporting reporting and analytics for different business units or different user communities, but with no integration between these. These are often the results of individual analytical or reporting silos with no common models and almost no common understanding of each others functions in the business.

In this environment one also often finds tool-specific information delivery - an OLAP cube here for an OLAP tool for one community, flat file extracts over there for an analytical tool used by another community and one or more conventional reporting tools operating off data marts, some with dimensional models and others with hybrid prit-and-bubblegummed data models.

In such an environment you often find the same data extracted more than once from the same source system, but each time transformed using different business rules. Standard definitions typically do not exist and you often get the same term defined in multiple ways by different business communities. Users are often creating applications (such as complex Excel spreadsheets or loose little Access databases) on their desktops to further tweak the data before reporting from it.

In short, it results in disparate and inconsistent reports, and total confusion should the same data ever end up on the same desk along different routes.

Implications

The biggest danger this introduces is that you would get a different answer to the same question if you ask it from different people in the organisation. Now the question is: which of these answers actually finds its way to the exco? Is it the correct one?

Other implications are a lack of data availability across the organisation, lack of common definitions, serious problems in the compliance space, increased information delivery costs, excessively long "time to market" of any new ventures, and very crucially, a serious lack of confidence in the data.

This results in reduced business productivity. A recent survey showed that 70% of users` time is spent manipulating data and only 30% of their time is spent analysing the data. It is estimated that 75% of a typical IS function`s time is spent on extracting, preparing and manipulating data, with only 25% on reporting, analysing and presenting it to the business.

What to do

The first step with any such a serious problem is to realise that you have the problem, acknowledge its presence, and make the conscious decision to do something about that. Of course, that is not so easy in a large highly federated environment.

The next step is to get proper team structures in place that can eliminate the problems as they carry on with day-to-day operations as well. This requires very strong leadership of the team and very strong business sponsorship. Such an initiative must be driven and be mandated from the top. A small little unit hidden away somewhere under finances or customer care is not going to have the clout to eradicate the inconsistency and independence demons. It also requires skilled and experienced business analysts, data modellers and data architects to resolve and correct the inconsistencies.

The next step is to scope and prioritise what exactly is going to be fixed. Which silos are going to be integrated, what are the business functions they support, and if any are excluded from this exercise, why this is allowed and until when is it allowed?

This is followed by detailed analysis, focussing on the usage of each silo`s data. Which user groups are supported? Are there any cross references between these? What are the business value contributions of each silo? What is the granularity and frequency of the data utilised? Where are the data items originally obtained from?

The rest of the process is "divide and conquer". Based on the analysis performed, mini business cases must be drawn up, estimating the potential costs of fixing the problem and the resulting increase in revenue or decrease in operational costs.

This is followed by developing high level subject area data models for each area, ensuring that these models are consistent with respect to each other and that they conform exactly on the various integration touch points. This is a crucial step. The reason that it is done per subject area is to limit the scope of each exercise to a handleable chunk of work with a limited and focussed enough audience. The crux though is that these subject area models must be integrated and conformed.

Part of this process is forcing the business to come up with standard definitions and business rules, which have be accepted and signed off across the entire business. This is a political minefield to walk through, and this is where the business sponsor must ruthlessly blast down the obstacles to clear the way for the business analysts to perform this task efficiently. Political games and domain protection can seriously stall these efforts and frustrate the people doing the work.

From the integrated subject area models, an implementation roadmap can be drawn up - how the various parts of the models can be populated and what reporting and analysis can then be performed from these. Of course, this has to happen in conjunction with supporting on-going reporting and analytics requirements, so the implementation roadmap is often dove-tailed with the projects on the table at the time. The philosophy is basically: if you touch an area for new business requirements, fix it into the integrated subject area data models at the same time, and redo all reporting of the affected business areas off the improved data structures.

In this way the data warehouse is integrated silo by silo, in alignment with current business activities, every effort being cost-justified by its mini business case.

Summary

An accidental data warehouse is like a heredity disease - it is not a shame if you inherited it, you just have to adapt your lifestyle and care programme to deal with it. Once an organisation has realised and acknowledged that it has an accidental data warehouse, and it has decided that it wants to do something constructive about it at a high enough level, there are processes and techniques that can be followed to slowly fix it without a drastic impact on the business, and eventually even eliminate all its resulting complications.