About
Subscribe

Six steps to successful data warehousing

Johannesburg, 26 Apr 2001

Pat Holgate, managing consultant at Teradata Solutions Group, discusses six steps to successful data warehousing.

1) Business justification and return on investment

One of the most fundamental business aspects of a successful data warehouse is business justification and return on investment. In today`s environment a business cannot proceed with solutions that do not show adequate ROI from systems in a timely manner. Studies have shown that summary data provides very low valued answer sets. The main reason for this is that the returned information generally raises more questions and leads to very little action by the end user. It is only when the end user is able to act upon the information that value is received. If the user cannot complete an analysis then the actions are either non-existent or incomplete.

In contrast to this, databases that contain detail data in cross-functional business models have shown ROI in excess of 400%. It is not uncommon for true data warehouse systems to pay for themselves in the first year of operation.

There are many examples of this type of ROI. One customer was able to run an analysis that completely paid for the system and two years of operations with less than three queries. The enabling factor was the ability to ask detailed questions of cross-functional data.

Surprisingly, very few data warehouse projects are undertaken with a clear idea of the business objective and justification. This is a very easy pitfall to avoid up front.

2) Determine data needs and concerns

Once the business objective is understood and the metric is defined, the next step is to determine the data needs necessary to meet the objective. Note that at this stage is does not help to limit yourself to a single department`s data. To follow our example of churn avoidance, the data necessary would be customers, billing and usage, service history, and any customer support information. This data may be spread across multiple departments and limiting oneself to only one at this stage, will cause a stovepipe implementation.

After the data needs are defined, you must also define where the data is. Is the data consistent if in multiple places and if not, why not? Can you capture the data at its inception (or most practical point thereafter)?

If you cannot determine what data is necessary to meet your objective, or cannot locate the data once it is defined then STOP and rethink. One of the common pitfalls here is the need to get data from other groups. If this is the case, ensure that the groups are willing to cooperate in obtaining and cleansing of the data. This is usually a common political barrier that must be challenged. If you cannot get the groups to cooperate then you may need to call in the executive committee to resolve the issue, but resolve it before proceeding.

3) Create logical picture (architecture and model)

The creation of a logical picture is similar to creating a blue print before building a house. This is necessary for consistent communication and as a roadmap to where you are going. The logical picture assumes there are no barriers from technology or the politics or culture that will prevent you from implementation. Do you want to have all the data together in one place and do you need nightly (or hourly) updates? Are the users going to be accessing the system remotely, and finally how is the data modeled from a business perspective? All this leads to the creation of what you are going to undertake.

This step will also be useful in refining the scope of your implementation. Are you trying to do too much or too little? Will the end result give the users the necessary capability? It is important to not only build the picture of your phase one goals but your longer-term goals as well. This does not need to be a complete diagram, model and architecture but rather a good indication of where you are going. The plans get refined as you go along and will be referenced as you move from phase to phase.

The real advantage in creating a logical picture is that as you implement your first phase, you can perform a "sanity check" against your picture to see if you are putting processes into your first phase that will inhibit later phases from being accomplished.

4) Physical implementation

The physical implementation is an area that can be fraught with risks. This is because much of your logical picture, which was created without barriers, may now encounter major barriers. One of the easiest places to fail is in the selection of your platform. This is due to the fact that few warehouses have clear user objectives or metrics and as a result, the platforms and tools are not selected with any goal in mind. Even with the existence of business metrics and requirements, many platforms are selected due to a variety of reasons such as "it is our standard", "we have a site license" and "it is what our staff knows". Given this, one of the easiest ways to ensure success is to make certain your technology is going to live up to the challenge being put forth by the business opportunity.

All that said, assume you now have the selection process completed. In all likelihood you will have some compromises from your logical picture in the transition to a physical world. When you need to make a change to your model or architecture it is important to document the change, the reason for this, and what was gained and lost due to the change.

There have been cases where this step (moving from logical to physical) has taken months, and sometimes years! The problem is that after all the investment made, very few people are willing to admit the wrong choice. Recognise when you are taking too much effort to make your system work and be prepared to rethink your choices.

5) Usage and ROI audit

Once the data warehouse has been placed in production, it is very important to track the user benefits. A few issues occur with the initial implementation. The first is that the user community must understand that what they are getting is capability rather than just "faster" reports. The new system must give the users the ability to do something not capable before and then encourage the new usage. Any query statistics such as SQL, run times, execution time, frequency, rows returned, etc. should be captured for later analysis.

After the 90 day period, look at the usage to determine patterns, heavy users, infrequent users, long running queries and queries with a high variance in run times. From this analysis you will be able to accomplish a few things: 1) determine what tuning is necessary for well known and often run analysis, 2) determine how the system is being used to effect the business metrics identified in step 1 above, and 3) start to determine what the users are missing to calculate the best new data that should be brought into the environment.

This step is often overlooked due to the pressures that arise once the system is operational. Pressures such as batch processes are not running as smoothly as desired, users are identifying new needs (and where is that phase 2!), and few people really want to see if the ROI was achieved (as they will not change course now even if it is not providing payback, or it may highlight business failures on their parts). Despite these forces one must remember to follow the process to success. Without it, you end up building a "house of cards on a foundation of quicksand". This is the time to enforce the standards and the process before it is too late.

6) Leverage and extend to next business need

After the 90-day evaluation you are ready to start the next cycle of the process. This is where the initial systems are used as the basis for continued growth and leverage. A good way to start this step is by going back to the initial business discovery results. Take a look at what else was deemed critical at that time. Is there a good portion (say over 60%) of data overlap between the new need and the existing system? If so that may be a good candidate, if not then what is the gain in getting these two areas together? While there will certainly be benefit being derived from the initial implementation the real value of the warehouse is in the cross-functional capability that should be the goal of the second iteration. The data warehouse starts to become "greater than the sum of it`s parts". The evolution starts to put visibility on the "white space" between the functions, and that white space is what we call the "business".

By following the process and continuing to leverage the existing environment to further benefit you should see short-term tactical benefits that support your longer-term strategic goals.

Share

Editorial contacts

Pat Holgate
Teradata Solutions Group
path@nds.co.za