Phased and distributed BI in Africa

By Yolanda Smit for Harvey Jones Systems

Johannesburg, 27 Nov 2008

Business intelligence (BI) solutions based on the data warehouse approach are finally becoming a standard concept in South Africa. Most large corporations already own and manage a data warehouse, or are in the process of developing one.

The best practice of data warehousing is also close to being perfected, surely closer today to a science rather than an art. This is driven by experts such as Ralph Kimball, the proverbial father of the data warehouse, and his peer Bill Inmon. However, all current best practices apply to the development of one standardised central data warehouse that is developed, implemented and maintained at a central location, providing access to the whole corporation.

Usually, these centralised data warehouses are developed using a phased approach, due to the changing nature of end-user requirements. This also helps in managing end-user expectations as they see results much earlier in the life of the data warehouse project.

However, there is a different approach in data warehousing that has been touched on by Bill Inmon: the distributed data warehouse. This approach is unique in that it develops one standardised data warehouse, but replicates it in different operations and maintains it in a distributed fashion.

Distributed data warehouses have various advantages:

* They simplify the reporting environment for each operation.
* They make information and reports easily accessible and simpler to maintain in a smaller, focused environment.
* They overcome infrastructure and bandwidth issues that may be pervasive in large WANs because they are hosted locally in each operation.

This is especially valuable in enterprises where operations are spread throughout Africa, and WAN connectivity is limited.

As more corporates require BI solutions to be extended to their operations in Africa, they soon realise that the best solution for the roll-out is probably a distributed data warehouse approach, which is where the challenge soon rears its head: what is the best practice in developing a distributed data warehouse, especially when using a phased development approach?

Benchmarking on extensive experience in the field of phased distributed data warehouse projects, the two major challenges one can expect to encounter are:

Exponential growth in project activities

When starting on the first phase of the project, it is smooth sailing: The project kicks off with a well-planned development phase, followed by rigorous testing, deployment and end-user training.

After the successful deployment at the first operation, you might feel inclined to pat yourself on the back for a good project well done, but this is where the fun starts. Usually, this is the point where the development team will close the book on Phase 1, and eagerly stretch themselves out to the adventure of Phase 2.

As the project steers down this road, additional activities now rear their heads and require attention. While you dive into the deep end of Phase 2, Operation 2 needs to be prepared for Phase 1 implementation. Just as you finished deployment at Operation 2, you start with change management at Operation 2, while planning for deployment at Operation 3.

As you continue with deployment at Operation 3, Operation 2 returns after change management with demands for customisation as the standard Phase 1 solution does not meet all their needs. In the meantime, Operation 4 and 5 are frustrated with the long wait for their BI solution, while Phase 2 requires attention for user acceptance testing and deployment at the first pilot site.

Before you know it, a small, contained project has erupted into full blown BI programme with a host of activities that need to be organised and coordinated. Maybe it's time to reconsider your career as project manager and rather study to become an orchestra conductor.

Version control

The diversification in project scope and activities leads to another obstacle: version control.

You have the standard solution on a development server that is accessed for development of Phase 2, while you have several replicas on several different test beds, either at head office in the lab for data testing and verification, or at the various operations for user-acceptance testing. At various points, fixes are made and in some extreme situations, ad hoc customisations are done. How do you stay in control of what has changed? And how do you merge so many different versions back into one standard data warehouse that effectively reflects all changes?

There might be several development tools that will assist with version control, but the problem might persist, as the root of this problem lies in logistics rather than technology. Team members involved in deployment do fixes on the fly as they are pressured by the D-Day of go-live. After the dust settles, they might not remember exactly what changes had been made and why, and attempting to merge the various solutions results in synchronising errors.

So what's best practice then?

So, how do we overcome this? Is it possible to build a distributed data warehouse using a phased approach, or should we revert to the big bang approach?

The answer can probably be found in the analogy of a growing entrepreneurial business. During the first few years, everything is smooth sailing. The five staff members in the business can easily cope with the scope of the business. The foundation of the business is trust and no standards, policies and procedures are required as each staff member takes ownership of their responsibility and is flexible enough to take on various different roles.

However, as the business starts growing, the need arises to formalise the working environment. Slowly, different departments evolve, supporting structures such as HR and payroll are required, procedures are formalised, there's a greater need for team building, and formalising of communication channels and collaboration.

In the same way, a distributed data warehouse with a phased approach starts off fitting the textbook definition of a project, but pretty soon it explodes into a programme consisting of a multitude of projects.

Usually a BI team will consist of a project manager with a team of developers consisting of a business analyst, solutions architect, a couple of back-end developers and some front-end developers. The business analyst and solutions architect double as trainers and change agents in the change management process.

Consider restructuring the team, and changing the paradigm from managing a project to managing a business. Form four separate teams, each taking ownership of key areas in the normal development lifecycle. These teams will overlap, so facilitate the overlap with senior team members such as the business analyst and solutions architect to ensure effective collaboration between the teams.

* The plateau development team focuses only on the development of the new phase of the solution.

* The testing team has the sole responsibility of coordinating and conducting quality testing. They will conduct extensive testing with the development team at the end of each phase. This team also drives the process of testing in preparation of each deployment at the different operations, after which they assist the deployment team with technical testing at the operation during deployment. Finally, they manage and monitor the user-acceptance testing (UAT) at each operation.

* The deployment team takes full responsibility for coordinating and conducting the various deployments. It is recommended that they do extensive handover of technical aspects to the local IT teams to enable local technical support as much as possible. This team also takes responsibility for ensuring that all changes made during a deployment are successfully merged back into the main solution at the hub to assist with version control.

* The training and change management team takes core responsibility for getting end-users to use the system effectively, whether it is the introduction sessions, or conducting the product training on the generic training solution. They also take ownership of preparing for the UAT workshops in collaboration with the testing team and then facilitate the UAT workshops. It might add more value to have qualified trainers (as opposed to techies) as part of this team, as the literacy levels in Africa place more emphasis on the need for proper training and change management.

In conclusion

Be prepared: When kicking off a distributed data warehouse development ... this is not just a project; it will become a pervasive part of the IT department's life.

Don't run away, though! The end result will definitely add value to the client, and in a geographically dispersed company in Africa, a distributed data warehouse will most probably be the only solution possible. Furthermore, stick with the best practice of a phased approach, but change the team structure to effectively cope with the unique aspects of a distributed data warehouse.

Editorial contacts