Integration challenges in business intelligence

By Cor Winckler for PBT Group

Johannesburg, 29 Mar 2007

The integration challenges in business intelligence (BI) extend way beyond just the sphere of data integration.

Data integration must, by nature, be part of this discussion, but it is a single aspect of a multi-faceted landscape.

This article, therefore, will explore all the areas in typical BI projects where integration plays a role.

Hardware platforms

Modern BI platforms are most often heterogeneous - this extends all the way down to the hardware. A simple example is that the data warehouse database servers are often 64-bit Unix RISC or CISC hardware platforms, whereas the departmental data mart might be on an Intel 32-bit machine running MS Windows, and an end-user might well be using a Web-based interface from a disk-less workstation or even a mobile device.

Yet, architecturally, these machines all participate in the 'BI platform' and need to be integrated to work together to service the BI requirements.

Fortunately, advances in networking standards, particularly the popularity and almost universal implementation of TCP/IP, running on a variety of wired or wireless carriers, makes it possible for these machines to all communicate with each other, regardless of each machine's particular hardware architecture.

Other industry standards come to the rescue when it comes to transparently access data between hardware platforms. The proliferation of ANSI SQL as a query language is well established, and one can almost not imagine a world without it. ODBC and JDBC drivers exist for most vendor databases, makes accessing various relational data sources transparent to end-user applications. Similarly the OLAP Council's MDAPI makes accessing various vendor OLAP sources transparent to OLAP front-end applications.

The no-brainer - data integration

This kind of integration uses standard tools, in a staging area, to combine data from various source systems, and to integrate this data into a data warehouse or reporting repository. These standard techniques include sorting, de-duplicating, combining, enhancing, cleaning, house holding a plethora of other techniques.

Data profiling and data integration tools (ETL tools) are the clich'ed 'dime a dozen'. A count of available ETL tools on the market, in both open source and proprietary formats, at this time exceeds 30, and the number will probably increase.

The programming work done by an ETL programmer could be done, as simply as using a graphical user interface to define the ETL rules, or as complex as writing the transformations in a 3GL programming language like C, C++, Java, or a 4GL or interpretive language like Sybase and SQL Server's Transact-SQL or Oracle PL/SQL, or a proprietary programming language like SAS. Using GUI tools often take a bit longer for the up-front work, but are more self-documenting and metadata driven, and therefore more valuable in the long run. Roll-your-own solutions are typically quicker to develop, but the responsibility for metadata and documentation lies squarely on the development team, and if not done properly, may make the system difficult to maintain (and understand) in the long run.

Design integration

The content of the data warehouse has to be consistently maintained, across a potentially federated architecture, where data may be housed in anything from large central enterprise-wide reporting repositories to smaller distributed systems. No matter how centralised or distributed the storage of the various data elements are, it is still important, and usually quite a challenge, to keep the consistency across various implementation cycles. Each effort to enhance the data warehouse with new data feeds, has to adhere to the same standards, and has to fit into the larger enterprise-level model for the data warehouse.

One way to ensure such consistency is to utilise the concept of a bus matrix. This method is part of the widely used methodology that was published in the late 1990s by Ralph Kimball and his colleagues from the Kimball Group. In the bus matrix, each and every business process and its corresponding measures are documented, and mapped onto the applicable conformed dimensions. These dimensions are shared across the enterprise, and ensure that roll-ups and groupings are done exactly the same way regardless of the reporting platform.

Conformance is as much, if not more, a business integration challenge than a technical integration challenge. The data that needs to be conformed may come from various systems and can be manipulated to conform to a standard structure and content, but unless the business was part of the process of defining and approving that standardised structure and standardised content, the conformance effort is probably doomed. This 'intellectual integration' is much more important for project success than the techies would like to believe.

At this point it is necessary to mention metadata, even though it is also discussed in a subsequent section. In a federated BI platform, the shared definition of a data element, especially a shared and distributed data element, is of utmost importance. It is no use that the item was defined, if that definition does not accompany it consistently wherever it gets used.

The tools needed to keep on delivering well-integrated well-designed data properly into an ever-growing BI environment are not something one gets off the shelf. One needs to settle into a consistent methodology that is used and understood by all. The team and the business need to have the skills and understanding to walk the entire journey.

Integrated reporting

Once a proper methodology is in place to ensure consistent delivery of the content of the data warehouse, it makes the job of front-end developers much easier. In a federated environment, however, there is still the added complexity of multiple front-end information delivery tools, from multiple reporting environments to dashboards and analytical applications. Across all of these technologies and audiences, the same consistency and conformance needs to be maintained, from the base level data, all the way through to consistent roll-ups and summaries of data, and the correct use of conformed hierarchies. It is no use if the same business hierarchy is represented in two different ways between say a report and an OLAP application.

Consistency is influenced by the frequency of data refreshes, differences in software, and the ability to exchange front-end tool metadata. The data refresh integration is part of the normal ETL integration. Consistency in reporting should be greatly supported by proper design integration and conformance. Standards in technology should address software integration between tools, and lastly, proper metadata integration should synchronise metadata between all the tools in the BI platform.

Metadata integration

There are a plethora of software tools in a BI platform. The obvious ones are the DBMSs, the ETL tools and the reporting tools. There are also system administration tools, design tools, data profiling and data mining tools, to name just a few. Most of tools employ some form of internal metadata. The problem of integrating the various software tools thus becomes a problem of integrating the underlying metadata.

This is of course where the single-vendor platforms claim to have a large advantage over best-of-breed type implementations. Examples include The Oracle BI Suite and the new SAS 9 BI Platform. In these systems, the ability to share and define metadata across the entire platform is a built-in feature. In best-of-breed implementations, the sharing of metadata has to be achieved deliberately.

There are several industry standards like the Object Meta Group's Common Warehouse Model (CWM) that attempts to assist with this dilemma. There are also third party metadata integration tools available, that import and export tool metadata to a variety of existing tools, with each tool plug-in carrying a price-tag.

The option is also open to write your own metadata exchange. This may seem like a daunting task, but once one understands each tool in the environment, as well as its metadata import and export capabilities, and to what level they support the various standards, it becomes a relatively simply XML mapping exercise to translate metadata from one to the other. Most Web and Java developers nowadays do XML mapping on a day-to-day basis. Some of these XML mappings can even be done by making clever use of CCS style sheets.

Team work - The final front

We have become used to the word 'integrated' being used to describe the post-1994 South Africa. This is a very applicable and correct use of the word, if we take the dictionary definition literally. This ability to take everyone in a nation and make them work and exist to 'become one' is just as applicable to a smaller team of people. In any medium to large project it is very important that everyone in the initiative are aligned in their thinking, to aim for the same end goal. This 'soft' issue is often overlooked, and can sometimes be a very significant contributing factor to late delivery, or even failure of a BI project.

Each member of a BI team has to have the correct and applicable skills. Within the team as a whole (not necessarily in each individual) there must also be a large amount of experience. Though interaction and communication, these individual and group strengths should be maximised to deliver the larger project. If each person understands their role, and appreciates the strengths and weaknesses of the other members of the team, a unified and strong effort will result in good and timeous delivery.

Conclusion

This paper gave a brief overview of some of the integration challenges that one will face during the lifecycle of a BI project. Around the world IT and business professionals face these challenges every day, and there are unique and often simple solutions to some of these challenges. By being aware of them up-front, it is possible to face the challenge head-on, and deliver a successful BI project/program.