About
Subscribe

When the going gets tough

The current trend to integrating enterprise business applications into suites presents new challenges to the extracting, transforming and loading process.
Julian Field
By Julian Field, MD of CenterField Software
Johannesburg, 01 Oct 2002

Extracting, transforming and loading (ETL) from transactional systems into data warehouses has always been a complex task; the current trend to integrating enterprise business applications into suites presents new challenges to the ETL process.

It was hard enough taking data from an enterprise resource planning (ERP) package and transforming it into information that could populate a data warehouse and produce useful business information. The task became much harder when new applications, such as customer relationship management (CRM) and management (SCM) - each with their own data sets, partly unique, partly not - were added to the corporate application profile.

There are three basic layers of integration: interface, application and data integration.

Julian Field, GM, Ascential Software South Africa.

Now, corporations are tending towards integrated application suites, where ERP, CRM, SCM and other applications are built around a common data architecture. To most managers responsible for looking after data, this seems like a good idea. The administration burden is reduced, there are fewer stores of data located around the company and ensuring your data is current becomes easier. However, when delving deeper into the process, many people find that these architectures can make the process of extracting timely and meaningful data even more complex.

Some vendors now combine standalone applications with their CRM or SCM products to ensure their products can use these shared architectures with complex ERP systems. Yet more external applications are then added or bundled to assist in the extraction process, but they generally do not fulfil the complete ETL role as they cannot provide target models, nor can they transform logic for those targets. The point many of these add-ons seem to miss is that data integration and movement is critical to the success of any data warehousing project, as are applications specifically designed for the ETL process.

Applications included with these systems are generally good at what they do, but the complexities involved in the ETL process are not easily implemented as an add-on or afterthought. The best solution in this scenario, in my experience, is to leave the ETL procedures to those applications designed for that purpose and let the ERP, CRM and other application systems do what they are best at.

In addition, the people involved in developing data warehousing solutions know that only a limited percentage of enterprise data is generated by transactional systems. External data from suppliers and other now plays a more important role in the overall mix of data sources, complicating the idea of a common data architecture. This needs to be taken into account when planning a data warehouse and the way to populate it with relevant data.

A popular option among this potential for problems and ETL headaches is to stagger the warehouse implementation: the incremental, "small manageable steps" approach. This generally means staff or high-paid consultants writing scripts; long, drawn-out development and implementation cycles; and regular, immediately due consulting bills. It`s best to keep the big picture in mind. The drive to real-time application and data integration is another issue driving the redesign of many ETL offerings and influencing customers in their selection of these tools. There are three basic layers of integration: interface, application and data integration.

Interface-level integration is generally via Web-based interfaces to operational data. This provides a layer of abstraction, but does not actually integrate the data displayed at the user`s request. Application integration can occur in various guises. Applications can integrate data from disparate modules, or enterprise application integration (EAI, a topic requiring its own column) also provides a way for disparate applications to connect one another.

Consistent commitment

Data integration is the deepest level of integration because it is not short-lived or operational in nature. It demands consistent definitions, rules, processes and outcomes, as well as a commitment to the long-term collection of mission-critical information.

Data integration is the discipline, or even the art, of information construction. Corporate data is collected and integrated with reference data in order to support decision-making across the enterprise and over any period of time - and is that not the ultimate goal of business intelligence?

Caveat emptor

Before I end this sermon, I would also like to tighten the noose and offer my two cents about two touchy issues: vendors and consultants. As far as vendors are concerned, a good filter is required. This filter will ignore any information they provide about competitors` products. As you are a potential customer, they need to come to your party and you should only hear what they can demonstrate about their own products. This should be done using your own data and metadata.

As far as consultants are concerned, I generally prefer suspicion to acceptance of advice directing me to use a particular product under the threat of the project being "a total failure" otherwise. I also have a problem with people giving advice on the purchasing and use of complex tools when they intend to be far away after the initial niceties of pilot projects - when the fan gets really dirty.

Of course, not all vendors and consultants are out to pull the wool over your eyes, but in a tough economy, it is merely good sense to weigh your options carefully. And when it comes to complex processes such as ETL, making sure you get your money`s worth is common sense.

Share