Subscribe

Avoiding the metadata backlog


Johannesburg, 15 May 2003

In much the same way as a balance sheet provides a view of the financial resources or the organisation, metadata provides an inventory of the organisation`s information assets. Which organisation can manage its human resources well without an up-to-date employee register? Which company would manage its buildings and installations without an up-to-date fixed asset register? But the burning question is - in this so-called information era, is our information asset register up to date? Bill Inmon summarised it very aptly when he said that metadata has become an industrial monument to underachievement!

Metadata is not even a new concept. We have been dealing with metadata for decades - from Cobol copy-libraries, through database system catalogues and the IBM-type repository systems and now more recently with the metadata used in ETL (extract, transformation and loading) tools and in information delivery tools.

However, it is exactly this use of metadata in business intelligence that highlights its importance. Nowhere are the needs for and the problems around metadata been so important than in the business intelligence space. Here we have potentially hundreds of users sharing a common information resource with integrated information from three to six or even more different business areas. When we refer to "margin", do we include or exclude VAT? When we refer to "headcount", do we include or exclude temporary staff? To ensure everyone plays the same song off the same sheet music, we must have standardised definitions and integrated metadata in this environment. But, do we?

In the business intelligence world, we require business and technical metadata. Business metadata provides a road map for the business users on how to access the information in the data warehouse - in non-technical terms. It informs the users what information there is, what it means, how up-to-date it is and how to get to it. Technical metadata gives the developers and technical users the information about how it is implemented - who implemented what, how it is designed and implemented, how and when it is populated, how were business rules implemented, which versions run "live", etc. In traditional terms, the technical metadata forms the system documentation, while the business metadata is used as the user manual.

So why can`t we just use the information delivery tool`s metadata? Because, in a large corporate, we find some users using Business Objects, others using Cognos Impromptu, other downloading data into Essbase cubes and the Excel brigade manipulating the data in spreadsheets before reporting on it... So which tool`s metadata do we use, how do we ensure consistency, or how do we put all of it together? This, among many other requirements, clearly shows why a proper integrated metadata repository is required.

However, merely plugging all the metadata in a single integrated repository is only the first step. We need to use this repository proactively. The right approach is to first capture the metadata, ensure it is correct - and only then to drive the processes from the metadata. For example, first capture the ETL mapping definitions, and then generate the ETL programs from the metadata. First capture the data warehouse table definitions, and then generate the tables from the metadata. First capture the metadata, and then generate the reports and analyses using the metadata. The minute you first do the process or implementation and then capture the metadata afterwards, you create the potential loophole that the metadata could become the catch-up orphan. And it will... I can assure you, with an after-the-fact metadata approach, when the project gets tight, the documentation and metadata is going to be cut. With a pro-active metadata-driven approach, the metadata cannot be cut, as that cuts the process as well.

But is it worth the effort? It is quite easy to calculate. Ask the data warehouse developers to do a proper impact analysis of what all could change when a key table in the data warehouse would change. Without proper up-to-date metadata in place, they have to search through table definitions, source-to-target mappings and programs and information delivery reports to "guesstimate" an answer. Ask the data warehouse users what exactly a complex measure means, and where it comes from. Without proper metadata in place, they will deliberate the issue for hours and spend man-days scratching through source systems. When two simple queries on the metadata repository were all that was needed... There is no question, as a documentation, cataloguing and impact analysis tool, the return on investment in terms of productivity and accuracy is immeasurable when compared to the relative low cost and effort of installing such a tool and using it correctly and proactively throughout the process.

Share

Editorial contacts

Martin Rennhackkamp
PBT Group
(021) 551 0937
martin@prescient.co.za