About
Subscribe

BI focus on data quality highlights enterprise need for real-time solutions

Research shows that ETL centric approaches to data quality are making way to real-time enterprise data quality solutions that create trusted views of data. Most new Trillium Software deals include real-time component.

By Master Data Management
Johannesburg, 31 Aug 2009

Over the last few years, various BI vendors have been raising the topic of data quality as a critical component for successful BI. It is quite right that this issue is being highlighted - the concept of “garbage in, garbage out” is even more relevant when data is being assimilated from multiple sources before being processed to create business insight. For one of our customers, for example, this led to a report showing that the best-selling product and the next best-selling product were both the same - due to duplicated product data with differing product codes.

The most common approach recommended by these vendors is to apply data cleansing rules during the ETL process so as to create a cleaned data source for analytics and reporting. This approach is inherently limiting, in that the cleaned data cannot be used for other purposes, such as MDM, nor does it resolve the fundamental issue that poor quality data should be managed at source - before it enters the database. We may get a more accurate report, but the stock control system could still place an automated order for more raw material - even though tons of the stuff may be in the warehouse under a different product code.

Another issue is that having different data in the operational and analytics systems means that we may get different answers to the same question - daily operational reports may show that our overall credit risk is spread between many customers. Aggregated reports may, however, show that the same customer has been captured many times with small variations of spelling or language, and that our exposure is in fact much higher that was previously thought. Discrepancies of this nature are a major source of risk to any analytics project as they create doubt in the minds of the business users. Business users will not work with reports that they do not trust.

According to Forrester Research analyst, Rob Karel, the requirement to deliver a “trusted view of data to the right stakeholder at the right time” is driving the importance of real-time data management services. Karel believes re-usable cleansing services and matching services are among the most significant services available to data architects trying to fix the confidence problem in the data warehouse, or the ERP or CRM system.

Real-time data quality tools, such as the Trillium Software System, standardise backroom processing through standard interfaces for the ERP, CRM, legacy and ETL applications. The same business rules applied for ETL can be applied to the legacy systems - whether on the mainframe, UNIX, the PC or even the Web - and can be applied as part of a real-time services oriented architecture to clean and deduplicate new records before they enter the databases. Simplifying the ETL process in this way reduces the cost of data integration and maintenance leading to a more flexible and cost-effective data warehouse.

Share