Data profiling continues to be the top obstacle for successful business intelligence (BI) projects.
This statement runs counter to the revelations contained in a report from Computer Sciences Corporation (CSC) and Financial Executives Research Foundation. They ran a survey that found data quality and information integrity have overtaken security as the top concern for CFOs.
The industry has been abuzz with commentary since this report came out, but the survey may have got the priorities a little wrong. Data quality is indeed a valid concern, and without good quality data, companies` warehouse and BI projects are going to yield variable results.
That`s why, according to the CSC and Financial Executives Research Foundation survey, 58% of respondents said improving data quality and information integrity was their most critical technology concern. Information security previously held the top spot for two years, but it has now slipped to number four on the list.
But data profiling precedes quality in a data project; companies first need to know where all the data is and what is in each distributed store. Data quality is a far simpler task to perform - there are many tools available that will launch the space shuttle for the company with the right pilot at the helm. But telling the pilot where to go is the hard part.
And that`s why, in my experience, most of the data quality tools being sold in SA aren`t being used to clean the data - they`re being used to profile it.
All profiling does, and it sounds deceptively simple, is show people what content they have, its structure and its quality. The sad truth is that most companies don`t know what they have, where it is or what it consists of.
Ralph Kimball is a visionary in the data warehouse business, a well known speaker, consultant, teacher, author of the "Data Warehouse Designer" column for Intelligent Enterprise magazine and of the best-selling books The Data Warehouse Lifecycle Toolkit and The Data Webhouse Toolkit.
All profiling does, and it sounds deceptively simple, is show people what content they have, its structure and its quality.
Mervyn Mooi is director at Knowledge Integration Dynamics.
He states there is a growing trend of people who need visibility into their data if they are to better manage their businesses. He also acknowledges that most organisations face a myriad disparate data sources, sometimes global, and that regulations and compliance mean they can no longer be overlooked.
The problem lies in the facts that the CSC and Financial Executives Research Foundation survey uncovered; but they are only half the story. The other critical component is that most organisations don`t know where the data quality problem originates.
Data profiling reveals that. There are two components to a data profiling exercise: strategic and tactical. The strategic component identifies data sources and divulges whether or not it contains clean data or requires some effort. The tactical component dives deeper into the data source to reveal specific technical interventions to fix it.
That`s why data profiling absolutely must precede any data quality project. It focuses the project to save time, effort and helps it reach a clearly defined goal.
* Mervyn Mooi is director at Knowledge Integration Dynamics.
Share