About
Subscribe

Ensure data quality through automated profiling

Many business leaders believe their data stores are critical corporate assets containing valuable information, and they are shocked when projects fail because of poor data quality.
Julian Field
By Julian Field, MD of CenterField Software
Johannesburg, 14 Dec 2004

Few organisations question the quality of their and have no hesitation in using it to serve customers or make decisions affecting the future of their companies. It`s only when silos are broken down and information is shared between departments for the first time, or databases merged to cater for new applications, that the poor quality of information becomes evident.

What few people seem to realise is that the more databases are used by people creating, modifying and deleting information, the greater the of data degradation. Every time a client`s personal information is added or changed; structures, relationships and fields altered to meet changing business needs; or databases designed for one application are shared with others, there is a risk that data will be corrupted.

In an ideal world, all updates would be documented and available for reference when a problem or question arises. Of course, this is the real world and we know that in the rush to accomplish business objectives, not enough care is taken to document processes and changes.

Even small databases that are easier to control as they are confined to limited applications and often functioned perfectly in their own environments, are at risk when integrated with other systems. For example, over 80% of major integration projects fail or overrun their budgets because of the surprises bad data delivers.

Yet the risk potential of bad data is greater than cost overruns and long projects. The effect of months of implementation delays in a new mission-critical system, such as customer relationship management or enterprise resource planning initiatives, can cost a company the window of opportunity to achieve competitive advantage. Organisations today don`t have the luxury of allowing delays in software installations or updates; they have to succeed the first time or be left playing catch-up.

The most efficient way database administrators can accurately identify potential data problems and prevent integration failure and other information-related problems is through data profiling. There are two data profiling methodologies in use today: manual and automated.

Manual data profiling

Organisations today don`t have the luxury of allowing delays in software installations or updates; they have to succeed the first time or be left playing catch-up.

Julian Field, MD, Centerfield Software

Manual data profiling is the most complex and time-consuming way to determine data quality. The process involves administrators or analysts developing data assessments from existing documentation and making assumptions about the company`s data. Their aim is to deliver a best guess as to what and where data problems are likely to be encountered.

Since we`re talking about a small team of experts, they are not realistically going to be able to evaluate all the available data in even an average-sized company. Instead, they take a sample of the data and analyse it to determine if their initial suspicions were correct. If they were, they can then develop applications to fix the problems they encountered.

Further analysis will determine if this was successful and if there are other problems to address. And so this cycle of analysis and correction continues. Unfortunately, there is no guarantee that all issues will be identified and resolved, and undetected data problems could create havoc further along in projects relying on the data.

Automated data profiling

A far more efficient, quicker and more accurate method of examining and fixing data is to use an data profiling tool. Using technology means the process is so much faster and accurate that all data can be examined in a relatively short period. Although profiling tools differ, they can all automatically identify virtually all problems in corporate data, providing a complete picture of its structure, relationships and quality.

More than merely tools for use in specific integration exercises, the speed and accuracy of the profiling tools means they can also be used for regular data quality audits to ensure organisations always maintain maximum data integrity. Not only will this smooth data processes, but it will also reinforce the value of the organisation`s data.

Data is the lifeblood of every organisation, but often the pace of business sees data quality suffering, with the result that the applications and people relying on the information perform below expectation. In addition, when companies or divisions merge and major database integration projects are initiated, data quality, or the lack thereof, can create costly barriers to success. The solution is to pre-empt any problems and ensure data integrity through automated data profiling to guarantee that information will not be the weak link in either internal or external business processes.

Share