One of the reasons IT departments exist is to exert some control over their company`s technology and make sure it is able to efficiently store, distribute and manage information in a manner that adds value and improves business processes. Consequently, almost every task this department performs will have some effect on the organisation`s data.
The ability to continually improve and overhaul corporate infrastructure without negatively impacting data is therefore a critical component of the IT department`s reason for being. When starting a new company, technical staff do not have to worry about their impact on existing data since it does not exist; it`s the work done in companies with a history of data collection and manipulation that will have the greatest effect on the business`s information. And this is where data profiling comes into play.
Data profiling tools and best practices should be an automatic component of the planning process for any project that affects existing data and databases. The process of profiling identifies the data and its attributes, and then assesses the quality and complexity of the information.
Everybody involved in the project will then understand what data they have to work with and can easily determine the relationships between information and applications. It will also allow planners to more accurately scope the extent of the project in terms of the costs and time required to adapt to or change data structures.
Without profiling, it`s common to suddenly discover data problems halfway through the project, such as bad data, missing values or non-standard entries (also called finger trouble). These problems will be even greater when mapping to legacy databases. Apart from the hassles involved in resolving these issues, the diversion is likely to cause cost and time overruns as resources are refocused on the rush job of fixing the data. Knowing about these issues before starting will allow people to plan and cost the project more effectively.
Benefits of a data profiling solution
A data profiling solution will ensure companies get the most out of their information by assisting in the following areas:
* Establishing benchmarks to monitor data quality problems.
* Project planning becomes more efficient and accurate if an inventory of data assets and their quality is completed first.
* Analysing disparate data sources.
* Data profiling provides information that helps planners understand the costs of a project and the potential for a return on investment.
* Saving money and time spent correcting data-related problems.
* Providing information about defects and shortcomings in the data to help plan for future projects.
What projects benefit from profiling?
Without profiling, it`s common to suddenly discover data problems halfway through the project, such as bad data, missing values or non-standard entries.
Charl Barnard, GM, Knowledge Integration Dynamics
Of course, not every project needs to start with a data profiling exercise. Of those that do, the most important are data warehousing, integration and application development projects.
The business of populating a data warehouse, commonly known as the extracting, transforming and loading (ETL) process, can see multiple source databases accessed directly. Without an accurate profile on the structures and quality of the data, ETL takes much longer, costs more and the results may still contain defects that are only discovered later in the project.
These same benefits can also be applied to a variety of data integration projects, including consolidating databases or migrating legacy systems. Additionally, all programmers understand the importance of knowing the ins and outs of databases for which they will be writing or updating code.
Finally, data quality projects are non-starters if they are not preceded by an in-depth profiling phase. It is no coincidence that we use the PQI acronym: data profiling projects first, followed by quality initiatives and finally integration processes.
Fortunately, data profiling has been made much easier over the past few years as various products have been released to market to automate and simplify the process as much as is possible. The process can be undertaken with custom-developed tools or even manually, but this is generally not recommended due to the time and labour resources this will consume.
In the high-pressure environments most of us work in, time is not always devoted to processes without measurable returns - such as data profiling. However, starting with a profiling exercise will deliver returns later in terms of costs and time saved. It will also raise the opinions business has of IT by allowing planners to more accurately determine the costs and time required to finish a project without cutting corners or dropping functionality at the last minute.
Share