Avoiding the data spaghetti junction
When data is the lifeblood of the organisation, quality content is not enough; tried and tested rules, controls and architectures are needed.
Despite all their efforts and investments in data quality centres of excellence, some enterprises are still grappling with data quality issues, and this at a time when data is more important for business than ever before.
The effects of poor data quality are felt throughout the enterprise, impacting everything from operations to customer experience, costing companies an estimated $3 trillion a year in the US alone.
Data quality will become increasingly crucial as organisations seek to build on their data to benefit from advances in analytics (including big data), artificial intelligence and machine learning.
We find organisations unleashing agile disruptors into their databases without proper controls in place; business divisions failing to standardise their controls and definitions; and companies battling to reconcile data too late in the lifecycle, often resulting in a 'spaghetti junction' of siloed, duplicated and non-standardised data that cannot deliver on its potential business value for the company.
Controls at source
Data quality as a whole has improved in recent years, particularly in banks and financial services facing the pressures of compliance.
However, this improvement is largely on the wrong side of the fence, after the data has been captured. This may stem from challenges experienced decades ago, when data validation of data being captured by thousands of clerks could slow down systems and result in customers having to wait in banks and stores while their details were captured.
Data quality will become increasingly crucial as organisations seek to build on their data to benefit from advances in analytics, artificial intelligence and machine learning.
But this practice has continued to this day in many organisations, which still qualify data after capture and so add unnecessary additional layers of resources for data cleaning.
Ensuring data quality should start with pre-emptive controls, with strict entry validation and verification rules, and data profiling of both structured and unstructured data.
Controls at the integration layer
Standardisation is crucial in supporting data quality, but in many organisations different rules and definitions are applied to the same data, resulting in duplication and an inability to gain a clear view of the business and its customers.
For example, the definition of the data entity called a customer may differ from one bank department to another: for the retail division, the customer is an individual, while for the commercial division, the customer is a registered business, and the directors of the business, also registered as customers. The bank will then have multiple versions of what a customer is, and when data is integrated, there will be multiple definitions and structures involved.
Commonality must be found in terms of definitions, and common structures and rules applied to reduce this complexity, and relationships in the data must be understood, with data profiling applied to assess the quality of the data.
Controls at the physical layer
Wherever a list of data exists, reference data should also be standardised across the organisation instead of using a myriad of conventions across various business units.
The next prerequisites for data quality are cleaning and data reconciliation. Incorrect, incomplete and corrupt records must be addressed, standardised conventions, definitions and rules applied, and a reconciliation must be done. What you put in must balance with what you take out. By using standardised reconciliation frameworks and processes, data quality and compliance are supported.
Controls at the presentation layer
On the front end where data is consumed, there should be a common data portal and standard access controls, or view into the data. While the consumption and application needs of each organisation vary, 99% of users do not need report authoring capabilities, and those who do should not have the ability to manipulate data out of context or in an unprotected way.
With a common data portal and standardised access controls, data quality can be better protected.
Several practices also support data quality: starting with a thorough needs analysis and defining data rules and standards in line with both business requirements and in compliance with legislation.
Architecture and design must be carefully planned, with an integration strategy adopted that takes into account existing designs and meta-data. Development initiatives must adhere to data standards and business rules, and the correctness of meta-data must be verified.
Effective testing must be employed to verify the accuracy of the test results and designs; and deployment must include monitoring, audit, reconciliation counts and other best practices.
With these controls and practices in place, the organisation achieves tight, well-governed and sustained data quality.
Mervyn Mooi is a director of Knowledge Integration Dynamics (KID), and also a key resource within the company's information management, data warehousing and business intelligence teams. He has been in the IT industry for 36 years, beginning his career as an operator at the CICS bureau in Johannesburg in the early 1980s. Thereafter, he was appointed as a programmer at state-owned oil exploration and production company SOEKOR. In 1986, Mooi joined Anglo American's head office IT department where he remained for almost 12 years. Here he progressed to become a senior programmer, analyst, database administrator and technical support specialist. After completing his degree in informatics, he then left to join Software Futures, where he worked as a senior consultant for 18 months in the data warehousing and business intelligence arena. Mooi joined KID in 1999 as a data warehouse and business intelligence specialist. Mooi's experience in ICT disciplines includes operations, business and systems analysis, application development, database administration, data governance/management, data architecture/modelling, production application and systems software support, data warehousing and business intelligence. He now focuses on enterprise information management, information governance and cloud solutions.