About
Subscribe

2009: A data odyssey

This is the first part in a three-part series of articles focusing on how we came to be in a data crisis, and what to do about it.

Johannesburg, 28 Jul 2009

Greek mythology tells us about Sisyphus, the character doomed forever to roll a rock up a hill. Sweating and veins bulging, muscles straining, he would get the massive rock to the top of the hill, whereupon it would roll straight back downhill to the bottom, and Sisyphus would begin the task all over again. This scenario was set to play itself out for eternity for poor, departed Sisyphus. The ancient Greeks knew torture.

That's what it feels like for anyone involved in the business of data quality. If a company comes to the realisation that it has a data problem, it begins to work on it. This is an important qualification: not many companies, in our experience, concede that there is a data problem. They get by with a multitude of data errors and the consequences that arise from them, either not realising or caring, or shrugging, "it's just one of those things you have to live with".

The consequences can be costly: damaged customer relationships, missed opportunities in the market and straightforward lost revenue.

And we are talking about lots of lost revenue: just the volume of returned postal items, at 12% on average across industries, costs companies millions a year in SAPO penalties and postal returns, not to mention the 12% reduction in cash flow due to unsettled invoices.

The very foundation of customer relationship management (CRM) is accurate customer data, and if you can't communicate correctly and accurately with existing and potential customers, what chance do you have of earning or retaining their business?

Then there is the issue of business intelligence: premises assumed and conclusions reached on the basis that corporate data is good and clean. This in itself can be one of the most costly aspects of data quality.

Why so hard?

Why is data so hard? The very word, data, gives the story away. It derives from the Latin word 'dare', which means 'give'. The participle, 'datum', means given. The plural participle, 'data', means many things that are given, or accepted. In business, then, we accept the data given to us, and we don't question it or the assumptions arising from it. The problem with this is that our data is never 100% sound.

Data needs to be represented as something: in an analogue device, it is represented as, typically, a Roman or an Arabic numeral (but the Sumerians with cuneiform probably got there first). The Romans understood early on how to communicate data: miles on a landmark, time on a sundial, the structure of an army. The Roman system of counting, or representing physical entities, was cumbersome so the world adopted the Arabic, or decimal, system, which is the system in use today and with its 10 numerals (the same as the number of fingers we have) laid the foundation for the world we know.

Binary method

A new problem arose in 1950, when scientists realised they had to use a different method of counting on computers. So arose the binary method, in terms of which transistors were either on or off: on meant 1, off meant 0. This system, which functions on the squaring of the numeral 2 (two choices: 1 or 0), gave rise to the byte, or 2 to the power of 3 (a byte has eight bits, eight being 2 to the power of 3). A byte could now represent something, such as an alphabetic letter, a numeral, or a punctuation mark. Data had meaning and life and purpose.

The first problem arose when commercial entities such as IBM, Honeywell, Univac, HP, Control Data and Burroughs, to mention six, each developed its own version of how data should be represented. To overcome this hurdle, the American Standard Code for Information Interchange was created; today most of us know it as ASCII, and it is the standard for how you input data via your keyboard.

ASCII provided a common foundation for communicating between computers, but that was at a low level. Over time, especially in the last decade, the need has arisen for computers to communicate at a higher level, ideally being able to do so at a process level, where two or more computers can easily communicate with each other through forms, which replicate internal processes. The appropriate fields on my invoice will map to your matching purchase order, leading to electronic, non-manual settlement.

So arose the integration format known as XML, or eXtended Markup Language, which allowed companies to agree standards in terms of which they could interpret the fields on each other's business forms.

Of course, as would be expected, a Tower of Babel of XML arose, with 41 variants, including that of Microsoft, which developed its own.

Now there were two major challenges: how to allow computers to exchange data in a standard way; and how to resolve the issue of data quality. In other words: how to interchange data, and how to resolve the enduring and repeating data quality. But that's the subject for the next instalment in this series.

Share

Editorial contacts

Jeann'e Swart
Predictive Communications
(011) 452 2923
jeanne@predictive.co.za