Subscribe

Break it down

Canonical data breaks down data into its simplest form, while still meeting business requirements.

Mervyn Mooi
By Mervyn Mooi, Director of Knowledge Integration Dynamics (KID) and represents the ICT services arm of the Thesele Group.
Johannesburg, 21 Aug 2015

Most companies did not suddenly emerge with hundreds or thousands of employees, hundreds of thousands of customers and millions in turnover. They grew organically, by merging, being acquired or acquiring other companies.

Growth, though mostly desirable in the business world, creates its own problems and challenges within a company. And it is likely the cause of one of a number of companies' biggest headaches - alignment. For example: one part of the business has no idea what another is up to and the executives struggle to get at the information they need to make educated decisions of how to run the growing, merged, or acquired business. Furthermore, it is a humungous manual process to capture, recapture and synchronise data from the full spectrum of systems - this includes data on printed sheets of paper or even apps on mobile devices.

Most programs pass data or information to and from a data store, such as a database, file or storage repository. Most programs, such as apps, processes and so on, operate against an underlying data store. Even Outlook is just a fancy front-end sitting on top of a data store, and business systems are no different. Many business applications and processes share a data store - some companies go a step further and create data warehouses and master hubs that collate, integrate and standardise the data from many sources.

The next problem then is there are so many data stores that need to share the data. They may share data from manufacturing, finance, the product department, human resources, and the helpdesk - all over the business. It helps executives and management make smarter decisions about the future based on what's happening in the business today. It also helps improve customer service, make the supply chain more efficient and deliver a raft of other business benefits.

Spoke in the wheel

Incompatible data becomes a major snag to system integration. Canonical data helps resolve that issue. Essentially, it's breaking down the data into its simplest possible form that still meets the business' requirements. And even better: there are standard canonical data models for different industries, so much of the work is already done. But not all of it.

The standard models represent typical business subject areas. However, those may not relate to all the business' needs. Models will have to be developed and customised accordingly. A business can figure out the gaps by mapping the data items within the data stores of the subject areas back to the data modelled in the canonical model.

However, a company can still run into snags. Its data modellers must ensure the sources of the data being used are the correct ones. There can often be more than one source of data - imagine a customer's name in a bank, which may originate in the credit card, home loan, transmission or another division - so a merged (integrated) set of that data will need to be created. The next headache is ensuring the customer's name is correctly spelled across data sources, for example. It is also vital to ensure all the data is there.

Growth, though mostly desirable in the business world, creates its own problems and challenges.

Source custodians, the credit card, home loans, and transmission divisions in the example above, also need to agree to standardise the structure, collection and capture of that data in future, or it throws the canonical model out all over again. That's achieved through artefact definitions (metadata). It's like a dictionary of definitions describing what data is collected, how, when, where, why, and what it will be used for in the process.

Canonical data requires processes be clearly defined and enforced. Straying from the process allows error to creep into the data collection, storage, retrieval, copying and destruction, which again throws out the canonical model.

IT resources, for example, any artefact - whether data items or processes - are enshrined in the discipline of metadata management.

Canonical data is standardising on the simplest form of data, across all impacted data sources, and which meets the business' goals. But it's important to go beyond mapping pre-existing models for a specific industry. The company must check the source data, ensure there is a single source (or certify that where there are multiple sources, they are integrated and the data quality is good). Metadata management thereafter ensures ongoing protection of the company's canonical model that delivers trustworthy data to the people in the business who will use it regularly.

Share