Understanding the undeniable value of metadata

Metadata is an undervalued contributor to data project delivery and operations, yet provides a huge advantage in the development of data-driven solutions.

Metadata defines or describes the context of data. While some know it as ‘data about data’, this to my mind is a very high-level definition.

However, the context of data gives more meaning in that it examines the “what, why, who, how and when?”, but still allowing for the two main categories of metadata: technical metadata and business metadata.

Technical metadata is information about data that is stored on the device. In a relational database, technical metadata is a database name, table name, column name and data type, relationships between database tables, constraints and all the attributes that relate to the physical structure of where the data is stored.

A more common example would be looking at an Excel spreadsheet. Technical metadata in an Excel spreadsheet is the file name, sheet name, file created date and file modified date. Technical metadata is created at the time that an object or file is created but can also be modified. The object or file is the ‘container’ and is therefore where the data is stored.

Business metadata, on the other hand, is the business context or definition of the data. This enables the organisation to gain a common understanding of its available data across the business – and importantly, the presence of business metadata reduces communication barriers between the “users” of the data, while aiding in the decision-making process.

The magnitude of confusion that can be caused by misalignment of business definitions across different business units in an organisation is immense.

According to Donna Burbank (DATAVERSITY): “Metadata helps both IT and business users understand the data they are working with. Without metadata, the organisation is at risk of making decisions based on the wrong data.”

This is important to remember as business glossaries, business rule definitions and data quality definitions are all types of business metadata.

So, why is metadata important?

The magnitude of confusion that can be caused by misalignment of business definitions across different business units in an organisation is immense. As a result, the practice of managing the organisation’s metadata is extremely important.

I have witnessed many disputes of “incorrect reports” complaints from business intelligence (BI) systems purely due to a lack of common understanding of business rule definitions in organisations.

Therefore, when a request comes through to the BI team to provide a report for the total number of active customers, there are two important underlying questions that must be considered in a request like this:

  • What is a customer?
  • How is an active customer derived?

A customer can be an entity/party or subscriber that has purchased one or more of the products or services offered by the organisation. One user may require only legal entities, while another user might require natural persons.

If the company therefore has more than one customer on-boarding system, the entity might exist in two or more operational systems. Depending on the type of business user requesting the report, the understanding of the definition of a customer can vary.

Additionally, executive management, middle management and end-users may have a different understanding of business definitions. It is therefore important for companies to ensure business metadata is updated, circulated and understood by all data users.

"The essence of business metadata is in reducing or eliminating the barriers of communication between human and human, as well as human and computer, so that the data conveyed from reports, information systems, or business intelligence applications can be crystal clear, can facilitate business operations, and can be leveraged for all business decision-making processes," states Lowell Fryman.

Furthermore, metadata provides a huge advantage in the development of data-driven solutions. Some extract transform load (ETL) tools use metadata to auto-generate ETL code that can be executed at run time. This type of development has proven to be an accelerator in data-driven solutions as it limits the amount of scripting required to be done by the developers.

In fact, I have had the pleasure of being part of a team that managed operations, using a metadata-driven ETL framework. This solution was deployed in multiple Africa and Middle East countries, and had to ensure remote work was seamless given that the metadata repository was centrally hosted. As a result, deployments were done remotely, using VPN connections.

The metadata centricity enabled uniformity on data structures, and conformity of business rules, across the organisation in different operating countries.

What I have found is that not efficiently managing metadata is causing organisations to spend unnecessarily on both data project delivery and operations. Additionally, having technical metadata, without business metadata, delays the productivity in relation to the project team understanding source system data.

As a result, this then usually requires a subject matter expert to articulate the business rules and data definitions to the project team. This hand-holding process is cumbersome and could easily be alleviated if the source systems have data dictionaries, documented business rules and an up to date business glossary.

From an operational perspective, technical metadata is core to creating the automated monitoring of systems. However, running a live data-driven solution, with real-time and batch data feeds, requires constant monitoring.

It is virtually impossible to expect system administrators to be available, throughout the day, to monitor systems. And without this automated monitoring, the cost of overtime and head count would be exhaustive for any organisation to manage.

It is therefore important for technical metadata to be captured and utilised to automate the monitoring and issue remediation process as much as possible, if you want it to work correctly in your organisation.

Windsor Gumede

Director, PBT Innovation at PBT Group

Windsor Gumede, director, PBT Innovation at PBT Group

He is a self-motivated, results-driven principal BI consultant with 10 years’ experience in data and analytics. Gumede has worked on numerous data and analytics projects in Africa and the Middle East.

Throughout his career, he has played different roles, from ETL/ELT development, to data modelling, front-end development, solution architecture and design, to pre-sales consulting. The majority of his experience comes from the telecommunications industry, but he is currently maturing his knowledge in the insurance space using big data technologies to help insurance clients comply with regulatory requirements.

Gumede is a strong believer in the core fundamentals of enterprise data management. “I see a huge gap in South Africa with technical resources that have skills in the big data engineering field but don’t have the proper grounding on enterprise data management principles. Skills on tools and technology without the literature is ineffectual.”

See also