One-stop shop for data democratisation?

Simply making all data available to all users could prove chaotic and risky, so data democratisation is not as easy as it sounds.

Data democratisation − a push to enable everyone in the organisation to access, manipulate and analyse the organisation’s rich data stores − has been a dream for some years. 

In theory, data democratisation would eliminate data gatekeepers and bottlenecks, and make data available and usable across the business. This, in theory, would help unlock more value from data and drive smarter, data-driven businesses.

No sooner had the concept emerged than organisations realised there would be challenges in achieving data democratisation. There were concerns about data governance, data quality and the risks of inexperienced staff not adhering to data management best practices.

Data democratisation raises questions such as what data should be accessed, and where this should be enabled. Simply making all data available to all users could prove chaotic and risky.

The notion that data can be provisioned easily through a “one-stop shop” for the digital marketplace and individual consumers is a misleading one.

In democratising and sharing data to the right people or processes, a huge amount of preparation is required in the background − from creation to consumption to destruction. The data must be qualified (legally accessed, profiled, verified, cleansed, validated) and integrated (matched, related, deduplicated, enriched) before it can be provisioned as trust-worthy information.

The notion that data can be provisioned easily through a “one-stop shop” for the digital marketplace and individual consumers is a misleading one.

Moreover, democratising data requires underlying governance: the application of controls in the preparation of the data to ensure the right data accesses and processing are done at the right times and by the right people for the right purpose. All of this work comes at a cost.

Even if all this work is done at the data’s source and served directly to the consumer, it should be noted that the provisioning will inevitably take from many points, and will still need to come together.

Take the example of a taxi driver looking for the shortest route to a destination. This involves getting possible routes from one information provider (Google Maps, for example), which in turn has to access and bring many datasets together from other disparate sources, for maps, route parameters, world dates/times, road maintenance and control status, nearby events from the newscasts and internet resourcing, and will then channel the collated and integrated data product to the taxi driver.

Optimising the ‘how’

The potential and opportunities for channelling all data to users is indeed exciting. But the ‘how’ of this is still being developed. Factors such as data sourcing, preparation and provisioning; the processes, techniques, methods and patterns (functions) used to access and process the data are yet to be optimised.

In future, it is likely the data emerging from addressing the ‘how’ will also offer up new and useful insights for the source providers themselves.

They may be in a position in future to analyse function performance statistics, compute and throughput resource consumption, affinities, preferences and the least resource consuming patterns through the systems and applications, for example.

To serve up data while addressing data management and governance concerns, organisations need to define key architectural and governance principles behind the sharing and ownership of data centrally and use enabling technology to execute on these principles.

These concepts are co-dependent, as just the definition of the principles will make for slow to no progress, while the use of technology without the defined principles will lead to poor adoption of the technology and failure of the project.

A potential solution is a data fabric. The core principle behind the data fabric architecture is to provide a curation and abstraction layer for data across multiple disparate sources, but with this layer having the ability to deliver governance and data management uniformly.

Depending on the requirement and size of an organisation, a data mesh may enable the concept of data democratisation, by assigning responsibility of data management across disparate teams in silos, but they still maintain some central governance principles while executing their day-to-day operations.

Veemal Kalanjee

MD of Infoflow.

Veemal Kalanjee is MD of Infoflow, part of the Knowledge Integration Dynamics (KID) group. He has an extensive background in data management sciences, having graduated from Potchefstroom University with an MSc in computer science. He subsequently worked at KID for seven years in various roles within the data management space. Kalanjee later moved to Informatica SA as a senior pre-sales consultant, and recently moved back to the KID Group as MD of Infoflow, which focuses on data management technologies, in particular, Informatica.

See also