Data consolidation: It’s so simple, even Generation Z are applying it

The benefit of spending a bit more time upfront to plan and consolidate data, to make it easier to work with, is not that difficult to buy into.

My nine-year-old daughter is at the phase in her life where she is learning how to study. It’s been so long since I was in junior school, I had forgotten that there is a time in a person’s life when they don’t know how to approach studying or research – or in other words, how to gather intelligence.

She recently attended some study sessions and, with help from her mom, proceeded to create her very own study guide to help her. Having done that, she couldn’t wait to show it off to me. I spent a fair amount of time going through her study guide, making the appropriate exclamation of surprise and awe every few pages.

In all honesty, it really was quite impressive. There was a separately indexed section per subject area, and each subject area had its own grouping of a related set of detailed information, summaries and even mind maps! And of course, in a great deal of colour.

Admittedly, I found myself getting a bit side-tracked along the way, as I couldn’t help but notice some parallels between the work that she had done, and what we do as architects and engineers in the data industry.

Just like my daughter, we have a great deal of information that we need to digest. We cannot possibly use it all in its native format and so we try to consolidate that information, organise and index it to allow ourselves to better manage the massive amount of unwieldy information we are faced with. We also place a focus on making sure the information is accurately registered, to ensure we know what we have.

Having gone through the entire study guide, I decided to test my youngling. I asked her: “This is all nice and pretty my girl…but why did you have to do this? What did you gain from doing this?” To which she replied: “Dad, I have sooooo much to learn, I can’t possibly remember it all, and when I need to remember something, I never know where to find it! By creating a study guide, I have found all the important stuff I need, and put it all in one place! It took me a while, but it is so much easier for me to study now!”

As a solution and data architect, I am constantly pushing for simplification and consolidation of systems: hardware, software, operational systems and data engineering processes.

Honestly speaking, I wasn’t expecting such a concise, meaningful answer. What made it so surreal was that her answer, while expressed in a youthful manner, is the textbook answer that any data architect would give you in explaining the importance of consolidation.

As a solution and data architect, I am constantly pushing for simplification and consolidation of systems: hardware, software, operational systems and data engineering processes. What we as data professionals know, is that the thing that makes business intelligence, data engineering, data integration, etc, so complex, is the sheer volume of disconnected sets of data that we must work with.

However, time and time again, I engage with clients, only to see multiple sets of source data, on diverse hardware and software platforms, in multiple redundant systems. Very often there are valid reasons for this – businesses are bought out or companies merge, but sadly there is often no rhyme nor reason for why the data sources are so disparate. And yet, this “de-consolidation” of data is one of the leading factors in the rising cost of data delivery.

While it would be hugely unrealistic to expect all data to reside in a single database, in a single database instance, on a single server in a single application, I still question if it should really be the other extreme and costing businesses unnecessarily? Should there be so many copies of the same database? So many instances of a single database platform? So many different, competing data platforms and application systems?

The benefit of spending a bit more time upfront to plan and consolidate data, to make it easier to work with, yielding benefits in the long-term should not be that difficult to buy into. My nine-year-old daughter gets it, and is buying into it, enthusiastically…so why not our industry?

To my mind, the reason for this is quite simply due to commercial expediency, planning expediency, political expediency, personal expediency, and many other variations of expediency, and it’s all of a short-term nature.

This, of course, is understandable – we are all human and we are all working under a great deal of pressure. At some point, however, someone in the organisation must acknowledge the decisions that are constantly being made for the sake of short-term expediency are starting to drive the long-term strategy of the organisation. And that is bad.

This will lead an organisation down a path where the data platform for a client will reach a point of gridlock, where the resulting landscape is so convoluted and difficult to work with, that the smallest change request has unacceptable high costs and turnaround time to achieve. At this point, agility, flexibility, freedom to explore, freedom to experiment, is lost for good.

While many of us know this and accept this, what are we doing about this? I am writing content pieces about it and am warning my clients every chance I get. What are you, the reader, doing about this?

Julian Thomas

Principal consultant at PBT Group

Julian Thomas is principal consultant at PBT Group, specialising in delivering solutions in data warehousing, business intelligence, master data management and data quality control. In addition, he assists clients in defining strategies for the implementation of business intelligence competency centres, and implementation roadmaps for a wide range of information management solutions. Thomas has spent most of his career as a consultant in South Africa, and has implemented information management solutions across the continent, using a wide range of technologies. His experience in the industry has convinced him of the importance of hybrid disciplines, in both solution delivery and development. In addition, he has learned the value of robust and flexible ETL frameworks, and has successfully built and implemented complementary frameworks across multiple technologies.

See also