About
Subscribe

PBT Group: Why businesses must rethink their data lakes

Johannesburg, 19 Aug 2025
Why businesses must rethink their data lakes. (image: PBT Group)
Why businesses must rethink their data lakes. (image: PBT Group)

While businesses are racing to implement artificial intelligence (AI), many overlook a critical factor in success: the quality and structure of the data feeding these models. The reality is that a model is only as good as the data it is trained on. This is according to Julian Thomas, Principal Consultant at PBT Group.

“If your data lake is unmanaged or full of unstructured, incomplete, insufficient or unreliable data, even the most sophisticated AI will not deliver value,” he emphasises.

Thomas explains that too many organisations treat their data lakes as passive repositories, a place to store everything, rather than a curated resource. This approach undermines governance, hinders usability and creates downstream issues for data teams tasked with developing AI and machine learning solutions.

“To get AI right, we need to shift the mindset around data lakes. They should be active environments governed by frameworks like the Medallion architecture, which helps teams clean, refine and enrich data in a structured, layered way.”

PBT Group often uses the Medallion architecture to bring structure to a data lake. It separates data into three layers. Bronze for raw, unfiltered data; Silver for data that has been cleaned and enriched, that is more analytics-friendly; and Gold for the curated, trusted datasets that are fully governed and ready for use in business intelligence or machine learning. This progression helps teams work from a consistent base, trace where data comes from and ensure that what is delivered matches the needs of the people using it.

But a layered structure is only part of the solution. The real differentiator, according to Thomas, is data wrangling.

“Data wrangling is not just a technical clean-up. It is a deliberate, skilled process of transforming messy, inconsistent data into something reliable and fit for purpose. That includes everything from deduplication to validation and enrichment.”

This approach is particularly important in industries like financial services, where it is essential to know exactly where your data comes from and how it has been handled. It is also crucial when training AI models, which depend on accurate historical data to perform reliably and fairly.

As part of the wider data wrangling process, Thomas emphasises that it is important to understand the main difference between data wrangling and the process of extract, transform and load (ETL). “Data wrangling can be considered as ‘informal ETL’, done in the context of machine learning for a given initiative. ETL is effectively the same activity, however, it is automated for long-term use. Once data wrangling has been completed with the resulting training model approved for production implementation, the data wrangling solution must be handed over to a formal engineering team where it can be converted into formal ETL.”

Thomas also cautions against viewing data quality as a once-off project.

“Data governance must be embedded into daily operations. From ingestion to output, quality controls, validation steps and metadata tracking need to be built into every phase.”

The payoff? A structured data lake combined with rigorous wrangling makes data more accessible and AI-ready. It enables teams to experiment with confidence, deliver faster iterations and avoid the costly rework that comes from poor input data.

“As AI becomes more integrated into business decisions, the pressure on data teams will only increase. Getting the fundamentals right now, especially how we wrangle and structure our data, will determine who actually succeeds in turning AI into value.”

Share

PBT Group

PBT Group is a technology and cloud agnostic Data Specialist and Software services solutions provider. With more than 800 highly skilled consultants, PBT Group has 25 years’ experience across 27 countries, 5 continents, and a variety of industries. As a Data Specialist organisation, we provide a one-stop data service offering and pride ourselves on long-term client relationships.

PBT Group operates in Africa and Europe, providing services and creating solutions that capitalise on data-driven insights, to make well-timed, intuitive business decisions that consistently position our clients ahead of the curve.

PBT Group is a 51% black-owned, level 1 B-BBEE company, based on the ICT Sector Codes. For more information, visit: www.pbtgroup.co.za.

Editorial contacts

Nicole Allman
INK and Co.
Nicole@inkandco.co.za