About
Subscribe

Turning data insight into risk resilience

Sensitive data is dynamic. Continuous discovery and intelligent classification are now the backbone of risk management and safe AI scaling.
Johannesburg, 05 Dec 2025
Erik Du Toit, sales specialist and consultant, OpenText.
Erik Du Toit, sales specialist and consultant, OpenText.

Sensitive data used to be easy to define. It was the information you knew you had – customer records, identity numbers, bank details, contracts, board material and so on. You protected it by protecting the systems it lived in. And while that mental model still shapes many governance programmes, it no longer fits how data behaves. 

In hybrid cloud and multicloud estates, sensitive data is everywhere and constantly changing state. It moves from core systems into SaaS workflows, collaboration spaces, spreadsheets, then into automated re-use. It is at rest, in motion and in use (and sometimes all in a single day). “Sensitive data today isn’t defined only by what it is, but by what it does,” says Erik Du Toit, OpenText sales specialist and consultant. “It moves, it multiplies, it gets repurposed. If your definition doesn’t account for that fluidity, your controls will always trail behind.” 

Sprawl meets speed

The first discovery challenge is scale. Data sprawls across on-premises platforms, multiple cloud environments and third-party services, so no single team has full sight of the estate. “Governance cadence matters,” Du Toit continues. “If discovery happens quarterly while the data landscape changes daily, you are managing a historical artefact instead of a living environment.” The second challenge is pace. Data growth was already steep, then generative AI turned a steady incline into a surge. Content is produced continuously – prompts, drafts, meeting transcripts, synthetic datasets, derived documents – and it lands in places traditional governance rarely watches. Periodic discovery exercises can’t keep up. By the time a scan finishes, the estate has shifted again.

Most of the real exposure sits in unstructured data. E-mails, documents, media files, chat platforms and scanned PDFs. Even IOT outputs make up most of what enterprises hold and they do not behave like neat database rows. “Unstructured data is where visibility breaks down first,” adds Du Toit. Unstructured data lacks uniform formats, lives in scattered locations and resists one-size-fits-all policy. Permissions are manual, inherited and rarely revisited, so yesterday’s access becomes today’s breach risk. When organisations finally do proper discovery, they usually find three things: vast ROT volumes still taking up space and widening exposure, sensitive files shared too broadly because convenience won and shadow repositories or legacy archives holding unknown PII with no clear owner. “Discovery is where you see the hidden risk of unstructured data,” says Du Toit. “It’s not just hard to manage, it silently accumulates risk until something triggers it.”

From compliance to capability

Privacy and sovereignty rules are forcing discovery to grow up. POPIA, together with global frameworks like GDPR, makes data visibility a requirement, not a preference. Organisations must know where personal data resides, how it is protected and whether it is moving across borders with the right controls. That reshapes discovery and classification strategies, especially in multilingual, multi-jurisdictional businesses. “Regulation is pushing enterprises from policy thinking to evidence thinking,” Du Toit says. “You cannot demonstrate compliant handling if you cannot continuously show what you hold, where it lives and who can reach it.”

A more interesting shift is happening alongside compliance pressure. Some organisations now treat data discovery as a business enabler, not a checkbox. “The biggest misconception is that discovery is a project,” Du Toit adds. “It’s a practice. If you don’t sustain it, you revert to blind spots by default.” Done well, discovery lifts data quality and reduces model risk, lets teams delete ROT with confidence and strengthens trust because governance is provable. Yet many programmes still trip over the basics, relying on metadata-only scans that miss context, over-deleting without weighing business value, ignoring collaboration platforms where sensitive content spreads fastest and treating cleanup as a once-off exercise. The risk comes back long before governance catches up.

The best practice playbook

So, what does ‘best’ look like in the real world? It starts with continuous discovery across structured and unstructured data, wherever it lives. “You have to treat discovery as a live signal, not a snapshot,” says Du Toit. From there, classification needs to understand context, not just patterns, backed by risk scoring so teams focus on the most exposed or valuable data first. Automation keeps pace without creating noise, context-aware rules cut false positives and AI handles scale, while human oversight defines business-specific sensitivity and validates ambiguity. This mix matters even more now as generative AI creates synthetic and derivative data at speed, which must be classified in real-time to avoid leakage into models and outputs. “The organisations that win with AI and compliance at the same time will be those that radically improve data self-awareness,” he ends. “If you can discover continuously, classify intelligently and act on risk in priority order, you turn governance from a brake into an accelerator.”

Share