companyzone companyzone

Better data management begins with PII discovery

The in and outs of doing PII discovery in structured and unstructured data.

Johannesburg, 04 Nov 2021
Read time 4min 40sec
Ken Wessels, Operations Director, Meniko Records Management.
Ken Wessels, Operations Director, Meniko Records Management.

According to Statista, 74 zettabytes of data will be created by the end of 2021. That’s 1.145 trillion megabytes per day, and how all this data is stored – and what kind of information sits inside of it – matters. Thanks to the POPI Act and Europe’s GDPR, processing personal identifiable information (PII) is now a critical part of doing business.

“While the POPIA Act took a bit longer than expected to be promulgated, now that it’s come into effect, it has really made a big difference as it has forced data ownership within organisations,” explains Ken Wessels, Meniko Records Management Services operations director. “It’s forced organisations – both in the private and public space – to appoint an individual who becomes a custodian for that data.”

But before an organisation can focus on protecting or encrypting PII, they need to discover where it lies (and how easy it is to access). PII, which is defined as any kind of data that can be used to identify a specific individual, is divided into different categories, from ID numbers, names and gender to credit card information (PCI) and special privacy information (SPI).

“Different information – especially that which contains high risk data – gets categorised and dealt with differently. It comes with its own set of rules,” explains Wessels. PII can also sit within structured or unstructured environments. While structured data typically sits within databases in rows and columns and is easier to work with, unstructured data is often a “lot of mix and match records” says Wessels. “The main thing that makes PII discovery difficult is the large amount of data defined as PII.” 

From endless different types of PII to the sheer volume of PII found in both structured and unstructured data, discovery isn’t a simple affair. Different industries, be it healthcare, insurance or financial institutions, for example, have a multitude of applications which gather data, and sometimes certain types of PII can be harder to find than others: “While ID numbers are easy to find – the grammars can be customised using expressions and pattern recognitions – passport numbers are not as simple,” says Wessels. “Some foreign nationals have passport numbers that are just numbers and 13 digits long. There’s a lot of work that goes into isolating and finding creative ways of discovering PII to ensure it doesn’t take forever.”

Once found, PII in structured or unstructured data should be encrypted, masked or de-identified. While most well-known software providers build privacy solutions into their data storage, this can actually make PII discovery more complicated. Often the solutions are specific to the software, so with larger, legacy organisations running more than one kind of software, a single tool that can discover across database types and platforms is advantageous. PII discovery should also be localised. Tools built for the European or American market can present false positives, so being able to search for South African-specific PII is important if a company doesn’t want to miss out on valuable privacy information. “A single records management tool can look across a range of databases in the structured world and also look across different source repositories in the unstructured world, looking into file shares and other document management systems,” he explains.

Ultimately, the simplest form of remediation is deletion. Wessels explains that PII discovery begins with identifying ROT – redundant, outdated and trivial data. “We want to reduce the size of the databases. People are sitting with legacy data – outdated information. According to POPIA, this is the type of data a company should be getting rid of – you’re legally not allowed to retain someone’s PII after the regulated retention period has expired.”

What many organisations fail to realise is that PII discovery is not simply about compliance – getting rid of ROT can save time and money when it comes to physical IT resources. Moving less data into the cloud is beneficial because it means less database servers and nodes are required.

“Organisations are compelled to be compliant, which means they need to find privacy information and secure it… it’s also about calculating your risk of being breached within different environments. It's one thing to be breached, but why have you got data that you shouldn't even have been holding onto?” asks Wessels. “And now you’ve caused a larger breach because you just haven't done the basics of cleaning up big data and managing your records properly.”

While smaller companies may be scared of being fined, for larger companies PII discovery is also about reputational risk. If hacked, it’s important to understand exactly where the incident took place and if the PII within that database remains secure. “Discovering PII in structured or unstructured data pays for itself because it helps organisations to optimise their data and make better business decisions. It ultimately adds more value by bringing more insights into the data,” he ends.