Data discovery is where it all begins

Do you know what data you have, where it is and who is using it? If not, you are on a highway of pain on several fronts.
Paul Meyer
By Paul Meyer, Security solutions executive, iOCO Tech.
Johannesburg, 02 Dec 2022

Data has become the key corporate lever for growth and competitive advantage. Not coincidentally, the digitalisation of business and industry is generating ever-growing volumes of data −1 000 petabytes per day, with the total global store of data set to reach 463 exabytes globally.

Data volumes are growing unimaginably large, with increasing amounts of unstructured data, a fact that creates challenges when it comes to storing, analysing and securing it. As the value of data has become recognised, it has attracted the attention of increasingly well-resourced cyber criminals.

Figures vary, but the Identity Theft Resource Centre examines publicly available data breach disclosures and released its key findings for 2021. It found that nearly 294 million people were impacted by data breaches.

The 2021 Thales Data Threat Report found that almost half (45%) of US companies suffered a data breach in the past year. There were 4 145 publicly disclosed breaches that exposed over 22 billion records in 2021, approximately 5% fewer than in 2020.

According to the 2021 Year End Report: Data Breach QuickView, by Risk Based Security and Flashpoint, additional incidents continue to surface. It is typical for the number of breaches disclosed for a given year to subsequently increase by 5% to 10% as the data matures.

Assuming this pattern continues, 2022 is expected to at least match the 2021 breach count, and potentially exceed it by as much as 5%.

Again, the numbers don’t matter so much − just understand that data breaches are increasingly frequent and damaging.

As always, risk and opportunity turn out to be two sides of the same coin.

Apart from immeasurable reputational losses, the yearly average data breach cost increased the most between the years 2020 and 2021 − a spike likely influenced by the COVID-19 pandemic.

The average data breach costs in 2022 are estimated at $4.35 million, a 2.6% rise from the 2021 amount of $4.24 million.

Enter the regulators, with the European Union’s General Data Protection Regulation (GDPR) now acting as the “de facto global standard”, according to Nader Henein, research vice-president at Gartner.

Gartner believes two-thirds of the global population will have its personal data protected by regulation by 2023.

South Africa’s Protection of Personal Information Act (POPIA, as it’s known) is substantively based on GDPR.

To summarise: companies will hold increasing amounts of data, some of which is very valuable and thus heavily protected by regulation. This data will also be the target of sophisticated cyber criminals.

Two key data challenges

Based on this analysis, it should be obvious that every organisation faces a set of challenges related to data. Broadly speaking, they are:

Cyber security: The corporate data treasure chest must be protected to comply with stringent data regulations, but also to safeguard the information that could be useful to competitors. As always, risk and opportunity turn out to be two sides of the same coin.

Operational efficiency: The sheer volume of data is a challenge. Where will it be stored? How will it be managed? How clean is it? (The old saying, “Garbage in, garbage out” remains true.) It is impossible to manually manage these volumes of data and, of course, a large proportion of it is simply not useful. How to establish where the useful data is stored, classify what it is, and then, of course, there is the issue of cost − all that data must be hosted somewhere and that means budget must be allocated to expensive storage for redundant, obsolete or trivial data.

To solve both these issues, organisations need to begin with a process of data discovery. Only once the company has accurate, granular insight into what data it holds will it be possible to assess the risks, classify and take the appropriate steps to manage its data effectively.

The pain points

Another way of understanding the challenges related to data is to look at the pain points organisations are experiencing:

  • How do I know what data I have? The challenge is to discover and classify sensitive data of all types across hybrid IT and multi-cloud ecosystems. A related issue is that only 14% of corporate data is business-critical, while 32% of data is redundant, obsolete or trivial.
  • How do I move towards purpose-based data collection? Organisations want to move beyond simply collecting data indiscriminately to a situation in which they only collect, process and store data for which they have a use, and the right to use.
  • How do I ensure the data I hold is secure, but usable? Security is critical from both the compliance and reputational points of view, but the data also must be accessible when needed.
  • How do I control data access and usage within the context of the data life cycle? A related issue is to ensure access to the data is controlled by suitable policies − and that an audit trail is provided for compliance purposes. Data has a lifecycle and must be treated accordingly. For example, the reason for the data to be collected must be valid, and the relevant regulation may specify how long data needs to be retained. In a similar vein, data that is stale needs to be deleted to reduce costs.

In my second article, I will look at what the solution to the challenge of data discovery should look like.