Stuck in a swamp: is the era of big data over?

By Jacques du Preez, CEO at Intellinexus

Johannesburg, 19 Apr 2022
Read time 3min 20sec

Is the growing volume of unstructured data pointing to the end of big data as we know it?

Some analysts estimate that 80%-90% of all data generated today is unstructured, which adds immense complexity to companies' big data ambitions. Unstructured data also increases the risk that data lakes become either difficult to mine for value or, in the worst cases, unsuited to enabling better decision-making.

Trawling bottomless data lakes for value is not an efficient strategy for data-driven decision-making. Companies should instead develop strategies and deploy tools that will act as a harpoon gun that can hone in on precisely the right data for the problem or opportunity the business is dealing with.

Developing a clear business case for the data produces better outcomes than a spray-and-pray approach. Honing in on a business problem and defining which data sets can support and enable strategic decisions will yield better results than trying to mine endless troves of data in the hope that patterns and insights emerge.

The alternative is wasted time and effort and missed opportunities as business decision-makers drown in data swamps offering little to no actionable insight.

Drain the (data) swamp

The growing volumes of data that many organisations are dealing with are leading to data swamps that quickly become hard to manage and introduce inefficiencies in how companies mine that data for actionable insight.

One of the key challenges of data lakes is that it's all too easy to source vast volumes of data and to dump that into a data lake with the hope of finding some value.

This often leads to data swamps where valuable insights are drowned in a sea of irrelevant data, slowing down decision-making and hampering organisational efforts to become truly data-driven.

To combat this, companies need to categorise and catalogue all data going into the data lake and then inform users about what data is available and where to find it.

Due to the nature of various data formats – such as text files or parquet – it is essential that companies have the metadata and make it accessible to improve its utility.

Having properly defined metadata also adds significant value to that data: when automation is applied to data pipelines for upstream data warehouses and applications, using a metadata-driven approach can help eliminate many of the inefficiencies plaguing companies.

Data habits driving better outcomes

How do organisations minimise the risk of data swamps and ensure they have the most appropriate data and insights at their fingertips to drive positive business outcomes?

Certain good data habits can aid organisational efforts at achieving data-driven decision-making:

Firstly, using accurate organisational data to determine future decisions creates greater transparency and helps break down silos between different departments.

As enterprises create and store more data – for example, transactional data – in digital form, they create greater opportunity to mine that data and guide discovery of new insights that may have been hidden previously.

Segmentation and customisation can provide further opportunities to fine-tune product or service offerings to specific customer segments, which can drive greater revenue and improve business outcomes.

Automation is essential, though. The underlying algorithms that analyse big data sets can be used to replace manual decisions and improve efficiency by doing away with the need for labour-intensive manual calculations. This can have the halo effect of optimised processes and improved accuracy and response times.

Finally, organisations that put data at the centre of innovation, for example, by analysing purchasing data to identify demand for products in market segments that were not apparent, typically have better processes and systems in place to improve data usage and avoid data swamps.