Subscribe

Big data versus accuracy

Analysis of unstructured big data has the potential to enhance structured data analysis. But don't believe everything it contains.

Mervyn Mooi
By Mervyn Mooi, Director of Knowledge Integration Dynamics (KID) and represents the ICT services arm of the Thesele Group.
Johannesburg, 05 Aug 2016

Companies around the world are buying into the hope that big data analytics, which includes combining structured and unstructured data, will deliver an all-knowing 'crystal ball' to drive competitiveness. Last year, Gartner reported big data analytics services alone was a $40 billion market and growing fast.

A major part of the big data appeal is the promise of combining accurate and structured internal data with fast-changing unstructured external data, offering a complete picture of the market environment and the company's own position within it.

However, while unstructured external data could add useful new methods for information-gathering and decision-making processes, it cannot be considered 100% accurate. In some cases, it will not be even close to accurate, and cannot be counted on as a basis for making crucial business decisions.

Search and repeat

The proportion of unstructured external data brought into the big data mix, and how much credence is given to it, depends on the questions to be addressed, the company's willingness to accept discrepancies in the data when answering a particular question, and the importance of the decisions to be made based on the big data analysis. Searching for useful insights in unstructured external big data may also require a few passes before acceptable data is identified.

Deriving business value from big data is not an exact science.

For example, a new car dealership looking for prospective customers might rely entirely on external data to build a leads list. It might use a search engine to identify companies in the area of the dealership, then narrow down the list to companies likely to need cars and likely to have the budget for new cars. The resulting leads list is a good start, but may still require verification calls to determine whether the prospective customers are still in business, still based in the area and likely to be interested.

A bank investigating new branches and new markets might combine its own structured customer data with unstructured external data such as a map, to plot a visual representation of where existing customers are, and where there are gaps with potential for marketing to new customers. This insight may require further clarification and does not guarantee new customers in the blank spots on the map, but it does give the bank useful information to work with.

When a company is seeking insights for business-critical decisions, the ratio of qualified structured data to unstructured external data should be around 90-10, with unstructured external data serving to complement the analysis, not form the basis of it. This is because structured (high-value) data is traditionally compliance and quality bound and can be trusted.

When using big data analytics, companies should also note that deriving business value from big data is not an exact science, and there are no guarantees. For instance, a company using its own data in combination with unstructured data to assess interest in its products might count visits to its Web site as an indicator of its popularity.

While the visitor figures might be accurate, the assumptions made based on the figures could be completely wrong, since visitors to the site could have stumbled across it by accident or have been using it for comparison shopping and have no interest in buying the products.

Big data analytics is helpful for traversing high volumes of unstructured data and supplementing the company's existing, qualified data. But, depending on the answers needed, big data will need to achieve greater degrees of accuracy and reliability before business-critical decisions can be made based on its analysis.

Share