Get chatty with GenAI to find data that says it all

Talk to your data and it will talk back to you. Welcome to the brave new world of generative artificial intelligence.
Wayne Lebotschy
By Wayne Lebotschy, Growth and innovation director, DataOrbis, part of Smollan.
Johannesburg, 07 Jun 2024
Wayne Lebotschy, growth and innovation director, DataOrbis, part of Smollan.
Wayne Lebotschy, growth and innovation director, DataOrbis, part of Smollan.

More than 20 years ago, the first data warehouses burst onto the scene. While many of those early data warehousing projects may have failed to meet expectations, they did provide an indication of what could be possible.

Fast-forward to the 2020s, and the emergence of the internet of things and the related proliferation of connected devices and sensors pushed the volumes of data being generated off the scale.

In parallel, what one could call “the promise of data” became ever more glittering. Everybody now agreed that data was the organisation’s most valuable asset, and that the ability to analyse it, and draw insights from it, would usher in a new era of fact-based, rather than intuition- or experience-based, decision-making.

Over the years, tools emerged to make it easier to visualise, and thus understand, what the data was saying. But the fact remained that users had to request what reports they wanted, and those reports had to be coded by a steadily growing development team.

Getting exactly the report one wanted was a battle, both time-consuming and expensive. Even worse, the slow “time to insight” meant the insight was often outdated by the time it was gained.

The launch of ChatGPT and the subsequent advances in the capabilities of artificial intelligence (AI) spelled the answer to the longstanding data question. Thanks to generative AI (GenAI), it’s now possible to put queries to data in plain English and receive a useful answer, and then query the results for more information or a related question.

It’s nothing less than a step. Thanks to GenAI, the query process becomes completely user-friendly and fully interactive. “Show me the sales data for the past six months, and now show me the top 10 items in the last six months compared to the corresponding period last year and the year before,” and so on. There’s no need to contact the IT department to get a business analyst to come and scope the query and then write the necessary code.

And given the advances in voice-recognition technology, it’s possible to use audio to ask the query − and even receive the response.

What needs to be in place

The future is often portrayed as even less perfect than the present − but as regards data at least, the future looks bright. However, a few important principles must be in place:

Any data source must be usable. One of the big problems in the past was elaborate programmes for rejigging corporate data to make it usable. This is time-consuming, expensive and ultimately not practical. The new generation of tools must be able to use existing corporate data as is − a new data strategy should not be required.

The launch of ChatGPT and the subsequent advances in the capabilities of AI spelled the answer to the longstanding data question.

Competitive data must be secured. There are whole chapters that could be written about data security − as it has proliferated and its value has become apparent, the need to protect it has become paramount.

Regulations like the General Data Protection Regulation in the European Union and the Protection of Personal Information Act in South Africa are designed to address one aspect, but organisations themselves are increasingly concerned about protecting their own data as it contains highly-competitive information.

At the same time, though, this data needs to be available to GenAI. Users must be able to ask whatever question they want, but the underlying architecture needs to ensure the data doesn’t leak from its database into the large language model (LLM) doing the processing.

The tool must be simple to use, but able to deal with complex questions. This may sound like a new variation on “having your cake and eating it” but in fact, it’s the big promise of GenAI.

Really, the point here is that the interface through which the query is asked must be intelligent enough to be capable of dealing with imprecise, natural language queries framed in typical business language. But, at the back-end, there must be powerful capability to solve complicated business problems.

One other thing: it should be possible for the conversation to take place in multiple languages, not just English, to make it truly user-friendly. Zulu, Hebrew, French, Hindi and so on, should be supported.

The tool must be able to use multiple LLMs. The LLM is the machine learning model where all the magic happens. LLMs rely on analysing massive data sets. One like ChatGPT uses a wide variety of data, but in business there is a growing move towards sector-specific LLMs that train themselves on specialist data sets.

For example, a user in the finance department will be asking very different questions, and referring to very different types of data, from a user in the marketing department. Ideally, therefore, the GenAI tool must be able to use multiple specialist LLMs in the background, without the user even knowing what is going on.

The tool must be contextual or embedded. Rather than users having to leave the application or portal they are using to access the analytics dashboard, it is much more useful if the dashboard can be embedded where it is needed.

Generative AI applications that embody these principles have the potential to change the way we use data, and greatly improve the value we get out of it.

On the one hand, by making it easy for any user to query the data using a wide range of conversational language, they effectively democratise data, and make it quick and easy to get the report one needs. The only real limitation that remains is the user’s ability to ask the right questions.

On the other hand, there is a greatly reduced need for large development teams as GenAI can do the vast bulk of the back-end coding required for each report. Consequently, organisations get a better result for much less money.

We now truly can talk to our data, and have it talk back to us − and save money at the same time.