Get back to basics to capitalise on ML and AI in analytics

Read time 4min 20sec

When artificial intelligence (AI)-enabled predictive and prescriptive analytics first promised companies a crystal ball with which to view their futures, many were so blown away by the glamour of beautiful, interactive graphs that they rushed to invest in the technologies without getting their foundations right.

For many of them, AI projects proved disappointing.

Gartner predicted some years ago that through 2022, 85% of AI projects would deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them. A 2019 white paper by Pactera Technologies, in association with Nimdzi Insights, echoed this high failure rate.

Despite the challenges organisations have experienced, AI has become central to most organisations’ digital transformation journeys.

The PricewaterhouseCoopers third annual AI Predictions Report, issued last month, indicates companies still have faith in the potential for AI: over half of US respondents are increasing their AI investments after the COVID-19 crisis, even though 76% of organisations are barely breaking even on their AI investments.

The research found that a quarter of the participants reported widespread AI adoption, up from 18% the year before.

AI needs data

For analytics to deliver, the data in use has to be clean, accurate and relevant. Data analysts have traditionally spent a great deal of their time – up to 70% – on data discovery and preparation rather than on developing data science models.

Similarly, for AI in analytics to be effective, the right foundations have to be in place. It is a well-known fact that AI has an inherent reliance on data. The challenge many organisations have is with the identification, access and availability of the relevant datasets for the use cases, and consequently, the AI models do not always yield the expected results.

For analytics to deliver, the data in use has to be clean, accurate and relevant.

In South Africa, AI will be the next logical step after the current wave of machine learning adoption. However, since machine learning (ML) is also being deployed to draw conclusions or predictions based on sample sets of data, it requires the same foundations as AI will require: quality and readily accessible data.

If the datasets being used are of poor quality and do not represent target markets (ie, non-specific or generalised), you will have poor predictions, which will cause business decisions to be made on incorrect data.

As the demand for insights escalates and AI enters the analytics arena, organisations will need to look at preparing a catalogue of timely and good quality data, so that data scientists can quickly review the data sources and define algorithms to get good insights into the business use case.

The right data in the cloud

Where the data resides is another key consideration. In the past, many organisations lifted and shifted entire big data lakes from data centres into the cloud, in the belief this would offer cost savings. Not only were these savings not achieved, but analysis often became more complex.

The cloud does present an ideal environment for analytics due to its scale, elasticity, redundancy and accessibility to the broader enterprise; however, not all data should be in the cloud – only the right data.

The approach to cloud migration projects is often focused on “getting the data to the cloud”, instead of starting with a top-down approach and focusing only on the data that is pertinent to the business problem.

Unmanaged and non-curated data ingestion and storage leads to bloated costs being incurred on the cloud platform due to unnecessary and irrelevant data being moved and processed on the cloud. This is a huge challenge for “extract, load now and transform later” methods traditionally followed by big data projects.

When embarking on these initiatives, one of the critical success factors is to focus on a set of high-value use cases on which to base the AI/ML model. These use cases are typically based on a business problem that needs to be addressed.

Data is often collected from a myriad of sources, however, the key is to identify the data that is relevant to these high-value use cases to ensure the business problem is solved, the costs of the cloud platform are managed, and that the AI model does not only translate into a technical benefit.

When companies embark on cloud migration projects, it is pertinent that they consider their data management strategy for the cloud, in unison with the cloud migration planning process.

Use of technology such as data integration platforms, data catalogues, data quality and data security solutions is key within this step as it ensures data management principles can be executed in a controlled and standardised way.

Once this data foundation is set, it enables any future initiatives in the cloud to follow a recipe for data curation that leads to better AI models and insights.

Veemal Kalanjee

MD of Infoflow.

Veemal Kalanjee is MD of Infoflow, part of the Knowledge Integration Dynamics (KID) group. He has an extensive background in data management sciences, having graduated from Potchefstroom University with an MSc in computer science. He subsequently worked at KID for seven years in various roles within the data management space. Kalanjee later moved to Informatica SA as a senior pre-sales consultant, and recently moved back to the KID Group as MD of Infoflow, which focuses on data management technologies, in particular, Informatica.

See also