Using AI to build new AI now a reality

Johannesburg, 09 Dec 2021

Wolfgang Knupp, Consulting Data and AI Architect at IBM South Africa.

Once the stuff of science fiction, the ability to harness artificial intelligence (AI) to build new AI is now within reach of citizen data scientists, says IBM.

Wolfgang Knupp, Consulting Data and AI Architect at IBM South Africa, says while AI-infused technologies are widely in use across consumer products and services, most people still find themselves doing many of the same laborious tasks they have always done at work. “They don’t need to do so – it has become quite achievable for them to use AI at work to make their lives easier,” he says.

“I would go as far as to say that the time has arrived where it is possible for citizen data scientists to successfully build their own AI, thanks to the fact that AI-infused technology is available to significantly accelerate the previously complex, expensive and time-consuming steps required to build new AI,” Knupp says.

Knupp says the steps involved in building one’s own AI are:

Articulate what you expect your AI to do for you (the use case);
Gain access to a modern data platform that includes data fabric and MLOps capabilities, such as IBM’s Cloud Pak for Data. “Cloud Pak for Data is available as a SaaS offering on IBM cloud, but can also be deployed on any cloud or on-premises infrastructure, with some key components available as free ‘lite’ plans,” he says.
Gather/source as much relevant data as possible to train your AI. “The data platform/data fabric you use can help you find, access and shape the data you need for your AI,” Knupp says. “A data platform should include a ‘shopping for data’ capability where users can simply state what they are looking for and the platform will respond with a list of available data assets matching the search request. Should the required data assets not be available in the data catalogue, the data fabric should provide the capability for you to place a request for data into the platform and automatically route the request to a person in your organisation who can source the data and make it available to you via a managed workflow.”
Understand your data. Says Knupp: “The data platform/fabric must allow you to explore the found data assets in terms of context, quality, timeliness, size, origin, etc, to allow you to ascertain if it is suitable as a basis for your AI. A good data platform/fabric will have auto-discovery and ML-assisted curation and data quality features that simplify and accelerate these previously very hard to achieve and time-consuming data governance and stewardship tasks.”
Engineer and shape your data to become usable for building AI models. “You may need to combine data from multiple sources, establish common keys on which to join them, change data types, format or standardise codes, remove bad data or similar,” he says. “To allow these tasks to be manageable by non-experts, the data platform must provide easy-to-use codeless data shaping tooling and scalable data virtualisation.”
Create/train/test your machine learning model(s). “Until very recently, this was the exclusive domain of data scientists, but thanks to some clever automation and advances in AI, tools like Watson AutoAI now allows citizen analysts to simply present their data sets to the tool, select the column/field they would like the model to predict, and then AutoAI will prepare and standardise the data and then iterate through a large number of applicable ML algorithms to train and evaluate which model would be best suited for your data,” Knupp says. “It allows you to review the model quality using a variety of metrics and then allows you to save the model for deployment.”

Knupp says his team has had very positive feedback from experienced data scientists who report that this tooling saved them many months of work and significantly reduced the chance of errors.

He says: “You need to deploy the machine learning model(s) to an environment where they can be used for real-time scoring via an API. The deployment and operation of ML models has also been significantly accelerated and simplified when using a modern data platform/fabric like IBM’s Cloud Pak for Data.

In a few simple steps, a saved ML model can be deployed into a production environment in online or batch scoring modes complete with a secure REST API endpoint that can be called by your application. You need to build a new AI-enabled application or adapt your application to use the deployed ML model to provide you with the AI assistance you want.”

Knupp says at this stage of development, the citizen data scientist could hand over to the application development or integration teams, or they could consider creating a new data-driven application using development tools like Palantir for Cloud Pak for Data, which takes advantage of the power of the data fabric to allow the construction of a data-driven and AI-enabled application using a low-code/no-code approach.