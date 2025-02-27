It’s crucial that African languages and voices are part of the AI evolution, says Lelapa AI.

Lelapa AI has collaborated with Way With Words and the University of Pretoria’s Data Science for Social Impact to unveil a framework dedicated to artificial intelligence (AI) ethics and inclusion.

Named the Esethu Framework, the model aims to ensure African language speakers are not only contributors to AI research, but also beneficiaries of its growth, says Lelapa AI.

It is a sustainable data curation model that gives African communities greater control over their linguistic data, while ensuring ongoing reinvestment into new African language datasets.

“Lelapa AI has created a novel data framework that prioritises the African language technology ecosystem,” says Jenalea Rajab, research lead at Lelapa AI. “It ensures commercial use is open to African entities, while revenue from non-African companies funds local data creation.”

Lelapa AI is an Africa-centric AI research and product lab dedicated to promoting linguistic diversity and digital inclusion.

African languages remain vastly underrepresented in AI models. According to the lab,African language data has been freely used by AI giants without reinvesting in the communities that created it.

The Esethu Framework aims to change this. Key features include the Esethu licence and a community-centric licensing scheme, to create clear pathways for ethical commercialisation by ensuring foreign companies using African language data pay it forward, funding more language data collection.

Additionally, it features community-led development by ensuring local linguists and native speakers play a key role in dataset creation. It also focuses on scalability and replicability, so that it can be applied to any low-resource African language that needs better AI representation.

As part of the first dataset developed under the framework, the collaborators have introduced the Vuk’uzenzele isiXhosa Speech Dataset (ViXSD).

The ViXSD dataset is an open-source automatic speech recognition dataset for isiXhosa. It includes 10 hours of isiXhosa speech data, diverse speakers across dialects, age groups and regions, as well as ethical licensing that ensures future isiXhosa data growth.

“This dataset opens new opportunities for building voice AI, transcription tools and multilingual natural languages processing models that better serve isiXhosa speakers,” says Lelapa AI.

“The Esethu Framework ensures the same process can be applied to build datasets for other African languages, ensuring long-term, sustainable language AI development across the continent.”