Subscribe

The lesser spotted language

Human language technologies empower speakers of minority and endangered languages, facilitating their use online.

Tarryn Giebelmann
By Tarryn Giebelmann, Sub-Editor
Johannesburg, 23 Jan 2013
Human language technologies currently in development claim to be able to translate one language into any other language.
Human language technologies currently in development claim to be able to translate one language into any other language.

The Internet has drastically changed the way people communicate, how they organise their lives and how they use language.

With its wide reach, the Internet has the potential to encourage the use, spread and development of minority and/or endangered languages. However, any attempt to save a minority language or promote its use on the Internet is bound to come up against a number of hurdles, including the digital divide as well as the expense and effort required to prepare a language, in terms of hardware and software, for use on the Internet.

Daniel Prado, executive secretary of the Maaya World Network for Linguistic Diversity, notes that while the Internet has become a tool of daily life for urban populations in industrialised countries, it remains internationally inaccessible to five out of seven individuals. He points out that more than five billion people lacked Internet access at the end of 2010 and that distribution itself is uneven - at most, 10% of Africans are connected, compared to 25% of Asians, 80% of North Americans and 65% of Europeans.

According to Unesco, nearly half of the world's 6 000 languages could disappear by the end of the century. The crux of language disappearance lies in a decrease in speaker numbers - 50% of the world's languages are spoken by fewer than 10 000 individuals.

Changing landscape

Linguist David Crystal has noted that an endangered language will progress if its speakers can make use of electronic technology, and that the Internet has a particularly important role to play in the future of minority languages. However, any discussion on Internet media must be grounded in the realities of access and use, as access to the Internet is far from universal and is far down the list of priorities for many communities.

People's desire to use technology can drive them to find solutions that overcome the technical barriers to using their languages online.
People's desire to use technology can drive them to find solutions that overcome the technical barriers to using their languages online.

Although the demographics are changing, English has been the dominant language of the Internet since its conception. Daniel Cunliffe, who has conducted extensive research on minority languages, points out that where sufficient online content exists, national languages appear robust in resisting English. However, even if there is sufficient minority language content on the Internet, the software used to create that content is often in English or another majority language, implicitly reinforcing the dominant status of these languages.

According to Prado, a globalisation process is amplifying language extinction, and this extinction rate is only increasing in the information age as the ICT industry promotes the better equipped or more prestigious languages to the detriment of others. This means speakers will gradually lean towards the language that allows them the widest range of expression.

An endangered language will progress if its speakers can make use of electronic technology.

However, according to Crystal, the Internet offers a home to all languages - as soon as their communities have an electricity supply and functioning computer technology. He further notes that Africa has an extremely high percentage of mobile subscribers and is also the region with the highest mobile growth rate. With access to the Internet via mobile phones being one of the major growth areas, this could greatly alter the linguistic scenario of the Internet.

However, African languages remain poorly represented. Prado notes that, in a sample of 1 374 African sites, only 3.22% used an African language as the language of communication.

The economics of Internet technology development and use disfavours linguistic diversity.

Sociolinguist Maik Gibson highlights the potential of mobile phones in language-revitalisation efforts. He points to the increased penetration of mobile phones into regions such as Kenya, along with the reduction of the price of phones that are Internet-capable. Because of this, Gibson notes, Internet access is no longer dependent on a constant electricity supply or broadband cables, which could possibly allow developing nations to leapfrog infrastructure hurdles.

Technical barriers

The Internet is historically an American technology, modelled on English standards and, as a result, many languages with few speakers are not well represented, if at all.

Crowdsourcing

* Arapesh and Facebook American anthropologist Lise Dobrin is working to preserve Arapesh, a minority language in Papua New Guinea, through recorded conversations. A Facebook community of Arapesh speakers access the recordings and contribute to the ongoing initiative. * Indigenous Tweets Created by Saint Louis University professor Kevin Scannell in March 2011, Indigenous Tweets tracks 127 languages and 47 806 users through micro-blogging site Twitter. A software program trawls Twitter and analyses tweets using statistical language recognition. The program uses a database of words in minority languages to locate speakers of those languages. If a certain percentage of tweets are in the target language, that person's followers' tweets are checked, too. A database of speakers of the minority languages is compiled, which can be accessed from the Web site. * Google's Endangered Language Project Arguably the biggest online language initiative to date, the project was launched in June 2012 and allows people to find, share and store information about dialects in danger of disappearing. The site documents more than 3 000 languages that are on the verge of extinction (about half of all languages in the world). Jacob Collard, interim team leader of the ELCat project at Eastern Michigan University, which constitutes the information on the Google ELP site, notes the site has about 11 000 registered users and 22 registered organisations, all of which are potential contributors. By August 2012, more than 3 000 samples of endangered languages and 170 related documents had been submitted. * Duolingo Carnegie Mellon University professor Luis von Ahn created Duolingo, a project that combines translation with language learning. As users gain proficiency in a new language, they use that knowledge to help translate documents on the Internet for others. The model presents a win-win situation - users learn a new language, while Internet content is translated for other users. It predicts that if one million people used the service to learn, the entirety of English Wikipedia could be translated into Spanish in just 80 hours. * Thu'um.org The Internet is also being used to create new languages or to add the lexicons of existing languages. Thu'um.org is using crowdsourcing in a community-driven initiative to beef up the lexicon of the Dragon Language from The Elder Scrolls V: Skyrim computer game. Contributors can propose new words to be added to the lexicon, such as 'aaznah', which means 'mother' in Dragon Language. More for fun than anything else, this project is a perfect example of how the Internet can be used to communicate in any language users choose.

According to John Paolillo, an associate professor in the School of Informatics and Computing at Indiana University, large languages, like Chinese Mandarin, French, German and Spanish, are well served with their own standard character encodings, fonts, keyboards and computer operating systems.

Other languages that employ a Roman alphabet may piggyback off these resources. However, languages that do not have these resources lack effective encoding skills and, therefore, are hampered in their use of Internet technology. Adapting these technologies or formulating new technologies for under-served languages is an incredibly complex, time-consuming and expensive task. "The economics of Internet technology development and use disfavours linguistic diversity," says Paolillo.

Cunliffe notes that, in the past, legacy software was unable to represent non-Latin characters on the Internet, and while modern standards are able to represent a much wider range of characters, often, minority language communities will only have access to older technology, which is less able to support their languages. In many cases, users will adapt their language to suit the available technology rather than wait for the development of technology able to support their languages. People's desire to use technology, he suggests, can drive them to find solutions that overcome the technical barriers to using their languages online.

According to Cunliffe, the Internet is still predominantly a textual medium, so where a language has no written form, a limited literary tradition or low levels of literacy, the Internet may further marginalise minority languages. The multimedia capabilities of the Internet offer some possibilities for languages in these situations, and in particular, provide languages without a written form as an alternative to reduction in text. According to Prado, IP telephony, digital radio and television, audio and video downloads, video hosting sites and streaming are now a part of everyday life, allowing all forms of communication to employ electronic channels previously reserved for writing.

Researchers Mikami Yoshiki and Shigeaki Kodama propose the 'localisation problem' as the difficulty of producing technology in regional languages. They suggest technologies do not evenly benefit all language communities, creating the possibility for a 'digital language divide'.

The Universal Coded Character Set, the international standard on character code for information interchange, does not include the entirety of character sets used by humankind, meaning most languages cannot be catered for and cannot be used on the Internet.

Tech, gamification, crowdsourcing

Crystal notes that Internet-based media are expected to play an important role in the future of minority languages, pointing to Unesco predictions that there will be more minority language material produced on the Internet than in traditional print or audiovisual forms.

Language technologies, such as machine-facilitated translation, allow people to express themselves in their own languages and enable communication. However, Prado notes that only 1% of the world's languages have an automated translation system at their disposal, and only around 50 languages possess a sufficient number of translated texts. Current technologies are only able to give basic translations and are not yet at the point where they could successfully translate, for example, literary works.

Gamification, or the integration of game dynamics into, in this case, a Web site to enhance participation and learning, is also gaining popularity when it comes to language learning. One such site, Memrise, "uses images and science to make learning languages child's play". The site offers 220 language courses that facilitate language learning through online games. While a lot of these courses are for more mainstream languages like English, Spanish and German, it does include some minority languages and even one South African indigenous language - Xhosa.

Crowdsourcing appears to be a popular way to encourage people to use indigenous languages more and help raise awareness. Crowdsourcing is a form of outsourcing - large tasks (such as translation) are outsourced to a group of people in an open call. Crowdsourcing is usually voluntary and is an unpaid, collective effort. In the case of language translation, participants could be attracted by the opportunity to use their indigenous languages and raise awareness of them.

The Holy Grail?

According to Unicef's Gerrit Beger and Akshay Sinha, South Africans lead as the highest users of mobile technology on the continent, and 72% of South African youth aged 15 to 24 have a cellphone. Furthermore, the country boasts a 100.48% mobile penetration rate, making it safe to assume that all South Africans have access to mobile technology - if they don't personally own a cellphone, they more than likely have access to a friend or relative's cellphone.

Language-related mobile apps

There's an app for everything these days, and language-learning apps are gaining in popularity and usefulness. * 24/7 range The 24/7 range of language learning apps feature Spanish, French, Italian, German, Japanese and Chinese Mandarin, but no minority languages. There is, however, an app that teaches Sign Language and the iHandy Translator app claims to be able to translate into any language. A closer look at this app, however, reveals that only 52 languages are supported, none of them endangered. * uTalk The uTalk range aims to teach the basics of languages and covers a number of indigenous South African languages, including uTalk Zulu (read a review on this app here), uTalk Xhosa, uTalk Sesotho and uTalk Tswana. The range also supports a number of African languages, including Swahili and Somali. * Ma! Iwaidja Developed by Australian National University linguist Bruce Birch and a project team, the Ma! Iwaidja smartphone dictionary app includes a 1 500-entry English-Iwaidja dictionary with audio, a 450-entry phrase book and an information section about Iwaidja and other endangered languages of Arnhem Land. It allows users to record words or phrases with their own translations, including speakers in the documentation process of the language. * Ojibwe Ojibwe is an indigenous language spoken in Canada. The app claims to teach proper pronunciation, important phrases and syllabics, while at the same time educating about the Ojibway People, their history and culture. * PlaySay PlaySay helps users learn a language using a non-classroom, game-like setting. Users can learn pronunciation by using the app to complete a series of real-world missions in scenarios such as ordering food, introducing themselves, delivering a pick-up line, or asking someone for help. * U-STAR Released by a group of Asian linguists, U-STAR "can ingest speech in any widely spoken language and regurgitate a translation on the spot".

An issue, however, is that smartphone penetration is relatively low in SA, with the majority of the population using basic feature phones. Speaking at last year's Popular Mechanics FutureTech event, Alan Knott-Craig Jnr suggested that about 13% of all phones in SA are smartphones, with the rest being feature or "dumb" phones. Some of these may not be able to access the Internet and they certainly do not accommodate smart applications, which could benefit language learning tremendously in terms of content provision and facilitating interaction.

Hardware and software issues may also stifle the full potential of mobile, so what is needed are "smart apps for dumb phones", says Knott-Craig Jnr. These apps must have low bandwidth requirements so they are relatively inexpensive to operate, and should facilitate cheap communication.

According to Beger and Sinha, most technological advancements in SA have taken place in the mobile sphere, leading to a significant rise in mobile ownership and usage. However, SA continues to struggle with a significant lag in both the expansion of ICT infrastructure and ownership of computers and access to Internet.

Where to from here?

If online language initiatives attract support from minority language speakers as well as the broader Internet audience, there can be little doubt of their success, given the infinite amount of people who are active on the Internet.

A lot of work is needed, however. Until minority languages have sufficient computer technology and communication infrastructure, they will remain disconnected, as will their speakers. There may still be hope for languages that do not have a writing system, as the Internet and ICTs offer other opportunities for language cultivation, through, for example, video and audio formats.

However, infrastructure development needs institutional and corporate backing. It requires immense funding, time and effort to develop infrastructure that will support minority languages, but it appears this is still not, and will not be, top of the official agenda for some time. Until it is, it may be up to the speakers themselves to protect their languages and the cultures and traditions inherent in them. And the Internet offers the ideal platform for that.

Share