Open source community should help fight fake news

By making a fake news detection model available as a global open source project, it could be improved to make detection faster and easier.
Read time 3min 30sec
Aroma Rodrigues, software activist
Aroma Rodrigues, software activist

Fake news, propaganda, deep fakes: these are all manifestations of what has become a major societal and civic problem around the world. Driven to a very large extent by technology, such disinformation can and should be countered by technology. The open source community is ideally placed to take the lead in this. 

That’s the view of Aroma Rodrigues, a full-time Python developer at a major bank in India and a part-time software activist. She told delegates at last week’s PyConZA 2019, part of SA’s Open Source Week, that they can and should be doing more to use their skills for social good.

Rodrigues defined fake news as news that looks real and deludes people into believing it, but it is either entirely false (what it is reporting on didn’t happen), or has been modified to suit vested interests.

While fake news and propaganda have always existed, the widespread use of social media and instant messengers makes this false information infinitely more dangerous.

“In the last few years, fake news has been used to slander communities and incite violence, riots and even, on occasion, lynching and murder,” said Rodrigues. “Fake news has also been used on social media to topple governments, swivel elections and build up mass perspective for and against individuals and organisations. This means that for the modern world and democratic ideals to survive, the menace of fake news must be addressed.” 

For example, the US-based Knight Foundation, which was established to promote excellence in journalism, examined more than 10 million tweets from 700 000 Twitter accounts before, during and after the 2016 US presidential election. The study found that identified clusters of Twitter accounts linked back to more than 600 fake and conspiracy news sites repeatedly, often in ways that seemed to be co-ordinated, or even automated, in order to sway public opinion one way or another.

“These numbers do not take account of the multiplication factor. The number of people affected could be much larger when one considers the number of ‘likes’ and ‘shares’ each tweet may attract,” Rodrigues added.

It’s not just Twitter. A recent Reuters Institute study of English-language Indian Internet users found that 25% of respondents got all their news via WhatsApp. The same proportion said they got their news from Facebook.

A fine-tuning of the model is necessary, and this is where the Python community, and the open source community as a whole, could come in.

However, content shared via WhatsApp has led to murder. At least 31 people were killed in 2017 and 2018 as a result of mob attacks fuelled by rumours on WhatsApp and social media, a BBC analysis found.

Rodrigues pointed out that while people can conduct a number of checks to determine whether or not a so-called news item is real or fake, this could take too long. Rather, she said, the checking could be automated, using technology such as a Rapid Automatic Keyword Extraction algorithm and natural language processing libraries.

“Essentially, there are characteristics of fake news that can be translated into a technical model, to predict whether a particular article is fake news or not,” she explained.

Rodrigues said she had developed one such model, which had been tested on several clearly identified and popular (in terms of how often they had been forwarded/copied/retweeted) fake news articles, with 100% accuracy. However, her model had not worked as well for all articles.

“A fine-tuning of the model is necessary, and this is where the Python community, and the open source community as a whole, could come in,” she said.

“A model is only as good as the data you feed into it. If this type of fake news detection model could be centralised and made available around the world as a global open source project, it could be enhanced and improved in ways that would make detection faster and easier.”

See also