Subscribe

Web of mistrust

Jon Tullett
By Jon Tullett, Editor: News analysis
Johannesburg, 06 Sept 2012

From star ratings on products to industry analysis of market figures, every part of the technology world hinges on statistics. And those statistics are increasingly coming under fire as dubious maths, guesswork and outright fraud are exposed.

With buying decisions from consumer purchases to multimillion-rand deals hinging on the numbers, trust is rapidly eroding.

The industry is facing a crisis of trust, and with good cause.

Lots of likes

In a recent Securities Exchange Commission filing, Facebook revealed that it estimated 8.7% of its active users - 83 million accounts out of 955 million - were duplicate or fake accounts. That's an increase from the 5-6% the company estimated at the time of its IPO, and the number could grow still further.

For advertisers, Facebook's currency is its brand engagement, measured in “Likes”, but the company has also recently stepped up efforts to remove fake likes, and estimated that on average, brand pages would lose about 1% of their total likes. (It seems surprising that a fake user base of nearly 10% would only generate 1% fake likes.)

For some the penny had already dropped. Just before Facebook's ill-fated IPO, General Motors announced it was going to cancel $10 million worth of Facebook advertising, citing poor returns.

Similar problems plague other social networks. Estimates vary, but some peg the level of spam accounts on social networks to be as high as 40%. Earlier this year, spam messages were accounting for nearly 10% of all message traffic on social sites, and growing rapidly.

PeekYou, a social media analytics firm, estimated that between 30% and 60% of followers on Twitter are “real”, with some outrageous outliers - PeekYou's analysis showed that 92% of Newt Gingrich's 1.3 million followers were fake.

South African businesses are not exempt. Earlier this year, an online spat turned ugly when blogger Shaun Dewberry exposed an enormous inflation in reported listener stats in local online radio, picking out 2Oceansvibe and Ballz Visual Radio as chief culprits. Those stations were claiming, at the time, 40 000 to 60 000 listeners per hour. Dewberry's research claimed those numbers were “complete fabrications”.

The reaction was swift and brutal. NetDynamics, the streaming provider to the sites in question and the source of the stats, responded immediately with legal papers, claiming defamation and threatening court action, while DJ Darren Scott, on Ballz radio, cold-called Dewberry and attacked his findings on air.

MyBroadband, given access to Net Dynamics' logs, then confirmed Dewberry's findings - the stations appeared to be averaging around 200 listeners per hour, hitting about 2 000 per day, not the consistent audience of many tens of thousands they had claimed. 2Oceansvibe promptly severed tied with NetDynamics, and Scott grudgingly apologised to Dewberry.

If the bad news is that the problem is almost completely pervasive, the good news is that users are becoming more savvy. Earlier this year, BandwidthBlog posted an infographic from StrategyWorx, showing social media numbers for SA. Readers of the blog were quick to point out obvious discrepancies in the numbers - Johannesburg and Cape Town, for example, were shown to have identical numbers of Facebook users. StrategyWorx MD Steven Ambrose defended the numbers, saying they came directly from Facebook. The numbers were later revised to a more plausible geographic split, but doubts may linger over their veracity.

Incentive to cheat

The uncomfortable truth is that it is simply not in an online platform's best interest to report less-than-glowing numbers, or even to reduce fraud. Glowing Amazon reviews sell more products, Facebook likes convince brands to spend more, Twitter followers build reputation, publishers' stats attract banner revenue, and so on. Ethics be damned, cheating, which raises those numbers, is a positive result, commercially speaking, provided your reputation survives.

Providers take action when obvious abuse arises, but for any service where quantity is held in higher regard than quality, token efforts are often ineffective. In contrast, Google, with its complete reliance on ad revenue tied to the quality of its search results, is considerably more aggressive at weeding out attempts to game its systems.

And cheating works. One of the founders of Reddit, a popular discussion site founded in 2005, recently admitted creating numerous fake accounts in the site's early days, under the control of the two founders, to spur discussion and give the impression of an established community. Once other users were drawn in and the community began to reach critical mass, those accounts could be withdrawn from use. The strategy worked - Reddit grew rapidly and was acquired by publishing group Cond'e Nast in 2006.

Fake user accounts, known colloquially as “sock puppets”, are common in many online forums, including online reviews. UK-based author RJ Ellory was recently outed as publishing positive reviews of his own work, and criticism of his competitors, on sites like Amazon. Ellory is not alone - author Stephen Leather has also admitted using networks of close friends to achieve similar results.

That practice drew widespread criticism from Ellory's fans, and Amazon removed many of his comments, but Ellory had already sold millions of books. Fake reviews are enormously common - it is an effective way for an author, publisher, marketer or promoter to encourage sales of their product, or hinder a competitor.

Fake reviews are common on all varieties of e-commerce, and strongly influence buying behaviour. Researcher Bing Liu recently published a paper arguing that as much as a third of all online reviews are fake.

The industry's voracious appetite for audience engagement means there is a pent-up demand for these canned interactions. Gettingbookreviews.com offered professionally written book reviews until the author, Todd Rutherford, was blackballed by Google and Amazon. Rutherford charged $100 per review, but you can get cheaper engagement (followers, likes, stars, reviews, you name it) for a couple of bucks a pop - the fake content market is engaged in the traditional race to the bottom.

Crisis of trust

If the numbers can't be believed, what does that mean for the industry?

“We're facing a crisis of trust,” says Arthur Goldstuck, MD of World Wide Worx - a local analyst and research organisation. “Consumers are being asked to make decisions based on Facebook likes or product reviews, corporates are aligning media buying or deciding technology strategies based on market research. If that research is not trustworthy, there is a problem.”

“This is incredibly important,” agrees Paula Raubenheimer, who heads up measurement initiatives at the Digital Media and Marketing Association (DMMA). “The industry's entire reputation is built on stats.” In SA, the DMMA is the third party responsible for providing independent verification of publisher stats, and it's eyeing social media as the next frontier, but even everyday Web statistics is becoming more tricky, Raubenheimer says.

“It's a fine line we're treading. Even the definition of a page view can vary. We count a view if more than 50% of the page is loaded, but as publishers look for technologies to reduce load times, limit the impact on their servers, dynamic content delivery can mean that we don't always register a page view when we should.”

Meanwhile, content providers are moving aggressively into mobile apps, where limited measurement options are available. “Some publishers use Web content in apps, so they can count those views, but that's only one way to deliver content. We're still having a debate around how to track apps.”

And social media is another Gordian knot, Raubenheimer says, and so far no one has found a way to cut it. “You have people using 10, 20, 30 different metrics to measure social media, none of them the same. We're hosting forums to debate, if not ways to measure social media yet, at least the terminology so everyone is talking the same language.”

Whichever end of the scale you are at, the only solution is increased scepticism. “If the numbers don't add up, demand to see the sources and the methodology,” says Goldstuck. Authoritative studies, he says, should include details of the data and how it was gathered and analysed.

“The integrity of data, and its interpretation, is vital for business decision-makers and marketers who are investing in social media,” says Fuseware MD Mike Wronski. Fuseware uses, for example, APIs provided by Twitter to gain stats from the source, and recently published the South African Social Media Landscape 2012 report, showing marked growth in local usage.

Does the rise of dodgy data and fake reviews risk drowning out the real numbers and diluting the honest reviews? This is nothing new - sites have been exaggerating their numbers for as long as the Web has been popular - but the steady migration of consumer activity and big business towards the Web means the impact is that much higher. Whether it is serious enough to warrant regulation over and above industry groups like the DMMA remains to be seen.

Share