Spotlight spoils data scientists' party at Oscars
Data scientists at Principa who attempted to foretell the winners of the Oscars using predictive analytics have missed one of their calculations.
Two weeks ago, the data scientists said the Best Picture award would be taken by 2015 American survival western thriller - The Revenant. However, during the 88th Academy Awards ceremony held yesterday, this was not the case as The Revenant was edged out by biographical crime drama Spotlight.
Nonetheless, the data scientists got three of their predictions spot on. They rightly predicted the Best Actor award would go to Leonardo DiCaprio, Best Actress would be won by Brie Larson, and the Best Director award to be taken by Alejandro Gonz'alez I~n'arritu.
The Oscars are an annual American awards ceremony hosted by the Academy of Motion Picture Arts and Sciences to recognise excellence in cinematic achievements in the film industry as assessed by the academy's voting membership.
Principa uses a combination of statistical packages like SAS, R and Excel to make such predictions.
"We correctly predicted Leonardo DiCaprio for Best Actor, Brie Larson for Best Actress, and I~n'arritu for Best Director," says Thomas Maydon, head of data analytics at Principa.
"However, Best Picture was a different story. Our models indicated that Spotlight and The Revenant had an almost equal probability of winning - there was less than 1% difference in probability. It was a virtual coin-toss. We went with The Revenant and in the end Spotlight took the honours for Best Picture. So we got three out of four correct," he points out.
Maydon notes the contest was very close and Principa saw that in the lead-up to the Oscars with each of the top three films - The Revenant, Spotlight and The Big Short - potentially winning key awards.
According to Maydon, the main challenge was with acquiring the data. "We scoured a variety of data sources to pull the data together, which takes 90% of the time. The remaining 10% is allocated to the actual building of the models.
"We also like to combine quantitative and qualitative data in our models. Quantitative data is data like 'how many Oscars has the director previously won?' or 'what was the age of previous winners?' It's information that's easily expressed as a number. Qualitative data is data like 'how did critics respond to this actor's performance?' These data sets are more tricky to introduce into our models. We used bookie data and recent awards as a proxy for the qualitative data," he explains.