Data scientists predict Oscar winners

Read time 3min 50sec
The 88th Academy Awards ceremony will honour the best films of 2015 and is scheduled to take place on 28 February.
The 88th Academy Awards ceremony will honour the best films of 2015 and is scheduled to take place on 28 February.

Data scientists at Principa, who last year predicted some results of the Rugby World Cup with a degree of success, are at it again.

This time they have used predictive analytics to foretell the winners of the Oscars - an annual American awards ceremony hosted by the Academy of Motion Picture Arts and Sciences to recognise excellence in cinematic achievements in the film industry as assessed by the academy's voting membership.

The 88th Academy Awards ceremony will honour the best films of 2015 and is scheduled to take place on 28 February at the Dolby Theatre in Hollywood, California.

Principa's prediction for this year's Oscar winners in the four major categories are as follows:

1) Best Film: The Revenant - winning by a very slim margin over Spotlight
2) Best Actor: Leonardo DiCaprio - winning by an eight-fold margin over Matt Damon
3) Best Actress: Brie Larson - winning by a 6.7-fold margin over Saoirse Ronan
4) Best Director: Alejandro Gonz'alez I~n'arritu - winning by a very slim margin over George Miller

The data scientists sourced volumes of data from Oscar winners in the four major categories since 1935.

"We tested a few machine learning algorithms and chose a combination of algorithms that showed the greatest predictive power in sample and out of sample," says Thomas Maydon, head of data analytics at Principa.

"We then added this data to our algorithms to identify patterns and trends to determine what the best predictors as well as characteristics are of an Oscar winner. Some of the best predictors identified have tended to be winning other awards, critics' ratings and bookie odds. Other predictors have been genre and box office revenue before and after Oscar nominations."

However, Principa says there are chances of surprises. "There is always chance for an 'upset' as we saw during the Rugby World Cup," says Maydon. "In terms of our own predictions, our algorithms have been almost 100% accurate when tested against Oscar wins over the past four years."

Maydon says the bookie odds have gotten all of the Best Picture and Best Actor winners correct in the last four years, but got one year wrong for Best Actress and Best Director respectively.

"Our model has improved on this by getting every single prediction right for the first three categories, but making the same mistake for the Best Director category (2012). Interestingly, the difference between our highest score for director in 2012 and second place - which actually won in the end - is only 0.001%, showing that it was incredibly close."

Nonetheless, he notes, as the bookie odds change every day - and this is one of Principa's predictors - the prediction today could change the closer we get to the Oscars.

"Our focus is more on the identification of trends and patterns from data spanning 80 years. We're identifying the characteristics that make up an Oscar winner by analysing the characteristics of Oscar winners since 1935 and spotting the patterns, as well as the trends; for example, are movies based on true stories growing in popularity among the academy members? Our analysis shows an upward trend for this over the decades."

Describing the challenges that the data scientists faced when making the predictions, Maydon points to the sourcing and scrubbing of the various datasets to get them into a state that they could be used by the company's predictive algorithms.

He explains that most of the data scrubbing involved extracting data from the Internet and packaging this data in the correct format, joining the various datasets together, and carrying out validations on the data to ensure it was fit for purpose.

In addition, there was inconsistency in the time span of the data available. The company was looking at films and actors nominated and chosen for an Oscar since 1935. However, many of the variables it chose to include in its calculation were not available that far back, such as Golden Globe and SAG awards, odds, and critic ratings, he adds.

Login with