Subscribe

How we found the best place on earth

A SAS project designed to demonstrate the abilities of advanced analytics and machine learning has determined, thanks to a range of carefully weighted variables, where people would most like to live.


Johannesburg, 02 Aug 2018
Craig Stephens, Advisory Solutions Manager, SAS Africa.
Craig Stephens, Advisory Solutions Manager, SAS Africa.

While the growth of advanced analytics and machine learning has been significant, and the realisation of the kind of benefits this can have to organisations is increasing all the time, the fact remains that many enterprises have yet to grasp just how quickly and easily vast amounts of data can be drawn together to provide an answer to a specific question.

Craig Stephens, Advisory Solutions Manager for SAS Africa, explains there is still a lack of awareness as to how rapidly machine learning can parse through vast quantities of disparate data, in order to determine an answer that may otherwise appear impossible to find. It was for this reason, he explains, that SAS undertook a project aimed at demonstrating exactly how analytics works and the benefits it can offer to businesses.

"Realising that virtually everyone loves to travel and see new places, and coupling that with the mindset of some in South Africa, which is focused on issues like crime, the economy and immigration, we decided to undertake a project to use analytics and machine learning to find what was statistically, at least, the best place on earth to live," he says.

"To this end, we initiated a project called 'Paradise Found', which incorporated as much open and available information about as many places around the globe as we could possibly find. By collecting data from a vast range of sources, we were able to develop a case study that clearly demonstrates how analytics and machine learning can be used to find a specific answer from within a massive collection of data."

Stephens indicates the first phase of the project was the data phase, which involved collecting information from multiple sources. These sources included 57 different city studies, 1 060 international data services, online geo-location services and various forms of social media, to name a few. Ultimately, he says, SAS collected information from around 1 200 different unique data sources.

"Moreover, these sources had nothing in common with one another, other than a place, and the idea was to bring together all these various features, issues and variables, and put them through the machine learning process. By the time we began the second phase, we had collected over five million unique data points on more than 148 000 places in 193 different countries.

"The second phase of the project was the discovery phase, where we had to manage all this data and determine the weighting for the appropriate variables. Thanks to data visualisation, we were able to bring the number of places down to 15 000, which contained values from all four original data sets. After this, we imputed missing values, undertook principal component analysis and detected a series of eight dimensions that could be used to rank these places according to the importance the user placed on each dimension."

The last phase, he states, was the deployment phase, where the project was set up using visual analytics to effectively provide users with their personal paradise configurator. This was built on the SAS machine learning algorithm that was able to rank the places according to the importance each dimension was given by the user.

"The eight dimensions that were ultimately used were: safety and infrastructure; education and career; restaurants and shopping; nature; health; culture; cost of living; and family. What is particularly interesting is that, after parsing all the data available in the system, the machine learning algorithm pointed out that, overall, the best place on earth to live is West Perth in Australia.

"While this may prove to be a boost to the West Perth Tourism Board, this was, of course, not the intention. The aim was really to provide a clear demonstration as to how one is not only able to access vast quantities of data and extract millions of data points, but more critically, how machine learning algorithms can then extract a result from this mass of information that will provide a definitive answer to whatever question is being posed," he concludes.