What makes a data scientist?
If capitalising on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive.
The first step in filling the need for data scientists, therefore, is to understand what they do in businesses. The next is to ask: What skills do they need? And what fields are those skills most readily found in?
An introduction to the role of a data scientist
Megan Yates, Ixio Analytics, a data scientist in her own right, tackles these questions head-on - what skills does a data scientist need? And what fields are these skills most readily found in?
Data scientists have an exceedingly broad range of skills. These range from data mining, data management, data analysis and statistics to business skills. The term data scientist tends to be used by people with highly differing skill sets. Despite these diverse competencies, the function is specific.
The function of this role is to use the results of data-driven analysis to inform future actions. Data scientists use tools to find signals and patterns in data without any human filtering and to provide data-driven actions from this work.
While many of the methods fall under the field of data mining and predictive analytics (and consequently the field of data science isn't vastly different from either), a data scientist would, in addition, be responsible for leading actions that result from the data-driven work. Using data to describe and show what happened historically is business intelligence. In the field of business intelligence, data analysts generally present these historical results to business experts, who decide on the strategy.
Most data scientists have a master's or PhD degree and come from engineering, physical science, maths or computer science backgrounds. While having a master's or PhD is not a criterion for the role, and there are exceptions, a strong educational background helps with the knowledge and tools required for this task.
Technical tools required for the role include R, SAS, Python, SQL, Hadoop, familiarity with cloud tools such as Amazon EC2 and Amazon S3 and the ability to work with unstructured data. Statistical and machine learning capabilities are essential.
In addition to the range of technical skills needed, data scientists also need to convey results to business. It may seem like the set of skills is a big ask of a single individual, but for those who enjoy working with data, possess an inordinate amount of curiosity and are passionate about seeing results, these skills develop easily.
Few people leave university with all of these skills (although data science degrees are becoming more and more common) and many people currently working in this field are self-taught, with the drive to learn originating from complex business challenges and need. Self-teaching is a crucial trait because it is almost certain that the tools available in 10 years' time will be different from those we use today. There are a multitude of sites facilitating learning and practice in this field and Yates would highly recommend an individual testing themselves in some data science competitions.
The second term of the role - scientist
Despite a distinct function, which is generally agreed upon among experts, little attention is paid to the second term of the role - scientist. The scientific method is an iterative procedure of observation, measurement, experiment, testing and finally adjustment or modification of hypotheses.
The key here is that predictions are tested, and the results of these tests included in further refining and testing one's hypothesis. Good tests come from carefully set-up, controlled and replicated experiments. Fortunately for data scientists, the digital world makes this part considerably simpler than other science disciplines. Models, with control groups and treatments can be deployed and monitored in near real-time.
Most businesses have a clear idea of what they do and sell, however, they tend not to run in a scientific way. With the availability of data, cheap computing offered by cloud providers and many open source statistical and machine learning tools, businesses should be tracking, modelling, predicting and testing their customer's behaviour. Data has a time value and the sooner businesses start to follow a scientific method for modelling and predicting customer behaviour, the better for their bottom line and their future strategy.
How would your career path look?
Yates started working in this field in 2012 and her experience in this role has been split into two main areas: the first in optimising operational and cost efficiencies in businesses and the second in customer behavioural modelling, prediction and testing.
The majority of the work in which Yates has been involved over the past three years has been driven by business challenge and need. Continuous improvement in both the solutions as well as her skills has been the dominant theme.
Yates believes part of the reason the field of data science is so successful is because practitioners take responsibility for the outcomes. Data scientists are intensely involved in the actions, deployment, testing and further improvement and striving to make solutions better to get real business results sets them apart from other data, statistical and analytical roles.