Purple people and data scientists

Data scientists communicate meaningful stories from data, regardless of data size or complexity.

Read time 4min 20sec

In 2011, McKinsey & Co released an analysis of the growing big data market, and predicted that, by 2018, "...the US alone could face a shortage of 140 000 to 190 000 people with deep analytical skills, as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions".

Of course, anybody falling into this category might be curious as to how this figure was arrived at. Understand this, predicting such a broad range of numbers seven years into the future, while statistically risky, is from a marketing point of view a relatively safe statement, as they're unlikely to be judged in retrospect. It is precisely the kind of people who look at the statement from both an analytical and business perspective who would fall into the category of "purple people" or "data scientists".

"Purple people" has been a fairly broad business intelligence (BI) term described quite concisely here, referring to individuals with a skills mix of two traditionally separate BI entities (ie, business and technology, red and blue, resulting in a purple blend). However, the title of "purple person" or even "chief purple person" never really caught on in the IT industry.

Sexy science

Hence, by adding in a pinch of big data with a dash of statistics to the role, the industry has recently and literally coinedthe term "data scientist". No less a publication than the Harvard Business Review, in October 2012, called it "the sexiest job of the 21st century", which, even for an industry accustomed to hype, was guaranteed to raise some eyebrows.

I was prompted to write this Industry Insight on this topic in light of my experiences during the recent 2013 ITWeb BI Summit, where a presenter asked the audience, by a show of hands, "Who is a data scientist?" I was surprised when I was the lone hand raised in the audience, which, considering this is the primary local conference for BI in SA, either makes the McKinsey report eerily prescient, or my understanding of what makes a data scientist biased in my favour.

A more realistic interpretation is that, with the role of a data scientist being relatively new and not consistently defined, raising your hand in a public forum to a role which is not formally in your job title would involve sticking your neck out to a degree. Alternatively, your boss sitting behind you may interpret it as a request for a raise, because "with great titles, comes great responsibility" (and occasionally, pay).

Rare breed

Yet, from personal experience, in the South African BI industry, these rare purple people do exist, but what would be a useful definition to help discover these potentially valuable resources? I suggest one here.

A wide range of skills needs to be harnessed to translate raw data into a meaningful end result.

A data scientist is a person who has the ability to communicate meaningful stories from data, regardless of data size or complexity.

Considering this definition carefully, an emphasis on statistics or big data does not necessarily make a data scientist - a wide range of skills needs to be harnessed to translate raw data into a meaningful end result, as well as communicate in a way that tells a story of interest to the audience. To do this, any or all of the following skills may be called on: technical IT, behavioural economics, statistics, visualisation, psychology, business knowledge.

The reason that Michael Lewis (2003 book 'Moneyball') and Nate Silver (predicted 49/50 US state results in the 2008 presidential election) became relatively well know is that both told interesting stories - the former about baseball performance and the latter about how effectively biased news station pundits were based on their political leanings. The fact that they were both based on data and statistics was just a means of getting to the end result. has an interesting opinion piece that makes the distinction between vertical data scientists who have deep knowledge of a narrow field, such as statistics or computer science, and a horizontal data scientist who has a wide range of knowledge over multiple areas. Branding vertical data scientists as 'fake' (as they do) sounds extreme, but Lewis and Silver would definitely fall into the horizontal data scientist category.

To sum this up, there will be a general industry shortage of the blended skills required to create value from data, and the BI industry in general will be challenged to move beyond the reporting and dashboard mindset.

Companies which get a head start in identifying and nurturing talent with the attitude, curiosity, intelligence and broad range of skills required to develop into data scientists, will acquire a competitive advantage over their slower moving counterparts.

David Logan

Principal consultant, PBT Group

David Logan has been specialising in the data warehousing and business intelligence (BI) field for over 15 years, working for a variety of clients in the telecoms, insurance, banking and retail industries in the UK and SA. He is currently a principal consultant at PBT Group, with a particular focus on data visualisation and performance in the very large (billion row+) EDW space. He promotes a “back to basics” approach to data warehousing, with an emphasis on results over theory. Experience has taught him that the end-customer experience of the value produced by a data warehouse is the primary determinant of success, with up-to-date, accurate, and accessible business insight being the goal. A current area of interest is in using BI to drive the ETL process for a data warehouse, ie, creating an efficient, dynamic and “thinking” ETL process, which requires little maintenance.

Login with