Data analysis is not just algorithms
Math and stats can tell us all sorts of interesting things about data, by identifying patterns and unlocking buried truths or secrets.
These days, when people think of data analysis, several terms automatically spring to mind, such as predictive modelling, machine learning, qualitative and quantitative analysis, etc, the list goes on. All of these tend to be heavily grounded in the fields of mathematics and statistics.
I feel there is something missing in all of this and, in the past few years, our definition and understanding of data analysis, and consequently our focus, has shifted to accommodate this.
There is a great description of data analysis in Business Dictionary: "The process of evaluating data using analytical and logical reasoning to examine each component of the data provided."
The first component here refers to "analytical" which speaks to the prolific math and stats techniques available to us for analysing data. However, this is where I fear our focus ends, on the mechanics of analytics. There is a growing reliance on these mechanics, and a decreasing dependency on the logical, deductive reasoning that should partner with this to achieve effective analysis.
The reality is, the math and stats can tell us all sorts of interesting things about the data, by identifying patterns and unlocking buried truths or secrets. However, all of this insight can be meaningless if we don't have the necessary acumen to properly understand the story being told, and to understand how to take advantage of the information that we are presented with following analysis.
Data profiling techniques such as frequency distribution, standard deviation, covariance and correlation form the duct tape and cable ties of data analysis.
What becomes clear then is that data analysts need to focus not only on growing their analytic skills, but their ability to reason logically. Knowledge of business domains, processes and having an understanding of how systems work are all important aspects to this. Without this understanding, the impact and quality of data analysis will always be understated.
What is also often overlooked is to consider not only relying on the numbers, but to also look at the data. Too often, data analysts perform their data analysis at arm's length of the data. I would argue that an important facet of data analysis is getting your hands dirty. When embarking on a complex piece of data analysis, data analysts must take the time to look at the data and understand what they are seeing in their own terms, and through the lenses of the business.
I have also observed a growing trend in the industry, to compartmentalise the function of data analytics. In other words, if you dig beneath the surface, you will find that the analytics component only really exists in a few carefully constructed teams, such as credit risk, marketing automation, sales optimisation, etc.
One scenario often encountered is in the data engineering field, where teams are building operational/historical data stores, data warehouses, big data lakes, etc. Great energy and expense are invested in building these data solutions; however, the data engineers themselves don't utilise data analytic skills and techniques in the building of these solutions.
Data profiling techniques such as frequency distribution, standard deviation, covariance and correlation form the duct tape and cable ties of data analysis. These techniques can offer tremendous value to data engineering teams.
They can assist the business analysts in understanding their data, they can help the data modellers validate their data models, and the data engineers can use these techniques to build out test and quality controls as part of test-driven development.
Going beyond the development effort, data analytics can be included in the production system as a form of continuous quality assurance. This is an area where data analysis can truly be of great, practical use. This, however, based on my experience, is very rarely the case.
Sadly, very little budget is assigned for these activities, and project teams often don't assign ownership and responsibility for this. The result is that less and less real data analysis is being performed in the implementation of data solutions, which has a direct correlation to the increase in quality incidents being reported in testing and post-production.
To end off, I will leave you with some advice one of my early mentors gave me. I remember one of my solutions failed in production post-deployment. I basically tried to pass the buck onto the business analyst, by saying that I didn't know the data well enough to pick up the problem in testing.
My manager and I always got on very well, so when she became critical of me, it made me stop and listen very carefully, which is what I hope any data engineer reading this is doing now. She said that not knowing the data or the system was a poor excuse for not getting hands-on with the data. She said there are very few systems, in this digitally connected age, where we do not have some level of familiarity with, that we do not interact with, at least in a peripheral context.
This provides us with an almost intuitive understanding of all types of data, and should allow us to perform certain deductive, logical tests on any data, to help us better understand what we are seeing.
Ultimately, it is about taking what you know, your own perspective, enhancing this with whatever research you can do, and shining the light of common sense on the data you are looking at. No calculator, computer, complex algorithm is required for this. Just good old-fashioned common sense and reasoning.
Do not over-rely on the tools and underestimate your own ability in the domain of data analysis.
Julian Thomas is principal consultant at PBT Group, specialising in delivering solutions in data warehousing, business intelligence, master data management and data quality control. In addition, he assists clients in defining strategies for the implementation of business intelligence competency centres, and implementation roadmaps for a wide range of information management solutions. Thomas has spent most of his career as a consultant in South Africa, and has implemented information management solutions across the continent, using a wide range of technologies. His experience in the industry has convinced him of the importance of hybrid disciplines, in both solution delivery and development. In addition, he has learned the value of robust and flexible ETL frameworks, and has successfully built and implemented complementary frameworks across multiple technologies.