Data mining is more about process than it is about clever algorithms. When the process is not well understood, all the clever techniques and algorithms can get applied to the wrong data, in the wrong way, and yield incorrect results.
This is the view of US-based analytics guru, Michael JA Berry, who has been invited to SA by SAS SA, leader in business intelligence, and marketing insights company, Knowledge Factory, to present data mining training in August.
"A corollary is that the skills, knowledge and intuition of the human data miner in coaxing meaning from recalcitrant data are more important than tools and techniques," says Berry.
"Data mining can provide unintended results. Finding data that is inaccurate is more dangerous than finding factual data that is not useful, because important business decisions could be based on incorrect information."
Data mining results often seem reliable because they are based on actual data derived in a seemingly scientific manner. However, this appearance of reliability can be deceiving. The data itself may be incorrect or not relevant to the question at hand. The patterns discovered may reflect past business decisions or nothing at all. Data transformations, within the system, such as summarisation, may have destroyed or hidden important information.
He points out that humans depend so heavily on patterns in their day-to-day lives that they tend to see patterns even when they don't exist.
"If you look at the night-time sky, you probably do not see a random arrangement of stars but the Big Dipper, or the Southern Cross, or Orion's Belt," he says.
"Some even see astrological patterns and portents that can be used to predict the future. This was an early form of data mining. The widespread acceptance of outlandish conspiracy theories is further evidence of the human need to find patterns in data."
Berry believes the reason humans have developed such an affinity for patterns is because patterns often reflect some underlying truth about the way the world works. For instance, the phases of the moon, the progression of the seasons, the constant alternation of night and day, even the regular appearance of a favourite TV show, at the same time, on the same day of the week, are useful because they are stable and therefore predictive.
One can use these patterns to decide when it is safe to plant tomatoes or how to program the VCR. Other patterns clearly do not have any predictive power. If a coin comes up heads five times in a row, there is still a 50-50 chance that it will come up tails on the sixth toss.
"The challenge for data miners is to figure out which patterns are predictive and which are not - to separate signal from noise," explains Berry.
During his visit to SA, Berry will present an intensive three-day data mining course at SAS Institute.
A data mining author and expert, Berry has 20 years data mining experience, and specialises in applying advanced analytical techniques to solve practical business problems.
He co-authored the best-selling book in the field, "Data Mining Techniques for Marketing, Sales, and Customer Support" (John Wiley & Sons, 2004, 2nd Edition), with his colleague Gordon S Linoff.
The two authors also collaborated on "Mastering Data Mining", a case-study-based approach to best practices in data mining for every stage of the customer lifecycle.
Berry has extensive experience applying data mining techniques, such as rule induction and memory-based reasoning, to extract actionable information from very large parallel databases to solve real business problems.
In 1997, he founded Data Miners, a consultancy specialising in predictive modelling, data mining education and the integration of data mining into standard business practices.
Share