Reasons for BI mistrust and how to defeat it
A lot of South African businesses still don't trust business intelligence (BI) solutions to give them credible information on which to base their strategic decisions. The standard joke is that the CEO gets the executives together in the boardroom where they ply through the spreadsheets and graphical reports the minions spent countless hours creating, before tossing them all aside and going with their gut instincts, writes Francois Cross, director of IT Business. The joke exists and is amusing to those in the industry, specifically because it is based on some element of the truth. Many BI solutions really are unreliable, and in many cases that's because the data they have to work with is itself unreliable.
In the old days, businesses would collect their data, filter it into a database, and then work with that to create a view of what was going on in the business. But the typical number of data stores has grown and there is a great deal more interaction with other businesses now too. That wouldn't be a problem if everyone used the same type of database, or if all databases used the same standards, and also if the companies supplying the data ensured it was all spick and span.
Anyone who has spent even a small amount of time in the industry knows that good quality data is a rarity, and that although standards exist, they are seldom adhered to, and architectures differ in the exceptional instances where they're successfully applied. The incentive to supply good quality data is also often one-sided. Cross recently worked with his team to supply a customer's BI solution with data from 12 third-party providers who run national operations in the telecommunications industry.
The customer uses its networks to operate a distributed service for a portion of its subscriber base, which is around 90 000-strong. The financial incentive for the service providers is slim, but for this customer, who needs to understand the subscribers' habits to supply a more robust and focused service, the financial implications are far greater.
At the moment they get about 60% of their data from the third-party suppliers, typically in flat files like .csv and .txt types, uploaded to an FTP site. The rest of the data originates in their database. And that presents people with a problem. The 60% of data from the 12 different sources arrives in multiple formats and the quality is erratic. It's what they call a lack of referential integrity, it's one of the primary reasons why businesspeople don't trust BI solutions, and it's what people mean when they say the data is bad, it's unreliable and ultimately leads to a lack of trust. So how do users deal with that? The best solution is to get all of the service providers to provide the right data, in the right format and ensure it is always correct before they give it to out. And while there is a business project under way to achieve just that, it does take some time to arrange. Even so, if an agreement is reached it is unlikely that the quality of the data will be consistently good. In the meantime, the company needs to provide some business value for their customer.
What ITBusiness does is have scripts that automatically check the FTP sites for the daily uploaded files. Then they automatically pull those files into their customer's system, where they subject it to an extraction, transformation and loading (ETL) process. The challenge during extraction, which really sets the tone for the remainder of the project, is that data stores use different formats for organising the data. In this particular case, they are better off than they could be because all of the files received are flat files, so in one format, although within that format there are discrepancies. The transformation component is what ITBusiness is building when they develop the rules that are applied to the data they receive and how they prepare that for operational use by their customers' decision-makers. Basically that means they interrogate the files they get and check to see if they have missing rows, columns and if some of the data has been left out.
They then start creating rules for the software to check those files in the future. Over the period of a month, they'll be able to build a set of rules robust enough to give them usable data. Those rules help them to develop the DQ aspect of the project. They also merge the data sets, deduplicate data, aggregate the data and so on. Once they've run the rules, they'll load the data into the warehouse. The way they run this particular operation is to have some technically skilled consultants operating the systems, which control the largely automated processes. The systems do not require very powerful hardware so they're quite cost-effective. In this instance there is 600Gb of data. The key to getting the most out of the data and maximising the return on investment (ROI) is in the software tools and the skills to run them. They really make the difference between being able to work efficiently with the data or requiring relatively labour-intensive operations that don't quite meet business requirements. In this case they are using a mixture of software tools, some of which are supplied with popular boxed database software, which means the customer already owns them, alongside more specialised tools from a global vendor. The mixture means they can contain the costs as far as possible while still delivering the business benefits that will ultimately deliver the best ROI for the customer.
While the issue of multiple data sources and lack of referential integrity is a primary cause behind BI mistrust it is not an insurmountable obstacle. It does, however, require a fair level of education on the customer's part, particularly where there are several sponsors and divisions or departments that must buy into and support the project. The technical employees in South African businesses, from the CIO down, usually understand the process and the results that can be obtained, but they are seldom masters of their own budgets and never write their own requirements.