Answers in no time
Internet search engines with instant query responses may have misled enterprises into believing all analytical queries should deliver split-second answers.
With the advent of big data analytics hype and the rapid convenience of Internet searches, enterprises might be forgiven for expecting to have all answers to all questions at their fingertips in near real-time.
Unfortunately, getting trusted answers to complex questions is a lot more complicated and time-consuming than simply typing a search query. Behind the scenes on any Internet search, a great deal of preparation has already been done in order to serve up the appropriate answers.
Google, for instance, dedicates vast amounts of high-end resources and all of its time to preparing the data necessary to answer a search query instantly. But, even Google cannot answer broad questions or make forward-looking predictions.
In cases where the data is known and trusted, the data has been prepared and rules have been applied, and the search parameters are limited, such as with a property Web site, almost instant answers are possible, but this is not true business intelligence (BI) or analytics.
Behind the scenes
Within the enterprise, matters become a lot more complicated. When the end-user seeks an answer to a broad query - such as when a marketing firm wants to assess social media to find an affinity for a certain range of products over a six-month period - a great deal of 'churn' must take place in the background to deliver answers. This is not a split-second process, and it may deliver only general trend insights rather than trusted, quality data that can serve as the basis for strategic decisions.
Most business users are not BI experts.
When end-users wish to do a query and are given the power to process their own BI/analytics, lengthy churn must take place. Every time a query, report or instance of data access is converted into useful BI/analytical information for end-consumers, there is a whole lot of preparation work to be done along the way: ie, identify data sources> access> verify> filter> pre-process> standardise> look up> match> merge> de-dup> integrate> apply rules> transform> pre-process> format> present> distribute/channel.
Because most queries have to traverse, link and process millions of rows of data and possibly trillions of words from within the data sources, this background churn could take hours, days or even longer.
A recent TWDI study found organisations are dissatisfied with the time it takes for the chain of processes involved for BI, analytics and data warehousing to deliver valuable data and insights to business users. The organisations attributed this, in part, to ill-defined project objectives and scope, a lack of skilled personnel, data quality problems, slow development or inability to access all relevant data.
The problem is most business users are not BI experts and do not all have analytical minds, so the 'discover and report' method may be iterative (therefore slow), and in many cases, the outputs/results are not of the quality expected. The results may also be inaccurate as data quality rules may not have been applied, and data linking may not be correct, as it would be in a typical data warehouse where data has been qualified and pre-defined/derived.
In a traditional situation, with a structured data warehouse where all the preparation is done in one place, and once only, and then shared many times, supported by quality data and predefined rules, it may be possible to get sub-second answers.
But, often, even in this scenario, sub-second insights are not achieved, since time to insight also depends on properly designed data warehouses, server power and network bandwidth.
Users tend to confuse search and discover on flat raw data that's already there, with information and insight generation at the next level. In more complex BI/analytics, each time a query is run, all the preparation work has to be done from the beginning and the necessary churn can take a significant amount of time.
Therefore, demanding faster BI 'time to value' and expecting answers in sub-seconds could prove to be a costly mistake. While it is possible to gain some form of output in sub-seconds, these outputs will likely not be qualified, trusted insights that can deliver real strategic value to the enterprise.
Mervyn Mooi is a director of Knowledge Integration Dynamics (KID), and also a key resource within the company's information management, data warehousing and business intelligence teams. He has been in the IT industry for 36 years, beginning his career as an operator at the CICS bureau in Johannesburg in the early 1980s. Thereafter, he was appointed as a programmer at state-owned oil exploration and production company SOEKOR. In 1986, Mooi joined Anglo American's head office IT department where he remained for almost 12 years. Here he progressed to become a senior programmer, analyst, database administrator and technical support specialist. After completing his degree in informatics, he then left to join Software Futures, where he worked as a senior consultant for 18 months in the data warehousing and business intelligence arena. Mooi joined KID in 1999 as a data warehouse and business intelligence specialist. Mooi's experience in ICT disciplines includes operations, business and systems analysis, application development, database administration, data governance/management, data architecture/modelling, production application and systems software support, data warehousing and business intelligence. He now focuses on enterprise information management, information governance and cloud solutions.