Big data as a business driver
Big data is being monetised and is becoming an asset on the balance sheet. Increasingly, executives are seeing how big data offers a short path from insights to revenue.
This was the word from data warehousing thought leader and teacher, Ralph Kimball, who was speaking at an ITWeb event in Bryanston yesterday.
While operational data, customer behaviour data and big data have disrupted the data warehouse and forced it to evolve into something quite sophisticated, the mission remains unchanged, he said.
According to Kimball, roughly half of the use cases of big data analytics come from behaviour tracking, which involves everything from ad tracking to online game gesture tracking, among other things.
He unpacked the intricacies of ad tracking, mentioning how Yahoo actually auctions the advertising that appears on browsers in real time. "In the second or two between you clicking on a page and the page appearing, Yahoo conducts an auction with interested advertisers who want their content on the next page." This is part of big data analysis, he pointed out.
Likewise, he relayed an experience he had at the offices of Zynga, the social game services company responsible for Farmville and Cityville, among others. According to Kimball, in the corner of the development room, there was a screen documenting, in real time, how many people around the globe were playing Zynga games at that particular time, which, on a Tuesday morning, was roughly 65 million.
"But they don't track the number of people playing the game; they record every micro-gesture of every player," he said. This helps them to keep their users safe from hackers and to better understand the user's needs, which ultimately means they are required to analyse huge amounts of data.
"The current flood of data is at biblical proportions," he said, stressing that it cannot just be stored without being analysed. In line with this, he pointed out that while many believe big data to be synonymous with unstructured data, it can also be hyper-structured data. This kind of data is often generated from things like smart utility meters, building sensors and in-flight aircraft information.
All of this data calls for integration, he said, adding that data silos should be eliminated.
According to Kimball, there are two serious responses to this big data conundrum, MapReduce/ Hadoop, or a relational database management system (RDBMS).
MapReduce/Hadoop is a framework or style of organising a task. It allows for the organising of vast amounts of data and shuffles it so that parallel reducers can do the work, explained Kimball. An RDBMS allows the architect to create, update and administer a database that includes a collection of items organised in formally described tables, he said.
But businesses should note that this is not an either/or situation; there are also hybrid possibilities that combine the two, Kimball pointed out.
Other data warehouse disruptions include the rise of the data scientist, noted Kimball. He called on data warehouse architects to embrace data scientists by building cross-department analytic communities and insisting on shared data warehouse resources.
Big data serves as a roadblock to our comfortable situation as data warehouse architects, he said. "But it is not the size of this data that is a concern, it is the variance and the complexity of it," he noted, questioning if big data has the potential to break the traditional data warehousing model.
"Big data is absolutely a part of our future. We must continuously put our arms around the large task of bringing all the data together and presenting it in a manageable way for business users."