Mind the gap

Todd Goldman, VP and GM for Enterprise Data Integration at Informatica
A lack of knowledge, resources and skills. These are the three main aspects that are stifling companies when dealing with big data.

This is according to Todd Goldman, VP and GM for Enterprise Data Integration at Informatica, who believes that in addition to the above, the misunderstanding of how to capture big data - the technology and processes required to make it actionable - is what enterprises of all sizes need to grasp before they can reap the full benefits of big data.

"At the moment, companies are in three positions with regards to big data," he says. "The first is that they don't understand big data and how it can benefit them, and have, therefore, pushed the idea of implementing it onto the back-burner. The second position is where the company has a plan for big data in that it has the necessary resources ready and has been successfully running its big data model in a test environment, but lacks the expertise to put it into practice. The third is a very small minority of early adopters who have been able to successfully implement Hadoop-based analytics."

"This is where Informatica comes in, as we provide software that makes it easy to implement business and IT processes to turn raw data into great data that is ready for analysis on Hadoop, without requiring an army of Hadoop developers," he says. "Furthermore, once a company is able to integrate all the data it needs from disparate sources, it is ready to preform the data analysis that gives it a competitive edge."

Use the correct tool for the job at hand

"For small data sets or for smaller companies, a simple, well structured Oracle database that can be queried is more than sufficient," says Goldman. But, many think that a platform like Hadoop is the way to solve all of their big data challenges. This is where the confusion comes in, as Hadoop is designed to process very large data sets, but it doesn't make the data clean, integrated or fit for purpose.

Users often also confuse what Hadoop actually does. The program will not magically integrate, clean and optimise the data automatically for the user. This, unfortunately, is still a manual process and performing analytics on dirty data will only give bad results. Currently, 80% of data scientists spend their time prepping the data rather than using their expertise to gain valuable insights and make recommendations to improve business performance.

The best way to gain the full benefits of big data is to regard data as a business process that needs to be managed. "Implementing new technology is simply not enough, but companies need to make their data scientists more efficient by building processes around managing and analysing their data," he says.

The skills problem

Goldman goes on to say that Hadoop is a technology that is constantly being improved and developed, and so it is very difficult for a company to find and retain employees who are well trained to use it. It is just too complex for the average developer to be successful.

"This is similar to an operating system like the Mac OS. Users buy a Mac because it is easy to use. Many of them don't know that it is based on the Unix operating system, which is very difficult to use and many of them don't care. They just want their Mac to work," says Goldman.

To try and facilitate this problem, Informatica hides the underlying complexity of Hadoop by providing users with an easy-to-use graphical user interface that allows them to easily integrate, clean, and prepare data for analysis on Hadoop, without requiring them to have any knowledge of the complexities of Hadoop.

"Our graphical development environment also allows our users to take advantage of new code, new features and new functions and roll them out with the next update, without disrupting how a user currently uses the program," he concludes.

Todd Goldman will be speaking at Informatica Day, a half-day event designed to bring the data community's brightest minds together to push the boundaries of current thinking and discuss the role of data in the real-time digital enterprise of the future.

The event will be held in Johannesburg on 21 October. Those interested in attending can fill out the registration form found here.

