For corporates today, it`s a matter of data, data, everywhere, but not a drop of information, says Alan Raubenheimer, CEO of Embarcadero Africa. He highlights some factors you should consider.
Every business today, regardless of size, faces challenges in managing its digital corporate information. These challenges take many forms. Perhaps most familiar is the now-classic problem of gathering information from various applications, departments and individuals, and consolidating it in a format that can be analysed.
It is no longer possible to address fundamental business questions by querying a single system, or even a few systems. Instead, organisations need to integrate dozens of applications to achieve a global view of all facets of their operation, such as customer behaviour, supply capabilities and business performance. The practice of data warehousing evolved to address this area of concern.
Data warehouses give businesses a single view into their data, enabling them to make more informed decisions. The process of moving and manipulating the data before loading into the data warehouse is widely regarded as the riskiest, least maintainable, most expensive, time-consuming and underestimated portion of any data warehousing project.
Moving data is not a need limited to data warehousing. Many projects require gathering data from various sources for integration into something new. Perhaps you have a home-grown application that you are updating or replacing. You will have to migrate data from the old system, combine it with information stored in another place, and load it into the new application. Or perhaps you are building a reporting data mart, but the scale does not warrant the high cost or complexity of tools designed for enterprise-level data warehousing.
Software applications that assist in these processes have been available for several years. The products were developed specifically to address the needs of enterprise data warehousing projects undertaken by very large companies. Such projects have historically had long timelines and huge budgets, and the cost of the data movement tools reflects both of those characteristics. It is not unusual to hear of allocations of over $1 million for data movement and transformation software in data warehousing budgets. And this does not include implementation costs!
Hand-coding
Because of the high cost of acquiring and implementing tools for data movement and transformation, the majority of these needs are addressed by hand-coding. In other words, writing scripts, lots of scripts. These might be developed in PL/SQL, VB, perl, C or Java. Most staff find this work tedious, recognise it to be a never-ending burden, and would rather be working on more interesting projects.
Your employees are the business subject matter experts. Rarely will you find employees who can do the manual and tedious technical development. Contractors often fill the void under employee supervision. Unfortunately, employees rarely have the opportunity to learn about the work or understand the code, and documentation is usually minimal.
Of course, when something breaks, the people who created the code are long gone. Does this scenario sound familiar?
The challenges of data manipulation
Regardless of the specific business problem being addressed, there are three steps that must be accomplished to get your data in the form and location you need:
1. The data must first be extracted from the various sources, and brought together. These sources can be as diverse as large relational databases, smaller databases and spreadsheets created by business analysts, XML files, and even plain old text files.
2. Once the data has been accessed, it is then transformed. This involves changing data types, removing extraneous fields, cleaning up unwanted characters in text fields, and creating fields that previously did not exist. You must then apply business rules until the raw incoming data is in the format that can be consumed by the target database, application, or file.
3. The final step is to load the transformed data into the target.
The process of extracting, transforming and loading data is so common that it is known by its acronym: ETL.
Various products exist to assist in these tasks, but they are big, complex and expensive. As the days of unlimited IT budgets are gone, the tool you select has to be cost-effective and still provide the richest feature set and performance for your limited budget. The tool must be able to handle the most mundane tasks of ETL right out of the box. Installing and configuring the tool also should not be a project in itself.
It should easily connect to your various data sources, extract the data, join heterogeneous sources, convert data types, eliminate duplicate records, and determine the appropriate targets for the formatted data. You should not have to devote your most skilled programming resources to accomplish this.
Your ETL tool should also meet additional requirements. Aside from being easy to install, easy to learn, and easy to use, it also needs to expand to fit your needs. The tool must flexible enough to manage your more complex transformations. It has to improve the efficiency of your enterprise without constraining you to a single way of approaching any particular problem. It has to bring value to all levels of the organisation, whether it`s the business analyst, DBA, or seasoned developer.
Indeed, it has to be simple, yet extensible. It needs to be flexible and powerful, yet provide immediate productivity gains.
Oh, and wouldn`t if be great if the product documented your processes as they are developed?
Traditionally, experts who recommend using an ETL tool have asserted that the value of these products is realised in terms of the long-term maintainability and flexibility as opposed to immediate productivity.
This is fine for those multi-year, enterprise-wide projects with budgets in the millions of dollars. But long-term ROI is not what you are seeking when looking at the many data manipulation tasks you face on a daily basis, and is not consistent with the new "just-in-time" mindset.
One of the biggest problems with software technology is that it is too complex. Software packages are typically difficult to install and configure, difficult to integrate into the existing environment, and difficult to upgrade. As a result, a very large part of any company`s technology budget goes to software maintenance.
Diminishing the costs of software maintenance should be everyone`s goal. Software vendors must do a better job of application integration out of the box instead of making business do it.
A big step toward overcoming these challenges is addressed by increasing pressure on software vendors to adhere to standards.
Conclusion
This is by no means an exhaustive list of requirements a potential customer should be asking of an ETL tool ... but it will give you a good head-start ... and help you overcome the price tag traditionally associated with ETL.
Editorial contacts

