Evolution of ETL tools: What your firm needs today

Neither traditional nor cloud platform-specific ETL tools alone may be up to the task of integrating data from multiple sources across increasingly hybrid environments.
Read time 4min 00sec

As organisations move to hybrid cloud environments, we are seeing the same challenges emerging that drove the development of ETL (extract, transform, load) tools in the first place. However, this time, ETL tools themselves are contributing to the complexity.

Thirty years ago, when data warehouses were starting to proliferate within the industry, the easiest mechanism to get data into and out of them was via SQL.

A few years on and the realisation had started dawning that a more efficient, scalable and maintainable method was needed. Enter the humble “ETL” tool. At the time, it revolutionised the way that data flowed between operational systems and data warehouses, and there were a handful of vendors with products such as Informatica PowerMart (PowerCenter), Ascential DataStage and Oracle Warehouse Builder.

Data integration is a foundational building block for many organisations' data-driven initiatives, ranging from data warehousing to digital transformations. Without a standardised integration architecture and mechanism, organisations often struggle to achieve the desired outcomes.

In 2021, ETL tools are a commodity with the capability baked into many data platforms. However, these capabilities still largely address an on-premises ETL requirement.

With more companies moving to the cloud, the traditional ETL tool is no longer an adequate solution for future-proofing a cloud strategy.

ETL tools native to public cloud environments are often adopted to address the cloud requirement, but few support any future need to move data back and forth through multiple cloud platforms and hybrid environments.

Therefore, for many organisations, moving to cloud brings with it massive integration jobs at huge cost. With myriad ETL tools in use, organisations can face new complexity and the need for additional coding and scripting, which doesn’t enable a workable and sustainable environment. These challenges are much the same as those we saw in data warehouses 30 years ago.

Determining the use case

To future-proof the architecture and minimise the future costs, defining the use case/s for the data integration technology should be an early priority.

Organisations need to consider: what are they moving data to the cloud for? They need to identify the use cases to drive the type of data integration they need: will they require batch integration, real-time integration (API integration), built-in data quality, for example, and based on this, determine what type of mechanisms they will use.

With more companies moving to the cloud, the traditional ETL tool is no longer an adequate solution for future-proofing a cloud strategy.

Due to data integration being core to business operations and many businesses having multiple customer-facing and back-end applications, the selected data integration technology needs to be versatile and address multiple requirements.

Many of the use cases we encounter are related to cloud initiatives with the legacy “ETL” tools no longer addressing this requirement.

The importance of a cloud-native solution with the aforementioned qualities has become increasingly evident and decisions on technologies need to be based on “future-proofing” an enterprise data integration strategy instead of evaluating based on siloed use cases.

Many organisations moving to the cloud do this gradually and therefore require a hybrid data integration solution that has the ability to integrate data on-premises, on-premises to cloud, and cloud-to-cloud.

Achieving optimal performance

With cloud and big data platforms making available large reserves of processing capability, a data integration technology that still requires its own powerhouse of processing is neither efficient nor cost-effective, considering the data ingress and egress costs associated with cloud platforms.

Future-proof ETL solutions should have the ability to push processing to the data platform and in this way reduce much of the performance inefficiencies and reduce ingress and egress costs.

Data governance and data management principles suffer without an integrated platform. To enable data integration across a diverse cloud and on-premises environment, organisations need to select their integration technologies in parallel with their choice of a cloud platform. Most look at platform analytics and performance, but overlook the critical data integration component.

Because organisations will always have some form of hybrid architecture, they need to ensure there will always be sustainable integration and interchange between on-premises and cloud, without the need for future redevelopment and cost.

Scalability, diverse connectivity, self-service and a cloud-native architecture that addresses hybrid use cases are now some of the key principles that any organisation needs to consider when evaluating technologies for cloud data integration.

With a robust cloud-native integration technology platform connecting all systems on-premises and across clouds, they can standardise how they approach integration, and standardise the skills set needed.

Veemal Kalanjee

MD of Infoflow.

Veemal Kalanjee is MD of Infoflow, part of the Knowledge Integration Dynamics (KID) group. He has an extensive background in data management sciences, having graduated from Potchefstroom University with an MSc in computer science. He subsequently worked at KID for seven years in various roles within the data management space. Kalanjee later moved to Informatica SA as a senior pre-sales consultant, and recently moved back to the KID Group as MD of Infoflow, which focuses on data management technologies, in particular, Informatica.

See also