Benchmark studies show that cleaning and sorting data before it is loaded into the database can accelerate the process by as much as 90%. This makes the process of data management an easy investment to justify.
With daily data loads of up to 3TB and growing, businesses could find themselves overwhelmed before they`ve begun.
Julian Field, Country Manager, Ascential Software SA
Yet two years ago we found large corporate clients declining to invest in industry-standard extraction, transformation and load (ETL) applications. Today that resistance has gone, and even the largest groups are switching to these ETL applications as they wrestle with data warehousing and customer relationship management (CRM) initiatives.
The reason is simple: while on the face of it, it may seem lower-cost to develop your own ETL routines, in the end it costs significantly more. The challenge of multiple back-end systems and as many - or more - target systems means a level of complexity beyond the capability of most IT departments to address.
These IT departments have learned the hard and frustrating way that hand-coding a complete ETL application is a more daunting undertaking than they had realised. Home-grown solutions seem seductively inexpensive in the short-term, but as organisational practice and data analysis processes change, the code must be altered on an ongoing basis.
The more the processes and code change, the greater the occurrence of broken, interfering code fragments. Data often becomes inconsistent and incomplete - an outcome that could have been avoided with an end-to-end data integration solution. And, as any company will testify, in IT and business today, change is the name of the game: change from an internal structure perspective; change through merger and acquisition; change due to competitive pressures; change due to new strategic imperatives, such as e-business; and change due to new reporting requirements or ad hoc business analysis.
There is also the challenge of multiple platforms. The financial department might run on Unix; sales wants to integrate with legacy data; marketing wants its data in mainframe database format - and extracting and processing data housed in a mainframe, and integrating it with other enterprise data constitutes a major undertaking, requiring a different set of tools and skills. These are business requirements, which cannot be denied by technical difficulties. Data needs to be managed across departments with different needs, across different technical specifications, across platforms, enterprise-wide.
Rigid routines
Summarised, the business cannot be held captive by inflexible business analysis systems, or rigid IT routines, when the requirement is for flexibility and agility.
There are further complications to writing your own ETL scripts and routines:
- . Is there any guarantee that staff will use consistent standards or program in line with international best practice?
- . Will they document their methods and processes? Experience indicates that this is seldom the case.
- . How long will these staff stay? When they leave, will they have left a solid foundation behind them for other staff to inherit? Again, experience indicates that this is not usually the case.
- . When major change occurs - corporate downsizing from a mainframe to a Unix system, for instance - will these staff be able to modify the ETL routines to accommodate the new system?
- . Will their home-grown scripts and routines stand up to the rigours of dramatically increased data flows?
- . Will they be able to integrate with an IBM MQ Series, for instance, so as to provide real-time data flows to a partner`s digital dashboard?
- . Will they be able to integrate meta data considerations, to ensure data is never out of context or misunderstood?
These are not trivial questions, given the way the market is going. For example, e-business applications are generating more data than any company has ever had to deal with. What`s more, analysts are predicting exorbitant data increases to come: Meta Group has predicted 10 000% growth in data volumes. If your business is managing 3TB today, you need to implement a data management programme to cope with 300TB tomorrow.
There is also the increasing need to interrogate data in real-time, or near real-time so as to permit management the most current view of business and customer activity, and so as to publish current, relevant data to business partners. With daily data loads of up to 3TB and growing, businesses could find themselves overwhelmed before they`ve begun.
In considering these requirements, there will also be the need to integrate third-party data from business partners, as is indicated by the drive to collaborative commerce. This will take ETL tools into the area of enterprise integration, which maintain real-time links between sources at different companies in an extended supply chain.
If the projected increase in data volume is just partially correct, a well-integrated data management programme will provide a major competitive advantage; and its absence will put the business at severe risk.
* (Please note that Industry Insight pieces reflect the view of the author only. For further stories and opinions on this subject, please visit ITWeb`s related sections.)
Share