Subscribe

Cornerstones of data integration performance

Hardware and networks are becoming faster all the time, but that doesn`t guarantee high performance in an IT environment.
By Charl Barnard, GM of business intelligence at Knowledge Integration Dynamics
Johannesburg, 28 Nov 2005

In a recent survey at The Data Warehousing Institute`s Winter 2005 Conference, 55% of those surveyed said they expected their company`s data volumes to rise more than 25% in the next 18 months.

As data volumes explode, user requirements and operating environments constantly change, and acceptable load windows shrink, data integration professionals must make the most of what they have by improving performance. And ultimately, the ability to manage every aspect of performance reduces the risk of having systems that don`t perform as well as the business needs them to.

Some 67% of those surveyed also said performance and scalability were their top priorities when selecting a data integration platform - even more critical than price, unified platform, codeless solution and ease of use.

Real-world enterprise data integration performance is more complex than a measurement of raw processing power, and is made up of four basic cornerstones:

* Throughput - the rate at which rows or bytes of data can be processed.
* Scalability - the ability to handle increasingly large and complex data integration scenarios.
* Availability - the resilience of the environment.
* Performance manageability - the most important concept encompassing the ability to manage or monitor across the often distributed, heterogeneous environment, respond to change and maintain performance levels.

Fresher information, faster

High throughput matters because it results in fresher data for the business, faster response to customers and increased operational efficiency.

But hardware is only part of the throughput equation. Multithreading and data partitioning, which enable companies to break up complex processing tasks and spread them across hardware resources, also speed response times. Another option is to capture only changed data and process it in real-time. This improves the freshness of data while focusing processing capacity only on data that has changed.

Predictable growth

Software with good scalability reduces risk, allowing precise estimation of project windows, flexible configuration and optimal resource utilisation. Scalability delivers a solid foundation for long-term performance.

To optimise an organisation`s scalability, map out expectations for growth of data volumes and requirements for different types of processing, such as batch or real-time. Perhaps consider adding capabilities that enhance scalability, such as server grid/MPP architectures and 64-bit support. These enable organisations to position themselves for growth without requiring extensive purchases of hardware that would be under-utilised for much of the time.

Organisational priorities

If a resource isn`t available in the wake of a component failure, then it has no throughput, scalability or manageability. But not all systems are equally important. For some applications and industries, system outages can cost lives. But more often, there is a grey area, and a certain percentage of downtime is acceptable.

Quantify the costs associated with downtime for each system or application. Sometimes, an outage means users are unable to do their jobs, reducing productivity. It can cost revenue if customers are forced (or annoyed enough) to take their business elsewhere, or if companies fail to meet service level agreements. Compare these costs to the cost of providing increased availability to see what makes the most sense.

Building high availability into a system requires designing it so there are no single points of failure, to ensure resiliency. In addition, implement appropriate fail-over mechanisms and backup capabilities to support the availability requirements for each application.

Manageability

For some applications and industries, system outages can cost lives.

Charl Barnard, GM of business intelligence at Knowledge Integration Dynamics

A company can`t consistently achieve good performance if it has poor insight into what`s going on and little ability to manage or monitor across the environment. After all, if an element within the system can`t be fully managed, is maximising its value delivering the performance needed?

It pays to invest in scheduling tools, which help ensure warehouses and ODSs remain up to date and accurate. Monitoring capabilities are also essential - companies must know about any potential bottlenecks or outages long before users call to complain.

Even a system that meets the firm`s needs perfectly today will need some changes in the future, so the ability to manage change is key to the ongoing value of the system. Organisations should be able to spell out the scope of a change and implement it in a rapid and predictable way. To do this, give developers the tools they need to work effectively, and visual development tools are the easiest to manage.

Also, the separation of transformation logic and physical execution eliminates any need to make modifications as a result of environment changes, freeing developers from a common chore. This is useful for migrating and deploying between environments or providing flexibility in heterogeneous environments so that processing can be done wherever hardware resources are available and without the need for manual intervention.

The biggest payoff of performance manageability becomes apparent at runtime. Manageability enables companies to process data in batch, on demand and real-time modes. Organisations can fine-tune source and target system interaction and handle process requirements such as profiling and cleansing. For elimination of bottlenecks and other performance problems, better manageability enables improved coordination between hardware resources.

Managing performance

To get from "potential" to "actual" greater performance requires better throughput, scalability, availability and manageability than most businesses have built into their environments. These four cornerstones need to be considered at the outset and reconsidered frequently during the operating life of a system.

Organisations that focus on designing their systems to achieve all four will ultimately realise the benefits of greater performance for increased competitive advantage: cost savings, time savings and greater productivity.

Share