System health - Throwing money at problems

Issued by Barnstone

Johannesburg, 11 Feb 2009

As central banks around the globe announced pumping billions of dollars into the global financial system, analysts had very different views on the question: “Will throwing money at the problem really make it go away?” In the world of business applications we can ask a similar question “Will throwing hardware at a 'sick' application really solve performance problems?” The answer is probably not.

As most IT departments are under pressure to manage cost, it is imperative that approved expenditure delivers measurable results. To tackle the problem is very difficult if nobody is quite clear on what the problem is.

To introduce additional hardware in an environment without a clear understanding of performance bottlenecks will often result in small performance improvements and limited scalability. Effective tuning is usually a series of incremental successes. Only by removing one performance bottleneck will it be possible to identify the next.

Before an informed decision regarding hardware upgrades can be made, the system should be current, healthy and performing optimally. In today's fast-growing and complex environments, a set of parameters that was optimally configured a month ago may again require adjustment, and updated releases of software components may be available.

Although it is not a substitute for ongoing maintenance and tuning, a technical review by a service provider that specialises in system health checks can be very valuable. Such a review will typically include a comparison to best practise and the deliverable is a report describing the status of the environment and a set of recommendations.

There is risk associated with making adjustments to an environment and the implementation should be planned and managed well. Where it is possible, testing should take place on non-productive systems before the changes are migrated to the productive environment. Some changes will require production downtime and should be scheduled not to interfere with business activity. Back-out plans should be realistic and well documented.

These are typical characteristics of a “healthy” environment:

* Patch levels - Operating system, database and application patch levels are up-to-date to ensure the latest error corrections and improvements are implemented.
* Performance statistics and error logs - Performance and error information is collected and retained to assist in the identification of problem areas (O/S, database and application).
* Architecture - The server landscape and network is correctly configured.
* Administration/Tuning - Operating system, database and application parameters are correctly configured and adjusted regularly as the environment grow and change.
* Routine tasks - Cleanup/housekeeping processes are correctly scheduled.
* Activity scheduling - Non-critical processing is scheduled outside peak online periods and load balancing is configured correctly.

If there are still performance or stability-related problems in a well-configured and “healthy” environment, at least the cause of the problems will not be concealed behind a cloud of bad tuning. In understanding the problem, it can be correctly targeted and throwing money in the correct place will make the problem go away.