Background
Certain South African companies have in the past attempted to establish their own unique, dedicated disaster recovery (DR) facilities. In many cases, these sites and facilities have fallen into disuse and the disaster recovery programme has not been sustained. The financial impact has run into millions of rands.
CSA has knowledge of a significant number of these recovery facilities, which have either failed outright, or which failed to meet the primary objective of effective DR.
Some South African companies have established and run successful recovery sites. In many cases, however, these sites are adjacent to each other and constitute a level of resilience to technical malfunction, but are not true disaster recovery sites.
The establishment of such sites was often driven by convenience, a factor that invariably compromises the principles of effective DR. The most important of these is to have the DR site a suitable distance from the production site ie, out of the CBD in a different Telkom infrastructure and power grid.
Typically in Gauteng, a distance of more than 10 kilometres is deemed to be acceptable. It is preferable to have an inferior standby site situated remotely from the primary site, than a state-of-the-art facility nearby.
A further motivation for in-house solutions is the desire for dedicated facilities and infrastructure, and many organisations interpret the need for dedicated DR as meaning in-house facilities are mandatory.
However, a commercial standby site can just as easily provide dedicated services as syndicated. Generally, this is cheaper than an in-house solution, as there are always some facilities that can be shared and no organisation requires a 100% dedicated solution.
This paper seeks to determine which factors may have caused the in-house recovery sites to fail, in an effort to assist organisations to avoid repeating errors made by others in the past.
Failed recovery sites
In most cases, in-house recovery sites were closed due to financial considerations, where the costs of maintaining the disaster site became prohibitive either after a downturn in business or when upgrades were required.
In some cases, the production processing capability was upgraded to the point where the backup equipment could never cope with the production workload in a disaster recovery situation. The situation remained in this state for an extended period before official acknowledgment was given and action taken.
Other companies made the error of offloading production or development work onto the standby configuration. This created the situation where it became so difficult to perform testing, that testing never took place, critically compromising the ability of the organisations to recover. In other cases non-critical activity became critical and could not be moved off the "backup" machine.
Where different configurations are used, there have been occasions where incompatibilities develop between the production and backup environments, as well as software incompatibilities, where an older hardware platform is utilised for recovery, and the new platform undergoes incremental software upgrades which eventually are unable to run on the dated recovery platform.
A common problem affecting in-house recovery sites is the discipline required to keep the recovery process working. Staff members are, in most cases, judged on normal day-to-day delivery and performance, not on their occasional ability to perform disaster recovery tasks. Critical functions such as testing of the recovery plans and change control is often neglected, and the ability to recover in a real disaster is then compromised.
Many organisations have subsequently closed their recovery sites and subscribed to a commercial service after the realisation that the recovery site was too close to the production facility and did not constitute a real solution.
Several South African companies continue to run their production and backup sites adjacent to one another because the cost of distancing the site is considered to be prohibitive, for example, where mirroring is required, and the Telkom costs would become exorbitant.
It must be stressed that the close proximity of the recovery site does not provide any protection whatsoever to large-scale disasters, civil unrest, denial of access, or telecommunications failure. We have also seen many of these companies storing production backup tapes at the adjacent recovery site, creating a false sense of security. One could argue the company would be better off with no backup site and the information backups stored securely at a remote site.
In a further instance, management approved budget for an in-house technical recovery solution after much persuasion from the information technology division. However, after several years, growing obsolescence rendered the solution unworkable and the technical staff were reluctant to inform management, who remained under the false impression that the business was adequately protected.
The situation is often further exacerbated by the fact that the in-house solution typically caters for IT recovery only, and little or no thought has been given to wider business recovery and continuity. Invariably, specialist business areas such as call centres, trading environments, and printing facilities have not been provided for. The business will not recover after a major catastrophe even if the central IT function does.
A company's internal second site is also at risk from disgruntled employees, strikes and other industrial action. Moreover, continuous upgrades to the production site in later years can lead to budget constraints, resulting in the recovery site lagging behind and becoming ineffective.
Conclusion
It is very expensive and high risk for an organisation to create and maintain an in-house recovery site. Only a handful of South African companies continue to run their own effective disaster recovery sites.
Most organisations, both locally and internationally, prefer to utilise a recovery facility provided by a commercial vendor. This was borne out by the 11 September attacks in New York.
Within literally hours, over a hundred companies had switched their production IT facilities to one of those provided by the commercial vendors. Many of these were dedicated facilities, which meant the organisation was protected even in an extreme situation where multiple organisations invoked the standby services.
Recommendations
Companies should carefully determine their overall business continuity requirements using qualified, external consultants who can provide an objective view. This should cover not only the IT recovery requirements, but also the overall business recovery requirement.
If an in-house solution is utilised, a three-year plan should be agreed on, with all costs shown and provision made for upgrades in the future. This solution should have dedicated staff with the required training and experience.
Furthermore, the solution should be subjected to independent audit on a regular basis if staffed in-house.
The recommended (and normally less expensive) option is to utilise a custom solution designed and provided by a specialist business continuity company. If required, such a company may provide dedicated, online real-time solutions on a commercial basis.
Major cost savings may be realised by the same company also providing additional services on a shared or syndicated basis, enabling recovery of all critical IT systems and business processes, as well as validation of up-to-date business continuity plans with regular tests.
Author: Jorgen Nielsen Consulting Services Consultant.

