When disaster strikes

Johannesburg, 26 Jun 2006

Here`s a sobering thought for those thinking their disaster recovery and business continuity (DR/BC) plans could wait a little longer: a Gartner survey reveals two out of five companies that experience a catastrophic event or prolonged outage never resume operations.

Of those that do, one of three goes out of business within two years as a direct result of that outage or event. The bottom line: if hit by a major disaster, there`s a 60% chance the company will be out of business within two years.

US-based auditing firm McGladrey and Pullen estimates 50% of all businesses that experience a critical system outage of 10 days or more will never recover. In those 10 days, the average business will lose 2% to 3% of its annual revenue.

That`s not to mention the damage to reputation a disaster can wreak on a company, says Colin Erasmus, technology security manager at Microsoft SA.

Some companies still do not take DR seriously and may see it as a grudge expense.
Allen Smith, MD, Continuity SA

"If your database gets hacked and stolen, how do you recover? What happens if you lose the personal details of a million of your customers? In California, you are obliged by law to tell those customers you lost their details. Pending legislation in SA has a similar notification clause.

"In this case, how do you recover at a customer and a reputation level? Suddenly, disaster recovery means a lot more than just fixing infrastructure," says Erasmus.

For many businesses the scariest part of unplanned downtime is lost customers. Regaining lost customers costs 14 times the initial investment made in gaining them, and that`s assuming they can be won back at all.

Business as usual

So what constitutes a disaster? In simple terms, a disaster is any unplanned event that disrupts "business as usual" - whether an earthquake, a terrorist attack or malfunctioning software caused by a computer virus. In fact, 80% of unplanned downtime is caused by software or human error.

The big mistake I see companies making is they simply don`t know what they really need to recover to keep their businesses running.
Arjen Wiersma, business development manager, BMC Software

It stands to reason, then, that disaster recovery is the process by which business is resumed after a disruptive event. The modern trend is towards business continuity, which is one step beyond disaster recovery and suggests a more comprehensive approach to making sure the business can be kept going after a disaster until normal facilities are restored.

The two terms are often married under the acronym DR/BC - but this does not imply the two terms are interchangeable.

Far too many IT departments think DR and BC are the same thing, and end up taking a purely technology view, says Chamu M`Kombe, business continuity and resiliency services business unit manager for IBM SA.

Don`t only test what you know is going to work. If you fail, it`s good - you have a problem that can be fixed.
Tim Knowles, MD, Stortech

That`s a problem, since recovering from a disaster goes far beyond IT, and touches just about every part of the business. "It is vital the disaster recovery plan is based on a sound business continuity plan that has taken into account the reality of the business requirements for recovery. If the disaster recovery plan cannot meet the requirements of the business units, it is of no value," says M`Kombe.

Business continuity really belongs in the boardroom, says Sheldon Hand, Symantec`s storage specialist. "Today, practically every business process relies on IT - effectively harnessing data that is stored somewhere. We`re storing more information than ever before, we have to recover it faster, and have to prove it is in the original format."

Ostrich approach

A big obstacle in SA is many companies are taking the ostrich approach to DR/BC; either pretending they don`t need it or having totally inadequate plans in place. Allen Smith, MD of Continuity SA, says apart from the financial institutions and telecoms companies, the hard fact is there are businesses that don`t have a clue about DR/BC.

"Some companies still do not take disaster recovery seriously and may see it as a grudge expense. Others use old tapes or old technology, which could mean their backups are practically useless in any case," says Smith.

Smith estimates 80% to 90% of SA`s listed companies have shown an awareness of proper DR strategies. However, many are still not doing a lot about implementation. Only about half of listed companies have actually implemented proper strategies.

Raul del Fabbro, storage division manager at distributor Drive Control, is more optimistic. He feels awareness of the need for DR/BC is definitely on the rise. Still, a common assumption remains that disaster recovery is only available to the biggest companies with the deepest pockets.

Stortech MD Tim Knowles says it`s not enough just to make sure the business and IT are aligned in all aspects, from planning to actual recovery. The plan has to be communicated to all levels of the company, so everybody knows what to do when the proverbial paw-paw is rapidly approaching the fan.

Three steps to a good DR/BC plan

1. Know thyself, says Sheldon Hand, storage specialist at Symantec. "See what the company is running - messaging, ERP, CRM - and list them all. Then decide what the impact on the business would be (financial, reputation and so on) if those were unavailable. Not all applications are equal! Focus on the most critical ones."
2. Draw up an enterprise-wide approach. Chamu M`Kombe, a business unit manager at IBM SA, says the plan should include people, processes and IT, and should be driven by senior people in the company.
3. Communicate the plan, and make sure everyone in the company understands it. Frans Nijeboer, a national practice manager at DiData, says it`s no use having disaster recovery if nobody knows the game plan when the system goes down. "The plan needs to be drilled into people, with specific teams and responsibilities. You could be preparing for a battle that never happens, but you need to be ready."

"After that, the tough part of disaster recovery is not getting the data to a remote site or onto a backup. It`s recovering that data in time to get the business back before customers even suspect there`s a problem," says Knowles.

"One area of weakness is the gap between what the business units think will happen and what IT thinks will happen. Business units might think e-mail will be back up in two hours, but IT has to spend another day recovering data. If your plans are in place, that shouldn`t happen," he says.

Frans Nijeboer, national practice manager for data centre and storage solutions at Dimension Data, says rather than hypothesise about potential disasters individually, a DR/BC plan should address two major factors.

How quickly, how much

The first is what is known in the industry as the recovery time objective: how quickly must lost data be recovered after a disaster? Some systems might not need to be recovered immediately, while others must be brought back online as soon as possible.

Popular myths about data integrity

The Companies Act and the King II Report stipulate clearly that organisations are both responsible and accountable for all their data.
Paul Mullon, information governance executive at Metrofile, says few organisations are anywhere near prepared to continue operations in the face of major disasters. They think they have recovery plans in place, but the data itself is often unusable. He cites several myths around data integrity:
1. I`m okay if I make backups.
That`s great, but what about the integrity of the data? Can it be read? If it can, is it what is needed? Is it properly recoverable? "You have to be sure your data is tested and audited," says Mullon.
2. I`m okay because I keep backups in a separate safe.
Question is, is that safe fire and flood resistant? Is the storage process auditable - in other words, can it be proved that backups were stored there? "It`s a corporate governance risk you can ill afford in today`s regulatory environment," says Mullon.
3. I`m okay if I keep backups at home, because that`s offsite.
Mullon rolls his eyes. "One of the ways storage media gets damaged is bad transportation and storage. If you value your data, you won`t put it in the back of your car."
4. I`m okay if I do regular backups.
Not so: Media has a finite life. Have it tested regularly.
Mullon says vast quantities of information that has been captured electronically in the past decade or two have been lost. This includes records of the Truth and Reconciliation Commission, and information from past space probes - which is irreplaceable.

A financial institution with a trading desk, for example, may lose millions for every minute that the traders cannot trade, and so time to recovery must be instantaneous.

Once these priorities are established, the business needs to decide how much data it can afford to lose. This is typically known as the recovery point objective, and must be quantified for the different areas of the business. To what point do the systems and information need to be recovered? Can a loss of time or a loss of data be incurred? If the last transaction was lost, can it be recovered?

BMC Software`s business development manager Arjen Wiersma suggests if a bomb drops on Johannesburg, the first thing banks will be scuttling for is their balance sheets and financial records, not their ATMs.

"One has to weigh up what is the business value of the data and what is the business risk of not recovering it," says Wiersma. "If a process stops, will it impact on promises made? And if so, is it cheaper to spend R20 million on disaster recovery, or to take a R1 million SLA penalty?"

Although restoring key operations is vital, other services cannot be forgotten, including e-mail, voice mail, access to the intranet or to the Internet. As Nijeboer says, these days if the e-mail stops working, the business stops working.

"The big mistake I see companies making is that they simply don`t know what they really need to recover to keep their businesses running," says Wiersma. "They must identify what kind of solution their budget will allow."

Shift thinking

Wiersma finds it bizarre that companies religiously backup the exact same data set billions of times onto tape.

"We backup the same stuff so many times it`s ridiculous. Some companies are unknowingly backing up their employees` illicit porn stashes, and physically taking them to a safe environment. This could be a problem. And why backup the operating system? XP is XP. You should only backup what you really need."

The downtime bugbears

Some events are planned and may include the following:
* Application database maintenance
* Data migration
* Hardware upgrades (processor, storage)
* Operating system or DBMS maintenance
* Disaster recovery preparation
Others come totally out of left field:
* Site disasters (floods, power outages, storms, fire)
* Hardware failures (disk, CPU, network)
* Operating system failures
* Operation errors
* Improper data feeds
* User errors or deliberate data corruption
* Application software errors
* Application performance degradation
* Fallout from application change migrations

Another question that needs to be asked is: Who will conduct the recovery? How the recovery will take place is probably the question most open to debate - or most dependent on how deep pockets are.

Helen Vermij, product manager at Storgate, says it`s possible to build a cost-effective, yet perfectly suitable disaster recovery strategy in a heartbeat. "It just requires a shift in your thinking process," she says.

Vermij believes most companies can get by implementing a hardware platform that is similar to the one being used in their primary data centre.

Today, virtualisation is the major technology trend affecting disaster recovery. By virtualising servers and storage, the dependency on the underlying hardware is removed. This is changing the shape of disaster recovery, making it easier to recover than before. Instead of having 50 servers in a disaster recovery environment, a virtualised environment can get the business up and running on one server.

Just don`t think that when drawing up DR/BC plans, cutting and pasting someone else`s will suffice. As Del Fabbro points out, there is no "one size fits all" - every company has unique needs, and using someone else`s specifications could spell disaster.

It`s crucial, says Del Fabbro, to test everything regularly. Tests must be done at least once a year to ensure the backup procedures are functional, all technology is still compatible and employees are still familiar with the proper procedures.

Knowles says companies should view disaster recovery testing as a quality control exercise, and not a final exam. "If you find bugs, it`s good. Don`t only test what you know is going to work. If you fail, it`s good - you have a problem that can be fixed."

Data recovery processes are imperative to have in place in case a disaster strikes, but testing the processes is also important. If recent headlines have taught us anything, it is that there is no such thing as "too safe" when it comes to protecting data. The business`s survival could depend on it.

When disaster strikes

In a business era where data is king, many companies are still treating disaster recovery and business continuity like a beggar - and it could come back to bite them.

Business as usual

Ostrich approach

How quickly, how much

Shift thinking