Subscribe

When disaster surges

What do rising prices and uncertain power supplies mean for business continuity and disaster recovery? More factors for companies to juggle, that's what.
Paul Furber
By Paul Furber, ITWeb contributor
Johannesburg, 23 Jun 2008

What effect has our spate of blackouts had on businesses? Business continuity (BC) and disaster recovery (DR) experts see effects ranging from greater awareness to tighter budgets to meltdown (sometimes literally).

Despite the well-known consequences of poor disaster and recovery planning, many companies continue to wing it. What's the way forward?

Present at a roundtable to discuss these questions and some possible answers were: Mark Ogden, senior manager of technology and risk services at Ernst & Young; Christelle Larkins, area manager for MGE Office Protection Systems at Eaton; Nick Keene, country manager for Citrix South Africa; Chamu M'Kombe, business unit manager for BCRS at IBM Global Services; Petrus Human, head of professional services at Attix 5; Greg Montjoie, GM of hosting at Internet Solutions; Ingo Tuschardt, MD of Quintica SSA; Vinny La Bella, MD of Gallium Professional Services; Andre Hurter, manager of storage software products at Drive Control Corporation; Brenton Halsted, CTO of data centre and storage solutions at Dimension Data; and Bruce Nicholson, CIO of the i5 Group.

How is load-shedding affecting company budgets for business continuity and disaster recovery?

Greg Montjoie, Internet Solutions: It's not only budgets, but running costs as well. We have provision for outages, but the severe impact for us is that we've seen seven or eight times the spend on running costs at our facilities. A good example is a generator set that only needs to be typically serviced every four to six months. Now we're looking at maintaining it once or twice a month. So, not only fuel and oil, but also time and labour have gone off the scale.

Mark Ogden, Ernst & Young: And the likelihood of a generator going down is high. Ours did! It is under severe stress, we're not a small company, there's more load on it, and when it went down, we were out for another six hours - which tripped all the routers too. Also, because of the way the company is structured, all traffic is routed through the Johannesburg office. Even if there's no load-shedding in Cape Town or Durban, if we get taken out, they are too.

Petrus Human, Attix 5: It was either Sanlam or Santam that did a survey and currently 60 percent of companies in South Africa have generator sets; the rest don't. We've found in our Cape Town office that some of the unprotected PCs have hardware failures because of the power resets.

Nick Keene, Citrix: Just from my own office's perspective, I went out and purchased extended batteries for all our staff so that we can survive extended blackouts. Now we don't own any infrastructure in our office except Internet connectivity, but the interesting thing is that everyone else in our office park went to the landlord and organised a generator. That's convenient, of course, but now my operational cost has climbed by R1 000 a month whereas I spent R6 500 on extended batteries, which I could write off over a decent period of time. Now I have this recurring cost, which I'll probably never get rid of.

Christelle Larkins, Eaton: I can confirm what everyone is saying just from growth in UPS sales. We've seen 150% growth over Q1 of last year. The problem is that during the first two months of load-shedding, most of the buys were panic buys. People were running around buying whatever they could. We got phone calls from people wanting UPSs to run hair dryers, swimming pools, kettles and TVs. I think a lot of people have gone into this without thinking about it properly. They haven't thought about what the TCO is, how much a solution will cost over three to four years.

Chamu M'Kombe, IBM: The way we look at it is that there are three different systems for the use of electricity. The one is the direct line from the supplier, then the second level UPS, then the generator. A lot of clients will go to the first and second levels for their critical systems. We find a lot of clients want to sub-contract the third phase - the generator - because it's too expensive for them to go out and buy that in the market. We've seen a lot of cases where clients come to us and ask us to support them at the third level with generators if the UPSes don't work.

Vinny La Bella, Gallium: Running a professional services team means that when something goes down, we have to catch up the time or we catch a lot. The budget for additional resources is something else that needs to be taken into consideration. And it gets hectic!

Human: Just to confirm that: we have the exact same problem. A lot of backups fail when the line drops or when load-shedding happens, and my guys typically spend more time fixing and maintaining customer batteries than they previously did. We're quite lucky in how the software works in that we've built in a resume function for when the line drops. We've also built in scheduling so that secondary backups can happen if the first fails because the power went off.

The vast majority of companies don't have a strategy in place.

Bruce Nicholson, CIO, i5 Group

Ingo Tuschardt, Quintica SSA: Budgets always have two sides. Who's making money out of all of this? I like to be controversial, but I think it's benefiting us a touch more than it's harming us. It's not all negative.

Larkins: I think a lot of people are throwing solutions out there and over-speccing and over-selling. What we've done in our own office is put in LED lights above everyone's desk; and we take our printer offline when it's not needed and some of our servers. We have a 5KVA UPS system that can get four to six hours runtime. So, it's not necessarily about costing you a lot if you're picky about what you put on them.

Has the current situation of coping with outages meant that companies have taken their eyes off the IT ball?

Montjoie: I think there's two tiers to it. I think it's accelerating the South African IT market to catch up with world trends in terms of consolidation into the cloud. Virtualisation and centralisation is a massive trend; traditionally bandwidth has been a constraint, but prices are dropping and it has just forced companies to accelerate what they would have done anyway - which is centralise everything in a data centre with redundant power.

The second tier is that we've become a more conscious society about how we manage our power. It is a positive spin-off in that we will look back and say load-shedding got us to a level of maturity that we wouldn't have otherwise reached.

M'Kombe: If you look at the scale, we have at the one end companies that have absolutely no implementation of business continuity or disaster recovery, and at the other, we have world-class implementers. And then there is everyone else in between. But they have all moved up a couple of notches. You still have people with risky solutions, but a few months ago they had nothing. Now they're thinking about what they could do to get something that's workable. The net effect is to move everybody up, which is a good thing.

Bruce Nicholson, i5 Group: Part of companies' needs is to come up with a strategy and that's where we come in with advice on, for example, King II, which says that as part of good governance, you need to have a business continuity strategy and the board is accountable for it - it's not an IT thing.

What we've experienced in SA at the enterprise level is that the vast majority of companies don't have a strategy in place and load-shedding has highlighted that. Load-shedding is just the tip of the disaster iceberg: if we can't cope with four hours without power, how can we cope with something worse?

Ogden: Philosophically, is this really a BC or a DR issue? It's just part of the landscape that we have power outages once or twice a week for four hours at a time. You can't take the view that you've got a disaster once or twice a week. So, businesses and people are adapting. Now BC and DR have a role to play in educating people about manageable solutions. But business needs to adapt and mitigate the problem, either through increased sales or through doing something else during the downtime.

How has the landscape changed the ability to implement DR and BC?

Ogden: The name of the game is resilience. How will you deal with the outages impacting your ability to conduct your operations on a day-to-day basis? How do you put things in place to manage wear and tear with machines coming up and down the whole time? What is the impact on your staff? People are coming to work earlier, going home later, missing morning meetings and so on. How is your business able to cope with all of that?

Keene: I've seen initiatives that change priorities within IT strategies. About three years ago, we embarked on a campaign to find out more about working conditions. Flexible working was always there in the rankings, but not a priority. Now it has really started to move up. One of the factors is that the data centre has always been available and protected.

But, one of the problems, depending on the type of organisation, is travelling. The other day, I drove from Rivonia to my office in Sloane Street during an outage and it took me two-and-a-half hours. I can try to use my laptop or work with my phone while driving, but that's not conducive to my health.

Tuschardt: Load-shedding is now a day-to-day thing and I don't think it's part of business continuity or disaster recovery. It's just day-to-day availability of business. To me it highlights the need that people have to plan for these events that are out of our control, but it's now everyday business. In Lagos, everyone runs on generators; in Dubai, traffic jams are so bad that I've missed flights - it's sad, but it's highlighting some of the problems that others face all the time. It's highlighting our need, but it's not actually BC and DR.

La Bella: What we've also noticed is a lot of our customers are working very closely with metropolitan areas as well. If they have a generator in the basement, they are making it available for the traffic lights so that their staff can get in and out of the building. The size of the generators that some of them are buying is astronomical - you can run small cities off some of them! What it's done is made us better planners.

Larkins: Not having the electricity is one part of it. But what South Africans need to realise is that the quality of electricity in this country yesterday and today is not what it needs to be. What Eskom has done is increased the voltage in some areas to compensate for lower current. When the power is switched off and switched back on, it's not controlled. We are seeing a lot of damage - even on our UPSes that have surge protection built in. We've seen 20 or 30 units a week come back with blown input chargers from surges and I can just imagine the repercussion on other IT equipment behind that.

The name of the game is resilience.

Mark Ogden, senior manager of technology and risk services, Ernst & Young

Nicholson: If I can relate a personal story here - or rather a company one. Our UPSes in the server room were blown because of extreme voltages. Now, no one will give you a UPS because they say we need the power coming from Eskom to be clean. We say we need to keep our servers running because we host applications, websites and all sorts of other things - this is our company business. We can't expose the server directly to the Eskom line because the power coming in isn't clean. A spike will do lots of damage.

Larkins: It was one of our UPSes that blew [at i5]. We had installed it two days before and the voltage measured when the power came back on was something like 400V. It just blew the whole UPS. So, the whole unit, every single charger and power module, was gone and because of the number of spares that blew, we had to airfreight emergency spares in to replace it. But there's no guarantee that it won't happen again. And the only alternative is to expose the server directly.

Andre Hurter, Drive Control: While on that point, yes, you can have surge protection and UPSes, but if you hit a server with spikes, you can corrupt the data. What happens when your data is corrupt? That's where proper backups and business continuity comes in, including failing over to other sites.

Human: We've found that companies have done their research and we certainly seem to be providing more solutions than we do pure hardware and software products. It's a good idea to have a mirror site somewhere for backup purposes in case of disk corruption - keeping your eggs in two baskets rather than one.

Montjoie: I agree that, by definition, dealing with load-shedding is not the same as business continuity and disaster recovery, but I think that the point is we are using the same mechanisms to cope with it that would traditionally be used to implement BC and DR to service the industry through those four-hour gaps. So, we shouldn't call it that, but the mechanisms and the backend technology are the same.

Brenton Halsted, Dimension Data: We're doing a lot of server consolidation projects at the moment; it's probably one of the biggest areas of our business. It's all about reducing power and cooling load, and we're seeing that going to the desktop as well. Thin clients are replacing PCs on the desktop because the power requirements are so much less.

Virtualisation also makes disaster recovery much easier. If you have a virtualised infrastructure in your data centre, then you need much less equipment at a secondary site. It is possible to have a ten-to-one consolidation ratio.

Keene: The way that data centres were built out is that we would build a silo of servers. We would understand that the customer would need a certain amount of, say, e-mail servers. You would possibly use the ratio of N+1 for e-mail, database and application, and that would occupy a certain amount of space.

Virtualisation separates the physical from the logical and allows for dynamic provisioning. Instead, my problem becomes how to put what workloads I have in order to run my business onto what hardware workhorses I have to perform the tasks. And then I can start to shrink the amount of servers needed to do this. Typical figures are 25% fewer servers needed to run dynamic environments as opposed to static ones.

Tuschardt: I need to throw something out here. I keep hearing us talk about business continuity and disaster recovery in the same light. But disaster recovery, although it's where we all make our money and it's by far the highest budget item, is a subset - a function - of business continuity. I think it's important that we don't mix those two.

I'll give you an example. I did a presentation in the Middle East once to an exco of a huge commercial institute and we were talking about business continuity: how they assess the risks and how they go about it. Then I asked: 'What are you guys doing about it?' 'No,' they said, 'we have a disaster recovery site.' This is the standard answer from 95% of businesses. So, I said: 'Great, you have a disaster recovery site. What are you doing about that?' And they said: 'Right now, we're commissioning it.' They were swapping facilities with their business partners. For the five years before that, they were using swapped facilities with another company. But all the rest of the board - without exception - didn't know that. Some individuals in the company knew, but the business wasn't aware of it. They hadn't put the people plans or processes in place. Companies don't test their business continuity plans properly either. They do it once a year instead of every time there's a change in their businesses.

Light ahead

DR is a subset of BC. It's important we don't mix the two.

Ingo Tuschardt, MD, Quintica

Eskom may have thrown South African consumers and businesses a bone in the time between this debate and its publication, but the upside of load-shedding has been considerable. We're more efficient, we have better business continuity and disaster recovery infrastructure in place, and we're experiencing what much of the rest of the world takes in its stride. And that's not a bad thing.

Share