Three things you should know about deduplication

Just because storage is becoming more affordable, doesn't mean you should store every single piece of data that you have.

Johannesburg, 05 Apr 2018
Read time 3min 10sec
Claude Schuck, regional manager, Africa, Veeam.
Claude Schuck, regional manager, Africa, Veeam.

As a result of the explosion of available storage and the ability to outsource it at more affordable rates - ie, no need to invest in an expensive onsite data centre - there's a misconception that data deduplication is a dinosaur.

However, Claude Schuck, regional manager for Africa at Veeam, says deduplication is extremely relevant to businesses of all sizes: "But, how it's done and where it is done is evolving."

Data and storage evolution

Deduplication of your stored data is more important than ever before, says Schuck: "Yes, the cost of storage is coming down, but the explosion of both structured and unstructured data is uncontrollable and even at reduced storage rates, it's still going to cost your business a fair amount to store all of that data."

And let's face it, big data really is just that. It consumes storage space and bandwidth, and it needs to be kept secure. All resources that need to be efficiently managed.

While the cost of disk storage has dropped significantly, and data centres have become more efficient and able to store more data in less space, the outsourced storage model can turn out fairly costly for businesses that are storing multiple versions of their data, says Schuck.

What are your hosting providers doing?

"The quantity of data that you store is usually quite manageable when you're storing it onsite because you're very aware of how much capacity you have. However, when you start pushing workloads and storage into providers who charge cheap rates per gigabyte, you have to ask the question, are they deduplicating your data or just storing everything that you have?"

This is a very valid question, as even if you are paying a relatively low per-gig rate, that'll soon add up if you're paying to store everything. Schuck recommends that businesses deduplicate their data regardless, as the cost savings will be significant. "Deduplication could as much as halve your storage costs. With the General Data Protection Regulation and the Protection of Personal Information Act, businesses are now compelled to store certain data in a specific way for a predetermined period. That cost can be offset against the savings incurred on storage, to some extent."

Recovery time matters

He also cautions businesses to implement a deduplication technology that works with their infrastructure and meets their needs around the data. "The downside to hardware deduplication, which is used by the majority of the bigger vendors, is that while it's highly efficient at compressing data, restoring that data can take longer than half an hour - and that's just too long when users are wanting a quick recovery of between three and five minutes."

Hardware deduplication, in Schuck's view, is ideally suited to data that is destined for long-term archiving. But for so-called fresh data that needs to be accessed quickly, software deduplication enables instant recovery. Software deduplication works by compressing the data into one big flat file, whereas hardware deduplication pushes the data across multiple discs.

"While software deduplication isn't as efficient as its hardware-based contemporary, the upside is instant access to that data," he says. "In addition, hardware deduplication vendors charge per terabyte, whereas having access to software deduplication in-house can incur cost savings for the business over and above those realised on using less storage, less bandwidth and having to secure less data."

Read more about data reduction techniques by downloading this white paper on the topic: