The complexity conundrum

Johannesburg, 20 Jan 2016

The growing complexity of third-platform technology components, such as big data, is at the forefront of CIOs' minds. With no choice but to embrace these technologies for competitive advantage, organisations are faced with a reality where data growth is outpacing storage capacity growth, and budgets are not increased to accommodate the increasing space and performance requirements.

Now more than ever, data needs to be carefully managed across different boundaries, without losing control.
Sven Hansen, NetApp

This situation has given rise to a need for new architectures that can transform enterprise storage from an overwhelming problem into a true business enabler. Cloud computing, software-defined storage (SDS) and copy data virtualisation are all presented as possible solutions.

Attila Narin, Amazon Web Services' head of EMEA solutions architecture and business development, believes cloud computing should be at the centre of any strategy dealing with data. "The cloud accelerates big data analytics, machine learning and other business benefits that come from storing vast amounts of data. It provides instant scalability and elasticity for storage and allows companies to focus on deriving business value from their data instead of maintaining and managing storage infrastructure."

With technologies such as mobile, social, big data and the Internet of Things (IoT), more data than ever before needs to be stored and analysed by organisations around the world. "What we hear from customers is that the only way they can affordably, and easily, take advantage of these new areas of technology is through the cloud. Cloud computing ensures that our ability to analyse large amounts of data, and to extract business intelligence, is not limited by capacity or computing power."

NetApp's systems engineering manager for Africa Sven Hansen is also seeing customers moving towards object store style solutions to better manage large data volumes. "In the relative blink of an eye, technology innovation that allows for incredible insight to be derived from information has harnessed data into a valuable corporate asset that needs to be retrieved efficiently and guarded carefully," says Hansen. "Now more than ever, data needs to be carefully managed across different boundaries, without losing control. Data is extremely valuable and personal, but it is also extremely cumbersome."

Before cloud

In order to best manage data's value, some organisations are moving away from being the builders and maintainers of datacentres themselves, becoming brokers of services that enhance business performance. As a result, Hansen says, there are now many new service providers and vendors competing to meet these needs. "The sooner organisations realise what the cloud is - and what it is not - the more effectively they will be able to use the power of information to their advantage. As the fog of the cloud continues to clear, its complications are simplified and its incredible benefits are revealed," he says.

Before heading full-steam into cloud, however, Actifio solution strategist Gareth Donald suggests reducing data complexity through virtualisation. "Businesses are moving more of their infrastructure and applications into the cloud. However, challenges still remain and are largely centred around antiquated approaches to data management, which continues to bind application data to physical infrastructure. By moving towards the concept of virtual smart data, transformative flexible cloud models can be achieved."

Virtualisation in the age of smart data is the next frontier on the march to the cloud.
Gareth Donald, Actifio

He believes virtualisation in the age of smart data is the next frontier on the march to the cloud. "Copy data management is the solution as it is based on data virtualisation. By decoupling the data from the infrastructure within the datacentre where it is stored, the data becomes much easier to organise. It's basically the same concept as virtualising servers. Virtualising reduces the complexity inherent in linking the data to its physical location, and enables duplicate data to be identified and eliminated. Decoupling data from infrastructure reduces the amount of data that needs to be stored, and makes it much more accessible. Instead of being hidden in a specific application silo, virtual data is available to the whole IT system, and thus to the whole business," he says.

Maurice Blackwood, IBM's systems executive, agrees that complexity must be reduced first, and just putting it in the cloud doesn't necessarily solve the problem. "What simplifies the complexity is the software layer, which, in essence, is a first step towards cloud migration. Software-defined storage (SDS) is a key consideration for local enterprises seeking to manage data growth and those now provisioning for cloud," he says.

Complexity

Says SUSE's indirect account executive Derek Rule: "As businesses tighten operational costs and look for more ways to be leaner and more innovative, SDS will change how you budget and how you take the big data challenges of today."

Traditionally, he says, IT has worked out its storage requirements by looking at data growth levels and making a projection. "It has generally paid for increases by replacing old arrays - in an asset management lifecycle - and put new costs into specific projects, for example, paying for a storage cost in a new business process, even sneaking it in. But as the African market moves to SDS, businesses should have one storage pool, one storage budget, unlimited scale - and a different conversation with suppliers."

This approach is especially beneficial for enterprises that are expanding their geographic footprints across Africa, says Blackwood. "South African IT departments are experiencing increasing complexity in their storage environments, while at the same time facing budget constraints. They are managing and storing petabytes of data at the moment, and next it will be yottabytes. It's being stored in a sprawl of servers and siloed environments. And if they are expanding across borders into neighbouring African countries, they are adding further complexity in that they must integrate new IT environments into the existing enterprise environment as quickly and seamlessly as possible," he says.

As the African market moves to SDS, businesses should have one storage pool, one storage budget, unlimited scale - and a different conversation with suppliers.
Derek Rule, SUSE

Aside from big data analytics being a driver for improved data storage and management, enterprises must also optimise the storage environment for compliance reasons. A range of relatively new legislation has come into effect, Blackwood points out, requiring businesses not only to store and manage data securely, but also to make it readily accessible when required.

Whether for compliance, complexity or capacity reasons, current storage architecture needs to change to adapt to the rapid rate of technological change. As Hansen states: "The innovations and advances in today's technology are happening so quickly and so consistently, it's almost like being on an airplane. During the flight, it feels like you're sitting still, even though you're moving at more than 600 miles per hour."

Six storage requirements

Frederick Strydom, senior systems engineer: SAS specialist for EMC South Africa, has identified six factors the traditional primary storage platform should not compromise on.

1. Control

Control is not just about allocating storage based on required tier and hoping it provides the required performance for the application. It has to include certain proactive commands and monitoring capabilities, so that performance-related issues can be dealt with quickly.

2. Trust (availability, data integrity and encryption)

Redundant hardware is not the only feature a customer should expect from a storage system. In fact, it should have redundant active controllers at the bare minimum, as well as advanced fault isolation. Accurate and up-to-date statistics for storage arrays should also be a key criteria. When evaluating system encryption, it's important to ensure the system uses a different key to encrypt every disk.

3. Performance

The architecture should provide the right amount of performance needed for each application. System resources should run efficiently and not be fixed to individual front-end or backend ports or emulations.

4. Agility

The key so enabling an agile system is through the virtualisation of the storage system and its data services. The system should be able to rebalance itself across controllers and the disks attached to each. One of the most crippling limitations are systems that claim scale-out architecture, but require manual rebalancing. A truly agile system should also have the ability to scale beyond a single system, be able to recommend movement of workloads between systems, and have the ability to move/migrate applications seamlessly between them.

5. Economics

Power efficiencies and operational efficiencies should both be considered. For the cost of the system to be worthwhile, it should be able to automatically tier workloads on different disks, predict algorithm performance, predict workload changes, and have the ability to move data proactively - for at least the next two to three years.

6. Scale

A storage system requires architecture that has scale-out capabilities. When looking at scalability, the following should be considered: the maximum number of LUNs; maximum LUN size; maximum number of SNAPs per LUN; maximum number of SNAPs per system; maximum number of storage groups; and the maximum number of hosts (initiator groups).

How Hadoop conquered big data

Source: SAS

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

What are the benefits of Hadoop?

One of the top reasons that organisations turn to Hadoop is its ability to store and process huge amounts of data - any kind of data - quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things, that's a key consideration.

What is Hadoop used for?

Going beyond its original goal of searching millions (or billions) of web pages and returning relevant results, many organisations are looking to Hadoop as their next big data platform. Popular uses today include:

* Low-cost storage and active data archive. The modest cost of commodity hardware makes Hadoop useful for storing and combining data such as transactional, social media, sensor, machine, scientific, click streams, etc. The low-cost storage lets you keep information that is not deemed currently critical, but that you might want to analyse later.

* Staging area for a data warehouse and analytics store. One of the most prevalent uses is to stage large amounts of raw data for loading into an enterprise data warehouse (EDW) or an analytical store for activities such as advanced analytics, query and reporting, etc.

*Data lake. Hadoop is often used to store large amounts of data without the constraints introduced by schemas commonly found in the SQL-based world. It is used as a low-cost compute-cycle platform that supports processing ETL and data quality jobs in parallel using hand-coded or commercial data management technologies.

* Sandbox for discovery and analysis. Because Hadoop was designed to deal with volumes of data in a variety of shapes and forms, it can run analytical algorithms. Big data analytics on Hadoop can help your organisation operate more efficiently, uncover new opportunities and derive next-level competitive advantage

* Recommendation systems. One of the most popular analytical uses by some of Hadoop's largest adopters is for web-based recommendation systems. These systems analyse huge amounts of data in real time to quickly predict preferences before customers leave the web page.