Avoid data bottlenecks

By Alison Job for Tintri
Johannesburg, 15 Dec 2017

Dennis Naidoo, Senior Systems Engineer, Middle East and Africa, Tintri.

Data storage bottlenecks occur when the available resources are unable to facilitate the efficient handling of the amount of available data. Dennis Naidoo, Senior Systems Engineer, Middle East and Africa, Tintri, explains how this happens. "Most organisations are on a journey to some type of cloud model, even if it's just driving the increasing virtualisation of workloads.

"However, most traditional storage systems were designed for non-virtual workloads, and don't speak the same language as virtualisation. This obscures the relationship between the virtual objects to be managed and the underlying storage architecture, which leads to bottlenecks in performance and management complexity that hinders the scale, agility and high levels of automation that customers expect to accompany their move to cloud-enabled platforms."

The leading causes of bottlenecks

There are many factors that can contribute to storage bottlenecks in virtual environments, says Naidoo, and Flash alone is not the panacea for all storage performance issues.

"In the physical world, applications were given exclusive access to compute, network and storage resources, to guarantee performance for that application. Traditional storage systems were designed in this fashion to allow performance optimisation at the LUN (logical unit number), volume or tier level, to ensure the application provisioned with those resources would have the assured performance it required."

However, virtualisation breaks this model by abstracting the physical layers of compute, network and storage, and putting a virtual administrator in complete control of how those resources are used within the hypervisor. He explains, "That means that a storage administrator can't always guarantee that the LUN provisioned for high performance random writes, for example, will always be used for the intended application or its component."

"While the hypervisor does a great job of ensuring fair distribution of the compute and network layer, bottlenecks usually appear in the storage layer shared by many virtual machines (VMs). Each VM and vDisk has its own input/output (I/O) stream, which is randomised through the hypervisor creating an I/O blender effect that translates into degradation of performance on traditional storage systems. This blender effect gets worse when VM density increases. For this reason, we still often find customers choosing not to virtualise large Tier1 workloads, so they can guarantee performance for these critical applications in a non-virtual environment."

The commoditisation of Flash has had a significant effect in changing the way we manage storage in the enterprise today, according to Naidoo. "It improves performance by as much as 100 percent for random I/O over spinning disk drives, which is essential for addressing the I/O blender effect caused by virtual workloads. It also consumes far less power and floor space, allowing significant rack space consolidations in data centres and has significant cost benefits as a result."

However, there's another conundrum introduced with virtualisation, called the 'noisy neighbour syndrome'. He explains, "This is where an I/O-intensive virtual workload can monopolise the performance of the storage system, and starve out other VMs sharing that same object. Traditional storage systems designed around LUNs and volumes, that have just transitioned to using Flash as an additional storage medium, usually experience this issue in virtual environments. This is because a LUN addresses I/O in a first-in-first-out approach, which does not prevent a single VM from filling up the LUN queue, at the expense of other VMs that need to share that LUN."

The easiest way for administrators to prevent this is to provision more LUNs or volumes and to distribute workloads across them. The downside of this approach is that it can lead to over-provisioning, complexity and wastage.

Another cause of massive bottlenecks is the desktop boot, login or log-off from hundreds or even thousands of computers simultaneously, when virtual desktop and server workloads are mixed on the same storage system, continues Naidoo.

Storage bottlenecks can also be experienced further up the stack, across the storage network or within the hypervisor or host. "For example, VMs that are not aligned to the blocks on the underlying storage system can result in an increase in I/O reads and writes and significant performance degradation of the applications. Additionally, large I/O-intensive databases should be carefully laid out to ensure that their vDisks are spread across multiple virtual IO adapters for improved throughput through the hypervisor IO stack," he advises.

Choosing a storage solution

There are six main factors to consider when choosing a storage solution for a highly virtualised or cloud-enabled data centre, according to Naidoo.

1. The storage system should have the right level of abstraction.
It should be designed from the outset for virtual workloads, whether those applications are running in a virtual machine or container, the storage system should allow native management at the same virtual object layer (not at the LUN or volume layer). This will simplify management, and allow greater control for automation, orchestration and scaling in the environment.

2. The system should be optimally designed for Flash storage.
Techniques like wide striping and wear levelling should be used to prevent write amplification or write-cliffs on the storage systems over time.

3. The system should provide efficiencies.
Things like inline deduplication, inline compression and thin provisioning should be provided natively, to ensure the most effective utilisation of this relatively expensive medium.

4. The system should provide autonomous operations.
No manual tuning or configuration should be needed. It should be able to provide performance isolation between VMs and vDisks to ensure that the 'noisy neighbour' syndrome is never experienced, even when scaling up from hundreds to thousands of virtual workloads on the same system. This will allow you to deliver predictable high performance, meet SLAs, and create service tiers with zero effort.

5. The system should provide robust and comprehensive analytics for every virtual workload, across the stack.
a. Real-time, actionable analytics on every virtual workload in the environment, with visibility across your infrastructure: compute, network and storage. That will allow you to identify bottlenecks in real-time wherever they may occur.
b. Actionable analytics help pinpoint performance problems in your complete cloud or virtualised environment, make the change, and immediately see the results.
c. Predictive analytics that can analyse up to 3 years of historical performance data to allow you to predict your future Storage and Compute requirements based on organic growth

6. The system should also support open APIs, abstracted at the right level of management.
This is not abstractions at the LUN, volume of disk layer, but rather at the VM, vDisk or container level, to allow you to build powerful automation and orchestration to simplify mundane daily tasks for your teams.

Traditional storage still remains one of the primary obstacles to achieving greater virtualisation or building out the enterprise and/or private cloud for most organisations, says Naidoo. "Choosing the right storage system that is designed for your modern data centre will not only ensure that you can achieve more consistent performance SLAs, but also have the right level of insight to identify bottlenecks in real-time, even if they may exist outside of the storage system. With the right storage system, you can change storage from being a bottleneck to a business enabler that is delivering value to the teams that are generating the business advantage in your organisation."

Avoid data bottlenecks

Data storage moves from bottleneck to enabler.