Subscribe

Single instance storage saves

SIS describes a system's ability to keep a single copy of content that can be shared by many users.
Mike Hamilton
By Mike Hamilton
Johannesburg, 18 Sept 2008

While Moore's Law, first postulated in 1965, has been proven correct many times over in the last 43 years, it's time that steps are taken to counter its seemingly endless spiral of increasing processing speeds and memory requirements.

Intel's co-founder Gordon Moore first described the phenomenon in 1965. He predicted the number of transistors that could be placed on an integrated circuit board would increase exponentially, doubling approximately every two years.

This trend has continued for the last 43 years and has been applied to almost every measure of the capabilities of digital electronic devices, including processing speed and memory capacity.

It has also held good for the rate of increase in demand for data storage and the size of hard disk and tape data storage repositories - which are also doubling in size every two years.

Cause for concern

Unfortunately, data storage costs are now rising alarmingly and the inefficiencies that have been tolerated for so long have been highlighted by increasing emphasis on 'green' issues such as data centre energy consumption. The problem centres on the fact that around 85% of corporate data is redundant, mainly because of duplication.

In the case of data backups, it is highly likely that much of the data saved in a current backup operation is identical to that saved in a previous backup. This situation has been exacerbated because personal computer-based systems do not have intelligent file systems capable of identifying objects that are duplicated but held under different file names.

Software developers, hardware companies and utility software companies are all guilty of propagating the inefficiencies associated with data backup and storage. If 85% of the data held in the corporate environment - on numerous workstations and servers - could be eliminated, fewer data centre resources would be required, the centre itself could be physically smaller (requiring less energy to maintain) and the backup processes would be affected - resulting in faster more efficient backups.

Obviously, the IT industry is in need of a smarter way of storing data. The solution is 'single instance' data storage.

Single instance storage

The key to SIS is the 'pointer system' containing a list of the individuals who are allowed access to common content.

Mike Hamilton is MD of Channel Data.

Single instance storage (SIS) is being addressed by forward-thinking data storage vendors in the wake of demands by users for more cost-efficient ways of holding (and moving) data.

SIS describes a system's ability to keep a single copy of content that can be shared by many users - or computer systems. It addresses the problem by identifying the duplicate items and maintaining references for the duplicate items.

SIS has been implemented in file systems, e-mail server software, data backup solutions and in a variety of database and database management applications, including electronic messaging systems.

It may be implemented in software systems or in hardware appliances, such as a 'generic' appliance providing benefits to multiple software systems.

SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the 'common store', and replaces other copies with pointers to the stored versions.

In finding duplicates, matches are not affected by any differences in file names, properties or attributes. This method ensures exact digital duplicates can be matched and eliminated across data sets, applications, clients and operating system platforms.

When a SIS-managed file is modified and a user saves the file, the new file is written to the file system and not into the common store. Other users accessing the file continue to be served by the original SIS-housed version.

Pointer system

The key to SIS is the 'pointer system' containing a list of the individuals who are allowed access to common content.

While the primary benefit of SIS is a reduction in disk space requirements, a significant benefit is also faster and more efficient delivery of messages sent to large distribution lists.

Conceptually, SIS is also a factor in the transfer of data from site to site.

Let's assume a company has a number of branch offices and a central backup or replication policy is mandated. The movement of duplicated data from site to site is not only a costly exercise - in terms of WAN bandwidth requirements - but a slow one as well.

Recognising and eliminating redundant transmissions across the WAN will significantly improve application performance and help ensure high application availability at sites with multiple WAN links.

These appliances also feature on-board hard drives that allow the storage of large data patterns over long periods of time, permitting the devices to identify and remove repetitive traffic separated by gigabytes of data - producing up to a claimed 100-fold increase in effective WAN capacity.

Security

A number of questions have been asked about the security of data linked to SIS. In a sense, SIS adds its own brand of enhancements in the form of an additional layer of security.

Because an SIS data repository is not based on standard formatting principles, a hacker would first have to understand the very complex mapping of the hard drives before gaining access to meaningful information.

Moreover, because the SIS data repository is based on the same concept as a database, unless anyone with mal-intent gains access to the key, or 'schema', they would find it impossible to reconstitute the database from hacked data - which would then be rendered 'unreadable'.

If data compression or encryption were added, security would be taken to an even higher level.

* Mike Hamilton is MD of Channel Data.

Share