Subscribe

Safeguarding dying data formats

By Reuters
Saanen, Switzerland, 19 May 2010

In a secret bunker deep in the Swiss Alps, European researchers have deposited a "digital genome" that will provide the blueprint for future generations to read data stored using defunct technology.

Accompanied by burly security guards in black uniforms, scientists carried a time capsule through a labyrinth of tunnels and five security zones to a vault near the slopes of chic ski resort Gstaad.

The sealed box containing the key to unpick defunct digital formats will be locked away for the next quarter of a century behind a 3-1/2-tonne door strong enough to resist nuclear attack at the data storage facility, known as the Swiss Fort Knox.

"Einstein's notebooks you can take down off the shelf and read them today. Roll forward 50 years and most of Stephen Hawking's notes will likely only be stored digitally and we might not be able to access them all," said the British Library's Adam Farquhar, one of two computer scientists and archivists entrusted with transferring the capsule.

The capsule is the culmination of the four-year "Planets" project, which draws on the expertise of 16 European libraries, archives and research institutions, to preserve the world's digital assets as hardware and software is superseded at a blistering pace.

"The time capsule being deposited inside Swiss Fort Knox contains the digital equivalent of the genetic code of different data formats, a 'digital genome'," said the grey-bearded Farquhar, coordinator of the 15 million-euro project.

"I can't even read my own dissertation anymore except in paper form, because we didn't have anything like this when I wrote it," he said.

Around 100GB of data - equivalent to 24 tonnes of books - has already been created for every single individual on the planet, ranging from holiday snaps to health records, project organisers said, adding this amounted to over one trillion CDs worth of data across the globe.

But as technological breakthroughs help people to live longer, the lifespan of technology gets shorter, meaning the European Union alone loses digital information worth at least three billion euros every year, they said.

Studies suggest common data storage formats like CDs and DVDs only last 20 years, while digital file formats have a life expectancy of just five to seven years. Hardware even less.

"Unlike hieroglyphics carved in stone or ink on parchment, digital data has a shelf life of years not millennia," said Andreas Rauber, a professor at the University of Technology of Vienna, which is a partner in the project.

"Failure to implement adequate digital preservation measures now could cost us billions in the future," Rauber said, adding that the project had made open-use software available online to enable people to decipher data stored in defunct formats.

Without supporting software and compatible operating systems, knowing what is on a disc, let alone reading the files will be impossible, Farquhar said.

The project hopes to preserve "data DNA," the information and tools to access and read historical digital material and prevent digital memory loss into the next century.

"If we can nail the next 100 years, we figure we will be able to nail the next 100 years as well," Farquhar said.

This could have uses for countless different organisations, from pharmaceutical companies trying to access test data decades from now or aerospace companies checking design details of planes built to fly for 30 or 40 years.

People will be puzzled at what they find when they open the time capsule, said Rauber.

"In 25 years people will be astonished to see how little time must pass to render data carriers unusable because they break or because you don't have the devices anymore," he said. "The second shock will probably be what fraction of the objects we can't use or access in 25 years and that's hard to predict."

Share