Open source code to be preserved for 1 000 years
Today’s open source software is going to live forever.
Well, if not forever, then for at least the next 1 000 years if GitHub and its partners – the Long New Foundation, the Internet Archive, the Software Heritage Foundation, the Arctic World Archive, Microsoft Research, the Bodleian Library, and Stanford Libraries – have their way.
Through its Archive Program, GitHub has embarked on an ambitious project to preserve the best of today’s open source software – what it terms “priceless knowledge” – in a way that will ensure it is always available to future generations, regardless of what might happen.
In announcing the Archive Program, GitHub acknowledged that what is regarded as vital code today would probably become tomorrow’s curiosity. It could well be abandoned, forgotten or lost with very little impact on the world. Or not.
However, according to GitHub, there’s “a long history of technologies from which the world would have benefitted, as well as abandoned technologies which found unexpected new uses.”
There is also a possibility – albeit one that is much less likely – that some sort of global catastrophe could result in the loss of everything that is stored on modern, “ephemeral” media such as hard drives, SSDs, CDs that are good for a few decades, and backup tapes that may last for 30 years provided they are stored in archives with strictly controlled heat and humidity.
“Archiving software across multiple organisations and forms of storage will help ensure its long-term preservation: online archivists call this ‘LOCKSS’ for Lots Of Copies Keeps Stuff Safe,” GitHub stated.
GitHub aims to follow this principle by storing multiple copies of today’s software on an ongoing basis, across various data formats and locations.
This includes one very-long-term archive that’s designed to last for at least one thousand years – the GitHub Arctic Code Vault. This is a data repository which will be preserved in the Arctic World Archive (AWA), a facility located in an old coal mine, 250 metres deep in the Arctic permafrost in an archipelago that’s home to the world’s most remote town, far to the north of the Arctic Circle.
GitHub plans to capture a snapshot of every active public repository on 2 February 2020, and preserve this in the AWA on 1 600 metre film reels made of silver halide polyester – a medium that simulation tests indicate has a lifespan of 1000 years. The reels, which can be read with a magnifying glass, will be placed in steel-walled containers that will be placed in a sealed chamber in the mine where historical and cultural data from Italy, Brazil, Norway, the Vatican and many others are also stored. The vault also contains more than 900 000 seeds of the world’s most valuable plants and crops.
Oxford University’s Bodleian Library will provide redundancy for the Arctic Code Vault by keeping GitHub’s 10 000 most valued repositories in their depository on duplicate film reels.
In addition, the GitHub Archive Program is partnering with Microsoft’s Project Silica to ultimately archive all active public repositories for over 10 000 years by writing them into quartz glass platters using a femtosecond laser.
Skeptics point out that it’s unlikely anyone in 1 000 years would find any value in open source software from the 21 century.
However, GitHub points out that future historians will be able to learn about us from open source projects and metadata. In addition, because hardware tends to last far longer than most of today’s storage media, the possibility exists that there could come a time when working modern computers exist, but there is no software to run on them.
And in the not too distant future, the fact that data will be stored with multiple partners will provide options to people whose access might otherwise be restricted.
“If GitHub were to become unavailable in any location… those affected could access public code for their projects using the Internet Archive and Software Heritage Foundation,” GitHub concluded.