Breakthroughs in DNA use could shrink the space needed for data storage that today would fill a large supermarket down to the size of a sugar cube.
Technology companies routinely build sprawling data centres to store all the baby pictures, financial transactions, funny cat videos and email messages its users hoard. But a new technique developed by University of Washington (UW) and Microsoft researchers could shrink the space needed for data storage to a fraction of today’s requirements.
The team of computer scientists and electrical engineers has detailed one of the first complete systems to encode, store and retrieve digital data using DNA molecules, which can store information millions of times more compactly than current archival technologies.
In one experiment the team successfully encoded digital data from four image files into the nucleotide sequences of synthetic DNA snippets.
More significantly, they were also able to reverse that process, retrieving the correct sequences from a larger pool of DNA and reconstructing the images without losing a single byte of information.
“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works – it’s very, very compact and very durable,” said co-author Luis Ceze, University of Washington (UW) Associate Professor of Computer Science and Engineering.
“We’re essentially repurposing it to store digital data – pictures, videos, documents – in a manageable way for hundreds or thousands of years.”
Lee Organick, a UW computer science and engineering research scientist, mixes DNA samples for storage. Each tube contains a digital file, which might be a picture of a cat or a Tchaikovsky symphony.
The digital universe – all the data contained in our computer files, historic archives, movies, photo collections and the exploding volume of digital information collected by businesses and devices worldwide – is expected to hit 44 trillion GB by 2020.
That’s a tenfold increase compared to 2013, and will represent enough data to fill more than six stacks of computer tablets stretching to the moon. While not all of that information needs to be saved, the world is producing data faster than the capacity to store it. DNA molecules can store information many millions of times more densely than existing technologies for data storage, such as flash drives, hard drives, and magnetic and optical media.
Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries. DNA is best suited for archival applications, rather than instances where files need to be accessed immediately.
The team from the Molecular Information Systems Lab housed in the UW Electrical Engineering Building, in close collaboration with Microsoft Research, is developing a DNA-based storage system that it expects could address the world’s needs for archival storage.
First, the researchers developed a novel approach to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences: adenine, guanine, cytosine and thymine.
The digital data is chopped into pieces and stored by synthesising a massive number of tiny DNA molecules, which can be dehydrated or otherwise preserved for long-term storage. The UW and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform ‘random access’ – to identify and retrieve the correct sequences from this large pool of random DNA molecules, which is a task similar to reassembling one chapter of a story from a library of torn books.
To access the stored data later, the researchers also encode the equivalent of zip codes and street addresses into the DNA sequences. Using DNA sequencing, researchers can then read the data and convert them back ‘street addresses’.
What does 44 trillion GB of data look like?
Nine trillion CDs
An HD movie running for 1400 million years
A bitmap image 12 times the size of Australia