2. Why DNA?
Stability: DNA is a very stable molecule, especially if it is stored in cold, dry,
and dark conditions. We have found woolly mammoth DNA in colder regions
that has been preserved for thousands of years.
3. Why DNA?
Density: It is volumetric and not planar like a hard drive. One gram of DNA has
stored 700 terabytes of data and could eventually store about two petabytes of
data, which is equal to about three million CDs. To store 700TB on hard drives,
you’d need 233 3TB drives, weighing a total of 151 kilos. It is theoretically
possible to "store at least 100 million hours of high-definition video in about a
cup of DNA."
4. Why DNA?
Redundancy: Artificially synthesized DNA is generally stored as short
redundant fragments, in part due to the errors in synthesizing these
sequences, although this is improving fast. The advent of DNA laser printing
will increase both the speed and accuracy of DNA synthesis.
5. Why DNA?
Price: The Costs per Megabase (million base pairs) graph reflects the
production costs of generating raw, unassembled sequence (reading) data.
The cost per megabase was $10,000 in 2001 and has fallen to 10 cents in
2012. Synthesizing artificial sequences is costlier.
6. Why not DNA?
Speed: The fastest current technology can sequence (read) DNA on the order
of about 1 billion bases per hour. Synthesis (write) is even slower and more
expensive as well. This is extremely slow compared to modern storage media
but would be suitable for long term data storage. The increase in sequencing
speed actually exceeds Moore’s Law.
Rewriting: This is essentially a write-once technology, but static data like
government and historical records could benefit from this storage option. The
speed at which DNA can be sequenced or synthesized is slow, but perhaps
someday the speed will be practical.
7. Cool Facts about DNA
DNA is composed of two strands, which run in opposite directions. The
backbone is based on a sugar-phosphate motif, and the actual coding
sequence is based on four nitrogenous bases which are hydrogen bonded
together, keeping the strands coiled. The four bases are adenine, thymine,
guanine, and cytosine. A always pairs with T and G always pairs with C.
8. More Cool Facts about DNA
● The human genome is approximately 3 billion base pairs in length
(haploid).
● DNA usually exists in a highly compact, coiled state in the cell nucleus,
wrapped around proteins.
● If a single cell's DNA were stretched out to full length it would exceed
two meters in length.
● USB powered DNA sequencing devices can be bought for less than $1,000
and can plug right into your laptop
● Modern sequencing technology includes protein micropore, graphene,
and microfluidic silicon chip nanotechnologies.
● Traditional DNA synthesis has a 99 percent accuracy rate per 100 pairs
which is problematic and requires redundancy. But new DNA laser
printing technology can possibly achieve a 100% accuracy rate—
hypothetically to any strand length.
9. What has been done
The European Bioinformatics Institute encoded a 26-second snippet of Martin Luther King's classic
anti-racism address from 1963, a .jpg, a .pdf of the seminal 1953 paper by Crick and Watson
describing the structure of DNA, a .txt file containing all of Shakespeare's sonnets, and a file about
the encoding system itself (a total equivalent on a computer drive to about 760 kilobytes). Their
method stores multiple copies of overlapping fragments, with each fragment also carrying some
indexing details that identify where in the overall sequence it should sit. This builds redundancy
into the system, meaning that if some fragments become corrupted, the data will not be lost. This
is similar to the way the human genome was actually sequenced, in that multiple fragments from
multiple copies of the genome were sequenced to ensure the entire genome had coverage.
At Harvard University, strands of DNA that store 96 bits were synthesized, with each of the bases
(TGAC) representing a binary value (thymine and guanine = 1, adenine and cytosine = 0). To aid
with sequencing, each strand of DNA has a 19-bit address block at the start so a whole vat of DNA
can be sequenced out of order, and then sorted into usable data using the addresses. They used
microfluidic silicon chips, which can detect a single hydrogen atom and sequence DNA in a
massively parallel way. There are numerous other technologies including proteins from bacteria
like nanopores which can sequence a single strand of DNA running through them (albeit slowly).
Breakthroughs using graphene (one atom thick carbon sheet) make it possible to do the same
thing, and perhaps at higher speeds.
10. Designing and synthesizing DNA with modern
software is a reality, and DNA laser printing
will soon be commonplace.