SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Data Storage in DNA
SUBMITTED BY
Aditya Nag (34401217057)
Shibham Saha (34401217015)
BACHELOR OF COMPUTER APPLICATIONS
UNDER GUIDANCE OF
Mr. Nantu Shaw
INSPIRIA KNOWLEDGE CAMPUS, SILIGURI
* 2019-2020 *
ACKNOWLEDGEMENT
With immense please we, Mr.Aditya Nag, and Mr. Shibham
Saha, are presenting “Data Storage in DNA” seminar report as
part of the curriculum of “Under Graduate degree course in
Computer Applications”. We wish to thank all the people who
gave us unending support.
We express our profound thanks to Head of Department Mr.
Nantu Shaw, seminar guide Mr. Nantu Shaw and all those
who have indirectly guided and helped us in preparation of
this seminar.
Aditya Nag (34401217057)
Shibham Saha (34401217015)
INSPIRIA KNOWLEDGE CAMPUS, SILIGURI
BACHELOR OF COMPUTER APPLICATIONS
CERTIFICATE
This is to certify that Mr. Aditya Nag bearing Roll No. 34401217057,
and Shibham Saha bearing Roll No. 34401217015, students of
Bachelor of Computer Applications, have successfully completed
seminar on
Data Storage in DNA
to my satisfaction and submitted the same during the academic year
2019-2020 towards the partial fulfilment of undergraduate degree
course under the department of Computer Applications, Inspiria
Knowledge Campus, Siliguri.
Mr.___________________ Mr. ___________________
Seminar Guide HOD (Bachelor of Computer Applications)
Table of Contents
Topic page
1.Introduction 1-2
2.History 2-3
3.DNA Storage System 3
4.Proposed System 3
5.Working of DNA Digital Data 4-5
6.Assumptions and Dependencies 6
7.Conclusion 6
8.Reference 7
1 | P a g e
1.INTRODUCTION
The demand for data storage devices is increasing day by day as more and more data is
generated every single day. Total information in digital format in the year 2012 was about
2.7 Zettabytes. Presently devices such as optical discs, portable hard drives, pen drives and
flash drives are used to store data. But silicon and the other non-biodegradable materials
used in data storage pollute the environment vigorously. Also, they are available in limited
quantities. Thus, they would be exhausted one certain day. The linear density of digital
storage device is 10 kb per square mm. Hence, newer technology is needed for data storage
and storage purposes. As the data increases, the current data storage technology would not
be enough to store data in future, as data is growing every single day. Even potentially
important information can get lost due to lack of storage. One of the most common causes
of data loss is accidental deletion of files without any backup. Every day many people lose
their important data because of deleting files accidentally, because they do not have proper
backup systems. Poor handling of the optical disk can causedataloss in them. Data loss can
happen due to damage to the hard drive. Mechanical damage to the hard drive is common as
it contains a lot of damageable parts moving at very high speed. Hard drives can get
damaged due to accidental drop of computers. Hard drives can get damaged if any liquid
enters it. Liquids can cause damage to electronic parts of drive making it very difficult to
recover data. The hard disk can get damaged due to fire accidents. Solid State Drive has a
limited number of write cycles. Thus after write cyclelimit,it is not possible to write data on
them. A printed book has better life expectancy than best of the data storage method.
There are many ways to backup the data. One can use cloud services to store data. But to
access data which is stored in a remote cloud, an internet connection is needed all the time.
Thus without an internet connection, it is not possible to access the data which is stored in
the cloud. Another way is to store data on an external drive. But external drives are prone
data loss too. Scientists and Researchers for over the past decade have been trying to develop
a robust way of storing data on a medium which is dense, robust and ever- lasting. They are
sticking to the storage medium which is used by nature that is DNA. There are many reasons
to use DNA as the storage medium such as small size and high density. Just 1 gram of dry
DNA can store about 455 Exabyte of data. Thus, data on DNA can be conveniently stored.
The power usage required while working with DNA is a very little compared to a
conventional storage. Even the error rate of DNA storage is much less than normal storage
device.
Fig 1: Nucleotides of DNA
DNA is a very robust material and it has a long shelf life. The information stored in DNA
2 | P a g e
can be recovered even after thousands of years. As long as the DNA is stored in dry, dark
and cold conditions, DNA can be stored for a long time. By using Polymerase Chain
Reaction techniques, it is possibleto get as many copies as required. Thus, copying of data
can be done easily and many copies of data can be obtained. As DNA can retain information
for centuries, DNA can be used for long-term storage. Due to high density, the DNA can
store a large amount of data in very small space. As in approach, the data is stored in long
virtual DNA molecule but encoding is done using synthetically prepared short DNA strand.
Short strands will allow to easily manipulating data. It is possible to read simultaneously
and randomly read files stored in DNA. Also, compression technique is used to compress
data without any loss. The 4 nucleotides of DNA used in themodel are Adenine which will
be denoted as A, Cytosine as C, Guanine as G and Thymine as T.
2.History
The idea of DNA digital data storage dates back to 1959, when the physicist Richard P.
Feynman, in "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of
Physics" outlined the general prospects for the creation of artificial objects similar to objects
of the microcosm (including biological) and having similar or even more extensive
capabilities. In 1964-65 Mikhail Samoilovich Neiman, the Soviet physicist, published 3
articles about microminiaturization in electronics at the molecular-atomic level, where
independently presented general considerations and some calculations regarding the
possibility of recording, storage, and retrieval of information on synthesized DNA and RNA
molecules.
One of the earliest uses of DNA storage occurred in a 1988 collaboration between artist Joe
Davis and researchers from Harvard. The image, stored in a DNA sequence in E.coli, was
organized in a 5 x 7 matrix that, once decoded, formed a picture of an ancient Germanic
rune representing life and the female Earth. In the matrix, ones corresponded to dark pixels
while zeros corresponded to light pixels.
In 2007 a device was created at the University of Arizona using addressing molecules to
encode mismatch sites within a DNA strand. These mismatches were then able to be read
out by performing a restriction digest, thereby recovering the data
In 2011, George Church, Sri Kosuri, and Yuan Gao carried out an experiment that would
encode a 659-kb book that was co-authored by Church. To do this, the research team did a
two-to-one correspondence where a binary zero was represented by either an adenine or
cytosine and a binary one was represented by a guanine or thymine. After examination, 22
errors were found in the DNA
In 2013, a software called DNACloud was developed by Manish K. Gupta and co-workers
to encode computer files to their DNA representation. It implements a memory efficiency
version of the algorithm proposed by Goldman et al. to encode (and decode) data to DNA
(.dnac files).
In March 2017, Yaniv Erlich and Dina Zielinski of Columbia University and the New York
Genome Center published a method known as DNA Fountain that stored data at a density
of 215 petabytes per gram of DNA. The technique approaches the Shannon capacity of
DNA storage, achieving 85% of the theoretical limit. The method was not ready for large-
scale use, as it costs $7000 to synthesize 2 megabytes of data and another $2000 to read it
In March 2018, University of Washington and Microsoft published results demonstrating
3 | P a g e
storage and retrieval of approximately 200MB of data. The research also proposed and
evaluated a method for random access of data items stored in DNA.
In March 2019, the same team announced they have demonstrated a fully automated system
to encode and decode data in DNA
In June 2019, scientists reported that all 16 GB of Wikipedia have been encoded
into synthetic DNA
3.DNA STORAGE SYSTEM
We imagine DNA digital data storage as the last level of a deep storing hierarchy, giving very
dense and durable storage with access times of many hours to days. DNA synthesis and
sequencing can be made arbitrarily parallel, making the necessary read and write bandwidths
available. We now detail our proposal of a system for DNA based storage with random access
support.
Fig 2: DNA Storage System
4.PROPOSED SYSTEM
In the model proposed in this paper, DNA is used to store data. In this, a delimiter is used at
the end of each file so that data can be accessed randomly. The data will be encoded using
specialized Huffman tree. If required, each file can be given separate Huffman tree for
encoding which will increase data security along with compressing the data. In the case of
any error in data while encoding, this error is contained in that file only. As Huffman tree is
used for encoding, datacompression is achieved. It provides security as anyone cannot decode
it without the original tree. For sequencing of DNA strand, a lot of specialized equipmentis
needed. So without the required equipment, DNA cannot be read. Maximum of 2 nucleotide
repeats, except for delimiters, are there in DNA. There are 2 copies of all the data. So in the
case of data loss, other copy can be used to retrieve data. This method is flexible and the user
can manipulate the method to suit the needs and store all kinds of data.
4 | P a g e
5.WORKING OF DNA DIGITAL DATA
A digital data in DNA should come across 5 levels and are as follows: Coding, Synthesis,
Storage, Retrieval and Decoding.
Fig 3: Working of DNA Digital Data
A.ENCODING
1. Form frequency table of characters of the data.
2. Now Huffman tree of non-repeating nucleotides for encoding is generated as follows:
• Each node in the tree will have 3 children.
• The weights of branches of children will depend on the incoming weight of parent.
• On the off chance that the weight of incoming branch of a parent is A, at that point C
represents to the leftmost child, G represents to the middle child and T represents
the rightmost child.
• On the off chance that the weight of incoming branch of a parent is C, at that point
G represents the leftmost child, T represents the middle child and A represents the
rightmost child.
• On the off chance that the weight of incoming branch of a parent is G, at that point
T represents the leftmost child, A represents the middle child and C represents the
rightmost child.
• On the off chance that the weight of incoming branch of a parent is T, at that point
A represents the leftmost child, C represents the middle child and G represents the
rightmost child.
• T will be considered to be an incoming weight for root.
3. Now split the whole data into overlapping segments of
100 nucleotides with an offset of 50 nucleotides from previous.
4. Form pairs of segments starting from the 1st segment.
1. Index each pair from 0 to 107 and after 107, start from 0 again.
2. Reverse complement 2nd segment in each pair.
3. The index will be of 4 nucleotides long. The index is encoded by a combination of
nucleotides in a sequence of A, C, G, T such that no 2 consecutive nucleotides same.
5 | P a g e
Example: 0=ACAC, 1=ACAG, 2=ACAT.
4. Prepend A and append C to the 1st segment of the pair.
5. Prepend T and append G to the 2nd segment of the pair.
6. Each segment is now synthesized to actual DNA strand of length 106 nucleotides. If the
length of the code of a character is1,then to avoid repetition ofnucleotides,1more nucleotide
is added in the code of the character.
B. DECODING
1. The decoding process is simply the reverse of the encoding process.
2. The 1st nucleotide of DNA will tell whether the DNA is the 1st or 2nd segment of the pair or
whether the data is reverse complemented or not and directionality of strand.
3. If 1st nucleotide is A then:
• Remove 1st nucleotide.
• Next 4 nucleotides will tell us about segment number.
• Next 100 nucleotides will be data.
• The last nucleotide can be used for confirmation of the type of segment.
4. If 1st nucleotide is C then:
• Reverse whole segment.
• Remove 1st nucleotide.
• Next 4 nucleotides will tell us about segment number.
• Next 100 nucleotides will be data.
• The last nucleotide can be used for confirmation of the type of segment.
5. If 1st nucleotide is G then:
• Reverse whole segment.
• Remove 1st nucleotide.
• Next 4 nucleotides will tell us about segment number.
• Reverse complements next 100 nucleotides.
• These 100 nucleotides will now be data.
• The last nucleotide can be used for confirmation of the type of segment.
6. If 1st nucleotide is T then:
• Remove 1st nucleotide.
• Next 4 nucleotides will tell us about segment number.
• Reverse complements next 100 nucleotides.
• These 100 nucleotides will now be data.
• The last nucleotide can be used for confirmation of the type of segment.
7. If TTTT sequence is found, this will denote the end of the file. The new character will
start from next nucleotide.
8. Now by using the same Huffman tree, data can convert the data into original characters. It
is possible to generate different Huffman tree for different files or single Huffman tree for
6 | P a g e
whole data. This will compress the data and decoding cannot be done unless one has the
original tree. As specific orientation nucleotides have been used in the strands, it is possible
to read double number segmentsin the same number of indexes. The user can read the strand
from any direction.
6.ASSUMPTIONS AND DEPENDENCIES
a) Advantages:
DNA has many advantages for storing digital data.
• It is ultra-compact.
• It can last hundreds of thousands of years if kept in a cool, dry place.
• As long as human societies are reading and writing DNA, they will be able to decode it.
• DNA won’t degrade over time like cassette tapes and CDs, and it won’t become
obsolete.
b) Disadvantages:
• High cost.
• DNA is significantly harder and slower to read than conventional computer transistors
i.e., in terms of access speed it is actually less RAM-like than our average computer
SSD or spinning magnetic hard-drive.
7. CONCLUSION
DNA-based storage has the potential to be the ultimate archival storage solution. It is
extremely dense and durable. Thus using DNA for data storage, it is possible to store huge
amount of data in very less size. As DNA can hold data for many years, it is conceivable to
store data for quite a while. By utilizing this strategy, data is packed and the security to the
data is given. Parallel reading of files is also possible, enabling users to read multiple files
at the same time. This technique maintains two copies of data. Hencein caseof data damage,
its copy can be used to read data. In the case of any errors, while encoding the data the error
is restricted tothat particular file and no other file is affected due to that error. This technique
can be used for all kind of files by making minor changes to adapt to the type of file. This
technique can be used to store big data in very small space with little computational
overhead. This method is scalable and can be used to store large files too. Also, multiple
copies can be made easily. This method can be used to store information in archival systems
or big data. Instead of using conventional storage devices which have less capacity to store
data, DNA based storage method can be used in distant future to store data in a secured
manner and for a long time storage andcan solve the problem of limited space.
7 | P a g e
8.REFERENCES
[1] M. Burrows and D. J. Wheeler, “A block-sorting lossless data compression algorithm,”
Digital System Research Center, USA, 1994.
[2] ]G.C.Smith, C.C.Fiddes, J.P.Hawkins, and J.P.L.Cox, “Some possible codes for
encrypting data in DNA”, 2003.
[3] Wikipedia on Data Storage in DNA
[4] Blog article from geeksforgeeks
8 | P a g e

Contenu connexe

Tendances

Datastorage in DNA
Datastorage in DNADatastorage in DNA
Datastorage in DNAAditya Nag
 
DNA as memory storage device
DNA as memory storage deviceDNA as memory storage device
DNA as memory storage deviceKiran Gajare
 
DNA Storage at AGBT 2018
DNA Storage at AGBT 2018DNA Storage at AGBT 2018
DNA Storage at AGBT 2018Yaniv Erlich
 
3d optical data storage ppt
 3d optical data storage ppt 3d optical data storage ppt
3d optical data storage pptMuhammedOwaiz
 
Protein memory
Protein memory Protein memory
Protein memory ajay2604
 
Holographic data Storage
Holographic data StorageHolographic data Storage
Holographic data StorageZaraMudassir
 
3D Optical Data Storage
3D Optical Data Storage 3D Optical Data Storage
3D Optical Data Storage Shobha Rani
 
Dna computing
Dna computingDna computing
Dna computingsathish3
 
Holographic Memory
Holographic MemoryHolographic Memory
Holographic Memorysajayonline
 
Millipede presentation
Millipede presentationMillipede presentation
Millipede presentationAbhishek Gupta
 
Holographic Data Storage
Holographic Data StorageHolographic Data Storage
Holographic Data StorageLikan Patra
 
Difference between HDD & SSD
Difference between HDD & SSDDifference between HDD & SSD
Difference between HDD & SSDmohibullah rahimi
 
3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGE3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGERishikese MR
 
Biological computers
Biological computers Biological computers
Biological computers AnandhuV2
 

Tendances (20)

Datastorage in DNA
Datastorage in DNADatastorage in DNA
Datastorage in DNA
 
DNA Storage
DNA StorageDNA Storage
DNA Storage
 
DNA as memory storage device
DNA as memory storage deviceDNA as memory storage device
DNA as memory storage device
 
DNA Storage at AGBT 2018
DNA Storage at AGBT 2018DNA Storage at AGBT 2018
DNA Storage at AGBT 2018
 
cloud storage
cloud storagecloud storage
cloud storage
 
3d optical data storage ppt
 3d optical data storage ppt 3d optical data storage ppt
3d optical data storage ppt
 
Holographic memory
Holographic memoryHolographic memory
Holographic memory
 
Protein memory
Protein memory Protein memory
Protein memory
 
Holographic data Storage
Holographic data StorageHolographic data Storage
Holographic data Storage
 
3D Optical Data Storage
3D Optical Data Storage 3D Optical Data Storage
3D Optical Data Storage
 
Dna computing
Dna computingDna computing
Dna computing
 
Holographic memory
Holographic memoryHolographic memory
Holographic memory
 
Holographic Memory
Holographic MemoryHolographic Memory
Holographic Memory
 
Millipede presentation
Millipede presentationMillipede presentation
Millipede presentation
 
Ppt 5d
Ppt 5dPpt 5d
Ppt 5d
 
Holographic Data Storage
Holographic Data StorageHolographic Data Storage
Holographic Data Storage
 
Difference between HDD & SSD
Difference between HDD & SSDDifference between HDD & SSD
Difference between HDD & SSD
 
3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGE3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGE
 
Biological computers
Biological computers Biological computers
Biological computers
 
DNA Based Computing
DNA Based ComputingDNA Based Computing
DNA Based Computing
 

Similaire à Data Storage in DNA Documentation

Datastorage in DNA
Datastorage in DNADatastorage in DNA
Datastorage in DNAAditya Nag
 
Digital data storage in DNA
Digital data storage in DNADigital data storage in DNA
Digital data storage in DNASharath Raj
 
Dna storage
Dna storageDna storage
Dna storageCareerIn
 
Liquid Steganography presentation.pptx
Liquid Steganography  presentation.pptxLiquid Steganography  presentation.pptx
Liquid Steganography presentation.pptxChandniA5
 
Hafiz sarfraz ali presentation final
Hafiz sarfraz ali presentation finalHafiz sarfraz ali presentation final
Hafiz sarfraz ali presentation finalRaza Umer
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Data Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsData Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsUmair Amjad
 
Tape Storage Future Directions and the Data Explosion
Tape Storage Future Directions and the Data ExplosionTape Storage Future Directions and the Data Explosion
Tape Storage Future Directions and the Data ExplosionIBM India Smarter Computing
 
Dna computing
Dna computing Dna computing
Dna computing busyking03
 
A Study on DNA based Computation and Memory Devices
A Study on DNA based Computation and Memory DevicesA Study on DNA based Computation and Memory Devices
A Study on DNA based Computation and Memory DevicesEditor IJCATR
 
Digital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxDigital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxcuddietheresa
 
Secondary Storage
Secondary StorageSecondary Storage
Secondary Storagepy7rjs
 

Similaire à Data Storage in DNA Documentation (20)

Dna synopsis
Dna synopsisDna synopsis
Dna synopsis
 
Datastorage in DNA
Datastorage in DNADatastorage in DNA
Datastorage in DNA
 
DNA digital data storage.pptx
DNA digital data storage.pptxDNA digital data storage.pptx
DNA digital data storage.pptx
 
DNA STORAGE
 DNA STORAGE DNA STORAGE
DNA STORAGE
 
Digital data storage in DNA
Digital data storage in DNADigital data storage in DNA
Digital data storage in DNA
 
Dna storage
Dna storageDna storage
Dna storage
 
Liquid Steganography presentation.pptx
Liquid Steganography  presentation.pptxLiquid Steganography  presentation.pptx
Liquid Steganography presentation.pptx
 
Hafiz sarfraz ali presentation final
Hafiz sarfraz ali presentation finalHafiz sarfraz ali presentation final
Hafiz sarfraz ali presentation final
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Ack,abs,con,concl,ref
Ack,abs,con,concl,refAck,abs,con,concl,ref
Ack,abs,con,concl,ref
 
Final doc of dna
Final  doc of dnaFinal  doc of dna
Final doc of dna
 
Data Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvementsData Deduplication: Venti and its improvements
Data Deduplication: Venti and its improvements
 
Tape Storage Future Directions and the Data Explosion
Tape Storage Future Directions and the Data ExplosionTape Storage Future Directions and the Data Explosion
Tape Storage Future Directions and the Data Explosion
 
Dna computing
Dna computing Dna computing
Dna computing
 
Dna cryptography
Dna cryptographyDna cryptography
Dna cryptography
 
Dna computing
Dna computingDna computing
Dna computing
 
Genetic data storage
Genetic data storageGenetic data storage
Genetic data storage
 
A Study on DNA based Computation and Memory Devices
A Study on DNA based Computation and Memory DevicesA Study on DNA based Computation and Memory Devices
A Study on DNA based Computation and Memory Devices
 
Digital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docxDigital Forensic is a part of forensic that focuses on investigati.docx
Digital Forensic is a part of forensic that focuses on investigati.docx
 
Secondary Storage
Secondary StorageSecondary Storage
Secondary Storage
 

Plus de Aditya Nag

Customer Experience in a SaaS business
Customer Experience in a SaaS businessCustomer Experience in a SaaS business
Customer Experience in a SaaS businessAditya Nag
 
Home Security System using Arduino & GSM
Home Security System using Arduino & GSM Home Security System using Arduino & GSM
Home Security System using Arduino & GSM Aditya Nag
 
Application Of Python in Medical Science
Application Of Python in Medical ScienceApplication Of Python in Medical Science
Application Of Python in Medical ScienceAditya Nag
 
How To Make Your Windows Fast
How To Make Your Windows Fast How To Make Your Windows Fast
How To Make Your Windows Fast Aditya Nag
 
Documentation on stress management
Documentation on stress managementDocumentation on stress management
Documentation on stress managementAditya Nag
 
Stress Management for Students
Stress Management for StudentsStress Management for Students
Stress Management for StudentsAditya Nag
 
Quiz app android ppt
Quiz app android pptQuiz app android ppt
Quiz app android pptAditya Nag
 
Quiz app (android) Documentation
Quiz app (android) DocumentationQuiz app (android) Documentation
Quiz app (android) DocumentationAditya Nag
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisAditya Nag
 

Plus de Aditya Nag (11)

Customer Experience in a SaaS business
Customer Experience in a SaaS businessCustomer Experience in a SaaS business
Customer Experience in a SaaS business
 
Home Security System using Arduino & GSM
Home Security System using Arduino & GSM Home Security System using Arduino & GSM
Home Security System using Arduino & GSM
 
Application Of Python in Medical Science
Application Of Python in Medical ScienceApplication Of Python in Medical Science
Application Of Python in Medical Science
 
Insync
Insync Insync
Insync
 
How To Make Your Windows Fast
How To Make Your Windows Fast How To Make Your Windows Fast
How To Make Your Windows Fast
 
Documentation on stress management
Documentation on stress managementDocumentation on stress management
Documentation on stress management
 
Stress Management for Students
Stress Management for StudentsStress Management for Students
Stress Management for Students
 
Quiz app android ppt
Quiz app android pptQuiz app android ppt
Quiz app android ppt
 
Quiz app (android) Documentation
Quiz app (android) DocumentationQuiz app (android) Documentation
Quiz app (android) Documentation
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
5 g
5 g5 g
5 g
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Data Storage in DNA Documentation

  • 1. Data Storage in DNA SUBMITTED BY Aditya Nag (34401217057) Shibham Saha (34401217015) BACHELOR OF COMPUTER APPLICATIONS UNDER GUIDANCE OF Mr. Nantu Shaw
  • 2. INSPIRIA KNOWLEDGE CAMPUS, SILIGURI * 2019-2020 * ACKNOWLEDGEMENT With immense please we, Mr.Aditya Nag, and Mr. Shibham Saha, are presenting “Data Storage in DNA” seminar report as part of the curriculum of “Under Graduate degree course in Computer Applications”. We wish to thank all the people who gave us unending support. We express our profound thanks to Head of Department Mr. Nantu Shaw, seminar guide Mr. Nantu Shaw and all those who have indirectly guided and helped us in preparation of this seminar. Aditya Nag (34401217057) Shibham Saha (34401217015)
  • 3. INSPIRIA KNOWLEDGE CAMPUS, SILIGURI BACHELOR OF COMPUTER APPLICATIONS CERTIFICATE This is to certify that Mr. Aditya Nag bearing Roll No. 34401217057, and Shibham Saha bearing Roll No. 34401217015, students of Bachelor of Computer Applications, have successfully completed seminar on Data Storage in DNA to my satisfaction and submitted the same during the academic year 2019-2020 towards the partial fulfilment of undergraduate degree course under the department of Computer Applications, Inspiria Knowledge Campus, Siliguri. Mr.___________________ Mr. ___________________ Seminar Guide HOD (Bachelor of Computer Applications)
  • 4. Table of Contents Topic page 1.Introduction 1-2 2.History 2-3 3.DNA Storage System 3 4.Proposed System 3 5.Working of DNA Digital Data 4-5 6.Assumptions and Dependencies 6 7.Conclusion 6 8.Reference 7
  • 5. 1 | P a g e 1.INTRODUCTION The demand for data storage devices is increasing day by day as more and more data is generated every single day. Total information in digital format in the year 2012 was about 2.7 Zettabytes. Presently devices such as optical discs, portable hard drives, pen drives and flash drives are used to store data. But silicon and the other non-biodegradable materials used in data storage pollute the environment vigorously. Also, they are available in limited quantities. Thus, they would be exhausted one certain day. The linear density of digital storage device is 10 kb per square mm. Hence, newer technology is needed for data storage and storage purposes. As the data increases, the current data storage technology would not be enough to store data in future, as data is growing every single day. Even potentially important information can get lost due to lack of storage. One of the most common causes of data loss is accidental deletion of files without any backup. Every day many people lose their important data because of deleting files accidentally, because they do not have proper backup systems. Poor handling of the optical disk can causedataloss in them. Data loss can happen due to damage to the hard drive. Mechanical damage to the hard drive is common as it contains a lot of damageable parts moving at very high speed. Hard drives can get damaged due to accidental drop of computers. Hard drives can get damaged if any liquid enters it. Liquids can cause damage to electronic parts of drive making it very difficult to recover data. The hard disk can get damaged due to fire accidents. Solid State Drive has a limited number of write cycles. Thus after write cyclelimit,it is not possible to write data on them. A printed book has better life expectancy than best of the data storage method. There are many ways to backup the data. One can use cloud services to store data. But to access data which is stored in a remote cloud, an internet connection is needed all the time. Thus without an internet connection, it is not possible to access the data which is stored in the cloud. Another way is to store data on an external drive. But external drives are prone data loss too. Scientists and Researchers for over the past decade have been trying to develop a robust way of storing data on a medium which is dense, robust and ever- lasting. They are sticking to the storage medium which is used by nature that is DNA. There are many reasons to use DNA as the storage medium such as small size and high density. Just 1 gram of dry DNA can store about 455 Exabyte of data. Thus, data on DNA can be conveniently stored. The power usage required while working with DNA is a very little compared to a conventional storage. Even the error rate of DNA storage is much less than normal storage device. Fig 1: Nucleotides of DNA DNA is a very robust material and it has a long shelf life. The information stored in DNA
  • 6. 2 | P a g e can be recovered even after thousands of years. As long as the DNA is stored in dry, dark and cold conditions, DNA can be stored for a long time. By using Polymerase Chain Reaction techniques, it is possibleto get as many copies as required. Thus, copying of data can be done easily and many copies of data can be obtained. As DNA can retain information for centuries, DNA can be used for long-term storage. Due to high density, the DNA can store a large amount of data in very small space. As in approach, the data is stored in long virtual DNA molecule but encoding is done using synthetically prepared short DNA strand. Short strands will allow to easily manipulating data. It is possible to read simultaneously and randomly read files stored in DNA. Also, compression technique is used to compress data without any loss. The 4 nucleotides of DNA used in themodel are Adenine which will be denoted as A, Cytosine as C, Guanine as G and Thymine as T. 2.History The idea of DNA digital data storage dates back to 1959, when the physicist Richard P. Feynman, in "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics" outlined the general prospects for the creation of artificial objects similar to objects of the microcosm (including biological) and having similar or even more extensive capabilities. In 1964-65 Mikhail Samoilovich Neiman, the Soviet physicist, published 3 articles about microminiaturization in electronics at the molecular-atomic level, where independently presented general considerations and some calculations regarding the possibility of recording, storage, and retrieval of information on synthesized DNA and RNA molecules. One of the earliest uses of DNA storage occurred in a 1988 collaboration between artist Joe Davis and researchers from Harvard. The image, stored in a DNA sequence in E.coli, was organized in a 5 x 7 matrix that, once decoded, formed a picture of an ancient Germanic rune representing life and the female Earth. In the matrix, ones corresponded to dark pixels while zeros corresponded to light pixels. In 2007 a device was created at the University of Arizona using addressing molecules to encode mismatch sites within a DNA strand. These mismatches were then able to be read out by performing a restriction digest, thereby recovering the data In 2011, George Church, Sri Kosuri, and Yuan Gao carried out an experiment that would encode a 659-kb book that was co-authored by Church. To do this, the research team did a two-to-one correspondence where a binary zero was represented by either an adenine or cytosine and a binary one was represented by a guanine or thymine. After examination, 22 errors were found in the DNA In 2013, a software called DNACloud was developed by Manish K. Gupta and co-workers to encode computer files to their DNA representation. It implements a memory efficiency version of the algorithm proposed by Goldman et al. to encode (and decode) data to DNA (.dnac files). In March 2017, Yaniv Erlich and Dina Zielinski of Columbia University and the New York Genome Center published a method known as DNA Fountain that stored data at a density of 215 petabytes per gram of DNA. The technique approaches the Shannon capacity of DNA storage, achieving 85% of the theoretical limit. The method was not ready for large- scale use, as it costs $7000 to synthesize 2 megabytes of data and another $2000 to read it In March 2018, University of Washington and Microsoft published results demonstrating
  • 7. 3 | P a g e storage and retrieval of approximately 200MB of data. The research also proposed and evaluated a method for random access of data items stored in DNA. In March 2019, the same team announced they have demonstrated a fully automated system to encode and decode data in DNA In June 2019, scientists reported that all 16 GB of Wikipedia have been encoded into synthetic DNA 3.DNA STORAGE SYSTEM We imagine DNA digital data storage as the last level of a deep storing hierarchy, giving very dense and durable storage with access times of many hours to days. DNA synthesis and sequencing can be made arbitrarily parallel, making the necessary read and write bandwidths available. We now detail our proposal of a system for DNA based storage with random access support. Fig 2: DNA Storage System 4.PROPOSED SYSTEM In the model proposed in this paper, DNA is used to store data. In this, a delimiter is used at the end of each file so that data can be accessed randomly. The data will be encoded using specialized Huffman tree. If required, each file can be given separate Huffman tree for encoding which will increase data security along with compressing the data. In the case of any error in data while encoding, this error is contained in that file only. As Huffman tree is used for encoding, datacompression is achieved. It provides security as anyone cannot decode it without the original tree. For sequencing of DNA strand, a lot of specialized equipmentis needed. So without the required equipment, DNA cannot be read. Maximum of 2 nucleotide repeats, except for delimiters, are there in DNA. There are 2 copies of all the data. So in the case of data loss, other copy can be used to retrieve data. This method is flexible and the user can manipulate the method to suit the needs and store all kinds of data.
  • 8. 4 | P a g e 5.WORKING OF DNA DIGITAL DATA A digital data in DNA should come across 5 levels and are as follows: Coding, Synthesis, Storage, Retrieval and Decoding. Fig 3: Working of DNA Digital Data A.ENCODING 1. Form frequency table of characters of the data. 2. Now Huffman tree of non-repeating nucleotides for encoding is generated as follows: • Each node in the tree will have 3 children. • The weights of branches of children will depend on the incoming weight of parent. • On the off chance that the weight of incoming branch of a parent is A, at that point C represents to the leftmost child, G represents to the middle child and T represents the rightmost child. • On the off chance that the weight of incoming branch of a parent is C, at that point G represents the leftmost child, T represents the middle child and A represents the rightmost child. • On the off chance that the weight of incoming branch of a parent is G, at that point T represents the leftmost child, A represents the middle child and C represents the rightmost child. • On the off chance that the weight of incoming branch of a parent is T, at that point A represents the leftmost child, C represents the middle child and G represents the rightmost child. • T will be considered to be an incoming weight for root. 3. Now split the whole data into overlapping segments of 100 nucleotides with an offset of 50 nucleotides from previous. 4. Form pairs of segments starting from the 1st segment. 1. Index each pair from 0 to 107 and after 107, start from 0 again. 2. Reverse complement 2nd segment in each pair. 3. The index will be of 4 nucleotides long. The index is encoded by a combination of nucleotides in a sequence of A, C, G, T such that no 2 consecutive nucleotides same.
  • 9. 5 | P a g e Example: 0=ACAC, 1=ACAG, 2=ACAT. 4. Prepend A and append C to the 1st segment of the pair. 5. Prepend T and append G to the 2nd segment of the pair. 6. Each segment is now synthesized to actual DNA strand of length 106 nucleotides. If the length of the code of a character is1,then to avoid repetition ofnucleotides,1more nucleotide is added in the code of the character. B. DECODING 1. The decoding process is simply the reverse of the encoding process. 2. The 1st nucleotide of DNA will tell whether the DNA is the 1st or 2nd segment of the pair or whether the data is reverse complemented or not and directionality of strand. 3. If 1st nucleotide is A then: • Remove 1st nucleotide. • Next 4 nucleotides will tell us about segment number. • Next 100 nucleotides will be data. • The last nucleotide can be used for confirmation of the type of segment. 4. If 1st nucleotide is C then: • Reverse whole segment. • Remove 1st nucleotide. • Next 4 nucleotides will tell us about segment number. • Next 100 nucleotides will be data. • The last nucleotide can be used for confirmation of the type of segment. 5. If 1st nucleotide is G then: • Reverse whole segment. • Remove 1st nucleotide. • Next 4 nucleotides will tell us about segment number. • Reverse complements next 100 nucleotides. • These 100 nucleotides will now be data. • The last nucleotide can be used for confirmation of the type of segment. 6. If 1st nucleotide is T then: • Remove 1st nucleotide. • Next 4 nucleotides will tell us about segment number. • Reverse complements next 100 nucleotides. • These 100 nucleotides will now be data. • The last nucleotide can be used for confirmation of the type of segment. 7. If TTTT sequence is found, this will denote the end of the file. The new character will start from next nucleotide. 8. Now by using the same Huffman tree, data can convert the data into original characters. It is possible to generate different Huffman tree for different files or single Huffman tree for
  • 10. 6 | P a g e whole data. This will compress the data and decoding cannot be done unless one has the original tree. As specific orientation nucleotides have been used in the strands, it is possible to read double number segmentsin the same number of indexes. The user can read the strand from any direction. 6.ASSUMPTIONS AND DEPENDENCIES a) Advantages: DNA has many advantages for storing digital data. • It is ultra-compact. • It can last hundreds of thousands of years if kept in a cool, dry place. • As long as human societies are reading and writing DNA, they will be able to decode it. • DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete. b) Disadvantages: • High cost. • DNA is significantly harder and slower to read than conventional computer transistors i.e., in terms of access speed it is actually less RAM-like than our average computer SSD or spinning magnetic hard-drive. 7. CONCLUSION DNA-based storage has the potential to be the ultimate archival storage solution. It is extremely dense and durable. Thus using DNA for data storage, it is possible to store huge amount of data in very less size. As DNA can hold data for many years, it is conceivable to store data for quite a while. By utilizing this strategy, data is packed and the security to the data is given. Parallel reading of files is also possible, enabling users to read multiple files at the same time. This technique maintains two copies of data. Hencein caseof data damage, its copy can be used to read data. In the case of any errors, while encoding the data the error is restricted tothat particular file and no other file is affected due to that error. This technique can be used for all kind of files by making minor changes to adapt to the type of file. This technique can be used to store big data in very small space with little computational overhead. This method is scalable and can be used to store large files too. Also, multiple copies can be made easily. This method can be used to store information in archival systems or big data. Instead of using conventional storage devices which have less capacity to store data, DNA based storage method can be used in distant future to store data in a secured manner and for a long time storage andcan solve the problem of limited space.
  • 11. 7 | P a g e 8.REFERENCES [1] M. Burrows and D. J. Wheeler, “A block-sorting lossless data compression algorithm,” Digital System Research Center, USA, 1994. [2] ]G.C.Smith, C.C.Fiddes, J.P.Hawkins, and J.P.L.Cox, “Some possible codes for encrypting data in DNA”, 2003. [3] Wikipedia on Data Storage in DNA [4] Blog article from geeksforgeeks
  • 12. 8 | P a g e