More Related Content
Similar to Data mining with human genetics to enhance gene based algorithm and
Similar to Data mining with human genetics to enhance gene based algorithm and (20)
More from IAEME Publication
More from IAEME Publication (20)
Data mining with human genetics to enhance gene based algorithm and
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
176
DATA MINING WITH HUMAN GENETICS TO ENHANCE GENE
BASED ALGORITHM AND DNA DATABASE SECURITY
Vijay Arputharaj J
Research Scholar, Department of Computer Science,
Karpagam University, Coimbatore,
Tamil Nadu, India
Dr.R.Manicka Chezian
Associate Professor, Department of Computer Science,
NGM College (Autonomous),
Pollachi,Tamil Nadu, India
ABSTRACT
The goal of data mining in DNA Database is to check some possible combinations of
DNA sequences and to generate a common sympathetic code or algorithm to formulate the
sequence on mutations. Since the data mining is the best technique to analyze and extract the
data, it is also helpful to formulate the common algorithm.
Data mining in the area of study on human genetics, an important goal is to
understand the mapping relationship between the inter-individual variation in human DNA
sequences and variability in disease, mutation susceptibility. In lay terms, it is used to find
out how the changes in an individual's DNA sequence affect the risk of developing common
diseases and mutations with high level security. This investigation also helps in parental
identification algorithms for DNA sequences, genome expressions. Data mining, data
extraction techniques are used to understand the need for analyses of large, complex,
information-rich data sets in DNA Sequences.
Regulation of gene expression includes the processes that cells and viruses use to
regulate the way that the information in genes is turned into gene products. An important
challenge in use of large scale gene expression data for biological classification occurs when
the expression dataset being analyzed involves multiple classes. To overcome this kind of
problems data mining is used.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 176-181
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
177
Key Words- Data mining, DNA Database, DNA Sequence, Gene Expression, Biological
classification, Multiple class
1. INTRODUCTION
The Human Genome Task or Project is a worldwide scientific study mission with a
main aim of formative the succession of chemical base pairs which structure DNA, also to
identify and map the genes of the human genome from the corporeal and serviceable position.
A DNA database or DNA databank is a database of contains all DNA data. A DNA
Databank can be used in the analysis of parental comparison, genetic diseases, genetic
fingerprinting for criminology, genetic genealogy etc.
Data mining in the area of human genetics, an important goal is to understand the
mapping relationship between the individual variation in human DNA sequences and
variability in various algorithms for database security issues, for mutation susceptibility and
parental identification differences. In our country India which is solidly populated there is
huge need for DNA databases which may help in stopping different types of fraud as like
Passport fraud, Other fraud etc.
Data mining, data extraction techniques are used to understand the need for analyses
of large, complex, information-rich data sets in DNA Sequences. Several visualizations and
data mining techniques are already available, and they are used to validate and attempt to
discover new methods for differentiating DNA sequences or exons, from non-coding DNA
sequences or introns. Since the data mining is the best technique to analyze and extract the
data, it is also helpful to formulate the common algorithm.
2. LITERATURE STUDY
2.1 INTERNATIONAL STATUS
In northern countries data exploration techniques designed to classify DNA
sequences, many different classification techniques including rule-based classifiers and
neural networks. It is used visualization of both the original data and the results of the data
mining to help verify patterns and to understand the distinction between the different types of
data and classifications.
Forensic identification problems are examples in which the study of DNA profiles is a
common approach. Here we present some problems and develop their treatment putting the
focus in the use of Object-Oriented Bayesian Networks - OOBN. The use of DNA databases,
which began in 1995 in England, has created new challenges about its use. In Portugal, the
legislation for the construction of a genetic database was defined in 2008. Cryptographic,
Authentication and High Definition Security approaches for databases are used for several
countries like Thailand, US, UK etc
2.2 NATIONAL STATUS
Genetic features and environmental factors which were involved in multi factorial
diseases. data mining tools were required and we proposed a 2-Phase approach using a
specific genetic algorithm. For the first phase, the feature selection problem, we used a
genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms
had been introduced in the genetic algorithm such as sharing, random immigrant, dedicated
genetic operators and a particular distance operator had been defined. Then, the second phase,
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
178
a clustering based on the features selected during the previous phase, will use the clustering
algorithm k-means.
INDIA CHENNAI: The FBI has a DNA index system. The UK has a similar database. And
if Parliament passes the DNA Profiling Bill, 2007, India will soon join the league, creating a
national DNA database that will help police arrest serial offenders and give a boost to
forensic investigation. The bill, drafted and sent to all ministries and departments for their
feedback, has been modified. The final version has been sent to the law ministry, which has
sent it to the legal department for final drafting,
2.3 SIGNIFICANCE OF THE STUDY
• The important significance of this research is useful for entire society, the identity
of the citizen can be stored thru the Secured DNA Database, Which might not
contain any fraud like passport fraud, Ration card fraud etc.
• This research advances and aids in criminal and forensic databases, This
application is also useful for the government and for the society
• This research is primarily deals with the advancement of genetic algorithm with
proper security features in DNA Databases and it enhances the special features in
DNA database security.
3. RESEARCH STUDY AND DEVELOPMENT
3.1 AIMS AND OBJECTIVES
• To Enhance Database Security
This research is primarily deals with the advancement of genetic algorithm with proper
security features in DNA Databases and it enhances the special features in DNA database
security.
• Mapping relationships in DNA sequences and variability in disease, mutation
susceptibility
• Effective Solution in parental identification algorithms for DNA sequences, genome
expressions.
3.2 MATERIAL AND METHODS
1. Data mining and information retrieval
2. Visual Analytics and Collaboration
3. Combination of Parallel algorithms for sequence analysis
4. Seamless high-performance computing
5. Security Algorithms
a) Reverse Encryption algorithm to protect data
b) Advance Cryptography algorithm to protect data
c) Advanced Encryption Standard (AES)
The above methodologies the Data mining technique is used for knowledge
discovery from entire DNA Database, There can be three levels of genome data mining. The
simplest is an in-depth analysis of the result from a single query using a genome browser. In
this level, one may start with a gene or marker name, or by mapping a sequence to the
genome. Cross comparison of various annotation 'tracks' may help make sense of the query
region. This is the most popular use of any genome browser. Data mining is opposite to the
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
179
information retrieval in the sense, it does not based on predetermine criteria; it will uncover
some hidden patterns by exploring our data.
Visual Analytics, Parallel algorithms are used in the implementation of security issues
in the database.
Seamless High performance computing is connects with speed of access in the database
records Information retrieval is what based on predetermine criteria, like you are interested in
retrieving group of certain peoples belongs to certain class, having certain mortgage plan, or
having certain characteristics which you already know.
Cryptography is usually referred to as "the study of secret", while nowadays is most
attached to the definition of encryption. Encryption is the process of converting plain text
"unhidden" to a cryptic text "hidden" to secure it against data thieves. This process has
another part where cryptic text needs to be decrypted on the other end to be understood.
In the broad meadow of cryptography, encryption is the procedure of indoctrination
letters (or information) within such a method that hackers cannot understand writing it, other
than that approved parties only can used it.
In an encryption scheme, the memorandum or information, it is also called as plain
text; this text is encrypted using an encryption algorithm, turning it into an unreadable cipher
text. This is usually done with the use of an encryption key, which specifies how the message
is to be encoded. After that decryption is also done by the authorized party.
Encryption is a method of hiding data so that it cannot be read by anyone who does
not know the key. The key is used to lock and unlock data. To encrypt a data one would
perform some mathematical functions on the data and the result of these functions would
produce some output that makes the data look like garbage to anyone who doesn't know how
to reverse the operations.
The Advanced Encryption Standard (AES) is a measurement for the encryption of
electronic records which is conventional scheme by the U.S.National Institute of Standards
and Technology (NIST) in 2001,
STEPS:
1. KeyExpansion—round keys are derived from the cipher key using Rijndael's key
schedule.
2. InitialRound
1. AddRoundKey—each byte of the state is combined with the round key using
bitwise xor.
3. Rounds
1. SubBytes—a non-linear substitution step where each byte is replaced with
another according to a lookup table.
2. ShiftRows—a transposition step where each row of the state is shifted
cyclically a certain number of steps.
3. MixColumns—a mixing operation which operates on the columns of the state,
combining the four bytes in each column.
4. AddRoundKey
4. Final Round (no MixColumns)
1. SubBytes
2. ShiftRows
3. AddRoundKey
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
180
3.3 FINDINGS
• The DNA aging & sequencing’s success in sequencing the chemical bases of DNA
is almost transformed accord to the biological changes in age. It is form new
knowledge about fundamental biological processes. The initial segment of the task,
called mapping, it has fragmented the chromosomes into groups as a combined set
of regulated expressions. High Data mined Processors can be used to point out the
location of these grouped genes and expression of genes.
• Age correlated with an increasing percentage of sperm with highly damaged DNA
(range: 0–83%) and tended to inversely correlate with percentage of apoptotic
sperm (range: 0.3%–23%).
• Gene mutations prevent one or more of these proteins from working properly. By
changing a gene’s instructions for making a protein, a mutation can cause the
protein to malfunction or to be missing entirely. When a mutation alters a protein
that plays a critical role in the body, it can disrupt normal development or cause a
medical condition. A condition caused by mutations in one or more genes is called
a genetic disorder
• FUTURE OF GENOMIC RESEARCH
Develop and apply genome-based strategies for the early detection, diagnosis, and
treatment of diseases
Develop new technologies to study genes and DNA on a large scale and store
genomic data efficiently
5. RESULT AND DISCUSSION
It is form new knowledge about fundamental biological processes. High Data mined
Processors can be used to point out the location of these grouped genes and expression of
genes. The various algorithms and ideas are identified for DNA Database security also.
AGE CORRELATION
• Age correlated with an increasing percentage of sperm with highly damaged DNA
(range: 0–83%) and tended to inversely correlate with percentage of apoptotic
sperm (range: 0.3%–23%).
• The DNA aging & sequencing’s success in sequencing the chemical bases of DNA
is almost transformed accord to the biological changes in age. It is form new
knowledge about fundamental biological processes. The initial segment of the task,
called mapping, it has fragmented the chromosomes into groups as a combined set
of regulated expressions. High Data mined Processors can be used to point out the
location of these grouped genes and expression of genes.
6. CONCLUSION
The successful module in aging sequences of DNA genome expressions achieved
completely. The research process is yet to achieve further goals and objectives in disease,
mutation susceptibility, and parental modules with DNA Database security
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
181
REFERENCES
[1] B. Figg. (2004). Cryptography and Network Security. Internet:
http:/www.homepages.dsu.edu/figgw/Cryptography%20&%20Network%2
0Security.ppt.[March 16, 2010].
[2] A. Kahate, Cryptography and Network Security (2nd ed.). New Delhi: Tata McGraw Hill, 2008.
[3] M. Milenkovic. Operating System: Concepts and Design, New York: McGrew-Hill, Inc., 1992.
[4] P.R. Zimmermann. An Introduction to Cryptography. Germany: MIT press. Available:
http://www.pgpi.org/doc/pgpintro, 1995, [March 16, 2009].
[5] W. Stallings. Cryptography and Network Security (4th ed.). Englewood (NJ):Prentice
Hall,1995.
[6] V. Potdar and E. Chang. “Disguising Text Cryptography Using Image Cryptography,”
International Network Conference, United Kingdom: Plymouth, 2004.
[7] S.A.M. Diaa, M.A.K. Hatem, and M.H. Mohiy (2010). “Evaluating The Performance of
Symmetric Encryption Algorithms” International Journal of Network Security, 2010, 10(3),
pp.213-219
[8] T. Ritter. “Crypto Glossary and Dictionary of Technical Cryptography’. Internet:
www.ciphersbyritter.com/GLOSSARY.HTM , 2007, [August 17, 2009]
[9] K.M. Alallayah, W.F.M. Abd El-Wahed, and A.H. Alhamani.“Attack Of Against Simplified
Data Encryption Standard Cipher System Using Neural Networks”. Journal of Computer
Science,2010, 6(1), pp. 29-35.
[10] D. Rudolf. “Development and Analysis of Block Cipher and DES System”.
Internet:http://www.cs.usask..ca/~dtr467/400/, 2000, [April 24, 2009]
[11] H. Wang. (2002). Security Architecture for The Teamdee System. An unpublished MSc
Thesis submitted to Polytechnic Institution and State University, Virginia, USA.
[12] G.W. Moore. (2001). Cryptography Mini-Tutorial. Lecture notes University of Maryland
School of Medicine. Internet: http://www.medparse.com/whatcryp.htm [March16, 2009].
[13] T. Jakobsen and L.R. Knudsen. (2001). Attack on Block of Ciphers of Low Algebraic
Degree. Journal of Cryptography, New York, 14(3), pp.197-210.
[14] N. Su, R.N. Zobel, and F.O. Iwu. “Simulation in Cryptographic Protocol Design and Analysis.”
Proceedings 15th European Simulation Symposium, University of Manchester, UK., 2003.
[15] Dr.R.Manicka Chezian, and Dr.T.Devi. “Termination of triggers in active databases”
International Journal of Information Systems and Change Management, USA, Vol-5, No-3 PP
251-266, 2011
[16] Dr.R.Manicka Chezian, and Dr.T.Devi. “A new algorithm to detect the non termination of
triggers in active databases” International Journal of Advanced Networking and Applications,
Vol-3, Issue-2 PP 1098-1104, 2011
[17] Dr.R.Manicka Chezian, and P.M.Nishad “A vital approach to compare the size of DNA
sequence using LZW with fixed length binary code and tree structures”, International Journal of
Computer Applications, Vol-3, No-1, PP 7-9, 2012
[18] Dr.R.Manicka Chezian, and C.Bagyalakshmi “A survey on cloud data security using encryption
technique” International Journal of Advanced Research in Computer Engineering and
Technology, Vol-1, Issue-5, PP 263-265, 2012.
[19] B.Saichandana, Dr.K.srinivas and Dr. Reddi Kiran Kumar, “Visual Cryptography Scheme for
Color Images”, International Journal of Computer Engineering & Technology (IJCET),
Volume 1, Issue 1, 2010, pp. 207 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[20] Ahmad Salameh Abusukhon, “Block Cipher Encryption for Text-To-Image Algorithm”,
International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3,
2013, pp. 50 - 59, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.