SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
514
AN ANALOGY OF ALGORITHMS FOR TAGGING OF SINGLE
NUCLEOTIDE POLYMORPHISM AND EVALUATION THROUGH
LINKAGE DISEQUILIBRIUM
Pradipta Deb1
, Moitree Basu2
1,2
Tata Consultancy Services Limited
ABSTRACT
Recent years have seen an explosive growth in biological data. It should be managed and
stored for various purposes. Demand has never been greater for revolutionary technologies that
deliver fast, inexpensive and accurate and easy to comprehend genome information. Here comes the
relevance of tagging data. From huge large DNA sequence information scientists needed some small
efficient dataset that they can do their research on and that is exactly why some optimization needed
to be carried upon these big data. A subset of SNPs that are selected to represent the original
information embedded in the full set of SNPs is referred to as the set of Tag SNPs. Large sequencing
projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide
databases are doubling in size approximately every fourteen months. So to track and analyze this
amount of data scientists need some small set of data that can represent the whole database
characteristically. So computer scientists came up with some innovative algorithms to find tag SNPs.
We have done a comparative study by implementing the popular algorithms and evaluating them by
scoring Linkage Disequilibrium
1. INTRODUCTION
Here we will be discussing about current methods for selection of informative single
nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been
genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal
selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the
predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag
SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype
discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence
upon disease risk of common disease-causing variants in candidate genes in a haplotype-based
analysis.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 514-523
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
515
To choose a subset of SNPs, [tag SNPs] that can predict other SNPs in the region with small
probability of error and remove redundant information the following methods we have sincerely
worked on:
• Gauss Algorithm
• Gauss-Jordon Algorithm
• Greedy Algorithm
• Binary Optimization Algorithm
We also implemented a SNP scoring procedure to do a comparison study among all these
procedures and to give an overall view of the problem structure.
2. OVERVIEW
2.1 SNP
A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation
occurring when a single nucleotide A, T, C or G in the genome (or other shared sequence) differs
between paired chromosomes in an individual. The genetic code is specified by the four nucleotide
"letters" A (adenine), C (cytosine), T (thymine), and G (guanine). SNP variation occurs when a
single nucleotide, such as an A, replaces one of the other three nucleotide letters C, G, or T. An
example of a SNP is the alteration of the DNA segment AAGGTTA to ATGGTTA, where the
second "A" in the first snippet is replaced with a "T". On average, SNPs occur in the human
population more than one percent of the time. Because only about three to five percent of a person's
DNA sequence codes for the production of proteins, most SNPs are found outside of "coding
sequences". SNPs found within a coding sequence are of particular interest to researchers because
they are more likely to alter the biological function of a protein. Because of the recent advances in
technology, coupled with the unique ability of these genetic variations to facilitate gene
identification, there has been a recent flurry of SNP discovery and detection. The most important
application of SNP array is in determining disease susceptibility and consequently, in pharmaco-
genomics by measuring the efficacy of drug therapies specifically for the individual. SNP-based
genetic linkage analysis could be performed to map disease loci, and hence determine disease
susceptibility genes for an individual. The results of the different sectors on SNP studies may help to
gain insights into mechanisms of these diseases and to create targeted drugs.
2.2 Tag SNP
Tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome
with high linkage disequilibrium (the non-random association of alleles at two or more loci). It is
possible to identify genetic variation without genotyping every SNP in a chromosomal region. Tag
SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs
across the entire genome are genotyped. Tag SNP mainly helps in analyzing the huge (bulk) snp data
in very short time. With the help of tag SNP, a very large SNP data set can be represented with
merely a small chunk of tag SNP data. So, it computationally easier to work with tag SNP than
whole SNP haplotype. It mainly saves the computational time and memory space. Without tag SNP
it would have been impossible to analyze all the SNP data.
3. Problem Solving Methodologies
3.1 Gauss & Gauss-Jordon Algorithm for Finding Tag SNPs
In linear algebra, Gaussian elimination is an algorithm for solving systems of linear
equations. It can also be used to find the rank of a matrix, to calculate the determinant of a matrix,
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
516
and to calculate the inverse of an invertible square matrix. The method is named after Carl Friedrich
Gauss. We implemented this algorithm in order to find Tag SNPs.
3.2 Gauss Jordan Elimination:
In linear algebra, Gauss–Jordan elimination is an algorithm for getting matrices in reduced
row echelon form using elementary row operations. It is a variation of Gaussian elimination.
Gaussian elimination places zeros below each pivot in the matrix, starting with the top row and
working downwards. Matrices containing zeros below each pivot are said to be in row echelon form.
Gauss–Jordan elimination goes a step further by placing zeros above and below each pivot, such
matrices are said to be in reduced row echelon form. Every matrix has a reduced row echelon form,
and Gauss–Jordan elimination is guaranteed to find it.
Computer science's complexity theory shows Gauss–Jordan elimination to have a time
complexity of O(n3) for an n by n matrix (using Big O Notation). This result means it is efficiently
solvable for most practical purposes. As a result, it is often used in computer software for a diverse
set of applications. However, it is often an unnecessary step past Gaussian elimination. Gaussian
elimination shares Gauss-Jordan's time complexity of O(n3) but is generally faster. Therefore, in
cases in which achieving reduced row echelon form over row echelon form is unnecessary, Gaussian
elimination is typically preferred.
3.3 The Algorithm (Gauss Elimination)
Step1: The first part (Forward Elimination) reduces a given system to either triangular or echelon
form, or results in a degenerate equation, indicating the system has no unique solution but may have
multiple solutions (rank<order). This is accomplished through the use of elementary row operations.
Step2: The second step uses back substitution to find the solution of the system above. Stated
equivalently for matrices, the first part reduces a matrix to row echelon form using elementary row
operations while the second reduces it to reduced row echelon form, or row canonical form.
Step3: From this canonical form the tag SNP are chosen for a particular set of haplotype.
3.4 The Algorithm for (Gauss Jordan Elimination):
Step1: Using Gauss-Jordan Elimination, find Row Reduced Echelon Form (RREF) X of sample
matrix S
Step2: Extract the basis T of sample S
Step3: Factorize sample S = T x reff [X]
Step4: Output set of tags T
Fact: In sample, each SNP is a linear combination of tag SNPs Conjecture: In entire population, each
SNP is same linear combination of tags as in sample follows:
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
517
Figure 1: Example of Gauss Jordan Elimination Procedure
3.5 Greedy Algorithm
A greedy algorithm is any algorithm that follows the problem solving heuristic of making the
locally optimal choice at each stage with the hope of finding the global optimum. Recent studies
have shown that a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of
haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to
distinguish two distinct haplotypes due to the ambiguity caused by missing data.
In our project we applied a greedy algorithm which can find a subset of SNPs which can still
distinguish all distinct haplotypes even when some tag SNPs are missing. Assume we are given a
haplotype block containing N SNPs and K haplotype patterns. This block is denoted by an N × K
binary matrix Mh (see Figure 2). Define Mh[i,j] ⋴ {1,2} for each i ⋴ [1, N] and j ⋴ [1, K], where 1
and 2 represent the major and minor alleles, respectively. In reality, the haplotype block may also
contain missing data. This formulation can be easily extended to handle missing data by treating
them as wild card symbols. To simplify, we will assume no missing data in the block.
Figure 2: Haplotype block with Snp pattern
Let C be the set of all SNPs in Mh. The tag SNPs C' C are a subset of SNPs which is able
to distinguish each pair of haplotype patterns unambiguously. Note that the missing data may occur
at any SNP locus and thus create different missing patterns.
3.5.1 Problem: Minimum Tag SNPs (MTS)
To solve MTS efficiently, we applied a greedy algorithm which returns a solution not too
larger than the optimal solution. Assume that the SNPs selected by this algorithm are stored in a (m +
1) × |P| table. Here we have considered no missing SNP, so m=0. Initially, each grid in the table is
empty. Once a SNP Sk, (that can distinguish patterns Pi and Pj) is selected, one grid of the column (i,
j) is filled in with Sk, and we say that this grid is covered by Sk.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
518
This greedy algorithm works by covering the grids from the first row to the (m + 1)-th row,
and greedily selects a SNP which covers most uncovered grids in the i-th row at each iteration. In
other words, while working on the i-th row, a SNP is selected if its reformulated set S' maximizes |S'
∩ Ri |, where Ri is the set of uncovered grids at the i-th row.
Figure below illustrates an example for this algorithm to tolerate one missing tag SNP (i.e., m
= 1). The SNPs S1, S4, S2, and S3 are selected in order. When all grids in this table are covered,
each pair of patterns is distinguished by (m + 1) SNPs in the corresponding column. Thus, the SNPs
in this table are the robust tag SNPs which can tolerate up to m missing SNPs.
Figure 3: Pictorial example of Greedy Algorithm
The pseudo code of this greedy algorithm is given below:
3.5.2 Greedy Algorithm (C, P, m)
Step1: Ri ← P, i ⋴ [1, m + 1]
Step2: C' ← φ
Step3: for i = 1 to m + 1 do
Step4: while Ri ≠ φ do
Step5: select and remove a SNP S from C that maximizes |S' ∩ Ri|
Step6: C' ← C' ⋃ S
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
519
Step7: j ← i
Step8: while S' ≠ φ and j ≤ m + 1 do
Step9: Stmp ← S' ∩ Rj //Stmp is a temporary variable for holding the result of S' ∩ Ri
Step10: Rj ← Rj - Stmp
Step11: S' ← S' - Stmp
Step12: j ← j + l
Step13: endwhile
Step14: endwhile
Step15: endfor
Step16: return C'
The time complexity of this algorithm is analyzed as follows. At Line 4, the number of
iterations of the intermediate loop is bounded by |Ri| ≤ |P|. Within the loop body (Lines 5–13), Line 5
takes O (|C||P|) because we need to check all SNPs in C and examine the uncovered grids of Ri. The
inner loop (Lines 8–13) takes only O (|S'|). Thus, the entire program runs in O (m|C||P|2).
3.6 Tag Snp Selection using Binary Optimization Functions
Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene
association. However, they are limited by the cost of genotyping the tremendous number of SNPs. It
is therefore essential to select only informative subsets (tag SNPs) out of all SNPs. Several promising
methods for tag SNP selection have been proposed, such as the haplotype block-based and block-free
approaches. The block-free methods are preferred by some researchers because most of the block-
based methods rely on strong assumptions, such as prior block-partitioning, bi-allelic SNPs, or a
fixed number or locations for tagging SNPs. We employed the feature selection idea of binary
optimization function (BOF) to find informative tag SNPs. This method is very efficient, as it does
not rely on block partitioning of the genomic region.
3.6.1 The BOF Algorithm
Step1: Take one SNP each at a time.
Step2: Using the sliding window method, cut the SNP in suitable size (according to the function
used).
Step3: Find the fitness value of each window at a time by the fitness function (objective function).
Step4: Add the fitness value of all the windows of a SNP.
Step5: Repeat all the above steps for all the SNPs present in the database
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
520
Step6: Find the SNPs with same fitness value and make groups.
Step7: Take a random SNP from each fitness group and these are the tag SNPs.
In binary optimization, it is very easy to design some algorithms that are extremely good on
some benchmarks (and extremely bad on some others). It means we have to be very careful when we
choose a test function set.
3.6.2 Goldberg's order-3
The fitness f of a bit-string is the sum of the result of separately applying the following
function to consecutive groups of three components each:
Figure 4: Goldberg Function
If the string size is D, the maximum value is obviously D/3, for the string 1111...111. In
practice, we will then use as fitness the value D/3 - f so that the problem is now to find the
minimum 0.
3.6.3 Bipolar order-6
The fitness is the sum of the result of applying the following function to consecutive groups
of six components each:
Figure 5: Bipolar Six Function
So the solutions are all combinations of sequences 6x1 and 6x0. In particular, 1111...111 and
0000...000 are solutions. The maximum value is D/6.
4. EXPERIMENTAL RESULTS
For our paper, we have selected 3 data sets as ( 774 X 103), ( 120 X 618 ) & ( 93 X 550 ),
where the first value in each set represents the row (no of SNP) and second value represents the no of
column(no of haplotype). The LD Value [7], [8] for each data set is very close to the range (0 ≤ LD ≤
0.3). We have compared (LD/No. of Tag SNP) ratio for each method in each 3 database ie higher
ratio means the tag SNPs are highly correlated with the rest of the SNPs in the data. Also it is
notifiable that the number of tag SNPs for the data sets is fairly low which is good because then only
that small amount of data can represent the whole dataset.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
521
Figure 7: No. of Tag Snp values for different methods in each dataset along with their LD values
Figure 8: LD/TAG SNP ratio comparison value
From the graph it clear, among the methods we used greedy gives the best result whereas
Binary Optimization function (Using Goldberg Function) gives a quite moderate result but Bipolar-
Six and Gauss-Jordan Algorithm gives a less convincing result as in those cases no of TAG SNP is
quite high as well as co-relation value is low.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
522
5. CONCLUSION
We defined a new natural measure for evaluating the prediction accuracy of a set of tag
SNPs, and use it to develop a new method for tag SNPs selection. This method is based on some
novel algorithm that predicts the values of the rest of the SNPs given the tag SNPs. This methods are
very efficient. We compared different popular methods of tag SNP selection algorithms on some
different genotype datasets from different sources. We also have done a comparative study on the
result of different algorithms. In BOF we had to choose from multiple sequences having same score
in random. So different sequences could be the result at different instances. So to obtain fixed
sequence at every run some mechanism could be introduced here.
6. REFERENCES
1. Halldórsson BV, Bafna V, Lippert R, Schwartz R, Vega FM, ClarkAG,Istrail S: Optimal
haplotype block-free selection of tagging SNPs for genome-wide association studies.
Genome Research2004:1633-1640.
2. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA:Selecting a maximally
informative set of single-nucleotide polymorphisms for association analyses using linkage
disequilibrium.
3. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype
structure in the human genome. Nat Genet2001, 29(2):229-232.
4. Cormen TH, Leiserson CE, Rivest RL, Stein C: Introduction to algorithms The MIT Press;
2001.
5. McCluskey E. Minimization of Boolean Functions. Bell System Technical Journal. 1956;
35:1417–1444.
6. Tomaszewski, S. P., Celik, I. U., Antoniou, G. E., "WWW-based Boolean function
minimization" INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND
COMPUTER SCIENCE, VOL 13; PART 4, pages 577-584, 2003.
7. Zhang K, Qin ZS, Liu JS, Chen T, Waterman MS, Sun F: Haplotype block partition and tag
SNP selection using genotype data and their applications to association studies. Genome
Research 2004, 14:908-916.
8. Zhao JH, Lissarrague S, Essioux L, Sham PC: GENECOUNTING: haplotype analysis with
missing genotypes. Bioinformatics 2002, 18:1694-1695.
9. B. Devlin, and N. Risch. A comparison of linkage disequilibrium measures for fine-scale
mapping. Genomics.29, 311–322, 1995.
10. B. Halldórsson, V. Bafna, R. Lippert, R. Schwartz, F. de la Vega, A. Clark, and S. Istrail.
Optimal haplotype block-free selection of tagging snps for genome-wide association studies.
Genome research.14, 1633-1640, 2004.
11. Z. Meng, D. Zaykin, C. Xu, M. Wagner, and M. Ehm. Selection of genetic markers for
association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet. 73:
115–130, 2003.
12. He, J. and Zelikovsky, A. (2004) ‘Linear Reduction Methods for Tag SNP Selection’,
Proceedings of the International Conference of the IEEE Engineering in Medicine and
Biology (EMBC’04), pp. 2840–2843.
13. Zhang, K., Qin, Z., Liu, J., Chen, T., Waterman, M., and Sun, F. (2004) ‘Haplotype block
partitioning and tag SNP selection using genotype data and their applications to association
studies’, Genome Research, Vol. 14, pp. 908–916.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
523
14. Jingwu He, Kelly Westbrooks and Alexander Zelikovsky ‘Linear Reduction Method for
Predictive and Informative Tag SNP Selection’ Proceedings of the International Conference
of the IEEE Engineering in Medicine and Biology (EMBC’09)
15. Tu Minh Phuong , Zhen Lin Russ B. Altman ‘Choosing SNPs Using Feature Selection’ 2005
IEEE Computational Systems Bioinformatics Conference (CSB’05)
16. H. J. Greenberg, W. E. Hart, and G. Lancia, “Opportunities for combinatorial optimization in
computational biology.,” IN- FORMS Journal on Computing, Vol. 16, No. 3, pp. 211–231,
2004.
17. Shambhavi.B.R and Dr.Ramakanth Kumar.P, “Current State of the Art Pos Tagging for
Indian Languages – A Study”, International Journal of Computer Engineering & Technology
(IJCET), Volume 1, Issue 1, 2010, pp. 250 - 260, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
18. Moitree Basu and Pradipta Deb, “Tag SNP Selection using Quine-Mccluskey Optimization
Method”, International Journal of Advanced Research in Engineering & Technology
(IJARET), Volume 4, Issue 5, 2013, pp. 74 - 81, ISSN Print: 0976-6480, ISSN Online:
0976-6499.

More Related Content

Viewers also liked

Determination of residual stresses of welded joints prepared under
Determination of residual stresses of welded joints prepared underDetermination of residual stresses of welded joints prepared under
Determination of residual stresses of welded joints prepared underIAEME Publication
 
Investigation into the effects of process parameters on delamination 2
Investigation into the effects of process parameters on delamination 2Investigation into the effects of process parameters on delamination 2
Investigation into the effects of process parameters on delamination 2IAEME Publication
 
Text region extraction from low resolution display board ima
Text region extraction from low resolution display board imaText region extraction from low resolution display board ima
Text region extraction from low resolution display board imaIAEME Publication
 
Human resource management practices in multinational companies a case study i
Human resource management practices in multinational companies  a case study iHuman resource management practices in multinational companies  a case study i
Human resource management practices in multinational companies a case study iIAEME Publication
 
Analysis and simulation of chip formation & thermal effects on tool life usin...
Analysis and simulation of chip formation & thermal effects on tool life usin...Analysis and simulation of chip formation & thermal effects on tool life usin...
Analysis and simulation of chip formation & thermal effects on tool life usin...IAEME Publication
 
A mathematical model for predicting autoclave expansion for portland cement
A mathematical model for predicting autoclave expansion for portland cementA mathematical model for predicting autoclave expansion for portland cement
A mathematical model for predicting autoclave expansion for portland cementIAEME Publication
 
Design of multiloop controller for multivariable system using coefficient 2
Design of multiloop controller for multivariable system using coefficient 2Design of multiloop controller for multivariable system using coefficient 2
Design of multiloop controller for multivariable system using coefficient 2IAEME Publication
 
Presentacio 2
Presentacio 2Presentacio 2
Presentacio 2AgataBM
 
Estudo: Um Ministério conectado ou nem tanto assim?
Estudo: Um Ministério conectado ou nem tanto assim?Estudo: Um Ministério conectado ou nem tanto assim?
Estudo: Um Ministério conectado ou nem tanto assim?Miti Inteligência
 
Problemas na produção de jogos - Projeto Vates (Conceitual)
Problemas na produção de jogos - Projeto Vates (Conceitual)Problemas na produção de jogos - Projeto Vates (Conceitual)
Problemas na produção de jogos - Projeto Vates (Conceitual)Jogos Digitais, PUC - SP
 
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médica
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médicaAPRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médica
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médicaDeysy del Rosario
 
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...budzma
 

Viewers also liked (20)

Determination of residual stresses of welded joints prepared under
Determination of residual stresses of welded joints prepared underDetermination of residual stresses of welded joints prepared under
Determination of residual stresses of welded joints prepared under
 
Investigation into the effects of process parameters on delamination 2
Investigation into the effects of process parameters on delamination 2Investigation into the effects of process parameters on delamination 2
Investigation into the effects of process parameters on delamination 2
 
Text region extraction from low resolution display board ima
Text region extraction from low resolution display board imaText region extraction from low resolution display board ima
Text region extraction from low resolution display board ima
 
Human resource management practices in multinational companies a case study i
Human resource management practices in multinational companies  a case study iHuman resource management practices in multinational companies  a case study i
Human resource management practices in multinational companies a case study i
 
Analysis and simulation of chip formation & thermal effects on tool life usin...
Analysis and simulation of chip formation & thermal effects on tool life usin...Analysis and simulation of chip formation & thermal effects on tool life usin...
Analysis and simulation of chip formation & thermal effects on tool life usin...
 
A mathematical model for predicting autoclave expansion for portland cement
A mathematical model for predicting autoclave expansion for portland cementA mathematical model for predicting autoclave expansion for portland cement
A mathematical model for predicting autoclave expansion for portland cement
 
Design of multiloop controller for multivariable system using coefficient 2
Design of multiloop controller for multivariable system using coefficient 2Design of multiloop controller for multivariable system using coefficient 2
Design of multiloop controller for multivariable system using coefficient 2
 
Unicef
UnicefUnicef
Unicef
 
Presentation
PresentationPresentation
Presentation
 
Presentacio 2
Presentacio 2Presentacio 2
Presentacio 2
 
Moysa A Caring Company
Moysa A Caring CompanyMoysa A Caring Company
Moysa A Caring Company
 
Estudo: Um Ministério conectado ou nem tanto assim?
Estudo: Um Ministério conectado ou nem tanto assim?Estudo: Um Ministério conectado ou nem tanto assim?
Estudo: Um Ministério conectado ou nem tanto assim?
 
Problemas na produção de jogos - Projeto Vates (Conceitual)
Problemas na produção de jogos - Projeto Vates (Conceitual)Problemas na produção de jogos - Projeto Vates (Conceitual)
Problemas na produção de jogos - Projeto Vates (Conceitual)
 
Bruno milanez
Bruno milanezBruno milanez
Bruno milanez
 
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médica
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médicaAPRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médica
APRECIACIACION CRITICA PROBERTANLectura crítica de publicidad médica
 
Geografia
GeografiaGeografia
Geografia
 
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...
Зміцер Рагачоў, каардынатар праекта "Эпоха князя Міхала Клеафаса", – праект “...
 
Tarjeta de amor
Tarjeta de amorTarjeta de amor
Tarjeta de amor
 
Time and date
Time and dateTime and date
Time and date
 
Dago view
Dago viewDago view
Dago view
 

Similar to Tag SNPs selection algorithms evaluation

Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2IAEME Publication
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET Journal
 
Identifying the Coding and Non Coding Regions of DNA Using Spectral Analysis
Identifying the Coding and Non Coding Regions of DNA Using Spectral AnalysisIdentifying the Coding and Non Coding Regions of DNA Using Spectral Analysis
Identifying the Coding and Non Coding Regions of DNA Using Spectral AnalysisIJMER
 
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm andData mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm andIAEME Publication
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
 
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...AM Publications
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programmingeSAT Publishing House
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryIAEME Publication
 
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurSubgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurIAEME Publication
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
 

Similar to Tag SNPs selection algorithms evaluation (20)

Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2
 
2015-03-31_MotifGP
2015-03-31_MotifGP2015-03-31_MotifGP
2015-03-31_MotifGP
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
 
Identifying the Coding and Non Coding Regions of DNA Using Spectral Analysis
Identifying the Coding and Non Coding Regions of DNA Using Spectral AnalysisIdentifying the Coding and Non Coding Regions of DNA Using Spectral Analysis
Identifying the Coding and Non Coding Regions of DNA Using Spectral Analysis
 
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm andData mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 
40120130405011
4012013040501140120130405011
40120130405011
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binary
 
50120140503004
5012014050300450120140503004
50120140503004
 
50120130406023
5012013040602350120130406023
50120130406023
 
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurSubgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Tag SNPs selection algorithms evaluation

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 514 AN ANALOGY OF ALGORITHMS FOR TAGGING OF SINGLE NUCLEOTIDE POLYMORPHISM AND EVALUATION THROUGH LINKAGE DISEQUILIBRIUM Pradipta Deb1 , Moitree Basu2 1,2 Tata Consultancy Services Limited ABSTRACT Recent years have seen an explosive growth in biological data. It should be managed and stored for various purposes. Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate and easy to comprehend genome information. Here comes the relevance of tagging data. From huge large DNA sequence information scientists needed some small efficient dataset that they can do their research on and that is exactly why some optimization needed to be carried upon these big data. A subset of SNPs that are selected to represent the original information embedded in the full set of SNPs is referred to as the set of Tag SNPs. Large sequencing projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide databases are doubling in size approximately every fourteen months. So to track and analyze this amount of data scientists need some small set of data that can represent the whole database characteristically. So computer scientists came up with some innovative algorithms to find tag SNPs. We have done a comparative study by implementing the popular algorithms and evaluating them by scoring Linkage Disequilibrium 1. INTRODUCTION Here we will be discussing about current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 514-523 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 515 To choose a subset of SNPs, [tag SNPs] that can predict other SNPs in the region with small probability of error and remove redundant information the following methods we have sincerely worked on: • Gauss Algorithm • Gauss-Jordon Algorithm • Greedy Algorithm • Binary Optimization Algorithm We also implemented a SNP scoring procedure to do a comparison study among all these procedures and to give an overall view of the problem structure. 2. OVERVIEW 2.1 SNP A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide A, T, C or G in the genome (or other shared sequence) differs between paired chromosomes in an individual. The genetic code is specified by the four nucleotide "letters" A (adenine), C (cytosine), T (thymine), and G (guanine). SNP variation occurs when a single nucleotide, such as an A, replaces one of the other three nucleotide letters C, G, or T. An example of a SNP is the alteration of the DNA segment AAGGTTA to ATGGTTA, where the second "A" in the first snippet is replaced with a "T". On average, SNPs occur in the human population more than one percent of the time. Because only about three to five percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences". SNPs found within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein. Because of the recent advances in technology, coupled with the unique ability of these genetic variations to facilitate gene identification, there has been a recent flurry of SNP discovery and detection. The most important application of SNP array is in determining disease susceptibility and consequently, in pharmaco- genomics by measuring the efficacy of drug therapies specifically for the individual. SNP-based genetic linkage analysis could be performed to map disease loci, and hence determine disease susceptibility genes for an individual. The results of the different sectors on SNP studies may help to gain insights into mechanisms of these diseases and to create targeted drugs. 2.2 Tag SNP Tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci). It is possible to identify genetic variation without genotyping every SNP in a chromosomal region. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped. Tag SNP mainly helps in analyzing the huge (bulk) snp data in very short time. With the help of tag SNP, a very large SNP data set can be represented with merely a small chunk of tag SNP data. So, it computationally easier to work with tag SNP than whole SNP haplotype. It mainly saves the computational time and memory space. Without tag SNP it would have been impossible to analyze all the SNP data. 3. Problem Solving Methodologies 3.1 Gauss & Gauss-Jordon Algorithm for Finding Tag SNPs In linear algebra, Gaussian elimination is an algorithm for solving systems of linear equations. It can also be used to find the rank of a matrix, to calculate the determinant of a matrix,
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 516 and to calculate the inverse of an invertible square matrix. The method is named after Carl Friedrich Gauss. We implemented this algorithm in order to find Tag SNPs. 3.2 Gauss Jordan Elimination: In linear algebra, Gauss–Jordan elimination is an algorithm for getting matrices in reduced row echelon form using elementary row operations. It is a variation of Gaussian elimination. Gaussian elimination places zeros below each pivot in the matrix, starting with the top row and working downwards. Matrices containing zeros below each pivot are said to be in row echelon form. Gauss–Jordan elimination goes a step further by placing zeros above and below each pivot, such matrices are said to be in reduced row echelon form. Every matrix has a reduced row echelon form, and Gauss–Jordan elimination is guaranteed to find it. Computer science's complexity theory shows Gauss–Jordan elimination to have a time complexity of O(n3) for an n by n matrix (using Big O Notation). This result means it is efficiently solvable for most practical purposes. As a result, it is often used in computer software for a diverse set of applications. However, it is often an unnecessary step past Gaussian elimination. Gaussian elimination shares Gauss-Jordan's time complexity of O(n3) but is generally faster. Therefore, in cases in which achieving reduced row echelon form over row echelon form is unnecessary, Gaussian elimination is typically preferred. 3.3 The Algorithm (Gauss Elimination) Step1: The first part (Forward Elimination) reduces a given system to either triangular or echelon form, or results in a degenerate equation, indicating the system has no unique solution but may have multiple solutions (rank<order). This is accomplished through the use of elementary row operations. Step2: The second step uses back substitution to find the solution of the system above. Stated equivalently for matrices, the first part reduces a matrix to row echelon form using elementary row operations while the second reduces it to reduced row echelon form, or row canonical form. Step3: From this canonical form the tag SNP are chosen for a particular set of haplotype. 3.4 The Algorithm for (Gauss Jordan Elimination): Step1: Using Gauss-Jordan Elimination, find Row Reduced Echelon Form (RREF) X of sample matrix S Step2: Extract the basis T of sample S Step3: Factorize sample S = T x reff [X] Step4: Output set of tags T Fact: In sample, each SNP is a linear combination of tag SNPs Conjecture: In entire population, each SNP is same linear combination of tags as in sample follows:
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 517 Figure 1: Example of Gauss Jordan Elimination Procedure 3.5 Greedy Algorithm A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding the global optimum. Recent studies have shown that a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. In our project we applied a greedy algorithm which can find a subset of SNPs which can still distinguish all distinct haplotypes even when some tag SNPs are missing. Assume we are given a haplotype block containing N SNPs and K haplotype patterns. This block is denoted by an N × K binary matrix Mh (see Figure 2). Define Mh[i,j] ⋴ {1,2} for each i ⋴ [1, N] and j ⋴ [1, K], where 1 and 2 represent the major and minor alleles, respectively. In reality, the haplotype block may also contain missing data. This formulation can be easily extended to handle missing data by treating them as wild card symbols. To simplify, we will assume no missing data in the block. Figure 2: Haplotype block with Snp pattern Let C be the set of all SNPs in Mh. The tag SNPs C' C are a subset of SNPs which is able to distinguish each pair of haplotype patterns unambiguously. Note that the missing data may occur at any SNP locus and thus create different missing patterns. 3.5.1 Problem: Minimum Tag SNPs (MTS) To solve MTS efficiently, we applied a greedy algorithm which returns a solution not too larger than the optimal solution. Assume that the SNPs selected by this algorithm are stored in a (m + 1) × |P| table. Here we have considered no missing SNP, so m=0. Initially, each grid in the table is empty. Once a SNP Sk, (that can distinguish patterns Pi and Pj) is selected, one grid of the column (i, j) is filled in with Sk, and we say that this grid is covered by Sk.
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 518 This greedy algorithm works by covering the grids from the first row to the (m + 1)-th row, and greedily selects a SNP which covers most uncovered grids in the i-th row at each iteration. In other words, while working on the i-th row, a SNP is selected if its reformulated set S' maximizes |S' ∩ Ri |, where Ri is the set of uncovered grids at the i-th row. Figure below illustrates an example for this algorithm to tolerate one missing tag SNP (i.e., m = 1). The SNPs S1, S4, S2, and S3 are selected in order. When all grids in this table are covered, each pair of patterns is distinguished by (m + 1) SNPs in the corresponding column. Thus, the SNPs in this table are the robust tag SNPs which can tolerate up to m missing SNPs. Figure 3: Pictorial example of Greedy Algorithm The pseudo code of this greedy algorithm is given below: 3.5.2 Greedy Algorithm (C, P, m) Step1: Ri ← P, i ⋴ [1, m + 1] Step2: C' ← φ Step3: for i = 1 to m + 1 do Step4: while Ri ≠ φ do Step5: select and remove a SNP S from C that maximizes |S' ∩ Ri| Step6: C' ← C' ⋃ S
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 519 Step7: j ← i Step8: while S' ≠ φ and j ≤ m + 1 do Step9: Stmp ← S' ∩ Rj //Stmp is a temporary variable for holding the result of S' ∩ Ri Step10: Rj ← Rj - Stmp Step11: S' ← S' - Stmp Step12: j ← j + l Step13: endwhile Step14: endwhile Step15: endfor Step16: return C' The time complexity of this algorithm is analyzed as follows. At Line 4, the number of iterations of the intermediate loop is bounded by |Ri| ≤ |P|. Within the loop body (Lines 5–13), Line 5 takes O (|C||P|) because we need to check all SNPs in C and examine the uncovered grids of Ri. The inner loop (Lines 8–13) takes only O (|S'|). Thus, the entire program runs in O (m|C||P|2). 3.6 Tag Snp Selection using Binary Optimization Functions Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, they are limited by the cost of genotyping the tremendous number of SNPs. It is therefore essential to select only informative subsets (tag SNPs) out of all SNPs. Several promising methods for tag SNP selection have been proposed, such as the haplotype block-based and block-free approaches. The block-free methods are preferred by some researchers because most of the block- based methods rely on strong assumptions, such as prior block-partitioning, bi-allelic SNPs, or a fixed number or locations for tagging SNPs. We employed the feature selection idea of binary optimization function (BOF) to find informative tag SNPs. This method is very efficient, as it does not rely on block partitioning of the genomic region. 3.6.1 The BOF Algorithm Step1: Take one SNP each at a time. Step2: Using the sliding window method, cut the SNP in suitable size (according to the function used). Step3: Find the fitness value of each window at a time by the fitness function (objective function). Step4: Add the fitness value of all the windows of a SNP. Step5: Repeat all the above steps for all the SNPs present in the database
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 520 Step6: Find the SNPs with same fitness value and make groups. Step7: Take a random SNP from each fitness group and these are the tag SNPs. In binary optimization, it is very easy to design some algorithms that are extremely good on some benchmarks (and extremely bad on some others). It means we have to be very careful when we choose a test function set. 3.6.2 Goldberg's order-3 The fitness f of a bit-string is the sum of the result of separately applying the following function to consecutive groups of three components each: Figure 4: Goldberg Function If the string size is D, the maximum value is obviously D/3, for the string 1111...111. In practice, we will then use as fitness the value D/3 - f so that the problem is now to find the minimum 0. 3.6.3 Bipolar order-6 The fitness is the sum of the result of applying the following function to consecutive groups of six components each: Figure 5: Bipolar Six Function So the solutions are all combinations of sequences 6x1 and 6x0. In particular, 1111...111 and 0000...000 are solutions. The maximum value is D/6. 4. EXPERIMENTAL RESULTS For our paper, we have selected 3 data sets as ( 774 X 103), ( 120 X 618 ) & ( 93 X 550 ), where the first value in each set represents the row (no of SNP) and second value represents the no of column(no of haplotype). The LD Value [7], [8] for each data set is very close to the range (0 ≤ LD ≤ 0.3). We have compared (LD/No. of Tag SNP) ratio for each method in each 3 database ie higher ratio means the tag SNPs are highly correlated with the rest of the SNPs in the data. Also it is notifiable that the number of tag SNPs for the data sets is fairly low which is good because then only that small amount of data can represent the whole dataset.
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 521 Figure 7: No. of Tag Snp values for different methods in each dataset along with their LD values Figure 8: LD/TAG SNP ratio comparison value From the graph it clear, among the methods we used greedy gives the best result whereas Binary Optimization function (Using Goldberg Function) gives a quite moderate result but Bipolar- Six and Gauss-Jordan Algorithm gives a less convincing result as in those cases no of TAG SNP is quite high as well as co-relation value is low.
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 522 5. CONCLUSION We defined a new natural measure for evaluating the prediction accuracy of a set of tag SNPs, and use it to develop a new method for tag SNPs selection. This method is based on some novel algorithm that predicts the values of the rest of the SNPs given the tag SNPs. This methods are very efficient. We compared different popular methods of tag SNP selection algorithms on some different genotype datasets from different sources. We also have done a comparative study on the result of different algorithms. In BOF we had to choose from multiple sequences having same score in random. So different sequences could be the result at different instances. So to obtain fixed sequence at every run some mechanism could be introduced here. 6. REFERENCES 1. Halldórsson BV, Bafna V, Lippert R, Schwartz R, Vega FM, ClarkAG,Istrail S: Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Research2004:1633-1640. 2. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA:Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. 3. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet2001, 29(2):229-232. 4. Cormen TH, Leiserson CE, Rivest RL, Stein C: Introduction to algorithms The MIT Press; 2001. 5. McCluskey E. Minimization of Boolean Functions. Bell System Technical Journal. 1956; 35:1417–1444. 6. Tomaszewski, S. P., Celik, I. U., Antoniou, G. E., "WWW-based Boolean function minimization" INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, VOL 13; PART 4, pages 577-584, 2003. 7. Zhang K, Qin ZS, Liu JS, Chen T, Waterman MS, Sun F: Haplotype block partition and tag SNP selection using genotype data and their applications to association studies. Genome Research 2004, 14:908-916. 8. Zhao JH, Lissarrague S, Essioux L, Sham PC: GENECOUNTING: haplotype analysis with missing genotypes. Bioinformatics 2002, 18:1694-1695. 9. B. Devlin, and N. Risch. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics.29, 311–322, 1995. 10. B. Halldórsson, V. Bafna, R. Lippert, R. Schwartz, F. de la Vega, A. Clark, and S. Istrail. Optimal haplotype block-free selection of tagging snps for genome-wide association studies. Genome research.14, 1633-1640, 2004. 11. Z. Meng, D. Zaykin, C. Xu, M. Wagner, and M. Ehm. Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet. 73: 115–130, 2003. 12. He, J. and Zelikovsky, A. (2004) ‘Linear Reduction Methods for Tag SNP Selection’, Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology (EMBC’04), pp. 2840–2843. 13. Zhang, K., Qin, Z., Liu, J., Chen, T., Waterman, M., and Sun, F. (2004) ‘Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies’, Genome Research, Vol. 14, pp. 908–916.
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 523 14. Jingwu He, Kelly Westbrooks and Alexander Zelikovsky ‘Linear Reduction Method for Predictive and Informative Tag SNP Selection’ Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology (EMBC’09) 15. Tu Minh Phuong , Zhen Lin Russ B. Altman ‘Choosing SNPs Using Feature Selection’ 2005 IEEE Computational Systems Bioinformatics Conference (CSB’05) 16. H. J. Greenberg, W. E. Hart, and G. Lancia, “Opportunities for combinatorial optimization in computational biology.,” IN- FORMS Journal on Computing, Vol. 16, No. 3, pp. 211–231, 2004. 17. Shambhavi.B.R and Dr.Ramakanth Kumar.P, “Current State of the Art Pos Tagging for Indian Languages – A Study”, International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1, 2010, pp. 250 - 260, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. 18. Moitree Basu and Pradipta Deb, “Tag SNP Selection using Quine-Mccluskey Optimization Method”, International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 4, Issue 5, 2013, pp. 74 - 81, ISSN Print: 0976-6480, ISSN Online: 0976-6499.