SlideShare une entreprise Scribd logo
1  sur  56
Biological sequences analysis
A review of two alignment-free methods for sequence comparison
Outline
• Introduction to sequence alignment problem
• Introduction to alignment-free sequence comparison
• An LZ-complexity based alignment method
• A 2D graphical alignment method
• Methods overall comparison
Introduction to sequence alignment
• Goal: determine if a particular sequence is like another sequence
• determine if a database contains a potential homologous sequence.
Introduction to sequence alignment
• Two alignment types are used: global and local
• The global approach compares one whole sequence with other entire
sequences.
• The local method uses a subset of a sequence and attempts to align it to
subset of other sequences.
Introduction to sequence alignment
• The global alignment looks for comparison over the entire range of
the two sequences involved.
GCATTACTAATATATTAGTAAATCAGAGTAGTA
||||||||| ||
AAGCGAATAATATATTTATACTCAGATTATTGCGCG
Introduction to sequence alignment
• By contrast, when a local alignment is performed, a small seed is
uncovered that can be used to quickly extend the alignment.
• The initial seed for the alignment:
TAT
|||
AAGCGAATAATATATTTATACTCAGATTATTGCGCG
Introduction to sequence alignment
• By contrast, when a local alignment is performed, a small seed is
uncovered that can be used to quickly extend the alignment.
• And now the extended alignment:
TATATATTAGTA
||||||||| ||
AAGCGAATAATATATTTATACTCAGATTATTGCGCG
Introduction to sequence alignment
• How to search similiarities in genetic sequences?
• Naive methods: comparing all possibles alignments (extremely slow)
• Heuristics methods
• Examples: BLAST, FASTA, …
• Optimal solution is not guaranteed
• Tradeoff: Speed vs Accuracy
• Dynamic programming methods
• Examples: Needleman & Wunsch, Smith & Waterman
Introduction to sequence alignment
• How to search similiarities in genetic sequences?
• Naive methods: comparing all possibles alignments (extremely slow)
• Heuristics methods
• Examples: BLASTA, FASTA, …
• Optimal solution is not guaranteed
• Tradeoff: Speed vs Accuracy
• Dynamic programming methods
• Examples: Needleman & Wunsch, Smith & Waterman
• …faster alternatives?
Alignment-free comparison
• Challenge: overcome the traditional alignment-based algorithm
inefficiency
• Alignment-based methods
• Slow
• May produce incorrect results when used on more divergent but functionally
related sequences
Alignment-free comparison
• Much faster than alignment-based methods
• most methods work in linear time
• Four categories:
• methods based on k-mer/word frequency,
• methods based on substrings,
• methods based on information theory (LZ-complexity based method) and
• methods based on graphical representation (2D-graphical method)
LZ-complexity based sequence comparison
• Method based on information theory
• Analysis of DNA/Proteic sequences
• Built upon the LZ-complexity measure
• Dynamic programming algorithm
LZ-complexity
• Complexity measure for finite sequences
• LZ-complexity as entropy rate estimator for finite sequences
• Produces a dictionary of productions for a sequence 𝑆.
• “The proposed complexity measure is related to the number of steps
in a self-delimiting production process by which a given sequence is
presumed to be generated” (Abraham Lempel and Jacob Ziv, "On the Complexity of
Individual Sequences“, 1976)
LZ-complexity (production process)
• 𝑚-step production process of a finite sequence 𝑆
𝐻 𝑆 = 𝑆 1, ℎ1 ∗ 𝑆 ℎ1 + 1, ℎ2 , … , 𝑆(ℎ 𝑚−1 + 1, ℎ 𝑚)
• 𝐻 𝑆 is called history of 𝑆 and 𝐻𝑖 𝑆 = 𝑆(ℎ𝑖−1 + 1, ℎ𝑖) is called the
ith component of 𝐻 𝑆 .
• Each component 𝐻𝑖 𝑆 is added into a dictionary
LZ-complexity (algorithm)
Initialize the dictionary
repeat until the sequence have not been consumed
 Add the next symbol to the current subsequence.
 If the subsequence is reproducible from the previous history, add to the
dictionary and increase index value
LZ-complexity (algorithm)
Initialize the dictionary
repeat until the sequence have not been consumed
 Add the next symbol to the current subsequence.
 If the subsequence is reproducible from the previous history, add to the
dictionary and increase index value
The production process inserts a comma (',') into a sequence 𝑆 after the
creation of each new phrase formed by the concatenation of the longest
recognized dictionary phrase and the innovative symbol that follows.
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A
2 T
3 G
4 G
5 T
6 C
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T
3 G
4 G
5 T
6 C
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G
4 G
5 T
6 C
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T
6 C
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C C 5
7 G
8 G
9 T
10 T
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C C 5
7 G
8 G
9 T
10 T GGTT 6
11 T
12 C
LZ-complexity (Example)
• S = ATGGTCGGTTTC
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C C 5
7 G
8 G
9 T
10 T GGTT 6
11 T
12 C TC 7
LZ-complexity (Example)
• S = ATGGTCGGTTTC
• The complexity 𝑐 𝑆 of the
sequence S is
• 𝑐 𝑆 = 7
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C C 5
7 G
8 G
9 T
10 T GGTT 6
11 T
12 C TC 7
LZ-complexity (Example)
• S = ATGGTCGGTTTC
• The complexity 𝑐 𝑆 of the
sequence S is
• 𝑐 𝑆 = 7
• The history of 𝑆 is
• 𝐻 𝑆 = {𝐴, 𝑇, 𝐺, 𝐺𝑇, 𝐶, 𝐺𝐺𝑇𝑇, 𝑇𝐶}
Position Symbol Add to dictionary Index
1 A A 1
2 T T 2
3 G G 3
4 G
5 T GT 4
6 C C 5
7 G
8 G
9 T
10 T GGTT 6
11 T
12 C TC 7
LZ-complexity based sequence comparison
• Based on the number of components in the LZ-complexity
decomposition of the DNA sequences.
• Given two sequences S and Q decomposed using the LZ-complexity:
𝑆 = 𝑆1 𝑆2…𝑆 𝑘…𝑆 𝑚
𝑄 = 𝑄1 𝑄…𝑄 𝑘…𝑄 𝑛
 𝑚 is the number of fragments of 𝑆
 𝑛 is the number of fragments of 𝑄
LZ-complexity based sequence comparison
• Let 𝜎 be a score function used to build the dynamic programming
matrix. It is defined as follows:
𝜎 𝑆𝑖, _ = 𝜎 _, 𝑄𝑖 = 1
𝜎 𝑆𝑖, 𝑄𝑗 = 1 −
𝑁(𝑆𝑖, 𝑄𝑗)
max 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆𝑖, 𝑄𝑗)
• where 𝑁(𝑆𝑖, 𝑄𝑗) is the number of the same elements of fragment 𝑆𝑖
and 𝑄𝑗.
LZ-complexity based sequence comparison
• The sequence similarity matrix 𝑀 is built using the following
formulas:
𝑀 𝑖, 0 = 𝑘=1
𝑖
𝜎 𝑆𝑖, _
𝑀 0, 𝑗 = 𝑘=1
𝑗
𝜎 _, 𝑄𝑗
𝑀[𝑖, 𝑗] = min
𝑀 𝑖 − 1, 𝑗 + 𝜎(𝑆𝑖, _)
𝑀 𝑖 − 1, 𝑗 − 1 + 𝜎(𝑆𝑖, 𝑄𝑖)
𝑀 𝑖, 𝑗 − 1 + 𝜎(_, 𝑄𝑖)
𝑀 𝑖 − 1, 𝑗 − 1 𝑀 𝑖 − 1, 𝑗
𝑀 𝑖, 𝑗 − 1
Example
Q→ A T G TGA ATGC AT
S↓ 0 1 2 4 8 16 32
A 1 0 1 2 3 4 5
T 2 1 0 1 2 3 4
G 4 2 1 0 1 2 3
GT 8 3 2 1 0.333 1.333 2.333
C 16 4 3 2 1.333 1.083 2.083
GGTT 32 5 4 3 2.333 1.833 1.833
TC 64 6 5 4 3.333 2.833 2.333
Example
𝑀[𝑚, 𝑛] is the similarity distance between sequences 𝑆 and 𝑄
Q→ A T G TGA ATGC AT
S↓ 0 1 2 4 8 16 32
A 1 0 1 2 3 4 5
T 2 1 0 1 2 3 4
G 4 2 1 0 1 2 3
GT 8 3 2 1 0.333 1.333 2.333
C 16 4 3 2 1.333 1.083 2.083
GGTT 32 5 4 3 2.333 1.833 1.833
TC 64 6 5 4 3.333 2.833 2.333
Results
• Data set: sequences of the firtst exon of 𝛽-globin gene of 11 species
• Method:
Calculate the similarity degree among the sequences using the proposed
method (LZ-complexity + dynamic programming)
Arrange all the similarity degrees into a matrix
Put the pair-wise distances into a neighbor-joining program in the PHYLIP
package
Results
Results
G. Huang et al. (2D-graphical method)
• Method based on graphical representation
• Four vector correspond to four groups of nucleotides:
𝐴 → (1, −
3
3)
𝑇 → (1,
3
2)
𝐺 → (1, − 5)
𝐶 → (1, 3)
G. Huang et al. (2D-graphical method)
• DNA sequence can be turned into a graphical curve
G. Huang et al. (2D-graphical method)
• Graphs shows intuitively (dis)similarity between sequences.
G. Huang et al. (2D-graphical method)
• Graphs shows intuitively (dis)similarity between sequences.
G. Huang et al. (2D-graphical method)
• How to compare sequences?
• Similarity among sequences can be quantified by computing distance
between either vectors or points.
• Spatial distances
• Euclidean distance
• Mahalanobis distance
• Standard Euclidean distance
• Cosine similarity
• Stuart et al. (2002)
Euclidean distance
• Given two vectors 𝐴 = {𝑎1, 𝑎2, … , 𝑎 𝑛} and 𝐵 = {𝑏1, 𝑏2, … , 𝑏 𝑛}, the
Euclidean distance is computed as follow:
𝐸𝐷 𝐴, 𝐵 =
𝑖=1
𝑛
𝑎𝑖 − 𝑏𝑖
2
Mahalanobis distance
• The Mahalanobis distance takes into account the data covariance
relationship. It is defined as follow:
𝑀𝐷 𝐴, 𝐵 = 𝐴 − 𝐵 𝐶𝑉−1 𝐴 − 𝐵 ′
• 𝐶𝑉 is the covariance matrix
Standard Euclidean distance
• Standard Euclidean Distance (SED) considers merely the variance of n
variables.
Cosine similarity
• Stuart et al. define a distance using the angles between vectors. It is
defined as follow:
𝐴𝐷 𝐴, 𝐵 =
𝐴 ∙ 𝐵
𝐴 × 𝐵
=
𝑖=1
𝑛
𝑎𝑖 𝑏𝑖
𝑖=1
𝑛
𝑎𝑖
2
𝑖=1
𝑛
𝑏𝑖
2
𝐸𝐴𝐷 𝐴, 𝐵 = − ln 1 + 𝐴𝐷 𝐴, 𝐵 ∕ 2
• Where 𝐴𝐷(𝐴, 𝐵) is the cosine similarity between 𝐴 and 𝐵, 𝐸𝐴𝐷 𝐴, 𝐵
represents the evolutionary distance between 𝐴 and 𝐵.
Results
• Two data sets have been used
• a real sequences set
• Human mithocondrial genome
• a random sequences set
• Obtained by applying random mutation on the real sequences set
(1%, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% mutatio
n rates)
• Euclidean, SED, Mahalanobis and EAD distance have been used
Results
• 𝑑 𝑥 denotes the distance
between a sequence and its
randomly mutated version.
• The Euclidian distance is more
sensitive to mutation rate than
the other three distance.
Results
• 35 mitochondrial genome sequences from different mammals
(GeneBank db)
• Primates species including human, ape, gorilla, chimpazees, etc. are
grouped together
• Result is in agreement with that obtained by Yu et al.(2010) and Raina
et al. (2005)
Results
Presented methods comparison
LZ-complexity based algorithm 2D-graphic based algorithm
Dynamic programming algorithm Graphical algorithm
LZ-complexity measure Various distances (ED, Mahalanobis,…)
Generic (DNA/proteins) DNA-specific
Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
Presented methods comparison
LZ-complexity based algorithm 2D-graphic based algorithm
Dynamic programming algorithm Graphical algorithm
LZ-complexity measure Various distances (ED, Mahalanobis,…)
Generic (DNA/proteins) DNA-specific
Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
Presented methods comparison
LZ-complexity based algorithm 2D-graphic based algorithm
Dynamic programming algorithm Graphical algorithm
LZ-complexity measure Various distances (ED, Mahalanobis,…)
Generic (DNA/proteins) DNA-specific
Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
Position Symbol Add to dictionary Index Rate
1 A A 1 1
2 T T 2 1
3 G G 3 1
4 G
5 T GT 4 0.80
.. .. .. .. ..
𝐸𝐷 𝐴, 𝐵 =
𝑖=1
𝑛
𝑎𝑖 − 𝑏𝑖
2
Presented methods comparison
LZ-complexity based algorithm 2D-graphic based algorithm
Dynamic programming algorithm Graphical algorithm
LZ-complexity measure Various distances (ED, Mahalanobis,…)
Generic (DNA/proteins) DNA-specific
Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
Presented methods comparison
LZ-complexity based algorithm 2D-graphic based algorithm
Dynamic programming algorithm Graphical algorithm
LZ-complexity measure Various distances (ED, Mahalanobis,…)
Generic (DNA/proteins) DNA-specific
Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
Biological sequences analysis
Biological sequences analysis
Biological sequences analysis
Biological sequences analysis

Contenu connexe

Tendances

Tendances (20)

Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
Similarity
SimilaritySimilarity
Similarity
 
Clustal
ClustalClustal
Clustal
 
PAM matrices evolution
PAM matrices evolutionPAM matrices evolution
PAM matrices evolution
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
 
AgBioData: Complexity and Diversity of the Pan-Genome
AgBioData: Complexity and Diversity of the Pan-Genome AgBioData: Complexity and Diversity of the Pan-Genome
AgBioData: Complexity and Diversity of the Pan-Genome
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Cath
CathCath
Cath
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Proteomics
Proteomics Proteomics
Proteomics
 
Protein database
Protein databaseProtein database
Protein database
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 

Similaire à Biological sequences analysis

Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis Baivab Nag
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentSanaym
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentGiuliana Carullo
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 

Similaire à Biological sequences analysis (20)

Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
sorting
sortingsorting
sorting
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence Alignment
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 

Dernier

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Dernier (20)

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Biological sequences analysis

  • 1. Biological sequences analysis A review of two alignment-free methods for sequence comparison
  • 2. Outline • Introduction to sequence alignment problem • Introduction to alignment-free sequence comparison • An LZ-complexity based alignment method • A 2D graphical alignment method • Methods overall comparison
  • 3. Introduction to sequence alignment • Goal: determine if a particular sequence is like another sequence • determine if a database contains a potential homologous sequence.
  • 4. Introduction to sequence alignment • Two alignment types are used: global and local • The global approach compares one whole sequence with other entire sequences. • The local method uses a subset of a sequence and attempts to align it to subset of other sequences.
  • 5. Introduction to sequence alignment • The global alignment looks for comparison over the entire range of the two sequences involved. GCATTACTAATATATTAGTAAATCAGAGTAGTA ||||||||| || AAGCGAATAATATATTTATACTCAGATTATTGCGCG
  • 6. Introduction to sequence alignment • By contrast, when a local alignment is performed, a small seed is uncovered that can be used to quickly extend the alignment. • The initial seed for the alignment: TAT ||| AAGCGAATAATATATTTATACTCAGATTATTGCGCG
  • 7. Introduction to sequence alignment • By contrast, when a local alignment is performed, a small seed is uncovered that can be used to quickly extend the alignment. • And now the extended alignment: TATATATTAGTA ||||||||| || AAGCGAATAATATATTTATACTCAGATTATTGCGCG
  • 8. Introduction to sequence alignment • How to search similiarities in genetic sequences? • Naive methods: comparing all possibles alignments (extremely slow) • Heuristics methods • Examples: BLAST, FASTA, … • Optimal solution is not guaranteed • Tradeoff: Speed vs Accuracy • Dynamic programming methods • Examples: Needleman & Wunsch, Smith & Waterman
  • 9. Introduction to sequence alignment • How to search similiarities in genetic sequences? • Naive methods: comparing all possibles alignments (extremely slow) • Heuristics methods • Examples: BLASTA, FASTA, … • Optimal solution is not guaranteed • Tradeoff: Speed vs Accuracy • Dynamic programming methods • Examples: Needleman & Wunsch, Smith & Waterman • …faster alternatives?
  • 10. Alignment-free comparison • Challenge: overcome the traditional alignment-based algorithm inefficiency • Alignment-based methods • Slow • May produce incorrect results when used on more divergent but functionally related sequences
  • 11. Alignment-free comparison • Much faster than alignment-based methods • most methods work in linear time • Four categories: • methods based on k-mer/word frequency, • methods based on substrings, • methods based on information theory (LZ-complexity based method) and • methods based on graphical representation (2D-graphical method)
  • 12. LZ-complexity based sequence comparison • Method based on information theory • Analysis of DNA/Proteic sequences • Built upon the LZ-complexity measure • Dynamic programming algorithm
  • 13. LZ-complexity • Complexity measure for finite sequences • LZ-complexity as entropy rate estimator for finite sequences • Produces a dictionary of productions for a sequence 𝑆. • “The proposed complexity measure is related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated” (Abraham Lempel and Jacob Ziv, "On the Complexity of Individual Sequences“, 1976)
  • 14. LZ-complexity (production process) • 𝑚-step production process of a finite sequence 𝑆 𝐻 𝑆 = 𝑆 1, ℎ1 ∗ 𝑆 ℎ1 + 1, ℎ2 , … , 𝑆(ℎ 𝑚−1 + 1, ℎ 𝑚) • 𝐻 𝑆 is called history of 𝑆 and 𝐻𝑖 𝑆 = 𝑆(ℎ𝑖−1 + 1, ℎ𝑖) is called the ith component of 𝐻 𝑆 . • Each component 𝐻𝑖 𝑆 is added into a dictionary
  • 15. LZ-complexity (algorithm) Initialize the dictionary repeat until the sequence have not been consumed  Add the next symbol to the current subsequence.  If the subsequence is reproducible from the previous history, add to the dictionary and increase index value
  • 16. LZ-complexity (algorithm) Initialize the dictionary repeat until the sequence have not been consumed  Add the next symbol to the current subsequence.  If the subsequence is reproducible from the previous history, add to the dictionary and increase index value The production process inserts a comma (',') into a sequence 𝑆 after the creation of each new phrase formed by the concatenation of the longest recognized dictionary phrase and the innovative symbol that follows.
  • 17. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A 2 T 3 G 4 G 5 T 6 C 7 G 8 G 9 T 10 T 11 T 12 C
  • 18. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T 3 G 4 G 5 T 6 C 7 G 8 G 9 T 10 T 11 T 12 C
  • 19. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G 4 G 5 T 6 C 7 G 8 G 9 T 10 T 11 T 12 C
  • 20. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T 6 C 7 G 8 G 9 T 10 T 11 T 12 C
  • 21. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C 7 G 8 G 9 T 10 T 11 T 12 C
  • 22. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C C 5 7 G 8 G 9 T 10 T 11 T 12 C
  • 23. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C C 5 7 G 8 G 9 T 10 T GGTT 6 11 T 12 C
  • 24. LZ-complexity (Example) • S = ATGGTCGGTTTC Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C C 5 7 G 8 G 9 T 10 T GGTT 6 11 T 12 C TC 7
  • 25. LZ-complexity (Example) • S = ATGGTCGGTTTC • The complexity 𝑐 𝑆 of the sequence S is • 𝑐 𝑆 = 7 Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C C 5 7 G 8 G 9 T 10 T GGTT 6 11 T 12 C TC 7
  • 26. LZ-complexity (Example) • S = ATGGTCGGTTTC • The complexity 𝑐 𝑆 of the sequence S is • 𝑐 𝑆 = 7 • The history of 𝑆 is • 𝐻 𝑆 = {𝐴, 𝑇, 𝐺, 𝐺𝑇, 𝐶, 𝐺𝐺𝑇𝑇, 𝑇𝐶} Position Symbol Add to dictionary Index 1 A A 1 2 T T 2 3 G G 3 4 G 5 T GT 4 6 C C 5 7 G 8 G 9 T 10 T GGTT 6 11 T 12 C TC 7
  • 27. LZ-complexity based sequence comparison • Based on the number of components in the LZ-complexity decomposition of the DNA sequences. • Given two sequences S and Q decomposed using the LZ-complexity: 𝑆 = 𝑆1 𝑆2…𝑆 𝑘…𝑆 𝑚 𝑄 = 𝑄1 𝑄…𝑄 𝑘…𝑄 𝑛  𝑚 is the number of fragments of 𝑆  𝑛 is the number of fragments of 𝑄
  • 28. LZ-complexity based sequence comparison • Let 𝜎 be a score function used to build the dynamic programming matrix. It is defined as follows: 𝜎 𝑆𝑖, _ = 𝜎 _, 𝑄𝑖 = 1 𝜎 𝑆𝑖, 𝑄𝑗 = 1 − 𝑁(𝑆𝑖, 𝑄𝑗) max 𝑙𝑒𝑛𝑔𝑡ℎ(𝑆𝑖, 𝑄𝑗) • where 𝑁(𝑆𝑖, 𝑄𝑗) is the number of the same elements of fragment 𝑆𝑖 and 𝑄𝑗.
  • 29. LZ-complexity based sequence comparison • The sequence similarity matrix 𝑀 is built using the following formulas: 𝑀 𝑖, 0 = 𝑘=1 𝑖 𝜎 𝑆𝑖, _ 𝑀 0, 𝑗 = 𝑘=1 𝑗 𝜎 _, 𝑄𝑗 𝑀[𝑖, 𝑗] = min 𝑀 𝑖 − 1, 𝑗 + 𝜎(𝑆𝑖, _) 𝑀 𝑖 − 1, 𝑗 − 1 + 𝜎(𝑆𝑖, 𝑄𝑖) 𝑀 𝑖, 𝑗 − 1 + 𝜎(_, 𝑄𝑖) 𝑀 𝑖 − 1, 𝑗 − 1 𝑀 𝑖 − 1, 𝑗 𝑀 𝑖, 𝑗 − 1
  • 30. Example Q→ A T G TGA ATGC AT S↓ 0 1 2 4 8 16 32 A 1 0 1 2 3 4 5 T 2 1 0 1 2 3 4 G 4 2 1 0 1 2 3 GT 8 3 2 1 0.333 1.333 2.333 C 16 4 3 2 1.333 1.083 2.083 GGTT 32 5 4 3 2.333 1.833 1.833 TC 64 6 5 4 3.333 2.833 2.333
  • 31. Example 𝑀[𝑚, 𝑛] is the similarity distance between sequences 𝑆 and 𝑄 Q→ A T G TGA ATGC AT S↓ 0 1 2 4 8 16 32 A 1 0 1 2 3 4 5 T 2 1 0 1 2 3 4 G 4 2 1 0 1 2 3 GT 8 3 2 1 0.333 1.333 2.333 C 16 4 3 2 1.333 1.083 2.083 GGTT 32 5 4 3 2.333 1.833 1.833 TC 64 6 5 4 3.333 2.833 2.333
  • 32. Results • Data set: sequences of the firtst exon of 𝛽-globin gene of 11 species • Method: Calculate the similarity degree among the sequences using the proposed method (LZ-complexity + dynamic programming) Arrange all the similarity degrees into a matrix Put the pair-wise distances into a neighbor-joining program in the PHYLIP package
  • 35. G. Huang et al. (2D-graphical method) • Method based on graphical representation • Four vector correspond to four groups of nucleotides: 𝐴 → (1, − 3 3) 𝑇 → (1, 3 2) 𝐺 → (1, − 5) 𝐶 → (1, 3)
  • 36. G. Huang et al. (2D-graphical method) • DNA sequence can be turned into a graphical curve
  • 37. G. Huang et al. (2D-graphical method) • Graphs shows intuitively (dis)similarity between sequences.
  • 38. G. Huang et al. (2D-graphical method) • Graphs shows intuitively (dis)similarity between sequences.
  • 39. G. Huang et al. (2D-graphical method) • How to compare sequences? • Similarity among sequences can be quantified by computing distance between either vectors or points. • Spatial distances • Euclidean distance • Mahalanobis distance • Standard Euclidean distance • Cosine similarity • Stuart et al. (2002)
  • 40. Euclidean distance • Given two vectors 𝐴 = {𝑎1, 𝑎2, … , 𝑎 𝑛} and 𝐵 = {𝑏1, 𝑏2, … , 𝑏 𝑛}, the Euclidean distance is computed as follow: 𝐸𝐷 𝐴, 𝐵 = 𝑖=1 𝑛 𝑎𝑖 − 𝑏𝑖 2
  • 41. Mahalanobis distance • The Mahalanobis distance takes into account the data covariance relationship. It is defined as follow: 𝑀𝐷 𝐴, 𝐵 = 𝐴 − 𝐵 𝐶𝑉−1 𝐴 − 𝐵 ′ • 𝐶𝑉 is the covariance matrix
  • 42. Standard Euclidean distance • Standard Euclidean Distance (SED) considers merely the variance of n variables.
  • 43. Cosine similarity • Stuart et al. define a distance using the angles between vectors. It is defined as follow: 𝐴𝐷 𝐴, 𝐵 = 𝐴 ∙ 𝐵 𝐴 × 𝐵 = 𝑖=1 𝑛 𝑎𝑖 𝑏𝑖 𝑖=1 𝑛 𝑎𝑖 2 𝑖=1 𝑛 𝑏𝑖 2 𝐸𝐴𝐷 𝐴, 𝐵 = − ln 1 + 𝐴𝐷 𝐴, 𝐵 ∕ 2 • Where 𝐴𝐷(𝐴, 𝐵) is the cosine similarity between 𝐴 and 𝐵, 𝐸𝐴𝐷 𝐴, 𝐵 represents the evolutionary distance between 𝐴 and 𝐵.
  • 44. Results • Two data sets have been used • a real sequences set • Human mithocondrial genome • a random sequences set • Obtained by applying random mutation on the real sequences set (1%, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% mutatio n rates) • Euclidean, SED, Mahalanobis and EAD distance have been used
  • 45. Results • 𝑑 𝑥 denotes the distance between a sequence and its randomly mutated version. • The Euclidian distance is more sensitive to mutation rate than the other three distance.
  • 46. Results • 35 mitochondrial genome sequences from different mammals (GeneBank db) • Primates species including human, ape, gorilla, chimpazees, etc. are grouped together • Result is in agreement with that obtained by Yu et al.(2010) and Raina et al. (2005)
  • 48. Presented methods comparison LZ-complexity based algorithm 2D-graphic based algorithm Dynamic programming algorithm Graphical algorithm LZ-complexity measure Various distances (ED, Mahalanobis,…) Generic (DNA/proteins) DNA-specific Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
  • 49. Presented methods comparison LZ-complexity based algorithm 2D-graphic based algorithm Dynamic programming algorithm Graphical algorithm LZ-complexity measure Various distances (ED, Mahalanobis,…) Generic (DNA/proteins) DNA-specific Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
  • 50. Presented methods comparison LZ-complexity based algorithm 2D-graphic based algorithm Dynamic programming algorithm Graphical algorithm LZ-complexity measure Various distances (ED, Mahalanobis,…) Generic (DNA/proteins) DNA-specific Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results Position Symbol Add to dictionary Index Rate 1 A A 1 1 2 T T 2 1 3 G G 3 1 4 G 5 T GT 4 0.80 .. .. .. .. .. 𝐸𝐷 𝐴, 𝐵 = 𝑖=1 𝑛 𝑎𝑖 − 𝑏𝑖 2
  • 51. Presented methods comparison LZ-complexity based algorithm 2D-graphic based algorithm Dynamic programming algorithm Graphical algorithm LZ-complexity measure Various distances (ED, Mahalanobis,…) Generic (DNA/proteins) DNA-specific Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results
  • 52. Presented methods comparison LZ-complexity based algorithm 2D-graphic based algorithm Dynamic programming algorithm Graphical algorithm LZ-complexity measure Various distances (ED, Mahalanobis,…) Generic (DNA/proteins) DNA-specific Unrooted Phylogenetic-tree results Rooted Phylogenetic-tree results

Notes de l'éditeur

  1. PHYLIP is a free package of programs for inferring phylogenies