SlideShare une entreprise Scribd logo
1  sur  123
 
FBW 20-10-2011 Wim Van Criekinge
Inhoud Lessen: Bioinformatica ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],NCBI  - The National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/ The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), a part of the National Institutes of Health (NIH). ExPASy  -  Molecular Biology Server http://expasy.hcuge.ch/www/ Molecular biology WWW server of the Swiss Institute of Bioinformatics (SIB). This server is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE EBI   - European Bioinformatics Institute http://www.ebi.ac.uk/
Anno 2002 Anno 2003
Anno 2004
Anno 2005
Anno 2006
Anno 2007
Anno 2009
Anno 2010 Anno 2010
Anno 2011
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identity The extent to which two (nucleotide or amino acid)  sequences are invariant. Homology Similarity attributed to descent from a common ancestor. Definitions RBP:  26  RV K ENFDKARFS GTW YA MA KKDPEGLFLQDNIV A EFS V DE T GQMSATAKGRVRL L NN W D- 84 +  K  ++ +  +  GTW ++ MA +  L  +  A   V   T  +  + L +  W +  glycodelin:  23  QT K QDLELPKLA GTW HS MA MA-TNNISLMATLK A PLR V HI T SLLPTPEDNLEIV L HR W EN 81
Orthologous   Homologous sequences in different species  that arose from a common ancestral gene  during speciation; may or may not be responsible  for a similar function.   Paralogous   Homologous sequences within a single species  that arose by gene duplication.  Definitions
speciation duplication
fly  GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA  human  GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA  plant  GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA  bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA  yeast  GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA  archaeon  GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA  fly  KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST  human  KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST  plant  KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST  bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST  yeast  KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST  archaeon  KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST  fly  GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK  human  GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV  plant  GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA  bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA  yeast  GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV  archaeon  GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA  Multiple sequence alignment of glyceraldehyde- 3-phsophate dehydrogenases
[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],It is very important to realize, that all subsequent results depend critically on just how this is done and what model lies at the basis for the construction of a specific scoring matrix. A scoring matrix is a tool to quantify how well a certain model is represented in the alignment of two sequences, and any result obtained by its application is meaningful exclusively in the context of that model.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G and C purine-pyrimidine A and T purine -pyrimidine
[object Object],[object Object],A  T  C  G A  0  5  5  1 T  5  0  1  5 C  5  1  0  5 G  1  5  5  0 ,[object Object]
[object Object],[object Object],[object Object],A  T  C  G A  0  5  5  1 T  5  0  1  5 C  5  1  0  5 G  1  5  5  0
The Genome Chose Its Alphabet With Care  ,[object Object],[object Object]
[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
A  S  G  L  K  V  T  P  E  D  N  I  Q  R  F  Y  C  H  M  W  Z  B  X Ala  = A  O  1  1  2  2  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2 Ser  = S  1  O  1  1  2  2  1  1  2  2  1  1  2  1  1  1  1  2  2  1  2  2  2 Gly  = G  1  1  0  2  2  1  2  2  1  1  2  2  2  1  2  2  1  2  2  1  2  2  2 Leu  = L  2  1  2  0  2  1  2  1  2  2  2  1  1  1  1  2  2  1  1  1  2  2  2 Lys  = K  2  2  2  2  0  2  1  2  1  2  1  1  1  1  2  2  2  2  1  2  1  2  2 Val  = V  1  2  1  1  2  0  2  2  1  1  2  1  2  2  1  2  2  2  1  2  2  2  2 Thr  = T  1  1  2  2  1  2  0  1  2  2  1  1  2  1  2  2  2  2  1  2  2  2  2 Pro  = P  1  1  2  1  2  2  1  0  2  2  2  2  1  1  2  2  2  1  2  2  2  2  2 Glu  - E  1  2  1  2  1  1  2  2  0  1  2  2  1  2  2  2  2  2  2  2  1  2  2 Asp  = D  1  2  1  2  2  1  2  2  1  O  1  2  2  2  2  1  2  1  2  2  2  1  2 Asn  = N  2  1  2  2  1  2  1  2  2  1  O  1  2  2  2  1  2  1  2  2  2  1  2 Ile  = I  2  1  2  1  1  1  1  2  2  2  1  0  2  1  1  2  2  2  1  2  2  2  2 Gln  = Q  2  2  2  1  1  2  2  1  1  2  2  2  0  1  2  2  2  1  2  2  1  2  2 Arg  = R  2  1  1  1  1  2  1  1  2  2  2  1  1  0  2  2  1  1  1  1  2  2  2 Phe  = F  2  1  2  1  2  1  2  2  2  2  2  1  2  2  0  1  1  2  2  2  2  2  2 Tyr  = Y  2  1  2  2  2  2  2  2  2  1  1  2  2  2  1  O  1  1  3  2  2  1  2 Cys  = C  2  1  1  2  2  2  2  2  2  2  2  2  2  1  1  1  0  2  2  1  2  2  2 His  = H  2  2  2  1  2  2  2  1  2  1  1  2  1  1  2  1  2  0  2  2  2  1  2 Met  = M  2  2  2  1  1  1  1  2  2  2  2  1  2  1  2  3  2  2  0  2  2  2  2 Trp  = W  2  1  1  1  2  2  2  2  2  2  2  2  2  1  2  2  1  2  2  0  2  2  2 Glx  = Z  2  2  2  2  1  2  2  2  1  2  2  2  1  2  2  2  2  2  2  2  1  2  2 Asx  = B  2  2  2  2  2  2  2  2  2  1  1  2  2  2  2  1  2  1  2  2  2  1  2 ???  = X  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2 The table is generated by calculating  the minimum number of base changes required to convert an amino acid in row i to an amino acid in column j.  Note Met->Tyr is the only change that requires all 3 codon positions to change. ,[object Object]
[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
All amino acids have the same general formula   ,[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other similarity scoring matrices might be constructed from  any property of amino acids that can be quantified  - partition coefficients between hydrophobic and hydrophilic phases - charge - molecular volume Unfortunately, …
AAindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Protein Eng. 1996 Jan;9(1):27-36.
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
First step: finding “accepted mutations” ,[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dayhoff’s PAM1 mutation probability matrix  (Transition Matrix)
PAM1:  Transition Matrix ,[object Object]
[object Object],[object Object],[object Object],[object Object],PAM1:  Transition Matrix
Second   step: Frequencies of Occurence ,[object Object],[object Object],[object Object]
Amino acid frequencies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Second   step: Frequencies of Occurence
Third step: Relative Mutabilities ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fourth step: Mutation Probability Matrix ,[object Object],M ij = The mutation probability matrix gives the probability, that an amino acid i will replace an amino acid of type j in a given evolutionary interval, in two related sequences ,[object Object],ADB ADA A  D  B A  D B i j
Fifth step: The Evolutionary Distance ,[object Object],[object Object]
6. Relatedness Odds ,[object Object],[object Object],[object Object],[object Object],[object Object]
Last step: the log-odds matrix ,[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dayhoff’s PAM1 mutation probability matrix  (Transition Matrix)
Weighted Random Selection ,[object Object]
PAM-Simulator
PAM-Simulator
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
Margaret Dayhoff’s 34 protein superfamilies Protein PAMs per 100 million years Ig kappa chain 37 Kappa casein 33 Lactalbumin 27 Hemoglobin   12 Myoglobin 8.9 Insulin 4.4 Histone H4 0.10 Ubiquitin 0.00
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],BLOSUM: Blocks Substitution Matrix
BLOSUM ( BLO ck –  SUM ) scoring DDNAAV DNAVDD NNVAVV Block = ungapped alignent Eg. Amino Acids D N V A a  b  c  d  e  f 1 2 3 S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs
A. Observed pairs DDNAAV DNAVDD NNVAVV a  b  c  d  e  f 1 2 3 D  N  A  V  D  N A V  1  4 1 3  1 1 1  1 4  1  f f ij D  N  A  V  D  N A V  .056  .222 .056 .167 .056 .056 .056 .056 .222 .056  g ij /18 Relative frequency table Probability of obtaining a pair if randomly choosing pairs from block
B. Expected pairs A DDDDD NNNN AAAA VVVVV DDNAAV DNAVDD NNVAVV P i 5/18 4/18 4/18 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= P D P N  + P N P D  = 2 * (5/18)*(4/18) = .123 D  N  A  V  D  N A V  .077  .123 .154 .123 .049 .123 .099 .049 .123 .049  e ij Random rel. frequency table Probability of obtaining a pair of each amino acid drawn independently from block
C. Summary (A/B) ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
Rat versus  mouse RBP Rat versus  bacterial lipocalin
[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dotplots ,[object Object],[object Object],[object Object],[object Object]
Dot Plot References ,[object Object],[object Object],[object Object],[object Object]
Visual Alignments (Dot Plots) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dotplot-simulator.pl ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Window size = 1, stringency 100%
Noise in Dot Plots ,[object Object],[object Object],[object Object],[object Object],[object Object]
Reduction of Dot Plot Noise Self alignment of ACCTGAGCTCACCTGAGTTA
Dotplot-simulator.pl ,[object Object],[object Object],[object Object],[object Object],[object Object]
Chromosome Y self comparison
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Available Dot Plot Programs ,[object Object]
Available Dot Plot Programs ,[object Object]
Available Dot Plot Programs ,[object Object]
Weblems ,[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Tendances (20)

Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
BLAST
BLASTBLAST
BLAST
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Msa
MsaMsa
Msa
 
Clustal
ClustalClustal
Clustal
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Genetic mapping
Genetic mappingGenetic mapping
Genetic mapping
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 

Similaire à Bioinformatica 20-10-2011-t3-scoring matrices

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
Osama Barayan
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revised
ariddlegirl
 

Similaire à Bioinformatica 20-10-2011-t3-scoring matrices (20)

Computation and System Biology Assignment Help
Computation and System Biology Assignment HelpComputation and System Biology Assignment Help
Computation and System Biology Assignment Help
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysis
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revised
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 

Plus de Prof. Wim Van Criekinge

Plus de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Bioinformatica 20-10-2011-t3-scoring matrices

  • 1.  
  • 2. FBW 20-10-2011 Wim Van Criekinge
  • 3.
  • 4.
  • 5.
  • 14.
  • 15. Identity The extent to which two (nucleotide or amino acid) sequences are invariant. Homology Similarity attributed to descent from a common ancestor. Definitions RBP: 26 RV K ENFDKARFS GTW YA MA KKDPEGLFLQDNIV A EFS V DE T GQMSATAKGRVRL L NN W D- 84 + K ++ + + GTW ++ MA + L + A V T + + L + W + glycodelin: 23 QT K QDLELPKLA GTW HS MA MA-TNNISLMATLK A PLR V HI T SLLPTPEDNLEIV L HR W EN 81
  • 16. Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. Paralogous Homologous sequences within a single species that arose by gene duplication. Definitions
  • 18. fly GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA human GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA plant GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA yeast GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA archaeon GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA fly KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST human KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST plant KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST yeast KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST archaeon KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST fly GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK human GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV plant GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA yeast GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV archaeon GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA Multiple sequence alignment of glyceraldehyde- 3-phsophate dehydrogenases
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Other similarity scoring matrices might be constructed from any property of amino acids that can be quantified - partition coefficients between hydrophobic and hydrophilic phases - charge - molecular volume Unfortunately, …
  • 55.
  • 56. Protein Eng. 1996 Jan;9(1):27-36.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Dayhoff’s PAM1 mutation probability matrix (Transition Matrix)
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. Dayhoff’s PAM1 mutation probability matrix (Transition Matrix)
  • 84.
  • 87.
  • 88.
  • 89.  
  • 90.
  • 91. 4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
  • 92. Margaret Dayhoff’s 34 protein superfamilies Protein PAMs per 100 million years Ig kappa chain 37 Kappa casein 33 Lactalbumin 27 Hemoglobin  12 Myoglobin 8.9 Insulin 4.4 Histone H4 0.10 Ubiquitin 0.00
  • 93.
  • 94.
  • 95.
  • 96. BLOSUM ( BLO ck – SUM ) scoring DDNAAV DNAVDD NNVAVV Block = ungapped alignent Eg. Amino Acids D N V A a b c d e f 1 2 3 S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs
  • 97. A. Observed pairs DDNAAV DNAVDD NNVAVV a b c d e f 1 2 3 D N A V D N A V 1 4 1 3 1 1 1 1 4 1 f f ij D N A V D N A V .056 .222 .056 .167 .056 .056 .056 .056 .222 .056 g ij /18 Relative frequency table Probability of obtaining a pair if randomly choosing pairs from block
  • 98. B. Expected pairs A DDDDD NNNN AAAA VVVVV DDNAAV DNAVDD NNVAVV P i 5/18 4/18 4/18 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= P D P N + P N P D = 2 * (5/18)*(4/18) = .123 D N A V D N A V .077 .123 .154 .123 .049 .123 .099 .049 .123 .049 e ij Random rel. frequency table Probability of obtaining a pair of each amino acid drawn independently from block
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105. Rat versus mouse RBP Rat versus bacterial lipocalin
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114. Reduction of Dot Plot Noise Self alignment of ACCTGAGCTCACCTGAGTTA
  • 115.
  • 116. Chromosome Y self comparison
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.

Notes de l'éditeur

  1. Mutation probability matrix for the evolutionary distance of 1 PAM (i.e., one Accepted Point Mutation per 100 amino acids). An element of this matrix, [Mij], gives the probability that the amino acid in column j will be replaced by the amino acid in row i after a given evolutionary interval, in this case 1 PAM. Thus, there is a 0.56% probability that Asp will be replaced by Glu. To simplify the appearance, the elements are shown multiplied by 10,000. (Adapted from Figure 82. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979.)