SlideShare a Scribd company logo
1 of 32
SURYABHAN
SINGH RAWAT
Protein Classification

     A comparison of function
       inference techniques
Why do we need automated
classification?
   Sequencing a genome is only the first
    step.
   Between 35-50% of the proteins in
    sequenced genomes have no assigned
    functionality.
   Direct observation of function is costly,
    time consuming, and difficult.
Protein Domains
The tertiary structure of many proteins is built from
several domains.
Often each domain has a separate function to
perform for the protein, such as:
•binding a small ligand (e.g., a peptide in the
molecule shown here)
•spanning the plasma membrane (
transmembrane proteins)
•containing the catalytic site (enzymes)
•DNA-binding (in transcription factors)
•providing a surface to bind specifically to another
protein
In some (but not all) cases, each domain in a
protein is encoded by a separate exon in the gene
encoding that protein.
Inference through sequence
similarity

 ProtoMap: Automatic Classification
 of Protein Sequences, a Hierarchy
   of Protein Families, and Local
 Maps of the Protein Space (1999)
Final Goal
Observations
   Sometimes you don’t know where the
    domains are.
   It is generally accepted that two
    sequences with over 30% identity are
    likely to have the same fold.
   Homologous proteins have similar
    functions.
   Homology is a transitive relationship.
Departures
   Authors do not attempt to define protein
    domains or motifs.
   Not dependant on predefined groups or
    classifications.
   Chart the space of all proteins in
    SWISSPROT, as opposed to individual
    families
   Produce global organization of sequences.
Algorithm Overview
   We construct a weighted graph where
    the nodes are protein sequences and
    the edges are similarity scores.
   Cluster the network considering only
    those edges above some threshold.
   Decrease similarity threshold and
    repeat.
Measuring Sequence Similarity
    Expectation value used. This the
     normalized probability of the similarity
     occurring at random.
    Lower value implies logarithmically
     stronger similarity.

    λS − ln K
S'=
      ln 2
                          E = N /2         S'
Blosum62 Scoring Matrix
Finding Homologies
   Very difficult to distinguish a clear
    threshold between homology and
    chance similarity.
   Authors chose e = .1, .1, and .001 for
    SW, FASTA, and BLAST, respectively.
   Spent a lot of time empirically
    determining these thresholds.
Clustering
             Clustering is done
             iteratively.
             Start with a threshold
             of E < 10-100
             Cluster and increase
             threshold by a factor
             of 105
             Sublinear threshold
             prevents the collapse
             of sequence space
ProtoMap: Results
   Produces well-defined groups which
    correlate strongly to protein families in
    PROSITE and Pfam.
Results:
Immunoglobin Superfamily
ProtoMap: Limitations
   Analysis performs poorly by families
    dominated by short/local domains (PH,
    EGF, ER_TARGET, C2, SH2, SH3, ect…)
   High scoring, low complexity segments can
    lead to nonhomogeneous clusters.
   “Hard” clustering vs. “Soft” clustering
   Has difficulty classifying multidomain
    proteins.
ProtoMap: Future Directions
   3D structure/fold
   Biological function
   Domain content
   Cellular location
   Tissue specificity
   Source organism
   Metabolic pathways
Inference through protein
interaction networks

     Functional Classification of
   Proteins for the Prediction of
  Cellular Function from a Protein-
    Protein Interaction Network
               (2003)
PRODISTIN

• Very similar to ProtoMap,
only the data used to
produce the graph is a list
of binary protein-protein
interactions instead of
sequence similarity scores
• Sequence similarity not a
dominating factor in
PRODISTIN clusters
PRODISTIN Results
Problems with PRODISTIN
                  • Paucity of
                  protein-protein
                  interaction data
                  (average # of
                  connections =
                  2.6)
                  • Either very
                  robust or very
                  indiscriminant
Problems: Multidomain and
  Nonlocal Proteins
• protein kinases
• hydrolases
• ubiquitin…


PRODISTIN: Present problems in clustering by
biochemical function
ProtoMap: Can create undesired connection among
unrelated groups
Scale-Free Networks
   • Node connection probability follows
   a power law distribution
   • Maximum degree of separation
   grows as O(lg n)
   • Highly robust under noise, except
   at hubs and superhubs.


                          ki
P(linking to node i) ~
                         ∑kj   j
The Internet
Social Networks
Metabolic Networks
• The E. coli metabolic network is scale-free.
• Actually, the metabolic networks of all organisms in
all three domains of life appear to be scale-free (43
examined)
• The network diameter of all 43 metabolic networks is
the same, irrespective of the number of proteins
involved.
• Is this counter-intuitive? Yes.


 http://biocomplexity.indiana.edu/research/bionet/
Protein Domain Networks
 • Protein Domains – Nature’s take on writing
 modular code
 • Reconciles apparent paradox of a fixed network
 diameter across species – despite vast differences in
 complexity (some human proteins have 130
 domains)
 • Occurrence of specific protein domains in
 multidomain proteins is scale-free.


http://mbe.oupjournals.org/cgi/content/full/18/9/1694
Protein Domain Graphs
• Prosite domains have a distribution following the
power-law function f(x) = a(b + x)-c, with c = .89.
There are few highly connected domains and many
rarely connected ones.
• ProDom and Pfam domains follow the power
function P ( k ) ≈ k − γ

                              y = 2.5 for ProDom
                              y = 1.7 for Pfam
Hub Domains in Signaling
Pathways
Conclusions
• The accuracy of both ProtoMap and PRODISTIN is
limited because they make the tacit assumption of a
random network topology.
• Protein-Protein interaction networks have scale-
free topology, foiling PRODISTIN
• Protein Domain networks have scale-free topology,
foiling ProtoMap
• Any protein classification algorithm that performs
better than ProtoMap is probably going to have to
address this issue.

More Related Content

What's hot

Insights into All-Atom Protein Structure Prediction via in silico Simulations
Insights into All-Atom Protein Structure Prediction via in silico SimulationsInsights into All-Atom Protein Structure Prediction via in silico Simulations
Insights into All-Atom Protein Structure Prediction via in silico Simulationsdwang953
 
13C Chemical shifts of SUMO protein in the
13C Chemical shifts of SUMO protein in the13C Chemical shifts of SUMO protein in the
13C Chemical shifts of SUMO protein in theAbhilash Kannan
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interactionZeshan Haider
 
Protein interaction, types by kk sahu
Protein interaction, types by kk sahuProtein interaction, types by kk sahu
Protein interaction, types by kk sahuKAUSHAL SAHU
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresLars Juhl Jensen
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsKAUSHAL SAHU
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and SimulationsAbhilash Kannan
 
Gene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesGene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesboussau
 
Chromatin, Organization macromolecule complex
Chromatin, Organization macromolecule complexChromatin, Organization macromolecule complex
Chromatin, Organization macromolecule complexKAUSHAL SAHU
 
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...ijsc
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_predictionShwetA Kumari
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYShikha Popali
 

What's hot (20)

Insights into All-Atom Protein Structure Prediction via in silico Simulations
Insights into All-Atom Protein Structure Prediction via in silico SimulationsInsights into All-Atom Protein Structure Prediction via in silico Simulations
Insights into All-Atom Protein Structure Prediction via in silico Simulations
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
13C Chemical shifts of SUMO protein in the
13C Chemical shifts of SUMO protein in the13C Chemical shifts of SUMO protein in the
13C Chemical shifts of SUMO protein in the
 
Slides 0
Slides 0Slides 0
Slides 0
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
protein-protein interaction
protein-protein  interactionprotein-protein  interaction
protein-protein interaction
 
Protein interaction, types by kk sahu
Protein interaction, types by kk sahuProtein interaction, types by kk sahu
Protein interaction, types by kk sahu
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomics
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
proteomics
 proteomics proteomics
proteomics
 
Gene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesGene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayes
 
Chromatin, Organization macromolecule complex
Chromatin, Organization macromolecule complexChromatin, Organization macromolecule complex
Chromatin, Organization macromolecule complex
 
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...
A Multiset Rule Based Petri net Algorithm for the Synthesis and Secretary Pat...
 
Genomics
Genomics Genomics
Genomics
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Seminar2
Seminar2Seminar2
Seminar2
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 

Viewers also liked

Generic approach for predicting unannotated protein pair function using protein
Generic approach for predicting unannotated protein pair function using proteinGeneric approach for predicting unannotated protein pair function using protein
Generic approach for predicting unannotated protein pair function using proteinIAEME Publication
 
Journal Club 2013-09-10: Pandya et al
Journal Club 2013-09-10: Pandya et alJournal Club 2013-09-10: Pandya et al
Journal Club 2013-09-10: Pandya et alSpencer Bliven
 
Bio process
Bio processBio process
Bio processsun777
 
Evolutionary relationship between diverse protein with similar domain
Evolutionary relationship between diverse protein with similar domainEvolutionary relationship between diverse protein with similar domain
Evolutionary relationship between diverse protein with similar domainjj_zein
 
Usability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challengesUsability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challengesbolk
 

Viewers also liked (9)

Generic approach for predicting unannotated protein pair function using protein
Generic approach for predicting unannotated protein pair function using proteinGeneric approach for predicting unannotated protein pair function using protein
Generic approach for predicting unannotated protein pair function using protein
 
Journal Club 2013-09-10: Pandya et al
Journal Club 2013-09-10: Pandya et alJournal Club 2013-09-10: Pandya et al
Journal Club 2013-09-10: Pandya et al
 
Bioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-proteinBioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-protein
 
Mqt 1683 21-09-12
Mqt 1683   21-09-12Mqt 1683   21-09-12
Mqt 1683 21-09-12
 
Bio process
Bio processBio process
Bio process
 
Lecture 2 Dl.doc
Lecture 2 Dl.docLecture 2 Dl.doc
Lecture 2 Dl.doc
 
Evolutionary relationship between diverse protein with similar domain
Evolutionary relationship between diverse protein with similar domainEvolutionary relationship between diverse protein with similar domain
Evolutionary relationship between diverse protein with similar domain
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
Usability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challengesUsability and Bioinformatics: experience and research challenges
Usability and Bioinformatics: experience and research challenges
 

Similar to Bio process

Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
Lecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.pptLecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.pptSachin Teotia
 
Protein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.pptProtein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.pptSachin Teotia
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitantChoksi1
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal
 
Predict protein1 presentation
Predict protein1 presentationPredict protein1 presentation
Predict protein1 presentationArvind Kumar
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptxSilpa87
 
DNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptxDNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptxKashvi Jadia
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsseham15
 

Similar to Bio process (20)

Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 
Lecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.pptLecture__on__Proteomics_Introduction.ppt
Lecture__on__Proteomics_Introduction.ppt
 
Protein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.pptProtein Chemistry-Proteomics-Lec1_Intro.ppt
Protein Chemistry-Proteomics-Lec1_Intro.ppt
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013
 
Predict protein1 presentation
Predict protein1 presentationPredict protein1 presentation
Predict protein1 presentation
 
presentation
presentationpresentation
presentation
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptx
 
DNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptxDNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptx
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignments
 
Lecture 14 2013.ppt
Lecture 14 2013.pptLecture 14 2013.ppt
Lecture 14 2013.ppt
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Bio process

  • 2. Protein Classification A comparison of function inference techniques
  • 3. Why do we need automated classification?  Sequencing a genome is only the first step.  Between 35-50% of the proteins in sequenced genomes have no assigned functionality.  Direct observation of function is costly, time consuming, and difficult.
  • 4. Protein Domains The tertiary structure of many proteins is built from several domains. Often each domain has a separate function to perform for the protein, such as: •binding a small ligand (e.g., a peptide in the molecule shown here) •spanning the plasma membrane ( transmembrane proteins) •containing the catalytic site (enzymes) •DNA-binding (in transcription factors) •providing a surface to bind specifically to another protein In some (but not all) cases, each domain in a protein is encoded by a separate exon in the gene encoding that protein.
  • 5. Inference through sequence similarity ProtoMap: Automatic Classification of Protein Sequences, a Hierarchy of Protein Families, and Local Maps of the Protein Space (1999)
  • 7. Observations  Sometimes you don’t know where the domains are.  It is generally accepted that two sequences with over 30% identity are likely to have the same fold.  Homologous proteins have similar functions.  Homology is a transitive relationship.
  • 8. Departures  Authors do not attempt to define protein domains or motifs.  Not dependant on predefined groups or classifications.  Chart the space of all proteins in SWISSPROT, as opposed to individual families  Produce global organization of sequences.
  • 9. Algorithm Overview  We construct a weighted graph where the nodes are protein sequences and the edges are similarity scores.  Cluster the network considering only those edges above some threshold.  Decrease similarity threshold and repeat.
  • 10. Measuring Sequence Similarity  Expectation value used. This the normalized probability of the similarity occurring at random.  Lower value implies logarithmically stronger similarity. λS − ln K S'= ln 2 E = N /2 S'
  • 12. Finding Homologies  Very difficult to distinguish a clear threshold between homology and chance similarity.  Authors chose e = .1, .1, and .001 for SW, FASTA, and BLAST, respectively.  Spent a lot of time empirically determining these thresholds.
  • 13. Clustering Clustering is done iteratively. Start with a threshold of E < 10-100 Cluster and increase threshold by a factor of 105 Sublinear threshold prevents the collapse of sequence space
  • 14. ProtoMap: Results  Produces well-defined groups which correlate strongly to protein families in PROSITE and Pfam.
  • 16. ProtoMap: Limitations  Analysis performs poorly by families dominated by short/local domains (PH, EGF, ER_TARGET, C2, SH2, SH3, ect…)  High scoring, low complexity segments can lead to nonhomogeneous clusters.  “Hard” clustering vs. “Soft” clustering  Has difficulty classifying multidomain proteins.
  • 17. ProtoMap: Future Directions  3D structure/fold  Biological function  Domain content  Cellular location  Tissue specificity  Source organism  Metabolic pathways
  • 18. Inference through protein interaction networks Functional Classification of Proteins for the Prediction of Cellular Function from a Protein- Protein Interaction Network (2003)
  • 19. PRODISTIN • Very similar to ProtoMap, only the data used to produce the graph is a list of binary protein-protein interactions instead of sequence similarity scores • Sequence similarity not a dominating factor in PRODISTIN clusters
  • 20.
  • 21.
  • 23. Problems with PRODISTIN • Paucity of protein-protein interaction data (average # of connections = 2.6) • Either very robust or very indiscriminant
  • 24. Problems: Multidomain and Nonlocal Proteins • protein kinases • hydrolases • ubiquitin… PRODISTIN: Present problems in clustering by biochemical function ProtoMap: Can create undesired connection among unrelated groups
  • 25. Scale-Free Networks • Node connection probability follows a power law distribution • Maximum degree of separation grows as O(lg n) • Highly robust under noise, except at hubs and superhubs. ki P(linking to node i) ~ ∑kj j
  • 28. Metabolic Networks • The E. coli metabolic network is scale-free. • Actually, the metabolic networks of all organisms in all three domains of life appear to be scale-free (43 examined) • The network diameter of all 43 metabolic networks is the same, irrespective of the number of proteins involved. • Is this counter-intuitive? Yes. http://biocomplexity.indiana.edu/research/bionet/
  • 29. Protein Domain Networks • Protein Domains – Nature’s take on writing modular code • Reconciles apparent paradox of a fixed network diameter across species – despite vast differences in complexity (some human proteins have 130 domains) • Occurrence of specific protein domains in multidomain proteins is scale-free. http://mbe.oupjournals.org/cgi/content/full/18/9/1694
  • 30. Protein Domain Graphs • Prosite domains have a distribution following the power-law function f(x) = a(b + x)-c, with c = .89. There are few highly connected domains and many rarely connected ones. • ProDom and Pfam domains follow the power function P ( k ) ≈ k − γ y = 2.5 for ProDom y = 1.7 for Pfam
  • 31. Hub Domains in Signaling Pathways
  • 32. Conclusions • The accuracy of both ProtoMap and PRODISTIN is limited because they make the tacit assumption of a random network topology. • Protein-Protein interaction networks have scale- free topology, foiling PRODISTIN • Protein Domain networks have scale-free topology, foiling ProtoMap • Any protein classification algorithm that performs better than ProtoMap is probably going to have to address this issue.