SlideShare une entreprise Scribd logo
1  sur  24
Eugenio Belda

Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team)
                                     CEA/DSV/IG/Genoscope & CNRS UMR8030
Introduction
 Advances in sequencing technologies has allowed an exponential accumulation
of complete genome sequences in public databases in recent years.
                                                                      12273 protein
                                            4712 enzymatic
 However, wide gap exist                      activities            families (Pfam)
between rapid advances in genome             (EC number)
sequencing and slow progress in                   25% of                     26%
characterization of new protein                  orphan                  of unknown
functions                                       reactions                 functions

                                                                 ?
 Genoscope (French National Sequencing Center) has
as one fundamental research objective the extension of in
silico sequence annotations with experimental
characterization of new enzymatic functions (Metabolic
Genomics).
     Lab. of Genomics & Biochemistry of Metabolism (LGBM)
      Lab. of Organic Chemistry and Biocatalysis (LCOB)
     Lab. For enzymatic cloning and screening (LCAB)
     Lab. of Bioinformatic Analysis in Genomic and Metabolism
     (LABGeM)
Three MicroScope components
Process Management




                        Primary Databank                    Syntactic                    Functional / relational    > 25 methods :
                            Update                         Annotations                        Analyses
                                                                                                                     Integrated in a
                     JBPM Database
                                                                                                                        workflow
                                             DB                                 Job                                management system
                                           Release                             History

                                                                                                                   => full automatisation :
                     PkGDB                                                                     MicroCyc
                                                                                                                   • genome annotation
Data Management




                                                                                                                   • primary data up-to-date
                        Primary              Internal                   Computational                 Pathway
                       Databanks             Genomic                      results                     Genome
                                             Objects                                                 DataBases


                                                                                                                   Vallenet D. et al.
                                                                                                                   «MicroScope - a platform for
                                                                                                                   microbial genome annotation
                     MaGe Web Interface                                                       Keyword search
                                                                                             Blast and Pattern     and comparative genomics»
                              Tutorial
                                                               Login                       Phylogenetic Profile    Database 2009
Visualization




                                                                                              Fusion / Fission
                         Genome overview                                                    Tandem duplications
                                                           Genome browser                    Minimal Gene Set      Vallenet D, et al.
                            Data Export                         and                             RGPfinder
                                                            Synteny maps                       SNPs / InDels       «MaGe - a microbial genome
                              Artemis                                                                              annotation system supported
                                                                                                  KEGG
                                                                                                 MicroCyc          by synteny results» Nucleic
                             CGView
                             LinePlot
                                                 Synton        Gene         Gene              Metabolic Profile    Acids Research 2006
                                                 display       editor        card            Pathway / Synteny
Database Management
   Relational DataBase PkGDB
 (Prokaryotic Genome DataBase)
                                  EC / reaction
                                 correspondence
                                                  • Experimentally elucidated
                                                  metabolic pathways
                                                  • 1800 pathways from 2216
                                                  organisms

                                                                 (P. Karp, SRI, USA)
                                                         Pathway Tools
   A metabolic database is built for each annotated microbial genome
         PGDB = Pathway/Genome Database (orgname_Cyc)

                                   http://www.genoscope.cns.fr/agc/microcyc

                                                        Today: 1233 organisms
                                                          (of which 676 public
                                                               genomes)
  Mapping on the          PkGDB
  KEGG metabolic
     maps
 (http://www.kegg.jp/)
MicroScope Web site
    More than 30 tools are made available to the community
                               «guest» access
                           «guest» access




                                                               Since 2005, more than
                                                                   50.000 expert
                                                               annotations per year

                                                              > 1,000 users, 300 active




    www.genoscope.cns.fr/agc/microscope
Curation of metabolic data in Microscope
    CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration
   of genomic and metabolic contexts, that assists expert functional annotation, especially
   in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in
   genome sequence associated to “close” metabolic reactions):
    Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15.


                                   gene gaps

                                                                                    genes
                                                                                    on genome

      functional
      annotations
                                                               ?                  reactions and
                                                                                  compounds in
                                                                                  metabolic network



                                                                   reaction gap
                                                                   And ORPHAN

      The method provides candidate genes for global/local orphan enzymatic activities
      that are located in the “gaps” of metabolons
https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php
Curation of metabolic data in Microscope
      CanOE (Candidate genes for Orphan Enzymes)
                 Example: Allantoin degradation metabolon in E. coli K12
                 2.1.3.5 is a global orphan reaction (no associated to any gene in any
                 organism)




                                       Three candidate genes for EC:2.1.3.5 reaction
  None share any significant similarities with kown carbamoytransferases
  Protein expression and biochemical assays under way
Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple
prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)
Curation of metabolic data in Microscope

   GPR curation interface: In the context of network reconstruction, is essential the
  definition of Gene-Protein-Reaction associations (Genes encoding
  enzymes/complexes/isozymes catalyzing a particular metabolic reaction):




                                             Thiele & Palsson; Nat Protoc. 2010;5(1):93-121
Curation of metabolic data in Microscope
   GPR curation interface: The gene curation interface of Microscope allows the
  validation of Gene-Reaction associations based on curated gene annotations. Two
  reference reaction resources availables, MetaCyc (functional) and RHEA (under
  development):




                     4.1.3.27, 2.4.2.18                          Automatic retrieval of
                                                                 Metacyc/Rhea
                                                                 reactions based on
                                                                      EC number
                                                                       Keyword
                                                                      search
Curation of metabolic data in Microscope

   Pathway validation interface: Validation/curation of automatically projected MetaCyc
  pathways based on Gene-Reaction associations:
Projet Microme : www.microme.eu
                                     A Knowledge-Based Bioinformatics Framework
                                     for Microbial Pathway Genomics



                                                            AMAbiotics
 Purpose : develop bioinformatics infrastructures,                                Centro Nacional
together with a projection and curation process, in                                de Biotecnología

order to generate :                                         CEA-Genoscope

    - complete metabolic pathways from genome                                       European
                                                                                    Bioinformatics
annotations                                                 Center for research
                                                                                    Institute

    - whole-cell metabolic models from pathway              and Technology
                                                                                  German Collection of
                                                            Hellas
assemblies                                                                        Microorganisms and
                                                                                  Cell Cultures

                                                            ISTHMUS                 Spanish National
 Experimentally validation of metabolic model                                      Cancer Centre

using growth phenotype data (i.e, BIOLOG                    Molecular               Tel-Aviv
experiments) generated within the project for a             Networks                University

subset of selected species.
                                                                                    Université
                                                            Swiss Institute of
                                                                                    Libre de
                                                            Bioinformatics
                                                                                    Bruxelles
 Analytical tools are integrated for comparative
and phylogenetic analysis based on projected                                        Wageningen
                                                            Wellcome Trust
pathways and metabolic models                               Sanger Institute        University
Microme WP2: Objectives
 Provide EU with a curated microbial metabolic resource

 Implement a unique cyclic and colaborative curation process for metabolic data

 Unification of existing metabolic resources:

                Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions)
                Cross-references External resources (compounds, reactions, pathways):
               KEGG, MetaCyc, Metabolic models
 Alcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret
 L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754-
 D760, Database issue.




MicroScope and Microme
 Use MicroScope as reference resource of curated GPR (Gene Protein Reaction)
associations for microbial genomes included in Microme project
 Development of novel interfaces for GPR curation in Microscope environment. Retrieval
of METACYC and RHEA reactions for a particular gene object from EC number annotations
MicroScope and Microme
  Development of web-services to provide Microme partners with curated Gene-
 Reaction associations from Microscope platform
                                                             Curation tool




                         Reconstruction

              microcyc     Each night     PkGDB




                                   Web-services
Test-case: Bacillus subtilis 168 re-annotation

   Second most intensively studied bacterium after Escherichia coli, being a model
  organism for Gram-positive bacteria


   Genome sequenced in
  1997. 4,214 Megabases, 4000
  CDSs
                                                     Nature 1997 Nov 20;390(6657):249-56


   Re-sequencing and first re-
  annotation of the genome in
  2009

                                                      Microbiology (2009), 155, 1758-1775


   Re-annotation of the genome in the context of Microme project with special
  focus in the curation of Gene-Reaction associations by using Microscope metabolic
  tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics
  (Antoine Danchin)
Test-case: Bacillus subtilis 168 re-annotation

   Starting data for curation of Gene-Reaction associations

                                                                Predicted MetaCyc
                                                                reaction; BBH relationship
                                                                with E. coli CDSs


                                                                Predicted MetaCyc
                                                                reaction; No BBH
    310 CDSs
                                                                relationship with E. coli
                                          531 CDSs              CDSs
                909 CDSs
    508 CDSs                                         378 CDSs   "Putative enzymes" in
                                                                Product type annotation;
                                                                No predicted MetaCyc
                                                                reaction

                                                                "Enzymes" in Product type
                                                                annotation; No predicted
                                                                MetaCyc reaction
Test-case: Bacillus subtilis 168 re-annotation
   From the 909 CDS with predicted reaction

       531 with BBH in E. coli:

                 416 with same GPR in B.              Automatic validation of Gene-
                  subtilis and E. coli (EcoCyc)            Reaction associations

                 115 CDS with different GPR in
                  B. subtilis and E. coli (EcoCyc)   Manual curation of Gene-Reaction
                                                       associations in Microscope
       378 without BBH in E. coli:                           environment

                 254 with GPR predicted from          Sequence similarity profiles
                  the curated EC number
                                                       Genomic context
                 124 with GPR predicted from
                                                      conservation
                  “product” annotation

   310 CDS with “enzyme” annotation and               Integration of genomic and
  without predicted reaction                          metabolic context (CanOE
                                                      strategy)
   508 CDS with “enzyme” annotation and
  without predicted reaction: Filter by
                                                       Co-evolution patterns of
  Catalytic activity field in SwissProt
  annotations (41 CDSs)
                                                      functionally related genes
Test-case: Bacillus subtilis 168 re-annotation

 Problems associated to
automatic predictions of Gene-
Reaction associations. Example:
Generic EC number definition
associated to multiple specific                    No experimental
reaction instances in MetaCyc                    evidence of activity ;
                                                   generic product
                                                     annotation


                                         17 predicted reactions based
                                         on EC:1.2.1.3 annotation.
                                         Problems in terms of
                                         modelling purposes

                                         Without experimental
                                         evidence of specific
                                         substrates, only generic
                                         reaction has been validated
Test-case: Bacillus subtilis 168 re-annotation
        Stats of curation Gene-Reaction associations in Microscope


                                                  1022
     Nº reactions                                                                   Initial Gene-
                                             985 (388)
                                                                                    Reaction
                                                                                    predictions
                                            901                                     (Pathway Tools)
          Nº CDS
                                             1006 (517)

                                                                                    Current Gene-
Nº Gene-Reaction                                                 1549               Reaction
     associations                                          1406 (715)               associations
                                                                                    (Manually Curated)


                    0         500        1000             1500          2000

             105 CDS without
          automatically predicted                   147 new reactions added (not
             reaction in initial                   originally predicted)
                projections                         184 originally predicted
                                                   reactions removed
Test-case: Bacillus subtilis 168 re-annotation
  17 possible updates of SwissProt annotations      Reported to
                                                   SwissProt/IUBMB
  6 possible new EC numbers                           curators
  13 possible new metabolic pathways/pathway variants not presents in MetaCyc

            Biotin biosynthesis pathway variant
            Lipoate biosynthesis pathway variant
  New       Myoinositol catabolism pathway variant
pathway     Rhamnogalacturonan type I degradation pathway variant
variants    Acetoin dehydrogenase pathway variant
            Methionin salvage pathway variant
            Bacillaene biosynthesis pathway
            Aerobic respiration pathway variants

            Aromatic polyketide biosynthesis pathway
  New       2-methylthio-N6-threocarbamoyladenosine biosynthesis
 metab.    Bacilysocin biosynthesis
pathways   Archaeal-type ether lipid biosynthesis
           Bacillaene biosynthesis pathway
           Methionine-Cysteine interconversion
Test-case: Bacillus subtilis 168 re-annotation
  Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant
 (EC:2.6.1.62)
KEGG pathway (map00780)               MetaCyc pathway (PWY-5005)




                                                                                S-Adenosyl-L-
                                                                             methionine as amino
                                                                                 group donor



                                                 L-lysine instead S-adenosyl-
                                                 Methionine as amino group donor in
                                                 Bacillus subtilis BioA enzyme
Test-case: Bacillus subtilis 168 re-annotation
   Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of
  genome-scale metabolic models
         iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED
         methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL.
             Genome Biol. 2009;10(6):R69



Dead-end
metabolite

                                                                                                                 Auxotrophic for
       EX_pimelate                                                                                                   Biotin
                                                                                                                  biosynthesis
                                                                           FBA simulations iBsu1103 model

                                                                            122.97                            122.97           122.97
                                                                140.00
               Not included in
                                           Biomass prod. rate




                                                                120.00
              Biomass equation                                  100.00
                                                                 80.00
                                                                 60.00
                           EX_biotin                             40.00
                                                                                              0.00
                                                                 20.00
                                                                  0.00
                                                                         iBsu1103    iBsu1103; Biotin   iBsu1103;         iBsu1103;
                                                                                        in Biomass    External influx   External influx
                                                                                                         Pimelate           Biotin
Test-case: Bacillus subtilis 168 re-annotation
 BioI enzyme of B. subtilis 168: cytochrome
P450 protein that catalyzes the oxidative
cleavage of acyl-ACP/free fatty acid molecules
generated in the context of fatty acid
biosynthesis yielding pimeloyl-ACP as primary
product.


           Fatty acids                           An Acyl-ACP
           metabolism                                   BioI (BSU30190)   L-Alanine+H+

                                                 Pimeloyl-ACP             BioF (BSU30220)
                                                                             CO2+HoloACP


      A fatty acid
                    BioI
                 (BSU30190)
Future work

   Extension of the reference set of Microme species to:
       Acinetobacter sp. ADP1
       Pseudomonas putida KT2440
       Bacillus subtilis 168


   Second version of Gene-Reaction curation interface in Microscope
  environment:
        Curation of protein complexes / Isozyme sets
        Management of Rhea reactions in addition of MetaCyc reactions


   Definition of strategies for vertical annotation and propagation of curated
  GPR across multiple microbial genomes


   Use UniPathway as reference resource of metabolic pathways in Microscope;
  Specie-specific pathway representations based on Pathway modules
  combination (http://www.unipathway.org)
Contributions
            Claudine Médigue (Group Leader)
            David Vallenet (Researcher)
            Damien Monrico (Engineer)
            François Lefèvre (Engineer)
            Alexander T. Smith (PhD)
            Eugeni Belda (Post doc)

  IT team   Claude Scarpelli
            Ludovic Fleury


External partners
            Anne Morgat                                    Antoine Danchin



Foundings

                                EU Framework Programme 7 Collaborative
                               Project. Grant Agreement Number 222886-2

Contenu connexe

Tendances

Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of BiasRyan Poplin - Sources of Bias
Ryan Poplin - Sources of BiasGenomeInABottle
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionThermo Fisher Scientific
 
Consortium to produce_bio_fuels_from_jatropha[1]
Consortium to produce_bio_fuels_from_jatropha[1]Consortium to produce_bio_fuels_from_jatropha[1]
Consortium to produce_bio_fuels_from_jatropha[1]ehiosa
 
Infographic CV MB Long
Infographic CV MB LongInfographic CV MB Long
Infographic CV MB LongMetin Bilgin
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML TodayMike Hucka
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Sage Base
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Neil Swainston
 
Structure generation, metabolite space, and metabolite likeness
Structure generation, metabolite space, and metabolite likenessStructure generation, metabolite space, and metabolite likeness
Structure generation, metabolite space, and metabolite likenessVodafoneZiggo
 

Tendances (10)

Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of BiasRyan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein production
 
Consortium to produce_bio_fuels_from_jatropha[1]
Consortium to produce_bio_fuels_from_jatropha[1]Consortium to produce_bio_fuels_from_jatropha[1]
Consortium to produce_bio_fuels_from_jatropha[1]
 
Infographic CV MB Long
Infographic CV MB LongInfographic CV MB Long
Infographic CV MB Long
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML Today
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
Structure generation, metabolite space, and metabolite likeness
Structure generation, metabolite space, and metabolite likenessStructure generation, metabolite space, and metabolite likeness
Structure generation, metabolite space, and metabolite likeness
 

En vedette

Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Leighton Pritchard
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticssarwat bashir
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Sreekanth Gali
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Lecture 4 metabolic pathway eng
Lecture 4 metabolic pathway engLecture 4 metabolic pathway eng
Lecture 4 metabolic pathway engDr. Tan Boon Siong
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

En vedette (10)

Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Testppt
TestpptTestppt
Testppt
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Lecture 4 metabolic pathway eng
Lecture 4 metabolic pathway engLecture 4 metabolic pathway eng
Lecture 4 metabolic pathway eng
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similaire à Biocuration2012 Eugeni Belda

Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
Emerald bio nollert_pegs_draft_v4.3
Emerald bio nollert_pegs_draft_v4.3Emerald bio nollert_pegs_draft_v4.3
Emerald bio nollert_pegs_draft_v4.3Peter Nollert
 
Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Sage Base
 
Free software and bioinformatics
Free software and bioinformaticsFree software and bioinformatics
Free software and bioinformaticsAlberto Labarga
 
M Reich - GenomeSpace
M Reich - GenomeSpaceM Reich - GenomeSpace
M Reich - GenomeSpaceJan Aerts
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Sage Base
 
NSA 2012 M.Gavery
NSA 2012 M.GaveryNSA 2012 M.Gavery
NSA 2012 M.Gaverymgavery
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011Iddo
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011Iddo
 
Build Your Next Breakthrough Using Next-Generation Cloning
Build Your Next Breakthrough Using Next-Generation CloningBuild Your Next Breakthrough Using Next-Generation Cloning
Build Your Next Breakthrough Using Next-Generation CloningThermo Fisher Scientific
 
Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Sage Base
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema
 

Similaire à Biocuration2012 Eugeni Belda (20)

Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
Emerald bio nollert_pegs_draft_v4.3
Emerald bio nollert_pegs_draft_v4.3Emerald bio nollert_pegs_draft_v4.3
Emerald bio nollert_pegs_draft_v4.3
 
Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21Stephen Friend AMIA Symposium 2012-03-21
Stephen Friend AMIA Symposium 2012-03-21
 
Paper - Muhammad Gulraj
Paper - Muhammad GulrajPaper - Muhammad Gulraj
Paper - Muhammad Gulraj
 
Free software and bioinformatics
Free software and bioinformaticsFree software and bioinformatics
Free software and bioinformatics
 
M Reich - GenomeSpace
M Reich - GenomeSpaceM Reich - GenomeSpace
M Reich - GenomeSpace
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28
 
NSA 2012 M.Gavery
NSA 2012 M.GaveryNSA 2012 M.Gavery
NSA 2012 M.Gavery
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011
 
Build Your Next Breakthrough Using Next-Generation Cloning
Build Your Next Breakthrough Using Next-Generation CloningBuild Your Next Breakthrough Using Next-Generation Cloning
Build Your Next Breakthrough Using Next-Generation Cloning
 
Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Biocuration2012 Eugeni Belda

  • 1. Eugenio Belda Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team) CEA/DSV/IG/Genoscope & CNRS UMR8030
  • 2. Introduction  Advances in sequencing technologies has allowed an exponential accumulation of complete genome sequences in public databases in recent years. 12273 protein 4712 enzymatic  However, wide gap exist activities families (Pfam) between rapid advances in genome (EC number) sequencing and slow progress in 25% of 26% characterization of new protein orphan of unknown functions reactions functions ?  Genoscope (French National Sequencing Center) has as one fundamental research objective the extension of in silico sequence annotations with experimental characterization of new enzymatic functions (Metabolic Genomics). Lab. of Genomics & Biochemistry of Metabolism (LGBM)  Lab. of Organic Chemistry and Biocatalysis (LCOB) Lab. For enzymatic cloning and screening (LCAB) Lab. of Bioinformatic Analysis in Genomic and Metabolism (LABGeM)
  • 3. Three MicroScope components Process Management Primary Databank Syntactic Functional / relational > 25 methods : Update Annotations Analyses Integrated in a JBPM Database workflow DB Job management system Release History => full automatisation : PkGDB MicroCyc • genome annotation Data Management • primary data up-to-date Primary Internal Computational Pathway Databanks Genomic results Genome Objects DataBases Vallenet D. et al. «MicroScope - a platform for microbial genome annotation MaGe Web Interface Keyword search Blast and Pattern and comparative genomics» Tutorial Login Phylogenetic Profile Database 2009 Visualization Fusion / Fission Genome overview Tandem duplications Genome browser Minimal Gene Set Vallenet D, et al. Data Export and RGPfinder Synteny maps SNPs / InDels «MaGe - a microbial genome Artemis annotation system supported KEGG MicroCyc by synteny results» Nucleic CGView LinePlot Synton Gene Gene Metabolic Profile Acids Research 2006 display editor card Pathway / Synteny
  • 4. Database Management Relational DataBase PkGDB (Prokaryotic Genome DataBase) EC / reaction correspondence • Experimentally elucidated metabolic pathways • 1800 pathways from 2216 organisms (P. Karp, SRI, USA) Pathway Tools A metabolic database is built for each annotated microbial genome PGDB = Pathway/Genome Database (orgname_Cyc) http://www.genoscope.cns.fr/agc/microcyc Today: 1233 organisms (of which 676 public genomes) Mapping on the PkGDB KEGG metabolic maps (http://www.kegg.jp/)
  • 5. MicroScope Web site  More than 30 tools are made available to the community «guest» access «guest» access Since 2005, more than 50.000 expert annotations per year > 1,000 users, 300 active www.genoscope.cns.fr/agc/microscope
  • 6. Curation of metabolic data in Microscope  CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration of genomic and metabolic contexts, that assists expert functional annotation, especially in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in genome sequence associated to “close” metabolic reactions): Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15. gene gaps genes on genome functional annotations ? reactions and compounds in metabolic network reaction gap And ORPHAN The method provides candidate genes for global/local orphan enzymatic activities that are located in the “gaps” of metabolons https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php
  • 7. Curation of metabolic data in Microscope  CanOE (Candidate genes for Orphan Enzymes) Example: Allantoin degradation metabolon in E. coli K12 2.1.3.5 is a global orphan reaction (no associated to any gene in any organism) Three candidate genes for EC:2.1.3.5 reaction  None share any significant similarities with kown carbamoytransferases  Protein expression and biochemical assays under way Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)
  • 8. Curation of metabolic data in Microscope  GPR curation interface: In the context of network reconstruction, is essential the definition of Gene-Protein-Reaction associations (Genes encoding enzymes/complexes/isozymes catalyzing a particular metabolic reaction): Thiele & Palsson; Nat Protoc. 2010;5(1):93-121
  • 9. Curation of metabolic data in Microscope  GPR curation interface: The gene curation interface of Microscope allows the validation of Gene-Reaction associations based on curated gene annotations. Two reference reaction resources availables, MetaCyc (functional) and RHEA (under development): 4.1.3.27, 2.4.2.18 Automatic retrieval of Metacyc/Rhea reactions based on EC number  Keyword search
  • 10. Curation of metabolic data in Microscope  Pathway validation interface: Validation/curation of automatically projected MetaCyc pathways based on Gene-Reaction associations:
  • 11. Projet Microme : www.microme.eu A Knowledge-Based Bioinformatics Framework for Microbial Pathway Genomics AMAbiotics  Purpose : develop bioinformatics infrastructures, Centro Nacional together with a projection and curation process, in de Biotecnología order to generate : CEA-Genoscope - complete metabolic pathways from genome European Bioinformatics annotations Center for research Institute - whole-cell metabolic models from pathway and Technology German Collection of Hellas assemblies Microorganisms and Cell Cultures ISTHMUS Spanish National  Experimentally validation of metabolic model Cancer Centre using growth phenotype data (i.e, BIOLOG Molecular Tel-Aviv experiments) generated within the project for a Networks University subset of selected species. Université Swiss Institute of Libre de Bioinformatics Bruxelles  Analytical tools are integrated for comparative and phylogenetic analysis based on projected Wageningen Wellcome Trust pathways and metabolic models Sanger Institute University
  • 12. Microme WP2: Objectives  Provide EU with a curated microbial metabolic resource  Implement a unique cyclic and colaborative curation process for metabolic data  Unification of existing metabolic resources:  Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions)  Cross-references External resources (compounds, reactions, pathways): KEGG, MetaCyc, Metabolic models Alcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754- D760, Database issue. MicroScope and Microme  Use MicroScope as reference resource of curated GPR (Gene Protein Reaction) associations for microbial genomes included in Microme project  Development of novel interfaces for GPR curation in Microscope environment. Retrieval of METACYC and RHEA reactions for a particular gene object from EC number annotations
  • 13. MicroScope and Microme  Development of web-services to provide Microme partners with curated Gene- Reaction associations from Microscope platform Curation tool Reconstruction microcyc Each night PkGDB Web-services
  • 14. Test-case: Bacillus subtilis 168 re-annotation  Second most intensively studied bacterium after Escherichia coli, being a model organism for Gram-positive bacteria  Genome sequenced in 1997. 4,214 Megabases, 4000 CDSs Nature 1997 Nov 20;390(6657):249-56  Re-sequencing and first re- annotation of the genome in 2009 Microbiology (2009), 155, 1758-1775  Re-annotation of the genome in the context of Microme project with special focus in the curation of Gene-Reaction associations by using Microscope metabolic tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics (Antoine Danchin)
  • 15. Test-case: Bacillus subtilis 168 re-annotation  Starting data for curation of Gene-Reaction associations Predicted MetaCyc reaction; BBH relationship with E. coli CDSs Predicted MetaCyc reaction; No BBH 310 CDSs relationship with E. coli 531 CDSs CDSs 909 CDSs 508 CDSs 378 CDSs "Putative enzymes" in Product type annotation; No predicted MetaCyc reaction "Enzymes" in Product type annotation; No predicted MetaCyc reaction
  • 16. Test-case: Bacillus subtilis 168 re-annotation  From the 909 CDS with predicted reaction  531 with BBH in E. coli:  416 with same GPR in B. Automatic validation of Gene- subtilis and E. coli (EcoCyc) Reaction associations  115 CDS with different GPR in B. subtilis and E. coli (EcoCyc) Manual curation of Gene-Reaction associations in Microscope  378 without BBH in E. coli: environment  254 with GPR predicted from  Sequence similarity profiles the curated EC number  Genomic context  124 with GPR predicted from conservation “product” annotation  310 CDS with “enzyme” annotation and  Integration of genomic and without predicted reaction metabolic context (CanOE strategy)  508 CDS with “enzyme” annotation and without predicted reaction: Filter by  Co-evolution patterns of Catalytic activity field in SwissProt annotations (41 CDSs) functionally related genes
  • 17. Test-case: Bacillus subtilis 168 re-annotation  Problems associated to automatic predictions of Gene- Reaction associations. Example: Generic EC number definition associated to multiple specific No experimental reaction instances in MetaCyc evidence of activity ; generic product annotation 17 predicted reactions based on EC:1.2.1.3 annotation. Problems in terms of modelling purposes Without experimental evidence of specific substrates, only generic reaction has been validated
  • 18. Test-case: Bacillus subtilis 168 re-annotation  Stats of curation Gene-Reaction associations in Microscope 1022 Nº reactions Initial Gene- 985 (388) Reaction predictions 901 (Pathway Tools) Nº CDS 1006 (517) Current Gene- Nº Gene-Reaction 1549 Reaction associations 1406 (715) associations (Manually Curated) 0 500 1000 1500 2000 105 CDS without automatically predicted  147 new reactions added (not reaction in initial originally predicted) projections  184 originally predicted reactions removed
  • 19. Test-case: Bacillus subtilis 168 re-annotation  17 possible updates of SwissProt annotations Reported to SwissProt/IUBMB  6 possible new EC numbers curators  13 possible new metabolic pathways/pathway variants not presents in MetaCyc  Biotin biosynthesis pathway variant  Lipoate biosynthesis pathway variant New  Myoinositol catabolism pathway variant pathway  Rhamnogalacturonan type I degradation pathway variant variants  Acetoin dehydrogenase pathway variant  Methionin salvage pathway variant  Bacillaene biosynthesis pathway  Aerobic respiration pathway variants  Aromatic polyketide biosynthesis pathway New  2-methylthio-N6-threocarbamoyladenosine biosynthesis metab. Bacilysocin biosynthesis pathways Archaeal-type ether lipid biosynthesis Bacillaene biosynthesis pathway Methionine-Cysteine interconversion
  • 20. Test-case: Bacillus subtilis 168 re-annotation  Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant (EC:2.6.1.62) KEGG pathway (map00780) MetaCyc pathway (PWY-5005) S-Adenosyl-L- methionine as amino group donor L-lysine instead S-adenosyl- Methionine as amino group donor in Bacillus subtilis BioA enzyme
  • 21. Test-case: Bacillus subtilis 168 re-annotation  Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of genome-scale metabolic models iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL. Genome Biol. 2009;10(6):R69 Dead-end metabolite Auxotrophic for EX_pimelate Biotin biosynthesis FBA simulations iBsu1103 model 122.97 122.97 122.97 140.00 Not included in Biomass prod. rate 120.00 Biomass equation 100.00 80.00 60.00 EX_biotin 40.00 0.00 20.00 0.00 iBsu1103 iBsu1103; Biotin iBsu1103; iBsu1103; in Biomass External influx External influx Pimelate Biotin
  • 22. Test-case: Bacillus subtilis 168 re-annotation  BioI enzyme of B. subtilis 168: cytochrome P450 protein that catalyzes the oxidative cleavage of acyl-ACP/free fatty acid molecules generated in the context of fatty acid biosynthesis yielding pimeloyl-ACP as primary product. Fatty acids An Acyl-ACP metabolism BioI (BSU30190) L-Alanine+H+ Pimeloyl-ACP BioF (BSU30220) CO2+HoloACP A fatty acid BioI (BSU30190)
  • 23. Future work  Extension of the reference set of Microme species to:  Acinetobacter sp. ADP1  Pseudomonas putida KT2440  Bacillus subtilis 168  Second version of Gene-Reaction curation interface in Microscope environment:  Curation of protein complexes / Isozyme sets  Management of Rhea reactions in addition of MetaCyc reactions  Definition of strategies for vertical annotation and propagation of curated GPR across multiple microbial genomes  Use UniPathway as reference resource of metabolic pathways in Microscope; Specie-specific pathway representations based on Pathway modules combination (http://www.unipathway.org)
  • 24. Contributions Claudine Médigue (Group Leader) David Vallenet (Researcher) Damien Monrico (Engineer) François Lefèvre (Engineer) Alexander T. Smith (PhD) Eugeni Belda (Post doc) IT team Claude Scarpelli Ludovic Fleury External partners Anne Morgat Antoine Danchin Foundings EU Framework Programme 7 Collaborative Project. Grant Agreement Number 222886-2