SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Bringing Standards to Life:
 Software Development by the
          Genomics
    Standards Consortium




            Renzo Kottmann
             Microbial Genomics Group
     Max Planck Institute for Marine Microbiology

      M3 SIG Stockholm July 2009                1
Genomic Standards Consortium (GSC)

Goal
  • Promote mechanisms that
         standardize the description of genomes
         exchange and integrate genomic data

Open-membership, international working body
  • Established in Sept 2005
  • Participants include DDBJ, EMBL, GenBank, Sanger,
    JCVI, JGI, EBI and a range of US, UK and EU research
    institutions
  • Organized a series of workshops


                                                                             2       2
            http://gensc.org and http://gensc.org/gc_wiki/index.php/GSC_Membership
Minimum Information about a Genome Sequence
              (MIGS) Specification

MIGS extends what DDBJ/EMBL/GenBank request
 upon submission of a genome sequence
  • Examples:
       Description of geographic location of a sample and
        habitat
       “Minimum Information about a Metagenomic Sequence”
        (MIMS)
         – Temperature
         – pH
       Description of sequence generation
         – Sequencing method
         – Assembly method

                                                             3   3
                         Field et al. Nat Biotechnol. 2008
MIGS Checklist 2.0




                                      4   4
  Field et al. Nat Biotechnol. 2008
MIGS Checklist 2.0




                                          M = mandatory




                                      5              5
  Field et al. Nat Biotechnol. 2008
Software Development for MIGS/MIMS

Mechanisms for
 achieving compliance
 are needed:
  • Such mechanisms
    involve
       an appropriate reporting
        structure for capturing
        and exchanging data,
        software,
        databases
        and controlled
        vocabularies and/or
        ontologies for defining
        the terms used in the
        annotations.

                                         6
     Field et al. Nat Biotechnol. 2008
Software Development for MIGS/MIMS

Mechanisms for                          Supporting Projects:
 achieving compliance                      • Habitat-Lite (Ontology
 are needed:                                 specification)
  • Such mechanisms
    involve
       an appropriate reporting
        structure for capturing
        and exchanging data,
        software,
        databases
        and controlled
        vocabularies and/or
        ontologies for defining
        the terms used in the
        annotations.

                                                        7
     Field et al. Nat Biotechnol. 2008
Software Development for MIGS/MIMS

Mechanisms for                          Supporting Projects:
 achieving compliance                      • Habitat-Lite (Ontology
 are needed:                                 specification)
  • Such mechanisms                        • Genomic Rosetta Stone
    involve                                  (Identifier Mapping)
       an appropriate reporting
        structure for capturing
        and exchanging data,
        software,
        databases
        and controlled
        vocabularies and/or
        ontologies for defining
        the terms used in the
        annotations.

                                                       8
     Field et al. Nat Biotechnol. 2008
Software Development for MIGS/MIMS

Mechanisms for                          Supporting Projects:
 achieving compliance                      • Habitat-Lite (Ontology
 are needed:                                 specification)
  • Such mechanisms                        • Genomic Rosetta Stone
    involve                                  (Identifier Mapping)
       an appropriate reporting           • GCDML (MIGS/MIMS
        structure for capturing
        and exchanging data,                 specification in XML)
        software,
        databases
        and controlled
        vocabularies and/or
        ontologies for defining
        the terms used in the
        annotations.

                                                       9
     Field et al. Nat Biotechnol. 2008
Software Development for MIGS/MIMS

Mechanisms for                          Supporting Projects:
 achieving compliance                      • Habitat-Lite (Ontology
 are needed:                                 specification)
  • Such mechanisms                        • Genomic Rosetta Stone
    involve                                  (Identifier Mapping)
       an appropriate reporting           • GCDML (MIGS/MIMS
        structure for capturing
        and exchanging data,                 specification in XML)
        software,                         • Genomes Catalogue
        databases                           (Database and Web
        and controlled                      Server)
        vocabularies and/or
        ontologies for defining
        the terms used in the
        annotations.

                                                       10
     Field et al. Nat Biotechnol. 2008
Aquatic Aquatic: Freshwater Acquatic: Marine Terrestrial Air Fossil Food Organism-Associated Extreme Habitat Other


                                               Habitat-Lite (= EnvO-Lite)
        Easy-to-use (small) set of terms
                • Captures high-level information about habitat
                • Derived from the Environment Ontology (EnvO).

        Meet the needs of multiple users
                • Annotators, database providers, biologists, and
                  bioinformaticians alike who need to search and
                  employ such data in comparative analyses.




                                                                  Hirschman et al. OMICS. 2008                       11   11
Habitat-Lite

            1. Level                                  2. Level
Aquatic                              soil
 Aquatic: Freshwater                 sediment
 Aquatic: Marine                     sludge
Terrestrial                          waste water
Air                                  hot spring
Fossil                               hydrothermal vent
Food                                 biofilm
Organism-Associated                  microbial mat
Extreme Habitat
Other


                       < 20 terms

                       Hirschman et al. OMICS. 2008        12    12
Habitat-Lite applied




   http://www.megx.net/genomes   13   13
Genomic Rosetta Stone (GRS)
Create a unified mapping between different genomic
 resources
Improve navigation across these resources
Enable the integration of this information in the near
 future.




                    Van Brabant et al. OMICS. 2008   14   14
Genomic Rosetta Stone (GRS)




       Van Brabant et al. OMICS. 2008   15   15
Genomic Rosetta Stone (GRS)
Enable the integration of this information in the near
 future




                    Van Brabant et al. OMICS. 2008   16   16
Genomic Contextual Data
             Markup Language (GCDML)


An Extensible Markup Language (XML)


Aim
  • Implement MIGS/MIMS
  • Provide even more descriptors
  • Facilitate exchange and integration of genomic data




                      Kottmann et al. OMICS. 2008   17    17
GCDML Example (excerpt)



<gcdml:originalSample>
  <gcdml:physicalMaterial>
    <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

    <gcdml:samplePointLocation>
      <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>
      <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>
      <gcdml:pos2D>54.329 10.149</gcdml:pos2D>
      <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>
    </gcdml:samplePointLocation>

    <gcdml:marineHabitat>
      <gcdml:waterBody>
         <gcdml:depth>
           <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>
         </gcdml:depth>
      </gcdml:waterBody>
    </gcdml:marineHabitat>

     <gcdml:materialType>seawater</gcdml:materialType>
     <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>
  </gcdml:physicalMaterial>
</gcdml:originalSample>                                                                 18
                                             Kottmann et al. OMICS. 2008                                         18
GCDML Example (excerpt)



<gcdml:originalSample>
  <gcdml:physicalMaterial>
    <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

    <gcdml:samplePointLocation>
      <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>
      <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>
      <gcdml:pos2D>54.329 10.149</gcdml:pos2D>
      <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>
    </gcdml:samplePointLocation>

    <gcdml:marineHabitat>
      <gcdml:waterBody>
         <gcdml:depth>
           <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>
         </gcdml:depth>
      </gcdml:waterBody>
    </gcdml:marineHabitat>

     <gcdml:materialType>seawater</gcdml:materialType>
     <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>
  </gcdml:physicalMaterial>
</gcdml:originalSample>                                                                 19
                                             Kottmann et al. OMICS. 2008                                         19
GCDML Example (excerpt)



<gcdml:originalSample>
  <gcdml:physicalMaterial>
    <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime>

    <gcdml:samplePointLocation>
      <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord>
      <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString>
      <gcdml:pos2D>54.329 10.149</gcdml:pos2D>
      <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod>
    </gcdml:samplePointLocation>

    <gcdml:marineHabitat>
      <gcdml:waterBody>
         <gcdml:depth>
           <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure>
         </gcdml:depth>
      </gcdml:waterBody>
    </gcdml:marineHabitat>

     <gcdml:materialType>seawater</gcdml:materialType>
     <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount>
  </gcdml:physicalMaterial>
</gcdml:originalSample>                                                                 20
                                             Kottmann et al. OMICS. 2008                                         20
Genome Catalogue
Online system for capturing MIGS/MIMS compliant
 reports




                    Field et al. Nature 2008   21   21
Genome Catalogue
Requirements
  • A Rich toolkit/user-friendly
  • Designed to give credit to all contributors
  • XML-based (GCDML)
        Able to maintain all versions of GCDML schemas
  • Web services-based
        Supporting the automated exchange of content
  • Serve as the international GCAT identifier authority
  • Comprehensive
        Containing reports for all taxa and metagenomes
  • Ontology-supportive
  • Shared by the GSC

                                                 22        22
Current Status
We have specifications:
  • MIGS/MIMS
  • Habitat-Lite
  • Genomic Rosetta Stone
Work on supporting software is ongoing:
  • Genomes Catalogue is in prototype status
  • Funding
        This is a long-term endeavour that can not be done on a
         voluntary basis




                                                  23               23
Disscusion
Need of software for:
  • Creation of MIGS/MIMS data
  • Storage
  • Analysis
Expand standardization efforts to
  • Software specification/development
  • Work on a standardized genomic data management
    architecture / cyberinfrastructure
Data intensive science is successful if it works
 towards one community with one vision
  • World Wide Genomics project

                                          24         24
Acknowledgements

All Members of GSC incl.
       Dawn Field
       Peter Sterk
       Saul Kravitz
       Tanya Gray

Megx.net team
       Frank Oliver Glöckner
       Ivaylo Kostadinov
       Melissa Beth Duhaime
       Pier Luigi Buttigieg
       Wolfgang Hankeln
       Pelin Yilmaz


                                            25
END



Looking forward to the discussion

          Join the GSC
         http://gensc.org


                            26       26

Contenu connexe

Similaire à Software Development by the Genomics Standards Consortium

The MIBBI Foundry and its Modules
The MIBBI Foundry and its ModulesThe MIBBI Foundry and its Modules
The MIBBI Foundry and its Modules
MIBBI Checklists
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524
EDINA, University of Edinburgh
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
TERN Australia
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
EDINA, University of Edinburgh
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
EDINA, University of Edinburgh
 
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
TERN Australia
 
Human genome project the mitre corporation - jason program office
Human genome project   the mitre corporation - jason program officeHuman genome project   the mitre corporation - jason program office
Human genome project the mitre corporation - jason program office
PublicLeaker
 
Human genome project the mitre corporation - jason program office
Human genome project   the mitre corporation - jason program officeHuman genome project   the mitre corporation - jason program office
Human genome project the mitre corporation - jason program office
PublicLeaks
 

Similaire à Software Development by the Genomics Standards Consortium (20)

The MIBBI Foundry and its Modules
The MIBBI Foundry and its ModulesThe MIBBI Foundry and its Modules
The MIBBI Foundry and its Modules
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011
 
iRODS
iRODSiRODS
iRODS
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
 
AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite Prediction
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbes
 
Cpascoe pimms or2012_
Cpascoe pimms or2012_Cpascoe pimms or2012_
Cpascoe pimms or2012_
 
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group BioinformaticsDr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
 
Human genome project the mitre corporation - jason program office
Human genome project   the mitre corporation - jason program officeHuman genome project   the mitre corporation - jason program office
Human genome project the mitre corporation - jason program office
 
Human genome project the mitre corporation - jason program office
Human genome project   the mitre corporation - jason program officeHuman genome project   the mitre corporation - jason program office
Human genome project the mitre corporation - jason program office
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
 
Brizio rossibiodec
Brizio rossibiodecBrizio rossibiodec
Brizio rossibiodec
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Software Development by the Genomics Standards Consortium

  • 1. Bringing Standards to Life: Software Development by the Genomics Standards Consortium Renzo Kottmann Microbial Genomics Group Max Planck Institute for Marine Microbiology M3 SIG Stockholm July 2009 1
  • 2. Genomic Standards Consortium (GSC) Goal • Promote mechanisms that  standardize the description of genomes  exchange and integrate genomic data Open-membership, international working body • Established in Sept 2005 • Participants include DDBJ, EMBL, GenBank, Sanger, JCVI, JGI, EBI and a range of US, UK and EU research institutions • Organized a series of workshops 2 2 http://gensc.org and http://gensc.org/gc_wiki/index.php/GSC_Membership
  • 3. Minimum Information about a Genome Sequence (MIGS) Specification MIGS extends what DDBJ/EMBL/GenBank request upon submission of a genome sequence • Examples:  Description of geographic location of a sample and habitat  “Minimum Information about a Metagenomic Sequence” (MIMS) – Temperature – pH  Description of sequence generation – Sequencing method – Assembly method 3 3 Field et al. Nat Biotechnol. 2008
  • 4. MIGS Checklist 2.0 4 4 Field et al. Nat Biotechnol. 2008
  • 5. MIGS Checklist 2.0 M = mandatory 5 5 Field et al. Nat Biotechnol. 2008
  • 6. Software Development for MIGS/MIMS Mechanisms for achieving compliance are needed: • Such mechanisms involve  an appropriate reporting structure for capturing and exchanging data,  software,  databases  and controlled vocabularies and/or ontologies for defining the terms used in the annotations. 6 Field et al. Nat Biotechnol. 2008
  • 7. Software Development for MIGS/MIMS Mechanisms for Supporting Projects: achieving compliance • Habitat-Lite (Ontology are needed: specification) • Such mechanisms involve  an appropriate reporting structure for capturing and exchanging data,  software,  databases  and controlled vocabularies and/or ontologies for defining the terms used in the annotations. 7 Field et al. Nat Biotechnol. 2008
  • 8. Software Development for MIGS/MIMS Mechanisms for Supporting Projects: achieving compliance • Habitat-Lite (Ontology are needed: specification) • Such mechanisms • Genomic Rosetta Stone involve (Identifier Mapping)  an appropriate reporting structure for capturing and exchanging data,  software,  databases  and controlled vocabularies and/or ontologies for defining the terms used in the annotations. 8 Field et al. Nat Biotechnol. 2008
  • 9. Software Development for MIGS/MIMS Mechanisms for Supporting Projects: achieving compliance • Habitat-Lite (Ontology are needed: specification) • Such mechanisms • Genomic Rosetta Stone involve (Identifier Mapping)  an appropriate reporting • GCDML (MIGS/MIMS structure for capturing and exchanging data, specification in XML)  software,  databases  and controlled vocabularies and/or ontologies for defining the terms used in the annotations. 9 Field et al. Nat Biotechnol. 2008
  • 10. Software Development for MIGS/MIMS Mechanisms for Supporting Projects: achieving compliance • Habitat-Lite (Ontology are needed: specification) • Such mechanisms • Genomic Rosetta Stone involve (Identifier Mapping)  an appropriate reporting • GCDML (MIGS/MIMS structure for capturing and exchanging data, specification in XML)  software, • Genomes Catalogue  databases (Database and Web  and controlled Server) vocabularies and/or ontologies for defining the terms used in the annotations. 10 Field et al. Nat Biotechnol. 2008
  • 11. Aquatic Aquatic: Freshwater Acquatic: Marine Terrestrial Air Fossil Food Organism-Associated Extreme Habitat Other Habitat-Lite (= EnvO-Lite) Easy-to-use (small) set of terms • Captures high-level information about habitat • Derived from the Environment Ontology (EnvO). Meet the needs of multiple users • Annotators, database providers, biologists, and bioinformaticians alike who need to search and employ such data in comparative analyses. Hirschman et al. OMICS. 2008 11 11
  • 12. Habitat-Lite 1. Level 2. Level Aquatic soil Aquatic: Freshwater sediment Aquatic: Marine sludge Terrestrial waste water Air hot spring Fossil hydrothermal vent Food biofilm Organism-Associated microbial mat Extreme Habitat Other < 20 terms Hirschman et al. OMICS. 2008 12 12
  • 13. Habitat-Lite applied http://www.megx.net/genomes 13 13
  • 14. Genomic Rosetta Stone (GRS) Create a unified mapping between different genomic resources Improve navigation across these resources Enable the integration of this information in the near future. Van Brabant et al. OMICS. 2008 14 14
  • 15. Genomic Rosetta Stone (GRS) Van Brabant et al. OMICS. 2008 15 15
  • 16. Genomic Rosetta Stone (GRS) Enable the integration of this information in the near future Van Brabant et al. OMICS. 2008 16 16
  • 17. Genomic Contextual Data Markup Language (GCDML) An Extensible Markup Language (XML) Aim • Implement MIGS/MIMS • Provide even more descriptors • Facilitate exchange and integration of genomic data Kottmann et al. OMICS. 2008 17 17
  • 18. GCDML Example (excerpt) <gcdml:originalSample> <gcdml:physicalMaterial> <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime> <gcdml:samplePointLocation> <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord> <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString> <gcdml:pos2D>54.329 10.149</gcdml:pos2D> <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod> </gcdml:samplePointLocation> <gcdml:marineHabitat> <gcdml:waterBody> <gcdml:depth> <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure> </gcdml:depth> </gcdml:waterBody> </gcdml:marineHabitat> <gcdml:materialType>seawater</gcdml:materialType> <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount> </gcdml:physicalMaterial> </gcdml:originalSample> 18 Kottmann et al. OMICS. 2008 18
  • 19. GCDML Example (excerpt) <gcdml:originalSample> <gcdml:physicalMaterial> <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime> <gcdml:samplePointLocation> <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord> <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString> <gcdml:pos2D>54.329 10.149</gcdml:pos2D> <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod> </gcdml:samplePointLocation> <gcdml:marineHabitat> <gcdml:waterBody> <gcdml:depth> <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure> </gcdml:depth> </gcdml:waterBody> </gcdml:marineHabitat> <gcdml:materialType>seawater</gcdml:materialType> <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount> </gcdml:physicalMaterial> </gcdml:originalSample> 19 Kottmann et al. OMICS. 2008 19
  • 20. GCDML Example (excerpt) <gcdml:originalSample> <gcdml:physicalMaterial> <gcdml:samplingTime><gcdml:notGiven>unknown</gcdml:notGiven></gcdml:samplingTime> <gcdml:samplePointLocation> <gml:LocationKeyWord>Baltic Sea</gml:LocationKeyWord> <gml:LocationString>Kiel Fjord, Baltic Sea, Germany</gml:LocationString> <gcdml:pos2D>54.329 10.149</gcdml:pos2D> <gcdml:determinationMethod>derived from literature</gcdml:determinationMethod> </gcdml:samplePointLocation> <gcdml:marineHabitat> <gcdml:waterBody> <gcdml:depth> <gcdml:measure min="0.00" max="0.05“><gcdml:values uom="m">0.00 0.05</gcdml:values></gcdml:measure> </gcdml:depth> </gcdml:waterBody> </gcdml:marineHabitat> <gcdml:materialType>seawater</gcdml:materialType> <gcdml:amount><gcdml:measure><gcdml:values uom="ml">100</gcdml:values></gcdml:measure></gcdml:amount> </gcdml:physicalMaterial> </gcdml:originalSample> 20 Kottmann et al. OMICS. 2008 20
  • 21. Genome Catalogue Online system for capturing MIGS/MIMS compliant reports Field et al. Nature 2008 21 21
  • 22. Genome Catalogue Requirements • A Rich toolkit/user-friendly • Designed to give credit to all contributors • XML-based (GCDML)  Able to maintain all versions of GCDML schemas • Web services-based  Supporting the automated exchange of content • Serve as the international GCAT identifier authority • Comprehensive  Containing reports for all taxa and metagenomes • Ontology-supportive • Shared by the GSC 22 22
  • 23. Current Status We have specifications: • MIGS/MIMS • Habitat-Lite • Genomic Rosetta Stone Work on supporting software is ongoing: • Genomes Catalogue is in prototype status • Funding  This is a long-term endeavour that can not be done on a voluntary basis 23 23
  • 24. Disscusion Need of software for: • Creation of MIGS/MIMS data • Storage • Analysis Expand standardization efforts to • Software specification/development • Work on a standardized genomic data management architecture / cyberinfrastructure Data intensive science is successful if it works towards one community with one vision • World Wide Genomics project 24 24
  • 25. Acknowledgements All Members of GSC incl.  Dawn Field  Peter Sterk  Saul Kravitz  Tanya Gray Megx.net team  Frank Oliver Glöckner  Ivaylo Kostadinov  Melissa Beth Duhaime  Pier Luigi Buttigieg  Wolfgang Hankeln  Pelin Yilmaz 25
  • 26. END Looking forward to the discussion Join the GSC http://gensc.org 26 26