SlideShare une entreprise Scribd logo
1  sur  21
Modeling a microbial community 
and biodiversity assay with OBI 
and PCO: the gains of a modular 
approach 
ICBO2014, in Houston Oct 6-9 
Philippe Rocca-Serra, Ramona Walls, Jacob Parnell, Rachel Gallery, Jie 
Zheng, Susanna Assunta Sansone and Alejandra Gonzalez-Beltran
Biodiversity in the 
News 
• Grim headlines 
• True for many 
Vertebrates species 
• Mankind only now 
starts to build tools 
enabling true 
exploration of diversity
Exploring the world biodiversity 
• Game changing progress in sequencing 
technology 
– Illumina 
– Oxford Nanopore Minion 
http://dx.doi.org/10.5524/100102
Microbial Diversity
Biodiversity studies with molecular 
techniques 
• Shotgun sequencing: 
– Sequencing as much as possible (probing is 
limited by sequencing depth available, the 
rarer the species, the deeper the sequencing 
needs to be) 
• Targeted sequencing: 
– Reliance on a ‘marker gene’ whose variability 
will be used to estimate distance between 
species
‘Barcode’ as in Multiplexed 
Libraries 
genomic DNA isolated from individual sample is 
-fragmented (shearing) 
-ligated to a unique short DNA tag (i.e called the barcode) 
-PCR amplification and sequencing 
-output of a single collection of reads which can be subsequently sorted 
using the DNA short-hand by computational mean – deconvolution process 
Credits: http://rdp.cme.msu.edu/wiki/index.php/Pyrosequencing_Help
‘Barcode’ as in Barcode of Life 
Credits: http://www.barcodeoflife.org
Ambiguous Language 
• What is a barcode or what is a barcoding 
experiment? 
– Metaphors are impenetrable to computers. 
– Need to make representation unambiguous 
– Barcoding, meaning a technique for 
processing more samples in one go -> 
another word for multiplexing 
– Barcoding, meaning the creation of a unique 
profile as a means to identify types of living 
things
Heaps of sequence data for 
sure….but 
• What is the value in 
the absence of 
accompanying 
descriptors? 
• Essential annotation 
to ascertain identity 
and origin, sampling 
conditions and 
rationale
Helping Data Management 
• MIXS Guidelines checklist 
• SRA xml schema, Genbank records… 
• Tabular Templates for Data Collection 
• Wealth of RDF conversion tools 
– R2RML W3C data standards 
• Using the same xml and same guidelines, 
nevertheless ambiguities subsist
ISA templates for Microbial 
Diversity Studies 
• Integrating MIXS checklist in the ISA 
framework 
• Mapping MIXS entities into SRA XML 
schema 
– Properties of sample 
– Properties of sample processing 
– Properties of resulting libraries 
– Properties of data processing
Ambiguities: Barcoding 
• Library Experiment Sample unicity 
• Use Case: creation of libraries for 
Bacteria,Fungi,Eukaryota with specific genes 
(16sRNA, ITS, COI) 
• ISA conversion to ENA: 
– 1 sample -> 3 libraries 
• SRA/ENA submission: 
– 3 libraries -> 3 samples
Working with OBI, PCO,SO, CHEBI 
Drawn using CMAPtools: http://cmap.ihmc.us
Working with OBI, PCO,SO, CHEBI 
Drawn using CMAPtools: http://cmap.ihmc.us
OBI-PCO based representation 
• ‘targeted gene survey’ 
• has part some ‘library preparation’ (OBI_0000711) 
• ‘polymerase chain reaction’ (OBI_0000415) is_part_of ‘library preparation’ (OBI_0000711) 
• ‘polymerase chain reaction’(OBI_0000415) 
• has_specified_input some ‘forward pcr primer’ (OBI_0000722) 
• has_specified_input some ‘reverse pcr primer’ (OBI_0001951) 
• has_specified_input some ‘multiplexing sequence identifier’ 
• has_specified_input some ‘DNA extract’ (OBI_0001051) 
• ‘library preparation’ (OBI_0000711) ‘has_specified_output’ some ‘single fragment library’ 
(OBI_0000736) 
• ‘library preparation’ (OBI_0000711) precedes ‘DNA sequencing’(OBI_0000626) 
• ‘library sequence deconvolution’ is_preceded_by ‘DNA sequencing’(OBI_0000626) 
• ‘library sequence deconvolution’ is_followed_by ‘(OBI_0200187)’ 
• ‘sequence analysis data transformation’ (OBI_0200187) has_specified_output some ‘data 
item’ (IAO_0000027) and is about ‘population quality’ (PCO_0000003)
Conclusions 
• We have clarified the OWL representation of 
several assays commonly used in biodiversity 
studies. 
• We have outlined good practice for serializing 
biodiversity experimental process both using ISA, 
SRA and RDF format 
• We have shown how synergies obtained from 
resources of the OBO Foundry can greatly benefit 
fast development of fit for purpose tabular data 
collection templates which greatly help compliance 
with annotation standard guidelines.
Why does it matter? 
• Correct sample size assessment 
• Assessing independence of samples and 
sampling events. 
• Is it really possible to ascertain identity of 
samples by solely relying a metadata? 
• How can such uncertainties affect 
downstream analysis / meta analysis?
Future directions 
• Sample Collection Protocols and 
Procedures as applied in biodiversity 
studies (field studies, “Marine macrofauna 
grab sampling method” and so forth) 
• Clarify the reporting of actual results 
• Keeping working with PCO and OBO 
Foundry related efforts.
Acknowledgements 
• Dr. Ramona Walls (iPlant, Uni of Arizona) 
• Pr. Paula Mabee (Uni South Dakota) 
• RCN: Phenotype Ontology Research Coordination 
Network , National Science Foundation (NSF-DEB- 
0956049), (2010 - 2015) 
• Dr. Jie Zheng and OBI companions 
• PCO coworkers and RCN workshop participants 
• ISA Team 
• You
Acknowledgements 2

Contenu connexe

Tendances

How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
drnigam
 
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
CALA-MW
 

Tendances (13)

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.
 
Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...Standards and software: practical aids for reproducibility of computational r...
Standards and software: practical aids for reproducibility of computational r...
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Schindel i evobio norman ok - jun 11
Schindel   i evobio norman ok - jun 11Schindel   i evobio norman ok - jun 11
Schindel i evobio norman ok - jun 11
 
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
 

En vedette

FINAL POSTER
FINAL POSTERFINAL POSTER
FINAL POSTER
Ryan Foo
 

En vedette (13)

Met soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharingMet soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharing
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
TranSMART ISA-june2012
TranSMART ISA-june2012TranSMART ISA-june2012
TranSMART ISA-june2012
 
BioSharing - mapping the landscape of Standards, Databases and Data policies ...
BioSharing - mapping the landscape of Standards, Databases and Data policies ...BioSharing - mapping the landscape of Standards, Databases and Data policies ...
BioSharing - mapping the landscape of Standards, Databases and Data policies ...
 
Damon Little - Opening Plenary
Damon Little - Opening PlenaryDamon Little - Opening Plenary
Damon Little - Opening Plenary
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Soil Microbial Communities: Key Indicators of Soil Carbon Transformations Whe...
Soil Microbial Communities: Key Indicators of Soil Carbon Transformations Whe...Soil Microbial Communities: Key Indicators of Soil Carbon Transformations Whe...
Soil Microbial Communities: Key Indicators of Soil Carbon Transformations Whe...
 
FINAL POSTER
FINAL POSTERFINAL POSTER
FINAL POSTER
 
Plant Barcoding
Plant BarcodingPlant Barcoding
Plant Barcoding
 
Use of DNA barcoding and its role in the plant species/varietal Identifica...
Use of DNA  barcoding  and its role in the plant species/varietal  Identifica...Use of DNA  barcoding  and its role in the plant species/varietal  Identifica...
Use of DNA barcoding and its role in the plant species/varietal Identifica...
 
Microbial Ecology
Microbial EcologyMicrobial Ecology
Microbial Ecology
 
Microbial community composition of different soil layers in an aged oil spill...
Microbial community composition of different soil layers in an aged oil spill...Microbial community composition of different soil layers in an aged oil spill...
Microbial community composition of different soil layers in an aged oil spill...
 

Similaire à Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: The Interoperability Gains of a Modular Approach

Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
Jose Enrique Ruiz
 
#LAWDI Open Context, publishing linked data in archaeology
#LAWDI Open Context, publishing linked data in archaeology#LAWDI Open Context, publishing linked data in archaeology
#LAWDI Open Context, publishing linked data in archaeology
ekansa
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
Cyndy Parr
 

Similaire à Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: The Interoperability Gains of a Modular Approach (20)

Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS DNA BarcodING IN ANIMALS
DNA BarcodING IN ANIMALS
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Nucleic acid database
Nucleic acid databaseNucleic acid database
Nucleic acid database
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
#LAWDI Open Context, publishing linked data in archaeology
#LAWDI Open Context, publishing linked data in archaeology#LAWDI Open Context, publishing linked data in archaeology
#LAWDI Open Context, publishing linked data in archaeology
 
Workflow Preservation
Workflow PreservationWorkflow Preservation
Workflow Preservation
 
Data integration
Data integrationData integration
Data integration
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 

Dernier

Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Dernier (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 

Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: The Interoperability Gains of a Modular Approach

  • 1. Modeling a microbial community and biodiversity assay with OBI and PCO: the gains of a modular approach ICBO2014, in Houston Oct 6-9 Philippe Rocca-Serra, Ramona Walls, Jacob Parnell, Rachel Gallery, Jie Zheng, Susanna Assunta Sansone and Alejandra Gonzalez-Beltran
  • 2. Biodiversity in the News • Grim headlines • True for many Vertebrates species • Mankind only now starts to build tools enabling true exploration of diversity
  • 3. Exploring the world biodiversity • Game changing progress in sequencing technology – Illumina – Oxford Nanopore Minion http://dx.doi.org/10.5524/100102
  • 5. Biodiversity studies with molecular techniques • Shotgun sequencing: – Sequencing as much as possible (probing is limited by sequencing depth available, the rarer the species, the deeper the sequencing needs to be) • Targeted sequencing: – Reliance on a ‘marker gene’ whose variability will be used to estimate distance between species
  • 6. ‘Barcode’ as in Multiplexed Libraries genomic DNA isolated from individual sample is -fragmented (shearing) -ligated to a unique short DNA tag (i.e called the barcode) -PCR amplification and sequencing -output of a single collection of reads which can be subsequently sorted using the DNA short-hand by computational mean – deconvolution process Credits: http://rdp.cme.msu.edu/wiki/index.php/Pyrosequencing_Help
  • 7. ‘Barcode’ as in Barcode of Life Credits: http://www.barcodeoflife.org
  • 8. Ambiguous Language • What is a barcode or what is a barcoding experiment? – Metaphors are impenetrable to computers. – Need to make representation unambiguous – Barcoding, meaning a technique for processing more samples in one go -> another word for multiplexing – Barcoding, meaning the creation of a unique profile as a means to identify types of living things
  • 9. Heaps of sequence data for sure….but • What is the value in the absence of accompanying descriptors? • Essential annotation to ascertain identity and origin, sampling conditions and rationale
  • 10. Helping Data Management • MIXS Guidelines checklist • SRA xml schema, Genbank records… • Tabular Templates for Data Collection • Wealth of RDF conversion tools – R2RML W3C data standards • Using the same xml and same guidelines, nevertheless ambiguities subsist
  • 11.
  • 12. ISA templates for Microbial Diversity Studies • Integrating MIXS checklist in the ISA framework • Mapping MIXS entities into SRA XML schema – Properties of sample – Properties of sample processing – Properties of resulting libraries – Properties of data processing
  • 13. Ambiguities: Barcoding • Library Experiment Sample unicity • Use Case: creation of libraries for Bacteria,Fungi,Eukaryota with specific genes (16sRNA, ITS, COI) • ISA conversion to ENA: – 1 sample -> 3 libraries • SRA/ENA submission: – 3 libraries -> 3 samples
  • 14. Working with OBI, PCO,SO, CHEBI Drawn using CMAPtools: http://cmap.ihmc.us
  • 15. Working with OBI, PCO,SO, CHEBI Drawn using CMAPtools: http://cmap.ihmc.us
  • 16. OBI-PCO based representation • ‘targeted gene survey’ • has part some ‘library preparation’ (OBI_0000711) • ‘polymerase chain reaction’ (OBI_0000415) is_part_of ‘library preparation’ (OBI_0000711) • ‘polymerase chain reaction’(OBI_0000415) • has_specified_input some ‘forward pcr primer’ (OBI_0000722) • has_specified_input some ‘reverse pcr primer’ (OBI_0001951) • has_specified_input some ‘multiplexing sequence identifier’ • has_specified_input some ‘DNA extract’ (OBI_0001051) • ‘library preparation’ (OBI_0000711) ‘has_specified_output’ some ‘single fragment library’ (OBI_0000736) • ‘library preparation’ (OBI_0000711) precedes ‘DNA sequencing’(OBI_0000626) • ‘library sequence deconvolution’ is_preceded_by ‘DNA sequencing’(OBI_0000626) • ‘library sequence deconvolution’ is_followed_by ‘(OBI_0200187)’ • ‘sequence analysis data transformation’ (OBI_0200187) has_specified_output some ‘data item’ (IAO_0000027) and is about ‘population quality’ (PCO_0000003)
  • 17. Conclusions • We have clarified the OWL representation of several assays commonly used in biodiversity studies. • We have outlined good practice for serializing biodiversity experimental process both using ISA, SRA and RDF format • We have shown how synergies obtained from resources of the OBO Foundry can greatly benefit fast development of fit for purpose tabular data collection templates which greatly help compliance with annotation standard guidelines.
  • 18. Why does it matter? • Correct sample size assessment • Assessing independence of samples and sampling events. • Is it really possible to ascertain identity of samples by solely relying a metadata? • How can such uncertainties affect downstream analysis / meta analysis?
  • 19. Future directions • Sample Collection Protocols and Procedures as applied in biodiversity studies (field studies, “Marine macrofauna grab sampling method” and so forth) • Clarify the reporting of actual results • Keeping working with PCO and OBO Foundry related efforts.
  • 20. Acknowledgements • Dr. Ramona Walls (iPlant, Uni of Arizona) • Pr. Paula Mabee (Uni South Dakota) • RCN: Phenotype Ontology Research Coordination Network , National Science Foundation (NSF-DEB- 0956049), (2010 - 2015) • Dr. Jie Zheng and OBI companions • PCO coworkers and RCN workshop participants • ISA Team • You

Notes de l'éditeur

  1. Biodiversity, the field of science interested in documenting The Earth’s life form wherever they are. For Vertebrates and many macroscopic species, the outlook seems grim as seen in recent headlines, here exemplified by BBC news title dating back September 30th. This is all the more troubling as we only start to have the molecular tools to probe life very diverse niches.
  2. We are all too aware of the advances in sequencing technologies, with Illumina instruments dominating the market. While those instruments are fast they are still bulky and competitors are working hard at developing new alternatives whose size (here is Oxford Nanopore Minion USB connected nanopore sequencing ) for which a first dataset has been published in a BMC GigaScience.
  3. For a long time, scientists have been limited in their exploration by the ‘lense’ through which their were looking. This can not be more explicitly demonstrated in the world of microbiology where only what that could be grown in lab conditions would be characterized. The advent of fast, accurate sequencing techniques opened entirely new horizons to life exploration. Here are few examples, from our happy scientists at the zoology department in Oxford, collecting new deep sea samples, to colleagues monitoring extreme habitats such as mining waters. Other projects such as Tara Ocean recapitulate some of the sea trails followed by the XV century explorers in an attempt to provide a snapshot of marine biodiversity. Finally, biodiversity is within too as shown in this famous Nature article and by projects such as the American Gut.
  4. When it comes to biodiversity studies relying on sequencing techniques, there are in fact 2 main approaches: global or targeted. In the first case, one will try to sequence as much as possible, and this means as deep as possible to trawl the rarest (i.e. less abundant) species. But deep sequencing is expensive and requires long machine time, which can be an issue with a limited number of instruments and a vast number of samples to process. Another approach is much more parsimonious but only provides an indirect measure of biodiversity. The technique relies on identifying a genomic region specific to a genre, but variable enough to estimate the spread of subspecies within that genre. Such genomic region are often coding genes, common ancestors which have accumulated mutations and can be used as a proxy to estimate distance between relatives. For the Bacteria, 16sRNA gene is used, for Fungi, hyper-variable regions of gene ITS are the prime tool and COI gene is often used for Eukaryotes.
  5. This brings the need to disambiguate 2 very distinct (even though related in their metaphor) of the notion of ‘Barcode’. You remember we mentioned that instrument occupancy was still a bottleneck (as well as reagent costs). There, multiplexing techniques offer an extremely valuable solution for speeding up throughput. Once more, the advances in computational treatment of sequencing reads meant it was possible to devise library construction techniques allowing pooling of tagged samples so one single reaction well could be used to produce signal. Since individual genomic DNA for each sample has been tagged with a ‘multiplex identifier’ (mid) colloquially called ‘barcode’, it is possible to apply a deconvolution protocol and group together all sequencing reads associated to the tag and therefore a sample. This is first meaning of ‘barcode sequencing’ in the field.
  6. But Barcode is also met in the project ‘Barcode of life’ . Here the aim, is to defined a true nucleic acid profile (if possible in single gene region) which would uniquely define a given species. This slide shows the overall workflow and ambition.
  7. (all on the slide)
  8. Fine, huge amounts of sequencing data are being generated but those will be of little value if contextual data is missing. The criticality of such annotation has been outlined in a NatBiotech paper from 2011 by Yilmaz et al., who published the MIXS/MIMARKS minimal information specification. This work was carried out under the Genomic Standards Consortium (GSC) initiative.
  9. The MIXS/MIMARKS checklist provides a framework detailing which metadata to collect, with specific requirements for specific sample types. It is meant to facilitate exchange of data between centres collecting and archiving environmental samples. We will now show how these guidelines have been implemented by the ISA Team, that generated a set of configurations defining data collection templates.
  10. A quick introduction of ISA tool suites, support data collection, persistence and conversion to a set of formats supported by Public Repositories. Ecosystem revolving around the ISA-TAB format Support for massively parallel datasets Gradient from left to right – configuration (annotation guidelines), curation tools to analysis and usage – people can choose the path that is more convenient for their use case. More recently, we became involved with Publishers (NPG and BMC GigaScience)
  11. The main job consisted in 2 steps: i. create the ISA configurations from MIMARKS guidelines. This meant binning metadata tags defined by GSC to the relevant ISA syntactic element. For instance, MIXS geo_loc (geographical location) has been mapped to the ISA Source Name element while ‘collection device’ has been mapped as a Parameter Value associated to a ‘Protocol’. A screenshot shown here illustrate the ‘distribution’ of MIMARKS tags over an ISA workflow , showing here only the annotation related to library preparation and data acquisition. ii. Step 2 consisted in adjusting the ISA SRA converter and mapping the metadata into SRA schema objects. This is where we realized that the same information (MIMARKS) can be mapped differently to the same schema (SRA).
  12. The example we consider here is that of an environmental gene survey performed on the same sample but using 3 different sets of PCR primers to amplify genomic regions targeting 3 different Genera. Following ISA templates, the interpretation of the conversion retains that feature, i.e. all libraries have been derived from the same samples. However, other tools will create 3 distinct SRA samples. Identity will have to be assumed. The experience has been used to fully describe these types of assays in a BFO based ontological framework in order to ensure semantic accuracy and avoid the pitfalls. The following 2 slides present a graphical representation (in the form of cMAP) of ‘targeted gene survey’ assay by exploiting the OBI assay design pattern and augmenting it to accommodate the specifics of the procedure.
  13. This shows the component corresponding the biomaterial sample collection and preservation,
  14. This shows the component corresponding the biomaterial processing to generate sequencing libraries, preceding the data acquisition and treatment processes, which ultimately, produced information artifact about a population.
  15. The representation can therefore be exploited to convert ISA spreadsheet for this type of information and totally clarify the semantics of the tables. Such mapping can be fed to the ISA RDF conversion module (LinkedISA) as the means to make biodiversity data more linked. Obviously, this pattern is independent from ISA based representation but the same representation can be used as mapping template, thus providing a patterns to consistently represent such data.
  16. Conclusions: (all on the slide, really)
  17. In digging into the details of sequence based biodiversity assay, we have identified a potential issue in existing representation affecting the ability to accurately assess true sample size. This may result in inconsistencies between declared sample size in experimental reports and sample sizes computed from deposited data. While remedial heuristics can be devised to compensate, they have a cost. Those methods will have to rely on computing distance metrics based on vectors of metadata values and try to infer identity of origin. They key question will be to understand how it may influence downstream data analysis
  18. This leads to the discussion of future direction of work PCO and OBI could look into. These could range from capturing the specifics of sampling procedures used in environmental and biodiversity studies. A number of protocols and guidelines , such as the” Marine macrofauna grab sampling method” to give an example, development could also look into clarifying the actual measurement produced from such studies. Ideally, working under the foundry, as people are growing more familiar with development conventions and practices, it makes cross talk more productive , with term dispatch and composition protocol being more refined and detailed. This also encourages cross domain development and outreach to existing and sometimes overlapping efforts. OBI and IAO are currently outlining a plan for alignment, these are encouraging signals for the community.
  19. A big thank to Ramona Walls, Paula Mabee and RCN Phenotype group for organizing and leading these twins events. Al the participants of the PCO meeting (Robert Garulnick, Pier Luigi Buttigieg, Adam among others…) Jie Zheng and obi folks , of course all my colleagues of the ISA Team (Alejandra, Eamonn, Susanna and Milo) and you for your attention
  20. I have to insist of a Heartfelt acknowledgment as it meant swapping this (Oxford floods in February) to this (Arizona desert, February, same year) It was nice to be somewhere dry and in such a great company 