SlideShare une entreprise Scribd logo
1  sur  65
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk
Introduction
• Experimental standards
• Proteomics
• Metabolomics
• Enzyme kinetics
• Modelling standards
• Models
• Simulations
• Results
Why do we need standards?
• Aids researchers by facilitating management of
experimental data
• Facilitates open-source software development
and interoperability
• Allows data to be shared
• Increasingly becoming a requirement for journal
submissions
When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes
established
• Need for a standard becomes obvious
Who develops standards?
• Usually two or more academic groups
• Commercial providers often less enthusiastic
• Often formed by a Working Group
• Proteome Standards Initiative
• Metabolomics Standards Initiative
• “Minimum information required” specification
provided
• Followed by data schema, XML standard
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
Mass spectrometry data
• How do we represent the following?
Mass spectrometry data
• The simple approach:
Mass spectrometry data
• The simple approach does provide a list of
masses and intensities, but…
• What instrument was used?
• Who ran the instrument?
• What sample was used?
• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
Mass spectrometry data
• The less simple approach: mzData
• Developed by the Proteome Standards Initiative,
2005
• Put together by Working Group of academics and
commercial parties
• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into
one
• Support “tagging” with metadata
mzData
• http://www.psidev.info/index.php?q=node/80#mzdata
• XML format, includes…
• Peak lists (mz / intensities)
• Experimental protocols
• Admin (Who? When?)
• Instrument details
• etc.
Controlled vocabularies
• Use of free text is “dangerous”
• Non-standard, ambiguous terms
• Difficult to match / compare
• Controlled vocabularies
• Collection of standardised terms
• Organised into vocabularies or ontologies
• Ontologies contain controlled terms and relationships
between them (predicates)
Controlled vocabularies
• Ontology Lookup Service, EBI
mzData
Proteomics data
• Proteomics data is not solely mass
spectrometry data
• Sample preparation protocol?
• Peptide / protein identifications?
• Post-translational modifications
• Identification scores?
• To support this, an extension is required
• Extension based on defined set of “minimum
requirements”
• MIAPE
MIAPE
PRIDE
• Proteomics identifications database
– Both a format and a database
– Centralised, standards compliant, open source, public
data repository for proteomics data
– Query, submit and retrieve proteomics data in
standardized XML formats
– Public version housed at the EBI
– http://www.ebi.ac.uk/pride/
PRIDE
• Peptide / protein identifications
PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with
Ontology Lookup
Service
• Developed by EBI
• Automatic upload
to PRIDE database
PRIDE database
Future directions
• PRIDE does NOT hold:
• Protein and peptide quantitations
• New approaches being developed
• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide
identifications
• mzQuantML – storage of protein and peptides
quantitations
Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR)
data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
Metabolomics
• Data standard does NOT currently exist
• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)
• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI
• Not many details as yet
• In the mean time…
• MCISB has developed its own repository
MeMo
• Metabolomics Model database
• Designed initially for metabolomics data
• SQL / XML hybrid approach
• Holds:
– Experimental meta-data (submitter, lab, date)
– Sample meta-data (including biological source)
– Instrumentation meta-data
– Mass spectra
– Metabolite identifications
MeMo
MeMo web interface
Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define
the kinetics of the reaction
• Experimental approach: perform kinetic assays
Enzyme kinetics
• Many approaches:
– Absorbance
– Fluorescence
– others
• Currently concentrating on absorbance assays
on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a
given reaction under particular conditions (pH
and temperature)
Enzyme kinetics: Michaelis-Menten
• Traditionally, for each assay, initial rate, v is
determined
Enzyme kinetics: Michaelis-Menten
• Performing this at various substrate
concentrations allows KM and Vmax to be
determined:
STRENDA guidelines
• Standards for Reporting Enzymology Data
• http://www.beilstein-institut.de/en/projects/strenda/
• Specifies…
• Reactants / products
• Enzyme (wild-type, modified, purification, expressed
in
• Experimental conditions (pH, temperature, buffer)
• Instrument, experiment type
• Submitter (contact details)
SABIO-RK
• http://sabio.villa-bosch.de/
• Comprehensive collection of enzyme kinetic
constants
• Adheres to STRENDA recommendation
• Harvested from literature
• Searchable web interface
SABIO-RK
SABIO-RK
SABIO-RK
BRENDA
• http://www.brenda-enzymes.org/
• Even more comprehensive
• Slightly less well-curated
• Again, searchable web interface
BRENDA
Other experimental standards
• MIBBI: Minimum Information for Biological and
Biomedical Investigations
• http://mibbi.org/
• Over thirty recommendations for a range of
experimental techniques
Modelling standards
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Modelling
• What is a model?
• “An analytic or computational model proposes
specific testable hypotheses about a biological
system”
• Mathematical / computational representation of
a biological system
• May allows computational simulations of the
system
Pathway databases
• Building a model often starts with a topological
description of a pathway or pathways
• What reacts with what?
• A number of existing data resources
• Biochemical knowledge, curated from literature
KEGG
KEGG
Metabolite
Enzyme
Reaction
MetaCyc
Reactome
Simulation tools
• The systems biology community has developed
a strong software infrastructure
• Many tools exist, including simulators
• Several hundred
• How do we link pathway databases to these
simulators?
• A standard: SBML
• Systems Biology Markup Language
• Recently celebrated its 10th
birthday
SBML
• XML markup language describing models
• Contains concepts such as…
• compartments
• species (metabolites, enzymes, RNA, etc.)
• reactions
• Similar to pathway databases
• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
Mathematical SBML
• Also contains concepts allowing simulations
• Many of these driven by experimental work
• Specification of metabolite and enzyme
concentrations
• Specification of kinetic laws and kinetic
parameters
• Parameterised model = pathways + experimental data
SBML
SBML data resources
• Biomodels.net
• http://www.ebi.ac.uk/biomodels-main/
• Curated collection of biochemical models at EBI
• JWS Online
• http://jjj.mib.ac.uk/
• Also curated
• BUT also includes an online simulator
• You’ll learn more next month…
SBML tools
• Hundreds of ‘em (205)
• http://sbml.org/SBML_Software_Guide
• Different goals
• Whole cell / single pathway
• Deterministic / stochastic simulators
• Different platforms / programming languages
• Matrix exists, describing capabilities of each
tool
• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
Making SBML models: CellDesigner
Other model representations
• CellML
• http://www.cellml.org/
• Larger scale modelling
• Inter-cellular, used in whole organ modelling
• BioPAX
• http://www.biopax.org/
• Similar goals to SBML
• Overlap between “competing” representations
is being reduced
• Regular “COMBINE” meetings
MIRIAM
• Minimum Information Required in the
Annotation of Models
• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make
models reusable
• Specify model creator contact details
• Ensure consistent annotation of terms with database
resources
• e.g. use UniProt identifiers for unambigous
identification of enzymes
SBML visualisation: SBGN
• Until recently, no standardised way of viewing
models
• Systems Biology Graphical Notation
• Attempts to generate standard “wiring-diagram” for
biological representations
Model simulation
Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?
• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…
• Model (what to run the simulation on)
• Simulation (define what to simulate, duration, step-
size)
• Data generation (post-processing normalisation)
• Output (2D plot, 3D plot)
Simulation results: SBRML
• Simulation results are data too, and are
represented by SBRML
• Systems Biology Results Markup Language
• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation
results
• Dada JO, et al. SBRML: a markup language for associating systems
biology data with models. Bioinformatics 2010, 26, 932-938.
SBRML
Conclusion
• Data standards greatly facilitate computational
systems biology
• Standards exist (and are being continually
developed) for both experimental and modelling
data
• Provides a framework for data sharing and
open-source software tool development
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk

Contenu connexe

En vedette

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
Neil Swainston
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
Neil Swainston
 

En vedette (6)

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 

Similaire à Data standards for systems biology

Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
ChemAxon
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 

Similaire à Data standards for systems biology (20)

Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
Data integration
Data integrationData integration
Data integration
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 

Plus de Neil Swainston

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
Neil Swainston
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Neil Swainston
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
Neil Swainston
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
Neil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
Neil Swainston
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
Neil Swainston
 

Plus de Neil Swainston (8)

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
 
SBML Browse
SBML BrowseSBML Browse
SBML Browse
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
 
libAnnotationSBML
libAnnotationSBMLlibAnnotationSBML
libAnnotationSBML
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Data standards for systems biology

  • 1. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk
  • 2. Introduction • Experimental standards • Proteomics • Metabolomics • Enzyme kinetics • Modelling standards • Models • Simulations • Results
  • 3. Why do we need standards? • Aids researchers by facilitating management of experimental data • Facilitates open-source software development and interoperability • Allows data to be shared • Increasingly becoming a requirement for journal submissions
  • 4. When are standards developed? • Standards generally are generated organically • Not for pioneers • When an experimental technique becomes established • Need for a standard becomes obvious
  • 5. Who develops standards? • Usually two or more academic groups • Commercial providers often less enthusiastic • Often formed by a Working Group • Proteome Standards Initiative • Metabolomics Standards Initiative • “Minimum information required” specification provided • Followed by data schema, XML standard
  • 6. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 7. Proteomics • We wish to store: • Raw experimental mass spectrometry data • Protein / peptide identifications • Protein / peptide quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 8. Mass spectrometry data • How do we represent the following?
  • 9. Mass spectrometry data • The simple approach:
  • 10. Mass spectrometry data • The simple approach does provide a list of masses and intensities, but… • What instrument was used? • Who ran the instrument? • What sample was used? • …etc. • The simple approach lacks metadata • Many simple approaches (formats) exist
  • 11. Mass spectrometry data • The less simple approach: mzData • Developed by the Proteome Standards Initiative, 2005 • Put together by Working Group of academics and commercial parties • Regular meetings, both real and virtual • Goal: unify the existing “simple” formats into one • Support “tagging” with metadata
  • 12. mzData • http://www.psidev.info/index.php?q=node/80#mzdata • XML format, includes… • Peak lists (mz / intensities) • Experimental protocols • Admin (Who? When?) • Instrument details • etc.
  • 13. Controlled vocabularies • Use of free text is “dangerous” • Non-standard, ambiguous terms • Difficult to match / compare • Controlled vocabularies • Collection of standardised terms • Organised into vocabularies or ontologies • Ontologies contain controlled terms and relationships between them (predicates)
  • 16. Proteomics data • Proteomics data is not solely mass spectrometry data • Sample preparation protocol? • Peptide / protein identifications? • Post-translational modifications • Identification scores? • To support this, an extension is required • Extension based on defined set of “minimum requirements” • MIAPE
  • 17. MIAPE
  • 18. PRIDE • Proteomics identifications database – Both a format and a database – Centralised, standards compliant, open source, public data repository for proteomics data – Query, submit and retrieve proteomics data in standardized XML formats – Public version housed at the EBI – http://www.ebi.ac.uk/pride/
  • 19. PRIDE • Peptide / protein identifications
  • 20. PRIDE Converter • User interface • Usable by biologists • Interfaces with Ontology Lookup Service • Developed by EBI • Automatic upload to PRIDE database
  • 22. Future directions • PRIDE does NOT hold: • Protein and peptide quantitations • New approaches being developed • mzML – mass spectrometry format, enhancement of mzData, including support for richer datasets • mzIdentML – storage of protein and peptide identifications • mzQuantML – storage of protein and peptides quantitations
  • 23. Metabolomics • We wish to store: • Raw experimental mass spectrometry (and NMR) data • Metabolite identifications • Metabolite quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 24. Metabolomics • Data standard does NOT currently exist • Core Information for Metabolomics Reporting • Metabolites Standard Initiative (MSI) • http://msi-workgroups.sourceforge.net/ • MetaboLights being developed at EBI • Not many details as yet • In the mean time… • MCISB has developed its own repository
  • 25. MeMo • Metabolomics Model database • Designed initially for metabolomics data • SQL / XML hybrid approach • Holds: – Experimental meta-data (submitter, lab, date) – Sample meta-data (including biological source) – Instrumentation meta-data – Mass spectra – Metabolite identifications
  • 26. MeMo
  • 27.
  • 29. Enzyme kinetics • How fast does a given reaction occur? Enzyme A B • Determination of kinetic constants which define the kinetics of the reaction • Experimental approach: perform kinetic assays
  • 30. Enzyme kinetics • Many approaches: – Absorbance – Fluorescence – others • Currently concentrating on absorbance assays on BMG NOVOstar instrument • Requirement: determination of KM and kcat for a given reaction under particular conditions (pH and temperature)
  • 31. Enzyme kinetics: Michaelis-Menten • Traditionally, for each assay, initial rate, v is determined
  • 32. Enzyme kinetics: Michaelis-Menten • Performing this at various substrate concentrations allows KM and Vmax to be determined:
  • 33. STRENDA guidelines • Standards for Reporting Enzymology Data • http://www.beilstein-institut.de/en/projects/strenda/ • Specifies… • Reactants / products • Enzyme (wild-type, modified, purification, expressed in • Experimental conditions (pH, temperature, buffer) • Instrument, experiment type • Submitter (contact details)
  • 34. SABIO-RK • http://sabio.villa-bosch.de/ • Comprehensive collection of enzyme kinetic constants • Adheres to STRENDA recommendation • Harvested from literature • Searchable web interface
  • 38. BRENDA • http://www.brenda-enzymes.org/ • Even more comprehensive • Slightly less well-curated • Again, searchable web interface
  • 40. Other experimental standards • MIBBI: Minimum Information for Biological and Biomedical Investigations • http://mibbi.org/ • Over thirty recommendations for a range of experimental techniques
  • 42. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 43. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 44. Modelling • What is a model? • “An analytic or computational model proposes specific testable hypotheses about a biological system” • Mathematical / computational representation of a biological system • May allows computational simulations of the system
  • 45. Pathway databases • Building a model often starts with a topological description of a pathway or pathways • What reacts with what? • A number of existing data resources • Biochemical knowledge, curated from literature
  • 46. KEGG
  • 50. Simulation tools • The systems biology community has developed a strong software infrastructure • Many tools exist, including simulators • Several hundred • How do we link pathway databases to these simulators? • A standard: SBML • Systems Biology Markup Language • Recently celebrated its 10th birthday
  • 51. SBML • XML markup language describing models • Contains concepts such as… • compartments • species (metabolites, enzymes, RNA, etc.) • reactions • Similar to pathway databases • KEGG2SBML tool exists for converting KEGG pathway maps to SBML files
  • 52. Mathematical SBML • Also contains concepts allowing simulations • Many of these driven by experimental work • Specification of metabolite and enzyme concentrations • Specification of kinetic laws and kinetic parameters • Parameterised model = pathways + experimental data
  • 53. SBML
  • 54. SBML data resources • Biomodels.net • http://www.ebi.ac.uk/biomodels-main/ • Curated collection of biochemical models at EBI • JWS Online • http://jjj.mib.ac.uk/ • Also curated • BUT also includes an online simulator • You’ll learn more next month…
  • 55. SBML tools • Hundreds of ‘em (205) • http://sbml.org/SBML_Software_Guide • Different goals • Whole cell / single pathway • Deterministic / stochastic simulators • Different platforms / programming languages • Matrix exists, describing capabilities of each tool • http://sbml.org/SBML_Software_Guide/ SBML_Software_Matrix
  • 56. Making SBML models: CellDesigner
  • 57. Other model representations • CellML • http://www.cellml.org/ • Larger scale modelling • Inter-cellular, used in whole organ modelling • BioPAX • http://www.biopax.org/ • Similar goals to SBML • Overlap between “competing” representations is being reduced • Regular “COMBINE” meetings
  • 58. MIRIAM • Minimum Information Required in the Annotation of Models • http://www.ebi.ac.uk/miriam/ • Set of guidelines describing how to make models reusable • Specify model creator contact details • Ensure consistent annotation of terms with database resources • e.g. use UniProt identifiers for unambigous identification of enzymes
  • 59. SBML visualisation: SBGN • Until recently, no standardised way of viewing models • Systems Biology Graphical Notation • Attempts to generate standard “wiring-diagram” for biological representations
  • 61. Model simulation • Many simulators exist • How do we tell a simulator what to simulate? • Simulation Experiment Description Markup Language (SED-ML) • Contains concepts… • Model (what to run the simulation on) • Simulation (define what to simulate, duration, step- size) • Data generation (post-processing normalisation) • Output (2D plot, 3D plot)
  • 62. Simulation results: SBRML • Simulation results are data too, and are represented by SBRML • Systems Biology Results Markup Language • Developed by Joseph Dada, et al. (Manchester) • Structured format for representing simulation results • Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
  • 63. SBRML
  • 64. Conclusion • Data standards greatly facilitate computational systems biology • Standards exist (and are being continually developed) for both experimental and modelling data • Provides a framework for data sharing and open-source software tool development
  • 65. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk