SlideShare a Scribd company logo
1 of 65
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk
Introduction
• Experimental standards
• Proteomics
• Metabolomics
• Enzyme kinetics
• Modelling standards
• Models
• Simulations
• Results
Why do we need standards?
• Aids researchers by facilitating management of
experimental data
• Facilitates open-source software development
and interoperability
• Allows data to be shared
• Increasingly becoming a requirement for journal
submissions
When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes
established
• Need for a standard becomes obvious
Who develops standards?
• Usually two or more academic groups
• Commercial providers often less enthusiastic
• Often formed by a Working Group
• Proteome Standards Initiative
• Metabolomics Standards Initiative
• “Minimum information required” specification
provided
• Followed by data schema, XML standard
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
Mass spectrometry data
• How do we represent the following?
Mass spectrometry data
• The simple approach:
Mass spectrometry data
• The simple approach does provide a list of
masses and intensities, but…
• What instrument was used?
• Who ran the instrument?
• What sample was used?
• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
Mass spectrometry data
• The less simple approach: mzData
• Developed by the Proteome Standards Initiative,
2005
• Put together by Working Group of academics and
commercial parties
• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into
one
• Support “tagging” with metadata
mzData
• http://www.psidev.info/index.php?q=node/80#mzdata
• XML format, includes…
• Peak lists (mz / intensities)
• Experimental protocols
• Admin (Who? When?)
• Instrument details
• etc.
Controlled vocabularies
• Use of free text is “dangerous”
• Non-standard, ambiguous terms
• Difficult to match / compare
• Controlled vocabularies
• Collection of standardised terms
• Organised into vocabularies or ontologies
• Ontologies contain controlled terms and relationships
between them (predicates)
Controlled vocabularies
• Ontology Lookup Service, EBI
mzData
Proteomics data
• Proteomics data is not solely mass
spectrometry data
• Sample preparation protocol?
• Peptide / protein identifications?
• Post-translational modifications
• Identification scores?
• To support this, an extension is required
• Extension based on defined set of “minimum
requirements”
• MIAPE
MIAPE
PRIDE
• Proteomics identifications database
– Both a format and a database
– Centralised, standards compliant, open source, public
data repository for proteomics data
– Query, submit and retrieve proteomics data in
standardized XML formats
– Public version housed at the EBI
– http://www.ebi.ac.uk/pride/
PRIDE
• Peptide / protein identifications
PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with
Ontology Lookup
Service
• Developed by EBI
• Automatic upload
to PRIDE database
PRIDE database
Future directions
• PRIDE does NOT hold:
• Protein and peptide quantitations
• New approaches being developed
• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide
identifications
• mzQuantML – storage of protein and peptides
quantitations
Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR)
data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
Metabolomics
• Data standard does NOT currently exist
• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)
• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI
• Not many details as yet
• In the mean time…
• MCISB has developed its own repository
MeMo
• Metabolomics Model database
• Designed initially for metabolomics data
• SQL / XML hybrid approach
• Holds:
– Experimental meta-data (submitter, lab, date)
– Sample meta-data (including biological source)
– Instrumentation meta-data
– Mass spectra
– Metabolite identifications
MeMo
MeMo web interface
Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define
the kinetics of the reaction
• Experimental approach: perform kinetic assays
Enzyme kinetics
• Many approaches:
– Absorbance
– Fluorescence
– others
• Currently concentrating on absorbance assays
on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a
given reaction under particular conditions (pH
and temperature)
Enzyme kinetics: Michaelis-Menten
• Traditionally, for each assay, initial rate, v is
determined
Enzyme kinetics: Michaelis-Menten
• Performing this at various substrate
concentrations allows KM and Vmax to be
determined:
STRENDA guidelines
• Standards for Reporting Enzymology Data
• http://www.beilstein-institut.de/en/projects/strenda/
• Specifies…
• Reactants / products
• Enzyme (wild-type, modified, purification, expressed
in
• Experimental conditions (pH, temperature, buffer)
• Instrument, experiment type
• Submitter (contact details)
SABIO-RK
• http://sabio.villa-bosch.de/
• Comprehensive collection of enzyme kinetic
constants
• Adheres to STRENDA recommendation
• Harvested from literature
• Searchable web interface
SABIO-RK
SABIO-RK
SABIO-RK
BRENDA
• http://www.brenda-enzymes.org/
• Even more comprehensive
• Slightly less well-curated
• Again, searchable web interface
BRENDA
Other experimental standards
• MIBBI: Minimum Information for Biological and
Biomedical Investigations
• http://mibbi.org/
• Over thirty recommendations for a range of
experimental techniques
Modelling standards
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Modelling
• What is a model?
• “An analytic or computational model proposes
specific testable hypotheses about a biological
system”
• Mathematical / computational representation of
a biological system
• May allows computational simulations of the
system
Pathway databases
• Building a model often starts with a topological
description of a pathway or pathways
• What reacts with what?
• A number of existing data resources
• Biochemical knowledge, curated from literature
KEGG
KEGG
Metabolite
Enzyme
Reaction
MetaCyc
Reactome
Simulation tools
• The systems biology community has developed
a strong software infrastructure
• Many tools exist, including simulators
• Several hundred
• How do we link pathway databases to these
simulators?
• A standard: SBML
• Systems Biology Markup Language
• Recently celebrated its 10th
birthday
SBML
• XML markup language describing models
• Contains concepts such as…
• compartments
• species (metabolites, enzymes, RNA, etc.)
• reactions
• Similar to pathway databases
• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
Mathematical SBML
• Also contains concepts allowing simulations
• Many of these driven by experimental work
• Specification of metabolite and enzyme
concentrations
• Specification of kinetic laws and kinetic
parameters
• Parameterised model = pathways + experimental data
SBML
SBML data resources
• Biomodels.net
• http://www.ebi.ac.uk/biomodels-main/
• Curated collection of biochemical models at EBI
• JWS Online
• http://jjj.mib.ac.uk/
• Also curated
• BUT also includes an online simulator
• You’ll learn more next month…
SBML tools
• Hundreds of ‘em (205)
• http://sbml.org/SBML_Software_Guide
• Different goals
• Whole cell / single pathway
• Deterministic / stochastic simulators
• Different platforms / programming languages
• Matrix exists, describing capabilities of each
tool
• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
Making SBML models: CellDesigner
Other model representations
• CellML
• http://www.cellml.org/
• Larger scale modelling
• Inter-cellular, used in whole organ modelling
• BioPAX
• http://www.biopax.org/
• Similar goals to SBML
• Overlap between “competing” representations
is being reduced
• Regular “COMBINE” meetings
MIRIAM
• Minimum Information Required in the
Annotation of Models
• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make
models reusable
• Specify model creator contact details
• Ensure consistent annotation of terms with database
resources
• e.g. use UniProt identifiers for unambigous
identification of enzymes
SBML visualisation: SBGN
• Until recently, no standardised way of viewing
models
• Systems Biology Graphical Notation
• Attempts to generate standard “wiring-diagram” for
biological representations
Model simulation
Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?
• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…
• Model (what to run the simulation on)
• Simulation (define what to simulate, duration, step-
size)
• Data generation (post-processing normalisation)
• Output (2D plot, 3D plot)
Simulation results: SBRML
• Simulation results are data too, and are
represented by SBRML
• Systems Biology Results Markup Language
• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation
results
• Dada JO, et al. SBRML: a markup language for associating systems
biology data with models. Bioinformatics 2010, 26, 932-938.
SBRML
Conclusion
• Data standards greatly facilitate computational
systems biology
• Standards exist (and are being continually
developed) for both experimental and modelling
data
• Provides a framework for data sharing and
open-source software tool development
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk

More Related Content

Viewers also liked

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
Neil Swainston
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
Neil Swainston
 

Viewers also liked (6)

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 

Similar to Data standards for systems biology

Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
ChemAxon
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 

Similar to Data standards for systems biology (20)

Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
Data integration
Data integrationData integration
Data integration
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 

More from Neil Swainston

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
Neil Swainston
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Neil Swainston
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
Neil Swainston
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
Neil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
Neil Swainston
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
Neil Swainston
 

More from Neil Swainston (8)

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
 
SBML Browse
SBML BrowseSBML Browse
SBML Browse
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
 
libAnnotationSBML
libAnnotationSBMLlibAnnotationSBML
libAnnotationSBML
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Data standards for systems biology

  • 1. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk
  • 2. Introduction • Experimental standards • Proteomics • Metabolomics • Enzyme kinetics • Modelling standards • Models • Simulations • Results
  • 3. Why do we need standards? • Aids researchers by facilitating management of experimental data • Facilitates open-source software development and interoperability • Allows data to be shared • Increasingly becoming a requirement for journal submissions
  • 4. When are standards developed? • Standards generally are generated organically • Not for pioneers • When an experimental technique becomes established • Need for a standard becomes obvious
  • 5. Who develops standards? • Usually two or more academic groups • Commercial providers often less enthusiastic • Often formed by a Working Group • Proteome Standards Initiative • Metabolomics Standards Initiative • “Minimum information required” specification provided • Followed by data schema, XML standard
  • 6. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 7. Proteomics • We wish to store: • Raw experimental mass spectrometry data • Protein / peptide identifications • Protein / peptide quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 8. Mass spectrometry data • How do we represent the following?
  • 9. Mass spectrometry data • The simple approach:
  • 10. Mass spectrometry data • The simple approach does provide a list of masses and intensities, but… • What instrument was used? • Who ran the instrument? • What sample was used? • …etc. • The simple approach lacks metadata • Many simple approaches (formats) exist
  • 11. Mass spectrometry data • The less simple approach: mzData • Developed by the Proteome Standards Initiative, 2005 • Put together by Working Group of academics and commercial parties • Regular meetings, both real and virtual • Goal: unify the existing “simple” formats into one • Support “tagging” with metadata
  • 12. mzData • http://www.psidev.info/index.php?q=node/80#mzdata • XML format, includes… • Peak lists (mz / intensities) • Experimental protocols • Admin (Who? When?) • Instrument details • etc.
  • 13. Controlled vocabularies • Use of free text is “dangerous” • Non-standard, ambiguous terms • Difficult to match / compare • Controlled vocabularies • Collection of standardised terms • Organised into vocabularies or ontologies • Ontologies contain controlled terms and relationships between them (predicates)
  • 16. Proteomics data • Proteomics data is not solely mass spectrometry data • Sample preparation protocol? • Peptide / protein identifications? • Post-translational modifications • Identification scores? • To support this, an extension is required • Extension based on defined set of “minimum requirements” • MIAPE
  • 17. MIAPE
  • 18. PRIDE • Proteomics identifications database – Both a format and a database – Centralised, standards compliant, open source, public data repository for proteomics data – Query, submit and retrieve proteomics data in standardized XML formats – Public version housed at the EBI – http://www.ebi.ac.uk/pride/
  • 19. PRIDE • Peptide / protein identifications
  • 20. PRIDE Converter • User interface • Usable by biologists • Interfaces with Ontology Lookup Service • Developed by EBI • Automatic upload to PRIDE database
  • 22. Future directions • PRIDE does NOT hold: • Protein and peptide quantitations • New approaches being developed • mzML – mass spectrometry format, enhancement of mzData, including support for richer datasets • mzIdentML – storage of protein and peptide identifications • mzQuantML – storage of protein and peptides quantitations
  • 23. Metabolomics • We wish to store: • Raw experimental mass spectrometry (and NMR) data • Metabolite identifications • Metabolite quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 24. Metabolomics • Data standard does NOT currently exist • Core Information for Metabolomics Reporting • Metabolites Standard Initiative (MSI) • http://msi-workgroups.sourceforge.net/ • MetaboLights being developed at EBI • Not many details as yet • In the mean time… • MCISB has developed its own repository
  • 25. MeMo • Metabolomics Model database • Designed initially for metabolomics data • SQL / XML hybrid approach • Holds: – Experimental meta-data (submitter, lab, date) – Sample meta-data (including biological source) – Instrumentation meta-data – Mass spectra – Metabolite identifications
  • 26. MeMo
  • 27.
  • 29. Enzyme kinetics • How fast does a given reaction occur? Enzyme A B • Determination of kinetic constants which define the kinetics of the reaction • Experimental approach: perform kinetic assays
  • 30. Enzyme kinetics • Many approaches: – Absorbance – Fluorescence – others • Currently concentrating on absorbance assays on BMG NOVOstar instrument • Requirement: determination of KM and kcat for a given reaction under particular conditions (pH and temperature)
  • 31. Enzyme kinetics: Michaelis-Menten • Traditionally, for each assay, initial rate, v is determined
  • 32. Enzyme kinetics: Michaelis-Menten • Performing this at various substrate concentrations allows KM and Vmax to be determined:
  • 33. STRENDA guidelines • Standards for Reporting Enzymology Data • http://www.beilstein-institut.de/en/projects/strenda/ • Specifies… • Reactants / products • Enzyme (wild-type, modified, purification, expressed in • Experimental conditions (pH, temperature, buffer) • Instrument, experiment type • Submitter (contact details)
  • 34. SABIO-RK • http://sabio.villa-bosch.de/ • Comprehensive collection of enzyme kinetic constants • Adheres to STRENDA recommendation • Harvested from literature • Searchable web interface
  • 38. BRENDA • http://www.brenda-enzymes.org/ • Even more comprehensive • Slightly less well-curated • Again, searchable web interface
  • 40. Other experimental standards • MIBBI: Minimum Information for Biological and Biomedical Investigations • http://mibbi.org/ • Over thirty recommendations for a range of experimental techniques
  • 42. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 43. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 44. Modelling • What is a model? • “An analytic or computational model proposes specific testable hypotheses about a biological system” • Mathematical / computational representation of a biological system • May allows computational simulations of the system
  • 45. Pathway databases • Building a model often starts with a topological description of a pathway or pathways • What reacts with what? • A number of existing data resources • Biochemical knowledge, curated from literature
  • 46. KEGG
  • 50. Simulation tools • The systems biology community has developed a strong software infrastructure • Many tools exist, including simulators • Several hundred • How do we link pathway databases to these simulators? • A standard: SBML • Systems Biology Markup Language • Recently celebrated its 10th birthday
  • 51. SBML • XML markup language describing models • Contains concepts such as… • compartments • species (metabolites, enzymes, RNA, etc.) • reactions • Similar to pathway databases • KEGG2SBML tool exists for converting KEGG pathway maps to SBML files
  • 52. Mathematical SBML • Also contains concepts allowing simulations • Many of these driven by experimental work • Specification of metabolite and enzyme concentrations • Specification of kinetic laws and kinetic parameters • Parameterised model = pathways + experimental data
  • 53. SBML
  • 54. SBML data resources • Biomodels.net • http://www.ebi.ac.uk/biomodels-main/ • Curated collection of biochemical models at EBI • JWS Online • http://jjj.mib.ac.uk/ • Also curated • BUT also includes an online simulator • You’ll learn more next month…
  • 55. SBML tools • Hundreds of ‘em (205) • http://sbml.org/SBML_Software_Guide • Different goals • Whole cell / single pathway • Deterministic / stochastic simulators • Different platforms / programming languages • Matrix exists, describing capabilities of each tool • http://sbml.org/SBML_Software_Guide/ SBML_Software_Matrix
  • 56. Making SBML models: CellDesigner
  • 57. Other model representations • CellML • http://www.cellml.org/ • Larger scale modelling • Inter-cellular, used in whole organ modelling • BioPAX • http://www.biopax.org/ • Similar goals to SBML • Overlap between “competing” representations is being reduced • Regular “COMBINE” meetings
  • 58. MIRIAM • Minimum Information Required in the Annotation of Models • http://www.ebi.ac.uk/miriam/ • Set of guidelines describing how to make models reusable • Specify model creator contact details • Ensure consistent annotation of terms with database resources • e.g. use UniProt identifiers for unambigous identification of enzymes
  • 59. SBML visualisation: SBGN • Until recently, no standardised way of viewing models • Systems Biology Graphical Notation • Attempts to generate standard “wiring-diagram” for biological representations
  • 61. Model simulation • Many simulators exist • How do we tell a simulator what to simulate? • Simulation Experiment Description Markup Language (SED-ML) • Contains concepts… • Model (what to run the simulation on) • Simulation (define what to simulate, duration, step- size) • Data generation (post-processing normalisation) • Output (2D plot, 3D plot)
  • 62. Simulation results: SBRML • Simulation results are data too, and are represented by SBRML • Systems Biology Results Markup Language • Developed by Joseph Dada, et al. (Manchester) • Structured format for representing simulation results • Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
  • 63. SBRML
  • 64. Conclusion • Data standards greatly facilitate computational systems biology • Standards exist (and are being continually developed) for both experimental and modelling data • Provides a framework for data sharing and open-source software tool development
  • 65. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk