3. Why do we need standards?
• Aids researchers by facilitating management of
experimental data
• Facilitates open-source software development
and interoperability
• Allows data to be shared
• Increasingly becoming a requirement for journal
submissions
4. When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes
established
• Need for a standard becomes obvious
5. Who develops standards?
• Usually two or more academic groups
• Commercial providers often less enthusiastic
• Often formed by a Working Group
• Proteome Standards Initiative
• Metabolomics Standards Initiative
• “Minimum information required” specification
provided
• Followed by data schema, XML standard
6. MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
7. Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
10. Mass spectrometry data
• The simple approach does provide a list of
masses and intensities, but…
• What instrument was used?
• Who ran the instrument?
• What sample was used?
• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
11. Mass spectrometry data
• The less simple approach: mzData
• Developed by the Proteome Standards Initiative,
2005
• Put together by Working Group of academics and
commercial parties
• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into
one
• Support “tagging” with metadata
13. Controlled vocabularies
• Use of free text is “dangerous”
• Non-standard, ambiguous terms
• Difficult to match / compare
• Controlled vocabularies
• Collection of standardised terms
• Organised into vocabularies or ontologies
• Ontologies contain controlled terms and relationships
between them (predicates)
16. Proteomics data
• Proteomics data is not solely mass
spectrometry data
• Sample preparation protocol?
• Peptide / protein identifications?
• Post-translational modifications
• Identification scores?
• To support this, an extension is required
• Extension based on defined set of “minimum
requirements”
• MIAPE
18. PRIDE
• Proteomics identifications database
– Both a format and a database
– Centralised, standards compliant, open source, public
data repository for proteomics data
– Query, submit and retrieve proteomics data in
standardized XML formats
– Public version housed at the EBI
– http://www.ebi.ac.uk/pride/
20. PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with
Ontology Lookup
Service
• Developed by EBI
• Automatic upload
to PRIDE database
22. Future directions
• PRIDE does NOT hold:
• Protein and peptide quantitations
• New approaches being developed
• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide
identifications
• mzQuantML – storage of protein and peptides
quantitations
23. Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR)
data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
24. Metabolomics
• Data standard does NOT currently exist
• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)
• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI
• Not many details as yet
• In the mean time…
• MCISB has developed its own repository
25. MeMo
• Metabolomics Model database
• Designed initially for metabolomics data
• SQL / XML hybrid approach
• Holds:
– Experimental meta-data (submitter, lab, date)
– Sample meta-data (including biological source)
– Instrumentation meta-data
– Mass spectra
– Metabolite identifications
29. Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define
the kinetics of the reaction
• Experimental approach: perform kinetic assays
30. Enzyme kinetics
• Many approaches:
– Absorbance
– Fluorescence
– others
• Currently concentrating on absorbance assays
on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a
given reaction under particular conditions (pH
and temperature)
40. Other experimental standards
• MIBBI: Minimum Information for Biological and
Biomedical Investigations
• http://mibbi.org/
• Over thirty recommendations for a range of
experimental techniques
42. MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
43. MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
44. Modelling
• What is a model?
• “An analytic or computational model proposes
specific testable hypotheses about a biological
system”
• Mathematical / computational representation of
a biological system
• May allows computational simulations of the
system
45. Pathway databases
• Building a model often starts with a topological
description of a pathway or pathways
• What reacts with what?
• A number of existing data resources
• Biochemical knowledge, curated from literature
50. Simulation tools
• The systems biology community has developed
a strong software infrastructure
• Many tools exist, including simulators
• Several hundred
• How do we link pathway databases to these
simulators?
• A standard: SBML
• Systems Biology Markup Language
• Recently celebrated its 10th
birthday
51. SBML
• XML markup language describing models
• Contains concepts such as…
• compartments
• species (metabolites, enzymes, RNA, etc.)
• reactions
• Similar to pathway databases
• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
52. Mathematical SBML
• Also contains concepts allowing simulations
• Many of these driven by experimental work
• Specification of metabolite and enzyme
concentrations
• Specification of kinetic laws and kinetic
parameters
• Parameterised model = pathways + experimental data
54. SBML data resources
• Biomodels.net
• http://www.ebi.ac.uk/biomodels-main/
• Curated collection of biochemical models at EBI
• JWS Online
• http://jjj.mib.ac.uk/
• Also curated
• BUT also includes an online simulator
• You’ll learn more next month…
55. SBML tools
• Hundreds of ‘em (205)
• http://sbml.org/SBML_Software_Guide
• Different goals
• Whole cell / single pathway
• Deterministic / stochastic simulators
• Different platforms / programming languages
• Matrix exists, describing capabilities of each
tool
• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
57. Other model representations
• CellML
• http://www.cellml.org/
• Larger scale modelling
• Inter-cellular, used in whole organ modelling
• BioPAX
• http://www.biopax.org/
• Similar goals to SBML
• Overlap between “competing” representations
is being reduced
• Regular “COMBINE” meetings
58. MIRIAM
• Minimum Information Required in the
Annotation of Models
• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make
models reusable
• Specify model creator contact details
• Ensure consistent annotation of terms with database
resources
• e.g. use UniProt identifiers for unambigous
identification of enzymes
59. SBML visualisation: SBGN
• Until recently, no standardised way of viewing
models
• Systems Biology Graphical Notation
• Attempts to generate standard “wiring-diagram” for
biological representations
61. Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?
• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…
• Model (what to run the simulation on)
• Simulation (define what to simulate, duration, step-
size)
• Data generation (post-processing normalisation)
• Output (2D plot, 3D plot)
62. Simulation results: SBRML
• Simulation results are data too, and are
represented by SBRML
• Systems Biology Results Markup Language
• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation
results
• Dada JO, et al. SBRML: a markup language for associating systems
biology data with models. Bioinformatics 2010, 26, 932-938.
64. Conclusion
• Data standards greatly facilitate computational
systems biology
• Standards exist (and are being continually
developed) for both experimental and modelling
data
• Provides a framework for data sharing and
open-source software tool development
65. Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk