SlideShare une entreprise Scribd logo
1  sur  26
Curation
Ewan Birney (tweetable)
Who am I?
• Associate Director at
  European Bioinformatics
  Institute (EBI)
• Involved in genomics since I
  was 19 (> 20 years!)
• Trained as a biochemist –
  most people think I am CS
                                 EBI is in Hinxton, South
• Analysed – sometimes lead
                                 Cambridgeshire
  –
  human/mouse/rat/platypus
                                 EBI is part of EMBL, ~like
  etc genomes, ENCODE,
                                 CERN for molecular biology
  Others.
Molecular Biology
• The study of how life works – at a molecular level

• Key molecules:
  • DNA – Information store (Disk)
  • RNA – Key information transformer, also does stuff (RAM)
  • Proteins – The business end of life (Chip, robotic arms)
  • Metabolites – Fuel and signalling molecules (electricity)
• Theories of how these interact – no theories of to predict what
  they are
• Instead we determine attributes of molecules and store them in
  globally accessible, open, databases
Theory  Observation


                    Can accurately predict from models




 Must directly observe
    Molecular Geology,  Climate        High Energy
    Biology   Astronomy modelling      Physics
This ratio is not well correlated with data size


   ~60PB                        High Energy Physics

Data Size
             Molecular Astronomy
             Biology
    ~5PB                      Climate Models




             Ratio of model predictability
“Knowing stuff” is critical to biology…

• The bases of the human genome
  • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow….
• The functions of proteins
  • Enzymes, Transcription Factors, Signalling….
• The types of cells, their lineages and organ composition
  • …and all the molecular components in each cell
• Small molecules
  • … and their conversions, binding partners
• Structures of molecules, complexes and cells
  • … at atomic and higher resolution
Two fundamental types of information

• Experimental data           • Consensus Knowledge

• The result of a specific    • Integration of different
  experiment                    strands of information on a
• Often an experiment           topic
  specific, data heavy part   • Realised as a
  plus a “meta-data” part       computationally accessible
• Might be contradictory        scheme


• “Primary paper”             • “Review article”
Five types of curation
Experimental Data Entry

• Intact – Protein:Protein
  interactions


• GWAS Catalog –
  extraction of summary
  statistics
Experimental Meta data capture

• Sample, CDS lines in
  ENA
• Sample in Metabolights,
  PRIDE etc
• Machine and analysis
  specification in PDB,
  PRIDE, ENA
Consensus integration of information

• GenCode gene models in
  human
• Summaries and GO
  assignment in UniProt
• Pathway information in
  Reactome
• GO assignment and
  summaries in MODs (eg,
  PomBase, WormBase,
  PhytoPathDB etc)
Knowledge frameworks

•   The EC classification
•   Cell type ontologies
•   Cell lineages – Worms!
•   SnowMed, HPO etc
•   GO ontologies
Knowledge management

• Creation of rules
  representing ENA
  standards compliance
• Cross-ontology
  coordination (eg, EFO) or
  tieing (GO  ChEBI)
• RuleBase / UniRule
  curation processes
Data Entry vs Programming

 Direct                                    Programmatic
 Data Entry                                Data Entry




                      “Messy” Scripting
         Improved
         Data entry
         tools              RuleBase,
                            Computational Accessible
                            Standards
Thank You!
Curation Dilema

• If you do your job well…   • If you do your job badly…

• Everyone assumes it’s      • Everyone assumes it’s
  easy                         easy
• People forget about the    • People forget about the
  complexity                   complexity


• You are ignored           • People complain 
Why we need an infrastructure…
Infrastructures are critical…
But we only notice them when they go wrong
Biology already needs an information
infrastructure

• For the human genome
  • (…and the mouse, and the rat, and… x 150 now, 1000 in the
    future!) - Ensembl
• For the function of genes and proteins
  • For all genes, in text and computational – UniProt and GO
• For all 3D structures
  • To understand how proteins work – PDBe
• For where things are expressed
  • The differences and functionality of cells - Atlas
..But this keeps on going…

• We have to scale across all of (interesting) life
  • There are a lot of species out there!
• We have to handle new areas, in particular medicine
  • A set of European haplotypes for good imputation
  • A set of actionable variants in germline and cancers
• We have to improve our chemical understanding
  • Of biological chemicals
  • Of chemicals which interfere with Biology
ELIXIR’s mission
To build a sustainable
European infrastructure for
biological
information, supporting life
science research and its
                                                  medicine
translation to:

                                    environment


                         bioindustries

            society


              22
How?

Fully Centralised                                 Fully Distributed




Pros: Stability, reuse,             Pros: Responsive, Geographic
Learning ease                       Language responsive
Cons: Hard to concentrate           Cons: Internal communication overhead
Expertise across of life science    Harder for end users to learn
Geographic, language placement      Harder to provide multi-decade stability
Bottlenecks and lack of diversity
Research        Healthcare




    International    National
    EBI / Elixir     Healthcare
    English          National Language
    Low legalities   Complex legalities

2
Other infrastructures needed for biology
• EuroBioImaging
  • Cellular and whole organism Imaging
• BioBanks (BBMRI)
  • We need numbers – European populations – in particular for rare
    diseases, but also for specific sub types of common disease
• Mouse models and phenotypes (Infrafrontier)
  • A baseline set of knockouts and phenotypes in our most tractable
    mammalian model
  • (it’s hard to prove something in human)
• Robust molecular assays in a clinical setting (EATRIS)
  • The ability to reliably use state of the art molecular techniques in a
    clinical research setting
(you can follow me on twitter @ewanbirney)
I blog and update this on Google Plus publically

Contenu connexe

Tendances

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Fokhruz Zaman
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & OntologiesEric Jain
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomicsHoffman Lab
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformaticsNimrita Koul
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 

Tendances (20)

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomics
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Kegg
KeggKegg
Kegg
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 

Similaire à Ewan Birney Biocuration 2013

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdfsirwansleman
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hujimhutamu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biologyrobertstevens65
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Big Data
Big DataBig Data
Big DataSURFnet
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionUdayBhanushali111
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Michael Matthews
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questionsamlbinder
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study designamlbinder
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics Senthil Natesan
 

Similaire à Ewan Birney Biocuration 2013 (20)

Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Biocurator2012.41.hu
Biocurator2012.41.huBiocurator2012.41.hu
Biocurator2012.41.hu
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Big Data
Big DataBig Data
Big Data
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4Bi 140 science, technology and society module 4
Bi 140 science, technology and society module 4
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Using public databases to inform research questions
Using public databases to inform research questionsUsing public databases to inform research questions
Using public databases to inform research questions
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics
 

Plus de Iddo

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?Iddo
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific PresentationsIddo
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrIddo
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...Iddo
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongIddo
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaIddo
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Iddo
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsIddo
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputIddo
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceIddo
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryIddo
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergentIddo
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sourcesIddo
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013Iddo
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Iddo
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012Iddo
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011Iddo
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011Iddo
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuricIddo
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacaoIddo
 

Plus de Iddo (20)

What can Community Challenges do for You?
What can Community Challenges do for You?What can Community Challenges do for You?
What can Community Challenges do for You?
 
Surviving Scientific Presentations
Surviving Scientific PresentationsSurviving Scientific Presentations
Surviving Scientific Presentations
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nr
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Why Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is WrongWhy Your Microbiome Analysis is Wrong
Why Your Microbiome Analysis is Wrong
 
Tracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in BacteriaTracing the Ancestry of Genomes in Bacteria
Tracing the Ancestry of Genomes in Bacteria
 
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...Computational Challenges in Biological Data Science: an Optimistically Cautio...
Computational Challenges in Biological Data Science: an Optimistically Cautio...
 
Friedberg lab-overview-grad-students
Friedberg lab-overview-grad-studentsFriedberg lab-overview-grad-students
Friedberg lab-overview-grad-students
 
Understanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low OutputUnderstanding Biological Function in Times of High Throughput and Low Output
Understanding Biological Function in Times of High Throughput and Low Output
 
Random Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in ScienceRandom Musings on Fixing Data Shambles in Science
Random Musings on Fixing Data Shambles in Science
 
Genome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin DiscoveryGenome Informatics 2015 Bacteriocin Discovery
Genome Informatics 2015 Bacteriocin Discovery
 
Convergent divergent
Convergent divergentConvergent divergent
Convergent divergent
 
Some US Science Funding sources
Some US Science Funding sourcesSome US Science Funding sources
Some US Science Funding sources
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013
 
Metagenomics Biocuration 2013
Metagenomics Biocuration 2013Metagenomics Biocuration 2013
Metagenomics Biocuration 2013
 
Ismb grant-writing-2012
Ismb grant-writing-2012Ismb grant-writing-2012
Ismb grant-writing-2012
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011
 
Afp cafa djuric
Afp cafa djuricAfp cafa djuric
Afp cafa djuric
 
Go camp 2010_cacao
Go camp 2010_cacaoGo camp 2010_cacao
Go camp 2010_cacao
 

Dernier

Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...chandars293
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Ishani Gupta
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls ServiceGENUINE ESCORT AGENCY
 
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Anamika Rawat
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Anamika Rawat
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...chandars293
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...adilkhan87451
 
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...Dipal Arora
 
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableJanvi Singh
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...karishmasinghjnh
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...tanya dube
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...chennailover
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Anamika Rawat
 

Dernier (20)

Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
Andheri East ) Call Girls in Mumbai Phone No 9004268417 Elite Escort Service ...
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls (DIPAL) ⟟ 8250077686 ⟟ Call Me For Genuine Sex Serv...
 
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 

Ewan Birney Biocuration 2013

  • 2. Who am I? • Associate Director at European Bioinformatics Institute (EBI) • Involved in genomics since I was 19 (> 20 years!) • Trained as a biochemist – most people think I am CS EBI is in Hinxton, South • Analysed – sometimes lead Cambridgeshire – human/mouse/rat/platypus EBI is part of EMBL, ~like etc genomes, ENCODE, CERN for molecular biology Others.
  • 3. Molecular Biology • The study of how life works – at a molecular level • Key molecules: • DNA – Information store (Disk) • RNA – Key information transformer, also does stuff (RAM) • Proteins – The business end of life (Chip, robotic arms) • Metabolites – Fuel and signalling molecules (electricity) • Theories of how these interact – no theories of to predict what they are • Instead we determine attributes of molecules and store them in globally accessible, open, databases
  • 4. Theory  Observation Can accurately predict from models Must directly observe Molecular Geology, Climate High Energy Biology Astronomy modelling Physics
  • 5. This ratio is not well correlated with data size ~60PB High Energy Physics Data Size Molecular Astronomy Biology ~5PB Climate Models Ratio of model predictability
  • 6. “Knowing stuff” is critical to biology… • The bases of the human genome • … and the Mouse, Rat, Wheat, Ecoli, Plasmodium, Cow…. • The functions of proteins • Enzymes, Transcription Factors, Signalling…. • The types of cells, their lineages and organ composition • …and all the molecular components in each cell • Small molecules • … and their conversions, binding partners • Structures of molecules, complexes and cells • … at atomic and higher resolution
  • 7. Two fundamental types of information • Experimental data • Consensus Knowledge • The result of a specific • Integration of different experiment strands of information on a • Often an experiment topic specific, data heavy part • Realised as a plus a “meta-data” part computationally accessible • Might be contradictory scheme • “Primary paper” • “Review article”
  • 8. Five types of curation
  • 9. Experimental Data Entry • Intact – Protein:Protein interactions • GWAS Catalog – extraction of summary statistics
  • 10. Experimental Meta data capture • Sample, CDS lines in ENA • Sample in Metabolights, PRIDE etc • Machine and analysis specification in PDB, PRIDE, ENA
  • 11. Consensus integration of information • GenCode gene models in human • Summaries and GO assignment in UniProt • Pathway information in Reactome • GO assignment and summaries in MODs (eg, PomBase, WormBase, PhytoPathDB etc)
  • 12. Knowledge frameworks • The EC classification • Cell type ontologies • Cell lineages – Worms! • SnowMed, HPO etc • GO ontologies
  • 13. Knowledge management • Creation of rules representing ENA standards compliance • Cross-ontology coordination (eg, EFO) or tieing (GO  ChEBI) • RuleBase / UniRule curation processes
  • 14. Data Entry vs Programming Direct Programmatic Data Entry Data Entry “Messy” Scripting Improved Data entry tools RuleBase, Computational Accessible Standards
  • 16. Curation Dilema • If you do your job well… • If you do your job badly… • Everyone assumes it’s • Everyone assumes it’s easy easy • People forget about the • People forget about the complexity complexity • You are ignored  • People complain 
  • 17. Why we need an infrastructure…
  • 19. But we only notice them when they go wrong
  • 20. Biology already needs an information infrastructure • For the human genome • (…and the mouse, and the rat, and… x 150 now, 1000 in the future!) - Ensembl • For the function of genes and proteins • For all genes, in text and computational – UniProt and GO • For all 3D structures • To understand how proteins work – PDBe • For where things are expressed • The differences and functionality of cells - Atlas
  • 21. ..But this keeps on going… • We have to scale across all of (interesting) life • There are a lot of species out there! • We have to handle new areas, in particular medicine • A set of European haplotypes for good imputation • A set of actionable variants in germline and cancers • We have to improve our chemical understanding • Of biological chemicals • Of chemicals which interfere with Biology
  • 22. ELIXIR’s mission To build a sustainable European infrastructure for biological information, supporting life science research and its medicine translation to: environment bioindustries society 22
  • 23. How? Fully Centralised Fully Distributed Pros: Stability, reuse, Pros: Responsive, Geographic Learning ease Language responsive Cons: Hard to concentrate Cons: Internal communication overhead Expertise across of life science Harder for end users to learn Geographic, language placement Harder to provide multi-decade stability Bottlenecks and lack of diversity
  • 24. Research Healthcare International National EBI / Elixir Healthcare English National Language Low legalities Complex legalities 2
  • 25. Other infrastructures needed for biology • EuroBioImaging • Cellular and whole organism Imaging • BioBanks (BBMRI) • We need numbers – European populations – in particular for rare diseases, but also for specific sub types of common disease • Mouse models and phenotypes (Infrafrontier) • A baseline set of knockouts and phenotypes in our most tractable mammalian model • (it’s hard to prove something in human) • Robust molecular assays in a clinical setting (EATRIS) • The ability to reliably use state of the art molecular techniques in a clinical research setting
  • 26. (you can follow me on twitter @ewanbirney) I blog and update this on Google Plus publically