SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
European Bioinformatics Institute -
the home for big data in biology
www.ebi.ac.uk
Advanced Bioinformatics for Genomics
and BioData Driven Research
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Rome, Italy
>1700 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
Our mission
Deliver
excellent
research
Train the
next
generation
of scientists
Engage with
industry
Coordinate
bioinformatics
in Europe
Deliver
scientific
services
Data and tools to support life science research
www.ebi.ac.uk/services
Bioinformatics services
What services do we provide? Labs around the
world send us their
data and we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide tools
to help
researchers
use it
A collaborative
enterprise
~64 million
requests to EMBL-EBI websites
every day
273 petabytes
of raw storage in our data centres
22 500
participants to EMBL-EBI Training
events
Requests from
20 million
unique IP addresses
Big Data, big demand for EMBL-EBI data services…
Data resources at EMBL-EBI
Data resources for Genomics – Molecular Archives
BioSamples database - centralised resource for FAIR sample data
(>12 million samples)
Experimental Factor Ontology - systematic description of experimental
variables available in EBI databases and projects (26,764 terms)
European Genome-phenome Archive - sequence and genotype
experiments, including case-control and population studies (3,445 studies)
European Nucleotide Archive (ENA) - record of the world's nucleotide
sequencing information (>2,400 million sequences, > 7,200 billion bases)
European Variation Archive - sole international resource for human and
non-human variation
Data resources for Genomics – Genes, Genomes & Variation
Ensembl - genome browser (human: >0.6 billion SNV, >6 million SV)
Ensembl Genomes - 275 vertebrate species / strains; Metazoa; Plants;
Fungi; Protists; Bacteria
GWAS Catalog - moved to EBI in 2015 (4,390 publicn., > 17,000 assocn.)
HGNC - 41,787 approved gene entries (19,320 protein coding)
International Genome Sample Resource - ensures future usability and
accessibility of 1000 Genomes Project data
VEP started as a simple wrapper around the Ensembl API to map variants to
transcripts and predict molecular consequence.
As new data sets and algorithms have become available, functionality has
increased and VEP is now an extensive and sophisticated tool
The Ensembl Variant Effect Predictor
New resource for Genomics
• New resource for gene expression and splicing QTLs
• https://www.ebi.ac.uk/eqtl/
Global Alliance for Genomics and Health (GA4GH)
• Chaired by EMBL-EBI Director Ewan Birney
• EMBL-EBI teams leading various activities in Technical Work Streams:
• Large Scale Genomics (file formats and htsget subgroups)
• Clinical and phenotypic data capture
• Data Use and Researcher identification
• ENA/EGA/EVA and HCA DCP are also Driver Projects
Data resources for Genomics – Molecular Atlas
• Human Cell Atlas Data Coordination Platform
• In 2017, Chan Zuckerberg Initiative (CZI) funding to EMBL-
EBI, Broad Institute and the UCSC Genomics Institute, to
build a cloud-based data coordination platform
• HCA will generate petabytes of data for billions of cells,
across multiple modalities, generated by hundreds of labs
around the world
• DCP will organise, curate, standardise analyse this data
and enable open data access
Data resources for Genomics – Proteins and Protein Families
A free to use resource for the archiving,
assembly, analysis, & browsing of
microbiome data
AnalysisData archiving Assembly
NEW Resource: BioImage Archive
Molecules Cells
Tissues /
Organisms
Molecular
Machines
Graphic courtesy of Jan Ellenberg
Light Sheet
Microscopy
High Throughput
Microscopy
Superresolution
Microscopy
Cryo Electron
Microscopy
Correlate Technologies
Integrate Data
0.1 TB / day
0.5 TB / dataset
0.5 TB / day
7.5 TB / dataset
40 TB / day
10 TB / dataset
5 TB / day
20 TB / dataset
Data-driven discovery
Research
www.ebi.ac.uk/research
Zamin
Iqbal
Thomas
Keene
John
Marioni
Janet
Thornton
Andrew
Leach
Evangelia
Petsalaki
Virginie
Uhlmann
Daniel
Zerbino
Paul
Flicaek
Nick
Goldman
Rob
Finn
Alvis
Brazma
Pedro
Beltrao
Alex
Bateman
Ewan
Birney
Moritz
Gerstung
Isidro
Cortes-
Ciriano
Research groups at EMBL-EBI
Irene
Papatheodorou
In 2018, EMBL-EBI had 165 grants awarded, 120 jointly funded with researchers and institutes in 62 countries
Pedro Beltrao: Functional landscape of the human phosphoproteome
Ochoa et al Nature Biotech 2019
• Created largest phospho-
proteome resource to date
(120,000 human phosphosites)
• Used machine learning methods
to compile and analyse large
phosphorylation related biological
datasets
• Identifying new functional
phosphosites has enormous
potential to progress research
into many biological processes
and diseases
Evangelia Petsalaki: Inference of kinase-kinase regulatory networks
from phosphoproteomics data (collaboration with Beltrao group)
Invergo*,Petursson* et al, bioRxiv
Moritz Gerstung: Pan-cancer computational histopathology
• Analysis with deep learning extracts histopathological patterns
• accurately discriminates 28 cancer and 14 normal tissue types
• Predicts: whole genome duplications; focal amplifications and deletions; driver gene
mutations
• Correlations with gene expression indicative of immune infiltration and proliferation
• Prognostic information augments conventional grading and histopathology subtyping
https://doi.org/10.1101/813543
Zam Iqbal: Mykrobe – predicting TB drug resistance from WGS data
https://wellcomeopenresearch.org/articles/4-191/v1
Virginie Uhlmann: Mathematical models for bioimage analysis
doi.org/10.1371/journal.pone.0173433
Dictionary Learning for Two-Dimensional Kendall Shapes
https://arxiv.org/abs/1903.11356
An example of best practice for complex datasets
Single Cell RNA-Seq analysis at EMBL-EBI
From Irene Papatheodorou
Team Leader – Gene Expression
ArrayExpress – functional genomics archive
• started in 2000 as an archive
for microarray data
• evolved into general archive for
high-throughput functional
genomics data (microarray- or
NGS- based)
• all data are manually curated
prior to inclusion
• microarray data stored directly
in ArrayExpress
• sequencing data brokered to
and stored in ENA
• curated datasets support
reproducible and re-usable
research
Annotare – Minimum information about a scRNA-Seq
experiment
single cell
isolation
single cell well
quality
OK
doublet
debris
single cell
identifier barcode
UMI
cDNA
read
pass
fail
post-analysis single
cell quality
library
construction
inferred
cell type
R1
R2
I1
files
sample
metadata
https://arxiv.org/abs/1910.14623
From database to knowledgebase: Expression Atlases
165 baseline expression
~ 3,350 differential expression
> 3,500 bulk datasets
62 species
> 955,000 assays
> 120 single-cell datasets
12 species
https://www.ebi.ac.uk/gxa
https://www.ebi.ac.uk/gxa/sc/home
Interactive Analysis with Galaxy
https://humancellatlas.usegalaxy.eu/
Flexible
Interoperable
Scalable
Main Points
• Enabling rational choices when composing workflows
• Using a common exchange format as ‘workflow glue’
• Galaxy integrations
What people usually do...
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
OR
OR
What we really should be doing
Read Filter Normalise Compare Cluster Markers
Problem 2:
need format glue!
... but to do that we need interoperable components
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Problem 1:
components in different
languages
Our solution
Read Filter Normalise Compare Cluster Markers
Environments &
containers
Workflows
CLI CLI CLI CLI CLI CLIScripts layer
Galaxy integrations
• Extended Galaxy init container:
• Thin tool wrappers leveraging Bioconda wrappers
• Starting tertiary workflows
• Added logic for dynamic destinations
• Leverage existing Kubernetes integrations
• Improved LSF functionality for non-DRMAA clusters:
• Improved CLI executor
https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary
Pablo
Moreno
Summary
• ArrayExpress/Annotare for data Submissions
• Expression Atlas/Single Cell Expression Atlas
• Analysis Workflows in Galaxy
Open Targets
Data integration Platforms
Drug discovery
• Finding the right biological target for a
drug requires bioinformatics to:
• identify promising targets
• select candidate medicines.
• EMBL-EBI services support all stages
of drug discovery:
• Ensembl
• UniProt
• ChEMBL
• Protein Data Bank in Europe
• Reactome
• Pinpointing the processes in the human body
that have a demonstrable effect on disease
• Aims to improve the success rate in the
discovery and repurposing of medicines
• A new kind of collaboration with:
• GSK
• EMBL-EBI
• Wellcome Sanger Institute
• Biogen
• Takeda
• Celgene
• Sanofi
Open Targets
www.opentargets.org
Open Targets Platform and Open Targets Genetics
www.targetvalidation.org genetics.opentargets.org
Challenges for the near future
• Non-coding SNVs
• Data standardization to enable AI/ML
• Connecting data
• Moving to the cloud
www.ebi.ac.uk
Stay in touch
Twitter: @emblebi
Facebook: EMBLEBI
LinkedIn: /company/ebi
YouTube: EMBLMedia

Contenu connexe

Tendances

Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Monica Munoz-Torres
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Functional Genomics Data Society
 
VariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitVariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitData Con LA
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.Elena Sügis
 

Tendances (20)

Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
VariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitVariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn Langit
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Article
ArticleArticle
Article
 

Similaire à Advanced Bioinformatics for Genomics and BioData Driven Research

Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalChristoph Steinbeck
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsGigaScience, BGI Hong Kong
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingOla Spjuth
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen ARDC
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeRafael C. Jimenez
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesRafael C. Jimenez
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsMakarand Bhale
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 

Similaire à Advanced Bioinformatics for Genomics and BioData Driven Research (20)

Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNal
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 

Dernier

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 

Dernier (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 

Advanced Bioinformatics for Genomics and BioData Driven Research

  • 1. European Bioinformatics Institute - the home for big data in biology www.ebi.ac.uk Advanced Bioinformatics for Genomics and BioData Driven Research
  • 2. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Rome, Italy >1700 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 3. Our mission Deliver excellent research Train the next generation of scientists Engage with industry Coordinate bioinformatics in Europe Deliver scientific services
  • 4. Data and tools to support life science research www.ebi.ac.uk/services Bioinformatics services
  • 5. What services do we provide? Labs around the world send us their data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise
  • 6. ~64 million requests to EMBL-EBI websites every day 273 petabytes of raw storage in our data centres 22 500 participants to EMBL-EBI Training events Requests from 20 million unique IP addresses Big Data, big demand for EMBL-EBI data services…
  • 7. Data resources at EMBL-EBI
  • 8. Data resources for Genomics – Molecular Archives BioSamples database - centralised resource for FAIR sample data (>12 million samples) Experimental Factor Ontology - systematic description of experimental variables available in EBI databases and projects (26,764 terms) European Genome-phenome Archive - sequence and genotype experiments, including case-control and population studies (3,445 studies) European Nucleotide Archive (ENA) - record of the world's nucleotide sequencing information (>2,400 million sequences, > 7,200 billion bases) European Variation Archive - sole international resource for human and non-human variation
  • 9. Data resources for Genomics – Genes, Genomes & Variation Ensembl - genome browser (human: >0.6 billion SNV, >6 million SV) Ensembl Genomes - 275 vertebrate species / strains; Metazoa; Plants; Fungi; Protists; Bacteria GWAS Catalog - moved to EBI in 2015 (4,390 publicn., > 17,000 assocn.) HGNC - 41,787 approved gene entries (19,320 protein coding) International Genome Sample Resource - ensures future usability and accessibility of 1000 Genomes Project data
  • 10. VEP started as a simple wrapper around the Ensembl API to map variants to transcripts and predict molecular consequence. As new data sets and algorithms have become available, functionality has increased and VEP is now an extensive and sophisticated tool The Ensembl Variant Effect Predictor
  • 11. New resource for Genomics • New resource for gene expression and splicing QTLs • https://www.ebi.ac.uk/eqtl/
  • 12. Global Alliance for Genomics and Health (GA4GH) • Chaired by EMBL-EBI Director Ewan Birney • EMBL-EBI teams leading various activities in Technical Work Streams: • Large Scale Genomics (file formats and htsget subgroups) • Clinical and phenotypic data capture • Data Use and Researcher identification • ENA/EGA/EVA and HCA DCP are also Driver Projects
  • 13. Data resources for Genomics – Molecular Atlas • Human Cell Atlas Data Coordination Platform • In 2017, Chan Zuckerberg Initiative (CZI) funding to EMBL- EBI, Broad Institute and the UCSC Genomics Institute, to build a cloud-based data coordination platform • HCA will generate petabytes of data for billions of cells, across multiple modalities, generated by hundreds of labs around the world • DCP will organise, curate, standardise analyse this data and enable open data access
  • 14. Data resources for Genomics – Proteins and Protein Families A free to use resource for the archiving, assembly, analysis, & browsing of microbiome data AnalysisData archiving Assembly
  • 15. NEW Resource: BioImage Archive Molecules Cells Tissues / Organisms Molecular Machines Graphic courtesy of Jan Ellenberg Light Sheet Microscopy High Throughput Microscopy Superresolution Microscopy Cryo Electron Microscopy Correlate Technologies Integrate Data 0.1 TB / day 0.5 TB / dataset 0.5 TB / day 7.5 TB / dataset 40 TB / day 10 TB / dataset 5 TB / day 20 TB / dataset
  • 18. Pedro Beltrao: Functional landscape of the human phosphoproteome Ochoa et al Nature Biotech 2019 • Created largest phospho- proteome resource to date (120,000 human phosphosites) • Used machine learning methods to compile and analyse large phosphorylation related biological datasets • Identifying new functional phosphosites has enormous potential to progress research into many biological processes and diseases
  • 19. Evangelia Petsalaki: Inference of kinase-kinase regulatory networks from phosphoproteomics data (collaboration with Beltrao group) Invergo*,Petursson* et al, bioRxiv
  • 20. Moritz Gerstung: Pan-cancer computational histopathology • Analysis with deep learning extracts histopathological patterns • accurately discriminates 28 cancer and 14 normal tissue types • Predicts: whole genome duplications; focal amplifications and deletions; driver gene mutations • Correlations with gene expression indicative of immune infiltration and proliferation • Prognostic information augments conventional grading and histopathology subtyping https://doi.org/10.1101/813543
  • 21. Zam Iqbal: Mykrobe – predicting TB drug resistance from WGS data https://wellcomeopenresearch.org/articles/4-191/v1
  • 22. Virginie Uhlmann: Mathematical models for bioimage analysis doi.org/10.1371/journal.pone.0173433
  • 23. Dictionary Learning for Two-Dimensional Kendall Shapes https://arxiv.org/abs/1903.11356
  • 24. An example of best practice for complex datasets Single Cell RNA-Seq analysis at EMBL-EBI From Irene Papatheodorou Team Leader – Gene Expression
  • 25. ArrayExpress – functional genomics archive • started in 2000 as an archive for microarray data • evolved into general archive for high-throughput functional genomics data (microarray- or NGS- based) • all data are manually curated prior to inclusion • microarray data stored directly in ArrayExpress • sequencing data brokered to and stored in ENA • curated datasets support reproducible and re-usable research
  • 26. Annotare – Minimum information about a scRNA-Seq experiment single cell isolation single cell well quality OK doublet debris single cell identifier barcode UMI cDNA read pass fail post-analysis single cell quality library construction inferred cell type R1 R2 I1 files sample metadata https://arxiv.org/abs/1910.14623
  • 27. From database to knowledgebase: Expression Atlases 165 baseline expression ~ 3,350 differential expression > 3,500 bulk datasets 62 species > 955,000 assays > 120 single-cell datasets 12 species https://www.ebi.ac.uk/gxa
  • 29. Interactive Analysis with Galaxy https://humancellatlas.usegalaxy.eu/ Flexible Interoperable Scalable
  • 30. Main Points • Enabling rational choices when composing workflows • Using a common exchange format as ‘workflow glue’ • Galaxy integrations
  • 31. What people usually do... Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers OR OR
  • 32. What we really should be doing Read Filter Normalise Compare Cluster Markers
  • 33. Problem 2: need format glue! ... but to do that we need interoperable components Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Problem 1: components in different languages
  • 34. Our solution Read Filter Normalise Compare Cluster Markers Environments & containers Workflows CLI CLI CLI CLI CLI CLIScripts layer
  • 35. Galaxy integrations • Extended Galaxy init container: • Thin tool wrappers leveraging Bioconda wrappers • Starting tertiary workflows • Added logic for dynamic destinations • Leverage existing Kubernetes integrations • Improved LSF functionality for non-DRMAA clusters: • Improved CLI executor https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary Pablo Moreno
  • 36.
  • 37. Summary • ArrayExpress/Annotare for data Submissions • Expression Atlas/Single Cell Expression Atlas • Analysis Workflows in Galaxy
  • 39. Drug discovery • Finding the right biological target for a drug requires bioinformatics to: • identify promising targets • select candidate medicines. • EMBL-EBI services support all stages of drug discovery: • Ensembl • UniProt • ChEMBL • Protein Data Bank in Europe • Reactome
  • 40. • Pinpointing the processes in the human body that have a demonstrable effect on disease • Aims to improve the success rate in the discovery and repurposing of medicines • A new kind of collaboration with: • GSK • EMBL-EBI • Wellcome Sanger Institute • Biogen • Takeda • Celgene • Sanofi Open Targets www.opentargets.org
  • 41. Open Targets Platform and Open Targets Genetics www.targetvalidation.org genetics.opentargets.org
  • 42. Challenges for the near future • Non-coding SNVs • Data standardization to enable AI/ML • Connecting data • Moving to the cloud
  • 43.
  • 44. www.ebi.ac.uk Stay in touch Twitter: @emblebi Facebook: EMBLEBI LinkedIn: /company/ebi YouTube: EMBLMedia