SlideShare a Scribd company logo
1 of 26
Jo McEntyre, EMBL-EBI
Mining Data Availability Statements for GWAS data
GWAS and the GWAS Catalog
• GWAS
analyse
variants
across the
genome to
identify loci
associated
with a
disease or
phenotype
Study metadata
including:
- Trait
- Sample
information
Publication
information
Results
- Lead
associations
- Summary
statistics
GWAS
Catalog
data
GWAS Catalog content
As of October 2019
• 4,220 publications
• 7,661 studies
• 157,336 variant-trait assoc.
• 276 pubs with summary
statistics, >8,000 datasets
www.ebi.ac.uk/gwas
What is Europe PMC?
Europe PMC– free digital archive of
biomedical and life sciences research publications
Content in Europe PMC
Europe PMC is a partner in PubMed Central International
Text mining infrastructure
• Gene-disease relationships
• Mutations
• GeneRIFs
• Diseases and phenotypes
• Phosphorylation events
• Transcription factor-target
interactions
• Organisms
• Gene/proteins
• GO terms
• ChEBI
• EFO
• Grants
• Accession numbers
Text mining platform: SciLite
application
Accession numbers mined from full text publications
ELIXIR Core Data
Resources and Deposition Databases
Cross-links between GWAS and Europe PMC
Data Availability statements in Europe PMC
<title> and XML path
Title XML path Frequency
Data Availability article:front:notes 90,928
Data accessibility article:back:sec 2,694
Data Availability article:back:sec:fn-group 2,580
Data article:body:sec 2,265
Availability of supporting data article:body:sec 1,593
Major datasets article:back:sec:sec 1,074
Database survey article:body:sec 986
Extended Data article:body:sec 851
Data availability article:body:sec 795
Extended Data Figure 1 article:body:sec:SecTag:fig 689
Top 10 combinations of <title> content containing “data” and XML path
Some unhelpful statements
Curating papers for the GWAS catalog
GWAS Catalog literature identification:
Query based vs machine learning
Query-based Machine learning
Precision 6% 27%
Recall 100% 96%
Improved efficiency
80% reduction in publications to review
average 144 to 30/week
Summary statistics in the GWAS Catalog by publication year
% of publications with summary statistics over time & in the whole Catalog
Summary statistics for users
Facilitating data integration
and downstream analyses
The end
GWAS Catalog literature identification
• Previously used manual query based search term
• Query: genomewide OR genome wide OR genome-wide OR GWAS
• Now replaced with machine learning based search
• convolutional neural net trained on corpus of GWAS Catalog
publications
• Collaboration with Zhiyong Lu’s group
Lee et al, PMID 30102703 , PloS Comp Bio
• ML results triaged by curator in custom Pubtator interface
Old literature search and triage
process
• Manual search in PubMed
• Query: genomewide OR genome wide OR genome-
wide OR GWAS
• Curator assesses each publication for eligibility for inclusion in
GWAS Catalog
• Specific eligibility criteria
https://www.ebi.ac.uk/gwas/docs/methods/criteria
• Genome wide association study of >100,000 variants distributed
genome
Deep learning algorithm (convolutional neural net) trained on corpus of
GWAS Catalog publications)
Figure 1. Lee et al, PMID 30102703 , PloS Comp Bio
Machine learning search
Corpus of
GWAS Catalog
publications
GWAS Catalog machine learning literature
search method
• Precision 27%
• Recall 96%
Table 3. Lee et al, PMID 30102703 , PloS Comp Bio
Machine learning:
• Improved efficiency (80% reduction in publications to review, 144 to 30/week)
• Similar capture of eligible studies
GWAS Catalog machine learning literature search method vs
query based search
Table 3. Lee et al, PMID 30102703 , PloS Comp Bio
Uses
Narrow-down/prioritise
candidate loci
Drug
target
discovery
Predict
disease risk
Understand
disease
mechanism
Statistics on
disease data
and research
DOI citations within DASs
Most popular data repositories based on DOI citations in DASs (Jan-Mar 2019)
(?i)(10[.]d{4,9})(?=/)(?=[-._;()/:A-Z0-9]+)

More Related Content

What's hot

Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
Trish Whetzel
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
Gerald Lushington
 

What's hot (20)

As a result of the mandates
As a result of the mandatesAs a result of the mandates
As a result of the mandates
 
PubMed
PubMedPubMed
PubMed
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource Discovery
 
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
Access to Freely Available Journal Articles: Gold, Green, and Rogue Open Ac...
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and Biositemaps
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
 
Discovery impact erl2014
Discovery impact erl2014Discovery impact erl2014
Discovery impact erl2014
 
Cameron Neylon - Lightning talk at NISO Altmetrics Initiative
Cameron Neylon - Lightning talk at NISO Altmetrics InitiativeCameron Neylon - Lightning talk at NISO Altmetrics Initiative
Cameron Neylon - Lightning talk at NISO Altmetrics Initiative
 
NISO Apr 29 Virtual Conference: Value in numbers: A Shared Approach to Measur...
NISO Apr 29 Virtual Conference: Value in numbers: A Shared Approach to Measur...NISO Apr 29 Virtual Conference: Value in numbers: A Shared Approach to Measur...
NISO Apr 29 Virtual Conference: Value in numbers: A Shared Approach to Measur...
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Open Access and Publishers - Michael Mabe (2007)
Open Access and Publishers - Michael Mabe (2007)Open Access and Publishers - Michael Mabe (2007)
Open Access and Publishers - Michael Mabe (2007)
 
NISO Apr 29 Virtual Conference: Dismantling a Single-Discipline Journal Bundl...
NISO Apr 29 Virtual Conference: Dismantling a Single-Discipline Journal Bundl...NISO Apr 29 Virtual Conference: Dismantling a Single-Discipline Journal Bundl...
NISO Apr 29 Virtual Conference: Dismantling a Single-Discipline Journal Bundl...
 
The Growing Call for Open Access - Heather Joseph (2007)
The Growing Call for Open Access - Heather Joseph (2007)The Growing Call for Open Access - Heather Joseph (2007)
The Growing Call for Open Access - Heather Joseph (2007)
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
 
Niso dda uksg 2014
Niso dda uksg 2014Niso dda uksg 2014
Niso dda uksg 2014
 

Similar to GWAS and DAS

Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
Yatpang Cheung
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
Trish Whetzel
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 

Similar to GWAS and DAS (20)

Mcentyre dryad-orcid_may2013
Mcentyre dryad-orcid_may2013Mcentyre dryad-orcid_may2013
Mcentyre dryad-orcid_may2013
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case study
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
Data availability Study
Data availability Study Data availability Study
Data availability Study
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
openSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association StudiesopenSNP - Crowdsourcing Genome Wide Association Studies
openSNP - Crowdsourcing Genome Wide Association Studies
 
Data integration
Data integrationData integration
Data integration
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 

More from Verena139

Orcid implementation in uk 29092014
Orcid implementation in uk 29092014Orcid implementation in uk 29092014
Orcid implementation in uk 29092014
Verena139
 
Thunderbolts and lightning outputs
Thunderbolts and lightning outputsThunderbolts and lightning outputs
Thunderbolts and lightning outputs
Verena139
 
Weathering the storm outputs
Weathering the storm outputsWeathering the storm outputs
Weathering the storm outputs
Verena139
 

More from Verena139 (14)

Peer judge: Praise and Criticism Detection in F1000Research reviews
Peer judge: Praise and Criticism Detection in F1000Research reviews Peer judge: Praise and Criticism Detection in F1000Research reviews
Peer judge: Praise and Criticism Detection in F1000Research reviews
 
Tracking data
Tracking dataTracking data
Tracking data
 
Metrics for oa monographs - introduction
Metrics for oa monographs - introductionMetrics for oa monographs - introduction
Metrics for oa monographs - introduction
 
Thoughts on metrics for OA monographs
Thoughts on metrics for OA monographsThoughts on metrics for OA monographs
Thoughts on metrics for OA monographs
 
Operas Metrics Service
Operas Metrics Service Operas Metrics Service
Operas Metrics Service
 
Reproducibility Analytics Lab
Reproducibility Analytics Lab Reproducibility Analytics Lab
Reproducibility Analytics Lab
 
Prediction markets
Prediction markets  Prediction markets
Prediction markets
 
Jisc R&D work in Research Analytics
Jisc R&D work in Research AnalyticsJisc R&D work in Research Analytics
Jisc R&D work in Research Analytics
 
ORCID: Jisc&ARMA final meeting update by Josh Brown
ORCID: Jisc&ARMA final meeting update by Josh BrownORCID: Jisc&ARMA final meeting update by Josh Brown
ORCID: Jisc&ARMA final meeting update by Josh Brown
 
Orcid implementation in uk 29092014
Orcid implementation in uk 29092014Orcid implementation in uk 29092014
Orcid implementation in uk 29092014
 
ORCID: Jisc&ARMA progress meeting update by Josh Brown
ORCID: Jisc&ARMA progress meeting update by Josh Brown ORCID: Jisc&ARMA progress meeting update by Josh Brown
ORCID: Jisc&ARMA progress meeting update by Josh Brown
 
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
 
Thunderbolts and lightning outputs
Thunderbolts and lightning outputsThunderbolts and lightning outputs
Thunderbolts and lightning outputs
 
Weathering the storm outputs
Weathering the storm outputsWeathering the storm outputs
Weathering the storm outputs
 

Recently uploaded

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Recently uploaded (20)

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

GWAS and DAS

  • 1. Jo McEntyre, EMBL-EBI Mining Data Availability Statements for GWAS data
  • 2. GWAS and the GWAS Catalog • GWAS analyse variants across the genome to identify loci associated with a disease or phenotype Study metadata including: - Trait - Sample information Publication information Results - Lead associations - Summary statistics GWAS Catalog data
  • 3. GWAS Catalog content As of October 2019 • 4,220 publications • 7,661 studies • 157,336 variant-trait assoc. • 276 pubs with summary statistics, >8,000 datasets www.ebi.ac.uk/gwas
  • 4. What is Europe PMC? Europe PMC– free digital archive of biomedical and life sciences research publications
  • 5. Content in Europe PMC Europe PMC is a partner in PubMed Central International
  • 6. Text mining infrastructure • Gene-disease relationships • Mutations • GeneRIFs • Diseases and phenotypes • Phosphorylation events • Transcription factor-target interactions • Organisms • Gene/proteins • GO terms • ChEBI • EFO • Grants • Accession numbers
  • 7. Text mining platform: SciLite application
  • 8. Accession numbers mined from full text publications ELIXIR Core Data Resources and Deposition Databases
  • 9. Cross-links between GWAS and Europe PMC
  • 11. <title> and XML path Title XML path Frequency Data Availability article:front:notes 90,928 Data accessibility article:back:sec 2,694 Data Availability article:back:sec:fn-group 2,580 Data article:body:sec 2,265 Availability of supporting data article:body:sec 1,593 Major datasets article:back:sec:sec 1,074 Database survey article:body:sec 986 Extended Data article:body:sec 851 Data availability article:body:sec 795 Extended Data Figure 1 article:body:sec:SecTag:fig 689 Top 10 combinations of <title> content containing “data” and XML path
  • 13.
  • 14. Curating papers for the GWAS catalog
  • 15. GWAS Catalog literature identification: Query based vs machine learning Query-based Machine learning Precision 6% 27% Recall 100% 96% Improved efficiency 80% reduction in publications to review average 144 to 30/week
  • 16. Summary statistics in the GWAS Catalog by publication year % of publications with summary statistics over time & in the whole Catalog
  • 17. Summary statistics for users Facilitating data integration and downstream analyses
  • 18.
  • 20. GWAS Catalog literature identification • Previously used manual query based search term • Query: genomewide OR genome wide OR genome-wide OR GWAS • Now replaced with machine learning based search • convolutional neural net trained on corpus of GWAS Catalog publications • Collaboration with Zhiyong Lu’s group Lee et al, PMID 30102703 , PloS Comp Bio • ML results triaged by curator in custom Pubtator interface
  • 21. Old literature search and triage process • Manual search in PubMed • Query: genomewide OR genome wide OR genome- wide OR GWAS • Curator assesses each publication for eligibility for inclusion in GWAS Catalog • Specific eligibility criteria https://www.ebi.ac.uk/gwas/docs/methods/criteria • Genome wide association study of >100,000 variants distributed genome
  • 22. Deep learning algorithm (convolutional neural net) trained on corpus of GWAS Catalog publications) Figure 1. Lee et al, PMID 30102703 , PloS Comp Bio Machine learning search Corpus of GWAS Catalog publications
  • 23. GWAS Catalog machine learning literature search method • Precision 27% • Recall 96% Table 3. Lee et al, PMID 30102703 , PloS Comp Bio
  • 24. Machine learning: • Improved efficiency (80% reduction in publications to review, 144 to 30/week) • Similar capture of eligible studies GWAS Catalog machine learning literature search method vs query based search Table 3. Lee et al, PMID 30102703 , PloS Comp Bio
  • 26. DOI citations within DASs Most popular data repositories based on DOI citations in DASs (Jan-Mar 2019) (?i)(10[.]d{4,9})(?=/)(?=[-._;()/:A-Z0-9]+)