Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their impact on genomic data analysis

•Télécharger en tant que PPTX, PDF•

0 j'aime•392 vues

This document discusses how changes over time to the Gene Ontology (GO) and GO annotations can impact genomic data analysis and enrichment results. The author analyzed over 2,500 gene lists from past studies and found that enrichment results become less semantically similar over time, with 47% having less similar results after 11 years on average compared to the initial time of publication. While objective changes may occur, subjective impressions of results can remain the same. Researchers are encouraged to use the GOtrack database to evaluate how changes may affect their own data and results.

Sciences

Monitoring changes in the Gene
Ontology
and their impact on genomic data analysis
Paul Pavlidis, PhD
University of British Columbia, Vancouver, BC Canada
https://pavlab.msl.ubc.ca
October 25, 2018
GigaScience Prize Track

2
Matthew Jacobson Adriana Estela Sedeño-Cortés

The Gene Ontology in 60 seconds
3
GO = Hierarchy of >45000 terms
describing gene function
Applied by annotators to genes with
evidence codes
(“GO annotations” = GOA)
Used in tens of thousands of papers
• Gene description
• Algorithm evaluation
• Enrichment analysis GRIN1

Both GO and GOA change over time
Does it matter?
• Are old enrichment results and other
interpretations based on GO still valid?
• Will new results be valid in the future?
No easy way for researchers to easily
evaluate the effects on their own data.
4

5
GOtrack database
• Data for 9 model organisms
• Dating back to 2001
• Over 200,000,000 data points
• Updated monthly
Web app functionality
Track genes and terms
Track enrichment results

Annotation churn
7
Term present
Term absent

Ingredients for an enrichment analysis
9
Statistica
l test

Evaluating the effect of GO/GOA changes
Inputs: Gene lists from MSigDB
• >2500 Chemical and genetic perturbations (CGP) – “hit lists”
• 0.5-16 years old (median 11)
10

Evaluating the effect on enrichment
analysis• Perform enrichment analysis using GO/GOA for the time of publication (t0)
to a recent time point (tnow)
• Compare the lists of enriched terms at t0 and tnow using semantic similarity
measures (Jaccard and others)
11
Define a null distribution:
t0-tnow comparisons for
randomly selected pairs of
hit lists
Random pair

New results tend to have more sig. terms
12
Mean t0 = 21; tnow = 110.
One point = one hit
list

13
Null (random hit list pairs)
All t0-tnow comparisons
Semantic similarity drops over time
• Overall 47% have
results less similar than
the 95%ile of the null
• Correlation between
similarity and age is -
0.34

Objective changes may conflict with
subjective impressions
14
DNA replication
mitosis
M phase of mitotic cell cycle
DNA modification
biopolymer methylation
methylation
pattern specification process
regulation of gene expression, epigenetic
somatic stem cell population maintenance
stem cell population maintenance
maintenance of cell number
DNA replication initiation
G1/S transition of mitotic cell cycle
cell cycle G1/S phase transition
mitotic nuclear division
gene silencing
cell fate specification
endoderm development
t0
tnow
Example of one hit list as an extreme case: Jaccard similarity = 0.0

Conclusions
Enrichment results change over time, but how
much this matters is difficult to predict
•Use GOtrack to judge for yourself
16

Sanja Rogic
Shreejoy
Tripathy
Lilah Toker
Ogan Mancarci
Marjan Farahbod
Manuel
Belmadani
Alex Morin
Margot Gunning
Eric Chu
Nivi Thatra
Nathaniel Lim
Shams Bhuiyan
Simran Rai
Stepan Tesar
Dima Vavilov
Aman Sharma
Calvin Chang
John Phan
Jimmy Liu
Former members
Min Feng
Ellie Hogan
Sophia Ly
Cindy-Lee Crichlow
Brandon Huntington
Ben Callaghan
Matthew Jacobson
Dmitry Tebaykin
James Liu
Patrick Savage
Brenna Li
Justin Leong
Nikolaus Fortelny
Nathan Holmes
Patrick Tan
Kris Anderson
Rachel Edgar
Elodie Portales-Casamar
Adri Sedeño
Jesse Gillis
Leon French
Carolyn Ch’ng
Meeta Mistry
Raymond Lim
Eloi Mercier
Anton Zoubarev
Cameron McDonald
Thea Van Rossum
Nicolas St. George
Frances Lui
Artemis Lai
Gayathiri
Charathsandran
Luchia Tseng
John Choi
Fangwen Zhao
Jenni Hantula
Tianna Koreman
Olivia Marais
Hugh Brown
Celia Siu
Cathy Kwok
Willie Kwok
Nathan Eveleigh
Collaborators
Kurt Haas
Doug Allen
Tim O’Connor
Cathy Rankin
Chris Loewen
Chris Overall
Shernaz Bamji
Michael Kobor
Geoff Hicks
Suzanne Lewis
Etienne Sibille
Gustavo
Turecki

Recommandé

DNA methylation: from array to sequencingjyotirmoy211

Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta

Cnv and a analysis strategiesElsa von Licy

FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPINGiQHub

Bioinformatic Analysis of Synthetic Lethality in Breast CancerTom Kelly

Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...Ben Laufer

Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky

dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET

Recommandé

DNA methylation: from array to sequencingjyotirmoy211

Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta

Cnv and a analysis strategiesElsa von Licy

FORENSIC EPIGENETICS FOR BODILY FLUID TYPING, SUSPECT AGE, AND PHENOTYPINGiQHub

Bioinformatic Analysis of Synthetic Lethality in Breast CancerTom Kelly

Long-lasting alterations to DNA methylation and ncRNAs could underlie the eff...Ben Laufer

Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky

dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET

10.1.1.80.2149vantinhkhuc

Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle

SPIN Workshop Microbial Genomics @NISTNathan Olson

Specificity and Evolvability in Eukaryotic Protein Interaction Networkspedrobeltrao

Understanding Biological Function in Times of High Throughput and Low OutputIddo

Predicting phenotype from genotype with machine learningPatricia Francis-Lyon

Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...adcobb

SPIN Workshop Microbial Genomics @NISTnist-spin

20160530 journal club_jqoJavier Quílez Oliete

Unifying Genomics, Phenomics, and EnvironmentsAnne Thessen

Genome editing comes of ageJan Hryca

Lecture 7 gwas fullLekki Frazier-Wood

PadminiNarayanan-Intro-2018.pptxDESMONDEZIEKE1

Liangqun ms defense.pptxLiangqun Lu

Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)

Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics

Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...OSUCCC - James

NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist

Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Kate Hertweck

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook

IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong

Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong

Contenu connexe

Similaire à Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their impact on genomic data analysis

10.1.1.80.2149vantinhkhuc

Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle

SPIN Workshop Microbial Genomics @NISTNathan Olson

Specificity and Evolvability in Eukaryotic Protein Interaction Networkspedrobeltrao

Understanding Biological Function in Times of High Throughput and Low OutputIddo

Predicting phenotype from genotype with machine learningPatricia Francis-Lyon

Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...adcobb

SPIN Workshop Microbial Genomics @NISTnist-spin

20160530 journal club_jqoJavier Quílez Oliete

Unifying Genomics, Phenomics, and EnvironmentsAnne Thessen

Genome editing comes of ageJan Hryca

Lecture 7 gwas fullLekki Frazier-Wood

PadminiNarayanan-Intro-2018.pptxDESMONDEZIEKE1

Liangqun ms defense.pptxLiangqun Lu

Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)

Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics

Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...OSUCCC - James

NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist

Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Kate Hertweck

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook

Similaire à Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their impact on genomic data analysis (20)

10.1.1.80.2149

Genome in a bottle april 30 2015 hvp Leiden

SPIN Workshop Microbial Genomics @NIST

Specificity and Evolvability in Eukaryotic Protein Interaction Networks

Understanding Biological Function in Times of High Throughput and Low Output

Predicting phenotype from genotype with machine learning

Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...

SPIN Workshop Microbial Genomics @NIST

20160530 journal club_jqo

Unifying Genomics, Phenomics, and Environments

Genome editing comes of age

Lecture 7 gwas full

PadminiNarayanan-Intro-2018.pptx

Liangqun ms defense.pptx

Identification of pathological mutations from the single-gene case to exome p...

Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.

Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...

NAISTビッグデータシンポジウム - バイオ久保先生

Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017

Plus de GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong

Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong

STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong

Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong

Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong

Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong

PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong

Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong

Hong Kong Open Access & GigaScience: CCHK@10GigaScience, BGI Hong Kong

Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong

Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong

Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong

Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong

Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong

Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong

EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong

Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong

Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong

Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong

Plus de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...

Scott Edmunds: Preparing a data paper for GigaByte

STM Week: Demonstrating bringing publications to life via an End-to-end XML p...

Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...

Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...

Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...

PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...

Democratising biodiversity and genomics research: open and citizen science to...

Hong Kong Open Access & GigaScience: CCHK@10

Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix

Anil Thanki at #ICG13: Aequatus: An open-source homology browser

Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science

Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...

Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...

Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective

EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...

Reproducible method and benchmarking publishing for the data (and evidence) d...

Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...

Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...

Dernier

POGONATUM : morphology, anatomy, reproduction etc.Cherry

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

Clean In Place(CIP).pptx .Poonam Aher Patil

Site specific recombination and transposition.........pdfCherry

Porella : features, morphology, anatomy, reproduction etc.Cherry

Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry

FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime

Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar

Human genetics..........................pptxCherry

GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad

CYTOGENETIC MAP................ ppt.pptxCherry

Genetics and epigenetics of ADHD and comorbid conditionsbassianu17

PODOCARPUS...........................pptxCherry

Genome sequencing,shotgun sequencing.pptxCherry

LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry

Factory Acceptance Test( FAT).pptx .Poonam Aher Patil

Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2

Cot curve, melting temperature, unique and repetitive DNACherry

Use of mutants in understanding seedling development.pptxRenuJangid3

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani

Dernier (20)

POGONATUM : morphology, anatomy, reproduction etc.

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

Clean In Place(CIP).pptx .

Site specific recombination and transposition.........pdf

Porella : features, morphology, anatomy, reproduction etc.

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

FS P2 COMBO MSTA LAST PUSH past exam papers.

Role of AI in seed science Predictive modelling and Beyond.pptx

Human genetics..........................pptx

GBSN - Microbiology (Unit 3)Defense Mechanism of the body

CYTOGENETIC MAP................ ppt.pptx

Genetics and epigenetics of ADHD and comorbid conditions

PODOCARPUS...........................pptx

Genome sequencing,shotgun sequencing.pptx

LUNULARIA -features, morphology, anatomy ,reproduction etc.

Factory Acceptance Test( FAT).pptx .

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

Cot curve, melting temperature, unique and repetitive DNA

Use of mutants in understanding seedling development.pptx

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds

Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their impact on genomic data analysis

1. Monitoring changes in the Gene Ontology and their impact on genomic data analysis Paul Pavlidis, PhD University of British Columbia, Vancouver, BC Canada https://pavlab.msl.ubc.ca October 25, 2018 GigaScience Prize Track

2. 2 Matthew Jacobson Adriana Estela Sedeño-Cortés

3. The Gene Ontology in 60 seconds 3 GO = Hierarchy of >45000 terms describing gene function Applied by annotators to genes with evidence codes (“GO annotations” = GOA) Used in tens of thousands of papers • Gene description • Algorithm evaluation • Enrichment analysis GRIN1

4. Both GO and GOA change over time Does it matter? • Are old enrichment results and other interpretations based on GO still valid? • Will new results be valid in the future? No easy way for researchers to easily evaluate the effects on their own data. 4

5. 5 GOtrack database • Data for 9 model organisms • Dating back to 2001 • Over 200,000,000 data points • Updated monthly Web app functionality Track genes and terms Track enrichment results

6. 6

7. Annotation churn 7 Term present Term absent

8. 8

9. Ingredients for an enrichment analysis 9 Statistica l test

10. Evaluating the effect of GO/GOA changes Inputs: Gene lists from MSigDB • >2500 Chemical and genetic perturbations (CGP) – “hit lists” • 0.5-16 years old (median 11) 10

11. Evaluating the effect on enrichment analysis• Perform enrichment analysis using GO/GOA for the time of publication (t0) to a recent time point (tnow) • Compare the lists of enriched terms at t0 and tnow using semantic similarity measures (Jaccard and others) 11 Define a null distribution: t0-tnow comparisons for randomly selected pairs of hit lists Random pair

12. New results tend to have more sig. terms 12 Mean t0 = 21; tnow = 110. One point = one hit list

13. 13 Null (random hit list pairs) All t0-tnow comparisons Semantic similarity drops over time • Overall 47% have results less similar than the 95%ile of the null • Correlation between similarity and age is - 0.34

14. Objective changes may conflict with subjective impressions 14 DNA replication mitosis M phase of mitotic cell cycle DNA modification biopolymer methylation methylation pattern specification process regulation of gene expression, epigenetic somatic stem cell population maintenance stem cell population maintenance maintenance of cell number DNA replication initiation G1/S transition of mitotic cell cycle cell cycle G1/S phase transition mitotic nuclear division gene silencing cell fate specification endoderm development t0 tnow Example of one hit list as an extreme case: Jaccard similarity = 0.0

15. 15

16. Conclusions Enrichment results change over time, but how much this matters is difficult to predict •Use GOtrack to judge for yourself 16

17. Sanja Rogic Shreejoy Tripathy Lilah Toker Ogan Mancarci Marjan Farahbod Manuel Belmadani Alex Morin Margot Gunning Eric Chu Nivi Thatra Nathaniel Lim Shams Bhuiyan Simran Rai Stepan Tesar Dima Vavilov Aman Sharma Calvin Chang John Phan Jimmy Liu Former members Min Feng Ellie Hogan Sophia Ly Cindy-Lee Crichlow Brandon Huntington Ben Callaghan Matthew Jacobson Dmitry Tebaykin James Liu Patrick Savage Brenna Li Justin Leong Nikolaus Fortelny Nathan Holmes Patrick Tan Kris Anderson Rachel Edgar Elodie Portales-Casamar Adri Sedeño Jesse Gillis Leon French Carolyn Ch’ng Meeta Mistry Raymond Lim Eloi Mercier Anton Zoubarev Cameron McDonald Thea Van Rossum Nicolas St. George Frances Lui Artemis Lai Gayathiri Charathsandran Luchia Tseng John Choi Fangwen Zhao Jenni Hantula Tianna Koreman Olivia Marais Hugh Brown Celia Siu Cathy Kwok Willie Kwok Nathan Eveleigh Collaborators Kurt Haas Doug Allen Tim O’Connor Cathy Rankin Chris Loewen Chris Overall Shernaz Bamji Michael Kobor Geoff Hicks Suzanne Lewis Etienne Sibille Gustavo Turecki

Notes de l'éditeur

Previous work
Previous work only tested small numbers of gene sets and usually over shorter periods of time. Allows us to also look at the effect of time
Stability analysis of 2,573 published hit lists. (A) Change in number of significant GO terms. Each point is one CGP hit list. Points are jittered to reduce overplotting. (B) Similarity of enrichment results, using the complete Jaccard index. The CGP hit lists are binned into most recent (orange), old (green), and oldest (blue). The distribution for the CPs is in black. The blue vertical line indicates the 95%ile of the null.
This is a worst-case scenario for overlap, but completely typical for how the results look. “The other side of this situation is whether objectively low scores (compared to the null) match subjective impressions of “instability”, as well. The answer is yes, but arguably less convincingly. For example, the hit list BENPORATH_ES_2 (Ben-Porath et al., 2008, 40 genes) has a complete Jaccard similarity between t0 and tnow of 0.0. At t0 , the enriched terms included “DNA replication”, “mitosis”, “methylation”, and “epigenetic regulation of gene expression”. While none of these terms are enriched at tnow , highly related terms such as “DNA replication initiation”, “mitotic nuclear division” and “gene silencing” are enriched (Supplementary File 2).” Hit list from “An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors”