SlideShare une entreprise Scribd logo
1  sur  1
Games for gene annotation and phenotype classification
                                                   Andrew I. Su, Salvatore Loguercio, Benjamin M. Good
 Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA


                                                            ABSTRACT                                                                           Game 3: The Cure
The Empire State Building was built with 7 million hours of human effort. The Panama                                                                                               make predictions on
                                                                                                                    The Challenge               cancer   normal
Canal took 20 million hours to complete. By comparison, it is estimated that up to 150                                                                                             new samples
billion hours are spent playing games every year (9 billion on Solitaire alone). Obviously
people play games because they are enjoyable and fun. But aside from that enjoyment,                     •     With tens of thousands of
                                                                                                                                                                           find patterns         cancer
games largely result in no tangible benefit, neither to the individual nor to society at                       measurements but only
large.                                                                                                         hundreds of samples,
  Recently, several groups have built “games with a purpose”, a class of games that                            many possible patterns are                                                         normal
focuses on collaboratively harnessing gamers for productive ends. In biology, games have                       found.
been built to fold proteins and RNAs, and to perform multiple sequence alignment. Here,                  •     But which ones are real?
we present our efforts to apply games to two critical challenges in genetics.
  First, we have built games focused on organizing and structuring gene annotations. With
the increasing popularity of genome-scale science, many analysis strategies (including                   • Prior knowledge encoded in databases has been used to improve classifiers by
gene set enrichment, pathway analysis, and cross-species comparisons) depend on                            guiding the search predictive gene sets [3]
comprehensive and accurate gene annotations. These structured annotations are mostly                     • What about knowledge that is not recorded in structured databases?
the result of centralized manual curation efforts, but these initiatives do not scale well               • The Cure is designed to motivate and enable people to help improve the feature
with the explosive growth of the biomedical literature. We describe several games that                     selection step for predictor inference.
target working biologists to extract their expert domain knowledge in computable form.
                                                                                                                                                                             http://genegames.org/cure/
  Second, we describe a game for predicting human phenotypes from molecular
descriptors. Researchers can now relatively easily characterize any biological sample                             The Game                                                                 Gene info. provided from
                                                                                                                                                                                           Gene Ontology, Gene Rifs.
according to a number of features, including genotype, gene expression, and epigenetics.                 • Goal: pick the best set                                                         Search box highlights genes
A key challenge in the field is identifying exactly which of those molecular features can be               of genes.                                                                       with annotation match
used to predict a clinical phenotype like disease susceptibility or adverse drug events.                 • Best: the gene set that
While statistical classifiers have been applied to this challenge, they typically do not                   produces the best
incorporate prior biological knowledge, and they often fail to replicate in external test                  decision tree classifier of
populations. Here, we present results from the „The Cure‟ a game to help identify                          breast cancer prognosis.
biomarker gene sets that can be used to improve predictions of breast cancer prognosis                   • Classifier: created using
based on gene expression.                                                                                  training data and
                                    Play these games now!!! at: http://genegames.org                       selected genes, used to
                                                                                                           predict phenotype.
                                                          Game 1: Dizeez                                 • Score: cross-validation
                                                                                                           performance of decision
• Purpose: identify new gene-disease links                                                                 tree using selected
• Rules:                                                                                                   genes and training data.

  • Select biological area (e.g. ‘cancer’) to start game.
                                                                                                                                                                                Decision trees built
  • Given a gene, guess the related disease.                                                                     Your current ‘hand’.                                           automatically using
                                                                                                                 round ends at 5 cards                                          genes in player’s
  • Points are awarded for correct guesses within one                                                                                                                           hands
     minute.
  • ‘Correct’ answers drawn from text mining
                                                                                                                                                   RESULTS
• Data:                                                                                                 • 214 Players registered (125 in 1st                                                 • Clinical data
  • When several different players suggest the same                                                       week): 40% have a PhD.                                                               (Age, etc.)
  ‘incorrect’ gene-disease link, we detect a new candidate
  gene annotation.
                                                           DIzeez Results                                                                                            • Predictor scored 69% correct on
                                                                                                                                                                       Sage Breast Cancer Prognosis
  • Time frame: 2 months                                                                                                                                               Challenge test set. [4]
                                                                                                                                                                     • (Best of all submitted predictors
  • Unique players: 230
                                                                                                                                                                       scored 72%)
  • Games played: 1045                                                                                                                                               • Awaiting results on external
  • Guesses collected: 8,525                                                                            • 3,954 games played in 47 days
                                                                                                                                                                       validation set.
  • Unique gene-disease pairs: 6,941                                                                                                             Genes selected at
  • Guesses that match existing annotation:                                                                                                      highest frequency
    4804 (69%)
  • For 14 novel gene-disease pairs guessed                                                                                                      REFERENCES
    by >3 players, 9 (64%) were validated by                                                            1. Salvatore Loguercio, Benjamin M. Good, Andrew I. Su (2012) Dizeez: an online game for
    a literature search                                                                                    human gene-disease annotation. In: Bio-Ontologies SIG, ISMB: 15 July 2011, Vienna.
  • Player consensus correlates with probability of validation [1]                                         http://bio-ontologies.knowledgeblog.org/438
                                                                                                        2. Luis Von Ahn and Laura Dabbish (2004) Labeling images with a computer game. In:
                                                          Game 2: GenESP                                   Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
                                                                                                        3. Janus Dutkowski and Trey Ideker (2011) Protein Networks as Logic Functions in
                                                                                                           Development and Cancer. PLoS Computational Biology
  • Direct reward for
                                                                                                        4. Sage bionetworks: DREAM7 Breast Cancer Prognosis Challenge. http://www.the-dream-
    consensus formation
                                                                                                           project.org/challenges/sage-bionetworks-dream-breast-cancer-prognosis-challenge
  • Multiplayer
  • Open-ended
                                                                                                                                         Contact and Acknowledgements
  • Tested pattern [2]
                                                                                                             Benjamin Good: bgood@scripps.edu @bgood , Andrew Su: asu@scripps.edu @andrew.su
  • Work in Progress
                                                                             Guess what genes your           We acknowledge support from the National Institute of General Medical Sciences
                                                                            partner is thinking about
                                                                                 when they see
                                                                                                             (GM089820 and GM083924) and the NIH through the FaceBase Consortium for a particular
                                                                                ‘neuroblastoma’              emphasis on craniofacial genes (DE-20057).

      RESEARCH POSTER PRESENTATION DESIGN © 2012

      www.PosterPresentations.com

Contenu connexe

Plus de Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Benjamin Good
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative SpiritBenjamin Good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 

Plus de Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 

ASHG poster - Games for gene annotation and phenotype classification

  • 1. Games for gene annotation and phenotype classification Andrew I. Su, Salvatore Loguercio, Benjamin M. Good Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA ABSTRACT Game 3: The Cure The Empire State Building was built with 7 million hours of human effort. The Panama make predictions on The Challenge cancer normal Canal took 20 million hours to complete. By comparison, it is estimated that up to 150 new samples billion hours are spent playing games every year (9 billion on Solitaire alone). Obviously people play games because they are enjoyable and fun. But aside from that enjoyment, • With tens of thousands of find patterns cancer games largely result in no tangible benefit, neither to the individual nor to society at measurements but only large. hundreds of samples, Recently, several groups have built “games with a purpose”, a class of games that many possible patterns are normal focuses on collaboratively harnessing gamers for productive ends. In biology, games have found. been built to fold proteins and RNAs, and to perform multiple sequence alignment. Here, • But which ones are real? we present our efforts to apply games to two critical challenges in genetics. First, we have built games focused on organizing and structuring gene annotations. With the increasing popularity of genome-scale science, many analysis strategies (including • Prior knowledge encoded in databases has been used to improve classifiers by gene set enrichment, pathway analysis, and cross-species comparisons) depend on guiding the search predictive gene sets [3] comprehensive and accurate gene annotations. These structured annotations are mostly • What about knowledge that is not recorded in structured databases? the result of centralized manual curation efforts, but these initiatives do not scale well • The Cure is designed to motivate and enable people to help improve the feature with the explosive growth of the biomedical literature. We describe several games that selection step for predictor inference. target working biologists to extract their expert domain knowledge in computable form. http://genegames.org/cure/ Second, we describe a game for predicting human phenotypes from molecular descriptors. Researchers can now relatively easily characterize any biological sample The Game Gene info. provided from Gene Ontology, Gene Rifs. according to a number of features, including genotype, gene expression, and epigenetics. • Goal: pick the best set Search box highlights genes A key challenge in the field is identifying exactly which of those molecular features can be of genes. with annotation match used to predict a clinical phenotype like disease susceptibility or adverse drug events. • Best: the gene set that While statistical classifiers have been applied to this challenge, they typically do not produces the best incorporate prior biological knowledge, and they often fail to replicate in external test decision tree classifier of populations. Here, we present results from the „The Cure‟ a game to help identify breast cancer prognosis. biomarker gene sets that can be used to improve predictions of breast cancer prognosis • Classifier: created using based on gene expression. training data and Play these games now!!! at: http://genegames.org selected genes, used to predict phenotype. Game 1: Dizeez • Score: cross-validation performance of decision • Purpose: identify new gene-disease links tree using selected • Rules: genes and training data. • Select biological area (e.g. ‘cancer’) to start game. Decision trees built • Given a gene, guess the related disease. Your current ‘hand’. automatically using round ends at 5 cards genes in player’s • Points are awarded for correct guesses within one hands minute. • ‘Correct’ answers drawn from text mining RESULTS • Data: • 214 Players registered (125 in 1st • Clinical data • When several different players suggest the same week): 40% have a PhD. (Age, etc.) ‘incorrect’ gene-disease link, we detect a new candidate gene annotation. DIzeez Results • Predictor scored 69% correct on Sage Breast Cancer Prognosis • Time frame: 2 months Challenge test set. [4] • (Best of all submitted predictors • Unique players: 230 scored 72%) • Games played: 1045 • Awaiting results on external • Guesses collected: 8,525 • 3,954 games played in 47 days validation set. • Unique gene-disease pairs: 6,941 Genes selected at • Guesses that match existing annotation: highest frequency 4804 (69%) • For 14 novel gene-disease pairs guessed REFERENCES by >3 players, 9 (64%) were validated by 1. Salvatore Loguercio, Benjamin M. Good, Andrew I. Su (2012) Dizeez: an online game for a literature search human gene-disease annotation. In: Bio-Ontologies SIG, ISMB: 15 July 2011, Vienna. • Player consensus correlates with probability of validation [1] http://bio-ontologies.knowledgeblog.org/438 2. Luis Von Ahn and Laura Dabbish (2004) Labeling images with a computer game. In: Game 2: GenESP Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 3. Janus Dutkowski and Trey Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS Computational Biology • Direct reward for 4. Sage bionetworks: DREAM7 Breast Cancer Prognosis Challenge. http://www.the-dream- consensus formation project.org/challenges/sage-bionetworks-dream-breast-cancer-prognosis-challenge • Multiplayer • Open-ended Contact and Acknowledgements • Tested pattern [2] Benjamin Good: bgood@scripps.edu @bgood , Andrew Su: asu@scripps.edu @andrew.su • Work in Progress Guess what genes your We acknowledge support from the National Institute of General Medical Sciences partner is thinking about when they see (GM089820 and GM083924) and the NIH through the FaceBase Consortium for a particular ‘neuroblastoma’ emphasis on craniofacial genes (DE-20057). RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com