SlideShare une entreprise Scribd logo
1  sur  29
The Gene Wiki: Crowdsourcing human gene
               annotation

                    Andrew Su, Ph.D.
      Department of Molecular and Experimental Medicine
               The Scripps Research Institute

                    Biocuration 2012

                       April 2, 2012
2
The Long Tail is a prolific source of content


                      Short
                      Head
            Content
           produced


                                      Long Tail



                              Contributors (sorted)




            News :      Newspapers                 Blogs
             Video:     TV/Hollywood              YouTube
  Product reviews:    Consumer reports         Amazon reviews
    Food reviews:        Food critics               Yelp
    Talent judging:       Olympics              American Idol
  Gene annotation:     Manual curation           Gene Wiki
3




  We can harness the
Long Tail of scientists
to directly participate in
  the gene annotation
        process.
4
Wikipedia is reasonably accurate
5
Wikipedia has breadth and depth



           Articles




            Words
            (millions)



                         Wikipedia       Britannica
                                          Online




                                     http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Filtering, extracting, and summarizing PubMed



Documents




 Concepts
7
Wiki success depends on a positive feedback

                  Gene wiki page utility




                             1   100
                         2             200




    Number of                                Number of
   contributors                                users
8
 10,000 gene “stubs” within Wikipedia          Utility




                                                         Users

                                        Contributors



                                         Protein structure
    Gene
  summary
                                          Symbols and
                                           identifiers


                                         Gene Ontology
                                          annotations
   Protein
interactions

                                        Tissue expression
  Linked                                     pattern
references

                                         Links to structured
                                             databases



Huss, PLoS Biol, 2008
9
 Gene Wiki has a critical mass of readers
                                                                      Utility




                                                                                Users
                                                               Contributors
                                         Total: ~4.3 million
                                           views / month




Huss, PLoS Biol, 2008; Good, NAR, 2011
10
 Gene Wiki has a critical mass of editors
                                                                                  Utility



                  ~10,000 words added / month
                                                                                            Users
                                                                            Contributors
                         Total 1.42 million words
                            ≈ 230 full-length articles


                    4.3 million views / month


                                                         Cumulative edits
                                                                                  Productive
                                                                                     edits
                             1000 edits / month



                                                                                Vandalism


Good, NAR, 2011
11
A review article for every gene is powerful




      Reelin: 68 editors, 543 edits since July 2002
      Heparin: 175 editors, 320 edits since June 2003
      AMPK: 44 editors, 84 edits since March 2004
      RNAi: 232 editors, 708 edits since October 2002
                                          References to the literature
         Hyperlinks to related concepts
12
Making the Gene Wiki more computable



Free text       Structured annotations
13
Filling the gaps in gene annotation

                                   NCBI Entrez Gene: 3362



                       Gene Wiki
                       mapping


          Wikilink                    Candidate
                                      assertion

                                   GO:0004993



                       GO exact
                       synonym
14
Filling the gaps in gene annotation

                                   NCBI Entrez Gene: 334



                       Gene Wiki
                       mapping


          Wikilink                    Candidate
                                      assertion

                                   GO:0006897



                       GO exact
                        match
Disease associations mined from the Gene Wiki
                                        Good, BMC Genomics 2011, 12:603


  Gene Wiki Articles
      (10,271)                               23% exact
                                               match


      Filter out                                    5% match
     seeded text                                     parent
                                                    2% match
                                                      child
                       70% have
       NCBO
                       no match
     Annotator



  Matched Disease                        2147
                        Compare to
  Ontology terms                       candidate
                        DO database
      (2983)                          annotations
Disease associations mined from the Gene Wiki
                                             Good, BMC Genomics 2011, 12:603




                        Expert curation




                                          Correct
       Incorrect: 10%            86%

           Maybe: 4%                       Overall specificity: 90-93%
GO associations mined from the Gene Wiki
                                        Good, BMC Genomics 2011, 12:603


  Gene Wiki Articles
      (10,271)                                 17% exact
                                                 match


      Filter out
     seeded text                                     26% match
                                                       parent



                       55% have
       NCBO            no match
     Annotator                               2% match
                                               child


   Matched Gene                          6319
                        Compare to
   Ontology terms                      candidate
                        GO database
      (11,022)                        annotations
GO associations mined from the Gene Wiki
                                              Good, BMC Genomics 2011, 12:603




                      Expert curation



                                    Correct

                              14%
                                        Maybe
                       60%      26%
          Incorrect
                                         Overall specificity: 48-64%
19
Common sources of error in GO associations
                                                        Good, BMC Genomics 2011, 12:603



       1) Incorrect concept recognition
            OR2F1: “Olfactory receptors … are
            responsible for the recognition and G protein-
            mediated transduction of odorant signals.”

 Signal transduction (GO:0007165)          Transduction (GO:0009293)
 The cellular process in which a signal    The transfer of genetic information to a
 is conveyed to trigger a change in the    bacterium from a bacteriophage or
 activity or state of a cell. Signal       between bacterial or yeast cells
 transduction begins with reception of a   mediated by a phage vector.
 signal, e.g. a ligand binding to a
 receptor or receptor activation by a
 stimulus such as light, and ends with
 regulation of a downstream cellular
 process…
20
Common sources of error in GO associations
                                         Good, BMC Genomics 2011, 12:603



    2) Incorrect sentence context
        MEF2C: “Several post translational
        modifications have been identified including
        phosphorylation on serine-59 …”

                                          Dephosphorylation
                                          Excretion
                    Phosporylation        Gene expression
                                          Glycosylation
                                          Localization
       MEF2C         Neurogenesis         Methylation
                                          Proteolysis
                                          Secretion
                                          Transport
      Myelination                         Transcription
                                          Translation
21
Novel GO annotations – so what?




                 6319
  11,022                                 ~100,000
                “novel”    4703 (43%)
annotations                             annotations
              annotations match known
mined from                               from GO
              @ 48-64% annotations
 Gene Wiki                              consortium
               specificity
22
Gene Wiki content improves enrichment analysis
    axon                                            Enrichment
  guidance     GO term
                                                     analysis
(GO:0007411)

                                     811 articles

 264 genes                           PubMed          Concept
               Gene list
                                     abstracts      recognition




                     GO:0007411
                      Yes    No
Linked genes   Yes     13        2
   through
               No     251   12033
   PubMed

                 P = 1.55 E-20
23
Gene Wiki content improves enrichment analysis
   muscle                                          Enrichment
 contraction   GO term
                                                    analysis
(GO:0006936)

                                 251 articles

  87 genes                      PubMed              Concept
               Gene list
                                abstracts          recognition
                                     +
                                Gene Wiki
                                 87 articles
                   GO:0006936                     GO:0006936


Linked genes                       Linked genes
   through                            through
   PubMed                            PubMed +
                                     Gene Wiki
                   P = 1.0                        P = 1.22 E-09
24
Gene Wiki content improves enrichment analysis



                     More
    p-value       significant
(PubMed + GW)    PubMed only

                                                  Muscle
                                                contraction



                                     More
                                  significant
                                 PubMed + GW




                   p-value (PubMed only)
25
Challenges and future directions


   • How to complement and integrate with
     traditional biocuration workflows?
   • How to disseminate and utilize
     crowdsourced annotations?
26




          The
 Long Tail of scientists
is a valuable source of
  information on gene
        function
27
       Collaborators                                                  Group members
Doug Howe, ZFIN                                             Erik Clarke       Ian Macleod
John Hogenesch, U Penn
Jon Huss, GNF
                                                            Ben Good (*)      Chunlei Wu
Luca de Alfaro, UCSC                                        Salvatore Loguercio
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush                                  See poster # 30 for more on
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern                      the Gene Wiki and
Many Wikipedia editors                                  crowdsourcing in biology!
    WP:MCB Project



                                                                                     Contact
                                                                                 http://sulab.org
                                                                                asu@scripps.edu
                                                                                  @andrewsu
                                                                                  +Andrew Su



                                        Funding and Support



                                   (BioGPS: GM83924, Gene Wiki: GM089820)
28
Making the Gene Wiki more reliable
  Novartis is a multinational   2       The company name is derived
  pharmaceutical company                 from old Greek, and means
 based in Basel, Switzerland                 "destroyer of birds".
that manufactures drugs such
         as clozapine
     (Clozaril), diclofenac
         (Voltaren), …

                                    2
29
Making the Gene Wiki more reliable
  Novartis is a multinational         2         The company name is derived
  pharmaceutical company                         from old Greek, and means
 based in Basel, Switzerland                         "destroyer of birds".
that manufactures drugs such
   as clozapine (Clozaril),
   diclofenac (Voltaren), …




              36211 total edits              36 total edits

                                  *                                          *
                                  *
                                  *
                                  *                                          *
                                  *
                                  *                                          *
                                  *
                                  *                                          *
                                  *                                          *

          High-trust author               Low-trust author
                                                      http://www.wikitrust.net/

Contenu connexe

Similaire à ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Andrew Su
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Updatedolleyj
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012anewgene
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing SymposiumAndrew Su
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10Dan Bolser
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your appsIRIDA_community
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals Cyndy Parr
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 

Similaire à ISB2012: The Gene Wiki: Crowdsourcing human gene annotation (20)

NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Wikis at work
Wikis at workWikis at work
Wikis at work
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 

Plus de Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)Andrew Su
 

Plus de Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 

Dernier

VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfMedicoseAcademics
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 

Dernier (20)

VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 

ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

  • 1. The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. Department of Molecular and Experimental Medicine The Scripps Research Institute Biocuration 2012 April 2, 2012
  • 2. 2 The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
  • 3. 3 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
  • 5. 5 Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  • 6. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 7. 7 Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
  • 8. 8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression Linked pattern references Links to structured databases Huss, PLoS Biol, 2008
  • 9. 9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / month Huss, PLoS Biol, 2008; Good, NAR, 2011
  • 10. 10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month Vandalism Good, NAR, 2011
  • 11. 11 A review article for every gene is powerful Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
  • 12. 12 Making the Gene Wiki more computable Free text Structured annotations
  • 13. 13 Filling the gaps in gene annotation NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym
  • 14. 14 Filling the gaps in gene annotation NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match
  • 15. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 23% exact match Filter out 5% match seeded text parent 2% match child 70% have NCBO no match Annotator Matched Disease 2147 Compare to Ontology terms candidate DO database (2983) annotations
  • 16. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct Incorrect: 10% 86% Maybe: 4% Overall specificity: 90-93%
  • 17. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent 55% have NCBO no match Annotator 2% match child Matched Gene 6319 Compare to Ontology terms candidate GO database (11,022) annotations
  • 18. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct 14% Maybe 60% 26% Incorrect Overall specificity: 48-64%
  • 19. 19 Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein- mediated transduction of odorant signals.” Signal transduction (GO:0007165) Transduction (GO:0009293) The cellular process in which a signal The transfer of genetic information to a is conveyed to trigger a change in the bacterium from a bacteriophage or activity or state of a cell. Signal between bacterial or yeast cells transduction begins with reception of a mediated by a phage vector. signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
  • 20. 20 Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 2) Incorrect sentence context MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” Dephosphorylation Excretion Phosporylation Gene expression Glycosylation Localization MEF2C Neurogenesis Methylation Proteolysis Secretion Transport Myelination Transcription Translation
  • 21. 21 Novel GO annotations – so what? 6319 11,022 ~100,000 “novel” 4703 (43%) annotations annotations annotations match known mined from from GO @ 48-64% annotations Gene Wiki consortium specificity
  • 22. 22 Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis (GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes No Linked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
  • 23. 23 Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis (GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  • 24. 24 Gene Wiki content improves enrichment analysis More p-value significant (PubMed + GW) PubMed only Muscle contraction More significant PubMed + GW p-value (PubMed only)
  • 25. 25 Challenges and future directions • How to complement and integrate with traditional biocuration workflows? • How to disseminate and utilize crowdsourced annotations?
  • 26. 26 The Long Tail of scientists is a valuable source of information on gene function
  • 27. 27 Collaborators Group members Doug Howe, ZFIN Erik Clarke Ian Macleod John Hogenesch, U Penn Jon Huss, GNF Ben Good (*) Chunlei Wu Luca de Alfaro, UCSC Salvatore Loguercio Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush See poster # 30 for more on Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern the Gene Wiki and Many Wikipedia editors crowdsourcing in biology! WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)
  • 28. 28 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2
  • 29. 29 Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds". that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/

Notes de l'éditeur

  1. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  2. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  3. Transduction accounts for 70% of the concept recognition problems
  4. Tried on 773 GO categories, significant in 356 cases (46%)
  5. We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
  6. We started working with Doug Howe because he helped us learn a lot about biocuration, but clearly we’d need to expand partnersIn particular, since GO curation seems to be largely drawn by organisms
  7. Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  8. Reverted four minutes later
  9. Reverted four minutes later