SlideShare a Scribd company logo
1 of 17
University of Waterloo
         MultiText for Genomics


 Task-Specific Query
Expansion for Genomics
 (MultiText Experiments for TREC 2003)


               David L. Yeung
            University of Waterloo,
           Waterloo, Ontario, Canada

                Nov. 20, 2003



                                                                          1/17

              TREC 2003 Genomics Track: University of Waterloo MultiText Project
The MultiText Project

•What    is MultiText?
 •   A collection of IR tools developed at U of Waterloo.


What is MultiText for Genomics?
 •   Based on MultiText.
 •   No external databases or domain-specific knowledge.
 •   A combination of techniques...


                                                                                       2/17

                           TREC 2003 Genomics Track: University of Waterloo MultiText Project
MultiText for Genomics
  •What   is MultiText for Genomics?


           Query Formulation
                (Okapi)

                                                Feedback
Topic                                                                       Documents
                                            (Query expansion)

             Query Tiering
              (metadata)



                                                                                         3/17

                             TREC 2003 Genomics Track: University of Waterloo MultiText Project
Query Formulation (Okapi)
•Two          interesting facts:                               Query Formulation
 •   Gene name type didn't matter                                   (Okapi)

 •   Spacing and punctuation affected performance
•Example            (training topic 5):
     •   glycine receptor, alpha 1
     •   Glycine-receptor, alpha1
     •   Alpha 1 Glycine Receptor
     •   glycine receptors... alpha receptor... alpha 1
         •   And so on...




                                                                                             4/17

                                 TREC 2003 Genomics Track: University of Waterloo MultiText Project
Okapi Search Term Sets
•Generate            multiple search term sets:
 •   Okapi 1 (higher precision, lower recall)
     •   Treat gene names as phrases, except for punctuation.
         •   “glycine_receptor_alpha_1”
 •   Okapi 2
     •   Heuristics for guessing role of punctuation; also guess plurals.
 •   Okapi 3 (lower precision, higher recall)
     •   All pairs of tokens from gene names (bigrams).
         •   “glycine[_]receptor”, “receptor[_]alpha”, “alpha[_]1”, etc.
 •   Okapi Fusion
     •   Take the product of the 3 scores.
                                                                                                  5/17

                                      TREC 2003 Genomics Track: University of Waterloo MultiText Project
Results of Okapi Experiments
                        Mean Average Precision (MAP)

                                                                       Okapi 1
Training
                                                                       Okapi 2
                                                                       Okapi 3
    Test
                                                                       Okapi Fusion

           0    0.05   0.1   0.15    0.2     0.25     0.3     0.35


• Two interesting points:
   • The trend in MAP is reversed between the training and test data.
   • Recall (from most to least): Okapi Fusion/Okapi 3, Okapi 2, Okapi 1.




                                                                                          6/17

                              TREC 2003 Genomics Track: University of Waterloo MultiText Project
MultiText for Genomics
  •Next:   Query Tiering


            Query Formulation
                 (Okapi)

                                                 Feedback
Topic                                                                        Documents
                                             (Query expansion)

              Query Tiering
               (metadata)



                                                                                          7/17

                              TREC 2003 Genomics Track: University of Waterloo MultiText Project
Query Tiering (metadata)
•Use   metadata tags in data:
   (“<TagName>”..“</TagName>”) > “search_terms”


•Order   them by correlation to relevance:
                                  chemical list (RN)

                      Relevance   title (TI)

                                  abstract (AB)

                                     MeSH headings (MH)

                                            PubMed ID (PMID)...
                                                                                      8/17

                          TREC 2003 Genomics Track: University of Waterloo MultiText Project
The Query Tiers
•6   Query Tiers:                                                 Query Tiering
                                                                   (metadata)
 •   Tier 1:
     •   Almost exact match in the “chemical list” metadata field.
     •   “glycine receptor, alpha 1” → “glycine receptor alpha1”
 •   Tier 2:
     •   As above, but allow for additional terms.
     •   “RAC1” → “rac1 GTP-Binding Protein”
 •   Tier 3:
     •   Gene name is weakened until a match is made.
     •   “estrogen receptor 1” → “Receptors, Estrogen”

                                                                                            9/17

                                TREC 2003 Genomics Track: University of Waterloo MultiText Project
The Query Tiers
•6   Query Tiers (continued):
 •   Tier 4:
     •   Boolean expression in the “title” metadata field.
     •   “tyrosyl-tRNA synthetase” → “tyrosyl”^“trna”^“synthetase”
 •   Tier 5:
     •   Boolean expression in the “chemical list” metadata field.
 •   Tier 6:
     •   Boolean expression in the “abstract” metadata field.




                                                                                           10/17

                                TREC 2003 Genomics Track: University of Waterloo MultiText Project
Using the Query Tiers

•Can       retrieve documents using:
 •   All Tiers (AT)
     •   The tiers are executed in order.
 •   Best Tier (BT)
     •   Once a tier has retrieved non-zero documents, ignore the rest.



               ... then fuse with results of Okapi experiment.


                                                                                           11/17

                                TREC 2003 Genomics Track: University of Waterloo MultiText Project
Using the Query Tiers
                   Query Formulation
                        (Okapi)

                                                         Feedback
Topic                                                                                Documents
                                                     (Query expansion)

                      Query Tiering
                       (metadata)


   •Fusing           with Okapi:
        •   Rank Fusion (-R)
            •   Document's score based on weighted sum of (reverse) rank.

                                                                                                 12/17

                                      TREC 2003 Genomics Track: University of Waterloo MultiText Project
MultiText for Genomics
  •Next:   Feedback


           Query Formulation
                (Okapi)

                                                Feedback
Topic                                                                       Documents
                                            (Query expansion)

             Query Tiering
              (metadata)



                                                                                        13/17

                             TREC 2003 Genomics Track: University of Waterloo MultiText Project
Feedback (Query expansion)
•Learn            “most relevant” chemical:                                  Feedback
 •   Using pseudo-relevance feedback                                     (Query expansion)

 •   Only if document not matched in Tier 1
 •   Assign score to chemicals using Tf-Idf scoring scheme
                                                                                                         α
•Example                (training topic 27):                                                  N 
                                                                         w i = R i ×  log
                                                                                     
                                                                                              
                                                                                               f 
                                                                                                  
                                                                                                    
 •   cholinergic receptor, muscarinic 3                                                       i 
     •   Receptors, Muscarinic (29880.980020675546)
     •   Muscarinic Antagonists (20430.84754342255)
     •   muscarinic receptor M2 (13976.522895229124)
     •   muscarinic receptor M3 (11159.997636110056)
     •   Carbachol (11101.760218985524)
         •   ... etc.
                                                                                                     14/17

                                          TREC 2003 Genomics Track: University of Waterloo MultiText Project
Complete MTG System - Runs
        Query Formulation
             (Okapi)

                                                  Feedback
Topic                                                                         Documents
                                              (Query expansion)

          Query Tiering
           (metadata)

         •Complete   runs: Okapi Fusion, ATR, BTR, ATRF, BTRF
          •   Fusion with Okapi: Rank Fusion (-R)
          •   Query Tiering: All Tiers (AT), Best Tier (BT)
          •   Feedback: (-F)

                                                                                          15/17

                               TREC 2003 Genomics Track: University of Waterloo MultiText Project
Complete MTG System -
                 Results
                              Mean Average Precision (MAP)

Training                                                                     Okapi Fusion
                                                                             ATR
    Test                                                                     BTR
                                                                             ATRF*
             0   0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5                BTRF*

     •Complete      runs: Okapi Fusion, ATR, BTR, ATRF*, BTRF*
         •   Fusion with Okapi: Rank Fusion (-R)
         •   Query Tiering: All Tiers (AT), Best Tier (BT)
         •   Feedback: (-F)
     •                                    * denotes an official submission

                                                                                               16/17

                                    TREC 2003 Genomics Track: University of Waterloo MultiText Project
Conclusions
•MultiTextsupports a variety of standard and
 non-standard techniques:
 •   Okapi BM25 implementation
 •   Query Tiering and Fusion
 •   Pseudo-relevance Feedback

•Possible
        to improve performance in genomics
 domain even without domain-specific knowledge:
 •   Characteristics of corpus (SSR, metadata)
 •   Merging results of multiple independent methods

         •For   more information, please see our paper!
                                                                                      17/17

                           TREC 2003 Genomics Track: University of Waterloo MultiText Project

More Related Content

Viewers also liked

Presentación ITgallery Demoday INCUBE
Presentación ITgallery Demoday INCUBEPresentación ITgallery Demoday INCUBE
Presentación ITgallery Demoday INCUBEAbián R. Zaya
 
Blog, Wiki, Or Teacher Web
Blog, Wiki, Or Teacher WebBlog, Wiki, Or Teacher Web
Blog, Wiki, Or Teacher WebNicole
 
Knight Center Analytics
Knight Center AnalyticsKnight Center Analytics
Knight Center AnalyticsQing Ye
 
ITgallery - Presentación SPEGC para el 3er Programa de Aceleración
ITgallery - Presentación SPEGC para el 3er Programa de AceleraciónITgallery - Presentación SPEGC para el 3er Programa de Aceleración
ITgallery - Presentación SPEGC para el 3er Programa de AceleraciónAbián R. Zaya
 
Product Photography Catalog
Product Photography CatalogProduct Photography Catalog
Product Photography Catalogmarkojokic
 
Our Digital Natives Presentation
Our Digital Natives PresentationOur Digital Natives Presentation
Our Digital Natives Presentationguest3584c6c
 
Knight Center
Knight CenterKnight Center
Knight CenterQing Ye
 
UTPA prez slides
UTPA prez slidesUTPA prez slides
UTPA prez slidesQing Ye
 
Sample Test Slide Ppt
Sample Test Slide PptSample Test Slide Ppt
Sample Test Slide Pptguest88f951e
 
Presentación ITgallery en Summa Art Fair (Madrid)
Presentación ITgallery en Summa Art Fair (Madrid)Presentación ITgallery en Summa Art Fair (Madrid)
Presentación ITgallery en Summa Art Fair (Madrid)Abián R. Zaya
 
Quantum Algorithms for Evaluating MIN-MAX Trees
Quantum Algorithms for Evaluating MIN-MAX TreesQuantum Algorithms for Evaluating MIN-MAX Trees
Quantum Algorithms for Evaluating MIN-MAX TreesDavid Yonge-Mallo
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)David Yonge-Mallo
 
Famous Hispanic Americans Powerpoint
Famous Hispanic Americans PowerpointFamous Hispanic Americans Powerpoint
Famous Hispanic Americans Powerpoint007aud
 
Dramatic Irony
Dramatic IronyDramatic Irony
Dramatic Irony007aud
 
Revised Blooms Taxonomy
Revised Blooms TaxonomyRevised Blooms Taxonomy
Revised Blooms Taxonomy007aud
 
Design improvement-of-the-existing-car-jack-design---24-pages
Design improvement-of-the-existing-car-jack-design---24-pagesDesign improvement-of-the-existing-car-jack-design---24-pages
Design improvement-of-the-existing-car-jack-design---24-pagesJorge Flórido
 
Ideas sobre Seguridad y Salud
Ideas sobre Seguridad y SaludIdeas sobre Seguridad y Salud
Ideas sobre Seguridad y Saludchele00
 

Viewers also liked (19)

XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
 
Presentación ITgallery Demoday INCUBE
Presentación ITgallery Demoday INCUBEPresentación ITgallery Demoday INCUBE
Presentación ITgallery Demoday INCUBE
 
Blog, Wiki, Or Teacher Web
Blog, Wiki, Or Teacher WebBlog, Wiki, Or Teacher Web
Blog, Wiki, Or Teacher Web
 
Knight Center Analytics
Knight Center AnalyticsKnight Center Analytics
Knight Center Analytics
 
ITgallery - Presentación SPEGC para el 3er Programa de Aceleración
ITgallery - Presentación SPEGC para el 3er Programa de AceleraciónITgallery - Presentación SPEGC para el 3er Programa de Aceleración
ITgallery - Presentación SPEGC para el 3er Programa de Aceleración
 
Product Photography Catalog
Product Photography CatalogProduct Photography Catalog
Product Photography Catalog
 
Our Digital Natives Presentation
Our Digital Natives PresentationOur Digital Natives Presentation
Our Digital Natives Presentation
 
XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
XV SEMINARIO INTERNACIONAL DE SALUD ALIMENTACIÓN Y NUTRICIÓN HUMANA”
 
Knight Center
Knight CenterKnight Center
Knight Center
 
UTPA prez slides
UTPA prez slidesUTPA prez slides
UTPA prez slides
 
Sample Test Slide Ppt
Sample Test Slide PptSample Test Slide Ppt
Sample Test Slide Ppt
 
Presentación ITgallery en Summa Art Fair (Madrid)
Presentación ITgallery en Summa Art Fair (Madrid)Presentación ITgallery en Summa Art Fair (Madrid)
Presentación ITgallery en Summa Art Fair (Madrid)
 
Quantum Algorithms for Evaluating MIN-MAX Trees
Quantum Algorithms for Evaluating MIN-MAX TreesQuantum Algorithms for Evaluating MIN-MAX Trees
Quantum Algorithms for Evaluating MIN-MAX Trees
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)
 
Famous Hispanic Americans Powerpoint
Famous Hispanic Americans PowerpointFamous Hispanic Americans Powerpoint
Famous Hispanic Americans Powerpoint
 
Dramatic Irony
Dramatic IronyDramatic Irony
Dramatic Irony
 
Revised Blooms Taxonomy
Revised Blooms TaxonomyRevised Blooms Taxonomy
Revised Blooms Taxonomy
 
Design improvement-of-the-existing-car-jack-design---24-pages
Design improvement-of-the-existing-car-jack-design---24-pagesDesign improvement-of-the-existing-car-jack-design---24-pages
Design improvement-of-the-existing-car-jack-design---24-pages
 
Ideas sobre Seguridad y Salud
Ideas sobre Seguridad y SaludIdeas sobre Seguridad y Salud
Ideas sobre Seguridad y Salud
 

Similar to Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)

Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through DatabaseNina Jeliazkova
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...Hilmar Lapp
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Barbera van Schaik
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 

Similar to Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003) (20)

Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Cufflinks
CufflinksCufflinks
Cufflinks
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 

Recently uploaded

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEMISSRITIMABIOLOGYEXP
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...Nguyen Thanh Tu Collection
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 

Recently uploaded (20)

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 

Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)

  • 1. University of Waterloo MultiText for Genomics Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003) David L. Yeung University of Waterloo, Waterloo, Ontario, Canada Nov. 20, 2003 1/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 2. The MultiText Project •What is MultiText? • A collection of IR tools developed at U of Waterloo. What is MultiText for Genomics? • Based on MultiText. • No external databases or domain-specific knowledge. • A combination of techniques... 2/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 3. MultiText for Genomics •What is MultiText for Genomics? Query Formulation (Okapi) Feedback Topic Documents (Query expansion) Query Tiering (metadata) 3/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 4. Query Formulation (Okapi) •Two interesting facts: Query Formulation • Gene name type didn't matter (Okapi) • Spacing and punctuation affected performance •Example (training topic 5): • glycine receptor, alpha 1 • Glycine-receptor, alpha1 • Alpha 1 Glycine Receptor • glycine receptors... alpha receptor... alpha 1 • And so on... 4/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 5. Okapi Search Term Sets •Generate multiple search term sets: • Okapi 1 (higher precision, lower recall) • Treat gene names as phrases, except for punctuation. • “glycine_receptor_alpha_1” • Okapi 2 • Heuristics for guessing role of punctuation; also guess plurals. • Okapi 3 (lower precision, higher recall) • All pairs of tokens from gene names (bigrams). • “glycine[_]receptor”, “receptor[_]alpha”, “alpha[_]1”, etc. • Okapi Fusion • Take the product of the 3 scores. 5/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 6. Results of Okapi Experiments Mean Average Precision (MAP) Okapi 1 Training Okapi 2 Okapi 3 Test Okapi Fusion 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 • Two interesting points: • The trend in MAP is reversed between the training and test data. • Recall (from most to least): Okapi Fusion/Okapi 3, Okapi 2, Okapi 1. 6/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 7. MultiText for Genomics •Next: Query Tiering Query Formulation (Okapi) Feedback Topic Documents (Query expansion) Query Tiering (metadata) 7/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 8. Query Tiering (metadata) •Use metadata tags in data: (“<TagName>”..“</TagName>”) > “search_terms” •Order them by correlation to relevance: chemical list (RN) Relevance title (TI) abstract (AB) MeSH headings (MH) PubMed ID (PMID)... 8/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 9. The Query Tiers •6 Query Tiers: Query Tiering (metadata) • Tier 1: • Almost exact match in the “chemical list” metadata field. • “glycine receptor, alpha 1” → “glycine receptor alpha1” • Tier 2: • As above, but allow for additional terms. • “RAC1” → “rac1 GTP-Binding Protein” • Tier 3: • Gene name is weakened until a match is made. • “estrogen receptor 1” → “Receptors, Estrogen” 9/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 10. The Query Tiers •6 Query Tiers (continued): • Tier 4: • Boolean expression in the “title” metadata field. • “tyrosyl-tRNA synthetase” → “tyrosyl”^“trna”^“synthetase” • Tier 5: • Boolean expression in the “chemical list” metadata field. • Tier 6: • Boolean expression in the “abstract” metadata field. 10/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 11. Using the Query Tiers •Can retrieve documents using: • All Tiers (AT) • The tiers are executed in order. • Best Tier (BT) • Once a tier has retrieved non-zero documents, ignore the rest. ... then fuse with results of Okapi experiment. 11/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 12. Using the Query Tiers Query Formulation (Okapi) Feedback Topic Documents (Query expansion) Query Tiering (metadata) •Fusing with Okapi: • Rank Fusion (-R) • Document's score based on weighted sum of (reverse) rank. 12/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 13. MultiText for Genomics •Next: Feedback Query Formulation (Okapi) Feedback Topic Documents (Query expansion) Query Tiering (metadata) 13/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 14. Feedback (Query expansion) •Learn “most relevant” chemical: Feedback • Using pseudo-relevance feedback (Query expansion) • Only if document not matched in Tier 1 • Assign score to chemicals using Tf-Idf scoring scheme α •Example (training topic 27):   N  w i = R i ×  log    f    • cholinergic receptor, muscarinic 3   i  • Receptors, Muscarinic (29880.980020675546) • Muscarinic Antagonists (20430.84754342255) • muscarinic receptor M2 (13976.522895229124) • muscarinic receptor M3 (11159.997636110056) • Carbachol (11101.760218985524) • ... etc. 14/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 15. Complete MTG System - Runs Query Formulation (Okapi) Feedback Topic Documents (Query expansion) Query Tiering (metadata) •Complete runs: Okapi Fusion, ATR, BTR, ATRF, BTRF • Fusion with Okapi: Rank Fusion (-R) • Query Tiering: All Tiers (AT), Best Tier (BT) • Feedback: (-F) 15/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 16. Complete MTG System - Results Mean Average Precision (MAP) Training Okapi Fusion ATR Test BTR ATRF* 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 BTRF* •Complete runs: Okapi Fusion, ATR, BTR, ATRF*, BTRF* • Fusion with Okapi: Rank Fusion (-R) • Query Tiering: All Tiers (AT), Best Tier (BT) • Feedback: (-F) • * denotes an official submission 16/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project
  • 17. Conclusions •MultiTextsupports a variety of standard and non-standard techniques: • Okapi BM25 implementation • Query Tiering and Fusion • Pseudo-relevance Feedback •Possible to improve performance in genomics domain even without domain-specific knowledge: • Characteristics of corpus (SSR, metadata) • Merging results of multiple independent methods •For more information, please see our paper! 17/17 TREC 2003 Genomics Track: University of Waterloo MultiText Project