SlideShare une entreprise Scribd logo
1  sur  21
> design > publish > search!




              How to Search Annotated Text
                      by Strategy?
                                  Roberto Cornacchia
                                     Wouter Alink
                                   Arjen P. De Vries

                                      Spinque B.V.

                               CLIN 2013, 18 January 2013


                                                            http://www.spinque.com/
Search by Strategy
> design > publish > search!


                  Design the way you would like to search

●
    A search engine design framework

●
    Custom search engines built from “Strategies”, which:
    ●
      are designed as graphs
    ●
      abstract data processing
    ●
      combine different data sources
    ●
      incorporate probabilistic reasoning
    ●
      translate to database queries



                                                   http://www.spinque.com/
Search by Strategy
> design > publish > search!


   Don't try and program the ultimate search engine



   Design a number of domain-specific search strategies
                                                      Crime map
                                                       Crime map          All houses
                                                                           All houses        Query terms
                                                                                              Query terms


                                  Rank                    Rank                 Select           Rank
                                   Rank                    Rank                 Select           Rank
                               on location             on location           on attribute      full-text
                                on location             on location           on attribute      full-text



                                              Difference
                                               Difference




    Click. Generate Web search engines on probabilistic DB
                                                                 Union
                                                                  Union





                                                                                                            3
Multiple domains, custom UIs
> design > publish > search!




                                                4
Multiple domains, custom UIs
> design > publish > search!




                                                5
Multiple domains, custom UIs
> design > publish > search!




                                                6
Multiple domains, custom UIs
> design > publish > search!




                                                7
Strategy Editor
> design > publish > search!




                                   8
Not only "documents"
> design > publish > search!




                                         9
What's in the DB?
> design > publish > search!


  term     obj      freq            subj     pred / attr         obj / val            p
   t0       o3      0.03           Roberto   speaks_to             You               0.95

   t0       o5      0.21            You      listen_to           Roberto             0.6

   t1       o2      0.08           speech    minutes               15                0.8

 Full-text search                 Annotation search


  obj      f1      ...      fN                   obj       pre      size     level
   o0    0.12      ...     0.84                  o0        100       50       0
   o1    0.54      ...      0                    o1        110       20       1
   o2    0.23      ...     0.31                  o2        144       16       2
 Feature-vectors (CBIR, SVM)                    Hierarchical search

                                                                             10
Choose hot topics from (kid-)news
> design > publish > search!


                                                       http://www.opstel.eu




             Kid news          Rank on date   Expand


                                                            Extract terms



                                                       11
Use POS annotations
> design > publish > search!


    Text
        <abstract date="2013-01-15">
          Lilly de pitbull is een held. De hond uit
          de Amerikaanse staat Massachusetts heeft …
        </abstract>



    Annotated text: we are interested in NPs

     <abstract date="2013-01-15">
       <NP>Lilly de pitbull</NP> is <NP>een held</NP>.
       <NP>De hond uit de Amerikaanse staat
       Massachusetts</NP> heeft …
     </abstract>



                                               12
"Lilly de held" on Alpino
> design > publish > search!




                                            13
Choose hot topics from (kid-)news
> design > publish > search!


                                                       http://www.opstel.eu




             Kid news          Rank on date
                                              Expand

                                                              Top terms
                                                              Top NPs



                                                       14
Topic suggestion for kids
> design > publish > search!

               http://www.opstel.eu




                                                   15
Topic suggestion for kids
> design > publish > search!


    Data: Wikipedia, magazines for children, ..


    Left branch: rank data sources on
    annotations, e.g.:
    
        Most seen content – hot topics
    
        Seen during night-time? Probably not for kids


    Right branch: query expansion using recent
    (hot) content


    Can we improve this by adding.. ?
    
        Text reading level (machine learning)
    
        Handle spelling mistakes in query expansion
    
        Syntactic dependencies




                                                         16
Example: syntactic dependencies
> design > publish > search!


    AEGIR dependency parser for English (Koster et al.)


    Parses text, outputs dependency triples
    
          "PGs prevent the mucosal damage .. "

          [PG,SUBJ,prevent]
          [prevent,OBJ,damage]
          [damage,ATTR,mucosal]
    ...


    CLEFIP 2011: Combining document representations for prior-art
    retrieval, Eva D'hondt, Suzan Verberne, Wouter Alink, Roberto
    Cornacchia


                                                   17
> design > publish > search!




         Prior art search.
Designed by Eva D'hondt, Nijmegen

                                    18
> design > publish > search!




                          Find patents containing similar triples

                                                                    19
Recap
> design > publish > search!


   Strategies encapsulate
    domain expert knowledge
                                                                  Crime map
                                                                   Crime map          All houses
                                                                                       All houses           Query terms
                                                                                                             Query terms


    (how to find)                            Rank
                                               Rank
                                           on location
                                            on location
                                                                      Rank
                                                                       Rank
                                                                   on location
                                                                                             Select
                                                                                              Select
                                                                                           on attribute
                                                                                            on attribute
                                                                                                               Rank
                                                                                                                Rank
                                                                                                              full-text
                                                                                                               full-text
                                                                    on location




   Strategies abstract away                              Difference
                                                           Difference



    search expert knowledge                                                  Union
    (how to search) YOU can easily experiment                                 Union



          with (new) data representations, ranking formulas,
                           annotations, etc.
    Strategies facilitate knowledge management
       Store / share / publish / refine

   Minimise the effort needed to design/update
    complex domain-specific search engines

                                                                                      20
> design > publish > search!




                                 Thank you

                               www.spinque.com




                                                 21

Contenu connexe

En vedette

ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
Arjen de Vries
 

En vedette (6)

ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
20090914 Petamedia Irp5
20090914 Petamedia Irp520090914 Petamedia Irp5
20090914 Petamedia Irp5
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 

Similaire à How to Search Annotated Text by Strategy?

Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search Engines
Johan Koren
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
Astuanax
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
auexpo Conference
 
Responsive Web Design - Introduction & Workflow Overview
Responsive Web Design - Introduction & Workflow OverviewResponsive Web Design - Introduction & Workflow Overview
Responsive Web Design - Introduction & Workflow Overview
Aidan Foster
 

Similaire à How to Search Annotated Text by Strategy? (20)

Search engine optimization and osint
Search engine optimization and osintSearch engine optimization and osint
Search engine optimization and osint
 
How to not suck! Lessons Learned from running a Web Startup.
How to not suck! Lessons Learned from running a Web Startup. How to not suck! Lessons Learned from running a Web Startup.
How to not suck! Lessons Learned from running a Web Startup.
 
Mobile App & Game Biz
Mobile App & Game BizMobile App & Game Biz
Mobile App & Game Biz
 
Search engine strategy introduction
Search engine strategy introductionSearch engine strategy introduction
Search engine strategy introduction
 
Search engine strategies - introduction
Search engine strategies - introductionSearch engine strategies - introduction
Search engine strategies - introduction
 
Seo Made Easy
Seo Made EasySeo Made Easy
Seo Made Easy
 
Search Introduction - Updated
Search Introduction - UpdatedSearch Introduction - Updated
Search Introduction - Updated
 
Search engine strategies
Search engine strategiesSearch engine strategies
Search engine strategies
 
SEO para WordPress - 12 años de experiencias | Daniel Peris
SEO para WordPress - 12 años de experiencias | Daniel PerisSEO para WordPress - 12 años de experiencias | Daniel Peris
SEO para WordPress - 12 años de experiencias | Daniel Peris
 
Lost in the Net: Navigating Search Engines
Lost in the Net:  Navigating Search EnginesLost in the Net:  Navigating Search Engines
Lost in the Net: Navigating Search Engines
 
SEO On A Budget
SEO On A BudgetSEO On A Budget
SEO On A Budget
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
 
Brand as a System: The Local meets the Global
Brand as a System: The Local meets the GlobalBrand as a System: The Local meets the Global
Brand as a System: The Local meets the Global
 
SEO for Independent Wedding Professionals
SEO for Independent Wedding ProfessionalsSEO for Independent Wedding Professionals
SEO for Independent Wedding Professionals
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
 
Search engine -final
Search engine  -finalSearch engine  -final
Search engine -final
 
Andy Kirk's Webinar for Tableau (July 2016)
Andy Kirk's Webinar for Tableau (July 2016)Andy Kirk's Webinar for Tableau (July 2016)
Andy Kirk's Webinar for Tableau (July 2016)
 
Responsive Web Design - Introduction & Workflow Overview
Responsive Web Design - Introduction & Workflow OverviewResponsive Web Design - Introduction & Workflow Overview
Responsive Web Design - Introduction & Workflow Overview
 
International on page seo and content transcreation gianluca fiorelli
International on page seo and content transcreation   gianluca fiorelliInternational on page seo and content transcreation   gianluca fiorelli
International on page seo and content transcreation gianluca fiorelli
 
Jaspersoft Webinar deck
Jaspersoft Webinar deckJaspersoft Webinar deck
Jaspersoft Webinar deck
 

Plus de Arjen de Vries

The personal search engine
The personal search engineThe personal search engine
The personal search engine
Arjen de Vries
 

Plus de Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
 
Diversity (in Media)
Diversity (in Media)Diversity (in Media)
Diversity (in Media)
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

How to Search Annotated Text by Strategy?

  • 1. > design > publish > search! How to Search Annotated Text by Strategy? Roberto Cornacchia Wouter Alink Arjen P. De Vries Spinque B.V. CLIN 2013, 18 January 2013 http://www.spinque.com/
  • 2. Search by Strategy > design > publish > search! Design the way you would like to search ● A search engine design framework ● Custom search engines built from “Strategies”, which: ● are designed as graphs ● abstract data processing ● combine different data sources ● incorporate probabilistic reasoning ● translate to database queries http://www.spinque.com/
  • 3. Search by Strategy > design > publish > search!  Don't try and program the ultimate search engine  Design a number of domain-specific search strategies Crime map Crime map All houses All houses Query terms Query terms Rank Rank Select Rank Rank Rank Select Rank on location on location on attribute full-text on location on location on attribute full-text Difference Difference Click. Generate Web search engines on probabilistic DB Union Union  3
  • 4. Multiple domains, custom UIs > design > publish > search! 4
  • 5. Multiple domains, custom UIs > design > publish > search! 5
  • 6. Multiple domains, custom UIs > design > publish > search! 6
  • 7. Multiple domains, custom UIs > design > publish > search! 7
  • 8. Strategy Editor > design > publish > search! 8
  • 9. Not only "documents" > design > publish > search! 9
  • 10. What's in the DB? > design > publish > search! term obj freq subj pred / attr obj / val p t0 o3 0.03 Roberto speaks_to You 0.95 t0 o5 0.21 You listen_to Roberto 0.6 t1 o2 0.08 speech minutes 15 0.8 Full-text search Annotation search obj f1 ... fN obj pre size level o0 0.12 ... 0.84 o0 100 50 0 o1 0.54 ... 0 o1 110 20 1 o2 0.23 ... 0.31 o2 144 16 2 Feature-vectors (CBIR, SVM) Hierarchical search 10
  • 11. Choose hot topics from (kid-)news > design > publish > search! http://www.opstel.eu Kid news Rank on date Expand Extract terms 11
  • 12. Use POS annotations > design > publish > search!  Text <abstract date="2013-01-15"> Lilly de pitbull is een held. De hond uit de Amerikaanse staat Massachusetts heeft … </abstract>  Annotated text: we are interested in NPs <abstract date="2013-01-15"> <NP>Lilly de pitbull</NP> is <NP>een held</NP>. <NP>De hond uit de Amerikaanse staat Massachusetts</NP> heeft … </abstract> 12
  • 13. "Lilly de held" on Alpino > design > publish > search! 13
  • 14. Choose hot topics from (kid-)news > design > publish > search! http://www.opstel.eu Kid news Rank on date Expand Top terms Top NPs 14
  • 15. Topic suggestion for kids > design > publish > search! http://www.opstel.eu 15
  • 16. Topic suggestion for kids > design > publish > search!  Data: Wikipedia, magazines for children, ..  Left branch: rank data sources on annotations, e.g.:  Most seen content – hot topics  Seen during night-time? Probably not for kids  Right branch: query expansion using recent (hot) content  Can we improve this by adding.. ?  Text reading level (machine learning)  Handle spelling mistakes in query expansion  Syntactic dependencies 16
  • 17. Example: syntactic dependencies > design > publish > search!  AEGIR dependency parser for English (Koster et al.)  Parses text, outputs dependency triples  "PGs prevent the mucosal damage .. " [PG,SUBJ,prevent] [prevent,OBJ,damage] [damage,ATTR,mucosal] ...  CLEFIP 2011: Combining document representations for prior-art retrieval, Eva D'hondt, Suzan Verberne, Wouter Alink, Roberto Cornacchia 17
  • 18. > design > publish > search! Prior art search. Designed by Eva D'hondt, Nijmegen 18
  • 19. > design > publish > search! Find patents containing similar triples 19
  • 20. Recap > design > publish > search!  Strategies encapsulate domain expert knowledge Crime map Crime map All houses All houses Query terms Query terms (how to find) Rank Rank on location on location Rank Rank on location Select Select on attribute on attribute Rank Rank full-text full-text on location  Strategies abstract away Difference Difference search expert knowledge Union (how to search) YOU can easily experiment Union with (new) data representations, ranking formulas,  annotations, etc. Strategies facilitate knowledge management  Store / share / publish / refine  Minimise the effort needed to design/update complex domain-specific search engines 20
  • 21. > design > publish > search! Thank you www.spinque.com 21