SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Usage and impact of controlled vocabularies in a
subject repository for indexing and retrieval

Dr. Timo Borst

LIBER 2011
Barcelona
29.6.-2.7.2011




                                ZBW is member of the Leibniz Association
Overview

1. Terminology webservices as a means for supporting retrieval in the
   realm of library applications

2. Logfile analysis as an approach for analysing users‘ search
   behaviour

3. Results

4. Conclusions and suggestions for improving search interfaces




                                                                 Seite 2
Terminology webservices
General idea: “Provide a framework for integrating authority data, which
is both normative and flexible enough to tolerate local idiosyncrasies on
a string level.”
Approach: Concept modelling based on Semantic Web /
SKOS standards (for concepts, persons, institutions,…)




                                                                    Seite 3
Terminology webservices
Architecture




                          Seite 4
Terminology webservices
Terminology?

   „STW Thesaurus for Economics“,
    http://zbw.eu/stw/versions/latest/about.en.html
   More than 6,000 standardized subject headings and 18,000 entry
    terms
   Contains concepts from Economics and Business Research, but
    also from law, sociology and politics
   Part of the Semantic Web and the LOD cloud
   Integrated into our own retrieval applications, downloaded from
    many institutions


                                                                 Seite 5
Terminology webservices
How does it work?




                          Seite 6
Logfile analysis
 Many approaches to analysis of user behaviour (logfile analysis,
  real-time tracking, usability studies, questionnaires…)
 To us, logfiles serve as a basis for analysing string patterns in
  queries, hence search behaviour on a linguistic level
 Basic idea: each user request is logged in a standardized way, e.g.
  by a web server




 Query strings are automatically processed and analysed e.g. through
  scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…)

                                                                   Seite 7
Logfile analysis


 Pros                                   Cons

+ Automatic generation and             - Access through proxies and
  persistence of logfiles                browser caches -> no user
+ Can be processed at any time           identification and counting
  by different tools                     possible
+ Filtering of robots, crawlers etc.   - Sometimes restricted to data
  possible                               privacy rules (e.g., no IP
                                         tracking allowed)
                                       - No real-time processing

                                                                    Seite 8
Logfile analysis
To be investigated:

1. What is the current rate of search queries with controlled
   vocabulary?

2. What is the potential mapping of uncontrolled search terms to
   controlled vocabulary?

3. How does the use of controlled vocabulary affect document views?




                                                                   Seite 9
Results
What is the current rate of search queries with controlled vocabulary
(JEL, STW terms by autosuggest and search term expansion
with/without scrolling)?
                      rate of controlled queries
                                               12%          STW terms

                                                            JEL terms
                                                     14%
                                                            STW expansion
                                                            terms
                                                            STW expansion
                                                            terms (scrolled)
                                                            non-controlled

                                                       6%
                67%                                  1%




                                                                               Seite 10
Results
What is the potential mapping of uncontrolled search terms to
controlled vocabulary (internal search)?*
   potential for controlled queries / internal search

             18%
                                                                             *approach:
                                                                             Running search terms
                                                                             against Lucene/SOLR-
                                                                             index of STW terms with
                                                        matching terms       stemming
                                                        non-matching terms
    15%                                                 other




                                              67%




                                                                                             Seite 11
Results
How does the use of controlled vocabulary affect document views
(Google search)?*
    potential for controlled queries / Google search

             18%


                                                       matching terms
                                                       non-matching terms   *approach:
                                                       other
                                                                            Running search terms
     13%                                                                    against Lucene/SOLR-
                                                                            index of STW terms with
                                                                            stemming
                                               69%




                                                                                           Seite 12
Conclusions and suggestions for improving
search interfaces (I)
  Significant use of and potential for controlled vocabulary – if the
   vocabulary is big enough and constantly maintained
 Significant rate of uncontrolled terms belonging to other-categories
   like „names“ and „document titles“ – how to support this better?
   Different searches for names according to different roles (e.g.
      search for (co-)authors, in citations, author information etc.)
   Suggesting names by authority files
 Result sets resulting from search term expansion are scrolled quite
   often – how to avoid this?
   Adding filters
   Sorting by column
   Cascading search

                                                                  Seite 13
Conclusions and suggestions for improving
search interfaces (II)
  Mapping of uncontrolled terms to vocabulary still may be further
   improved by linguistic techniques – main goal: convergence
   between „information system‘s language“ and user language
 Uncontrolled internal search (in our repository) and Google search
   formally do not differ much - what does that mean?
   Statement: Adaptation to Google text based search is not
     appropriate for domain specific scientific search. Instead, we do
     need
     suggest services based on authority data for terms, names,
       institutions etc. to better anticipate domain specific queries
     visible real-time information about other users‘ search
       behaviour (community building)
     visualization and navigation of domain specific topics
                                                                  Seite 14
Seite 15
Thank you!


Questions?

 Dr. Timo Borst
t.borst@zbw.eu




                  ZBW is member of the Leibniz Association

Contenu connexe

Similaire à Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
ijp2p
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexing
ijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
ijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
ijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
ijp2p
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 

Similaire à Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval (20)

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexing
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Search Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesSearch Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in Biosciences
 
Search Interface Feature Evaluation
Search Interface Feature EvaluationSearch Interface Feature Evaluation
Search Interface Feature Evaluation
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Web traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisWeb traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysis
 
Web Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisWeb Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional Analysis
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek GwizdkaLearning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 

Plus de redsys

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platform
redsys
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungen
redsys
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungen
redsys
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulen
redsys
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
redsys
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
redsys
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
redsys
 

Plus de redsys (7)

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platform
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungen
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungen
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulen
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 

Dernier

Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
lizamodels9
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
amitlee9823
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Dernier (20)

Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLJAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
JAYNAGAR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

  • 1. Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval Dr. Timo Borst LIBER 2011 Barcelona 29.6.-2.7.2011 ZBW is member of the Leibniz Association
  • 2. Overview 1. Terminology webservices as a means for supporting retrieval in the realm of library applications 2. Logfile analysis as an approach for analysing users‘ search behaviour 3. Results 4. Conclusions and suggestions for improving search interfaces Seite 2
  • 3. Terminology webservices General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.” Approach: Concept modelling based on Semantic Web / SKOS standards (for concepts, persons, institutions,…) Seite 3
  • 5. Terminology webservices Terminology?  „STW Thesaurus for Economics“, http://zbw.eu/stw/versions/latest/about.en.html  More than 6,000 standardized subject headings and 18,000 entry terms  Contains concepts from Economics and Business Research, but also from law, sociology and politics  Part of the Semantic Web and the LOD cloud  Integrated into our own retrieval applications, downloaded from many institutions Seite 5
  • 7. Logfile analysis  Many approaches to analysis of user behaviour (logfile analysis, real-time tracking, usability studies, questionnaires…)  To us, logfiles serve as a basis for analysing string patterns in queries, hence search behaviour on a linguistic level  Basic idea: each user request is logged in a standardized way, e.g. by a web server  Query strings are automatically processed and analysed e.g. through scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…) Seite 7
  • 8. Logfile analysis Pros Cons + Automatic generation and - Access through proxies and persistence of logfiles browser caches -> no user + Can be processed at any time identification and counting by different tools possible + Filtering of robots, crawlers etc. - Sometimes restricted to data possible privacy rules (e.g., no IP tracking allowed) - No real-time processing Seite 8
  • 9. Logfile analysis To be investigated: 1. What is the current rate of search queries with controlled vocabulary? 2. What is the potential mapping of uncontrolled search terms to controlled vocabulary? 3. How does the use of controlled vocabulary affect document views? Seite 9
  • 10. Results What is the current rate of search queries with controlled vocabulary (JEL, STW terms by autosuggest and search term expansion with/without scrolling)? rate of controlled queries 12% STW terms JEL terms 14% STW expansion terms STW expansion terms (scrolled) non-controlled 6% 67% 1% Seite 10
  • 11. Results What is the potential mapping of uncontrolled search terms to controlled vocabulary (internal search)?* potential for controlled queries / internal search 18% *approach: Running search terms against Lucene/SOLR- index of STW terms with matching terms stemming non-matching terms 15% other 67% Seite 11
  • 12. Results How does the use of controlled vocabulary affect document views (Google search)?* potential for controlled queries / Google search 18% matching terms non-matching terms *approach: other Running search terms 13% against Lucene/SOLR- index of STW terms with stemming 69% Seite 12
  • 13. Conclusions and suggestions for improving search interfaces (I)  Significant use of and potential for controlled vocabulary – if the vocabulary is big enough and constantly maintained  Significant rate of uncontrolled terms belonging to other-categories like „names“ and „document titles“ – how to support this better?  Different searches for names according to different roles (e.g. search for (co-)authors, in citations, author information etc.)  Suggesting names by authority files  Result sets resulting from search term expansion are scrolled quite often – how to avoid this?  Adding filters  Sorting by column  Cascading search Seite 13
  • 14. Conclusions and suggestions for improving search interfaces (II)  Mapping of uncontrolled terms to vocabulary still may be further improved by linguistic techniques – main goal: convergence between „information system‘s language“ and user language  Uncontrolled internal search (in our repository) and Google search formally do not differ much - what does that mean?  Statement: Adaptation to Google text based search is not appropriate for domain specific scientific search. Instead, we do need  suggest services based on authority data for terms, names, institutions etc. to better anticipate domain specific queries  visible real-time information about other users‘ search behaviour (community building)  visualization and navigation of domain specific topics Seite 14
  • 16. Thank you! Questions? Dr. Timo Borst t.borst@zbw.eu ZBW is member of the Leibniz Association