SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Usage and impact of controlled vocabularies in a
subject repository for indexing and retrieval

Dr. Timo Borst

LIBER 2011
Barcelona
29.6.-2.7.2011




                                ZBW is member of the Leibniz Association
Overview

1. Terminology webservices as a means for supporting retrieval in the
   realm of library applications

2. Logfile analysis as an approach for analysing users‘ search
   behaviour

3. Results

4. Conclusions and suggestions for improving search interfaces




                                                                 Seite 2
Terminology webservices
General idea: “Provide a framework for integrating authority data, which
is both normative and flexible enough to tolerate local idiosyncrasies on
a string level.”
Approach: Concept modelling based on Semantic Web /
SKOS standards (for concepts, persons, institutions,…)




                                                                    Seite 3
Terminology webservices
Architecture




                          Seite 4
Terminology webservices
Terminology?

   „STW Thesaurus for Economics“,
    http://zbw.eu/stw/versions/latest/about.en.html
   More than 6,000 standardized subject headings and 18,000 entry
    terms
   Contains concepts from Economics and Business Research, but
    also from law, sociology and politics
   Part of the Semantic Web and the LOD cloud
   Integrated into our own retrieval applications, downloaded from
    many institutions


                                                                 Seite 5
Terminology webservices
How does it work?




                          Seite 6
Logfile analysis
 Many approaches to analysis of user behaviour (logfile analysis,
  real-time tracking, usability studies, questionnaires…)
 To us, logfiles serve as a basis for analysing string patterns in
  queries, hence search behaviour on a linguistic level
 Basic idea: each user request is logged in a standardized way, e.g.
  by a web server




 Query strings are automatically processed and analysed e.g. through
  scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…)

                                                                   Seite 7
Logfile analysis


 Pros                                   Cons

+ Automatic generation and             - Access through proxies and
  persistence of logfiles                browser caches -> no user
+ Can be processed at any time           identification and counting
  by different tools                     possible
+ Filtering of robots, crawlers etc.   - Sometimes restricted to data
  possible                               privacy rules (e.g., no IP
                                         tracking allowed)
                                       - No real-time processing

                                                                    Seite 8
Logfile analysis
To be investigated:

1. What is the current rate of search queries with controlled
   vocabulary?

2. What is the potential mapping of uncontrolled search terms to
   controlled vocabulary?

3. How does the use of controlled vocabulary affect document views?




                                                                   Seite 9
Results
What is the current rate of search queries with controlled vocabulary
(JEL, STW terms by autosuggest and search term expansion
with/without scrolling)?
                      rate of controlled queries
                                               12%          STW terms

                                                            JEL terms
                                                     14%
                                                            STW expansion
                                                            terms
                                                            STW expansion
                                                            terms (scrolled)
                                                            non-controlled

                                                       6%
                67%                                  1%




                                                                               Seite 10
Results
What is the potential mapping of uncontrolled search terms to
controlled vocabulary (internal search)?*
   potential for controlled queries / internal search

             18%
                                                                             *approach:
                                                                             Running search terms
                                                                             against Lucene/SOLR-
                                                                             index of STW terms with
                                                        matching terms       stemming
                                                        non-matching terms
    15%                                                 other




                                              67%




                                                                                             Seite 11
Results
How does the use of controlled vocabulary affect document views
(Google search)?*
    potential for controlled queries / Google search

             18%


                                                       matching terms
                                                       non-matching terms   *approach:
                                                       other
                                                                            Running search terms
     13%                                                                    against Lucene/SOLR-
                                                                            index of STW terms with
                                                                            stemming
                                               69%




                                                                                           Seite 12
Conclusions and suggestions for improving
search interfaces (I)
  Significant use of and potential for controlled vocabulary – if the
   vocabulary is big enough and constantly maintained
 Significant rate of uncontrolled terms belonging to other-categories
   like „names“ and „document titles“ – how to support this better?
   Different searches for names according to different roles (e.g.
      search for (co-)authors, in citations, author information etc.)
   Suggesting names by authority files
 Result sets resulting from search term expansion are scrolled quite
   often – how to avoid this?
   Adding filters
   Sorting by column
   Cascading search

                                                                  Seite 13
Conclusions and suggestions for improving
search interfaces (II)
  Mapping of uncontrolled terms to vocabulary still may be further
   improved by linguistic techniques – main goal: convergence
   between „information system‘s language“ and user language
 Uncontrolled internal search (in our repository) and Google search
   formally do not differ much - what does that mean?
   Statement: Adaptation to Google text based search is not
     appropriate for domain specific scientific search. Instead, we do
     need
     suggest services based on authority data for terms, names,
       institutions etc. to better anticipate domain specific queries
     visible real-time information about other users‘ search
       behaviour (community building)
     visualization and navigation of domain specific topics
                                                                  Seite 14
Seite 15
Thank you!


Questions?

 Dr. Timo Borst
t.borst@zbw.eu




                  ZBW is member of the Leibniz Association

Contenu connexe

Similaire à Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSijp2p
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews cscpconf
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 
Search Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesSearch Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesZanda Mark
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Web traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisWeb traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisRobin Paynter
 
Web Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisWeb Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisLaura Zeigen
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek GwizdkaLearning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdkajacekg
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 

Similaire à Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval (20)

BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
Bsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexingBsi bloom filter based semantic indexing
Bsi bloom filter based semantic indexing
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKSBSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
BSI: BLOOM FILTER-BASED SEMANTIC INDEXING FOR UNSTRUCTURED P2P NETWORKS
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Search Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in BiosciencesSearch Interface Feature Evaluation in Biosciences
Search Interface Feature Evaluation in Biosciences
 
Search Interface Feature Evaluation
Search Interface Feature EvaluationSearch Interface Feature Evaluation
Search Interface Feature Evaluation
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Web traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysisWeb traffic and campus trends: a multi-institution analysis
Web traffic and campus trends: a multi-institution analysis
 
Web Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional AnalysisWeb Traffic and Campus Trends: A Multi-Institutional Analysis
Web Traffic and Campus Trends: A Multi-Institutional Analysis
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek GwizdkaLearning about Information Searchers from Eye-Tracking by Jacek Gwizdka
Learning about Information Searchers from Eye-Tracking by Jacek Gwizdka
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 

Plus de redsys

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platformredsys
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungenredsys
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungenredsys
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulenredsys
 
Poster presentation
Poster presentationPoster presentation
Poster presentationredsys
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...redsys
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 

Plus de redsys (7)

DSpace as publication platform
DSpace as publication platformDSpace as publication platform
DSpace as publication platform
 
Einbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende BibliotheksanswendungenEinbindung von Linked Data in existierende Bibliotheksanswendungen
Einbindung von Linked Data in existierende Bibliotheksanswendungen
 
Datenschutz für Bibliotheksanwendungen
Datenschutz für BibliotheksanwendungenDatenschutz für Bibliotheksanwendungen
Datenschutz für Bibliotheksanwendungen
 
Medienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an HochschulenMedienkompetenz und Wikipedia an Hochschulen
Medienkompetenz und Wikipedia an Hochschulen
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
 
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
Integration von Normdaten in Bibliotheksanwendungen auf der Basis von Semanti...
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 

Dernier

VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 

Dernier (20)

VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 

Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval

  • 1. Usage and impact of controlled vocabularies in a subject repository for indexing and retrieval Dr. Timo Borst LIBER 2011 Barcelona 29.6.-2.7.2011 ZBW is member of the Leibniz Association
  • 2. Overview 1. Terminology webservices as a means for supporting retrieval in the realm of library applications 2. Logfile analysis as an approach for analysing users‘ search behaviour 3. Results 4. Conclusions and suggestions for improving search interfaces Seite 2
  • 3. Terminology webservices General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.” Approach: Concept modelling based on Semantic Web / SKOS standards (for concepts, persons, institutions,…) Seite 3
  • 5. Terminology webservices Terminology?  „STW Thesaurus for Economics“, http://zbw.eu/stw/versions/latest/about.en.html  More than 6,000 standardized subject headings and 18,000 entry terms  Contains concepts from Economics and Business Research, but also from law, sociology and politics  Part of the Semantic Web and the LOD cloud  Integrated into our own retrieval applications, downloaded from many institutions Seite 5
  • 7. Logfile analysis  Many approaches to analysis of user behaviour (logfile analysis, real-time tracking, usability studies, questionnaires…)  To us, logfiles serve as a basis for analysing string patterns in queries, hence search behaviour on a linguistic level  Basic idea: each user request is logged in a standardized way, e.g. by a web server  Query strings are automatically processed and analysed e.g. through scripts (PERL), regexp or UNIX Shell commands (grep, sed, awk,…) Seite 7
  • 8. Logfile analysis Pros Cons + Automatic generation and - Access through proxies and persistence of logfiles browser caches -> no user + Can be processed at any time identification and counting by different tools possible + Filtering of robots, crawlers etc. - Sometimes restricted to data possible privacy rules (e.g., no IP tracking allowed) - No real-time processing Seite 8
  • 9. Logfile analysis To be investigated: 1. What is the current rate of search queries with controlled vocabulary? 2. What is the potential mapping of uncontrolled search terms to controlled vocabulary? 3. How does the use of controlled vocabulary affect document views? Seite 9
  • 10. Results What is the current rate of search queries with controlled vocabulary (JEL, STW terms by autosuggest and search term expansion with/without scrolling)? rate of controlled queries 12% STW terms JEL terms 14% STW expansion terms STW expansion terms (scrolled) non-controlled 6% 67% 1% Seite 10
  • 11. Results What is the potential mapping of uncontrolled search terms to controlled vocabulary (internal search)?* potential for controlled queries / internal search 18% *approach: Running search terms against Lucene/SOLR- index of STW terms with matching terms stemming non-matching terms 15% other 67% Seite 11
  • 12. Results How does the use of controlled vocabulary affect document views (Google search)?* potential for controlled queries / Google search 18% matching terms non-matching terms *approach: other Running search terms 13% against Lucene/SOLR- index of STW terms with stemming 69% Seite 12
  • 13. Conclusions and suggestions for improving search interfaces (I)  Significant use of and potential for controlled vocabulary – if the vocabulary is big enough and constantly maintained  Significant rate of uncontrolled terms belonging to other-categories like „names“ and „document titles“ – how to support this better?  Different searches for names according to different roles (e.g. search for (co-)authors, in citations, author information etc.)  Suggesting names by authority files  Result sets resulting from search term expansion are scrolled quite often – how to avoid this?  Adding filters  Sorting by column  Cascading search Seite 13
  • 14. Conclusions and suggestions for improving search interfaces (II)  Mapping of uncontrolled terms to vocabulary still may be further improved by linguistic techniques – main goal: convergence between „information system‘s language“ and user language  Uncontrolled internal search (in our repository) and Google search formally do not differ much - what does that mean?  Statement: Adaptation to Google text based search is not appropriate for domain specific scientific search. Instead, we do need  suggest services based on authority data for terms, names, institutions etc. to better anticipate domain specific queries  visible real-time information about other users‘ search behaviour (community building)  visualization and navigation of domain specific topics Seite 14
  • 16. Thank you! Questions? Dr. Timo Borst t.borst@zbw.eu ZBW is member of the Leibniz Association