SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Bibliometric-enhanced Retrieval
Models for Big Scholarly Information
Systems
philipp.mayr@gesis.org
Workshop on Scholarly Big Data: Challenges and
Ideas. IEEE BigData 2013
Intro
• What are Big Scholarly Information
Systems?
Intro
• What are bibliometric-enhanced IR
models?
– set of methods to quantitatively analyze
scientific and technological literature
– E.g. citation analysis (h-index)
– CiteSeer was a pioneer bibliometric-enhanced
IR system
Background
• DFG-funded (2009-2013): Projects IRM I and IRM II
– IRM = Information Retrieval Mehrwertdienste (value-added IR services)
• Goal: Implementation and evaluation of value-added IR services for
digital library systems
• Main idea: Applying scholarly (science) models for IR
 Co-occurrence analysis of controlled vocabularies (thesauri)
 Bibliometric analysis of core journals (Bradford’s law)
 Centrality in author networks (betweenness)
• In IRM I we concentrated on the basic evaluation
• In IRM II we concentrate on the implementation of reusable (web)
services
4
http://www.gesis.org/en/research/external-funding-projects/archive/irm/
Search Term Recommender (Petras 2006)
Search Term Service: recommending strongly
associated terms from controlled vocabulary
Bradfordizing (White 1981, Mayr 2009)
Bradford Law of Scattering (Bradford 1948): idealized example for 450 articles
Nucleus/Core:
150 papers in
3 Journals
Zone 2:
150 papers in
9 Journals
Zone 3:
150 papers in
27 Journals
Ranking by Bradfordizing: sorting the core journal papers / core books on top
bradfordized list of journals in informetrics applied to monographs: publisher as sorting criterion
Author Centrality (Mutschke 2001, 2004)
Ranking by Author Centrality: sorting central author papers on top
Scenarios for combined ranking services
iterative use : simultanous use:
Result Set
Core Journal Papers
Central Author Papers
Relevant
Papers
Result Set
Central Author Papers
Core Journal Papers
Prototye
http://multiweb.gesis.org/irsa/IRMPrototype
Evaluation
Main Research Issue:
Contribution to retrieval quality and usability
• Precision:
– Do central authors (core journals) provide more relevant hits?
– Do highly associated cowords have any positive effects?
• Value-adding effects:
– Do central authors (core journals) provide OTHER relevant hits?
– Do coword-relationships provide OTHER relevant search terms?
• Mashup effects:
– Do combinations of the services enhance the effects?
Evaluation Design
• precision in existing evaluation data:
– Clef 2003-2007: 125 topics; 65,297 SOLIS documents
– KoMoHe 2007: 39 topics; 31,155 SOLIS documents
• plausibility tests:
– author centrality / journal coreness ↔ precision
– Bradfordizing ↔ author centrality
• precision tests with users (Online-Assessment-Tool)
• usability tests with users (acceptance)
Evaluation of Bradfordizing on CLEF Data (Mayr 2013)
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
Bradford zones (core, z2, z3)
2003 articles 0,29 0,22 0,16
2004 articles 0,23 0,18 0,13
2005 articles 0,31 0,24 0,17
2006 articles 0,29 0,27 0,24
2007 articles 0,28 0,26 0,22
2005 monographs 0,21 0,16 0,19
2006 monographs 0,28 0,28 0,24
2007 monographs 0,24 0,21 0,23
core z2 z3
journal articles:
significant improvement
of precision from zone3
to core
monographs:
slight improvement of
precision distribution
between the three
zones
precision between Bradford zones (core, zone2 and zone3)
Evaluation of Author Centrality on CLEF Data
• moderate positive relationship between
rate of networking and precision
• precision of TF-IDF rankings (0.60)
significantly higher than author centrality
based rankings (0.31) – BUT:
• very little overlap of documents on top of
the ranking lists: 90% of relevant hits
provided by author centrality did not appear
on top of TF-IDF rankings
→ added precision of 28%
0
20
40
60
80
100
120
140
0 0,2 0,4 0,6 0,8 1 1,2
GiantSize
Precision
Correlation Precision10 -
Giant Size: 0.25
• author centrality seems to favor OTHER
relevant documents than traditional rankings
• value-adding effect:
other view to the information space
avg number docs 517
avg number authors 664
avg number co-authors 302
avg giant size 24
Result: overlap
Intersection of
suggested top n=10
documents over all
topics and services
Mutschke et al. 2011
top 10 result lists
are marginal
overlapping!
IRSA
•
•
•
16
17
IRSA: Workflow
Analysis
18
Output
19
Returning suggestions for any query term
Integration
20
www.sowiport.de is
using query suggestions
from IRSA
IRM & Modeling Science
measuring contribution
of bibliometric-enhanced services
to retrieval quality
deeper insights in
structure & functioning
of science
Bibliometric-enhanced
services
(structural attributes of
science system)
way towards a formal
model of science
References
• Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-
added services for scholarly information systems. Scientometrics, 89(1), 349–
364. doi:10.1007/s11192-011-0430-x
• Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term
recommendation systems. In Proceedings of the 36th international ACM SIGIR
conference on Research and development in information retrieval - SIGIR ’13
(pp. 1093–1094). New York, New York, USA: ACM Press.
doi:10.1145/2484028.2484207
• Mayr, P. (2013). Relevance distributions across Bradford Zones: Can
Bradfordizing improve search? In J. Gorraiz, E. Schiebel, C. Gumpenberger, M.
Hörlesberger, & H. Moed (Eds.), 14th International Society of Scientometrics
and Informetrics Conference (pp. 1493–1505). Vienna, Austria. Retrieved from
http://arxiv.org/abs/1305.0357
• Hienert, D., Schaer, P., Schaible, J., & Mayr, P. (2011). A Novel Combined Term
Suggestion Service for Domain-Specific Digital Libraries. In S. Gradmann, F.
Borri, C. Meghini, & H. Schuldt (Eds.), International Conference on Theory and
Practice of Digital Libraries (TPDL) (pp. 192–203). Berlin: Springer.
doi:10.1007/978-3-642-24469-8_21 22
Using IRSA
23



•
•
Thank you!
Dr Philipp Mayr
GESIS Leibniz Institute for the Social Sciences
Unter Sachsenhausen 6-8
50667 Cologne
Germany
philipp.mayr@gesis.org
24

Contenu connexe

Tendances

Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
GESIS
 

Tendances (20)

Using OpenUrl Activity Data Summary for RDTF Day 26 May 11
Using OpenUrl Activity Data Summary for RDTF Day 26 May 11Using OpenUrl Activity Data Summary for RDTF Day 26 May 11
Using OpenUrl Activity Data Summary for RDTF Day 26 May 11
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it all
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
Brooking Ingesting Metadata - FINAL
Brooking Ingesting Metadata - FINALBrooking Ingesting Metadata - FINAL
Brooking Ingesting Metadata - FINAL
 
Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)
 
Navigating the data management ecosystem - John Kratz
Navigating the data management ecosystem - John KratzNavigating the data management ecosystem - John Kratz
Navigating the data management ecosystem - John Kratz
 
Predicting potential electronic serials use
Predicting potential electronic serials usePredicting potential electronic serials use
Predicting potential electronic serials use
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscape
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Technique
 
Navigating the data management ecosystem - Dan Valen
Navigating the data management ecosystem - Dan ValenNavigating the data management ecosystem - Dan Valen
Navigating the data management ecosystem - Dan Valen
 
What is a data-driven Academic Library?
What is a data-driven Academic Library?What is a data-driven Academic Library?
What is a data-driven Academic Library?
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine FeldenIntroduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
 
Zucca "Technology & Systems"
Zucca "Technology & Systems"Zucca "Technology & Systems"
Zucca "Technology & Systems"
 
Putnam Data Quality and the IR
Putnam Data Quality and the IRPutnam Data Quality and the IR
Putnam Data Quality and the IR
 
An Enlighten-ed view of Repository and Research System Integration
An Enlighten-ed view of Repository and Research System IntegrationAn Enlighten-ed view of Repository and Research System Integration
An Enlighten-ed view of Repository and Research System Integration
 
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
A snake, a planet, and a bear ditching spreadsheets for quick, reproducible r...
 
Who will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynoteWho will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynote
 
Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...
 

En vedette

Pennants for Descriptors
Pennants for DescriptorsPennants for Descriptors
Pennants for Descriptors
GESIS
 
Demonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations SystemsDemonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations Systems
GESIS
 
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshopIntroduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
GESIS
 

En vedette (20)

Pennants for Descriptors
Pennants for DescriptorsPennants for Descriptors
Pennants for Descriptors
 
PEP-TF: Social Media Monitoring of the Campaigns for the 2013 German Bundesta...
PEP-TF: Social Media Monitoring of the Campaigns for the 2013 German Bundesta...PEP-TF: Social Media Monitoring of the Campaigns for the 2013 German Bundesta...
PEP-TF: Social Media Monitoring of the Campaigns for the 2013 German Bundesta...
 
Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...Establishing an Online Access Panel for Interactive Information Retrieval Res...
Establishing an Online Access Panel for Interactive Information Retrieval Res...
 
Are topic-specific search term, journal name and author name recommendations ...
Are topic-specific search term, journal name and author name recommendations ...Are topic-specific search term, journal name and author name recommendations ...
Are topic-specific search term, journal name and author name recommendations ...
 
Past, present and future of scientific information
Past, present and future of scientific informationPast, present and future of scientific information
Past, present and future of scientific information
 
Demonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations SystemsDemonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations Systems
 
Opening Scholarly Communication in Social Sciences (OSCOSS)
Opening Scholarly Communication in Social Sciences (OSCOSS)Opening Scholarly Communication in Social Sciences (OSCOSS)
Opening Scholarly Communication in Social Sciences (OSCOSS)
 
Introduction to the 15th NKOS workshop @TPDL2016
Introduction to the 15th NKOS workshop @TPDL2016Introduction to the 15th NKOS workshop @TPDL2016
Introduction to the 15th NKOS workshop @TPDL2016
 
Analyzing the research output presented at European Networked Knowledge Organ...
Analyzing the research output presented at European Networked Knowledge Organ...Analyzing the research output presented at European Networked Knowledge Organ...
Analyzing the research output presented at European Networked Knowledge Organ...
 
Introduction of the 3rd International Workshop on Bibliometric-enhanced Infor...
Introduction of the 3rd International Workshop on Bibliometric-enhanced Infor...Introduction of the 3rd International Workshop on Bibliometric-enhanced Infor...
Introduction of the 3rd International Workshop on Bibliometric-enhanced Infor...
 
Recent applications of Knowledge Organization Systems
Recent applications of Knowledge Organization SystemsRecent applications of Knowledge Organization Systems
Recent applications of Knowledge Organization Systems
 
Opening Scholarly Communication in the Social Sciences
Opening Scholarly Communication in the Social SciencesOpening Scholarly Communication in the Social Sciences
Opening Scholarly Communication in the Social Sciences
 
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshopIntroduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
Introduction of the Bibliometric-enhanced Information Retrieval (BIR) workshop
 
Recent Advances in Bibliometric-Enhanced Information Retrieval
Recent Advances in Bibliometric-Enhanced Information RetrievalRecent Advances in Bibliometric-Enhanced Information Retrieval
Recent Advances in Bibliometric-Enhanced Information Retrieval
 
Assessing a human mediated current awareness service
Assessing a human mediated current awareness serviceAssessing a human mediated current awareness service
Assessing a human mediated current awareness service
 
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...Measuring the usefulness of Knowledge Organization Systems in Information Ret...
Measuring the usefulness of Knowledge Organization Systems in Information Ret...
 
Using co-authorship networks for author name disambiguation
Using co-authorship networks for author name disambiguationUsing co-authorship networks for author name disambiguation
Using co-authorship networks for author name disambiguation
 
Towards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social SciencesTowards a Semantic Citation Index for the German Social Sciences
Towards a Semantic Citation Index for the German Social Sciences
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation index
 
Opening Scholarly Communication in Social Sciences by Connecting Collaborativ...
Opening Scholarly Communication in Social Sciences by Connecting Collaborativ...Opening Scholarly Communication in Social Sciences by Connecting Collaborativ...
Opening Scholarly Communication in Social Sciences by Connecting Collaborativ...
 

Similaire à Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
NASIG
 

Similaire à Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems (20)

Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Sandusky, "Deep Indexing and Discover of Tables and Figures"
Sandusky, "Deep Indexing and Discover of Tables and Figures"Sandusky, "Deep Indexing and Discover of Tables and Figures"
Sandusky, "Deep Indexing and Discover of Tables and Figures"
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
BLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, SymplecticBLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic
 
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
 
Introduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special LibrariansIntroduction to Altmetrics for Medical and Special Librarians
Introduction to Altmetrics for Medical and Special Librarians
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
UCL’s research IT management systems architecture review aligned with Open Sc...
UCL’s research IT management systems architecture review aligned with Open Sc...UCL’s research IT management systems architecture review aligned with Open Sc...
UCL’s research IT management systems architecture review aligned with Open Sc...
 
Service system design
Service system designService system design
Service system design
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Beyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research ArticlesBeyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research Articles
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
A Social Network-Empowered Research Analytics Framework For Project Selection
A Social Network-Empowered Research Analytics Framework For Project SelectionA Social Network-Empowered Research Analytics Framework For Project Selection
A Social Network-Empowered Research Analytics Framework For Project Selection
 
Gather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchGather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your research
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 

Plus de GESIS

4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
GESIS
 
Contextualised Browsing in a Digital Library’s Living Lab
Contextualised Browsing in a Digital Library’s Living LabContextualised Browsing in a Digital Library’s Living Lab
Contextualised Browsing in a Digital Library’s Living Lab
GESIS
 

Plus de GESIS (16)

10th BIR Workshop @ECIR 2020: introduction
10th  BIR Workshop @ECIR 2020: introduction10th  BIR Workshop @ECIR 2020: introduction
10th BIR Workshop @ECIR 2020: introduction
 
From closed to open access: A case study of flipped journals
From closed to open access: A case study of flipped journalsFrom closed to open access: A case study of flipped journals
From closed to open access: A case study of flipped journals
 
Highly cited references in PLOS ONE and their in-text usage over time
Highly cited references in PLOS ONE and their in-text usage over timeHighly cited references in PLOS ONE and their in-text usage over time
Highly cited references in PLOS ONE and their in-text usage over time
 
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural...
 
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with BibliometricsBibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
 
Analyzing the network structure and gender differences of the “NKOS community”
Analyzing the network structure and gender differences of the “NKOS community”Analyzing the network structure and gender differences of the “NKOS community”
Analyzing the network structure and gender differences of the “NKOS community”
 
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
 
Searching beyond datasets in the Social Sciences
Searching beyond datasets in the Social SciencesSearching beyond datasets in the Social Sciences
Searching beyond datasets in the Social Sciences
 
Bedeutung von Text Mining am Beispiel der Sozialwissenschaften
Bedeutung von Text Mining am Beispiel der SozialwissenschaftenBedeutung von Text Mining am Beispiel der Sozialwissenschaften
Bedeutung von Text Mining am Beispiel der Sozialwissenschaften
 
Contextualised Browsing in a Digital Library’s Living Lab
Contextualised Browsing in a Digital Library’s Living LabContextualised Browsing in a Digital Library’s Living Lab
Contextualised Browsing in a Digital Library’s Living Lab
 
41st European Conference on Information Retrieval (ECIR 2019)
41st European Conference on Information Retrieval (ECIR 2019)41st European Conference on Information Retrieval (ECIR 2019)
41st European Conference on Information Retrieval (ECIR 2019)
 
Offenes kollaboratives Schreiben: Eine „Open Science“-Infrastruktur am Beispi...
Offenes kollaboratives Schreiben: Eine „Open Science“-Infrastruktur am Beispi...Offenes kollaboratives Schreiben: Eine „Open Science“-Infrastruktur am Beispi...
Offenes kollaboratives Schreiben: Eine „Open Science“-Infrastruktur am Beispi...
 
A Complete Year of User Retrieval Sessions in a Social Sciences Academic Sear...
A Complete Year of User Retrieval Sessions in a Social Sciences Academic Sear...A Complete Year of User Retrieval Sessions in a Social Sciences Academic Sear...
A Complete Year of User Retrieval Sessions in a Social Sciences Academic Sear...
 
Challenges in Extracting and Managing References
Challenges in Extracting and Managing ReferencesChallenges in Extracting and Managing References
Challenges in Extracting and Managing References
 
Einführung in das Vektorraummodell
Einführung in das VektorraummodellEinführung in das Vektorraummodell
Einführung in das Vektorraummodell
 
Industrie 4.0
Industrie 4.0Industrie 4.0
Industrie 4.0
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

  • 1. Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems philipp.mayr@gesis.org Workshop on Scholarly Big Data: Challenges and Ideas. IEEE BigData 2013
  • 2. Intro • What are Big Scholarly Information Systems?
  • 3. Intro • What are bibliometric-enhanced IR models? – set of methods to quantitatively analyze scientific and technological literature – E.g. citation analysis (h-index) – CiteSeer was a pioneer bibliometric-enhanced IR system
  • 4. Background • DFG-funded (2009-2013): Projects IRM I and IRM II – IRM = Information Retrieval Mehrwertdienste (value-added IR services) • Goal: Implementation and evaluation of value-added IR services for digital library systems • Main idea: Applying scholarly (science) models for IR  Co-occurrence analysis of controlled vocabularies (thesauri)  Bibliometric analysis of core journals (Bradford’s law)  Centrality in author networks (betweenness) • In IRM I we concentrated on the basic evaluation • In IRM II we concentrate on the implementation of reusable (web) services 4 http://www.gesis.org/en/research/external-funding-projects/archive/irm/
  • 5. Search Term Recommender (Petras 2006) Search Term Service: recommending strongly associated terms from controlled vocabulary
  • 6. Bradfordizing (White 1981, Mayr 2009) Bradford Law of Scattering (Bradford 1948): idealized example for 450 articles Nucleus/Core: 150 papers in 3 Journals Zone 2: 150 papers in 9 Journals Zone 3: 150 papers in 27 Journals Ranking by Bradfordizing: sorting the core journal papers / core books on top bradfordized list of journals in informetrics applied to monographs: publisher as sorting criterion
  • 7. Author Centrality (Mutschke 2001, 2004) Ranking by Author Centrality: sorting central author papers on top
  • 8. Scenarios for combined ranking services iterative use : simultanous use: Result Set Core Journal Papers Central Author Papers Relevant Papers Result Set Central Author Papers Core Journal Papers
  • 11. Main Research Issue: Contribution to retrieval quality and usability • Precision: – Do central authors (core journals) provide more relevant hits? – Do highly associated cowords have any positive effects? • Value-adding effects: – Do central authors (core journals) provide OTHER relevant hits? – Do coword-relationships provide OTHER relevant search terms? • Mashup effects: – Do combinations of the services enhance the effects?
  • 12. Evaluation Design • precision in existing evaluation data: – Clef 2003-2007: 125 topics; 65,297 SOLIS documents – KoMoHe 2007: 39 topics; 31,155 SOLIS documents • plausibility tests: – author centrality / journal coreness ↔ precision – Bradfordizing ↔ author centrality • precision tests with users (Online-Assessment-Tool) • usability tests with users (acceptance)
  • 13. Evaluation of Bradfordizing on CLEF Data (Mayr 2013) 0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 Bradford zones (core, z2, z3) 2003 articles 0,29 0,22 0,16 2004 articles 0,23 0,18 0,13 2005 articles 0,31 0,24 0,17 2006 articles 0,29 0,27 0,24 2007 articles 0,28 0,26 0,22 2005 monographs 0,21 0,16 0,19 2006 monographs 0,28 0,28 0,24 2007 monographs 0,24 0,21 0,23 core z2 z3 journal articles: significant improvement of precision from zone3 to core monographs: slight improvement of precision distribution between the three zones precision between Bradford zones (core, zone2 and zone3)
  • 14. Evaluation of Author Centrality on CLEF Data • moderate positive relationship between rate of networking and precision • precision of TF-IDF rankings (0.60) significantly higher than author centrality based rankings (0.31) – BUT: • very little overlap of documents on top of the ranking lists: 90% of relevant hits provided by author centrality did not appear on top of TF-IDF rankings → added precision of 28% 0 20 40 60 80 100 120 140 0 0,2 0,4 0,6 0,8 1 1,2 GiantSize Precision Correlation Precision10 - Giant Size: 0.25 • author centrality seems to favor OTHER relevant documents than traditional rankings • value-adding effect: other view to the information space avg number docs 517 avg number authors 664 avg number co-authors 302 avg giant size 24
  • 15. Result: overlap Intersection of suggested top n=10 documents over all topics and services Mutschke et al. 2011 top 10 result lists are marginal overlapping!
  • 21. IRM & Modeling Science measuring contribution of bibliometric-enhanced services to retrieval quality deeper insights in structure & functioning of science Bibliometric-enhanced services (structural attributes of science system) way towards a formal model of science
  • 22. References • Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value- added services for scholarly information systems. Scientometrics, 89(1), 349– 364. doi:10.1007/s11192-011-0430-x • Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’13 (pp. 1093–1094). New York, New York, USA: ACM Press. doi:10.1145/2484028.2484207 • Mayr, P. (2013). Relevance distributions across Bradford Zones: Can Bradfordizing improve search? In J. Gorraiz, E. Schiebel, C. Gumpenberger, M. Hörlesberger, & H. Moed (Eds.), 14th International Society of Scientometrics and Informetrics Conference (pp. 1493–1505). Vienna, Austria. Retrieved from http://arxiv.org/abs/1305.0357 • Hienert, D., Schaer, P., Schaible, J., & Mayr, P. (2011). A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), International Conference on Theory and Practice of Digital Libraries (TPDL) (pp. 192–203). Berlin: Springer. doi:10.1007/978-3-642-24469-8_21 22
  • 24. Thank you! Dr Philipp Mayr GESIS Leibniz Institute for the Social Sciences Unter Sachsenhausen 6-8 50667 Cologne Germany philipp.mayr@gesis.org 24