Between information retrieval services and bibliometrics research. New ways of semantic browsing and visual analytics

Between information retrieval services and
bibliometrics research
–
new ways of semantic browsing and visual analytics
Rob Koopman, Shenghui Wang
OCLC Research
Andrea Scharnhorst
DANS- KNAW
November 7, 2015
ASIST, sigmetrics workshop

Content
- New approach to find structure in
bibliographic information – ARIADNE (2 Method)
- Applications:
- Data curation – author disambiguation (1 Motivation)
- Illustration of topics – the case of digital humanities
Topical browsing – DEMO (3)
- Excursion into bibliometrics – the Berlin group challenge
(4)
- Wrapping up (5)

Data curation – author disambiguation

Mapping topics, communities, research
fronts, …..
Bibliometrics
Documents are similar because
they:
- Cite each other
- Are cited together
- Use the same references
- Use the same vocabulary
- Have the same authors
Information retrieval
Documents are similar because
they:
- Use the same vocabulary
- - ….
ARIADNE is about similarity of entities!

Document/work, Record and Entity
…
Authors Title Journal … Reference Subject
Authors
names
Topical terms
Reference
Journal
Glänzel, W.
Glanzel, W.
bibliometrics
…
…
citations … Casimir effect
N=SUM (doc)

A MARC record
title
authors
issn
dewey
publisher

Demo examples
• http://thoth.pica.nl/demo/relate WorldCat
• http://thoth.pica.nl/relate ArticleFirst
• http://thoth.pica.nl/astro/relate Astrophysics
data Berlin group

Dataset
● WorldCat, 300+ million records
● Selected 13 million items (topical terms,
authors, ISSNs, Dewey decimal codes,
publishers, subject headings)
● Represented by 6 million topical terms
But a matrix of 13M x 6M is too big to process

C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C
after random projection
-- Semantic matrix
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne’s thread: In- teractive navigation in a world of networked information. In: CHI’15 Extended Abstracts.
Step 1: Building the semantic matrix
– and Dimension reduction based on Random Projection

Step 2: Interactive exploration
- Provide a simple search/text box
- Calculate the top 500 most related
candidates
- Find mutually related items
- Convert distances to probabilities
- Project to 2D
- Enhance interface with links to other spaces

Exploration of a topic
http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=

EINS 1st PLENARY
Digital libraries
Science, Computer
Science, ontologies
Many different humanities fields
Prominently language &
Literary studies
Illustration of context around a
topic/field – journal view
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread:
Interactive navigation in a world of networked information. In: CHI'15 Extended
Abstracts. (2015)

As visual exploration
of any dataset – astrophysics case

Wrapping up – future work
● Compare the algorithm to other existing algorithms – benchmarking
● More metadata fields (publisher, subject, identifiers) – ongoing
● Identify further problems to which Ariadne can be applied
● Curation (e.g. author name disambiguation);
● Knowledge discovery (e.g. matching chemical molecules);
● Information science – population of libraries, subject areas, …
● Feedback from users – Prepare user scenarios for usability testing
and set up an evaluation project – tbd
● Improve visualisation
● More functionality (timeline, history)
● Extend the implementation to other databases

Thank you
rob.koopman@oclc.org
shenghui.wang@oclc.org
Andrea.scharnhorst@dans.knaw.nl
http://thoth.pica.nl/relate (ArticleFirst)
http://thoth.pica.nl/astro/relate (Astrophysics articles)
http://thoth.pica.nl/demo/relate (WorldCat)

References
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive
navigation in a world of networked information. In: B. Begole, J. Kim, K. Inkpen, W. Woo
(eds.) Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human
Factors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18 - 23,
2015, pp. 1833{1838. ACM (2015). DOI 10.1145/2702613.2732781. URL
http://doi.acm.org/10.1145/2702613.2732781 (Preprint Arxiv.org)
Koopman, R., Wang, S., Scharnhorst, A.: Contextualization of Topics - Browsing through
Terms, Authors, Journals and Cluster Allocations. In: A.A. Salah, Y. Tonta, A.A.A.
Salah, C. Sugimoto, U. Al (eds.) Proceedings of ISSI 2015 Istanbul. 15th International
Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29th June to 4th
July 2015, pp. 1042{1053. Boazici University Printhouse, Istanbul (2015). URL http:
//www.issi2015.org/en/Proceedings-of-ISSI-2015.html

Between information retrieval services and bibliometrics research. New ways of semantic browsing and visual analytics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Between information retrieval services and bibliometrics research. New ways of semantic browsing and visual analytics

Similaire à Between information retrieval services and bibliometrics research. New ways of semantic browsing and visual analytics (20)

Plus de Andrea Scharnhorst

Plus de Andrea Scharnhorst (19)

Dernier

Dernier (20)

Between information retrieval services and bibliometrics research. New ways of semantic browsing and visual analytics

Notes de l'éditeur