Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Between information retrieval services and
bibliometrics research
–
new ways of semantic browsing and visual analytics
Rob...
Content
- New approach to find structure in
bibliographic information – ARIADNE (2 Method)
- Applications:
- Data curation...
Data curation – author disambiguation
Mapping topics, communities, research
fronts, …..
Bibliometrics
Documents are similar because
they:
- Cite each other
- Ar...
Document/work, Record and Entity
…
Authors Title Journal … Reference Subject
Authors
names
Topical terms
Reference
Journal...
A MARC record
title
authors
issn
dewey
publisher
Demo examples
• http://thoth.pica.nl/demo/relate WorldCat
• http://thoth.pica.nl/relate ArticleFirst
• http://thoth.pica.n...
Dataset
● WorldCat, 300+ million records
● Selected 13 million items (topical terms,
authors, ISSNs, Dewey decimal codes,
...
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C
after random projection
-- Semantic matrix
Koo...
Step 2: Interactive exploration
- Provide a simple search/text box
- Calculate the top 500 most related
candidates
- Find ...
Exploration of a topic
http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=
EINS 1st PLENARY
Digital libraries
Science, Computer
Science, ontologies
Many different humanities fields
Prominently lang...
As visual exploration
of any dataset – astrophysics case
Wrapping up – future work
● Compare the algorithm to other existing algorithms – benchmarking
● More metadata fields (publ...
Thank you
rob.koopman@oclc.org
shenghui.wang@oclc.org
Andrea.scharnhorst@dans.knaw.nl
http://thoth.pica.nl/relate (Article...
References
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive
navigation in a world of...
Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual ana...
Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual ana...
Prochain SlideShare
Chargement dans…5
×

Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

374 vues

Publié le

R. Koopman, S. Wang, A. Scharnhorst (2015) Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics. Presentation at the Sigmetrics workshop, ASIST 2015, November 7, 2015 St. Louis, Missouri

Publié dans : Formation
  • Soyez le premier à commenter

Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

  1. 1. Between information retrieval services and bibliometrics research – new ways of semantic browsing and visual analytics Rob Koopman, Shenghui Wang OCLC Research Andrea Scharnhorst DANS- KNAW November 7, 2015 ASIST, sigmetrics workshop
  2. 2. Content - New approach to find structure in bibliographic information – ARIADNE (2 Method) - Applications: - Data curation – author disambiguation (1 Motivation) - Illustration of topics – the case of digital humanities Topical browsing – DEMO (3) - Excursion into bibliometrics – the Berlin group challenge (4) - Wrapping up (5)
  3. 3. Data curation – author disambiguation
  4. 4. Mapping topics, communities, research fronts, ….. Bibliometrics Documents are similar because they: - Cite each other - Are cited together - Use the same references - Use the same vocabulary - Have the same authors Information retrieval Documents are similar because they: - Use the same vocabulary - - …. ARIADNE is about similarity of entities!
  5. 5. Document/work, Record and Entity … Authors Title Journal … Reference Subject Authors names Topical terms Reference Journal Glänzel, W. Glanzel, W. bibliometrics … … citations … Casimir effect N=SUM (doc)
  6. 6. A MARC record title authors issn dewey publisher
  7. 7. Demo examples • http://thoth.pica.nl/demo/relate WorldCat • http://thoth.pica.nl/relate ArticleFirst • http://thoth.pica.nl/astro/relate Astrophysics data Berlin group
  8. 8. Dataset ● WorldCat, 300+ million records ● Selected 13 million items (topical terms, authors, ISSNs, Dewey decimal codes, publishers, subject headings) ● Represented by 6 million topical terms But a matrix of 13M x 6M is too big to process
  9. 9. C: a co-occurrence matrix R: a random matrix of +/-1 C’: approximation of C after random projection -- Semantic matrix Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne’s thread: In- teractive navigation in a world of networked information. In: CHI’15 Extended Abstracts. Step 1: Building the semantic matrix – and Dimension reduction based on Random Projection
  10. 10. Step 2: Interactive exploration - Provide a simple search/text box - Calculate the top 500 most related candidates - Find mutually related items - Convert distances to probabilities - Project to 2D - Enhance interface with links to other spaces
  11. 11. Exploration of a topic http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=
  12. 12. EINS 1st PLENARY Digital libraries Science, Computer Science, ontologies Many different humanities fields Prominently language & Literary studies Illustration of context around a topic/field – journal view Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive navigation in a world of networked information. In: CHI'15 Extended Abstracts. (2015)
  13. 13. As visual exploration of any dataset – astrophysics case
  14. 14. Wrapping up – future work ● Compare the algorithm to other existing algorithms – benchmarking ● More metadata fields (publisher, subject, identifiers) – ongoing ● Identify further problems to which Ariadne can be applied ● Curation (e.g. author name disambiguation); ● Knowledge discovery (e.g. matching chemical molecules); ● Information science – population of libraries, subject areas, … ● Feedback from users – Prepare user scenarios for usability testing and set up an evaluation project – tbd ● Improve visualisation ● More functionality (timeline, history) ● Extend the implementation to other databases
  15. 15. Thank you rob.koopman@oclc.org shenghui.wang@oclc.org Andrea.scharnhorst@dans.knaw.nl http://thoth.pica.nl/relate (ArticleFirst) http://thoth.pica.nl/astro/relate (Astrophysics articles) http://thoth.pica.nl/demo/relate (WorldCat)
  16. 16. References Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive navigation in a world of networked information. In: B. Begole, J. Kim, K. Inkpen, W. Woo (eds.) Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18 - 23, 2015, pp. 1833{1838. ACM (2015). DOI 10.1145/2702613.2732781. URL http://doi.acm.org/10.1145/2702613.2732781 (Preprint Arxiv.org) Koopman, R., Wang, S., Scharnhorst, A.: Contextualization of Topics - Browsing through Terms, Authors, Journals and Cluster Allocations. In: A.A. Salah, Y. Tonta, A.A.A. Salah, C. Sugimoto, U. Al (eds.) Proceedings of ISSI 2015 Istanbul. 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29th June to 4th July 2015, pp. 1042{1053. Boazici University Printhouse, Istanbul (2015). URL http: //www.issi2015.org/en/Proceedings-of-ISSI-2015.html

×