Improved term maps using full-text data

•

1 j'aime•222 vues

1) The document discusses using full-text data rather than just metadata to create improved term maps for visualizing topics in scientific literature. 2) It compares different approaches for creating term maps using full-text data from publications in the Journal of Informetrics, including using titles/abstracts vs full text, binary vs full counting of term co-occurrences, and mapping at the publication level vs paragraph level. 3) The results show that full-text data yields richer maps than just titles and abstracts, and that full counting is preferable to binary counting when using full text. Paragraph-level maps provide more fine-grained structure but areas may not always represent literature topics.

Sciences

Using full-text data to create
improved term maps
Nees Jan van Eck1, Ludo Waltman1, Min Song2, and Yoo Kyung Jeong2
1Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
2Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
16th International Conference on Scientometrics & Informetrics
Wuhan, China, October 19, 2017

Introduction
• Traditionally bibliometric analyses are based on
meta data of scientific publications
• Full text of scientific publications is increasingly
becoming available in structured formats
• We study different approaches for creating
VOSviewer term maps using full text data
• We perform comparisons with a traditional
approach based on titles and abstracts
1

Interpretation of a term map
• Size:
– The larger a term, the higher the frequency of occurrence of the
term
• Distance:
– In general, the smaller the distance between two terms, the
higher the relatedness of the terms, as measured by co-
occurrences
– Horizontal and vertical axes have no special meaning
• Colors:
– Colors indicate clusters of closely related terms
3

Creating a term map
1. Input English-language text corpus
2. Identify terms
3. Count co-occurrences of terms
4. Create layout and clustering
4

Counting co-occurrences of terms
• Full counting:
– All occurrences of a term in a document are counted
• Binary counting:
– Only the presence or absence of a term matters
– Number of occurrences of a term is not taken into account
5

Data
• Full text of publications in Journal of Informetrics
• 688 publications in the period 2007-2016
• Downloaded in XML format using the Elsevier
ScienceDirect Article Retrieval API
6
Average
per pub.
Sections 6.0
Paragraphs 42.1
Sentences 191.1

Titles and abstracts / binary counting
9

Full text, publication level / full counting
10

Full text, paragraph level / full counting
11

Conclusions
• Full text vs. titles and abstracts:
– Full text yields richer maps than titles and abstracts
– Richer maps may be useful for interactive visualization, perhaps
not for static visualization
• Full counting vs. binary counting:
– When using full text data, full counting is preferable over binary
counting
• Paragraph level vs. publication level:
– Paragraph-level maps have more fine-grained structure than
publication-level maps
– However, areas in paragraph-level maps do not always represent
topics in the literature
12

Future research
• Use full-text data for creating other types of maps,
in particular co-citation maps
13

Recommandé

VOSviewer: A software tool for analyzing and visualizing scientific literatureNees Jan van Eck

Large-scale visualization of scienceNees Jan van Eck

Visual exploration of scientific literature using VOSviewer and CitNetExplorerNees Jan van Eck

Scientometric approaches to classificationNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

Bibliometric visualization using VOSviewerLudo Waltman

Bibliometric network analysis: Software tools, techniques, and an analysis o...Nees Jan van Eck

Large-scale analysis of bibliometric data sourcesNees Jan van Eck

Recommandé

VOSviewer: A software tool for analyzing and visualizing scientific literatureNees Jan van Eck

Large-scale visualization of scienceNees Jan van Eck

Visual exploration of scientific literature using VOSviewer and CitNetExplorerNees Jan van Eck

Scientometric approaches to classificationNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

Bibliometric visualization using VOSviewerLudo Waltman

Bibliometric network analysis: Software tools, techniques, and an analysis o...Nees Jan van Eck

Large-scale analysis of bibliometric data sourcesNees Jan van Eck

Large-scale analysis of bibliometric networksNees Jan van Eck

Large-scale visualization of science: Methods, tools, and applicationsLudo Waltman

Crossref as a source of open bibliographic metadataNees Jan van Eck

A new software tool for large-scale analysis of citation networksNees Jan van Eck

Applications of community detection in bibliometric network analysisNees Jan van Eck

VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...Nees Jan van Eck

Science Mapping and Research PositioningNees Jan van Eck

Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...Nees Jan van Eck

Intermediacy of publicationsNees Jan van Eck

Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...Ludo Waltman

Advanced citation matching and large-scale cited reference extractionNees Jan van Eck

CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck

Visualizing science based on open data sourcesNees Jan van Eck

Multiple perspectives on bibliometric dataNees Jan van Eck

Advanced bibliometric software tools for publishers and editorsNees Jan van Eck

VOSviewer and CitNetExplorer TutorialNees Jan van Eck

Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...Nees Jan van Eck

Toward open citations: Why, how, and when?Ludo Waltman

Accuracy of citation data in Web of Science and ScopusNees Jan van Eck

A systematic empirical comparison of different approaches for normalizing cit...Nees Jan van Eck

Szomszor "Methods and Tools for Scholarly Data Analytics"National Information Standards Organization (NISO)

Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky

Contenu connexe

Tendances

Large-scale analysis of bibliometric networksNees Jan van Eck

Large-scale visualization of science: Methods, tools, and applicationsLudo Waltman

Crossref as a source of open bibliographic metadataNees Jan van Eck

A new software tool for large-scale analysis of citation networksNees Jan van Eck

Applications of community detection in bibliometric network analysisNees Jan van Eck

VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...Nees Jan van Eck

Science Mapping and Research PositioningNees Jan van Eck

Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...Nees Jan van Eck

Intermediacy of publicationsNees Jan van Eck

Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...Ludo Waltman

Advanced citation matching and large-scale cited reference extractionNees Jan van Eck

CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck

Visualizing science based on open data sourcesNees Jan van Eck

Multiple perspectives on bibliometric dataNees Jan van Eck

Advanced bibliometric software tools for publishers and editorsNees Jan van Eck

VOSviewer and CitNetExplorer TutorialNees Jan van Eck

Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...Nees Jan van Eck

Toward open citations: Why, how, and when?Ludo Waltman

Accuracy of citation data in Web of Science and ScopusNees Jan van Eck

A systematic empirical comparison of different approaches for normalizing cit...Nees Jan van Eck

Tendances (20)

Large-scale analysis of bibliometric networks

Large-scale visualization of science: Methods, tools, and applications

Crossref as a source of open bibliographic metadata

A new software tool for large-scale analysis of citation networks

Applications of community detection in bibliometric network analysis

VOSviewer and CitNetExplorer: Software tools for bibliometric analysis of s...

Science Mapping and Research Positioning

Bibliometrische visualisaties voor het bijhouden van wetenschappelijke litera...

Intermediacy of publications

Web of Science, Scopus, Dimensions, and beyond: The evolving landscape of bib...

Advanced citation matching and large-scale cited reference extraction

CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

Visualizing science based on open data sources

Multiple perspectives on bibliometric data

Advanced bibliometric software tools for publishers and editors

VOSviewer and CitNetExplorer Tutorial

Visualizing science using VOSviewer based on Crossref, Microsoft Academic, an...

Toward open citations: Why, how, and when?

Accuracy of citation data in Web of Science and Scopus

A systematic empirical comparison of different approaches for normalizing cit...

Similaire à Improved term maps using full-text data

Szomszor "Methods and Tools for Scholarly Data Analytics"National Information Standards Organization (NISO)

Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky

Towards a Semantic Citation Index for the German Social SciencesGESIS

British Libraryclarivate

Intra- and interdisciplinary cross-concordances for information retrieval GESIS

Head Start: Improving Academic Literature Search with Overview Visualizations...Open Knowledge Maps

Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf

· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docxoswald1horne84988

B sc mathematics project guidelines for final year studentskuckoo1

Knowledge Representation on the WebRinke Hoekstra

Groups of Highly Cited Publications: Stability in Content with Citation Windo...Nadine Rons

Making topic maps from Subject Headings for linking and organizingLars Marius Garshol

Session5 03.george rehmIMPACT Centre of Competence

Semantically-enabled Browsing of Large Multilingual Document CollectionsCarlos Badenes-Olmedo

ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch

Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar

TopicModels_BleiPaper_Summary.pptxKalpit Desai

information-skills-for-researchers-v3Jacqueline Thomas

Search term recommendation and non-textual ranking evaluatedGESIS

Concept maphariom2015

Similaire à Improved term maps using full-text data (20)

Szomszor "Methods and Tools for Scholarly Data Analytics"

Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...

Towards a Semantic Citation Index for the German Social Sciences

British Library

Intra- and interdisciplinary cross-concordances for information retrieval

Head Start: Improving Academic Literature Search with Overview Visualizations...

Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...

· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx

B sc mathematics project guidelines for final year students

Knowledge Representation on the Web

Groups of Highly Cited Publications: Stability in Content with Citation Windo...

Making topic maps from Subject Headings for linking and organizing

Session5 03.george rehm

Semantically-enabled Browsing of Large Multilingual Document Collections

ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...

Comparison of Techniques for Measuring Research Coverage of Scientific Papers...

TopicModels_BleiPaper_Summary.pptx

information-skills-for-researchers-v3

Search term recommendation and non-textual ranking evaluated

Concept map

Plus de Nees Jan van Eck

Community detection using citation relations and textual similarities in a la...Nees Jan van Eck

A scientometric perspective on university rankingNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

A scientometric perspective on university rankingNees Jan van Eck

CWTS Leiden Ranking: An advanced bibliometric approach to university rankingNees Jan van Eck

Open data sources in VOSviewerNees Jan van Eck

How to design a ranking system: Criteria and opportunities for a comparisonNees Jan van Eck

On cluster stabilityNees Jan van Eck

Network visualization: Fine-tuning layout techniques for different types of n...Nees Jan van Eck

Cluster stabilityNees Jan van Eck

Plus de Nees Jan van Eck (10)

Community detection using citation relations and textual similarities in a la...

A scientometric perspective on university ranking

Open data sources in VOSviewer

A scientometric perspective on university ranking

CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

Open data sources in VOSviewer

How to design a ranking system: Criteria and opportunities for a comparison

On cluster stability

Network visualization: Fine-tuning layout techniques for different types of n...

Cluster stability

Dernier

Davis plaque method.pptx recombinant DNA technologycaarthichand2003

Citronella presentation SlideShare mani upadhyayupadhyaymani499

Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju

Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju

User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems

basic entomology with insect anatomy and taxonomyDrAnita Sharma

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju

User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth

ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School

FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV

Radiation physics in Dental Radiology...navyadasi1992

Four Spheres of the Earth Presentation.pptJoemSTuliba

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1

Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9

Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051

Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju

Speech, hearing, noise, intelligibility.pptxpriyankatabhane

Dernier (20)

Davis plaque method.pptx recombinant DNA technology

Citronella presentation SlideShare mani upadhyay

Pests of safflower_Binomics_Identification_Dr.UPR.pdf

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf

User Guide: Capricorn FLX™ Weather Station

basic entomology with insect anatomy and taxonomy

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf

User Guide: Orion™ Weather Station (Columbia Weather Systems)

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx

ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX

FREE NURSING BUNDLE FOR NURSES.PDF by na

Radiation physics in Dental Radiology...

Four Spheres of the Earth Presentation.ppt

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx

Base editing, prime editing, Cas13 & RNA editing and organelle base editing

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR

Forensic limnology of diatoms by Sanjai.pptx

Pests of Bengal gram_Identification_Dr.UPR.pdf

Speech, hearing, noise, intelligibility.pptx

Improved term maps using full-text data

1. Using full-text data to create improved term maps Nees Jan van Eck1, Ludo Waltman1, Min Song2, and Yoo Kyung Jeong2 1Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands 2Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea 16th International Conference on Scientometrics & Informetrics Wuhan, China, October 19, 2017

2. Introduction • Traditionally bibliometric analyses are based on meta data of scientific publications • Full text of scientific publications is increasingly becoming available in structured formats • We study different approaches for creating VOSviewer term maps using full text data • We perform comparisons with a traditional approach based on titles and abstracts 1

3. VOSviewer term maps 2

4. Interpretation of a term map • Size: – The larger a term, the higher the frequency of occurrence of the term • Distance: – In general, the smaller the distance between two terms, the higher the relatedness of the terms, as measured by co- occurrences – Horizontal and vertical axes have no special meaning • Colors: – Colors indicate clusters of closely related terms 3

5. Creating a term map 1. Input English-language text corpus 2. Identify terms 3. Count co-occurrences of terms 4. Create layout and clustering 4

6. Counting co-occurrences of terms • Full counting: – All occurrences of a term in a document are counted • Binary counting: – Only the presence or absence of a term matters – Number of occurrences of a term is not taken into account 5

7. Data • Full text of publications in Journal of Informetrics • 688 publications in the period 2007-2016 • Downloaded in XML format using the Elsevier ScienceDirect Article Retrieval API 6 Average per pub. Sections 6.0 Paragraphs 42.1 Sentences 191.1

8. 7

9. Term maps 8

10. Titles and abstracts / binary counting 9

11. Full text, publication level / full counting 10

12. Full text, paragraph level / full counting 11

13. Conclusions • Full text vs. titles and abstracts: – Full text yields richer maps than titles and abstracts – Richer maps may be useful for interactive visualization, perhaps not for static visualization • Full counting vs. binary counting: – When using full text data, full counting is preferable over binary counting • Paragraph level vs. publication level: – Paragraph-level maps have more fine-grained structure than publication-level maps – However, areas in paragraph-level maps do not always represent topics in the literature 12

14. Future research • Use full-text data for creating other types of maps, in particular co-citation maps 13

15. 14 Thank you for your attention!