This document summarizes a presentation about analyzing large bibliometric data sources. It discusses the speaker's background in bibliometrics and their research center CWTS. CWTS has access to large bibliographic databases and focuses on bibliometric and scientometric research. Software tools for constructing and analyzing bibliometric networks are described, including VOSviewer and CitNetExplorer which were developed by the speaker. Network analysis techniques like community detection and layout algorithms are also covered. Finally, the document analyzes the field of data science using bibliometric methods by identifying publications related to data and mapping the growth and structure of data-driven research fields.
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Analysis of bibliometric data sources and data science fields
1. Large-scale analysis of bibliometric
data sources
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
8th LCDS Meeting: Statistics & Data Science
Leiden, November 13, 2015
2. About myself
• Master in computer science
• PhD thesis on bibliometric
mapping of science
• Researcher at CWTS since 2009
• Research focus on analysis and
visualization of bibliometric
networks
1
3. Centre for Science and Technology
Studies (CWTS)
• Research center at Leiden University
focusing on science and technology
studies
• About 30 staff members
• History of more than 25 years in
bibliometric and scientometric
research
• Contract research
• Full access to large bibliographic
database (Web of Science and
Scopus)
2
4. Bibliographic databases: ‘Big data’
3
Web of Science Scopus
Journals 12,000 20,000
Publications 45 million 35 million
Citations 1 billion 0.9 billion
5. Bibliometric networks
4
Web of
Science
Scopus
Citation network
of publications
Co-authorship network
of authors / organizations
Co-citation network
of pubs / authors / journals
Co-occurrence network
of terms
Bibliographic coupling network
of pubs / authors / journals
Bibliographic
database
8. Software tools
• VOSviewer (www.vosviewer.com)
– Tool for constructing and visualizing bibliometric networks
• CitNetExplorer (www.citnetexplorer.nl)
– Tool for visualizing and analyzing citation networks of
publications
• Both tools have been developed together
with my colleague Ludo Waltman 7
15. Smart local moving algorithm
15
Q = 0.4198
Q = 0.3791
Reduced
network
Local moving
heuristic in
subnetworks
Local moving heuristic
Original
network
16. Algorithmically constructed
classification system of science
• 16.2 million publications from the period 2000–
2014 indexed in Web of Science
• 241.7 million citation relations
• Classification system of 3 hierarchical levels:
– 28 broad disciplines
– 813 fields
– 3,822 subfields
16
17. 17
Breakdown of scientific literature into
813 fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
21. What is data science?
• Empirical operationalization of data science based
on publications with ‘data’ in title or abstract
21
Wikipedia: “Data Science is an interdisciplinary field
about processes and systems to extract knowledge
or insights from data … which is a continuation of
some of the data analysis fields such as statistics,
data mining, and predictive analytics”
LCDS: “Data Science … deals with finding, analyzing
and validating complex patterns in data. Data
Science methods are indispensable for maintaining a
competitive edge in all disciplines in science”
23. 23
Breakdown of scientific literature into
813 fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
24. 24
Data-driven nature of different
scientific fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
% pub. with ‘data’ in title or abstract
25. 25
Data-driven nature of different
scientific fields
artificial
intelligence
statistics
bioinformatics
neuroimaging
pattern
recognition astronomy
earth
water
weather
climate
remote
sensing
nutrition
obesity
addiction
% pub. with ‘data’ in title or abstract
26. Data science fields (at least 20% ‘data’
publications)
26
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
28. 28
Leiden University’s publication output
in data science fields
Social sciences
and humanities
Biomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
29. Leiden University’s institutes with most
publications in data science fields
• Leiden Observatory
• LUMC
• Faculty of Archaeology
• Institute of Psychology (FSW)
• Centre for Science and Technology Studies (FSW)
• Mathematical Institute (Science)
• Institute of Biology Leiden (Science)
• Leiden Institute of Advanced Computer Science
(Science)
29
30. LUMC departments with most
publications in data science fields
• Medical Statistics and Bioinformatics
• Rheumatology
• Psychiatry
• Radiology
• Clinical Epidemiology
• Human Genetics
• Neurosurgery
• Cardiology
• Clinical Oncology
• Endocrinology 30
31. Term map based on Leiden University’s
publications in data science fields
31