SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Seed+Expand
aggregating the scientific output of the
Netherlands, 2000-2010
Linda Reijnhoudt, Rodrigo Costas, Ed Noyons,
Katy Börner, Andrea Scharnhorst
1
linda.reijnhoudt@dans.knaw.nl, andrea.scharnhorst@dans.knaw.nl
DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), the Hague, the Netherlands
2
rcostas@cwts.leidenuniv.nl, noyons@cwts.leidenuniv.nl
Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, the Netherlands
3
katy@indiana.edu
Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana
University, Bloomington, Indiana, United States of America
to study the dynamics on the output of
Dutch professors (2001-2011)
but, lack of data on
the output of full professors!
goal
the problem
given a Dutch professor in the NARCIS system
find all his/her publications
how to connect bibliographic data from CWTS
with the NARCIS system?
CWTS
Bibliometric
publications database:
● author
● author-order
● email (sometimes)
● affiliation
(sometimes)
● journal
context
DANS
NARCIS
dutch scholars:
● name, initials
● DAI
● affiliations
● organisation
● email
=
?
non trivial I
● misspelled names
○ Van Knienberg instead Van Knippenberg
● different initials / first name
○ Johannes and Hans
● different formats in the data across sources
○ Prefixes separated in the NARCIS system
■ P.M.P. | van | Bergen en Henegouwen
○ Made initials or concatenated in WoS
■ Henegouwen, PMPVE (Henegouwen, Paul M. P. van Bergen En)
non trivial II
● multiple scholars have the same author
name (homonymy)
● the same scholar with multiple author
names (synonymy)
○ changes over time, e.g., due to marriage
the raw data
NARCIS database (DANS)
○ 8378 Dutch full professors
■ affiliation to dutch organizations
■ name, initials
■ email
■ DAI
CWTS bibiometric data system
○ close to 23 million publications in more than 12,000
journals
○ no unique author identifier for all authors
the Gold Standard
we already know the complete oeuvre of
1400 Dutch full professors, due to manually
verified publication lists by CWTS (2001-
2010)
USEFUL TO VALIDATE OUR METHODOLOGY
the 1400 of the 8376 (17%) full professors
who already appear in this list:
the Gold Standard
the sources & main overview
Seed+Expand main concept
● seed creation, precision
○ given a full professor, {initials, name, email, affiliations}
○ find one or more publications that are most likely
authored by this professor
● seed expansion, recall
○ given these 'seed' publications,
○ find publications by the same author
1. publication-based classifications
2. Scopus Author Identifier
seed creation
1. Email seed (EM)
2. Author Address approaches (*)
a. Reprint Author (RP)
b. Direct linkage author-addresses (DL)
c. Approximate linkage author addresses (AL)
3. Digital Author Identifier seed (DAI)
(*) For these seeds, very common
names have been excluded
seed expansion
1. CWTS Paper-Based Classification
(2001-2011)
○ based on citation relationships of publications
○ 672 meso, over 20K micro disciplines
○ micro: +23% unique papers over seed
○ meso: +34% unique papers over seed
2. Scopus Author Identifier (1996-2011)
○ +69% unique papers over seed
evaluation
Gold standard:2001-2010
results
● 80% of Dutch professors detected
● Micro-disciplines: highest precision (88.5)
● Scopus Author id & micro disciplines:
same recall (95.9)
● This methodology can be applied to other
sets and author identity schemes (ORCID,
VIVO, etc.)
● Further research on disciplinary
differences and improvements
general discussion
● increasing bibliographic data sources but
still lacking author disambiguated data!!
● lack of research on how to connect
databases
○ repositories
○ bibliographic databases (WoS, Scopus, etc.)
○ altmetrics
● e-mail data and DAI/ORCID-like
identifiers are powerful linking elements
across systems
the end ...
thank you very much for your attention!
questions?
comments?
five seeds
combined: 6753 of 8376 full professors found

Contenu connexe

En vedette

Knowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use casesKnowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use casesAndrea Scharnhorst
 
KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013Andrea Scharnhorst
 
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Andrea Scharnhorst
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingAndrea Scharnhorst
 
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...Andrea Scharnhorst
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...Andrea Scharnhorst
 
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.Andrea Scharnhorst
 

En vedette (8)

Knowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use casesKnowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use cases
 
KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013
 
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
 
If only I had a map!
If only I had a map!If only I had a map!
If only I had a map!
 
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...
 
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
 

Similaire à Seed and Expand

Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Anne-Wil Harzing
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Anne-Wil Harzing
 
Benchmarking Research Performance
Benchmarking Research PerformanceBenchmarking Research Performance
Benchmarking Research PerformanceAnne-Wil Harzing
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?Gudmundur Thorisson
 
Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401Ricardo Valls P. Geo., M. Sc.
 
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...Nadine Rons
 
Large-scale visualization of science
Large-scale visualization of scienceLarge-scale visualization of science
Large-scale visualization of scienceNees Jan van Eck
 
Search like a Pro!
Search like a Pro!Search like a Pro!
Search like a Pro!Aaron Tay
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profileCreate and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profileNader Ale Ebrahim
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslidesNathalie Cornée
 
British Library
British LibraryBritish Library
British Libraryclarivate
 
What can we learn from academic impact
What can we learn from academic impactWhat can we learn from academic impact
What can we learn from academic impactAnne-Wil Harzing
 
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Anne-Wil Harzing
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introductionVince Smith
 
Transforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional RepositoriesTransforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional RepositoriesNASIG
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile Nader Ale Ebrahim
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSMaaike Duine
 

Similaire à Seed and Expand (20)

Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
 
Benchmarking Research Performance
Benchmarking Research PerformanceBenchmarking Research Performance
Benchmarking Research Performance
 
Fransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of RecordFransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of Record
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?
 
Mapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practicesMapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practices
 
Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401
 
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
 
Large-scale visualization of science
Large-scale visualization of scienceLarge-scale visualization of science
Large-scale visualization of science
 
Search like a Pro!
Search like a Pro!Search like a Pro!
Search like a Pro!
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profileCreate and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides
 
British Library
British LibraryBritish Library
British Library
 
What can we learn from academic impact
What can we learn from academic impactWhat can we learn from academic impact
What can we learn from academic impact
 
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Transforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional RepositoriesTransforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional Repositories
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 

Plus de Andrea Scharnhorst

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
 
The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...Andrea Scharnhorst
 
Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...Andrea Scharnhorst
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Andrea Scharnhorst
 
Dilemmata of research infrastructures
Dilemmata of research infrastructuresDilemmata of research infrastructures
Dilemmata of research infrastructuresAndrea Scharnhorst
 
Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processAndrea Scharnhorst
 
SUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINESSUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINESAndrea Scharnhorst
 
Information science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital ArchiveInformation science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital ArchiveAndrea Scharnhorst
 
Why do we need to model the science system?
Why do we need to model the science system?Why do we need to model the science system?
Why do we need to model the science system?Andrea Scharnhorst
 
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...Andrea Scharnhorst
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Andrea Scharnhorst
 
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...Andrea Scharnhorst
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesAndrea Scharnhorst
 
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...Andrea Scharnhorst
 
Digital Humanities as a Virtual Community
Digital Humanities as a Virtual CommunityDigital Humanities as a Virtual Community
Digital Humanities as a Virtual CommunityAndrea Scharnhorst
 
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...Andrea Scharnhorst
 
Future of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge MapsFuture of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge MapsAndrea Scharnhorst
 
KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps Andrea Scharnhorst
 

Plus de Andrea Scharnhorst (20)

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...
 
Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)
 
Dilemmata of research infrastructures
Dilemmata of research infrastructuresDilemmata of research infrastructures
Dilemmata of research infrastructures
 
DARIAH Contributions 2019
DARIAH Contributions 2019DARIAH Contributions 2019
DARIAH Contributions 2019
 
Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
SUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINESSUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINES
 
Information science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital ArchiveInformation science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital Archive
 
Why do we need to model the science system?
Why do we need to model the science system?Why do we need to model the science system?
Why do we need to model the science system?
 
Humanities and ICT
Humanities and ICTHumanities and ICT
Humanities and ICT
 
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...
 
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studies
 
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
 
Digital Humanities as a Virtual Community
Digital Humanities as a Virtual CommunityDigital Humanities as a Virtual Community
Digital Humanities as a Virtual Community
 
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
 
Future of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge MapsFuture of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge Maps
 
KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps
 

Dernier

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 

Dernier (20)

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Seed and Expand

  • 1. Seed+Expand aggregating the scientific output of the Netherlands, 2000-2010 Linda Reijnhoudt, Rodrigo Costas, Ed Noyons, Katy Börner, Andrea Scharnhorst 1 linda.reijnhoudt@dans.knaw.nl, andrea.scharnhorst@dans.knaw.nl DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), the Hague, the Netherlands 2 rcostas@cwts.leidenuniv.nl, noyons@cwts.leidenuniv.nl Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, the Netherlands 3 katy@indiana.edu Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University, Bloomington, Indiana, United States of America
  • 2. to study the dynamics on the output of Dutch professors (2001-2011) but, lack of data on the output of full professors! goal
  • 3. the problem given a Dutch professor in the NARCIS system find all his/her publications how to connect bibliographic data from CWTS with the NARCIS system?
  • 4. CWTS Bibliometric publications database: ● author ● author-order ● email (sometimes) ● affiliation (sometimes) ● journal context DANS NARCIS dutch scholars: ● name, initials ● DAI ● affiliations ● organisation ● email = ?
  • 5. non trivial I ● misspelled names ○ Van Knienberg instead Van Knippenberg ● different initials / first name ○ Johannes and Hans ● different formats in the data across sources ○ Prefixes separated in the NARCIS system ■ P.M.P. | van | Bergen en Henegouwen ○ Made initials or concatenated in WoS ■ Henegouwen, PMPVE (Henegouwen, Paul M. P. van Bergen En)
  • 6. non trivial II ● multiple scholars have the same author name (homonymy) ● the same scholar with multiple author names (synonymy) ○ changes over time, e.g., due to marriage
  • 7. the raw data NARCIS database (DANS) ○ 8378 Dutch full professors ■ affiliation to dutch organizations ■ name, initials ■ email ■ DAI CWTS bibiometric data system ○ close to 23 million publications in more than 12,000 journals ○ no unique author identifier for all authors
  • 8. the Gold Standard we already know the complete oeuvre of 1400 Dutch full professors, due to manually verified publication lists by CWTS (2001- 2010) USEFUL TO VALIDATE OUR METHODOLOGY the 1400 of the 8376 (17%) full professors who already appear in this list: the Gold Standard
  • 9. the sources & main overview
  • 10. Seed+Expand main concept ● seed creation, precision ○ given a full professor, {initials, name, email, affiliations} ○ find one or more publications that are most likely authored by this professor ● seed expansion, recall ○ given these 'seed' publications, ○ find publications by the same author 1. publication-based classifications 2. Scopus Author Identifier
  • 11. seed creation 1. Email seed (EM) 2. Author Address approaches (*) a. Reprint Author (RP) b. Direct linkage author-addresses (DL) c. Approximate linkage author addresses (AL) 3. Digital Author Identifier seed (DAI) (*) For these seeds, very common names have been excluded
  • 12. seed expansion 1. CWTS Paper-Based Classification (2001-2011) ○ based on citation relationships of publications ○ 672 meso, over 20K micro disciplines ○ micro: +23% unique papers over seed ○ meso: +34% unique papers over seed 2. Scopus Author Identifier (1996-2011) ○ +69% unique papers over seed
  • 14. results ● 80% of Dutch professors detected ● Micro-disciplines: highest precision (88.5) ● Scopus Author id & micro disciplines: same recall (95.9) ● This methodology can be applied to other sets and author identity schemes (ORCID, VIVO, etc.) ● Further research on disciplinary differences and improvements
  • 15. general discussion ● increasing bibliographic data sources but still lacking author disambiguated data!! ● lack of research on how to connect databases ○ repositories ○ bibliographic databases (WoS, Scopus, etc.) ○ altmetrics ● e-mail data and DAI/ORCID-like identifiers are powerful linking elements across systems
  • 16. the end ... thank you very much for your attention! questions? comments?
  • 17. five seeds combined: 6753 of 8376 full professors found