The Data Citation Index indexes over 2.6 million data records and datasets from repositories across scientific fields. It contains over 400,000 citations to these records. However, 88% of the indexed resources remain uncited. Citations are more common for data studies than datasets. Certain fields like crystallography, biochemistry and molecular biology have higher than average citation rates for datasets. Sociology, demography and economics have higher citation rates for data studies. The authors note that data sharing could be lower than expected due to lack of recognition and incentives for researchers.
How many citations are there in the Data Citation Index?
1. How many citations are there in the Data Citation Index?
D Torres-Salinas, E Jiménez-Contreras & N Robinson-Garcia
EC3 Research Group
EC3Metrics
University of Granada
19th International Conference on Science and Technology Indicators
3-5 September 2014 Leiden, The Netherlands
3. Rationale
“The data deluge has arrived.[…] If the rewards of the data deluge are to be reaped, then researchers who produce those data must share them”
Borgman, 2012
Peng, 2011
4. Rationale
“The ‘dirty little secret’ behind the promotion of data sharing is that not much sharing may be taking place”
Borgman, 2012
“The lack of recognition incentives is regarded as a crucial and unresolved obstacle to establishing a data sharing culture”
Piwowar et al., 2008
5. Data and citations
“A consistent, rigorous approach to data citation is lacking”
Parsons et al., 2010
What do we cite?
Original study <- Piwowar et al.
Data papers <- Scientific Data
Data sets <- Data Citation Index
6. Data Citation Index
GENERAL DESCRIPTION
Multidisciplinary database launched in 2012
It indexes data repositories from all scientific fields along with citation data associated to them
Follows an evaluation and selection process at the level of repository based on: subject, editorial content and geographic origin and scope
7. Data Citation Index
PUBLICATION TYPES
Data repositories a database comprising datasets and data studies which stores and provides access to the raw data
Datasets a single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.
Data studies description of studies or experiments held in repositories with the associated data which have been used in the data study.
10. Data Citation Index
MATERIAL AND METHODS
Data retrieval in May-June 2013
Analysis by areas: Science, Engineering & Technology, Social Sciences and Arts & Humanities
arXiv:1306.6584
11. Data Citation Index
GENERAL INDICATORS All Document TypesDatasetsData studiesTotal Citations404,211294,051106,895Total Records2,623,5282,468,736154,674Uncited Records2,311,5532,185,062126,428% Uncited88.1188.5181.74Citation Average0.150.120.69Standard Desviation3.060.369.56
12. Data Citation Index
REPOSITORIES BY AREA Engineering & Technology1Science67Social Sciences19Humanities & Arts9
13. Datasets Citat ions Data studies Citat ions
Engineering & Technology 1545 890 240 26
Humanit ies & Arts 44588 1 6847 20459
Science 2004449 293193 114338 26189
Social Sciences 424952 7 37855 69659
Data Citation Index
RECORDS AND CITATIONS BY AREA AND TYPE
14. Data Citation Index
TOP 10 CATEGORIES HIGHLY CITED FOR DATASETS 0.000.501.001.500% 10% 20% 30% 40% 50% CrystallographyBiochemistry & Mol. BiologyGenetics & HeredityGeosciencesPhysics, Atomic, MolecularEvolutionary BiologyCell BiologySpectroscopyMedical Laboratory Tech. Nanoscience & Nanotech. Citation average andstandard deviation% of total citations from DCI 47% 23% 16%
15. Data Citation Index
TOP 10 CATEGORIES HIGHLY CITED FOR DATA STUDIES 051015202530350% 10% 20% 30% SociologyDemographyEconomicsBusinessPolitical ScienceBiochemistry & Mol. BiologyGenetics & HeredityHealth Care SciencesCriminology & PenologyFamily StudiesCitation average andstandard deviation% of total citations from DCI 30%
16. Data Citation Index
MAIN REPOSITORIES IN THE DCI, CITATIONS & RECORDS 0200004000060000800001000001200001400001600000100000200000300000400000500000600000700000MiRBaseGene Expression UniProt knowledgebaseCrystallography Open DatabaseU.S. Census Bureau TIGERProteinData BankArrayExpress ArchivePANGEAUK DATAARCHIVEInter-university Consortium for Political and Social ResearchAnimal QTL Database TotalNumber of citations in the Data Citation Index TotalNumber of records indexed the Data Citation IndexSize= Total CitationsPie Chart= % of citationsLEGEND
17. Discussion
I. High rate of uncitedness (88%)
II.Biased towards the Science
III.Data sets vs. Data studies (Two Cultures?)
IV.Too soon or too presumptious?
18. THANK YOU D Torres-Salinas torressalinas@gmail.com N Robinson-Garcia elrobin@ugr.es E Jiménez Contreras evaristo@ugr.es
19th International Conference on Science and Technology Indicators
3-5 September 2014 Leiden, The Netherlands