1. Crowdsourcing in Article Evaluation
Isabella Peters1, Stefanie Haustein1,2 & Jens Terliesner1 MULTIDIMENSIONAL
JOURNAL
isabella.peters@uni-duesseldorf.de | s.haustein@fz-juelich.de | jens.terliesner@uni-duesseldorf.de JOURNAL
MANAGEMENT
EVALUATION
GENERAL IDEA OF CROWDSOURCING ARTICLE & JOURNAL EVALUATION
Past: traditional journal evaluation uses cumulated Present: access‐, download‐, subscription statistics of Future: a multidimensional approach which combines
citation numbers of articles. electronic articles should reflect usage of articles and available usage numbers.
Problem: citations do not appropriately reflect articles‘ journals. Focus: data of STM‐social bookmarking systems (e.g.,
influence on readers because only such readers were Problem: measuring is problematic although standards CiteULike) for measuring journal perception and reader
count, who also write articles and publish in journals. are given. Global usage statistics are not available. perception of articles as crowdsourced alternative.
88.4% of all retrieved bookmarks were tagged
DATA COLLECTION & TEST SETS matc
hing
via
DO
I REV MOD PHYS 10000
8,208 tags were assigned 38,241 times
Test set I O
I
/D
via SN
45 solid state physics journals g
in / IS
ch n
1000
ar tio
frequency
all publications from 2004 to 2008 se evia
ab
br
100
J PHYS A /
bibliographic data for 168,109 J PHYS A tit
le
SOFT MATTER
documents from Web of Science 10
PHYS REV E
PHYS REV E
Data collection CiteULike
1
1 10 100 1000 10000
matching of articles and bookmarks REV MOD PHYS
tag tags
tagcloud: all tags assigned at least 50 times
to articles in CiteULike, BibSonomy
and Connotea journals
350 13,608 bookmarks were matched to
satoshi (322 posts)
Test set II 300
10,280 articles
Number of bookmarks per user
13,760 correct bookmarks retrieved bookmarks J PHYS A
250
2,441 unique users
bronckobuster (238 posts)
for articles of test set I Number of bookmarks retrieved rice (234 posts)
1,179 users posted one article
bibliographic data 200
BibSonomy 940
75% of content is created by 21%
150
of users
CiteULike 10778 100 8,511 articles were only
50 bookmarked once
Connotea 2042
0
Users from BibSonomy, CiteULike and Connotea
RQ: DO TAGS REFLECT OTHER VIEWS ON ARTICLES THAN AUTHOR OR INTERMEDIARY KEYWORDS?
Comparison of: Preprocessing and cleaning of keyword sets: Results from preprocessing and cleaning:
aim: to receive a linguistically homogenous keyword author: ‐55.3% spelling variants
subject headings
collection intermediary: ‐2.8% spelling variants
publication
all keywords: removed special characters (except automatic: ‐5.3% spelling variants
Inspec
author matching via DOI 724 articles of test set I&II hyphens and underscores), lower case, BE to AE, tags: ‐8.4% variants
*
keywords contained all keyword types stemming with Porter 2
title
author keywords: removed stop words and dataset author: +34.1% overlap**
tags
terms
comparison of keyword sets specific terms (e.g., imported) intermediary: +21% overlap
abstract on article level via cosine tags for comparison with title & abstract terms: split at automatic: +20.6% overlap
BibSonomy
terms CiteULike
similarity coefficient separating character (e.g., hyphen or undescore)
Web of Science Connotea
tags for comparison with automatic & controlled ** at least one term in common
KeyWords PlusTM
keywords: deletion of separating character and blanks
RESULTS OF TERM SET COMPARISON mean overlap tag ratio
tags in terms mean overlap term ratio
terms in tags
mean cosine similarity
between tags and keywords
tags reveal user perception of articles
tags for articles of the
journal J Stat Mech crowdsourcing article & journal evaluation
Analysis over time can reveal shifts in thematic focus areas
tags assigned to articles
intermediary keywords for articles published in J Phys
of the journal J Stat Mech Condens Matter in 2004
Mitglied der Helmholtz-Gemeinschaft
tags assigned to articles published in
J Phys Condens Matter in 2008
overlap: at least one
term in common
Good, B., Tennis, J., & Wilkinson, M. 2009. Social tagging in the life sciences: Characterizing a new metadata resource for bioinformatics. BMC Bioinformatics, 10(313). DOI= 10.1186/1471‐2105‐10‐313.
Haustein, S. 2011. Wissenschaftliche Zeitschriften im Web 2.0. Die Analyse von Bookmarks zur Evaluation wissenschaftlicher Journale. In Proceedings of the 12th International Symposium for Information Science (Hildesheim,
Germany, March 09‐11, 2011). 148‐159.
Haustein, S., & Siebenlist, T. 2011. Applying social bookmarking data to evaluate journal usage. Journal of Informetrics, 5(3). 446‐457. DOI= 10.1016/j.joi.2011.04.002
References
Jeong, W. 2009. Is tagging effective? Overlapping ratios with other metadata fields. In Proceedings of the International Conference on Dublin Core and Metadata Applications (Seoul, Korea, October 12‐16, 2009). 31‐39.
Lin, X., Beaudoin, J., Bul, Y., & Desai, K. 2006. Exploring characteristics of social classification. In Proceedings of the 17th Annual ASIS&T SIG/CR Classification Research Workshop (Austin, USA, November 03‐08, 2006). 1 Department of Information Science, Heinrich‐Heine‐University Düsseldorf,
Lu, C., Park J., &Hu, X. 2010. User tags versus expert‐assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings. Journal of Information Science, 36(6), 763‐779.
Lux, M., Granitzer, M., & Kern, R. 2007. Aspects of broad folksonomies. In Proceedings of the 18th International Conference on Database and Expert Systems Applications (Regensburg, Germany, September 03‐07, 2007). 283‐287. Universitätsstraße 1, 40225 Düsseldorf (Germany)
Noll, M. G., & Meinel, C. 2007. Authors vs. readers. A comparative study of document metadata and content in the WWW. In Proceedings of the 2007 ACM Symposium on Document Engineering (Winnipeg, Canada, August 28‐31, 2 Central Library, Forschungszentrum Jülich,
2007). 177‐186.
Peters, I. 2009. Folksonomies. Indexing and Retrieval in Web 2.0. De Gruyter Saur, München.
Terliesner, J., & Peters, I. 2011. Der T‐Index als Stabilitätsindikator für dokument‐spezifische Tag‐Verteilungen. In Proceedings of the 12th International Symposium for Information Science (Hildesheim, Germany, March 09‐11,
52425 Jülich (Germany)
2011). 123‐133.
* http://snowball.tartarus.org/download.php.