The California Digital Library (CDL) developed a value-based strategy to assess journals which is now used as a major part of the University of California's systemwide e-journal collection planning process. The strategy involves using objective metrics to calculate the value of scholarly journals and identify titles that make a greater or lesser contribution to the University's mission of teaching, research, and public service. A key aspect of this strategy is the use of the CDL Weighted Journal Value Algorithm to assess multiple vectors of value for each journal title under review: utility, quality and cost effectiveness.
Among all the metrics used for the Algorithm, usage data is still the key metric. However, the usage data is not as reliable and comparable as might be expected. One of the reasons is that the design of a publisher's electronic interface can have a measurable effect on electronic journal usage statistics. In 2014, CDL conducted a research project to study the impact on usage data of publisher website design. The presentation also covers how vendor interfaces and other factors impact usage data.
Heart Disease Classification Report: A Data Analysis Project
INFLATED JOURNAL VALUE RANKINGS: PITFALLS YOU SHOULD KNOW ABOUT HTML AND PDF USAGE
1. INFLATED JOURNAL VALUE RANKINGS:
PITFALLS YOU SHOULD KNOW ABOUT
HTML AND PDF USAGE
Chan Li & Jacqueline Wilson
California Digital Library, University of California
ALA 2015, San Francisco
3. JOURNAL USAGE IS CENTRAL TO COLLECTION
DEVELOPMENT AND ASSESSMENT
Usage
Subjects
Cost
Impact
Factor,
Eigenfactor
Source
Normalized
Impact Per
Paper (SNIP)
Cost per
citation
Cost Per
Use
4. MANY FACTORS CAN AFFECT
PUBLISHER E-JOURNAL USAGE REPORTING
Usage =
Download
Data
mining/crawling
activities
Other sources of
usage like
institutional
repositories,
aggregators, etc.
Open access
journals
Usage capture
and reporting
mechanism
Journal
characteristics:
article type,
quality, subjects,
etc.
Publisher website
design &
discovery services
linking
5. JOURNAL CONTENT TYPES CAN AFFECT USAGE
Content Type Distribution
Article,
544, 29%
Letter,
748, 40%
Short
Survey,
202, 11%
Note,
121, 7%
Editorial,
110, 6%
Review,
66, 4%
Erratum,
59, 3%
Conferen
ce Paper,
1, 0%
Usage Distribution
PDF
25%
HTML
75%
New England Journal of Medicine 2013
6. PUBLISHER WEBSITE DESIGN & DISCOVERY SERVICES
LINKING CAN AFFECT USAGE REPORTS
Landing page: abstract, links to both
HTML and PDF versions Landing page: fulltext HTML
7. PUBLISHER WEBSITE ISSUE
• Usus: an independent community-run website for those interested in the
usage of online content.
• Issue Report: “I believe there is a problem with some publishers double-
counting use in JR1s. Some publishers link directly to HTML via a link
resolver (1 use) and when the same user clicks on the PDF version
moments later (2nd use) a second use is counted.” April 13, 2015
• Usus Response: “COUNTER investigated this issue a few years back and
addressed concerns raised about the inflation of full text request counts
due to the “interface effect” where the users are automatically presented
with the HTML version and access the PDF from the HTML
view. COUNTER’s investigation into the situation resulted in
the introduction of separate PDF and HTML counts in COUNTER JR1 and
JR1 GOA reports…”
9. RESEARCH QUESTION IDENTIFICATION
• Do the following affect HTML and PDF usage
reports:
• Discovery services linking
• Document type
• Journal citation value
• Disciplinary differences
• Publisher platform
10. DEFINING THE LANDSCAPE:
PUBLISHER SAMPLE ANALYSIS
Journal Articles
• 40 Publishers
• From each publisher,
we randomly selected
articles published in
2000, 2009 and 2014
Discovery Services
Linking
• Google Scholar
• Publisher website
• WorldCat Local
• PubMed
Publisher Landing
Page
• Abstract, HTML Option,
PDF Option
• Full-text: HTML or PDF
version
11. DISCOVERY SERVICES & PUBLISHER WEBSITE LINKING
Linking Bias Group: 23 publishers link to full-
text HTML articles as the default from some of
the search portals below.
Publisher
Landing Page:
HTML Full-Text
Google
Scholar
PubMed
WorldCat
Publisher
Journal
Site
No Linking Bias Group: 17 publishers link to
article abstracts and options to download
either HTML or PDF full-text articles as the
default from all of the search portals below.
Landing Page:
Abstract +
Options to
Download either
HTML Full-Text
or PDF
Google
Scholar
PubMed
WorldCat
Publisher
Journal
Site
12. HTML AND PDF USAGE RATIOS VARY
GREATLY AMONG PUBLISHERS
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
NEJM
BMJ
NPG
UniverstyofChicagoPress
AAAS
LWW
Elsevier
ProjectMuse
AACR
BMJSpecial
PsycArticle
Karger
CoB
Thieme
PNAS
OUP
AIP
SPIE
RoyalSocietyofLondon
RSC
Springer
Wiley
BioOne
TF
IOP
UCPress
AmericanMeteologicalSociety
SAGE
MAL
ACS
OSA
ACMDigitalLibrary
Cambridge
NRC
IEEE
Bepress
Duke
MIT
APS
AmericanMathematical…
UC 2013 HTML Usage To Total Ratio by Publisher
(sorted from high to low)
HTML % PDF %
13. HTML USAGE IS GREATLY IMPACTED BY DISCOVERY
SERVICES LINKING & PUBLISHER WEBSITE DESIGN
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
NEJM
BMJ
NPG
UniverstyofChicagoPress
AAAS
LWW
Elsevier
ProjectMuse
AACR
BMJSpecial
PsycArticle
Karger
CoB
Thieme
PNAS
OUP
AIP
SPIE
RoyalSocietyofLondon
RSC
Springer
Wiley
BioOne
TF
IOP
UCPress
AmericanMeteologicalSociety
SAGE
MAL
ACS
OSA
ACMDigitalLibrary
Cambridge
NRC
IEEE
Bepress
Duke
MIT
APS
AmericanMathematicalSociety
UC 2013 HTML Usage To Total Ratio by Publisher
(sorted from high to low)
No Linking Bias: HTML Ratios
Orange bars: HTML % from influence group publishers
Blue bars: HTML % from no influence group publishers
Red bars: PDF % from all publishers
HTML Full-
text from all
4
HTML Full-
text from 3
HTML Full-
text from 1 Abstract from
all 4
Linking Bias: HTML Ratios
14. DATA COLLECTION & REGRESSION
ANALYSIS
• Analysis sample:
– 20+ journal publishers with 4,500 journals
• Methodology
– Regression analysis to estimate the effect of various
factors on HTML/PDF usage
• Factors Analyzed
• Document Type
• Journal Citation Value
• Disciplinary Differences
• Publisher Platform
15. DOCUMENT TYPE HAS LIMITED EFFECT ON
HTML USAGE
All Publishers
10% increase in the
percentage of
research articles
0.18% decrease in
the percentage of
HTML downloads
Note: Research article is defined as the combination of article, review and conference
papers. The document type data is from Scopus.
16. JOURNAL CITATION VALUE EFFECT:
HIGHLY CITED JOURNALS HAVE FEWER HTML
DOWNLOADS AND MORE PDF DOWNLOADS
10% increase in Eigenfactor
2.7% decrease in the
HTML downloads
Data source: Eigenfactor: http://www.eigenfactor.org/faq.php
17. DISCIPLINARY
DIFFERENCES OVERALL
HAVE LIMITED EFFECT
ON HTML USAGE
More HTML
Usage %
Nursing
Social
Science
Neuroscience
Computer
science
More PDF
Usage %
Mathematics
Veterinary
Dentistry
Economics
Journal discipline categories are from Scopus
18. JOURNAL PUBLISHER
PLATFORM HAS THE
MOST SIGNIFICANT
EFFECT ON HTML
USAGE
More HTML
Usage %
NEJM
SPIE
NATURE
Elsevier
More PDF
Usage %
Springer
NRC
MAL
OUP
Journal discipline categories are from Scopus
19. RESULTS & REPORT WRITING
• Discovery services linking options and
publisher website choices have significant
effect on reader behavior
• Document type, journal citation and
disciplinary differences have relatively limited
effect on usage data
20. Our research indicates:
– Counting total usage, which combines PDF and HTML is
not a reliable metric, because of the various factors
studied
– Counting HTML only is incomplete and skewed
– BEST Options:
• Count PDF only
• Use a range from total usage to PDF usage
APPLYING RESEARCH FINDINGS TO
LIBRARY USAGE ASSESSMENT
21. WHAT MIGHT HAPPEN IF YOUR LIBRARY CHANGED ITS JOURNAL USAGE
COUNTING METHODOLOGY
TOTAL USAGE VS PDF USAGE: PUBLISHER RANKINGS VARY
Publisher
Ranking by
Total Usage
Publisher
Ranking by
PDF Usage
22. HTML USAGE RATIO: SIGNIFICANT
GROWTH OVER TIME
34%
39%
43%
59%
0%
10%
20%
30%
40%
50%
60%
70%
2011 2012 2013 2014
Change in UC’s HTML Usage Ratio 2011-2014
23. REASONS FOR HTML GROWTH: TRENDS IN
PUBLISHER WEBSITE DESIGN: ENHANCED HTML--
PRESENTATION, CONTENT AND CONTEXT
24. INFLATED JOURNAL VALUE RANKINGS: PITFALLS YOU NOW KNOW ABOUT
HTML AND PDF USAGE
Acknowledgements:
Research partners:
*Nicole Contaxis, MLIS 2015, School of Education and Information Studies, UCLA
*Alex Wood-Doughty, Ph.D. in progress, School of Economics, UC-Santa Barbara
*Significant earlier research on usage data problems by Terry Bucknell, Philip M Davis,
and Jason Price
Chan.Li@ucop.edu
Jacqueline.Wilson@ucop.edu