This document summarizes a study on the persistence of retracted scientific articles online and in personal libraries. The study found that 12% of retracted articles from PubMed were available publicly online, mostly in their published version, and without retraction notices. It also found 75% of retracted articles in the Mendeley database. The document proposes that CrossMark could help address the problem by providing retraction notices to readers, but has some limitations. It advocates a tripartite solution involving checks before reading, writing, and publication to help update the scholarly record on retracted articles.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Persistence of Error (2011 CrossRef Annual Meeting)
1. The Persistence of Error:
A Study of Retracted Articles on the
Internet and in Personal Libraries
2011 CrossRef Annual Member Meeting
November 15, 20110
Phil Davis, Ph.D.
pmd8@cornell.edu
3. What We Know
• Number of retractions small but increasing (Wager &
Williams, 2011; Steen, 2011)
• Retracted articles continue to be cited as valid studies
(Budd et al., 1998, 2011; Redman et al., 2008)
• Journal publishers are inconsistent with alerting
readers: 41% articles watermarked, 32% contain no
notification anywhere (Steen, 2011)
• Most publishers allow some form of self-archiving
(SHERPA/Romeo; Morris, 2009)
• Authors often ignore publisher policy (Davis & Connolly,
2007)
• Journal articles are likely to be found on non-publisher
websites (Wren, 2005)
3 November 21, 2011
4. What We Assume
• Reaching readers is a communication problem that is
not being solved by publishers and indexers alone.
• There is more than one access conduit to the scholarly
literature
• Proliferation of article versions
• Scholars hoard articles in personal libraries
• Article status is static unless stated otherwise
• As retraction numbers are small, little incentive to
search for updates (high-cost, low return)
4 November 21, 2011
5. What We Don’t Know
• Extent of proliferation of retracted papers on the
public internet (out of the control of the publisher)
• Where they exist and which version(s)?
• What exists in readers personal libraries?
5 November 21, 2011
6. What We Did
1. Searched for copies of retracted papers on the
public Internet. Excluded published version on
publisher’s website
2. Created an API that searched the Mendeley
database for retracted articles
6 November 21, 2011
27. 27
Retracted articles
0
100
120
140
160
180
20
40
60
80
1973
1975
1976
November 21, 2011
1977
1978
1979
1980
1981
1982
1983
1984
1985
No public copies
1986
Found public copies
1987
1988
1989
1990
1991
1992
Year
1993
1994
1995
1996
1997
1998
1999
2000
Public Copies on the Web
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
28. Summary of Web Study
• 1,779 retracted articles from PubMed (1973-2010)
• 308(12%) publicly-accessible copies (excluding
published version on journal website)
• 29 could be found in more than one location (max 5)
• 90% of copies were published version; 9% final
manuscripts; 1% other
• 41% in PMC; 28% on educational sites; 7% commercial
• 24% copies with retraction notices (5% excluding PMC
page view)
28 November 21, 2011
29. A window into what is on computers
29 November 21, 2011
30. Mendeley API
Our API: http://www.fireisborn.org/retract/
30 November 21, 2011
31. Results from Mendeley
• 75% (1,340 of 1,779 records) could be found in
Mendeley (mean readers = 3.4, max = 133)
• Caveat: We are not certain if they have the PDF
• Concentration of “readers” in top journals
• High readership articles more than 3x likely to be
found on public (non-repository) websites (OR
3.28, 2.33-4.61, p<.0001)
31 November 21, 2011
32. Implications
• The problem of persistence cannot be controlled by
copyright. Publishers lack control of articles
• Increased access comes with a versioning problem
• Essential problem: How do you reach readers when a
Version of Record is no longer a Version of Record?
32 November 21, 2011
33. Solutions
Given 90% public copies are publisher
version, CrossMark would be seen by the future reader
Caveats:
• Reader still responsible for initializing verification check
• Authors often write directly from bibliographic software
• Doesn’t prevent reuse/recycling of citations
• Doesn’t automatically update older PDFs (without symbol)
• Institutional self-archiving mandates may increase author
manuscripts
33 November 21, 2011