NISO Patron Privacy VM#3-Richard Entlich: user-based information offered by publishers
1. Patron data collected and
offered by publishers
NISO Patron Privacy Virtual Forum #3
May 22, 2015
Richard Entlich
Collection Analyst Librarian
Cornell University Library
2. The Landscape
Publishers collect a variety of information
about user interaction with their systems
◦ Some is specifically at the request and for the
benefit of libraries, such as COUNTER
reports
◦ Some consists of proprietary reports of
various kinds (outside of COUNTER) are
provided by some publishers. Others will
provide such reports upon request.
◦ Some is to meet publisher objectives, such as
protecting intellectual property (e.g., to detect
“excessive downloads”), or for marketing
3. Understanding publisher data
collection activity
Libraries don’t know the scope of data that
publishers are collecting about their
patrons’ use of licensed e-resources
Publisher web site terms & conditions and
privacy policy statements provide some
information but don’t offer a complete
picture
Looking at the data publishers are already
providing to libraries can provide some
insight
4. Examples of publisher provided
data about users
Licensed e-journals
Licensed e-books
Web scale discovery systems
5. Licensed e-journals
Full-text article downloads by month by IP
address, platform level
◦ Provided routinely by many publishers
American Chemical Society
Association for Computing Machinery
IEEE
Nature Publishing Group
Royal Society of Chemistry
… and many others
◦ Most publishers can provide such reports
upon request, even if they don’t offer them on
their administrative portals
6. Downloads by month by IP address,
platform level
◦ Very similar in design to the COUNTER JR1
report, except by IP instead of journal title
Full-text downloads for [Whatever] University
(by month by IP address)
IP Address Jan-2014 Feb-2014 Mar-2014 YTD total PDF total HTML total
123.124.125.126 3 6 8
17 17 0
123.124.125.127 6 15 17
38 38 0
123.124.125.128 9 20 22
51 51 0
123.124.125.129 12 25 27
64 40 24
123.124.125.130 15 8 10
33 33 0
123.124.125.131 9 16 14
39 39 0
123.124.125.132 9 15 29
53 53 0
Total 63 105 127 295 271 24
7. Downloads by IP, article level
Combines highly granular demographic
and bibliographic data
At least one third party analytics provider,
MPS, makes such a report an option for
publishers using its MPSInsight product
Some publishers make the report
available to libraries
8.
9.
10. Full-text article downloads by
IP
Data returned (for a one month period)
◦ Journal [journal title]
◦ DOI [Digital object identifier]
◦ Title [article title]
◦ Volume
◦ Issue
◦ IP Address [full IPv4 in dot-decimal notation]
◦ Total Successful Full-Text Article Requests
11. Licensed e-books
Full-text page, section, or chapter
downloads by IP address, platform level
◦ Provided routinely by some publishers
◦ Very similar in appearance to the comparable
e-journal report
◦ Most publishers can provide such reports
upon request, even if they don’t offer them on
their administrative portals
12.
13.
14. Usage statistics with
authentication details
Data returned (for a full calendar year)
Customer Number # of Hits
Collection Number of Pages Viewed
MiL EAN/ISBN Number of Pages Downloaded
Title [e-book title] Number of Pages Printed
Publisher Checkouts
Pub e-EAN/ISBN License
Hardcover EAN/ISBN Authentication / Login type
Paper EAN/ISBN Login Date
LC Subject Heading IP Address [full IPv4 in dot-decimal notation]
LC Class Session ID
21. IP addresses and usage data
An IP address does not identify a person,
but comes uncomfortably close
What level of bibliographic data is
acceptable to combine with IP addresses?
Should publisher systems be retaining
such data or sharing it with libraries?
Do libraries have the right to ask
publishers not to collect it? Retain it?
Share it? Sell it?
22. Library use for IP address data
◦ Platform level download counts by IP
address can be very useful
◦ IP addresses can be converted into
demographic categories and then
removed, allowing for demographic
analysis of licensed e-resource use (e.g.
at the college or department level)
◦ Under the widely used IP-authentication
model, publisher systems are the best
source of such data
23. Recommended reading
Some publishers provided “too much
information”
Some publishers declined to share IP-based
data with the library, citing privacy concerns
The University of Virginia library significantly
altered its funding model for licensed e-
resources based on usage patterns that the IP
data revealedhttp://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1422&context=charl
eston