The document outlines a meeting to discuss a collaborative project between several Maine libraries to analyze circulation data and holdings to make data-driven decisions about retaining and withdrawing print monograph collections in a sustainable shared collection strategy. Key players from the participating libraries and experts from Sustainable Collections will analyze data extracted from the libraries and use it to develop retention and withdrawal scenarios to guide collaborative collection management decisions. A multi-step process is proposed to clean and analyze the data, develop scenarios, facilitate discussions, and produce lists of titles for retention and withdrawal for each individual library.
2. The SCS Team
Rick Lugg and Ruth Fischer
• R2 Founders, principals
• Recognized as experts in:
– Selection-to-access workflows
– Integration of vendor and library systems
– Adapting library organizations for the 21st century
Andy Breeding
• Focus on content-management, web and search solutions, user experience
• Most recently: User-Experience Team Manager at Harvard Business School
• 20+ years in special libraries
Eric Redman
• Former Chief Architect and Director of IT at Blackwell North America
• Led Development of Blackwell’s Collection Manager 7 application
• Deep knowledge of bibliographic data, search, and information architecture
• 28 years IT experience
Sustainablecollections.com 2
4. The SCS Approach: Data-Driven Decisions
• Circulation and other local use
data (in-house, reserves)
• Location
• Year of publication
• Year Acquired
• Holdings in other libraries
(national, state, peer)
• Overlap within MSCS libraries
• Secure digital copy (Hathi)
4
5. The SCS Approach: Project Success
• Partnership & collaboration
• M/SCS
• Flexibility
• New Ground
– Internet Archive
– Academic/Public
– LC/DDC
– FRBR-on
• Custom MSCS Data Set
5
6. Project Scope: Participating Libraries
• Colby College
• Bates College
• Bowdoin College
• Portland Public Library
• University of Maine/Orono (URSUS)
• University of Southern Maine (URSUS)
• Bangor Public Library (URSUS)
• Maine State Library (URSUS)
• [Bangor Theological Seminary]
Sustainablecollections.com 6
7. Project Scope: Material Types
• Circulating print monographs
• Reference books
• Special Collections monographs
• Out of Scope
– eBooks
– Government Documents
– Non-print formats
– Maps, scores
– Journals
Sustainablecollections.com 7
8. Project Scope: Key Questions
• What monographs should the eight partner libraries
designate for long-term retention for the benefit of
shared collections in the State of Maine?
• What is an equitable and/or common-sense
distribution of retention responsibilities?
• What monographs held by the partners are candidates
for incorporating into POD/EOD services by virtue of
Hathi Trust or Internet Archive programs for public
domain material?
• What monograph copies (by library) could optionally
be deselected, once retention decisions have been
finalized?
Sustainablecollections.com 8
9. Project Management
• Roles
• Program Manager
• Project Team
• SCS: analyze & present data; facilitate discussions on
data, interpretation, and policy options
• Decision-Making
• Retention/Withdrawal Scenarios
• Title Protection rules, etc
• Communication
• Listserv?
• Direct or via Program Manager?
Sustainablecollections.com 9
10. High-level project schedule
Tentative
Task Description
Dates
February
Planning Meetings Key players discuss data extracts, anomalies, peers, etc.
2013
Libraries prepare and deliver extracts to SCS. SCS validates, March
Data Preparation
normalizes, matches, and performs holdings lookups. 2013
Group Collection Categorical overview of the group data set. Used to gauge
April 2013
Summary opportunities and guide scenario development.
Scenario Project leaders suggest preliminary withdrawal and preservation Begin April
Development criteria. SCS iterates and revises. 2013
Detailed Excel spreadsheets for review, bases on finalized criteria for
Candidate Lists 2013
withdrawal. Modify as necessary.
Discussions This will be needed at many points – but especially around scenario Through-
Facilitation development, allocation, and policy development. out
Assignment of withdrawal opportunities and retention commitments
Allocation 2013
– based on many factors.
Production of Picklists Once allocation decisions have been made, SCS will derive title/item
2013
and Keeplists lists for use by individual libraries.
Ongoing Data SCS will maintain (but will not update) the MSCS dataset for 2 years,
Sustainablecollections.com 10 …
Management which can be used for additional projects.
11. Collecting and preparing the libraries’ data
• Bibliographic, item, circulation, and holdings data
extracted, transformed, and loaded to a MSCS database
• Filter out-of scope bib records
(eBooks, maps, scores, DVDs, Gov Docs)
• Eliminate duplicate bib records
• Normalize call numbers
• Eliminate trailing spaces in control numbers
• Validate OCLC numbers
• Match bib records on OCLC number (with title-string check)
• LCCN/title-string lookups for records lacking OCLC#
• Identify and accommodate unusual implementations of MARC
• Map item-level data and interpret codes
Sustainablecollections.com 11
12. MSCS Data
Record Type Expected Current Working #
Bib Records 2,415,000 2,901,973
Item Records 3,000,000 4,950,549
Libraries 8 9 (BTS added)
Sustainablecollections.com 12
13. Opportunities for Local Data Remediation
Bib Records Received 695,567
Bib Records included for analysis 683,545
Bib Records out-of-scope for analysis 12,022
Duplicate bib records received 388
Government docs 712 gpo nbr is not null or gov doc nbr is not null
Rec Type not equal to 'a' 11,010 non-language materials per MARC leader 06
non-monograpic materials per MARC leader
Bib Level not equal to 'a' or 'm' 154 07
medium is not null (videos, electronic
Non-print resources 408 materials, sound recordings, etc.)
Unable to obtain OCLC number 226
Bib Title/Author mismatch with OCLC 233
Multiple OCLC numbers per record 0
Local holding not set in WorldCat 13 99,437
15. Additional Factors
• Comparator Libraries
• Title Protection Rules
• Subject Analysis
• Authoritative Title Lists
• Today’s task: make sure that decision factors
are represented in the data before we begin
Sustainablecollections.com 15
17. By “titles” we can mean two different things
1. Title Set
Dominguez Fullerton Long Beach Los Angeles Northridge Pomona
2. Title Holding
17
18. Each “Title-Holding” has different characteristics
Dominguez Fullerton Long Beach Los Angeles Northridge Pomona
Hills
Total Circulations
0 circs 19 circs 16 circs 12 circs 13 circs 8 circs
Last Circulation Date
-none- 11/30/11 12/16/08 5/30/07 4/27/07 3/11/08
Date added to Collection
6/27/02 4/23/02 9/21/01 5/03/00 11/11/02 8/11/00
18
19. Pilot Group Holdings and Avg Total Charges by LC
800,000
HOLDINGS
600,000
400,000
200,000
-
A B C D E F G H J K L M N P Q R S T U V Z
10.0
AVG CHARGES
8.0
6.0
4.0
2.0
0.0
A B C D E F G H J 19 L M N P Q R S T U V Z
K
20. 3
2.5
Average # ofInsitutions Holding
2
1.5
Average number of CNY
1
library holdings per title
by publication year
0.5
1965 = 2.39 (peak value)
2012 = 1.75
0
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Sustainablecollections.com 20
21. Circulation Counts
Sample Library Title-Holding Counts All Libraries Percent
1 All Title Holdings - Filtered 3,575,321 100%
2 Total Charges = 0 (all available circ data) 1,161,359 32%
3 Total Charges = 1 to 3 (all available circ data) 1,071,029 30%
4 Total Charges = 4 to 9 (all available circ data) 699,350 20%
5 Total Charges = 10+ (all available circ data) 643,583 18%
6 Last charge after 2010 501,890 14%
7 Last charge after 2007 914,325 26%
8 Last charge after 2005 1,157,845 32%
21
22. WorldCat™ Counts
Sample Library Title-Holding Counts All Libraries Percent
1 All Title Holdings - Filtered 3,575,321 100%
9 0-9 Holdings in USA 122,092 3%
10 10-19 Holdings in USA 73,656 2%
11 20-49 Holdings in USA 234,822 7%
12 50-99 Holdings in USA 405,321 11%
13 100-199 Holdings In USA 752,079 21%
14 200+ Holdings in USA 1,987,329 56%
15 0-9 Holdings in California 426,536 12%
16 10-49 Holdings in California 1,858,850 52%
17 50+ Holdings in California
22 1,289,913 36%
23. Think MSCS
Overlap within Group of 6 Libraries
Sample Library Title-Holding Counts All Libraries Percent
1 All Title Holdings - Filtered 3,575,321 100%
18 Title-holdings present in 1 library 978,728 27%
19 Title-holdings present in 2 libraries 717,012 20%
20 Titles-holdings present in > 2 libraries 1,879,581 53%
21 Title-holdings present in 3 libraries 630,176 18%
22 Title-holdings present in 4 libraries 556,887 16%
23 Title-holdings present in 5 libraries 445,660 12%
24 Title-holdings present in 6 libraries 246,858 7%
23
24. Date Related Counts
All
Sample Library Title-Holding Counts Percent
Libraries
1 All Title Holdings - Filtered 3,575,321 100%
30 Publication Year before 2005 3,356,176 94%
31 Publication Year before 2000 3,102,731 87%
32 Publication Year before 1990 2,600,033 73%
33 Last Item Add-Date before 2005 3,257,574 91%
24
25. Hathi Trust Matches
All
Sample Library Title-Holding Counts Percent
Libraries
1 All Title Holdings - Filtered 3,575,321 100%
34 Hathi Trust Public Domain Match 101,822 3%
35 Hathi Trust In-Copyright Match 1,626,447 45%
25
27. Sample Pilot Group - Title-Holdings by Holdings Level
2,000,000
1,800,000
Commonly Held
1,600,000
Titles
1,400,000
1,200,000
1,000,000
1,879,581
800,000 Uniquely
Held Titles
600,000
978,728
400,000
717,012
200,000
-
1 2 3-6
# of Pilot Group Libraries Holding Title
28. Sample Pilot Group - Title-Holdings by Holdings Level
2,000,000
1,800,000
1,600,000
779,756
1,400,000 4+ circs
1-3 Circs
1,200,000
0 circs
1,000,000
800,000 305,438 539,718
600,000 257,739
311,240
400,000
220,071
560,107
200,000 362,050
239,202
-
1 2 3-6
# of Pilot Group Libraries Holding Title
29. Titles Published and Acquired before 2000
Shared Withdrawal Scenarios
within the Sample Pilot Group
0 1 or fewer 3 or fewer
Circulations circulations circulations
Keep 1 Title-holding 623,382 850,392 1,077,845
Keep 2 Title-holdings 408,135 534,642 648,965
Keep 3 Title-holdings 238,548 299,848 348,723
Sustainablecollections.com 29
34. Post-Summary Outputs
• Iterations of Retention Scenarios
• Single group-wide retention list
• Allocation of retention commitments &
withdrawal opportunities
• Allocation database
• 2-year access to MSCS data set
• Things we probably haven’t anticipated
Sustainablecollections.com 34
36. Please describe your library’s retention priorities.
Do you have any remote storage or compact shelving?
Do you plan to reduce the size of your local print
collection or stacks? If so, do you have a goal in mind?
Are any libraries under the significant space pressure?
Sustainablecollections.com 36
37. How many circulating print monographs
are there in your collection?
Sustainablecollections.com 37
38. How many reference books?
How many juvenile books?
How can SCS identify/segregate
these parts of your collection?
Sustainablecollections.com 38
39. Your OCLC symbol? Symbols?
What is the local practice with regard to
setting holdings?
Recent OCLC reclamation project?
Include or exclude titles where the holding
has not been set?
Sustainablecollections.com 39
40. Classification
What is your library’s primary
classification scheme?
Secondary classification scheme?
Are these segregated by location?
Where are local call numbers stored?
Sustainablecollections.com 40
41. Call Numbers in Bibliographic Records Call numbers in Item Records
LC Dewey Local LC Local Dewey Local
MARC Field 050 082 090 092 095 945$a 945$b
Bates 181,930 252,824 2,639 758
Bowdoin 189,187 298,047 43,960 23,390
Colby 316,514 230,399 174,718 11,189 233
Portland Public
- - 183,707 260,196 24,226
Library (PPL)
URSUS 888,772 875,282 607,032 399,111 3,100,433 1,952,982
BTS 23,981 22,590 - 2 62,181 5,442
Sustainablecollections.com 41
42. How many years of circulation data is available?
Total charges
Last charge date
Are there any internal processes
that routinely “charge” items?
Sustainablecollections.com 42
43. In-House Usage
Re-shelving counts?
Any other
systematic tallies?
Sustainablecollections.com 43
44. Are item add dates available?
Date accessioned?
If yes, how many years of add/acq data
is available?
Sustainablecollections.com 44
45. How does the library handle multiple copies?
What is the best way for
SCS to differentiate
multiple copies from a
multi-volume set?
Sustainablecollections.com 45
46. Contact Info
• andy@sustainablecollections.com
• rick@sustainablecollections.com
Sustainablecollections.com 46
48. Discussion & Decisions Needed
• Comparator Libraries
• Title Protection Rules
• Data Presentation: LC, DDC, combined?
• Internet Archive: how much to invest
Sustainablecollections.com 48
49. Scenario Building: Issues to Consider
• Archive copies vs. Service copies
• Dispersion of title-holdings / delivery times
• MSCS ‘unique’ titles: how to handle
• Preservation commitments (in what context?)
• Role/relationship with other regional libraries?
• Physical condition
49
50. Think about…
• Think about the questions you want to ask
• Think about which data points (and combinations of
points) can help answer those questions
• Think about the MSCS’s 2.9 million title-holdings as if
it were a single distributed collection (this is only an
exercise)
• Think first about titles that have never circulated and
are held by multiple libraries
• Think about storage, retention, and withdrawal
• Ask: what is the worst-case scenario?
50
51. Comparator libraries
• SCS can support three groups with a
maximum of 20 OCLC symbols each
• These are in addition to US Holdings, State
Holdings, Groupwide Holdings, HathiTrust,
and Internet Archive
• Not of primary interest to MSCS?
Sustainablecollections.com 51
52. Local Interest Rules
• Categories to be taken off the table
• Retained regardless of circulation/use
• Examples: Maine, Atlantic Coast
• Rules consist of keywords and classification
ranges, e.g. Local Maine History
• DDC 974.1
• LC F16-30
Sustainablecollections.com 52
53. Subject Analysis
• LC
• DDC
• Augmented DDC/LC
• Conspectus
• Can we learn what is needed by looking
through one lens?
Sustainablecollections.com 53
54. Internet Archive
• Because the Internet Archive API is not designed for large-scale
batch queries, SCS must obtain the full set of Open Library data (of
which IA is a subset).
•
• SCS must parse the Open Library records to identify the IA titles.
These are large files, e.g., the Open Library Editions file contains 25
million lines. About 6.8 million of these appear to have OCLC
numbers. As of 1/2/13, the IA “Texts” division contains 3.7 million
items, not all of which are books. It will require some digging to
verify the various relationships and the quality of the data. We
believe that the actual number of full-text books in IA is between
2.2 and 2.5 million.
• SCS must identify items which appear both in IA and in HathiTrust
to minimize duplication of counts.
Sustainablecollections.com 54
Notes de l'éditeur
California in the New Millennium: The Changing Social and Political Landscape,Publication Year 2000 (Paperback version 2002)This Title is held by all 23 CSU Libraries – we have details on the 6 in the Pilot Group