This document discusses HathiTrust, a digital repository managed by over 60 partner institutions, as a potential repository for government documents. It provides an overview of HathiTrust, noting that it contains over 10 million volumes, including over 2.9 million public domain volumes. The document estimates that around 300,000 titles in HathiTrust, or 4% of the total, are US government documents, with 80% of those being in the public domain. It compares the ability to find full text of documents between HathiTrust and Google Books. The document outlines who can access what within HathiTrust and how content can be searched, discovered, and loaded into local library catalogs. It acknowledges challenges around searching, copyright statuses, and metadata
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
HathiTrust--a GovDocs Repository?
1. HathiTrust--a GovDocs
Repository?
Brian Vetruba, Catalog Librarian/Germanic Studies
Librarian
Washington University in St. Louis
bvetruba@wustl.edu
Leveraging Your Strengths: Regional Government
Documents Conference | Federal Reserve Bank St. Louis
May 4, 2012
2. Overview
• Began in 2008
• Over 10.2 million volumes
• Over 2.9 million public domain
(PD) volumes (“full view”)
• Over 60 partners
hāthī ( ) (pronounced
HAH-tee) is the Hindi word
for elephant.
3. Content
HathiTrust Partners:
Content by call number, language, date And more...
http://www.hathitrust.org/statistics_visualizations
4. US Gov Docs in HathiTrust
• Ca. 300,000 = 4% of all
titles in HathiTrust
• 80% of gov docs in
HathiTrust in public
domain
6. HathiTrust Compared to Google Books
More titles found in Google but HathiTrust
provides more full-text
Total docs = 385 Titles Found Full-text
(1940s)
HathiTrust 98 90
Google 181 4
HathiTrust better for searching serials
One record for all issues of a title
Title changes noted
Sare (2012)
7. Who can do what
Everyone HathiTrust Partners
• View PD content • Download entire volumes
• Search PD and copyright of PD materials
materials • Create private or public
• View public collections collections
• Download single PD pages • Have a voice in the future
• Download MARC records of HathiTrust
10. Searching & Discovering Content
Loading records into local ILS
http://www.hathitrust.org/data
Bibliographic and Data APIs
http://www.hathitrust.org/data
Widgets
http://www.hathitrust.org/widgets
11.
12. Searching & Discovering Content
Embed links
Public collections
http://babel.hathitrust.org/cgi/mb
Individual items
From a LibGuide for a
German literature
course
13. Final Thoughts
CHALLENGES
• Searching/retrieval obstacles (e.g. no SUDoc search)
• Inaccurate copyright statuses impeding access
• Inaccurate linkages and bibliographic info
PROGRESS
• Commitment to expand and enhance access to gov docs
• Research study examining how to improve access
• Coordination with Committee on Institutional Cooperation
and others to create a digital corpus of 1+ million print
docs
14. Questions about HathiTrust
http://www.hathitrust.org/help
feedback@issues.hathitrust.org
@hathitrust
More info:
http://libguides.wustl.edu/hathitrust
15. More info on loading records into ILS
Kent State Univ.:
http://techserv.lib.muohio.edu/ovgtsl11/presentations/Panchyshyn.pptx
Univ. of Denver:
http://www.slideserve.com/holleb/harvesting-hathitrust-documents-a-new-
model-for-online-access
Univ. of Colorado-Denver:
Beall, Jeffrey. 2009. “Free Books: Loading Brief MARC Records for Open-
Access Books in an Academic Library Catalog.” Cataloging & Classification
Quarterly 47 (5) (January 4): 452–463. doi:10.1080/01639370902870215.
16. Bibliography
Malpas, Constance. 2011. Cloud-sourcing Research Collections Managing
Print in the Mass-digitized Library Environment. Dublin, Ohio OCLC
:
Research. Accessed May 2, 2012
http://www.oclc.org/research/publications/library/2011/2011-01.pdf
York, Jeremy. 2012. HathiTrust: Issues and Challenges in Preserving the
Published Record [PowerPoint slides]. Accessed April 30, 2012
http://www.hathitrust.org/documents/HathiTrust-Amigos-201202.pptx
Brown, Christopher C. 2011. Harvesting HathiTrust Documents: A New Model
for Online Access [PowerPoint slides]. Accessed April 30, 2012
http://www.slideserve.com/holleb/harvesting-hathitrust-documents-a-new-
model-for-online-access
York, Jeremy. 2012. “HathiTrust: The Elephant in the Library.” Library Issues:
Briefings for Faculty and Administrators 32 (3) (January). Accessed May 2,
2012 http://www.libraryissues.com/sub/LI320003.asp .
Sare, Laura. 2012. “A Comparison of HathiTrust and Google Books Using
Federal Publications.” Practical Academic Librarianship: The International
Journal of the SLA Academic Division 2 (1): 1–25. Accessed May 2, 2012
http://journals.tdl.org/pal/article/viewFile/5880/5922
HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouUnlike other e-book initiatives, both PRESERVATION and ACCESS are main focal points. HathiTrust stated intention to preserve digital volumes over long term. hāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.5,422,301 book titlesTrustworthy Repository Audit and Certification (TRAC)269,168 serial titles
Much of the current content in HathiTrust was digitized as part of the Google Books. Another major source is from the Internet Archive. Increasing amount of content coming out of digitization by partner libraries. So if most of content in HT is in Google or IA, how is it different-- digital library organized by libraries for libraries and their users -- Catalog structure to facilitate access -- use all the metadata -- fine tune search interfaces to fit user’s needs --open access to data and meta--key goal PRESERVATION over the long haul. -- Locally digitized collections from partners increasingly important-- coordination w/ other digital library initiatives like Digital Public Library of America, as wellCLICK ON HATHI TRUST TO SHOW CONTENT VISUALIZATIONS. Overlap -- Median overlap for ARL libraries is 50% Higher for smaller college libraries (already 50% in May 2011) Jeremy York, "HathiTrust: Aspiring to Build the Universal Library". UKSG Annual Conference, March 26, 2012.
According to the report on HT Constitutional Conventional, there are an estimated 300,000 of US documents in HathiTrust which according to the report 1/5/ to 1/3 of all printed documentsOthers estimate the Gov Docs constitute about 4% of all titles in HathiTrust. Malpas in her report Cloud Sourcing notes that ~ 80 of gov docs in public domain and thus are or should be in viewable in their entirety. Gov Docs account for high percentage of public domain materials. Automatic rights determination: Conducted on all works at time of ingest and when records are modifiedPublic domain worldwideUS works published before 1923, US federal government publications, non-US works published prior to 1872Public domain in the United StatesNon-US works published prior to 1923
Using Monthly Catalog of US Government Publications, 1895-1976 via Proquest and Catalog of Government Publications (1976 onward), Christopher Brown at University of Denver looked at the percentage of government documents in HathiTrust.Best coverage for 1970-1980s; worse coverage for late 19th century. And of course drops off in 2000s with GPOs decision to do away with most print documents.
Sare, Laura. 2012. “A Comparison of HathiTrust and Google Books Using Federal Publications.” Practical Academic Librarianship: The International Journal of the SLA Academic Division 2Using a random sample of 1540 federal documents published between 1943 and 1976, Sare looked at number of titles found, full text, search interface, and quality of bibliographic records.OVERALL – Sare found more docs listed in Google Books but more documents were available as full text in HathiTrust1940s -- 385 total HT found 98 titles; 90 were full text Google found 181 titles; only 4 were full text DUE TO GOOGLE DECISION TO CONSIDER ANYTHING PUBLISHED AFTER 1923 AS IN-COPYRIGHT MATERIAL BECAUSE THEY FALL WITHIN ORPHAN WORKS TIMEFRAME. . THOSE NON-FULL TEXT TITLES FOUND IN GOOGLE BOOKS EITHER HAD “SNIPPET” VIEWS OR “NO PREVIEW” AVAILABE. ---BETTER BIBLIOGRAPHIC DATA IN HATHITRUST RECORDS, ESPECIALLY FOR SERIALS. MULTIPLE RECORDS AND MULTIPLE LINKS IN WORLDCAT RECORDS FOR GOOGLE BOOKS ESPECIALLY CUMBERSOME. PUTTING ON CATALOGING HAT– TITLE CHANGES NOTED IN HT RECORDS WHEREAS NOT IN GOOGLE BOOKSSNIPPET views useful. Also the fact users can see keywords in context beneficial
Everyone can view books and journals, docs in public domain and read online. Single pages can be downloaded. In fact, anyone could download multiple pages from a public domain volume – just one page at a time. The one main difference for users from a HathiTrust partner library is that they can download entire volumes of public domain materials. Other difference is that users from HT partners can save sets of records. Can do this even if not from a partner library although very very difficult.---------NOW LET’S GO INTO HATHITRUSTCLICK ON ICON TO GO TO HT CATALOGAU: United statesAU: CommerceKW: LumberProblems of the softwood lumber industry : hearings before the Committee on Commerce, United States Senate, Eighty-seventh Congress, second session, on impact of lumber imports...by United States. Congress. Senate. Committee on Commerce. Published 1962 FULL TEXT SEARCH – both public domain and copyrightCubanCastroimmigrationCollections:Official gazette of the United States Patent OfficBritish Foreign offceNASA Technical reportsUNDER ABOUTOUR RESEARCH CENTER. AUTHOR SEARCH*DIFFERENT THAN GOOGLE’S N-GRAM, ALSO SEARCHES IN COPYRIGHT.
HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.
LOADING RECORDS INTO LOCAL ILS-- AVAILABLE TO ANY LIBRARY – REGARDLESS IF PARTNER OR NOT PARTNER1. University of Michigan provides an OAI feed of MARC21 and Dublin Core records for public domain items. OAI Toolkit to assist in harvesting records2. Tab delimited files use to retrieve metadata3. Exporting records from WorldCat – public domain materials not identified; would need to check items individually. Non partner libraries who have done this-- Ball State University-- Kent State University (records available via OHIOLink??) – end of 2009/beginning of 2010 Link to powerpoint at end-- University of Colorado at Denver -- Did this in May 2008 but deleted records in 2011 when started using Summon --Article by Jeffrey Beall on this at end. **HATHI DOES ASSIST WITH CREATING CUSTOMIZED DATA SETS. --issues – -- initial labor -- continual maintenance -- would need to regularly download new files -- can your system handle it? Sluggishness -- duplication
HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.
CURRENT CHALLENGES WITH HATHITRUST CATALOG – CAN’T LIMIT TO GOV DOCS; CAN’T SEARCH BY SUDOC NUMBER. AND THE ONGOING ISSUE OF AGENCY NAME CHANGES -- ALL IMPEDIMENTS TO USERS FINDING GOV DOCSANOTHER THORNY ISSUE IS INACCURATE COPYRIGHT STATUS. SOMETIMES GOV DOCS IN PUBLIC DOMAIN NOTED AS “IN COPYRIGHT” AND THUS NOT AVAILABLE IN FULL VIEW. GOOGLE LOCKED DOWN ALL MATERIALS PUBLISHED AFTER 1923 REGARDLESS OF IF GOV DOC OR NOT. ----
HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.