SlideShare a Scribd company logo
1 of 17
Building a Public Research Center for the HathiTrust Digital Library @hathitresearch | @hathitrust http://www.hathitrust-research.org Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries Associate Director-Data to Insight Center, Pervasive Technology Institute Indiana University June 14, 2011 JCDL 2011: Big Data! Big Deal? Panel
HathiTrust Research Center (HTRC) Team Indiana University Beth Plale – Director Robert McDonald – Executive Committee University of Illinois Scott Poole – Co-Director John Unsworth – Executive Committee
HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as Model Expanded to include content from  CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions and more than 8 million volumes
Towards a HathiTrust Research Center Started in response to proposed Google Settlement  - June 2009 ,[object Object]
Worked to identify key stakeholders from HT institutions to collaborate and write RFP
Google Settlement in early 2011 did not stop the centerDeveloped specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009 HTRC RFP Working Group RFP Released – Winter 2010
Our Collaboration HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.
Our Mission Phase I : starting Apr 2011 and going for 18 mos. Phase II : starting Fall 2012 and going for … Goal: enable strong computational research and education on a collection that has not been amenable to computational exploration EVER before!
Our Goals Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery.  Also register derived data sets, indexes, and versions in registry repository.   Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools.   Support interoperability across collections and institutions, through use of inCommon SAML identity.
Our Future Support innovation in cyberinfrastructure to deliver optimal access and use of the HathiTrust corpus. Implement “Non-consumptive” research: a technical and intellectual challenge Identify and host existing data analysis, text mining and retrieval toolsthat are of interest to the community.   Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.
HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Poole, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board TBD HathiTrust Executive Committee Liaison Laine Farley, California Digital Library
HathiTrust Research Center Today	 $250K in funding for initial 18 month startup Creating Themed Collections for early Use Cases Astronomy – Victorian Literature - Influenza Ingest and Replication Mechanisms Between HT and HTRC Full-text SOLR indexes Data Capsule integration Karma integration Integration with SEASR/MEANDRE SOA services at NCSA Alignment with Bamboo Technology Project Alignment with international Google Books Research Centers Establishing long-term non-consumptive research methodologies
HTRC Proposed Technical Architecture Courtesy IU Data to Insight Center – Beth Plale/Yiming Sun
Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun Current SEASR Integration Demo 1.  User enters Author name or Volume title 2.  Query RIS for Author Name or Volume Title Sample Collection Bibliography Database JS/PHP Auto-completer Book Search Interface by Author or Title 3.  Volume ID 7. Tag Cloud returned to user 4. Invoke Tag Cloud service with URL Converted from MARC to RIS 5. Use URL to Retrieve Volume Public-domain OCR Web Access Servlet A persistent RESTful Web Service Tag Cloud Viewer Data Flow 6. OCR for volume Sample Public Domain Collection Meandre Workbench Organized as pairtree for demo only SEASR Infrastructure Administrator creates tag cloud viewer in advance through SEASR
Non-Consumptive Research Track No action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection.  Beth Plale (Indiana University) Atul Prakash (University of Michigan) Geoffrey Fox (Indiana University) Robert H. McDonald (Indiana University)
HTRC Managed Data-Intensive Compute Resources HathiTrust Digital Library Content ,[object Object]
 Access to HT copyrighted indices

More Related Content

What's hot

Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
University of California Curation Center
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
William Gunn
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
SEAD
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data Project
Edward Blurock
 

What's hot (20)

Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Sue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxSue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptx
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptx
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goal
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data Project
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 

Viewers also liked

Research design, laboratory experiment
Research design, laboratory experimentResearch design, laboratory experiment
Research design, laboratory experiment
leannacatherina
 
ARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDSARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDS
stuti31
 

Viewers also liked (7)

The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14
 
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
 
Research design, laboratory experiment
Research design, laboratory experimentResearch design, laboratory experiment
Research design, laboratory experiment
 
Lime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutesLime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutes
 
Architectural details
Architectural detailsArchitectural details
Architectural details
 
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
 
ARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDSARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDS
 

Similar to Building a Public Research Center for the HathiTrust Digital Library

Andy Powell Presentation
Andy Powell PresentationAndy Powell Presentation
Andy Powell Presentation
Donggi heo
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
John Butler
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
John Butler
 

Similar to Building a Public Research Center for the HathiTrust Digital Library (20)

JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?
 
Andy Powell Presentation
Andy Powell PresentationAndy Powell Presentation
Andy Powell Presentation
 
Di d dlf_handout
Di d dlf_handoutDi d dlf_handout
Di d dlf_handout
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research Libraries
 
How to open repositories
How to open repositoriesHow to open repositories
How to open repositories
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 

More from Robert H. McDonald

More from Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Building a Public Research Center for the HathiTrust Digital Library

  • 1. Building a Public Research Center for the HathiTrust Digital Library @hathitresearch | @hathitrust http://www.hathitrust-research.org Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries Associate Director-Data to Insight Center, Pervasive Technology Institute Indiana University June 14, 2011 JCDL 2011: Big Data! Big Deal? Panel
  • 2. HathiTrust Research Center (HTRC) Team Indiana University Beth Plale – Director Robert McDonald – Executive Committee University of Illinois Scott Poole – Co-Director John Unsworth – Executive Committee
  • 3. HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as Model Expanded to include content from CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions and more than 8 million volumes
  • 4.
  • 5. Worked to identify key stakeholders from HT institutions to collaborate and write RFP
  • 6. Google Settlement in early 2011 did not stop the centerDeveloped specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009 HTRC RFP Working Group RFP Released – Winter 2010
  • 7. Our Collaboration HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.
  • 8. Our Mission Phase I : starting Apr 2011 and going for 18 mos. Phase II : starting Fall 2012 and going for … Goal: enable strong computational research and education on a collection that has not been amenable to computational exploration EVER before!
  • 9. Our Goals Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery. Also register derived data sets, indexes, and versions in registry repository. Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools. Support interoperability across collections and institutions, through use of inCommon SAML identity.
  • 10. Our Future Support innovation in cyberinfrastructure to deliver optimal access and use of the HathiTrust corpus. Implement “Non-consumptive” research: a technical and intellectual challenge Identify and host existing data analysis, text mining and retrieval toolsthat are of interest to the community.   Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.
  • 11. HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Poole, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board TBD HathiTrust Executive Committee Liaison Laine Farley, California Digital Library
  • 12. HathiTrust Research Center Today $250K in funding for initial 18 month startup Creating Themed Collections for early Use Cases Astronomy – Victorian Literature - Influenza Ingest and Replication Mechanisms Between HT and HTRC Full-text SOLR indexes Data Capsule integration Karma integration Integration with SEASR/MEANDRE SOA services at NCSA Alignment with Bamboo Technology Project Alignment with international Google Books Research Centers Establishing long-term non-consumptive research methodologies
  • 13. HTRC Proposed Technical Architecture Courtesy IU Data to Insight Center – Beth Plale/Yiming Sun
  • 14. Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun Current SEASR Integration Demo 1. User enters Author name or Volume title 2. Query RIS for Author Name or Volume Title Sample Collection Bibliography Database JS/PHP Auto-completer Book Search Interface by Author or Title 3. Volume ID 7. Tag Cloud returned to user 4. Invoke Tag Cloud service with URL Converted from MARC to RIS 5. Use URL to Retrieve Volume Public-domain OCR Web Access Servlet A persistent RESTful Web Service Tag Cloud Viewer Data Flow 6. OCR for volume Sample Public Domain Collection Meandre Workbench Organized as pairtree for demo only SEASR Infrastructure Administrator creates tag cloud viewer in advance through SEASR
  • 15. Non-Consumptive Research Track No action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection. Beth Plale (Indiana University) Atul Prakash (University of Michigan) Geoffrey Fox (Indiana University) Robert H. McDonald (Indiana University)
  • 16.
  • 17. Access to HT copyrighted indices
  • 18.
  • 19. HathiTrust Research Center Events HTRC Kickoff Event at Digital Humanities Conference 2011 Stanford University - June 20, 2011 Working on models for collaborative research AHRC/ESRC/IMLS/JISC/NEH/NSF/NOW/SSHRC Digging into Data Round 2 http://www.diggingintodata.org/ Working on early advanced user case studies for the HathiTrust Corpus
  • 20. Support and Acknowledgements IU UITS Research Technologies National Center for Supercomputing Applications IU Data to Insight Center iCHASS Illinois Informatics Institute Lilly Endowment, Inc. The Alfred P. Sloan Foundation
  • 21. For More on HathiTrust Research Center See – http://www.hathitrust-research.org Follow us @hathitresearch on twitter Robert H. McDonald @mcdonald on twitter robert@indiana.edu

Editor's Notes

  1. State Core Team NamesTalk about Partnership between IU and UIUC
  2. Basic History of HathiTrust Digital Library – Digital Public Library of America - LAC