SlideShare une entreprise Scribd logo
1  sur  24
The HathiTrust Research Center:
An Overview of Advanced
Computational Services
April 18, 2015 | DPLAFest 2015 | Indianapolis, IN
Robert H. McDonald
Indiana University Libraries | Data To Insight Center
Indiana University
Tweet us - #HTRC #DPLAFest
HATHI TRUST RESEARCH CENTER
Many thanks …
HTRC IU Team
• Beth Plale
• Robert H. McDonald
• Dirk Herr-Hoyman
• Miao Chen
• Guangchen Ruan
• Zong Peng
• Milinda Pathirage
• Samitha Liyanage
• Leena Unnikrishnan
• Nicholae Cline
HTRC UIUC Team
• J. Stephen Downie
• Beth Namachchivaya
• Ryan Dubnicek
• Megan Senseney
• Sayan Bhattacharyya
• Colleen Fallaw
• Loretta Auvil
• Boris Capitanu
• Harriet Green
• Jacob Jett
• Dan Bassett
4/18/15 #HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of academic &
research institutions, offering a collection of
millions of titles digitized from libraries around
the world.
– IU is a founding member of the HathiTrust along
with University of Michigan, University of
California, and the University of Virginia.
http://www.hathitrust.org/htrc
http://www.hathitrust.org
4/18/15 #HTRC @HathiTrust
HathiTrust “Wow” Numbers
• 13,284,163 total volumes
• 6,742,394 book titles
• 352,534 serial titles
• 4,649,457,050 pages
• 595 terabytes
• 157 miles
• 10,793 tons
• 4,979,599 volumes in the public domain
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1. Michigan 4,712,752
2. California 3,612,596
3. Harvard 838,115
4. Wisconsin 561,094
5. Indiana 529,601
6. Cornell 510,286
7. Penn State 388,713
8. Illinois 329,136
9. NYPL 294,883
10. Princeton 252,837
11. Minnesota 193,124
12. Madrid 117,291
13. Library of
Congress
108,892
14. Keio University 90,112
4/18/15 #HTRC @HathiTrust
Goals for HTRC
• Provide a persistent and sustainable structure to
enable scholars to ask and answer new questions.
– Leverage data storage and computational infrastructure at Indiana
& Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible
without the HTRC
• Enable scholars to fully utilize content of HathiTrust
Library while preventing intellectual property misuse
within U.S. copyright law.
– Provide a secure computational and data environment for
scholars to perform research using HathiTrust Digital Library.
4/18/15 #HTRC @HathiTrust
HathiTrust and HTRC
HathiTrust
University
of
Illinois
Indiana
University
HathiTrust
Research
Center
University
of
Michigan
• Board of Governors
• Executive Committee
• Executive Director
Non-Consumptive
Research
4/18/15 #HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
HTRC Services
4/18/15 #HTRC @HathiTrust
Working with HTRC Tools
Get started at: https://htrc2.pti.indiana.edu/
Build Worksets
Execute Algorithms
Visualize Term Frequency
http://sandbox.htrc.illinois.edu/bookworm/
4/18/15 #HTRC @HathiTrust
Working with HTRC Staff
Advanced Collaborative
Support
Scholarly Commons
Advanced Research
Workshops, tutorials, and
guidance for using HTRC
One-on-one research support
provided through a competitive
awards process
Collaborative research
partnership with HTRC
4/18/15 #HTRC @HathiTrust
SHARC (v 3.1)
Secure HathiTrust Analytics Research Commons
4/18/15 #HTRC @HathiTrust
HTRC Personal Account Creation
https://htrc2.pti.indiana.edu
4/18/15 #HTRC @HathiTrust
Registration
4/18/15 #HTRC @HathiTrust
Data Capsule
UnCamp 2015
Research Engagements
4/18/15 #HTRC @HathiTrust
ACS (Advanced Collaborative Support)
1. Tracing Technology Diffusion Over Time: Dr. Michelle Alexopoulos, a scholar in
economics from the University of Toronto.
2. Detecting Literary Plagiarisms: The Case of Oliver Goldsmith: Douglas Duhaime.
University of Notre Dame: Will work on developing tools for detecting
plagiarisms. He will focus on the case of Oliver Goldsmith, to detect the literary
thefts of Goldsmith by using machine learning techniques.
3. Taxonomizing the Texts: Towards Cultural-Scale Models of Full Text: Colin Allen,
Jaimie Murdock. Indiana University Bloomington. Allen and Murdock will carry
out a cultural-scale investigation and topic modeling on HT public-domain full
text through random sampling to select collections according to the Library of
Congress Subject Headings (LCSH).
4. The Trace of Theory. Geoffrey Rockwell, Laura Mandell, Stefan Sinclair, Matthew
Wilkens, Susan Brown. University of Alberta, Texas A&M University, University of
Notre Dame. Aim to subset theoretical subsets from the HT public corpus and
apply large-scale topic modeling on the subsets. The researchers will develop
tools and computational methods for tracking the concept of "theory.”
4/18/15 #HTRC @HathiTrust
WCSA Funded Projects
1. Workset Creation through Image Analysis of
Document Pages - Texas A&M University (PI: Keith
Biggers)
2. Semantic Analysis of Documents from the HathiTrust
Corpus - Waikato University (PI: Annike Hinze)
3. Distributed Metadata Correction and Annotation-
Maryland Institute for Technology in the Humanities,
University of Maryland. (PI: Trevor Muñoz)
4. ElEPHãT: Early English Print in HathiTrust, a Linked
Semantic Workset Prototype-Oxford University (PI:
Kevin Page)
4/18/15 #HTRC @HathiTrust
HTRC Data Capsule for Secure Text-
Mining at Scale
Funded at $606,000 by The Alfred P. Sloan Foundation; Beth
Plale, Indiana University, PI; Atul Prakash, University of Michigan,
Co-PI; Fall 2011 – Fall 2014.
Goal: Prototype a system that enables secure text mining to be
carried out at scale using public cloud resources, including:
1. a software cloud infrastructure based on OpenStack
2. mechanisms for managing a secure virtual machine We plan
The Sloan Cloud will provide users with dedicated virtual
machines that are pre-configured with appropriate tools and
provide secure access to remote data that cannot be funneled
through the VM to outside filesystems.
4/18/15 #HTRC @HathiTrust
NEH Bookworm+HTRC Project
http://sandbox.htrc.illinois.edu/bookworm/
UpComing Events
4/18/15 #HTRC @HathiTrust
HTRC UpComing Events
1. Tutorial at JCDL 2015 – June 21, 2015 –
Knoxville, TN
– Topic Exploration with the HTRC Data Capsule for
Non-Consumptive Research
– http://www.jcdl2015.org/tutorials-workshops
2. HASTAC 2015 Post Conference Workshop – May
30, 2015 – East Lansing, MI
– Workshop on Text Mining with the HathiTrust
Research Center
– http://www.hastac2015.org/schedule/post-
conference-workshops/

Contenu connexe

Tendances

Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for librariesLEARN Project
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
 
Only as good as our sources
Only as good as our sourcesOnly as good as our sources
Only as good as our sourcesPru Mitchell
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Jisc
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy ProjectDuraSpace
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
Laurent Romary #OAdata 7 May 2013
Laurent Romary #OAdata 7 May 2013Laurent Romary #OAdata 7 May 2013
Laurent Romary #OAdata 7 May 2013dri_ireland
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
 
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...OCLC
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamPlatforma Otwartej Nauki
 

Tendances (20)

Arlitsch may4-3
Arlitsch may4-3Arlitsch may4-3
Arlitsch may4-3
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for libraries
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey Boulton
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
March 18 NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? P...
March 18 NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? P...March 18 NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? P...
March 18 NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? P...
 
Introduction to open-data
Introduction to open-dataIntroduction to open-data
Introduction to open-data
 
Only as good as our sources
Only as good as our sourcesOnly as good as our sources
Only as good as our sources
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Laurent Romary #OAdata 7 May 2013
Laurent Romary #OAdata 7 May 2013Laurent Romary #OAdata 7 May 2013
Laurent Romary #OAdata 7 May 2013
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...Putting Research Data into Context: A Scholarly Approach to Curating Data for...
Putting Research Data into Context: A Scholarly Approach to Curating Data for...
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Data Management for Librarians
Data Management for LibrariansData Management for Librarians
Data Management for Librarians
 

Similaire à The HathiTrust Research Center: An Overview of Advanced Computational Services

The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryRobert H. McDonald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...Spencer Keralis
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Beth Plale
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
 
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...Harriett Green
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Allen Press
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott LibraryRebekah Cummings
 
Open Data as OER for Transversal Skills - WOERC 2017
Open Data as OER for Transversal Skills - WOERC 2017Open Data as OER for Transversal Skills - WOERC 2017
Open Data as OER for Transversal Skills - WOERC 2017Leo Havemann
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!Renaine Julian
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBeth Plale
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesKeith Webster
 
HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013Beth Plale
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...African Open Science Platform
 
Research Data Census
Research Data CensusResearch Data Census
Research Data CensusJerry Sheehan
 

Similaire à The HathiTrust Research Center: An Overview of Advanced Computational Services (20)

The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital Library
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
 
Ps rwebinar january2019final
Ps rwebinar january2019finalPs rwebinar january2019final
Ps rwebinar january2019final
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Open Data as OER for Transversal Skills - WOERC 2017
Open Data as OER for Transversal Skills - WOERC 2017Open Data as OER for Transversal Skills - WOERC 2017
Open Data as OER for Transversal Skills - WOERC 2017
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research Libraries
 
HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
 
Research Data Census
Research Data CensusResearch Data Census
Research Data Census
 

Plus de Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Panel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IUPanel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IURobert H. McDonald
 

Plus de Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Panel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IUPanel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IU
 

Dernier

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 

Dernier (20)

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

The HathiTrust Research Center: An Overview of Advanced Computational Services

  • 1. The HathiTrust Research Center: An Overview of Advanced Computational Services April 18, 2015 | DPLAFest 2015 | Indianapolis, IN Robert H. McDonald Indiana University Libraries | Data To Insight Center Indiana University Tweet us - #HTRC #DPLAFest HATHI TRUST RESEARCH CENTER
  • 2. Many thanks … HTRC IU Team • Beth Plale • Robert H. McDonald • Dirk Herr-Hoyman • Miao Chen • Guangchen Ruan • Zong Peng • Milinda Pathirage • Samitha Liyanage • Leena Unnikrishnan • Nicholae Cline HTRC UIUC Team • J. Stephen Downie • Beth Namachchivaya • Ryan Dubnicek • Megan Senseney • Sayan Bhattacharyya • Colleen Fallaw • Loretta Auvil • Boris Capitanu • Harriet Green • Jacob Jett • Dan Bassett
  • 3. 4/18/15 #HTRC @HathiTrust HathiTrust Digital Library • HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world. – IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia. http://www.hathitrust.org/htrc http://www.hathitrust.org
  • 4. 4/18/15 #HTRC @HathiTrust HathiTrust “Wow” Numbers • 13,284,163 total volumes • 6,742,394 book titles • 352,534 serial titles • 4,649,457,050 pages • 595 terabytes • 157 miles • 10,793 tons • 4,979,599 volumes in the public domain
  • 5. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1. Michigan 4,712,752 2. California 3,612,596 3. Harvard 838,115 4. Wisconsin 561,094 5. Indiana 529,601 6. Cornell 510,286 7. Penn State 388,713 8. Illinois 329,136 9. NYPL 294,883 10. Princeton 252,837 11. Minnesota 193,124 12. Madrid 117,291 13. Library of Congress 108,892 14. Keio University 90,112
  • 6. 4/18/15 #HTRC @HathiTrust Goals for HTRC • Provide a persistent and sustainable structure to enable scholars to ask and answer new questions. – Leverage data storage and computational infrastructure at Indiana & Illinois – Stimulate community development of new functionality and tools – Use tools to enable discoveries that would not be possible without the HTRC • Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law. – Provide a secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
  • 7. 4/18/15 #HTRC @HathiTrust HathiTrust and HTRC HathiTrust University of Illinois Indiana University HathiTrust Research Center University of Michigan • Board of Governors • Executive Committee • Executive Director
  • 9. 4/18/15 #HTRC @HathiTrust Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 11. 4/18/15 #HTRC @HathiTrust Working with HTRC Tools Get started at: https://htrc2.pti.indiana.edu/ Build Worksets Execute Algorithms Visualize Term Frequency http://sandbox.htrc.illinois.edu/bookworm/
  • 12. 4/18/15 #HTRC @HathiTrust Working with HTRC Staff Advanced Collaborative Support Scholarly Commons Advanced Research Workshops, tutorials, and guidance for using HTRC One-on-one research support provided through a competitive awards process Collaborative research partnership with HTRC
  • 13. 4/18/15 #HTRC @HathiTrust SHARC (v 3.1) Secure HathiTrust Analytics Research Commons
  • 14. 4/18/15 #HTRC @HathiTrust HTRC Personal Account Creation https://htrc2.pti.indiana.edu
  • 19. 4/18/15 #HTRC @HathiTrust ACS (Advanced Collaborative Support) 1. Tracing Technology Diffusion Over Time: Dr. Michelle Alexopoulos, a scholar in economics from the University of Toronto. 2. Detecting Literary Plagiarisms: The Case of Oliver Goldsmith: Douglas Duhaime. University of Notre Dame: Will work on developing tools for detecting plagiarisms. He will focus on the case of Oliver Goldsmith, to detect the literary thefts of Goldsmith by using machine learning techniques. 3. Taxonomizing the Texts: Towards Cultural-Scale Models of Full Text: Colin Allen, Jaimie Murdock. Indiana University Bloomington. Allen and Murdock will carry out a cultural-scale investigation and topic modeling on HT public-domain full text through random sampling to select collections according to the Library of Congress Subject Headings (LCSH). 4. The Trace of Theory. Geoffrey Rockwell, Laura Mandell, Stefan Sinclair, Matthew Wilkens, Susan Brown. University of Alberta, Texas A&M University, University of Notre Dame. Aim to subset theoretical subsets from the HT public corpus and apply large-scale topic modeling on the subsets. The researchers will develop tools and computational methods for tracking the concept of "theory.”
  • 20. 4/18/15 #HTRC @HathiTrust WCSA Funded Projects 1. Workset Creation through Image Analysis of Document Pages - Texas A&M University (PI: Keith Biggers) 2. Semantic Analysis of Documents from the HathiTrust Corpus - Waikato University (PI: Annike Hinze) 3. Distributed Metadata Correction and Annotation- Maryland Institute for Technology in the Humanities, University of Maryland. (PI: Trevor Muñoz) 4. ElEPHãT: Early English Print in HathiTrust, a Linked Semantic Workset Prototype-Oxford University (PI: Kevin Page)
  • 21. 4/18/15 #HTRC @HathiTrust HTRC Data Capsule for Secure Text- Mining at Scale Funded at $606,000 by The Alfred P. Sloan Foundation; Beth Plale, Indiana University, PI; Atul Prakash, University of Michigan, Co-PI; Fall 2011 – Fall 2014. Goal: Prototype a system that enables secure text mining to be carried out at scale using public cloud resources, including: 1. a software cloud infrastructure based on OpenStack 2. mechanisms for managing a secure virtual machine We plan The Sloan Cloud will provide users with dedicated virtual machines that are pre-configured with appropriate tools and provide secure access to remote data that cannot be funneled through the VM to outside filesystems.
  • 22. 4/18/15 #HTRC @HathiTrust NEH Bookworm+HTRC Project http://sandbox.htrc.illinois.edu/bookworm/
  • 24. 4/18/15 #HTRC @HathiTrust HTRC UpComing Events 1. Tutorial at JCDL 2015 – June 21, 2015 – Knoxville, TN – Topic Exploration with the HTRC Data Capsule for Non-Consumptive Research – http://www.jcdl2015.org/tutorials-workshops 2. HASTAC 2015 Post Conference Workshop – May 30, 2015 – East Lansing, MI – Workshop on Text Mining with the HathiTrust Research Center – http://www.hastac2015.org/schedule/post- conference-workshops/

Notes de l'éditeur

  1. 5
  2. 6
  3. 7
  4. Data Capsule Enhanced Feature Extraction Alpha Version of Bookworm
  5. 18