SlideShare a Scribd company logo
1 of 23
Topic Exploration with the
HTRC Data Capsule for Non-Consumptive
Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15
Robert H. McDonald | Jiaan Zeng - Data To Insight Center
Jaimie Murdock – InPho Project
Indiana University
Tweet us - @HathiTrust #HTRC
HATHI TRUST RESEARCH CENTER
Tweet us - @InPhoproject
#HTRC @HathiTrust
Tutorial Agenda
• 9:00-9:15 - An overview of the HTRC (Robert
McDonald)
• 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng)
• 9:30-9:45 - Intro to Topic Models and the InPho
Explorer (Jaimie Murdock)
• 9:45-10:30 - Hands-On Parts 1&2
• 10:30-10:45 - Break
• 10:45-11:30 - Hands-On Parts 3&4
• 11:30-11:45 – Advanced Notebooks (Jaimie Murdock)
• 11:45-12:00 – HTRC Advanced Collaborative Support
(Robert McDonald)
HTRC@Events
• HTRC UnCamp 2015 – March
30-31, 2015 Ann Arbor, MI
• Stephen Downie Keynote at
JCDL 2015
• Digital Humanities 2015 – June
29-July 3, 2015 Sydney Australia
• (LSA)'s Biennial Linguistic
Institute, July 13, 2015 Chicago,
IL
• HILT 2015 – July 28-29, 2015
Indianapolis, IN
HATHI TRUST RESEARCH CENTER
Many thanks …
HTRC IU Team
• Beth Plale (PI)
• Robert H. McDonald
• Miao Chen
• Guangchen Ruan
• Zong Peng
• Milinda Pathirage
• Samitha Liyanage
• Jiaan Zeng
• Zong Peng
• Leena Unnikrishnan
• Nicholae Cline
HTRC UIUC Team
• J. Stephen Downie (PI)
• Beth Namachchivaya
• Megan Senseney
• Sayan Bhattacharyya
• Loretta Auvil
• Boris Capitanu
• Harriet Green
• Eleanor Dickson
#HTRC @HathiTrust
Outline
• What is the HTRC?
• Non-Consumptive Research Paradigm
• Current Architecture
• Future Architecture
• Advanced Collaborative Support (RFP)
#HTRC @HathiTrust
HathiTrust Digital Library
• HathiTrust is a partnership of 90+
academic & research institutions,
offering a collection of millions
of digitized titles.
• http://hathitrust.org
– IU is a founding member of the
HathiTrust along with University
of Michigan, University of
California, and the University of
Virginia
#HTRC @HathiTrust
HathiTrust Research Center
Mission
• Public research arm of HathiTrust
• Goal: enable researchers world-wide to
accomplish tera-scale text data-mining and
analysis
– Develop cutting-edge software tools for processing,
analyzing text
– Develop cyberinfrastructure to enable HPC access to
the HathiTrust Digital Library
• Established: July, 2011
• Collaborative center: Indiana University &
University of Illinois
#HTRC @HathiTrust
HTRC Timeline
• Phase I: development 01 Jul 2011 – 31 Mar 2013
– HTRC software and services release v1.0
https://github.com/htrc
• Phase II: outreach, 01 Apr 2013 – 30 June 2014
– 2nd HTRC UnCamp Sep ’13
• Phase III: operations, 01 July 2014 – present (2014-2018)
HTRC Current Users (ca 2014)
Projected Use 2019
Digital
Humanities
(60)
Education
(60)
Informatics
(60)
Observers
(20)
194 existing user accounts
Lots of user accounts; good
starting point.
Improve :
• Increase amount of real work
being accomplished as
measured by usage on HTRC’s
compute resources Quarry and
Big Red II at IU
• Develop educational uses
• Develop informatics uses
• Decrease number of observers
to 10%
 Project 200 users at any one time
of which 90% are doing relevant
education/scholarship
9
HTRC Current Users (ca Now)
#HTRC @HathiTrust
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
HTRC
Complexity hiding interface
All the complexity
Tabular info
Statistical plots
Spatial plots
Request
HTRC Version 2.0
HTRC Goals
• Provide a persistent and sustainable structure to
enable original and cutting edge research.
– Leverage data storage and computational infrastructure at Indiana &
Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible without
the HTRC
• Enable scholars to fully utilize content of
HathiTrust Library while preventing intellectual
property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars to
perform research using HathiTrust Digital Library.
HTRC Organization
2014-18
HTRC Executive
Mgmt
Administrative
Support
Core
Development
Advanced
Research
Advanced
Collaborative
Support
Scholarly
Commons
HTRC Data Capsule
HTRC Data Capsule@IU Team
• Beth Plale (PI)
• Jiaan Zeng
• Guangchen Ruan
HTRC Data Capsule@Michigan Team
• Atul Prakash (PI)
• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and
Beth Plale. 2014. Cloud computing data capsules for non-
consumptiveuse of texts. In Proceedings of the 5th ACM workshop
on Scientific cloud computing (ScienceCloud '14). ACM, New York,
NY, USA, 9-16. DOI=10.1145/2608029.2608031
http://doi.acm.org/10.1145/2608029.2608031
Special Thanks to
• Samitha Liyanage
• Milinda Pathirage
• Zong Peng
• Earlence Fernandes
• Ajit Aluri
@hathitrust
HTRC Data Capsule Workflow
Data Capsule Screenshots
Maintenance Mode
Secure Mode
#HTRC @HathiTrust
HTRC Advanced Collaborative Support
• ACS will be offered on a rolling basis over next
four years 2014-18
• 1st RFP Call Deadline was Jan 8, 2015 5:00pm
eastern
– RFP - http://www.hathitrust.org/htrc/acs-rfp
• For more info on the Advanced Collaborative
Support please contact:
htrc.acs.awards@gmail.com
#HTRC @HathiTrust
Scholarly Commons
User Support Service
• Develop training materials
• Educational workshops
• Tool and workset creation
• Collaborate with librarians and DH
centers at HT institutions
• Assist researchers in HTRC text data
mining research projects
• Led out of University of Illinois
Library; smaller group at IU
• Resourced at 2.7 FTE.
20
Administra ve Support
Senior Library Personnel
(4 supervisors at .05 FTE)
Senior Project
Coordinator
(.25 FTE)
Execu ve Assistant
(.5 FTE)
Core Development
Sr. So ware Architect
(1.0 FTE)
Research Programmer
(.5 FTE)
Library Research
Programmer
(.5 FTE)
IU Systems
Administrator
(.25 FTE)
User Interface Specialist
(2 years at 1.0 FTE)
Informa cs Developers
(2 developers for 2 years
at .15 FTE)
Advanced Research
CS PhD Students
LIS PhD Students
UI Systems
Administrator
(.5 FTE)
Advanced Collabora ve
Support (coordinated by
M. Chen)
Research Programmer
(.5 FTE)
Computa onal Research
Liaison
(.5 FTE)
Asst Dir Outreach &
Educa on (M. Chen)
(1 year at .25 FTE)
Scholarly Commons
Dig Humani es Specialist
(1.0 FTE)
CLIR Postdoctoral
Research Associate
(2 years at 1.0 FTE)
Digital Research
Librarian support
(.2 FTE)
Scholars Commons
Support
(.5 FTE)
LIS MS Students
(.25 FTE) (.11 FTE)
Key:
Area
Proposed for funding by HathTrust
#HTRC @HathiTrust
HTRC Future Work
• Copyrighted content in progress
• Advanced Collaborative Support
– The award model
– Award content is HTRC ACS staff time
– Collaborate with scholars on addressing their research needs related
to HTRC
– E.g. prototyping, running text analysis
– Advocate open source; encourage extending the work to a grant
submission
• Scholars Commons
– Interaction with scholars to help using HTRC tools and services
– An interface to interact with HTRC users via the channel of scholars
commons
– Series of workshops at IU and other places
– Weekly consulting time
– Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R
– Contact: Miao Chen, Nicholae Cline
#HTRC @HathiTrust
• For details http://www.hathitrust.org/htrc/faq
• General contact info
– J. Stephen Downie, Co-Director HTRC,
jdownie@Illinois.edu
– Beth Plale, Co-Director HTRC, plale@indiana.edu
• Requests for capability, interest
– Robert McDonald, rhmcdona@indiana.edu
#HTRC @HathiTrust
Important URLs
• HTRC Portal
– http://sharc.hathitrust.org
• Data Capsule Tutorial
– http://shoutkey.com/gin
• VNC Installation Directions
– http://shoutkey.com/peat

More Related Content

What's hot

MPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for AnalysisMPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for Analysis
Shawn Day
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
Micah Altman
 

What's hot (20)

BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group BL Labs at Bloomsbury Digital Humanities Group
BL Labs at Bloomsbury Digital Humanities Group
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Humanities Research with the Web of Data
Humanities Research with the Web of DataHumanities Research with the Web of Data
Humanities Research with the Web of Data
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
 
BL Labs Competition 2016
BL Labs Competition 2016BL Labs Competition 2016
BL Labs Competition 2016
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Iiif to go iiif vatican (7 minutes)
Iiif to go   iiif vatican (7 minutes)Iiif to go   iiif vatican (7 minutes)
Iiif to go iiif vatican (7 minutes)
 
re3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositories
 
Elab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-finalElab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-final
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historians
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Data
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
MPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for AnalysisMPhil Lecture on Data Vis for Analysis
MPhil Lecture on Data Vis for Analysis
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 

Similar to JCDL 2015 Tutorial Opening Slides

Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Peter Löwe
 

Similar to JCDL 2015 Tutorial Opening Slides (20)

HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital Library
 
Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12Curation Service Models - Michael Witt - RDAP12
Curation Service Models - Michael Witt - RDAP12
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Use of ICT in educational research
Use of ICT in educational researchUse of ICT in educational research
Use of ICT in educational research
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
 
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina SmithWorkshop 4: Open Science & Open Data for Librarians/Ina Smith
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
 
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research EnvironmentIFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Introduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital InfrastructureIntroduction to UC San Diego’s Integrated Digital Infrastructure
Introduction to UC San Diego’s Integrated Digital Infrastructure
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
 
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogusReferentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
Referentie Architectuur Onderzoeksdata en Onderzoeksdata diensten catalogus
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Data Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDsData Strategy and Services at the British Library: Data, Software and PIDs
Data Strategy and Services at the British Library: Data, Software and PIDs
 

More from Robert H. McDonald

More from Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Panel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IUPanel Session: VIVO and the data culture of universities-VIVO@IU
Panel Session: VIVO and the data culture of universities-VIVO@IU
 
THe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at ScaleTHe HathiTrust Research Center: Digital Humanities at Scale
THe HathiTrust Research Center: Digital Humanities at Scale
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
LLAMA SAAS Session on Telecommuting 6.25.12
 LLAMA SAAS Session on Telecommuting 6.25.12 LLAMA SAAS Session on Telecommuting 6.25.12
LLAMA SAAS Session on Telecommuting 6.25.12
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

JCDL 2015 Tutorial Opening Slides

  • 1. Topic Exploration with the HTRC Data Capsule for Non-Consumptive Joint Conference on Digital Libraries 2015 | Knoxville, TN| 06.21.15 Robert H. McDonald | Jiaan Zeng - Data To Insight Center Jaimie Murdock – InPho Project Indiana University Tweet us - @HathiTrust #HTRC HATHI TRUST RESEARCH CENTER Tweet us - @InPhoproject
  • 2. #HTRC @HathiTrust Tutorial Agenda • 9:00-9:15 - An overview of the HTRC (Robert McDonald) • 9:15-9:30 - HTRC Data Capsule Intro (Jiaan Zeng) • 9:30-9:45 - Intro to Topic Models and the InPho Explorer (Jaimie Murdock) • 9:45-10:30 - Hands-On Parts 1&2 • 10:30-10:45 - Break • 10:45-11:30 - Hands-On Parts 3&4 • 11:30-11:45 – Advanced Notebooks (Jaimie Murdock) • 11:45-12:00 – HTRC Advanced Collaborative Support (Robert McDonald)
  • 3. HTRC@Events • HTRC UnCamp 2015 – March 30-31, 2015 Ann Arbor, MI • Stephen Downie Keynote at JCDL 2015 • Digital Humanities 2015 – June 29-July 3, 2015 Sydney Australia • (LSA)'s Biennial Linguistic Institute, July 13, 2015 Chicago, IL • HILT 2015 – July 28-29, 2015 Indianapolis, IN HATHI TRUST RESEARCH CENTER
  • 4. Many thanks … HTRC IU Team • Beth Plale (PI) • Robert H. McDonald • Miao Chen • Guangchen Ruan • Zong Peng • Milinda Pathirage • Samitha Liyanage • Jiaan Zeng • Zong Peng • Leena Unnikrishnan • Nicholae Cline HTRC UIUC Team • J. Stephen Downie (PI) • Beth Namachchivaya • Megan Senseney • Sayan Bhattacharyya • Loretta Auvil • Boris Capitanu • Harriet Green • Eleanor Dickson
  • 5. #HTRC @HathiTrust Outline • What is the HTRC? • Non-Consumptive Research Paradigm • Current Architecture • Future Architecture • Advanced Collaborative Support (RFP)
  • 6. #HTRC @HathiTrust HathiTrust Digital Library • HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles. • http://hathitrust.org – IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia
  • 7. #HTRC @HathiTrust HathiTrust Research Center Mission • Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish tera-scale text data-mining and analysis – Develop cutting-edge software tools for processing, analyzing text – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library • Established: July, 2011 • Collaborative center: Indiana University & University of Illinois
  • 8. #HTRC @HathiTrust HTRC Timeline • Phase I: development 01 Jul 2011 – 31 Mar 2013 – HTRC software and services release v1.0 https://github.com/htrc • Phase II: outreach, 01 Apr 2013 – 30 June 2014 – 2nd HTRC UnCamp Sep ’13 • Phase III: operations, 01 July 2014 – present (2014-2018)
  • 9. HTRC Current Users (ca 2014) Projected Use 2019 Digital Humanities (60) Education (60) Informatics (60) Observers (20) 194 existing user accounts Lots of user accounts; good starting point. Improve : • Increase amount of real work being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU • Develop educational uses • Develop informatics uses • Decrease number of observers to 10%  Project 200 users at any one time of which 90% are doing relevant education/scholarship 9
  • 10. HTRC Current Users (ca Now)
  • 11. #HTRC @HathiTrust Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 12. HTRC Complexity hiding interface All the complexity Tabular info Statistical plots Spatial plots Request
  • 14. HTRC Goals • Provide a persistent and sustainable structure to enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana & Illinois – Stimulate community development of new functionality and tools – Use tools to enable discoveries that would not be possible without the HTRC • Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law. – Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
  • 16. HTRC Data Capsule HTRC Data Capsule@IU Team • Beth Plale (PI) • Jiaan Zeng • Guangchen Ruan HTRC Data Capsule@Michigan Team • Atul Prakash (PI) • Alexander Crowell Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non- consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031 Special Thanks to • Samitha Liyanage • Milinda Pathirage • Zong Peng • Earlence Fernandes • Ajit Aluri @hathitrust
  • 17. HTRC Data Capsule Workflow
  • 19. #HTRC @HathiTrust HTRC Advanced Collaborative Support • ACS will be offered on a rolling basis over next four years 2014-18 • 1st RFP Call Deadline was Jan 8, 2015 5:00pm eastern – RFP - http://www.hathitrust.org/htrc/acs-rfp • For more info on the Advanced Collaborative Support please contact: htrc.acs.awards@gmail.com
  • 20. #HTRC @HathiTrust Scholarly Commons User Support Service • Develop training materials • Educational workshops • Tool and workset creation • Collaborate with librarians and DH centers at HT institutions • Assist researchers in HTRC text data mining research projects • Led out of University of Illinois Library; smaller group at IU • Resourced at 2.7 FTE. 20 Administra ve Support Senior Library Personnel (4 supervisors at .05 FTE) Senior Project Coordinator (.25 FTE) Execu ve Assistant (.5 FTE) Core Development Sr. So ware Architect (1.0 FTE) Research Programmer (.5 FTE) Library Research Programmer (.5 FTE) IU Systems Administrator (.25 FTE) User Interface Specialist (2 years at 1.0 FTE) Informa cs Developers (2 developers for 2 years at .15 FTE) Advanced Research CS PhD Students LIS PhD Students UI Systems Administrator (.5 FTE) Advanced Collabora ve Support (coordinated by M. Chen) Research Programmer (.5 FTE) Computa onal Research Liaison (.5 FTE) Asst Dir Outreach & Educa on (M. Chen) (1 year at .25 FTE) Scholarly Commons Dig Humani es Specialist (1.0 FTE) CLIR Postdoctoral Research Associate (2 years at 1.0 FTE) Digital Research Librarian support (.2 FTE) Scholars Commons Support (.5 FTE) LIS MS Students (.25 FTE) (.11 FTE) Key: Area Proposed for funding by HathTrust
  • 21. #HTRC @HathiTrust HTRC Future Work • Copyrighted content in progress • Advanced Collaborative Support – The award model – Award content is HTRC ACS staff time – Collaborate with scholars on addressing their research needs related to HTRC – E.g. prototyping, running text analysis – Advocate open source; encourage extending the work to a grant submission • Scholars Commons – Interaction with scholars to help using HTRC tools and services – An interface to interact with HTRC users via the channel of scholars commons – Series of workshops at IU and other places – Weekly consulting time – Every Wed 2:30 – 4:30pm, IU library, Scholars Commons 157R – Contact: Miao Chen, Nicholae Cline
  • 22. #HTRC @HathiTrust • For details http://www.hathitrust.org/htrc/faq • General contact info – J. Stephen Downie, Co-Director HTRC, jdownie@Illinois.edu – Beth Plale, Co-Director HTRC, plale@indiana.edu • Requests for capability, interest – Robert McDonald, rhmcdona@indiana.edu
  • 23. #HTRC @HathiTrust Important URLs • HTRC Portal – http://sharc.hathitrust.org • Data Capsule Tutorial – http://shoutkey.com/gin • VNC Installation Directions – http://shoutkey.com/peat

Editor's Notes

  1. HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
  2. Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.
  3. Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
  4. The Scholarly Commons User Support service gives HT institutions exclusive access to training and learning materials that help them establish programs that integrate HTRC tools and services into their scholarly commons programs in libraries and digital humanities centers. The SC will be physically located on the University of Illinois Library’s Scholarly commons. Several Library staff and faculty will support this service. Key among these is the Digital Humanities Research Specialist who will assist with the development of training and outreach initiatives in support of researchers working with the Hathi Trust Research Center and HathiTrust digital library affiliates who seek to start their own HTRC research services. This will involve planning, implementation and continuous development of training materials, educational workshops, and potential tools, and outreach activities in support of the usage of HTRC tools and datasets. The HTRC Digital H. Specialist will focus on development of HTRC research services at HathiTrust member institutions, and will collaborate with public services and data services librarians at HathiTrust member institutions on developing support services for digital humanities research with HTRC corpus. The specialist will work closely with the English and Digital Humanities Librarian at the University of Illinois Library to develop research data services for the humanities, with particular emphasis on the HTRC corpus and tools. Additional professionals are focused on related aspects of HTRC work, including a CLIR Postdoc researching user requirements for HTRC tools, a Technical Specialist and other technical support. These professionals contribute to the work of the Scholarly Commons and to the HT community in helping to articulate the relationship between new technologies and humanities scholarship to the community of humanists; and in advising teaching faculty on the usage of digitized textual corpora and providing technical support for use of analytical tools. The scope and responsibilities will evolve in accordance with priorities established by the Library and HathiTrust community.     The specialist will spend up to 20 percent of their time on the support of research work with the HTRC. Examples of currently supported digital humanities projects involving the HTRC corpus include: A text mining project of eighteenth-century novels for changes in dialect; A textual analysis of nineteenth-century women's serial novels for thematic patterns; A comparative literature textual analysis project; Topic modeling of twentieth-century texts for depictions of African-American women.