SlideShare une entreprise Scribd logo
1  sur  20
HathiTrust Research Center
  Architecture Overview
      Robert H. McDonald | @mcdonald
Executive Committee-HathiTrust Research Center (HTRC)
         Deputy Director-Data to Insight Center
           Associate Dean-University Libraries
                Indiana University
Follow Along




http://slidesha.re/U4z1gW
HTRC Architecture Group
Indiana University   University of Illinois
• Beth Plale, Lead   • J. Stephen Downie
• Yiming Sun         • Loretta Auvil
• Stacy Kowalczyk    • Boris Capitanu
• Aaron Todd
                     • Kirk Hess
• Jiaan Zeng
                     • Harriett Green
• Guangchen Ruan
• Zong Peng
• Swati Nagde
Presentation Overview
•   Considerations for Current Architecture
•   Architecture - Use Case Methodology
•   Technical Overview
•   UnCamp Sessions for Further Review
Main Case – Data Near
             Computation
                                   HTRC
  HT                              Volume
                 HT              Store and
Volume
              Volume               Index
 Store
               Store               (IUB)
 (UM)                                              XSEDE
              (IUPUI)
                                                 Compute
                         FutureGrid              Allocation
                        Computation
                           Cloud                UIUC
                                             Compute
                                             Allocation
                             IU
                         Compute
                         Allocation
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
  either acting alone or in cooperation with
  other users over duration of one or multiple
  sessions can result in sufficient information
  gathered from collection of copyrighted works
  to reassemble pages from collection.
• Definition disallows collusion between users,
  or accumulation of material over time.
  Differentiates human researcher from proxy
  which is not a user. Users are human beings.
Amicus Brief and NCR
• Jockers, Sag, Schultz –
• http://tinyurl.com/cy34hhr
Use Cases for Phase 1 Architecture
• Use Case #1 - Previously registered user
  submitted algorithm retrieved and run with
  results set
• Use Case #2 - HTRC applications/portal access
  (SEASR)
• Use Case #3 – Blacklight Lucene/Solr faceted
  access
• Use Case #4 - Direct programmatic access
  through Secure Data API
HTRC Current Infrastructure
• Servers
  – 14 production-level quad-core servers
     • 16 – 32GB of memory
     • 250 – 500GB of local disk each
  – 6-node Cassandra cluster for volume store
  – Ingest service and secure Data API access point
• Storage (IU University Infrastructure)
  – 13TB of 15,000 RPM SAS disk storage
  – Increase up to 17TB by end of 2012
  – 500TB available in late year 2-year 3
Key Components of Architecture
•   Portal Access
•   Blacklight Access
•   Agent
•   Registry
•   Secured Data API Access
HTRC Architecture
 Portal Access
              Blacklight
                                                            Direct
          Agent                                         programmatic
                                                          access (by
Application       Collection                          programs running
submission         building                          on HTRC machines)


                                      Security (OAuth2)
                                                    Data API access interface           Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture                                         Portal Access
 Portal Access
              Blacklight
                                                                HTRC Portal
                                                            Direct
          Agent                                         programmatic
                                                          access (by
Application       Collection                          programs running                  Blacklight
submission         building                          on HTRC machines)


                                      Security (OAuth2)
                                                    App SEAR                     App Blacklight
                                                    Data API access interface              Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture                                              Agent
 Portal Access
                                                           HTRC Agent
              Blacklight
                                                           Direct
          Agent                                   Application
                                                       programmatic       Collection
                                                         access (by
Application       Collection
                                                  submission
                                                     programs running      building
submission         building                         on HTRC machines)


                                      Security (OAuth2)
                                                    Data API access interface          Solr Proxy
               Registry (WSO2)                                          Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                          Solr index




 Compute resources
                                                          Storage resources
HTRC Architecture                                         HTRC Registry
 Portal Access
                                                            Registry (WSO2)
              Blacklight
                                                                                Meandre
                                                     Algorithms
                                                         Direct
                                                        programmatic
                                                                                Workflows
          Agent
                                                          access (by
Application       Collection                          programs running




                                                                     1
submission         building                          on HTRC Sets
                                                     Resultmachines)            Collections

                                      Security (OAuth2)
                                                    Data API access interface            Solr Proxy
               Registry (WSO2)                                           Audit
                               Meandre
         Algorithms                                              Cassandra
                               Workflows
                                                                 cluster
                                                                 volume store
         Result Sets           Collections
                                                                           Solr index




 Compute resources
                                                           Storage resources
HTRC Architecture
                                                             Secure Data API
 Portal Access
              Blacklight                                • RESTful Web Service
                                                    Direct     –   Language agnostic
          Agent                                 programmatic –     Clients don’t have to
                                                  access (by
Application       Collection                  programs running     deal with Cassandra
submission         building
                                                             • Simple OAuth2
                                             on HTRC machines)

                                                                  authentication
                                  Security (OAuth2)          • HTTP over SSL
                                                             • Audits
                                                 Data API access interface client access
                                                                                  Solr Proxy
                Registry (WSO2)
                                                             • Protected behind
                                                                      Audit
         Algorithms
                            Meandre
                           Workflows
                                                                  firewall, accessible
                                                              Cassandra
                                                                  only to authorized IPs
                                                              cluster
                                                         volume store
         Result Sets           Collections
                                                                    Solr index

                                                                        HTRC
 Compute resources
                                                   Storage resources
NoSQL Methodology
• Currently HT content is stored in a pair-tree file
  system convention (CDL)
• Moving these files into a NoSQL store like
  Cassandra enabled HTRC to aggregate them into
  larger sets of files for use in retrieval
• Use of Cassandra enabled HTRC to share content
  over a commodity based Cassandra cluster of
  virtual machines
• Originally investigated use of MongoDB,
  CouchDB, Hbase and Cassandra
HTRC Solr index
• The Solr Data API 0.1 test version
  – Preserves all query syntax of original Solr
  – Prevents user from modification
  – Hides the host machine and port number HTRC
    Solr is actually running on
  – Creates audit log of requests
  – Provides filtered term vector for words starting
    with user-specified letter
Data Capsules VM
            Cluster

                                          HTRC Volume
                                         Store and Index

      Remote
                  Provide secure
      Desktop
                  VM
      Or VNC

                       Submit secure
 Scholars              capsule
                                          FutureGrid
                       map/reduce Data
                                         Computation
                       Capsule images
                                            Cloud
                       to FutureGrid.
                       Receive and
                       review results


Non-Consumptive Research-Secure Data Capsule
Sessions for Further Review
• For more on API – Tues Topic I/II
  (Yiming Sun)
• For more on Portal/SEASR – Tues Topic II
  (Loretta Auvil)
• For more on Portal/Blacklight – Tues Topic III
  (Stacy Kowalczyk)
Contact Information
• Robert H. McDonald
  – Email – robert@indiana.edu
  – Chat – rhmcdonald on googletalk | skype
  – Twitter - @mcdonald
  – Blog – http://www.rmcdonald.net
  – Twitter Hashtag: #HTRC12

Contenu connexe

En vedette

To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
John Sing
 

En vedette (12)

Cloud Architecture in the Data Center
Cloud Architecture in the Data CenterCloud Architecture in the Data Center
Cloud Architecture in the Data Center
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
MetaFabric Architecture
MetaFabric ArchitectureMetaFabric Architecture
MetaFabric Architecture
 
Data Center: Earth
Data Center: EarthData Center: Earth
Data Center: Earth
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
Data Center Floor Design - Your Layout Can Save of Kill Your PUE & Cooling Ef...
 
Saving money with smart data center design
Saving money with smart data center designSaving money with smart data center design
Saving money with smart data center design
 
Data Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle EastData Center Free Cooling in the Middle East
Data Center Free Cooling in the Middle East
 
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
To_Infinity_and_Beyond_Internet_Scale_Workloads_Data_Center_Design_v6
 

Similaire à HTRC Architecture Overview

Wc Mand Connectors2
Wc Mand Connectors2Wc Mand Connectors2
Wc Mand Connectors2
day
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined Netoworks
Mohammad Faraji
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
drmarcustillett
 
Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009
rsnarayanan
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
MapR Technologies
 

Similaire à HTRC Architecture Overview (20)

HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
OGCE MSI Presentation
OGCE MSI PresentationOGCE MSI Presentation
OGCE MSI Presentation
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Wc Mand Connectors2
Wc Mand Connectors2Wc Mand Connectors2
Wc Mand Connectors2
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData Platform
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined Netoworks
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009Virtualization Map Tech Ed2009
Virtualization Map Tech Ed2009
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
Microservices
MicroservicesMicroservices
Microservices
 
Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka Processing IoT Data from End to End with MQTT and Apache Kafka
Processing IoT Data from End to End with MQTT and Apache Kafka
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Avenue Omg
Avenue OmgAvenue Omg
Avenue Omg
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
Play with cloud foundry
Play with cloud foundryPlay with cloud foundry
Play with cloud foundry
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 

Plus de Robert H. McDonald

Plus de Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

HTRC Architecture Overview

  • 1. HathiTrust Research Center Architecture Overview Robert H. McDonald | @mcdonald Executive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data to Insight Center Associate Dean-University Libraries Indiana University
  • 3. HTRC Architecture Group Indiana University University of Illinois • Beth Plale, Lead • J. Stephen Downie • Yiming Sun • Loretta Auvil • Stacy Kowalczyk • Boris Capitanu • Aaron Todd • Kirk Hess • Jiaan Zeng • Harriett Green • Guangchen Ruan • Zong Peng • Swati Nagde
  • 4. Presentation Overview • Considerations for Current Architecture • Architecture - Use Case Methodology • Technical Overview • UnCamp Sessions for Further Review
  • 5. Main Case – Data Near Computation HTRC HT Volume HT Store and Volume Volume Index Store Store (IUB) (UM) XSEDE (IUPUI) Compute FutureGrid Allocation Computation Cloud UIUC Compute Allocation IU Compute Allocation
  • 6. Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 7. Amicus Brief and NCR • Jockers, Sag, Schultz – • http://tinyurl.com/cy34hhr
  • 8. Use Cases for Phase 1 Architecture • Use Case #1 - Previously registered user submitted algorithm retrieved and run with results set • Use Case #2 - HTRC applications/portal access (SEASR) • Use Case #3 – Blacklight Lucene/Solr faceted access • Use Case #4 - Direct programmatic access through Secure Data API
  • 9. HTRC Current Infrastructure • Servers – 14 production-level quad-core servers • 16 – 32GB of memory • 250 – 500GB of local disk each – 6-node Cassandra cluster for volume store – Ingest service and secure Data API access point • Storage (IU University Infrastructure) – 13TB of 15,000 RPM SAS disk storage – Increase up to 17TB by end of 2012 – 500TB available in late year 2-year 3
  • 10. Key Components of Architecture • Portal Access • Blacklight Access • Agent • Registry • Secured Data API Access
  • 11. HTRC Architecture Portal Access Blacklight Direct Agent programmatic access (by Application Collection programs running submission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 12. HTRC Architecture Portal Access Portal Access Blacklight HTRC Portal Direct Agent programmatic access (by Application Collection programs running Blacklight submission building on HTRC machines) Security (OAuth2) App SEAR App Blacklight Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 13. HTRC Architecture Agent Portal Access HTRC Agent Blacklight Direct Agent Application programmatic Collection access (by Application Collection submission programs running building submission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 14. HTRC Architecture HTRC Registry Portal Access Registry (WSO2) Blacklight Meandre Algorithms Direct programmatic Workflows Agent access (by Application Collection programs running 1 submission building on HTRC Sets Resultmachines) Collections Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  • 15. HTRC Architecture Secure Data API Portal Access Blacklight • RESTful Web Service Direct – Language agnostic Agent programmatic – Clients don’t have to access (by Application Collection programs running deal with Cassandra submission building • Simple OAuth2 on HTRC machines) authentication Security (OAuth2) • HTTP over SSL • Audits Data API access interface client access Solr Proxy Registry (WSO2) • Protected behind Audit Algorithms Meandre Workflows firewall, accessible Cassandra only to authorized IPs cluster volume store Result Sets Collections Solr index HTRC Compute resources Storage resources
  • 16. NoSQL Methodology • Currently HT content is stored in a pair-tree file system convention (CDL) • Moving these files into a NoSQL store like Cassandra enabled HTRC to aggregate them into larger sets of files for use in retrieval • Use of Cassandra enabled HTRC to share content over a commodity based Cassandra cluster of virtual machines • Originally investigated use of MongoDB, CouchDB, Hbase and Cassandra
  • 17. HTRC Solr index • The Solr Data API 0.1 test version – Preserves all query syntax of original Solr – Prevents user from modification – Hides the host machine and port number HTRC Solr is actually running on – Creates audit log of requests – Provides filtered term vector for words starting with user-specified letter
  • 18. Data Capsules VM Cluster HTRC Volume Store and Index Remote Provide secure Desktop VM Or VNC Submit secure Scholars capsule FutureGrid map/reduce Data Computation Capsule images Cloud to FutureGrid. Receive and review results Non-Consumptive Research-Secure Data Capsule
  • 19. Sessions for Further Review • For more on API – Tues Topic I/II (Yiming Sun) • For more on Portal/SEASR – Tues Topic II (Loretta Auvil) • For more on Portal/Blacklight – Tues Topic III (Stacy Kowalczyk)
  • 20. Contact Information • Robert H. McDonald – Email – robert@indiana.edu – Chat – rhmcdonald on googletalk | skype – Twitter - @mcdonald – Blog – http://www.rmcdonald.net – Twitter Hashtag: #HTRC12

Notes de l'éditeur

  1. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  2. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  3. Registry – agent can deploy any service listed in this diagram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  4. Registry – agent can deploy any service listed in this diagram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  5. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -