SlideShare une entreprise Scribd logo
1  sur  21
Experience with Ingestion of Large Collections
Stuart Kenny
Research IT
Trinity College Dublin
Stuart Kenny
Research IT
Trinity College Dublin
The Fairy Tales of Charles Perrault. Illustrated by Harry Clarke.
Intro. Thomas Bodkin. London: George G. Harrap, [1922].
Internet Archive version of a copy in the New York Public Library.
Web. 25 December 2012.
My what a big collection you
have!
About DRI (https://repository.dri.ie/)
● DRI is an interactive trusted digital repository for
contemporary and historical, social and cultural
data held by Irish institutions
● RIA (lead), NUIM, TCD, DIT, NUIG, NCAD
● Partners: academic, cultural, social, government
Outline
• What’s our problem?
• Example collections
• Ingest solutions
• Current ingest process
• Possible future process
Ingesting Objects
• Ingest form
o Suitable for single
objects/small collections
o Flat hierarchies
o Simple metadata standards
• Multiple standards
o e.g., MARC, EAD
o XML upload
• How to handle complex
standards, many
objects?
Example Collection: Clarke Stained Glass
• MODS metadata
• 10,025 objects
• 42 sub-collections
• 20,047 files, 2.82 TB
• Problems:
o Large number of objects
o Data transfer
Example Collection: TCD Children’s Books
• MARC metadata
• 207,889 objects
• 16 sub-collections
• Problems:
o Large number of objects
o Very slow to ingest
o Timeouts and errors
Example Collection: Kilkenny Design Workshop
• EAD metadata
• 2,040 objects
• 2,734 series/files
• 2,231 files, 1.2GB
• Problems:
o Very complex metadata standard
o Hierarchical structure
EAD, and why I don’t quite hate it as much as I did...
• Single XML file upload
• Structure encoded in metadata
• URLs to files
• But
o One-shot ingest
o How to edit/update?
o Slow to ingest
o Requires a lot of resources
Sufia Batch Upload
• Add multiple files
• New work for each
• Metadata for each
work
• How to handle
multiple standards?
• Different metadata
for each work?
Avalon Batch Ingest
• Ingest package
o Manifest file
o Plus content files
• Manifest file is spreadsheet
o Metadata for items
o Names of content files
• Ingest package uploaded to Avalon DropBox
Approach up to now
• Command line client
o Enter text commands at ‘command prompt’
• Written in Ruby
• Run locally by user
• Metadata and asset files arranged in fixed directory structure
• Client iterates over directory creates each object as single
ingest
Problems
• Lack of user familiarity with command line
• Multiple platform support
o i.e., Windows
• Difficulty of installing
• Multiple single ingests
o Slow
o Error prone
• Required lots of user support
• Mostly in the end ingests performed by dev team
Current Attempt
• Web-based UI
• Borrow heavily from Avalon approach
• Upload metadata XML plus assets to online storage
• Add manifest spreadsheet
o Each row contains path to metadata
o Paths to zero or more asset files
o Paths relative to online storage directory
• Backend processes manifest and ingests as background task
• UI updates status
Current Attempt
UI
Online
Storage Repository
Select
manifest
Retrieve
remote
files
Ingest
Update
status
• Hydra BrowseEverything
o Gem to access cloud storage
o DropBox, Google Drive…
• User uploads files
• In UI selects collection
and manifest to ingest
• Everything handled
server side in
background
• Can view status in UI
Outstanding Issues
• Online storage
o Dropbox type storage size limits
• Creating spreadsheet less easy than directory structure
• Possible solutions
o Provide online storage
o Has to be per user
o Generate required manifest from uploaded directory structure

Contenu connexe

Tendances

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Working with Islandora
Working with Islandora Working with Islandora
Working with Islandora eohallor
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studieseohallor
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
Best Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingBest Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingOCLC
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Digital Document Retention and SharePoint
Digital Document Retention and SharePointDigital Document Retention and SharePoint
Digital Document Retention and SharePointChris Grant
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15Michele Mennielli
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...Oscar Corcho
 
Ontology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsOntology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsTrish Whetzel
 
DSpace-CRIS Workshop OR2015: Slides
DSpace-CRIS Workshop OR2015: SlidesDSpace-CRIS Workshop OR2015: Slides
DSpace-CRIS Workshop OR2015: SlidesAndrea Bollini
 
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?Gudmundur Thorisson
 

Tendances (20)

The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Working with Islandora
Working with Islandora Working with Islandora
Working with Islandora
 
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked DataIFLA LIDASIG Open Session 2017: Introduction to Linked Data
IFLA LIDASIG Open Session 2017: Introduction to Linked Data
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Best Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingBest Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web Archiving
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Digital Document Retention and SharePoint
Digital Document Retention and SharePointDigital Document Retention and SharePoint
Digital Document Retention and SharePoint
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
 
Ontology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsOntology Web services for Semantic Applications
Ontology Web services for Semantic Applications
 
DSpace-CRIS Workshop OR2015: Slides
DSpace-CRIS Workshop OR2015: SlidesDSpace-CRIS Workshop OR2015: Slides
DSpace-CRIS Workshop OR2015: Slides
 
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
 

En vedette

Kev Long - Administrative Roles in the DRI
Kev Long - Administrative Roles in the DRIKev Long - Administrative Roles in the DRI
Kev Long - Administrative Roles in the DRIdri_ireland
 
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your CollectionTim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collectiondri_ireland
 
Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storageKathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storagedri_ireland
 
Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?dri_ireland
 
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection dri_ireland
 
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...dri_ireland
 
Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?dri_ireland
 
Clare Lanigan - DRI Training Day UCC: Understanding Copyright
Clare Lanigan - DRI Training Day UCC: Understanding CopyrightClare Lanigan - DRI Training Day UCC: Understanding Copyright
Clare Lanigan - DRI Training Day UCC: Understanding Copyrightdri_ireland
 
Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storage Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storage dri_ireland
 
Dr Natalie Harrower - DRI and Open Data
Dr Natalie Harrower - DRI and Open DataDr Natalie Harrower - DRI and Open Data
Dr Natalie Harrower - DRI and Open Datadri_ireland
 
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...dri_ireland
 
Clare Lanigan - DRI Training Series: 3. Understanding Copyright
Clare Lanigan - DRI Training Series: 3. Understanding CopyrightClare Lanigan - DRI Training Series: 3. Understanding Copyright
Clare Lanigan - DRI Training Series: 3. Understanding Copyrightdri_ireland
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)dri_ireland
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)dri_ireland
 
Ingrid Dillo - Digital humanities challenges and the Research Data Alliance
Ingrid Dillo - Digital humanities challenges and the Research Data AllianceIngrid Dillo - Digital humanities challenges and the Research Data Alliance
Ingrid Dillo - Digital humanities challenges and the Research Data Alliancedri_ireland
 
Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)dri_ireland
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)dri_ireland
 

En vedette (17)

Kev Long - Administrative Roles in the DRI
Kev Long - Administrative Roles in the DRIKev Long - Administrative Roles in the DRI
Kev Long - Administrative Roles in the DRI
 
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your CollectionTim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection
 
Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storageKathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storage
 
Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?
 
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
 
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
Rebecca Grant, Kathryn Cassidy, Marta Bustillo - Implementing Orphan Works Le...
 
Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?Kathryn Cassidy - What metadata do we need for preservation?
Kathryn Cassidy - What metadata do we need for preservation?
 
Clare Lanigan - DRI Training Day UCC: Understanding Copyright
Clare Lanigan - DRI Training Day UCC: Understanding CopyrightClare Lanigan - DRI Training Day UCC: Understanding Copyright
Clare Lanigan - DRI Training Day UCC: Understanding Copyright
 
Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storage Kathryn Cassidy - Using MOAB versioning for preservation storage
Kathryn Cassidy - Using MOAB versioning for preservation storage
 
Dr Natalie Harrower - DRI and Open Data
Dr Natalie Harrower - DRI and Open DataDr Natalie Harrower - DRI and Open Data
Dr Natalie Harrower - DRI and Open Data
 
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
 
Clare Lanigan - DRI Training Series: 3. Understanding Copyright
Clare Lanigan - DRI Training Series: 3. Understanding CopyrightClare Lanigan - DRI Training Series: 3. Understanding Copyright
Clare Lanigan - DRI Training Series: 3. Understanding Copyright
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
Ingrid Dillo - Digital humanities challenges and the Research Data Alliance
Ingrid Dillo - Digital humanities challenges and the Research Data AllianceIngrid Dillo - Digital humanities challenges and the Research Data Alliance
Ingrid Dillo - Digital humanities challenges and the Research Data Alliance
 
Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
 

Similaire à Stuart Kenny; Kathryn Cassidy - Experience with Ingestion of Large Collections at DRI

3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaJisc RDM
 
London HUG
London HUGLondon HUG
London HUGBoudicca
 
Common Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web DataCommon Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web Datahuguk
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...Andrea Payant
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรDr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
NLW Linked Open Data Sets
NLW Linked Open Data SetsNLW Linked Open Data Sets
NLW Linked Open Data SetsGlen Robson
 
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD WorkshopFergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshopdri_ireland
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative EffortDarwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative Effortkristgen
 
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Repository Fringe
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...eswcsummerschool
 

Similaire à Stuart Kenny; Kathryn Cassidy - Experience with Ingestion of Large Collections at DRI (20)

3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
London HUG
London HUGLondon HUG
London HUG
 
Common Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web DataCommon Crawl: An Open Repository of Web Data
Common Crawl: An Open Repository of Web Data
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรการประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
Welcome to the CTDA
Welcome to the CTDAWelcome to the CTDA
Welcome to the CTDA
 
NLW Linked Open Data Sets
NLW Linked Open Data SetsNLW Linked Open Data Sets
NLW Linked Open Data Sets
 
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD WorkshopFergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative EffortDarwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
 
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...
 
Quick and dirty islandora
Quick and dirty islandoraQuick and dirty islandora
Quick and dirty islandora
 

Plus de dri_ireland

NORFest 2023 Lightning Talks Session Two
NORFest 2023 Lightning Talks Session TwoNORFest 2023 Lightning Talks Session Two
NORFest 2023 Lightning Talks Session Twodri_ireland
 
NORFest 2023: Early Career Researcher Panel on Research Assessment
NORFest 2023: Early Career Researcher Panel on Research AssessmentNORFest 2023: Early Career Researcher Panel on Research Assessment
NORFest 2023: Early Career Researcher Panel on Research Assessmentdri_ireland
 
NORFest 2023: National Open Research Fund 2023, Projects Launch
NORFest 2023: National Open Research Fund 2023, Projects LaunchNORFest 2023: National Open Research Fund 2023, Projects Launch
NORFest 2023: National Open Research Fund 2023, Projects Launchdri_ireland
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
NORFest 2023 Lightning Talks Session One
NORFest 2023 Lightning Talks Session OneNORFest 2023 Lightning Talks Session One
NORFest 2023 Lightning Talks Session Onedri_ireland
 
NORFest2023 Keynote address: Chelle Gentemann (NASA)
NORFest2023 Keynote address: Chelle Gentemann (NASA)NORFest2023 Keynote address: Chelle Gentemann (NASA)
NORFest2023 Keynote address: Chelle Gentemann (NASA)dri_ireland
 
The Archiving Reproductive Health project as a FAIR data resource for humanit...
The Archiving Reproductive Health project as a FAIR data resource for humanit...The Archiving Reproductive Health project as a FAIR data resource for humanit...
The Archiving Reproductive Health project as a FAIR data resource for humanit...dri_ireland
 
Developing a self-care protocol for working with potentially traumatic data: ...
Developing a self-care protocol for working with potentially traumatic data: ...Developing a self-care protocol for working with potentially traumatic data: ...
Developing a self-care protocol for working with potentially traumatic data: ...dri_ireland
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland dri_ireland
 
DRI Copyright and Licencing_UCC_Mar23.pptx
DRI Copyright and Licencing_UCC_Mar23.pptxDRI Copyright and Licencing_UCC_Mar23.pptx
DRI Copyright and Licencing_UCC_Mar23.pptxdri_ireland
 
The Digital Repository of Ireland Digital Preservation and Research Sustainab...
The Digital Repository of Ireland Digital Preservation and Research Sustainab...The Digital Repository of Ireland Digital Preservation and Research Sustainab...
The Digital Repository of Ireland Digital Preservation and Research Sustainab...dri_ireland
 
DRI's role in WorldFAIR: Cultural Heritage / Image Sharing
DRI's role in WorldFAIR: Cultural Heritage / Image SharingDRI's role in WorldFAIR: Cultural Heritage / Image Sharing
DRI's role in WorldFAIR: Cultural Heritage / Image Sharingdri_ireland
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Archiving Ports, Ports as Archives
Archiving Ports, Ports as ArchivesArchiving Ports, Ports as Archives
Archiving Ports, Ports as Archivesdri_ireland
 
Preservation, Access, Discovery
Preservation, Access, DiscoveryPreservation, Access, Discovery
Preservation, Access, Discoverydri_ireland
 
Dublin in the Fingal Archives
Dublin in the Fingal ArchivesDublin in the Fingal Archives
Dublin in the Fingal Archivesdri_ireland
 
Dublin Ghost Signs
Dublin Ghost SignsDublin Ghost Signs
Dublin Ghost Signsdri_ireland
 
Mapping Memories: Participatory Media, Place-Based Stories, Refugee Youth
Mapping Memories: Participatory Media, Place-Based Stories, Refugee YouthMapping Memories: Participatory Media, Place-Based Stories, Refugee Youth
Mapping Memories: Participatory Media, Place-Based Stories, Refugee Youthdri_ireland
 
Supporting Activists to Preserve Video Documentation
Supporting Activists to Preserve Video Documentation Supporting Activists to Preserve Video Documentation
Supporting Activists to Preserve Video Documentation dri_ireland
 
Making the Future
Making the FutureMaking the Future
Making the Futuredri_ireland
 

Plus de dri_ireland (20)

NORFest 2023 Lightning Talks Session Two
NORFest 2023 Lightning Talks Session TwoNORFest 2023 Lightning Talks Session Two
NORFest 2023 Lightning Talks Session Two
 
NORFest 2023: Early Career Researcher Panel on Research Assessment
NORFest 2023: Early Career Researcher Panel on Research AssessmentNORFest 2023: Early Career Researcher Panel on Research Assessment
NORFest 2023: Early Career Researcher Panel on Research Assessment
 
NORFest 2023: National Open Research Fund 2023, Projects Launch
NORFest 2023: National Open Research Fund 2023, Projects LaunchNORFest 2023: National Open Research Fund 2023, Projects Launch
NORFest 2023: National Open Research Fund 2023, Projects Launch
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
NORFest 2023 Lightning Talks Session One
NORFest 2023 Lightning Talks Session OneNORFest 2023 Lightning Talks Session One
NORFest 2023 Lightning Talks Session One
 
NORFest2023 Keynote address: Chelle Gentemann (NASA)
NORFest2023 Keynote address: Chelle Gentemann (NASA)NORFest2023 Keynote address: Chelle Gentemann (NASA)
NORFest2023 Keynote address: Chelle Gentemann (NASA)
 
The Archiving Reproductive Health project as a FAIR data resource for humanit...
The Archiving Reproductive Health project as a FAIR data resource for humanit...The Archiving Reproductive Health project as a FAIR data resource for humanit...
The Archiving Reproductive Health project as a FAIR data resource for humanit...
 
Developing a self-care protocol for working with potentially traumatic data: ...
Developing a self-care protocol for working with potentially traumatic data: ...Developing a self-care protocol for working with potentially traumatic data: ...
Developing a self-care protocol for working with potentially traumatic data: ...
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland
 
DRI Copyright and Licencing_UCC_Mar23.pptx
DRI Copyright and Licencing_UCC_Mar23.pptxDRI Copyright and Licencing_UCC_Mar23.pptx
DRI Copyright and Licencing_UCC_Mar23.pptx
 
The Digital Repository of Ireland Digital Preservation and Research Sustainab...
The Digital Repository of Ireland Digital Preservation and Research Sustainab...The Digital Repository of Ireland Digital Preservation and Research Sustainab...
The Digital Repository of Ireland Digital Preservation and Research Sustainab...
 
DRI's role in WorldFAIR: Cultural Heritage / Image Sharing
DRI's role in WorldFAIR: Cultural Heritage / Image SharingDRI's role in WorldFAIR: Cultural Heritage / Image Sharing
DRI's role in WorldFAIR: Cultural Heritage / Image Sharing
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Archiving Ports, Ports as Archives
Archiving Ports, Ports as ArchivesArchiving Ports, Ports as Archives
Archiving Ports, Ports as Archives
 
Preservation, Access, Discovery
Preservation, Access, DiscoveryPreservation, Access, Discovery
Preservation, Access, Discovery
 
Dublin in the Fingal Archives
Dublin in the Fingal ArchivesDublin in the Fingal Archives
Dublin in the Fingal Archives
 
Dublin Ghost Signs
Dublin Ghost SignsDublin Ghost Signs
Dublin Ghost Signs
 
Mapping Memories: Participatory Media, Place-Based Stories, Refugee Youth
Mapping Memories: Participatory Media, Place-Based Stories, Refugee YouthMapping Memories: Participatory Media, Place-Based Stories, Refugee Youth
Mapping Memories: Participatory Media, Place-Based Stories, Refugee Youth
 
Supporting Activists to Preserve Video Documentation
Supporting Activists to Preserve Video Documentation Supporting Activists to Preserve Video Documentation
Supporting Activists to Preserve Video Documentation
 
Making the Future
Making the FutureMaking the Future
Making the Future
 

Dernier

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Dernier (20)

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

Stuart Kenny; Kathryn Cassidy - Experience with Ingestion of Large Collections at DRI

  • 1. Experience with Ingestion of Large Collections Stuart Kenny Research IT Trinity College Dublin
  • 2. Stuart Kenny Research IT Trinity College Dublin The Fairy Tales of Charles Perrault. Illustrated by Harry Clarke. Intro. Thomas Bodkin. London: George G. Harrap, [1922]. Internet Archive version of a copy in the New York Public Library. Web. 25 December 2012. My what a big collection you have!
  • 3. About DRI (https://repository.dri.ie/) ● DRI is an interactive trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions ● RIA (lead), NUIM, TCD, DIT, NUIG, NCAD ● Partners: academic, cultural, social, government
  • 4. Outline • What’s our problem? • Example collections • Ingest solutions • Current ingest process • Possible future process
  • 5. Ingesting Objects • Ingest form o Suitable for single objects/small collections o Flat hierarchies o Simple metadata standards • Multiple standards o e.g., MARC, EAD o XML upload • How to handle complex standards, many objects?
  • 6.
  • 7. Example Collection: Clarke Stained Glass • MODS metadata • 10,025 objects • 42 sub-collections • 20,047 files, 2.82 TB • Problems: o Large number of objects o Data transfer
  • 8. Example Collection: TCD Children’s Books • MARC metadata • 207,889 objects • 16 sub-collections • Problems: o Large number of objects o Very slow to ingest o Timeouts and errors
  • 9. Example Collection: Kilkenny Design Workshop • EAD metadata • 2,040 objects • 2,734 series/files • 2,231 files, 1.2GB • Problems: o Very complex metadata standard o Hierarchical structure
  • 10. EAD, and why I don’t quite hate it as much as I did... • Single XML file upload • Structure encoded in metadata • URLs to files • But o One-shot ingest o How to edit/update? o Slow to ingest o Requires a lot of resources
  • 11. Sufia Batch Upload • Add multiple files • New work for each • Metadata for each work • How to handle multiple standards? • Different metadata for each work?
  • 12. Avalon Batch Ingest • Ingest package o Manifest file o Plus content files • Manifest file is spreadsheet o Metadata for items o Names of content files • Ingest package uploaded to Avalon DropBox
  • 13. Approach up to now • Command line client o Enter text commands at ‘command prompt’ • Written in Ruby • Run locally by user • Metadata and asset files arranged in fixed directory structure • Client iterates over directory creates each object as single ingest
  • 14.
  • 15. Problems • Lack of user familiarity with command line • Multiple platform support o i.e., Windows • Difficulty of installing • Multiple single ingests o Slow o Error prone • Required lots of user support • Mostly in the end ingests performed by dev team
  • 16. Current Attempt • Web-based UI • Borrow heavily from Avalon approach • Upload metadata XML plus assets to online storage • Add manifest spreadsheet o Each row contains path to metadata o Paths to zero or more asset files o Paths relative to online storage directory • Backend processes manifest and ingests as background task • UI updates status
  • 18. • Hydra BrowseEverything o Gem to access cloud storage o DropBox, Google Drive… • User uploads files • In UI selects collection and manifest to ingest • Everything handled server side in background • Can view status in UI
  • 19.
  • 20.
  • 21. Outstanding Issues • Online storage o Dropbox type storage size limits • Creating spreadsheet less easy than directory structure • Possible solutions o Provide online storage o Has to be per user o Generate required manifest from uploaded directory structure