SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Scaling up to archive the UK
Web
Helen Hockx-Yu
www.bl.uk 2
2001-2002
Explore
 Launch
Domain.UK
project
 No public
access
Collaborate
2003-2008
 Establish Web
Archiving Programme
 Lead UK Web
Archiving Consortium
 Launch UK Web
Archive
Build capacity BAU
2008-2011
 People, systems and
processes
 Curatorial expertise
 Technical know-how
2011
 Web Archiving as
operational unit
 Implement non-print
Legal Deposit since
April 2013
Web Archiving Timeline
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
www.bl.uk 3
Before (6 April 2013)
• Selective archiving of websites that
– reflect the diversity of lives, interests and activities throughout the UK
– contain research value or are of research interest
– feature political, cultural, social and economic events of national interest
– demonstrate innovative use of the web
– Also prioritise websites at risk and web-only content
• Permission based
– Permission to archive, to provide online access and to preserve. Also ask
or 3rd party rights clearance
– 30% success rate, 5% explicit refusal (mostly due to 3rd party rights)
• Online access through UK Web Archive
www.bl.uk 4
Toolset
• Selection and Permission Tool
– selection and permission management
– Integrated with the Web Curator Tool
• Web Curator Tool
– Job scheduling
– Metadata
– Access control
– Harvesting (uses Heritirx)
– QA
• Indexing and SIP generation – scripts and SOLR (for full-text index)
• Wayback – rendering tool for WARCs
• UK Web Archive – web-based end user interface
www.bl.uk 5
Access
•Currently 3 ways to access the web archive
– Online through the UK Web Archive
– Catalogue records (of special collections)
– Keywords search through primo (corporate resource
discovery system)
•Conduct researcher survey / research
projects to understand requirements
www.bl.uk 6
A catalogue record for a collection
www.bl.uk 7
Keyword search through Primo
www.bl.uk 8
UK Web Archive
• 14,118 websites, 60,482
instances, 17.6TB WARCs
• Over 182,761 unique visits 1st
April ‘12 – 31st March ‘13
• Key websites include videos
• Full-text, N-gram, title and
URL search
• Browse by subject / special
collection, visual browsing
• Analytical access
http://www.webarchive.org.uk
www.bl.uk 9
Analytical access
• Shift of focus from the level of single webpages or websites to the entire
web archive collection.
• Use web archives as datasets, access to metadata and knowledge
about websites
• Support survey, annotation, contextualisation and visualisation
• Allows discovery of patterns, trends and relationships in inter-linked
web pages
• Helps addresses a number of challenging issues
– Scalability
– Accessibility of individual websites
– Components missed by crawlers
www.bl.uk 10
After (6 April 2013)
• Government introduced Non-print Legal Deposit Regulations 2013
• Apply to material published digitally and online, including articles
books, and websites.
• 6 UK Legal Deposit Libraries
• Deposited content accessible “on library premises controlled by the
deposit library”
– after 7 days of collection or deposit
– Single concurrent access
– Catalogue records allowed to be searchable online
– Digital copying not permitted
www.bl.uk 11
Legal Deposit of UK websites
• In scope
– Sites that use a .uk or other UK geographic top-level domain
– where part of the publishing process takes place in the UK;
• Will not archive
– sites concerning film and recorded sound where the audio-visual
content predominates
– private intranets and emails
• Over 10 million .uk registered domains
– 4th TLD after .com, .de and .net
– UK organisations also use non .uk domain names (eg .com or .org)
– scale unknown
www.bl.uk 12
Domain Crawl
News
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
Domain crawl:
• Broad
sweep of
UK domain
• Once or
twice a year
Events & key
sites and news:
• Events of
UK interest
• High value,
high impact
sites
• National &
regional
news
Special
Collection:
• Focused,
thematic
collections
• Support
priority
subjects
Key sitesEvents
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
Collecting strategy
www.bl.uk 13
Access strategy
• Deposited content cannot be accessed outside the reading
rooms.
• Online access can be provided to metadata and selected content
to showcase the Legal Deposit web archive of the UK
– Bibliographic metadata
– Analysis and visualisation of aggregated content
– Statistical and contextual data
– Copy of deposited content with direct permission
• For sites from outside the UK, permission both to harvest and for
public access will be required
www.bl.uk 14
Before and after: what has changed
• Everything!
BEFORE AFTER
Scale 14,000 4 – 5 million
Purpose Advocacy, demonstrating
benefits
Legal Deposit
Workflow (and
tools)
Selection prior to harvesting Selection / curation can happen post
harvesting
Permission to
archive
Required Can collect in-scope material without
permission
Access Online Reading rooms only (unless with direct
permission for online access)
Nature of QA Quality control leading to
deselection
Flagging up quality issues
Ownership British Library Legal Deposit Libraries
www.bl.uk 15
Progress
• Experimental domain crawl in August-December 2012, no access
– Started with 4.8 million seeds
– Collected 27TB data +1TB of crawl logs
• 1st Legal Deposit domain crawl started in April
– Started with 3.8 million seeds
– Ran between 8th April - 21st June and collected over 31TB data
• Focused collection on National Health Service Reform
– Showcase end-to-end processes including ingest and access in
reading room in early July
• Selecting key sites, news site and events
www.bl.uk 16
Collection: National Health Service (NHS)
Reform
www.bl.uk 17
Challenges
• Legal deposit territoriality and scope
• Advanced content
• User experience
• Monitoring software, rendering engine
• Change of business processes
www.bl.uk 18
Thank you

Contenu connexe

Tendances

LoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana CloudLoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana Cloudlocloud
 
Ktisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural RepositoryKtisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural RepositoryLibrary and Information Services
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEBogdan Trifunovic
 
What’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWhat’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWARCnet
 
Consolidating Openness : Developing Rijksmuseum Research Services
Consolidating Openness : Developing Rijksmuseum Research ServicesConsolidating Openness : Developing Rijksmuseum Research Services
Consolidating Openness : Developing Rijksmuseum Research ServicesSaskia Scheltjens
 
Anne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at InaAnne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at InaFIAT/IFTA
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...locloud
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providerslocloud
 
National Services for Web Archiving: a way to preserve and provide access to...
National Services for Web Archiving: a way to preserve and  provide access to...National Services for Web Archiving: a way to preserve and  provide access to...
National Services for Web Archiving: a way to preserve and provide access to...Paulo Leitao
 
LoCloud: Local Cultural Heritage Online and in the Cloud
LoCloud: Local Cultural Heritage Online and in the CloudLoCloud: Local Cultural Heritage Online and in the Cloud
LoCloud: Local Cultural Heritage Online and in the Cloudlocloud
 
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCTuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCWARCnet
 
A National Library's Digitisation Guide for Digital Humanists
A National Library's Digitisation Guide for Digital HumanistsA National Library's Digitisation Guide for Digital Humanists
A National Library's Digitisation Guide for Digital HumanistsRossitza Atanassova
 
20yrs: 2005_warwick3
20yrs: 2005_warwick3 20yrs: 2005_warwick3
20yrs: 2005_warwick3 Neil Beagrie
 
Seminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projectsSeminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projectsMediaMosa
 
A Wiki for Archivists? ArchiefWiki.org
A Wiki for Archivists? ArchiefWiki.orgA Wiki for Archivists? ArchiefWiki.org
A Wiki for Archivists? ArchiefWiki.orgTom Cobbaert
 
2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHAGeorg Petz
 
20yrs: 2013 Screening the Future
20yrs: 2013 Screening the Future20yrs: 2013 Screening the Future
20yrs: 2013 Screening the FutureNeil Beagrie
 

Tendances (20)

LoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana CloudLoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana Cloud
 
Ktisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural RepositoryKtisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural Repository
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
 
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
What’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collectionsWhat’s in a URL? Analysing COVID-19 web archive collections
What’s in a URL? Analysing COVID-19 web archive collections
 
Consolidating Openness : Developing Rijksmuseum Research Services
Consolidating Openness : Developing Rijksmuseum Research ServicesConsolidating Openness : Developing Rijksmuseum Research Services
Consolidating Openness : Developing Rijksmuseum Research Services
 
Anne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at InaAnne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at Ina
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
 
National Services for Web Archiving: a way to preserve and provide access to...
National Services for Web Archiving: a way to preserve and  provide access to...National Services for Web Archiving: a way to preserve and  provide access to...
National Services for Web Archiving: a way to preserve and provide access to...
 
Luca Martinelli Europeana
Luca Martinelli EuropeanaLuca Martinelli Europeana
Luca Martinelli Europeana
 
LoCloud: Local Cultural Heritage Online and in the Cloud
LoCloud: Local Cultural Heritage Online and in the CloudLoCloud: Local Cultural Heritage Online and in the Cloud
LoCloud: Local Cultural Heritage Online and in the Cloud
 
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPCTuesday 5 May: IIPC activities, Olga Holownia, IIPC
Tuesday 5 May: IIPC activities, Olga Holownia, IIPC
 
A National Library's Digitisation Guide for Digital Humanists
A National Library's Digitisation Guide for Digital HumanistsA National Library's Digitisation Guide for Digital Humanists
A National Library's Digitisation Guide for Digital Humanists
 
Work Package 4 - Month 6 by Sam Leon
Work Package 4 - Month 6 by Sam LeonWork Package 4 - Month 6 by Sam Leon
Work Package 4 - Month 6 by Sam Leon
 
20yrs: 2005_warwick3
20yrs: 2005_warwick3 20yrs: 2005_warwick3
20yrs: 2005_warwick3
 
Seminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projectsSeminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projects
 
A Wiki for Archivists? ArchiefWiki.org
A Wiki for Archivists? ArchiefWiki.orgA Wiki for Archivists? ArchiefWiki.org
A Wiki for Archivists? ArchiefWiki.org
 
2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA
 
20yrs: 2013 Screening the Future
20yrs: 2013 Screening the Future20yrs: 2013 Screening the Future
20yrs: 2013 Screening the Future
 

En vedette

Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...
Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...
Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...Biblioteca Nacional de España
 
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...Biblioteca Nacional de España
 
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...Biblioteca Nacional de España
 
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia Serra
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia SerraPADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia Serra
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia SerraBiblioteca Nacional de España
 
Mitos y realidades de las redes sociales e Internet. Mario Tascón
Mitos y realidades de las redes sociales e Internet. Mario TascónMitos y realidades de las redes sociales e Internet. Mario Tascón
Mitos y realidades de las redes sociales e Internet. Mario TascónBiblioteca Nacional de España
 
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...Biblioteca Nacional de España
 
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...Productos y servicios de información de AENOR de interés para bibliotecas. Ro...
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...Biblioteca Nacional de España
 

En vedette (8)

Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...
Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...
Un paseo por Aragón a través de la Biblioteca Nacional de España. Ana Santos ...
 
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...
Biblioteca Digital del Patrimonio Iberoamericano: normalización y tecnología ...
 
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...
Comunicacion ME2.0: influencia de la marca personal en la organización. Nuria...
 
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia Serra
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia SerraPADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia Serra
PADICAT, el archivo web de la Biblioteca de Catalunya. Eugènia Serra
 
Mitos y realidades de las redes sociales e Internet. Mario Tascón
Mitos y realidades de las redes sociales e Internet. Mario TascónMitos y realidades de las redes sociales e Internet. Mario Tascón
Mitos y realidades de las redes sociales e Internet. Mario Tascón
 
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...
Una nueva Ley de Depósito Legal. Un gran éxito y un gran reto. Concha Jiménez...
 
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...Productos y servicios de información de AENOR de interés para bibliotecas. Ro...
Productos y servicios de información de AENOR de interés para bibliotecas. Ro...
 
Una nueva Ley de Depósito Legal
Una nueva Ley de Depósito LegalUna nueva Ley de Depósito Legal
Una nueva Ley de Depósito Legal
 

Similaire à Scaling up to archive the UK Web. Helen Hockx-Yu

Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Andy Jackson
 
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsjohnkayebl
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Michael Day
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...lisbk
 
Prospects and pitfalls in using web archives for research
Prospects and pitfalls in using web archives for researchProspects and pitfalls in using web archives for research
Prospects and pitfalls in using web archives for researchPeter Webster
 
Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Michael Day
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemorySamantha Norling
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British LibraryMichael Day
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Sally Chambers
 

Similaire à Scaling up to archive the UK Web. Helen Hockx-Yu (20)

Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
Aglin
AglinAglin
Aglin
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Pandora
PandoraPandora
Pandora
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...
 
Prospects and pitfalls in using web archives for research
Prospects and pitfalls in using web archives for researchProspects and pitfalls in using web archives for research
Prospects and pitfalls in using web archives for research
 
8 a pleased
8 a pleased8 a pleased
8 a pleased
 
Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...
 
EDINA Serials UKLA SafeNet
EDINA Serials UKLA SafeNetEDINA Serials UKLA SafeNet
EDINA Serials UKLA SafeNet
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British Library
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive
 
WAPWG Jan 2020 Rossi
WAPWG Jan 2020 Rossi WAPWG Jan 2020 Rossi
WAPWG Jan 2020 Rossi
 
Bill Stockting - UKAD Forum 2016
Bill Stockting - UKAD Forum 2016Bill Stockting - UKAD Forum 2016
Bill Stockting - UKAD Forum 2016
 

Plus de Biblioteca Nacional de España

La colección de relaciones de sucesos en la Biblioteca Nacional de España
La colección de relaciones de sucesos en la Biblioteca Nacional de EspañaLa colección de relaciones de sucesos en la Biblioteca Nacional de España
La colección de relaciones de sucesos en la Biblioteca Nacional de EspañaBiblioteca Nacional de España
 
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos Aramburo
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos AramburoIdentidad común: las fuentes del patrimonio bibliográfico. Ana Santos Aramburo
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos AramburoBiblioteca Nacional de España
 
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...Biblioteca Nacional de España
 
RDA. Autoridades. Fundamentos. Identificación de entidades. Relaciones
RDA. Autoridades. Fundamentos. Identificación de entidades. RelacionesRDA. Autoridades. Fundamentos. Identificación de entidades. Relaciones
RDA. Autoridades. Fundamentos. Identificación de entidades. RelacionesBiblioteca Nacional de España
 
Pleno del Real Patronato. Biblioteca Nacional de España
Pleno del Real Patronato. Biblioteca Nacional de EspañaPleno del Real Patronato. Biblioteca Nacional de España
Pleno del Real Patronato. Biblioteca Nacional de EspañaBiblioteca Nacional de España
 
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de España
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspañaObjetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de España
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspañaBiblioteca Nacional de España
 
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...Biblioteca Nacional de España
 
Evaluación actuaciones 2018. Planificación actuaciones 2019
Evaluación actuaciones 2018. Planificación actuaciones 2019Evaluación actuaciones 2018. Planificación actuaciones 2019
Evaluación actuaciones 2018. Planificación actuaciones 2019Biblioteca Nacional de España
 
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos AramburoPleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos AramburoBiblioteca Nacional de España
 
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...Biblioteca Nacional de España
 

Plus de Biblioteca Nacional de España (20)

La colección de relaciones de sucesos en la Biblioteca Nacional de España
La colección de relaciones de sucesos en la Biblioteca Nacional de EspañaLa colección de relaciones de sucesos en la Biblioteca Nacional de España
La colección de relaciones de sucesos en la Biblioteca Nacional de España
 
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos Aramburo
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos AramburoIdentidad común: las fuentes del patrimonio bibliográfico. Ana Santos Aramburo
Identidad común: las fuentes del patrimonio bibliográfico. Ana Santos Aramburo
 
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...
La Biblioteca Nacional de España como centro de apoyo a la investigación. Ana...
 
Data privacy in library authority files: a survey
Data privacy in library authority files: a surveyData privacy in library authority files: a survey
Data privacy in library authority files: a survey
 
Perfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambiosPerfil de RDA de la BNE. Resumen de cambios
Perfil de RDA de la BNE. Resumen de cambios
 
RDA. Autoridades. Fundamentos. Identificación de entidades. Relaciones
RDA. Autoridades. Fundamentos. Identificación de entidades. RelacionesRDA. Autoridades. Fundamentos. Identificación de entidades. Relaciones
RDA. Autoridades. Fundamentos. Identificación de entidades. Relaciones
 
RDA: el nuevo texto
RDA: el nuevo textoRDA: el nuevo texto
RDA: el nuevo texto
 
Pleno del Real Patronato. Biblioteca Nacional de España
Pleno del Real Patronato. Biblioteca Nacional de EspañaPleno del Real Patronato. Biblioteca Nacional de España
Pleno del Real Patronato. Biblioteca Nacional de España
 
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de España
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de EspañaObjetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de España
Objetivos 2019. Pleno del Real Patronato. Biblioteca Nacional de España
 
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...
Pleno del Real Patronato. Biblioteca Nacional de España. Evaluación actuacion...
 
Evaluación actuaciones 2018. Planificación actuaciones 2019
Evaluación actuaciones 2018. Planificación actuaciones 2019Evaluación actuaciones 2018. Planificación actuaciones 2019
Evaluación actuaciones 2018. Planificación actuaciones 2019
 
Dirección Técnica. Objetivos 2019
Dirección Técnica. Objetivos 2019Dirección Técnica. Objetivos 2019
Dirección Técnica. Objetivos 2019
 
Evaluación 2018. Objetivos 2019
Evaluación 2018. Objetivos 2019Evaluación 2018. Objetivos 2019
Evaluación 2018. Objetivos 2019
 
Evaluación actuaciones 2018. Dirección Cultural
Evaluación actuaciones 2018. Dirección CulturalEvaluación actuaciones 2018. Dirección Cultural
Evaluación actuaciones 2018. Dirección Cultural
 
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos AramburoPleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos Aramburo
Pleno CCB. Consejo de Cooperación Bibliotecaria. Ana Santos Aramburo
 
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...
Descubrir, aprender, disfrutar en la Biblioteca Nacional de España. Ana Santo...
 
VIAF GDPR
VIAF GDPRVIAF GDPR
VIAF GDPR
 
Renacer prensa historica
Renacer prensa historicaRenacer prensa historica
Renacer prensa historica
 
RDA y Linked data (Ricardo Santos Muñoz)
RDA y Linked data (Ricardo Santos Muñoz)RDA y Linked data (Ricardo Santos Muñoz)
RDA y Linked data (Ricardo Santos Muñoz)
 
Desarrollo actual de RDA (Pilar Tejero López)
Desarrollo actual de RDA (Pilar Tejero López)Desarrollo actual de RDA (Pilar Tejero López)
Desarrollo actual de RDA (Pilar Tejero López)
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Scaling up to archive the UK Web. Helen Hockx-Yu

  • 1. Scaling up to archive the UK Web Helen Hockx-Yu
  • 2. www.bl.uk 2 2001-2002 Explore  Launch Domain.UK project  No public access Collaborate 2003-2008  Establish Web Archiving Programme  Lead UK Web Archiving Consortium  Launch UK Web Archive Build capacity BAU 2008-2011  People, systems and processes  Curatorial expertise  Technical know-how 2011  Web Archiving as operational unit  Implement non-print Legal Deposit since April 2013 Web Archiving Timeline 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
  • 3. www.bl.uk 3 Before (6 April 2013) • Selective archiving of websites that – reflect the diversity of lives, interests and activities throughout the UK – contain research value or are of research interest – feature political, cultural, social and economic events of national interest – demonstrate innovative use of the web – Also prioritise websites at risk and web-only content • Permission based – Permission to archive, to provide online access and to preserve. Also ask or 3rd party rights clearance – 30% success rate, 5% explicit refusal (mostly due to 3rd party rights) • Online access through UK Web Archive
  • 4. www.bl.uk 4 Toolset • Selection and Permission Tool – selection and permission management – Integrated with the Web Curator Tool • Web Curator Tool – Job scheduling – Metadata – Access control – Harvesting (uses Heritirx) – QA • Indexing and SIP generation – scripts and SOLR (for full-text index) • Wayback – rendering tool for WARCs • UK Web Archive – web-based end user interface
  • 5. www.bl.uk 5 Access •Currently 3 ways to access the web archive – Online through the UK Web Archive – Catalogue records (of special collections) – Keywords search through primo (corporate resource discovery system) •Conduct researcher survey / research projects to understand requirements
  • 6. www.bl.uk 6 A catalogue record for a collection
  • 8. www.bl.uk 8 UK Web Archive • 14,118 websites, 60,482 instances, 17.6TB WARCs • Over 182,761 unique visits 1st April ‘12 – 31st March ‘13 • Key websites include videos • Full-text, N-gram, title and URL search • Browse by subject / special collection, visual browsing • Analytical access http://www.webarchive.org.uk
  • 9. www.bl.uk 9 Analytical access • Shift of focus from the level of single webpages or websites to the entire web archive collection. • Use web archives as datasets, access to metadata and knowledge about websites • Support survey, annotation, contextualisation and visualisation • Allows discovery of patterns, trends and relationships in inter-linked web pages • Helps addresses a number of challenging issues – Scalability – Accessibility of individual websites – Components missed by crawlers
  • 10. www.bl.uk 10 After (6 April 2013) • Government introduced Non-print Legal Deposit Regulations 2013 • Apply to material published digitally and online, including articles books, and websites. • 6 UK Legal Deposit Libraries • Deposited content accessible “on library premises controlled by the deposit library” – after 7 days of collection or deposit – Single concurrent access – Catalogue records allowed to be searchable online – Digital copying not permitted
  • 11. www.bl.uk 11 Legal Deposit of UK websites • In scope – Sites that use a .uk or other UK geographic top-level domain – where part of the publishing process takes place in the UK; • Will not archive – sites concerning film and recorded sound where the audio-visual content predominates – private intranets and emails • Over 10 million .uk registered domains – 4th TLD after .com, .de and .net – UK organisations also use non .uk domain names (eg .com or .org) – scale unknown
  • 12. www.bl.uk 12 Domain Crawl News S p e c i a l c o l l e c t i o n S p e c i a l c o l l e c t i o n Domain crawl: • Broad sweep of UK domain • Once or twice a year Events & key sites and news: • Events of UK interest • High value, high impact sites • National & regional news Special Collection: • Focused, thematic collections • Support priority subjects Key sitesEvents S p e c i a l c o l l e c t i o n S p e c i a l c o l l e c t i o n Collecting strategy
  • 13. www.bl.uk 13 Access strategy • Deposited content cannot be accessed outside the reading rooms. • Online access can be provided to metadata and selected content to showcase the Legal Deposit web archive of the UK – Bibliographic metadata – Analysis and visualisation of aggregated content – Statistical and contextual data – Copy of deposited content with direct permission • For sites from outside the UK, permission both to harvest and for public access will be required
  • 14. www.bl.uk 14 Before and after: what has changed • Everything! BEFORE AFTER Scale 14,000 4 – 5 million Purpose Advocacy, demonstrating benefits Legal Deposit Workflow (and tools) Selection prior to harvesting Selection / curation can happen post harvesting Permission to archive Required Can collect in-scope material without permission Access Online Reading rooms only (unless with direct permission for online access) Nature of QA Quality control leading to deselection Flagging up quality issues Ownership British Library Legal Deposit Libraries
  • 15. www.bl.uk 15 Progress • Experimental domain crawl in August-December 2012, no access – Started with 4.8 million seeds – Collected 27TB data +1TB of crawl logs • 1st Legal Deposit domain crawl started in April – Started with 3.8 million seeds – Ran between 8th April - 21st June and collected over 31TB data • Focused collection on National Health Service Reform – Showcase end-to-end processes including ingest and access in reading room in early July • Selecting key sites, news site and events
  • 16. www.bl.uk 16 Collection: National Health Service (NHS) Reform
  • 17. www.bl.uk 17 Challenges • Legal deposit territoriality and scope • Advanced content • User experience • Monitoring software, rendering engine • Change of business processes