SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Webarchiv
Czech web archive of National Library of the Czech Republic
Curatorial approaches, topic collections and cooperation with the research
communities
Acquisition of web resources – curatorial approaches
● What to archive?
● What will be important in the future?
● User community?
● How to build archive content?
word cloud of all domains in Webarchiv
History
• 2000 – project of National Library of the Czech Republic, Moravian Library
and Masaryk University
• 2001 – first archived website
• 2005 – regular harvesting of content
• 2007 – joining the IIPC – International Internet Preservation Consortium
one of our oldest archival copy - website of Charles University
archive copy of National Library of the Czech Republic website
Today
• 385 TB archived data
• 9,5 billion digital objects (text, images, audio and video objects, software,
scripts, etc.)
• 3,5 people in the department + 1 IT guy
Legal Issues
• Copyright act – Library Licence allows library to make a reproduction of a
work for its own archiving and conservation purposes
• Legal deposit act – does not cover born digital documents
• Online access – based on contract with publishers or on Creative Commons
licence
Software
• crawler: Heritrix 3.4
• access: Open Wayback 2.3.1
• curators: Seeder (developed in-house, available on github https://github.com/webarchivcz/)
Seeder – software for managing electronic resources, websites and harvests
Collection policy
• Comprehensive harvests
• Selective harvests
• Topic collections
Comprehensive harvests
• contract with czech domain provider CZ.NIC
• once or twice a year crawl of the whole .cz domain
• accessible only in the library
• 1,4 millions of second order domains / domain.cz
• maximum of 5000 harvested files per site
“Archived versions of this page are only available from the Reference Centre of the National Library
of the Czech Republic.”
Selective harvests
• selective approach
• bohemical character (territory, language, authorship, topic/content), not only on czech domain
• resources with historical, scientific or cultural value
• curated resources
• online access – contract or Creative Commons
• more than 5000 archived websites with online access
• crawled periodically
• maximum of 15 000 harvested files per site
Curators – workflow
• selecting and evaluating resources
• contracting with publishers
• cataloging (RDA rules, conspectus method)
• access and quality assurance
Topic collections
collections of resources related to certain event or topic
deeper capture of the topic in electronic resources
current events – harvesting usually in several stages: before, during and after the event
• planned: elections, anniversaries
• unexpected: floods, terrorist attacks
long-term collections – continuous harvesting
• Creative Commons, Periodical publications, Charles University
collaboration with IIPC (Olympics and Paralympics, Climate Change, Artificial Intelligence)
IIPC - collaborative collection 2018 Winter Olympics and Paralympics
Challenges
● make the archive as accessible to the public as possible (legislative
restrictions)
● collection policy – collection profiling
● full-text search
● development of tools for working with archived data, big data and metadata
● cooperation with the research communities
Current cooperation with the researchers / institutions
● methodological support for building own archives
The National Archives, Czech Academy of Sciences, Office for supervision of economic affairs
of political parties and political movements
● topic collections
The National Archives – Public Authorities
● archiving specific resources
Czech Language Institute of the Czech Academy of Sciences (periodical publications)
Development of centralized interface for extracting big data
from web archives – research project
● Webarchiv – National Library of the Czech Republic
● University of West Bohemia – Faculty of Applied Sciences, The Department of Cybernetics
● Institute of Sociology of the Czech Academy of Sciences
First steps:
● legislative analysis
● index analysis
● analysis of provenance, authenticity
and technical parameters of archive data
● workshop for researchers
analysis of file formats in Webarchiv
project accepted into the program of the Ministry of Culture which helps to support applied research and experimental development of national
and cultural identity (NAKI)
Thank you for your attention
Marie Haškovcová
www.webarchiv.cz
www.facebook.com/webarchivcz
marie.haskovcova@nkp.cz

Contenu connexe

Tendances

Tendances (20)

2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA
 
What can libraries do for researchers?
What can libraries do for researchers?What can libraries do for researchers?
What can libraries do for researchers?
 
The integration and management of archaeological datasets: the Europeana proj...
The integration and management of archaeological datasets: the Europeana proj...The integration and management of archaeological datasets: the Europeana proj...
The integration and management of archaeological datasets: the Europeana proj...
 
ARACHNE at the German Archaeological Institute (DAI)
ARACHNE at the German Archaeological Institute (DAI)ARACHNE at the German Archaeological Institute (DAI)
ARACHNE at the German Archaeological Institute (DAI)
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your data
 
Increasing Visibility of Cultural Heritage Objects: A Case of Turkish Conten...
Increasing Visibility of Cultural Heritage Objects:  A Case of Turkish Conten...Increasing Visibility of Cultural Heritage Objects:  A Case of Turkish Conten...
Increasing Visibility of Cultural Heritage Objects: A Case of Turkish Conten...
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
Once upon a time there was a website - Archiving websites for the Musicologic...
Once upon a time there was a website - Archiving websites for the Musicologic...Once upon a time there was a website - Archiving websites for the Musicologic...
Once upon a time there was a website - Archiving websites for the Musicologic...
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
LoCloud Collections, or how to make your local heritage available on-line
LoCloud Collections, or how to make your local heritage available on-lineLoCloud Collections, or how to make your local heritage available on-line
LoCloud Collections, or how to make your local heritage available on-line
 
Scaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-YuScaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-Yu
 
Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up
 
IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017
 
Digital Archiving at the Meertens Institute
Digital Archiving at the Meertens InstituteDigital Archiving at the Meertens Institute
Digital Archiving at the Meertens Institute
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
 
Digitising Hansard
Digitising HansardDigitising Hansard
Digitising Hansard
 
LoCloud Overview
LoCloud OverviewLoCloud Overview
LoCloud Overview
 
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UKEaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...
 
AddressingHistory - Tracing the Past
AddressingHistory - Tracing the PastAddressingHistory - Tracing the Past
AddressingHistory - Tracing the Past
 

Similaire à Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities

Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
Europeana Newspapers
 

Similaire à Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities (20)

ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Developing a webarchiving strategy for national movements in Flanders
Developing a webarchiving strategy for national movements in FlandersDeveloping a webarchiving strategy for national movements in Flanders
Developing a webarchiving strategy for national movements in Flanders
 
Europeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcherEuropeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcher
 
Introduction to LoCloud
Introduction to LoCloudIntroduction to LoCloud
Introduction to LoCloud
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
Digitization Of Audiovisual Collections
Digitization Of Audiovisual CollectionsDigitization Of Audiovisual Collections
Digitization Of Audiovisual Collections
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
Ktisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural RepositoryKtisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural Repository
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
Dag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections onlineDag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections online
 
Introduction of the project "Books Discovered Once Again"
Introduction of the project "Books Discovered Once Again" Introduction of the project "Books Discovered Once Again"
Introduction of the project "Books Discovered Once Again"
 
Jussi Nuorteva - Power of Open Data in Archives
Jussi Nuorteva - Power of Open Data in Archives Jussi Nuorteva - Power of Open Data in Archives
Jussi Nuorteva - Power of Open Data in Archives
 
Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914
 
Museums and Europeana
Museums and EuropeanaMuseums and Europeana
Museums and Europeana
 

Plus de Webarchive of National Library of the Czech Republic

Plus de Webarchive of National Library of the Czech Republic (20)

Inzerat - datovy analytik / datova analyticka
Inzerat - datovy analytik / datova analyticka Inzerat - datovy analytik / datova analyticka
Inzerat - datovy analytik / datova analyticka
 
Inzerát datovy analytik_wa
Inzerát datovy analytik_waInzerát datovy analytik_wa
Inzerát datovy analytik_wa
 
Sys admin wa_rvv
Sys admin wa_rvvSys admin wa_rvv
Sys admin wa_rvv
 
Volné pracovní místo - kurátor/ka webového archivu
Volné pracovní místo - kurátor/ka webového archivuVolné pracovní místo - kurátor/ka webového archivu
Volné pracovní místo - kurátor/ka webového archivu
 
Volné místo - analytik českého webového archivu
Volné místo - analytik českého webového archivuVolné místo - analytik českého webového archivu
Volné místo - analytik českého webového archivu
 
Webarchiv aneb až po lokty v mrtvolách
Webarchiv aneb až po lokty v mrtvoláchWebarchiv aneb až po lokty v mrtvolách
Webarchiv aneb až po lokty v mrtvolách
 
Kurz webové archivace 2018/2
Kurz webové archivace 2018/2Kurz webové archivace 2018/2
Kurz webové archivace 2018/2
 
Blok expertu
Blok expertuBlok expertu
Blok expertu
 
Kurz webové archivace 2018/1
Kurz webové archivace 2018/1Kurz webové archivace 2018/1
Kurz webové archivace 2018/1
 
Webarchiv
WebarchivWebarchiv
Webarchiv
 
Datovy analytik
Datovy analytikDatovy analytik
Datovy analytik
 
Kurz webové archivace 2017/4
Kurz webové archivace 2017/4Kurz webové archivace 2017/4
Kurz webové archivace 2017/4
 
Kurz webové archivace 2017/3
Kurz webové archivace 2017/3Kurz webové archivace 2017/3
Kurz webové archivace 2017/3
 
Kurz webové archivace 2017/2
Kurz webové archivace 2017/2Kurz webové archivace 2017/2
Kurz webové archivace 2017/2
 
Kurz webové archivace 2017/1
Kurz webové archivace 2017/1Kurz webové archivace 2017/1
Kurz webové archivace 2017/1
 
Tematické kolekce jako měřítko kvality webových archivů
Tematické kolekce jako měřítko kvality webových archivůTematické kolekce jako měřítko kvality webových archivů
Tematické kolekce jako měřítko kvality webových archivů
 
WARC 1.1 je skoro tady - co přinese nová verze?
WARC 1.1 je skoro tady - co přinese nová verze?WARC 1.1 je skoro tady - co přinese nová verze?
WARC 1.1 je skoro tady - co přinese nová verze?
 
WARC 1.1 je skoro tady - co přinese nová verze
WARC 1.1 je skoro tady - co přinese nová verzeWARC 1.1 je skoro tady - co přinese nová verze
WARC 1.1 je skoro tady - co přinese nová verze
 
Mezi snem a realitou. Otevřená data českého webového archivu.
Mezi snem a realitou. Otevřená data českého webového archivu.Mezi snem a realitou. Otevřená data českého webového archivu.
Mezi snem a realitou. Otevřená data českého webového archivu.
 
Český webový archiv
Český webový archivČeský webový archiv
Český webový archiv
 

Dernier

Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 

Dernier (20)

Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 

Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities

  • 1. Webarchiv Czech web archive of National Library of the Czech Republic Curatorial approaches, topic collections and cooperation with the research communities
  • 2.
  • 3. Acquisition of web resources – curatorial approaches ● What to archive? ● What will be important in the future? ● User community? ● How to build archive content? word cloud of all domains in Webarchiv
  • 4. History • 2000 – project of National Library of the Czech Republic, Moravian Library and Masaryk University • 2001 – first archived website • 2005 – regular harvesting of content • 2007 – joining the IIPC – International Internet Preservation Consortium
  • 5. one of our oldest archival copy - website of Charles University
  • 6. archive copy of National Library of the Czech Republic website
  • 7. Today • 385 TB archived data • 9,5 billion digital objects (text, images, audio and video objects, software, scripts, etc.) • 3,5 people in the department + 1 IT guy
  • 8.
  • 9.
  • 10. Legal Issues • Copyright act – Library Licence allows library to make a reproduction of a work for its own archiving and conservation purposes • Legal deposit act – does not cover born digital documents • Online access – based on contract with publishers or on Creative Commons licence
  • 11. Software • crawler: Heritrix 3.4 • access: Open Wayback 2.3.1 • curators: Seeder (developed in-house, available on github https://github.com/webarchivcz/)
  • 12. Seeder – software for managing electronic resources, websites and harvests
  • 13. Collection policy • Comprehensive harvests • Selective harvests • Topic collections
  • 14. Comprehensive harvests • contract with czech domain provider CZ.NIC • once or twice a year crawl of the whole .cz domain • accessible only in the library • 1,4 millions of second order domains / domain.cz • maximum of 5000 harvested files per site
  • 15. “Archived versions of this page are only available from the Reference Centre of the National Library of the Czech Republic.”
  • 16. Selective harvests • selective approach • bohemical character (territory, language, authorship, topic/content), not only on czech domain • resources with historical, scientific or cultural value • curated resources • online access – contract or Creative Commons • more than 5000 archived websites with online access • crawled periodically • maximum of 15 000 harvested files per site
  • 17.
  • 18. Curators – workflow • selecting and evaluating resources • contracting with publishers • cataloging (RDA rules, conspectus method) • access and quality assurance
  • 19. Topic collections collections of resources related to certain event or topic deeper capture of the topic in electronic resources current events – harvesting usually in several stages: before, during and after the event • planned: elections, anniversaries • unexpected: floods, terrorist attacks long-term collections – continuous harvesting • Creative Commons, Periodical publications, Charles University collaboration with IIPC (Olympics and Paralympics, Climate Change, Artificial Intelligence)
  • 20.
  • 21. IIPC - collaborative collection 2018 Winter Olympics and Paralympics
  • 22. Challenges ● make the archive as accessible to the public as possible (legislative restrictions) ● collection policy – collection profiling ● full-text search ● development of tools for working with archived data, big data and metadata ● cooperation with the research communities
  • 23. Current cooperation with the researchers / institutions ● methodological support for building own archives The National Archives, Czech Academy of Sciences, Office for supervision of economic affairs of political parties and political movements ● topic collections The National Archives – Public Authorities ● archiving specific resources Czech Language Institute of the Czech Academy of Sciences (periodical publications)
  • 24.
  • 25. Development of centralized interface for extracting big data from web archives – research project ● Webarchiv – National Library of the Czech Republic ● University of West Bohemia – Faculty of Applied Sciences, The Department of Cybernetics ● Institute of Sociology of the Czech Academy of Sciences First steps: ● legislative analysis ● index analysis ● analysis of provenance, authenticity and technical parameters of archive data ● workshop for researchers analysis of file formats in Webarchiv project accepted into the program of the Ministry of Culture which helps to support applied research and experimental development of national and cultural identity (NAKI)
  • 26.
  • 27. Thank you for your attention Marie Haškovcová www.webarchiv.cz www.facebook.com/webarchivcz marie.haskovcova@nkp.cz