SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Digitizing all Dutch books, newspapers & magazines -
                730 million pages in 20 years -
              storing it, and getting it out there

                                        Olaf D. Janssen

               Koninklijke Bibliotheek (KB), National Library of the Netherlands,
                  Prins Willem-Alexanderhof 5, The Hague, The Netherlands
                                      olaf.janssen@kb.nl



       Abstract. In the next 20 years, the Dutch national library will digitize all
       printed publications since 1470, some 730M pages. To realize the first
       milestone of this ambition, KB made deals with Google and Proquest to digitize
       42M pages.
       Since 2003 KB has operated its e-Depot, a system for permanent digital object
       storage. KB is now replacing it with a new solution to better deal with future
       demands, allowing improved storage of its mass digitization output.
       To meet user demand for centralized access, KB is also replacing its scattered
       full-text online portfolio by a National Platform for Digital Publications, both a
       content delivery platform for its mass digitization output and a national domain
       aggregator for publications. From 2011 onwards, this collaborative, open and
       scalable platform will be expanded with more partners, content and
       functionalities.
       The KB is also involved in setting up a Dutch cross-domain aggregator,
       enabling content exposure in Europeana.

       Keywords: National libraries, Digital library workflows, Mass digitization,
       Google, Proquest, Permanent storage, Integrated access, Cross-domain cultural
       heritage, Aggregation, Interoperability, Europeana




1 Digitizing the KB

The KB 1 started digitizing its holdings in 1995, for reasons of accessibility and long-
term preservation. In the first years small scale efforts focused on scanning visually
attractive materials, highlights of the collection for the widest possible audiences. One
of the first projects was 100 highlights of the Koninklijke Bibliotheek 2 , followed by
Memory of the Netherlands 3 , the national programme for digitizing Dutch cultural
heritage, which was focused on image based materials. It was not until 1999 that the
KB started digitizing historical textual publications (books, newspapers &
magazines).

For the last 8 years, the focus has been on large-scale digitization of text corpora for
study and research in the humanities using public funding. In 2003 a project took off
to scan the complete run of Dutch Parliamentary Papers 4 . Consisting of 2.3 million
pages, this was at that time an unprecedented quantity for the Netherlands. At the end
of 2006 the KB was rewarded the Historical Newspapers project 5 . By the end of
2011, it will have scanned 8 million pages from popular Dutch regional, national and
colonial newspapers from the period 1618-1995.

In addition, in February 2011 the Early Dutch Books Online digitization effort 6
delivered 2.1 million full-text pages from the specials book collections of the KB and
the university libraries of Amsterdam and Leiden. Furthermore, by the end of this
year, some 1.5 million pages from the most frequently consulted old magazines (1840
-1950) will have been converted into full-texts.

In 2010 the KB announced its ambitious plans to digitize all Dutch books,
newspapers, magazines and other printed publications from 1470 onwards, a total of
730 million pages. A first milestone is set for 2013, by when the library should have
scanned 10% of this amount. To realize its ambition, the KB cannot not rely on public
funding alone, especially in times when government support for cultural heritage is in
a downward trend. It has therefore entered into strategic public-private partnerships
with both Google 7 and Proquest 8 to digitize 210.000 books (some 42M pages) from
its public domain collections.


2 Permanent storage, now & in the future

As the national library, the KB has a duty to permanently store not only printed
publications, but also digital ones (both born-digital and digitized). As early as 1994
the KB recognized the importance of such an electronic depot and took action
accordingly. It started making pilot agreements with major international publishers for
depositing e-journals (“safehaven”) and undertook market research to acquire a
technical solution for permanent storage. Such a system turned out not to be available
off-the-shelf, so in 2000 KB joined forces with IBM to build the world’s first OAIS-
based processing and preservation system for permanent storage of digital objects.
This has resulted in the operational e-Depot 9 , which the KB has been running since
2003.

Nowadays, this deposit is a safehaven for over 15 million scientific articles from
some of the world’s biggest publishers 10 , focusing on international scientific,
technical and medical journals (STM-publications). In addition it houses digital
monographs, periodicals and reports from Dutch publishers and materials from the
scientific repositories of Dutch universities, as part of the NARCIS 11 initiative.


2.1 Towards a new e-Depot

In 2012 the KB’s maintenance contract with IBM will run out and components of the
system will no longer be supported. The current implementation of the e-Depot is
based on requirements set in the late ‘90s. Some of these have become outdated with
respect to current & expected future requirements for speed and collection
management facilities. Additionally, with the 'seven-year-itch' or the system 12 having
past, it is already living longer than most other IT systems. Other reasons for
upgrading the e-Depot are
      Volume & scalability: digital publishing has lead to enormous growth of
         KB’s digital collections. Furthermore, the KB wants to permanently store the
         hundreds of millions of files resulting from its mass digitization programme
         output.
      Heterogeneity & flexibility: the current system is only optimized for
         processing and storing relatively small numbers of homogeneous single
         objects, i.e. mostly PDFs. In other words, it is not able to give fast access to
         large numbers of diverse and compound content, which will become
         increasingly common in the near future (e.g. enriched publications, e-books,
         websites)

In defining new requirements, the KB looked for consultation with its international
colleagues, most notably with the National Library of Germany (DNB) and SUB
Göttingen. This collaboration was based on the joint use of the IBM based system.
Early 2009, the KB and DNB sought cooperation with other European national
libraries to share experience, knowledge and resources. Another reason for doing so
was the lack of suitable commercial off-the-shelf products; the solutions that are
available bring the risk of vendor lock-in. When national libraries would join forces in
defining requirements and tendering, this could trigger commercial suppliers to invest
more in developing solutions that answer their requirements. Together with the
national libraries of the UK, Germany, Norway, Spain, Portugal, Switzerland and the
Czech Republic, KB defined an architectural outline, based on a two-layered OAIS
model and a modular setup of the preservation system. Unfortunately, later that year
the libraries decided not to have a joint tender due to different timelines.

To guarantee continued technical innovation and development of the e-Depot, the KB
is a partner in the SCAPE project 13 . This EU-funded initiative will provide ongoing
technical input by developing scalable preservation planning and execution services
that can be deployed in the new e-Depot system within the next three to five years.


3 Providing access & adding value

The back-end data standards 14 are identical across all KB-run mass digitization
projects, making the outputs in theory fully interoperable. However, this potential has
not yet been optimized in the front-end presentation of the KB’s full-text collections.
So far this has been done via separate, websites (4, 5, 15 ), each with its own specific
branding, URLs, design and search & object display functionalities.

For end-users the KB-collections thus appear to be unrelated and scattered, making
them relatively difficult to use given the expectations of modern users. They demand
all content to be available via a single point of entry, with the ability to apply multiple
views & filters (by theme, by time, by geographical location, by object type etc.) to
the interoperable, contextualized, enriched and re-usable content, with minimum
copyright limitations. In addition, users are primarily interested in the digital content
itself, much less from which physical object or institution it was derived.


3.1 Providing access – the Dutch National Platform for Digital Publications

The KB has taken these user demands seriously and has just finished designing and
implementing the first basic iteration of the Dutch National Platform for Digital
Publications (working name). This full-text content distribution platform will give
access to digitized books, newspapers and magazines. Not only will it include the
output of the KB’s mass digitization projects, but it will also be open for text
collections from other libraries. Access will be central via a modern Web2.0 site, as
well as distributed via search and display APIs. These can deliver content to users in
their normal workflows (via regular social networks, on mobile devices, in
professional virtual research environments & communities, in products like Zotero,
ReWorks, EndNote etc.), as well as allow others (both business and consumers) to
build their own applications based on the content.

Further key design choices of the platform include:

    1.   Open: everybody can bring and get content, as long as it fits the scope
         (Dutch textual publications) and certain standards (e.g. metadata & object
         quality). This will enable small institutions without much in-house expertise
         or infrastructure to expose their content on a national level. Depending on
         the rights on the objects, the content can be used, re-used, shared or enriched
         by third parties.

    2.   Scalable: given the ambitions of the KB to make all (to be) digitized
         collections available online, the platform must be able to cope with huge
         amounts of metadata and objects in the future. This means the service should
         allow for step-by-step upscaling towards more content and functionalities,
         with as little manual programming or data conversion work as possible.

    3.   Collaborative: as said above, the platform will be an open network of KB
         and other institutions, starting with a coalition of the willing. To guarantee
         buy-in from the start, partners will need to work collaboratively on both
         operational and strategical levels. This not only includes technical, but also
         organizational issues, such as funding, sustainability, governance and policy
         development.

This collaborative approach means that
     responsibilities (e.g. financial, technical, business, product development) are
         shared among the partners,
     national expertise about e.g. semantic & metadata interoperability is brought
         together,
    barriers for new partners to join the network are lowered,
        positions for joint support funding requests (both on national and European
         levels) become stronger, and thus
        future sustainability of the platform is more likely.

Furthermore, the National Platform for Digital Publications will improve the
visibility of the KB as an attractive business-to-business service & data provider for
partners in the Netherlands. KB could for instance offer a package of (paid)
permanent object storage in its e-Depot, with an option to present the object on the
platform to end users free of charge.

The platform marks a turning point towards centralized access of KB text collections.
Starting with the output of the Early Dutch Books Online project in May 2011, the
content of the platform will be expanded step-by-step in the years to come. The
current planning is as follows:
      2011: Early Dutch Books Online (2.1M pages), First set of old magazines
         (1840 -1950, up to 1.5M pages), First set of early 20th century books (1913
         onwards)
      2012: Historical Newspaper collection (8M pages, by transferring the content
         of http://kranten.kb.nl into the platform), Collection of historical children’s
         books from the Rotterdam public library
      2012-2014: output from the Google & Proquest efforts to be included, up to
         42M pages

Finally, the National Platform for Digital Publications will be positioned as a full-text
and metadata aggregator, with the aim of making the content interoperable and
exporting it to cross-domain initiatives, both on national, European and global levels.
See Section 4 for more details.


3.2 Improved access leads to added value creation

In the past decade, cultural heritage institutions have invested increasingly in their
digital services, making their collections accessible and at the same time bringing new
economic and social benefits within reach. A report 16 by the Dutch Foundation for
Economic Research has shown that the total benefits of digitization and accessibility
outweigh the costs. The heritage sector, creative industries, the education sector and
consumers will all experience immediate benefits from widespread availability of
cultural heritage objects. In other words, digital collections represent significant
potential economic and social value, provided they are made easily accessible.

To get an understanding how institutions should make their collections accessible to
generate maximum added value, the BMICE 17 distribution ring model 18 of Figure 1
gives guidance.
Figure 1. The BMICE ring model - Distribution rings showing four forms of access
to cultural heritage. The outward arrow represents the direction of added value.

The four rings represent the following levels of access
    1. Analogue in house: The work is displayed physically or made physically
         accessible in an archive, exhibition or reading room.
    2. Digital in house: The work is described digitally and may be digitized. It is
         made available within the walls of the institution by means of a closed
         network (or through digital data carriers), such as a computer or terminal at
         the institution that visitors can use to search through the collection database.
    3. Online: All or part of the digital collection of the institution is offered online
         through the institution’s website, but without explicit rights of use or reuse.
    4. Online in the network: Digital collections of the institution are made
         available in online networks. Rights of use are granted to third parties (the
         public, other institutions) for use or reuse.

Heritage institutions have traditionally focused on - and felt safe in - the first ring,
with ring 2 opening up since the start of the digital age in the late ‘80s. The 3rd ring
has come into view since the mid ‘90s, when the web entered everyday life. The rise
of the social web in the ‘00s has put momentum in giving access to objects in the 4th
ring. Even nowadays, many content holders are only just beginning to enter this circle
and understand the huge benefits of opening up their collections within rights-
controlled networks & communities; for many this means a big step outside their
trusted safe zones. The yellow outward arrow in Figure 1 represents the direction of
added value. It can thus be concluded that “the more heritage institutions move
outside their comfort zones, the greater the value that is created.”
Some examples of activities in the outermost ring are:
    On-demand digital archive: Users can search & order (free or paid,
       depending on the rights) cultural heritage sources using various search
       functions.
    Online museum experience: Alternative to or expansion of the museum using
       web 2.0 tools and platforms. Target users are approached actively by
       offering widgets, setting up discussion groups on social networks, and so on.
    Collaborative storytelling: Users tell their own personal stories on platforms.
       Heritage institutions often provide specific rights-cleared archive material
       that users can then integrate into their narrative.
    Distributed online research: Technical platforms, tools and social networks
       where users can jointly conduct and present research. This guarantees a
       certain degree of reliability with regard to the information, the relationship
       between the sources and the members of the community. An example of this
       is wikipedia.org.
    Social tagging: Users are given the facility of tagging digitized cultural
       heritage sources. The tags can contain a description or can express some
       appreciation, and they enrich the collection, making it easier and more
       worthwhile to discover.
    Online marketplace: This offers users the chance to bid online for cultural
       heritage objects and works of art.

Another example of a 4th ring service is the National Platform for Digital
Publications. As said above, it will be an open & collaborative service, providing
search and display APIs for delivering content to the places and networks the user are.
Similar to Youtube, it will offer widget-based embeddable content, possibilities for
user annotation, user profile pages, and cross-collection searching & display.


4 The cross-domain & international dimensions

As the national library, the KB has a very important facilitating and networking role
in the Dutch scientific and cultural infrastructure. Using this position, it has the
potential to set up and stimulate different levels of collaboration to make online
heritage more accessible. This is illustrated by the 3-tier collaborative model in Fig.2.
Figure 2. Dutch national collaborative aggregation model. The KB is responsible for
aggregating publications in the National Platform for Digital Publications

Lower level: domain specific collaboration & aggregation
As said in Section 3, KB’s National Platform for Digital Publications will be
positioned as an aggregator for Dutch full-texts, aiming to make the content - and the
network of content delivering partners - interoperable and ready for participation in
cross-domain initiatives on national and international levels.

Besides the KB with its platform, organizations from other domains are working on
interoperability and aggregation for their specific sectors. Lead by the Institute for
Sound & Vision 19 , institutions from the audio-visual domain collaborate to enable
aggregation of AV-materials. Similar initiatives are taking place for the archival
domain, with the National Archives 20 as the facilitator, and for the museum sector.
For the latter, the Rijksdienst voor het Cultureel Erfgoed 21 is the main player.

The ways content aggregation and the supporting technical and organizational
structures are set up are not uniform, but differ across the domains. Based on sector-
specific best-practices, knowledge and culture, each aggregator is setting up domain
interoperability in the best possible way. This is however not done in isolation; the
domains are in regular contact to reach consensus on issues such as “which content
goes where”, to learn from each other and to avoid overlapping work. This way
responsibilities & roles are kept clear, while at the same time synergies are exploited
where possible.
Middle level: national cross-domain collaboration & aggregation
To enable these sector specific aggregation initiatives to come together, the results of
the NED! project 22 are used. It delivered a basic infrastructure for the interoperability
of Dutch digital heritage, using open standards including XML, DublinCore, OAI-
PMH and SRU. It is now being expanded to build a cross-domain heritage aggregator
that can become the national hub for content delivery to international initiatives.

Building a national aggregator is however a step-by-step process, not finished
overnight. Until that time domain-specific aggregators - in case of the library domain
the Dutch National Platform for Digital Publications or The European Library 23 -
will continue to have an important role in routing Dutch library content directly to
top-level services. Finally, it should be noted that the cross-domain hub is envisioned
as a “dark aggregator”, i.e. a B2B service without an interface (website) for end users
(however, see item 5 below).

Top level: International cross-country collaboration & aggregation
Having established national cross-domain aggregation and interoperability on as
many levels as possible 24 , Dutch content can be shown and used on international
stages, most notably Europeana 25 .

This fast growing, largely EU-funded, metadata aggregator and display space for
European digitized works enables people to explore the resources of Europe's
museums, libraries, archives and audio-visual collections. It promotes discovery and
networking opportunities in a multilingual space where users can engage, share in and
be inspired by the rich diversity of Europe's cultural and scientific heritage.
Europeana always connects users to the original source of the material so authenticity
is ensured. The digital objects they can find are not stored centrally with Europeana,
but remain hosted at the providing cultural institutions.

Europeana offers the following added values for (Dutch) content holding institutions:

    1.   It enriches the experience of their users by making relations between their
         objects and information from other countries and in other formats. This
         enables cross-border and interdisciplinary research, as well as enriching the
         content by presenting it in a wider context.

    2.   Users expect integrated content – they want to see video’s, listen to sound
         recordings, look at images and read texts, all in once place. Using Europeana
         they can find related content in multiple formats, from different countries
         and from diverse domains and disciplines.

    3.   Europeana makes their content findable in search engines.

    4.   Europeana generates extra visits to their holdings by redirecting users to the
         original source of the content (i.e. the content holders’ websites).
5.   Europeana offers a set of APIs 26 . These not only enable reuse of Europeana
            content by third parties, but also allow the contextualized & enriched content
            of the providing institutions to be used in their own environments. The APIs,
            in other words, make it possible to create user interface elements for (dark)
            aggregation services on the lower and middle levels, as indicated in Figure 2
            by the dotted API arrows.

       6.   Knowledge transfer can be major added value for participants in the
            Europeana network. Europeana collaborates with professionals from digital
            libraries across Europe and the US. Knowledge generated by these experts is
            fed back into the network via presentations, workshops and seminars. This
            way valuable knowledge about the theory and practice on metadata
            standards, multilinguality, semantic web, information architectures, usability,
            geolocation, object modeling and many other subjects becomes available for
            content suppliers.

All advantages mentioned in Section 3 about openness, scalability and collaboration
apply equally to Europeana, as these key design choices were also the foundations on
which Europeana was built. Similar to the National Platform for Digital Publications,
Europeana is also a service in the 4th ring of the BMICE model. Becoming partners in
the Europeana network and making their content (re-)usable there, will thus allow
Dutch institutions to add another layer of added value to Dutch cultural & scientific
heritage.




   1
     Koninklijke Bibliotheek (KB), national library of the Netherlands, http://www.kb.nl
   2
     100 highlights of the KB, http://www.kb.nl/galerie/100hoogtepunten/index-en.html
   3
     Memory of the Netherlands, the national programme for digitizing Dutch cultural heritage,
http://www.geheugenvannederland.nl
   4
     Filming and digitization of the Dutch parliamentary papers 1814-1995,
http://www.kb.nl/hrd/digitalisering/archief/staten-generaal-en.html (project information) &
http://www.statengeneraaldigitaal.nl/ (website)
   5
     Dutch Historical Newspapers 1618-1945, http://www.kb.nl/hrd/digi/ddd/index-en.html
(project information) & http://kranten.kb.nl (website)
   6
     EDBO – Early Dutch Books Online - 10.000 full-text digitized books from 1781-1800, 2.1
million pages, http://www.earlydutchbooksonline.nl (from 26-5-2011 onwards)
7
      KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google-
en.html
    8
      Digitization by Proquest of early printed books in KB collection,
http://www.kb.nl/nieuws/2011/proquest-en.html
    9
      E-Depot, the KB’s digital archiving environment for permanent access to digital objects -
http://www.kb.nl/hrd/dd/index-en.html
    10
       Including, but not limited to Elsevier, BioMed Central, Blackwell Publishing, Oxford
University Press, Springer and Brill. For a complete list, see http://www.kb.nl/dnp/e-
depot/operational/background/policy_archiving_agreements-en.html
    11
       NARCIS, National Academic Research and Collaborations Information System,
http://www.narcis.nl/about/Language/en
    12
       Wijngaarden, H. van.: The seven year itch. Developing a next generation e-Depot at the
KB. Paper for the 76th IFLA General Conference and Assembly, 10-15 August 2010,
Gothenburg, Sweden, http://www.ifla.org/files/hq/papers/ifla76/157-wijngaarden-en.pdf
(accessed on 28-03-2011)
    13
       SCAPE - SCAlable Preservation Environments, http://www.scape-project.eu/
    14
       KB’s open digitization & accessibility standards,
http://www.kb.nl/hrd/digitalisering/standaarden-en.html
    15
       Digitization of ANP news items, http://www.kb.nl/hrd/digitalisering/archief/anp-en.html
(project information) & http://anp.kb.nl (website)
    16
       Hof, B.J.F. et al.: Baten in beeld; Kengetallen kosten-batenanalyse: beelden voor de
toekomst, SEO Amsterdam (2006), ISBN13 9789067333405,
http://www.kennisland.nl/uploads/.../8ba66f40-51c9-4f7f-9e60-8404c8aa84e8 (accessed on 27-
03-2011)
    17
       BMICE, Business Model Innovatie Cultural Erfgoed / Business Model Innovation
Cultural Heritage, http://www.bmice.nl/
    18
       BMICE ring model, taken from
http://www.den.nl/getasset.aspx?id=Businessmodellen/KL_BusModIn_web_eng_04.pdf&asset
type=attachments
    19
       The Netherlands Institute for Sound & Vision, http://instituut.beeldengeluid.nl
    20
       National Archives of the Netherlands, http://www.en.nationaalarchief.nl/default.asp
    21
       Rijksdienst voor het Cultureel Erfgoed, http://www.cultureelerfgoed.nl
    22
       NED! - Nederlands Erfgoed Digitaal!, http://www.nederlandserfgoeddigitaal.nl/
    23
       The European Library; on the one hand a free service that offers access to the resources of
the 48 national libraries of Europe in 35 languages, on the other hand an international library
domain aggregator for Europeana, http://www.theeuropeanlibrary.org
    24
       Establishing interoperability on as many levels as possible: technical, metadata,
semantical, human, inter-domain, organizational, political, .etc.
    25
       Europeana; paintings, music, films and books from over 1500 of Europe's galleries,
libraries, archives and museums, http://www.europeana.eu
    26
       Europeana Application Programming Interfaces, http://version1.europeana.eu/web/api

Contenu connexe

En vedette

Necc Docs Spreadsheets
Necc Docs SpreadsheetsNecc Docs Spreadsheets
Necc Docs Spreadsheets
Juan Pittau
 
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
Olaf Janssen
 
Ant And Administration‏ الادارة وضوابطها‏
Ant And Administration‏ الادارة وضوابطها‏Ant And Administration‏ الادارة وضوابطها‏
Ant And Administration‏ الادارة وضوابطها‏
amr hassaan
 
How social media revolutionized search marketing final deck
How social media revolutionized search marketing   final deckHow social media revolutionized search marketing   final deck
How social media revolutionized search marketing final deck
Optify
 
Rdf Processing On The Java Platform
Rdf Processing On The Java PlatformRdf Processing On The Java Platform
Rdf Processing On The Java Platform
guestc1b16406
 

En vedette (20)

Necc Docs Spreadsheets
Necc Docs SpreadsheetsNecc Docs Spreadsheets
Necc Docs Spreadsheets
 
2 Intro Fall 09
2 Intro Fall 092 Intro Fall 09
2 Intro Fall 09
 
Notebook 11.2 Tutorial including New Features & Tips
Notebook 11.2 Tutorial including New Features & TipsNotebook 11.2 Tutorial including New Features & Tips
Notebook 11.2 Tutorial including New Features & Tips
 
Progress report Wikipedian-in-Residence national library & archives Netherlan...
Progress report Wikipedian-in-Residence national library & archives Netherlan...Progress report Wikipedian-in-Residence national library & archives Netherlan...
Progress report Wikipedian-in-Residence national library & archives Netherlan...
 
OER uptake in adult education institutions
OER uptake in adult education institutionsOER uptake in adult education institutions
OER uptake in adult education institutions
 
Banner Layers
Banner LayersBanner Layers
Banner Layers
 
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
Konijnen en erfgoed in een hoge hoed - National platform for Dutch cross-doma...
 
Ant And Administration‏ الادارة وضوابطها‏
Ant And Administration‏ الادارة وضوابطها‏Ant And Administration‏ الادارة وضوابطها‏
Ant And Administration‏ الادارة وضوابطها‏
 
Bio Powerpoint
Bio PowerpointBio Powerpoint
Bio Powerpoint
 
Id
IdId
Id
 
Präsentation Villa Bosch Library 04_2011
Präsentation Villa Bosch Library 04_2011Präsentation Villa Bosch Library 04_2011
Präsentation Villa Bosch Library 04_2011
 
Olaf Janssen on benefits of collaboration between Europeana and archives duri...
Olaf Janssen on benefits of collaboration between Europeana and archives duri...Olaf Janssen on benefits of collaboration between Europeana and archives duri...
Olaf Janssen on benefits of collaboration between Europeana and archives duri...
 
Blogspot
BlogspotBlogspot
Blogspot
 
Optify's 2012 Marketing Athlete Report
Optify's 2012 Marketing Athlete ReportOptify's 2012 Marketing Athlete Report
Optify's 2012 Marketing Athlete Report
 
How social media revolutionized search marketing final deck
How social media revolutionized search marketing   final deckHow social media revolutionized search marketing   final deck
How social media revolutionized search marketing final deck
 
Social Media Fest 2011
Social Media Fest 2011Social Media Fest 2011
Social Media Fest 2011
 
I D
I DI D
I D
 
Rdf Processing On The Java Platform
Rdf Processing On The Java PlatformRdf Processing On The Java Platform
Rdf Processing On The Java Platform
 
Social Media Fest 2011
Social Media Fest 2011Social Media Fest 2011
Social Media Fest 2011
 
Talent mapping MBTI
Talent mapping MBTITalent mapping MBTI
Talent mapping MBTI
 

Similaire à Digitizing all Dutch books, newspapers & magazines - 730 million pages in 20 years - storing it, and getting it out there

Library labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities researchLibrary labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities research
Sally Chambers
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
Europeana Newspapers
 
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
Neil Beagrie
 

Similaire à Digitizing all Dutch books, newspapers & magazines - 730 million pages in 20 years - storing it, and getting it out there (20)

Developing KB online services... for a change
Developing KB online services... for a changeDeveloping KB online services... for a change
Developing KB online services... for a change
 
Developing a national digital library stapel - meijers 20160302
Developing a national digital library   stapel - meijers 20160302Developing a national digital library   stapel - meijers 20160302
Developing a national digital library stapel - meijers 20160302
 
Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016
 
BL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research TeamBL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research Team
 
Building library networks with linked data
Building library networks with linked dataBuilding library networks with linked data
Building library networks with linked data
 
Journées ABES 2014 - Projet CIB - Uwe Rich
Journées ABES 2014 - Projet CIB - Uwe Rich Journées ABES 2014 - Projet CIB - Uwe Rich
Journées ABES 2014 - Projet CIB - Uwe Rich
 
How to Build a Digital Library
How to Build a Digital LibraryHow to Build a Digital Library
How to Build a Digital Library
 
Proyecto Arrow. Ana Manchado Mangas
Proyecto Arrow. Ana Manchado MangasProyecto Arrow. Ana Manchado Mangas
Proyecto Arrow. Ana Manchado Mangas
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
Digitallibrary
DigitallibraryDigitallibrary
Digitallibrary
 
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
 
Linking data in digital libraries the case of puglia digital library
Linking data in digital libraries the case of puglia digital libraryLinking data in digital libraries the case of puglia digital library
Linking data in digital libraries the case of puglia digital library
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
Save This Book
Save This BookSave This Book
Save This Book
 
GI2012 pekarek-liber
GI2012 pekarek-liberGI2012 pekarek-liber
GI2012 pekarek-liber
 
Enabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service ProvidersEnabling Accessible Resource Access via Service Providers
Enabling Accessible Resource Access via Service Providers
 
Library labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities researchLibrary labs as experimental incubators for digital humanities research
Library labs as experimental incubators for digital humanities research
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
 

Plus de Olaf Janssen

Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
Olaf Janssen
 
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
Olaf Janssen
 
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
Olaf Janssen
 

Plus de Olaf Janssen (20)

Atlas De Wit + Wikimedia Commons + Wikidata = nieuwe manieren van zoeken & v...
Atlas De Wit + Wikimedia Commons + Wikidata  = nieuwe manieren van zoeken & v...Atlas De Wit + Wikimedia Commons + Wikidata  = nieuwe manieren van zoeken & v...
Atlas De Wit + Wikimedia Commons + Wikidata = nieuwe manieren van zoeken & v...
 
Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
Verbinden van bibliotheekcollecties met Wikimedia-projecten, KNVI-congres 201...
 
Interview InformatieProfessional KNVI Smart Humanity 2019 special met Olaf Ja...
Interview InformatieProfessional KNVI Smart Humanity 2019 special met Olaf Ja...Interview InformatieProfessional KNVI Smart Humanity 2019 special met Olaf Ja...
Interview InformatieProfessional KNVI Smart Humanity 2019 special met Olaf Ja...
 
Uitleg zelfgemaakte foto's uploaden naar Wikimedia Commons, Wikicafe Tiel, 10...
Uitleg zelfgemaakte foto's uploaden naar Wikimedia Commons, Wikicafe Tiel, 10...Uitleg zelfgemaakte foto's uploaden naar Wikimedia Commons, Wikicafe Tiel, 10...
Uitleg zelfgemaakte foto's uploaden naar Wikimedia Commons, Wikicafe Tiel, 10...
 
Beelddonaties: enkele reis of retour? Studiemiddag Wiki Wetenschappers SAE 26...
Beelddonaties: enkele reis of retour? Studiemiddag Wiki Wetenschappers SAE 26...Beelddonaties: enkele reis of retour? Studiemiddag Wiki Wetenschappers SAE 26...
Beelddonaties: enkele reis of retour? Studiemiddag Wiki Wetenschappers SAE 26...
 
Kennisbijeenkomst Wikimedia en Bibliotheken, 15 mei 2019
Kennisbijeenkomst Wikimedia en Bibliotheken, 15 mei 2019Kennisbijeenkomst Wikimedia en Bibliotheken, 15 mei 2019
Kennisbijeenkomst Wikimedia en Bibliotheken, 15 mei 2019
 
Introductie Delpher - Wikicafe Tilburg, 6 december 2018
Introductie Delpher - Wikicafe Tilburg, 6 december 2018Introductie Delpher - Wikicafe Tilburg, 6 december 2018
Introductie Delpher - Wikicafe Tilburg, 6 december 2018
 
Leven lang leren met Wikipedia & de KB, Teamdag KB, 29 mei 2018, Den Haag
Leven lang leren met Wikipedia & de KB, Teamdag KB, 29 mei 2018, Den HaagLeven lang leren met Wikipedia & de KB, Teamdag KB, 29 mei 2018, Den Haag
Leven lang leren met Wikipedia & de KB, Teamdag KB, 29 mei 2018, Den Haag
 
Hoe zet je zelfgemaakte foto's op Wikpedia? - Openbare bibliotheek 's-Hertoge...
Hoe zet je zelfgemaakte foto's op Wikpedia? - Openbare bibliotheek 's-Hertoge...Hoe zet je zelfgemaakte foto's op Wikpedia? - Openbare bibliotheek 's-Hertoge...
Hoe zet je zelfgemaakte foto's op Wikpedia? - Openbare bibliotheek 's-Hertoge...
 
Wikipedia - artikel in Boekenwereld, jrg 33, nr 1, febr 2017
Wikipedia - artikel in Boekenwereld, jrg 33, nr 1, febr 2017Wikipedia - artikel in Boekenwereld, jrg 33, nr 1, febr 2017
Wikipedia - artikel in Boekenwereld, jrg 33, nr 1, febr 2017
 
Uitleg fotos uploaden Wikimedia Commons, Wikicafe Tilburg, 04-01-2018
Uitleg fotos uploaden Wikimedia Commons, Wikicafe Tilburg, 04-01-2018Uitleg fotos uploaden Wikimedia Commons, Wikicafe Tilburg, 04-01-2018
Uitleg fotos uploaden Wikimedia Commons, Wikicafe Tilburg, 04-01-2018
 
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
 Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe... Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
 
Slimmer worden met de KB en Wikipedia - Filosofie op de Kaart, 20-10-2017, KB...
Slimmer worden met de KB en Wikipedia - Filosofie op de Kaart, 20-10-2017, KB...Slimmer worden met de KB en Wikipedia - Filosofie op de Kaart, 20-10-2017, KB...
Slimmer worden met de KB en Wikipedia - Filosofie op de Kaart, 20-10-2017, KB...
 
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH,...
 
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
Joining forces with Wikipedia reasons, experiences and impact - Sharing is Ca...
 
Digitale Toegankelijkheid in de praktijk - Deel 1. Koninklijke Bibliotheek, 2...
Digitale Toegankelijkheid in de praktijk - Deel 1. Koninklijke Bibliotheek, 2...Digitale Toegankelijkheid in de praktijk - Deel 1. Koninklijke Bibliotheek, 2...
Digitale Toegankelijkheid in de praktijk - Deel 1. Koninklijke Bibliotheek, 2...
 
Introductie Wikipedia Fotodag OB Midden-Brabant, Tilburg, 01-04-2017
Introductie Wikipedia Fotodag OB Midden-Brabant, Tilburg, 01-04-2017Introductie Wikipedia Fotodag OB Midden-Brabant, Tilburg, 01-04-2017
Introductie Wikipedia Fotodag OB Midden-Brabant, Tilburg, 01-04-2017
 
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
 
Kansen voor Wikipedia-fotodag in de Spoozone in Tilburg
Kansen voor Wikipedia-fotodag in de Spoozone in TilburgKansen voor Wikipedia-fotodag in de Spoozone in Tilburg
Kansen voor Wikipedia-fotodag in de Spoozone in Tilburg
 
Lunchlezing Arnhemsche Eau de Cologne-fabriek 1873-1876, Koninklijke Biblioth...
Lunchlezing Arnhemsche Eau de Cologne-fabriek 1873-1876, Koninklijke Biblioth...Lunchlezing Arnhemsche Eau de Cologne-fabriek 1873-1876, Koninklijke Biblioth...
Lunchlezing Arnhemsche Eau de Cologne-fabriek 1873-1876, Koninklijke Biblioth...
 

Dernier

Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdf
laloo_007
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 

Dernier (20)

Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdf
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
BeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdfBeMetals Investor Presentation_May 3, 2024.pdf
BeMetals Investor Presentation_May 3, 2024.pdf
 
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdfTVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial Wings
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 

Digitizing all Dutch books, newspapers & magazines - 730 million pages in 20 years - storing it, and getting it out there

  • 1. Digitizing all Dutch books, newspapers & magazines - 730 million pages in 20 years - storing it, and getting it out there Olaf D. Janssen Koninklijke Bibliotheek (KB), National Library of the Netherlands, Prins Willem-Alexanderhof 5, The Hague, The Netherlands olaf.janssen@kb.nl Abstract. In the next 20 years, the Dutch national library will digitize all printed publications since 1470, some 730M pages. To realize the first milestone of this ambition, KB made deals with Google and Proquest to digitize 42M pages. Since 2003 KB has operated its e-Depot, a system for permanent digital object storage. KB is now replacing it with a new solution to better deal with future demands, allowing improved storage of its mass digitization output. To meet user demand for centralized access, KB is also replacing its scattered full-text online portfolio by a National Platform for Digital Publications, both a content delivery platform for its mass digitization output and a national domain aggregator for publications. From 2011 onwards, this collaborative, open and scalable platform will be expanded with more partners, content and functionalities. The KB is also involved in setting up a Dutch cross-domain aggregator, enabling content exposure in Europeana. Keywords: National libraries, Digital library workflows, Mass digitization, Google, Proquest, Permanent storage, Integrated access, Cross-domain cultural heritage, Aggregation, Interoperability, Europeana 1 Digitizing the KB The KB 1 started digitizing its holdings in 1995, for reasons of accessibility and long- term preservation. In the first years small scale efforts focused on scanning visually attractive materials, highlights of the collection for the widest possible audiences. One of the first projects was 100 highlights of the Koninklijke Bibliotheek 2 , followed by Memory of the Netherlands 3 , the national programme for digitizing Dutch cultural heritage, which was focused on image based materials. It was not until 1999 that the KB started digitizing historical textual publications (books, newspapers & magazines). For the last 8 years, the focus has been on large-scale digitization of text corpora for study and research in the humanities using public funding. In 2003 a project took off
  • 2. to scan the complete run of Dutch Parliamentary Papers 4 . Consisting of 2.3 million pages, this was at that time an unprecedented quantity for the Netherlands. At the end of 2006 the KB was rewarded the Historical Newspapers project 5 . By the end of 2011, it will have scanned 8 million pages from popular Dutch regional, national and colonial newspapers from the period 1618-1995. In addition, in February 2011 the Early Dutch Books Online digitization effort 6 delivered 2.1 million full-text pages from the specials book collections of the KB and the university libraries of Amsterdam and Leiden. Furthermore, by the end of this year, some 1.5 million pages from the most frequently consulted old magazines (1840 -1950) will have been converted into full-texts. In 2010 the KB announced its ambitious plans to digitize all Dutch books, newspapers, magazines and other printed publications from 1470 onwards, a total of 730 million pages. A first milestone is set for 2013, by when the library should have scanned 10% of this amount. To realize its ambition, the KB cannot not rely on public funding alone, especially in times when government support for cultural heritage is in a downward trend. It has therefore entered into strategic public-private partnerships with both Google 7 and Proquest 8 to digitize 210.000 books (some 42M pages) from its public domain collections. 2 Permanent storage, now & in the future As the national library, the KB has a duty to permanently store not only printed publications, but also digital ones (both born-digital and digitized). As early as 1994 the KB recognized the importance of such an electronic depot and took action accordingly. It started making pilot agreements with major international publishers for depositing e-journals (“safehaven”) and undertook market research to acquire a technical solution for permanent storage. Such a system turned out not to be available off-the-shelf, so in 2000 KB joined forces with IBM to build the world’s first OAIS- based processing and preservation system for permanent storage of digital objects. This has resulted in the operational e-Depot 9 , which the KB has been running since 2003. Nowadays, this deposit is a safehaven for over 15 million scientific articles from some of the world’s biggest publishers 10 , focusing on international scientific, technical and medical journals (STM-publications). In addition it houses digital monographs, periodicals and reports from Dutch publishers and materials from the scientific repositories of Dutch universities, as part of the NARCIS 11 initiative. 2.1 Towards a new e-Depot In 2012 the KB’s maintenance contract with IBM will run out and components of the system will no longer be supported. The current implementation of the e-Depot is based on requirements set in the late ‘90s. Some of these have become outdated with
  • 3. respect to current & expected future requirements for speed and collection management facilities. Additionally, with the 'seven-year-itch' or the system 12 having past, it is already living longer than most other IT systems. Other reasons for upgrading the e-Depot are  Volume & scalability: digital publishing has lead to enormous growth of KB’s digital collections. Furthermore, the KB wants to permanently store the hundreds of millions of files resulting from its mass digitization programme output.  Heterogeneity & flexibility: the current system is only optimized for processing and storing relatively small numbers of homogeneous single objects, i.e. mostly PDFs. In other words, it is not able to give fast access to large numbers of diverse and compound content, which will become increasingly common in the near future (e.g. enriched publications, e-books, websites) In defining new requirements, the KB looked for consultation with its international colleagues, most notably with the National Library of Germany (DNB) and SUB Göttingen. This collaboration was based on the joint use of the IBM based system. Early 2009, the KB and DNB sought cooperation with other European national libraries to share experience, knowledge and resources. Another reason for doing so was the lack of suitable commercial off-the-shelf products; the solutions that are available bring the risk of vendor lock-in. When national libraries would join forces in defining requirements and tendering, this could trigger commercial suppliers to invest more in developing solutions that answer their requirements. Together with the national libraries of the UK, Germany, Norway, Spain, Portugal, Switzerland and the Czech Republic, KB defined an architectural outline, based on a two-layered OAIS model and a modular setup of the preservation system. Unfortunately, later that year the libraries decided not to have a joint tender due to different timelines. To guarantee continued technical innovation and development of the e-Depot, the KB is a partner in the SCAPE project 13 . This EU-funded initiative will provide ongoing technical input by developing scalable preservation planning and execution services that can be deployed in the new e-Depot system within the next three to five years. 3 Providing access & adding value The back-end data standards 14 are identical across all KB-run mass digitization projects, making the outputs in theory fully interoperable. However, this potential has not yet been optimized in the front-end presentation of the KB’s full-text collections. So far this has been done via separate, websites (4, 5, 15 ), each with its own specific branding, URLs, design and search & object display functionalities. For end-users the KB-collections thus appear to be unrelated and scattered, making them relatively difficult to use given the expectations of modern users. They demand all content to be available via a single point of entry, with the ability to apply multiple
  • 4. views & filters (by theme, by time, by geographical location, by object type etc.) to the interoperable, contextualized, enriched and re-usable content, with minimum copyright limitations. In addition, users are primarily interested in the digital content itself, much less from which physical object or institution it was derived. 3.1 Providing access – the Dutch National Platform for Digital Publications The KB has taken these user demands seriously and has just finished designing and implementing the first basic iteration of the Dutch National Platform for Digital Publications (working name). This full-text content distribution platform will give access to digitized books, newspapers and magazines. Not only will it include the output of the KB’s mass digitization projects, but it will also be open for text collections from other libraries. Access will be central via a modern Web2.0 site, as well as distributed via search and display APIs. These can deliver content to users in their normal workflows (via regular social networks, on mobile devices, in professional virtual research environments & communities, in products like Zotero, ReWorks, EndNote etc.), as well as allow others (both business and consumers) to build their own applications based on the content. Further key design choices of the platform include: 1. Open: everybody can bring and get content, as long as it fits the scope (Dutch textual publications) and certain standards (e.g. metadata & object quality). This will enable small institutions without much in-house expertise or infrastructure to expose their content on a national level. Depending on the rights on the objects, the content can be used, re-used, shared or enriched by third parties. 2. Scalable: given the ambitions of the KB to make all (to be) digitized collections available online, the platform must be able to cope with huge amounts of metadata and objects in the future. This means the service should allow for step-by-step upscaling towards more content and functionalities, with as little manual programming or data conversion work as possible. 3. Collaborative: as said above, the platform will be an open network of KB and other institutions, starting with a coalition of the willing. To guarantee buy-in from the start, partners will need to work collaboratively on both operational and strategical levels. This not only includes technical, but also organizational issues, such as funding, sustainability, governance and policy development. This collaborative approach means that  responsibilities (e.g. financial, technical, business, product development) are shared among the partners,  national expertise about e.g. semantic & metadata interoperability is brought together,
  • 5. barriers for new partners to join the network are lowered,  positions for joint support funding requests (both on national and European levels) become stronger, and thus  future sustainability of the platform is more likely. Furthermore, the National Platform for Digital Publications will improve the visibility of the KB as an attractive business-to-business service & data provider for partners in the Netherlands. KB could for instance offer a package of (paid) permanent object storage in its e-Depot, with an option to present the object on the platform to end users free of charge. The platform marks a turning point towards centralized access of KB text collections. Starting with the output of the Early Dutch Books Online project in May 2011, the content of the platform will be expanded step-by-step in the years to come. The current planning is as follows:  2011: Early Dutch Books Online (2.1M pages), First set of old magazines (1840 -1950, up to 1.5M pages), First set of early 20th century books (1913 onwards)  2012: Historical Newspaper collection (8M pages, by transferring the content of http://kranten.kb.nl into the platform), Collection of historical children’s books from the Rotterdam public library  2012-2014: output from the Google & Proquest efforts to be included, up to 42M pages Finally, the National Platform for Digital Publications will be positioned as a full-text and metadata aggregator, with the aim of making the content interoperable and exporting it to cross-domain initiatives, both on national, European and global levels. See Section 4 for more details. 3.2 Improved access leads to added value creation In the past decade, cultural heritage institutions have invested increasingly in their digital services, making their collections accessible and at the same time bringing new economic and social benefits within reach. A report 16 by the Dutch Foundation for Economic Research has shown that the total benefits of digitization and accessibility outweigh the costs. The heritage sector, creative industries, the education sector and consumers will all experience immediate benefits from widespread availability of cultural heritage objects. In other words, digital collections represent significant potential economic and social value, provided they are made easily accessible. To get an understanding how institutions should make their collections accessible to generate maximum added value, the BMICE 17 distribution ring model 18 of Figure 1 gives guidance.
  • 6. Figure 1. The BMICE ring model - Distribution rings showing four forms of access to cultural heritage. The outward arrow represents the direction of added value. The four rings represent the following levels of access 1. Analogue in house: The work is displayed physically or made physically accessible in an archive, exhibition or reading room. 2. Digital in house: The work is described digitally and may be digitized. It is made available within the walls of the institution by means of a closed network (or through digital data carriers), such as a computer or terminal at the institution that visitors can use to search through the collection database. 3. Online: All or part of the digital collection of the institution is offered online through the institution’s website, but without explicit rights of use or reuse. 4. Online in the network: Digital collections of the institution are made available in online networks. Rights of use are granted to third parties (the public, other institutions) for use or reuse. Heritage institutions have traditionally focused on - and felt safe in - the first ring, with ring 2 opening up since the start of the digital age in the late ‘80s. The 3rd ring has come into view since the mid ‘90s, when the web entered everyday life. The rise of the social web in the ‘00s has put momentum in giving access to objects in the 4th ring. Even nowadays, many content holders are only just beginning to enter this circle and understand the huge benefits of opening up their collections within rights- controlled networks & communities; for many this means a big step outside their trusted safe zones. The yellow outward arrow in Figure 1 represents the direction of added value. It can thus be concluded that “the more heritage institutions move outside their comfort zones, the greater the value that is created.”
  • 7. Some examples of activities in the outermost ring are:  On-demand digital archive: Users can search & order (free or paid, depending on the rights) cultural heritage sources using various search functions.  Online museum experience: Alternative to or expansion of the museum using web 2.0 tools and platforms. Target users are approached actively by offering widgets, setting up discussion groups on social networks, and so on.  Collaborative storytelling: Users tell their own personal stories on platforms. Heritage institutions often provide specific rights-cleared archive material that users can then integrate into their narrative.  Distributed online research: Technical platforms, tools and social networks where users can jointly conduct and present research. This guarantees a certain degree of reliability with regard to the information, the relationship between the sources and the members of the community. An example of this is wikipedia.org.  Social tagging: Users are given the facility of tagging digitized cultural heritage sources. The tags can contain a description or can express some appreciation, and they enrich the collection, making it easier and more worthwhile to discover.  Online marketplace: This offers users the chance to bid online for cultural heritage objects and works of art. Another example of a 4th ring service is the National Platform for Digital Publications. As said above, it will be an open & collaborative service, providing search and display APIs for delivering content to the places and networks the user are. Similar to Youtube, it will offer widget-based embeddable content, possibilities for user annotation, user profile pages, and cross-collection searching & display. 4 The cross-domain & international dimensions As the national library, the KB has a very important facilitating and networking role in the Dutch scientific and cultural infrastructure. Using this position, it has the potential to set up and stimulate different levels of collaboration to make online heritage more accessible. This is illustrated by the 3-tier collaborative model in Fig.2.
  • 8. Figure 2. Dutch national collaborative aggregation model. The KB is responsible for aggregating publications in the National Platform for Digital Publications Lower level: domain specific collaboration & aggregation As said in Section 3, KB’s National Platform for Digital Publications will be positioned as an aggregator for Dutch full-texts, aiming to make the content - and the network of content delivering partners - interoperable and ready for participation in cross-domain initiatives on national and international levels. Besides the KB with its platform, organizations from other domains are working on interoperability and aggregation for their specific sectors. Lead by the Institute for Sound & Vision 19 , institutions from the audio-visual domain collaborate to enable aggregation of AV-materials. Similar initiatives are taking place for the archival domain, with the National Archives 20 as the facilitator, and for the museum sector. For the latter, the Rijksdienst voor het Cultureel Erfgoed 21 is the main player. The ways content aggregation and the supporting technical and organizational structures are set up are not uniform, but differ across the domains. Based on sector- specific best-practices, knowledge and culture, each aggregator is setting up domain interoperability in the best possible way. This is however not done in isolation; the domains are in regular contact to reach consensus on issues such as “which content goes where”, to learn from each other and to avoid overlapping work. This way responsibilities & roles are kept clear, while at the same time synergies are exploited where possible.
  • 9. Middle level: national cross-domain collaboration & aggregation To enable these sector specific aggregation initiatives to come together, the results of the NED! project 22 are used. It delivered a basic infrastructure for the interoperability of Dutch digital heritage, using open standards including XML, DublinCore, OAI- PMH and SRU. It is now being expanded to build a cross-domain heritage aggregator that can become the national hub for content delivery to international initiatives. Building a national aggregator is however a step-by-step process, not finished overnight. Until that time domain-specific aggregators - in case of the library domain the Dutch National Platform for Digital Publications or The European Library 23 - will continue to have an important role in routing Dutch library content directly to top-level services. Finally, it should be noted that the cross-domain hub is envisioned as a “dark aggregator”, i.e. a B2B service without an interface (website) for end users (however, see item 5 below). Top level: International cross-country collaboration & aggregation Having established national cross-domain aggregation and interoperability on as many levels as possible 24 , Dutch content can be shown and used on international stages, most notably Europeana 25 . This fast growing, largely EU-funded, metadata aggregator and display space for European digitized works enables people to explore the resources of Europe's museums, libraries, archives and audio-visual collections. It promotes discovery and networking opportunities in a multilingual space where users can engage, share in and be inspired by the rich diversity of Europe's cultural and scientific heritage. Europeana always connects users to the original source of the material so authenticity is ensured. The digital objects they can find are not stored centrally with Europeana, but remain hosted at the providing cultural institutions. Europeana offers the following added values for (Dutch) content holding institutions: 1. It enriches the experience of their users by making relations between their objects and information from other countries and in other formats. This enables cross-border and interdisciplinary research, as well as enriching the content by presenting it in a wider context. 2. Users expect integrated content – they want to see video’s, listen to sound recordings, look at images and read texts, all in once place. Using Europeana they can find related content in multiple formats, from different countries and from diverse domains and disciplines. 3. Europeana makes their content findable in search engines. 4. Europeana generates extra visits to their holdings by redirecting users to the original source of the content (i.e. the content holders’ websites).
  • 10. 5. Europeana offers a set of APIs 26 . These not only enable reuse of Europeana content by third parties, but also allow the contextualized & enriched content of the providing institutions to be used in their own environments. The APIs, in other words, make it possible to create user interface elements for (dark) aggregation services on the lower and middle levels, as indicated in Figure 2 by the dotted API arrows. 6. Knowledge transfer can be major added value for participants in the Europeana network. Europeana collaborates with professionals from digital libraries across Europe and the US. Knowledge generated by these experts is fed back into the network via presentations, workshops and seminars. This way valuable knowledge about the theory and practice on metadata standards, multilinguality, semantic web, information architectures, usability, geolocation, object modeling and many other subjects becomes available for content suppliers. All advantages mentioned in Section 3 about openness, scalability and collaboration apply equally to Europeana, as these key design choices were also the foundations on which Europeana was built. Similar to the National Platform for Digital Publications, Europeana is also a service in the 4th ring of the BMICE model. Becoming partners in the Europeana network and making their content (re-)usable there, will thus allow Dutch institutions to add another layer of added value to Dutch cultural & scientific heritage. 1 Koninklijke Bibliotheek (KB), national library of the Netherlands, http://www.kb.nl 2 100 highlights of the KB, http://www.kb.nl/galerie/100hoogtepunten/index-en.html 3 Memory of the Netherlands, the national programme for digitizing Dutch cultural heritage, http://www.geheugenvannederland.nl 4 Filming and digitization of the Dutch parliamentary papers 1814-1995, http://www.kb.nl/hrd/digitalisering/archief/staten-generaal-en.html (project information) & http://www.statengeneraaldigitaal.nl/ (website) 5 Dutch Historical Newspapers 1618-1945, http://www.kb.nl/hrd/digi/ddd/index-en.html (project information) & http://kranten.kb.nl (website) 6 EDBO – Early Dutch Books Online - 10.000 full-text digitized books from 1781-1800, 2.1 million pages, http://www.earlydutchbooksonline.nl (from 26-5-2011 onwards)
  • 11. 7 KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google- en.html 8 Digitization by Proquest of early printed books in KB collection, http://www.kb.nl/nieuws/2011/proquest-en.html 9 E-Depot, the KB’s digital archiving environment for permanent access to digital objects - http://www.kb.nl/hrd/dd/index-en.html 10 Including, but not limited to Elsevier, BioMed Central, Blackwell Publishing, Oxford University Press, Springer and Brill. For a complete list, see http://www.kb.nl/dnp/e- depot/operational/background/policy_archiving_agreements-en.html 11 NARCIS, National Academic Research and Collaborations Information System, http://www.narcis.nl/about/Language/en 12 Wijngaarden, H. van.: The seven year itch. Developing a next generation e-Depot at the KB. Paper for the 76th IFLA General Conference and Assembly, 10-15 August 2010, Gothenburg, Sweden, http://www.ifla.org/files/hq/papers/ifla76/157-wijngaarden-en.pdf (accessed on 28-03-2011) 13 SCAPE - SCAlable Preservation Environments, http://www.scape-project.eu/ 14 KB’s open digitization & accessibility standards, http://www.kb.nl/hrd/digitalisering/standaarden-en.html 15 Digitization of ANP news items, http://www.kb.nl/hrd/digitalisering/archief/anp-en.html (project information) & http://anp.kb.nl (website) 16 Hof, B.J.F. et al.: Baten in beeld; Kengetallen kosten-batenanalyse: beelden voor de toekomst, SEO Amsterdam (2006), ISBN13 9789067333405, http://www.kennisland.nl/uploads/.../8ba66f40-51c9-4f7f-9e60-8404c8aa84e8 (accessed on 27- 03-2011) 17 BMICE, Business Model Innovatie Cultural Erfgoed / Business Model Innovation Cultural Heritage, http://www.bmice.nl/ 18 BMICE ring model, taken from http://www.den.nl/getasset.aspx?id=Businessmodellen/KL_BusModIn_web_eng_04.pdf&asset type=attachments 19 The Netherlands Institute for Sound & Vision, http://instituut.beeldengeluid.nl 20 National Archives of the Netherlands, http://www.en.nationaalarchief.nl/default.asp 21 Rijksdienst voor het Cultureel Erfgoed, http://www.cultureelerfgoed.nl 22 NED! - Nederlands Erfgoed Digitaal!, http://www.nederlandserfgoeddigitaal.nl/ 23 The European Library; on the one hand a free service that offers access to the resources of the 48 national libraries of Europe in 35 languages, on the other hand an international library domain aggregator for Europeana, http://www.theeuropeanlibrary.org 24 Establishing interoperability on as many levels as possible: technical, metadata, semantical, human, inter-domain, organizational, political, .etc. 25 Europeana; paintings, music, films and books from over 1500 of Europe's galleries, libraries, archives and museums, http://www.europeana.eu 26 Europeana Application Programming Interfaces, http://version1.europeana.eu/web/api