Presented by Peter Burnhill, Director of EDINA, at PARSE.insight workshop on Preservation, Access and Re-use of Scientific Data, Darmstadt, Germany, 22 September 2009.
Piloting an E-Journals Preservation Registry Service (PEPRS)
1. Piloting an E-Journals Preservation
Registry Service (PEPRS)
Adapted from Progress Report to ISSN
Directors’ Meeting, Beijing September 2009
Peter Burnhill Director, EDINA
University of Edinburgh
1
2. Bio: I’m a data person turned into something else
Began at the University of Edinburgh as a survey statistician in
research centre in 1979
Changed career in 1984 to set up University Data Library
then combined that with
Co-director, Regional Research [GIS] Laboratory for Scotland, 1987/93
Past-President of IASSIST, 1996 - 2001
•international assoc. for data librarians and archivists
www.iassistdata.org
Director, Digital Curation Centre, 2004 - 2006 (Phase 1)
www.dcc.ac.uk
Director, EDINA national data centre, 1996 - present day
4. Re-thinking stewardship for scholarly works
The central task:
• to ensure that researchers, students & their
teachers have continuity of access to the online
scholarly resources they need.
• Digital preservation is crucial but need to keep
focus on ‘continuity of access’.
"I am in no way interested in immortality,
but only in the taste of tea."
Lu T'ung (born 755 A.D., reputedly lived 400 years)
5. Emergence of Digital Library
• mix of the document tradition (signifying objects & their use)
and the computation tradition (applying algorithmic, logical,
mathematical, and mechanical techniques to information management)
– “Both traditions are needed. Information Science is rooted in part in
humanities and qualitative social sciences. The landscape of
Information Science is complex. An ecumenical view is needed.”
* M.Buckland, Journal of American Society for Information Science, 50 p970-74 1999
• The digital library has words, numbers, pictures and sounds
– Numeric data, online learning & teaching materials, digital pictures
and other audio-visual materials
• What do researchers do?
• And what do they want/need of a digital library
– that they cannot do for themselves?
6. Infrastructure to support four ‘demand-side’ verbs
discover information object of interest
e.g. dataset, article referenced in database, etc
locate organisation offering service
e.g. data centre
or document delivery service
request use of service
via open access, privilege of membership, payment of money
access object of interest
via online access, document delivery, personal visit
based on MODELS workshops (UKOLN/JISC eLib)
7. Scientific / Scholarly Record
I have never believed that Science equates to
what is published in refereed journals
but
• Record of Science
does contain what is published in refereed journals
• Record of Scholarship (including Science)
contains what is published in journals (& books)
8. What’s the Problem for E-Journal Content?
• First, the Good News!
– Researchers and students now have online access to journal
articles
* to read & download: Any-where, Any-time …
• Next, the Bad News!
– What is now in digital form may not always be available
– Stops ‘tipping point’ from print to online
* Frustrates economic benefits of existing investment in digital
* Not good for libraries, not good for publishers
9. Why Worry About Digital Preservation?
• All that is now digital may not always be available
– for a variety of reasons
* Natural disaster
– Earthquake, Fire, Flood
* Computing failure
– Digital decay
» Bit rot, Format obsolescence
* Human folly
– Criminal/political action
» Hacking
– Commercial issues
» Publishers ceases publication with no transfer
» Publisher goes out of business with no transfer
9
10. Some Consequences of Web
• Essentials of supply chain have changed
* licensed to access, not sale of content
• Libraries no longer take physical custody of much key content
* online remotely, not on-shelf locally
• Role of libraries as trusted keepers of information and culture
has been disrupted
– Need assurance of continuity of access
* of all content for future generations
* of the back copies, post-cancellation of the licence
• Scholarly, cultural & intellectual heritage is at risk
11. What’s the Answer?
1. Think: Consider how we ensured continuing access to printed works
over the long term
– Human-readable format; relatively enduring media (paper)
– Multiple copies held in multiple places (a network of libraries)
2. Think again: Understand what is different about the digital
– Formats become obsolete; unseen digital decay (‘bit rot’)
– Can easily be altered (authenticity), copied and transported (theft)
3. Propose: Develop digital preservation policy
• Including practices that address threats & risks
4. Act: Implement policy & practices for global effect
– Need to command consensus across stakeholders (Transparency)
– Need to be sustainable, in organisational, technical & financial terms
5. Reflect: Test, monitor and report: Community & Transparency
12. How important are E-journals?
• 96.1% of Science journals are online
• 86.5% of Arts and Humanities are online
• 2006-2007 – 102,000,000 downloads
– Up 21% from previous year
• 17% usage is at the weekend
Source. E-journals: their use, value and impact.
Research Information Network. UK April 2009.
12
13. There are now lots of E-Journals and E-Serials
E-journals and preservation
70,000 66,000
59,549
60,000
50,000
Thousands of journals
40,000
30,000
30,000
20,000
10,000
0
Ulrich ISSN Academic
journals
1313131313131313131
14. Why a Preservation Registry?
• Many schemes emerging to meet challenge
• But who is doing what?
– How can libraries & policy-makers assess which e-journals
are being archived, by what methods, and under what
terms of access?
• JISC commissioned a scoping study for an
e-journals preservation registry
– the idea had been mentioned in the literature
14
15. Scoping Study Report Prior to PEPRS
• Rightscom / Loughborough University, 2007
– Confirmed expressed need among libraries and
policy makers
– Warned of potential burden on digital
preservation agencies
– Recommended:
* an e-journals preservation registry should be built
* UK Union Catalogue of Serials (SUNCAT)
or SHERPA (Open Access) get involved
– SUNCAT is hosted and managed at EDINA
15
16. Piloting …
PEPRS
Project: Funded by JISC,
• over two years, starting August 2008.
– review after 18 months into prospect for move into service
Partners: EDINA and ISSN International Centre (Paris)
– Support of Governing Body and Directors of ISSN Network
Purpose: Scope, develop & test a registry service
– Establish and test an Information Architecture
– Seek consensus across stakeholders
– Technical & financial sustainability
16
17. PEPRS is a project funded by JISC
Joint Information Systems Committee
JISC manages funding from all the UK government agencies responsible for
higher and further education
•‘to provide world-class leadership in the innovative use of ICT to support
education and research’
•JISC manages and funds more than 200 projects within 15 programmes.
Outputs and lessons are made available to the HE and FE community.
•JISC also supports 50 Services that provide expertise, advice, guidance and
resources to address the needs of all users in HE and FE.
• The three largest services are JANET(UK) - which oversees networking -
and two national academic data centres, EDINA and Mimas, based
respectively at the Universities of Edinburgh and Manchester
17
21. Project deliverables
• Now mid-way in a two-year project:
1. Problem statement, including definition of
user/stakeholder requirements
2. Formal statement of the information architecture
and proposed m2m interfaces, standards and
protocols
3. Prototype and then a working demonstrator, suitable
for external evaluation an as platform for an e-
journals preservation registry service
4. Business plan, with value proposition
5. Project-to-service plan, for roll-out and launch of
service and phased enhancement of functionality.
22. Presentations & Publication
1. JISC Journals Working Group, London, August 2008
2. ISSN National Directors Meeting, Tunis, September 2008
3. NASIG, 24th Annual Conference, Ashville NC, USA, 4 June 2009
4. Library of Chinese Academy of Science, Beijing, 15 September 2009
5. ISSN National Directors Meeting, Beijing, 17 September 2009
6. PARSE.Insight Workshop, Darmstadt, Germany, 21 September 2009
7. …
P.Burnhill, F.Pelle, P.Godefroy, F.Guy, M.Macgregor, A.Rusbridge & C.Rees
Piloting an e-journals preservation registry service.
Serials 22(1) March 2009. [UK Serials Group]
P.Burnhill
Tracking e-journal preservation: archiving registry service anyone?
Against the Grain. 21(1) February 2009. pp. 32,34,36
22
23. E-Journals
PEPRS
Scope: Journal and other serial content in digital format
– Focus on those serials with the ISSN identifier
* If its worth saving, it should have an ISSN
Multi-level: article is the information object of desire
– Focus on Journal Title-level
– Issued Content, ie Volumes (Year), Articles
International:
– Matters for the UK
* But matters to all countries
– Cannot be resolved in (national) isolation
23
24. Preservation
PEPRS
Scope: digital preservation agencies for journal content
Multi-level:
– 3rd Party organisations (eg CLOCKSS & Portico; PubMed)
– National Libraries (eg BL (UK), KB (Netherlands) some with
legal deposit
– Libraries and library consortia (eg UK LOCKSS Alliance)
24
25. Registry
PEPRS
Scope: what is being done by digital preservation
agencies for e-journals
Multi-level:
– Who can register, who decides who…
– What should be registered
* Intention, ingest pending (agreed), ingest in progress, ingest
completion.
– Self-statement of methods, using comparable vocabulary
International:
– Registry must be international
25
26. Service
PEPRS
Scope: delivering value for various use communities
Multi-use communities:
– Librarians
– Policy makers and funders
– Digital preservation agencies
– Publishers
– Subscription Agents
– etc
International:
– Action taken in and for the UK
– How to provide international service?
26
27. 4. Digital Preservation Agencies in the Pilot
* Two 3rd Party Organisations
– CLOCKSS
– Portico
* Two National Libraries (c.f. legal deposit)
– British Library (BL)
British Library e-Journal Digital Archive
– Koninklijke Bibliotheek (KB e-Depot)
KB, National Library of the Netherlands
* One library cooperative
– UK LOCKSS Alliance
27
31. Legal Deposit
• Works well with print via legislation and national
libraries.
• Countries with legislation enacted (or ‘in train’)
for e-materials include: Canada, Denmark,
Finland, France, Germany, Iceland, New
Zealand, Norway, South Africa, Sweden, UK
• But, not all countries (notably USA) and in UK
the legislation supports voluntary deposit, with
restrictions of mode of access
31
35. Piloting an E-journals Preservation Registry Service (PEPRS)
JISC-funded project, EDINA & ISSN-IC as partners
E-J Preservation Registry Service
E-Journal METADATA
Preservation on preservation actions
Registry
METADATA
on extant e-journals
36. Piloting an
E-journals
Preservation
Registry
Service
E-J Preservation Registry Service
E-Journal METADATA
Preservation on preservation action
Registry (b)
KEY DATA: (a)
Serial Title-level
(a)
•Title+ISSN; Pub.; related
•Extent issued in digital?
METADATA KEY DATA: (b)
on extant e-journals Agency Status
Serial Title-level, ISSN?
•Policies eg on access
Data dependency •Extent preserved
37. Piloting an
E-journals
Preservation
SERVICES: user requirements
Registry
Service (c)
E-J Preservation Registry Service
E-Journal METADATA
Preservation on preservation action
Registry (b)
(a)
METADATA
on extant e-journals
38. Piloting an
E-journals
Preservation
Registry
Service
E-J Preservation Registry Service
E-Journal METADATA
Preservation on preservation action
Registry (b)
(a)
METADATA
on extant e-journals
Data dependency
ISSN
Register
39. Piloting an
E-journals
Preservation
Registry
Service
E-J Preservation Registry Service
E-Journal METADATA
Preservation on preservation action
Registry (b)
(a)
METADATA Digital Preservation Agencies
e.g. CLOCKSS, Portico; BL, KB;
on extant e-journals UK LOCKSS Alliance etc.
Data dependency
ISSN
Register
40. Abstract Data Model: Figure 1 in reference paper in Serials, March 2009
SERVICES: user requirements
E-J Preservation Registry Service
Piloting an
E-journals E-Journal METADATA
Preservation Preservation on preservation action
Registry (b)
Registry
Service
(a)
METADATA Digital Preservation Agencies
e.g. CLOCKSS, Portico; BL, KB;
on extant e-journals UK LOCKSS Alliance etc.
Data dependency
ISSN
Register
41. 6. Project Progress & Achievements
• Data implementation model for Project
• Screenshots from ‘working’ Prototype
• Liaison with Archiving Agencies
– sample data & data fields
• (Presentations & publications)
41
42. Data Model for Prototype & Working Demonstrator:
(1) obtain subsets of data from ISSN Register and from Preservation Agencies;
(2) set up secure system for project purposes; (3) develop prototype / demonstrator
Pilot of E-J Preserv Registry Service
Project
E-Journal Preservation action metadata
Preservation
Piloting an Registry
E-journals
Preservation
Registry
Service
E-J
metadata
Digital Preservation Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.
ISSN
Register
43. This is a ‘Prototype’ – being shared by project partners, and may be shown to project
associates & the funders (JISC): this shows the Basic Search
46. Example of a
search that
reports no
known
preservation
activity for an
this e-journal
47. ISSN MARC 21 fields
ISSN Data 001 Control Number (Internal)
ISSN Data 008 Fixed-Length Data Elements inc. country code
ISSN Data 022 International Standard Serial Number (ISSN & ISSN-L)
ISSN Data 007 Medium of publication
ISSN Data 222 Key Title
ISSN Data 210 Abbreviated Title
ISSN Data 245 Title proper
ISSN Data 246 Varying Form of Title
ISSN Data 710 Added Entry - Corporate Name
ISSN Data 260 Publication, Distribution, etc. (Imprint)
ISSN Data 362 Dates of Publication and/or Sequential Designation
ISSN Data 776 Additional Physical Form Entry
ISSN Data 780 Preceding Entry
ISSN Data 785 Successor Title
47
48. Possible Agency fields
Agency Archiving Agency
Agency e-ISSN
Agency Print-ISSN
Agency Title
Agency Publisher
Agency Preservation Status
Agency Holdings (Volume, Issue)
Agency Start Date of Committed Titles
Agency End Date of Committed Titles
Agency Start Date of Processed Titles
Agency End Date of Processed Titles
48
49. Thoughts and action ..
Still early days:
• Use E-Journals Register, sourced from ISSN Register
– Over 66,000 e-serials now have ISSN
• Need to agree what users want to know
– descriptors of digital preservation policy & practices
• Use network interoperability (to search or to harvest)
– for up-to-date, reliable information held by preservation agencies
on and statements about policies and coverage
• ‘Titles’ is easy, but ‘Holdings’ is difficult!
– role for DOI and Onix for Serials?
• Ensure that e-journals you care about get an ISSN identifier!
– The Directory of Open Access Journals (DOAJ) requires it
49
50. Questions, Questions, Questions ….
• What to do about e-serial content that is being preserved
where the ISSN has not been assigned?
• How/whether to include print journals with content that are
digitised retrospectively?
* some of which may have a print ISSN but many will not
• How to collect, record and display ‘holdings’ information?
– The extent preserved: years?, issues? Articles???
• How to be an international registry, and will that scale?
• If attention is switching from preservation to post-cancellation
access, should PEPRS try to adapt?
– But that is for a national registry (PeCAN Project)
– A national not an international responsibility
50
51. ISSN-IC looking at assignment workflow
• As part of PEPRS project, ISSN-IC has drafted a workflow.
An example might be:
1. Discover ‘new’ (unassigned) e-serial from a digital preservation
agency
– Establish ISSN eligibility and ISSN jurisdiction for that publication
2. Temporary use of identifier local to PEPRS
3. ISSN-IC work with National ISSN Centre
* according to pre-agreed schedule
4. E-serial becomes included in ISSN Register
5. Metadata and ‘pointer’ in e-journals preservations registry service
updated
6. Happiness!
51
52. Project developments
• Interaction with Preservation Agencies
– Blogging workshop for all Project participants.
Seek views on data flows, data fields, vocabularies
etc.
• Development of demonstrator, to support
pilot activity
Planned for autumn/winter 2009
• Assessment of future of pilot, and future
funding
Scheduled for February 2010
52
53. Project Update at Project Website
http://edina.ac.uk/projects/peprs/index.html
53