British Library Datasets Programme
John Kaye - Lead Content Specialist datasets, British Library spoke on the British Library's Datasets programme and the DataCite project
2. The British Library
Exists for everyone who wants to do
research – for academic, personal, and
commercial purposes.
Covers all subject areas – sciences,
technology, medicine, arts, humanities,
social sciences…
Receives a copy of every item
published in the UK.
Holds over 150 million items, with 3
million items added each year.
Used by over 16,000 people each day
(on site and online).
2
3. Data and the Digital Landscape
Seismic measurements taken by a
geologist.
Genetic data collected by a medical
researcher.
A survey of public opinions collected
by a sociologist.
3
4. The Foundation for Research
Data is a crucial component of the scholarly record.
Re-acquisition may be impossible
Datasets are essential to the British Library’s mission
to advance the World’s knowledge.
4
5. Currently…
There is:
No effective way to link between datasets and article;
No widely used method to identify datasets;
No widely used method to cite datasets.
As a result, datasets are:
Difficult to discover;
Difficult to access;
In danger of being lost.
5
6. Datasets Strategy - Vision
Researchers can discover, access, adapt, reuse and
reference datasets in the course of their research
Researchers will be able to track the impact that their datasets
have and receive appropriate credit
The British Library will be an essential component of an
interconnected network of service providers
Datasets from all disciplines remain intact, discoverable,
useable and vital for future generations
6
7. The Datasets Programme
We envision a future where researchers can:
Discover, access, reuse, and reference
datasets.
Track the impact of the data that they
generate and receive appropriate credit.
Our approach is to:
Provide a focus for the community to
establish needs, requirements and
agreement.
Explore novel technology and creative
solutions.
7
8. Projects – DataCite
DataCite is an international consortium which
aims to:
Establish easier access to scientific research
data on the Internet
Increase acceptance of research data as
legitimate, citable contributions to the scientific
record
Support data archiving that will permit results
to be verified and re-purposed for future study
8
9. Projects – DataCite
German National Library of Science and
Technology (TIB)
British Library (BL), UK
ETH Zurich Library, Switzerland
Institute for Scientific and Technical
Information (INIST-CNRS), France Founded on 1 Dec 2009
National Technical Information Center
(DTIC), Denmark
TU Delft Library, Netherlands
Canada Institute for Scientific and
Technical Information (CISTI) Now 12 members from 9
Australian National Data Service (ANDS) different countries
California Digital Library (CDL), USA
Purdue University Libraries (PUL), USA
German National Library of Medicine (ZB
MED)
GESIS - Leibniz Institute of Social
Sciences, Germany
9
10. Projects – DataCite
DataCite:
Supports researchers by enabling them to locate,
identify, and cite research datasets with
confidence
Supports data centres by providing workflows and
standards for data publication
Supports publishers by enabling research articles
to be linked to the underlying data
10
11. A Key Component for Many Goals
Cite
Make
Reuse
Visible
Persistent
?
Identification
Find Verify
Track
Access
Impact
11
12. Connecting an Article with the Underlying Data
URLs are not persistent
(e.g. Wren JD: URL decay in
MEDLINE- a 4-year follow-up
study. Bioinformatics. 2008, Jun
1;24(11):1381-5).
Digital Object Identifiers (DOIs)
offer a solution
Mostly widely used identifier for
scientific articles Dataset
Researchers, authors, publishers Yancheva et al (2007). Analyses
know how to use them on sediment of Lake Maar.
Put datasets on the same playing PANGAEA.
field as articles doi:10.1594/PANGAEA.587840
12
13. The Cost of Visibility
DOI-registration and €0.01 – €1
search results
Storage,
€50 – €500
quality assurance,
(approx 1% of data creation cost)
and metadata
Harvesting €5,000 – €5,000,000
and production
13
15. Social Science Collections and Research Datasets Strategy
Content
Continue to build existing content (print and electronic): OECD, World
Bank, UN etc
Enhance links to Economic and Social Data Service: International
Government, Longitudinal, Qualidata
Partnerships
Key partners: UK Data Archive, ONS, The National Archives
Involved in UK Data Forum; signatory to National Data Strategy for
Economic and Social Data
Resource discovery
Resource/ user guides – add value to SSCR projects
Dataset Cataloguing
Census 2011 exhibition
Capacity building
Datasets Content Lead Recruited
Training for Reference Team (Social Science, Science)
15
16. Challenges to Explore
Long-term preservation of data
Standards for data citation and metadata
Methods for assuring quality and integrity of data
Attribution and credit for data producers
Effective discovery and accessibility
16
17. John Kaye
Lead Content Specialist – Datasets
Social Science Collections and Research
The British Library
96 Euston Road London
NW1 2DB
Telephone: 020 7412 7450
Email: john.kaye@bl.uk
datasets@bl.uk
17