7. Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record
8.
9.
10.
11. Difficult to Discover. Good luck finding the data! “ Source: Committee on Climate Change”
12.
13.
14.
15. Scholarly record Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record
16. Research training based on scholarly communication Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Rarely includes data
17. Scholarly communication requires intellectual exchanges Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record No such data fabric
18. Scholarly discourse requires a record and provenance Discovery Access Record Permanence Citation Metadata Exposure Trust Fabrics Copyright Scholarly record Almost non-existent for data
[Slide to remind people we are focusing on research data] When you read a research article, you are reading someone’s interpretation of some underlying evidence. And that’s a subjective interpretation. When we talk about data, we are really talking about the solid evidence that underpins these research articles.
Data is the foundation for research It is an essential component of the scientific record. Time-consuming, costly to produce. Re-acquisition may be impossible. Therefore essential that it is preserved and shared.
Despite the importance of this data, there is… As a result, datasets are … Many researchers are willing to share their data, but … . This is like grey literature in the 80’s and early 90’s – before the web. If you wanted a conference paper, you either had to show up in person or know someone who was there. This split research institutes and universities into haves and have-nots. You had to be on the ‘secret paper passing network’ to learn about the hottest latest research. This means that almost 80% of researchers put their datasets where? On their laptops, desktops, desk drawers, departmental servers. This is not a serious way to run a serious business! The management practices vary tremendously – some will have good practices, but many will not – placing the data at risk. 88% said they would make data available and 43% expressed the need to access other’s data. Researchers who produce the essential data that drive new science are often unrewarded and that data centres have considerable challenges justifying their budget and existence. And how is the state of resource discovery for datasets? UKRDS Study (1) Data is difficult to retain and manage once project funding ceases (compare grey and published articles) (2) Only 12% do not make their data available - but informal networks are predominant (3) 43% expressed the need to access other's data
As a result, datasets are: Difficult to discover Difficult to access In danger of being lost This is widely recognised…
[The Economist - citation appears in print and web versions, so not to save space] Good luck finding the data! Cannot: - Validate the author’s claims - Investigate the data for other interesting facts…
I am here to tell you about the datasets programme, which has come about because of rapid changes in the digital landscape. People are generating and sharing ever increasing volumes of data. We refer to collections of data as datasets. While the nature of datasets varies across disciplines, researchers within each discipline typically agree on what constitutes a dataset for them. Examples of datasets include (1) example of volcanic data (2) sound archive (3) cluster of chromosomes inside a breast cancer cell (4) uk poll of voting intention (blue cons, red labour, yellow liberal) Within the Dataset Programme, we consider a dataset to be an organised collection of digital objects that is produced or consumed during research. We emphasise the role that the dataset plays in the research activity, its importance to researchers, its impact, and its potential for reuse. Despite the differing nature of datasets, many of the services required by researchers are shared, such as methods of citation, discovery, and preservation.
The current situation for data isn’t good. Articles are well catered for by libraries and publishers. The underlying data is being neglected. Unsatisfactory.
We can even ask the question – are datasets first class citizens in the record of science? Contrast this situation with the one that we have for research articles. Libraries ensure long-term storage and management of articles Well established services for giving access to articles. Nearly all published articles are held in multiple national libraries Articles and citations form the backbone of impact analysis of researchers Catalogues and full-text search support discovery Clearly, this is an untenable situation and we need to take action!
The datasets programme has been established to explore how the Library can help… Not only do we want to ensure data is preserved, we envision a future where… Our approach is to foster collaboration and…
How can we achieve this? We are working on a number of projects – see www.bl.uk/datasets
DataCite 2 We see Persistent identification as a key component for this…
So what can organisations, like the British Library, do to help address these issues. Libraries have a reasonable level of credibility with identifiers and metadata to enable discovery and enhance access. We are cross-discipline, and have established relationships with publishers, universities, researchers, funders and play a core role in the national research infrastructure. We feel that we can address some of the barriers that we are seeing to data citation. We are clear that we do not want to re-invent the wheel and that we want to ensure that the right incentives are there.
DataCite 3 The approach that DataCite is taking – using DOIs - has some important social benefits. Researchers, authors, publishers are comfortable, understand, and know how to use them. They put datasets on a level playing field with articles. [Add citation of data in an article… REAL ONE!]
So what can organisations, like the British Library, do to help address these issues. Libraries have a reasonable level of credibility with identifiers and metadata to enable discovery and enhance access. We are cross-discipline, and have established relationships with publishers, universities, researchers, funders and play a core role in the national research infrastructure. We feel that we can address some of the barriers that we are seeing to data citation. We are clear that we do not want to re-invent the wheel and that we want to ensure that the right incentives are there.
Example Project 1 – DataCite Our long term vision is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence. Germany – TIB Germany – Gesis Leibniz Institute Germany – German Library of Medicine United Kingdom - The British Library France - INIST Switzerland - ETH Zürich Denmark - TU Delft Netherlands - TIC Canada - CISTI Australia - ANDS USA - CDL USA - Purdue
Today we will be talking about DataCite International association of 15 organisations, founded at the British Library Just had our 1 year anniversary (founded at the British Library in December 2009). We are working together to…
What is a DOI? Unique identifier, similar in concept to an ISBN Consists of a prefix and a suffix
(NOTE – this DOI will not resolve!)
Built a service or minting DOIs This is what we will tell you about today BUT FIRST, we will quickly introduce DOIs
How can we achieve this? We are working on a number of projects – see www.bl.uk/datasets