Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Publishing your research: Research Data Management (Introduction)
1. Publishing your
Research
Research Data Management
James Bisset james.bisset@durham.ac.uk
Academic Liaison Librarian (Research Support)
Sebastian PaÅucha sebastian.palucha@dur.ac.uk
Research Data Manager (CIS/Library)
2. Session outline
- What... is āResearch Dataā?
- Small group activity
- What... is āResearch Data Managementā ?
- Data life cycle, existing practice and policy
- Why... Is Research Data Management important?
- Drivers for change, Requirements on & benefits for researchers
- How... to manage and secure research data
- Data Management Planning. Document storage and back-up
- How... To share data
- Benefits of sharing data and tools available
11. Research Data
There is no single or simple definition
of what constitutes āresearch dataā
12. Research Data
There is no single or simple definition
of what constitutes āresearch dataā
- it is used to support the production or validation of
original research.
13. Research Data
There is no single or simple definition
of what constitutes āresearch dataā
- it is used to support the production or validation of
original research.
- it can be āborn digitalā, or it can be analogue (and
then digitised)
14. Research Data
There is no single or simple definition
of what constitutes āresearch dataā
- it is used to support the production or validation of
original research.
- it can be āborn digitalā, or it can be analogue (and
then digitised)
- it is situational...
15. Data is situational
Ship logbooks :
- historical record of events
- data to reconstruct weather
patterns
- data on naval personnel
(genealogical / demographic)
- extrapolation of data on
ration provisions etc.
16. Data is situational
CCTV footage:
- data on crime & prevention
- data on foot-fall
- demographic data
19. Data is situational
Data can be used...
... and re-used...
... for purposes you
may not have thought of...
20. Data is situational
Data can be used...
... and re-used...
... for purposes you
may not have thought of...
... even after you have extracted all
the value you need from it.
21. ā Research data ... is
collected, observed, or
created, for purposes of
analysis to produce original
research results.ā
Research Data Explained (2013) Edinburgh University MANTRA
http://datalib.edina.ac.uk/mantra/
23. Where is your data?
JISC RDM Survey
- Russell Group institutions average over
2PB of data
24. Where is your data?
JISC RDM Survey
- Russell Group institutions average over
2PB of data
- significant data storage on external
drives, hard drives etc.
25. Where is your data?
JISC RDM Survey
- Russell Group institutions average over
2PB of data
- significant data storage on external
drives, hard drives etc.
- 23% of institutions had lost research data
26. Where is your data?
JISC RDM Survey
- Russell Group institutions average over
2PB of data
- significant data storage on external
drives, hard drives etc.
- 23% of institutions had lost research data
- how would this impact upon your PhD?
28. ā Research data management
concerns the organisation of
data, from its entry to the research
cycle through to the dissemination
and archiving of valuable results.ā
Whyte, A., Tedds, J. (2011). āMaking the Case for Research Data
Managementā. DCC Briefing Papers.
http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
34. ā¦ it is a requirement
āPublicly funded research data are a public
good, produced in the public interest, which
should be made openly available with as few
restrictions as possible in a timely and
responsible manner that does not harm
intellectual property.ā
RCUK Common Principles on Data Policy
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
35. ā¦ it is a requirement
The European Commission is developing an
Open Data Pilot to:
āfacilitate research data
registration, discovery, access and re-use,ā
Horizon 2020 ā Outline of a Pilot for Open Research Data
http://www.coar-repositories.org/files/Horizon_2020_Open_Data_Pilot_20130703_final.pdf
38. ā¦ it is good practice
ā¢ Knowing where it is aids retrieval
39. ā¦ it is good practice
ā¢ Knowing where it is aids retrieval
ā¢ You may need to retrieve the data 3 months
or 3 years after you have ācreatedā it
- eg when writing up your PhD or article
40. ā¦ it is good practice
ā¢ Knowing where it is aids retrieval
ā¢ You may need to retrieve the data 3 months
or 3 years after you have ācreatedā it
- eg when writing up your PhD or article
ā¢ To safeguard your data from loss / theft /
corruption or damage / obsolescence
41. ā¦ it is good practice
ā¢ Project in 1986
ā¢ Multiple formats
of data
(image, video, text
) stored on Laser
Disc
ā¢ Copyright issues
http://www.bbc.co.uk/news/technology-
13367398
http://en.wikipedia.org/wiki/BBC_Domesd
ay_Project
43. ā¦ boosts your profile
ā10,555 studies ā¦ we found that studies that made
data available in a public repository received 9%
more citations than similar studies for which the
data was not made available.ā
Piowar H. & Vision T. (2013) āData reuse and the open data citation advantageā PeerJ
http://peerj.com/articles/175/
44. ā¦ data can be re-used
ā¢ You can share something which can be built
upon in ways you might not have imagined
- inter-disciplinary research
- collaboration opportunities
45. ā¦ data can be re-used
ā¢ You can share something which can be built
upon in ways you might not have imagined
- inter-disciplinary research
- collaboration opportunities
ā¢ Data can be tested and replicated
- identify fraud and error
- Fraud in cancer care
- Sir Cyril Burt (1893-1971): Heritability of IQ
- Reinhart-Rogoff revisited
46. Why manage your
data?
ā¢ You are increasingly likely to be required to
ā¢ It is good research practice
- to defend your research publications
- to secure against loss of data
ā¢ It boost your citation potential
ā¢ Your data can be re-used and replicated
50. Data Management
Planning
ā¢ The Majority of UK funders ask for a data management
plan as part of a funding application
ā¢ Purpose:
- to help you properly manage your data
- to provide a funder with confidence that you are a good
investment.
52. Questions to consider
ā¢ What is the story of the data ?
ā¢ What form and format are the data in ?
ā¢ What is the expected lifespan of the data ?
ā¢ How could the date be used, re-used or re-purposed ?
ā¢ How large is the data set ? Will it grow ?
ā¢ Who are the potential audiences ?
ā¢ Who owns the data ?
ā¢ Is the data sensitive ?
ā¢ What publications are linked to the data ?
ā¢ How should the data be made accessible ?
Witt & Carlson (2007) āConducting a Data Interviewā Scientist
53. Who is involved?
ā¢ Influences and dependenciesā¦
- Researcher requirements
- Funder and institutional requirements
- Availability and suitability of data storage
- Research Group requirements
- Publisher requirements
- Legal Requirements
(FoI, Copyright, Ethics, Data Protections)
54. Who is involved?
ā¢ Actors and interactions
- Researcher & PI
- Research Office
- IT Business Partner for Research / Research Data
Manager
- Librarians / archivist / record managers
(metadata schema, curation)
- FOI officers
- Technical and laboratory staff
61. DM Plan: common themes
ā¢ Data collection, what and how
(i.e. volume, format, )
ā¢ Documentation, administrative data and
metadata
ā¢ Ethics and legal compliance
(the FOI, IPR and DP acts; confidentiality and embargoes)
ā¢ Storage and backup
(day to day practices)
ā¢ Data sharing and preservation
(where, who and when will have access)
63. Organising your data
ā¢ plan a hierarchy of files and folders, organised
byā¦
- type of data (text, image, model, sound, video etc.)
- type of research activity (survey, interview etc.)
- type of material (documentation, publication, etc.)
64. Organising your data
ā¢ Be systematic and consistent with naming
conventions and housekeeping from the startā¦
- files should be sortable by name
- filenames should indicate the āversionā
- filenames should be easily distinguisable
66. Thinking about filenames
ā¢ Consider including elements in filenamesā¦
- Date 2013_12_12
- Project identifier CARD
- Content description RDM_presentation
- Version v1_2
2013_12_12-CARD-RDM_presentation-v1_2.pptx
67. Thinking about filenames
ā¢ Consider including elements in filenamesā¦
- Date 2013_12_12
- Project identifier CARD
- Content description RDM_presentation
- Version v1_2
CARD/2013_12_12-RDM_presentation-v1_2.pptx
68. Thinking about filenames
ā¢ Pitfalls to avoid
- Whitespace
- Unsupported characters in filenames
- Capitalisation
2013_12_12-RO-RDM_presentation-v1.2.pptx
2013_12_12-RO-rdm_presentation-v1.2.pptx
71. Data about Data
ā¢ To keep track of dataā¦
ā¢ ā¦ and to describe what data is available to a
secondary user
ā¢ Spreadsheet?
ā¢ Lab notebook?
- electronic / paper?
ā¢ Database?
77. Data formats
ā¢ Think about what format you are saving your data
inā¦
Prefer thisā¦ ā¦ over this
ASCII (human readable)
(.txt, .xml, .csv )
Binary formats
(.exe, .doc, )
Open standard
.odt
.ods
Proprietary
.docx
.xlsx
79. Data back-ups
ā¢ Are you just digitising / photocopying?
ā¢ Are you saving files into in multiple locations
(pendrives, hard drive, external hard drive?)
ā¢ Tip for Durham Research Students:-
- (stevens)(j:) your Durham network drive
ā¢ Other tools available: SyncToy, Time Machine, Deja
Dup
80.
81. Data security
ā¢ Password vault
ā Do you use passwords >8+
ā Public Key Encryption (PKI) use 128 ā 256
ā¢ Virtual Encrypted Drive
ā TrueCrypt, FileVault
82. Data security
ā¢ Secure Interent Protocols
ā WiFi: WPA2 but not WEP
ā Browser: HTTPS
ā Virtual Private Network (VPN), Secure Shell (SSH)
ā¢ How to access j: drive off campus
ā DU MDS Anywhere
ā WinSPC, Macfusion, sftp
86. Further Reading
ā¢ DCC training materials on RDM
- http://www.dcc.ac.uk/training/train-trainer/disciplinary-rdm-
training/conceptualise/conceptualise
ā¢ Examples of Research Data plans
- http://relu.data-archive.ac.uk/data-sharing/planning/examples/
- http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples
ā¢ Data Management Plan templates
- https://dmponline.dcc.ac.uk/
88. [15] Via Flickr Creative Commons, and by L. Whittaker: Available at
http://www.flickr.com/photos/7577311@N06/1490557341
Image Credits
[3] Via Flickr Creative Commons, and by Eric Fischer: Available at
http://www.flickr.com/photos/24431382@N03/4671562937
[31] Via Flickr Creative Commons, and by barks photo stream: Available
at http://www.flickr.com/photos/49503168860@N01/4257136773
[48] Via Flickr Creative Commons, and by Darwin Bell: Available at
http://www.flickr.com/photos/darwinbell/1454251440/
[16] Via Flickr Creative Commons, and by What What: Available at
http://www.flickr.com/photos/99136715@N00/26553280
[27] Via Flickr Creative Commons, and by FutUndBeidl: Available at
http://www.flickr.com/photos/61423903@N06/7369580478
Emphasis: - this session is an introductory, awareness session. - not the aim that you will go away experts - we want you to leave thinking you need to know more and read more - further reading at end of slides - Much of the topics discussed are wider than just your research degree - But many of the principles are applicable to your research degree - INTRODUCE SEBASTIAN - He and Paul Drummond in CIS will be looking at developing policy and systems support across the University over the next 3 years, in line with policy directions from the UK and Europe, and you may meet him over that time. - He will also be looking at the need to develop and provide additional training and guidance.
5 minutes discussion in groups of 3-4 / yell out to front of class
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
There is no single definition, so lets agree some basics...Situational ā egcctv footage (crime prevention, measure of footfall, demographic data) / ships logs (historical record of events, record of weather patterns, personnel lists) / photographs (historical record of objects or locations, a source of data on techniques or chemical processes of photo development)
I said there was no single definition of what data is, but Iām going to leave you with one...
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?
Ask students where they are storing their data? - are they backing it up - what to they plan to do with it once completed? - what if they are asked for it in 7 years time? - if only on one device, what happens if that device is stolen/lost/damaged?Given that H.264 (Half HD) is 25Mbit/s or 3.125 MB/s we need to stream 11 years for 1PT, check Google calculator https://www.google.com.au/search?q=25+Mbit%2Fs+%2a+1+hour#q=25+Mbit%2Fs+*+22+years
Click on image to go through to Data ArchiveCreating Data: You need to plan ahead. What storage will you require? What formats will the data be in, and how will this be supported? What ethical and legal considerations do you need to take into account in both collecting and storing the data, and then how will this affect your ability to share the data.Processing data: As you digitise, transcribe, translate, anonymise, check and clean the data created or collected, you need to start to put some of you planning into practice: storing data, describing data. Here you might be creating new sets of metadata which will be key to any future re-use: your notebooks, codebooks (if coding qualitative data), recording decisions and workflows applied in cleaning and checking data.Analysing data: Here is the bulk of your actual research, but you will at this point be needing to think about how the data can and should be preserved. For example, when publishing your research you may need to either provide the underpinning data, or indicate if, how and where it can be accessed by a readerā so you may need to be providing access to data from this pointPreserving data: The best format for the data for you to use may not be the best format for the data to be preserved for future use. So here you will need to be working with colleagues to ensure the data is stored, and backed-up effectively. To aid retrieval, you will also need to ensure the metatdata and documentation describing the data is robust. And finally, you will need to be thinking about how the preservation of the data will be ongoing.Giving Access to data: This is how and where you provide access to the data, and make clear any copyright issues arising.Re-using data: how is the data then re-used in further researchā¦ and the cycle begins again.
Mention Sebastian and Paulās role in supporting RDM across the institution.Mention pages with guidance and advice and contact details will be up shortly.
What would happened if you lost external hard drive with few years of research data for your PhD? Image from Peter Murray-Rust blog CC-by - http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan/
What would happened if you lost external hard drive with few years of research data for your PhD? http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan/ - I have lost 5 years of research data
5 minutes discussion in groups of 3-4...
Funders are asking āwhy do you need to collect new data, it may already existā
You also have requirements or moves to recognise the need to manage and share data from other organisations.Mention also that journals (eg in biosciences) may require you to submit data alongside a published article as standard practice.
You also have requirements or moves to recognise the need to manage and share data from other organisations.Mention also that journals (eg in biosciences) may require you to submit data alongside a published article as standard practice.
Another example: NASA re-used 200,000 master tapes which were thought to have passed their usefulness. But they were later required, and NASA instead had to rely on poorer quality and less complete sets of broadcast images which brought there own copyright issues with them.
You might be thinking, I donāt want people to find out if I have made a mistakeā¦.Well, you may, and then you can own up and move on. But what you should be more worried about is being able to identify if others have made a mistake and how that might impact on your research.
You might be thinking, I donāt want people to find out if I have made a mistakeā¦.Well, you may, and then you can own up and move on. But what you should be more worried about is being able to identify if others have made a mistake and how that might impact on your research.
http://archive.ics.uci.edu/ml/datasets/IrisImages Credits - http://en.wikipedia.org/wiki/Iris_flower_data_setIris setosa ā taken by Radomil - CC-by-sa 3.0 unportedIris versicolor ā taken by Danielle Langlois ā CC-by-sa 3.0 unportedIris virginicashraveli BLUE FLAG from Flickr http://flickr.com/photos/33397993@N05/3352169862 ā CC-by-sa 2.0 generic
DCC introductory video, concentrate on research integrity:- http://www.youtube.com/watch?v=2JBQS0qKOBU first 3 min
Research Data Planning is a joint endeavour with multiple participants contributing to different stages of research data lifecycle. All have to fallow the same map to mitigate risk of not arriving at the same destination.1999 NASA 125$ mln Mars probe lost, Agency used metric system whereas contractor imperial.At least two multi-million Ā£ research grants have been lost by top UK research institutions because of failing to provide an adequate and robust data management plan as part of the grant application.
(*) EPSRC ask to develop and implement institutional data policyAt least two multi-million Ā£ research grants have been lost by top UK research institutions because of failing to provide an adequate and robust data management plan as part of the grant application.Reminder ā this is applicable to a much wider context than just your PhD dataBecause you and other might want to know where you are goingBecause it saves money in the long runBecause it leads to better quality research and enables high quality curation and reuse
5 minutes discussion in groups of 3-4...Think how to engage them ā Research data story ā¦
Witt & Carlson (2007) āConducting a Data Interviewā Scientist
Different influences -> different plansBroader: country, body of foundation, outcome ā commercial or public domain, weather the work is reproducible or notFounder: desirable place for long-term curation, data in certain formatsInternal requirements: institutional repositories, self-imposed ethics, softer influences related to disciplinary difference or even personal preferencesPublisher: ownership of copyright signed over not compatible with institutional policiesLegal: the UK/EU legislation ā such as resent Dropbox issue ā safety harbour agreement. Legal: Example with paediatric research, legal requirements to seek consent one children are grown up
One of the major challenges is communication between academics and other stakeholdersRO ā RDM pages, a key hub for all RDM activities. Explore RDM resources libraryYou will have (RD) our (CIS) support ā¦ invite us to discussion with academics when you will talk on DMP aspects ā¦
Stress RO role to point to the tool, to CIS where all of us could fill missing gaps in everybody's knowledge
DCC. (2013). Checklist for a Data Management Plan. v.4.0. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/data-management-plans http://www.dcc.ac.uk/sites/default/files/documents/resource/DMP_Checklist_2013.pdfAdministrative dataData collection, what and howDocumentation and metadataEthics and legal compliance, the FOI, IPR and DP actsStorage and backupSelection and preservationData sharingResponsibilities and resources
Sortable by name (so date first can be useful)Where version control is important, should be clear in the name. Do not just move to a different folder or name as ādraftā or āoldāDistinguishable: donāt have files with the same name in different folders, as this could end up causing confusion if files are copies elsewhere or re-used.
Sortable by name (so date first can be useful)Where version control is important, should be clear in the name. Do not just move to a different folder or name as ādraftā or āoldāDistinguishable: donāt have files with the same name in different folders, as this could end up causing confusion if files are copies elsewhere or re-used.Organisation; helps facilitate future retrievalContext; helps judge content without openingConsistency; benefits processing growing number of files
Think labels which helps to retrieve a document later, I might only remember part of the name, but context will help me to judge if this is the file Iām looking forSortable by name (so date first can be useful)Where version control is important, should be clear in the name. Do not just move to a different folder or name as ādraftā or āoldāDistinguishable: donāt have files with the same name in different folders, as this could end up causing confusion if files are copies elsewhere or re-used.
Think labels which helps to retrieve a document later, I might only remember part of the name, but context will help me to judge if this is the file Iām looking forSortable by name (so date first can be useful)Where version control is important, should be clear in the name. Do not just move to a different folder or name as ādraftā or āoldāDistinguishable: donāt have files with the same name in different folders, as this could end up causing confusion if files are copies elsewhere or re-used.
Capitalisation ā UNIX capitalisation might distinguish between two entirely different filesSearching for r will not find R
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Data inventory ā a simple MS Excel could be used. ESDS data inventory template example ā http://www.esds.ac.uk/aandp/create/datatemplate.xls
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Capitalisation ā UNIX capitalisation might distinguish between two entirely different files
Introduce Google search concept ā keywords phraseImportant if sharing the data on a repository.ODIN cover page - http://figshare.com/articles/D2_3_First_year_communication_report_including_results_from_first_year_event/843603DDI ā Data Documentation InitiativeMETS - Metadata Encoding and Transmission StandardTEI ā Text Encoding InitiativeQDE ā QuDEx ā Qualitative Data Exchange
Important if sharing the data on a repository.Emphasise ā not covered in detail here in session. But support will be available (check with Sebastian?)DDI ā Data Documentation InitiativeMETS - Metadata Encoding and Transmission StandardTEI ā Text Encoding InitiativeQDE ā QuDEx ā Qualitative Data Exchange
Examples ā Microsoft excel example to use? Older versions?Microsoft works files in earlier versions of Word.
Show j drive āsnapshotā exampleā¦
Show j drive āsnapshotā exampleā¦Mention Sebastianās Dropbox tipā¦
10-15 mins from end? Time to have a look, or ask Sebastian questions?Dryad: Data underpinning medical and science publications, traditionally strong in health and biomedical sciences. Primarily peer-reviewed, but also some non-peer reviewed such as dissertations and theses. Spreadsheets, photos, software code, video... Up to 10GB per publication.Sample search: ProteinFigshare: Not just data (but remember data is āsituationalā. Multidisciplinary. Majoprity usage amongst PhD students and postdocs. A lot of presentations, posters and diagrams, but also datasets, code and publications.Sample browse: ChemistryESRC Data Store: Social Sciences, linked to ESRC funded research projects. Not all data is accessible. May be metadata only. May link to other repositories where publications have been deposited.Sample Search: China OR Asia OR Brand OR Market OR Finance
Re3data: registry of research data repositoriesDatacite: service to provide unique DOIs to data sets for citation, but also include a register of data sets and repositories. Linked to Databib, another service for locating data repositories.
Engage for further discussion?
Overview of Twitter.. Donāt show how to create account ā on handout.Headlined / aboutProfile and home page@ Connections page (mentions and interactions)Search function (tweets, users, lists)