Research Data Management: obstacles faced by the novice data manager
1. CSIR Research Data Management:
the way forward
Louise Patterton
CSIRIS
September 2013
2. The future of research data in the CSIR
• Definition
• Global trends
• Current situation
• Problems
• Solving the problems
• Plan of action
• Policy
• Summary……………..
4. OBSTACLE
The concept of Research Data
Management (RDM)
new!
OBSTACLE
CSIR Research Data status quo
UNKNOWN!
OBSTACLE
CSIR Research Data policy
NON-EXISTENT!
OBSTACLE
Global trends
WE ARE FALLING
BEHIND!
6. Dear Colleague,
We will be repeating the NeDICC workshop for rookie data managers in Pretoria on 28 August. This very
successful workshop was launched at the 5th African Conference for Digital Scholarship and Curation during
June and due to demand we have decided to make it available in Gauteng as well. Space is limited so
unfortunately we can accommodate no more than 50 attendees.
Venue: CSIR Knowledge Commons
Date: 28 August 2013
Time: 09:00 - 14:00
Price: R399 VAT included
A light lunch will be served at 13:00.
The workshop will provide those who are starting out on the data management journey the opportunity to hear
how other rookie data managers are coping with the new challenges, where they find their information and who
they talk to. Delegates will also have the opportunity gain advice from those who have already engaged with
researchers and those who are providing their research clients with appropriate training. They will also have the
opportunity to hear from one institution where data management has become part of the way in which things get
done.
OBSTACLE #1: Unfamiliarity with ‘Research Data
Management‛ concept.........
35. Obstacle #4: Global trends............way
ahead
• Training tools: (courses, degrees)
* DMTpsych (psychology)
* Mantra (wide coverage)
* Cairo (creative arts)
* DATUM for Health (health studies)
* DataTrain (archaeology)
• Data Archives/Data Repositories/Data banks
* UK Data Archive (soc science in UK)
* National Space Science Data Center (space)
• Funder requirements
* DMP is essential
36. • UK: Legal requirements…all Research Councils now have
research data management policies, based on a set of
common principles formulated by Research Councils UK
• USA: National Institutes of Health: Data Sharing Policy:
Supports the sharing of research data and expects
researchers funded at $500,000 or more to include a data
sharing plan in their grant proposals
• USA: National Science Foundation (NSF): Dissemination and
Sharing of Research Results: Beginning January 18, 2011, NSF
will require grant proposals to include a supplementary
data management plan of no more than 2 pages. This
requirement is a new implementation of the long-standing
NSF Data Sharing Policy
• Australia: Monash University Policy Bank: The purpose of this
policy is to ensure that research data is stored, retained,
made accessible for use and reuse, and/or disposed of,
according to legal, statutory, ethical and funding bodies’
Global Research Data Management Policy
trends
37. Research Funders % elements
National Science Foundation (NSF) 53%
NSF Basic Research to Enable Agricultural
Development (BREAD)
59%
NSF Division of Earth Sciences (EAR) 65%
NSF Division of Ocean Sciences 59%
NSF Integrated Ocean Drilling Program 47%
NSF Ocean Acidification Research 59%
DOE Atmospheric Radiation Measurement
Program (ARM)
76%
National Aeronautics and Space
Administration (NASA) - Earth Sciences
65%
NIH - National Human Genome Research
Institute
88%
NIH - Genome-Wide Association Studies
(GWAS)
76%
American Heart Association 0%
Issues in Science and Technology Librarianship: Percent of
total data elements addressed by policy (Dietrich et al,
2012)
39. • ‚Homeless‛ data quickly become no data at all: curation NB
• There is no economic ‚magic bullet‛ that does not require
someone, somewhere, to pay: funding required
• What happens to valuable data when project funding ends:
long term planning required
• Additionally:
* infrastructure
* policy/guidelines/training
* team
• Data management planning does not happen in a vacuum
Some final points to ponder on….
41. THE WAY FORWARD:
Step 1:
• survey/audit/inventory
• aim: Research Data Management Practices
• questionnaire edited, refined
• ethics clearance
• target sample chosen: Research Group Leaders
• audio recording…transcribed
• all units, all Research Group Leaders
• confidentiality
• benchmark against similar studies
42. THE WAY FORWARD:
Step 2:
Analysis………………………………………………………………………………………
………………………………………………………..…..……………………………..
Step 3:
Recommendations:
• personnel
• infrastructure
• cost
Step 99:
• CSIR Research Data Policy
• Training/Guidelines
• Data Repository
• Sharing
This is such a new field that one cannot but really focus on obstacles blocking the way……
So…have decided to rename my presention: “Excuse me sir, do you have a minute to talk about carrots”.Yes, it might not make sense now, but it should….in a minute or two.
This a cut and paste from the ad we emailed to the SA Online User Group as well as other Library-and Information Science groups. The natire of replies received back from the professional community indicate that research data management is at the moment still a very confusing subject, or field.
For this reason, I am going back to basics, and will explain the concept of research data management in the simplest possible way.
This is a carrot cake factory.
The CSIR carrot cake factory, to be more precise.
Our products are not new research, articles, conference papers, or technology demonstrators….but carrot cake…and carrot salad. It is liked and loved and very popular….we are a national household name and our products have even made a name globally.
For many years now, clients have been happy with the carrot cake….when suddenly things changed. They now…in addition to carrot cake….would like to buy the carrots and make their own products!
Which brings us to the following dilemma: do we even know what is going on with our carrots? Can we supply something that we have not really paid attention to?This is the crux of my analogy: carrots are the datasets. Carrot cake and salad…..the articles and discoveries and scientific breakthroughs. So what is needed now is a detailed inventory or audit into the carrots to establish what the current situation is like.
(I hope the analogy is by now clear to all……?)
In this audit, we need to establish the origins and harvesting of the carrots (how data was collected, or generated, how deciding on the data was done)
We need to establish the quality of the carrots…..
The various growth stages it goes through (data versions, rate of growth)
How the data is sorted, file formats used…..
Where do you store the carrots? In a dilapidated storeroom or archive……
Or a modern state-of-the-art warehouse (this is actually a real farm warehouse…..)
Is the data protected from corruption or damage? Is there a data disaster recovery plan in place?
How is the data grouped? What are the naming conventions used? How are the various versions named? How about renaming…how is that done? How is data retrieved when searched for?
Is data documentation done? Are codebooks, data dictionaries, instrument calibration and other procedures or aspects crucial to data understanding, documented?
Will data be shared? Is access restricted…and how is access controlled? Are embargoes ever required?
When data is shared, how will it be done? Will a web-browser be used, or is FTP the preferred method?
What about data misuse, or misinterpretation? What are the dangers of another researcher tarnishing the original data collector’s reputation?
Finally….data destruction? Will data ever need to be destroyed, and if so, what are the procedures/methods to be used?