Why manage your research data is considered.
Five aspects of data maturity are examined; storage, description/discovery, people, training and governance.
CDU Library are interested to engage with researchers and Colleges around the FAIR data agenda and research data management (RDM) capability more generally. The training would allow users to assess and reflect to what extent they are managing their research data, with a view to making more data more FAIR. Where there are gaps the participant would reflect and indicate where to access support from CDU to better manage their research data, and identify potential improvement of making their research data FAIR.
The National eResearch and Data management landscape - CDU Data Readiness training - Nov 18
1. CDU - DATA READINESS
THE NATIONAL ERESEARCH & DATA MANAGEMENT
LANDSCAPE
SLIDES: @VALUEMGMT CC-BY*: DR RICHARD FERRERS, ARDC
*except where indicated. 19-20 November 2018, Darwin NT
Enabling 21st Century Research
2. ARDC - IMAGINING THE RESEARCH DATA COMMONS
20TH CENTURY RESEARCH
▸ Citations -> Grants
▸ Results -> Journal -> Citations
▸ Data -> drawer
▸ Focus: journal impact factor, citations,
grants
=> Research in the age of the Journal
3. ARDC - EMPOWERING 21ST CENTURY RESEARCH
WHAT IS THE PROBLEM WITH 20TH CENTURY RESEARCH?
▸ Scarcity and declining government research funding
▸ Power of journals, long delays in publishing
▸ Strong research competition for grants; low return on effort
▸ Deluge of literature; not enough time
▸ Incentives: promotion, funding due to citations, journal impact factor
=> stress, stress, stress
4. ARDC - ACCELERATING RESEARCH
A VISION FOR 21ST CENTURY RESEARCH
▸ data intensive; data commons (FAIR)
▸ globally collaborative (FA)
▸ cross-disciplinary (IR)
▸ Focus; data/software/services ->
impact -> value creation
=> research in the age of the Internet
machine-actionable | scalable
7. 1. THE DATA DELUGE
Copyright xtremesport4u.xom Permission requested
8. ARDC - FOR DATA DRIVEN RESEARCH
1.1 THE DATA DELUGE EVIDENCE - DATA DOIS MINTED (@SEPT ’18) / PAPERS
World AU Figshare CrossRef
Total 15,200,000 276,000* 1,060,000 100M
2018 2,700,000 62,000 219,000 8M
2017 3,200,000 45,000 318,000 7M
* Paradisec = 237,000; one project, ten staff.Source: stats.datacite.org; data.crossref.org/reports/statusReport.html
12. FUNDERS - PUBLISHING DATA
1.5 AUST FUNDERS - ENVIRONMENTAL $44M
▸ NESP data guidelines- ‘research outputs
including data and information products must
be publicly available… openly accessible
(website or repository)… with a persistent link’.
▸ with an open licence
▸ within 12 months of finished data collection
▸ data management planning essential, inc.
description standards, contact person, methods
and procedures
14. 2. NATIONAL AND INTERNATIONAL
DEVELOPMENTS
TOWARDS 21ST CENTURY RESEARCH
15. ARDC - DIGITALLY TRANSFORMING, ACCELERATING AUSTRALIA'S 21ST CENTURY RESEARCH
2. NATIONAL / INTERNATIONAL DEVELOPMENTS IN DATA, RESEARCH
▸ Int’l - Royal Society (UK) - Science as an Open
Enterprise (2012)
▸ AU - National Data Statement (NISA)
(Dec 2015) | National Data Commissioner (Jul ’18)
▸ Int’l - China; Statement on Scientific Data
Management Measures (April 2018)
▸ Int’l - US; Statement of National Academies
- Science, Engineering and Medicine - Open Science by Design.
(July 2018)
17. RESEARCH DATA ALLIANCE - 7,000
SCIENTISTS; 137 COUNTRIES;
BUILDING BRIDGES TO DATA SHARING
2.2 RD-A in a nutshell
ARDC - ENABLING 21ST CENTURY RESEARCH
18. 2.3 AU National Data Statement (2015); innovation.gov.au
ARDC - FOR DATA INTENSIVE RESEARCH
”THE AUSTRALIAN GOVERNMENT COMMITS TO OPTIMISE THE USE AND
REUSE OF PUBLIC DATA.…
AUSTRALIAN GOVERNMENT ENTITIES WILL:
- MAKE NON-SENSITIVE DATA OPEN BY DEFAULT … [FOR] INNOVATION
AND PRODUCTIVITY [FOR] ALL …
- WHERE POSSIBLE, ENSURE NON-SENSITIVE PUBLICLY FUNDED
RESEARCH DATA IS MADE OPEN FOR USE AND REUSE”
19. ”THE NEW NATIONAL DATA COMMISSIONER WILL BE THE TRUSTED
OVERSEER OF A NEW DATA SHARING AND RELEASE FRAMEWORK,
ALLOWING AUSTRALIA TO REALISE THE FULL POTENTIAL OF DATA
WHILE MAINTAINING PUBLIC TRUST IN THE DATA SYSTEM” (LINK)
2.4 National Data Commissioner appointed 01.07.18
ARDC - DATA COMPUTE CLOUD PEOPLE
20. LET THE OPENING OF SCIENTIFIC DATA
BECOME THE NORM ”ALL SCIENTIFIC DATA
GENERATED IN CHINA MUST BE SUBMITTED TO
GOVERNMENT-SANCTIONED DATA CENTERS BEFORE
APPEARING IN PUBLICATIONS [FROM TODAY].” SCIENCE
2.5 Scientific Data Management Measures - China State Council
17.03.18
ARDC - POWERING 21ST CENTURY RESEARCH
21. “RESEARCH FUNDERS SHOULD PROVIDE
EXPLICIT AND CONSISTENT SUPPORT
FOR PRACTICES ... THAT FACILITATE
[OPEN SCIENCE] SHIFT IN CULTURE AND
INCENTIVES…” (P.130)
2.6 US National Academies of Science, Engineering and Medicine
(Jul 2018)
ARDC - ENABLING 21ST CENTURY RESEARCH
22. “RESEARCH INSTITUTIONS SHOULD WORK TO
CREATE A CULTURE THAT ACTIVELY SUPPORT
OPEN SCIENCE… BY BETTER REWARDING &
SUPPORTING RESEARCHERS ENGAGED IN
OPEN SCIENCE PRACTICES…” (P.130)
2.7 US National Academies of Science, Engineering and Medicine
(Jul 2018)
ARDC - ENABLING 21ST CENTURY RESEARCH
24. 3. NCRIS - NATIONAL RESEARCH
INFRASTRUCTURE INVESTMENT
ENABLING 21ST CENTURY RESEARCH
25. 3.1 WHAT IS NCRIS?
$1.9 BILLION CAPITAL (2018 - 30) [1]
$1.5 BILLION OPERATING (2015 - 25) [2]
Implementing 21st Century Research
in Australia - the Federal Government response
National Collaborative Research Infrastructure Strategy
26. 3.2 NCRIS -“GREAT RESEARCH
INFRASTRUCTURE ATTRACTS AND
NURTURES TALENT AND UNDERWRITES A
NATION’S REPUTATION FOR HIGH-IMPACT
RESEARCH”
AU Chief Scientist Alan Finkel, National Research Infrastructure
Roadmap (2016)
ARDC - ACCELERATING RESEARCH
27. ARDC - POWERING 21ST CENTURY RESEARCH
3.3 NCRIS NINE FOCUS AREAS
…. Australia … as an emerging or established global leader:
•Digital Data and eResearch Platforms
•Platforms for Humanities, Arts and Social
Sciences
•Characterisation
•Advanced Fabrication and Manufacturing
•Advanced Physics and Astronomy
•Earth and Environmental Systems
•Biosecurity
•Complex Biology
•Therapeutic Development
Dept of Education
32. ARDC - DIGITALLY TRANSFORMING RESEARCH
4.2 WHAT IS ARDC?
▸ $20M pa operating budget, post 01.07.18
▸ $77M capital investment (2018-22)
▸ Cloud compute | storage
▸ Discipline focussed virtual labs
▸ Research data management
▸ Five year Strategic Planning
- about to commence
▸ Research Data Australia - the national data catalogue
33. ARDC - ENABLING 21ST CENTURY RESEARCH
4.3 DATA ENHANCED VIRTUAL LABS
Marine | Astro | Geo | Eco Cloud
Humanities / Social Science
Bio
Characterisation Imaging
Agriculture (Link)
More platforms
34. ARDC - TALKING DATA
4.4 WHAT COULD GO WRONG WITH YOUR DATA?
▸ Forgotten
▸ Lost
▸ Ignored
▸ Undocumented
35. ARDC - ENABLING 21ST CENTURY RESEARCH
4.5 WHAT DO I DO? RSCH DATA MANAGEMENT
=> FAIR data
▸ Findable | Accessible
▸ Interoperable | Reuseable
41. ARDC - ACCELERATING RESEARCH
1.DATA STORAGE
▸ The Code of Responsible Conduct of Research
2018
▸ ANDS Guide on Data Storage
▸ Advanced / Basic features of storage
▸ CDU Policy / Procedure -> see Governance
42. ARDC - TRANSFORMING RESEARCH
1.1 THE CODE - STORE AND SHARE RESPONSIBLY WHERE APPROPRIATE
▸ P3 Research Principle; Transparency
▸ “reporting… share and communicate… data and findings …openly, responsibly and
accurately”
▸ R8 Responsibilities of Institutions
▸ “provide facilities for safe and secure storage of research data… where possible and
appropriate allow access and reference”
▸ R22 Responsibilities of Researchers
▸ “retain clear, accurate, secure and complete records of all research including research data…
where possible and appropriate, allow access and reference to these by interested parties”
43. ARDC - ENABLING 21ST CENTURY RESEARCH
1.2 ANDS GUIDE TO DATA STORAGE - TYPES
▸ Working data (local, individual, project)
▸ Institutional repository / data store eg QUT
▸ National data storage eg RDS (40PB), Cloudstor (1Tb per person; 10Tb / Uni)
▸ Cloud data storage eg Amazon, Microsoft, figshare
▸ Discipline repository eg Dryad, Worldwide Protein Data Bank
44. ARDC - FOR NEXT GENERATION RESEARCH
1.3 ANDS - FEATURES OF DATA STORAGE
▸ Reliability / durability eg local vs Institutional
▸ Discoverability eg data store vs (discipline)
repository
▸ Backup eg institutional vs Cloudstor
▸ Working vs published data eg changing v
locked
▸ speed/convenience of access eg tape vs SSD
▸ size eg Institutional (1TB) vs National (1PB)
45. ARDC - EMPOWERING RESEARCHERS
1.4 DATA STORAGE - BASIC FEATURES
▸ secure
▸ reliable / durable
▸ convenient
▸ accessible
▸ backed up
▸ Source: ANDS Guide - Data Storage
46. ARDC - ACCELERATING RESEARCH
1.5 DATA STORAGE - ADVANCED FEATURES
▸ internal collaboration
▸ external collaboration
▸ publishable (durable, discoverable)
▸ self-service vs request
▸ challenges; sensitive - encrypt
▸ challenges; privacy - remain within jurisdiction
▸ Source: lessons from engaging with
Institutions
47. ARDC - ENVISIONING 21ST CENTURY RESEARCH
2. DATA DESCRIPTION / DATA DISCOVERY
▸ Research Data Australia (RDA) - the national data
catalogue
▸ RDS med.data - 90 collections - 2200 Tb
▸ RDS earth systems - 40 collections - 9397 Tb
▸ CDU examples of medical/environmental data
descriptions published
▸ Scabies Repository [1] [CDU eSpace]
▸ NT Datalink Child cohort (N = 50,000) [2]
▸ Oz Flux Tower, near Katherine (Tern)
48. ARDC - EMPOWERING RESEARCHERS
2.1 WHAT IS RDA?
▸ The national research data catalogue
▸ harvested from research institutions eg CDU
▸ 140,000+ data descriptions available
▸ 51,000+ open datasets available
▸ 59,000+ grant descriptions available
▸ faceted | structured | API accessible
▸ researchdata.ands.org.au
49. ARDC - ENABLING 21ST CENTURY RESEARCH
2.2 WHO CONTRIBUTES TO RDA?
▸ Universities
▸ State / Federal Gov - data.gov, data.vic.gov.au
▸ Federal research groups - CSIRO, ALA, Tern
▸ NCRIS facilities - AURIN - urban planning
▸ Data intensive research groups - Paradisec -
Pacific languages
50. ARDC - POWERING 21ST CENTURY RESEARCH
2.3 MORE DATA TO DISCOVER
▸ Another 2,000 research data repositories
▸ _________________________________
▸ Suggested Researcher ACTIONS:
▸ List your data on your CV
▸ Describe your data - hint: talk to a data librarian
▸ Publish your data descriptions
▸ Publish your data*
(* where appropriate)
51. ARDC - DIGITALLY TRANSFORMING RESEARCH
2.4 NT DATALINK - CHILD COHORT
▸ Description
▸ Aggregated by
▸ Institution
▸ View count
▸ Subjects
▸ Identifier
▸ Data time period
52. ARDC - TRANSFORMING RESEARCH
2.5 SCABIES REPOSITORY
▸ Description | Access the data
▸ Aggregated by, Principal Investigator
▸ Institution
▸ Grant ID
▸ Subjects
▸ Identifier eg DOI: 10.5061/dryad.014v6/1
▸ Related Publications: PLOS
53. ARDC - ACCELERATING RESEARCH
3. DATA PEOPLE / DATA SUPPORT
▸ data scientist; using eresearch data tools
▸ data mentor; training phd
▸ data trainer eg Library Carpentry, Software
Carpentry, Data Carpentry (Resbaz)
▸ data managers; manage repository
▸ data librarians; describing data, engaging
▸ data technologists; building tools/platforms
▸ data champions; promoting data
54. PEOPLE
3.1 ERESEARCH / DATA
MANAGEMENT NEEDS
data scientist
data mentor
data librarian
data technologist
data trainer data leader
56. ARDC - ACCELERATING RESEARCH
4. DATA TRAINING
▸ who; students, early career, support staff
▸ what; awareness, tools, process,
▸ where; national, local, institutional, discipline
▸ when; induction (sample), annual, milestone
57. ARDC - EMPOWERING RESEARCH
4.1 CDU - ANDS / ARDC DATA TRAINING
▸ ANDS - CDU 2015; RDM 101 - why care about
data? Sharing data to raise your impact? More
▸ ARDC - CDU 2018; RDM 202 - data readiness,
national eResearch capability; store, describe,
train, people, governance
▸ CDU - Data Management Lib Guide (2013)
▸ National list of data resources, policies, training
58. ARDC - DATA DRIVEN RESEARCH
4.2 NATIONAL DATA MANAGEMENT & ERESEARCH TRAINING
▸ 23 Research Data Things - self-paced
▸ 10 Health and Medical Things | 10 Sports Science Things
▸ 10 Ecological Things | ecoEd - workshop modules for UG
▸ ANDS Sensitive Data Guide | Nectar Cloud Computing - self-paced training
▸ ANDS data mgmt Guides and presentations eg Health & Med Webinar series
▸ Other:23 Data Impact Case Studies; Agriculture, Anthropology, Chemistry,
History
59. ARDC - TRANSFORMING RESEARCH
4.3 10 HEALTH AND MEDICAL (SPORTS, ECO) THINGS
▸ Getting started | Issues in research data mgmt
▸ Data discovery
▸ Sensitive data; share?
▸ Publishers and funders
▸ Data citation | data identifiers
▸ Describing data | data management plans*
▸ Data publishing | Licensing data | Spatial data^
▸ * 10 Sports Science Things only
▸ ^ 10 Eco Data Things only
60. ARDC - ENVISIONING 21ST CENTURY RESEARCH
4.4 INSTITUTIONAL DATA TRAINING - EXAMPLE
▸ University of Melbourne - Doing Data Better - Managing Data @Melbourne
▸ 1. Getting started with research data
▸ 2. Developing your data management plan
▸ 3. Ethical and legal issues
▸ 4. Organising, storing and backing up your data
▸ 5. Document and describing your data
▸ 6. Sharing and preserving your data
Copyright: The University of Melbourne 1994-2017
61. ARDC - DATA DRIVEN RESEARCH
4.5 RESEARCH TOOLS & DATA TRAINING - INTERNATIONAL INITIATIVES
▸ Software Carpentry - self-paced or classroom, train the trainer (Resbaz AU 2019)
▸ Programming in R | Python |
▸ Automating tasks and version control in Git | Unix
▸ Data Carpentry
▸ Data Organisation | Analysis | Visualisation
▸ Ecology | Genomics | Social Science | Ocean/Atmosphere | Astro
▸ Library Carpentry
▸ Intro to data | Unix | Git | Data cleaning | SQL | Python | Archiving data
Resbaz History
63. ARDC - ACCELERATING RESEARCH
5. DATA GOVERNANCE
▸ The Code of Responsible Conduct of Research
2018
▸ National Statement on Ethical Conduct 2018
▸ Funders eg NHMRC Data Statement
▸ CDU Institutional Policy / Procedure
▸ Local Departmental guidance, culture
64. ARDC - TRANSFORMING RESEARCH
5.1 THE CODE - RESPONSIBILITIES AND PRINCIPLES
▸ P8 Research Principle; Promotion
▸ “promote and foster a research culture and environment that support responsible
conduct of research”
▸ R3 Responsibilities of Institutions
▸ “develop policies and procedures which ensure institutional practices are consistent
with principles and responsibilities of The Code”
▸ R14 Responsibilities of Researchers: “support a culture of responsible practice”
▸ R16 “promote education and training in responsible research conduct”
65. ARDC - EMPOWERING RESEARCH
5.2 NATIONAL STATEMENT ON ETHICAL CONDUCT IN HUMAN RESEARCH (2018)
▸ 3.1.45 “For all research, researchers should develop a data management plan… collection, use,
storage, access … sharing and re-use of data and information”
▸ 3.1.37 “when…collect information… considered to be of long term value… should obtain
consent for perpetual retention”
▸ 3.1.56 “when planning to share data or information with other researchers… or add them to a
databank, researchers must develop a data management plan (see 3.1.45)”
▸ 3.1.60 “Researchers should be aware of expectations and policies regarding sharing .. of
participant data… and should consider the value of the data or information for future research…
if consent to future use was not obtained at the time of collection, then reviewers considering
proposed re-use of this data… may consider a waiver of consent…or seek additional consent”
66. ARDC - TRANSFORMING RESEARCH
5.3 FUNDERS - NHMRC OPEN ACCESS POLICY (2018)
▸ “NHMRC acknowledges the importance of making research data publicly accessible
and therefore strongly encourages researchers to consider the reuse value of their
data and to take reasonable steps to share research data and associated metadata
arising from NHMRC supported research. “ (4.2 Research Data p.7)
▸ “The level of access may range from highly restricted …to fully open access.”
▸ “When sharing data, researchers should ensure that appropriate metadata [ie
description] accompany the datasets ”
▸ “A further example in which sharing of data is crucial is during public health
emergencies. ”
67. ARDC - ACCELERATING RESEARCH
5.4 LOCAL GOVERNANCE
▸ Culture of responsible research practice - The Code -
safe and secure data storage eg Mission (La Trobe),
Governance (UNSW)
▸ Training including induction, support staff, develop
local processes eg IMAS
▸ Ethical committees - national statement - data
management plan
▸ Institutional Governance - data management plan
▸ Recommended: IT, Library, Researchers, Office of
Research - ongoing conversation to improve data
management processes
68. ARDC - DIGITALLY TRANSFORMING RESEARCH
5.5 CDU GOVERNANCE - RDM PROCEDURES
▸ “Research data management is a shared
responsibility”
▸ “store and maintain data securely.. and backed
up… and copy should be … housed remotely”
▸ “provide a copy of data to Rsch & Innovn prior
to …publication of the research”
▸ “all researchers … are required to develop a
Research Data Management Plan at the
commencement of each research project”
69. ARDC - ENVISIONING 21ST CENTURY RESEARCH
6. KEY TAKEAWAYS
▸ DATA IS OUR VISION FOR THE FUTURE
▸ DATA IS ABUNDANT
▸ DATA NEEDS DESCRIPTION AND STORAGE
▸ DATA NEEDS TO BE CONNECTED
▸ EXPLOIT NATIONAL INVESTMENT IN DATA
▸ MAKE YOUR DATA FAIR OR OPEN
▸ BUILD YOUR DATA LEGACY - add Data
Descriptions to your CV
70. ARDC - TRAINING RESEARCHERS IN FAIR DATA
6.1 HOW DO I DO - FAIR DATA?
▸ Describe your data (FA)
▸ Licence your data (R)
▸ Publish your data descriptions to your CV (A)
▸ Store your data (safely, long term, identifiably) (F)
▸ Promote your data (A)
▸ Connect your data (grants, publications,
literature, problem statements) (I)
=> Estimated Cost: 1 FTE per 100 Rschrs
71. ARDC - ENVISIONING 21ST CENTURY RESEARCH
6.2 WHAT CAN I DO RIGHT NOW?
▸ Get an ORCiD - a permanent identifier for you
eg RICHARD FERRERS, PAUL WONG
▸ Describe your data assets in your CV (LinkedIn)
▸ Get DOIs for your most valuable data (Figshare,
CDU)
▸ Cite your data in your publications eg AJTDE
▸ Link your data descriptions to publications,
grants, your research team, key literature
72. “IN 100 YEARS, IT WILL BE MY DATA
THAT IS MY LASTING LEGACY.”
6.3 Prof. Craig Johnson,
ARDC - EMPOWERING 21ST CENTURY RESEARCH
data leader
74. … SUPPORT THOSE AROUND YOU TO
MAKE THE CHANGE TO 21ST CENTURY
RESEARCH…
6.5 We are all journeying together
ARDC - ENABLING 21ST CENTURY RESEARCH
75. RICHARD FERRERS, ARDC-CDU LIAISON
RESEARCH DATA SPECIALIST
MYDATA - GOOGLE: FERRERS FIGSHARE
ARDC - ACCELERATING 21ST CENTURY RESEARCH
ORCiD: LinkedIn
Figshare: MyData
@valuemgmt
richard.ferrers@ardc.edu.au