Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Best Practice in Data Management and Sharing

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 179 Publicité

Best Practice in Data Management and Sharing

Télécharger pour lire hors ligne

It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References

It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Best Practice in Data Management and Sharing (20)

Publicité

Plus récents (20)

Best Practice in Data Management and Sharing

  1. 1. Best Practice in Data Management and Sharing Mojtaba Lotfaliany; MD, PhDc PhD Student @ Non-Communicable Disease Control, School of Population and Global Health, University of Melbourne Researcher @ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences
  2. 2. First things first! We deeply appreciate the contribution of following organizations. Most of the information in this presentation was derived from the Australian National Data Service, UK Data Service, the UK data achieve and the valuable book entitled ““Managing and Sharing Research Data: A Guide to Good Practice” . Non Communicable Disease Unit
  3. 3. What is this presentation about? • Research funders are increasingly mandating open access to research data • Governments internationally are demanding transparency in research • The economic climate is requiring much greater reuse of data • Fear of data loss calls for more robust information security practices. • Journal publishers increasingly require submission of the data upon which publications are based for peer review. • Researchers and data users recognize the long-term value of well-prepared data
  4. 4. What is this presentation about? All these factors mean that researchers will need to improve, enhance and professionalize their research data management skills to meet the challenge of producing the highest quality shareable and reusable research outputs in a responsible and efficient way.
  5. 5. What is this presentation about? Robust research data management techniques give researchers and data professionals the skills required to deal with the rapid developments in the data management environment. This presentation contains brief introduction of most important data management and data sharing skills. This presentation aims to help researchers to implement data management (and sharing) policies in order to maximize openness of data, transparency and accountability of research they support.
  6. 6. What is this presentation about? Introduction: What Is “Research Data”? and Data Lifecycle Part 1: • Why Manage Your Data? • Formatting and organizing the data • Storage and Security of Data • Data documentation and meta data • Quality Control • Version controlling • Working with sensitive data • Controlled Vocabulary • Centralized Data Management
  7. 7. What is this presentation about? Part 2: • Data sharing • What are publishers & funders saying about data sharing? • Researchers’ Attitudes • Benefits of data sharing • Considerations before data sharing • Methods of Data Sharing • Shared Data Uses and Its’ Limitations • Data management plans • Brief summary • Acknowledgment , References
  8. 8. What is “Research Data”
  9. 9. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical
  10. 10. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research
  11. 11. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research • Data files • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols
  12. 12. What Is “Research Data”? Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. • Observational • Experimental • Simulation • Derived or compiled • Reference or canonical • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artifacts, specimens, samples • Collection of digital objects acquired and generated during the process of research • Data files • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols • Correspondence including electronic mail and paper-based correspondence • Project files • Grant applications • Ethics applications • Technical reports • Research reports • Master lists • Signed consent forms
  13. 13. How data differs across disciplines RCSB Protein Data Bank Australian Data Archive
  14. 14. Data Lifecycle
  15. 15. Data Management
  16. 16. Why Manage Your Data? Effective research data management of medical, health and clinical data is increasingly recognised as a critical part of the research process. It enables: • Trust in data you obtain for reuse from other sources • Reproducibility of research through increasing veracity of data • Increased quality of your research • Strengthening of researchers’ reputation through increased citations and reach of all research outputs
  17. 17. Why Manage Your Data? Effective research data management of medical, health and clinical data is increasingly recognised as a critical part of the research process. It enables: • Trust in data you obtain for reuse from other sources • Reproducibility of research through increasing veracity of data • Increased quality of your research • Strengthening of researchers’ reputation through increased citations and reach of all research outputs • Increased connectivity between all research outputs, and researchers • More efficient use of scarce research funds • Data description for sharing and collaboration • Reduced risk of loss or corruption of data
  18. 18. Why Manage Your Data? By data management we mean all data practices, manipulations, enhancements and processes that ensure that research data are of a high quality, are well organized, documented, preserved, sustainable, accessible and reusable
  19. 19. Why Manage Your Data? Video As you watch the cartoon jot down the data management mistakes could those mistakes in the cartoon have been avoided.
  20. 20. Why Manage Your Data?
  21. 21. Formatting and organizing the data
  22. 22. Choosing File Formats • All digital data exist in specific file formats; the form in which information is coded so that a software program can read and interpret those data. • A particular file format is usually linked to a specific software program. • If the same file is to be read by a different program it may need to be converted.
  23. 23. Choosing File Formats • Format best suited for data creation • Format best suited for data analyses and other planned uses; • Format best suited for long-term sustainability and sharing of data
  24. 24. Choosing File Formats • Non-proprietary or open (CSV vs. MS Excel) • Lossless format (TIFF vs. JPEG) • Common, used by the research community (SPSS) • Standard representation (ASCII, Unicode) • Easy to track changes • Easy to be converted without data loss • Minimal human intervention
  25. 25. Data conversion • To present the data • To analysis the data in different package • To convert images to texts (OCR software) • Data preservation o After any data conversions, they should be checked for any error, changes, or lost.
  26. 26. File Names • Sensible file names and well-organized folder structures make it easier to find and keep track of data files. • Develop a naming system that works for your project and use it consistently. • Good file names can provide useful cues to the content, status and version of a file, can uniquely identify a file and can help in classifying and sorting files.
  27. 27. Best Practice for File Naming • Create meaningful but brief names • Use file names to classify broad types of files • Do not use spaces, dots and special characters such as $ or ? or ! • Use hyphens '-' or underscores '_' to separate logical elements in a file name • Avoid very long file names • Reserve the 3-letter file extension for application-specific codes that represent file format. such as .doc, .xls, .mov and .tif
  28. 28. Best Practice for File Structure • Think carefully how best to structure files in folders • When working in collaboration, the need for an orderly structure is even higher. • Consider the best hierarchy for files, deciding whether a deep or shallow hierarchy is preferable.
  29. 29. Best Practice for File Structure Research project files could be organized according to: • Research activity, such as interviews, surveys or focus groups; • Data type, such as images, text or database; • Kind of material, for example, publications, deliverables or documentation.
  30. 30. Organize Files Logically Make sure your file system is logical and efficient Project 1 Time_point1 Time_point2 Biomarkers Anthropometrics Biodiv_H20_heatExp_2005_2008.csv Biodiv_H20_predatorExp_2001_2003.csv Biodiv_H20_planktonCount_start2001_active.csv Biodiv_H20_chla_profiles_2003.csv Project Name Location Experiment Name Date File Format
  31. 31. Storage and Security of Data
  32. 32. Best Practice in Storing Data and Preservation • Store data uncompressed in non-proprietary or open standard formats for long-term software readability • Copy or migrate data files to new media every two to five years • Check the data integrity of stored data files at regular intervals.
  33. 33. Best Practice in Storing Data and Preservation Store data uncompressed in non-proprietary or open standard formats for long-term software readability Copy or migrate data files to new media every two to five years Check the data integrity of stored data files at regular intervals. • Organize and label stored data clearly so they are easy to locate and physically accessible • Ensure that areas and rooms for storage of digital or non-digital data are fit for the purpose, structurally sound and free from the risk of flood and fire • Create digital versions of paper-based data or information in PDF/A format for long-term preservation and storage.
  34. 34. Backup Your Data • Reduce the risk of damage or loss • Use multiple locations (here, near, far) • Create a backup schedule • Use reliable backup medium • Test your backup system (i.e., test file recovery)
  35. 35. Physical data security • Controlling access to rooms and buildings where data, computers or media are held; • Logging the removal of, and access to, media or hardcopy material in store rooms; • Transporting sensitive data only under exceptional circumstances.
  36. 36. Network Security • Not storing confidential data such as those containing personal information on servers or computers connected to an external network, particularly servers that host Internet services • Firewall protection and security-related upgrades and patches to operating systems to avoid viruses and malicious codes.
  37. 37. Security of computer systems • Locking computer systems with a password and installing a firewall system • Protecting servers by power surge protection systems through line-Interactive uninterruptible power supply (UPS) systems • Imposing non-disclosure agreements for managers or users of confidential data • Not sending personal or confidential data via email or other file transfer means without first encrypting them • Remembering that file-sharing services such as Google Docs and Dropbox may not be suitable for certain types of information.
  38. 38. Data Encryption
  39. 39. Access controlling and security • Needing specific authorization from the data owner to access data • Placing confidential data under embargo for a given period of time until confidentiality is no longer pertinent • providing access to approved researchers only • providing secure access to data through enabling remote analysis of confidential data but excluding the ability to download data • Mixed levels of access regulations
  40. 40. Mixed levels of access regulations
  41. 41. Data documentation and Meta Data
  42. 42. Data documentation • The collective term 'data documentation' includes information on why and how data were created, prepared or digitized, what they mean, what their content and structure are, and any alterations or coding that may have taken place. • Good documentation is critical for understanding data in the short, medium and longer term; and is vital for successful long-term data preservation.
  43. 43. Data documentation levels Data documentation requires descriptive material at two levels. • The high-level information, commonly known as study- level or describes the research project, the data creation processes, rights and general contexts. • The data-level information covers descriptions and annotations at the file and within-file level. Metadata are a specific subset of data documentation that provides structured searchable information
  44. 44. Good study-level data documentation includes: • Research design and context of data collection • Data collection methods • Structure of data files, with number of cases, records, files and variables, as well as any relationships among such items; • Secondary data sources used and provenance • Data validation, checking, proofing, cleaning and other quality assurance procedures
  45. 45. Good study-level data documentation includes: • Research design and context of data collection • Data collection methods • Structure of data files, with number of cases, records, files and variables, as well as any relationships among such items; • Secondary data sources used and provenance • Data validation, checking, proofing, cleaning and other quality assurance procedures • Modifications made to data over time since their original creation and identification of different versions of datasets; • Information on data confidentiality, access and any applicable conditions of use; • Publications, presentations and other research outputs that explain or draw on the data.
  46. 46. Data-level data documentation Metadata can be generated manually, or it can be created automatically. Within data base or in separate files
  47. 47. Within file meta data
  48. 48. Within file meta data
  49. 49. Data Dictionary Project Documentation Dataset Documentation • Context of data collection • Data collection methods • Structure, organization of data files • Data sources used • Data validation, quality assurance • Transformations of data from the raw data through analysis • Information on confidentiality, access and use conditions • Variable names and descriptions • Explanation of codes and schemas used • Algorithms used to transform data • File format and software (including version) used
  50. 50. Data Dictionary Structure
  51. 51. Good data-level data documentation (for tabular data) includes: • Names, labels and descriptions • Value code labels • Coding and classification schemes • Codes for missing values • Derived data • Weighting and grossing variables
  52. 52. Quality Control
  53. 53. Data Quality Control • They are fit for their intended uses in operations, decision making and planning. • If the ISO 9000:2015 definition of quality • Completeness • Validity • Accuracy • Consistency • Availability • Timeliness
  54. 54. Data Quality Control in Data Entry • Calibration of instruments • Taking multiple measurements, observations or samples • Checking the truth of the record with an expert; • Using standardized methods and protocols for capturing observations • Customize questions
  55. 55. Data Checking During data checking, data are edited, cleaned, verified, cross-checked and validated. • Double-checking coding of observations or responses and out-of-range values • Checking data completeness • Verifying random samples of the digital data against the original data • Double entry of data • Statistical analyses such as frequencies, means, ranges or clustering to detect errors and anomalous values • Proof-reading transcriptions • Peer review
  56. 56. Data Quality Control • Misleading data • Duplicate data • Incorrect data • Inaccurate data • Non-integrated data • Data that violates business rules • Data without a generalized formatting • Incorrectly punctuated or spelled data
  57. 57. Data Quality Control • Manually? • OpenRefine (formerly Google Refine) is a valuable open source tool that is similar to Excel but more powerful. You can use it to: record data; manipulate data; clean up dirty data; and to transform datasets. • Other alternatives
  58. 58. Version controlling and tracking
  59. 59. What a version is? • A version is “a particular form of something differing in certain respects from an earlier form or other forms of the same type of thing”. • In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. • Versioning is one means by which to track changes associated with ‘dynamic’ data that is not static over time.
  60. 60. What a version is? • Scenario 1: a new observation is created and it should be added to the dataset • Scenario 2: an existing observation is removed and it should be deleted from the dataset • Scenario 3: an error was identified in one of the existing observation stored in the dataset and this error must be corrected.
  61. 61. Version controlling and tracking • Version information makes a revision of a dataset uniquely identifiable. • Uniqueness can be used by researchers to determine whether and how data has changed over time and to determine specifically which version of a dataset they are working with. • Explicit versioning allows for repeatability in research, enables comparisons, and prevents confusion.
  62. 62. Version controlling and tracking
  63. 63. Version controlling and tracking
  64. 64. Tools for version controlling
  65. 65. Version control tables
  66. 66. Best Practice in Version Controlling • Decide how many versions of a file to keep, • Identity milestone versions to keep • Uniquely identity different versions of files using a systematic naming convention, such as using version numbers or dates • Record changes made to a tile when a new version is created
  67. 67. Best Practice in Version Controlling • Decide how many versions of a file to keep, • Identity milestone versions to keep • Uniquely identity different versions of files using a systematic naming convention, such as using version numbers or dates • Record changes made to a tile when a new version is created • Record relationships between items where needed, • Track the location of files it they are stored in a variety of locations • Regularly synchronize files in different locations • Identify a single Location for the storage of milestone and master versions.
  68. 68. Working with sensitive data
  69. 69. What is sensitive data Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention.
  70. 70. What is sensitive data Some examples but not all:
  71. 71. Data de-identification A person's identity can be disclosed from: • Direct identifiers such as names, addresses, postcode information, telephone numbers or pictures • Indirect identifiers which, when linked with other publicly available information sources, could identify someone, for example information on workplace, occupation or exceptional values of characteristics like salary or age
  72. 72. Data de-identification • Removing direct identifiers, e.g. name or address • Aggregating or reducing the precision of information or a variable, e.g. replacing date of birth by age groups • Generalizing the meaning of detailed text, e.g. replacing a doctor's Detailed area of medical expertise with an area of medical specialty • Using pseudonyms • Restricting the upper or lower ranges of a variable to hide outliers, e.g. top- coding salaries • Consider Statistical Disclosure Techniques (SDC)
  73. 73. Data de-identification
  74. 74. Data de-identification Read more about techniques • 'Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers'. • The UK Data Service outlines approaches for de-identifying quantitative and qualitative.
  75. 75. Controlled Vocabulary
  76. 76. Controlled Vocabulary • Controlled vocabularies ensure shared understanding of the terminologies used in taxonomies and classifications. • Using established vocabularies promotes interoperability, discovery and re-use of data. • Goal of Controlled vocabulary
  77. 77. Controlled Vocabulary Video
  78. 78. Controlled Vocabulary examples
  79. 79. METeOR METeOR is Australia’s repository for national metadata standards for health, housing and community services statistics and information. An example: Person—weight (measured), total kilograms N[NN].N
  80. 80. Have your own controlled Vocabulary
  81. 81. Controlled Vocabulary
  82. 82. Controlled Vocabulary 1. Find and learn about controlled vocabularies relevant to research. 2. Access those vocabularies and reuse them in your community. 3. Integrate vocabularies into your local information systems at a technical level. 4. Upload and describe a vocabulary to share with others. 5. Make a vocabulary machine readable (more easily integrated into other's systems). 6. Create new or import existing vocabularies and manage them with your community's input.
  83. 83. A good example The Australian Longitudinal Study of Ageing
  84. 84. Centralized Data Management
  85. 85. Centralized Data Management • researchers can share good practice and data management experiences with each other • building capacity, collective knowledge and resources for the center • new researchers can immediately implement good data practices from this shared expertise • a uniform approach to data management by creating standard data policies and procedures • keeping track of projects and owners of data over time, especially when researchers come and go • storing and backing up data in a central location; • making researchers and staff aware of duties, responsibilities, funder and legal requirements relating to research data, with easy access to relevant information • ensuring that data management is costed into funding proposals.
  86. 86. A Centralized Data Management may include: • what the data mean • how they were created • where they were obtained • who owns them • who has access, use and editing rights • who is responsible for managing them • storage and backup strategies • data quality control procedures • different versions of files • how they wilt or can be shared
  87. 87. A Centralized Data Management may include: • Acts and Regulations and local statement or policy on data sharing • codes of practice or professional standards relevant to research data • exemplar data management plans:, • a statement of Institutional IT data management and existing backup procedures: • security policy for data storage and data format recommendations • quality control standards for data collection and data entry; • file-naming and version control guidance • template consent forms and information sheets • example ethical review forms and data anonymization guidelines • confidentiality agreements for data handlers.
  88. 88. A Centralized Data Management, How? • An institutional or departmental drive where access can be provided to external researchers. for example. through remote access via virtual private network (VPN) techniques • A secure file transfer protocol (FTP) server • A virtual Research Environment (VRE) or portal environment.
  89. 89. A Centralized Data Management, How? • A content management system such as Drupal • Cloud-based file-sharing areas such as Dropbox. Google Docs. Google Drive. • A data repository such as Dspace, Fedora, Sprints, CKAN or cloud-based figshare.
  90. 90. Part 2: Data sharing
  91. 91. Open / Shared / Closed: The world of data Video
  92. 92. What are publishers & funders saying about data sharing?
  93. 93. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007).
  94. 94. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007). The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities: The Berlin Declaration called for promoting knowledge dissemination through the open access paradigm via the internet, which requires the worldwide web to be sustainable, interactive and transparent, with openly accessible and compatible content and tools (Berlin Declaration, 2003).
  95. 95. The Data Sharing Agenda Organization for Economic Cooperation and Development (OECD) Principles and Guidelines for Access to Research Data from Public Funding: Publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007). The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities: The Berlin Declaration called for promoting knowledge dissemination through the open access paradigm via the internet, which requires the worldwide web to be sustainable, interactive and transparent, with openly accessible and compatible content and tools (Berlin Declaration, 2003). The High Level Expert Group on Scientific Data: Noting the rising tide of data, proposed that we are on the verge of a great new leap in scientific capability, fuelled by data, with a need for a scientific e-infrastructure that supports seamless access, use, reuse and trust of data (European Commission, 2010). The report sketches the benefits and costs of accelerating the development of a fully functional e-infrastructure for scientific data. Open infrastructure, open culture and open content need to go hand in hand.
  96. 96. Data sharing policies of major medical funders
  97. 97. Data sharing policies of major medical funders
  98. 98. Data sharing policies of major medical funders
  99. 99. Data sharing policies of major medical funders
  100. 100. Data sharing policies of major medical funders
  101. 101. Data sharing policies of major medical funders
  102. 102. Data sharing policies of major medical funders
  103. 103. Data sharing policies of major medical funders
  104. 104. Journals and Publishers • Data sharing policies are becoming increasingly common in Australia and internationally. • More and more journal publishers are asking authors to make the data underpinning a journal article available.
  105. 105. New journal data policies
  106. 106. Researchers’ Attitudes
  107. 107. Researcher motivations for sharing data Source
  108. 108. Researcher motivations for sharing data Source
  109. 109. Researcher motivations for sharing data Source
  110. 110. Why some researchers remain reluctant to share their own research data • 42% Intellectual property or confidentiality issues • 36% My funder/institution does not require data sharing • 26% I am concerned that my research will be scooped • 26% I am concerned about misinterpretation or misuse • 23% Ethical concerns • 22% I am concerned about being given proper citation credit or attribution • 21% I did not know where to share my data • 20% Insufficient time and/or resources • 16% I did not know how to share my data • 12% I don’t think it is my responsibility • 12% I did not consider the data to be relevant • 11% Lack of funding • 7% Other Source
  111. 111. Why some researchers remain reluctant to share their own research data • My data are not of interest or use to anyone else. • I want to publish my work before anyone else sees my data. • I have not got the time or money to prepare data for sharing. • If I ask my respondents for consent to share their data then they will not agree to participate in the study. • Other researchers would not understand my data at all or may use it in a wrong way
  112. 112. Data sharing trends by country Source: http://www.acscinf.org/PDF/Giffi-%20Researcher%20Data%20Insights%20-- %20Infographic%20FINAL%20REVISED.pdf
  113. 113. Benefits of data sharing
  114. 114. Benefits for researchers • Increases visibility of scholarly work; • Likely to increase citations rates, • Enables new collaborations; • Encourages scientific enquiry and debate; • Promotes innovation and potential new data uses; • Establishes links to next generation of researchers.
  115. 115. Benefits for research funders: • Promotes primary and secondary use of data; • Makes optimal use of publicly funded research; • Avoids duplication of data collection; • Maximizes return on investment.
  116. 116. Benefits for the scholarly community • Maintains professional standards of open inquiry; • Maximizes transparency and accountability; • Promotes innovation through unanticipated and new uses of data; • Enables scrutiny of research findings; • Improves quality from verification, replication and trustworthiness; • Encourages the improvement and validation of research methods; • Provides resources for teaching and learning.
  117. 117. Benefits for research participants • Allows maximum use of contributed information; • Minimizes data collection on difficult-to-reach or over-researched populations; • Allows participants' experiences to be understood as widely as ethically possible.
  118. 118. Benefits for the public • Advances science to the benefit of society; • Adopts emerging norms such as open access publishing • To be, and appear to be, open and accountable; • Complies with openness laws and regulations.
  119. 119. Considerations before data sharing
  120. 120. Considerations before data sharing • Good data management • Meeting ethical and legal obligations • Intellectual property rights • Data licensing • Meta data schema and cross-walking
  121. 121. Good data management • Data can only be shared if they are of high quality, well-curated, well-documented, and can be referenced and indexed. • Data integrity translates as accuracy and consistency and is ensured through quality control.
  122. 122. Legal obligations Legislation that may impact on the sharing of data: • Privacy Act 1988 • Human Rights Act 2004 • Freedom of Information Act 1982 • The Freedom of Information Amendment (Reform) Act 2010
  123. 123. Human Research Ethics Researchers should: • Inform participants how research data will be stored, preserved and used in the long-term • Inform participants how confidentiality will be maintained, e.g. by anonymizing data • Obtain informed consent, either written or verbal, for data sharing
  124. 124. Levels of consent • ‘Specific': limited to the specific project under consideration • ‘Extended': given for the use of data or tissue in future research projects that are either (i) an extension of, or closely related to, the original project; or (ii) in the same general area of research (for example, genealogical, ethnographical, epidemiological, or chronic illness research); • ‘Unspecified': given for the use of data or tissue in any future research.
  125. 125. Intellectual Property Rights • In most research institutions, such as universities, the institution owns IP rights arising from research undertaken by employees in the course of their employment. • A research funder may also wish to exert some claim over rights, although, in most cases, IP rights are attributed to the researcher unless an out-put becomes commercially viable. • If a university research project has commercial collaborators there may be joint IP rights in the research outputs, which are best handled via consortium agreements or legal contracts.
  126. 126. Intellectual Property Rights • Copyright and Exemptions Under Fair Dealing Copyright is an intellectual property right assigned automatically to the creator. • Copyright cannot be taken away without consent and cannot be abused without the possibility of legal action ensuing. • Most research outputs, including spreadsheets, publications, textual files, reports and computer programs, fall under literary work and are therefore protected by copyright.
  127. 127. The Freedom of Information Legislation (FOI) • There exist rights for people to request access to recorded information held by public sector organizations. • This can include research data held by universities or research institutions. • Many countries have some form of Freedom of Information legislation, which is designed to ensure accountability and good governance in public authorities. • Research data can be requested under the FOI Act and legally supplied to anyone, but copyright and IP rights to such data remain with the original researcher.
  128. 128. What is a license? Why apply a license? • When considering sharing your data, you need to consider how you want your data to be reused by other researchers or students. • You can specify this by licensing the data to match the intended uses. • The data publisher, be it a data center, archive or repository, usually does not expect to have rights in the data collections it distributes or provides access to. • Rather, a researcher or data creator will retain the copyright in their data and give the center a non-exclusive license to redistribute the data.
  129. 129. What is a license? Why apply a license? • All copyright holders with some claim over the data collection need to agree to the terms of deposit. • Without this license agreement in place, a data center or institutional repository cannot legally provide access to the data.
  130. 130. AusGOAL Framework
  131. 131. AusGOAL Framework
  132. 132. How do I apply a license? • You must ‘own’ the data to apply the license • Look at your institution/s IP policies • When partnering: agree – before collecting the data – who can apply the license and what that license will be • Include this info in HREC application
  133. 133. How open can I be? • Consent? (For what?) • Potential for harm/discrimination? • Data modified to address identification, limit harm? • HREC approval?
  134. 134. Meta data schema and cross-walking • A metadata standard is a schema that has been formally approved and published, (ANZLIC and DDI). • Numerous metadata standards exist and the standard chosen to describe resources such as research data should be appropriate to the project or discipline. • Directory of Disciplinary Metadata
  135. 135. Meta data cross-walking • Many of these contributors use different metadata schemas in creating their research data records. • The records of each contributor need to be ‘cross-walked’ • A schema crosswalk is a table that shows equivalent elements in more than one database schema. • It maps the elements in one schema to the equivalent elements in another schema.
  136. 136. Meta data cross-walking • The flexible structure of XML makes it possible to convert data from one metadata standard to another using an XSLT. • XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents.
  137. 137. Methods of Data Sharing
  138. 138. The crucial role of data repositories • Informal sharing • Specialist data centers, archive or repository • An institutional repository • Submitting to a journal to support a publication • Publish in a data journal • Dissemination via a project or institutional website • Self-publishing via a cloud-based system
  139. 139. Informal sharing Data can be shared with colleagues and trusted collaborations or upon request. • Easy • No citation • Little credit • Not easily to find the appropriate data • No data preservation • Duplication of effort
  140. 140. Specialist data centers, archive or repository • Repositories enable discovery of data by publishing data descriptions ("metadata") about the data they hold - like a library catalogue describes individual materials held in a library. • You can publish a description (i.e. the metadata) of your data without making the data itself openly accessible, which enables you to place conditions around access to the data.
  141. 141. Specialist data centers, archive or repository • Assurance that data meet set quality standards • Long-term preservation of data in standard file formats which can be upgraded when needed due to software upgrades or changes • Safe-keeping of data in a secure environment with the ability to control access where this is required • Regular data backups • Online resource discovery of data through data catalogues
  142. 142. Specialist data centers, archive or repository • Access to data in popular file formats • Licensing arrangements to acknowledge data rights and appropriate handling of confidential data • Standard citation mechanism to acknowledge data creation; • Promotion of data to many users • Monitoring of the secondary usage of data • Management of access to data and user queries on behalf of the data owner
  143. 143. An institutional repository • Great wide-scale visibility for scholars of the institutions • Meeting security and ethical obligations • Concerns about data preservation • Concerns about data shareability
  144. 144. Access controlling • Data centers typically liaise with the researchers who own the data in selecting the most suitable type of access for data. • Access regulations should always be proportionate to the kind of data and confidentiality involved. • Access conditions which require that the data center contact the researcher directly about each particular request may result in extended delays before access is granted.
  145. 145. Data portals Data portals or aggregators draw together research data records from a number of repositories, • Research Data Australia (RDA) aggregates records from over 100 Australian research repositories • re3data • The largest and most comprehensive registry of data repositories available on the web. • Subject specific
  146. 146. Data portals
  147. 147. Journals and Data Publishing • As a supplementary material. (Mandatory/Optional) • Data Paper or Data Article • Meta data or Data • Data Journals or regular journals • Data Papers are subject of full peer review • Quality control and technical reviews
  148. 148. Journals and Data Publishing
  149. 149. Project websites and Cloud space • Project Websites and Linked Open Data Project websites can provide easy immediate storage and simplified access to research data • No sustainability for the longer-term. • Difficult to control who is using data and how • Needs a backup and exit plan for sharing data. • This should be viewed as a short-term impact- generating facility and not a long-term data storage solution.
  150. 150. What is data citation? • Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers. • Citing data is increasingly being recognized as one of the key practices leading to recognition of data as a primary research output.
  151. 151. The importance of data citation? • Acknowledges the author's sources; • Makes identifying data easier; • Promotes the reproduction of research results; • Makes it easier to find data; • Allows the impact of data to be tracked; • Provides a structure that recognizes and can reward data creators.
  152. 152. Data citation
  153. 153. Data citation conventions • Uniform Resource Names (URN) • Uniform Resource Locators(URL) • Digital Object Identifiers (DOI) Some Other similar identifiers: • International Standard Book Number (ISBN) • The Open Researcher and Contributor ID (ORCID)
  154. 154. Shared data Use and Its’ Limitations
  155. 155. Shared data can be used for: • Descriptive and Historical studies • Comparative Studies • Secondary Analysis • Replication and Validation of Published Articles • Research Design and Methodological Advancement • Teaching and Learning
  156. 156. Shared data limitations: • Lack of availability of suitable data • Fit of secondary data analysis • Time to use unfamiliar data • Unfamiliarity with appropriate statistical methods of secondary analysis • Lack of rich-enough documentation • Concern about ethical reuse of data
  157. 157. Data management plans
  158. 158. Importance of data management plans • We have explored several important data management concepts during these presentation. • A Data Management Plan (DMP) documents how data will be managed, stored and shared during and after a research project. • Some research funders and human research ethics committees are now requesting that researchers submit a DMP as part of their project proposal.
  159. 159. What a good Data management plan includes Important: Each element is linked to further information in ANDS website.
  160. 160. Preparing a Data Management Plan • The best research practice is to consider these at the start of a project. • By planning ahead the research team can improve research efficiency, guard against data loss, enhance data security, and ensure research data integrity and replicability. • Many Data Management Plan templates are now freely available for reuse.
  161. 161. Data Management Plan • Data Management Checklists • Online planning tools Video
  162. 162. Brief summary
  163. 163. Do you have a successful data?
  164. 164. Need to know about data management more? • Deakin University Library's pages on Managing your Research Data • Cambridge University pages on Research Data Management • University of Leicester Data management support for researchers pages • 23 (research data) Things - ANDS
  165. 165. Need to know about data management more?
  166. 166. Acknowledgment , References We deeply appreciate the contribution of Australian National Data Service
  167. 167. Acknowledgment , References We deeply appreciate the contribution of the book entitled “Managing and Sharing Research Data: A Guide to Good Practice” published by Louise Corti, Veerle Van den Eynden, Libby Bishop, Matthew Woollard; SAGE, 20 Mar. 2014 . * Most of the information in this presentation was derived from this valuable book.
  168. 168. Acknowledgment , References We deeply appreciate the contribution of the UK Data Service and the UK data achieve
  169. 169. Acknowledgment
  170. 170. Acknowledgment Data Management, Metadata and Data Sharing Workgroup The Iran Cohort Consortium (ICC)
  171. 171. Acknowledgment Sherry Lake Kathryn Unsworth
  172. 172. Acknowledgment Non Communicable Disease Unit Davood Khalili Brian Oldenburg
  173. 173. Contact info • Email: mlotfaliany@student.unimelb.edu.au • Tel: +61 450 55 1367 • Please don’t hesitate to contact me!

×