Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Preparing Data for Sharing: The FAIR Principles

An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.

Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.

  • Soyez le premier à commenter

Preparing Data for Sharing: The FAIR Principles

  1. 1. PREPARING DATA FOR SHARING The FAIR Principles Gareth Knight London School of Hygiene & Tropical Medicine gareth.knight@lshtm.ac.uk ADMIT Network Meeting 01 December 2015
  2. 2. FAIR Principles Findable •Descriptive metadata •Persistent Identifiers Accessible •Determining what to share •Participant consent and risk management •Access status Interoperable •XML standards •Data Documentation Initiative •CDISC Reusable •Rights and licence models •Permitted and non-permitted use http://datafairport.org/ Make your data: • Findable • Accessible • Interoperable • Reusable
  3. 3. Data Sharing in the sciences • Data sharing has always taken place in some form • Enlightenment during 17 – 18th century built upon open debate and sharing of knowledge • Science depends on openness and transparency to advance – Replicate results – Correct errors & address bias • Negative as well as positive findings need to be in the public domain “Systematic Dictionary of the Sciences, Arts, and Crafts” Diderot & d'Alembert (1751 onwards)
  4. 4. Data Sharing in the News “To make progress in science, we need to be open and share.” Neelie Kroes (2012) vice president of the European Commission http://europa.eu/rapid/press-release_SPEECH-12-258_en.htm
  5. 5. Key Motivators Research / Policy development Ensure validity Funder Requirement Publisher requirements
  6. 6. Data reuse improves citation rate • Studies that made data available in a public repository received 9% more citations than similar studies where data was not available • Creators tend to cite own data up to 2 years • Third party use grew over time: for 100 datasets deposited in year 0, – 40 reuse papers in PubMed in year 2 – 100 by year 4 – 150+ by year 5. Piwowar & Vision, T.J (2013). Data reuse and the open data citation advantage. https://peerj.com/articles/175/ Study of 10,557 articles published between 2001 and 2009 that collected gene expression microarray data
  7. 7. DATA DISCOVERY Is your data findable?
  8. 8. Discovery Metadata • Descriptive metadata created to describe key attributes of data: – Title – Creator – Content description • Data repositories/journals capture and publish discovery metadata in several formats (DC, DataCite, DDI) • Metadata ‘harvested’ by research data catalogues & search engines • Metadata available to all, even if data is not Registry of Research Data Repositories http://service.re3data.org
  9. 9. Citing Data • Research data are a citable resource, same as papers & books • 44-75 days is the estimated average lifespan of web URLs • A unique, long-term identifier is necessary to enable citation • Many persistent ID systems developed to solve problem – DOI, Handle, ARK, etc. • Data citation in reports and publications UK Data Service: Citing Data https://www.ukdataservice.ac.uk/use-data/citing-data
  10. 10. DATA ACCESS Do you have permission to share? If so, what?
  11. 11. Data Selection Meet funder / journal obligations Encourage research use Higher citation rate Reproduce & validate results ConstraintsMotivation Concern that will attract lower rate of response or people will be less honest Intellectual Property Rights issues Participant consent doesn’t address sharing Data Protection legislation Data sharing decisions built upon recognition of all influencing factors Information Commissioner Office. Data Sharing Code of Practice http://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing/
  12. 12. Handling individual level data • Collected and analysed for specific purpose • Stored no longer than is necessary • Kept securely and safely to prevent unauthorised or unlawful access, process, loss, or destruction EU Data Protection Directive 95/46/EC establishes limitations on how information on living individuals is held and used Reform of the data protection legal framework in the EU http://ec.europa.eu/justice/data-protection/reform/index_en.htm
  13. 13. Data Sharing as a barrier Investigation of influence of open data policies on consent rate: • No participants declined to participate, regardless of condition • Rates of drop-out vs completion did not vary between open/non-open policies • No significant change in potential consent rates when participants openly asked about the influence of open data policies on their likelihood of consent. Some researchers consider sharing obligations to be a barrier to research participation
  14. 14. Access Status Control method • Data Transfer Agreement • Access controls Application process: • Request form • Review process Access criteria: • Permitted users – how do you identify? • Permitted use – topic, academic use, • Other criteria: encryption, time period Open Vs. controlled access https://www.flickr.com/photos/toruokada/16958186672/
  15. 15. DATA INTEROPERABILITY Can data be analysed and harmonized?
  16. 16. Data Standards Data exchange is dependent upon: • Open formats • Common standards • Documented metadata specification • Consistent vocabulary • Documented workflows https://biosharing.org/
  17. 17. Clinical Data Interchange Standards Consortium Standards intended to improve consistency across the clinical trial lifecycle Protocol Data Collection Data Tabulation Data Analysis Archiving and exchange Protocol Representation Model Clinical Data Acquisition Standards Harmonization (CDASH) Operational Data Model (ODM) and Define-XML Study Data Tabulation Model (SDTM) Analysis Data Model (ADaM)
  18. 18. Data Documentation Initiative • Maintained & developed by DDI Alliance • Supported by data archives, producers, research data centers, university data libraries, statistics organizations, etc. • Two versions: – DDI2 / Codebook: An archived instance of a study – DDI3 / DDI Lifecycle: Suitable for longitudinal and repeated surveys An XML-based metadata standard developed for social science and economic statistics http://www.ddialliance.org/
  19. 19. Study Concepts measures Survey Instruments using Questions made up of Universes about Responses collect resulting in with values of Variables Comprised of Categories/ Codes, Numbers Data Files Survey Data Model Slide source: https://www.unece.org/fileadmin/DAM/stat s/documents/ece/ces/ge.33/2011/mtg2/W P_1_Arofan.ppt
  20. 20. DDI Codebook A codeBook consists of: 1. docDscr: describes the DDI document 2. stdyDscr: Title, abstract, methodologies, agencies, access policy 3. fileDscr: a description of files in the dataset 4. dataDscr: variables (name, code, etc.), variable groups, cubes 5. othMat: other related materials, e.g. document citation 3 levels - Study, dataset, variable Preserves the collection of files associated with an archival copy of a survey
  21. 21. DDI Lifecycle http://www.ddialliance.org/what Data collector Data Analyst Data Curator Secondary user Each stage may be performed by different groups
  22. 22. DDI Metadata reuse Basic metadata can be reused during study life: • Concepts, questions, responses, variables, categories, codes, survey instruments, etc. may be adopted from earlier waves Referencing earlier iterations: • Unique identifier • Version number - control over time Common metadata ‘groups’ maintained by specific agencies: • Schemes: lists of items of a single type • Modules: metadata for a specific purpose or lifecycle stage • All maintainable metadata has a known owner or agency
  23. 23. Unique ID example urn=“urn:ddi:3_0:VariableScheme.Variable=pop.umn.edu: STUDY0145_VarSch01(1_0).V101(1_1)” This is a URN From DDI Version 3.0 For a variable The scheme agency is pop.umn.edu With identifier STUDY012345_VarSch01 Version 1.0 Variable ID is V101 Version 1.1 http://www.iza.org/conference_files/eddi09/ppt/thomas_wendy_course.pdf
  24. 24. DDI Cross-study comparison Variables are comparable if they possess same properties: • Age is comparable if has: – Same concept (e.g., age at last birthday) – Same top-level universe (people) – Same representation (i.e., an integer from 0-99) DDI Comparison module: • Place similar items in same group and perform tailored comparison • Mappings are context-dependent, i.e. sufficient for purposes of particular research
  25. 25. DDI Tools DDI Codebook: • Nesstar Publisher & Server • IHSN Microdata Management Toolkit • Collectica • NADA • UKDA - DExT, ODaF DeXtris DDI Lifecycle • Collectica Designer, Collectica for Excel, Portal • Sledgehammer DDI Tools http://www.ddialliance.org/resources/tools
  26. 26. DATA REUSE Can data be used for further research?
  27. 27. Data Rights • Many rights apply to data – Copyright – Moral – Database – Patents & trade secrets • Rights issues vary between countries • Ensure your project has clarified rights issues before sharing https://www.flickr.com/photos/riekhavoc/4813140176/ Rights issues influence how data can be shared, used and cited
  28. 28. FAIR data • Consider permitted use • Apply appropriate licence • Use open formats • Consistent vocabulary • Common metadata standards • Consider what will be shared • Obtain participant consent & perform risk management • Describe your data in a data repository • Apply a persistent identifiers Findable ReusableInteroperable Accessible
  29. 29. Thank You for your attention! Questions