3. https://www.asis.org/rdap/
“Epi” Data Characteristics
• Sensitive
• Often recycled, daisy-chained
– Big data
• Complex and heterogeneous
• Flat-file vs. relational databases
• Often numeric, even for non-numeric
responses – data dictionaries are essential!
4. https://www.asis.org/rdap/
Researcher Needs
• HIPAA-aligned storage
• High-capacity storage and computation
• Protection of personal investment in data
• Incentives for sharing data
• Metadata interoperability
5. https://www.asis.org/rdap/
Library Services for Epi Data
• Technology
– Repository with access controls
OR
Long-term embargoes for data
– High-capacity preservation (OA and dark)
– Ability to mint PIDs for data
6. https://www.asis.org/rdap/
Library Services for Epi Data
• Training
– Data management specific to epi
– Metadata standards and uses
– De-identification – how and why
– Data citation using PIDs
7. https://www.asis.org/rdap/
Resources
• Informed Consent: Lutz, K., et al. (2012). Research ethics board approval
for an international thromboprophylaxis trial. Journal of critical care
• Workflows: Enanoria, W. (2004). Data Management Issues in
Epidemiology. Berkeley, CA: Center for Infectious Diseases & Emergency
Readiness. Retrieved from
www.idready.org/slides/data_management.ppt
• Workflows: Thomas, R. K. (Ed.). (2003). Chapter 12: Information Sources
and Data Management. Health Services Planning.
• Metadata: Brandt, C. A., Gadagkar, R., Rodriguez, C., & Nadkarni, P. M.
(2004). Managing complex change in clinical study metadata. Journal of
the American Medical Informatics Association : JAMIA
• Disciplinary Metadata (DCC): http://www.dcc.ac.uk/resources/metadata-
standards
Notes de l'éditeur
Dr. John Snow is famous for his investigations into the causes of the 19th century cholera epidemics, and is also known as the father of (modern) epidemiology. [13][14] He began with noticing the significantly higher death rates in two areas supplied by Southwark Company. His identification of the Broad Street pump as the cause of the Soho epidemic is considered the classic example of epidemiology. He used chlorine in an attempt to clean the water and had the handle removed, thus ending the outbreak.
Reuses data from other sources Daisy chain of related studies Often ePHI/sensitive information (therefore subject to HIPAA) Privacy and security are paramount! Conformance with laws and regulations especially important Big data Complex and heterogenous - Associating public health studies with genomics research, demographic information with health information, etc Required quality data to reproduce studies and verify results Requires reuse of workflow modules to execute same commands on different data supervision of collections and data sharing by oversight committees, rather than individuals, common Researcher incentives in current system cause researchers to view data as proprietary, rather than a public good; lots of data hugging Data is often stored either in Excel spreadsheets or relational databases Data is often coded into numeric values, since epidemiologists often work with statistical analyses and most statistical routines require that non- numeric information be coded into numeric answers
Training done in cooperation with IRB and Research Administration and University IT group; take a wholistic approach; researchers don’t want to have to go to 3 different trainings if they can avoid it. Integration of concepts and issues, where you can’t have a single workshop.