This document summarizes a workshop on health and social development analytics using big data. It discusses how data sources are becoming larger, more diverse and used for multiple purposes. This presents opportunities to better understand issues but also challenges around privacy, bias and data quality. The workshop aims to identify partnership opportunities and prototype projects using integrated data to address health and social issues. Case studies from various institutions are presented using combined data sources like medical records, surveys and environmental factors.
Botany krishna series 2nd semester Only Mcq type questions
Sdal air health and social development (jan. 27, 2014) final
1. Health & Social Development Analytics
and Big Data –
A Joint AIR and Virginia Tech Workshop
SALLIE KELLER, DIRECTOR
SOCIAL AND DECISION ANALYTICS LABORATORY
VIRGINIA BIOINFORMATICS INSTITUTE AT VIRGINIA TECH
Social and Decision Analytics Laboratory
2. Starting the Journey
“In attempting to arrive at the truth, I have applied everywhere for information,
but scarcely an instance have I been able to obtain hospital records fit for
any purpose of comparison. If they could be obtained, they would enable us to
decide many other questions besides the ones alluded to. They would show
subscribers how their money was spent, what amount of good was really
being done with it, or whether their money was not doing mischief rather than
good.”
Florence Nightingale (1864)
Social and Decision Analytics Laboratory
3. Social and Decision Analytics Laboratory
• Pressures & Opportunities
of Today
• Big data
– Why important?
– What about privacy?
• Health & Social
Development analytics
– What makes it big data?
– How does big data change
current approaches?
• Selected examples
• Methodology challenges
Outline
4. Health and Social Development Pressures
Source: Congressional Budget Office.
Social and Decision Analytics Laboratory
• Health as a percent of
GDP
– 5% in 1960 to 18% in 2012
• Changing demographics
– Increasing minority
populations
– Rapidly aging populations
– Rural vs. urban living
– Increasing inequality
• Focus on the patient
– Health outcomes
4
5. Health Care Analytics Opportunities
• Drivers behind health care costs
– Technology, infectious and chronic diseases
• Workforce demand
– Care givers, biomedical researchers, IT specialists
• Prevention and personalization
– Changing demographics and lifestyles
Social and Decision Analytics Laboratory
6. Social Development Analytics Opportunities
Social and Decision Analytics Laboratory
• Understanding and anticipating
– Changes in population growth, aging and diversity
– Adapting to increasing urbanization
– Building individual and community resiliency
• Tailoring programs and policies by defined subpopulations
7. Big Data - Doesn’t matter what its called, only
matters what you do with it
Social and Decision Analytics Laboratory
• Big data
– Structured & unstructured
– Collections
• Designed
• Observational/convenience
• Statistics / analytics
– Replication, reproducibility,
representativeness
– Description, association, causation
• prediction ≠ correlation
• Cost drivers
– Analytics and informatics, NOT data collection
8. Now Big Data is Changing Social Sciences
• Social science research
– Traditionally informed by
surveys and statistically
designed experiments
– Clean, well-controlled, limited
in scale (~103)
• Bringing “Big data” to bear
for social policy
– Data informed computational
social science models
– Quantitative social science
methods & practice at scale
Social and Decision Analytics Laboratory
9. Methodological Issues
Social and Decision Analytics Laboratory
New methods and tools are
needed to ensure
– Data access
– Data quality
– Representativeness
– Replication
– Reproducibility
– Characterization of noisy
data
• Managing biases
– Selection bias
– Measurement bias
National Research Council 2013
11. Social and Decision Analytics Laboratory
• European Council 1995/1996:
– “… any information relating to an
identified or identifiable natural
person; an identifiable person
is one who can be identified
(data subject), directly or indirectly,
in particular by reference to an
identification number or to one
or more factors specific to his
physical, physiological, mental,
economic, cultural or social identity.”
• World Economic Forum 2011:
– “… digital data created by and
about people.”
11
Personal Data - New Asset Class
12. World Economic Forum 2013
Social and Decision Analytics Laboratory
Yesterday
• Definition of personal data is
predetermined and binary
• Individual provides legal
consent but not truly engaged
• Policy framework focuses on
minimizing risk to individual
Today
• Definition of personal data is
contextual and dependent on
social norms
• Individual engaged and
understands how data is used
and value created
• Policy needs to focus on
balancing protection with
innovation and economic
growth
12
13. Further Privacy Thoughts
• Will people voluntarily give up their data if they can see a
personal or societal benefit?
• Are norms/expectations changing with generations?
• What are technical fixes for multi-level privacy/
classification?
• What is the optimal level of privacy for studies of interest?
Social and Decision Analytics Laboratory
14. Can we table privacy for the duration of
the workshop?
• Deserves serious, devoted conversation
• We should be leaders in this conversation
• Will need to specifically address as projects develop
Social and Decision Analytics Laboratory
15. Changing Landscape of Health Data
Social and Decision Analytics Laboratory
• Electronic Health Records
• Interoperability challenges
• Public choices
– 23andME
– Google Health
– Health Vault
P. Breugel, Tower of Babel (1563)
16. Personal Health Data
Social and Decision Analytics Laboratory
• Today
– medical history
– lab results
– imaging results (X-ray,
MRI)
– medication records
– Allergies
– vaccination records
– demographic data
– billing information
• Tomorrow
– genome sequence
– Epigenome
– Transcriptome
– Proteome
– Metabolome
– Immunome
– Microbiome
– survey data
– health monitor data
17. Omics
Social and Decision Analytics Laboratory
"Omics" datasets are large,
require sophisticated
interpretation, and will have to
be reinterpreted over time as
knowledge and standard of care
change
• Tomorrow
– Genome sequence
– Epigenome
– Transcriptome
– Proteome
– Metabolome
– Immunome
– Microbiome
– Survey data
– Health monitor data
18. Self Reported Data
Social and Decision Analytics Laboratory
These self-reported data will
vary widely in quality and utility for
research, but will be an important
source of phenotype information
• Tomorrow
– genome sequence
– Epigenome
– Transcriptome
– Proteome
– Metabolome
– Immunome
– Microbiome
– survey data
– health monitor data
19. Tomorrow is Today
• Infrastructure is being created to enable large longitudinal
studies that combine:
– Comprehensive electronic health records
– Behavioral and environmental factors (survey information)
– Genetic information (partial or complete genome sequence)
NIH - Electronic Medical Records and Genomics Network
Wellcome Trust - UK Biobank
Vanderbilt University - BioVU
Kaiser Permanente – Research Genes, Enviro., & Health
Veterans Administration - Million Veteran Program
Social and Decision Analytics Laboratory
20. Tomorrow is Today
• Began collecting DNA in 2007; now has 167,250 samples
• Opt-out program; relatively few patients opt out
• Samples are matched with deidentified EHRs
• Use is restricted to Vanderbilt researchers
NIH - Electronic Medical Records and Genomics Network
Wellcome Trust - UK Biobank
Vanderbilt University - BioVU
Kaiser Permanente – Research Genes, Enviro., & Health
Veterans Administration - Million Veteran Program
Social and Decision Analytics Laboratory
21. Additional Characteristics that Make the Data Big
• Multi-sourced
• Observational
• Noisy
• Multi-purposed
Social and Decision Analytics Laboratory
22. Multi-Sourced Data
Health and social development occurs within context
• Individual and family history and experiences
• Environment
• Access to care, programs, and facilities
• Local, state, and national health and welfare systems
• Political and economic factors
Information communication technology opens opportunity to
capture meta data and provenance of the information
Challenge: integration and interpretation of data captured
under such varied circumstances
Social and Decision Analytics Laboratory
23. Observational Data
• Can come from every stakeholder, source, or technology
that interacts with the patient, care giver, or facility
• Little discrimination on what is captured
– Internet medical surveys, on-line disease tracking, prevention
activities, attitudes on blogs, etc.
• On-demand data from multiple systems
– Social networks, education records, work history, medical
records, extramural activities, etc.
Presents opportunity to study the health and development
processes as the naturally occur
Challenge: manage biases, data quality, and data linkage
Social and Decision Analytics Laboratory
24. Social and Decision Analytics Laboratory
Meanwhile, if the quantity of
information is increasing by
2.5 quintillion bytes per day,
the amount of useful
information almost certainly
isn’t. Most of it is just noise,
and the noise is increasing
faster than the signal.
Nate Silver, 2013
Challenge: uncertainty quantification
Noisy data
25. Multi-Purposed Data
• Individual health and well being versus the population
• Data reuse for multiple purposes
– Macro-level: regional, state, national, and international
– Meso-level: institution-wide
– Micro-level: individuals, cohorts, and groups
An opportunity to more fully use data
Challenge: What is optimal for an individual may not be
optimal for the population and vice versa
Social and Decision Analytics Laboratory
Source: Buckingham Shum, S. (2012)
26. Case Studies from VT Colleagues and
Collaborators
• Bureau of Economic Analysis Health Accounts
• Out of Hospital Cardiac Arrest
• EMBERS
• Mild Cognitive Impairment
• Synthetic Information
Social and Decision Analytics Laboratory
27. Household Consumption Expenditures for Medical Care:
An Alternate Presentation
Ana Aizcorbe, Eli B. Liebman, David M. Cutler, and
Allison B. Rosen
• Health care predicted to reach 20% of GDP by 2020
• Health care expenditures increased ~29% (2002-2006)
• Developing a satellite account on medical care spending
• Data include public and private sources
Survey of Current Business
June 2012:34-47
http://www.bea.gov/scb/pdf/2012/06%20June/0612_healthcare.pdf
32. Open Source Indicators for Forecasting
ILI Case Counts and Rare Disease Outbreaks
Naren Ramakrishnan (PI) – involves large multi-institutional team
• EMBERS: Early Model-based Event Recognition using
Surrogates
• Fully automated processing of data and delivery of warnings
Source
https://www.cs.vt.edu/node/6565
33. Google Flu Trends Google Search Trends Healthmap Weather Twitter OpenTable Parking Lot Imagery
EMBERS Prediction
Pipeline
33
35. Family Triad Perceptions of Mild Cognitive Impairment (MCI)
Karen A. Roberto, Rosemary Blieszner and Tina Savla
• Age-related decline in memory and executive functioning
• 10-20% of individuals aged 65+ have MCI
• Data Sources
– Memory clinics, churches, senior housing
– Family-level data: Elder with MCI age 60+, Primary care partner ,
Secondary care partner
Journal of Gerontology: Social Sciences
2011(6): 756-768
36. reasoning,
planning,
speech,
movement
emotions,
problem-solving
vision perception of
touch, pressure,
temperature,
pain
perception
and
recognition of
auditory
stimuli,
memory
*Executive Function*
Brain Functioning
37. Benefits of Multiple Informants
Complete
Acknowledgement
Families
Partial
Acknowledgement
No
Acknowledgement
Passive
Acknowledgement
38. Synthetic Information – Disease (Pandemic) Evolution
Stephen Eubank, Bryan Lewis, and many others
• Age-related decline in memory and executive functioning
• 10-20% of individuals aged 65+ have MCI
• Data Sources
– Memory clinics, churches, senior housing
– Family-level data: Elder with MCI age 60+, Primary care
partner , Secondary care partner
Source
: Roberto, Blieszner, McCann, & McPherson 2011
FIX
http://supercomputing.vbi.vt.edu/
43. Goals for the Workshop
• Imagine a different world –case studies are examples
• Look for synergistic capabilities to build partnerships
• Assess opportunities to integrate multiple sources of data
and approaches to comprehensively understand health
and social development issues
• Propose prototype projects to work on together to set the
stage for future projects
Social and Decision Analytics Laboratory