Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Chicago Health Atlas: The Promise, Process, and Problems of using electronic health record data for population health
1. Chicago Health Atlas:
The Promise, Process, and Problems of
using electronic health record data for
population health
April 4, 2013
Abel Kho, MD Roderick (Eric) Jones, MPH
Northwestern University Chicago Dept. Public Health
2. Session Preview
• What is the Chicago Health Atlas?
• The Promise:
Contextual factors that play a role in the
collaboration
• The Process:
Getting started, developing matching algorithms,
minimizing reidentification risk
• The Problems:
Deriving meaning and delivering it to people who
can use it
3. Chicago Health Atlas is a . . .
collaboration
• Informatics researchers from multiple
healthcare institutions
• Chicago Regional Extension Center
(CHITREC)
• Chicago Community Trust
• Chicago Department of Public Health
5. Chicago Health Atlas is a . . .
database
• De-identified electronic health record
data for ~1 million Chicagoans
• In-patient and out-patient visits spanning
2006-2011
• Individual patient records matched
across institutions
7. Chicago: Person, Place, Time
Percent change, Percent of total
Group
2000-2010 in 2010
Chicago 7 [2.7 million]
Non-Hispanic black 17 32
Non-Hispanic white 6 32
Hispanic 3 29
Non-Hispanic Asian 14 5
8. Chicago: Person, Place, Time
229 Square miles
77 neighborhood “Community areas”
Lake Michigan
with population median of 31,000
O’Hare
(range, 3,000 – 99,000)
Stem Leaf # Boxplot
9 9 1 0
9 4 1 |
8 |
8 02 2 |
Loop
7 99 2 |
7 23 2 |
6 |
6 44 2 |
5 556667 6 |
Suburban Cook County 5 223 3 |
4 559 3 +-----+
4 0124 4 | |
Midway 3 5666799 7 | + |
3 01112233 8 *-----*
2 55669 5 | |
2 01123334 8 | |
1 568888899 9 +-----+
1 01233334 8 |
0 6679 4 |
0 33 2 |
----+----+----+----+
Multiply Stem.Leaf by 10**+4
All but two community areas have
larger populations than the least-
populated Illinois county
10. Healthy Chicago sets goals for. . .
• Public policy and legislation (n=56)
• Health education and awareness (n=45)
• Interventions and programs (n=92)
12. Highlights
Infrastructure
• Establish an Office of Epidemiology and
Public Health Informatics
• Expand epidemiology capacity through an
increase in staff and the development of
strategic partnerships with other entities who
use or collect public health data
13. NYC Macroscope
Scientific Advisory Group
• New York City has embarked on a study to
validate population health estimates from its
Primary Care Information Project
• CDPH involvement has lead to collaboration
on developing vision and methodology for
more widespread use of EHR data for public
health
14. Highlights
Infrastructure
• Increase the
availability of
public health data
through the City
of Chicago
website
17. Even if we don’t have a mature
HIE or a Regenstrief Institute,
is it possible to . . .
• Leverage existing EHR data
• Weave together data from multiple
institutions with publicly available data
• Measure disease burden and care delivered?
18.
19. Design Considerations
• Limit sharing of any protected health
information
• Yet account for care of the same patient
at multiple institutions
• Protect anonymity of
patients/providers/institutions
• Enable linkage to new information and
sources as it becomes available
– Patient level
– Geographic location
20. Process – getting started
• Coordinated IRB approval across multiple
institutions.
– Constrained to adults aged 18-89
– Limited to structured data, no free text
– Focus on 606xx zip codes, with known
overlapping care institutions and high
population density
• Instead of an EMPI, create a lightweight
software application to pass identifiers through
a standard set of preprocessing steps, and then
“hash” the data
22. How we “Hashed” our Data
-Hash algorithms accept variable size input messages and produce a small
fixed-size output called a hash value or message digest
-The hash is non-degenerate; only 1 input message per final hash value
-The hash is 1-way; Easy to go from message to hash value, very hard to go
from hash value to message.
-We initially used an early hash, Secure Hash Algorithm-1 (SHA-1).
http://csrc.nist.gov/publications/nistbul/b-May-2008.pdf
23. Preliminary SHA-1 Single
Institution Validation
5-Variable Hash
Concatenate WilliamGalanter22732M123456789
William Galanter 3/31/1962 M SSN
WilliamGalanter22732M123456789 SHA1 20802322ED366A1EFD562A6219C4D7AF993BADAD
4-Variable Hash
William Galanter 3/31/1962 M Concatenate & SHA112345678901234567890123456789012345
24. Updated Hash Method
• SHA-1 was found to have a potential security issue, moved to a
second generation Hash, SHA-512* (512 bit)
• Significant focus on data pre-processing / normalization
• Trimming spaces and non A-Z characters, lower case
_Jimmy__ O’Brien Jr. jimmy, obrien
• Remove “-” from SSN and remove all invalid combinations
• Only allow Birth year >1921
• Use “F” and “M” for sex
• Replace missing elements with missing data indicators
*http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html
25. Updated Hash Method (cont.)
• Creates 5 hash IDs (with probability weights) depending on availability of
last name, first name, date of birth (DOB), gender, SSN.
– All data available (1.0)
– All fields except; no DOB, or no First and last name, or no SSN (0.3)
– All fields, but only first three letters of names available (0.1)
– SOUNDEX codes (phonetic equivalents) of the first and last name plus
date of birth and gender (0.1)
• Wrapped up into a standalone Java program
• Can readily consume other data sources (e.g. Social Security Death Index
Tables)
26. Diabetes
(250.xx)
Institution A Institution C/
Hash ID-1 Honest Broker
John john
Hash ID-2
O’Dwyer Pre- odwyer Hash Hash ID-3
6/12/1970 06121970
Process
987654329 Fxn Hash ID-4
987-65-4329 Hash ID-5
M m
Replace
Matched StudyID
HashIDs 250.xx
with 401.xx
Unique
John john StudyID
O dwyer Hash ID-1
Pre- odwyer Hash Hash ID-2
6/12/70 06121970
male
Process Fxn Hash ID-3
m Hash ID-4
Hash ID-5
HTN Institution B
(401.xx)
27. Data Dictionary
• Standardized data specifications for data
extractions from participating sites
– Demographics
– Vital signs
– Diagnoses
• Study ID | Month/Year | Encounter type | Encounter
number | Diagnosis code
– Medications
– Laboratory tests
• Study ID | Month/Year | Lab test name | Result |
Units | Normal Range | Specimen type
29. De-Identified Health Information
De-identified health information neither identifies nor provides
a reasonable basis to identify an individual. There are two
ways to de-identify information; either:
(1) a formal determination by a qualified statistician;
(2) the removal of specified identifiers of the individual and of
the individual’s relatives, household members, and
employers is required, and is adequate only if the covered
entity has no actual knowledge that the remaining
information could be used to identify the individual.
29
30. HIPAA Expert Determination
(abridged)
Certify via “generally accepted
statistical and scientific principles &
methods, that the risk is very small
that the information could be used,
alone or in combination with other
reasonably available information, by
the anticipated recipient to identify the
subject of the information.”
30
33. Uniqueness Analysis
Model Uniques (%) Uniques (People)
Safe Harbor 0.000064% 13
Chicago Health Atlas 0.3% 8,050
34. Uniqueness Analysis
Model Uniques (%) Uniques (People)
Safe Harbor 0.000064% 13
Chicago Health Atlas 0.3% 8,050
35. Completing the Re-identification
Requires Resources
Safe Harbored • Could link to registries
Records – Birth – Marriage
– Death – Divorce
Identified Identified Identified
Clinical Records Population Records Resource
• What’s in vogue?
Voter registration DBs
Chicago Health
Atlas Model
Benitez & Malin. JAMIA. 2010.
36. Risk will Vary Across Regions
Voter Registration Databases
IL MN TN WA WI
WHO Registered Political MN Voters Anyone Anyone Anyone
Committees
(ANYONE – In Person)
Format Disk Disk Disk Disk Disk
Cost $500 $46; “use ONLY for $2500 $30 $12,500
elections, political
activities, or law
enforcement”
Name
Address
Date of Birth
Sex
Race
Benitez & Malin. JAMIA. 2010.
37. Uniqueness Analysis
Model Uniques (%) Uniques (People)
Safe Harbor 0.000064% 13
Chicago Health Atlas 0.3% 8,050
38. Uniqueness Analysis
Model Uniques (%) Uniques (People)
Safe Harbor 0.000064% 13
Chicago Health Atlas 0.3% 8,050
Linked to Voter Registration
Safe Harbor Really small 0
Chicago Health Atlas 0.004% 80
Linked to Voter Reg
39. Uniqueness Analysis
Model Uniques (%) Uniques (People)
Safe Harbor 0.000064% 13
Chicago Health Atlas 0.3% 8,050
Linked to Voter Registration
Safe Harbor Really small 0
Chicago Health Atlas 0.004% 80
Linked to Voter Reg
40. Next Steps
• Consider re-identification risk options
– Coarsen ZIP codes
– Coarsen Ethnicities
– Coarsen Age groups
• Search* for tradeoffs between information
utility (e.g., epidemiologic findings) and
privacy (i.e., re-identification risk)
*Benitez & Malin. JAMIA. 2011.
43. Data contribution summary,
April 2013
Data Type Institution
1 2 3 4 5 6
Demographics C C C C C PC
Diagnoses C C C C C PC
Visit type C C C C C PC
BMI, BP C PP N N N PC
Glucose, HbA1c C C C N N PC
Medications C C C N N PC
C: complete; N: not yet incorporated;
PP: partial time period; PC: partial cohort
44. How many patients receive care
at more than one institution?
No. of institutions Number % Cumulative %
4 or 5 393 0.0 0.0
3 8,409 0.9 0.9
2 74,372 7.6 8.5
1 892,468 91.4 100.0
Includes the 5 institutions with all patient visits 2006-2010 submitted (as of April 2013).
45. Sample size/cohort comparison,
by residential ZIP code,
BRFSS* vs. Chicago Health Atlas
Source Min Median Mean Max
IL BRFSS, Chicago
2011 respondents 4 15 16 33
Chicago Health
Atlas, patient with 1,339 10,031 9,270 21,289
2010 visit
*CDC Behavioral Risk Factor Surveillance System survey, Chicago
sub-sample from Illinois dataset.
46. Diabetes prevalence estimate
by residential ZIP
Percent=
# of patients with > 1 diabetes mellitus diagnosis code
# of patients with visit in 2006-2010
47. No, patient does not
have type 2 diabetes
Finding type 2 diabetes
in the health record
• Diagnosis codes
• Labs
• Medications
• Number of visits Yes, patient has type 2 diabetes
48. Diabetes prevalence estimate
by residential ZIP
Percent=
# of patients with > 1 diabetes mellitus diagnosis code
or lab criteria met
# of patients with visit in 2006-2010
49. Percent of Atlas patients with
diabetes diagnosis in 2006-2010
Percent
Minimum number of visits recorded
Illinois BRFSS estimates the prevalence of diabetes in Chicago at 9-11%.
50. Hypertension prevalence estimate
by residential ZIP
Percent=
# of patients with > 1 hypertension diagnosis code
# of patients with visit in 2006-2010
51. Coronary heart disease prevalence
estimate
by residential ZIP
Percent=
# of patients with > 1 CHD diagnosis code
# of patients with visit in 2006-2010
52. Gun shot wound prevalence
estimate
by residential ZIP
Percent=
# of patients with > 1 gun shot wound diagnosis code
# of patients with visit in 2006-2010
53. Problem
Applying estimates to Chicago
– rather than patient – populations
55. Race-ethnicity comparison
Percent of total
Group
Atlas 2010 Census
Non-Hispanic black 31 32
Non-Hispanic white 20 32
Hispanic 14 29
Non-Hispanic Asian 4 5
Not given/Unknown 31 0
59. Imputation of ZIP code rates to
community area
Diabetes hospitalization, 2010
Imputed using age, sex,
Rates by ZIP Imputed using age & sex & race-ethnicity
Additional text
60. Imputation of ZIP code rates to
community area
Diabetes hospitalization, 2010
Imputed using age, sex,
Rates by ZIP Imputed using age & sex & race-ethnicity
Additional text
61. Maps courtesy of Chieko Maene, University of Chicago, as part of CDPH-UC Diabetes Translational Research Collaboration.
62.
63. Dasymetric areal interpolation
1. Calculate for each ZIP code
Male & female x 19 age groups = 28 rates
or
Male & female x 19 age groups x 4 race-ethnicity
groups = 84 rates
2. Apply rates to corresponding population group
in each census block to get counts
3. Sum counts to Community area
4. Calculate rates based on community area
population denominators
64.
65.
66. Dataset description elements
• Description (who, what, where, when)
• Definitions
• Calculations and formulas
• Limitations, disclaimers, sources of error
• Benchmarks and references
67. Chicago Health Atlas Funders
• Otho S.A. Sprague Institute
• Northwestern Memorial Hospital
Community Engagement
68. Health Atlas Team
• Northwestern University: John Cashy, Anna Roberts, Sara
Lake
• Univ. of Illinois-Chicago: Bill Galanter, John Lazaro
• Cook County Hospital System: Bala Hota, Amanda Grasso
• Univ. of Chicago Medical Center: Chris Lyttle, Ben Vekhter,
David Meltzer
• Alliance of Chicago: Erin Kaleba, Fred Rachman, Jermaine
Dellahousaye
• Rush University Medical Center: Shannon Sims, Aaron Tabor
• Vanderbilt University: Brad Malin
• UIC Intern team: Ariadna Garcia, Pravin Babu Karuppaiah,
Shazia Sathar, Ulas Keles (Sid Battacharya, Faculty mentor)