The Doctor Social Graph Project, hosted on MedStartr (healthcare version of KickStartr) is a healthcare data project opening up Physician referral data and much more. http://thehealthcareblog.com/blog/2012/11/05/tracking-the-social-doctor-opening-up-physician-referral-data-and-much-more/
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Introduction to Doctor Social Graph Project
1. Introduction to the Doctor
Social Graph project
Brandon Weinberg : November 29, 2012
This presentation is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
2. Before I Start...
● As the Doctor Social Graph project rapidly
progresses, obsolence will kick in rendering
this content stale and "old news"
● This presentation was published on
Slideshare 11/29/2012 when the Doctor
Social Graph Project was quite new
● Details as of 11/29 are gradually emerging;
Most content in these slides is paraphrased
from official project announcements thus far
● Let's get started!
3. Organizer
● Fred Trotter
● Celebrated Health IT Expert in USA
● One of the Designees of the Direct Project
(Mandated HIE Protocol in USA)
● Co-Authored First Health IT Book for O'Reilly
and Most Popular Book on Meaningful Use
Standards: Meaningful Use And Beyond
● Values Open Source
4. Announcement
● Strata RX 2012- O'Relly Strata Conference
● October 16, 2012
● San Francisco
● Fred's Keynote Titled "The Ethos of
Healthcare Data Science"
● This Was When Data Was Initially Released
(Open Source Licensed), For Healthcare
Data Scientists in Audience
5. Social Dataset
● Collaborative Relationship Data
● How Doctors, Hospitals, Labs and Other
Healthcare Providers Collaborate To Treat
Medicare Patients
● Data Includes: Referrals to Specialists
● Data Includes: Lab Providers and Hospitals
A Doctor Often Works With
● Data Includes: Real Names and Addresses
● Representative of How USA Healthcare
System is Delivering Care
6. Doctor Social Graphs
● Graphical Representations of Group
Interactions During Medicare Treatment
● Diagrams Based on Math Models
● Use Nodes and Connections
● Nodes: Providers, e.g. Doctors, Hospitals,
Labs, Etc
● Connections: Degree to Which Providers
Work Together Treating Specific Patients
● Will Be Largest Real-Name Social Graph
That is Publically Available, Of Any Kind!
7. Doctor Social Graphs
● Visualization of Social Graph begins at 1:10
● http://youtu.be/L4C3cloZEQk
8. Other Social Graphs
● Facebook, Twitter, LinkedIn Exemplify
Private Big Data Social Graphs
● Most Portions of Data Remain In-House
● Do You Know Any Data Scientists Good at
Graphing and Graph Theory? They May
Appreciate Doctor Social Graph
9. Preparing Data
● Initial Dataset Was Obtained by Fred Trotter
● He Filed A Freedom of Information Act
Request Against Medicare Claims Database
● For Phase 1 Improvement, He Purchased
Board Credentialing Data in All 50 States
● Was $50-$1,000 Per State to Download
● Board Credentialing Data is Analogous to
"Credit History" for Doctors. e.g. Medical
Schools, Board Certifications and Board-
Imposed Punishments
10. Preparing Data
● After Merging Initial (Referral and Teaming)
Dataset with State Credentialing Data, the
Data Was Formatted For Usability, e.g.
Disparate Data Sources Will Be Formatted in
CSV, JSON, XML
● Merged Dataset To Be Released in Late
November or Early December to MedStartr
Backers (Explained Later)
11. Doctor Performance
● Fairly Evaluate Doctor Quality in USA
● "My Most Important Project For This Data Is
Simple: I Want To Create Algorithms To
Rate Doctors That Patients Find Useful And
That Doctors Find Fair." Fred Trotter
(paragraph 10)
● "The Development of Objective, Fair and
Useful Doctor Rating Systems" Fred Trotter
12. Doctor Performance
● Referrals From Doctors, For Example, May
Be Used As Doctor "Votes" For Each Other
● Scroll Down to Third Paragraph Why This
Matters To Patients For Challenges and
Biases in Current Doctor Rating Systems
● Examples Abound How Patients, Doctors,
Insurance Companies, Hospitals, Labs,
Academics, Scientists, Health Policy Makers
and Others May Leverage Data For Their
Particular Research Interests
13. Hospital Performance
● Hospital Performance Data Sources Will Be
Merged and Improve Dataset
● e.g. Phase 3
● Example Question: Which Cardiologists
Refer to Hospitals With Poor Central Line
Infection Rates?
● "We Want to Turn This Into the Ultimate
Source For Open Doctor and Hospital Data."
Fred Trotter
14. Overview of Data
● 2011 Dataset is 1.3 GB file
● 3.7 Million Entries
● Contains Nearly One Million Nodes
● Node = Person or Organization That
Provided Health Care Service to a Medicare
Patient
● Graph Data is Keyed Using National
Provider Identifiers (NPIs)
15. NPI
● NPI = Unique Provider Number
● Individual and Organization Providers
● NPI is Mandated by HIPPA (as a
Replacement to UPIN)
● Doctors and Hospitals Must Use Their NPI
for Medical Billing, e.g. Medicare Billing or
Prescribing Medication
16. Sample Data
● A few lines from a random search (grep) on
a specific NPI...
grep 1548387418 refer.2011.csv >
Methodist_Hospital_Referrals.csv
NPI_Seen_First,NPI_Seen_Second,Seen_Count
1184710477,1548387418,55
1548387418,1326047754,62
1548387418,1598971913,24
● Pretty Cool, Huh? Full Sample is on
Pastebin
17. Tip For Providers
● Are You A Health Care Provider?
● Good Time To Update Your NPI Record
● e.g. No Need to List Your Home Address
● Public Database
● Updated Weekly
● Fred Built a Very Clean NPI Search Tool
● Or Use Government NPI Search Tool
18. Referral and Teaming
● Graph Has 49,685,586 Referring Party Pairs
(Collaborative Relationships)
● When Providers Work On The Same Group
of Patients Within The Same Time Frame =
Teaming Relationship
● Interactions Traditionally Considered
Referral Relationships = Majority of Data
● If Provider A Sees the Same Patient As
Provider B Within 30 Days, It Counts As +1
19. Referral and Teaming
● What's Counted is How Often Two Providers
Bill Medicare For The Same Patients in 30
Days
● How Can Patient Identification Be Avoided,
You May Ask
● For Each Entry in Dataset, At Least 11
Patients Were Involved in Transaction
● 11 = CMS Standard
● 11 Solves "Elvis Problem"
20. Elvis Problem
● Everyone Knows Elvis' Doctor
● Everyone Knows Elvis Doctor Has One
Patient
● If Elvis' Doctor Refers to a Cardiologist, Then
Everyone Knows Elvis Has Heart Problems
● At Least 11 Patients Take Part In Each
Given "Referral Count"
● Enforcing a Minimum of 11 Patients in the
Transaction Addresses Said Problem
21. Privacy
● Aside From Knowing a Score Reflects 11 or
More Patients, Little Else Can Be Derived
From Relationship Scores About Patients
● e.g. Referral Relationship Score = 1,100
● You Know it Reflects 11 or More Patients
● Was It 11 Patients With 100 Referrals?
● Was It 100 Patients With 11 Referrals?
● Bottom Line. Data Reflects the Relationship
Score Between Two Nodes, While Omitting
Patient-Specific Data
22. Privacy
● No Patient-Specific Data is Released in
Dataset; Patient-Specific Data is Entirely
Omitted (Not Deidentified)
● Doctors Who Bill Medicare Are Government
Contractors; Some Will Be Surpised As
Public Data Becomes Increasingly
Accessible
● Freedom of Information Act Makes
Government Contractor Data Available to
Public for Accountability
23. Privacy
● It is Fair to Presume Organizations Are
Already Using Such Healthcare Data
● e.g. Insurance Companies, Pharmacy
Chains, Government, Etc
● Ironically, Patients and Doctors Have Had
Least Access To Study Such Data
24. Data Overlay
● Information Will Be Discoverable By
Overlaying Private or Public Data On Top of
the Dataset
● Dataset With Medicare Referral and
Teaming Patterns Was a Starting Point to
Merge Data
● Dataset Will Be Steadily Improved
● In Phase 2, For Example, Publically
Available Nursing Home Data To Be Merged
25. Geo-Encoded
● Each Provider Identifier Contains Practice
Location Address and Mailing Address
● Data Can Be Overlayed Geographically and
Merged With Geo Databases
● Graph Gets Input From a Geo-Encoded Key
● 80%: Specific Latitude or Longitude
● 20%: Zip Code for General Location
● Localized Healthcare Data
26. Sample Data, Re-Examined
● 1112223334,5556667778,1111
● 1112223334 = NPI of Node That Saw
Medicare Patient First
● 5556667778 = NPI of Node That Saw
Medicare Patient Second
● 1111 = Number of Times This Happened in
a 30-Day Period During A Year (Connection)
● 1111 = Relationship Score Between Real-
Named Nodes and Connections
● Often (Not Always) the PCP = First Variable
27. Most Popular Referrals
● Fred Uploaded the Top 100 Organizations
by Number of Nodes in Dataset to Pastebin
● One of Most Frequent "Referrals" is to Get
Lab Work Done at LabCorp, Quest or Other
Local Lab Providers
● Also Very Common "Referrals" are to
Hospital Emergency Departments and
Treatment Facilities Like DaVita
28. Taxonomy
● Public NPI File Has Provider-Type Ontology
Classifying Doctor and Organization Types
● Hospitals, Primary Care Doctors, Specialist
Types and Labs are Coded in NPI File in
This Provider-Type Ontology; Which is
Maintained by AMA's National Uniform Claim
Committee
● Not Perfect, But Usually Accurate
29. Funding Overview
● Funding is Occasionally Needed to Improve
Dataset and Fred Uses Crowdfunding Model
● Project is Currently Hosted on MedStartr
(Healthcare Version of KickStarter)
● Backers Can Receive Early Access (6
Months) to a Rich Healthcare Dataset
● Entire Dataset Will Become Open Sourced
in Mid-2013 and Free to the Public
● License To Be Creative Commons
Attribution-ShareAlike 3.0 Unported License
30. Funding Overview
● MedStartr Backers Have Bought 1 of 2 Data
Licenses
● Open Source Data License
● $100-$120: Access to Entire Database and
Sharing of Any Integrated Data Required
● Proprietary-Friendly Data License
● $1,200-$5,000: Access to Entire Database
and Sharing of Integrated Data Not Required
31. Funding Details
● For Phase 1 Improvements to the Initial
Dataset $23,720 Collected From 88
MedStartr Backers; 51 Receive Data
● 39 Get Open Source Data License and 12
Get Proprietary-Friendly Data License
● Data Price Rises Per Phase Between Now
and Mid-2013 (Until Data Becomes Free)
● Dual-License = No Data Hoarding; Lets
Organizations Pay Steep Price to Innovate in
Private, Without Blocking Open Research
32. Crowdfunding
● Fred Effectively Said, "If A Few Hundred
People Want To Pool Small Amounts of
Money Together For This Project, I'll Buy
and Prepare Public-Yet-Inaccessible
Healthcare Data So Scientists Can Use It To
Improve Healthcare, and It Will Never Be
Hoarded."
● Clinical Trial Fundraiser Diabetes App
● Extend Features Patient Relationship App
● Not-Just-For-Profits: Transparent Funding
33. Call To Innovators
● "All of The Cool Discoveries in This Dataset
Should Happen in the First Six Months."
Fred Trotter
● "All of the Really Amazing Discoveries in
This Dataset Will Be Made in the Next Few
Months, By Those Who Either Attended
Strata RX, or Who Participate in This
Project." Fred Trotter
● Phase 2 Underway on MedStartr
34. Conclusion
● This presentation was made for people
learning about the Doctor Social Graph
project
● I hope it provides them a few things which
make understanding the project and data
easier and faster
● Have fun using the Doctor Social Graph
● Questions/Comments: Brandon Weinberg
● Email: b@brandonweinberg.com