SlideShare une entreprise Scribd logo
1  sur  60
healthdata.gov
    now and next
 challenges overview

hhs ocio, health datapalooza 2012
session agenda
•   now
    –   tools and features


•   next
    –   target architecture


•   challenges
    –   explanations in sequence


                              1
now – tools and features
•   Drupal
    –   publishing workflow and community engagement
•   Solr
    –   faceted search
•   CKAN
    –   „on demand resources‟ (RESTful API and feeds)
•   EC2
    –   powered by GovCloud
•   github.com/hhs
    –   public repo‟s coming soon!

                             2
publishing workbench
•   insert interesting workbench screenshot




                       3
community engagement
•   insert interesting community engagement
    screenshot
•   question and/or ideas example




                      4
faceted search




      5
hub.healthdata.gov/api/rest/dataset


                           step 1:
                         HTTP GET
                         /dataset
                         collection
                          as JSON
                          (GUID or name)




                 6
hub.healthdata.gov/api/rest/dataset/{name}



                               step 2:
                              HTTP GET
                                each
                              /dataset
                            (as JSON, RDF/XML, or
                                     N3)




                    7
hub.healthdata.gov/api/search/dataset?q=medicare+costs




                                       JSON
                                     results for
                                    „medicare‟
                                    and „costs‟
                                      search
                                       query


                          8
hub.healthdata.gov/feeds/dataset.atom



                          atom feed
                            for all
                           datasets
                           (including recent
                             updates and
                               changes)




                  9
hub.healthdata.gov/feeds/custom.atom?q=medicare+cost




                                     custom
                                     search
                                      query
                                   result atom
                                      feed
                                     (anything with
                                    „medicare+cost‟)



                         10
next – target architecture
•   linked data
    –   (closed) google knowledge graph
    –   open health knowledge graph
•   integration framework
    –   top down modeling
    –   bottom up mapping
    –   social curation




                            11
#gkg – (closed) ‘things, not strings’



     “The Knowledge Graph helps us
         understand the relationships
    between things [… that are] linked
        in our graph. […] It‟s not just a
     catalog of objects; it also models
    all these inter-relationships.” source




                               12
open health knowledge graph




             13
health.data.gov/id/hospital/393303




                14
clinical quality linked data (HDI II)




                 15
lifting and enrichment




          16
Linked Data Integration Framework
                    GKG/Watson/Siri/…        healthdata.gov


                                                   PCAST DEAS



                                                      HKG




                Variety
                Volume
                Velocity




Health Data Actor
                                        17
social meta/data – graph curation




                18
i2 challenges
• two types
  – three domain specific
     • improve the integration and liquidity of data made available
  – four platform specific
     • enhance the capabilities of the technology components


• 3 release rounds
  – sequenced to leverage dependencies
     • round 1: June through October 2102
     • round 2: November 2012 through May 2013
     • round 3: June through December 2013

                                19
round 1 challenges
• June 2012 through October 2012

  – domain specific
     • [1.1] cross domain and domain specific metadata
         – voluntary consensus standards organizations, defacto
           standards, other


  – platform specific
     • [1.2] Simplified Sign On (SSO)
         – WebID identity provider and relying parties, HDP infrastructure
           components


  – $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes
                                  20
round 2 challenges
• November 2012 through May 2013

  – domain specific
     • [2.3] Mapping, Reconciliation and Correlation
         – structural variety, authoritative URI‟s, linking heuristics


  – platform specific
     • [2.4] Faceted Browsing and Visualization
         – D3 (backbone, jQuery, etc.)
     • [2.5] Custom API
         – Linked Data API „configurator‟ for dataset resources


             » each of these builds on [1.1] results

                                    21
round 3 challenges
• June 2013 through December 2013

  – domain specific
     • [3.6] Correlating HHS and NHS Classifications
        – structural variety, authoritative URI‟s, linking heuristics


  – platform specific
     • [3.7] Linked Data API based Data Element Access Services
        – „securing the data, not just the device‟
             » builds on [1.1], [1.2], and [2.5]




                                   22
domain challenge [1.1]
• Metadata
  – requests the application of existing voluntary
    consensus standards for metadata common to all
    open government data
  – and invites new designs for health domain specific
    metadata to classify datasets in our growing catalog,
    creating entities, attributes and relations
  – that form the foundations for better discovery,
    integration and liquidity.


• 374 on challenge.gov

                           23
W3C SKOS – concept schemes




            24
W3C DCAT – data catalogs




           25
hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf




                                            rdf/xml
                                          output uses
                                          dublin core
                                           and dcat
                                          metadata
                                         (mapping issues to work
                                            out, N3 output is
                                           incomplete, etc.)

                               26
https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf




                                            ckan script
                                           that creates
                                           dc and dcat
                                            metadata
                                           tags / values
                                             (thanks @JoshData!
                                              public github repo
                                                    soon :-)

                                  27
W3C Data Cube – statistics




refactor CQLD
vocabs/data?
start here and
 follow imports

                  28
W3C Provenance – change mgmt




                    apply to CKAN
                      /revisions




             29
hub.healthdata.gov/revision




             30
W3C org – organization




          31
quantity, units, dimensions, time




                32
OGC GeoSPARQL – geospatial




            33
OMG BMM – business motivation




              34        image source
CQLD domain specific




         35
platform challenge [1.2]
• WebID based SSO
  – will improve community engagement
  – by providing simplified sign on (SSO) for external
    users interacting across multiple HDP technology
    components,
  – making it easier for community collaborators to
    contribute,
  – leveraging new approaches to decentralized
    authentication.


• 375 on challenge.gov
                           36
relying party WebID login




            37
identity provider WebID login




              38
edit WebID property ACL at IdP




              39
property is now visible to the RP




                40
domain challenge [2.3]
• Mapping, Reconciliation and Correlation
   – builds on the Metadata domain challenge [1.1]
   – begins by acknowledging disparate open government publishing
     practices
   – and seeks the demonstration of an innovative and automated
     solution for transforming semi-structured data into structured data,
   – reconciles decentralized distributions about the same data entity
     against the master identity of an authoritative source,
   – and correlates these master identities when multiple authoritative
     sources exist,
   – enabling the network effect by introducing strong identity resolution
     techniques that ease the ability to aggregate different data about
     the same entities from independent publishers.


                                  41
automating structural transformations




                 42
‘reconciling’ strings to things




               43
result: turtle is the new JSON!




              44
link automation heuristics editor




                45
platform challenge [2.4]
• Faceted Browsing and Visualization
   – builds on the Metadata domain challenge [1.1]
   – uses the most popular browser based UI frameworks and libraries
     to realize novel exploration and discovery techniques for traversing
     large amounts of interrelated data,
   – contributing to a growing collection of open source widgets that
     make it easy for third parties to create new applications and embed
     health data in their content.




                                 46
surfing the domain schemata




 no domain knowledge
  required to discover
entities and relationships
                 47
agents construct e/r queries




Siri, which {LA County}
Hospitals have the best
 {Heart Attack} stats?
               48
d3 (jQuery, backbone, etc.)




             49
platform challenge [2.5]
• Custom API
  – also builds on the Metadata domain challenge [1.1]
  – makes it possible to tune programmatic access in accordance
    with dataset metadata, leveraging an existing „Web 3.0‟
    framework and Linked Data API (LDA) implementation to provide
    specialized interfaces




                              50
a ‘Web 3.0’ API ‘configurator’

• Linked Data API (LDA)
 – http://code.google.com/p/linked-data-api/
   •   open source impl here
       –   http://code.google.com/p/puelia-php/
   •   example usage here
       –   http://reference.data.gov.uk/doc/department
   •   example api reference docs here
       –   http://environment.data.gov.uk/lab/doc/api-bwq-reference-
           v0.2.html
   •   commercialization example here
       –   http://kasabi.com/tour


                               51
domain challenge [3.6]
• Correlating HHS – NHS Classifications
   – builds on both the Metadata [1.1] and Mapping, Reconciliation and
     Correlation [2.3] domain challenges,
   – and uses the US and UK health domain specific classification
     schemes to exercise the capabilities demonstrated by the
     automated solution to [2.3],
   – resulting in better international integration of frameworks for
     understanding societal outcomes and their corresponding health
     statistics.




                                52
platform challenge [3.7]
• Linked Data API based Data Element Access Services
   – builds on the Metadata domain challenge [1.1], and the Web ID
     based SSO [1.2], and Custom API [2.5] platform challenges
   – augmenting WebID based authentication with metadata driven
     authorization,
   – introducing an innovative security and privacy implementation of
     „data element access services‟ (DEAS) as described by the PCAST
     Health IT Report,
   – resulting in a Custom API configured by domain specific metadata
     that governs fine grained access to provide the right data to the
     right user.


• „secure the data, not just the devices‟
                                53
LDA + PPO = DEAS




       54
Privacy Preference Ontology (PPO)




                55
user 1 AuthZ ‘1101’ all attributes




                56
multiple machine readable formats




                57
user 2 AuthZ ‘1101’ no attributes




                58
thanks!
@prefix drm: <http://vocab.data.gov/def/drm#>
@prefix sdo: <http://schema.org/>
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>
@prefix dc: <http://purl.org/dc/terms/>


<http://hhs.gov/staff/georgethomas#>
    rdf:type drm:DataSteward , sdo:Person ;
    vcard:email “george dot thomas 1 at hhs dot gov” ;
    dc:contributor <healthdata.gov>, <data.gov/semantic> .

                           59

Contenu connexe

En vedette

George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
George Thomas
 

En vedette (9)

US-UK HHS-NHS Summit
US-UK HHS-NHS SummitUS-UK HHS-NHS Summit
US-UK HHS-NHS Summit
 
Gt ea2009
Gt ea2009Gt ea2009
Gt ea2009
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
Paul klee
Paul kleePaul klee
Paul klee
 
CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011
 
Realizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked DataRealizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked Data
 
HealthData.gov Challenge Webinar
HealthData.gov Challenge WebinarHealthData.gov Challenge Webinar
HealthData.gov Challenge Webinar
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.gov
 
Open Health Knowledge Graphs
Open Health Knowledge GraphsOpen Health Knowledge Graphs
Open Health Knowledge Graphs
 

Similaire à HDI III - Healthdata.gov - Now, Next and Challenges

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
Rensselaer Polytechnic Institute
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
data publica
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 

Similaire à HDI III - Healthdata.gov - Now, Next and Challenges (20)

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
Graphical display of statistical data on Android
Graphical display of statistical data on AndroidGraphical display of statistical data on Android
Graphical display of statistical data on Android
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 

Plus de George Thomas

Gt health2stat 7-22-2010
Gt health2stat 7-22-2010Gt health2stat 7-22-2010
Gt health2stat 7-22-2010
George Thomas
 

Plus de George Thomas (8)

Learn by doing
Learn by doingLearn by doing
Learn by doing
 
Gt health2stat 7-22-2010
Gt health2stat 7-22-2010Gt health2stat 7-22-2010
Gt health2stat 7-22-2010
 
Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
(More) Transparency Transformation
(More) Transparency Transformation(More) Transparency Transformation
(More) Transparency Transformation
 
Recovery.Gov
Recovery.GovRecovery.Gov
Recovery.Gov
 
Transparency Transformation
Transparency TransformationTransparency Transformation
Transparency Transformation
 
Office 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO OffsiteOffice 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO Offsite
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

HDI III - Healthdata.gov - Now, Next and Challenges

  • 1. healthdata.gov now and next challenges overview hhs ocio, health datapalooza 2012
  • 2. session agenda • now – tools and features • next – target architecture • challenges – explanations in sequence 1
  • 3. now – tools and features • Drupal – publishing workflow and community engagement • Solr – faceted search • CKAN – „on demand resources‟ (RESTful API and feeds) • EC2 – powered by GovCloud • github.com/hhs – public repo‟s coming soon! 2
  • 4. publishing workbench • insert interesting workbench screenshot 3
  • 5. community engagement • insert interesting community engagement screenshot • question and/or ideas example 4
  • 7. hub.healthdata.gov/api/rest/dataset step 1: HTTP GET /dataset collection as JSON (GUID or name) 6
  • 8. hub.healthdata.gov/api/rest/dataset/{name} step 2: HTTP GET each /dataset (as JSON, RDF/XML, or N3) 7
  • 9. hub.healthdata.gov/api/search/dataset?q=medicare+costs JSON results for „medicare‟ and „costs‟ search query 8
  • 10. hub.healthdata.gov/feeds/dataset.atom atom feed for all datasets (including recent updates and changes) 9
  • 11. hub.healthdata.gov/feeds/custom.atom?q=medicare+cost custom search query result atom feed (anything with „medicare+cost‟) 10
  • 12. next – target architecture • linked data – (closed) google knowledge graph – open health knowledge graph • integration framework – top down modeling – bottom up mapping – social curation 11
  • 13. #gkg – (closed) ‘things, not strings’ “The Knowledge Graph helps us understand the relationships between things [… that are] linked in our graph. […] It‟s not just a catalog of objects; it also models all these inter-relationships.” source 12
  • 16. clinical quality linked data (HDI II) 15
  • 18. Linked Data Integration Framework GKG/Watson/Siri/… healthdata.gov PCAST DEAS HKG Variety Volume Velocity Health Data Actor 17
  • 19. social meta/data – graph curation 18
  • 20. i2 challenges • two types – three domain specific • improve the integration and liquidity of data made available – four platform specific • enhance the capabilities of the technology components • 3 release rounds – sequenced to leverage dependencies • round 1: June through October 2102 • round 2: November 2012 through May 2013 • round 3: June through December 2013 19
  • 21. round 1 challenges • June 2012 through October 2012 – domain specific • [1.1] cross domain and domain specific metadata – voluntary consensus standards organizations, defacto standards, other – platform specific • [1.2] Simplified Sign On (SSO) – WebID identity provider and relying parties, HDP infrastructure components – $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes 20
  • 22. round 2 challenges • November 2012 through May 2013 – domain specific • [2.3] Mapping, Reconciliation and Correlation – structural variety, authoritative URI‟s, linking heuristics – platform specific • [2.4] Faceted Browsing and Visualization – D3 (backbone, jQuery, etc.) • [2.5] Custom API – Linked Data API „configurator‟ for dataset resources » each of these builds on [1.1] results 21
  • 23. round 3 challenges • June 2013 through December 2013 – domain specific • [3.6] Correlating HHS and NHS Classifications – structural variety, authoritative URI‟s, linking heuristics – platform specific • [3.7] Linked Data API based Data Element Access Services – „securing the data, not just the device‟ » builds on [1.1], [1.2], and [2.5] 22
  • 24. domain challenge [1.1] • Metadata – requests the application of existing voluntary consensus standards for metadata common to all open government data – and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations – that form the foundations for better discovery, integration and liquidity. • 374 on challenge.gov 23
  • 25. W3C SKOS – concept schemes 24
  • 26. W3C DCAT – data catalogs 25
  • 27. hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf rdf/xml output uses dublin core and dcat metadata (mapping issues to work out, N3 output is incomplete, etc.) 26
  • 28. https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf ckan script that creates dc and dcat metadata tags / values (thanks @JoshData! public github repo soon :-) 27
  • 29. W3C Data Cube – statistics refactor CQLD vocabs/data? start here and follow imports 28
  • 30. W3C Provenance – change mgmt apply to CKAN /revisions 29
  • 32. W3C org – organization 31
  • 34. OGC GeoSPARQL – geospatial 33
  • 35. OMG BMM – business motivation 34 image source
  • 37. platform challenge [1.2] • WebID based SSO – will improve community engagement – by providing simplified sign on (SSO) for external users interacting across multiple HDP technology components, – making it easier for community collaborators to contribute, – leveraging new approaches to decentralized authentication. • 375 on challenge.gov 36
  • 40. edit WebID property ACL at IdP 39
  • 41. property is now visible to the RP 40
  • 42. domain challenge [2.3] • Mapping, Reconciliation and Correlation – builds on the Metadata domain challenge [1.1] – begins by acknowledging disparate open government publishing practices – and seeks the demonstration of an innovative and automated solution for transforming semi-structured data into structured data, – reconciles decentralized distributions about the same data entity against the master identity of an authoritative source, – and correlates these master identities when multiple authoritative sources exist, – enabling the network effect by introducing strong identity resolution techniques that ease the ability to aggregate different data about the same entities from independent publishers. 41
  • 45. result: turtle is the new JSON! 44
  • 47. platform challenge [2.4] • Faceted Browsing and Visualization – builds on the Metadata domain challenge [1.1] – uses the most popular browser based UI frameworks and libraries to realize novel exploration and discovery techniques for traversing large amounts of interrelated data, – contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content. 46
  • 48. surfing the domain schemata no domain knowledge required to discover entities and relationships 47
  • 49. agents construct e/r queries Siri, which {LA County} Hospitals have the best {Heart Attack} stats? 48
  • 51. platform challenge [2.5] • Custom API – also builds on the Metadata domain challenge [1.1] – makes it possible to tune programmatic access in accordance with dataset metadata, leveraging an existing „Web 3.0‟ framework and Linked Data API (LDA) implementation to provide specialized interfaces 50
  • 52. a ‘Web 3.0’ API ‘configurator’ • Linked Data API (LDA) – http://code.google.com/p/linked-data-api/ • open source impl here – http://code.google.com/p/puelia-php/ • example usage here – http://reference.data.gov.uk/doc/department • example api reference docs here – http://environment.data.gov.uk/lab/doc/api-bwq-reference- v0.2.html • commercialization example here – http://kasabi.com/tour 51
  • 53. domain challenge [3.6] • Correlating HHS – NHS Classifications – builds on both the Metadata [1.1] and Mapping, Reconciliation and Correlation [2.3] domain challenges, – and uses the US and UK health domain specific classification schemes to exercise the capabilities demonstrated by the automated solution to [2.3], – resulting in better international integration of frameworks for understanding societal outcomes and their corresponding health statistics. 52
  • 54. platform challenge [3.7] • Linked Data API based Data Element Access Services – builds on the Metadata domain challenge [1.1], and the Web ID based SSO [1.2], and Custom API [2.5] platform challenges – augmenting WebID based authentication with metadata driven authorization, – introducing an innovative security and privacy implementation of „data element access services‟ (DEAS) as described by the PCAST Health IT Report, – resulting in a Custom API configured by domain specific metadata that governs fine grained access to provide the right data to the right user. • „secure the data, not just the devices‟ 53
  • 55. LDA + PPO = DEAS 54
  • 57. user 1 AuthZ ‘1101’ all attributes 56
  • 59. user 2 AuthZ ‘1101’ no attributes 58
  • 60. thanks! @prefix drm: <http://vocab.data.gov/def/drm#> @prefix sdo: <http://schema.org/> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> @prefix dc: <http://purl.org/dc/terms/> <http://hhs.gov/staff/georgethomas#> rdf:type drm:DataSteward , sdo:Person ; vcard:email “george dot thomas 1 at hhs dot gov” ; dc:contributor <healthdata.gov>, <data.gov/semantic> . 59