SlideShare a Scribd company logo
1 of 27
http://metacognition.info/presentations/SW-usecases-outcomes-research.ppt




  Semantic Web use cases
  in outcomes research
  Experiences from building a patient repository and
  developing standards



                                                            Chimezie Ogbuji
                                                  Metacognition Inc. (Owner)
Outline
• Me
• Semantic Web and Semantic Web technologies
  • RDF, GRDDL, OWL, RIF, and SPARQL
• Cleveland Clinic Semantic DB project
  •   Content repository
  •   Data collection workflow
  •   Quality and outcomes reporting
  •   Cohort identification
• Use of the system
Me and the Semantic Web
• I’ve been developing software using standards of the Semantic
  Web since 2001
  • Worked on a startup that developed an XML & RDF content
    repository
• Began working on Cleveland Clinic SemanticDB project in 2003
• Began working in the World-Wide Consortium (W3C),
  developing the SPARQL and GRDDL standards in 2007 and
  2006, respectively
• I contribute to and maintain several open source software
  projects related to Semantic Web technologies:
  • RDFLib (https://code.google.com/p/rdflib/)
  • FuXi (https://code.google.com/p/fuxi/)
  • Akamu (https://code.google.com/p/akamu/)
The Semantic Web
• The Semantic Web
  • What is it? Like asking “What is the Matrix?”
  • A vision of how the existing WWW can be extended such that
    machines can interpret the meaning of data involved in protocol
    interactions
  • A vision of the founder of the World-wide Web Consortium (W3C)
    and inventor of the internet (Tim Berners-Lee)
• Semantic Web technologies / standards
  • Layers of W3C standards (“Layer cake”)
  • A technological roadmap that attempts to realize this vision
  • The technologies are well-suited to addressing many enterprise
    software architecture challenges
http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
http://www.bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake
“Focus” standards
•   Resource Description Framework
•   Gleaning Resource Descriptions from Dialects of Language
•   SPARQL Protocol And RDF Query Language
•   Ontology Web Language
RDF
• A framework for representing information in on the WWW.
• Motivation
  • machine-interpretable metadata about web resources
  • mashup of application data
  • automated processing of web information by software agents
• Graph data model (directed, labeled graph)




• Nodes and links are labeled with URIs
• Some nodes are not labeled (Blank nodes)
• Links are called RDF sentences or triples
                                   http://www.w3.org/TR/rdf-concepts/
GRDDL
    • A protocol for sowing semantics in structured (XML) web
      content for harvest
    • Vast amount of latent semantics
      in web documents
    • Web content today is
      primarily built for human
      consumption




http://www.w3.org/TR/grddl/
Faithful Rendition
“By specifying a GRDDL transformation, the author of a document states that
the transformation will provide a faithful rendition in RDF of information (or
some portion of the information) expressed through the XML dialect used in
the source document.”
• Licenses an interpretation of an XML document that is
  certified by the author
                                    (embedded)
                                     transform
                  XHTML / XML
                                                        RDF
                   (instances)




                                   namespace
                                    transform
                 XML namespace                          RDF
Architectural value
• XML is well-suited for messaging, data collection, and
  structural validation
• RDF is well-suited for expressive logical assertions, querying,
  and inference.
• RDF graphs can be created, update, deleted, etc. (managed)
  using a particular XML vocabulary
  • vocabulary can be specific to a particular purpose
• GRDDL facilitates mutually-beneficial use of XML and RDF
  processing and representation
SPARQL
   • The query language for RDF content
   • It operates over an RDF dataset
      • comprised of named (a URI) RDF graphs and a single RDF graph
        without a name
   • Operationally and structurally similar to SQL
   • Many implementations (including the ones we used) build on
     existing relational database management systems
      • translate SPARQL queries into SQL queries




Elliott et al. A complete translation from SPARQL into efficient SQL. 2009
                                             http://www.w3.org/TR/sparql11-query/
OWL
• Language for describing and constraining the semantics of an
  RDF vocabulary
• Such constraints (often hierarchical) are called ontologies
• An ontology specifies a conceptualization of a particular
  domain as categories, relationships between them, and
  constraints on both
• By defining an OWL document for the terms in an RDF
  graph, additional RDF sentences can be inferred
• Additionally, an RDF graph can be determined to be consistent
  or inconsistent with respect to the ontology
• Both tasks can be performed by a logical reasoning engine
Semantic Database (SDB)
    • Cleveland Clinic’s Heart and Vascular Institute (HVI)
    • Challenges:
       • fragmented gathering and storing of clinical research data
       • compartmentalization of medical science and practice
       • clinical knowledge is often expressed in ambiguous, idiosyncratic
         terminology
       • problematic for longitudinal patient data that can feasibly span
         multiple, geographically separated sources and disciplines
    • Longitudinal patient record:
       • patient records from different times, providers, and sites of care
         that are linked to form a lifelong view of a patient’s health care
         experience
Institute of Medicine. The computer-based patient record: an essential technology for
health care. 1997
                   http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/
Project goals
• Create a framework for context-free data management
• Usable for any domain with nothing (or little) assumed about
  the domain
• Expert-provided, domain-specific knowledge is used to control
  most aspects of
  •   Data entry
  •   Storage
  •   Display
  •   Retrieval
  •   Formatting for external systems
Components
   • Content repository
      • supports data collection, document management, and knowledge
        representation for use in managing longitudinal clinical data
      • manages patient record documents as XML and converts them to
        RDF graphs for downstream semantic processing
   • Data collection workflow management
      • process of transcribing details of a heart procedure from the EHR
        into a registry
      • RDF used as the state machine of a workflow engine


Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and
Quality Reporting. 2012
Ogbuji. A Role for Semantic Web Technologies in Patient Record Data Collection.
2009
Workflow State as RDF Dataset
• Each task is an XML document in a content repository
• Mirrored into a named RDF graph that shares a web location
  (the name) with the document
• (SPARQL) query is dispatched against a workflow dataset to
  find tasks in particular states or assigned to particular people
• Applications interact with task information and fetch:
  • JSON and XML representations (for client-side web applications)
  • XHTML documents that render as faceted views of a collection of
    tasks
  • faceted view includes links to subsequent stages in workflow and
    into other web applications on server
Reporting challenges
   • Reporting places a heavy burden on institutions to produce
     data in specific formats with precise definitions
   • Definitions vary across reports
      • makes it difficult to use the same source data for all reports
   • Institutions are typically forced to manually abstract the data
     for each report
   • This is done separately to conform to the requirements for
     each report




Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and
Quality Reporting. 2012
Components: reporting
   • Quality and outcomes reporting
      • generate outcomes reports both for internal and external
        consumption
      • internal reports were generated monthly and external reports are
        generated quarterly
      • quarterly reports submitted to Society of Thoracic Surgeons (STS)
        Adult Cardiac Surgery National Database and American College of
        Cardiology (ACC) CathPCI Database
      • submissions are required for certification




Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and
Quality Reporting. 2012
Cohort identification
  • SPARQL and RDF datasets are well-suited as infrastructure for
    a longitudinal patient record data warehouse
  • HVI software development team partnered with Cycorp to
    build a cohort identification interface called the Semantic
    Research Assistant (SRA)
  • Based on the Cyc inference engine
     • a powerful reasoning system and knowledge base with built-in
       capability for natural language (NL)processing, forward-chaining
       inference and backward-chaining inference.
     • incorporates Cyc's NL processing to permit a user to compose a
       cohort selection query by typing an English sentence or sentence
       fragment

Lenat et al. Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries. 2010.
RDF dataset warehouse
• CycL to SPARQL
  • domain-specific medical ontologies in conjunction with the Cyc
    general ontology are used to convert the NL query into a formal
    representation and then into SPARQL queries.
  • SPARQL queries are submitted to the SemanticDB RDF store for
    execution
• Cleveland Clinic’s registry of 200,000 patient records
  comprises an RDF graph of roughly 80 million RDF assertion
Dataset topology
• An RDF dataset with no default graph and one named graph
  per patient record (a patient record graph)
• Beyond identifying the cohort, most subsequent query
  processing happens within a single patient record graph
• In our vocabulary, there are instances of
  PatientRecord, Operation, Patient, MedicalEvent, HospitalEpi
  sode, etc.
• PatientRecord resources share a URI with their containing
  graph
• GRAPH operator can be used to optimize the search space
• Optimal for the following cohort querying paradigm
   • Constraints in the first part of query are cross-graph and the second
      part are intra-graph
Use of system
• From 2009 through June of 2011
  • over 200 clinical investigations utilized SemanticDB to identify
    study cohorts and retrieve appropriate data for analysis
  • studies ranged from relatively simple feasibility assessments to
    extremely complex investigations of time-related events and
    competing risks of the patient experiencing a certain outcome
    after treatment
  • prior cohort identification and data export queries for studies
    would have been performed by a skilled database administrator
    (DBA) interpreting instructions from domain experts
  • Using SemanticDB and the SRA, a non-technical domain expert
    performed most of the queries

More Related Content

What's hot

Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
20080917 Rev
20080917 Rev20080917 Rev
20080917 Revcharper
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSemLib Project
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 
DAS, the Distributed Annotation System
DAS, the Distributed Annotation SystemDAS, the Distributed Annotation System
DAS, the Distributed Annotation SystemRafael C. Jimenez
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)Nikos Palavitsinis, PhD
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_nextJun Zhao
 
Thu 1400 cagle_kurt_color
Thu 1400 cagle_kurt_colorThu 1400 cagle_kurt_color
Thu 1400 cagle_kurt_colorDATAVERSITY
 

What's hot (20)

Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
20080917 Rev
20080917 Rev20080917 Rev
20080917 Rev
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
rEDCap At A Glance
rEDCap At A GlancerEDCap At A Glance
rEDCap At A Glance
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
DAS, the Distributed Annotation System
DAS, the Distributed Annotation SystemDAS, the Distributed Annotation System
DAS, the Distributed Annotation System
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Freire model api
Freire model apiFreire model api
Freire model api
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Session6
Session6Session6
Session6
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
 
Thu 1400 cagle_kurt_color
Thu 1400 cagle_kurt_colorThu 1400 cagle_kurt_color
Thu 1400 cagle_kurt_color
 

Viewers also liked

BBC Linked Data and the Music Bore
BBC Linked Data and the Music BoreBBC Linked Data and the Music Bore
BBC Linked Data and the Music BorePatrick Sinclair
 
BBC Programmes and Music on the Linking Open Data Cloud
BBC Programmes and Music on the Linking Open Data CloudBBC Programmes and Music on the Linking Open Data Cloud
BBC Programmes and Music on the Linking Open Data CloudPatrick Sinclair
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsPatrick Sinclair
 
Implementing the Storyline Ontology in BBC News
Implementing the Storyline Ontology in BBC NewsImplementing the Storyline Ontology in BBC News
Implementing the Storyline Ontology in BBC NewsJeremy Tarling
 

Viewers also liked (7)

Happenings de los 60 prueba
Happenings  de los 60 pruebaHappenings  de los 60 prueba
Happenings de los 60 prueba
 
Linked Data on Rails
Linked Data on RailsLinked Data on Rails
Linked Data on Rails
 
BBC Linked Data and the Music Bore
BBC Linked Data and the Music BoreBBC Linked Data and the Music Bore
BBC Linked Data and the Music Bore
 
BBC Programmes and Music on the Linking Open Data Cloud
BBC Programmes and Music on the Linking Open Data CloudBBC Programmes and Music on the Linking Open Data Cloud
BBC Programmes and Music on the Linking Open Data Cloud
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Implementing the Storyline Ontology in BBC News
Implementing the Storyline Ontology in BBC NewsImplementing the Storyline Ontology in BBC News
Implementing the Storyline Ontology in BBC News
 

Similar to Semantic Web use cases in outcomes research

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLChimezie Ogbuji
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)Venky Dood
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) robin fay
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for InsTemesgenHabtamu
 
An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profileskcoylenet
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked dataWilliam Smith
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
How does semantic technology work?
How does semantic technology work? How does semantic technology work?
How does semantic technology work? Graeme Wood
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 

Similar to Semantic Web use cases in outcomes research (20)

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDL
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
 
we to deep learning
we to deep learning we to deep learning
we to deep learning
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for Ins
 
An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profiles
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
How does semantic technology work?
How does semantic technology work? How does semantic technology work?
How does semantic technology work?
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Memorix and SHACL
Memorix and SHACLMemorix and SHACL
Memorix and SHACL
 
Ozr2013
Ozr2013Ozr2013
Ozr2013
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 

More from Chimezie Ogbuji

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryChimezie Ogbuji
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Chimezie Ogbuji
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextractionChimezie Ogbuji
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereChimezie Ogbuji
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachChimezie Ogbuji
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsChimezie Ogbuji
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsChimezie Ogbuji
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR OntologyChimezie Ogbuji
 

More from Chimezie Ogbuji (11)

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and Where
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
 

Semantic Web use cases in outcomes research

  • 1. http://metacognition.info/presentations/SW-usecases-outcomes-research.ppt Semantic Web use cases in outcomes research Experiences from building a patient repository and developing standards Chimezie Ogbuji Metacognition Inc. (Owner)
  • 2. Outline • Me • Semantic Web and Semantic Web technologies • RDF, GRDDL, OWL, RIF, and SPARQL • Cleveland Clinic Semantic DB project • Content repository • Data collection workflow • Quality and outcomes reporting • Cohort identification • Use of the system
  • 3. Me and the Semantic Web • I’ve been developing software using standards of the Semantic Web since 2001 • Worked on a startup that developed an XML & RDF content repository • Began working on Cleveland Clinic SemanticDB project in 2003 • Began working in the World-Wide Consortium (W3C), developing the SPARQL and GRDDL standards in 2007 and 2006, respectively • I contribute to and maintain several open source software projects related to Semantic Web technologies: • RDFLib (https://code.google.com/p/rdflib/) • FuXi (https://code.google.com/p/fuxi/) • Akamu (https://code.google.com/p/akamu/)
  • 4. The Semantic Web • The Semantic Web • What is it? Like asking “What is the Matrix?” • A vision of how the existing WWW can be extended such that machines can interpret the meaning of data involved in protocol interactions • A vision of the founder of the World-wide Web Consortium (W3C) and inventor of the internet (Tim Berners-Lee) • Semantic Web technologies / standards • Layers of W3C standards (“Layer cake”) • A technological roadmap that attempts to realize this vision • The technologies are well-suited to addressing many enterprise software architecture challenges
  • 7. “Focus” standards • Resource Description Framework • Gleaning Resource Descriptions from Dialects of Language • SPARQL Protocol And RDF Query Language • Ontology Web Language
  • 8. RDF • A framework for representing information in on the WWW. • Motivation • machine-interpretable metadata about web resources • mashup of application data • automated processing of web information by software agents • Graph data model (directed, labeled graph) • Nodes and links are labeled with URIs • Some nodes are not labeled (Blank nodes) • Links are called RDF sentences or triples http://www.w3.org/TR/rdf-concepts/
  • 9. GRDDL • A protocol for sowing semantics in structured (XML) web content for harvest • Vast amount of latent semantics in web documents • Web content today is primarily built for human consumption http://www.w3.org/TR/grddl/
  • 10. Faithful Rendition “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” • Licenses an interpretation of an XML document that is certified by the author (embedded) transform XHTML / XML RDF (instances) namespace transform XML namespace RDF
  • 11. Architectural value • XML is well-suited for messaging, data collection, and structural validation • RDF is well-suited for expressive logical assertions, querying, and inference. • RDF graphs can be created, update, deleted, etc. (managed) using a particular XML vocabulary • vocabulary can be specific to a particular purpose • GRDDL facilitates mutually-beneficial use of XML and RDF processing and representation
  • 12. SPARQL • The query language for RDF content • It operates over an RDF dataset • comprised of named (a URI) RDF graphs and a single RDF graph without a name • Operationally and structurally similar to SQL • Many implementations (including the ones we used) build on existing relational database management systems • translate SPARQL queries into SQL queries Elliott et al. A complete translation from SPARQL into efficient SQL. 2009 http://www.w3.org/TR/sparql11-query/
  • 13. OWL • Language for describing and constraining the semantics of an RDF vocabulary • Such constraints (often hierarchical) are called ontologies • An ontology specifies a conceptualization of a particular domain as categories, relationships between them, and constraints on both • By defining an OWL document for the terms in an RDF graph, additional RDF sentences can be inferred • Additionally, an RDF graph can be determined to be consistent or inconsistent with respect to the ontology • Both tasks can be performed by a logical reasoning engine
  • 14. Semantic Database (SDB) • Cleveland Clinic’s Heart and Vascular Institute (HVI) • Challenges: • fragmented gathering and storing of clinical research data • compartmentalization of medical science and practice • clinical knowledge is often expressed in ambiguous, idiosyncratic terminology • problematic for longitudinal patient data that can feasibly span multiple, geographically separated sources and disciplines • Longitudinal patient record: • patient records from different times, providers, and sites of care that are linked to form a lifelong view of a patient’s health care experience Institute of Medicine. The computer-based patient record: an essential technology for health care. 1997 http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/
  • 15. Project goals • Create a framework for context-free data management • Usable for any domain with nothing (or little) assumed about the domain • Expert-provided, domain-specific knowledge is used to control most aspects of • Data entry • Storage • Display • Retrieval • Formatting for external systems
  • 16. Components • Content repository • supports data collection, document management, and knowledge representation for use in managing longitudinal clinical data • manages patient record documents as XML and converts them to RDF graphs for downstream semantic processing • Data collection workflow management • process of transcribing details of a heart procedure from the EHR into a registry • RDF used as the state machine of a workflow engine Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012 Ogbuji. A Role for Semantic Web Technologies in Patient Record Data Collection. 2009
  • 17. Workflow State as RDF Dataset • Each task is an XML document in a content repository • Mirrored into a named RDF graph that shares a web location (the name) with the document • (SPARQL) query is dispatched against a workflow dataset to find tasks in particular states or assigned to particular people • Applications interact with task information and fetch: • JSON and XML representations (for client-side web applications) • XHTML documents that render as faceted views of a collection of tasks • faceted view includes links to subsequent stages in workflow and into other web applications on server
  • 18.
  • 19. Reporting challenges • Reporting places a heavy burden on institutions to produce data in specific formats with precise definitions • Definitions vary across reports • makes it difficult to use the same source data for all reports • Institutions are typically forced to manually abstract the data for each report • This is done separately to conform to the requirements for each report Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012
  • 20. Components: reporting • Quality and outcomes reporting • generate outcomes reports both for internal and external consumption • internal reports were generated monthly and external reports are generated quarterly • quarterly reports submitted to Society of Thoracic Surgeons (STS) Adult Cardiac Surgery National Database and American College of Cardiology (ACC) CathPCI Database • submissions are required for certification Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012
  • 21.
  • 22. Cohort identification • SPARQL and RDF datasets are well-suited as infrastructure for a longitudinal patient record data warehouse • HVI software development team partnered with Cycorp to build a cohort identification interface called the Semantic Research Assistant (SRA) • Based on the Cyc inference engine • a powerful reasoning system and knowledge base with built-in capability for natural language (NL)processing, forward-chaining inference and backward-chaining inference. • incorporates Cyc's NL processing to permit a user to compose a cohort selection query by typing an English sentence or sentence fragment Lenat et al. Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries. 2010.
  • 23.
  • 24. RDF dataset warehouse • CycL to SPARQL • domain-specific medical ontologies in conjunction with the Cyc general ontology are used to convert the NL query into a formal representation and then into SPARQL queries. • SPARQL queries are submitted to the SemanticDB RDF store for execution • Cleveland Clinic’s registry of 200,000 patient records comprises an RDF graph of roughly 80 million RDF assertion
  • 25. Dataset topology • An RDF dataset with no default graph and one named graph per patient record (a patient record graph) • Beyond identifying the cohort, most subsequent query processing happens within a single patient record graph • In our vocabulary, there are instances of PatientRecord, Operation, Patient, MedicalEvent, HospitalEpi sode, etc. • PatientRecord resources share a URI with their containing graph
  • 26. • GRAPH operator can be used to optimize the search space • Optimal for the following cohort querying paradigm • Constraints in the first part of query are cross-graph and the second part are intra-graph
  • 27. Use of system • From 2009 through June of 2011 • over 200 clinical investigations utilized SemanticDB to identify study cohorts and retrieve appropriate data for analysis • studies ranged from relatively simple feasibility assessments to extremely complex investigations of time-related events and competing risks of the patient experiencing a certain outcome after treatment • prior cohort identification and data export queries for studies would have been performed by a skilled database administrator (DBA) interpreting instructions from domain experts • Using SemanticDB and the SRA, a non-technical domain expert performed most of the queries