SlideShare une entreprise Scribd logo
1  sur  77
Role of Semantic Web
           in Health Informatics

Tutorial at 2012 ACM SIGHIT International Health Informatics
           Symposium (IHI 2012), January 28-30, 2012

       Satya S. Sahoo, GQ Zhang AmitSheth
        Division of Medical Informatics    Kno.e.sis Center
        Case Western Reserve University Wright State University
Outline
• Semantic Web
   o Introductory Overview
• Clinical Research
   o Physio-MIMI
• Bench Research and Provenance
   o Semantic Problem Solving Environment for T.cruzi
• Clinical Practice
   o Active Semantic Electronic Medical Record
Semantic Web
Landscape of Health Informatics



                    Patient Care
                    Personalized Medicine
                    Drug Development
Clinical Research                          Bench Research
                    Privacy
                    Cost




                     Clinical Practice
                       * Images from case.edu
Challenges
• Information Integration: Reconcile heterogeneity
  o Syntactic Heterogeneity: DOB vs. Date of Birth
  o Structural Heterogeneity: Street + Apt + City vs.
    Address
  o Semantic Heterogeneity: Age vs. Age at time of surgery
    vs. Age at time of admission
• Humans can (often) accurately interpret, but
  extremely difficult for machine
  o Role for Metadata/Contextual Information/Semantics
Semantic Web
• Web of Linked Data
• Introduced by Berners
  Lee et. al as next step for
  Web of Documents
• Allow “machine
  understanding” of data,
• Create “common”
  models of domains using
  formal language -
                              Semantic Web Layer Cake
  ontologies
              Layer cake image source: http://www.w3.org
Resource Description Framework
                                            Location
    Company                            Armonk, New York,
                                       United States
      IBM

                                       Zurich, Switzerland



• Resource Description Framework – Recommended by
  W3C for metadata modeling [RDF]
• A standard common modeling framework – usable by
  humans and machine understandable
RDF: Triple Structure, IRI, Namespace
                            Headquarters located in       Armonk, New York,
          IBM
                                                          United States

• RDF Triple
   o Subject: The resource that the triple is about
   o Predicate: The property of the subject that is described by the triple
   o Object:The value of the property
• Web Addressable Resource:Uniform Resource Locator (URL), Uniform
  Resource Identifier(URI), Internationalized Resource Identifier (IRI)
• Qualified Namespace:http://www.w3.org/2001/XMLSchema#
  asxsd:
   o xsd: string instead of http://www.w3.org/2001/XMLSchema#string
RDF Representation
• Two types of property values in a triple
   o Web resource             Headquarters located in
                        IBM                             Armonk, New York,
   o Typed literal                                      United States

                              Has total employees
                       IBM                              “430,000” ^^xsd:integer



 • The graph model of RDF:node-arc-node is the primary
   representation model
 • Secondary notations: Triple notation
    o companyExample:IBM companyExample:has-Total-
      Employee “430,000”^^xsd:integer .
RDF Schema
                 Headquarters located in   Armonk, New
     IBM
                                           York, United States

                 Headquarters located in   Redwood Shores,
    Oracle
                                           California, United States


                 Headquarters located in
   Company                                 Geographical Location



• RDF Schema: Vocabulary for describing groups of
  resources [RDFS]
RDF Schema
 • Propertydomain(rdfs:domain) and range(rdfs:range)
     Domain           Headquarters located in      Range
      Company                                   Geographical Location


 • Class Hierarchy/Taxonomy:rdfs:subClassOf
   SubClass           rdfs:subClassOf              (Parent) Class
Computer Technology                                      Company
Company

Banking Company

Insurance Company
Ontology: A Working Definition
• Ontologies are shared conceptualizations of a
  domain represented in a formal language*
• Ontologies in health informatics:
      o Common representation model - facilitate
        interoperability, integration across different projects,
        and enforce consistent use of terminology
      o Closely reflect domain-specific details (domain
        semantics) essential to answer end user
      o Support reasoning to discover implicit knowledge

* Paraphrased from Gruber, 1993
OWL2 Web Ontology Language
• A language for modeling ontologies [OWL]
• OWL2 is declarative
• An OWL2 ontology (schema) consists of:
  o Entities:Company, Person
  o Axioms:Company employs Person
  o Expressions:A Person Employed by a Company =
    CompanyEmployee
• Reasoning: Draw a conclusion given certain
  constraints are satisfied
  o RDF(S) Entailment
  o OWL2 Entailment
OWL2 Constructs

• Class Disjointness: Instance of class A cannot be
  instance of class B
• Complex Classes: Combining multiple classes with
  set theory operators:
  o Union:Parent =ObjectUnionOf(:Mother :Father)
  o Logical negation:UnemployedPerson =
    ObjectIntersectionOf(:EmployedPerson)
  o Intersection:Mother =ObjectIntersectionOf(:Parent
    :Woman)
OWL2 Constructs

• Property restrictions: defined over property
• Existential Quantification:
  o Parent =ObjectSomeValuesFrom(:hasChild :Person)
  o To capture incomplete knowledge
• Universal Quantification:
  o US President = objectAllValuesFrom(:hasBirthPlace
    United States)
• Cardinality Restriction
SPARQL: Querying Semantic Web Data

• A SPARQL query pattern composed of triples
• Triples correspond to RDF triple structure, but
  have variable at:
  o Subject: ?companyex:hasHeadquaterLocationex:NewYork.
  o Predicate: ex:IBM?whatislocatedinex:NewYork.
  o Object: ex:IBMex:hasHeadquaterLocation?location.
• Result of SPARQL query is list of values –
  valuescan replace variable in query pattern
SPARQL: Query Patterns
• An example query pattern
PREFIX ex:<http://www.eecs600.case.edu/>
SELECT?company ?location WHERE
{?company ex:hasHeadquaterLocation?location.}
• Query Result

   company                 location             Multiple
                                                Matches
   IBM                     NewYork
   Oracle                  RedwoodCity
   MicorosoftCorporation   Bellevue
SPARQL: Query Forms
• SELECT: Returns the values bound to the variables
• CONSTRUCT: Returns an RDF graph
• DESCRIBE: Returns a description (RDF graph) of
  a resource (e.g. IBM)
  o The contents of RDF graph is determined by SPARQL
    query processor
• ASK: Returns a Boolean
  o True
  o False
Semantic Web+Clinical Research Informatics =
              Physio-MIMI
Physio-MIMI Overview
• Physio-MIMI: Multi-Modality, Multi-Resource Environment for
  Physiological and Clinical Research
• NCRR-funded, multi-CTSA-site project (RFP 08-001) for
  providing informatics tools to clinical investigators and clinical
  research teams at and across CTSA institutions to enhance the
  collection, management and sharing of data
• Collaboration among Case Western, U Michigan, Marshfield
  Clinic and U Wisconsin Madison
• Use Sleep Medicine as an exemplar, but also generalizable
• Two year duration: Dec 2008 – Dec 2010
Features of Physio-MIMI
• Federated data integration environment
   – Linking existing data resources without a centralized data
     repository
• Query interface directly usable by clinical researchers
   – Minimize the role of the data-access middleman
• Secure and policy-compliant data access
   – Fine-grained access control, dual SSL, auditing
• Tools for curatingPSGs


                                Data Integration Framework
       Physio-MIMI

                                SHHS Portal
Data Access, Secondary Use
Measure not by the size of the database, but the
number of secondary studies it supported
Query Interface – driven by access
•   Visual Aggregator and Explorer (VISAGE)
•   Federated, Web-based
•   Driven by Domain Ontology (SDO)
•   PhysioMap to connect autonomous data sources


            Clinical            Clinical
          Investigator        Investigator

     1                   3      1                • GQ Zhang et al.
                                                   VISAGE: A Query Interface for Clinical
         Data Analyst
         Data Manager          Database      3     Research, Proceedings of the 2010 AMIA
                                                   Clinical Research Informatics
           2
                                2                  Summit, San Francisco, March 12-13, pp.
                              Data Analyst
                                                   76-80, 2010
           Database          Data Manager
Physio-MIMI Components
                          Sleep Researcher           Domain Expert              Informatician
META SERVER




                                                      Query Builder
              VISAGE




                           Query Manager                                        Query Explorer


                                                  DB-Ontology Mapper
DATA SERVER




                       Institutional Databases       Institutional Databases      Institutional Databases




                         Institutional Firewall        Institutional Firewall       Institutional Firewall
VISAGE screenshot
Components of VISAGE
Case Control Study Design
•Case-control is a common study design
• Used for epidemiological studies involving two cohorts,
one representing the cases
and the second representing the controls
• Adjusting matching ratio to improve statistical power
Example (CFS)
• Suppose we are interested in the question of whether
  sleep parameters (EEG) differ by obesity in age and race
  matched males
• Case: adult 55-75, male, BMI 35-50 (obese)
• Control: adult 55-75, male, BMI 20-30 (non-obese)
• Matching 1:2 on race (minimize race as a factor initially)
Adult 55-75, male, BMI 35-50
Adult 55-75, male, BMI 20-30
Set up 1:2 Matching
1:2 Matching Result



        Control   Matched
Case
1:5 Matching?
1:5 Matching – CFS+SHHS




Modify Control to Include
TWO data sources
Sleep Domain Ontology (SDO)
•   Standardize terminology and semantics (define variations) [RO]
•   Facilitate definition of data elements
•   Valuable for data collection, data curation
•   Data integration
•   Data sharing and access
•   Take advantage of progress in related areas (e.g. Gene Ontology)
•   Improving data quality – provenance, reproducibility
Sleep Domain Ontology (SDO)
   https://mimi.case.edu/concepts
Sleep Domain Ontology (SDO)
   https://mimi.case.edu/concepts
VISAGE Query Builder showing a data query on Parkinsonian Disorders and REM sleep
behavior disorder with race demographics
Semantic Web+Provenance +Bench
Research=T.cruzi SemanticProblem Solving
              Environment
Semantic Problem Solving Environment for
                 T.cruzi
Provenance in Scientific Experiments




                        New Parasite Strains
Provenance in Scientific Experiments
                 Gene
                 Name

               Sequence
               Extraction

  Drug
                 3‘ & 5’
Resistant
                 Region
 Plasmid                        Gene Name
                Plasmid
              Construction

               Knockout
T.Cruzi
            Construct Plasmid
sample
                                         ?
              Transfection


              Transfected
                Sample


                  Drug          Cloned Sample
                Selection

                Selected
                Sample

                   Cell
                 Cloning

                 Cloned
                 Sample
Provenance in Scientific Experiments
                 Gene
                 Name
                                • Provenance from the French word
                                  “provenir” describes the lineage or
               Sequence
               Extraction

  Drug
Resistant
 Plasmid
                 3‘ & 5’
                 Region           history of a data entity
                Plasmid
              Construction
                                • For Verification and Validation of
T.Cruzi
               Knockout
            Construct Plasmid
                                  Data Integrity, Process Quality, and
                                  Trust
sample


              Transfection


              Transfected
                                • Semantic Provenance Framework
                Sample
                                  addresses three aspects [Prov]
                                   o Provenance Modeling
                  Drug
                Selection

                Selected
                Sample             o Provenance Query Infrastructure
                   Cell
                 Cloning           o Scalable Provenance System
                 Cloned
                 Sample
Domain-specific Provenance ontology
                        has_agent                       agent
                                                                             is_a                                                                              PROVENIR
                                                                                                                data
                                                                                                                                                               ONTOLOGY
                                                                                                          is_a                           parameter
                                                                       data_collection                                                                  is_a
                   process                                                                                                             is_a
                                                                                                spatial_parameter
                                                                                                                                                       temporal_parameter

                                                                is_a                                                               domain_parameter
                 is_a                                                                    is_a


is_a                                                                                                                        is_a
                                is_a
                                                            transfection_machine
                                                                                                                       location                                   is_a
               drug_selection                                                                                                          is_a
                                                                   subPropertyOf
                                                                                                       sample                                 has_temporal_parameter
                   strain_creation          is_a
                      _protocol                                                                                                                                    Time:DateTime
                                                                                                                                                                    Descritption
transfection
                                             cell_cloning                                       is_a               transfection_buffer
                                                                                                                                                              PARASITE
                                 has_input_value                           Tcruzi_sample                                                                    EXPERIMENT
                                                                                                                                                             ONTOLOGY
                                                   has_parameter



                                        • Total Number of Classes - 118
                                        • DL Expressivity – ALCHQ(D)
Provenance Query Classification
Classified Provenance Queries into Three Categories
• Type 1: Querying for Provenance Metadata
   o Example: Which gene was used create the cloned sample with ID = 66?
• Type 2: Querying for Specific Data Set
   o Example: Find all knockout construct plasmids created by researcher
     Michelle using “Hygromycin” drug resistant plasmid between April 25, 2008
     and August 15, 2008
• Type 3: Operations on Provenance Metadata
   o Example: Were the two cloned samples 65 and 46 prepared under
     similar conditions – compare the associated provenance
     information
Provenance Query Operators
Four Query Operators – based on Query
  Classification
• provenance () – Closure operation, returns the
  complete set of provenance metadata for input data
  entity
• provenance_context() - Given set of constraints
  defined on provenance, retrieves datasets that
  satisfy constraints
• provenance_compare () - adapt the RDF graph
  equivalence definition
• provenance_merge () - Two sets of provenance
  information are combined using the RDF graph
  merge
Answering Provenance Queries using provenance ()
                 Operator
Implementation: Provenance Query Engine
                                               QUERY
                                              OPTIMIZER

• Three modules:
  o Query Composer
  o Transitive closure
  o Query Optimizer
• Deployable over a
  RDF store with
  support for
  reasoning              TRANSITIVE CLOSURE
Application in T.cruzi SPSE Project



                     • Provenance tracking
                       for gene knockout,
                       strain creation,
                       proteomics, microarray
                       experiments
                     • Part of the Parasite
                       Knowledge Repository
                       [BKR]
W3C Provenance Working Group
• Define a “provenance interchange language for
  publishing and accessing provenance”
• Three working drafts:
  o PROV-Data Model: A conceptual model for
    provenance representation
  o PROV-Ontology: An OWL ontology for provenance
    representation
  o PROV-Access and Query: A framework to query
    and retrieve provenance on the Web
Semantic Web+Clinical Practice Informatics =Active
  Semantic Electronic Medical Record (ASEMR)
Semantic Web application in use
In daily use at Athens Heart Center
  – 28 person staff
     • Interventional Cardiologists
     • Electrophysiology Cardiologists
  – Deployed since January 2006
  – 40-60 patients seen daily
  – 3000+ active patients
  – Serves a population of 250,000 people
Information Overload in Clinical
                 Practice
• New drugs added to market
  – Adds interactions with current drugs
  – Changes possible procedures to treat an illness
• Insurance Coverage's Change
  – Insurance may pay for drug X but not drug Y even
    though drug X and Y are equivalent
  – Patient may need a certain diagnosis before some
    expensive test are run
• Physicians need a system to keep track of ever
  changing landscape
System though out the practice
System though out the practice
System though out the practice
System though out the practice
Active Semantic Document (ASD)
A document (typically in XML) with the following features:

• Semantic annotations
   – Linking entities found in a document to ontology
   – Linking terms to a specialized lexicon [TR]


• Actionable information
   – Rules over semantic annotations
   – Violated rules can modify the appearance of the document (Show an
     alert)
Active Semantic Patient Record
• An application of ASD
• Three Ontologies
  – Practice
     Information about practice such as patient/physician data
  – Drug
     Information about drugs, interaction, formularies, etc.
  – ICD/CPT
     Describes the relationships between CPT and ICD codes
• Medical Records in XML created from database
Practice Ontology Hierarchy
                                 (showing is-a relationships)



                           facility
                                                                                            insurance_
 ancillary                                    owl:thing                                     carrier


                    ambularory                                           insurance
                    _episode
                                                                                              insurance_
encounter
                                                                                              plan
                                            person


            event                                                                    insurance_
                                  patient                                            policy
                                                          practitioner
Drug Ontology Hierarchy
                                  (showing is-a relationships)

                                             formulary_
        non_drug_           interaction_     property                   formulary
        reactant            property
                                                                                              indication
                    indication_                         property
                                                                              owl:thing
monograph           property
_ix_class                           prescription                                             interaction_
                                    _drug_                                                   with_non_
                brandname_                               prescription
                                    brand_name                                               drug_reactant
prescription    individual                               _drug                interaction
_drug_
property                      brandname_
               brandname_     composite        prescription                                 interaction_
               undeclared                      _drug_                                       with_mono
                                                                          interaction_
                                               generic                                      graph_ix_cl
                                                                          with_prescri
  cpnum_                     generic_                                                       ass
                                                                          ption_drug
  group                      composite
                                                   generic_
                                                   individual
Drug Ontology showing neighborhood of
       PrescriptionDrug concept
Part of Procedure/Diagnosis/ICD9/CPT Ontology




                     maps_to_diagnosis
       specificity
                                         procedure
  diagnosis

                     maps_to_procedure
Semantic Technologies in Use
• Semantic Web: OWL, RDF/RDQL, Jena
   – OWL (constraints useful for data consistency), RDF
   – Rules are expressed as RDQL
   – REST Based Web Services: from server side
• Web 2.0: client makes AJAX calls to ontology, also auto
  complete
Problem:
• Jena main memory- large memory footprint, future scalability
  challenge
• Using Jena’s persistent model (MySQL) noticeably slower
Architecture & Technology
Benefits: Athens Heart Center Practice
                              Growth
               1400
               1300
               1200
               1100
Appointments




               1000                                  2003
               900
                                                     2004
               800
                                                     2005
               700
                                                     2006
               600
               500
               400


                                                 v
                           b




                                          g
                            r




                                                 c
                    n




                                  n

                                          l


                                                p
                          ar




                                                 t
                          ay



                                        ju
                         ap




                                              no
                                              oc
                         fe
                  ja




                                ju


                                       au




                                              de
                                              se
                        m


                        m




                                      Month
Chart Completion before the preliminary
               deployment of the ASMER

         600
         500
         400
Charts




                                              Same Day
         300
                                              Back Log
         200
         100
           0
               Se 4




                      5
                     04




                     05
           04




                     05
                     04




                     05
                     04

                     04
                    l0




                    l0
           n




                   n
                 ay




                 ay
                  pt
                  ar




                  ar
                 ov
                 Ju




                 Ju
         Ja




                Ja
                M




                M
               M




               M
               N




                        Month/Year
Chart Completion after the preliminary
                deployment of the ASMER

         700
         600
         500
Charts




         400                                          Same Day
         300                                          Back Log
         200
         100
           0
               Sept   Nov 05        Jan 06   Mar 06
                05
                               Month/Year
Benefits of current system
• Error prevention (drug interactions, allergy)
  – Patient care
  – insurance
• Decision Support (formulary, billing)
  – Patient satisfaction
  – Reimbursement
• Efficiency/time
  – Real-time chart completion
  – “semantic” and automated linking with billing
Demo


     On-line demo of Active Semantic Electronic Medical Record

     deployed and in use at Athens Heart Center




71
Challenges, Opportunities, and Future
             Direction
Conclusions
Benefits of SW in Health Informatics:
• RDF a “universal” data model; Application-
  purpose agnostic (clinical care vs research)
• Integration “ready,” supporting distributed query
  out of box
• Semantic interoperability addressed at root level
• Better support of user interfaces for data capture,
  data query, data integration
• Scalability demonstrated
Challenges and Future Directions
• Design and implementation of health information systems
  with RDF as primary data store from ground up
• User-friendly graphical query interface on top of SPARQL
• Managing Protected Health Information (PHI) e.g. data
  encryption “at rest” for RDF store
• From retrospective annotation of data (with ontology) to
  prospective annotation of data: ontology-driven data capture
  with annotation happening at the point of primary source
  (eliminating the need to annotate data retrospectively)
• Let ontology drive “everything”
References
•   [RDF] Manola F, Miller, E.(Eds.). RDF Primer. 2004; Available from:
    http://www.w3.org/TR/rdf-primer/
•   [RDFS] Brickley D, Guha, R.V. RDF Schema. 2004; Available from:
    http://www.w3.org/TR/rdf-schema/
•   [OWL] Hitzler P, Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. OWL 2
    Web Ontology Language Primer: W3C; 2009
•   [Physio-MIMI]: http://physiomimi.case.edu
•   [ASEMR] A. P. Sheth, Agrawal, S., Lathem, J., Oldham, N., Wingate, H., Yadav, P.,
    Gallagher, K., "Active Semantic Electronic Medical Record," in 5th International
    Semantic Web Conference, Athens, GA, USA, 2006.
•   [BioRDF] BioRDF subgroup: Health Care and Life Sciences interest group Available:
    http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup
•   [TR] A. Ruttenberg, et al., "Advancing translational research with the Semantic Web,"
    BMC Bioinformatics vol. in Press, 2007.
References 2
•   [Visage] GQ Zhang et al. VISAGE: A Query Interface for Clinical Research,
    Proceedings of the 2010 AMIA Clinical Research Informatics Summit, San Francisco,
    March 12-13, pp. 76-80, 2010
•   [Prov] S.S. Sahoo, V. Nguyen, O. Bodenreider, P. Parikh, T. Minning, A.P. Sheth, “A
    unified framework for managing provenance information in translational research.”
    BMC Bioinformatics 2011, 12:461
•   [RO] Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C,
    Neuhaus F, Rector AL, Rosse C: Relations in biomedical ontologies. Genome Biol
    2005, 6(5):R46.
•   [BKR] Bodenreider O, Rindflesch, T.C.: Advanced library services: Developing a
    biomedical knowledge repository to support advanced information management
    applications. In. Bethesda, Maryland: Lister Hill National Center for Biomedical
    Communications, National Library of Medicine; 2006.
•   T.cruzi project web site: http://wiki.knoesis.org/index.php/Trykipedia
Acknowledgements
• Collaborators:
  o Susan Redline, Remo Mueller, and other members of
    Physio-MIMI team
  o Rick Tarleton, Todd Manning, Priti Parikh and other
    members of the T.cruzi SPSE team
  o Dr. S. Agrawal and other members at the Athens Heart
    Center, GA
• NIH Support: UL1-RR024989, UL1-RR024989-05S,
  NCRR-94681DBS78, NS076965, and 1R01HL087795

Contenu connexe

Tendances

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Cloud platforms - Cloud Computing
Cloud platforms - Cloud ComputingCloud platforms - Cloud Computing
Cloud platforms - Cloud Computing
Aditi Rai
 
Medical center using Data warehousing
Medical center using Data warehousingMedical center using Data warehousing
Medical center using Data warehousing
Saleem Almaqashi
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
purnimatm
 

Tendances (20)

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Cloud platforms - Cloud Computing
Cloud platforms - Cloud ComputingCloud platforms - Cloud Computing
Cloud platforms - Cloud Computing
 
Active database
Active databaseActive database
Active database
 
Information Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyInformation Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case Study
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
 
Medical center using Data warehousing
Medical center using Data warehousingMedical center using Data warehousing
Medical center using Data warehousing
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine Learning
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 

En vedette

Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Artificial Intelligence Institute at UofSC
 

En vedette (17)

Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web Detecting Signals from Real-time Social Web
Detecting Signals from Real-time Social Web
 
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
Citizen Sensor Data Mining, Social Media Analytics and Development Centric ...
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Active Perception over Machine and Citizen Sensing
Active Perception  over Machine and Citizen SensingActive Perception  over Machine and Citizen Sensing
Active Perception over Machine and Citizen Sensing
 
Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Trust networks
Trust networksTrust networks
Trust networks
 
Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...Domain case study: successful application of Semantic Web technologies and to...
Domain case study: successful application of Semantic Web technologies and to...
 
User Experiences of Enterprise Semantic Content Management
User Experiences of Enterprise Semantic Content ManagementUser Experiences of Enterprise Semantic Content Management
User Experiences of Enterprise Semantic Content Management
 
Meena Nagarajan Ph.D. Dissertation Defense
Meena Nagarajan Ph.D. Dissertation DefenseMeena Nagarajan Ph.D. Dissertation Defense
Meena Nagarajan Ph.D. Dissertation Defense
 
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated ConferencesComputing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
 
Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]
 
Kino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations EasierKino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations Easier
 
How to Leverage Social Media Communities for Crisis Response Coordination
How to Leverage Social Media Communities for Crisis Response CoordinationHow to Leverage Social Media Communities for Crisis Response Coordination
How to Leverage Social Media Communities for Crisis Response Coordination
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 

Similaire à Role of Semantic Web in Health Informatics

Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
Jose Enrique Ruiz
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 

Similaire à Role of Semantic Web in Health Informatics (20)

Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Semtech2006
Semtech2006Semtech2006
Semtech2006
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Role of Semantic Web in Health Informatics

  • 1. Role of Semantic Web in Health Informatics Tutorial at 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012), January 28-30, 2012 Satya S. Sahoo, GQ Zhang AmitSheth Division of Medical Informatics Kno.e.sis Center Case Western Reserve University Wright State University
  • 2. Outline • Semantic Web o Introductory Overview • Clinical Research o Physio-MIMI • Bench Research and Provenance o Semantic Problem Solving Environment for T.cruzi • Clinical Practice o Active Semantic Electronic Medical Record
  • 4. Landscape of Health Informatics Patient Care Personalized Medicine Drug Development Clinical Research Bench Research Privacy Cost Clinical Practice * Images from case.edu
  • 5. Challenges • Information Integration: Reconcile heterogeneity o Syntactic Heterogeneity: DOB vs. Date of Birth o Structural Heterogeneity: Street + Apt + City vs. Address o Semantic Heterogeneity: Age vs. Age at time of surgery vs. Age at time of admission • Humans can (often) accurately interpret, but extremely difficult for machine o Role for Metadata/Contextual Information/Semantics
  • 6. Semantic Web • Web of Linked Data • Introduced by Berners Lee et. al as next step for Web of Documents • Allow “machine understanding” of data, • Create “common” models of domains using formal language - Semantic Web Layer Cake ontologies Layer cake image source: http://www.w3.org
  • 7. Resource Description Framework Location Company Armonk, New York, United States IBM Zurich, Switzerland • Resource Description Framework – Recommended by W3C for metadata modeling [RDF] • A standard common modeling framework – usable by humans and machine understandable
  • 8. RDF: Triple Structure, IRI, Namespace Headquarters located in Armonk, New York, IBM United States • RDF Triple o Subject: The resource that the triple is about o Predicate: The property of the subject that is described by the triple o Object:The value of the property • Web Addressable Resource:Uniform Resource Locator (URL), Uniform Resource Identifier(URI), Internationalized Resource Identifier (IRI) • Qualified Namespace:http://www.w3.org/2001/XMLSchema# asxsd: o xsd: string instead of http://www.w3.org/2001/XMLSchema#string
  • 9. RDF Representation • Two types of property values in a triple o Web resource Headquarters located in IBM Armonk, New York, o Typed literal United States Has total employees IBM “430,000” ^^xsd:integer • The graph model of RDF:node-arc-node is the primary representation model • Secondary notations: Triple notation o companyExample:IBM companyExample:has-Total- Employee “430,000”^^xsd:integer .
  • 10. RDF Schema Headquarters located in Armonk, New IBM York, United States Headquarters located in Redwood Shores, Oracle California, United States Headquarters located in Company Geographical Location • RDF Schema: Vocabulary for describing groups of resources [RDFS]
  • 11. RDF Schema • Propertydomain(rdfs:domain) and range(rdfs:range) Domain Headquarters located in Range Company Geographical Location • Class Hierarchy/Taxonomy:rdfs:subClassOf SubClass rdfs:subClassOf (Parent) Class Computer Technology Company Company Banking Company Insurance Company
  • 12. Ontology: A Working Definition • Ontologies are shared conceptualizations of a domain represented in a formal language* • Ontologies in health informatics: o Common representation model - facilitate interoperability, integration across different projects, and enforce consistent use of terminology o Closely reflect domain-specific details (domain semantics) essential to answer end user o Support reasoning to discover implicit knowledge * Paraphrased from Gruber, 1993
  • 13. OWL2 Web Ontology Language • A language for modeling ontologies [OWL] • OWL2 is declarative • An OWL2 ontology (schema) consists of: o Entities:Company, Person o Axioms:Company employs Person o Expressions:A Person Employed by a Company = CompanyEmployee • Reasoning: Draw a conclusion given certain constraints are satisfied o RDF(S) Entailment o OWL2 Entailment
  • 14. OWL2 Constructs • Class Disjointness: Instance of class A cannot be instance of class B • Complex Classes: Combining multiple classes with set theory operators: o Union:Parent =ObjectUnionOf(:Mother :Father) o Logical negation:UnemployedPerson = ObjectIntersectionOf(:EmployedPerson) o Intersection:Mother =ObjectIntersectionOf(:Parent :Woman)
  • 15. OWL2 Constructs • Property restrictions: defined over property • Existential Quantification: o Parent =ObjectSomeValuesFrom(:hasChild :Person) o To capture incomplete knowledge • Universal Quantification: o US President = objectAllValuesFrom(:hasBirthPlace United States) • Cardinality Restriction
  • 16. SPARQL: Querying Semantic Web Data • A SPARQL query pattern composed of triples • Triples correspond to RDF triple structure, but have variable at: o Subject: ?companyex:hasHeadquaterLocationex:NewYork. o Predicate: ex:IBM?whatislocatedinex:NewYork. o Object: ex:IBMex:hasHeadquaterLocation?location. • Result of SPARQL query is list of values – valuescan replace variable in query pattern
  • 17. SPARQL: Query Patterns • An example query pattern PREFIX ex:<http://www.eecs600.case.edu/> SELECT?company ?location WHERE {?company ex:hasHeadquaterLocation?location.} • Query Result company location Multiple Matches IBM NewYork Oracle RedwoodCity MicorosoftCorporation Bellevue
  • 18. SPARQL: Query Forms • SELECT: Returns the values bound to the variables • CONSTRUCT: Returns an RDF graph • DESCRIBE: Returns a description (RDF graph) of a resource (e.g. IBM) o The contents of RDF graph is determined by SPARQL query processor • ASK: Returns a Boolean o True o False
  • 19. Semantic Web+Clinical Research Informatics = Physio-MIMI
  • 20. Physio-MIMI Overview • Physio-MIMI: Multi-Modality, Multi-Resource Environment for Physiological and Clinical Research • NCRR-funded, multi-CTSA-site project (RFP 08-001) for providing informatics tools to clinical investigators and clinical research teams at and across CTSA institutions to enhance the collection, management and sharing of data • Collaboration among Case Western, U Michigan, Marshfield Clinic and U Wisconsin Madison • Use Sleep Medicine as an exemplar, but also generalizable • Two year duration: Dec 2008 – Dec 2010
  • 21. Features of Physio-MIMI • Federated data integration environment – Linking existing data resources without a centralized data repository • Query interface directly usable by clinical researchers – Minimize the role of the data-access middleman • Secure and policy-compliant data access – Fine-grained access control, dual SSL, auditing • Tools for curatingPSGs Data Integration Framework Physio-MIMI SHHS Portal
  • 23. Measure not by the size of the database, but the number of secondary studies it supported
  • 24. Query Interface – driven by access • Visual Aggregator and Explorer (VISAGE) • Federated, Web-based • Driven by Domain Ontology (SDO) • PhysioMap to connect autonomous data sources Clinical Clinical Investigator Investigator 1 3 1 • GQ Zhang et al. VISAGE: A Query Interface for Clinical Data Analyst Data Manager Database 3 Research, Proceedings of the 2010 AMIA Clinical Research Informatics 2 2 Summit, San Francisco, March 12-13, pp. Data Analyst 76-80, 2010 Database Data Manager
  • 25. Physio-MIMI Components Sleep Researcher Domain Expert Informatician META SERVER Query Builder VISAGE Query Manager Query Explorer DB-Ontology Mapper DATA SERVER Institutional Databases Institutional Databases Institutional Databases Institutional Firewall Institutional Firewall Institutional Firewall
  • 28. Case Control Study Design •Case-control is a common study design • Used for epidemiological studies involving two cohorts, one representing the cases and the second representing the controls • Adjusting matching ratio to improve statistical power
  • 29. Example (CFS) • Suppose we are interested in the question of whether sleep parameters (EEG) differ by obesity in age and race matched males • Case: adult 55-75, male, BMI 35-50 (obese) • Control: adult 55-75, male, BMI 20-30 (non-obese) • Matching 1:2 on race (minimize race as a factor initially)
  • 30. Adult 55-75, male, BMI 35-50
  • 31. Adult 55-75, male, BMI 20-30
  • 32. Set up 1:2 Matching
  • 33. 1:2 Matching Result Control Matched Case
  • 35. 1:5 Matching – CFS+SHHS Modify Control to Include TWO data sources
  • 36. Sleep Domain Ontology (SDO) • Standardize terminology and semantics (define variations) [RO] • Facilitate definition of data elements • Valuable for data collection, data curation • Data integration • Data sharing and access • Take advantage of progress in related areas (e.g. Gene Ontology) • Improving data quality – provenance, reproducibility
  • 37. Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts
  • 38. Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts
  • 39. VISAGE Query Builder showing a data query on Parkinsonian Disorders and REM sleep behavior disorder with race demographics
  • 40. Semantic Web+Provenance +Bench Research=T.cruzi SemanticProblem Solving Environment
  • 41. Semantic Problem Solving Environment for T.cruzi
  • 42. Provenance in Scientific Experiments New Parasite Strains
  • 43. Provenance in Scientific Experiments Gene Name Sequence Extraction Drug 3‘ & 5’ Resistant Region Plasmid Gene Name Plasmid Construction Knockout T.Cruzi Construct Plasmid sample ? Transfection Transfected Sample Drug Cloned Sample Selection Selected Sample Cell Cloning Cloned Sample
  • 44. Provenance in Scientific Experiments Gene Name • Provenance from the French word “provenir” describes the lineage or Sequence Extraction Drug Resistant Plasmid 3‘ & 5’ Region history of a data entity Plasmid Construction • For Verification and Validation of T.Cruzi Knockout Construct Plasmid Data Integrity, Process Quality, and Trust sample Transfection Transfected • Semantic Provenance Framework Sample addresses three aspects [Prov] o Provenance Modeling Drug Selection Selected Sample o Provenance Query Infrastructure Cell Cloning o Scalable Provenance System Cloned Sample
  • 45. Domain-specific Provenance ontology has_agent agent is_a PROVENIR data ONTOLOGY is_a parameter data_collection is_a process is_a spatial_parameter temporal_parameter is_a domain_parameter is_a is_a is_a is_a is_a transfection_machine location is_a drug_selection is_a subPropertyOf sample has_temporal_parameter strain_creation is_a _protocol Time:DateTime Descritption transfection cell_cloning is_a transfection_buffer PARASITE has_input_value Tcruzi_sample EXPERIMENT ONTOLOGY has_parameter • Total Number of Classes - 118 • DL Expressivity – ALCHQ(D)
  • 46. Provenance Query Classification Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata o Example: Which gene was used create the cloned sample with ID = 66? • Type 2: Querying for Specific Data Set o Example: Find all knockout construct plasmids created by researcher Michelle using “Hygromycin” drug resistant plasmid between April 25, 2008 and August 15, 2008 • Type 3: Operations on Provenance Metadata o Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information
  • 47. Provenance Query Operators Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() - Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition • provenance_merge () - Two sets of provenance information are combined using the RDF graph merge
  • 48. Answering Provenance Queries using provenance () Operator
  • 49. Implementation: Provenance Query Engine QUERY OPTIMIZER • Three modules: o Query Composer o Transitive closure o Query Optimizer • Deployable over a RDF store with support for reasoning TRANSITIVE CLOSURE
  • 50. Application in T.cruzi SPSE Project • Provenance tracking for gene knockout, strain creation, proteomics, microarray experiments • Part of the Parasite Knowledge Repository [BKR]
  • 51. W3C Provenance Working Group • Define a “provenance interchange language for publishing and accessing provenance” • Three working drafts: o PROV-Data Model: A conceptual model for provenance representation o PROV-Ontology: An OWL ontology for provenance representation o PROV-Access and Query: A framework to query and retrieve provenance on the Web
  • 52. Semantic Web+Clinical Practice Informatics =Active Semantic Electronic Medical Record (ASEMR)
  • 53. Semantic Web application in use In daily use at Athens Heart Center – 28 person staff • Interventional Cardiologists • Electrophysiology Cardiologists – Deployed since January 2006 – 40-60 patients seen daily – 3000+ active patients – Serves a population of 250,000 people
  • 54. Information Overload in Clinical Practice • New drugs added to market – Adds interactions with current drugs – Changes possible procedures to treat an illness • Insurance Coverage's Change – Insurance may pay for drug X but not drug Y even though drug X and Y are equivalent – Patient may need a certain diagnosis before some expensive test are run • Physicians need a system to keep track of ever changing landscape
  • 55. System though out the practice
  • 56. System though out the practice
  • 57. System though out the practice
  • 58. System though out the practice
  • 59. Active Semantic Document (ASD) A document (typically in XML) with the following features: • Semantic annotations – Linking entities found in a document to ontology – Linking terms to a specialized lexicon [TR] • Actionable information – Rules over semantic annotations – Violated rules can modify the appearance of the document (Show an alert)
  • 60. Active Semantic Patient Record • An application of ASD • Three Ontologies – Practice Information about practice such as patient/physician data – Drug Information about drugs, interaction, formularies, etc. – ICD/CPT Describes the relationships between CPT and ICD codes • Medical Records in XML created from database
  • 61. Practice Ontology Hierarchy (showing is-a relationships) facility insurance_ ancillary owl:thing carrier ambularory insurance _episode insurance_ encounter plan person event insurance_ patient policy practitioner
  • 62. Drug Ontology Hierarchy (showing is-a relationships) formulary_ non_drug_ interaction_ property formulary reactant property indication indication_ property owl:thing monograph property _ix_class prescription interaction_ _drug_ with_non_ brandname_ prescription brand_name drug_reactant prescription individual _drug interaction _drug_ property brandname_ brandname_ composite prescription interaction_ undeclared _drug_ with_mono interaction_ generic graph_ix_cl with_prescri cpnum_ generic_ ass ption_drug group composite generic_ individual
  • 63. Drug Ontology showing neighborhood of PrescriptionDrug concept
  • 64. Part of Procedure/Diagnosis/ICD9/CPT Ontology maps_to_diagnosis specificity procedure diagnosis maps_to_procedure
  • 65. Semantic Technologies in Use • Semantic Web: OWL, RDF/RDQL, Jena – OWL (constraints useful for data consistency), RDF – Rules are expressed as RDQL – REST Based Web Services: from server side • Web 2.0: client makes AJAX calls to ontology, also auto complete Problem: • Jena main memory- large memory footprint, future scalability challenge • Using Jena’s persistent model (MySQL) noticeably slower
  • 67. Benefits: Athens Heart Center Practice Growth 1400 1300 1200 1100 Appointments 1000 2003 900 2004 800 2005 700 2006 600 500 400 v b g r c n n l p ar t ay ju ap no oc fe ja ju au de se m m Month
  • 68. Chart Completion before the preliminary deployment of the ASMER 600 500 400 Charts Same Day 300 Back Log 200 100 0 Se 4 5 04 05 04 05 04 05 04 04 l0 l0 n n ay ay pt ar ar ov Ju Ju Ja Ja M M M M N Month/Year
  • 69. Chart Completion after the preliminary deployment of the ASMER 700 600 500 Charts 400 Same Day 300 Back Log 200 100 0 Sept Nov 05 Jan 06 Mar 06 05 Month/Year
  • 70. Benefits of current system • Error prevention (drug interactions, allergy) – Patient care – insurance • Decision Support (formulary, billing) – Patient satisfaction – Reimbursement • Efficiency/time – Real-time chart completion – “semantic” and automated linking with billing
  • 71. Demo On-line demo of Active Semantic Electronic Medical Record deployed and in use at Athens Heart Center 71
  • 72. Challenges, Opportunities, and Future Direction
  • 73. Conclusions Benefits of SW in Health Informatics: • RDF a “universal” data model; Application- purpose agnostic (clinical care vs research) • Integration “ready,” supporting distributed query out of box • Semantic interoperability addressed at root level • Better support of user interfaces for data capture, data query, data integration • Scalability demonstrated
  • 74. Challenges and Future Directions • Design and implementation of health information systems with RDF as primary data store from ground up • User-friendly graphical query interface on top of SPARQL • Managing Protected Health Information (PHI) e.g. data encryption “at rest” for RDF store • From retrospective annotation of data (with ontology) to prospective annotation of data: ontology-driven data capture with annotation happening at the point of primary source (eliminating the need to annotate data retrospectively) • Let ontology drive “everything”
  • 75. References • [RDF] Manola F, Miller, E.(Eds.). RDF Primer. 2004; Available from: http://www.w3.org/TR/rdf-primer/ • [RDFS] Brickley D, Guha, R.V. RDF Schema. 2004; Available from: http://www.w3.org/TR/rdf-schema/ • [OWL] Hitzler P, Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. OWL 2 Web Ontology Language Primer: W3C; 2009 • [Physio-MIMI]: http://physiomimi.case.edu • [ASEMR] A. P. Sheth, Agrawal, S., Lathem, J., Oldham, N., Wingate, H., Yadav, P., Gallagher, K., "Active Semantic Electronic Medical Record," in 5th International Semantic Web Conference, Athens, GA, USA, 2006. • [BioRDF] BioRDF subgroup: Health Care and Life Sciences interest group Available: http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup • [TR] A. Ruttenberg, et al., "Advancing translational research with the Semantic Web," BMC Bioinformatics vol. in Press, 2007.
  • 76. References 2 • [Visage] GQ Zhang et al. VISAGE: A Query Interface for Clinical Research, Proceedings of the 2010 AMIA Clinical Research Informatics Summit, San Francisco, March 12-13, pp. 76-80, 2010 • [Prov] S.S. Sahoo, V. Nguyen, O. Bodenreider, P. Parikh, T. Minning, A.P. Sheth, “A unified framework for managing provenance information in translational research.” BMC Bioinformatics 2011, 12:461 • [RO] Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C: Relations in biomedical ontologies. Genome Biol 2005, 6(5):R46. • [BKR] Bodenreider O, Rindflesch, T.C.: Advanced library services: Developing a biomedical knowledge repository to support advanced information management applications. In. Bethesda, Maryland: Lister Hill National Center for Biomedical Communications, National Library of Medicine; 2006. • T.cruzi project web site: http://wiki.knoesis.org/index.php/Trykipedia
  • 77. Acknowledgements • Collaborators: o Susan Redline, Remo Mueller, and other members of Physio-MIMI team o Rick Tarleton, Todd Manning, Priti Parikh and other members of the T.cruzi SPSE team o Dr. S. Agrawal and other members at the Athens Heart Center, GA • NIH Support: UL1-RR024989, UL1-RR024989-05S, NCRR-94681DBS78, NS076965, and 1R01HL087795

Notes de l'éditeur

  1. RDF: Triple structure
  2. Review types of heterogeneity. Why we need to reconcile data heterogeneityUniform Resource Locator: A network location and used as an identifier for resources on the Web. URL is a specific type of URI. URI can be used to refer to anythingIRI: In addition to ASCII character set, contains Universal Character Set (from RFC 3987)
  3. RDF uses XML Schema datatypes
  4. Allows creation of an abstract representation of domain
  5. Allows creation of an abstract representation of domain
  6. Review types of heterogeneity. Why we need to reconcile data heterogeneity
  7. Review types of heterogeneity. Why we need to reconcile data heterogeneity
  8. Review types of heterogeneity. Why we need to reconcile data heterogeneity
  9. Better yet, for those who are not originally involved in developing the data resource
  10. Web based, anywhere anytime, not requiring knowledge of data structure, data elements, how they are stored
  11. Under the hood
  12. Simple interface: select query representing Cases; selecting query representing controls; then “explore”!
  13. Walking through an example to illustrate the VisAgE’s Case Control features. Cleveland family study
  14. Green means matched; visually inspect PieChart
  15. Try to provide 1:5 matching; not enough controls; COLOER indicator
  16. Controls coming from both Cleveland Family Study and Sleep Heart Health Study gives sufficient number of matched controls.This also illustrates the POWER of data federation in PhysioMIMI: combine different data sources
  17. Can accomondate differences, but prevent the same term to change meaning from one paragraph to the next in the same paper.
  18. Lets take another example from the NIH-funded collaborative project led by our lab along with the CTEGD at UGA to identify vaccine targets for the human pathogen T.cruzi that causes… specifically, we consider the experiment protocol used to create new strains of the parasites by knocking out specific genes in the parasite – to identify function of the gene among other functionalities.Proposal writing
  19. In terms of modeling: A formal representation is needed to support consistent interpretation (also machine processable for large volumes of data) and expressive to closely reflect the domain-specific details i.e. domain semanticsProvenance queries have many characteristics that I will cover later that need to be supported by the query infrastructure.Provenance queries over scientific data are characterized by high expression and data complexity (I will discuss these aspects in detail)Lets discuss the issue of provenance modeling first.
  20. In terms of modeling: A formal representation is needed to support consistent interpretation (also machine processable for large volumes of data) and expressive to closely reflect the domain-specific details i.e. domain semanticsProvenance queries have many characteristics that I will cover later that need to be supported by the query infrastructure.Provenance queries over scientific data are characterized by high expression and data complexity (I will discuss these aspects in detail)Lets discuss the issue of provenance modeling first.
  21. We can extend the provenir ontology to model domain-specific provenance. For example, we extended the provenir ontology to create the peo that models the gene knockout and strain creation protocols. We have also extended provenir to model the trident ontology representing oceanography specific provenance information – will discuss later in evaluation section.PEO has a expressivity of ALCHQ(D) is because of use qualifiers on the properties, for example cell cloning has input value drug selected sample and output value of cloned sample, datatype property for researcher notes
  22. First types of queries are “standard” provenance queryThe second type of queries is a complete reverse view: define constraints on provenance information to retrieve datasets that satisfy those constraints (provenance context)
  23. Formal definitions and implemented in prolog (to validate functional semantics) and mapped to SPARQL – the RDF query language
  24. Query composer: maps the functional semantics of the query operator to SPARQL syntax in other words, creates a SPARQL query pattern that conforms to the required behavior of the query operator. One of the challenges we faced in implementing this query composer is the use highly nested OPTIONAL function to account for the fact that all application do not collect the comprehensive provenance information that the query operators have been defined to operate on – hence is the query operators are translated to a basic graph pattern in SPARQL then it will return no results even though partial information is present. The OPTIONAL function allows us to sidestep this, but we had to map the nesting of the OPTIONAL functions to reflect the structure of the Provenir ontology schema, for example given a process, the link to an agent requires use of top-level OPTIONAL function, but the OPTIONAL function to retrieve the spatial parameters associated with the agent need to be nested within the top-level OPTIONAL functionI discussed the transitive closure function earlier, this is implemented using existing SPARQL function ASK – I will discuss the experiment results to justify the use of ASKThe query optimizer uses materialized view based approach to significantly improve query performance. The entities in the materialized view are indexed in B+ tree and in response to a query, the query optimizer looks up this index to see if a query can be answered using the materialized view or needs to sent to the DB. The query optimizer also includes a module to do “view selection” that is to decide whether to materialize a query result or not – I will expand on this in a few slides