SlideShare une entreprise Scribd logo
1  sur  131
Semantic Web: Technologies and Applications for the Real-World Amit Sheth LexisNexis Ohio Eminent Scholar Kno.e.sis Center Wright State University http:// knoesis.wright.edu Tutorial at WWW2007 by  Amit   Sheth  and Susie Stephens, Banff, Alberta, Canada. May 2007
Semantic Web: Technologies and Applications for the Real-World Susie Stephens Principal Research Scientist Tutorial at WWW2007 by  Amit   Sheth  and Susie Stephens, Banff, Alberta, Canada. May 2007
Tutorial Outline 4.30-5.00pm Semantic Web in Practice 3.30-4.30pm Real-World Applications (2):   Details and War Stories, Case Studies 3.00-3.30pm Break 2.00-3.00pm Real-World Applications (1):   Enabling Technologies and Experiences 1.30-2.00pm Introduction to the Semantic Web
Introduction to the Semantic Web
Agenda ,[object Object],[object Object],[object Object]
Characterizing the Semantic Web  ,[object Object],[object Object],[object Object]
Creating a Web of Data Source: Ivan Herman
Mashing Data Source: W3C
Drivers for the Semantic Web ,[object Object],[object Object],[object Object],[object Object]
Semantic Web Technologies Source:  W3C
Resource Description Framework (RDF) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Resource Description Framework (RDF) Source: W3C
RDF Schema (RDFS) ,[object Object],[object Object],[object Object]
Web Ontology Language (OWL) ,[object Object],[object Object],[object Object],[object Object]
Merging Ontologies Source: Siderean Software
SPARQL ,[object Object],[object Object],[object Object],[object Object],[object Object]
SPARQL as a Unifying Source Source: Ivan Herman
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) ,[object Object],[object Object],[object Object]
Rules ,[object Object],[object Object],[object Object],[object Object]
Semantic Annotations for WSDL (SAWSDL) ,[object Object],[object Object],[object Object],[object Object]
Enhanced Enterprise Search Source: Oracle OTN
Improved Reliability of Search Results Source: Segala
Web of Data Source: Freebase
Data Mashups  Source: BioDASH
Data Mashups Source: BioDASH
Document Management Source: MGH
Semantic Data Warehouse Source: University of Texas
Heterogeneous Data Aggregation ,[object Object],[object Object],[object Object],[object Object]
Integration with Semantic Mediation ,[object Object],[object Object],[object Object]
Natural Language Query of Applications ,[object Object],[object Object],[object Object],[object Object]
Decision Support Source: AGFA
Summary ,[object Object],[object Object],[object Object],[object Object]
Real World Applications  with Enabling Capabilities/Technologies
Enablers and Techniques Semantic Web Application Lifecycle
Syntax, Structure and Semantics ,[object Object],[object Object],[object Object]
Describing Semantic Web to Nontechnical Users ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Enablers and Techniques
A Typical Enterprise SW Application Lifecycle ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semagix Freedom Architecture: for building ontology-driven information system © Semagix, Inc. Managing Semantic Content on the Web
Building ontology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Ontology Examples
Ontology Language/ Representation Spectrum Is Disjoint Subclass of with transitivity property Modal Logic Logical Theory Thesaurus Has Narrower Meaning Than Taxonomy Is Sub-Classification of Conceptual Model Is Subclass of DB Schemas, XML Schema First Order Logic Relational Model XML ER Extended ER Description Logic DAML+OIL, OWL, UML RDFS, XTM Syntactic Interoperability Structural Interoperability Semantic Interoperability Expressiveness  
1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge 4. Querying the Ontology Ontology Creation and Maintenance Steps (Approach 1) © Semagix, Inc. Ontology Semantic Query  Server
Some observations on Ontology Development and Maintenance ,[object Object],[object Object],[object Object]
GlycO ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],EnzyO
GlycO ,[object Object],[object Object],[object Object]
GlycO population ,[object Object],[object Object],[object Object]
GlycO Population ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Diverse Data From Multiple Sources Assures Quality ,[object Object],[object Object],[object Object]
Ontology population workflow
[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} Ontology population workflow
<Glycan> <aglycon name=&quot;Asn&quot;/> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;3&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;a&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;2&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot; > </residue> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot; > </residue> </residue> <residue link=&quot;6&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;a&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;2&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> </residue> </residue> </residue> </residue> </residue> </Glycan> Ontology population workflow
Ontology population workflow
Diverse Data From Multiple Sources Assures Quality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],?
N. Takahashi and K. Kato ,  Trends in Glycosciences and Glycotechnology , 15: 235-251 GlycoTree  - D -Glc p NAc  - D -Glc p NAc  - D -Man p -(1-4)- -(1-4)-  - D -Man p -(1-6)+  - D -Glc p NAc -(1-2)-  - D -Man p -(1-3)+  - D -Glc p NAc -(1-4)-  - D -Glc p NAc -(1-2)+
Pathway representation in Ontology Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia
N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2    <=>  UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2  GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021  N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4
Pathway Steps - Reaction Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia
Pathway Steps - Glycan Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia Abundance of this glycan  in three experiments
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ProPreO ontology More at:  http:// knoesis.wright.edu /research/bioinformatics/
Semantic Annotation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997  - 0530 MDT NATIONAL PREPAREDNESS LEVEL II CURRENT  SITUATION: Alaska continues to experience large fire activity.  Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM .   This fire is on the east side of the Innoko Flats, between Galena and McGr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce.  The fire has increased in size, but was not mapped due to thick smoke.  The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. CHINIKLIK MOUNTAIN, Galena District, BLM .   A Type II Incident Management Team (Wehking) is  assigned to the Chiniklik fire.  The fire is contained.  Major areas of heat have been mopped up.  The fire is contained.  Major areas of heat have been mopped-up.  All crews and overhead will mop-up where the fire burned beyond the meadows.  No flare-ups occurred today.  Demobilization is planned for this weekend, depending on the results of infrared scanning. LAYOUT Date => day month int ‘,’ int
Extraction  Agent Web Page Enhanced Metadata Asset Taalee Extraction and Knowledgebase  Enhancement Taalee  Semantic Engine ,  also  IBM Semantics Tools
Metadata Extraction and Semantic Enhancement Semantic Enhancement Engine [Hammond, Sheth, Kochut 2002] Also, WebFountatin,  KIM from  OntoText
Semantic Annotation - document
Semantic Annotation – news feed
Semantic annotation of scientific/experimental data
ProPreO population: transformation to rdf Scientific Data Computational Methods Ontology instances
ProPreO: Ontology-mediated provenance 830.9570  194.9604  2 580.2985  0.3592 688.3214  0.2526 779.4759  38.4939 784.3607  21.7736 1543.7476  1.3822 1544.7595  2.9977 1562.8113  37.4790 1660.7776  476.5043 parent ion  m/z fragment ion  m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge M ass  S pectrometry (MS) Data http://knoesis.org/research/bioinformatics
ProPreO: Ontology-mediated provenance <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ion  m-z =“830.9570” abundance=“194.9604” z=“2”/> <fragment_ion  m-z =“580.2985” abundance=“0.3592”/> <fragment_ion  m-z =“688.3214” abundance=“0.2526”/> <fragment_ion  m-z =“779.4759” abundance=“38.4939”/> <fragment_ion  m-z =“784.3607” abundance=“21.7736”/> <fragment_ion  m-z =“1543.7476” abundance=“1.3822”/> <fragment_ion  m-z =“1544.7595” abundance=“2.9977”/> <fragment_ion  m-z =“1562.8113” abundance=“37.4790”/> <fragment_ion  m-z =“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> Ontological Concepts Semantically Annotated MS Data http://knoesis.org/research/bioinformatics
Real World Applications ,[object Object],[object Object],[object Object],[object Object]
Characterizing applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Review of Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Taalee  A/V Semantic Search Engine Context and Domain Specific Attributes Uniform Metadata for Content from Multiple  Sources, Can be sorted by any field Delightful, relevant information, exceptional targeting opportunity
Creating a Web of related information What can a context do?
Ontology driven Semantic Directory Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
What else can a context do? (a commercial perspective) Semantic Enrichment Semantic Targeting
Semantic/Interactive Targeting Precisely targeted through the use of Structured Metadata and integration from multiple sources Buy  Al  Pacino  Videos Buy  Russell Crowe  Videos Buy  Christopher Plummer  Videos Buy  Diane  Venora  Videos Buy  Philip Baker Hall  Videos Buy  The Insider  Video
Semantic Application – Equity Dashboard Managing Semantic Content on the Web , Internet Computing 2002 Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related news not specifically asked for (Semantic Associations) Competitive research inferred automatically Automatic 3 rd  party content integration
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Thanks -- Dr. Agrawal, Dr. Wingeth, and others.  Active Semantic Electronic Medical Record ISWC2006 paper Semantic Applications:  Health Care
[object Object],[object Object],[object Object],[object Object],Primary Funding by ARDA, Secondary Funding by NSF An Ontological Approach to the Document Access Problem of Insider Threat   Semantic Web Applications in Government
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Semantic Application in a Global Bank
Ahmed Yaseer  appears on Watchlist member of organization works for Company ,[object Object],[object Object],[object Object],[object Object],The Process Watch list Organization Company Hamas  WorldCom   FBI Watchlist
[object Object],[object Object],World Wide  Web content Public  Records BLOGS, RSS Un-structure text, Semi-structured Data Watch Lists Law  Enforcement Regulators Semi-structured Government Data Global Investment Bank User will be able to navigate  the ontology using a number  of different interfaces  Scores the entity  based on the  content and entity  relationships Establishing New Account
N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction Peptide Fraction ms data ms/ms data ms peaklist ms/ms peaklist Peptide list N-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Data reduction Peptide identification binning n 1
Workflow based on Web Services = Web Process
ISiS – Integrated Semantic Information and Knowledge System   Semantic Annotation Applications Semantic Web Process to incorporate provenance Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent  Agent  Agent  Agent  Biological Sample  Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/Sequest) Results Post-process (ProValt) O I O I O I O I O Biological Information
Semantic Annotation Facilitates Complex Queries ,[object Object],[object Object],[object Object],ProPreO concepts highlighted in red A Web Service Must Be Invoked Life Science Data and Information Integration using Semantic Web
[object Object],More  Life Science Applications More at  Knoesis  research in life sciences
Relationship Extraction 9284  documents  4733   documents Biologically  active substance Disease or Syndrome causes affects causes complicates Fish Oils Raynaud’s Disease ??????? instance_of instance_of 5  documents UMLS MeSH PubMed Lipid affects
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],About the data used T147—effect  T147—induce  T147—etiology  T147—cause  T147—effecting  T147—induced
Example PubMed abstract (for the domain expert) Abstract Classification/Annotation
Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )  REmBRANDTS :  REtrieval ,  BRowsing , Analytics and  kNowledge  Discovery over Text using Semantics
Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities
Hypothesis Driven  retrieval of Scientific Literature   PubMed Keyword query: Migraine[MH] + Magnesium[MH] Complex  Query Supporting Document  sets retrieved Migraine Stress Patient affects isa Magnesium Calcium Channel  Blockers inhibit
Eli Lilly Case Study
Advances in Science ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tailored Therapeutics ,[object Object],[object Object],[object Object],[object Object]
Critical Path Initiative Source: Innovation or Stagnation, FDA Report, March 2004
The Web of Heterogeneous Data Cell/Assay Technologies
Integrative Informatics Platform Discovery Clinical Public Proteomics Imaging In-Vitro/Vivo Genotyping SNPs/Haplotypes XREP plus IQ Proteomics Informatics System/Semantic Integration Layer Integrative Informatics PGI Lipid Informatics Tailored Therapeutic Workbench (TTW) Integrative Data Mining/Query System Lipidomics Gene Expression Info Mining Text Translational Informatics
Discovery Target Assessment Tool
Use Cases for RDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic Web in Practice
The Semantic Web is Maturing  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic Web Infrastructure ,[object Object],[object Object],[object Object],[object Object]
Triple Stores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],There are many others available too…
Boca, IBM Source: IBM
RDF Data Model, Oracle ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Virtuoso, OpenLink ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Adapting SQL Databases Source: Tim Berners-Lee
Mapping Relational to RDF ,[object Object],[object Object],[object Object],[object Object],Source: DartGrid
RDFizers ,[object Object],[object Object],[object Object],[object Object],[object Object],A directory of RDFizers is provided at:  http://simile.mit.edu/wiki/RDFizers ,[object Object],[object Object],[object Object],[object Object],[object Object]
Ontology Editors and Environments ,[object Object],Source: Ian Horrocks
Reasoning Systems KAON2 Pellet CEL
Semantic Web Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semantic Web Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Over 500 tools are now available
Lists of Tools ,[object Object],[object Object],[object Object],[object Object],[object Object]
How to get RDF Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RDF Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Vocabularies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Source: Ivan Herman
Collateral ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Public Fora at W3C ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pointers for Getting Going ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]

Contenu connexe

Tendances

Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
IJwest
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin
 

Tendances (20)

Website Performance at Client Level
Website Performance at Client LevelWebsite Performance at Client Level
Website Performance at Client Level
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Web Spa
Web SpaWeb Spa
Web Spa
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic Web
 
interoperability: the value of recombinant potential
interoperability: the value of recombinant potentialinteroperability: the value of recombinant potential
interoperability: the value of recombinant potential
 
RDF and Java
RDF and JavaRDF and Java
RDF and Java
 
Metadata for Terminology / KOS Resources
Metadata for Terminology / KOS ResourcesMetadata for Terminology / KOS Resources
Metadata for Terminology / KOS Resources
 
Folksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemFolksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization system
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
 
Ontology
OntologyOntology
Ontology
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in the
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Semantics 101
Semantics 101Semantics 101
Semantics 101
 

Similaire à Semantic Web: Technolgies and Applications for Real-World

Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
IJwest
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
umavanth
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 

Similaire à Semantic Web: Technolgies and Applications for Real-World (20)

Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
we to deep learning
we to deep learning we to deep learning
we to deep learning
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for Ins
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Making the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataMaking the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked Data
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Dernier (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Semantic Web: Technolgies and Applications for Real-World

  • 1. Semantic Web: Technologies and Applications for the Real-World Amit Sheth LexisNexis Ohio Eminent Scholar Kno.e.sis Center Wright State University http:// knoesis.wright.edu Tutorial at WWW2007 by Amit Sheth and Susie Stephens, Banff, Alberta, Canada. May 2007
  • 2. Semantic Web: Technologies and Applications for the Real-World Susie Stephens Principal Research Scientist Tutorial at WWW2007 by Amit Sheth and Susie Stephens, Banff, Alberta, Canada. May 2007
  • 3. Tutorial Outline 4.30-5.00pm Semantic Web in Practice 3.30-4.30pm Real-World Applications (2): Details and War Stories, Case Studies 3.00-3.30pm Break 2.00-3.00pm Real-World Applications (1): Enabling Technologies and Experiences 1.30-2.00pm Introduction to the Semantic Web
  • 4. Introduction to the Semantic Web
  • 5.
  • 6.
  • 7. Creating a Web of Data Source: Ivan Herman
  • 9.
  • 11.
  • 12. Resource Description Framework (RDF) Source: W3C
  • 13.
  • 14.
  • 15. Merging Ontologies Source: Siderean Software
  • 16.
  • 17. SPARQL as a Unifying Source Source: Ivan Herman
  • 18.
  • 19.
  • 20.
  • 21. Enhanced Enterprise Search Source: Oracle OTN
  • 22. Improved Reliability of Search Results Source: Segala
  • 23. Web of Data Source: Freebase
  • 24. Data Mashups Source: BioDASH
  • 27. Semantic Data Warehouse Source: University of Texas
  • 28.
  • 29.
  • 30.
  • 32.
  • 33. Real World Applications with Enabling Capabilities/Technologies
  • 34. Enablers and Techniques Semantic Web Application Lifecycle
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Semagix Freedom Architecture: for building ontology-driven information system © Semagix, Inc. Managing Semantic Content on the Web
  • 40.
  • 41.
  • 42. Ontology Language/ Representation Spectrum Is Disjoint Subclass of with transitivity property Modal Logic Logical Theory Thesaurus Has Narrower Meaning Than Taxonomy Is Sub-Classification of Conceptual Model Is Subclass of DB Schemas, XML Schema First Order Logic Relational Model XML ER Extended ER Description Logic DAML+OIL, OWL, UML RDFS, XTM Syntactic Interoperability Structural Interoperability Semantic Interoperability Expressiveness 
  • 43. 1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge 4. Querying the Ontology Ontology Creation and Maintenance Steps (Approach 1) © Semagix, Inc. Ontology Semantic Query Server
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 52. [][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} Ontology population workflow
  • 53. <Glycan> <aglycon name=&quot;Asn&quot;/> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;3&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;a&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;2&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot; > </residue> <residue link=&quot;4&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot; > </residue> </residue> <residue link=&quot;6&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;a&quot; chirality=&quot;D&quot; monosaccharide=&quot;Man&quot; > <residue link=&quot;2&quot; anomeric_carbon=&quot;1&quot; anomer=&quot;b&quot; chirality=&quot;D&quot; monosaccharide=&quot;GlcNAc&quot;> </residue> </residue> </residue> </residue> </residue> </Glycan> Ontology population workflow
  • 55.
  • 56. N. Takahashi and K. Kato , Trends in Glycosciences and Glycotechnology , 15: 235-251 GlycoTree  - D -Glc p NAc  - D -Glc p NAc  - D -Man p -(1-4)- -(1-4)-  - D -Man p -(1-6)+  - D -Glc p NAc -(1-2)-  - D -Man p -(1-3)+  - D -Glc p NAc -(1-4)-  - D -Glc p NAc -(1-2)+
  • 57. Pathway representation in Ontology Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia
  • 58. N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-acetyl-glucosaminyl_transferase_V N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4
  • 59. Pathway Steps - Reaction Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia
  • 60. Pathway Steps - Glycan Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia Abundance of this glycan in three experiments
  • 61.  
  • 62.
  • 63.
  • 64. Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997 - 0530 MDT NATIONAL PREPAREDNESS LEVEL II CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM . This fire is on the east side of the Innoko Flats, between Galena and McGr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. The fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. CHINIKLIK MOUNTAIN, Galena District, BLM . A Type II Incident Management Team (Wehking) is assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire is contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fire burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend, depending on the results of infrared scanning. LAYOUT Date => day month int ‘,’ int
  • 65. Extraction Agent Web Page Enhanced Metadata Asset Taalee Extraction and Knowledgebase Enhancement Taalee Semantic Engine , also IBM Semantics Tools
  • 66. Metadata Extraction and Semantic Enhancement Semantic Enhancement Engine [Hammond, Sheth, Kochut 2002] Also, WebFountatin, KIM from OntoText
  • 69. Semantic annotation of scientific/experimental data
  • 70. ProPreO population: transformation to rdf Scientific Data Computational Methods Ontology instances
  • 71. ProPreO: Ontology-mediated provenance 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge M ass S pectrometry (MS) Data http://knoesis.org/research/bioinformatics
  • 72. ProPreO: Ontology-mediated provenance <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ion m-z =“830.9570” abundance=“194.9604” z=“2”/> <fragment_ion m-z =“580.2985” abundance=“0.3592”/> <fragment_ion m-z =“688.3214” abundance=“0.2526”/> <fragment_ion m-z =“779.4759” abundance=“38.4939”/> <fragment_ion m-z =“784.3607” abundance=“21.7736”/> <fragment_ion m-z =“1543.7476” abundance=“1.3822”/> <fragment_ion m-z =“1544.7595” abundance=“2.9977”/> <fragment_ion m-z =“1562.8113” abundance=“37.4790”/> <fragment_ion m-z =“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> Ontological Concepts Semantically Annotated MS Data http://knoesis.org/research/bioinformatics
  • 73.
  • 74.
  • 75.
  • 76. Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Taalee A/V Semantic Search Engine Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field Delightful, relevant information, exceptional targeting opportunity
  • 77. Creating a Web of related information What can a context do?
  • 78. Ontology driven Semantic Directory Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
  • 79. What else can a context do? (a commercial perspective) Semantic Enrichment Semantic Targeting
  • 80. Semantic/Interactive Targeting Precisely targeted through the use of Structured Metadata and integration from multiple sources Buy Al Pacino Videos Buy Russell Crowe Videos Buy Christopher Plummer Videos Buy Diane Venora Videos Buy Philip Baker Hall Videos Buy The Insider Video
  • 81. Semantic Application – Equity Dashboard Managing Semantic Content on the Web , Internet Computing 2002 Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related news not specifically asked for (Semantic Associations) Competitive research inferred automatically Automatic 3 rd party content integration
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction Peptide Fraction ms data ms/ms data ms peaklist ms/ms peaklist Peptide list N-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Data reduction Peptide identification binning n 1
  • 88. Workflow based on Web Services = Web Process
  • 89. ISiS – Integrated Semantic Information and Knowledge System Semantic Annotation Applications Semantic Web Process to incorporate provenance Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Agent Agent Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/Sequest) Results Post-process (ProValt) O I O I O I O I O Biological Information
  • 90.
  • 91.
  • 92. Relationship Extraction 9284 documents 4733 documents Biologically active substance Disease or Syndrome causes affects causes complicates Fish Oils Raynaud’s Disease ??????? instance_of instance_of 5 documents UMLS MeSH PubMed Lipid affects
  • 93.
  • 94. Example PubMed abstract (for the domain expert) Abstract Classification/Annotation
  • 95. Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) ) REmBRANDTS : REtrieval , BRowsing , Analytics and kNowledge Discovery over Text using Semantics
  • 96. Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities
  • 97. Hypothesis Driven retrieval of Scientific Literature PubMed Keyword query: Migraine[MH] + Magnesium[MH] Complex Query Supporting Document sets retrieved Migraine Stress Patient affects isa Magnesium Calcium Channel Blockers inhibit
  • 98. Eli Lilly Case Study
  • 99.
  • 100.
  • 101. Critical Path Initiative Source: Innovation or Stagnation, FDA Report, March 2004
  • 102. The Web of Heterogeneous Data Cell/Assay Technologies
  • 103. Integrative Informatics Platform Discovery Clinical Public Proteomics Imaging In-Vitro/Vivo Genotyping SNPs/Haplotypes XREP plus IQ Proteomics Informatics System/Semantic Integration Layer Integrative Informatics PGI Lipid Informatics Tailored Therapeutic Workbench (TTW) Integrative Data Mining/Query System Lipidomics Gene Expression Info Mining Text Translational Informatics
  • 105.
  • 106.
  • 107.
  • 108. Semantic Web in Practice
  • 109.
  • 110.
  • 111.
  • 112.
  • 114.
  • 115.
  • 116. Adapting SQL Databases Source: Tim Berners-Lee
  • 117.
  • 118.
  • 119.
  • 120. Reasoning Systems KAON2 Pellet CEL
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.

Notes de l'éditeur

  1. Tasks often require data to be combined on the web (hotel and travel info) Humans can do that easily, even if different terminology is used. However, machines are ignorant. Partial information is unusable, difficult to make sense from an image, drawing analogies is difficult, difficult to combine information, e.g. is &lt;foo:creator&gt; sameas &lt;bar:author&gt;? How do you combine different XML schemas. - The semantic web is an interoperability technology. The traditional approach to interoperability is standardization. But standardization in this sense isn’t attractive, because this means that one has to anticipate everything about about the future, or one has to limit the world somehow. With the semantic web there’s more of a focus on standardizing how to say things, not what to say. There’s less of an emphasis on a priori standardization. The ultimate goal with the sw is serendipitous interoperability. Allowing information from independent sources be combined. Different domains have languages of their own, and their conceptual models of the world differ. SW was designed to accommodate different points of view. Using SW formalisms lowers the threshold for serendipitous reuse. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. So the semantic web is all about breaking down silos of data, so that people can make better decisions, because they now have a 360 degree view of their operations. The semantic web is about forming a web of data. With the data itself residing in databases, spreadsheets, XML repositories wiki pages or traditional web pages. The semantic web is the glue that can bring data from these different sources together. And it’s an approach that is based on open standards. Labeling data on Web so that both humans and machines can more effectively use them Associating meaning to data that machines can understand so as to achieve lot more automation and off-load more work to machines Exploiting common vocabulary and richer modeling of subject area for much better integration of data
  2. Map data onto an abstract data representation Make the data independent of its internal representation Merge the resulting representations Start making queries on the whole Queries that could not have been done on the individual data sets Use labels on the edges (relations) Use unique identifiers Data export does not always mean physical conversion of the data Relations can be generated on the fly at query time – via SQL bridges, scraping HTML, extracting data from Excel You can export part of the data Can use sameas E.g. integrating information about authors. (ISBN identifiers) Author isA person, so could find more in dbpedia for example. Include full classification of books, geographical location, information on inventories and prices – would then need ontologies and rules The graph representation is independent of the exact structure of for example the relational schema. A change in the local database schema or the XHTML does not affect the whole export step New data and new connections can be added seamlessly, regardless of the structure of other data sources The SemWeb makes this possible
  3. Combine hotel and travel information – hard for machines to do. CoCoDat, cell centered database, NeuronDB
  4. Can I trust (meta)data on the web? Is the author who s/he says s/he is? Can I check his/her credentials? Can I trust the inference engine How to name a full graph Protocols and policies to encode/sign full or partial graphs (blank nodes may be a problem for achieving uniqueness) How to express trust (trust in context) So I want to ground my earlier comments by giving you an overview of the semantic web standards. I’m going to start by giving you a high level overview of the different standards and technologies that make up the semantic web. I then have more slides that cover the key standards in some detail. So the semantic web takes advantage of many mature standards as it builds upon unicode, URIs, and XML. The use of URIs grounds RDF in the web. The semantic web uses XML in a couple of different ways. The semantic web takes advantage of XML namespaces for representing URIs. XML schema data types are used to characterize literals in RDF, and the standard way to serialize RDF is to use XML. RDF has been a W3C standard since 2004. RDF is a standard format for data interchange with the semantic web. RDF Schema is a language for describing RDF vocabularies, and is meant for inferencing. OWL has been a W3C standard since 2004. OWL is a more complex languages for describing RDF vocabularies. Again it is intended for inferencing. W3C formed a working group in the area of rules last Nov. It has been recognized that adding rules to the SW framework would provide additional expressivity. The goal of W3C’s work in this area is to provide a rule interchange format, so that the different rule engines can talk to each other. Gary Hallmark is doing an excellent job representing Oracle in this working group. SPARQL is the standard that W3C are working on for querying graphs. So it’s a bit like SQL for the SW. And Fred Zemke is doing a great job representing Oracle in this group. The upper layers of the stack are areas that still need to be standardized in the future. A proof language is simply a language that let&apos;s us prove whether or not a statement is true. The spectrum of the topic of trust is large - ranging from digital signatures and security to social network analysis. A nd cryto refers to the need to have support for security throughout the stack.
  5. - RDF or resource description framework is the main syntactic standard for the Semantic Web. - In the Semantic Web things are referred to as ‘resources’; a resource can be anything that someone might want to talk about, e.g. Shakespeare, Stratford, “the value of п ”. A Subject, Predicate and and Object together form the basic building block for RDF – the triple. Using the triple it is possible to make statements about facts, statements about associations, and statements of statements. Triples become more interesting when more than one triple refers to the same entity. When more than one triple refers to the same thing, sometimes it is convenient to view the triples as a directed graph , in which each triple is an edge from its subject to its object, with the predicate as the label on the edge. One value of representing data in a triple is the ease with which information from multiple sources can be combined. Since information is represented simply as triples, merged information from graphs A and B should be as simple as forming the graph of all of the triples from A and B together. Merging graphs to create a combined graph is a straightforward process, but only when it is known which nodes in each of the source graphs match. That is, the essence of the merge comes down to answering the question, “when is a node in one graph the same node as a node in another graph?” In RDF, this issue is resolved through the use of URIs. A URI provides a global identification for a resource that is common across the web. If two agents on the web want to refer to the same resource, recommended practice on the web is for them to agree to a common URI for that resource. This is not a stipulation that is particular to the Semantic Web, but to the web in general; global naming leads to global network effects. RDF borrows the notion of URI to resolve the identity problem in graph merging. The application is quite simple; a node from one graph is merged with a node from another graph, exactly if they have the same URI. The RDF standard itself takes advantage of the namespace infrastructure to define a small number of standard identifiers in a namespace defined in the standard, a namespace called rdf . rdf:type is a property that provides an elementary typing system in RDF. For example, we can express the relationship between several entities using type information. The subject of rdf:type in these triples can be any identifier, and the object is understood to be a type. There is no restriction on the usage of rdf:type with types; types can have types etc.: rdf:Property is an identifier that is used as a type in RDF to indicate when another identifier is to be used as a predicate, rather than as a subject or an object. We can declare all the identifiers we have used as predicates So using RDF, information can also be assembled into useful blocks of knowledge. Any information expressed in RDF can be connected to any other information expressed in RDF, in much the same manner that any document expressed in HTML can link to any other document in HTML. This makes data very re-useable, and doesn’t trap people into an inflexible data model. Databases actually also use triples, but the components of the triple are a row, a column and the value in the field. RDF triples are very simplistic and flexible and so they can represent data in a table structure, in a tree structure, or in any structure.
  6. RDF is a labeled connection between resources spo Use URIs for naming URIs make the merge possible Machine readable formats Form DLG
  7. This is the simplest form of extra knowledge. Define the terms we can use What restrictions apply What extra relations are there RDF vocabulary description language e.g. use the term novel. Every novel is a fiction The glass palace is a novel Everything in RDF is a resource Classes are a collection of resources Relationships are defined between classes and resources Typing an individual belongs to a particular class – e.g. the glass palace is a novel (or to be more precise ISBN xxx is a novel) Subclassing – the instance of one is also the instance of another – e.g. every novel is a fiction RDFS formalizes these notions Resources and classes are nodes; while types and subClassOf are properties A resource may belong to several classes. The Glass Palace is a novel ad an inventory item. The type information can be very important for applications. It can be used to categorize nodes. Inferred data is not in the original data But can be inferred from the RDFS rules Some RDF environments can return that triple too. RDF semantics has 44 entailment rules If such ad such triples are in the graph, and these triples Do that recursively until the graph doesn’t change This can be done in polynomial time Whether triples are added to the graph, or deduced at run time is an implementation issue Property is a special class. Properties are also resources identified by URIs Properties are constrained by their range and domain Ie what individuals can serve as object and subject There is also a possibility for sub-property All resources bound by the sub are also bound by each other Properties are also resources (named with URIs) Literals may have a data type. Natural language can also be specified RDFS has some predefined classes and properties They are not new concepts in the RDF Model, just resources with agreed semantics e.g. collections (aka lists), containers (sequence, bag, alternatives), reification Terminological assertions and axioms describes the separation of data from schema So working up the stack. RDF provides support for vocabularies through RDFS. RDFS supports hierarchies of classes and properties. This makes it easier to interpret data when people are talking at different levels of abstraction. It also allows the domain and range of properties to be constrained. rdfs:domain always says what class the subject of a triple using that property belongs to. rdfs:range always says what class the object of a triple using that property belongs to. RDFS represent the simplest ontology level. RDF Schema is meant for inferencing. So if flipper is a dolphin. And dolphins are mammals. Then you can infer that flipper is a mammal. RDFS Example: A rdf:type B, B rdfs:subClassOf C =&gt; A rdf:type C Ex: Matt rdf:type Father, Father rdfs:subClassOf Parent =&gt; Matt rdf:type Parent
  8. So moving up the stack further, you get to Web Ontology Language or OWL. Ontologies are a very general term that is meant to define the concepts and relationships used to describe and represent an area of knowledge. Ontologies are used to classify the terms used in a particular application, characterize possible relationships, and define possible constraints using those relationships. OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. Ontologies can be very complex, or very simple. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. exactly one), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. Owl lite, owl dl and owl full have increasing expressivity. OWL lite supports those users primarily needing a classification hierarchy and simple constraints. E.g. while it supports cardinality constraints, it only permits cardinality values of 1 or 0. It provides a quick migration path for thesauri and other taxonomy. It has a lower formal complexity that OWL DL. OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (so all conclusions are guaranteed to be computable) and decidability (all computations will finish in a finite time). OWL DL contains all owl language constructs, but they can be used only under certain restrictions (e.g. while a class can be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logics, a field of research that has studied the logics that form the formal foundations of OWL. OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. E.g. in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full. An ontology is an engineering artefact consisting of: A vocabulary used to describe (a particular view of) some domain An explicit specification of the intended meaning of the vocabulary. Often includes classification based information Constraints capturing background knowledge about the domain Ideally, an ontology should: Capture a shared understanding of a domain of interest Provide a formal and machine manipulable model -- An ontology describes the concepts and relationships that are important in a particular domain, providing a vocabulary for that domain as well as a computerized specification of the meaning of terms used in the vocabulary. Ontologies range from taxonomies and classifications, database schemas, to fully axiomatized theories. In recent years, ontologies have been adopted in many business and scientific communities as a way to share, reuse and process domain knowledge. Ontologies are now central to many applications such as scientific knowledge portals, information management and integration systems, electronic commerce, and semantic web services.
  9. One last point that I’d like to make about ontologies, is that it is possible to merge them. This means that it’s possible to take several relatively small and modular ontologies, and join them together. The figure shows a number of different ontologies that could be merged together. And by doing that, it would become possible to query across a number of data sources that are typically in different silos. Merging ontologies is still a fairly manual process, but there are research groups that are focused on at least partially automating the task.
  10. Queries become very important for distributed rdf data The fundamental idea: generalize the approach to graph patterns The pattern contains unbound symbols By binding the symbols (if possible) subgraphs of the RDF graph are selected If there is such a selection, the query returns the bound resources SPARQL is based on similar systems that already existed in some environments SPARQL is a programming language-independent query language Optional patterns; if they match variables,, fine, if not, don’t care Limit the number of returned results; remove duplicated, sort them.. Specify several data sources (via URIs) within the query (essentially a merge) Construct a graph combining a separate pattern and query results Use datatypes and/or language tags when matching a pattern Already many implementations Used locally, ie bound to a programming environment Used remotely, ie over the network on into a database Separate documents define the protocol and the result format SPARQL Protocol for RDF with HTTP and SOAP bindings SPARQL results in XML or JSON formats
  11. GRDDL is a mechanism for G leaning R esource D escriptions from D ialects of L anguages. This GRDDL specification introduces markup based on existing standards for declaring that an XML document includes data compatible with the Resource Description Framework (RDF) and for linking to algorithms (typically represented in XSLT), for extracting this data from the document. The markup includes a namespace-qualified attribute for use in general-purpose XML documents and a profile-qualified link relationship for use in valid XHTML documents. The GRDDL mechanism also allows an XML namespace document (or XHTML profile document) to declare that every document associated with that namespace (or profile) includes gleanable data and for linking to an algorithm for gleaning the data. A corresponding GRDDL Use Case Working Draft provides motivating examples. A GRDDL Primer demonstrates the mechanism on XHTML documents which include widely-deployed dialects known as microformats. A GRDDL Test Cases document illustrates specific issues in this design and provides materials to aid in test-driven development of GRDDL-aware agents. GRDDL formalizes the scrapper approach. The user has to provide dc-extract.xsl and use its conventions (making use of the corresponding meta=s, class id-s, etc). But by using the profile attribute, a client is instructed to find and run the transformation processor automatically. There is a mechanism for XML in general, a transformation can also be defined on a XML schema level. A bridge to microformats.
  12. The Web Services Description Language (WSDL) specifies a way to describe the abstract functionalities of a service and concretely how and where to invoke it. The WSDL 2.0 specification does not include semantics in the description of Web services. Therefore, two services can have similar descriptions while meaning totally different things. Resolving this ambiguity in Web services descriptions is an important step toward automating the discovery and composition of Web services — a key productivity enabler in many domains including business application integration. Semantic Annotations for WSDL and XML Schema ( SAWSDL ) specification that is being developed by this Working Group defines mechanisms using which semantic annotations can be added to WSDL components. SAWSDL does not specify a language for representing the semantic models, e.g. ontologies. Instead, it provides mechanisms by which concepts from the semantic models that are defined either within or outside the WSDL document can be referenced from within WSDL components as annotations. These semantics when expressed in formal languages can help disambiguate the description of Web services during automatic discovery and composition of the Web services. Based on member submission WSDL-S , the key design principles for SAWSDL are: The specification enables semantic annotations for Web services using and building on the existing extensibility framework of WSDL. It is agnostic to semantic representation languages. It enables semantic annotations for Web services not only for discovering Web services but also for invoking them. Based on these design principles, SAWSDL defines the following three new extensibility attributes to WSDL 2.0 elements to enable semantic annotation of WSDL components: an extension attribute, named modelReference , to specify the association between a WSDL component and a concept in some semantic model. This modelReference attribute is used to annotate XML Schema complex type definitions, simple type definitions, element declarations, and attribute declarations as well as WSDL interfaces, operations, and faults. two extension attributes, named liftingSchemaMapping and loweringSchemaMapping , that are added to XML Schema element declarations, complex type definitions and simple type definitions for specifying mappings between semantic data and XML. These mappings can be used during service invocation. Publications
  13. - Another area where semantic web can be used is in enterprise search. - And this is in fact exactly what we’re doing on OTN. We’re currently just using the capability for the press room, but beta testing will be starting soon for more of the site. I believe that semantic search is of most value where you have a steady stream of highly structured information coming in, where the semantic metadata is applied automatically in the background, e.g filling in forms, where the form is backed by an ontology. With OTN we are using RDF to type the content, so that you can dynamically aggregate the site according to your area of interest. So using this approach, you aren’t restricted to say drilling down into the database silo, or the middleware silo. Now you can choose to say view all forums, or all sites about Java or life sciences. Oracle Technology Network (OTN): aggregates many sources of content through a single portal Various feeds are pulled, Oracle’s taxonomy is used for additional annotation Ontologies make it possible to re-group related items; leads to more comprehensible search results and to narrow search Advantages include enhanced search and navigation, better user interface, better queries through SPARQL
  14. Improving the reliability of search results Define content labels: metadata with more information on search results (eg, suitability for mobile devices, disabled users or children)‏ With a proper extension (Segala’s Search Thresher): (1) search results are processed, trying to locate a content label file (in RDF) and (2) pages that contain a content label file are highlighted, additional information is displayed More trust and warning on search results; with proper settings, results can be filtered
  15. University of Texas SAPPHIRE: an integrated biosurveillance system; integrates data from multiple disparate sources Data can be viewed from many different perspectives (disease surveillance, environmental protection, biosecurity and bioterrorism, veterinary surveillance, etc.)‏ SAPPHIRE new data feeds absorbed easily (eg, it had a successful use during the Katrina disaster)‏ Advantages include dynamic adaptability, usage of disparate and heterogeneous data
  16. Ordnance Survey Geographic Referencing Framework Maintains a definitive mapping data of Great Britain (used by the public and private sector)‏ Integrates different, semantically diverse sources of data Develop general ontologies (eg, Hydrology, Administrative Geography, Buildings and Places, etc) to bridge differences Efficient queries via the ontology or directly in RDF Advantages include efficient data integration, data repurposing, better quality control and classification
  17. The complexity of supply chains has increased, they involve many players of differing size and function Support for “Operational Support Systems (OSS)” integration using semantic descriptions of system interfaces and messages Internet Service Providers integrate their OSS with those of BT (via a gateway)‏ Integration of heterogeneous OSS systems of partners The approach reduces costs and time-to-market; ontologies allow for a reuse of services
  18. Tata Consultancy Services Natural Language Interface to Business Applications A domain OWL ontology aids in the retrieval of relevant data and concepts Advantages include distinct semantics for the various concepts in the domain, link in external concepts; the system uses a natural language system to interact “intelligently” with the user
  19. Reduction of Errors in Radiological Procedure Orders Clinical (textual) guidelines exist to assist clinicians for the appropriate radiological procedures Clinical guidelines and protocols (from different sources) can be integrated, bound to (OWL) ontologies SW engines generate recommendations for radiology orders Prevents, e.g., medical errors, spreads the development/maintenance of knowledge bases, traces the provenance of facts and rules in decision making
  20. 1 Databases: Focus on local semantics that have only aspects of the real world Typically keep that semantics implicit Use logic structurally Their schemas are not generally reusable Ontologies: Focus on global semantics of the real world Make that semantics explicit Enable machine interpretability by using a logic-based modeling language Are reusable as true models of a portion of the world 1. Intro. On the previous slide, I highlighted some of the differences between OWL lite, OWL DL, and OWL Full. I want to show this slide to help position OWL compared to other technologies in terms of expressivity.   2. Ontologies - So ontologies are about vocabularies and their meanings, with an explicit, expressive, and well-defined semantics, possibly machine-interpretable - Ontologies try to limit the possible formal models of interpretation (semantics) of those vocabularies to the set of meanings a modeler intends, i.e., close to the human conceptualization   3. DB - None of the other &amp;quot;vocabularies&amp;quot; such as database schemas or object models, with less expressive semantics, do that - The approaches with less expressive semantics typically assume that humans will look at the &amp;quot;vocabularies&amp;quot; and supply the semantics via the human semantic interpreter (your mental model)   5. Modeling - A given ontology cannot model completely any given domain - However, in capturing real world semantics, you are thereby enabled to reuse, extend, refine, generalize, etc., that semantic model - You cannot reuse database schemas - You might be able to take a database conceptual schema and use that as the basis of an ontology, but that would still be a leap from an Entity-Relation model to a Conceptual Model (say, UML, i.e., a weak ontology) to a Logical Theory (strong ontology) - But logical and physical schemas are typically pretty useless, since they incorporate non real world knowledge (and in non-machine-interpretable form) By the time you have the physical schema, you just have relations and key information: you&apos;ve thrown away the little semantics you had at the conceptual schema level  
  21. concerning classes and class-hierarchies
  22. Websites are converted into Freedom-internal descriptions. Parts of the instance data is the carbbank id, which is not really considered in this slide. Another part is the IUPAC glycan representation, which is 2 dimensional and ambiguous. This is converted into the linear LINUCS format, which again is a purely syntactical translation of the IUPAC structure. For that reason, ambiguity might still be there in the individual residue descriptions, whereas the structure is unambiguous. LINUCS is then given to the LINUCS to GLYDE converter which analyzes different spellings of monosaccharide residues to find an unambiguous representation, which is then compared to the instances in the KB.
  23. Websites are converted into Freedom-internal descriptions. Parts of the instance data is the carbbank id, which is not really considered in this slide. Another part is the IUPAC glycan representation, which is 2 dimensional and ambiguous. This is converted into the linear LINUCS format, which again is a purely syntactical translation of the IUPAC structure. For that reason, ambiguity might still be there in the individual residue descriptions, whereas the structure is unambiguous. LINUCS is then given to the LINUCS to GLYDE converter which analyzes different spellings of monosaccharide residues to find an unambiguous representation, which is then compared to the instances in the KB.
  24. Websites are converted into Freedom-internal descriptions. Parts of the instance data is the carbbank id, which is not really considered in this slide. Another part is the IUPAC glycan representation, which is 2 dimensional and ambiguous. This is converted into the linear LINUCS format, which again is a purely syntactical translation of the IUPAC structure. For that reason, ambiguity might still be there in the individual residue descriptions, whereas the structure is unambiguous. LINUCS is then given to the LINUCS to GLYDE converter which analyzes different spellings of monosaccharide residues to find an unambiguous representation, which is then compared to the instances in the KB.
  25. Websites are converted into Freedom-internal descriptions. Parts of the instance data is the carbbank id, which is not really considered in this slide. Another part is the IUPAC glycan representation, which is 2 dimensional and ambiguous. This is converted into the linear LINUCS format, which again is a purely syntactical translation of the IUPAC structure. For that reason, ambiguity might still be there in the individual residue descriptions, whereas the structure is unambiguous. LINUCS is then given to the LINUCS to GLYDE converter which analyzes different spellings of monosaccharide residues to find an unambiguous representation, which is then converted to the representation used in GlycO and compared to the instances already present in the KB.
  26. Use of canonical representation – model components that may be used combinatorial manner The rules of combination are also modeled as part of the ontology
  27. A metabolic pathway is (ideally) a chain of reactions leading from one chemical compound to another. (Ideally, because pathways can branch at some points leading to different intermediate structures with the same final structure) This slide shows the description of a reaction between two glycans in GlycO. The reaction adds the glycosyl residue N-glycan a-D-Glcp 71 to the acceptor residue N-glycan a-D-Glcp 70, forming the new lipid linked N-glycan precursor molecule G00008 from the lipid linked N-glycan precursor molecule G10599 (identifiers are from KEGG). The box with the bar graphs shows evidence for this reaction in different experiments. ES embryonic stem cells EB embryonic body cells RA &amp;quot;retinoic acid-treated&amp;quot; cells.  ES cells treated this way differentiate differently than cells grown in the absence of LIF, so there are two different kinds of differentiated cells (EB and RA).
  28. 1: the whole pathway is shown from the Dolichol compound over the first sugar: N-Acetyl-D-glucosaminyldiphosphodolichol (or GlcNAc-PP-dol) to the N-Glycan G00022 (KEGG accession No) or (GlcNAc)7 (Man)3 (Asn)1 (just numbers of residues, the glycan doesn’t have a common name, but belongs to a class of “Pentaantennary complex-type sugar chains”). 2. GNT-I (UDP-N-acetyl-D-glucosamine:3-(alpha-D-mannosyl)-beta-D-mannosyl-$glycoprotein 2-beta-N-acetyl-D-glucosaminyltransferase) catalyzes the reaction from 3-(alpha-D-mannosyl)-beta-D-mannosyl-R to 3-(2-[N-acetyl-beta-$D-glucosaminyl]-alpha-D-mannosyl)-beta-D-mannosyl-R 3. GNT-V (UDP-N-acetyl-D-glucosamine:6-[2-(N-acetyl-beta-D-glucosaminyl)-$alpha-D-mannosyl]-glycoprotein $6-beta-N-acetyl-D-glucosaminyltransferase) catalyzes the reaction from 6-(2-[N-acetyl-beta-D-glucosaminyl]-$alpha-D-mannosyl)-beta-D-mannosyl-R to 6-(2,6-bis[N-acetyl-$beta-D-glucosaminyl]-alpha-D-mannosyl)-beta-D-mannosyl-R, which is part of the Glycan G00021 4. The part of the ontology tree just shows where GNT-V is. 5. The GNT-V entry in the ontology shows that N-Glycan_beta_GlcNAc_9 is added with the help of Enzyme GNT-V to a sugar containing the residue N-glycan_alpha_man_4. Why this is important for GLycomics: G00021 is a so-called tetraantennary complex N-Glycan. When the red BlcNAc beta 1-6 is present due to GNT-V, this chain can be extended with polylactosamine. Polylactosamine is found in some metastatic cells. A challenge now is to find out whether this Glycan structure is always made by GNT-V. Then we might be able to tell something about GNT-V and cancer That is where probabilistic reasoning comes into play. Mention that man_4 and glcnac_9 are Contextual residues. Mention GlycoTree
  29. A metabolic pathway is (ideally) a chain of reactions leading from one chemical compound to another. (Ideally, because pathways can branch at some points leading to different intermediate structures with the same final structure) This slide shows the description of a reaction between two glycans in GlycO. The reaction adds the glycosyl residue N-glycan a-D-Glcp 71 to the acceptor residue N-glycan a-D-Glcp 70, forming the new lipid linked N-glycan precursor molecule G00008 from the lipid linked N-glycan precursor molecule G10599 (identifiers are from KEGG). The box with the bar graphs shows evidence for this reaction in different experiments. ES embryonic stem cells EB embryonic body cells RA &amp;quot;retinoic acid-treated&amp;quot; cells.  ES cells treated this way differentiate differently than cells grown in the absence of LIF, so there are two different kinds of differentiated cells (EB and RA).
  30. This slide shows a glycan in a pathway. Interesting here is the partonomy. You can see that the glycopeptide G10695 is composed of the glycan moiety N-Glycan G10695 and a representative peptide moiety. Representative means, that we don’t specify which peptide moiety it exactly is. Interesting for glycomics researchers is the fact that they are looking at a glycopeptide. The glycan moiety itself is composed of parts (the carbohydrate residues), which in turn are composed of atoms.
  31. Ontology-mediated provenance: use of ontology to construct and process provenance metadata.
  32. Zoom into examples
  33. Additional details can be found in http://www.semagix.com/documents/SemagixCIRASFinal_004.pdf
  34. (a) Serve global population of 500 users (B) Complete all source checks in 20 seconds or less © Integrate with enterprise single sign-on systems (d) Meet complex name matching and disambiguation criteria (e) Adhere to complex security requirements Results: Rapid, accurate KYC checks; Automatic audit trails; Reduction in in false positives; Streamlines and enhances due diligence of potential high risk accounts
  35. By N-glycosylation Process, we mean the identification and quantification of glycopeptides Separation and identification of N-Glycans Proteolysis: treat with trypsin Separation technique I: chromatography like lectin affinity chromatography From PNGase F: we get fractions that contain peptides and glycans – we focus only on peptides. Separation technique II: chromatography like reverse phase chromatography
  36. Over the last decade there have been tremendous advances in biology. The human genome has been sequenced. It is possible to do large scale experiments to look at the level at which all genes are active within a single cell. Technologies are provided that allow the gene profile of individuals to be determined. Another area where there have been tremendous advances is in the area of imaging. This is 2 fold. There are high throughput screening approaches of assays in the lab. But there have also been huge advances in medical images, for example, MRI approaches are much better for brain scans. New approaches have been discovered for switching off individual genes, so that we can better understand their functionality. All these advances are allowing companies to approach tailored therapeutics.
  37. Tailored therapeutics is the right dose to the right patient at the right time Benefits of better risk profile Benefits of cost – trials can be more effective Some drugs will make it to market now that would have failed otherwise But it is the end of the blockbuster era.
  38. In order to embrace tailored therapeutics it is necessary to integrate data all of the way from basic research, through prototype design to design, pre-clinical and clinical. This is because it’;s necessary to combine genetic information with information about how the different subsets will respond to disease. This is proving to be a challenge for the pharma industry as much data has historically been held in silos. The need for better integration in drug discovery and development is so acute that the US’s FDA has written a report called innovation or stagnation, that specifically encourages companies to better use their data.
  39. Described the 3 main data integration approaches But the right approach would depend upon your data environment How stable is the data, how much do you need to share it with collaborators, how important is performance, how much can you control the data (e.g. format, identifiers), does data provenance matter to you, are you looking to performance simple queries or look for complex patterns, how critical is security, are you working with many data types?
  40. It’s a hybrid world – lots of relational, XML, and increasingly RDF/OWL. Keep data in existing representation Map to RDF. Many tools for mapping relational and XML to RDF (D2RQ, Squirrel RDF, DartGrid, GRDDL) Use ontologies as the semantic integration layer
  41. - To date we have used SemWeb for TAT
  42. For companies to fully benefit from the Semantic Web approach, new software tools need to be introduced at various layers of an application stack. Databases, middleware, and applications must be enhanced in order be able to work with RDF, OWL and SPARQL. As application stacks become Semantic Web enabled, they form a flexible information bus to which new components can be added. As the Semantic Web continues to mature, there is an increasing range of data repositories available that are able to store RDF and OWL. The earliest implementations of RDF repositories, or triple stores as they are also known, were primarily open source. The technology providers adopted varying architectural approaches. For example, 3store ( http:// threestore.sourceforge.net / ) employs an in-memory representation, while Sesame ( http:// sourceforge.net /projects/sesame/ ) and Kowari ( http:// kowari.org ) use a disk-based solution. More recently a number of commercial triple stores have become available. These include RDF Gateway from Intellidimension ( http:// www.intellidimension.com / ), Virtuoso from OpenLink ( http:// virtuoso.openlinksw.com ), Profium Metadata Server from Profium ( http:// www.profium.com/index.php?id =485 ), and the Oracle Database from Oracle ( http:// www.oracle.com/technology/tech/semantic_technologies ). An interesting trend is that solutions from both Oracle and OpenLink enable RDF/OWL to be stored in a triple store alongside relational and XML data. This type of hybrid solution enables a single query to span multiple different data representations.
  43. Underlying technologies: Relational Database (DB2, Oracle, MySQL) - RDF triples stored in a table (subject, predicate, object, named graph) Save space by normalizing URIs and strings to integer ids. Extra tables for history, ACLs, replication - J2EE (Jetty, Tomcat, WebSphere) Jetty: Standalone server, checkout from CVS and run for testing WAS: Enterprise-ready Web-application server for real deployment - JMS Server (Active MQ, WebSphere MQ) pub-sub messaging used for real-time notifications of triple updates. Named Graphs A named graph is the logical unit of RDF storage in Boca. The named graph is the first-order unit of data access in the Boca programming model. Each triple exists in exactly one named graph The same S,P,O in two different graphs implies two separate statements. Adding and removing triples is done in the context of a named graph Each named graph has a metadata graph, containing information such as ACLs Named graphs can be exposed via URLs, Web Services, LSIDs Replication Boca clients have a persistent local RDF store that mirrors a subset of the triples on the Boca server. Replicated subset specified by: Triple patterns; e.g. (&lt;http://tdwg.org/meetings/GUID-2#&gt;, &lt;http://tdwg.org/preds/hasParticipant&gt;,*) Named graph URIs Triple patterns within named graphs When a replication is initiated, the service computes what has changed in the subset based on pattern and graph subscriptions. Replication can work as a background process on the client, or be explicitly initiated. Applications can query/write against graphs in the local and server models. Notification Updates to named graphs on server are published in near real-time to clients. Local replicas can be kept up-to-date between replications. Notification is central to distributed RDF applications Ex: workflow, collaboration Access Controls Boca uses can have the following system-wide permissions: canInsertNamedGraphs -- a user must have this permission in order to create a new named graph (i.e. insert statements into a graph that does not yet exist in the system) Boca users can have the following per-named-graph permissions (these apply also to the system graph): canRead -- a user with this permission may view the triples in the named graph and in its metadata graph canAdd -- a user with this permission may insert new triples into the named graph canRemove -- a user with this permission may remove triples from the named graph canChangeNamedGraphACL -- a user with this permission may change the ACL triples in the metadata graph canRemoveNamedGraph -- a user with this permission may entirely remove the named graph from the system Versioning SVN-like approach to versioning When a triple is added to or removed from a named graph, a new revision of that named graph is created. Simple API for reading old revisions Provides a straightforward mechanism for concurrent distributed computing. When a client submits an update to a named graph, it may specify the version number that it currently has. The update will fail if the graph has been more recently modified. Query Users may query Boca in a variety of ways. Query the complete database Query a subset of named graphs Query a particular named graph Query the local store
  44. Maintains fidelity (user-specified lexical form) Supports long literal values In Oracle 10g release 2, there is support for both RDF and RDFS. The implementation takes advantage of the database’s object relation capabilities. All RDF triples are parsed out of documents and stored in the system as entries in tables under the MDSYS schema. An RDF triple is treated as one database object. A single RDF document that contains multiple triples will, therefore, result in many database objects. T he user interacts with the triples at an object level, and the underlying implementation is transparent. A key feature of Oracle’s RDF storage is that subject and object nodes are stored only once, regardless of the number of times they participate in triples. Subject and object nodes are reused, if they already exist in the database. A new link, however, is always created whenever a new triple is inserted. When a triple is deleted from the database, the corresponding link is directly removed. However, the nodes attached to this link are not removed if there is at least one other link connected to them. A key component of the RDF Data Model is also its ability to store the RDF graphs in separate models. Thereby, allowing the graphs to be partitioned. Bulk load and incremental DML load. In order to be able to query the RDF data in the database we have extended SQL. We have done this using the databases table functions capability. The query capability has been designed to meet most of the requirements identified by W3C in SPARQL for graph querying. The table function allows a graph query to be embedded in a SQL query. It has the ability to search for an arbitrary pattern against the RDF data, including inferencing, based on RDF, RDFS, and user-defined rules.
  45. D2R MAP is a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies. The mappings can be used by a D2R processor to export data from a relational database into RDF. The mapping process executed by the D2R processor has four logical steps: Selection of a record set from the database using SQL 2. Grouping of the record set by the d2r:groupBy columns. 3. Creation of class instances and identifier construction. 4. Mapping of the grouped record set data to instance properties. As Semantic Web technologies are getting mature, there is a growing need for RDF applications to access the content of non-RDF, legacy databases without having to replicate the whole database into RDF. D2RQ is a declarative language to describe mappings between relational database schemata and OWL/RDFS ontologies. The mappings allow RDF applications to access the content of huge, non-RDF databases using Semantic Web query languages like SPARQL.
  46. Although the ability to store data in RDF and OWL is very important, it is expected that much data will remain in relational and XML data formats. Consequently, tools have been developed that allow data in these formats to be made available as RDF. The most commonly used approach for mapping between relational representations and RDF is D2RQ ( http://sites.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/ ), which treats non-RDF relational databases as virtual RDF graphs. Other toolkits for mapping relational data to RDF include SquirrelRDF ( http:// jena.sourceforge.net/SquirrelRDF / ) and DartGrid ( http://ccnt.zju.edu.cn/projects/dartgrid/dgv3.html ). Alternative approaches for including relational data within the Semantic Web include FeDeRate ( http://www.w3.org/2003/01/21-RDF-RDB-access/ ) that provides mappings between RDF queries and SQL queries, and SPASQL (http: //www.w3.org/2005/05/22-SPARQL-MySQL/XTech.html ) which eliminates the query-rewriting phase by modifying mySQL to allow parsing SPARQL directly within the relational database server. It is also important to be able to map XML documents and XHTML pages into RDF. W3C will soon have a candidate recommendation, Gleaning Resource Descriptions from Dialects of Languages (GRDDL) ( http://www.w3.org/2004/01/rdxh/spec ), that enables XML documents to designate how they can be translated into RDF, typically by indicating the location of an XML Stylesheet Translations (XSLT). Work is also underway to develop mapping tools from other data representations to RDF ( http://esw.w3.org/topic/ConverterToRdf ). The ability to map data into RDF from relational, XML, and other data formats, combined with the data integration capabilities of RDF, makes it exceptionally well positioned as a common language for data representation and query.
  47. Protégé is a free, open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. At its core, Protégé implements a rich set of knowledge-modeling structures and actions that support the creation, visualization, and manipulation of ontologies in various representation formats. Protégé can be customized to provide domain-friendly support for creating knowledge models and entering data. Further, Protégé can be extended by way of a plug-in architecture and a Java-based Application Programming Interface (API) for building knowledge-based tools and applications. SWOOP, meant for rapid and easy browsing and development of OWL ontologies. The GrOWL project aims to reconcile modern ontology languages with the philosophy of semantic networks and concept maps, by providing a set of graphical idioms that cover almost every knowledge modelling construct and presents a &amp;quot;knowledge space&amp;quot; to users in an intuitive graphical translation. GrOWL provides a graphical browser and an editor of OWL ontologies that can be used stand-alone or embedded in a web browser. While it is in alpha stage, it is already being used in a number of projects, including the Ecosystem Services Database and SEEK . This page introduces GrOWL and its rapidly expanding feature list . GrOWL is open source, written in Java, and it can be downloaded in the download section of this site. TopBraid Composer™ is an enterprise-class platform for developing Semantic Web ontologies and building semantic applications. Fully compliant with W3C standards, Composer offers comprehensive support for developing, managing and testing configurations of knowledge models and their instance knowledge bases. Composer provides a flexible and extensible framework with a published API for developing semantic client/server or browser-based solutions,  that can integrate disparate applications and data sources. IODT is a toolkit for ontology-driven development. This toolkit includes EMF Ontolgy Definition Metamodel (EODM), EODM workbench, and an OWL Ontology Repository (named Minerva). OntoTrack is a new browsing and editing &amp;quot;in one view&amp;quot; ontology authoring tool for OWL (currently a subset of OWL Lite called OWL Lite-). It combines a sophisticated graphical layout with mouse enabled editing features optimized for efficient navigation and manipulation of large ontologies. SemanticWorks™ 2007 provides powerful, easy-to-use functionality for: (from Altova) Visual creation and editing of RDF, RDF Schema (RDFS), OWL Lite, OWL DL, and OWL Full documents using an intuitive, visual interface and drag-and-drop functionality Syntax checking to ensure conformance with the RDF/XML specifications Auto-generation and editing of RDF/XML or N-triples formats based on visual RDF/OWL design Printing the graphical RDF and OWL representations to create documentation for Semantic Web implementations Typical tasks in Ontology Engineering: author concept descriptions refine the ontology manage errors integrate different ontologies (partially) reuse ontologies These tasks are highly challenging ; need for: tool &amp; infrastructure support design methodologies
  48. Bossam Bossam , a rule-based OWL reasoner (free, well-documented, closed-source). FaCT++ FaCT++ is an OWL DL Reasoner implemented in C++ KAON2 KAON2 is an an infrastructure for managing OWL-DL , SWRL , and F-Logic ontologies. it is capable of manipulating OWL-DL ontologies; queries can be formulated using SPARQL. Pellet Pellet is an open-source Java based OWL DL reasoner. It can be used in conjunction with both Jena and OWL API libraries; it can also be downloaded and be included in other applications. RacerPro RacerPro is an OWL reasoner and inference server for the Semantic Web
  49. Zitgist (Zitgist LLC, USA) SWSE (DERI, Ireland) Swoogle (UMBC)
  50. If a company has been able to consistently assign URIs across all of their data sources, then integrating data is straightforward with RDF. However, if different departments initially assigned different URIs to the same entity, then OWL constructs can be used to relate equivalent URIs. Several vendors provide middleware designed for such ontology-based data integration. These include OntoBroker (http://www.ontoprise.de/content/e1171/e1231/), TopBraid Suite (http://www.topquadrant.com/tq_topbraid_suite.htm), Protégé (http://protege.stanford.edu/), and Cogito Data Integration Broker (http://www.cogitoinc.com/databroker.html). A number of organizations have developed toolkits for building Semantic Web applications. The most commonly used is the Jena framework for Java-based Semantic Web applications (http://jena.sourceforge.net/). It provides a programmatic environment for reading, writing, and querying RDF and OWL. A rule-based inference engine is provided, as well as standard mechanisms for using other reasoning engines such as Pellet (http://pellet.owldl.com/) or FaCT++ (http://owl.man.ac.uk/factplusplus/). Sesame is another frequently used Java-based Semantic Web application toolkit that provides support for both query and inference (http://www.openrdf.org/about.jsp). A further commonly used application is the Haystack Semantic Web Browser that is based on the Eclipse platform (http://haystack.csail.mit.edu/staging/eclipse-download.html). There are also many programming environments that are available for developing Semantic Web applications in a range of languages (http://esw.w3.org/topic/SemanticWebTools). Ontology based integration Protégé is a free, open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. At its core, Protégé implements a rich set of knowledge-modeling structures and actions that support the creation, visualization, and manipulation of ontologies in various representation formats. Protégé can be customized to provide domain-friendly support for creating knowledge models and entering data. Further, Protégé can be extended by way of a plug-in architecture and a Java-based Application Programming Interface (API) for building knowledge-based tools and applications. TopBraid Composer™ is an enterprise-class platform for developing Semantic Web ontologies and building semantic applications. Fully compliant with W3C standards, Composer offers comprehensive support for developing, managing and testing configurations of knowledge models and their instance knowledge bases. Composer provides a flexible and extensible framework with a published API for developing semantic client/server or browser-based solutions,  that can integrate disparate applications and data sources. RDF Browsers Longwell is a web-based RDF-powered highly-configurable faceted browser. SW Browsers Haystack webMethods - Using Semantic technologies (RDF/OWL) this metadata allows developers to find assets by context (rather than just name) and to perform complex searches to ascertain relationships and dependencies between assets. In a SOA environment, where services might have complicated interdependencies with other services, this capability is vital for allowing a developer to understand the impact if a given service is changed (for example, knowing which business process depends on the VerifyCredit service). This ability to provide context helps organizations to manage the complexity of their environment, which is especially important as more moving parts are introduced. Exhibit is a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code. It&apos;s like Google Maps and Timeline , but for structured data normally published through database-backed web sites. Exhibit essentially removes the need for a database or a server side web application. Its Javascript-based engine makes it easy for everyone who has a little bit of knowledge of HTML and small data sets to share them with the world and let people easily interact with them. Haystack is a tool designed to let individuals manage all their information in ways that make the most sense to them. By removing arbitrary barriers created by applications that handle only certain information &amp;quot;types&amp;quot; and that record only a fixed set of relationships defined by the developer, we aim to let users define whichever arrangements of, connections between, and views of information they find most effective. Such personalization of information management will dramatically improve everyone&apos;s ability to find what they need when they need it.
  51. Options 1 and 2 don’t scale! Option3 – use conventions e.g. class names or meta elements. Similar to what microformats do (without referring to RDF though). They many not extract RDF but use the data directly instead in Web 2.0 applications, but the application is not that different. Other applications may extract it to yield RDF (e.g. RSS 1.0) 4. GRDDL formalizes the scrapper approach. The user has to provide dc-extract.xsl and use its conventions (making use of the corresponding meta=s, class id-s, etc). But by using the profile attribute, a client is instructed to find and run the transformation processor automatically. There is a mechanism for XML in general, a transformation can also be defined on a XML schema level. A bridge to microformats. 5. RDFa extends (X)HTML a bit by defining general attributes to add metadata to any element (a bit like the class in microformats, but via dedicated properties). Provides an almost complete serialization of RDF in XHTML. The same mechanism can also be used for general XML content. It’s like microformats but with more rigor and more generic. Makes it easy to mix vocabularies, which is not that easy with microformats. It can easily be combined with GRDDL. Both GRDDL and RDFa aim at ‘binding’ existing structural data with RDF. GRDDL brings structured data to RDF. RDFa brings RDF to structured data. The same URI may be interpreted as a web page to be displayed by a browser, or as RDF data to be integrated. Most data is in relational databases. RDFying them is impossible. Bridges are being defined – a layer between RDF and the database, rdb tables are mapped to rdf graphs on the fly, in some cases the mapping is generic (columns represent properties, cells are eg literals or references to other tables via blank nodes), in other cases separate mapping files define the details. This is a very important source of RDF data. Part of the metadata information is present in tools.. But thrown away at output. E.g. photoshop cs stores metadata in RDF. RSS1.0 is a huge amount of RDF.
  52.    DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.
  53. FAQ, best practices, standards Semanticweb. Org?