2. HypercubeChemical Semantics, September 20132
What is this all about ?
The principal objective of our enterprise is to create a
testbed for comprehensive exploration of ideas behind
the practical application of the Semantic Web in
computational chemistry.
The aforementioned working testbed (Chemical
Semantics Portal) is initially limited to computational
chemistry and a limited class of users.
In addition, we will focus on the semi-empirical, ab-initio
and density functional (DFT) calculations of quantum
chemistry and their typical results.
The purpose of this talk is to present the ideas of the
Semantic Web and their possible application in
computational chemistry, and to present the working
prototype of the Chemical Semantic Portal.
4. HypercubeChemical Semantics, September 20134
The evolution of the Web
WEB 1.0 - Web of documents
WEB 2.0 - Social, Read/Write Web
WEB 3.0 - Semantic Web = Web of Data
? WEB 4.0 - Intelligent Web ?
* AssumingChristmas1990asitsbeggining(http://en.wikipedia.org/wiki/History_of_the_World_Wide_Web)
The web is only 8287 days* (23 years) old !
Print – 203,800 days
Newspapers – 142,800 days
Radio – 41,200 days
TV – 28,000 days
7. HypercubeChemical Semantics, September 20137
Web 3.0 – Semantic Web
2010-2020(?) - Web of Data, Linked Data Web
Link
Link
Link
Link
Link
Link
Link
Link
LinkLink
Resource
Resource
Resource
Resource
Resource
Resource
Resource
Resource
hasPeople
humanResources
hasServices
hasProducts
hasPeople
hasPeople
hasProduct
hasProduct
colleaguecolleague
Organization
HR
Services
Products
People
People
Product
Product
9. HypercubeChemical Semantics, September 20139
The WEB is TOO BIG to know
Web 1.0 & 2.0 major issues
The WEB is TOO BIG to know
Social Web dwells in isolated silos
Data Deluge - Scientific data stored in isolated silos
People look at the Web through Google’s Goggles
11. HypercubeChemical Semantics, September 201311
What is Semantic Web ?
The Semantic Web is a Web of data. It is an extension of
the current Web that provides an easier way to find, share,
reuse and combine information.
“The vision of the Semantic Web is to extend principles of the Web
from documents to data.(...) This also means creation of a common
framework that allows data to be shared and reused across
application, enterprise, and community boundaries, to be
processed automatically by tools as well as manually, including
revealing possible new relationships among pieces of data.”
http://www.w3.org/2001/sw/
12. HypercubeChemical Semantics, September 201312
Foundations of Semantic Web
―Semantic‖ in ―Semantic Web‖ is about MEANING of data, not
about the syntax it is expressed in.
Semantic Web = Web Full of Meaning = Web of meaningful
Data
Semantic Web is about representation of THINGS (OBJECTS
and CONCEPTS) and their properties on the Web, not just about
documents
Semantic Web uses global NAMING scheme to identify
THINGS, not just to address documents
Semantic Web links THINGS with TYPED LINKS, not with ―blind‖
hyperlinks
Semantic Web allows DISCOVERY of new FACTS about
THINGS,not just browsing through pages
* Picture by Roger Sayle (http://pubs.acs.org/doi/abs/10.1021/ci800243w)
13. HypercubeChemical Semantics, September 201313
Example
COC(=O)[C@H](C1=CC=CC=C1Cl)N2CCC3=C(C2)C=CS3
InChI=1S/C16H16ClNO2S/c1-20-16(19)15(12-4-2-3-5-
13(12)17)18-8-6-14-11(10-18)7-9-21-14/h2-
5,7,9,15H,6,8,10H2,1H3/t15-/m0/s1
InchI (Key)=GKTWGGQPFAXNFI-HNNXBMFYSA-N
“Plavix” (Clopidogrel)
* Based on “Foreign Language Translation of Chemical Nomenclature by Computer” by Roger Sayle (DOI: 10.1021/ci800243w)
http://www.chemspider.com/InChIKey=GKTWGGQPFAXNFI-HNNXBMFYSA-N
14. HypercubeChemical Semantics, September 201314
How do we represent THINGS on SW
On the Semantic WEB we represent THINGS using elementary UNITS
of data: TRIPLES.
We can create logical and structural relations between elements of the triple, build taxonomies,
vocabularies and classes and finally “reason” on large sets of triples.
The file format we store the triples in — is called RDF.
:H2O gnvc:hasInChIString “1S/H2O/h1H2”
For example:
Subject Predicate Object
Thing Property Value
Resource Description Framework
:hasMolecularMass “18.0153”
“RDF is for THINGS as HTML is for DOCUMENTS”
15. HypercubeChemical Semantics, September 201315
How do we Identify Things on the Semantic Web
For unambiguous identification of things (objects)on the Web
and their properties, Semantic Web uses URIs — Universal Resource
Identifiers, a generalization of URL i.e. Ordinary Web addresses:
Water
Molecular
Mass “18.0153”
http://www.chemicalsemantics.com/h2o
http://purl.org/chem/ns#MM A number
17. HypercubeChemical Semantics, September 201317
Semantic Web allows Discovery
Semantic Web tools for building ―inteligent‖
vocabularies – RDFS (RDFS Schema) and OWL
ontologies allow for simple logical INFERENCES
and discovery of IMPLICIT facts.
For example:
When a user searches for a molecule with
specific properties, it is possible to automatically
provide him with other molecules that belong to
the same ―class‖ of molecules. .
18. HypercubeChemical Semantics, September 201318
Semantic Web = GGG (Giant Global Graph)
Organization
HR
Services
Products
People
People
Product
Product
hasPeople
humanResources
hasServic
es
hasProducts
hasPeople
hasPeople
hasProduc
t
hasProduc
t
colleaguecolleague
GGG – term coined by Tim Berners Lee in 2007
Ooops… sorry, but it’s BIG
Semantic Web = GGG (Giant Global Graph)
19. HypercubeChemical Semantics, September 201319
Core Semantic Web Technologies
RDF — ResourceDescriptionFramework
RDFa— RDF ―inattributes‖
RDFS— ResourceDescriptionFrameworkSchema Language
OWL — OntologyWeb Language
SPARQL— Semantic Protocol& RDF Query Language
RIF— Rule InterchangeFormat
RDF deals with THINGS
RDFa enablesto embed RDF into ordinaryHTML Web Pages
RDFS deals with SETS and CLASSES of THINGS
OWL deals with intelligentVOCABULARIES(withlogical relationsbetween
concepts)
SPARQL allows for searchingthroughgraphsof triples storedin ―triple stores‖
RIF allows to expressand interchange generalizedIF...THENconstructs
20. HypercubeChemical Semantics, September 201320
AAA— Anyonecan say Anythingabout Any Topic.
... and one about Semantic Web Philosophy
OWA— Open WorldAssumption.
We mustassumethatat any time a new piece of informationmay come so
we can’t assumethatwe have ALL the informationat themomentof
informationconsumption.
It also means that not knowing something does not necessarily
imply falsity!
HendlerHypothesis:
“A Little SemanticsGoes A Long Way”
21. HypercubeChemical Semantics, September 201321
Link Data Four Principles:
• UseWEB ADDRESES (URLs) as namesfor things.
• UseADDRESSES THATWORK ON THE WEB
- sothat peoplecan lookup thosenames.
• Whensomeonelooksup a URL,PROVIDEUSEFUL
INFORMATION,USING THE STANDARDS
(likeRDF).
• IncludeLINKS TO OTHERURLs,so thatthey can
discovermore things.
Hendler Hypothesis in action...
The Semantic Web isn't just about putting data on the web. It is about making links,
so that a person or machine can explore the web of data. With linked data, when you
have some of it, you can find other, related, data. (Tim-Berners Lee)
22. HypercubeChemical Semantics, September 201322
Ontologies
“An ontology formally represents knowledge as a set of
concepts within a domain, and the relationships between
pairs of concepts. It can be used to model a domain and
support reasoning about concepts.” (Wikipedia)
The fundamental goals of ontologies:
Define concepts used in Semantic graphs (like RDF)
Enable terminological standardisation
Provide tools for building intelligent dictionaries with
synonyms and cross-references
Enable encoding of taxonomies (hierarchical definitions)
Enable reasoning and inferencing – discovering implicit
knowledge
23. Chemical Semantics, September 201323 Hypercube
Antoine Lavoisier “Traité élémentaire de chimie”
Early ideas in ontology
"We think only through the medium of words. --
Languages are true analytical methods. (…) The
art of reasoning is nothing more than a language
well arranged.
Thus, while I thought myself employed only in
forming a Nomenclature, and while I proposed
to myself nothing more than to improve the
chemical language, my work transformed itself
by degrees, without my being able to prevent
it, into a treatise upon the Elements of
Chemistry.
24. Chemical Semantics, September 201324 Hypercube
Nivaldo J. Tro “Chemistry. A Molecular Approach”
Example of Ontology “Hello world”
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix chem: <http://purl.org/chem/simple_classification#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foo: <http://example.com/this/> .
## Classes
chem:Matter a rdfs:Class ;
rdfs:label "Matter"@en ;
rdfs:label "Matière"@fr ;
rdfs:label "Materia"@pl .
chem:PureSubstances a rdfs:Class ;
rdfs:label "Pure Substances"@en ;
rdfs:label "Substances Pures"@fr ;
rdfs:label "Substancja"@pl ;
rdfs:subClassOf chem:Matter .
chem:Mixture a rdfs:Class ;
rdfs:label "Mixture"@en ;
rdfs:label "Mélange "@fr ;
rdfs:label "Mieszanina"@pl ;
rdfs:subClassOf chem:Matter .
chem:Heterogeneous a rdfs:Class ;
rdfs:label "Heterogeneous"@en ;
rdfs:label "Hétérogène"@fr ;
rdfs:label "Heterogeniczny"@pl ;
rdfs:subClassOf chem:Mixture .
chem:Homogeneous a rdfs:Class ;
rdfs:label "Homogeneous"@en ;
rdfs:label "Homogène"@fr ;
rdfs:label "Jednorodny"@pl ;
rdfs:subClassOf chem:Mixture .
## Properties
chem:atomicNumber a rdf:Property ;
rdfs:domain chem:Element;
rdfs:range rdfs:Literal .
chem:moleculeName a rdf:Property ;
rdfs:domain chem:Compound;
rdfs:range rdfs:Literal .
chem:componentName a rdf:Property ;
rdfs:domain chem:Mixture ;
rdfs:range chem:Matter .
25. Chemical Semantics, September 201325 Hypercube
Non-Trivial Ontologies in Chemistry
ChEBI – Chemical Entities of Biological Interest
http://www.ebi.ac.uk/chebi/
Project of EMBL-EBI
European Bioinformatics Institute (Cambridge) of European
Molecular Biology Lab (Heidelberg)
OBO Foundry Ontology (http://www.obofoundry.org/ )
The Open Biological and Biomedical Ontologies
Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities
focused on „small‟ chemical compounds.
The term „molecular entity‟ refers to any constitutionally or isotopicaly distinct atom, molecule, ion, ion
pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity.
The molecular entities in question are either products of nature or synthetic products used to intervene
in the processes of living organisms.
ChEBI incorporates an ontological classification, whereby the relationships between molecular entities
or classes of entities and their parents and/or children are specified.
26. Chemical Semantics, September 201326 Hypercube
Non-Trivial Ontologies in Chemistry
ChemINF – Chemical Information Ontology
https://code.google.com/p/semanticchemistry/
Janna Hastings, Nico Adams, Christoph Steinbeck (EBI)
Leonid Chepelev, Michel Dumontier,
Egon Willighagen, Nico Adams
OBO Foundry Candidate
ChemINF descibes:
• Chemical graphs, and various formats for encoding them.
• Chemical descriptors, with definitions and axioms describing what they are
specifically about.
• Specifications for certain descriptors.
• Algorithms and their software implementations and axioms describing their inputs
and outputs.
• Chemical data representation formalisms and formats.
27. HypercubeChemical Semantics, September 201327
Chemical Semantics Ontology
http://purl.org/gc/gc.owl
Gainesville Core (alpha edition)
Gainesville Core describes:
• Molecular Publications
• Molecular Systems
• Molecular Calculations
Molecular Systems contain Molecules
• The Molecules may have Residues (for
biopolymers and polymers)
• Molecular Calculations contain Initial Data
and Results
• The Initial Data may have Methods, Basis
Sets, Functionals, etc.
• The Results may have Energies, Wave
Functions and Spectra, etc.
GC aims at complete description of typical
Computational Chemistry experiment
29. HypercubeChemical Semantics, September 201329
Related Ontologies ...
SIO – Semanticscience Integrated Ontology
OPB– Ontologyof Physicsfor Biology
RXNO – Name Reaction Ontology
CMO – Chemical Methods Ontology
MOP– Molecular Proocesses Ontology
SO – The Sequence OntologyProject
30. HypercubeChemical Semantics, September 201330
Importance of Structural Data Structures
CML – Chemical Markup Language
“CMLisnot'justanotherfileformat';itiscapableofholdingextremelycomplexinformation
structuresandsoactingasaninterchangemechanismorforarchival.Itinterfaceseasilywith
moderndatabasearchitecturessuchasrelationaldatabasesorobject-orienteddatabases.
Mostimportantly,italargeamountofgenericXMLsoftwaretoprocessandtransformitis
alreadyavailablefromthecommunity.”
P.Murray-Rust,H.S.Rzepa,2001
CML“pavedtheroad”toSemanticsinChemistry.
Extremelyusefulasaninterchangeformat betweenCCsoftwareandSemanticWeb
Ourposition:ChemicalSemanticswilluseCSX–similarstructuralformatenrichedbyexplicit
descriptionof molecularconstituents,enricheddescriptionofcomputationsinputsandresults
.
31. HypercubeChemical Semantics, September 201331
A timeline of Semantic Web
RDF–1999
CML-ChemicalMarkupLanguage-1999
FOAF-2000
RDFa-2004
DBPedia–2007
ChEBI-ChemicalEntitiesofBiologicalInterest-2007
GoodRelations(2008,Googleadoption:November2,2010)
Schema.org–June2011
Google’sKnowledgeGraph–May2012
FacebookGraphSearch-January2013
32. Chemical Semantics, September 201332 Hypercube
An emerging successor to the web, the
Semantic Web, will likely profoundly
change the very nature of how
scientific knowledge is produced and
shared, in ways that we can now barely
imagine.
Conclusion
34. HypercubeChemical Semantics, September 201334
CS Portal main targets
Interoperable PUBLISHING of Computational
Chemistrycalculations
FEDERATIONof publisheddata with existing
web-based chemicaldatasets
Cloud-like ARCHIVING of Computational
Chemistrycalulations results, input/output
files etc.
37. HypercubeChemical Semantics, September 201337
http://portal.chemicalsemantics.com/cs
Manualpublication(upload)
Automatedpublicationdirectly from
ModellingSoftware - via Web API
39. HypercubeChemical Semantics, September 201339
Permanent Chemical URIs
Automatedgeneration of permanent URIs
http://purl.org/chem/pub/2013-08-04-quercetin
Owned & controlled by
OCLC (Online
Computer Library
Center)
Is claimed to be
persistent and eternal.
Owned by OCLC
controlled by
Chemical
Semantics, Inc.
Generated by Chemical
Semantics, Inc. for the
user. Owned by the user.
40. HypercubeChemical Semantics, September 201340
URI naming scheme
Publication
http://purl.org/chem/pub/2013-08-05-betacyanin
http://purl.org/chem/pub/2013-08-05-betacyanin/mol-calc
Molecular Calculations
http://purl.org/chem/pub/2013-08-05-betacyanin/molSys
Molecular System
A Molecule of the system
http://purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1
Bonds between atomsin the molecule
http://purl.org/chem/pub/2013-08-05-betacyanin/molSys/m1/a1a12
41. HypercubeChemical Semantics, September 201341
Dual nature of the URIs
Realizes Linked Data Principles
For Humans(i.e. as seen via web browser)
http://purl.org/chem/pub/2013-08-02-pyridine_base
Returns:
42. HypercubeChemical Semantics, September 201342
Dual nature of the URIs
Realizes Linked Data Principles
For Machines (i.e. as seen via Semantic Tools (rdfEditor,Fidler))
http://purl.org/chem/pub/2013-08-02-pyridine_base
Returns:
Content-
negotiations:
“Onegets
what one
asksfor”
49. HypercubeChemical Semantics, September 201349
SPARQL queries on CS Portal
CountingnumberoftriplesinthegraphsoftheCSPortal
SELECT ?graph (count(*) as ?count)
WHERE {
GRAPH ?graph { ?s ?p ?o . }
}
group by ?graph
order by DESC(?count)
50. HypercubeChemical Semantics, September 201350
SPARQL queries on CS Portal
Countingnumberofelementsinallmolecularsystemson
theCSPortal
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gc: <http://purl.org/gc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
?element (count(*) as ?count)
WHERE {
?atom gc:isElement ?element .
}
GROUP BY ?element
ORDER BY DESC(?count)
51. HypercubeChemical Semantics, September 201351
SPARQL queries on CS Portal
Numberofdifferentcalculationsinallmolecularsystemsof
theCSPortal
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gc: <http://purl.org/gc/>
SELECT
?resultType (count(*) as ?count)
WHERE {
GRAPH ?graph {
?calc rdf:type gc:Calculation ;
gc:hasResult ?result .
?result rdf:type ?resultType .
}
}
group by ?resultType
order by DESC(?count)