This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Web Science - ISoLA 2012
1. Using OWL Domain Models as
Abstract Workflow Models
Or...
Conducting in silico research in the Web
from hypothesis to publication
Mark Wilkinson
Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British Columbia
Vancouver, BC, Canada.
2. Context
“While it took 2,300 years after the first
report of angina for the condition to be
commonly taught in medical
curricula, modern discoveries are
being disseminated at an increasingly
rapid pace. Focusing on the last 150
years, the trend still appears to be
linear, approaching the axis around
2025.”
The Healthcare Singularity and the Age of Semantic
Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-
Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide adapted with permission from Joanne Luciano, Presentation
at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
3. “The Singularity”
The X-intercept is where, the moment a discovery is
made, it is immediately put into practice
(not only medical practice, but any research endeavour...)
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
13. We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis
simply by building a model in the Web
describing what the answer
(if one existed)
would look like
16. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
17. Original Study Simplified
Using what is known about interactions in fly & yeast
predict new interactions with your
human protein of interest
18. Abstracted
Given a protein P in Species X
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X genome
(1) Keep only those with homologue in X
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
19. Modeling the answer...
OWL
Web Ontology Language (OWL) is the
language approved by the W3C
for representing knowledge in the Web
20. Modeling the answer...
Note that every word in
this diagram is, in reality, a
URL (because it is OWL)
21. Modeling the answer...
The model of a Potential
Interactor is published in
The Web
It utilizes concepts from
other models published in
The Web
(ours and other’s)
by referencing their URLs
22. Modeling the answer...
The model of a Potential
Interactor is a network of
concepts distributed
within the Web
It will be affected by
changes to those concepts
We do not “own” all of
those concepts!
23. Modeling the answer...
ProbableInteractor
is homologous to (
Potential Interactor from ModelOrganism1…)
and
Potential Interactor from ModelOrganism2…)
Probable Interactor is defined in OWL as a subclass of Potential Interactor
that requires homologous pairs of interacting proteins to exist in both
comparator model organisms.
(Effectively, an intersection)
25. Running a Web Science 2.0
Experiment
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
26. The tricky bit is...
In the abstract, the
search for homology is
“generic” – ANY model
organism.
But when the machine
attempts to do the
experiment, it will have
to use several different
and specific resources
because our question
specifies two different taxon:4932 a i:ModelOrganism1 . # yeast
species taxon:7227 a i:ModelOrganism2 . # fly
27. This is the question we ask:
(the query language here is SPARQL)
PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>
SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
The reference (URL) to our OWL model of the answer
28. Our system then derives (and executes) the following workflow automatically
These are different
Web services!
...selected at run-time
based on the same model
31. There are three very cool things about what you just saw...
The system was able to
create a workflow based on
an OWL model (ontology)
32. There are three very cool things about what you just saw...
The system was able to create a
COMPUTATIONAL workflow
based on a BIOLOGICAL model
33. There are three very cool things about what you just saw...
The workflow it created
(i.e. the services chosen)
differed depending on context
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
34. We got the answer
“simply” by designing a model of the answer!
39. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
40. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
Note that there is no “FROM” clause!
We don’t tell it where it should get the information,
The machine has to figure that out by itself...
44. The query results are live hyperlinks
to the respective Database or images
(the answer is IN the Web!)
45. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
46. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
47. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
Note again that there is no “From” clause…
I have not told SHARE where to look for the
answer, I am simply asking my question
51. Two different
Two different providers of
providers of pathway
gene information
information (KEGG and
(KEGG & GO);
NCBI); were found &
were found & accessed
accessed
52. The results are all links to the original data
(The answer is IN the Web!)
53. Show me the latest Blood Urea Nitrogen and Creatinine levels
of patients who appear to be rejecting their transplants
(I showed you this query in ISoLA 2010… sorry for repeating myself )
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
54. Show me the latest Blood Urea Nitrogen and Creatinine levels
of patients who appear to be rejecting their transplants
(I showed you this query in 2010… sorry for repeating myself!)
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
55. Likely Rejecter:
A patient who has creatinine levels
that are increasing over time
- - Mark D Wilkinson’s definition
56. Likely Rejecter:
…but there is no “likely rejecter”
column or table in our database…
only blood chemistry measurements
at various time-points
60. SHARE “decomposes” the
Likely Rejector OWL class
into its constituent property restrictions
61. Each property restriction in the Class
is matched with a SADI Service
The matched SADI Service can
generate data that has that property
62. SHARE chains these SADI services
are into a workflow...
...the outputs from that workflow are
Instances (OWL Individuals)
of the Likely Rejector OWL Class
63. For example… SHARE utilizes SADI to discover
analytical services on the Web that do linear regression analysis;
required for the “increasing over time” part of the Class definition
65. SHARE examines the OWL Class
Gathers, from the Web, the ontologies that are
referenced by that Class
then uses those ontological properties to identify
which data-sources and analytical tools it must
access to create data matching that Class definition
67. The way SHARE builds the workflow varies
depending on the context of the query
(i.e. which data/ontologies it reads – Mine? Yours?)
and on what part of the query
it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
70. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
71. derives and executes the following workflow automatically
using an OWL ontology that describes the biology
72. The analytical tools chosen for that
workflow were determined based on
context
even though the biological (ontological)
model driving their selection was the
same
77. Every component of the model
Every component of the input data
Every component of the output data
is a URL
Therefore the model, the question,
the experiment, and the results
are inherently IN the Web
78. Every component of the model
Every component of the input data
Every component of the output data
is a URL
The answer, and the knowledge derived from it,
is immediately available to Web search engines
and moreover, can instantly affect the outcome of
other Web Science experiments
85. University of British Columbia
Luke McCarthy – Lead Dev. Edward Kawas
Everything... SADI Service auto-generator
Benjamin VanderValk Ian Wood
SHARE & SADI & Experimental modeling & Experimental modeling project
myHeath Button
Soroush Samadian
Cardiovascular data modeling and queries
86. C-BRASS Collaborators at other sites
U of New Brunswick Carleton University
Dr. Chris Baker Dr. Michel Dumontier
Alexandre Riazanov Marc-Alexandre Nolin
Leonid Chepelev
Steve Etlinger
Nichaella Kieth
Jose Cruz
n 1499, when Portuguese explorer Vasco da Gama returned home after completing the first-ever sea voyage from Europe to India, he had less than half of his original crew with him— scurvy had claimed the lives of 100 of the 160 men. Through-out the Age of Discovery,1 scurvy was the leading cause of death among sailors. Ship captains typically planned for the death of as many as half of their crew during long voyages. A dietary cause for scurvy was suspected, but no one had proved it. More than a century later, on a voyage from England to India in 1601, Captain James Lancaster placed the crew of one of his four ships on a regi- men of three teaspoons of lemon juice a day. By the halfway point of the trip, almost 40% of the men (110 of 278) on three of the ships had died, while on the lemon-supplied ship, every man sur- vived [1]. The British navy responded to this discovery by repeat- ing the experiment—146 years later.In 1747, a British navy physician named James Lind treated sail- ors suffering from scurvy using six randomized approaches and demonstrated that citrus reversed the symptoms. The British navy responded, 48 years later, by enacting new dietary guidelines re- quiring citrus, which virtually eradicated scurvy from the British fleet overnight. The British Board of Trade adopted similar dietary practices for the merchant fleet in 1865, an additional 70 years later. The total time from Lancaster’s definitive demonstration of how to prevent scurvy to adoption across the British Empire was 264 years [2].The translation of medical discovery to practice has thankfully improved sub- stantially. But a 2003 report from the Institute of Medicine found that the lag be- tween significant discovery and adoption into routine patient care still averages 17 years [3, 4]. This delayed translation of knowledge to clinical care has negative effects on both the cost and the quality of patient care. A nationwide review of 439 quality indicators found that only half of adults receive the care recommended by U.S. national standards [5].