Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
LA Semantic Web meetup nov5th 2012
1. The
Seman)c
Web
(There
and
Back
Again)
Car)c
Ramakrishnan
Pablo
N.
Mendes
Research
Scien)st
Research
Associate
Datapop
Open
Knowledge
Founda)on
11/5/12
1
2. Evolu)on
of
the
Seman)c
Web
1945
1991
+
Internet
2001
“I
have
a
dream
for
the
Web
[in
which
computers]
become
capable
of
analyzing
all
the
data
on
the
Web
–
the
content,
links,
and
transac)ons
between
people
and
computers.”
–
Tim
Berners
Lee
11/2/12
2
3. Emergent
Knowledge
in
Public
Text
Nicolas
Poussin
painted_by
Nicolas
Flammel
men-oned_in
member_of
cryp-c_mo1o_of
Victor
Hugo
member_of
Priory
of
Sion
displayed_at
wri1en_by
The
Hunchback
Louvre
displayed_at
of
Notre
Dame
painted_by
Leonardo
Da
Vinci
men-oned_in
painted_by
11/2/12
3
4. Emergent
Knowledge
in
Biomedical
Research
Papers
contain
Dietary
fish
oils
Eicosapentaenoic
acid
Confirmed
by
reduces
Eicosapentaenoic
acid
Blood
viscosity
clinical
trials
have
Raynaud’s
disease
pa)ents
elevated
blood
viscosity.
Swanson,
D.
R.
(1986).
"Fish
Oil,
Raynaud's
Syndrome,
and
Undiscovered
Public
Knowledge."
Perspec)ves
in
Biology
and
Medicine
30(1):
7-‐18.
can
inhibit
12
subsequent
Magnesium
Spreading
cor)cal
depression
studies
support
hypothesis
May
be
implicated
in
Spreading
cor)cal
depression
Migraine
Agacks
Swanson,
D.
R.
(1988).
"Migraine
and
Magnesium:
Eleven
Neglected
Connec)ons."
Perspec)ves
in
Biology
and
Medicine
31(4):
526-‐557.
11/2/12
4
5. Applica)on
of
Emergent
Knowledge
in
Biology
–
Drug
Repurposing
Rosiglitazone
Carbopla)n
induces
ac)vates
DNA
fragmenta)on
PPARγ
Peroxisome
prolifertator-‐ac)viated
receptor
gamma
induces
downregulates
downregulates
Cancer
cell
death
Metallothianine
Girnun,
G.
D.,
E.
Naseri,
et
al.
(2007).
Cancer
Cell
11(5):
395-‐406
11/2/12
5
6. Research
Areas
• Extrac)ng
Factual
Knowledge
from
Biomedical
Research
Ar)cles
– En))es
–
“Carbopla)n
induces
Cell
Death”
– Rela)ons
–
induces(Carbopla)n,
Cell
Death)
– Supervised
Machine
Learning
• Expensive
Training
data
• Discovering
Pagerns
in
Factual
Knowledge
– Paths
–
Carbopla)n
???
Rosiglitazone
– Subgraphs
11/5/12
6
7. LA-‐PDFText
–
Extrac)ng
Text
From
Research
Papers
Ramakrishnan,
C.,
A.
Patnia,
E.
Hovy
and
G.
Burns
(2012).
"Layout-‐Aware
Text
Extrac)on
from
Full-‐text
PDF
of
Scien)fic
Ar)cles."
Source
Code
for
Biology
and
Medicine
7(1):
7.
hgp://code.google.com/p/lapdoext/
11/6/12
7
8. LA-‐PDFText
–
Extrac)ng
Text
From
Research
Papers
Ramakrishnan,
C.,
A.
Patnia,
E.
Hovy
and
G.
Burns
(2012).
"Layout-‐Aware
Text
Extrac)on
from
Full-‐text
PDF
of
Scien)fic
Ar)cles."
Source
Code
for
Biology
and
Medicine
7(1):
7.
hgp://code.google.com/p/lapdoext/
11/6/12
8
9. Unsupervised
Fact
Extrac)on
Dallenbach-‐Hellweg,
G.
(1976)
Fortschr
Med
94(5):
256-‐263.
Abstract:
An
excessive
endogenous
or
exogenous
s)mula)on
by
estrogen
induces
adenomatous
hyperplasia
of
the
endometrium.
Relationship
induces
nsubj
dobj
Subject head Object head
s)mula)on
hyperplasia
det
An
amod
prep_of
amod
prep_by
amod
adenomatous
endometrium
endogenous
excessive
det
estrogen
conj_or
the
exogenous
11/2/12
9
10. Resul)ng
Structure
(RDF)
Dallenbach-‐Hellweg,
G.
(1976)
Fortschr
Med
94(5):
256-‐263.
Abstract:
An
excessive
endogenous
or
exogenous
s)mula)on
by
estrogen
induces
adenomatous
hyperplasia
of
the
endometrium.
adenomatous hyperplasia
hasModifier hasPart
An excessive
endogenous
or exogenous
stimulation modified_entity_2
hasModifier hasPart
modified_entity_1 induces composite_entity_1
hasPart hasPart
estrogen
endometrium
Car)c
Ramakrishnan,
Pablo
N.
Mendes,
Shaojun
Wang,
Amit
P.
Sheth:
Unsupervised
Discovery
of
Compound
En))es
for
Rela)onship
Extrac)on.
EKAW
2008:
146-‐155
11/6/12
10
11. Detec)ng
Nested
En))es
Chevy
Chase
Bank
on
5th
and
3rd
Chevy
Chase
Bank
on
5th
and
3rd
Syntac)c
Dependencies
nn
prep_on
nn
prep_on
[[[Chevy
Chase]
Bank] Person Org
on
5th
and
3rd ]
Loca)on
11/5/12
11
12. Result
of
Unsupervised
Extrac)on
Abstracts
of
~18
million
research
~200
million
parse
trees
En)ty
Rela)onship
network
ar)cles
adenomatous hyperplasia
hasModifier hasPart
An excessive
endogenous
or exogenous
stimulation modified_entity_2
hasModifier hasPart
modified_entity_1 induces composite_entity_1
hasPart hasPart
estrogen
endometrium
• 137,414,820
triples
with
named
rela)ons
– Triple
“hair-‐ball”
11/5/12
12
14. Discovering
Pagerns
in
Factual
Knowledge
• Finding
Paths
– Exponen)al
no.
of
paths
Informa)on
overload
– Relevance
not
all
paths
are
equally
relevant
• Our
solu)on
– Subgraph
detec)on
with
fixed
node
budget
– Heuris)c
edge
weigh)ng
to
control
relevance
Car)c
Ramakrishnan,
William
H.
Milnor,
Maghew
Perry,
Amit
P.
Sheth:
Discovering
informa)ve
connec)on
subgraphs
in
mul)-‐rela)onal
graphs.
SIGKDD
Explora)ons
7(2):
56-‐63
(2005)
11/6/12
14
15. Candidate
Subgraph
Iden)fica)on
• Bidirec)onal
lock-‐step
growth
from
S
and
T
– Next
hop
based
on
edge
weights
– Terminate
when
cut
edge
limit
reached
– Results
in
candidate
graph
11/6/12
15
16. Finding
Best
Subgraphs
• Candidate
Graph
– Too
large
to
be
useful
– Lis)ng
paths
=
informa)on
overload
• Electrical
Circuit
– Edge
weights
=
resistance
– +1
volt
at
source
node
&
ground
at
target
• Using
Ohm’s
and
Kirchoff’s
laws
– find
maximum
current
flow
paths
through
the
candidate
graph
from
S
to
T
Car)c
Ramakrishnan,
William
H.
Milnor,
Maghew
Perry,
Amit
P.
Sheth:
Discovering
informa)ve
connec)on
subgraphs
in
mul)-‐rela)onal
graphs.
SIGKDD
Explora)ons
7(2):
56-‐63
(2005)
11/6/12
16
17. Semi-‐automated
Knowledge
Discovery
in
Biomedicine
–
How
far
are
we?
• Trust
in
extracted
facts
– Extrac)on
errors
– Poor
quality
sources
– No
provenance
– Misleading
cita)ons
– Inten)onally
misleading
research
reports
– Uninten)onal
mistakes
in
research
reports
• Informa)on
overload
11/5/12
17
18. Building
A
Web
of
Linked
En))es
with
DBpedia
Spotlight
Pablo
N.
Mendes
Research
Associate
Open
Knowledge
Founda)on
11/5/12
18