1. WP4: TV Data Text Enrichment
Pavel
Mihaylov
(OT)
and
partners
2. Contents
Ontotext
and
its
role
in
the
project
WP4:
text,
audio
and
video
Goals
and
achievements
Demo
Conclusions
26-‐27
March
2012
NoTube
3rd
review
2
3. • Seman5c technology developer est.
in
2000
– Staff: 65 employees
and
mulMple
contractors
• Global leader in
semanMc
technologies
– Seman5c Databases:
high
performance
RDF
DBMS,
scalable
reasoning
– Seman5c Search:
text-‐mining
(IE),
InformaMon
Retrieval
(IR)
– Web Mining:
focused
crawling,
screen
scraping,
data
fusion
• Role
in
NoTube
– WP4 leader
– Seman5c Enrichment
– Experience
from
mulMple
European
projects
26-‐27
March
2012
NoTube
3rd
review
3
4. WP4:
Content
Enrichment
Content
Enrichment
• Text:
EPGs,
programme
• Adding
metadata
descripMons
• Content
about
content
• Audio
• Video
26-‐27
March
2012
NoTube
3rd
review
4
5. Goal:
Text
enrichment
SemanMc
• Analyses
short
or
free-‐text
text
segments
annotaMon
• Extends
them
with
further
component
world
knowledge
Recognising
items
of interest
in
text
Assigning
links
to
Linked
Open
Data
26-‐27
March
2012
NoTube
3rd
review
5
6. Goal:
Text
enrichment
(2)
Live at the Apollo 2/6
Not Going Out
star
Lee Mack presents
sets
from
American
comic
Rich Hall
and
Scotland’s
very
own
Danny Bhoy.
26-‐27
March
2012
NoTube
3rd
review
6
7. Goal:
MulMlingual
English
Turkish
German
Korean
Italian
TV
world
French
Dutch
Bulgarian
Arabic
26-‐27
March
2012
NoTube
3rd
review
7
8. Goal:
Graph
enrichment
Exploit
relaMons
in
SemanMc
Repository
• EnMMes
• A
richer
set
extracted
• Follow
a
of
enMMes
from
text
chain
of
LOD
predicates
Build
upon
basic
Enrich
the
basic
enrichment
enrichment
26-‐27
March
2012
NoTube
3rd
review
8
10. Goal:
Graph
enrichment
(3)
Classes
to
enrich
• Film
• TelevisionShow
• Work
• Band/MusicalArMst
• Actor
• Place
26-‐27
March
2012
NoTube
3rd
review
10
11. Film
enrichment
• Film class
•
At
least
one
common
indirect relaMon
26-‐27
March
2012
NoTube
3rd
review
11
12. TelevisionShow
enrichment
• TelevisionShow class
•
At
least
two
common
indirect relaMons
26-‐27
March
2012
NoTube
3rd
review
12
13. Work
enrichment
• Work except
Film and
TelevisionShow
•
At
least
one common indirect rela?on
26-‐27
March
2012
NoTube
3rd
review
13
14. Band/MusicalArMst
enrichment
• Band and
MusicalAr5st
•
At
least
one direct rela?on
26-‐27
March
2012
NoTube
3rd
review
14
15. Actor
enrichment
• Actor class
•
Starring
relaMon
from
at
least
two common Works
26-‐27
March
2012
NoTube
3rd
review
15
16. Place
enrichment
• Place class
•
At
least
one direct rela?on
26-‐27
March
2012
NoTube
3rd
review
16
17. Lupedia
Text
enrichment
service
• Input:
plain
text,
e.g.
programme
descripMons
• Output:
Linked
Open
Data
enrichment
• XML,
json,
RDFa
• Features:
• MulMlingual
• Graph
enrichment
• MulMple
vocabularies
• Configurable
• Fast
26-‐27
March
2012
NoTube
3rd
review
17
18. Lupedia
over
Mme
DisambiguaMon
Predicate,
Most
specific
heurisMcs
and
class
in
output
class
weights
MulMple
HeurisMcs
vocabularies
New
matching
Selectable
opMons
and
vocabulary
filters
MulMlingualism
Becer
Graph
service
enrichment
26-‐27
March
2012
NoTube
3rd
review
18
19. EvaluaMon
summary
Lupedia
compared
to
OpenCalais
and
AlchemyAPI
Lupedia
is
a
unique
service
• Only
two
other
similar
services
• Much
becer
coverage
than
either
of
them
• Comparable
precision
• Custom
vocabularies
&
filters
• Tuned
to
TV
domain
26-‐27
March
2012
NoTube
3rd
review
19
20. Links
to
other
WPs
• EnMty
URIs
point
• Lupedia
in
NLP
• Lupedia
in
to
WP1
models
based
profiling
SmartLink
and
and
enrichment
Watch’n’Buy
WP1
WP3
WP5
• IntegraMon,
• 7a
news
enrichment
in
enrichment
demo
apps
• 7c
programme
descripMon
enrichment
WP6
WP7
26-‐27
March
2012
NoTube
3rd
review
20
22. Emerging
compeMMon
Lupedia Yahoo WikiMachine En5tyPedia
LOD
output
DBpedia
&
DBpedia
?
LinkedMDB
MulMlingual
ar,
bg,
nl,
en,
en,
zh
en,
pt,
it
?
fr,
de,
it,
ko,
tr
Confidence
yes
yes
yes
?
Graph
yes
yes*
no
?
enrichment
Remark
Tuned
to
TV
No
direct
access
to
Too
generic,
Not
yet
domain,
one
of
LOD,
graph
precision
seems
released
the
pioneers
enrichment
too
lower
abstract
26-‐27
March
2012
NoTube
3rd
review
22
23. Lessons
&
Impact
Lessons
learnt:
Lupedia
recognised
• Emerging
similar
as
one
of
the
major
services
clearly
show
players
and
Various
partners
the
need
for
such
included
in
NERD:
services
willing
to
use
• Coverage
and
language
• AggregaMng
named
Lupedia
in
other
support
are
important
enMty
services
and
projects
comparing
their
performance
• hcp://nerd.eurecom.fr
26-‐27
March
2012
NoTube
3rd
review
23
24. Life
aker
NoTube
Possibly
an
Will
be
kept
OpenCalais-‐
alive
as
a
like
service
in
demo
service
future
Closed
source
26-‐27
March
2012
NoTube
3rd
review
24