2. WP1
Overview
• “Backend” shared datasets and services
• Mappings, integration and common vocabulary
• Extra datasets to support usecase scenarios
2
Monday, March 26, 2012
3. WP1:
Year
3
Direc2on
&
Achievements
• Moving
from
single
‘warehouse’
to
distributed
set
of
databases,
datasets
and
services
• Planning
for
sustainable
life-‐aFer-‐project
• Integra2ng
feedback
from
end-‐to-‐end
demos
3
Monday, March 26, 2012
5. Why
WP1?
two
roles
• NoTube
internal:
a
hub
for
data
sharing
• NoTube
external:
show
how
shared
datasets
and
vocabularies
help
with
user-‐facing
“Web
and
TV”
problems
• “show”
-‐cri2cally-‐
includes
“thinking
out
loud”
as
we
explore,
via
blog,
email,
twiTer
etc.
– scholarly
ar2cles
rarely
reach
our
target
audiences
5
Monday, March 26, 2012
6. Outreach
message
• Let
metadata
flow
widely
-‐
adver2sing
content,
rather
than
be
a
hidden
asset
• Iden/fy
and
link
content
with
useful
URLs(*)
• Open
APIs
to
control
TV
and
link
devices
[WP7c]
...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011
WP1 concerned primarily with the first two: getting metadata into the Web from
source, rather than scraping, guessing, approximating.
6
Monday, March 26, 2012
7. Aside:
RDFa
went
mainstream
• Try
‘View
source’
on
IMDB,
RoTen
Tomatoes,
BBC,
tv.com
sites
to
find
RDF
descrip2ons
of
TV
content.
• NoTube’s
approach
was
to
lead
by
example,
to
engage
with
industry
and
to
plan
from
the
beginning
for
the
‘aFerlife’.
• This
strategy
worked.
7
Monday, March 26, 2012
8. Facebook OGP
tv.com 'The Wire' page
...simple, extensible standards are being adopted
OGP since 2010; schema.org since 2011...
8
Monday, March 26, 2012
9. TV
Data
Warehouse
• We
s2ll
host
several
crawls
of
TV
EPG
data
• Trend
is
for
data
to
be
more
cleanly
available
from
source,
without
scraping
• Crawling,
aggrega2on
and
integra2on
s2ll
useful,
but
less
scraping
required
• Crawled
'data
warehouse'
also
used
as
a
research
testbed
collec2on
9
Monday, March 26, 2012
10. WP1:
Example
Datasets
• WP7c/WP3
use
DBpedia/Wikipedia
URLs
for
topics;
covers
all
mainstream
areas.
• BBC
also
using
Lonclass/UDC
topic
codes
(we’re
helping
prepare
this
for
sharing)
• For
Music,
we
adopt
MusicBrainz
IDs
• Mapping
diverse
representa2ons
of
‘genre’
• “Organic”
item/topic
similarity
measures
derived
from
user
data
from
WP3
10
Monday, March 26, 2012
11. WP1:
Data
Services
• Data
Services
exposed
as
sta2c
files:
– Show
how
to
embed
RDFa
in
HTML
– Publish
as
RDF/XML
Linked
Data
• Interac2ve
Data
Services:
– Using
W3C
SPARQL,
SQL
or
SOLR/Lucene,
over
HTTP
and/or
XMPP.
11
Monday, March 26, 2012
12. WP1:
Exploita2on
and
Sustainability
• WP1’s
approach
designed
to
outlive
NoTube
• Use,
augment
and
contribute
to
external
data
– e.g.
DBpedia,
Archive.org,
W3C
&
wider
Web
of
data
trend
(e.g.
RDFa
adop2on)
– also
we
demonstrate
e.g.
on
blog
how
we
did
it
-‐
so
others
can
replicate
it
– WP4
enrichments
can
be
fed
back
to
externals,
e.g.
similarity
metrics
&
clusters
12
Monday, March 26, 2012
13. WP1:
Sustainability
2
• NoTube’s
2010
W3C
“Web
&
TV”
posi2on
paper
lobbied
for
unique
IDs
&
public
metadata
for
video
content;
this
is
now
going
mainstream.
• VUA
will
con2nue
hos2ng
some
data,
using
PURL.org
so
can
pass
e.g.
to
W3C
later.
• Collab
with
Facebook
OGP
(helped
with
their
RDFa
adop2on)
and
now
search
engine's
Schema.org
(RDFa
and
extending
TV
vocab).
13
Monday, March 26, 2012
15. Workpackage
Links
• Background
data
for
all
Workpackages
• Collaborated
with
WP2
on
BMF
RDF
models
• Closer
2es
throughout
WP3/7
developments
• WP4
en2ty
and
topic
URIs
point
to
WP1
• Outreach
work
around
RDFa,
Posi2on
Paper
15
Monday, March 26, 2012
16. 2nd
review
comments
• Not
clear
though
how
this
work
has
built
upon
the
results
of
year
1,
and
how
the
current
progress
is
in
line
with
the
case
studies.
– Worked
more
closely
and
pragma1cally
with
case
studies
in
WP7,
especially
7c
and
related
WP3
work.
Moved
towards
more
decentralised
model,
instead
of
'warehouse'.
– 7c
collabora1on
with
KMI's
'Watch
and
Buy'
scenario,
and
with
WP4
1med
ad
inser1on
work,
used
EU
p2pnext
'limo'
work;
also
egtaMETA
from
EBU
from
7c
– WP1
work
became
more
"hands-‐on";
we
helped
WP7
extract
datasets
such
as
TED.com
and
Archive.org
which
we
expect
will
shortly
be
replaceable
by
cleaner
informa1on
from
'official'
sources.
16
Monday, March 26, 2012
17. 2nd
review
comments
• No
relevant
state
of
the
art
is
documented
and
no
details
or
cita<ons
on
automated
algorithms
are
given.
Evalua<on
is
restricted
to
examples
and
no
quan<ta<ve
data
are
given.
– We
accept
weakness
in
report
(lack
of
scholarly/
scien1fic
detail);
chose
to
focus
on
more
informal
communica1on
with
outside
world
in
final
phase.
A
2nd
version
of
the
doc
was
produced,
but
main
changes
were
around
'life
aUer
project'
themes
rather
than
adding
more
scien1fic
and
scholarly
detail.
17
Monday, March 26, 2012
18. 2nd
review
comments
•
A
close
collabora5on
with
WP7
is
recommended
in
order
to
ensure
that
work
meets
the
requirements
of
the
use
cases.
– this
very
well
describes
our
emphasis
in
final
phase
18
Monday, March 26, 2012
19. Lessons
Learned
• It's
hard
to
simulate
an
evolving
global
data
ecosystem;
but
we've
played
a
small
part
in
some
huge
changes.
• Publishers
will
adopt
simple
Seman2c
Web
standards
when
they
are
given
an
incen5ve.
• It's
hard
for
a
4-‐year
old
plan
to
stay
relevant
in
such
an
environment;
ability
to
be
agile
was
cri2cally
important.
19
Monday, March 26, 2012
20. WP1
Summary
• Used
open
standards
(RDF)
and
largely
open
data
(e.g.
Wikipedia/DBpedia)
• Integrated,
mapped
and
data-‐mined
• Contribu1ng
our
addi1ons
back
to
the
community
/
commons
(highlight:
BBC
sims)
• Documen1ng
what
we
learned
for
external
developers
and
subsequent
projects
Questions?
20
Monday, March 26, 2012
23. WP1:
End-‐to-‐End
issues
• In
final
year,
our
End-‐to-‐End
scenarios
have
more
mature
implementa2ons
• Feedback
from
WP3/7c:
key
issue
is
sparsity
of
large
vocabularies
when
used
for
record
matching.
No
single
solu2on
here.
• Integra2ng
techniques
from
WP4
(e.g.
clustering,
data-‐mining)
cri2cal
for
applying
large
and
chao2c
vocabularies
for
prac2cal
recommenda2ons.
23
Monday, March 26, 2012