1. Research and Development ♥ BBC MMXIII
A Linked Data Context Strategy
for the BBC
Michael Smethurst,
BBC Internet Research and Future Services
With thanks to
Yves Raimond, Tristan Ferne, Olivier Thereaux, Paul Rissen
2. Research and Development ♥ BBC MMXIII
Why Linked Data?
1. On the web content needs context to be useful
2. The BBC has data on its output but not on the subjects of
its output
3. Commercial data is usually modelled at the wrong level
(saleable items)
4. Commercial data doesn’t give you the freedom to make
your own APIs on top
5. Using inference minimises workload
34. How do we answer…
Which radio programmes interviewed Nelson
Mandela in 1990?
How can I find a picture of a relative in a
library’s photo archive?
Was my music used in the background of
that TV programme?
from its beginnings in the 1920s in radio. Now 10 national radio channels and more than 40 in the nations and regions
TV broadcast since the 1930s
On the Web since 1994. that's a lot of web-history too, we've been doing this for a while
The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
Data is a first-class citizen
Working on the World Service audio archive three years of continuous audio
Speech recognition -> automated transcripts + topic identification (at scale) Kiwi is a framework aimed at automatically identifying topics in speech radio programmes, with topic identifiers being drawn from Linked Open Data sources such as DBpedia. In order to generate such topics in a reasonable time for large programme archives, we built a processing infrastructure distributing computations on cloud resources (e.g. Amazon EC2). We used this infrastructure to automatically tag the entire BBC World Service archive (70,000 programmes) in around two weeks.