2. Long history of archive & preservation at BBC
Exploring new workflows & technology to help unlock our archives
& how the archival process is integrated within the production chain
Background/Current State:
• Manual Tagging : Online Production & Publication
• Automated Tagging : World Service Radio Archive Pilot
WiP/Direction of Travel:
• Manual & Auto : the end to end Production & integrated Archive
Background & Work In Progress
3. Manual Tagging: Online Publication & Publication
PublicationOnline Production
Events
Network of
Content
People
Places
Orgs
• Online Journalists manually tag content with structured /semantic data
• Tags are BBC minted and curated -with relationships to public datasets
maintained where possible
• Auto tag recommendations within production workflow – work in progress
• Driving dynamic audience facing services – e.g. BBC News App currently
4. Auto Tagging: World Service Radio Archive Pilot
Network of
Content
People
Places
Orgs
• Large English archive with varying quality of legacy text metadata
• R&D: Speech to Text, Speaker ID, Concept Extraction and Auto Tagging
• Crowd moderation metadata improvement
• http://www.bbc.co.uk/taster/projects/world-service-archive
R&D Analytics
WS Archive
5. Auto Tagged
Manual/Workflow
Direction of Travel: Manual & Auto Tagging: Across the Chain
Plan Acquire
Auto Tagged
Manual/Workflow
A blend of both auto and manual content metadata enrichment using
-common, structured data vocabularies, &
-common tools/services to support improved:
• DISCOVERY across all our platforms for Journalists
• REUSE of content and data – audience benefits
• IMPACT & REACH of content by infusing it with structured data eg Story based delivery
Online Production
WS Archive
Publish
Produce
Archive
Publish
6. Window On the Newsroom POC
Transcribe
Speaker
Id
Concept
Extract
Aggregate
Metadata
Internal
Audience
Analyse & Auto EnrichAcquire,
Produce,
Archive
Content Arrivals Board
Media Tagger
Plan
Discovery
Manual Tagging
Reuse/ Publish
Archive
• Manual tagging through the chain : planning publication
• Auto tagging on ingest too. Content type specific analysis
• DBPedia & BBC datasets
WIP:
• Tagging at the (w3c) Media Fragment level for to support Archive granularity
• Speech to text improvements; text in video; image recognition, non English
• POC Production