SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
5 June 2013
BBC Linked Data Platform
Using semantic technologies to make our content more connected and more discoverable
A (very) short history
✤ Dynamic Semantic Publishing
✤ BBC Sport - Transition from ‘static’ to ‘dynamic’
✤ Introduction of Semantic Technologies for World Cup 2010
✤ Raising the bar for Olympics 2012
✤ Linked Data Platform & The Creative Work
Olympics 2012
Athletes & Medals: from trackside to our audience
BBC Linked Data Platform
(our logo)
LDP:The CreativeWork
MinimalMetadata
Semantically
AggregatedMetadata
Triple Store
Website
Mobile
Apps
IPTV
Open API
CreativeWorks
✤ Minimal metadata
✤ Enough non-semantic metadata to support ‘rich links’ in a wide
range of applications
✤ Enough semantic metadata (tags) to support discovery through
semantic queries
✤ Full metadata requires a content-type-specific metadata API
✤ Access to content requires a content API
Some use-cases
✤ Automated index pages/feeds
✤ Semantic navigation
✤ Semantic search
✤ A typical query:
✤ Top 10, most recent, BBC News Items about Politicians who are
members of The Labour Party
Powered by LDP
BBC Sport
BBC Music
BBC Olympics 2012
BBC Knowledge & Learning Beta
BBC News Local Beta
BBC Sport Mobile App
CreativeWork Ontology
CreativeWorks in Code
case class CreativeWork(
locators: Set[Locator],
title: String,
modified: DateTime,
format: Option[FormatType.FormatType] = None,
created: Option[DateTime] = None,
uri: Option[String] = None,
primaryContentOf: List[PrimaryContentOf] = List(),
about: List[String] = List(),
mentions: List[String] = List(),
`type`: CreativeWorkType = CreativeWorkType.CreativeWork,
provenance: Option[CreativeWorkProvenance] = None,
thumbnails: List[Thumbnail] = List(),
audience: Option[AudienceType] = None,
category: Option[CreativeWorkCategory] = None
) {
private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1)
private val allLocatorsDistinct = locators.map(_.uri).size == locators.size
require(title.trim.isEmpty == false, "Creative Work has an empty title")
require(title.length <= CreativeWork.MaxTitleLength,
"Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength)
require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type")
require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs")
def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", ""))
}
object CreativeWork {
val Locator = "http://www.bbc.co.uk/ontologies/cms/locator"
val MaxTitleLength = 300
}
Creative
Work Query*
CONSTRUCT {
?creativeWork a cwork:CreativeWork ;
a ?type ;
cwork:title ?title ;
cwork:about ?about ;
cwork:mentions ?mentions ;
cwork:dateModified ?modified ;
?about bbc:preferredLabel ?aboutPreferredLabel .
?mentions bbc:preferredLabel ?mentionsPrefLabel .
}
WHERE {{
SELECT DISTINCT ?creativeWork
! WHERE {
! {{#about}}
! ! FILTER (?about = <{{about}}>) .
! ! ?creativeWork cwork:about ?about .
! {{/about}}
! {{#mentions}}
! ! FILTER (?mentions = <{{mentions}}>) .
! ! ?creativeWork cwork:mentions ?mentions .
! {{/mentions}}
! ?creativeWork a cwork:CreativeWork ;
! ! a ?type ;
! ! cwork:title ?title ;
! ! cwork:dateModified ?modified .
! }
! ORDER BY DESC(?modified)
! LIMIT 10
! {{#offset}}OFFSET {{offset}}{{/offset}}
}
?creativeWork a cwork:CreativeWork .
{
?creativeWork a cwork:CreativeWork ;
a ?type ;
! ! cwork:title ?title ;
! ! cwork:dateModified ?modified .
{
?type rdfs:subClassOf cwork:CreativeWork .
} UNION {
OPTIONAL {
?creativeWork cwork:about ?about .
OPTIONAL { ?about rdfs:label ?aboutLabel . }
OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . }
}
OPTIONAL {
?creativeWork cwork:mentions ?mentions .
OPTIONAL { ?mentions rdfs:label ?mentionsLabel . }
OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . }
}
}
}
} *Simplified
SPARQL CONSTRUCT
Inner SELECT
Parametisation
Pagination
Mustache-templated
Our principal challenge:
Data Management
4 Kinds of Data
✤ Creative Works
✤ Reference Data, managed in sets (Datasets)
✤ Reference Data, managed individually (Resources)
✤ Ontologies
99.99% Availability
Our own URIs
✤ Everything has a ‘Thing URI’:
✤ http://www.bbc.co.uk/things/{GUID}#ID
✤ Opaque ID, dereferencable*
✤ BBC controls identity, therefore quality & consistency
✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc
*coming soon
Our own ontologies
✤ Core set of ontologies that are BBC owned
✤ Creative Work, BBC, (Organsational) Provenance, etc
✤ Ability to change regularly and unilaterally
✤ Provide ‘mappings’ to more widely used ontologies
(e.g. Schema.org)
✤ Domain ontologies can be shared or reused
✤ Sport, Politics, GeoLocation, etc
Open data
✤ Provided through Mashery
✤ ‘Connected Studio’ events will validate
our API
✤ Public beta to follow
✤ JSON-LD & Turtle
✤ Future
✤ Self-provisioned, cloud-based
triple stores
✤ Data Dumps
The Hard Problems...
Managing concepts across BBC
✤ Which domain ‘owns’ Arnold Schwarzenegger?
✤ News? Entertainment? History? Politics?
✤ Can domains ‘own’ predicates?
✤ Layering information over shared concepts
✤ High quality sub-sets vs. lower quality ‘long-tail’
✤ Synchronisation with external datasets
✤ Tools for creating and managing concepts
✤ Emerging, splitting & combining concepts
✤ Linked Data gives us a language to solve these problems
Metadata
Often subjective, never complete
✤ What is this TV programme about?
✤ Manual tag curation
✤ Subjective
✤ Long-term expense
✤ Inconsistent
✤ Automated tag generation
✤ Short-term expense
✤ Value in data or algorithm?
✤ Complex
✤ Relies on assumptions
✤ Our approach? Invest in both. Validate learnings.
When to reason?
✤ Our options...
✤ Before writing to the triple store
✤ Materialised in the triple store (Forward-chaining inference)
✤ Inferred by the SPARQL engine (Backward-chaining inference)
✤ After SPARQL results have returned
✤ None/some/all of the above
Maturity of SemanticTech
✤ From a Software Industry perspective, Semantic (RDF) Technology is
not mainstream and is therefore hard to sell
✤ Library/application immaturity can be a hinderance to innovation
✤ I believe the Sem Tech industry needs to focus on
simplicity and abstraction
✤ Semantic Technology is complex, but using it, need not be
Find out more
✤ Video from QCon London 2013:
✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api
✤ BBC Internet Blog:
✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-­‐Data-­‐Connecting-­‐
together-­‐the-­‐BBCs-­‐Online-­‐Content
✤ david.rogers@bbc.co.uk
✤ @daverog

Contenu connexe

Similaire à BBC Linked Data Platform (SemTechBiz San Fran 2013)

Similaire à BBC Linked Data Platform (SemTechBiz San Fran 2013) (20)

Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
GoralSoft
GoralSoftGoralSoft
GoralSoft
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De LathouwerInspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
 
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De LathouwerInspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
 
Inspire Helsinki 2019 Keynote by Bart De Lathouwer
Inspire Helsinki 2019 Keynote by Bart De LathouwerInspire Helsinki 2019 Keynote by Bart De Lathouwer
Inspire Helsinki 2019 Keynote by Bart De Lathouwer
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titanium
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
 
HTML5: An Overview
HTML5: An OverviewHTML5: An Overview
HTML5: An Overview
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator Titanium
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Azure Media Services & Azure Search
Azure Media Services & Azure SearchAzure Media Services & Azure Search
Azure Media Services & Azure Search
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Linked Open Data and Ontotext Projects
Linked Open Data and Ontotext ProjectsLinked Open Data and Ontotext Projects
Linked Open Data and Ontotext Projects
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Introduction to web scraping
Introduction to web scrapingIntroduction to web scraping
Introduction to web scraping
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

BBC Linked Data Platform (SemTechBiz San Fran 2013)

  • 1. 5 June 2013 BBC Linked Data Platform Using semantic technologies to make our content more connected and more discoverable
  • 2. A (very) short history ✤ Dynamic Semantic Publishing ✤ BBC Sport - Transition from ‘static’ to ‘dynamic’ ✤ Introduction of Semantic Technologies for World Cup 2010 ✤ Raising the bar for Olympics 2012 ✤ Linked Data Platform & The Creative Work
  • 3. Olympics 2012 Athletes & Medals: from trackside to our audience
  • 4. BBC Linked Data Platform (our logo)
  • 6. CreativeWorks ✤ Minimal metadata ✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications ✤ Enough semantic metadata (tags) to support discovery through semantic queries ✤ Full metadata requires a content-type-specific metadata API ✤ Access to content requires a content API
  • 7. Some use-cases ✤ Automated index pages/feeds ✤ Semantic navigation ✤ Semantic search ✤ A typical query: ✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party
  • 8. Powered by LDP BBC Sport BBC Music BBC Olympics 2012 BBC Knowledge & Learning Beta BBC News Local Beta BBC Sport Mobile App
  • 10. CreativeWorks in Code case class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None ) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", "")) } object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300 }
  • 11. Creative Work Query* CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel . } WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE { ! {{#about}} ! ! FILTER (?about = <{{about}}>) . ! ! ?creativeWork cwork:about ?about . ! {{/about}} ! {{#mentions}} ! ! FILTER (?mentions = <{{mentions}}>) . ! ! ?creativeWork cwork:mentions ?mentions . ! {{/mentions}} ! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . ! } ! ORDER BY DESC(?modified) ! LIMIT 10 ! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } } } *Simplified SPARQL CONSTRUCT Inner SELECT Parametisation Pagination Mustache-templated
  • 13. 4 Kinds of Data ✤ Creative Works ✤ Reference Data, managed in sets (Datasets) ✤ Reference Data, managed individually (Resources) ✤ Ontologies
  • 15. Our own URIs ✤ Everything has a ‘Thing URI’: ✤ http://www.bbc.co.uk/things/{GUID}#ID ✤ Opaque ID, dereferencable* ✤ BBC controls identity, therefore quality & consistency ✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc *coming soon
  • 16. Our own ontologies ✤ Core set of ontologies that are BBC owned ✤ Creative Work, BBC, (Organsational) Provenance, etc ✤ Ability to change regularly and unilaterally ✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org) ✤ Domain ontologies can be shared or reused ✤ Sport, Politics, GeoLocation, etc
  • 17. Open data ✤ Provided through Mashery ✤ ‘Connected Studio’ events will validate our API ✤ Public beta to follow ✤ JSON-LD & Turtle ✤ Future ✤ Self-provisioned, cloud-based triple stores ✤ Data Dumps
  • 19. Managing concepts across BBC ✤ Which domain ‘owns’ Arnold Schwarzenegger? ✤ News? Entertainment? History? Politics? ✤ Can domains ‘own’ predicates? ✤ Layering information over shared concepts ✤ High quality sub-sets vs. lower quality ‘long-tail’ ✤ Synchronisation with external datasets ✤ Tools for creating and managing concepts ✤ Emerging, splitting & combining concepts ✤ Linked Data gives us a language to solve these problems
  • 20. Metadata Often subjective, never complete ✤ What is this TV programme about? ✤ Manual tag curation ✤ Subjective ✤ Long-term expense ✤ Inconsistent ✤ Automated tag generation ✤ Short-term expense ✤ Value in data or algorithm? ✤ Complex ✤ Relies on assumptions ✤ Our approach? Invest in both. Validate learnings.
  • 21. When to reason? ✤ Our options... ✤ Before writing to the triple store ✤ Materialised in the triple store (Forward-chaining inference) ✤ Inferred by the SPARQL engine (Backward-chaining inference) ✤ After SPARQL results have returned ✤ None/some/all of the above
  • 22. Maturity of SemanticTech ✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell ✤ Library/application immaturity can be a hinderance to innovation ✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction ✤ Semantic Technology is complex, but using it, need not be
  • 23. Find out more ✤ Video from QCon London 2013: ✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api ✤ BBC Internet Blog: ✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-­‐Data-­‐Connecting-­‐ together-­‐the-­‐BBCs-­‐Online-­‐Content ✤ david.rogers@bbc.co.uk ✤ @daverog