Contenu connexe Similaire à Sports and-semantic-tech-v.public (20) Sports and-semantic-tech-v.public1. Sports and Semantic Tech
Paul Kelly
XML Team Solutions
Chair, SportsML Working Party (IPTC)
Spring Meeting, IPTC
Dubai, UAE / 9th March 2011
iptc.org
sportsml.org
2. Let's Talk About This
• Exploratory, not a didactic presentation
• Purpose
– gauge interest among members
– brainstorm
– guide SWP agenda
• Explore
– set of problems
– possible solutions
– or do we have that backwards?
– business cases?
© 2010 IPTC (www.iptc.org) All rights reserved 2
3. Why Sports?
• easy? a no-brainer?
– Silver Oliver, BBC
• "Silver says the BBC has started with sport, because it is simpler. The
events and the actors taking part in those events are known in
advance. For example, even this far ahead you know the fixture list,
venues, teams and probably the majority of the players who are going
to take part in the 2010 World Cup."
– http://blogs.journalism.co.uk/editors/2010/02/24/a-history-of-linked-data-at-the-bbc/
– relationships easy to understand
• hierarchical
• sport/league/event/team/player
© 2010 IPTC (www.iptc.org) All rights reserved 3
4. Sports News Biz
• Business products
– team rosters
– schedules
– pre-event reports (text and statistical)
– live updates
– post-event reports (text and statistical)
– standings/tables
– stat reports
– injury reports
– general news
– wagering
– multimedia
– etc.
© 2010 IPTC (www.iptc.org) All rights reserved 4
5. What are the issues?
• ID resolution or acquisition
• data availability
• what to capture?
– everything rdfable?
– permanent metadata
– narrative
– perishable metadata
• implementing/architecture
• marketing scenarios
© 2010 IPTC (www.iptc.org) All rights reserved 5
6. IDs, Concepts and relationships
• IDs
– player, team, event, league, etc.
• concepts
– player, team, event, league, etc.
– also tournament-stage, season-type, etc.
– goals-scored, shots-missed, shots-on-net, etc.
• relationships
– isCompetitiveSportingOrganisationOf
– isGroupOf
– isMatchOf
– hasStat
© 2010 IPTC (www.iptc.org) All rights reserved 6
7. Data domains
• within sports domain
– eg. resolving player IDs between providers
– player page with wikipedia content
• within entire news domain
– when news and sport intersect
• doping, Beckhams, etc.
• multi-domain events like Olympics
• event management
• broader marketing domain
– personal data
• location
• favourite team
• favourite gin
© 2010 IPTC (www.iptc.org) All rights reserved 7
8. What's Out There?
• Linked Data State of the art
– dbpedia and freebase
• compare rosters for Miami Heat
– google calendar
• schedules
– Guardian medals spreadsheet
– sportscodes.org
• code resolver
• originally thought of as strictly external
• but ties in with
– internal metadata management
– other apps that produce and consume metadata
– Did I miss anything?
© 2010 IPTC (www.iptc.org) All rights reserved 8
9. Ontologies
• BBC sport ontology
– http://www.bbc.co.uk/ontologies/sport
• The Sport Ontology is a simple lightweight ontology for publishing data
about competitive sports events. The terms in this ontology allow data
to be published about:
– The structure of sports tournaments as a series of events
– Agents competing in a competition
– The type of discipline an event involves
– The award associated with the competition
– ...etc
© 2010 IPTC (www.iptc.org) All rights reserved 9
10. BBC Site
• BBC World Cup Site
– built on top of triple-store; dynamically produced via inference
– Jem Rayfield: "The BBC World Cup 2010 site features 700-plus
team, group and player pages, which are powered by a high-
performance dynamic semantic publishing (DSP) architecture.
Previously, BBC Sport would never have considered creating this
number of indices in the CPS, as each index would need an editor
to keep it up to date with the latest stories, even where automation
rules had been set up. To put this scale of task into perspective, the
World Cup site has more index pages than the rest of the BBC
Sport site."
© 2010 IPTC (www.iptc.org) All rights reserved 10
11. BBC Site
• "This framework facilitates the publication of automated
metadata-driven web pages that are light-touch, requiring
minimal journalistic management, as they automatically
aggregate and render links to relevant stories."
• "The foundation of these dynamic aggregations is a rich
ontological domain model. The ontology describes entity
existence, groups and relationships between the things/
concepts that describe the World Cup. For example, "Frank
Lampard" is part of the "England Squad" and the "England
Squad" competes in "Group C" of the "FIFA World Cup
2010"
• http://www.bbc.co.uk/blogs/bbcinternet/2010/07/
bbc_world_cup_2010_dynamic_sem.html
© 2010 IPTC (www.iptc.org) All rights reserved 11
12. BBC Site
• John O' Donovan: "Another way to think about all this, is
that we are not publishing pages, but publishing content as
assets which are then organised by the metadata
dynamically into pages"
• "We believe this is the first large scale, mass media site to
be using concept extraction, RDF and a Triple store to
deliver content."
– http://www.bbc.co.uk/blogs/bbcinternet/2010/07/
the_world_cup_and_a_call_to_ac.html
• entire BBC sports site will cut over to this architecture for
2012 Olympics.
© 2010 IPTC (www.iptc.org) All rights reserved 12
13. What to Capture?
• everything in rdf?
– where to draw the line between flat and deep data?
• vertical (sportsml) and horizontal (rdf)
• kinds of data
– stable metadata
• permanent
– player, team, event, league
• fixed
– schedules
– unpredictable permanent (meta)data
• historical post-event results
– scores
– highlights
– outcome
» historical interest, such as last time England won the World Cup
– the 0-goals, 0-assists guy?
© 2010 IPTC (www.iptc.org) All rights reserved 13
14. Perishable Metadata
• perishable metadata
– the pre-event narrative
• why should I follow this game?
• where should I watch it?
• who should I watch it with?
• more of a marketing opportunity?
© 2010 IPTC (www.iptc.org) All rights reserved 14
15. Pre-event significance
• What makes a sports event significant?
– decisive game
• Cup Final
• avoid relegation
– top teams
– matchup history
– rivalries
– top players
– streaks
• winning
• scoring
• losing
– interesting players
– news intersection (New Orleans Saints @ Super Bowl)
© 2010 IPTC (www.iptc.org) All rights reserved 15
16. Pre-Event Metadata
• These are all narratives
– all of it would be in the prose of a match preview
• Contrast
– structure and predictability of schedule
– unpredictability of narrative --> essential
– Winter Olympics narrative
• controllable?
– Georgian Luger
– "Own the Podium"
© 2010 IPTC (www.iptc.org) All rights reserved 16
17. Next Steps
• What should SportsML Working Party do?
– just SportsML?
– what about codes, concepts and ontologies
• map SportsML to ontologies
– rename to Sports News and Data Management?
© 2010 IPTC (www.iptc.org) All rights reserved 17
Notes de l'éditeur \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n