SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Bringing parliamentary debates to the Semantic Web

Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1
 
1 Delft   University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb



DERIVE 2012
Boston, 12.11.2012.
Motivation




  Cross-media comparison:
• What choices do different media make in the coverage of people and topics while
  reporting on political events?

• Does the representation of topics and people change over time and how do the
  various media types differ?
Motivation




                                         Political events
Media


  Cross-media comparison:
• What choices do different media make in the coverage of people and topics while
  reporting on political events?

• Does the representation of topics and people change over time and how do the
  various media types differ?
Background: the
PoliMedia project

  • Funded by CLARIN-NL

  • May 2012 - May 2013

  • 3 phases :
     I. modeling phase: creating
        a semantic model (this
        presentation)
     II. data production phase:
         creating links between
         political events and media
     III.application phase:
        searching and navigating
        linked datasets
  • www.polimedia.nl
Research questions

• How to represent political events on the Semantic Web?
• How to represent links between media and political events on
  the Semantic Web?
Research questions

• How to represent political events on the Semantic Web?
• How to represent links between media and political events on
  the Semantic Web?
Political events data set

• Events: Dutch parliamentary debates

 Handelingen der Staten-General or Dutch Hansard


• Some provenance:
  1. Transcripts are made of the complete
     debates of the Dutch parliament.
  2. Published online by the government on
     http://www.statengeneraaldigitaal.nl/ (1818
     1995) and http://
     officielebekendmakingen.nl/ (from 1995)
  3. PoliticalMashup project has translated
     government pdf and txt files into XML, incl
     URI’s as identifiers, see http://
     politicalmashup.nl/
  4. We build on that.
Media data sets

• newspaper articles and radio bulletins

    • at the National Library of the Netherlands

    • Many, mostly regional news papers 1950-
      1995

    • Text + images of newspaper layout

• newscasts

    • at the Netherlands institute for Sound and
      Vision

    • evening news and current affairs
      programs

    • metadata in Dublin Core and CDMI format

    • enriched with thesaurus terms from the
      Gemeenschappelijke Thesaurus
      Audiovisuele Archieven (GTAA)
Semantic model: what do we need to represent? 1/2

• Important information for every parliamentary debate is:             Debate
    • When the debate was held                                        Metadata
    • What is being said in the debate (topics)
                                                                           Topic 1
    • Who is giving the speeches in the debate and in which
      role (persons)
                                                                     Speaker 1 / Content
        • Additional information about actors involved in the
          event (names of the politicians, their party, age, etc.)
                                                                     Speaker 2 / Content
    • Structure: Subparts of the debate have their own
      identifiers (part of the debate where only one speaker
      can be identified as actor)                                     Speaker 3 / Content
        • chronological order (the order in which the subparts
          where occurring inside the parliament debate,

    • Named entities apart from politicians (persons,                      Topic 2
      locations, etc.)
                                                                     Speaker 1 / Content
Semantic model: what do we need to represent? 2/2




                         • Various information about media
                           items linked to the debate

                         • Links between subparts of the
                           debate and news articles, radio
                           bulletins and television newscasts
URI’s

• PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech

• Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel

• debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.
  198219830000846.2.11.12

• Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:
  010069811:mpeg21:pdf
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model   W.R. van Hage, V. Malaisé, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Semantic model   W.R. van Hage, V. Malaisé, R.
                 Segers, L. Hollink and A.Th.
                 Schreiber. Design and use of
                 the Simple Event Model
                 (SEM)
Current work: finding links

• Queries: speaker name + named entities + topics (created using
  topic modeling methods) extracted from political events dataset
• used for retrieval of media articles




         TopicList   =
           NamedEntitiesVector   TopicWordSetVector   NamedEntitiesVector   TopicWordSetVector
               Speech                  Speech           PartOfDebate           PartOfDebate



           +
         Speaker X       =
            ActorFromSpeech                                                                      TimeFrame
Finally

  • SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard
    data will be available soon.

  • Feel free to use it!

  • Links to media + search/browse app are expected early next year.
Thank you for your
                  attention!




  Henri Beunders (EUR)         Damir Juric (TU Delft)
     Jaap Blom (NISV)          Max Kemman (EUR)
     Laura Hollink (VU)        Martijn Kleppe (EUR)
Geert-Jan Houben (TU Delft)    Johan Oomen (NISV)

Contenu connexe

Similaire à Bringing parliamentary debates to the Semantic Web

Groningen nl pgroep
Groningen nl pgroepGroningen nl pgroep
Groningen nl pgroepmaartenmarx
 
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineTimo Wandhoefer
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European ParliamentLaura Hollink
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...Miriam Fernandez
 
Keynote Exploring and Exploiting Official Publications
Keynote Exploring and Exploiting Official PublicationsKeynote Exploring and Exploiting Official Publications
Keynote Exploring and Exploiting Official Publicationsmaartenmarx
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenMaxKemman
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshopPim Huijnen
 
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentationSENSE4US project
 
networks inparliament-ccct
 networks inparliament-ccct networks inparliament-ccct
networks inparliament-ccctmaartenmarx
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis Zelia Blaga
 
Development cooperation: A bibliometric approach to examine knowledge and com...
Development cooperation:A bibliometric approach to examine knowledge and com...Development cooperation:A bibliometric approach to examine knowledge and com...
Development cooperation: A bibliometric approach to examine knowledge and com...Sarah Cummings
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Liliana Bounegru
 
Libby Hemphill, "Elected Officials and Social Media"
Libby Hemphill, "Elected Officials and Social Media"Libby Hemphill, "Elected Officials and Social Media"
Libby Hemphill, "Elected Officials and Social Media"summersocialwebshop
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
 
Leading Change Ch 2
Leading Change Ch 2Leading Change Ch 2
Leading Change Ch 2Alex Moll
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodssmyrnaios
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesLaura Hollink
 

Similaire à Bringing parliamentary debates to the Semantic Web (20)

Groningen nl pgroep
Groningen nl pgroepGroningen nl pgroep
Groningen nl pgroep
 
WeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens OnlineWeGov Analysis Tools to connect Policy Makers with Citizens Online
WeGov Analysis Tools to connect Policy Makers with Citizens Online
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
 
Keynote Exploring and Exploiting Official Publications
Keynote Exploring and Exploiting Official PublicationsKeynote Exploring and Exploiting Official Publications
Keynote Exploring and Exploiting Official Publications
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
 
Cultural text mining workshop
Cultural text mining workshopCultural text mining workshop
Cultural text mining workshop
 
Sense4us PACITA event presentation
Sense4us PACITA event presentationSense4us PACITA event presentation
Sense4us PACITA event presentation
 
Ecpr general conference_presentation
Ecpr general conference_presentationEcpr general conference_presentation
Ecpr general conference_presentation
 
networks inparliament-ccct
 networks inparliament-ccct networks inparliament-ccct
networks inparliament-ccct
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Development cooperation: A bibliometric approach to examine knowledge and com...
Development cooperation:A bibliometric approach to examine knowledge and com...Development cooperation:A bibliometric approach to examine knowledge and com...
Development cooperation: A bibliometric approach to examine knowledge and com...
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
 
Libby Hemphill, "Elected Officials and Social Media"
Libby Hemphill, "Elected Officials and Social Media"Libby Hemphill, "Elected Officials and Social Media"
Libby Hemphill, "Elected Officials and Social Media"
 
Elected Officials on Social Media for Webshop 2012
Elected Officials on Social Media for Webshop 2012Elected Officials on Social Media for Webshop 2012
Elected Officials on Social Media for Webshop 2012
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
Leading Change Ch 2
Leading Change Ch 2Leading Change Ch 2
Leading Change Ch 2
 
Presentacion defensa marcelo_2018_v01
Presentacion defensa marcelo_2018_v01Presentacion defensa marcelo_2018_v01
Presentacion defensa marcelo_2018_v01
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 

Plus de Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentLaura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftLaura Hollink
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenarioLaura Hollink
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectLaura Hollink
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Laura Hollink
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisLaura Hollink
 

Plus de Laura Hollink (7)

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
 

Bringing parliamentary debates to the Semantic Web

  • 1. Bringing parliamentary debates to the Semantic Web Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1   1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb DERIVE 2012 Boston, 12.11.2012.
  • 2. Motivation Cross-media comparison: • What choices do different media make in the coverage of people and topics while reporting on political events? • Does the representation of topics and people change over time and how do the various media types differ?
  • 3. Motivation Political events Media Cross-media comparison: • What choices do different media make in the coverage of people and topics while reporting on political events? • Does the representation of topics and people change over time and how do the various media types differ?
  • 4. Background: the PoliMedia project • Funded by CLARIN-NL • May 2012 - May 2013 • 3 phases : I. modeling phase: creating a semantic model (this presentation) II. data production phase: creating links between political events and media III.application phase: searching and navigating linked datasets • www.polimedia.nl
  • 5. Research questions • How to represent political events on the Semantic Web? • How to represent links between media and political events on the Semantic Web?
  • 6. Research questions • How to represent political events on the Semantic Web? • How to represent links between media and political events on the Semantic Web?
  • 7. Political events data set • Events: Dutch parliamentary debates Handelingen der Staten-General or Dutch Hansard • Some provenance: 1. Transcripts are made of the complete debates of the Dutch parliament. 2. Published online by the government on http://www.statengeneraaldigitaal.nl/ (1818 1995) and http:// officielebekendmakingen.nl/ (from 1995) 3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http:// politicalmashup.nl/ 4. We build on that.
  • 8. Media data sets • newspaper articles and radio bulletins • at the National Library of the Netherlands • Many, mostly regional news papers 1950- 1995 • Text + images of newspaper layout • newscasts • at the Netherlands institute for Sound and Vision • evening news and current affairs programs • metadata in Dublin Core and CDMI format • enriched with thesaurus terms from the Gemeenschappelijke Thesaurus Audiovisuele Archieven (GTAA)
  • 9. Semantic model: what do we need to represent? 1/2 • Important information for every parliamentary debate is: Debate • When the debate was held Metadata • What is being said in the debate (topics) Topic 1 • Who is giving the speeches in the debate and in which role (persons) Speaker 1 / Content • Additional information about actors involved in the event (names of the politicians, their party, age, etc.) Speaker 2 / Content • Structure: Subparts of the debate have their own identifiers (part of the debate where only one speaker can be identified as actor) Speaker 3 / Content • chronological order (the order in which the subparts where occurring inside the parliament debate, • Named entities apart from politicians (persons, Topic 2 locations, etc.) Speaker 1 / Content
  • 10. Semantic model: what do we need to represent? 2/2 • Various information about media items linked to the debate • Links between subparts of the debate and news articles, radio bulletins and television newscasts
  • 11. URI’s • PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech • Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel • debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d. 198219830000846.2.11.12 • Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd: 010069811:mpeg21:pdf
  • 17. Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 18. Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
  • 19. Current work: finding links • Queries: speaker name + named entities + topics (created using topic modeling methods) extracted from political events dataset • used for retrieval of media articles TopicList = NamedEntitiesVector TopicWordSetVector NamedEntitiesVector TopicWordSetVector Speech Speech PartOfDebate PartOfDebate + Speaker X = ActorFromSpeech TimeFrame
  • 20. Finally • SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard data will be available soon. • Feel free to use it! • Links to media + search/browse app are expected early next year.
  • 21. Thank you for your attention! Henri Beunders (EUR) Damir Juric (TU Delft) Jaap Blom (NISV) Max Kemman (EUR) Laura Hollink (VU) Martijn Kleppe (EUR) Geert-Jan Houben (TU Delft) Johan Oomen (NISV)