SlideShare a Scribd company logo
1 of 13
Download to read offline
Cutting Long Stories Short
Fact Extraction from Wikipedia
Marco Fossati
fossati@spaziodati.eu
Poznan, 25th June 2015
What?
A Google Summer of Code Project for DBpedia
What?
Teaching Machines
to Read
Natural Language
Why?
Text Contains a Huge Amount of Knowledge
Why?
DBpedia Focuses on Semi-structured Data
Discovery of New Relations
Automatic Knowledge Base Population
How?
Machine Learning
+
Lexical Semantics
How?
Poland victory World Cup 2014
“Poland won the World Cup in 2014”
Approach
1. Lexical Units
1.1.Extraction via POS Tagging
1.2.Statistical Ranking
2. Frame Database (FrameNet, Kicktionary)
The Data-driven Way
Approach
3. Frame + Frame Elements Classification
Unsupervised, Rule-based
Supervised
4. Crowdsourced Training Set Construction
5. RDF Serialization
The Data-driven Way
Crowdsourcing the Annotation
Label words with Frame Elements
Use Case
Soccer Domain
Widely Represented (223.000 articles)
Lots of Semi-structured Data
Italian Wikipedia
Wanna contribute?
https://github.com/dbpedia/
fact-extractor
That’s all Folks!
Marco Fossati
fossati@spaziodati.eu

More Related Content

Similar to Cutting Long Stories Short Fact Extraction Wikipedia

Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic ResourcesEUDAT
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project Jie Bao
 
Cilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open SourceCilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open SourceJonathan Field
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries mdabrowski
 
E Swug2010 Info Lit
E Swug2010 Info LitE Swug2010 Info Lit
E Swug2010 Info LitMarcia Henry
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
CNR Semantic Lab presentation
CNR Semantic Lab presentationCNR Semantic Lab presentation
CNR Semantic Lab presentationOntoHR Project
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterJaroslaw Dobrzanski
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticsCornelius Puschmann
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
 
Linking data for Europeana
Linking data for EuropeanaLinking data for Europeana
Linking data for EuropeanaAntoine Isaac
 
Knowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easierKnowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easierDSquare Technologies
 
IPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven LessonsIPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven LessonsStuart Myles
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
 

Similar to Cutting Long Stories Short Fact Extraction Wikipedia (20)

Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
Cilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open SourceCilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open Source
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
E Swug2010 Info Lit
E Swug2010 Info LitE Swug2010 Info Lit
E Swug2010 Info Lit
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
CNR Semantic Lab presentation
CNR Semantic Lab presentationCNR Semantic Lab presentation
CNR Semantic Lab presentation
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge Harvester
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media Archiving
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Linking data for Europeana
Linking data for EuropeanaLinking data for Europeana
Linking data for Europeana
 
Knowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easierKnowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easier
 
IPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven LessonsIPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven Lessons
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
 
WP2 1st Review
WP2 1st ReviewWP2 1st Review
WP2 1st Review
 

More from Marco Fossati

StrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off SeminarStrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off SeminarMarco Fossati
 
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaUnsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaMarco Fossati
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked DataMarco Fossati
 
DBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for MultilingualismDBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for MultilingualismMarco Fossati
 
Primo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italianaPrimo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italianaMarco Fossati
 
Outsourcing FrameNet to the Crowd
Outsourcing FrameNet to the CrowdOutsourcing FrameNet to the Crowd
Outsourcing FrameNet to the CrowdMarco Fossati
 

More from Marco Fossati (8)

StrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off SeminarStrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off Seminar
 
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaUnsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
DBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for MultilingualismDBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
 
Primo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italianaPrimo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italiana
 
DBpedia italiana
DBpedia italianaDBpedia italiana
DBpedia italiana
 
On Data quality
On Data qualityOn Data quality
On Data quality
 
Outsourcing FrameNet to the Crowd
Outsourcing FrameNet to the CrowdOutsourcing FrameNet to the Crowd
Outsourcing FrameNet to the Crowd
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Cutting Long Stories Short Fact Extraction Wikipedia