SlideShare une entreprise Scribd logo
1  sur  16
CLTL
Software and Web
Services
Rubén Izquierdo Beviá
Rubén Izquierdo Beviá
About me
 5-year degree on Computer Science (University of
Alicante, Alicante, Spain)

 National NLP projects and 1 European project (QALLME)
(University of Alicante, Alicante, Spain)

 Thesis about NLP & Word Sense Disambiguation (University
of Alicante, Alicante, Spain. Sept 2010)

 Postdoc position at DutchSemCor Project (University of
Tilburg, Tilburg. Sept 2011-Sept2012)

 Postdoc position at OpeNER Project (Vrije
University, Amsterdam. Sept 2012-)
CLTL software
 In general common input/output format
 KAF
 NAF, as an extension of KAF

 Single components performing single tasks
 Integration of existing modules
 Adaptation of input/output formats

 Development of new ones
KAF
Kyoto Annotation Format
 Stand-off, layered, XML-based representation format





Different types of information are stored in different layers
Layers are linked by means of references
Suitable for creating pipelines based on this format
Layers:
 Text  tokens
 Term  lemmas, part-of-speech, term sentiment, word
senses
 Entities, chunks, opinions…
KAF
Kyoto Annotation Format
NAF
NewsReader Annotation Format
 Extension of KAF

 Allow the cross-document processing
 Event coreference

 ID’s are converted into valid URI’s

 Store the same type of information provided by different
tools
 Result of two different pos-taggers
How the software is provided I
 All modules are publicly available on GitHub
 CLTL GitHub
 http://github.com/cltl

 NewsReader GitHub
 http://github.com/newsreader

 OpeNER GitHub
 http://github.com/opener-project/
How the software is provided
II
 Some are available as Web Services
 Exposed as REST web services
 Accept and input stream (KAF/NAF)
 Generate an output stream (KAF/NAF)
 Easy to call from command line with CURL
 Easy to create module pipelines in the same way you create a
linux commands pipeline

 http://wordpress.let.vupr.nl/web-services/
How the software is provided
II
How the software is provided
II
Our software I
 General modules (integrated)
 Tokenizers: whitespace based, open-nlp trained...
 Sentence splitters: based on rules, open-nlp
 Pos-taggers: treetagger, open-nlp pos taggers
 Chunker: trained on Alpino data with open-nlp
 Parsers: Alpino (nl), Stanford (en)
Our software II
 General modules (developed by us)
 Wordnet Tools
 Functions to use a WordNet in LMF format

 Word Sense Disambiguation systems
 UKB: unsupersived
 SVM: supervised (for nl derived from DutchSemcor)

 Multiword tagger
 multiword sequences of terms according the WordNet

 OntoTagger
 Ontotagger inserts (semantic) labels into KAF representation on the basis
of lemma or wordnet synset representations of text
Our software III
 General modules (developed by us)
 Named Entity Recognizer
 Detects dates and locations using specific resources +
GeoNames

 KyBot
 Extract tuples and relations from a set of profiles formulated
using semantic and structural properties
Our software IV
 OpeNER related (developed by us)
 Hotel property tagger
 Detect aspects related with
cleanliness, staff, breakfast, rooms…

 Term polarity tagger
 Positive/negative terms, intensifiers, negators …
 Opinion miner
 Detect opinions: target + holder + expression
 2 rule based version // 1 machine learning version
Our software V
 NewsReader related (developed by us)
 Discourse Module
 Splits incoming texts into headers and paragraphs
 Factuality Classifier
 Classifies whether a statement is factual/probable/possible or
not

 Event Coreference
 Compares descriptions of events within and across
documents to decide if they refer to the same events.
CLTL
Software and Web
Services
Rubén Izquierdo Beviá

Contenu connexe

En vedette

Efficient approach of patent search paradigm (abstract)
Efficient approach of patent search paradigm (abstract)Efficient approach of patent search paradigm (abstract)
Efficient approach of patent search paradigm (abstract)
Prateek Jaiswal
 
Divine safety final
Divine safety finalDivine safety final
Divine safety final
TAVADO
 
patent search paradigm (ieee)
patent search paradigm (ieee)patent search paradigm (ieee)
patent search paradigm (ieee)
Prateek Jaiswal
 
Проект : Есть такая профессия - Родину защищать!
Проект : Есть такая профессия - Родину защищать!Проект : Есть такая профессия - Родину защищать!
Проект : Есть такая профессия - Родину защищать!
Aleksey92
 
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
Önder Gürcan
 

En vedette (15)

Social media in de culturele sector
Social media in de culturele sectorSocial media in de culturele sector
Social media in de culturele sector
 
Social media in de culturele sector
Social media in de culturele sectorSocial media in de culturele sector
Social media in de culturele sector
 
Efficient approach of patent search paradigm (abstract)
Efficient approach of patent search paradigm (abstract)Efficient approach of patent search paradigm (abstract)
Efficient approach of patent search paradigm (abstract)
 
CLTL python course: Object Oriented Programming (3/3)
CLTL python course: Object Oriented Programming (3/3)CLTL python course: Object Oriented Programming (3/3)
CLTL python course: Object Oriented Programming (3/3)
 
Divine safety final
Divine safety finalDivine safety final
Divine safety final
 
Social media & de culturele sector
Social media & de culturele sectorSocial media & de culturele sector
Social media & de culturele sector
 
5 FAQS About Dental Implants
5 FAQS About Dental Implants5 FAQS About Dental Implants
5 FAQS About Dental Implants
 
Portfolio
PortfolioPortfolio
Portfolio
 
patent search paradigm (ieee)
patent search paradigm (ieee)patent search paradigm (ieee)
patent search paradigm (ieee)
 
Проект : Есть такая профессия - Родину защищать!
Проект : Есть такая профессия - Родину защищать!Проект : Есть такая профессия - Родину защищать!
Проект : Есть такая профессия - Родину защищать!
 
CLTL Software and Web Services
CLTL Software and Web Services CLTL Software and Web Services
CLTL Software and Web Services
 
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
Self-Organizing Time Synchronization in Wireless Sensor Networks with Adaptiv...
 
Managing A Hedge Fund: Marketing To Investors & Raising Capital
Managing A Hedge Fund: Marketing To Investors & Raising CapitalManaging A Hedge Fund: Marketing To Investors & Raising Capital
Managing A Hedge Fund: Marketing To Investors & Raising Capital
 
Маркетинг Monster energy
Маркетинг Monster energyМаркетинг Monster energy
Маркетинг Monster energy
 
Peran pemimpin perubahan
Peran pemimpin perubahanPeran pemimpin perubahan
Peran pemimpin perubahan
 

Similaire à CLTL: Description of web services and sofware. Nijmegen 2013

A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
Rikki Wright
 
Evolution Of Object Oriented Technology
Evolution Of Object Oriented TechnologyEvolution Of Object Oriented Technology
Evolution Of Object Oriented Technology
Sharon Roberts
 

Similaire à CLTL: Description of web services and sofware. Nijmegen 2013 (20)

OOP Comparative Study
OOP Comparative StudyOOP Comparative Study
OOP Comparative Study
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Programing paradigm & implementation
Programing paradigm & implementationPrograming paradigm & implementation
Programing paradigm & implementation
 
Evolution Of Object Oriented Technology
Evolution Of Object Oriented TechnologyEvolution Of Object Oriented Technology
Evolution Of Object Oriented Technology
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual
 
c#.pptx
c#.pptxc#.pptx
c#.pptx
 
F# Tutorial @ QCon
F# Tutorial @ QConF# Tutorial @ QCon
F# Tutorial @ QCon
 
epicenter2010 Open Xml
epicenter2010   Open Xmlepicenter2010   Open Xml
epicenter2010 Open Xml
 
Chapter1
Chapter1Chapter1
Chapter1
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part i
 
Dot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part iDot net-interview-questions-and-answers part i
Dot net-interview-questions-and-answers part i
 
Sinux
SinuxSinux
Sinux
 
OOoCon Lpod
OOoCon LpodOOoCon Lpod
OOoCon Lpod
 
Microsoft.Net
Microsoft.NetMicrosoft.Net
Microsoft.Net
 
.Net
.Net.Net
.Net
 
OOP Java
OOP JavaOOP Java
OOP Java
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
 
OBJECT ORIENTED PROGRAMMING.docx
OBJECT ORIENTED PROGRAMMING.docxOBJECT ORIENTED PROGRAMMING.docx
OBJECT ORIENTED PROGRAMMING.docx
 

Plus de Rubén Izquierdo Beviá

Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
CLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRFCLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRF
Rubén Izquierdo Beviá
 
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor  Building a semantically annotated corpus for DutchCLIN 2012: DutchSemCor  Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
Rubén Izquierdo Beviá
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
Rubén Izquierdo Beviá
 

Plus de Rubén Izquierdo Beviá (15)

ULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of AmbiguityULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of Ambiguity
 
DutchSemCor workshop: Domain classification and WSD systems
DutchSemCor workshop: Domain classification and WSD systemsDutchSemCor workshop: Domain classification and WSD systems
DutchSemCor workshop: Domain classification and WSD systems
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Topic modeling and WSD on the Ancora corpus
Topic modeling and WSD on the Ancora corpusTopic modeling and WSD on the Ancora corpus
Topic modeling and WSD on the Ancora corpus
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Juan Calvino y el Calvinismo
Juan Calvino y el CalvinismoJuan Calvino y el Calvinismo
Juan Calvino y el Calvinismo
 
KafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF filesKafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF files
 
CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)
 
CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)
 
Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
 
CLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRFCLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRF
 
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor  Building a semantically annotated corpus for DutchCLIN 2012: DutchSemCor  Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

CLTL: Description of web services and sofware. Nijmegen 2013

  • 2. Rubén Izquierdo Beviá About me  5-year degree on Computer Science (University of Alicante, Alicante, Spain)  National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)  Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)  Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)  Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)
  • 3. CLTL software  In general common input/output format  KAF  NAF, as an extension of KAF  Single components performing single tasks  Integration of existing modules  Adaptation of input/output formats  Development of new ones
  • 4. KAF Kyoto Annotation Format  Stand-off, layered, XML-based representation format     Different types of information are stored in different layers Layers are linked by means of references Suitable for creating pipelines based on this format Layers:  Text  tokens  Term  lemmas, part-of-speech, term sentiment, word senses  Entities, chunks, opinions…
  • 6. NAF NewsReader Annotation Format  Extension of KAF  Allow the cross-document processing  Event coreference  ID’s are converted into valid URI’s  Store the same type of information provided by different tools  Result of two different pos-taggers
  • 7. How the software is provided I  All modules are publicly available on GitHub  CLTL GitHub  http://github.com/cltl  NewsReader GitHub  http://github.com/newsreader  OpeNER GitHub  http://github.com/opener-project/
  • 8. How the software is provided II  Some are available as Web Services  Exposed as REST web services  Accept and input stream (KAF/NAF)  Generate an output stream (KAF/NAF)  Easy to call from command line with CURL  Easy to create module pipelines in the same way you create a linux commands pipeline  http://wordpress.let.vupr.nl/web-services/
  • 9. How the software is provided II
  • 10. How the software is provided II
  • 11. Our software I  General modules (integrated)  Tokenizers: whitespace based, open-nlp trained...  Sentence splitters: based on rules, open-nlp  Pos-taggers: treetagger, open-nlp pos taggers  Chunker: trained on Alpino data with open-nlp  Parsers: Alpino (nl), Stanford (en)
  • 12. Our software II  General modules (developed by us)  Wordnet Tools  Functions to use a WordNet in LMF format  Word Sense Disambiguation systems  UKB: unsupersived  SVM: supervised (for nl derived from DutchSemcor)  Multiword tagger  multiword sequences of terms according the WordNet  OntoTagger  Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text
  • 13. Our software III  General modules (developed by us)  Named Entity Recognizer  Detects dates and locations using specific resources + GeoNames  KyBot  Extract tuples and relations from a set of profiles formulated using semantic and structural properties
  • 14. Our software IV  OpeNER related (developed by us)  Hotel property tagger  Detect aspects related with cleanliness, staff, breakfast, rooms…  Term polarity tagger  Positive/negative terms, intensifiers, negators …  Opinion miner  Detect opinions: target + holder + expression  2 rule based version // 1 machine learning version
  • 15. Our software V  NewsReader related (developed by us)  Discourse Module  Splits incoming texts into headers and paragraphs  Factuality Classifier  Classifies whether a statement is factual/probable/possible or not  Event Coreference  Compares descriptions of events within and across documents to decide if they refer to the same events.