SlideShare une entreprise Scribd logo
1  sur  25
University of Sheffield, NLP
Entity and Event Recognition in
ARCOMEM
University of Sheffield, UK
© The University of Sheffield, 1995-2013
This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence
University of Sheffield, NLP
What is Entity and Event Recognition?
● Entity and Event recognition are two of the main components of
an Information Extraction (IE) system.
● IE is about the automatic discovery of new, previously unknown
information, by automatically extracting information such as
entities and events from different textual resources.
● A key element is the linking together of this extracted
information to form new facts or to allow new hypotheses to be
explored further
● For example, entities and events can be used to help find
opinions expressed in the text about people, objects, or things
that have happened
University of Sheffield, NLP
Information Extraction with GATE
● In ARCOMEM, we use GATE for performing the information
extraction and opinion mining tasks
● GATE is an architecture for language processing, in
development at the University of Sheffield since 2000, and used
by hundreds of thousands of researchers, scientists and
organisations all over the world
● It includes components for language processing, e.g. parsers,
machine learning tools, stemmers, IR tools, IE components for
various languages...
● It also includes tools for visualising and manipulating text,
annotations, ontologies, parse trees, etc., and tools for evaluation
University of Sheffield, NLP
ARCOMEM offline processing architecture
GATE
components
University of Sheffield, NLP
Entity Recognition
● The application for Entity Recognition is based on (modified
versions of) ANNIE and TermRaider, which are part of GATE
● It consists of a set of Processing Resources (PRs) executed
sequentially over the corpus of documents
● Document Pre-processing
● Linguistic Pre-processing
● Named Entity Recognition
● Term Extraction
● RDF generation
University of Sheffield, NLP
Linguistic Pre-Processing
● We use the following PRs:
● Tokeniser
● Sentence Splitter
● Language Identification
● POS tagger
● Morphological analyser
● For social media such as twitter, we use a specially adapted
version of some of these
University of Sheffield, NLP
Named Entity Recognition
● The following entity types are recognised by the NER
component, corresponding to the data model
● Persons (e.g. artists, politicians, web 2.0 users)
● Organizations (e.g. companies, music bands, political
parties)
● Locations (e.g. cities, countries)
● Dates and times (of events and content publication)
● NER consists of:
● Gazetteer lookup
● JAPE grammars
● Co-reference
University of Sheffield, NLP
Document annotated with NEs
University of Sheffield, NLP
Document annotated with co-references
items in the co-reference chain
University of Sheffield, NLP
German Entity Extraction
• System for German NER is similar to the English one
• It uses German-specific resources (TreeTagger for lemmatisation
and POS tagging, tailored gazetteers and grammars)
• We use a language identification module (TextCat) to determine
which documents or sentences are in German
University of Sheffield, NLP
Entities in the Rock am Ring forum
Term Recognition
University of Sheffield, NLP
Term Recognition in ARCOMEM
● Terms generally consist of important nouns or noun phrases
that are not usually proper names (unlike most Named Entities)
but reflect the content or topic of the text
● For example, important terms in a political text might be
“government”, “minister”, “financial situation”, “Chinese
economy”
● These are used in a similar way to Named Entities, for things
like event recognition, topic detection and opinion mining
● In GATE, we use a tool called TermRaider for finding terms
University of Sheffield, NLP
TermRaider
• GATE plugin for detecting single and multi-word terms
• Based on a simple web service developed originally in the EU NeOn
project
• This version runs in GATE, with visualisation tools, and extended
functionality such as new scoring systems
• We have developed an adaptation for German.
• Terms are ranked according to three possible scoring systems:
– tf.idf = term frequency (nbr of times the term occurs in the
corpus) divided by document frequency (nbr of documents in
which the term occurs)
– augmented tf.idf = after scoring tf.idf, the scores of hypernyms
are boosted by the scores of hyponyms
– Kyoto domain relevance = document frequency × (1 + nbr of
hyponyms in the corpus), Bosma and Vossen 2010
University of Sheffield, NLP
TermRaider: Methodology
• After linguistic pre-processing (tokenisation, lemmatisation, POS
tagging etc.), nouns and noun phrases are identified as initial term
candidates
• Noun phrases include post-modifiers such as prepositional phrases,
and are marked with head information for determining hyponymy.
Nested nouns and noun phrases are all marked as candidates.
• Term candidates are then scored in 3 ways.
• The results can be viewed in the GATE GUI, exported as RDF
according to the ARCOMEM data model, or saved as CSV files
• The viewer can be used to adjust the cutoff parameter. This is used
to determine the score threshold for a term to be considered valid
• Terms can also be shown as a tag cloud
University of Sheffield, NLP
Term candidates in a document
University of Sheffield, NLP
Top terms from Greek Financial Crisis corpus
University of Sheffield, NLP
Terms can be exported as a tag cloud
University of Sheffield, NLP
Event Recognition
University of Sheffield, NLP
Expression of events
● Events can be expressed by:
● verbal predicates and their arguments (e.g. “The committee
dismissed the proposal”);
● noun phrases headed by nominalizations (e.g. “economic
growth”);
● adjective-noun combinations (e.g. “governmental measure”;
“public money”);
● event-referring nouns (e.g. “crisis”, “cash injection”).
University of Sheffield, NLP
Rule-based event extraction in GATE
● Recognition of entities and the relations between them in order to find
domain-specific events and situations.
● In a (semi-)closed domain, this approach is preferable to an open IE-
based approach which holds no preconceptions about the kinds of
entities and relations possible.
● Application for event recognition is designed to follow the entity
recognition application, so no linguistic pre-processing is necessary
● Basic approach involves finding event-indicative seed words (e.g.
“downturn” might indicate an economic event) and some linguistic
relaation to existing entities (e.g. “Greece”) in the sentence.
University of Sheffield, NLP
Event extraction application
● The application in GATE consists of the following Processing
Resources:
● event gazetteer
● verb phrase chunker
● event recognition JAPE grammars
● RDF generation
● As with the NE recognition module, a German version is also
available, and conditional processors decide, based on the
language identified for each sentence, which version to run.
University of Sheffield, NLP
Events annotated in GATE
University of Sheffield, NLP
Machine Learning Applications
● We have also developed machine learning applications for
performing both entity and event extraction in GATE
● These make use of the language detection and linguistic pre-
processing components, and then replace the JAPE grammars
with a learning algorithm based on annotated training data
● These provide an alternative mechanism for achieving the
same result, but require individual training for each different
domain of text we want to apply them to.
University of Sheffield, NLP
More information
● More details on entity, event and opinion mining from a
technical point of view are available from
http://gate.ac.uk/projects/arcomem/training.html
● Comprehensive training material about using GATE for text
mining is available at http://gate.ac.uk/family/training.html

Contenu connexe

Tendances

Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DLAndrea Nuzzolese
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text miningLokesh Ramaswamy
 
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConcAACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConcyasuimao
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language ProcessingVsevolod Dyomkin
 
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-RuleML
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Péter Király
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...RuleML
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in PracticeVsevolod Dyomkin
 

Tendances (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
IR
IRIR
IR
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Basics
BasicsBasics
Basics
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
co:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidalco:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidal
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Multilingual document analysis
Multilingual document analysisMultilingual document analysis
Multilingual document analysis
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConcAACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
 

Similaire à Arcomem training entities-and-events_advanced

Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisDiana Maynard
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advancedarcomem
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierBasis Technology
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
A Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionA Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionAmber Ford
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ijnlc
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...kevig
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Modelijtsrd
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxsaivinay93
 

Similaire à Arcomem training entities-and-events_advanced (20)

Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
A Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionA Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information Extraction
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
ResearchPaper
ResearchPaperResearchPaper
ResearchPaper
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 

Plus de arcomem

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)arcomem
 
Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)arcomem
 
Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advancedarcomem
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginnersarcomem
 
Arcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedArcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedarcomem
 
Arcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersArcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersarcomem
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedarcomem
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advancedarcomem
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginnerarcomem
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedarcomem
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advancedarcomem
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawlsarcomem
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerarcomem
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginnerarcomem
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advancedarcomem
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginnerarcomem
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advancedarcomem
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginnerarcomem
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advancedarcomem
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversificationarcomem
 

Plus de arcomem (20)

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
 
Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)
 
Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advanced
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Arcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedArcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advanced
 
Arcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersArcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginners
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advanced
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advanced
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginner
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advanced
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advanced
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawls
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginner
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginner
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advanced
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginner
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advanced
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginner
 
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversification
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Arcomem training entities-and-events_advanced

  • 1. University of Sheffield, NLP Entity and Event Recognition in ARCOMEM University of Sheffield, UK © The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence
  • 2. University of Sheffield, NLP What is Entity and Event Recognition? ● Entity and Event recognition are two of the main components of an Information Extraction (IE) system. ● IE is about the automatic discovery of new, previously unknown information, by automatically extracting information such as entities and events from different textual resources. ● A key element is the linking together of this extracted information to form new facts or to allow new hypotheses to be explored further ● For example, entities and events can be used to help find opinions expressed in the text about people, objects, or things that have happened
  • 3. University of Sheffield, NLP Information Extraction with GATE ● In ARCOMEM, we use GATE for performing the information extraction and opinion mining tasks ● GATE is an architecture for language processing, in development at the University of Sheffield since 2000, and used by hundreds of thousands of researchers, scientists and organisations all over the world ● It includes components for language processing, e.g. parsers, machine learning tools, stemmers, IR tools, IE components for various languages... ● It also includes tools for visualising and manipulating text, annotations, ontologies, parse trees, etc., and tools for evaluation
  • 4. University of Sheffield, NLP ARCOMEM offline processing architecture GATE components
  • 5. University of Sheffield, NLP Entity Recognition ● The application for Entity Recognition is based on (modified versions of) ANNIE and TermRaider, which are part of GATE ● It consists of a set of Processing Resources (PRs) executed sequentially over the corpus of documents ● Document Pre-processing ● Linguistic Pre-processing ● Named Entity Recognition ● Term Extraction ● RDF generation
  • 6. University of Sheffield, NLP Linguistic Pre-Processing ● We use the following PRs: ● Tokeniser ● Sentence Splitter ● Language Identification ● POS tagger ● Morphological analyser ● For social media such as twitter, we use a specially adapted version of some of these
  • 7. University of Sheffield, NLP Named Entity Recognition ● The following entity types are recognised by the NER component, corresponding to the data model ● Persons (e.g. artists, politicians, web 2.0 users) ● Organizations (e.g. companies, music bands, political parties) ● Locations (e.g. cities, countries) ● Dates and times (of events and content publication) ● NER consists of: ● Gazetteer lookup ● JAPE grammars ● Co-reference
  • 8. University of Sheffield, NLP Document annotated with NEs
  • 9. University of Sheffield, NLP Document annotated with co-references items in the co-reference chain
  • 10. University of Sheffield, NLP German Entity Extraction • System for German NER is similar to the English one • It uses German-specific resources (TreeTagger for lemmatisation and POS tagging, tailored gazetteers and grammars) • We use a language identification module (TextCat) to determine which documents or sentences are in German
  • 11. University of Sheffield, NLP Entities in the Rock am Ring forum
  • 13. University of Sheffield, NLP Term Recognition in ARCOMEM ● Terms generally consist of important nouns or noun phrases that are not usually proper names (unlike most Named Entities) but reflect the content or topic of the text ● For example, important terms in a political text might be “government”, “minister”, “financial situation”, “Chinese economy” ● These are used in a similar way to Named Entities, for things like event recognition, topic detection and opinion mining ● In GATE, we use a tool called TermRaider for finding terms
  • 14. University of Sheffield, NLP TermRaider • GATE plugin for detecting single and multi-word terms • Based on a simple web service developed originally in the EU NeOn project • This version runs in GATE, with visualisation tools, and extended functionality such as new scoring systems • We have developed an adaptation for German. • Terms are ranked according to three possible scoring systems: – tf.idf = term frequency (nbr of times the term occurs in the corpus) divided by document frequency (nbr of documents in which the term occurs) – augmented tf.idf = after scoring tf.idf, the scores of hypernyms are boosted by the scores of hyponyms – Kyoto domain relevance = document frequency × (1 + nbr of hyponyms in the corpus), Bosma and Vossen 2010
  • 15. University of Sheffield, NLP TermRaider: Methodology • After linguistic pre-processing (tokenisation, lemmatisation, POS tagging etc.), nouns and noun phrases are identified as initial term candidates • Noun phrases include post-modifiers such as prepositional phrases, and are marked with head information for determining hyponymy. Nested nouns and noun phrases are all marked as candidates. • Term candidates are then scored in 3 ways. • The results can be viewed in the GATE GUI, exported as RDF according to the ARCOMEM data model, or saved as CSV files • The viewer can be used to adjust the cutoff parameter. This is used to determine the score threshold for a term to be considered valid • Terms can also be shown as a tag cloud
  • 16. University of Sheffield, NLP Term candidates in a document
  • 17. University of Sheffield, NLP Top terms from Greek Financial Crisis corpus
  • 18. University of Sheffield, NLP Terms can be exported as a tag cloud
  • 19. University of Sheffield, NLP Event Recognition
  • 20. University of Sheffield, NLP Expression of events ● Events can be expressed by: ● verbal predicates and their arguments (e.g. “The committee dismissed the proposal”); ● noun phrases headed by nominalizations (e.g. “economic growth”); ● adjective-noun combinations (e.g. “governmental measure”; “public money”); ● event-referring nouns (e.g. “crisis”, “cash injection”).
  • 21. University of Sheffield, NLP Rule-based event extraction in GATE ● Recognition of entities and the relations between them in order to find domain-specific events and situations. ● In a (semi-)closed domain, this approach is preferable to an open IE- based approach which holds no preconceptions about the kinds of entities and relations possible. ● Application for event recognition is designed to follow the entity recognition application, so no linguistic pre-processing is necessary ● Basic approach involves finding event-indicative seed words (e.g. “downturn” might indicate an economic event) and some linguistic relaation to existing entities (e.g. “Greece”) in the sentence.
  • 22. University of Sheffield, NLP Event extraction application ● The application in GATE consists of the following Processing Resources: ● event gazetteer ● verb phrase chunker ● event recognition JAPE grammars ● RDF generation ● As with the NE recognition module, a German version is also available, and conditional processors decide, based on the language identified for each sentence, which version to run.
  • 23. University of Sheffield, NLP Events annotated in GATE
  • 24. University of Sheffield, NLP Machine Learning Applications ● We have also developed machine learning applications for performing both entity and event extraction in GATE ● These make use of the language detection and linguistic pre- processing components, and then replace the JAPE grammars with a learning algorithm based on annotated training data ● These provide an alternative mechanism for achieving the same result, but require individual training for each different domain of text we want to apply them to.
  • 25. University of Sheffield, NLP More information ● More details on entity, event and opinion mining from a technical point of view are available from http://gate.ac.uk/projects/arcomem/training.html ● Comprehensive training material about using GATE for text mining is available at http://gate.ac.uk/family/training.html