SlideShare une entreprise Scribd logo
1  sur  21
The Role of CNL and AMR
in Scalable Abstractive Summarization
for Multilingual Media Monitoring
Normunds Grūzītis and Guntis Bārzdiņš
University of Latvia, IMCS
National information agency LETA
5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland
Large-scale media monitoring
BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day.
A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the
maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with
the number of monitored sources.
Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it
is, they are tied down with mundane, routine monitoring tasks.
Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.
SUMMA – Scalable Understanding of Multilingual MediA
Identify people, places, events of interest
Discover trends, emerging events, crucial new stories
H2020 grant No. 688139
Timeline
Storyline
Event-based multi-document summarization: storyline highlights across a set of related stories
unrestricted
sort of CNL?
(templates)
• Extractive summarization selects
representative sentences from
the input documents
• Abstractive summarization builds
a semantic representation from
which a summary is generated
• What semantic representation?
Sentence A: I saw Joe’s dog, which was running in the garden.
Sentence B: The dog was chasing a cat.
Summary: Joe’s dog was chasing a cat in the garden.
Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive
Summarization Using Semantic Representations. NAACL 2015
Abstractive summarization
AMR – Abstract Meaning Representation
• A semantic representation aimed at large-scale human annotation
• A practical, replicable amount of abstraction
• Captures many aspects of meaning in a single simple data structure
• Aims to abstract away from (English) syntax
• Rooted, labeled graphs
• Makes heavy use of PropBank framesets
• An actual sembank of nearly 50K sentences
• Sentences paired with their whole-sentence, logical meanings
AMR – Abstract Meaning Representation
• A form of AMR has been around for a long time (Langkilde and Knight, 1998)
• It has changed a lot since then: PropBank, DBpedia, etc.
• Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme
• Uses the PENMAN notation (Bateman, 1990)
• A way of representing a directed labeled graph in a simple tree-like form
• Easy to read and write (for a human), and to traverse (for a program)
• From semantic role labelling (SRL) to whole-sentence representation
AMR – Abstract Meaning Representation
• Nodes are variables labelled by concepts
• Entities, events, states, properties
• d / dog: d is an instance of dog
• Edges are semantic relations
• E.g. “The dog is eating bones.”
(e / eat-01
:ARG0 (d / dog)
:ARG1 (b / bone))
eat.01: consume (VN-class: eat-39.1, FN-frame: Ingestion)
ARG0-PAG: consumer, eater (VN-role: agent)
ARG1-PPT: meal (VN-role: patient)
e / eat-01
b / boned / dog
AMR – Abstract Meaning Representation
“Bob ate four cakes that he bought.”
(x2 / eat-01
:ARG0 (x1 / person
:name (n / name
:op1 "Bob")
:wiki "Bob_X")
:ARG1 (x4 / cake
:quant 4
:ARG1-of (x7 / buy-01
:ARG0 x1)))
e / eat-01
x4 / cakex1 / person
x7 / buy-01
"Bob_X"
name
4
AMR – Abstract Meaning Representation
Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015
https://github.com/nschneid/amr-tutorial/
• AMR is still biased towards
English or other source
languages
• Not an Interlingua, but close:
Comparison of English AMRs to
Chinese and Czech
Xue N., Bojar O., Hajič J., Palmer
M., Uresova Z., Zhang X. LREC 2014
• Meanwhile, AMR is agnostic
about how to derive meanings
from strings, and vice versa
Natural Language Understanding
• While it has been recently showed that the CNL approach can be scaled up..
• Embedded CNLs allowing for CNL-based domain-specific information extraction
• CNL as an efficient and user-friendly interface for Big Data end-point querying
• CNL for bootstrapping robust NL interfaces
• High-level CNL for legal sources
• ..use cases like media monitoring are not limited to a particular domain, the input sources vary
from newswire texts to TV and radio transcripts to user-generated content in social networks
• In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with
robust and scalable NLU
• NLU cannot be approached by CNLs, and grammars in general (?)
SemEval 2016 Task 8 on AMR parsing
1. Riga (University of Latvia / LETA): 0.6196
2. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.6195
3. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.6005
4. UCL+Sheffield (University College London / University of Sheffield): 0.5983
5. M2L (Kyoto University): 0.5952
6. CMU (Carnegie Mellon University / University of Washington): 0.5636
7. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.5566
8. UofR (University of Rochester): 0.4985
9. MeaningFactory (University of Groningen): 0.4702*
10. CLIP@UMD (University of Maryland): 0.4370
11. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*
* Did not use AMR training data
NLG from AMR
• The potential of grammar-based and CNL approaches becomes obvious in the opposite direction
• e.g. in the generation of story highlights from summarized (pruned) AMR graphs
• Text generation from AMR is still recognized as a future task
• An unexplored niche for grammars and CNLs
• GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers
• Issue: AMR to AST mapping
Pourdamghani N., Gao
Y., Hermjakob U.,
Knight K. Aligning
English Strings with
Abstract Meaning
Representation
Graphs. EMNLP 2014
Butler A. Deterministic natural language
generation from meaning representations
for machine translation. NAACL 2016
Workshop on Semantics-Driven Machine
Translation
Pourdamghani N., Knight K., Hermjakob U.
Generating English from Abstract Meaning
Representations. INLG 2016 (to appear)
Flanigan J., Dyer C., Smith N.A., Carbonell J.
Generation from Abstract Meaning
Representation using Tree Transducers.
NAACL 2016
NLG from AMR
• Butler A. 2016. Deterministic natural language generation from meaning representations for
machine translation. NAACL Workshop on Semantics-Driven Machine Translation
• Converts PENMAN-style representations to Penn-style trees
• Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library
• Covers a wide range of constructions
• A simple example: “Girls see a boy.”
AMR to GF conversion: first experiment
“Girls see a boy.”
(x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy))))
mkCl : NP ⟶ VP ⟶ Cl
mkVP : V2 ⟶ NP ⟶ VP
mkNP : Quant ⟶ Num ⟶ CN ⟶ VP
mkCN : N ⟶ CN
(mkCl
(mkNP a_Quant singularNum (mkCN girl_N))
(mkVP
see_V2
(mkNP a_Quant singularNum (mkCN boy_N))))
adjoin (Cl (VP @)) with PB-frame
move ARG0 under Cl
move ARG1 under VP
adjoin (NP a_Quant singularNum (CN @)) with ARG0/1
excise var
AMR to GF conversion: first experiment
“The boy sees the two pretty girls.”
(x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty)))))))
mkCN : A ⟶ N ⟶ CN
mkNum : Digits ⟶ Num
mkDigits : Str ⟶ Digits
(mkCl
(mkNP a_Quant singularNum (mkCN boy_N))
(mkVP
see_V2
(mkNP a_Quant (mkNum (mkDigits "2")) (mkCN pretty_A girl_N))))
move mod under CN
replace Num with quant
adjoin (Num (Digits @)) with quant
Story headlines: Templates? Application grammar? CNL?
Multilingual Headlines Generator
(a GF toy example by José P. Moreno)
http://grammaticalframework.org/
demos/multilingual_headlines.html
Conclusion
• There is a potential for cooperating with the DL folks in both NLU and NLG
• Especially in NLG which is recognized among the next problems to “solve” by DL
• Especially in domain specific use cases that can be approached by CNL
• AMR to text issues to be addressed: number, time, co-references, articles,
concepts and WSD (for multilingual NLG), named entities, reification; the
management of transformation rules

Contenu connexe

En vedette (7)

Freashman orientation presentation 07 16-2013
Freashman orientation presentation 07 16-2013Freashman orientation presentation 07 16-2013
Freashman orientation presentation 07 16-2013
 
Douglas Arellanes - Find a way or make one: Transforming the news media by sh...
Douglas Arellanes - Find a way or make one: Transforming the news media by sh...Douglas Arellanes - Find a way or make one: Transforming the news media by sh...
Douglas Arellanes - Find a way or make one: Transforming the news media by sh...
 
micheleshalmon-resume 2016
micheleshalmon-resume 2016micheleshalmon-resume 2016
micheleshalmon-resume 2016
 
LTM NCSLMA
LTM NCSLMALTM NCSLMA
LTM NCSLMA
 
グリーンズ編集学校@名古屋第一期 一回目
グリーンズ編集学校@名古屋第一期 一回目グリーンズ編集学校@名古屋第一期 一回目
グリーンズ編集学校@名古屋第一期 一回目
 
Lesson 3
Lesson 3Lesson 3
Lesson 3
 
Persona Presentation
Persona PresentationPersona Presentation
Persona Presentation
 

Similaire à The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
NASIG
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS Poster
Taha Merghani
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 

Similaire à The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring (20)

NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser ...
 
Machine Learning Methods for Analysing and Linking RDF Data
Machine Learning Methods for Analysing and Linking RDF DataMachine Learning Methods for Analysing and Linking RDF Data
Machine Learning Methods for Analysing and Linking RDF Data
 
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS Poster
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
kurous case neural text.pdf
kurous case neural text.pdfkurous case neural text.pdf
kurous case neural text.pdf
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Translating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question AnsweringTranslating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question Answering
 
4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
co:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidalco:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidal
 
Sinmin Literature Review Presentation
Sinmin Literature Review PresentationSinmin Literature Review Presentation
Sinmin Literature Review Presentation
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 

Plus de Normunds Grūzītis

LU specseminārs Web Science: Tekstu analīze
LU specseminārs Web Science: Tekstu analīzeLU specseminārs Web Science: Tekstu analīze
LU specseminārs Web Science: Tekstu analīze
Normunds Grūzītis
 
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidēLatviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
Normunds Grūzītis
 

Plus de Normunds Grūzītis (8)

Grammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructionsGrammatical Framework for implementing multilingual frames and constructions
Grammatical Framework for implementing multilingual frames and constructions
 
FrameNet development for Latvian
FrameNet development for LatvianFrameNet development for Latvian
FrameNet development for Latvian
 
Towards Self-explanatory Ontology Visualization with Contextual Verbalization
Towards Self-explanatory Ontology Visualization with Contextual VerbalizationTowards Self-explanatory Ontology Visualization with Contextual Verbalization
Towards Self-explanatory Ontology Visualization with Contextual Verbalization
 
Formalising the Swedish Constructicon in Grammatical Framework
Formalising the Swedish Constructicon in Grammatical FrameworkFormalising the Swedish Constructicon in Grammatical Framework
Formalising the Swedish Constructicon in Grammatical Framework
 
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportOWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
LU specseminārs Web Science: Tekstu analīze
LU specseminārs Web Science: Tekstu analīzeLU specseminārs Web Science: Tekstu analīze
LU specseminārs Web Science: Tekstu analīze
 
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidēLatviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
Latviešu valodas tekstu korpusa iespējas vārdnīcu izveidē
 

Dernier

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

  • 1. The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring Normunds Grūzītis and Guntis Bārzdiņš University of Latvia, IMCS National information agency LETA 5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland
  • 2. Large-scale media monitoring BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day. A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with the number of monitored sources. Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it is, they are tied down with mundane, routine monitoring tasks. Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.
  • 3. SUMMA – Scalable Understanding of Multilingual MediA Identify people, places, events of interest Discover trends, emerging events, crucial new stories H2020 grant No. 688139
  • 5. Storyline Event-based multi-document summarization: storyline highlights across a set of related stories unrestricted sort of CNL? (templates)
  • 6. • Extractive summarization selects representative sentences from the input documents • Abstractive summarization builds a semantic representation from which a summary is generated • What semantic representation? Sentence A: I saw Joe’s dog, which was running in the garden. Sentence B: The dog was chasing a cat. Summary: Joe’s dog was chasing a cat in the garden. Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive Summarization Using Semantic Representations. NAACL 2015 Abstractive summarization
  • 7. AMR – Abstract Meaning Representation • A semantic representation aimed at large-scale human annotation • A practical, replicable amount of abstraction • Captures many aspects of meaning in a single simple data structure • Aims to abstract away from (English) syntax • Rooted, labeled graphs • Makes heavy use of PropBank framesets • An actual sembank of nearly 50K sentences • Sentences paired with their whole-sentence, logical meanings
  • 8. AMR – Abstract Meaning Representation • A form of AMR has been around for a long time (Langkilde and Knight, 1998) • It has changed a lot since then: PropBank, DBpedia, etc. • Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme • Uses the PENMAN notation (Bateman, 1990) • A way of representing a directed labeled graph in a simple tree-like form • Easy to read and write (for a human), and to traverse (for a program) • From semantic role labelling (SRL) to whole-sentence representation
  • 9. AMR – Abstract Meaning Representation • Nodes are variables labelled by concepts • Entities, events, states, properties • d / dog: d is an instance of dog • Edges are semantic relations • E.g. “The dog is eating bones.” (e / eat-01 :ARG0 (d / dog) :ARG1 (b / bone)) eat.01: consume (VN-class: eat-39.1, FN-frame: Ingestion) ARG0-PAG: consumer, eater (VN-role: agent) ARG1-PPT: meal (VN-role: patient) e / eat-01 b / boned / dog
  • 10. AMR – Abstract Meaning Representation “Bob ate four cakes that he bought.” (x2 / eat-01 :ARG0 (x1 / person :name (n / name :op1 "Bob") :wiki "Bob_X") :ARG1 (x4 / cake :quant 4 :ARG1-of (x7 / buy-01 :ARG0 x1))) e / eat-01 x4 / cakex1 / person x7 / buy-01 "Bob_X" name 4
  • 11. AMR – Abstract Meaning Representation Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015 https://github.com/nschneid/amr-tutorial/ • AMR is still biased towards English or other source languages • Not an Interlingua, but close: Comparison of English AMRs to Chinese and Czech Xue N., Bojar O., Hajič J., Palmer M., Uresova Z., Zhang X. LREC 2014 • Meanwhile, AMR is agnostic about how to derive meanings from strings, and vice versa
  • 12. Natural Language Understanding • While it has been recently showed that the CNL approach can be scaled up.. • Embedded CNLs allowing for CNL-based domain-specific information extraction • CNL as an efficient and user-friendly interface for Big Data end-point querying • CNL for bootstrapping robust NL interfaces • High-level CNL for legal sources • ..use cases like media monitoring are not limited to a particular domain, the input sources vary from newswire texts to TV and radio transcripts to user-generated content in social networks • In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with robust and scalable NLU • NLU cannot be approached by CNLs, and grammars in general (?)
  • 13. SemEval 2016 Task 8 on AMR parsing 1. Riga (University of Latvia / LETA): 0.6196 2. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.6195 3. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.6005 4. UCL+Sheffield (University College London / University of Sheffield): 0.5983 5. M2L (Kyoto University): 0.5952 6. CMU (Carnegie Mellon University / University of Washington): 0.5636 7. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.5566 8. UofR (University of Rochester): 0.4985 9. MeaningFactory (University of Groningen): 0.4702* 10. CLIP@UMD (University of Maryland): 0.4370 11. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706* * Did not use AMR training data
  • 14. NLG from AMR • The potential of grammar-based and CNL approaches becomes obvious in the opposite direction • e.g. in the generation of story highlights from summarized (pruned) AMR graphs • Text generation from AMR is still recognized as a future task • An unexplored niche for grammars and CNLs • GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers • Issue: AMR to AST mapping
  • 15. Pourdamghani N., Gao Y., Hermjakob U., Knight K. Aligning English Strings with Abstract Meaning Representation Graphs. EMNLP 2014 Butler A. Deterministic natural language generation from meaning representations for machine translation. NAACL 2016 Workshop on Semantics-Driven Machine Translation Pourdamghani N., Knight K., Hermjakob U. Generating English from Abstract Meaning Representations. INLG 2016 (to appear) Flanigan J., Dyer C., Smith N.A., Carbonell J. Generation from Abstract Meaning Representation using Tree Transducers. NAACL 2016
  • 16.
  • 17. NLG from AMR • Butler A. 2016. Deterministic natural language generation from meaning representations for machine translation. NAACL Workshop on Semantics-Driven Machine Translation • Converts PENMAN-style representations to Penn-style trees • Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library • Covers a wide range of constructions • A simple example: “Girls see a boy.”
  • 18. AMR to GF conversion: first experiment “Girls see a boy.” (x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy)))) mkCl : NP ⟶ VP ⟶ Cl mkVP : V2 ⟶ NP ⟶ VP mkNP : Quant ⟶ Num ⟶ CN ⟶ VP mkCN : N ⟶ CN (mkCl (mkNP a_Quant singularNum (mkCN girl_N)) (mkVP see_V2 (mkNP a_Quant singularNum (mkCN boy_N)))) adjoin (Cl (VP @)) with PB-frame move ARG0 under Cl move ARG1 under VP adjoin (NP a_Quant singularNum (CN @)) with ARG0/1 excise var
  • 19. AMR to GF conversion: first experiment “The boy sees the two pretty girls.” (x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty))))))) mkCN : A ⟶ N ⟶ CN mkNum : Digits ⟶ Num mkDigits : Str ⟶ Digits (mkCl (mkNP a_Quant singularNum (mkCN boy_N)) (mkVP see_V2 (mkNP a_Quant (mkNum (mkDigits "2")) (mkCN pretty_A girl_N)))) move mod under CN replace Num with quant adjoin (Num (Digits @)) with quant
  • 20. Story headlines: Templates? Application grammar? CNL? Multilingual Headlines Generator (a GF toy example by José P. Moreno) http://grammaticalframework.org/ demos/multilingual_headlines.html
  • 21. Conclusion • There is a potential for cooperating with the DL folks in both NLU and NLG • Especially in NLG which is recognized among the next problems to “solve” by DL • Especially in domain specific use cases that can be approached by CNL • AMR to text issues to be addressed: number, time, co-references, articles, concepts and WSD (for multilingual NLG), named entities, reification; the management of transformation rules