In the era of Big Data and Deep Learning, there is a common view that machine learning approaches are the only way to cope with the robust and scalable information extraction and summarization. It has been recently proposed that the CNL approach could be scaled up, building on the concept of embedded CNL and, thus, allowing for CNL-based information extraction from e.g. normative or medical texts that are rather controlled by nature but still infringe the boundaries of CNL. Although it is arguable if CNL can be exploited to approach the robust wide-coverage semantic parsing for use cases like media monitoring, its potential becomes much more obvious in the opposite direction: generation of story highlights from the summarized AMR graphs, which is in the focus of this position paper.
Advantages of Hiring UIUX Design Service Providers for Your Business
The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring
1. The Role of CNL and AMR
in Scalable Abstractive Summarization
for Multilingual Media Monitoring
Normunds Grūzītis and Guntis Bārzdiņš
University of Latvia, IMCS
National information agency LETA
5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland
2. Large-scale media monitoring
BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day.
A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the
maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with
the number of monitored sources.
Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it
is, they are tied down with mundane, routine monitoring tasks.
Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.
3. SUMMA – Scalable Understanding of Multilingual MediA
Identify people, places, events of interest
Discover trends, emerging events, crucial new stories
H2020 grant No. 688139
6. • Extractive summarization selects
representative sentences from
the input documents
• Abstractive summarization builds
a semantic representation from
which a summary is generated
• What semantic representation?
Sentence A: I saw Joe’s dog, which was running in the garden.
Sentence B: The dog was chasing a cat.
Summary: Joe’s dog was chasing a cat in the garden.
Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive
Summarization Using Semantic Representations. NAACL 2015
Abstractive summarization
7. AMR – Abstract Meaning Representation
• A semantic representation aimed at large-scale human annotation
• A practical, replicable amount of abstraction
• Captures many aspects of meaning in a single simple data structure
• Aims to abstract away from (English) syntax
• Rooted, labeled graphs
• Makes heavy use of PropBank framesets
• An actual sembank of nearly 50K sentences
• Sentences paired with their whole-sentence, logical meanings
8. AMR – Abstract Meaning Representation
• A form of AMR has been around for a long time (Langkilde and Knight, 1998)
• It has changed a lot since then: PropBank, DBpedia, etc.
• Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme
• Uses the PENMAN notation (Bateman, 1990)
• A way of representing a directed labeled graph in a simple tree-like form
• Easy to read and write (for a human), and to traverse (for a program)
• From semantic role labelling (SRL) to whole-sentence representation
9. AMR – Abstract Meaning Representation
• Nodes are variables labelled by concepts
• Entities, events, states, properties
• d / dog: d is an instance of dog
• Edges are semantic relations
• E.g. “The dog is eating bones.”
(e / eat-01
:ARG0 (d / dog)
:ARG1 (b / bone))
eat.01: consume (VN-class: eat-39.1, FN-frame: Ingestion)
ARG0-PAG: consumer, eater (VN-role: agent)
ARG1-PPT: meal (VN-role: patient)
e / eat-01
b / boned / dog
10. AMR – Abstract Meaning Representation
“Bob ate four cakes that he bought.”
(x2 / eat-01
:ARG0 (x1 / person
:name (n / name
:op1 "Bob")
:wiki "Bob_X")
:ARG1 (x4 / cake
:quant 4
:ARG1-of (x7 / buy-01
:ARG0 x1)))
e / eat-01
x4 / cakex1 / person
x7 / buy-01
"Bob_X"
name
4
11. AMR – Abstract Meaning Representation
Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015
https://github.com/nschneid/amr-tutorial/
• AMR is still biased towards
English or other source
languages
• Not an Interlingua, but close:
Comparison of English AMRs to
Chinese and Czech
Xue N., Bojar O., Hajič J., Palmer
M., Uresova Z., Zhang X. LREC 2014
• Meanwhile, AMR is agnostic
about how to derive meanings
from strings, and vice versa
12. Natural Language Understanding
• While it has been recently showed that the CNL approach can be scaled up..
• Embedded CNLs allowing for CNL-based domain-specific information extraction
• CNL as an efficient and user-friendly interface for Big Data end-point querying
• CNL for bootstrapping robust NL interfaces
• High-level CNL for legal sources
• ..use cases like media monitoring are not limited to a particular domain, the input sources vary
from newswire texts to TV and radio transcripts to user-generated content in social networks
• In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with
robust and scalable NLU
• NLU cannot be approached by CNLs, and grammars in general (?)
13. SemEval 2016 Task 8 on AMR parsing
1. Riga (University of Latvia / LETA): 0.6196
2. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.6195
3. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.6005
4. UCL+Sheffield (University College London / University of Sheffield): 0.5983
5. M2L (Kyoto University): 0.5952
6. CMU (Carnegie Mellon University / University of Washington): 0.5636
7. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.5566
8. UofR (University of Rochester): 0.4985
9. MeaningFactory (University of Groningen): 0.4702*
10. CLIP@UMD (University of Maryland): 0.4370
11. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*
* Did not use AMR training data
14. NLG from AMR
• The potential of grammar-based and CNL approaches becomes obvious in the opposite direction
• e.g. in the generation of story highlights from summarized (pruned) AMR graphs
• Text generation from AMR is still recognized as a future task
• An unexplored niche for grammars and CNLs
• GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers
• Issue: AMR to AST mapping
15. Pourdamghani N., Gao
Y., Hermjakob U.,
Knight K. Aligning
English Strings with
Abstract Meaning
Representation
Graphs. EMNLP 2014
Butler A. Deterministic natural language
generation from meaning representations
for machine translation. NAACL 2016
Workshop on Semantics-Driven Machine
Translation
Pourdamghani N., Knight K., Hermjakob U.
Generating English from Abstract Meaning
Representations. INLG 2016 (to appear)
Flanigan J., Dyer C., Smith N.A., Carbonell J.
Generation from Abstract Meaning
Representation using Tree Transducers.
NAACL 2016
16.
17. NLG from AMR
• Butler A. 2016. Deterministic natural language generation from meaning representations for
machine translation. NAACL Workshop on Semantics-Driven Machine Translation
• Converts PENMAN-style representations to Penn-style trees
• Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library
• Covers a wide range of constructions
• A simple example: “Girls see a boy.”
18. AMR to GF conversion: first experiment
“Girls see a boy.”
(x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy))))
mkCl : NP ⟶ VP ⟶ Cl
mkVP : V2 ⟶ NP ⟶ VP
mkNP : Quant ⟶ Num ⟶ CN ⟶ VP
mkCN : N ⟶ CN
(mkCl
(mkNP a_Quant singularNum (mkCN girl_N))
(mkVP
see_V2
(mkNP a_Quant singularNum (mkCN boy_N))))
adjoin (Cl (VP @)) with PB-frame
move ARG0 under Cl
move ARG1 under VP
adjoin (NP a_Quant singularNum (CN @)) with ARG0/1
excise var
19. AMR to GF conversion: first experiment
“The boy sees the two pretty girls.”
(x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty)))))))
mkCN : A ⟶ N ⟶ CN
mkNum : Digits ⟶ Num
mkDigits : Str ⟶ Digits
(mkCl
(mkNP a_Quant singularNum (mkCN boy_N))
(mkVP
see_V2
(mkNP a_Quant (mkNum (mkDigits "2")) (mkCN pretty_A girl_N))))
move mod under CN
replace Num with quant
adjoin (Num (Digits @)) with quant
20. Story headlines: Templates? Application grammar? CNL?
Multilingual Headlines Generator
(a GF toy example by José P. Moreno)
http://grammaticalframework.org/
demos/multilingual_headlines.html
21. Conclusion
• There is a potential for cooperating with the DL folks in both NLU and NLG
• Especially in NLG which is recognized among the next problems to “solve” by DL
• Especially in domain specific use cases that can be approached by CNL
• AMR to text issues to be addressed: number, time, co-references, articles,
concepts and WSD (for multilingual NLG), named entities, reification; the
management of transformation rules