The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

The Role of CNL and AMR
in Scalable Abstractive Summarization
for Multilingual Media Monitoring
Normunds Grūzītis and Guntis Bārzdiņš
University of Latvia, IMCS
National information agency LETA
5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland

Large-scale media monitoring
BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day.
A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the
maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with
the number of monitored sources.
Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it
is, they are tied down with mundane, routine monitoring tasks.
Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.

SUMMA – Scalable Understanding of Multilingual MediA
Identify people, places, events of interest
Discover trends, emerging events, crucial new stories
H2020 grant No. 688139

Storyline
Event-based multi-document summarization: storyline highlights across a set of related stories
unrestricted
sort of CNL?
(templates)

• Extractive summarization selects
representative sentences from
the input documents
• Abstractive summarization builds
a semantic representation from
which a summary is generated
• What semantic representation?
Sentence A: I saw Joe’s dog, which was running in the garden.
Sentence B: The dog was chasing a cat.
Summary: Joe’s dog was chasing a cat in the garden.
Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive
Summarization Using Semantic Representations. NAACL 2015
Abstractive summarization

AMR – Abstract Meaning Representation
• A semantic representation aimed at large-scale human annotation
• A practical, replicable amount of abstraction
• Captures many aspects of meaning in a single simple data structure
• Aims to abstract away from (English) syntax
• Rooted, labeled graphs
• Makes heavy use of PropBank framesets
• An actual sembank of nearly 50K sentences
• Sentences paired with their whole-sentence, logical meanings

• A form of AMR has been around for a long time (Langkilde and Knight, 1998)
• It has changed a lot since then: PropBank, DBpedia, etc.
• Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme
• Uses the PENMAN notation (Bateman, 1990)
• A way of representing a directed labeled graph in a simple tree-like form
• Easy to read and write (for a human), and to traverse (for a program)
• From semantic role labelling (SRL) to whole-sentence representation

• Nodes are variables labelled by concepts
• Entities, events, states, properties
• d / dog: d is an instance of dog
• Edges are semantic relations
• E.g. “The dog is eating bones.”
(e / eat-01
:ARG0 (d / dog)
:ARG1 (b / bone))
eat.01: consume (VN-class: eat-39.1, FN-frame: Ingestion)
ARG0-PAG: consumer, eater (VN-role: agent)
ARG1-PPT: meal (VN-role: patient)
e / eat-01
b / boned / dog

“Bob ate four cakes that he bought.”
(x2 / eat-01
:ARG0 (x1 / person
:name (n / name
:op1 "Bob")
:wiki "Bob_X")
:ARG1 (x4 / cake
:quant 4
:ARG1-of (x7 / buy-01
:ARG0 x1)))
e / eat-01
x4 / cakex1 / person
x7 / buy-01
"Bob_X"
name
4

Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015
https://github.com/nschneid/amr-tutorial/
• AMR is still biased towards
English or other source
languages
• Not an Interlingua, but close:
Comparison of English AMRs to
Chinese and Czech
Xue N., Bojar O., Hajič J., Palmer
M., Uresova Z., Zhang X. LREC 2014
• Meanwhile, AMR is agnostic
about how to derive meanings
from strings, and vice versa

Natural Language Understanding
• While it has been recently showed that the CNL approach can be scaled up..
• Embedded CNLs allowing for CNL-based domain-specific information extraction
• CNL as an efficient and user-friendly interface for Big Data end-point querying
• CNL for bootstrapping robust NL interfaces
• High-level CNL for legal sources
• ..use cases like media monitoring are not limited to a particular domain, the input sources vary
from newswire texts to TV and radio transcripts to user-generated content in social networks
• In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with
robust and scalable NLU
• NLU cannot be approached by CNLs, and grammars in general (?)

SemEval 2016 Task 8 on AMR parsing
1. Riga (University of Latvia / LETA): 0.6196
2. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.6195
3. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.6005
4. UCL+Sheffield (University College London / University of Sheffield): 0.5983
5. M2L (Kyoto University): 0.5952
6. CMU (Carnegie Mellon University / University of Washington): 0.5636
7. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.5566
8. UofR (University of Rochester): 0.4985
9. MeaningFactory (University of Groningen): 0.4702*
10. CLIP@UMD (University of Maryland): 0.4370
11. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*
* Did not use AMR training data

NLG from AMR
• The potential of grammar-based and CNL approaches becomes obvious in the opposite direction
• e.g. in the generation of story highlights from summarized (pruned) AMR graphs
• Text generation from AMR is still recognized as a future task
• An unexplored niche for grammars and CNLs
• GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers
• Issue: AMR to AST mapping

Pourdamghani N., Gao
Y., Hermjakob U.,
Knight K. Aligning
English Strings with
Abstract Meaning
Representation
Graphs. EMNLP 2014
Butler A. Deterministic natural language
generation from meaning representations
for machine translation. NAACL 2016
Workshop on Semantics-Driven Machine
Translation
Pourdamghani N., Knight K., Hermjakob U.
Generating English from Abstract Meaning
Representations. INLG 2016 (to appear)
Flanigan J., Dyer C., Smith N.A., Carbonell J.
Generation from Abstract Meaning
Representation using Tree Transducers.
NAACL 2016

NLG from AMR
• Butler A. 2016. Deterministic natural language generation from meaning representations for
machine translation. NAACL Workshop on Semantics-Driven Machine Translation
• Converts PENMAN-style representations to Penn-style trees
• Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library
• Covers a wide range of constructions
• A simple example: “Girls see a boy.”

AMR to GF conversion: first experiment
“Girls see a boy.”
(x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy))))
mkCl : NP ⟶ VP ⟶ Cl
mkVP : V2 ⟶ NP ⟶ VP
mkNP : Quant ⟶ Num ⟶ CN ⟶ VP
mkCN : N ⟶ CN
(mkCl
(mkNP a_Quant singularNum (mkCN girl_N))
(mkVP
see_V2
(mkNP a_Quant singularNum (mkCN boy_N))))
adjoin (Cl (VP @)) with PB-frame
move ARG0 under Cl
move ARG1 under VP
adjoin (NP a_Quant singularNum (CN @)) with ARG0/1
excise var

AMR to GF conversion: first experiment
“The boy sees the two pretty girls.”
(x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty)))))))
mkCN : A ⟶ N ⟶ CN
mkNum : Digits ⟶ Num
mkDigits : Str ⟶ Digits
(mkCl
(mkNP a_Quant singularNum (mkCN boy_N))
(mkVP
see_V2
(mkNP a_Quant (mkNum (mkDigits "2")) (mkCN pretty_A girl_N))))
move mod under CN
replace Num with quant
adjoin (Num (Digits @)) with quant

Story headlines: Templates? Application grammar? CNL?
Multilingual Headlines Generator
(a GF toy example by José P. Moreno)
http://grammaticalframework.org/
demos/multilingual_headlines.html

Conclusion
• There is a potential for cooperating with the DL folks in both NLU and NLG
• Especially in NLG which is recognized among the next problems to “solve” by DL
• Especially in domain specific use cases that can be approached by CNL
• AMR to text issues to be addressed: number, time, co-references, articles,
concepts and WSD (for multilingual NLG), named entities, reification; the
management of transformation rules

The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (7)

Similaire à The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

Similaire à The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring (20)

Plus de Normunds Grūzītis

Plus de Normunds Grūzītis (8)

Dernier

Dernier (20)

The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring