1. EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
iLISTEN
itaLIan Speech acT labEliNg
https://ilisten2018.github.io/
Pierpaolo Basile and Nicole Novielli
University of Bari Aldo Moro
Dipartimento di Informatica
{pierpaolo.basile, nicole.novielli}@uniba.it
@NicoleNovielli@basilepp
2. EVALITA 2018 Workshop
December 12-13 2018, Turin
Task Description
• Goal
o Annotating dialogue turns with speech act labels
• Speech acts
o Labels define the communicative intention of the
speaker
o i.e. statement, request for information, agreement,
opinion expression, general answer
• Who is telling what to whom?
o Speech acts as a coding standard for natural
dialogues tasks
J. L. Austin. 1962. How to do things with words. William James Lectures. Oxford University Press.
J. R. Searle. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London.
3. EVALITA 2018 Workshop
December 12-13 2018, Turin
Motivation
• Conversational access to information
o Chat-oriented dialogue systems
o Simulation of natural dialogues with embodied
conversational agents or chatbots
o Conversational interfaces for smart devices and IoT
• Dialogue analysis
o Chatlog analysis
o Interaction on social media
o Extraction of long-lasting value information from technical
discussions
• Dedicated venues
4. EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Transcripts of 60 dialogues
o 30 speech-based + 30 text-based
o 1,576 user dialogue turns
o 1,611 system turns
o ~22k words
• Development set: 40 dialogues
o 20 speech-based + 20 text-based
• Development set: 20 dialogues
o 10 speech-based + 10 text-based
5. EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Corpus of
persuasion dialogues
with an ECA
o Valentina plays the role
of an advisor in the
healthy eating domain
o Wizard of Oz studies:
ECA’s moves are pre-
defined
G. Clarizio, I. Mazzotta, N. Novielli, and F. De Rosis. 2006. Social attitude towards a conversational
character. In Proc. of IEEE International Workshop on Robot and Human Interactive Communication, pp. 2–7.
11. EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Act Annotation
A excerpt a from a dialogue
The turn ID provides an indication of the speaker and the
input modality
13. EVALITA 2018 Workshop
December 12-13 2018, Turin
Evaluation
• Ranking: classification of user dialogue acts
o F1-score (macro-averaging)
• Precision and Recall are also computed
o Both, micro- and macro-averaging
• Baseline: trivial classifier predicting the
majority class
o STATEMENT (33%)
14. EVALITA 2018 Workshop
December 12-13 2018, Turin
Participants
• Task open to everyone from industry and
academia
• Sixteen participants registered, but only two
teams actually submitted the
o UNITOR (Academia)
- Supervised system based on Structured Kernel-based
Support Vector Machine
- Exploits the parse tree and the cosine similarity between the
word vectors in a distributional semantics model
o X2Check (Industry) – Report not submitted
16. EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
Danilo Croce and Roberto Basili
A Markovian Kernel-based
Approach for itaLIan Speech acT
labEliNg
Macro Micro
17. EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
• Both systems overcome the baseline
• Some classes are harder to predict
o Low number of examples in the training data
Macro Micro
18. EVALITA 2018 Workshop
December 12-13 2018, Turin
Performance by class
Freq Prec Rec F Prec Rec F
OPENING 2% 1.00 1.00 1.00 1.00 0.73 0.84
CLOSING 2% 0.78 0.70 0.74 0.82 0.90 0.86
INFO-REQUEST 25% 0.78 0.83 0.80 0.74 0.79 0.76
SOLICITATION-REQ-CLARIF 7% 0.40 0.33 0.36 0.44 0.33 0.38
STATEMENT 33% 0.75 0.94 0.84 0.67 0.89 0.76
GENERIC-ANSWER 10% 0.86 0.92 0.89 0.76 0.90 0.82
AGREE-ACCEPT 5% 0.65 0.46 0.54 0.57 0.50 0.53
REJECT 5% 0.43 0.08 0.13 0.00 0.00 0.00
KIND-ATT-SMALLTALK 11% 0.50 0.39 0.44 0.47 0.20 0.29
Unitor X2Check
Some classes are harder to predict
- low number of examples in the training data
- the main cause of error is the misclassification as STATEMENT
19. EVALITA 2018 Workshop
December 12-13 2018, Turin
Ideas for future editions
• The best performing system leverages
syntactic features
o Task-related features are not defined
o Follow-up: extending the benchmark with dialogues
from different domains
• Is the task inherently dependent on the
language?
o To what extent the approaches generalize beyond
Italian?
o Dialogues in other languages might be included in the
gold standard, as in AMI
20. EVALITA 2018 Workshop
December 12-13 2018, Turin
Have fun!
• Download our dataset from the GitHub
EVALITA 2018 repository
https://github.com/evalita2018/data
Editor's Notes
SIGdial Meeting on Discourse and Dialogue
E.g.:
WOCHAT, Special Session on Chatbots and Conversational Agents
Natural Language Generation for Dialogue Systems special session
n particular, a recent research trend has emerged to investigate methodologies to enable intelligent access to information, that is by rely- ing on natural dialogues as interaction metaphor. In this perspective, chat-oriented dialogue systems are attracting the increasing attention of both re- search and practitioners interested in the simula- tion of natural dialogues with embodied conversa- tional agents (Klüwer, 2011), conversational inter- faces for smart devices (McTear et al., 2016) and the Internet of Things (Kar and Haldar, 2016). As a consequence, we are assisting to the flourishing of dedicated research venues on chat-oriented in- teraction. It is the case of WOCHAT1, the Special Session on Chatbots and Conversational Agents, now at its second edition, as well as the Nat- ural Language Generation for Dialogue Systems special session2, both co-located with the Annual SIGdial Meeting on Discourse and Dialogue. While not representing any deep understanding of the interaction dynamics, speech acts can be successfully employed as a coding standard for natural dialogues tasks.
n particular, a recent research trend has emerged to investigate methodologies to enable intelligent access to information, that is by relying on natural dialogues as interaction metaphor. In this perspective, chat-oriented dialogue systems are attracting the increasing attention of both re- search and practitioners interested in the simula- tion of natural dialogues with embodied conversa- tional agents (Klüwer, 2011), conversational inter- faces for smart devices (McTear et al., 2016) and the Internet of Things (Kar and Haldar, 2016). As a consequence, we are assisting to the flourishing of dedicated research venues on chat-oriented in- teraction. It is the case of WOCHAT1, the Special Session on Chatbots and Conversational Agents, now at its second edition, as well as the Nat- ural Language Generation for Dialogue Systems special session2, both co-located with the Annual SIGdial Meeting on Discourse and Dialogue. While not representing any deep understanding of the interaction dynamics, speech acts can be successfully employed as a coding standard for natural dialogues tasks.
This approach, while more verbose than a simple accuracy test, arise from the need to correctly address the unbalanced distribution of la- bels in the dataset. Furthermore, by providing de- tailed performance metrics, we intend to enhance interesting discussion on the nature of the problem and the data, as they might emerge from the par- ticipants’ final reports. As a baseline, we use the most frequent label for the user speech acts (i.e., STATEMENT).
One possible reason is that statements rep- resent the majority class, thus inducing a bias in the classifiers. Another possible explanation, is that dialogue moves that appear to be linguistically consistent with the typical structure of statements have been annotated differently, according to the actual communicative role they play.