Evalita2018 iListen - itaLIan Speech acT labEliNg

EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
iLISTEN
itaLIan Speech acT labEliNg
https://ilisten2018.github.io/
Pierpaolo Basile and Nicole Novielli
University of Bari Aldo Moro
Dipartimento di Informatica
{pierpaolo.basile, nicole.novielli}@uniba.it
@NicoleNovielli@basilepp

EVALITA 2018 Workshop
December 12-13 2018, Turin
Task Description
• Goal
o Annotating dialogue turns with speech act labels
• Speech acts
o Labels define the communicative intention of the
speaker
o i.e. statement, request for information, agreement,
opinion expression, general answer
• Who is telling what to whom?
o Speech acts as a coding standard for natural
dialogues tasks
J. L. Austin. 1962. How to do things with words. William James Lectures. Oxford University Press.
J. R. Searle. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London.

Motivation
• Conversational access to information
o Chat-oriented dialogue systems
o Simulation of natural dialogues with embodied
conversational agents or chatbots
o Conversational interfaces for smart devices and IoT
• Dialogue analysis
o Chatlog analysis
o Interaction on social media
o Extraction of long-lasting value information from technical
discussions
• Dedicated venues

Development and Test Data
• Transcripts of 60 dialogues
o 30 speech-based + 30 text-based
o 1,576 user dialogue turns
o 1,611 system turns
o ~22k words
• Development set: 40 dialogues
• Development set: 20 dialogues

Development and Test Data
• Corpus of
persuasion dialogues
with an ECA
o Valentina plays the role
of an advisor in the
healthy eating domain
o Wizard of Oz studies:
ECA’s moves are pre-
defined
G. Clarizio, I. Mazzotta, N. Novielli, and F. De Rosis. 2006. Social attitude towards a conversational
character. In Proc. of IEEE International Workshop on Robot and Human Interactive Communication, pp. 2–7.

Speech Acts: User’s Moves

Speech Acts: User’s Moves
Target of classification

Speech Acts: System’s
Moves

Speech Acts: System’s
Moves
Provided as context

Speech Act Annotation
A excerpt a from a dialogue

Speech Act Annotation
A excerpt a from a dialogue
The turn ID provides an indication of the speaker and the
input modality

Distribution and Format

Evaluation
• Ranking: classification of user dialogue acts
o F1-score (macro-averaging)
• Precision and Recall are also computed
o Both, micro- and macro-averaging
• Baseline: trivial classifier predicting the
majority class
o STATEMENT (33%)

Participants
• Task open to everyone from industry and
academia
• Sixteen participants registered, but only two
teams actually submitted the
o UNITOR (Academia)
- Supervised system based on Structured Kernel-based
Support Vector Machine
- Exploits the parse tree and the cosine similarity between the
word vectors in a distributional semantics model
o X2Check (Industry) – Report not submitted

Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
Danilo Croce and Roberto Basili
A Markovian Kernel-based
Approach for itaLIan Speech acT
labEliNg
Macro Micro

Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
• Both systems overcome the baseline
• Some classes are harder to predict
o Low number of examples in the training data
Macro Micro

Performance by class
Freq Prec Rec F Prec Rec F
OPENING 2% 1.00 1.00 1.00 1.00 0.73 0.84
CLOSING 2% 0.78 0.70 0.74 0.82 0.90 0.86
INFO-REQUEST 25% 0.78 0.83 0.80 0.74 0.79 0.76
SOLICITATION-REQ-CLARIF 7% 0.40 0.33 0.36 0.44 0.33 0.38
STATEMENT 33% 0.75 0.94 0.84 0.67 0.89 0.76
GENERIC-ANSWER 10% 0.86 0.92 0.89 0.76 0.90 0.82
AGREE-ACCEPT 5% 0.65 0.46 0.54 0.57 0.50 0.53
REJECT 5% 0.43 0.08 0.13 0.00 0.00 0.00
KIND-ATT-SMALLTALK 11% 0.50 0.39 0.44 0.47 0.20 0.29
Unitor X2Check
Some classes are harder to predict
- low number of examples in the training data
- the main cause of error is the misclassification as STATEMENT

Ideas for future editions
• The best performing system leverages
syntactic features
o Task-related features are not defined
o Follow-up: extending the benchmark with dialogues
from different domains
• Is the task inherently dependent on the
language?
o To what extent the approaches generalize beyond
Italian?
o Dialogues in other languages might be included in the
gold standard, as in AMI

Have fun!
• Download our dataset from the GitHub
EVALITA 2018 repository
https://github.com/evalita2018/data

Evalita2018 iListen - itaLIan Speech acT labEliNg

Recommended

Recommended

More Related Content

Similar to Evalita2018 iListen - itaLIan Speech acT labEliNg

Similar to Evalita2018 iListen - itaLIan Speech acT labEliNg (20)

More from Nicole Novielli

More from Nicole Novielli (12)

Recently uploaded

Recently uploaded (20)

Evalita2018 iListen - itaLIan Speech acT labEliNg

Editor's Notes