SlideShare a Scribd company logo
1 of 20
EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
iLISTEN
itaLIan Speech acT labEliNg
https://ilisten2018.github.io/
Pierpaolo Basile and Nicole Novielli
University of Bari Aldo Moro
Dipartimento di Informatica
{pierpaolo.basile, nicole.novielli}@uniba.it
@NicoleNovielli@basilepp
EVALITA 2018 Workshop
December 12-13 2018, Turin
Task Description
• Goal
o Annotating dialogue turns with speech act labels
• Speech acts
o Labels define the communicative intention of the
speaker
o i.e. statement, request for information, agreement,
opinion expression, general answer
• Who is telling what to whom?
o Speech acts as a coding standard for natural
dialogues tasks
J. L. Austin. 1962. How to do things with words. William James Lectures. Oxford University Press.
J. R. Searle. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London.
EVALITA 2018 Workshop
December 12-13 2018, Turin
Motivation
• Conversational access to information
o Chat-oriented dialogue systems
o Simulation of natural dialogues with embodied
conversational agents or chatbots
o Conversational interfaces for smart devices and IoT
• Dialogue analysis
o Chatlog analysis
o Interaction on social media
o Extraction of long-lasting value information from technical
discussions
• Dedicated venues
EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Transcripts of 60 dialogues
o 30 speech-based + 30 text-based
o 1,576 user dialogue turns
o 1,611 system turns
o ~22k words
• Development set: 40 dialogues
o 20 speech-based + 20 text-based
• Development set: 20 dialogues
o 10 speech-based + 10 text-based
EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Corpus of
persuasion dialogues
with an ECA
o Valentina plays the role
of an advisor in the
healthy eating domain
o Wizard of Oz studies:
ECA’s moves are pre-
defined
G. Clarizio, I. Mazzotta, N. Novielli, and F. De Rosis. 2006. Social attitude towards a conversational
character. In Proc. of IEEE International Workshop on Robot and Human Interactive Communication, pp. 2–7.
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: User’s Moves
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: User’s Moves
Target of classification
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: System’s
Moves
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: System’s
Moves
Provided as context
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Act Annotation
A excerpt a from a dialogue
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Act Annotation
A excerpt a from a dialogue
The turn ID provides an indication of the speaker and the
input modality
EVALITA 2018 Workshop
December 12-13 2018, Turin
Distribution and Format
EVALITA 2018 Workshop
December 12-13 2018, Turin
Evaluation
• Ranking: classification of user dialogue acts
o F1-score (macro-averaging)
• Precision and Recall are also computed
o Both, micro- and macro-averaging
• Baseline: trivial classifier predicting the
majority class
o STATEMENT (33%)
EVALITA 2018 Workshop
December 12-13 2018, Turin
Participants
• Task open to everyone from industry and
academia
• Sixteen participants registered, but only two
teams actually submitted the
o UNITOR (Academia)
- Supervised system based on Structured Kernel-based
Support Vector Machine
- Exploits the parse tree and the cosine similarity between the
word vectors in a distributional semantics model
o X2Check (Industry) – Report not submitted
EVALITA 2018 Workshop
December 12-13 2018, Turin
EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
Danilo Croce and Roberto Basili
A Markovian Kernel-based
Approach for itaLIan Speech acT
labEliNg
Macro Micro
EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531
X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957
Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564
• Both systems overcome the baseline
• Some classes are harder to predict
o Low number of examples in the training data
Macro Micro
EVALITA 2018 Workshop
December 12-13 2018, Turin
Performance by class
Freq Prec Rec F Prec Rec F
OPENING 2% 1.00 1.00 1.00 1.00 0.73 0.84
CLOSING 2% 0.78 0.70 0.74 0.82 0.90 0.86
INFO-REQUEST 25% 0.78 0.83 0.80 0.74 0.79 0.76
SOLICITATION-REQ-CLARIF 7% 0.40 0.33 0.36 0.44 0.33 0.38
STATEMENT 33% 0.75 0.94 0.84 0.67 0.89 0.76
GENERIC-ANSWER 10% 0.86 0.92 0.89 0.76 0.90 0.82
AGREE-ACCEPT 5% 0.65 0.46 0.54 0.57 0.50 0.53
REJECT 5% 0.43 0.08 0.13 0.00 0.00 0.00
KIND-ATT-SMALLTALK 11% 0.50 0.39 0.44 0.47 0.20 0.29
Unitor X2Check
Some classes are harder to predict
- low number of examples in the training data
- the main cause of error is the misclassification as STATEMENT
EVALITA 2018 Workshop
December 12-13 2018, Turin
Ideas for future editions
• The best performing system leverages
syntactic features
o Task-related features are not defined
o Follow-up: extending the benchmark with dialogues
from different domains
• Is the task inherently dependent on the
language?
o To what extent the approaches generalize beyond
Italian?
o Dialogues in other languages might be included in the
gold standard, as in AMI
EVALITA 2018 Workshop
December 12-13 2018, Turin
Have fun!
• Download our dataset from the GitHub
EVALITA 2018 repository
https://github.com/evalita2018/data

More Related Content

Similar to Evalita2018 iListen - itaLIan Speech acT labEliNg

DFlow is dead. Long live Tako!
DFlow is dead. Long live Tako!DFlow is dead. Long live Tako!
DFlow is dead. Long live Tako!Roberto Minelli
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...The Internet of Things Methodology
 
Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaDaniele Mazzei
 
List of Journal after read the abstract.docx
List of Journal after read the abstract.docxList of Journal after read the abstract.docx
List of Journal after read the abstract.docxAdieYadie1
 
Oerri briefing dec11
Oerri briefing dec11Oerri briefing dec11
Oerri briefing dec11Jisc
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Architecture for Participatory Learning
Architecture for Participatory LearningArchitecture for Participatory Learning
Architecture for Participatory LearningYishay Mor
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxROHITSHARMA779690
 
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCBOD ANR project U-PSUD
 
Teaching Students Collaborative Requirements Engineering. Case Study Red:Wire
Teaching Students Collaborative Requirements Engineering. Case Study Red:WireTeaching Students Collaborative Requirements Engineering. Case Study Red:Wire
Teaching Students Collaborative Requirements Engineering. Case Study Red:WireDagmar Monett
 
Standardization Activities: ISO/IEC JTC1 SC36
Standardization Activities: ISO/IEC JTC1 SC36Standardization Activities: ISO/IEC JTC1 SC36
Standardization Activities: ISO/IEC JTC1 SC36openforum
 
A presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberationA presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberationLu Xiao
 
Hoe ziet de toekomst van Learning Analytics er uit?
Hoe ziet de toekomst van Learning Analytics er uit?Hoe ziet de toekomst van Learning Analytics er uit?
Hoe ziet de toekomst van Learning Analytics er uit?Hendrik Drachsler
 
New books jun 2014
New books jun 2014New books jun 2014
New books jun 2014maethaya
 
Keynote ACIS/AAI2014 conference
Keynote ACIS/AAI2014 conferenceKeynote ACIS/AAI2014 conference
Keynote ACIS/AAI2014 conferenceKyoto University
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 

Similar to Evalita2018 iListen - itaLIan Speech acT labEliNg (20)

DFlow is dead. Long live Tako!
DFlow is dead. Long live Tako!DFlow is dead. Long live Tako!
DFlow is dead. Long live Tako!
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...
The IoT Methodology & An Introduction to the Intel Galileo, Edison and SmartL...
 
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystemDigital repertoires of poetry metrics: towards a Linked Open Data ecosystem
Digital repertoires of poetry metrics: towards a Linked Open Data ecosystem
 
A Methodology for Building the Internet of Things
A Methodology for Building the Internet of ThingsA Methodology for Building the Internet of Things
A Methodology for Building the Internet of Things
 
Leaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di PisaLeaning Lab il Living Lab di Pisa
Leaning Lab il Living Lab di Pisa
 
List of Journal after read the abstract.docx
List of Journal after read the abstract.docxList of Journal after read the abstract.docx
List of Journal after read the abstract.docx
 
Oerri briefing dec11
Oerri briefing dec11Oerri briefing dec11
Oerri briefing dec11
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Architecture for Participatory Learning
Architecture for Participatory LearningArchitecture for Participatory Learning
Architecture for Participatory Learning
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptx
 
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approachCinzia Battistella; Modeling a business ecosystem: a network analysis approach
Cinzia Battistella; Modeling a business ecosystem: a network analysis approach
 
Teaching Students Collaborative Requirements Engineering. Case Study Red:Wire
Teaching Students Collaborative Requirements Engineering. Case Study Red:WireTeaching Students Collaborative Requirements Engineering. Case Study Red:Wire
Teaching Students Collaborative Requirements Engineering. Case Study Red:Wire
 
Toward supporting decision-making under uncertainty in digital humanities wit...
Toward supporting decision-making under uncertainty in digital humanities wit...Toward supporting decision-making under uncertainty in digital humanities wit...
Toward supporting decision-making under uncertainty in digital humanities wit...
 
Standardization Activities: ISO/IEC JTC1 SC36
Standardization Activities: ISO/IEC JTC1 SC36Standardization Activities: ISO/IEC JTC1 SC36
Standardization Activities: ISO/IEC JTC1 SC36
 
A presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberationA presentation about my recent projects on goup ideation and deliberation
A presentation about my recent projects on goup ideation and deliberation
 
Hoe ziet de toekomst van Learning Analytics er uit?
Hoe ziet de toekomst van Learning Analytics er uit?Hoe ziet de toekomst van Learning Analytics er uit?
Hoe ziet de toekomst van Learning Analytics er uit?
 
New books jun 2014
New books jun 2014New books jun 2014
New books jun 2014
 
Keynote ACIS/AAI2014 conference
Keynote ACIS/AAI2014 conferenceKeynote ACIS/AAI2014 conference
Keynote ACIS/AAI2014 conference
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 

More from Nicole Novielli

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersNicole Novielli
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesNicole Novielli
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisNicole Novielli
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsNicole Novielli
 
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchNicole Novielli
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisNicole Novielli
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...Nicole Novielli
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowNicole Novielli
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...Nicole Novielli
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Nicole Novielli
 

More from Nicole Novielli (12)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software Developers
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost Sensors
 
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering Research
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack Overflow
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Evalita2018 iListen - itaLIan Speech acT labEliNg

  • 1. EVALITA 2018 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN iLISTEN itaLIan Speech acT labEliNg https://ilisten2018.github.io/ Pierpaolo Basile and Nicole Novielli University of Bari Aldo Moro Dipartimento di Informatica {pierpaolo.basile, nicole.novielli}@uniba.it @NicoleNovielli@basilepp
  • 2. EVALITA 2018 Workshop December 12-13 2018, Turin Task Description • Goal o Annotating dialogue turns with speech act labels • Speech acts o Labels define the communicative intention of the speaker o i.e. statement, request for information, agreement, opinion expression, general answer • Who is telling what to whom? o Speech acts as a coding standard for natural dialogues tasks J. L. Austin. 1962. How to do things with words. William James Lectures. Oxford University Press. J. R. Searle. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London.
  • 3. EVALITA 2018 Workshop December 12-13 2018, Turin Motivation • Conversational access to information o Chat-oriented dialogue systems o Simulation of natural dialogues with embodied conversational agents or chatbots o Conversational interfaces for smart devices and IoT • Dialogue analysis o Chatlog analysis o Interaction on social media o Extraction of long-lasting value information from technical discussions • Dedicated venues
  • 4. EVALITA 2018 Workshop December 12-13 2018, Turin Development and Test Data • Transcripts of 60 dialogues o 30 speech-based + 30 text-based o 1,576 user dialogue turns o 1,611 system turns o ~22k words • Development set: 40 dialogues o 20 speech-based + 20 text-based • Development set: 20 dialogues o 10 speech-based + 10 text-based
  • 5. EVALITA 2018 Workshop December 12-13 2018, Turin Development and Test Data • Corpus of persuasion dialogues with an ECA o Valentina plays the role of an advisor in the healthy eating domain o Wizard of Oz studies: ECA’s moves are pre- defined G. Clarizio, I. Mazzotta, N. Novielli, and F. De Rosis. 2006. Social attitude towards a conversational character. In Proc. of IEEE International Workshop on Robot and Human Interactive Communication, pp. 2–7.
  • 6. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: User’s Moves
  • 7. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: User’s Moves Target of classification
  • 8. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: System’s Moves
  • 9. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: System’s Moves Provided as context
  • 10. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Act Annotation A excerpt a from a dialogue
  • 11. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Act Annotation A excerpt a from a dialogue The turn ID provides an indication of the speaker and the input modality
  • 12. EVALITA 2018 Workshop December 12-13 2018, Turin Distribution and Format
  • 13. EVALITA 2018 Workshop December 12-13 2018, Turin Evaluation • Ranking: classification of user dialogue acts o F1-score (macro-averaging) • Precision and Recall are also computed o Both, micro- and macro-averaging • Baseline: trivial classifier predicting the majority class o STATEMENT (33%)
  • 14. EVALITA 2018 Workshop December 12-13 2018, Turin Participants • Task open to everyone from industry and academia • Sixteen participants registered, but only two teams actually submitted the o UNITOR (Academia) - Supervised system based on Structured Kernel-based Support Vector Machine - Exploits the parse tree and the cosine similarity between the word vectors in a distributional semantics model o X2Check (Industry) – Report not submitted
  • 15. EVALITA 2018 Workshop December 12-13 2018, Turin
  • 16. EVALITA 2018 Workshop December 12-13 2018, Turin Results System Prec Rec F Prec Rec F Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531 X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957 Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564 Danilo Croce and Roberto Basili A Markovian Kernel-based Approach for itaLIan Speech acT labEliNg Macro Micro
  • 17. EVALITA 2018 Workshop December 12-13 2018, Turin Results System Prec Rec F Prec Rec F Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531 X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957 Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564 • Both systems overcome the baseline • Some classes are harder to predict o Low number of examples in the training data Macro Micro
  • 18. EVALITA 2018 Workshop December 12-13 2018, Turin Performance by class Freq Prec Rec F Prec Rec F OPENING 2% 1.00 1.00 1.00 1.00 0.73 0.84 CLOSING 2% 0.78 0.70 0.74 0.82 0.90 0.86 INFO-REQUEST 25% 0.78 0.83 0.80 0.74 0.79 0.76 SOLICITATION-REQ-CLARIF 7% 0.40 0.33 0.36 0.44 0.33 0.38 STATEMENT 33% 0.75 0.94 0.84 0.67 0.89 0.76 GENERIC-ANSWER 10% 0.86 0.92 0.89 0.76 0.90 0.82 AGREE-ACCEPT 5% 0.65 0.46 0.54 0.57 0.50 0.53 REJECT 5% 0.43 0.08 0.13 0.00 0.00 0.00 KIND-ATT-SMALLTALK 11% 0.50 0.39 0.44 0.47 0.20 0.29 Unitor X2Check Some classes are harder to predict - low number of examples in the training data - the main cause of error is the misclassification as STATEMENT
  • 19. EVALITA 2018 Workshop December 12-13 2018, Turin Ideas for future editions • The best performing system leverages syntactic features o Task-related features are not defined o Follow-up: extending the benchmark with dialogues from different domains • Is the task inherently dependent on the language? o To what extent the approaches generalize beyond Italian? o Dialogues in other languages might be included in the gold standard, as in AMI
  • 20. EVALITA 2018 Workshop December 12-13 2018, Turin Have fun! • Download our dataset from the GitHub EVALITA 2018 repository https://github.com/evalita2018/data

Editor's Notes

  1. SIGdial Meeting on Discourse and Dialogue E.g.: WOCHAT, Special Session on Chatbots and Conversational Agents Natural Language Generation for Dialogue Systems special session
  2. n particular, a recent research trend has emerged to investigate methodologies to enable intelligent access to information, that is by rely- ing on natural dialogues as interaction metaphor. In this perspective, chat-oriented dialogue systems are attracting the increasing attention of both re- search and practitioners interested in the simula- tion of natural dialogues with embodied conversa- tional agents (Klüwer, 2011), conversational inter- faces for smart devices (McTear et al., 2016) and the Internet of Things (Kar and Haldar, 2016). As a consequence, we are assisting to the flourishing of dedicated research venues on chat-oriented in- teraction. It is the case of WOCHAT1, the Special Session on Chatbots and Conversational Agents, now at its second edition, as well as the Nat- ural Language Generation for Dialogue Systems special session2, both co-located with the Annual SIGdial Meeting on Discourse and Dialogue. While not representing any deep understanding of the interaction dynamics, speech acts can be successfully employed as a coding standard for natural dialogues tasks.
  3. n particular, a recent research trend has emerged to investigate methodologies to enable intelligent access to information, that is by relying on natural dialogues as interaction metaphor. In this perspective, chat-oriented dialogue systems are attracting the increasing attention of both re- search and practitioners interested in the simula- tion of natural dialogues with embodied conversa- tional agents (Klüwer, 2011), conversational inter- faces for smart devices (McTear et al., 2016) and the Internet of Things (Kar and Haldar, 2016). As a consequence, we are assisting to the flourishing of dedicated research venues on chat-oriented in- teraction. It is the case of WOCHAT1, the Special Session on Chatbots and Conversational Agents, now at its second edition, as well as the Nat- ural Language Generation for Dialogue Systems special session2, both co-located with the Annual SIGdial Meeting on Discourse and Dialogue. While not representing any deep understanding of the interaction dynamics, speech acts can be successfully employed as a coding standard for natural dialogues tasks.
  4. This approach, while more verbose than a simple accuracy test, arise from the need to correctly address the unbalanced distribution of la- bels in the dataset. Furthermore, by providing de- tailed performance metrics, we intend to enhance interesting discussion on the nature of the problem and the data, as they might emerge from the par- ticipants’ final reports. As a baseline, we use the most frequent label for the user speech acts (i.e., STATEMENT).
  5. One possible reason is that statements rep- resent the majority class, thus inducing a bias in the classifiers. Another possible explanation, is that dialogue moves that appear to be linguistically consistent with the typical structure of statements have been annotated differently, according to the actual communicative role they play.