SlideShare une entreprise Scribd logo
1  sur  14
Considering the subjectivity to rationalise evaluation approachesThe example of Spoken Dialogue Systems Marianne Laurent, Philippe Bretier (Orange Labs)  Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
? ? Spoken Dialogue Systems Spoken Language Understanding Automatic Speech Recognition Spoken Language Generation  Text-to Speech Evaluation ? « I can't Connect the Internet! » SPEECH UNDERSTANDING Dialogue Manager SYSTEM OUTPUT Information system Complex task	    - Dynamic interactions: no comparison to an ideal (fidelity)    - Diversity of evaluators profiles, individualities and evaluation situations
Internal review of evaluation methods: Ad hoc protocols depending on the evaluator profile… Laurent, M., Bretier, P. and Manquillet, C. (2010).  Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
Internal review of evaluation methods: Ad hoc protocols ... and on the evaluation context! http://www.slideshare.net/MarianneLo/lrecmlaurentposter Laurent, M., Bretier, P. and Manquillet, C. (2010).  Ad-hocevaluationsalong the lifecycle of industrialspoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
Toward one-size-fits-all evaluation protocols? «  Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007)   «  A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997) «  We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
Roadmap 1 Evaluation dependent on both context and evaluator 2 The evaluator as a mediator, an anthropocentric framework 3 Software implementation and anticipated added value
1 Evaluation, a rationalising contribution for a decision process Estimate material circumstances of the family Free examination Surmise what the family had been doing before the arrival of the unexpected visitor Give the age of the people Remember the clothes  worn by the people Yarbus, A. L. (1967),  Eye Movement and Vision, Plenum, New York. Remember positions of people and objects in the room
1 Evaluation, a goal driven argumentation discourse «  Process through which  one defines, obtains and delivers useful pieces of information  to settle between the alternative possible decisions. Daniel STUFFLEBEAM L'évaluation en éducation et la prise de décision,  1980, Ottawa, Edition NHP.
2 V-Model process to define of evaluation Nature of the decision to take Take the final decision Confront the results with initial objectives Identify the  objectives Meet the objectives? Define criteria Note on a grid of criteria Compare Deduce the  indicators Process data into indicators Top-down trend Situation interpreted into evaluation needs and procedure. Bottom-up trend  Value judgment: the evaluator creates a meaning. List the data to capture Capture the data Experimental  set-up
2 A meta-model to define evaluations  Interaction performance Interaction quality Efficiencyrelated aspects Utility & Usefulness Etc. Critical viewpoints Analysis Data-Driven Goal-Driven Data Processing Techniques Log Files Question-naires 3rd Party annotation Physio-metrics Capture
2 A mediator within an “evaluation ecosystem” Resources System of constraints Situation Demand system Community of practice Normative system Corpus of evaluations Rationalising system
3 Software implementation: MPOWERS Multi Point Of vieWEvaluation Refine Studio Define KPIs Retrieval of KPIs  & reports Log files Personalised  dashboards Third-party annotations Datamart User questionnaires KPIs,  an analytical statistical view on the system Data  as collected in evaluation campaigns Parameters,  a descriptive view on the system Dashboards,  Ad hoc selection of KPIs with potential graphics ITU-T Rec P.Supp.24:  Parametersdescribing the interaction with SDS
3 Added Value: Impact both for the individual and the belonging communities Contribution & Involvement COOPERATE: Contribute, as a knowledge-farming cooperative Evaluation definition & refinement CONNECT: Identify and create contact with relevant people.  Retrieval of evaluation results COLLABORATE:  ,[object Object]
 Discuss/negotiate to converge toward common practicesFeedback & Inspiration Communities of practice Communities of interest

Contenu connexe

Similaire à Qomex2010

Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsKatrien Verbert
 
Towards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsTowards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsKatrien Verbert
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshareguest94c824
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJames Ballard
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Katrien Verbert
 
Home mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & EvaluationHome mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & Evaluationwow!systems
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FOME2015
 
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...Monica Waters
 
Introduction to OpenSemcq
Introduction to OpenSemcqIntroduction to OpenSemcq
Introduction to OpenSemcqmbtosic
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Katrien Verbert
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsSimon Buckingham Shum
 
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Community Development Society
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Scott Hutcheson, Ph.D.
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentPamela Rutledge
 
Thesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaThesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaCameron Cliff
 

Similaire à Qomex2010 (20)

Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Towards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methodsTowards the next generation of interactive and adaptive explanation methods
Towards the next generation of interactive and adaptive explanation methods
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
 
Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”Interactive recommender systems: opening up the “black box”
Interactive recommender systems: opening up the “black box”
 
Home mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & EvaluationHome mess systems- Prototype 2 & Evaluation
Home mess systems- Prototype 2 & Evaluation
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
Presentación de la defensa de la tesis de Li Yang
Presentación de la defensa de la tesis de Li YangPresentación de la defensa de la tesis de Li Yang
Presentación de la defensa de la tesis de Li Yang
 
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
FoME Symposium 2015 | Workshop 8: Current Evaluation Practices and Perspectiv...
 
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...APPLYING QUALITATIVE RESEARCH IN E-LEARNING  DISCUSSION AND FINDINGS FROM THR...
APPLYING QUALITATIVE RESEARCH IN E-LEARNING DISCUSSION AND FINDINGS FROM THR...
 
Introduction to OpenSemcq
Introduction to OpenSemcqIntroduction to OpenSemcq
Introduction to OpenSemcq
 
Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?Human-centered AI: how can we support end-users to interact with AI?
Human-centered AI: how can we support end-users to interact with AI?
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning Analytics
 
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...Getting from There to Here: Eight Characteristics of Effective Economic & Com...
Getting from There to Here: Eight Characteristics of Effective Economic & Com...
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...Getting from Here to There: Eight Characteristics of Effective Economic & Com...
Getting from Here to There: Eight Characteristics of Effective Economic & Com...
 
Content Analysis Overview for Persona Development
Content Analysis Overview for Persona DevelopmentContent Analysis Overview for Persona Development
Content Analysis Overview for Persona Development
 
master_thesis.pdf
master_thesis.pdfmaster_thesis.pdf
master_thesis.pdf
 
Thesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement TransmediaThesis Proposal: Understanding Audience Engagement Transmedia
Thesis Proposal: Understanding Audience Engagement Transmedia
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Qomex2010

  • 1. Considering the subjectivity to rationalise evaluation approachesThe example of Spoken Dialogue Systems Marianne Laurent, Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
  • 2. ? ? Spoken Dialogue Systems Spoken Language Understanding Automatic Speech Recognition Spoken Language Generation Text-to Speech Evaluation ? « I can't Connect the Internet! » SPEECH UNDERSTANDING Dialogue Manager SYSTEM OUTPUT Information system Complex task - Dynamic interactions: no comparison to an ideal (fidelity) - Diversity of evaluators profiles, individualities and evaluation situations
  • 3. Internal review of evaluation methods: Ad hoc protocols depending on the evaluator profile… Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
  • 4. Internal review of evaluation methods: Ad hoc protocols ... and on the evaluation context! http://www.slideshare.net/MarianneLo/lrecmlaurentposter Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hocevaluationsalong the lifecycle of industrialspoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
  • 5. Toward one-size-fits-all evaluation protocols? «  Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007) «  A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997) «  We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
  • 6. Roadmap 1 Evaluation dependent on both context and evaluator 2 The evaluator as a mediator, an anthropocentric framework 3 Software implementation and anticipated added value
  • 7. 1 Evaluation, a rationalising contribution for a decision process Estimate material circumstances of the family Free examination Surmise what the family had been doing before the arrival of the unexpected visitor Give the age of the people Remember the clothes worn by the people Yarbus, A. L. (1967), Eye Movement and Vision, Plenum, New York. Remember positions of people and objects in the room
  • 8. 1 Evaluation, a goal driven argumentation discourse «  Process through which one defines, obtains and delivers useful pieces of information to settle between the alternative possible decisions. Daniel STUFFLEBEAM L'évaluation en éducation et la prise de décision, 1980, Ottawa, Edition NHP.
  • 9. 2 V-Model process to define of evaluation Nature of the decision to take Take the final decision Confront the results with initial objectives Identify the objectives Meet the objectives? Define criteria Note on a grid of criteria Compare Deduce the indicators Process data into indicators Top-down trend Situation interpreted into evaluation needs and procedure. Bottom-up trend Value judgment: the evaluator creates a meaning. List the data to capture Capture the data Experimental set-up
  • 10. 2 A meta-model to define evaluations Interaction performance Interaction quality Efficiencyrelated aspects Utility & Usefulness Etc. Critical viewpoints Analysis Data-Driven Goal-Driven Data Processing Techniques Log Files Question-naires 3rd Party annotation Physio-metrics Capture
  • 11. 2 A mediator within an “evaluation ecosystem” Resources System of constraints Situation Demand system Community of practice Normative system Corpus of evaluations Rationalising system
  • 12. 3 Software implementation: MPOWERS Multi Point Of vieWEvaluation Refine Studio Define KPIs Retrieval of KPIs & reports Log files Personalised dashboards Third-party annotations Datamart User questionnaires KPIs, an analytical statistical view on the system Data as collected in evaluation campaigns Parameters, a descriptive view on the system Dashboards, Ad hoc selection of KPIs with potential graphics ITU-T Rec P.Supp.24: Parametersdescribing the interaction with SDS
  • 13.
  • 14. Discuss/negotiate to converge toward common practicesFeedback & Inspiration Communities of practice Communities of interest
  • 15. merci ? ? ? @warius