TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Qomex2010
1. Considering the subjectivity to rationalise evaluation approachesThe example of Spoken Dialogue Systems Marianne Laurent, Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne) 23 June 2010, Qomex 2010, Trondheim, Norway
2. ? ? Spoken Dialogue Systems Spoken Language Understanding Automatic Speech Recognition Spoken Language Generation Text-to Speech Evaluation ? « I can't Connect the Internet! » SPEECH UNDERSTANDING Dialogue Manager SYSTEM OUTPUT Information system Complex task - Dynamic interactions: no comparison to an ideal (fidelity) - Diversity of evaluators profiles, individualities and evaluation situations
3. Internal review of evaluation methods: Ad hoc protocols depending on the evaluator profile… Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
4. Internal review of evaluation methods: Ad hoc protocols ... and on the evaluation context! http://www.slideshare.net/MarianneLo/lrecmlaurentposter Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hocevaluationsalong the lifecycle of industrialspoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
5. Toward one-size-fits-all evaluation protocols? « Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007) « A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997) « We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
6. Roadmap 1 Evaluation dependent on both context and evaluator 2 The evaluator as a mediator, an anthropocentric framework 3 Software implementation and anticipated added value
7. 1 Evaluation, a rationalising contribution for a decision process Estimate material circumstances of the family Free examination Surmise what the family had been doing before the arrival of the unexpected visitor Give the age of the people Remember the clothes worn by the people Yarbus, A. L. (1967), Eye Movement and Vision, Plenum, New York. Remember positions of people and objects in the room
8. 1 Evaluation, a goal driven argumentation discourse « Process through which one defines, obtains and delivers useful pieces of information to settle between the alternative possible decisions. Daniel STUFFLEBEAM L'évaluation en éducation et la prise de décision, 1980, Ottawa, Edition NHP.
9. 2 V-Model process to define of evaluation Nature of the decision to take Take the final decision Confront the results with initial objectives Identify the objectives Meet the objectives? Define criteria Note on a grid of criteria Compare Deduce the indicators Process data into indicators Top-down trend Situation interpreted into evaluation needs and procedure. Bottom-up trend Value judgment: the evaluator creates a meaning. List the data to capture Capture the data Experimental set-up
10. 2 A meta-model to define evaluations Interaction performance Interaction quality Efficiencyrelated aspects Utility & Usefulness Etc. Critical viewpoints Analysis Data-Driven Goal-Driven Data Processing Techniques Log Files Question-naires 3rd Party annotation Physio-metrics Capture
11. 2 A mediator within an “evaluation ecosystem” Resources System of constraints Situation Demand system Community of practice Normative system Corpus of evaluations Rationalising system
12. 3 Software implementation: MPOWERS Multi Point Of vieWEvaluation Refine Studio Define KPIs Retrieval of KPIs & reports Log files Personalised dashboards Third-party annotations Datamart User questionnaires KPIs, an analytical statistical view on the system Data as collected in evaluation campaigns Parameters, a descriptive view on the system Dashboards, Ad hoc selection of KPIs with potential graphics ITU-T Rec P.Supp.24: Parametersdescribing the interaction with SDS
13.
14. Discuss/negotiate to converge toward common practicesFeedback & Inspiration Communities of practice Communities of interest