SlideShare une entreprise Scribd logo
1  sur  21
Evaluation of Multi-user System of
Voice Interaction Using Grammars
        Elizabete Munzlinger, Fabricio da Silva Soares,
            and Carlos Henrique Quartucci Forster
                  {bety, p2p, forster}@ita.br

          ITA – Instituto Tecnológico de Aeronáutica
   EEC-I – Engenharia Eletrônica e Computação – Informática
              Divisão de Ciência da Computação
Agend
 Introduction
 Grammar Design
 Tests and Results of Accuracy
 Conclusion
Introduction
   Exaustive training




Fig. 1. Train the system to recognize one’s voice through the exhaustive reading of texts
Introduction
   Several contexts




           Fig. 2. Systems which particular application and contexts
Introduction
   Domotic system



            Port [4], Action [true]



                                             1
        Por favor,                                           on
          ligue a
        lâmpada!




                       Fig. 3. Prototype of Domotic system
Introduction
   Domotic system




                Fig. 3. Prototype of Domotic system
Grammar Design
   Grammar tree                         Main


                          Rule1          Rule2        Rule3


    Terminal symbols             Rule4      Rule1 Rule5             Rule6


                      Rule7        Rule8         Terminal symbols   Terminal symbols


                ...        ...     Terminal symbols

                      Fig. 4. The grammar tree composed by nodes
Grammar Design
      Grammar in Java Speech Grammar Format
grammar br.ita.domovox;
public <command> = [<introdução>] <action> [<complemento>] <object> [<complemento>] [<conclusão>];
<introdução> = [<educação>] [<complemento>] [<quem>];
<action> = <ação>;
<complemento> = [<posse>] [<outros>] [<onde>] [<tempo>] [<educação>] [<outros>];
<object> = [<indica>] [<posse>] <dispositivo>;
<conclusão> = [<introdução>];
<educação> = [<outros>] [<tratamento>] [<sistema>] [<tratamento>] [<complemento>];
<quem> = [<sujeito>] [<desejo>];
<posse> = [<outros>] [<possessivo>] [<outros>] [<sujeito>] [<outros>];
<onde> = [<lugar>] | [<outros>];
<tempo> = [<quando>] | [<outros>];
<tratamento> = por favor | faz favor | por gentileza | por obséquio | faça a gentileza | faça o favor | fazer o favor | fazer a
gentileza;
<sistema> = pc | computador | notebook | máquina | sistema | domovox | sistema domovox | sistema de voz | sistema de fala | meu
| cara | bicho | mano | maluco;
<sujeito> = eu | tu | ele | ela | nós | vós | eles | elas | você | vocês | mim | gente;
<desejo> = [<querer>] | [<desejar>] | [<precisar>] | [<necessitar>] | [<ir>] | [<poder>];
<querer> = quero | queres | quer | queremos | quereis | querem | querendo;
<desejar> = desejo | desejas | deseja | desejamos | desejais | desejam | desejando;
<precisar> = preciso | precisas | precisa | precisamos | precisais | precisam | precisando;
<necessitar> = necessito | necessitas | necessita | necessitamos | necessitais | necessitam | necessitando;
<ir> = vou | vais | vai | vamos | vão;
<poder> = pode | podes;
<ação> = <verdadeiro> | <falso>;
<verdadeiro> = (ligar | ligue | ativar | ative | ascender | ascenda) {true};
<falso> = (desligar | desligue | desativar | desative | apagar | apague) {false}
<indica>= [<artigo>] | [<indicação>];
<artigo> = o | a | os | as;
<indicação> = esse | essa | este | esta | aquele | aquela | aquilo | todos | todos os | todas as | tudo;
<dispositivo> = <porta00> | <porta01> | <porta02> | <porta03> | <porta04> | <porta05>;
<porta00> = (tudo | dispositivos | aparelhos) {0};
<porta01> = (luz | lâmpada) {1};
<porta02> = (ventilador | aparelho ventilador) {2};
<porta03> = (tv | tevê | televisão | televisor | aparelho de tv | aparelho televisor) {3};
<porta04> = (abajur | luminária | candelabro) {4};
<porta05> = (outros) {5};
<quando> = já | agora | nesse momento | nesse minuto | nesse segundo | agora mesmo;
<lugar> = aqui | aí | lá | ambiente | quarto | sala | peça | lugar | casa | apartamento | ap;
<possessivo> = meu | minha | meus | minhas | nosso | nossa | nossos | nossas | vosso | vossa | vossos | vossas | dele | dela |
deles | delas | desse | dessa | desses | dessas | nesse | nessa | nesses | nessas;
<outros> = que | da | de | do | mesmo | para | pra | momento | mandando | também | inclusive | estou | aí | ô | é | ã | hum |
mas | pode;
Grammar Design
     Computational resources


100% 1000                                      100%                 980MB
                                                                                  CPU
      800
       600
50%
       400                                                                        Memory
       200
         0
        0 min    0,5 min   1,0 min   1,5 min   2,0 min   2,5 min   3,0 min




      Graph. 1. Graphic of allocation and processing of the structure of the grammar
Grammar Design
   Redesign of grammar
                                    Comando


                Complemento*           Ação      Objeto


        Por favor, eu     Verdadeiro     Falso     Porta 01 Porta 02 ...
        você, sistema,
        do, preciso,
                             ligar,      desligar,    1,        2, TV,
        meu, pode, de,
                             ativar,     desativar,   luz,      televisor,
        quarto, a, o...
                             acender     apagar       lâmpada   televisão
                             ...         ...          ...       ...
                          Fig. 5. The new grammar tree
Grammar Design
   Computational resources


100% 1000
                                                                                CPU
      800
      600
50%
      400                                                         423MB         Memory
      200
        0                                    5%
       0 min   0,5 min   1,0 min   1,5 min   2,0 min   2,5 min   3,0 min




      Graph. 2. Graphic of allocation and processing of the structure of the grammar
Grammar Design
   Representation of grammar




     Fig. 6. Grammar represented through a state machine with a recursivity rule
Grammar Design
   Accepted commands




    Table 1. Examples of simple and complex commands based in the rules of grammar
Tests and Results of Accuracy
   Domotic system



                                                             logged



                  Por favor,
                    ligue a
                  lâmpada!
                                                    registered


      Fig. 7. Comparison between registered spoken words and the log system
Tests and Results of Accuracy
   General rates of acceptation

     100%
      90%         98%                                        Accepted without
                                                             log analysis
      80%                  85,70%
      70%
      60%                                                    Disregarding
                                                             definite articles
      50%
      40%
      30%                                                    Exactly commands
      20%                                                    with log analysis
                                     24,10%
      10%
       0%
            All commands (simples and complex)

                Graph. 3. Rates of acceptation of all commands
Tests and Results of Accuracy
   Rates of acceptation by simple and complex commands

    40%
    35%                                                              Definite articles
                                         35,3                        accepted
    30%                                         33,0
    25%
    20%
    15%                                                              Definite artices
                                                                     right
    10%         10,9
     5%                 8,9

     0%
           Simple commands         Complex commands

    Graph. 4. Rates of definite articles acceptation by simple and complex commands
Tests and Results of Accuracy
   Rates of acceptation by numbers from 1 to 32

    70%                                                    Word form

    60%             66,80%
                                                           Numeral form
    50%
    40%                                                    Just word form
    30%    33,20%                    34%
    20%                                                    Just numeral form (6, 7,
                                                           14, 19, 23, 24, 25, 26, 28,
    10%                                                    29, 32)
                              0%
    0%
               Number from 1 to 32

           Graph. 5. Rates of acceptation by numbers from 1 to 32
Tests and Results of Accuracy
   Rates of errors in the numbers recognition

       Highest rates of error:
          21, 27 and 31

       Mistook words with similar sound:
          21    “20 eu”
          31    “30 aí eu”     “30 aí vou”
                 “30 aí o”      “30 aí os”
                 “30 aqui os” “30 aqui eu”
                 “30 eu”        “30 em”
                 This happened in 70% of the cases
Conclusion
 Behavior of a voice interface system
 Design of grammar
 Experiments with users
 Redesign of grammar
References
1.   Burstein, A., Stolzle, A., Brodersen, R.W.: Using Speech
     Recognition in a Personal Communications System. In:
     Communications, 1992. ICC 92, Conference record,
     SUPERCOMM/ICC ’92, IEEE, Los Alamitos (1992)
2.   Pfaff, G.E.: User Interface Management Systems, p. 72.
     Springer, New York (1985)
3.   Seneff, S.: TINA: A Natural Language System for Spoken
     Language Applications. Comput. Linguist. 18, 61–86 (1992)
4.   Sun Microsystems Ltd, Java Speech API Programmer’s Guide
     Version 1.0, [online at],
     http://java.sun.com/products/javamedia/speech/
5.   Vieira, R., Lima, V.L.: Lingüística Computacional: Princípios e
     Aplicações. In: JAIA – ENIA, 2001, Fortaleza (2001)
Evaluation of Multi-user System of
Voice Interaction Using Grammars
        Elizabete Munzlinger, Fabricio da Silva Soares,
            and Carlos Henrique Quartucci Forster
                  {bety, p2p, forster}@ita.br

          ITA – Instituto Tecnológico de Aeronáutica
   EEC-I – Engenharia Eletrônica e Computação – Informática
              Divisão de Ciência da Computação

Contenu connexe

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Evaluation of multi user system of voice interaction using grammars(slide share)

  • 1. Evaluation of Multi-user System of Voice Interaction Using Grammars Elizabete Munzlinger, Fabricio da Silva Soares, and Carlos Henrique Quartucci Forster {bety, p2p, forster}@ita.br ITA – Instituto Tecnológico de Aeronáutica EEC-I – Engenharia Eletrônica e Computação – Informática Divisão de Ciência da Computação
  • 2. Agend  Introduction  Grammar Design  Tests and Results of Accuracy  Conclusion
  • 3. Introduction  Exaustive training Fig. 1. Train the system to recognize one’s voice through the exhaustive reading of texts
  • 4. Introduction  Several contexts Fig. 2. Systems which particular application and contexts
  • 5. Introduction  Domotic system Port [4], Action [true] 1 Por favor, on ligue a lâmpada! Fig. 3. Prototype of Domotic system
  • 6. Introduction  Domotic system Fig. 3. Prototype of Domotic system
  • 7. Grammar Design  Grammar tree Main Rule1 Rule2 Rule3 Terminal symbols Rule4 Rule1 Rule5 Rule6 Rule7 Rule8 Terminal symbols Terminal symbols ... ... Terminal symbols Fig. 4. The grammar tree composed by nodes
  • 8. Grammar Design  Grammar in Java Speech Grammar Format grammar br.ita.domovox; public <command> = [<introdução>] <action> [<complemento>] <object> [<complemento>] [<conclusão>]; <introdução> = [<educação>] [<complemento>] [<quem>]; <action> = <ação>; <complemento> = [<posse>] [<outros>] [<onde>] [<tempo>] [<educação>] [<outros>]; <object> = [<indica>] [<posse>] <dispositivo>; <conclusão> = [<introdução>]; <educação> = [<outros>] [<tratamento>] [<sistema>] [<tratamento>] [<complemento>]; <quem> = [<sujeito>] [<desejo>]; <posse> = [<outros>] [<possessivo>] [<outros>] [<sujeito>] [<outros>]; <onde> = [<lugar>] | [<outros>]; <tempo> = [<quando>] | [<outros>]; <tratamento> = por favor | faz favor | por gentileza | por obséquio | faça a gentileza | faça o favor | fazer o favor | fazer a gentileza; <sistema> = pc | computador | notebook | máquina | sistema | domovox | sistema domovox | sistema de voz | sistema de fala | meu | cara | bicho | mano | maluco; <sujeito> = eu | tu | ele | ela | nós | vós | eles | elas | você | vocês | mim | gente; <desejo> = [<querer>] | [<desejar>] | [<precisar>] | [<necessitar>] | [<ir>] | [<poder>]; <querer> = quero | queres | quer | queremos | quereis | querem | querendo; <desejar> = desejo | desejas | deseja | desejamos | desejais | desejam | desejando; <precisar> = preciso | precisas | precisa | precisamos | precisais | precisam | precisando; <necessitar> = necessito | necessitas | necessita | necessitamos | necessitais | necessitam | necessitando; <ir> = vou | vais | vai | vamos | vão; <poder> = pode | podes; <ação> = <verdadeiro> | <falso>; <verdadeiro> = (ligar | ligue | ativar | ative | ascender | ascenda) {true}; <falso> = (desligar | desligue | desativar | desative | apagar | apague) {false} <indica>= [<artigo>] | [<indicação>]; <artigo> = o | a | os | as; <indicação> = esse | essa | este | esta | aquele | aquela | aquilo | todos | todos os | todas as | tudo; <dispositivo> = <porta00> | <porta01> | <porta02> | <porta03> | <porta04> | <porta05>; <porta00> = (tudo | dispositivos | aparelhos) {0}; <porta01> = (luz | lâmpada) {1}; <porta02> = (ventilador | aparelho ventilador) {2}; <porta03> = (tv | tevê | televisão | televisor | aparelho de tv | aparelho televisor) {3}; <porta04> = (abajur | luminária | candelabro) {4}; <porta05> = (outros) {5}; <quando> = já | agora | nesse momento | nesse minuto | nesse segundo | agora mesmo; <lugar> = aqui | aí | lá | ambiente | quarto | sala | peça | lugar | casa | apartamento | ap; <possessivo> = meu | minha | meus | minhas | nosso | nossa | nossos | nossas | vosso | vossa | vossos | vossas | dele | dela | deles | delas | desse | dessa | desses | dessas | nesse | nessa | nesses | nessas; <outros> = que | da | de | do | mesmo | para | pra | momento | mandando | também | inclusive | estou | aí | ô | é | ã | hum | mas | pode;
  • 9. Grammar Design  Computational resources 100% 1000 100% 980MB CPU 800 600 50% 400 Memory 200 0 0 min 0,5 min 1,0 min 1,5 min 2,0 min 2,5 min 3,0 min Graph. 1. Graphic of allocation and processing of the structure of the grammar
  • 10. Grammar Design  Redesign of grammar Comando Complemento* Ação Objeto Por favor, eu Verdadeiro Falso Porta 01 Porta 02 ... você, sistema, do, preciso, ligar, desligar, 1, 2, TV, meu, pode, de, ativar, desativar, luz, televisor, quarto, a, o... acender apagar lâmpada televisão ... ... ... ... Fig. 5. The new grammar tree
  • 11. Grammar Design  Computational resources 100% 1000 CPU 800 600 50% 400 423MB Memory 200 0 5% 0 min 0,5 min 1,0 min 1,5 min 2,0 min 2,5 min 3,0 min Graph. 2. Graphic of allocation and processing of the structure of the grammar
  • 12. Grammar Design  Representation of grammar Fig. 6. Grammar represented through a state machine with a recursivity rule
  • 13. Grammar Design  Accepted commands Table 1. Examples of simple and complex commands based in the rules of grammar
  • 14. Tests and Results of Accuracy  Domotic system logged Por favor, ligue a lâmpada! registered Fig. 7. Comparison between registered spoken words and the log system
  • 15. Tests and Results of Accuracy  General rates of acceptation 100% 90% 98% Accepted without log analysis 80% 85,70% 70% 60% Disregarding definite articles 50% 40% 30% Exactly commands 20% with log analysis 24,10% 10% 0% All commands (simples and complex) Graph. 3. Rates of acceptation of all commands
  • 16. Tests and Results of Accuracy  Rates of acceptation by simple and complex commands 40% 35% Definite articles 35,3 accepted 30% 33,0 25% 20% 15% Definite artices right 10% 10,9 5% 8,9 0% Simple commands Complex commands Graph. 4. Rates of definite articles acceptation by simple and complex commands
  • 17. Tests and Results of Accuracy  Rates of acceptation by numbers from 1 to 32 70% Word form 60% 66,80% Numeral form 50% 40% Just word form 30% 33,20% 34% 20% Just numeral form (6, 7, 14, 19, 23, 24, 25, 26, 28, 10% 29, 32) 0% 0% Number from 1 to 32 Graph. 5. Rates of acceptation by numbers from 1 to 32
  • 18. Tests and Results of Accuracy  Rates of errors in the numbers recognition  Highest rates of error:  21, 27 and 31  Mistook words with similar sound:  21 “20 eu”  31 “30 aí eu” “30 aí vou” “30 aí o” “30 aí os” “30 aqui os” “30 aqui eu” “30 eu” “30 em” This happened in 70% of the cases
  • 19. Conclusion  Behavior of a voice interface system  Design of grammar  Experiments with users  Redesign of grammar
  • 20. References 1. Burstein, A., Stolzle, A., Brodersen, R.W.: Using Speech Recognition in a Personal Communications System. In: Communications, 1992. ICC 92, Conference record, SUPERCOMM/ICC ’92, IEEE, Los Alamitos (1992) 2. Pfaff, G.E.: User Interface Management Systems, p. 72. Springer, New York (1985) 3. Seneff, S.: TINA: A Natural Language System for Spoken Language Applications. Comput. Linguist. 18, 61–86 (1992) 4. Sun Microsystems Ltd, Java Speech API Programmer’s Guide Version 1.0, [online at], http://java.sun.com/products/javamedia/speech/ 5. Vieira, R., Lima, V.L.: Lingüística Computacional: Princípios e Aplicações. In: JAIA – ENIA, 2001, Fortaleza (2001)
  • 21. Evaluation of Multi-user System of Voice Interaction Using Grammars Elizabete Munzlinger, Fabricio da Silva Soares, and Carlos Henrique Quartucci Forster {bety, p2p, forster}@ita.br ITA – Instituto Tecnológico de Aeronáutica EEC-I – Engenharia Eletrônica e Computação – Informática Divisão de Ciência da Computação

Notes de l'éditeur

  1. Hello, we are from the Technological Institute of aeronautics. Our work’s name is: Evaluation of Multi-user System of Voice Interaction Using Grammars.
  2. In this paper we shows an experimental study about the design of grammars for a voice interface system. The influence of the grammar design on the behavior of the voice recognition system regarding accuracy and computational cost is assessed through tests. With the re-design of a grammar we show that those characteristics can be expressively improved. Este artigo apresenta um estudo experimental sobre o projeto de gramáticas para um sistema de interface por voz. A influência da gramática no comportamento do sistema de reconhecimento de voz quanto à exatidão e desempenho são apresentados através de testes. Apresentamos uma nova proposta quando a composição de regras da gramática
  3. Many speech recognition systems need every new user to train the system to recognize one’s voice through the exhaustive reading of texts. This training is necessary because these systems often use extended vocabularies of words. It is desirable to have a system independent of the training and able to recognize the same words when spoken by different voices, with different accents. Muitos sistemas de reconhecimento de fala necessitam que cada usuário treine o sistema para reconhecer sua fala através da leitura exaustiva de textos. A necessidade desse treinamento é devida ao extenso vocabulário de palavras que esses sistemas podem utilizar [1]. É desejável que um sistema independente de treinamento seja capaz de reconhecer as mesmas palavras quando pronunciadas por diferentes vozes, com diferentes sotaques [5].
  4. Applications that use recognized commands don’t need such extended vocabulary, which can be restricted to the needs of the particular application and contexts. Example: sciences, computation, economy, astronomy, domotics, aeronautics. By the use of grammars associated to the application a limit of possible words to every context is determined. The right design of a grammar can make the application become a multi-user system. Aplicações que utilizam reconhecimento de comandos não necessitam de um vocabulário tão abrangente, podendo este ser restrito às necessidades do aplicativo. Através do uso de gramáticas associadas ao aplicativo, é imposto um limite de possíveis palavras para cada contexto. O uso correto de gramáticas pode capacitar aplicativos a se tornarem multi-usuário.
  5. The grammar was used in a prototype of Domotic system that controls up to 32 devices through voice recognition. The system uses the parallel port of the computer and is connected to an electronic circuit that activates the devices. For the Automatic Speech Recognition system, IBM Via Voice was chosen because its acceptance of Brazilian Portuguese. The Domotic application was developed in Java and uses IBM Java Speech Technology API. As gramáticas foram aplicadas em um protótipo de sistema Domótico que controla até 32 dispositivos através de interface de voz. O sistema utiliza a porta paralela do computador e está interligado a um Circuito Eletrônico projetado para tal controle. Foi utilizado como sistema de ASR o IBM Via Voice porque admite o Português Brasileiro. O aplicativo Domótico é desenvolvido na linguagem Java e utiliza a API IBM Java Speech Technology.
  6. The grammar was used in a prototype of Domotic system that controls up to 32 devices through voice recognition. The system uses the parallel port of the computer and is connected to an electronic circuit that activates the devices. For the Automatic Speech Recognition system, IBM Via Voice was chosen because its acceptance of Brazilian Portuguese. The Domotic application was developed in Java and uses IBM Java Speech Technology API. As gramáticas foram aplicadas em um protótipo de sistema Domótico que controla até 32 dispositivos através de interface de voz. O sistema utiliza a porta paralela do computador e está interligado a um Circuito Eletrônico projetado para tal controle. Foi utilizado como sistema de ASR o IBM Via Voice porque admite o Português Brasileiro. O aplicativo Domótico é desenvolvido na linguagem Java e utiliza a API IBM Java Speech Technology.
  7. A grammar is built from a set of sentences separate by production rules and structured as a tree composed by nodes. The nodes of the grammar are contained in a static structure describing a hierarchy of nodes from the main node and a set of nodes dependent on it. In two-dimensional disposition it is possible to see the possibilities of connections between the levels of the tree following its hierarchy until reaching the terminal symbols. Uma gramática é construída a partir de um conjunto de sentenças separadas por regras e estruturadas em forma de árvore composta por nós. Os nós da gramática são contidos em uma estrutura estática que descreve uma hierarquia de nós a partir do nó pai, e um conjunto de nós dependentes deste. Em disposição bidimensional podem-se observar as possibilidades de ligações permissíveis entre os níveis da árvore seguindo a sua hierarquia, até alcançar os símbolos terminais.
  8. At first we designed a grammar for general use, (by systems with several contexts), based on the morphological analysis used in the sentences of the Portuguese language, and made of many rules that determines, for example, verbs, subjects, treatments, pronouns and articles. Thus the rule, that defines an article, comprises other two sub-rules, for definite articles and indefinite articles. In the end the grammar has a total of 64 sub-rules and 167 terminal symbols. Inicialmente foi projetada uma gramática para uso geral (por sistemas de diversos contextos) baseada na análise morfológica aplicada nas orações da língua portuguesa, consistindo em muitas regras que definem, por exemplo, verbo , sujeito , tratamento , pronomes e artigos . Assim, a regra que define artigo foi composta por outras duas sub-regras, para artigos definidos e artigos indefinidos . Ao final, a gramática totalizou 64 sub-regras com um total de 167 símbolos terminais.
  9. It was noticed that this complex grammar lowers the performance of the recognition system making it impossible to execute the application. It took at least 980 Mega Bytes of memory and 100% of CPU occupancy during 1 minute for allocation and processing of the structure of the grammar. Therefore, there were no computational resources remaining to analyze any sentence. To solve this problem, the grammar was restructured with changes to the rules composition resulting in the new tree. Observou-se que esta gramática complexa, torna baixo o desempenho do sistema de reconhecimento de voz ao ponto de impossibilitar a execução do aplicativo. Para alocação da estrutura da gramática, foram ocupados em média 980 MB de memória, com 100% de uso de CPU por até 1 min. Desse modo não restaram recursos computacionais para análise de sentença alguma. Para resolver esse problema, a gramática passou então por uma reestruturação com modificações na composição das regras resultando na árvore apresentada na Figura 1.
  10. The main node of the grammar is the rule command, that is composed by the sub-rules, complement, action and object. The rule action has two sub-rules, true and false, that controls the activation condition of the devices in the Domotic system. The rule object, has one sub-rule for each one of the 32 devices to be controlled, and also must return the value of accepting of just one of its rules. The sub-rule complemet has no value of acceptation and contains 165 terminal symbols extracted from the 35 sub-rules morphologically separated beforehand. Using this grammar, the consumption of memory went down to an average of 423 Mega Bytes, and the duration of total use of the CPU was less than one second. Como nó pai tem a regra comando que é composta pelas sub-regras complemento , ação e objeto . A regra ação contém duas sub-regras, verdadeiro e falso , que controlam o estado de funcionamento dos dispositivos no sistema Domótico. A regra objeto contém uma sub-regra para cada um dos 32 dispositivos a serem controlados, e também deverá retornar obrigatoriamente o valor de aceitação de somente uma de suas sub-regras. A sub-regra complemento não possui valor de aceitação e contém 165 símbolos terminais extraídos das 35 sub-regras antes separas morfologicamente. Para esta gramática o consumo de memória apresentou média de 423 MB para alocação da estrutura da gramática e pico de 100% de uso de CPU por menos de 1 segundo.
  11. The main node of the grammar is the rule command, that is composed by the sub-rules, complement, action and object. The rule action has two sub-rules, true and false, that controls the activation condition of the devices in the Domotic system. The rule object, has one sub-rule for each one of the 32 devices to be controlled, and also must return the value of accepting of just one of its rules. The sub-rule complemet has no value of acceptation and contains 165 terminal symbols extracted from the 35 sub-rules morphologically separated beforehand. Using this grammar, the consumption of memory went down to an average of 423 Mega Bytes, and the duration of total use of the CPU was less than one second. Como nó pai tem a regra comando que é composta pelas sub-regras complemento , ação e objeto . A regra ação contém duas sub-regras, verdadeiro e falso , que controlam o estado de funcionamento dos dispositivos no sistema Domótico. A regra objeto contém uma sub-regra para cada um dos 32 dispositivos a serem controlados, e também deverá retornar obrigatoriamente o valor de aceitação de somente uma de suas sub-regras. A sub-regra complemento não possui valor de aceitação e contém 165 símbolos terminais extraídos das 35 sub-regras antes separas morfologicamente. Para esta gramática o consumo de memória apresentou média de 423 MB para alocação da estrutura da gramática e pico de 100% de uso de CPU por menos de 1 segundo.
  12. Grammar rules can be represented by states of a machine as shown in the Figure, where R1, R2 and R3 represent the rules action , object and complement respectively. The recursivity of R3 makes possible the acceptation of any sequences of terminal symbols, with recognition of simple and complex commands represented in the Table. Podemos representar as regras da gramática através dos estados de uma máquina como mostra a Figura 2., onde R1, R2 e R3 representam as regras ação , objeto e complemento respectivamente. A recursividade de R3 permite aceitação de quaisquer seqüências de símbolos terminais com reconhecimento de comandos simples e complexos como apresentado na Tabela.
  13. Grammar rules can be represented by states of a machine as shown in the Figure, where R1, R2 and R3 represent the rules action , object and complement respectively. The recursivity of R3 makes possible the acceptation of any sequences of terminal symbols, with recognition of simple and complex commands represented in the Table. Podemos representar as regras da gramática através dos estados de uma máquina como mostra a Figura 2., onde R1, R2 e R3 representam as regras ação , objeto e complemento respectivamente. A recursividade de R3 permite aceitação de quaisquer seqüências de símbolos terminais com reconhecimento de comandos simples e complexos como apresentado na Tabela.
  14. At first, 16 users were submitted to the application. As a high rate of acceptation was noticed, an important question was made. Is the system really recognizing what is spoken by the user? To answer this question, all the words really recognized were logged. We could clearly detect incompatibilities between spoken and recognized words and as result of the log analysis we had: Inicialmente foram submetidos ao aplicativo 16 usuários. Ao perceber a alta taxa de aceitação dos comandos, levantou-se uma importante questão: Será que o sistema está reconhecendo exatamente o que é dito pelo usuário? Foi gerado então um log com registro de todas as palavras ( tokens ) efetivamente reconhecidas. Observou-se claramente que existem discordâncias entre o que é falado e o que é reconhecido e, como resultado da análise do log obteve-se:
  15. The rate of acceptation of all the simple and complex commands was 98%. However around 24% really match what was spoken by the user, becoming around 85% when disregarding the presence of definite articles. 1. A taxa de aceitação de todos os comandos simples e complexos efetuados ficou em 98%. Contudo, somente em torno de 24% corresponderam exatamente ao comando emitido pelo usuário, e em torno de 85% desconsiderando a presença de artigos definidos.
  16. The definite articles were recognized in around 11% of the selected simple commands and from these commands around 18% were not right. And curiously the rate was around 35% for selected complex commands, and just 6.5% of them were not right. 2. Os artigos definidos foram reconhecidos em 10.9% dos comandos simples selecionados, sendo que destes, 18.6% estavam incorretos. E curiosamente apresentou taxa de 35.3% em comandos complexos selecionados, sendo destes, apenas 6.5% incorretos.
  17. In tests with commands containing numbers from 1 to 32 written as words and in numeral form, the recognition was alternated. The recognition in numeral form had the rate of around 66%. For 34% of the numbers we just had the recognition in the numeral form. 3 Nos testes de comandos com os números por extenso e em numerais, compostos pelos números de 1 a 32, o reconhecimento foi alternado. O reconhecimento em forma de numeral obteve taxa de 66.8%. Para 34.3 % dos números, ocorreu o reconhecimento somente na forma de numeral.
  18. The numbers with the highest rates of errors in the recognition was 21, 27 and 31. We noticed the system mistook words with similar sound for numbers like “20 I” and “30 I” for the numbers 21 and 31. This happened in 70% of the cases in utterances of the number 31. 4. Os números com maior taxa de erro no reconhecimento foram o 21, 27 e 31. Ocorreram troca por composições de palavras com sonoridade semelhante ao da pronúncia do número, como “20 eu” para o número 21. Esse caso se repetiu em 70% para o ditado do número 31, sendo trocado por composições como esses.
  19. In this article we study the behavior of a voice interface system, and the implications in the design of grammars to define the voice commands. This study was accomplished using experiments with users, re-designing of a grammar with recursive rules and creating a log to analyze and adjust the grammar. We noticed that the presence of many sub-rules, even with few terminal symbols, demands more computational resources than the opposite. So, the adoption of a small vocabulary in a grammar does not guarantee a low computational cost or accuracy in the recognition. The use of the re-designed grammar made especially to the application and with good testing brings better recognition accuracy, because it will allow prediction of the next word. This is crucial to the critical systems and decision, where the recognition must be precise. Multi-user coverage without the need of training is a fundamental feature of the voice interfaces of the present days. From this study, we can create a methodology for automatic generation of grammars for interactive applications with proper care about the design of rules. This work intends to helping the coming of an era when interfaces will be more natural to people. Neste artigo estudamos o comportamento de um sistema de interface de voz e as implicações no projeto de gramática para definição dos comandos de voz. Este estudo foi feito através de experimentos com usuários, modelagem de gramática com regras recursivas e pela criação do arquivo de log para análise e ajuste das gramáticas. Observou-se que a presença de muitas sub-regras mesmo com poucos símbolos terminais demanda mais recursos computacionais do que o inverso. Sendo assim, nem sempre um vocabulário pequeno presente em uma gramática garante menor custo computacional ou maior exatidão no reconhecimento. O uso de uma gramática modelada especialmente para a aplicação e devidamente testada fornece maior exatidão no reconhecimento, por ela predizer a palavra seguinte. Isso é crucial para sistemas críticos e de decisão, onde o reconhecimento deve ser preciso. A cobertura multi-usuário sem necessidade de treinamento, é uma característica fundamental para interfaces de voz atuais. A partir deste estudo, é possível projetar uma metodologia de geração automática de gramáticas para aplicações interativas, com uma nova proposta quanto a composição de regras. Este trabalho buscou auxiliar no processo de alcance de uma nova era na interação homem-máquina, onde cada vez mais a tecnologia procura criar interfaces que sejam mais naturais ao homem.
  20. References This is the references of the work
  21. This is alone. Tank you. We acknowledge CAPES will be the financial support, and all to other colleagues that helped being part of the tests. I am opened will be questions.