Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Nishimoto icchp2010

1 382 vues

Publié le

Publié dans : Technologie, Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Nishimoto icchp2010

  1. 1. Evaluations of Deletion-Based Method and Mixing-Mased Method for Audio CAPTCHAs Takuya NISHIMOTO (Univ. Tokyo, Japan) Takayuki WATANABE (TWCU, Japan) @nishimotz 1
  2. 2. CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart  popular security techniques on the Web  prevent automated programs from abusing  image-based CAPTCHAs  image containing distorted characters  preventing use of persons with visual disability  audio CAPTCHAs were created create better audio CAPTCHA tasks  safeness: the difference of recognition performance  usability: mental workload of human in listening speech 2
  3. 3. Performance gap model performance of machine should be lower  than the intelligibility of human gap: safeness 100  should be large Human Intelligibility (%) exposed ratio (ER)  0%: random answer ASR  chance-level; no gap  100%: best guess  easy for both; no gap practical condition  0 < ER < 100 0 Exposed Ratio (%) 100 (Provided Information) 3
  4. 4. Safeness: ER control machine is becoming strong  statistical ASR method is the mainstream  supervised machine learning (Hidden Markov Models)  teqniques to cope with the noise CAPTCHA tasks should be created systematically  it should not be created by trial and error  controllability of Exposed Ratio is essential Mixing-based method: best way to control ER?  mixing noises / distorting signals  can hide portion of information, however...  difficult to measure the ER, performance is not easy to predict  alternatives must be investiated 4
  5. 5. Usability: Mental workload CAPTCHAs should not increase mental workload the workload may increase, if they are..  difficult to listen / memorize the task long task (many charactors)  difficult to remember  safer, but higher mental workload requirements  information can be obtained in short time, easily investigation required  human auditory sensation  language cognition 5
  6. 6. Top-down knowledge incomplete stimulus  knowledge helps to guess the information visual sensation:  if part of image is missing, or part of the word is hidden  common knowledge can complement image  about the character and the vocabulary speech perception:  if "word familiarity" is high: easy to guess phonemic restoration  may help the human listening 6
  7. 7. Deletion-based method delete some parts on temporal axis little by little  if every 30 msec over a period of 100 msec is replaced with silence, the 30% of the information was deleted  if the ratio of remained sections go down, the degree of listening difficulty may increase. Exposed Ratio can be controlled easily however, not easy to understand.... deletion (original) Festival engine KAL (HMM-based) 7
  8. 8. Phonemic restration interrupted speech and noise maskers combined  the fence effect  continuity of speech signal perceived  may help human listening  does not affect machine performance expected to enlarge the gap  performance difference of human and machine deletion + phonemic restration 8
  9. 9. NASA-TLX evaluation mental workload  rating 6 subscales  Mental, Physical, and Temporal Demands, Frustration, Effort, and Performance  range: 0-100 weights of subscales (6-1)  for each participant  placing an order how the 6 dimensions are related to personal definition of workload weighted workload (WWL) 9
  10. 10. Deletion vs Mixing (Exp1) objective: compare intelligibility and mental workload  Deletion-Based Method (DBM)  Mixing-Based Method (MBM)  effect of SNR (signal-to-noise ratio) in MBM human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  Japanese recorded speech  subjects: 15 (5 x 3) undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject  their average and SD become 50 and 10 respectively 10
  11. 11. Setup (Exp1) compare DBM and MBM within a person  acoustic presentation: given by headphone  at the subject’s preferred reference loudness level MBM disturbing signals  utterances of Japanese sentences fragmented as short periods, shuffled and combined Group Trial 1: D30 Trial 2: M0, Mm10, Mm20 G1 DBM 30% MBM SNR 0dB G2 DBM 30% MBM SNR -10dB G3 DBM 30% MBM SNR -20dB 11
  12. 12. Performance (Exp1) DBM(T1):marginally significant (p<0.1) (G1>G2) DBM 30% task is harder than MBM 0dB, -10dB, -20dB MBM(T2): effect of SNR conditions is significant, however, only between 0dB & -10dB (p<0.05) (G1>G2) DBM 30% vs DBM 30% vs DBM 30% vs100 MBM 0dB MBM -10dB MBM -20dB908070605040 T1 T230 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 12
  13. 13. Workload (Exp1) WWL: individual difference cancelled  subtraction of DBM (D30) score from MBM (M0, Mm10 and Mm20) score was performed DBM 30% vs MBM 30% vs DBM 30% vs MBM 0dB MBM -10dB MBM -20dB 20 10 0 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305-10-20-30 WWL: MBM 0db < DBM 30% ?-40-50 no significance (ANOVA)-60 MBM: task difficulty is not easy to control 13
  14. 14. Human vs Machine (Exp2) deletion-based method (DBM) is evaluated automatic speech recognition using HMM  task: numbers (1-7 digits) in Japanese  training: 8440 uttrances, 18 states, 20 mixtures  evaluation: 1001 utterances, sentence recognition human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  subjects: 17 undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject 14
  15. 15. Results (Exp2) DBM: Exposed Ratio can controll the gap size 100 70 90 Workload 60 80 70 50 60 50 Human Ave. (%) 40 40 Machine (%) 30 30 30% 50% 70% 30% 50% 70% DBM 30% gap is very large, however, Significant diffrerence (p<0.05) workload is very high. 15
  16. 16. Conclusion audio CAPTCHA task using phonemic restration  deletion-based method (DBM) evaluation of CAPTCHA task  performance + mental workload (NASA-TLX) comparison between DBM and MBM  DBM: easier to controll the task future works  ASR evaluation of mixing-based method  improve the noise  investigation of phonemic restration  really improving performance? only decreasing workload?  word familiarity, speech rate, synthesized speech, ... 16