Nishimoto icchp2010

Evaluations of Deletion-Based Method
and Mixing-Mased Method
for Audio CAPTCHAs

Takuya NISHIMOTO (Univ. Tokyo, Japan)
Takayuki WATANABE (TWCU, Japan)
@nishimotz

1

CAPTCHA
 Completely Automated Public Turing test
to tell Computers and Humans Apart
 popular security techniques on the Web
 prevent automated programs from abusing
 image-based CAPTCHAs
 image containing distorted characters
 preventing use of persons with visual disability
 audio CAPTCHAs were created
 create better audio CAPTCHA tasks
 safeness: the difference of recognition performance
 usability: mental workload of human in listening speech

2

Performance gap model
 performance of machine should be lower
 than the intelligibility of human
 gap: safeness 100
 should be large Human

Intelligibility (%)
 exposed ratio (ER)
 0%: random answer ASR
 chance-level; no gap
 100%: best guess
 easy for both; no gap
 practical condition
 0 < ER < 100
0 Exposed Ratio (%) 100
(Provided Information)
3

Safeness: ER control
 machine is becoming strong
 statistical ASR method is the mainstream
 supervised machine learning (Hidden Markov Models)
 teqniques to cope with the noise
 CAPTCHA tasks should be created systematically
 it should not be created by trial and error
 controllability of Exposed Ratio is essential
 Mixing-based method: best way to control ER?
 mixing noises / distorting signals
 can hide portion of information, however...
 difficult to measure the ER, performance is not easy to predict
 alternatives must be investiated
4

Usability: Mental workload
 CAPTCHAs should not increase mental workload
 the workload may increase, if they are..
 difficult to listen / memorize the task
 long task (many charactors)
 difficult to remember
 safer, but higher mental workload
 requirements
 information can be obtained in short time, easily
 investigation required
 human auditory sensation
 language cognition

5

Top-down knowledge
 incomplete stimulus
 knowledge helps to guess the information
 visual sensation:
 if part of image is missing, or part of the word is hidden
 common knowledge can complement image
 about the character and the vocabulary
 speech perception:
 if "word familiarity" is high: easy to guess
 phonemic restoration
 may help the human listening

6

Deletion-based method
 delete some parts on temporal axis little by little
 if every 30 msec over a period of 100 msec is replaced
with silence, the 30% of the information was deleted
 if the ratio of remained sections go down, the degree of
listening difficulty may increase.
 Exposed Ratio can be controlled easily
 however, not easy to understand....
deletion (original)

Festival engine
KAL (HMM-based)
7

Phonemic restration
 interrupted speech and noise maskers combined
 the fence effect
 continuity of speech signal perceived
 may help human listening
 does not affect machine performance
 expected to enlarge the gap
 performance difference of human and machine

deletion +
phonemic restration

8

NASA-TLX evaluation
 mental workload
 rating 6 subscales
 Mental, Physical, and Temporal
Demands, Frustration, Effort, and
Performance
 range: 0-100
 weights of subscales (6-1)
 for each participant
 placing an order
how the 6 dimensions are related
to personal definition of workload
 weighted workload (WWL)

9

Deletion vs Mixing (Exp1)
 objective: compare intelligibility and mental workload
 Deletion-Based Method (DBM)
 Mixing-Based Method (MBM)
 effect of SNR (signal-to-noise ratio) in MBM
 human intelligibility test
 75 utterances: 3,4,5 digits numbers (3 x 25)
 Japanese recorded speech
 subjects: 15 (5 x 3) undergraduate students
 mental workload (WWL) by NASA-TLX
 normalized within every subject
 their average and SD become 50 and 10 respectively

10

Setup (Exp1)
 compare DBM and MBM within a person
 acoustic presentation: given by headphone
 at the subject’s preferred reference loudness level
 MBM disturbing signals
 utterances of Japanese sentences
fragmented as short periods, shuffled and combined
Group Trial 1: D30 Trial 2: M0, Mm10, Mm20

G1 DBM 30% MBM SNR 0dB

G2 DBM 30% MBM SNR -10dB

G3 DBM 30% MBM SNR -20dB

11

Performance (Exp1)
DBM(T1)：marginally significant (p<0.1) (G1>G2)
DBM 30% task is harder than MBM 0dB, -10dB, -20dB
MBM(T2): effect of SNR conditions is significant, however,
only between 0dB & -10dB (p<0.05) (G1>G2)
DBM 30% vs DBM 30% vs DBM 30% vs
100 MBM 0dB MBM -10dB MBM -20dB

90

80

70

60

50

40 T1 T2

30
s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305

12

Workload (Exp1)
 WWL: individual difference cancelled
 subtraction of DBM (D30) score
from MBM (M0, Mm10 and Mm20) score was performed
DBM 30% vs MBM 30% vs DBM 30% vs
MBM 0dB MBM -10dB MBM -20dB
20
10
0
s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305
-10
-20
-30
WWL: MBM 0db < DBM 30% ?
-40
-50
no significance (ANOVA)
-60
MBM: task difficulty is not easy to control

13

Human vs Machine (Exp2)
 deletion-based method (DBM) is evaluated
 automatic speech recognition using HMM
 task: numbers (1-7 digits) in Japanese
 training: 8440 uttrances, 18 states, 20 mixtures
 evaluation: 1001 utterances, sentence recognition
 human intelligibility test
 75 utterances: 3,4,5 digits numbers (3 x 25)
 subjects: 17 undergraduate students
 mental workload (WWL) by NASA-TLX
 normalized within every subject

14

Results (Exp2)
 DBM: Exposed Ratio can controll the gap size
100 70

90 Workload

60
80

70
50
60

50 Human Ave. (%) 40

40 Machine (%)
30
30
30% 50% 70%
30% 50% 70%

DBM 30%
gap is very large, however,
Significant diffrerence (p<0.05) workload is very high.

15

Conclusion
 audio CAPTCHA task using phonemic restration
 deletion-based method (DBM)
 evaluation of CAPTCHA task
 performance + mental workload (NASA-TLX)
 comparison between DBM and MBM
 DBM: easier to controll the task
 future works
 ASR evaluation of mixing-based method
 improve the noise
 investigation of phonemic restration
 really improving performance? only decreasing workload?
 word familiarity, speech rate, synthesized speech, ...
16

Nishimoto icchp2010

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Nishimoto icchp2010

Similaire à Nishimoto icchp2010 (20)

Plus de Takuya Nishimoto

Plus de Takuya Nishimoto (20)

Dernier

Dernier (20)

Nishimoto icchp2010