[16.06.14] Auto Correction for Mobile Typing

Auto Correction
for
Mobile Typing
2016320172 Chan Ho Jun
2016320177 Hyeon Min Park
2016160040 Sun Mook Choi
2016-06-14 1

Contents
 Algorithm Research
 Nota Keyboard
 SwiftKey
 Conclusion
 Reference
2016-06-14 2

ALGORITHM RESEARCH
Chapter1
2016-06-14 3

Ultimate Goal of Spelling Correction
 Reducing spelling errors while the user types the same way
as before
 Reducing spelling errors that occur at borders between keys
2016-06-14 4

Cause of Spelling Error
 The difference among an individual’s touch distribution
 The difference between a key’s area of recognition and an
individual’s touch distribution
2016-06-14 5

Review
 Machine Learning
 Learnthroughtraining data
 Supervised Learning
 Knowinga user’s intentionis the key to spelling correction
 Supervisedmodel
- Refinedinput&answerinformation
2016-06-14 6

Review (Cont’d)
 Problem
 Difficult to differentiatewhich key the user pressedwhen he or she
pressesthe borderbetweenkeys
 Other Algorithms
 By trackingbackspace
- Inferringtheanswerinformation
- Learningthroughsupervisedlearning
 Low accuracy
2016-06-14 7

Semi-supervised Learning
 Supervised learning
 A small amountof labeleddata(the answerinformation)
 Unsupervised learning
 A large amountof unlabeleddata(the distributionof pressedkeys)
 A model that can learn without an answer information when
a user presses the borders between keys
2016-06-14 8

Clustering Algorithm
 Grouping similar objects into a same group
 Distribution-based clustering
 Gaussian mixture models
- UsingtheExpectation-Maximizationalgorithm
2016-06-14 9

Clustering Algorithm (Cont’d)
 Data near the key center
 Intendedthat key
 Used first-handto educatethe model
 Data on key borders
 Filed into the clustering algorithm
- Widenakey'sareaof recognition
2016-06-14 10

NOTA KEYBOARD
Chapter2
2016-06-14 11

Statistics
5.52% Errorrate
25.4%decreased
4.12%
292.0 press/min Inputspeed
4.8%increased
306.1 press/min
9.19% Backspaceinput
23.6%decreased
7.02%
2016-06-14 12

Usage Map
 5/8 ~ 6/10
2016-06-14 13

Correction Moment
2016-06-14 15

Problems or Limitations
 Not possible to suggest correction on a contextual basis
 When data set is small - High error rate when false data is
mistakenly input
2016-06-14 16

SWIFTKEY
Chapter3
2016-06-14 17

SwiftKey
 Natural Language Processing (NLP) for predictions and
spelling corrections
 Retroactive correction
2016-06-14 18

NLP – Types of Errors
 Non word error (NWE)
 bannana→ banana
 Real word error (RWE)
 Typographical
- two→ tow
 Cognitive
- two→ too
2016-06-14 19

Correction
 NWE
 RWE
Candidate
generation
Candidate
selection
Detect error
Candidate
generation
Candidate
selection
2016-06-14 20

Candidate Generation
 Words with similar spelling
 Words with similar pronunciation ( for RWE )
 The word itself ( for RWE )
2016-06-14 21

Words with similar spelling
 Smallest edit distance between words where the edits of
letters are
 Deletion
 Insertion
 Substitution
 Reversal(Transposition)
 80% to 95% of errors are within edit distance 1
2016-06-14 22

Example
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 23Jurafsky2012

Candidate Selection
 Select the candidate where the following is greatest:
𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑡𝑦𝑝𝑜
=
𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
𝑃(𝑡𝑦𝑝𝑜)
≈ 𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
Bayes’ Theorem
Error Model Language Model
2016-06-14 24

Candidate Selection
Language Model
 Unigram Model
 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
 The ratio of the frequencyof 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 and the total count of wordsin
the training set
 n-gram Model
 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒|𝑤𝑜𝑟𝑑1,…, 𝑤𝑜𝑟𝑑 𝑛−1)
 The ratioof the frequencyof 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 with consideringn-1words
surroundingthe training set
2016-06-14 25

Candidate Selection
Error Model
 Noisy Channel Model
Kernighan,Church,Gale1990
𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ≈
𝑑𝑒𝑙 𝑐𝑖−1, 𝑐𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1 𝑐𝑖]
, if deletion
𝑑𝑒𝑙 𝑐𝑖−1, 𝑡𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1]
, if insertion
𝑑𝑒𝑙 𝑡𝑖, 𝑐𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖]
, if substitution
𝑟𝑒𝑣 𝑐𝑖, 𝑐𝑖+1
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖 𝑐𝑖+1]
, if reversal
𝑑𝑒𝑙[𝑥,𝑦]:countof 𝑥𝑦typedas 𝑥
𝑎𝑑𝑑[𝑥,𝑦]:countof 𝑥 typedas 𝑥𝑦
𝑠𝑢𝑏[𝑥,𝑦]:countof 𝑥 typedas 𝑦
𝑟𝑒𝑣[𝑥,𝑦]:countof 𝑥𝑦typedas 𝑦𝑥
𝑐𝑖 :theeditletterincorrection
𝑡𝑖 :theeditletterintypo
𝑐𝑜𝑢𝑛𝑡[𝑥]:countof 𝑥 intrainingset
𝑐𝑜𝑢𝑛𝑡[𝑥𝑦]:countof 𝑥𝑦intrainingset
2016-06-14 26

2016-06-14 27Kernighan,Church,Gale1990

2016-06-14 28Kernighan,Church,Gale1990

Example
Jurafsky2012
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 29

Candidate Selection
Example (LanguageModel:Unigram,ErrorModel:NoisyChannelModel)
Candidate Frequency P(Candidate) P(Typo|Candidate) P(Typo|Candidate)P(Candidate)
actress 9321 .0000230573 .000117000 2.7000×10-9
cress 220 .0000005442 .000001440 .00078×10-9
caress 686 .0000016969 .000001640 .00280×10-9
access 37038 .0000916207 .000000209 .01900×10-9
across 120844 .0002989314 .000009300 2.8000×10-9
acres 12874 .0000318463 .000032100 1.0000×10-9
acres 12874 .0000318463 .000034200 1.0000×10-9
Usingtrainingsetof Corpusof ContemporaryEnglish(400 millionwords)
2016-06-14 30Jurafsky2012

Candidate Selection
Example (LanguageModel:Bigram)
 “… a stellar and versatile acress whose combination of sass
and glamour …”
Usingtrainingsetof Corpusof ContemporaryEnglish(400 millionwords)
P(actress|versatile)= .000021 P(whose|actress)= .0010
P(across|versatile)= .000021 P(whose|across)= .000006
P(versatile,actress,whose)= .000021× .001000= 210× 10-10
P(versatile,across,whose)= .000021× .000006= 1 × 10-10
2016-06-14 31Jurafsky2012

CONCLUSION
Chapter4
2016-06-14 32

Nota Keyboard SwiftKey
 Preventing typo’s  Correcting typo’s
2016-06-14 33

REFERENCE
Appendix
2016-06-14 34

Reference
 https://en.wikipedia.org/wiki/Semi-supervised_learning
 https://en.wikipedia.org/wiki/Cluster_analysis#Algorithms
 https://play.google.com/store/apps/details?id=com.notakeyboard&hl=ko
 Kernighan,MarkD.,KennethW.Church,and WilliamA.Gale.(1990).ASpellingCorrection
ProgramBasedonaNoisyChannelModel.
 Jurafsky,D.(2012).SpellingCorrectionandtheNoisyChannel.Lecture.RetrievedJune10,
2016,fromhttp://spark-public.s3.amazonaws.com/nlp/slides/spelling.pdf
2016-06-14 35

Thank You
Youcanlookagainthispresentationat
https://docs.com/kennyhm97/2659/16-06-14-auto-
correction-for-mobile-typing
2016-06-14 37

[16.06.14] Auto Correction for Mobile Typing

Recommandé

Recommandé

Contenu connexe

Plus de Hyeonmin Park

Plus de Hyeonmin Park (20)

Dernier

Dernier (20)

[16.06.14] Auto Correction for Mobile Typing

Notes de l'éditeur