SlideShare une entreprise Scribd logo
1  sur  37
Auto Correction
for
Mobile Typing
2016320172 Chan Ho Jun
2016320177 Hyeon Min Park
2016160040 Sun Mook Choi
2016-06-14 1
Contents
 Algorithm Research
 Nota Keyboard
 SwiftKey
 Conclusion
 Reference
2016-06-14 2
ALGORITHM RESEARCH
Chapter1
2016-06-14 3
Ultimate Goal of Spelling Correction
 Reducing spelling errors while the user types the same way
as before
 Reducing spelling errors that occur at borders between keys
2016-06-14 4
Cause of Spelling Error
 The difference among an individual’s touch distribution
 The difference between a key’s area of recognition and an
individual’s touch distribution
2016-06-14 5
Review
 Machine Learning
 Learnthroughtraining data
 Supervised Learning
 Knowinga user’s intentionis the key to spelling correction
 Supervisedmodel
- Refinedinput&answerinformation
2016-06-14 6
Review (Cont’d)
 Problem
 Difficult to differentiatewhich key the user pressedwhen he or she
pressesthe borderbetweenkeys
 Other Algorithms
 By trackingbackspace
- Inferringtheanswerinformation
- Learningthroughsupervisedlearning
 Low accuracy
2016-06-14 7
Semi-supervised Learning
 Supervised learning
 A small amountof labeleddata(the answerinformation)
 Unsupervised learning
 A large amountof unlabeleddata(the distributionof pressedkeys)
 A model that can learn without an answer information when
a user presses the borders between keys
2016-06-14 8
Clustering Algorithm
 Grouping similar objects into a same group
 Distribution-based clustering
 Gaussian mixture models
- UsingtheExpectation-Maximizationalgorithm
2016-06-14 9
Clustering Algorithm (Cont’d)
 Data near the key center
 Intendedthat key
 Used first-handto educatethe model
 Data on key borders
 Filed into the clustering algorithm
- Widenakey'sareaof recognition
2016-06-14 10
NOTA KEYBOARD
Chapter2
2016-06-14 11
Statistics
5.52% Errorrate
25.4%decreased
4.12%
292.0 press/min Inputspeed
4.8%increased
306.1 press/min
9.19% Backspaceinput
23.6%decreased
7.02%
2016-06-14 12
Usage Map
 5/8 ~ 6/10
2016-06-14 13
Typing Video
2016-06-14 14
Correction Moment
2016-06-14 15
Problems or Limitations
 Not possible to suggest correction on a contextual basis
 When data set is small - High error rate when false data is
mistakenly input
2016-06-14 16
SWIFTKEY
Chapter3
2016-06-14 17
SwiftKey
 Natural Language Processing (NLP) for predictions and
spelling corrections
 Retroactive correction
2016-06-14 18
NLP – Types of Errors
 Non word error (NWE)
 bannana→ banana
 Real word error (RWE)
 Typographical
- two→ tow
 Cognitive
- two→ too
2016-06-14 19
Correction
 NWE
 RWE
Candidate
generation
Candidate
selection
Detect error
Candidate
generation
Candidate
selection
2016-06-14 20
Candidate Generation
 Words with similar spelling
 Words with similar pronunciation ( for RWE )
 The word itself ( for RWE )
2016-06-14 21
Candidate Generation
Words with similar spelling
 Smallest edit distance between words where the edits of
letters are
 Deletion
 Insertion
 Substitution
 Reversal(Transposition)
 80% to 95% of errors are within edit distance 1
2016-06-14 22
Candidate Generation
Example
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 23Jurafsky2012
Candidate Selection
 Select the candidate where the following is greatest:
𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑡𝑦𝑝𝑜
=
𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
𝑃(𝑡𝑦𝑝𝑜)
≈ 𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑃 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
Bayes’ Theorem
Error Model Language Model
2016-06-14 24
Candidate Selection
Language Model
 Unigram Model
 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)
 The ratio of the frequencyof 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 and the total count of wordsin
the training set
 n-gram Model
 𝑃(𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒|𝑤𝑜𝑟𝑑1,…, 𝑤𝑜𝑟𝑑 𝑛−1)
 The ratioof the frequencyof 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 with consideringn-1words
surroundingthe training set
2016-06-14 25
Candidate Selection
Error Model
 Noisy Channel Model
Kernighan,Church,Gale1990
𝑃 𝑡𝑦𝑝𝑜 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ≈
𝑑𝑒𝑙 𝑐𝑖−1, 𝑐𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1 𝑐𝑖]
, if deletion
𝑑𝑒𝑙 𝑐𝑖−1, 𝑡𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖−1]
, if insertion
𝑑𝑒𝑙 𝑡𝑖, 𝑐𝑖
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖]
, if substitution
𝑟𝑒𝑣 𝑐𝑖, 𝑐𝑖+1
𝑐𝑜𝑢𝑛𝑡[𝑐𝑖 𝑐𝑖+1]
, if reversal
𝑑𝑒𝑙[𝑥,𝑦]:countof 𝑥𝑦typedas 𝑥
𝑎𝑑𝑑[𝑥,𝑦]:countof 𝑥 typedas 𝑥𝑦
𝑠𝑢𝑏[𝑥,𝑦]:countof 𝑥 typedas 𝑦
𝑟𝑒𝑣[𝑥,𝑦]:countof 𝑥𝑦typedas 𝑦𝑥
𝑐𝑖 :theeditletterincorrection
𝑡𝑖 :theeditletterintypo
𝑐𝑜𝑢𝑛𝑡[𝑥]:countof 𝑥 intrainingset
𝑐𝑜𝑢𝑛𝑡[𝑥𝑦]:countof 𝑥𝑦intrainingset
2016-06-14 26
2016-06-14 27Kernighan,Church,Gale1990
2016-06-14 28Kernighan,Church,Gale1990
Candidate Generation
Example
Jurafsky2012
Typo Candidate ti ci Type
acress
actress t Deletion
cress a Insertion
caress ac ca Reversal
access r c Substitution
across e o Substitution
acres s Insertion
acres s Insertion
2016-06-14 29
Candidate Selection
Example (LanguageModel:Unigram,ErrorModel:NoisyChannelModel)
Candidate Frequency P(Candidate) P(Typo|Candidate) P(Typo|Candidate)P(Candidate)
actress 9321 .0000230573 .000117000 2.7000×10-9
cress 220 .0000005442 .000001440 .00078×10-9
caress 686 .0000016969 .000001640 .00280×10-9
access 37038 .0000916207 .000000209 .01900×10-9
across 120844 .0002989314 .000009300 2.8000×10-9
acres 12874 .0000318463 .000032100 1.0000×10-9
acres 12874 .0000318463 .000034200 1.0000×10-9
Usingtrainingsetof Corpusof ContemporaryEnglish(400 millionwords)
2016-06-14 30Jurafsky2012
Candidate Selection
Example (LanguageModel:Bigram)
 “… a stellar and versatile acress whose combination of sass
and glamour …”
Usingtrainingsetof Corpusof ContemporaryEnglish(400 millionwords)
P(actress|versatile)= .000021 P(whose|actress)= .0010
P(across|versatile)= .000021 P(whose|across)= .000006
P(versatile,actress,whose)= .000021× .001000= 210× 10-10
P(versatile,across,whose)= .000021× .000006= 1 × 10-10
2016-06-14 31Jurafsky2012
CONCLUSION
Chapter4
2016-06-14 32
Nota Keyboard SwiftKey
 Preventing typo’s  Correcting typo’s
2016-06-14 33
REFERENCE
Appendix
2016-06-14 34
Reference
 https://en.wikipedia.org/wiki/Semi-supervised_learning
 https://en.wikipedia.org/wiki/Cluster_analysis#Algorithms
 https://play.google.com/store/apps/details?id=com.notakeyboard&hl=ko
 Kernighan,MarkD.,KennethW.Church,and WilliamA.Gale.(1990).ASpellingCorrection
ProgramBasedonaNoisyChannelModel.
 Jurafsky,D.(2012).SpellingCorrectionandtheNoisyChannel.Lecture.RetrievedJune10,
2016,fromhttp://spark-public.s3.amazonaws.com/nlp/slides/spelling.pdf
2016-06-14 35
Q&A
2016-06-14 36
Thank You
Youcanlookagainthispresentationat
https://docs.com/kennyhm97/2659/16-06-14-auto-
correction-for-mobile-typing
2016-06-14 37

Contenu connexe

Plus de Hyeonmin Park

Plus de Hyeonmin Park (20)

[17.07.18] SCPC 1회 본선 - 트리
[17.07.18] SCPC 1회 본선 - 트리[17.07.18] SCPC 1회 본선 - 트리
[17.07.18] SCPC 1회 본선 - 트리
 
[16.06.25] 한글 배포용 문서의 모든 것
[16.06.25] 한글 배포용 문서의 모든 것[16.06.25] 한글 배포용 문서의 모든 것
[16.06.25] 한글 배포용 문서의 모든 것
 
[16.05.31] 컴퓨터학과 소개
[16.05.31] 컴퓨터학과 소개[16.05.31] 컴퓨터학과 소개
[16.05.31] 컴퓨터학과 소개
 
[16.05.10] 외로움은 스트레스가 아니라 카.페.인 때문이다
[16.05.10] 외로움은 스트레스가 아니라 카.페.인 때문이다[16.05.10] 외로움은 스트레스가 아니라 카.페.인 때문이다
[16.05.10] 외로움은 스트레스가 아니라 카.페.인 때문이다
 
[16.05.09] 동성애에 대한 인식과 우리의 바람직한 자세
[16.05.09] 동성애에 대한 인식과 우리의 바람직한 자세[16.05.09] 동성애에 대한 인식과 우리의 바람직한 자세
[16.05.09] 동성애에 대한 인식과 우리의 바람직한 자세
 
[16.05.11] KIST 청년 소프트웨어 프로젝트 @ 경기과학고등학교
[16.05.11] KIST 청년 소프트웨어 프로젝트 @ 경기과학고등학교[16.05.11] KIST 청년 소프트웨어 프로젝트 @ 경기과학고등학교
[16.05.11] KIST 청년 소프트웨어 프로젝트 @ 경기과학고등학교
 
[15.10.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
[15.10.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석[15.10.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
[15.10.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
 
[15.08.19] 존경하는 인물
[15.08.19] 존경하는 인물[15.08.19] 존경하는 인물
[15.08.19] 존경하는 인물
 
[15.09.09] Alphabet
[15.09.09] Alphabet[15.09.09] Alphabet
[15.09.09] Alphabet
 
[14.07.25] KENNYSOFT - 야구심판
[14.07.25] KENNYSOFT - 야구심판[14.07.25] KENNYSOFT - 야구심판
[14.07.25] KENNYSOFT - 야구심판
 
[14.07.09] STAC 2014
[14.07.09] STAC 2014[14.07.09] STAC 2014
[14.07.09] STAC 2014
 
[14.05.02] 5/2 (금)
[14.05.02] 5/2 (금)[14.05.02] 5/2 (금)
[14.05.02] 5/2 (금)
 
[15.08.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
[15.08.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석[15.08.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
[15.08.07] 슈퍼컴퓨터를 이용한 안드로이드 어플리케이션의 정적 분석
 
[15.05.22] 정보 전달하는 글 쓰고 읽기
[15.05.22] 정보 전달하는 글 쓰고 읽기[15.05.22] 정보 전달하는 글 쓰고 읽기
[15.05.22] 정보 전달하는 글 쓰고 읽기
 
[15.05.01] Bèzier Curve
[15.05.01] Bèzier Curve[15.05.01] Bèzier Curve
[15.05.01] Bèzier Curve
 
[15.04.27] 최고가격제와 최저가격제
[15.04.27] 최고가격제와 최저가격제[15.04.27] 최고가격제와 최저가격제
[15.04.27] 최고가격제와 최저가격제
 
[15.04.20] WEEK9_박현민_주원철
[15.04.20] WEEK9_박현민_주원철[15.04.20] WEEK9_박현민_주원철
[15.04.20] WEEK9_박현민_주원철
 
[15.03.12] 일차변환
[15.03.12] 일차변환[15.03.12] 일차변환
[15.03.12] 일차변환
 
[15.03.09] 행렬
[15.03.09] 행렬[15.03.09] 행렬
[15.03.09] 행렬
 
[15.02.05] 타일 채우기 4
[15.02.05] 타일 채우기 4[15.02.05] 타일 채우기 4
[15.02.05] 타일 채우기 4
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Dernier (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

[16.06.14] Auto Correction for Mobile Typing

Notes de l'éditeur

  1. Uses Natural Language Processing (NLP) to predict suggestions. The same algorithm is also used to correct spelling errors. Retroactively corrects words by selecting the best candidate out of a list of suggestions
  2. Spelling errors can be classified into two types of errors: non word errors and real word errors The difference between those non words and real words is whether the errored word is in the dictionary or not
  3. To correct a non word error you first have to detect it before. Like mentioned before, if the word is not in the dictionary then then the word is indeed a non word error. So in this case, the bigger the dictionary, the better in detection. Next, you generate a list of candidates. And finally, out of the candidates, you select the one which is the best.
  4. In the step of generating candidates we make a list of words that includes the following: words with similar spelling, words with similar pronunciation, and the word itself. The last two are for real word errors.
  5. In the case of words with similar spelling, we would find words in the dictionary that have the minimal edit distance between the errored word. The edit distance between two words is the total count of deletion, insertion, substitution, reversal or transposition that happened. It is statistically known that more than 80 percent of errors are within edit distance of 1. And almost all errors within 2. So for a simplified spell checker program, it would generate a list of words with edit distance of 1.
  6. Here is an example of a typo “A C R E S S” We have candidates: actress, cress, caress, access, across, acres, acress that have the edit distance of 1 We can see the types in this coloumn and the t sub i and c sub i which we will later mention again why we need that.
  7. Language Model: "how likely is candidate to appear in an English text?" Error Model: "how likely is it that the author would type typo by mistake when candidate was intended?"