SlideShare une entreprise Scribd logo
American Sign
Language
Recognizer
By Ming Rutar
ASL Recognizer is a Udacity AI Course Project
Udacity is an online school founded by top AI gurus. http://www.udacity.com
Zillion ideas
floating in
academia
world
Few ideas
made to
Industry
Industry Cutting
Edge
Technologies
Science/Theory
Udacity teaches cutting-edge technologies with
academic depth and hands-on practices on
technologies
Technology/Practice
❖ A course lasts 3 - 6 months with
3-7 projects.
❖ The projects are product-like.
❖ Focus on core technologies and
provide helpers on utilitive tasks,
such as environment setup.
❖ Very active online communities.
Course instructors also
participate.
❖ Student projects are reviewed by
experts of the subject matter.
❖ If one had graduated, he/she can
always access the course
materials, which are adhered
with the technology trend and
updated accordingly.
❖ Affordable price.
The task
The overall goal of this project is to build a word recognizer for American Sign Language video
sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden
Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign
Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the
right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test
sets are pre-defined. You will derive a variety of feature sets
The Dataset
We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic
after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as
Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are
preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this:
X
Y
nx
ny
lx
rx
ly
ry
More about the data
The training input file:
video,speaker,word,startframe,endframe
1,woman-1,JOHN,8,17
1,woman-1,WRITE,22,50
1,woman-1,HOMEWORK,51,77
3,woman-2,IX-1P,4,11
3,woman-2,SEE,12,20
3,woman-2,JOHN,20,31
3,woman-2,YESTERDAY,31,40
3,woman-2,IX,44,52
4,woman-1,JOHN,2,13
4,woman-1,IX-1P,13,18
4,woman-1,SEE,19,27
4,woman-1,IX,28,35
4,woman-1,YESTERDAY,36,47
5,woman-2,LOVE,12,21
The test input file:
video,speaker,word,startframe,endframe
2,woman-1,JOHN,7,20
2,woman-1,WRITE,23,36
2,woman-1,HOMEWORK,38,63
7,man-1,JOHN,22,39
7,man-1,CAN,42,47
7,man-1,GO,48,56
7,man-1,CAN,62,73
12,woman-2,JOHN,9,15
12,woman-2,CAN,19,24
12,woman-2,GO,25,34
12,woman-2,CAN,35,51
21,woman-2,JOHN,6,26
the training data contains 112 unique words; test data contains 66 unique words; in test data, we
have 40 sentences made of 178 words.l
Feature Extraction
Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to
select features. Examples:
X
Y
g-ly
g-ry
g-rx
g-lx
Feature_ground
features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly']
asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y']
asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x']
...
X
rr
ltheta
lr
rtheta
feature_polar
features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta']
asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2)
asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y'])
...
HMMLearn
HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be
represented in Bayesian network:
We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word
‘Chocolate’ with different number of hidden states
● We initiate the class with number of hidden states,
number of iteration and more, see reference at
http://hmmlearn.readthedocs.io/en/latest/api.html#hm
mlearn.hmm.GaussianHMM
● for training we call method fit() and pass in the training
data, it returns itself.
● for inference, we call method score() with the word, it
emits a float that indicates the likelihood of input.
How do we do it
● We train the model one word at time with the training data.
● The words are encoded by associated with a unique integer, the word id
● A word has an associated list of feature set
● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then
select the best model for the word
● So after training, each word has a model.
● We test the models by building a recognizer that
○ Pick a feature and a model, test them with full sentences:
■ For each word in a sentence, ‘reading’ feature set
■ Pick the model with highest score model
■ From the model we find the word id
○ We decode the sequence of word id to a sentence
○ Company the synthesized sentence with the original sentence and get the Error Rate
● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
Model Selection
The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We
improve the model selection by use 2 popular information criteria:
● Bayesian information criteria (BIC)
○ The purpose is to punish the word with longer seq to prevent overfit.
○ BIC = −2 log L + p log N
■ where p is a parameter, L is Gausian score, N is the hmm length of the word.
■ p is very magical!!!
■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf
● Discriminative Information Criterion (DIC)
○ DIC scores the discriminant ability of a training set for one word against competing words.
Testing and Output
model_selector=SelectorBIC_orig, features=scale_podel
**** WER = 0.43258426966292135
Total correct: 101 out of 178
Video Recognized Correct
=====================================================================================================
2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK
7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN
12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN
21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN
25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX
28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX
30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX
36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1
40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE
43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE
50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD
54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE
57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY
67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE
71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY
74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY
77: *JOHN BLAME MARY ANN BLAME MARY
The Results
features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta
by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words

Contenu connexe

Similaire à American sign language recognizer

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document file
sheiblu
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS
 
Data Science
Data Science Data Science
Data Science
University of Sindh
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
RahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
Ganesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
kalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
Aravind Reddy
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET Journal
 
DP Project Report
DP Project ReportDP Project Report
DP Project Report
Chawal Ukesh
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
lbishal
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
Syeda Nasiha
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
KP Kshitij Parashar
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf
vanithagp1
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
Lu Jiang
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
台灣資料科學年會
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind-slides
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
Andy Zaidman
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event Categorization
Editor IJCATR
 

Similaire à American sign language recognizer (20)

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document file
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
DP Project Report
DP Project ReportDP Project Report
DP Project Report
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event Categorization
 

Dernier

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 

Dernier (20)

Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 

American sign language recognizer

  • 2. ASL Recognizer is a Udacity AI Course Project Udacity is an online school founded by top AI gurus. http://www.udacity.com Zillion ideas floating in academia world Few ideas made to Industry Industry Cutting Edge Technologies Science/Theory Udacity teaches cutting-edge technologies with academic depth and hands-on practices on technologies Technology/Practice ❖ A course lasts 3 - 6 months with 3-7 projects. ❖ The projects are product-like. ❖ Focus on core technologies and provide helpers on utilitive tasks, such as environment setup. ❖ Very active online communities. Course instructors also participate. ❖ Student projects are reviewed by experts of the subject matter. ❖ If one had graduated, he/she can always access the course materials, which are adhered with the technology trend and updated accordingly. ❖ Affordable price.
  • 3. The task The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test sets are pre-defined. You will derive a variety of feature sets
  • 4. The Dataset We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this: X Y nx ny lx rx ly ry
  • 5. More about the data The training input file: video,speaker,word,startframe,endframe 1,woman-1,JOHN,8,17 1,woman-1,WRITE,22,50 1,woman-1,HOMEWORK,51,77 3,woman-2,IX-1P,4,11 3,woman-2,SEE,12,20 3,woman-2,JOHN,20,31 3,woman-2,YESTERDAY,31,40 3,woman-2,IX,44,52 4,woman-1,JOHN,2,13 4,woman-1,IX-1P,13,18 4,woman-1,SEE,19,27 4,woman-1,IX,28,35 4,woman-1,YESTERDAY,36,47 5,woman-2,LOVE,12,21 The test input file: video,speaker,word,startframe,endframe 2,woman-1,JOHN,7,20 2,woman-1,WRITE,23,36 2,woman-1,HOMEWORK,38,63 7,man-1,JOHN,22,39 7,man-1,CAN,42,47 7,man-1,GO,48,56 7,man-1,CAN,62,73 12,woman-2,JOHN,9,15 12,woman-2,CAN,19,24 12,woman-2,GO,25,34 12,woman-2,CAN,35,51 21,woman-2,JOHN,6,26 the training data contains 112 unique words; test data contains 66 unique words; in test data, we have 40 sentences made of 178 words.l
  • 6. Feature Extraction Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to select features. Examples: X Y g-ly g-ry g-rx g-lx Feature_ground features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly'] asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y'] asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x'] ... X rr ltheta lr rtheta feature_polar features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta'] asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2) asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y']) ...
  • 7. HMMLearn HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be represented in Bayesian network: We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word ‘Chocolate’ with different number of hidden states ● We initiate the class with number of hidden states, number of iteration and more, see reference at http://hmmlearn.readthedocs.io/en/latest/api.html#hm mlearn.hmm.GaussianHMM ● for training we call method fit() and pass in the training data, it returns itself. ● for inference, we call method score() with the word, it emits a float that indicates the likelihood of input.
  • 8. How do we do it ● We train the model one word at time with the training data. ● The words are encoded by associated with a unique integer, the word id ● A word has an associated list of feature set ● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then select the best model for the word ● So after training, each word has a model. ● We test the models by building a recognizer that ○ Pick a feature and a model, test them with full sentences: ■ For each word in a sentence, ‘reading’ feature set ■ Pick the model with highest score model ■ From the model we find the word id ○ We decode the sequence of word id to a sentence ○ Company the synthesized sentence with the original sentence and get the Error Rate ● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
  • 9. Model Selection The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We improve the model selection by use 2 popular information criteria: ● Bayesian information criteria (BIC) ○ The purpose is to punish the word with longer seq to prevent overfit. ○ BIC = −2 log L + p log N ■ where p is a parameter, L is Gausian score, N is the hmm length of the word. ■ p is very magical!!! ■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf ● Discriminative Information Criterion (DIC) ○ DIC scores the discriminant ability of a training set for one word against competing words.
  • 10. Testing and Output model_selector=SelectorBIC_orig, features=scale_podel **** WER = 0.43258426966292135 Total correct: 101 out of 178 Video Recognized Correct ===================================================================================================== 2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK 7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN 12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN 21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN 25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX 28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX 30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX 36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1 40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE 43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE 50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD 54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE 57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY 67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE 71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY 74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY 77: *JOHN BLAME MARY ANN BLAME MARY
  • 11. The Results features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words