SlideShare a Scribd company logo
1 of 42
NLP Classifier
Models & Metrics
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
OUTLINE
Models
• Tfidf features
• Word2vec features
• Simple feedforward NN classifier
• CNN
• Word based
• Character based
• Siamese Networks
Metrics
Text Classification
Text Pre - processing Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall performance
• Training Data Collection / Examples
of classes that we are trying to model
• Model performance is directly
correlated with quality of training
data
• Model selection
• Architecture
• Parameter Tuning
User
Online
Model Evaluation
Applicable for text based classifications
• Removing special characters.
• Cleaning numbers.
• Removing misspellings
 Peter Norvig’s spell checker.
https://norvig.com/spell-correct.html
 Using Google word2vec vocabulary to identify
misspelled words.
https://mlwhiz.com/blog/2019/01/17/deeple
arning_nlp_preprocess/
• Removing contracted words --- contraction_dict =
{"ain't": "is not", "aren't": "are not","can't":
"cannot”, …}
Preprocessing!
--- Project
specific
TFIDF Features
• ngram_range: (1,3) --- implies unigrams,
bigrams, and trigrams will be taken into account
while creating features.
• min_df: Minimum no of time an ngram should
appear in a corpus to be used as a feature.
Tfidf features can be used with any ML classifier such as LR
When using LR for NLP tasks L1 regularization performs
better since tfidf features are sparse.
Transfer Learning – word2vec features
either using context to predict a target word (a method
known as continuous bag of words, or CBOW), or using a
word to predict a target context, which is called skip-gram
https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1
Applying tfidf weighting to word vectors boosts overall model performance
https://towardsdatascience.com/supercharging-word-vectors-be80ee5513d
Feed forward Neural
Network
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
Neural Network
a1
a2
a3
• Each node is a function with input
and output vectors
• Every network structure is defined
by a set of functions
Output Layer
• Loss is minimized using
Gradient Descent
• Find network parameters
such that the loss is
minimized
• This is done by taking
derivatives of the loss wrt
parameters.
• Next the parameters are
updated by subtracting
learning rate times the
derivative
Commonly
used loss
functions
• Mean Squared Error Loss
• Mean Squared Logarithmic Error Loss
• Mean Absolute Error Loss
Regression Loss Functions
• Binary Cross-Entropy
• Hinge Loss
• Squared Hinge Loss
Binary Classification Loss Functions
• Multi-Class Cross-Entropy Loss
• Sparse Multiclass Cross-Entropy Loss
• Kullback Leibler Divergence Loss
Multi-Class Classification Loss Functions
Cost Function
– Cross
Entropy
Dropout -- avoid overfitting
• Large weights in a neural network are a
sign of a more complex network that has
overfit the training data.
• Probabilistically dropping out nodes in the
network is a simple and effective
regularization method.
• A large network with more training and the
use of a weight constraint are suggested
when using dropout.
Activation
Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
Activation
Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
Activation
Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
Activation
Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
a = max(0,z)
Activation
Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
Text Data
Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
Text Pre-processing with Keras
PaddingTokenizing
Start with an Embedding Layer
• Embedding Layer of Keras which takes the previously calculated integers and
maps them to a dense vector of the embedding.
o Parameters
 input_dim: the size of the vocabulary
 output_dim: the size of the dense vector
 input_length: the length of the sequence
Hope to see you soon
Nice to see you again
After training
https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
Add a pooling layer
• MaxPooling1D/AveragePooling1D or
a GlobalMaxPooling1D/GlobalAveragePooling1D layer
• way to downsample (a way to reduce the size of) the incoming
feature vectors.
• Global max/average pooling takes the maximum/average of all
features whereas in the other case you have to define the pool size.
Definition of
the entire
model
Training
Using pre-trained word embeddings will lead to an accuracy of
0.82. This is a case of transfer learning.
https://realpython.com/python-keras-text-classification
Convolution Neural
Network
Detect features ! Downsample.
What is a CNN?
In a traditional feedforward neural network we connect each
input neuron to each output neuron in the next layer. That’s
also called a fully connected layer, or affine layer.
• We use convolutions over the input layer to compute the
output. This results in local connections, where each region
of the input is connected to a neuron in the output. Each
layer applies different filters and combines the result
• During the training phase, a CNN automatically learns the
values of its filters based on the task you want to perform.
• Inputs --- n_filters, kernel size (=2)
Model definition
Character based CNN
https://towardsdatascience.com/character-level-cnn-with-keras-50391c3adf33
Advantages
of CNN
• Character Based CNN
• Has the ability to deal with out of vocabulary
words. This makes it particularly suitable for
user generated raw text.
• Works for multiple languages.
• Model size is small since the tokens are
limited to the number of characters ~ 70.
This makes real life deployments easier and
faster.
• Does not need a lot of data cleaning
• Networks with convolutional and pooling layers
are useful for classification tasks in which we
expect to find strong local clues regarding class
membership.
https://machinelearningmastery.com/best-practices-document-classification-deep-learning/
Siamese Networks
Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. ---- they
have the same configuration, the same parameters & weights. Parameter updating is mirrored across both subnetworks.
• More Robust to class Imbalance
• Ensembling with classifier yields
better results.
• Creates more meaningful
embeddings.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Multi-task Modeling
CNN
Model
CNN
Model
Cross Entropy Loss
Output
Question
Q A
Answer
Similarity Function
Question/Answer
CNN
Model
Softmax -- # of courses
Cross Entropy Loss
Output
Two tasks
• Similarity between
question and answer.
• Classification of courses
Performance Metrics
Is the model good enough?
Classification
https://en.wikipedia.org/wiki/Precision_and_recall
Precision : TP/(TP+FP) --- what percentage of the positive class
is actually positive?
Recall : TP/(TP+FN) --- what percentage of the positive class
gets captured by the model?
Accuracy --- (TP+TN)/(TP+FP+TN+FN) --- what percentage of
predictions are correct?
Thresholding --- Coverage
In a binary classification if you choose randomly the probability of belonging to a class is 0.5
0.3
0.7
It is possible improve the percentage of
correct results at the cost of coverage.
Confusion Matrix
ROC & AUC
ROC – Reciever Operating Characteristics
An ROC curve (receiver operating characteristic curve) is a graph
showing the performance of a classification model at all
classification thresholds.
AUC – Area Under the Curve.
• AUC is scale-invariant. It measures how well predictions
are ranked, rather than their absolute values.
• AUC is classification-threshold-invariant. It measures the
quality of the model's predictions irrespective of what
classification threshold is chosen.
• Works better for imbalanced datasets.
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
• TPR = TP/(TP+FN)
• FPR = FP/(FP+TN)
Random
https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy
Summary
• Tfidf & word2vec provide simple feature extraction techniques
• As the amount of training data increases using deeplearning is logical
• Feed forward Network
• CNN
• Siamese Networks
• It is important to determine which metrics are important before
training data collection and modeling.
Thank You
@sangha_deb
sangha123@gmail.com
Word Vectors with
Context!
• In a context free embedding ”crisp” in sentence “The morning air is
getting crisp” and “getting burned to a crisp” would have the same
vector: f(crisp)
• In a context aware model the embedding would be specific to the
would be augmented by the context in which it appears.
• f(crisp, context)
https://www.gocomics.com/frazz/
Bert features
https://towardsdatascience.com/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b

More Related Content

What's hot

Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
A Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer ModelA Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer Modeltaeseon ryu
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overviewananth
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!taeseon ryu
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShahar Cohen
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 

What's hot (20)

Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Word embedding
Word embedding Word embedding
Word embedding
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
C3 w5
C3 w5C3 w5
C3 w5
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
C3 w1
C3 w1C3 w1
C3 w1
 
A Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer ModelA Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer Model
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 

Similar to NLP Classifier Models & Metrics

Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudAnima Anandkumar
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
Lec16 - Autoencoders.pptx
Lec16 - Autoencoders.pptxLec16 - Autoencoders.pptx
Lec16 - Autoencoders.pptxSameer Gulshan
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 

Similar to NLP Classifier Models & Metrics (20)

presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloud
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
Lec16 - Autoencoders.pptx
Lec16 - Autoencoders.pptxLec16 - Autoencoders.pptx
Lec16 - Autoencoders.pptx
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (12)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

NLP Classifier Models & Metrics

  • 1. NLP Classifier Models & Metrics Sanghamitra Deb Staff Data Scientist Chegg Inc
  • 2. OUTLINE Models • Tfidf features • Word2vec features • Simple feedforward NN classifier • CNN • Word based • Character based • Siamese Networks Metrics
  • 3. Text Classification Text Pre - processing Collecting Training Data Model Building Offline SME • Reduces noise • Ensures quality • Improves overall performance • Training Data Collection / Examples of classes that we are trying to model • Model performance is directly correlated with quality of training data • Model selection • Architecture • Parameter Tuning User Online Model Evaluation
  • 4. Applicable for text based classifications • Removing special characters. • Cleaning numbers. • Removing misspellings  Peter Norvig’s spell checker. https://norvig.com/spell-correct.html  Using Google word2vec vocabulary to identify misspelled words. https://mlwhiz.com/blog/2019/01/17/deeple arning_nlp_preprocess/ • Removing contracted words --- contraction_dict = {"ain't": "is not", "aren't": "are not","can't": "cannot”, …} Preprocessing! --- Project specific
  • 5. TFIDF Features • ngram_range: (1,3) --- implies unigrams, bigrams, and trigrams will be taken into account while creating features. • min_df: Minimum no of time an ngram should appear in a corpus to be used as a feature. Tfidf features can be used with any ML classifier such as LR When using LR for NLP tasks L1 regularization performs better since tfidf features are sparse.
  • 6. Transfer Learning – word2vec features either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1 Applying tfidf weighting to word vectors boosts overall model performance https://towardsdatascience.com/supercharging-word-vectors-be80ee5513d
  • 10. Neural Network a1 a2 a3 • Each node is a function with input and output vectors • Every network structure is defined by a set of functions
  • 12. • Loss is minimized using Gradient Descent • Find network parameters such that the loss is minimized • This is done by taking derivatives of the loss wrt parameters. • Next the parameters are updated by subtracting learning rate times the derivative
  • 13. Commonly used loss functions • Mean Squared Error Loss • Mean Squared Logarithmic Error Loss • Mean Absolute Error Loss Regression Loss Functions • Binary Cross-Entropy • Hinge Loss • Squared Hinge Loss Binary Classification Loss Functions • Multi-Class Cross-Entropy Loss • Sparse Multiclass Cross-Entropy Loss • Kullback Leibler Divergence Loss Multi-Class Classification Loss Functions
  • 15. Dropout -- avoid overfitting • Large weights in a neural network are a sign of a more complex network that has overfit the training data. • Probabilistically dropping out nodes in the network is a simple and effective regularization method. • A large network with more training and the use of a weight constraint are suggested when using dropout.
  • 16. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu
  • 17. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu
  • 18. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu
  • 19. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu a = max(0,z)
  • 20. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu
  • 21. Text Data Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
  • 22. Text Pre-processing with Keras PaddingTokenizing
  • 23. Start with an Embedding Layer • Embedding Layer of Keras which takes the previously calculated integers and maps them to a dense vector of the embedding. o Parameters  input_dim: the size of the vocabulary  output_dim: the size of the dense vector  input_length: the length of the sequence Hope to see you soon Nice to see you again After training https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
  • 24. Add a pooling layer • MaxPooling1D/AveragePooling1D or a GlobalMaxPooling1D/GlobalAveragePooling1D layer • way to downsample (a way to reduce the size of) the incoming feature vectors. • Global max/average pooling takes the maximum/average of all features whereas in the other case you have to define the pool size.
  • 26. Training Using pre-trained word embeddings will lead to an accuracy of 0.82. This is a case of transfer learning. https://realpython.com/python-keras-text-classification
  • 28. What is a CNN? In a traditional feedforward neural network we connect each input neuron to each output neuron in the next layer. That’s also called a fully connected layer, or affine layer. • We use convolutions over the input layer to compute the output. This results in local connections, where each region of the input is connected to a neuron in the output. Each layer applies different filters and combines the result • During the training phase, a CNN automatically learns the values of its filters based on the task you want to perform. • Inputs --- n_filters, kernel size (=2)
  • 31. Advantages of CNN • Character Based CNN • Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user generated raw text. • Works for multiple languages. • Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life deployments easier and faster. • Does not need a lot of data cleaning • Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership. https://machinelearningmastery.com/best-practices-document-classification-deep-learning/
  • 32. Siamese Networks Siamese neural network is a class of neural network architectures that contain two or more identical subnetworks. ---- they have the same configuration, the same parameters & weights. Parameter updating is mirrored across both subnetworks. • More Robust to class Imbalance • Ensembling with classifier yields better results. • Creates more meaningful embeddings.
  • 33. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Multi-task Modeling CNN Model CNN Model Cross Entropy Loss Output Question Q A Answer Similarity Function Question/Answer CNN Model Softmax -- # of courses Cross Entropy Loss Output Two tasks • Similarity between question and answer. • Classification of courses
  • 34. Performance Metrics Is the model good enough?
  • 35. Classification https://en.wikipedia.org/wiki/Precision_and_recall Precision : TP/(TP+FP) --- what percentage of the positive class is actually positive? Recall : TP/(TP+FN) --- what percentage of the positive class gets captured by the model? Accuracy --- (TP+TN)/(TP+FP+TN+FN) --- what percentage of predictions are correct?
  • 36. Thresholding --- Coverage In a binary classification if you choose randomly the probability of belonging to a class is 0.5 0.3 0.7 It is possible improve the percentage of correct results at the cost of coverage.
  • 38. ROC & AUC ROC – Reciever Operating Characteristics An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. AUC – Area Under the Curve. • AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values. • AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen. • Works better for imbalanced datasets. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc • TPR = TP/(TP+FN) • FPR = FP/(FP+TN) Random https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy
  • 39. Summary • Tfidf & word2vec provide simple feature extraction techniques • As the amount of training data increases using deeplearning is logical • Feed forward Network • CNN • Siamese Networks • It is important to determine which metrics are important before training data collection and modeling.
  • 41. Word Vectors with Context! • In a context free embedding ”crisp” in sentence “The morning air is getting crisp” and “getting burned to a crisp” would have the same vector: f(crisp) • In a context aware model the embedding would be specific to the would be augmented by the context in which it appears. • f(crisp, context) https://www.gocomics.com/frazz/

Editor's Notes

  1. A single node corresponds to two operations, computation of z which is a linear combination of features (a) and weights (w) and computation of the activation function sigma(z).
  2. we connect each input neuron to each output neuron in the next layer.
  3. if y=1 and your predicts 0 you are penalized heavily. Conversely if y=0 and your model 1 the penalization is infinite.
  4. When you build your neural network, one of the choices you get to make is what  activation function to use in the hidden layers,  as well as what is the output units of your neural network.  So far, we've just been using the sigmoid activation function. 
  5. Sigmoid --- output layer because if y is either 0 or  1, then it makes sense for y hat to be a number,  the one to output that's between 0 and 1 rather than between minus 1 and 1
  6. Sigmoid --- output layer because if y is either 0 or  1, then it makes sense for y hat to be a number,  the one to output that's between 0 and 1 rather than between minus 1 and 1
  7. Sigmoid --- output layer because if y is either 0 or  1, then it makes sense for y hat to be a number,  the one to output that's between 0 and 1 rather than between minus 1 and 1
  8. Sigmoid --- output layer because if y is either 0 or  1, then it makes sense for y hat to be a number,  the one to output that's between 0 and 1 rather than between minus 1 and 1
  9. With CountVectorizer, we had stacked vectors of word counts, and each vector was the same length (the size of the total corpus vocabulary). With Tokenizer, the resulting vectors equal the length of each text, and the numbers don’t denote counts, but rather correspond to the word values from the dictionary tokenizer.word_index.
  10. Power of generalization --- embeddings are able to share information across similar features. Fewer nodes with zero values.
  11. We define two different task for optimization. One of them is to match the front of the card with the back of the card. We use the CNN model defined in the previous slide and use the dot product as the similarity function and use a cross entropy loss. For the classification problem we feed the CNN model into a softmax layer to predict the courses. Both tasks are optimized simultaneously.
  12. In a binary classification if you choose randomly the probability of belonging to a class is 0.5