SlideShare a Scribd company logo
1 of 16
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Developing Recommendation
System to provide a
Personalized
Learning experience at Chegg
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Outline • Recommendations at Chegg.
• Organizing Content – Knowledge Graph
• Deep Dive : Content Classifications
• Cross Product Recommendations.
• Takeaways.
2
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Recommendations at Chegg
3
Goal of Recommendations at Chegg is providing the best possible
learning experience to Students. This is fueled by high quality
content.
Recommender Systems provide a backbone to surface the most
relevant content to a student. Organizing content into a knowledge
graph and detecting patterns in student behavior helps us
personalize student experience.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Recommendations at Chegg
Chegg Study Home Page
4
Multiple services: text book
rentals, question answering,
online tutoring, flashcards,
writing, math solver, etc.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Knowledge Graph
Subject
Course
Course
Course
Concept
Concept
Concept
Sub-
concepts
Physics
Electricity
and
Magnetism
Mechanics
Quantum
Physics
Velocity
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Connecting conetnt to the Knowledge Graph
Subject
Course
Concept
Sub-
concepts
A rightward-moving bicycle increases its speed from 2.0 m/s to 12.0 m/s. Is the
bicycle accelerating?
Writing tools
Machine
Learning
Classifiers
Mitosis
a type of cell division that results in two daughter
cells each having the same number and kind of
chromosomes as the parent nucleus, typical of
ordinary tissue growth.
Get your physics paper checked by an expert
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Connecting users to the Knowledge Graph
Subject
Course
Concept
Sub-
concepts
A rightward-moving bicycle
increases its speed from 2.0 m/s
to 12.0 m/s. Is the bicycle
accelerating?
Writing
tools
Machine
Learning
Classifiers
Physics
101
Acceleration
Do you need help
writing a physics
paper?
Edges are created
between users and
Biology
Mitosis
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Content Classification Pipeline
Text Pre-processing Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall
performance
• Training Data Collection
/ Examples of classes that we are
trying to model
• Model performance is directly
correlated with quality of training
data
Model Evaluation
• Model selection
• Architecture
• Parameter
Tuning
Student
Online
8
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Classification Problem
Assigning decks to Courses
• Decks are list of cards grouped together by students
for studying.
• There are several thousand courses, typically it is
more granular than subjects but less granular than
concepts.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
• TFIDF features with an SVM classifier
Pros –
• Gives decent performance on small training data.
• Straightforward training pipeline.
Cons –
• Does not do well for subjects dominated by symbols,
• Including word & character based features makes the token space & model extremely large.
• Character Based CNN.
• Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user
generated raw text.
• Works for multiple languages.
• Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life
deployments easier and faster.
• Networks with convolutional and pooling layers are useful for classification tasks in which we expect
to find strong local clues regarding class membership.
Modeling Approaches
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
CNN Model Architecture
GlobalMaxPool1D
Convolutions
Feature
Length
DenseLayer
Dropout
Prelu
Norm
….
Convolution &
pool layer
….
2 layers of convolution & pooling
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Multi-task Modeling
CNN
Model
CNN
Model
Cross Entropy Loss
Output
Card Front
Front Back
Card Back
Similarity Function
Card
CNN
Model
Softmax -- # of courses
Cross Entropy Loss
Output
Two tasks
• Similarity between card
front and back.
• Classification of courses
Adding another task
improves the accuracy
by a few percent.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Model Performance
Top-3 -- 73% accuracy on offline-
test data.
Challenges
• Imbalanced Training
Data
• Some classes have
too few training
examples
Solutions
• Collect More training
data.
• Use rule based
techniques to augment
training data
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Cross Product Recommendations!
Cold Start Problem: Users often use one product such as Chegg Study and may
just browse other products that provide Chegg Practice or Flash Cards.
Solutions:
Personalized
• Content Filtering --- Use KG to determine courses, concepts and sub-concepts
that users are currently studying and recommend trending content in that
category.
• Text Similarity --- Based on their content engagement. Use in house language
models optimized for Chegg content.
Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
• Content drives recommendations
• High quality
• Relevant
• Organizing the content into a Knowledge Graph (KG) facilitates content
based recommendations.
• Accuracy of classifiers is important --- models are constantly iterated even
for few percent gain.
• KG helps connect students and courses/concepts which helps with
personalized recommendation
• Cross product recommendations are possible through KG.
• Cold Start problems are made easier.
Takeaways
Confidential Material / © 2019 Chegg, Inc. / All Rights Reserved
Questions

More Related Content

What's hot

[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현
[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현
[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현Hoyoung Choi
 
A Survey of Image Steganography
A Survey of Image SteganographyA Survey of Image Steganography
A Survey of Image SteganographyEditor IJCATR
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxVishalLabde
 
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011devCAT Studio, NEXON
 
구매 기록 데이터 기반 솔루션 제공
구매 기록 데이터 기반 솔루션 제공구매 기록 데이터 기반 솔루션 제공
구매 기록 데이터 기반 솔루션 제공정재 전
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
Bandwidth enhancement patch antenna
Bandwidth enhancement patch antennaBandwidth enhancement patch antenna
Bandwidth enhancement patch antennaAnurag Anupam
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processingDHIVYADEVAKI
 
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018devCAT Studio, NEXON
 
Introduction to computer graphics and multimedia
Introduction to computer graphics and multimediaIntroduction to computer graphics and multimedia
Introduction to computer graphics and multimediaShweta Shah
 
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)3G4G
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab codeBhushan Deore
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.ozlael ozlael
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Line Detection in Computer Vision
Line Detection in Computer VisionLine Detection in Computer Vision
Line Detection in Computer VisionParth Nandedkar
 
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기Yongha Kim
 
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018devCAT Studio, NEXON
 
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지강 민우
 

What's hot (20)

[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현
[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현
[NDC 14] 가죽 장화를 먹게 해달라니 - <야생의>의 자유도 높은 아이템 시스템 구현
 
Coco dataset
Coco datasetCoco dataset
Coco dataset
 
A Survey of Image Steganography
A Survey of Image SteganographyA Survey of Image Steganography
A Survey of Image Steganography
 
Audio steganography - LSB
Audio steganography - LSBAudio steganography - LSB
Audio steganography - LSB
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
 
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011
이원, 온라인 게임 프로젝트 개발 결산 - 마비노기 개발 완수 보고서, NDC2011
 
구매 기록 데이터 기반 솔루션 제공
구매 기록 데이터 기반 솔루션 제공구매 기록 데이터 기반 솔루션 제공
구매 기록 데이터 기반 솔루션 제공
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Bandwidth enhancement patch antenna
Bandwidth enhancement patch antennaBandwidth enhancement patch antenna
Bandwidth enhancement patch antenna
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processing
 
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
 
Introduction to computer graphics and multimedia
Introduction to computer graphics and multimediaIntroduction to computer graphics and multimedia
Introduction to computer graphics and multimedia
 
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)
3GPP SON Series: Cell Outage Detection and Compensation (COD & COC)
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab code
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Line Detection in Computer Vision
Line Detection in Computer VisionLine Detection in Computer Vision
Line Detection in Computer Vision
 
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기
[NDC 2010] 그럴듯한 랜덤 생성 컨텐츠 만들기
 
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018
윤석주, 인하우스 웹 프레임워크 Jul8 제작기, NDC2018
 
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
[IGC 2017] 아마존 구승모 - 게임 엔진으로 서버 제작 및 운영까지
 

Similar to Personalized Learning Recommendations at Chegg

cache teaching analogy dataa naylatics Download PDF(Updated Curriculum in Bo...
cache teaching  analogy dataa naylatics Download PDF(Updated Curriculum in Bo...cache teaching  analogy dataa naylatics Download PDF(Updated Curriculum in Bo...
cache teaching analogy dataa naylatics Download PDF(Updated Curriculum in Bo...Mayurkumarpatil1
 
3Edge Corporate Presentation
3Edge Corporate Presentation3Edge Corporate Presentation
3Edge Corporate Presentation3Edge
 
Discovering the New SuccessFactors LMS Admin Features
Discovering the New SuccessFactors LMS Admin FeaturesDiscovering the New SuccessFactors LMS Admin Features
Discovering the New SuccessFactors LMS Admin FeaturesAshton Plusquellec
 
Supervised learning
Supervised learningSupervised learning
Supervised learningankit_ppt
 
OpenEd 2013: Designing Open Badges and an Open Course to Enhance and Extend...
OpenEd  2013: Designing Open Badges and an Open Course  to Enhance and Extend...OpenEd  2013: Designing Open Badges and an Open Course  to Enhance and Extend...
OpenEd 2013: Designing Open Badges and an Open Course to Enhance and Extend...Dan Randall
 
Adhyyan presentation.pptx
Adhyyan presentation.pptxAdhyyan presentation.pptx
Adhyyan presentation.pptxRashmiM58
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...Edge AI and Vision Alliance
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)WeCloudData
 
ML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptxML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptxAltafSMT
 
Introduction to EMA highlights
Introduction to EMA highlightsIntroduction to EMA highlights
Introduction to EMA highlightsNick Bunyan
 
Monika_Bansal_resume
Monika_Bansal_resumeMonika_Bansal_resume
Monika_Bansal_resumeMonika Bansal
 
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...Bridging the Divide: High Technology in Low-resource Settings -- an update (S...
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...James BonTempo
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Bringing Blackboard to Bath
Bringing Blackboard to BathBringing Blackboard to Bath
Bringing Blackboard to Bathkateboardman
 
altafppt.pptx
altafppt.pptxaltafppt.pptx
altafppt.pptxAltafSMT
 
Improving the student experience using digital insights
Improving the student experience using digital insightsImproving the student experience using digital insights
Improving the student experience using digital insightsJisc
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
altafppt.pptx
altafppt.pptxaltafppt.pptx
altafppt.pptxAltafAS
 

Similar to Personalized Learning Recommendations at Chegg (20)

Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...
 
cache teaching analogy dataa naylatics Download PDF(Updated Curriculum in Bo...
cache teaching  analogy dataa naylatics Download PDF(Updated Curriculum in Bo...cache teaching  analogy dataa naylatics Download PDF(Updated Curriculum in Bo...
cache teaching analogy dataa naylatics Download PDF(Updated Curriculum in Bo...
 
3Edge Corporate Presentation
3Edge Corporate Presentation3Edge Corporate Presentation
3Edge Corporate Presentation
 
Discovering the New SuccessFactors LMS Admin Features
Discovering the New SuccessFactors LMS Admin FeaturesDiscovering the New SuccessFactors LMS Admin Features
Discovering the New SuccessFactors LMS Admin Features
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
OpenEd 2013: Designing Open Badges and an Open Course to Enhance and Extend...
OpenEd  2013: Designing Open Badges and an Open Course  to Enhance and Extend...OpenEd  2013: Designing Open Badges and an Open Course  to Enhance and Extend...
OpenEd 2013: Designing Open Badges and an Open Course to Enhance and Extend...
 
Adhyyan presentation.pptx
Adhyyan presentation.pptxAdhyyan presentation.pptx
Adhyyan presentation.pptx
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 
ML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptxML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptx
 
Introduction to EMA highlights
Introduction to EMA highlightsIntroduction to EMA highlights
Introduction to EMA highlights
 
Nagacv
NagacvNagacv
Nagacv
 
Monika_Bansal_resume
Monika_Bansal_resumeMonika_Bansal_resume
Monika_Bansal_resume
 
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...Bridging the Divide: High Technology in Low-resource Settings -- an update (S...
Bridging the Divide: High Technology in Low-resource Settings -- an update (S...
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Bringing Blackboard to Bath
Bringing Blackboard to BathBringing Blackboard to Bath
Bringing Blackboard to Bath
 
altafppt.pptx
altafppt.pptxaltafppt.pptx
altafppt.pptx
 
Improving the student experience using digital insights
Improving the student experience using digital insightsImproving the student experience using digital insights
Improving the student experience using digital insights
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
altafppt.pptx
altafppt.pptxaltafppt.pptx
altafppt.pptx
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (14)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 

Recently uploaded (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 

Personalized Learning Recommendations at Chegg

  • 1. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Developing Recommendation System to provide a Personalized Learning experience at Chegg Sanghamitra Deb Staff Data Scientist Chegg Inc
  • 2. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Outline • Recommendations at Chegg. • Organizing Content – Knowledge Graph • Deep Dive : Content Classifications • Cross Product Recommendations. • Takeaways. 2
  • 3. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Recommendations at Chegg 3 Goal of Recommendations at Chegg is providing the best possible learning experience to Students. This is fueled by high quality content. Recommender Systems provide a backbone to surface the most relevant content to a student. Organizing content into a knowledge graph and detecting patterns in student behavior helps us personalize student experience.
  • 4. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Recommendations at Chegg Chegg Study Home Page 4 Multiple services: text book rentals, question answering, online tutoring, flashcards, writing, math solver, etc.
  • 5. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Knowledge Graph Subject Course Course Course Concept Concept Concept Sub- concepts Physics Electricity and Magnetism Mechanics Quantum Physics Velocity
  • 6. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Connecting conetnt to the Knowledge Graph Subject Course Concept Sub- concepts A rightward-moving bicycle increases its speed from 2.0 m/s to 12.0 m/s. Is the bicycle accelerating? Writing tools Machine Learning Classifiers Mitosis a type of cell division that results in two daughter cells each having the same number and kind of chromosomes as the parent nucleus, typical of ordinary tissue growth. Get your physics paper checked by an expert
  • 7. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Connecting users to the Knowledge Graph Subject Course Concept Sub- concepts A rightward-moving bicycle increases its speed from 2.0 m/s to 12.0 m/s. Is the bicycle accelerating? Writing tools Machine Learning Classifiers Physics 101 Acceleration Do you need help writing a physics paper? Edges are created between users and Biology Mitosis
  • 8. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Content Classification Pipeline Text Pre-processing Collecting Training Data Model Building Offline SME • Reduces noise • Ensures quality • Improves overall performance • Training Data Collection / Examples of classes that we are trying to model • Model performance is directly correlated with quality of training data Model Evaluation • Model selection • Architecture • Parameter Tuning Student Online 8
  • 9. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Classification Problem Assigning decks to Courses • Decks are list of cards grouped together by students for studying. • There are several thousand courses, typically it is more granular than subjects but less granular than concepts.
  • 10. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved • TFIDF features with an SVM classifier Pros – • Gives decent performance on small training data. • Straightforward training pipeline. Cons – • Does not do well for subjects dominated by symbols, • Including word & character based features makes the token space & model extremely large. • Character Based CNN. • Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user generated raw text. • Works for multiple languages. • Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life deployments easier and faster. • Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership. Modeling Approaches
  • 11. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved CNN Model Architecture GlobalMaxPool1D Convolutions Feature Length DenseLayer Dropout Prelu Norm …. Convolution & pool layer …. 2 layers of convolution & pooling
  • 12. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Multi-task Modeling CNN Model CNN Model Cross Entropy Loss Output Card Front Front Back Card Back Similarity Function Card CNN Model Softmax -- # of courses Cross Entropy Loss Output Two tasks • Similarity between card front and back. • Classification of courses Adding another task improves the accuracy by a few percent.
  • 13. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Model Performance Top-3 -- 73% accuracy on offline- test data. Challenges • Imbalanced Training Data • Some classes have too few training examples Solutions • Collect More training data. • Use rule based techniques to augment training data
  • 14. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved Cross Product Recommendations! Cold Start Problem: Users often use one product such as Chegg Study and may just browse other products that provide Chegg Practice or Flash Cards. Solutions: Personalized • Content Filtering --- Use KG to determine courses, concepts and sub-concepts that users are currently studying and recommend trending content in that category. • Text Similarity --- Based on their content engagement. Use in house language models optimized for Chegg content.
  • 15. Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved • Content drives recommendations • High quality • Relevant • Organizing the content into a Knowledge Graph (KG) facilitates content based recommendations. • Accuracy of classifiers is important --- models are constantly iterated even for few percent gain. • KG helps connect students and courses/concepts which helps with personalized recommendation • Cross product recommendations are possible through KG. • Cold Start problems are made easier. Takeaways
  • 16. Confidential Material / © 2019 Chegg, Inc. / All Rights Reserved Questions

Editor's Notes

  1. I am going to talk about personalizing the learning experience at Chegg using recommendation systems.
  2. Here is an outline of the presentation.
  3. Chegg is a centralized learning platform where a student comes to learn concepts required for academic performance, job interviews or other activities. The goal of any RS is to present content that is of high quality and relevant, i.e we show them what they want to study. An example of that is --- lets say the student has data analyst job interview --- we know this from past user interactions , so we show the student content related to learning “SQL”.
  4. This is an example of student experience at Chegg. A student logs in and finds suggestions in Mechanical Engineering and Chemistry. As you can seem this model suggests textbook solutions for users based on their past behavior, and it is accompanied by the message "based on your progress”. Another example is a concept-based recommendation module in Study, which is placed below an expert answer that the student is viewing. I wanted to use this slide to give you a look into our content. As you can see most of our content is academic materials.
  5. Now I will Segway into how this content is organized. We have build a knowledge graph which represents a hierarchy of subjects, courses and concepts. The nodes in this graph is provided by subject matter experts. We constantly iterate on this graph as we get suggestions for more nodes and edges. The machine Learning component comes in when we create edges between concept nodes and content. How does this look?
  6. Here is an example of how we connect content from different products to the nodes of the knowledge graph.
  7. When users interact with the content we are able to connect users to a node of the knowledge graph. Since user interactions constantly change with time the degs between users and KG nodes are constantly updated.
  8. Lets now do a deepdive into content classification since that is the backbone of all the recommendations here.
  9. Convolution and pooling layers are good at picking up signature at n-gram level, i.e it is able to pick up when certain phrases are indicative of certain class memberships.
  10. The two layers ensure that the correlations between n-grams are picked up at two different scales.
  11. We define two different task for optimization. One of them is to match the front of the card with the back of the card. We use the CNN model defined in the previous slide and use the dot product as the similarity function and use a cross entropy loss. For the classification problem we feed the CNN model into a softmax layer to predict the courses. Both tasks are optimized simultaneously.