SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data Science at Udemy
Larry Wai
Principal Data Scientist @Udemy
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Overview of talk
● What is data science?
● Udemy in a nutshell
● Data science projects at Udemy
● Data science work cycle
● What does it mean to be a data scientist?
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
What is data science?
data science in consumer internet = application of the scientific method using big data computational methods to
ascertain, predict, and utilize user behavior for business purposes
Inherits from three historical schools of thought
1. Research of natural phenomena using the scientific method
○ e.g. physics, astronomy
○ data science arises from substituting the study of natural phenomena with study of user behavior
2. Research of computational methods
○ e.g. mathematics, computer science
○ data science arises from pushing the limits of existing methods to compute that which could not be
computed before
3. Research of human behavior
○ e.g. economics, psychology
○ data science arises from applying big data to the study of microscopic human behavior, i.e. millions of
users x thousands of items = billions of user-item calculations
Other definitions (too general IMO):
● data science > statistics (only); stats does not require engineering skills
● data science > computer science (only); engineering does not require training in the scientific method
● data science > business analytics (only); analytics does not require engineering skills nor training in the scientific
method
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy in a nutshell
● consumer online education marketplace
● instructors get 50% of enrollment fee
● no certification requirements
● typical enrollment price point (paid) is $20-$40
● get to critical mass (instructors and students)
in each language through marketing
● above critical mass, leverage marketplace
(organic) driven growth
● Udemy currently has ~7 million students, ~30
thousand courses
● relevance of search and recommendations is
key to fostering growth
● learning goal data science is key to fostering
long term growth Google search trends for selected online education
companies
● Udemy (blue). Exponential marketplace growth.
● Coursera (yellow), Udacity (red), Lynda (green).
Incremental growth.
● note: this chart convinced me to join Udemy :)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy web site
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data science projects at Udemy
search & recommendation
● real time recommendation (web, mobile)
● real time search
● batch e-mail recommendation
learning goals
● course learning process optimization
● learning goal paths
● career learning goals
+ more projects
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Search and recommendation (in experiment)
Feature classes
● course historical averages
● personal historical behavior
● search term matching
Overall ranking strategy
● compute global score per visitor per
course per day
● consider modules as filters on the total
available inventory
● the module score will be the sum of the
global course scores for the top N
courses in the module
● individual courses are ranked within
each module according to the global
course score
course 1 course 2 course 3 course 4
course 5 course 6 course 7 course 8
course 9 course 10 course 11 course 12
module A
module B
module C
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Learning goals (conceptual stage)
Course learning goal clustering
● goals are hierarchical
● goals are linked
● goals are dynamic
Overall learning goal strategy
● continuously update learning goal
clustering
● quantify and evaluate student progress
towards learning goals
● identify learning goal paths according
to desired careers or hobbies
goal 1 goal 2 goal 3
goal 4 goal 5 goal 6course A
course B
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data science work cycle
experiment
setup
exploratory
analysis
model
deployment
model
building
data
collection ideal cycling time
is ~days to
~weeks
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Exploratory analysis
● data to be explored can in general be defined
as a multi-dimensional cube, a.k.a.
“hypercube”, where each side of the
hypercube is an exploratory “dimension” and
the “measures” of the user behavior are
aggregates in each cell
● the hypercube is the minimal representation
required for the exploratory analysis; e.g. we
minimize cardinality for continuous variables
● the human mind is unable to easily
comprehend more than 3 dimensions,
therefore exploratory analysis must be broken
down into actions which project the entire
hypercube onto different dimensions in
sequence
● goal for the analyst is to understand the multi-
dimensional user behavior, which may take
many projections in sequence (~100)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
model building
● platforms such as R allow us to leverage open
source modeling packages and compare
models with relatively low overhead
● most user behavior features are non-linear
and correlated; thus, the simplest “black box”
non-linear models which handle correlations
are practical to use, e.g. decision trees
● use residuals on holdout to validate model
model
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
model deployment
● standardized predictive model markup language (PMML) allows abstraction of models in deployment
● “plug-in” model deployment is agile because no new production code is needed for model updates
● shifts focus of algo development from production code development to data mining methods
● this approach allows a single person to build and deploy models quickly
● this approach is cutting edge and is being tested now at Udemy
create training dataset
create predictive
model, e.g. decision
trees, random forest
offline analysis;
residuals;
feature importance
loop through courses,
compute feature
vector per course
compute score per
course
sort by score
predictive model
store
(PMML format)
in memory model;
load on initialization;
periodic updates
model
building
model
deployment
model
storage
model
scoring
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
experiment setup
Practical requirements for experiments, a.k.a. A/B tests
● need enough users to measure an interesting
effect
● conversely, if an effect is not large enough to
measure, then it is not interesting, at least from a
data science point of view, and potentially from a
business point of view
● e.g. an interesting effect from a business point of
view would be +5% relative lift of conversion rate
● to achieve +5% relative lift at 95% confidence level
(on say typical 1 conversion per 10 sessions),
need to have 30,000 sessions in each of A and B
samples, i.e. >60,000 sessions
● ideally, would like to measure lift within ~days; so
need >60,000 sessions per day
● Udemy currently has >200,000 sessions per day
(but 2 years ago it was more like 20,000 sessions
per day, so 10x slower to run experiments)
1. smoke test (~few days)
○ 1% for test variant(s)
○ verify that nothing is broken
○ 40% CONTROL_1, 40% CONTROL_2
○ validate that control is setup correctly
2. initial ramp (~1 week)
○ 5-10% for test variant(s)
○ sizing depends upon whether we’ve tested
something like this before, and any
revenue concerns
3. intermediate ramp (~few weeks)
○ 25%-50% for test variant
○ 40%-50% for CONTROL_1
4. final ramp / launch
○ 90% for test variant
○ 10% for CONTROL_1 (optional); turn off
after a few weeks of monitoring
○ rename “test” as new baseline
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
data collection
● data should be collected at the most granular
level, e.g. typically per visitor per item per day
● data should be pre-arranged in a way which
facilitates fast hypercube production, i.e. star
schema
● most granular data is located at the star core
● experiment variants can be incorporated as
an additional dimension in one of the star
limbs
core table with
grouping fields
A, B, C
limb table with
grouping field
A
limb table with
grouping fields
A, B
limb table with
grouping field
B
limb table with
grouping fields
B, C
limb table with
grouping fields
A, B, D
mapping table
with grouping
field C and
other field D
“star schema”
(with intermediate mapping)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
What does it mean to be a data scientist?
A successful data scientist is somebody who can independently execute the entire data
science work cycle on the time scale of days to weeks.
Important personal factors
● technical chops in math, computational methods, and the scientific method
● a genuine research interest in the underlying user behavior
● good intuition for how the business works
Important environmental factors
● top-down knowledgeability and commitment to data science
● excellent data architect
● best practices data science infrastructure
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy is hiring!
https://about.udemy.com/careers/

Contenu connexe

Tendances

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaretDatabricks
 
Graph-Powered Machine Learning
Graph-Powered Machine Learning Graph-Powered Machine Learning
Graph-Powered Machine Learning GraphAware
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation PlatformKarthik Murugesan
 
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...Data Con LA
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerationsAseem Bansal
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and BeyondQuantUniversity
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Data analysis@network programming
Data analysis@network programmingData analysis@network programming
Data analysis@network programmingRama .
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital BusinessSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudDataWeCloudData
 

Tendances (20)

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaret
 
Graph-Powered Machine Learning
Graph-Powered Machine Learning Graph-Powered Machine Learning
Graph-Powered Machine Learning
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Ikanow oanyc summit
Ikanow oanyc summitIkanow oanyc summit
Ikanow oanyc summit
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
ODSC East 2018
ODSC East 2018ODSC East 2018
ODSC East 2018
 
Data analysis@network programming
Data analysis@network programmingData analysis@network programming
Data analysis@network programming
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 

Similaire à Data Science at Udemy

VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...eMadrid network
 
Exploring learning analytics
Exploring learning analyticsExploring learning analytics
Exploring learning analyticsJisc
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxdataKarthik
 
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...Joshua
 
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at TribalSoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at TribalChris Ballard
 
Research in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at TribalResearch in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at TribalLACE Project
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelOpen Cyber University of Korea
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve TeachingRafael Scapin, Ph.D.
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPaul Bailey
 
Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16Paul Bailey
 
UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016Paul Bailey
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learnedmwebbjisc
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3Ferdin Joe John Joseph PhD
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...nicholes21
 
Educational Data Mining to Analyze Students Performance – Concept Plan
Educational Data Mining to Analyze Students Performance – Concept PlanEducational Data Mining to Analyze Students Performance – Concept Plan
Educational Data Mining to Analyze Students Performance – Concept PlanIRJET Journal
 

Similaire à Data Science at Udemy (20)

VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
 
Exploring learning analytics
Exploring learning analyticsExploring learning analytics
Exploring learning analytics
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
 
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
 
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at TribalSoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
 
Research in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at TribalResearch in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at Tribal
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning model
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro Slides
 
Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learned
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
 
Educational Data Mining to Analyze Students Performance – Concept Plan
Educational Data Mining to Analyze Students Performance – Concept PlanEducational Data Mining to Analyze Students Performance – Concept Plan
Educational Data Mining to Analyze Students Performance – Concept Plan
 

Dernier

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Data Science at Udemy

  • 1. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data Science at Udemy Larry Wai Principal Data Scientist @Udemy
  • 2. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Overview of talk ● What is data science? ● Udemy in a nutshell ● Data science projects at Udemy ● Data science work cycle ● What does it mean to be a data scientist?
  • 3. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 What is data science? data science in consumer internet = application of the scientific method using big data computational methods to ascertain, predict, and utilize user behavior for business purposes Inherits from three historical schools of thought 1. Research of natural phenomena using the scientific method ○ e.g. physics, astronomy ○ data science arises from substituting the study of natural phenomena with study of user behavior 2. Research of computational methods ○ e.g. mathematics, computer science ○ data science arises from pushing the limits of existing methods to compute that which could not be computed before 3. Research of human behavior ○ e.g. economics, psychology ○ data science arises from applying big data to the study of microscopic human behavior, i.e. millions of users x thousands of items = billions of user-item calculations Other definitions (too general IMO): ● data science > statistics (only); stats does not require engineering skills ● data science > computer science (only); engineering does not require training in the scientific method ● data science > business analytics (only); analytics does not require engineering skills nor training in the scientific method
  • 4. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy in a nutshell ● consumer online education marketplace ● instructors get 50% of enrollment fee ● no certification requirements ● typical enrollment price point (paid) is $20-$40 ● get to critical mass (instructors and students) in each language through marketing ● above critical mass, leverage marketplace (organic) driven growth ● Udemy currently has ~7 million students, ~30 thousand courses ● relevance of search and recommendations is key to fostering growth ● learning goal data science is key to fostering long term growth Google search trends for selected online education companies ● Udemy (blue). Exponential marketplace growth. ● Coursera (yellow), Udacity (red), Lynda (green). Incremental growth. ● note: this chart convinced me to join Udemy :)
  • 5. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy web site
  • 6. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data science projects at Udemy search & recommendation ● real time recommendation (web, mobile) ● real time search ● batch e-mail recommendation learning goals ● course learning process optimization ● learning goal paths ● career learning goals + more projects
  • 7. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Search and recommendation (in experiment) Feature classes ● course historical averages ● personal historical behavior ● search term matching Overall ranking strategy ● compute global score per visitor per course per day ● consider modules as filters on the total available inventory ● the module score will be the sum of the global course scores for the top N courses in the module ● individual courses are ranked within each module according to the global course score course 1 course 2 course 3 course 4 course 5 course 6 course 7 course 8 course 9 course 10 course 11 course 12 module A module B module C
  • 8. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Learning goals (conceptual stage) Course learning goal clustering ● goals are hierarchical ● goals are linked ● goals are dynamic Overall learning goal strategy ● continuously update learning goal clustering ● quantify and evaluate student progress towards learning goals ● identify learning goal paths according to desired careers or hobbies goal 1 goal 2 goal 3 goal 4 goal 5 goal 6course A course B
  • 9. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data science work cycle experiment setup exploratory analysis model deployment model building data collection ideal cycling time is ~days to ~weeks
  • 10. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Exploratory analysis ● data to be explored can in general be defined as a multi-dimensional cube, a.k.a. “hypercube”, where each side of the hypercube is an exploratory “dimension” and the “measures” of the user behavior are aggregates in each cell ● the hypercube is the minimal representation required for the exploratory analysis; e.g. we minimize cardinality for continuous variables ● the human mind is unable to easily comprehend more than 3 dimensions, therefore exploratory analysis must be broken down into actions which project the entire hypercube onto different dimensions in sequence ● goal for the analyst is to understand the multi- dimensional user behavior, which may take many projections in sequence (~100)
  • 11. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 model building ● platforms such as R allow us to leverage open source modeling packages and compare models with relatively low overhead ● most user behavior features are non-linear and correlated; thus, the simplest “black box” non-linear models which handle correlations are practical to use, e.g. decision trees ● use residuals on holdout to validate model model
  • 12. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 model deployment ● standardized predictive model markup language (PMML) allows abstraction of models in deployment ● “plug-in” model deployment is agile because no new production code is needed for model updates ● shifts focus of algo development from production code development to data mining methods ● this approach allows a single person to build and deploy models quickly ● this approach is cutting edge and is being tested now at Udemy create training dataset create predictive model, e.g. decision trees, random forest offline analysis; residuals; feature importance loop through courses, compute feature vector per course compute score per course sort by score predictive model store (PMML format) in memory model; load on initialization; periodic updates model building model deployment model storage model scoring
  • 13. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 experiment setup Practical requirements for experiments, a.k.a. A/B tests ● need enough users to measure an interesting effect ● conversely, if an effect is not large enough to measure, then it is not interesting, at least from a data science point of view, and potentially from a business point of view ● e.g. an interesting effect from a business point of view would be +5% relative lift of conversion rate ● to achieve +5% relative lift at 95% confidence level (on say typical 1 conversion per 10 sessions), need to have 30,000 sessions in each of A and B samples, i.e. >60,000 sessions ● ideally, would like to measure lift within ~days; so need >60,000 sessions per day ● Udemy currently has >200,000 sessions per day (but 2 years ago it was more like 20,000 sessions per day, so 10x slower to run experiments) 1. smoke test (~few days) ○ 1% for test variant(s) ○ verify that nothing is broken ○ 40% CONTROL_1, 40% CONTROL_2 ○ validate that control is setup correctly 2. initial ramp (~1 week) ○ 5-10% for test variant(s) ○ sizing depends upon whether we’ve tested something like this before, and any revenue concerns 3. intermediate ramp (~few weeks) ○ 25%-50% for test variant ○ 40%-50% for CONTROL_1 4. final ramp / launch ○ 90% for test variant ○ 10% for CONTROL_1 (optional); turn off after a few weeks of monitoring ○ rename “test” as new baseline
  • 14. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 data collection ● data should be collected at the most granular level, e.g. typically per visitor per item per day ● data should be pre-arranged in a way which facilitates fast hypercube production, i.e. star schema ● most granular data is located at the star core ● experiment variants can be incorporated as an additional dimension in one of the star limbs core table with grouping fields A, B, C limb table with grouping field A limb table with grouping fields A, B limb table with grouping field B limb table with grouping fields B, C limb table with grouping fields A, B, D mapping table with grouping field C and other field D “star schema” (with intermediate mapping)
  • 15. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 What does it mean to be a data scientist? A successful data scientist is somebody who can independently execute the entire data science work cycle on the time scale of days to weeks. Important personal factors ● technical chops in math, computational methods, and the scientific method ● a genuine research interest in the underlying user behavior ● good intuition for how the business works Important environmental factors ● top-down knowledgeability and commitment to data science ● excellent data architect ● best practices data science infrastructure
  • 16. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy is hiring! https://about.udemy.com/careers/