SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
1st edition | July 8-11, 2019
1
BigML, Inc #DutchMLSchool
Introduction to BigML
Making Machine Learning Beautifully Simple
Full Name
Role, Company
2
Poul Petersen
CIO, BigML, Inc
BigML, Inc #DutchMLSchool
Sampling the Audience
3
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale
Aficionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed
Practitioner: Very familiar with ML packages (Weka,
Scikit, BigML, etc.)
Newbie: Just taking Coursera ML class or reading an
introductory book to ML
Absolute beginner: ML sounds like science fiction
BigML, Inc #DutchMLSchool
A Present for You
4
BigML, Inc #DutchMLSchool
Free 1-Month PRO Subscription
5
https://bigml.com/accounts/register/
dutchmlschool
BigML, Inc #DutchMLSchool
A Brief History of BigML
6
• BigML Mission: To make Machine
Learning Beautifully Simple
• BigML Founded in Corvallis,
Oregon in 2011 - long before ML
was "cool"
• You’ve never heard of it?
• Most innovative city in the United
States!
BigML, Inc #DutchMLSchool
A Brief History of BigML
7
BigML, Inc #DutchMLSchool
BigML Platform
8
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
BigML, Inc #DutchMLSchool
BigML Platform
9
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE
On-Premises
BigML, Inc #DutchMLSchool
Machine Learning Motivation
10
• You are looking to buy a house
• Recently found a house you like
• Is the asking price fair?
Imagine:
What Next?
BigML, Inc #DutchMLSchool
Machine Learning Motivation
11
Why not ask an expert?
• Experts can be rare / expensive
• Hard to validate experience:
• Experience with similar properties?
• Do they consider all relevant variables?
• Knowledge of market up to date?
• Hard to validate answer:
• How many times expert right / wrong?
• Probably can’t explain decision in detail
• Humans are not good at intuitive statistics
BigML, Inc #DutchMLSchool
Data vs Expert
12
Replace the expert with data?
• Intuition: square footage relates to price.
• Collect data from past sales
SQFT SOLD
2424 360000
1785 307500
1003 185000
4135 600000
1676 328500
1012 247000
3352 420000
2825 435350
PRICE = 125.3*SQFT + 96535
PREDICT
400262
320195
222211
614651
306538
223339
516541
450508
BigML, Inc #DutchMLSchool
Data vs Expert
13
Replace the expert scorecard
• Experts can be rare / expensive
• Hard to validate experience:
• Experience with similar properties?
• Do they consider all relevant variables?
• Knowledge of market up to date?
• Hard to validate answer:
• How many times expert right / wrong?
• Probably can’t explain decision in detail
• Humans are not good at intuitive statistics
BigML, Inc #DutchMLSchool
Data vs Expert
14
Replace the expert with data
• Intuition: square footage relates to price.
• Collect data from past sales
SQFT SOLD
2424 360000
1785 307500
1003 185000
4135 600000
1676 328500
1012 247000
3352 420000
2825 435350
PRICE = 125.3*SQFT + 96535
BigML, Inc #DutchMLSchool
More Data!
15
SQFT BEDS BATHS ADDRESS LOCATION
LOT
SIZE
YEAR
BUILT
PARKING
SPOTS
LATITUDE LONGITUDE SOLD
2424 4 3
1522 NW
Jonquil
Timberhill
SE 2nd
5227 1991 2 44,594828 -123,269328 360000
1785 3 2
7360 NW
Valley Vw
Country
Estates
25700 1979 2 44,643876 -123,238189 307500
1003 2 1
2620 NW
Chinaberry
Tamarack
Village
4792 1978 2 44,593704 -123,295424 185000
4135 5 3,5
4748 NW
Veronica
Suncrest 6098 2004 3 44,5929659 -123,306916 600000
1676 3 2
2842 NW
Monterey
Corvallis 8712 1975 2 44,5945279 -123,291523 328500
1012 3 1
2320 NW
Highland
Corvallis 9583 1959 2 44,591476 -123,262841 247000
3352 4 3
1205 NW
Ridgewood
Ridgewood
2
60113 1975 2 44,579439 -123,333888 420000
2825 3 411 NW 16th
Wilkins
Addition
4792 1938 1 44,570883 -123,272113 435350
Uhhhh……..
• Can we still fit a line to 10 variables? (well, yes)
• Will fitting a line give good results? (unlikely)
• What about those text fields and categorical values?
BigML, Inc #DutchMLSchool
Modeling Home Prices
16
BigML, Inc #DutchMLSchool
What just happened?
17
Home
Data
Square Feet?
Location?
Model Prediction:
Price=418K
BigML, Inc #DutchMLSchool
Some Terminology…
18
Home
Data
Model Prediction:
Price=418K
Training
Data
• Modeling
• Clustering
• Anomaly Detection
• Association Discovery
ML
Resource
ML
Platform
“Consume” the model
or
“put into production”
• Dashboard
• Custom Application
• Wearable / Edge device
• Batch Process
BigML, Inc #DutchMLSchool
Model Choices
19
• Single Decision Tree was Easy to understand, but could we
build something stronger?
• There are actually hundreds of algorithms…
BigML, Inc #DutchMLSchool
Model Choices
20
BigML, Inc #DutchMLSchool
Model Choices
21
• Single Decision Tree was Easy to understand, but could we
build something stronger?
• There are actually hundreds of algorithms…
• BigML carefully implements the best in terms of interpretability
and the ability to work with real-world data:
• Linear Regression
• Logistic Regression
• Single Decision Trees
• Decision Forest / Random Decision Forest
• Boosted Trees
• Deepnets (wait - those are hard, right?)
BigML, Inc #DutchMLSchool
Deepnets are Hard, Right?
22
x1 x2 x3 x4
y1 y2 y3Outputs
Inputs
h1 h2 h3 h4 h5 Hidden layer
3 Classes
4 Features
h1 h2 h3 h4 h5 Hidden layer
h1 h2 h3 h4 h9 Hidden layer….
h1 = activation?(wx, x) ?
BigML, Inc #DutchMLSchool
BigML Deepnets
23
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes significant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)
BigML, Inc #DutchMLSchool
Model Choices
24
BigML, Inc #DutchMLSchool
Choosing the Algorithm
25
Decreasing Interpretability / Better Representation / Longer Training
IncreasingDataSize/Complexity
Early Stage

Rapid Prototyping
Mid Stage

Proven Application
Late Stage

Critical Performance
DeepnetsSingle Tree Model
Logistic Regression Boosted Trees
Random

Decision Forest
Decision Forest
STILL
TO
O
H
AR
D
?
BigML, Inc #DutchMLSchool
OptiML
26
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to find ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but finds the optimum
machine learning algorithm and parameters for your data
automatically
• Outputs the top performing algorithms and parameters for your
data… Why use just one “best” result?
BigML, Inc #DutchMLSchool
Fusions
27
• Similar to an Ensemble, but we can mix different model types
• Logistic Regression, plus a Deepnet for example
• You can also create a fusion with different training sets!
• Last week, plus last month data, etc
• Or a Fusion of OptiML models
• Combines the “best of the best”
BigML, Inc #DutchMLSchool
OptiML & Fusions
28
BigML, Inc #DutchMLSchool
ML Workflows
29
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
• Real-world ML Applications
are workflows!
• Often requires
unsupervised learning!
BigML, Inc #DutchMLSchool
Let’s build a recommender
30
Typical way to shop for a home…
BigML, Inc #DutchMLSchool
Recommender Idea
31
?
?
?
?
Preference
Model
Preference
Data
Sample
… then use the Preference Model to
filter all the homes on the market
All Homes
Forsale
BigML, Inc #DutchMLSchool
Title
32
What if there are really unusual homes in the data?
• A mansion with 20 bathrooms
• A home with no bedrooms
• A lot size that is smaller than the home?
We don’t want to show these as suggestions
because they are unusual…. How do we detect
anomalies?
BigML, Inc #DutchMLSchool
Anomaly Detection
!33
BigML, Inc #DutchMLSchool
What just happened?
34
• We wanted to find and remove unusual houses.
• We created an Anomaly Detector and examined
the top anomalies.
• We found some unusual houses to remove and
discovered bad data (missing values) that we want
to fix.
BigML, Inc #DutchMLSchool
A clever way to fix missing data
35
Let’s use Machine Learning…
BEDS BATHS
SQFT PRICE BEDS BATHS
3.125 US$530.000 5 3
2.100 US$460.000 2
1.200 US$250.000 3
3.950 US$610.000 6 4
4
1.5
BigML, Inc #DutchMLSchool
WhizzML
!36
BigML, Inc #DutchMLSchool
What just happened?
37
• We had a Dataset with missing values.
• We wanted to apply an algorithm to fix the missing
values with Machine Learning
• Rather than write the algorithm, we found what we
needed in the WhizzML public gallery.
• Now that we have cloned the Script we can use it
again and again.
• We can write new ones too!
BigML, Inc #DutchMLSchool
Recommender Problem #2
38
• How can we avoid showing essentially the
same house over and over?
All Homes
?
?
?
Sample
Modern
BigML, Inc #DutchMLSchool
Recommender Problem #2
39
• How can we avoid showing essentially the
same house over and over?
All Homes
Modern
Lots of
Land
• Great! What if we don’t know how to group
them? Or how many groups?
?
sample
?
sample
BigML, Inc #DutchMLSchool
Clustering
40
BigML, Inc #DutchMLSchool
What just happened?
41
• Since we don’t know how many groups of homes
there should be, we used G-means Clustering to find
the optimum number of groups of homes
• Our recommender will use these groups to create a
better sampling for user preference
• We also tried to understand the home clusters using
“model clusters” but the models were difficult to
interpret
BigML, Inc #DutchMLSchool
Understanding Clusters Better
42
If SQFT >= 3,125 THEN “Cluster 1”
What if we could get rules like…
SQFT PRICE BEDS BATHS CLUSTER
3.125 US$530.000 5 3 Cluster 1
2.100 US$460.000 4 2 Cluster 3
1.200 US$250.000 3 1,5 Cluster 5
3.950 US$610.000 6 4 Cluster 1
BigML, Inc #DutchMLSchool
Association Discovery
!43
BigML, Inc #DutchMLSchool
What just happened?
44
• We used a Batch Centroid to add the Cluster
assignment of each home as a feature to the Dataset
• We use Association Discovery to find “interesting”
relationships between the features including the Cluster
assignment
BigML, Inc #DutchMLSchool
Recommender Problem #3
45
There is much more interesting information than just the
number of BEDS, BATHS, etc.
• Unfortunately, these "remarks" are not available in the
Redfin download
• Adding them to our dataset requires crawling the
website
• Like most ML projects, preparing the data is 80% of
the difficulty (fortunately I already did it!)
BigML, Inc #DutchMLSchool
Topic Modeling
46
BigML, Inc #DutchMLSchool
What just happened?
47
• We extending the home dataset with the syndicated
remarks text field
• We built a model to predict sale price and explored how
key words discovered in the remarks impacted price
• We used topic modeling to create a deeper thematic
understanding of the remarks
• Homes that are "in-town" or "out-of-town"
• We extended the dataset with fields that represent for
each home how related they are to each of these topics
• This will allow our clustering to group homes by a deeper
meaning than just BEDS, BATHS, etc
• Is there a better way to capture “locality”?
BigML, Inc #DutchMLSchool
Idea: Better Feature
48
Worth More
Worth Less
BigML, Inc #DutchMLSchool
A Better Feature for Home Prices
49
LATITUDE LONGITUDE REFERENCE
LATITUDE
REFERENCE
LONGITUDE
44,583 -123,296775 44,5638 -123,2794
44,604414 -123,296129 44,5638 -123,2794
44,600108 -123,29707 44,5638 -123,2794
44,603077 -123,295004 44,5638 -123,2794
44,589587 -123,301154 44,5638 -123,2794
Distance (m)
700
30,4
19,38
37,8
23,39
BigML, Inc #DutchMLSchool
Haversine Formula
50
https://en.wikipedia.org/wiki/Haversine_formula
BigML, Inc #DutchMLSchool
Feature Engineering
51
BigML, Inc #DutchMLSchool
What just happened?
52
• We wanted to create a new feature “distance from OSU”
• This is possible with Flatline, a DSL for feature engineering
• Rather than writing the code for the coordinate
transformation, we found a ready-made script shared in
the WhizzML gallery
• We cloned the script and transformed the dataset
• This can be easily repeated with new datasets: fresh data
or different cities
BigML, Inc #DutchMLSchool
Recommender Idea
53
?
?
Modern
Lots of
Land
Small
?
?
?
?
Preference
Model
Preference
Data
BigML, Inc #DutchMLSchool
House Recommender
54
Co-organized by: Sponsor:
Business Partners:

Contenu connexe

Tendances

DutchMLSchool. Machine Learning End-to-End
DutchMLSchool. Machine Learning End-to-EndDutchMLSchool. Machine Learning End-to-End
DutchMLSchool. Machine Learning End-to-EndBigML, Inc
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML AutomationBigML, Inc
 
DutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic ModelsDutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic ModelsBigML, Inc
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - DeepnetsBigML, Inc
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBigML, Inc
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature EngineeringBigML, Inc
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision MakingBigML, Inc
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Yuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayYuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayMLconf
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSMSunView Software, Inc.
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
VSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised LearningVSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised LearningBigML, Inc
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 

Tendances (20)

DutchMLSchool. Machine Learning End-to-End
DutchMLSchool. Machine Learning End-to-EndDutchMLSchool. Machine Learning End-to-End
DutchMLSchool. Machine Learning End-to-End
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
 
DutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic ModelsDutchMLSchool. Associations and Topic Models
DutchMLSchool. Associations and Topic Models
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Yuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayYuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBay
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
VSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised LearningVSSML18 Introduction to Supervised Learning
VSSML18 Introduction to Supervised Learning
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 

Similaire à DutchMLSchool. Introduction to Machine Learning with the BigML Platform

BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationBigML, Inc
 
BigML Education - OptiML
BigML Education - OptiMLBigML Education - OptiML
BigML Education - OptiMLBigML, Inc
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsBigML, Inc
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesBigML, Inc
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles BigML, Inc
 
VSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsVSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsBigML, Inc
 
DutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectDutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectBigML, Inc
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveBigML, Inc
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
MLSD18. Real World Use Case II
MLSD18. Real World Use Case IIMLSD18. Real World Use Case II
MLSD18. Real World Use Case IIBigML, Inc
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
MLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical PerspectiveMLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical PerspectiveBigML, Inc
 
VSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsVSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsBigML, Inc
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Lviv Startup Club
 

Similaire à DutchMLSchool. Introduction to Machine Learning with the BigML Platform (20)

BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
BigML Education - OptiML
BigML Education - OptiMLBigML Education - OptiML
BigML Education - OptiML
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and Anomalies
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles
 
VSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsVSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML Workflows
 
DutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML ProjectDutchMLSchool. Your first BigML Project
DutchMLSchool. Your first BigML Project
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
MLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business PerspectiveMLSEV. Machine Learning: Business Perspective
MLSEV. Machine Learning: Business Perspective
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
MLSD18. Real World Use Case II
MLSD18. Real World Use Case IIMLSD18. Real World Use Case II
MLSD18. Real World Use Case II
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
MLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical PerspectiveMLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical Perspective
 
VSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsVSSML18. OptiML and Fusions
VSSML18. OptiML and Fusions
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
 

Plus de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryBigML, Inc
 

Plus de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility Industry
 

Dernier

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Dernier (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

DutchMLSchool. Introduction to Machine Learning with the BigML Platform

  • 1. 1st edition | July 8-11, 2019 1
  • 2. BigML, Inc #DutchMLSchool Introduction to BigML Making Machine Learning Beautifully Simple Full Name Role, Company 2 Poul Petersen CIO, BigML, Inc
  • 3. BigML, Inc #DutchMLSchool Sampling the Audience 3 Expert: Published papers at KDD, ICML, NIPS, etc or developed own ML algorithms used at large scale Aficionado: Understands pros/cons of different techniques and/or can tweak algorithms as needed Practitioner: Very familiar with ML packages (Weka, Scikit, BigML, etc.) Newbie: Just taking Coursera ML class or reading an introductory book to ML Absolute beginner: ML sounds like science fiction
  • 4. BigML, Inc #DutchMLSchool A Present for You 4
  • 5. BigML, Inc #DutchMLSchool Free 1-Month PRO Subscription 5 https://bigml.com/accounts/register/ dutchmlschool
  • 6. BigML, Inc #DutchMLSchool A Brief History of BigML 6 • BigML Mission: To make Machine Learning Beautifully Simple • BigML Founded in Corvallis, Oregon in 2011 - long before ML was "cool" • You’ve never heard of it? • Most innovative city in the United States!
  • 7. BigML, Inc #DutchMLSchool A Brief History of BigML 7
  • 8. BigML, Inc #DutchMLSchool BigML Platform 8 Web-based Frontend Visualizations Distributed Machine Learning Backend SOURCE SERVER DATASET SERVER MODEL SERVER PREDICTION SERVER EVALUATION SERVER SAMPLE SERVER WHIZZML SERVER Tools - https://bigml.com/tools REST API - https://bigml.com/api Smart Infrastructure (auto-deployable, auto-scalable) SERVERS EVENTS GEARMAN QUEUE DESIRED TOPOLOGY AWS COSTS RUNQUEUE SCALER BUSY SCALER AUTO TOPOLOGY AUTO TOPOLOGY AUTO TOPOLOGY AUTO TOPOLOGY ACTUAL TOPOLOGY MESSAGE QUEUE
  • 9. BigML, Inc #DutchMLSchool BigML Platform 9 Web-based Frontend Visualizations Distributed Machine Learning Backend SOURCE SERVER DATASET SERVER MODEL SERVER PREDICTION SERVER EVALUATION SERVER SAMPLE SERVER WHIZZML SERVER Tools - https://bigml.com/tools REST API - https://bigml.com/api Smart Infrastructure (auto-deployable, auto-scalable) SERVERS EVENTS GEARMAN QUEUE DESIRED TOPOLOGY AWS COSTS RUNQUEUE SCALER BUSY SCALER AUTO TOPOLOGY AUTO TOPOLOGY AUTO TOPOLOGY AUTO TOPOLOGY ACTUAL TOPOLOGY MESSAGE QUEUE On-Premises
  • 10. BigML, Inc #DutchMLSchool Machine Learning Motivation 10 • You are looking to buy a house • Recently found a house you like • Is the asking price fair? Imagine: What Next?
  • 11. BigML, Inc #DutchMLSchool Machine Learning Motivation 11 Why not ask an expert? • Experts can be rare / expensive • Hard to validate experience: • Experience with similar properties? • Do they consider all relevant variables? • Knowledge of market up to date? • Hard to validate answer: • How many times expert right / wrong? • Probably can’t explain decision in detail • Humans are not good at intuitive statistics
  • 12. BigML, Inc #DutchMLSchool Data vs Expert 12 Replace the expert with data? • Intuition: square footage relates to price. • Collect data from past sales SQFT SOLD 2424 360000 1785 307500 1003 185000 4135 600000 1676 328500 1012 247000 3352 420000 2825 435350 PRICE = 125.3*SQFT + 96535 PREDICT 400262 320195 222211 614651 306538 223339 516541 450508
  • 13. BigML, Inc #DutchMLSchool Data vs Expert 13 Replace the expert scorecard • Experts can be rare / expensive • Hard to validate experience: • Experience with similar properties? • Do they consider all relevant variables? • Knowledge of market up to date? • Hard to validate answer: • How many times expert right / wrong? • Probably can’t explain decision in detail • Humans are not good at intuitive statistics
  • 14. BigML, Inc #DutchMLSchool Data vs Expert 14 Replace the expert with data • Intuition: square footage relates to price. • Collect data from past sales SQFT SOLD 2424 360000 1785 307500 1003 185000 4135 600000 1676 328500 1012 247000 3352 420000 2825 435350 PRICE = 125.3*SQFT + 96535
  • 15. BigML, Inc #DutchMLSchool More Data! 15 SQFT BEDS BATHS ADDRESS LOCATION LOT SIZE YEAR BUILT PARKING SPOTS LATITUDE LONGITUDE SOLD 2424 4 3 1522 NW Jonquil Timberhill SE 2nd 5227 1991 2 44,594828 -123,269328 360000 1785 3 2 7360 NW Valley Vw Country Estates 25700 1979 2 44,643876 -123,238189 307500 1003 2 1 2620 NW Chinaberry Tamarack Village 4792 1978 2 44,593704 -123,295424 185000 4135 5 3,5 4748 NW Veronica Suncrest 6098 2004 3 44,5929659 -123,306916 600000 1676 3 2 2842 NW Monterey Corvallis 8712 1975 2 44,5945279 -123,291523 328500 1012 3 1 2320 NW Highland Corvallis 9583 1959 2 44,591476 -123,262841 247000 3352 4 3 1205 NW Ridgewood Ridgewood 2 60113 1975 2 44,579439 -123,333888 420000 2825 3 411 NW 16th Wilkins Addition 4792 1938 1 44,570883 -123,272113 435350 Uhhhh…….. • Can we still fit a line to 10 variables? (well, yes) • Will fitting a line give good results? (unlikely) • What about those text fields and categorical values?
  • 17. BigML, Inc #DutchMLSchool What just happened? 17 Home Data Square Feet? Location? Model Prediction: Price=418K
  • 18. BigML, Inc #DutchMLSchool Some Terminology… 18 Home Data Model Prediction: Price=418K Training Data • Modeling • Clustering • Anomaly Detection • Association Discovery ML Resource ML Platform “Consume” the model or “put into production” • Dashboard • Custom Application • Wearable / Edge device • Batch Process
  • 19. BigML, Inc #DutchMLSchool Model Choices 19 • Single Decision Tree was Easy to understand, but could we build something stronger? • There are actually hundreds of algorithms…
  • 21. BigML, Inc #DutchMLSchool Model Choices 21 • Single Decision Tree was Easy to understand, but could we build something stronger? • There are actually hundreds of algorithms… • BigML carefully implements the best in terms of interpretability and the ability to work with real-world data: • Linear Regression • Logistic Regression • Single Decision Trees • Decision Forest / Random Decision Forest • Boosted Trees • Deepnets (wait - those are hard, right?)
  • 22. BigML, Inc #DutchMLSchool Deepnets are Hard, Right? 22 x1 x2 x3 x4 y1 y2 y3Outputs Inputs h1 h2 h3 h4 h5 Hidden layer 3 Classes 4 Features h1 h2 h3 h4 h5 Hidden layer h1 h2 h3 h4 h9 Hidden layer…. h1 = activation?(wx, x) ?
  • 23. BigML, Inc #DutchMLSchool BigML Deepnets 23 • The success of a Deepnet is dependent on getting the right network structure for the dataset • But, there are too many parameters: • Nodes, layers, activation function, learning rate, etc… • And setting them takes significant expert knowledge • Solution: Metalearning (a good initial guess) • Solution: Network search (try a bunch)
  • 25. BigML, Inc #DutchMLSchool Choosing the Algorithm 25 Decreasing Interpretability / Better Representation / Longer Training IncreasingDataSize/Complexity Early Stage Rapid Prototyping Mid Stage Proven Application Late Stage Critical Performance DeepnetsSingle Tree Model Logistic Regression Boosted Trees Random Decision Forest Decision Forest STILL TO O H AR D ?
  • 26. BigML, Inc #DutchMLSchool OptiML 26 • Each resource has several parameters that impact quality • Number of trees, missing splits, nodes, weight • Rather than trial and error, we can use ML to find ideal parameters • Why not make the model type, Decision Tree, Boosted Tree, etc, a parameter as well? • Similar to Deepnet network search, but finds the optimum machine learning algorithm and parameters for your data automatically • Outputs the top performing algorithms and parameters for your data… Why use just one “best” result?
  • 27. BigML, Inc #DutchMLSchool Fusions 27 • Similar to an Ensemble, but we can mix different model types • Logistic Regression, plus a Deepnet for example • You can also create a fusion with different training sets! • Last week, plus last month data, etc • Or a Fusion of OptiML models • Combines the “best of the best”
  • 29. BigML, Inc #DutchMLSchool ML Workflows 29 MODEL FILTERSOLD HOMES BATCH PREDICTION NEW FEATURES DATASET DEALS DATASET FILTERFORSALE HOMES NEW FEATURES • Real-world ML Applications are workflows! • Often requires unsupervised learning!
  • 30. BigML, Inc #DutchMLSchool Let’s build a recommender 30 Typical way to shop for a home…
  • 31. BigML, Inc #DutchMLSchool Recommender Idea 31 ? ? ? ? Preference Model Preference Data Sample … then use the Preference Model to filter all the homes on the market All Homes Forsale
  • 32. BigML, Inc #DutchMLSchool Title 32 What if there are really unusual homes in the data? • A mansion with 20 bathrooms • A home with no bedrooms • A lot size that is smaller than the home? We don’t want to show these as suggestions because they are unusual…. How do we detect anomalies?
  • 34. BigML, Inc #DutchMLSchool What just happened? 34 • We wanted to find and remove unusual houses. • We created an Anomaly Detector and examined the top anomalies. • We found some unusual houses to remove and discovered bad data (missing values) that we want to fix.
  • 35. BigML, Inc #DutchMLSchool A clever way to fix missing data 35 Let’s use Machine Learning… BEDS BATHS SQFT PRICE BEDS BATHS 3.125 US$530.000 5 3 2.100 US$460.000 2 1.200 US$250.000 3 3.950 US$610.000 6 4 4 1.5
  • 37. BigML, Inc #DutchMLSchool What just happened? 37 • We had a Dataset with missing values. • We wanted to apply an algorithm to fix the missing values with Machine Learning • Rather than write the algorithm, we found what we needed in the WhizzML public gallery. • Now that we have cloned the Script we can use it again and again. • We can write new ones too!
  • 38. BigML, Inc #DutchMLSchool Recommender Problem #2 38 • How can we avoid showing essentially the same house over and over? All Homes ? ? ? Sample Modern
  • 39. BigML, Inc #DutchMLSchool Recommender Problem #2 39 • How can we avoid showing essentially the same house over and over? All Homes Modern Lots of Land • Great! What if we don’t know how to group them? Or how many groups? ? sample ? sample
  • 41. BigML, Inc #DutchMLSchool What just happened? 41 • Since we don’t know how many groups of homes there should be, we used G-means Clustering to find the optimum number of groups of homes • Our recommender will use these groups to create a better sampling for user preference • We also tried to understand the home clusters using “model clusters” but the models were difficult to interpret
  • 42. BigML, Inc #DutchMLSchool Understanding Clusters Better 42 If SQFT >= 3,125 THEN “Cluster 1” What if we could get rules like… SQFT PRICE BEDS BATHS CLUSTER 3.125 US$530.000 5 3 Cluster 1 2.100 US$460.000 4 2 Cluster 3 1.200 US$250.000 3 1,5 Cluster 5 3.950 US$610.000 6 4 Cluster 1
  • 44. BigML, Inc #DutchMLSchool What just happened? 44 • We used a Batch Centroid to add the Cluster assignment of each home as a feature to the Dataset • We use Association Discovery to find “interesting” relationships between the features including the Cluster assignment
  • 45. BigML, Inc #DutchMLSchool Recommender Problem #3 45 There is much more interesting information than just the number of BEDS, BATHS, etc. • Unfortunately, these "remarks" are not available in the Redfin download • Adding them to our dataset requires crawling the website • Like most ML projects, preparing the data is 80% of the difficulty (fortunately I already did it!)
  • 47. BigML, Inc #DutchMLSchool What just happened? 47 • We extending the home dataset with the syndicated remarks text field • We built a model to predict sale price and explored how key words discovered in the remarks impacted price • We used topic modeling to create a deeper thematic understanding of the remarks • Homes that are "in-town" or "out-of-town" • We extended the dataset with fields that represent for each home how related they are to each of these topics • This will allow our clustering to group homes by a deeper meaning than just BEDS, BATHS, etc • Is there a better way to capture “locality”?
  • 48. BigML, Inc #DutchMLSchool Idea: Better Feature 48 Worth More Worth Less
  • 49. BigML, Inc #DutchMLSchool A Better Feature for Home Prices 49 LATITUDE LONGITUDE REFERENCE LATITUDE REFERENCE LONGITUDE 44,583 -123,296775 44,5638 -123,2794 44,604414 -123,296129 44,5638 -123,2794 44,600108 -123,29707 44,5638 -123,2794 44,603077 -123,295004 44,5638 -123,2794 44,589587 -123,301154 44,5638 -123,2794 Distance (m) 700 30,4 19,38 37,8 23,39
  • 50. BigML, Inc #DutchMLSchool Haversine Formula 50 https://en.wikipedia.org/wiki/Haversine_formula
  • 52. BigML, Inc #DutchMLSchool What just happened? 52 • We wanted to create a new feature “distance from OSU” • This is possible with Flatline, a DSL for feature engineering • Rather than writing the code for the coordinate transformation, we found a ready-made script shared in the WhizzML gallery • We cloned the script and transformed the dataset • This can be easily repeated with new datasets: fresh data or different cities
  • 53. BigML, Inc #DutchMLSchool Recommender Idea 53 ? ? Modern Lots of Land Small ? ? ? ? Preference Model Preference Data