SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Visual Diagnostics
at Scale
EuroSciPy 2019
Dr. Rebecca Bilbro
Chief Data Scientist, ICX Media
Co-creator, Scikit-Yellowbrick
Author, Applied Text Analysis with Python
@rebeccabilbro
A tale of
three datasets
Census Dataset
500K instances
50 features
(age, occupation,
education, sex, ethnicity
marital status)
Sarcasm Dataset
50K instances
5K features
(“love”, 🙄, “totally”, “best”,
“surprise”, “Sherlock”,
capitalization, timestamp)
Sensor Dataset
5M instances
15 features
(Ammonia, Acetaldehyde,
Acetone, Ethylene, Ethanol,
Toluene ppmv)
Scaling pain
points are
dataset-
specific
● Many features
● Many instances
● Feature variance
● Heteroskedasticity
● Covariance
● Noise
Logistic Regression Fit Times (seconds)
500 - 5M instances / 5 - 50 features
10 seconds
Multilayer Perceptron Fit Times (seconds)
500 - 5M instances / 5 - 50 features
5 min, 48
seconds
Support Vector Machine Fit Times (seconds)
500 - 500K instances / 5 - 50 features
5 hours, 24
seconds
Support Vector Machine Fit Times (seconds)
500 - 500K instances / 5 - 50 features
5 hours, 24
seconds
😵
How to
optimize?
● Be patient
● Be wrong
● Be rich
● Steer
Adventures in
Model Visualization
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from yellowbrick.features import ParallelCoordinates
data = load_iris()
oz = ParallelCoordinates(ax=axes[idx], fast=True)
oz.fit_transform(data.data, data.target)
oz.finalize()
Each point drawn individually
as connected line segment
With standardization
Points grouped by class, each class
drawn as single segment
Bumps
class Estimator(object):
def fit(self, X, y=None):
"""
Fits estimator to data.
"""
# set state of self
return self
def predict(self, X):
"""
Predict response of X
"""
# compute predictions pred
return pred
class Transformer(Estimator):
def transform(self, X):
"""
Transforms the input data.
"""
# transform X to X_prime
return X_prime
class Pipeline(Transfomer):
@property
def named_steps(self):
"""
Returns a sequence of estimators
"""
return self.steps
@property
def _final_estimator(self):
"""
Terminating estimator
"""
return self.steps[-1]
The scikit-learn API
self.X
class Visualizer(Estimator):
def draw(self):
"""
Draw called from scikit-learn methods.
"""
return self.ax
def finalize(self):
self.set_title()
self.legend()
def poof(self):
self.finalize()
plt.show()
import matplotlib.pyplot as plt
from yellowbrick.base import Visualizer
class MyVisualizer(Visualizer):
def __init__(self, ax=None, **kwargs):
super(MyVisualizer, self).__init__(ax, **kwargs)
def fit(self, X, y=None):
self.draw(X)
return self
def draw(self, X):
if self.ax is None:
self.ax = self.gca()
self.ax.plt(X)
def finalize(self):
self.set_title("My Visualizer")
The Yellowbrick API
Oneliners
from sklearn.linear_model import Lasso
from yellowbrick.regressor import ResidualsPlot
# Option 1: scikit-learn style
viz = ResidualsPlot(Lasso())
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.poof()
from sklearn.linear_model import Lasso
from yellowbrick.regressor import residuals_plot
# Option 2: Quick Method
viz = residuals_plot(
Lasso(), X_train, y_train, X_test, y_test
)
��
Visual Comparison Tests
=========================================== test session starts ============================================
platform darwin -- Python 3.7.1, pytest-5.0.0, py-1.8.0, pluggy-0.12.0
rootdir: /Users/rbilbro/pyjects/yb, inifile: setup.cfg
plugins: flakes-4.0.0, cov-2.7.1
collected 932 items
tests/__init__.py s... [ 0%]
tests/base.py s [ 0%]
tests/conftest.py s [ 0%]
tests/fixtures.py s [ 0%]
tests/images.py s [ 0%]
tests/rand.py s [ 0%]
tests/test_base.py s............ [ 2%]
...........................................................................................................
...........................................................................................................
...........................................................................................................
...........................................................................................................
tests/test_utils/test_target.py s............ [ 68%]
tests/test_utils/test_timer.py s..... [ 68%]
tests/test_utils/test_types.py s.................................................................... [ 70%]
....x................................x.............................................................. [ 72%]
.... [ 73%]
tests/test_utils/test_wrapper.py s....
===================== 854 passed, 72 skipped, 6 xfailed, 33 warnings in 225.96 seconds =====================
Roadmap
Brushing and Filtering
Ok for only 5 features Not good for 23 features
Machine-learning oriented aggregation
YB (current) Seaborn
Parallelization with joblib
Elbow Curve Validation Curve
from yellowbrick.features import Rank2D
from yellowbrick.pipeline import VisualPipeline
from yellowbrick.model_selection import CVScores
from yellowbrick.regressor import PredictionError
viz_pipe = VisualPipeline([
('rank2d', Rank2D(features=features, algorithm='covariance')),
('prederr', PredictionError(model)),
('cvscores', CVScores(model, cv=cv, scoring='r2'))
])
Visual
Pipelines
Models are aggregations,
so are visualizations...
…so use visualizations
to steer model selection!
scikit-yb.org

Contenu connexe

Tendances

Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Gianmario Spacagna
 

Tendances (20)

AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Part 2 Python
Part 2 PythonPart 2 Python
Part 2 Python
 
Java performance
Java performanceJava performance
Java performance
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Intro to programing with java-lecture 3
Intro to programing with java-lecture 3Intro to programing with java-lecture 3
Intro to programing with java-lecture 3
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 

Similaire à EuroSciPy 2019: Visual diagnostics at scale

ScalaCheck
ScalaCheckScalaCheck
ScalaCheck
BeScala
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
PAPIs.io
 

Similaire à EuroSciPy 2019: Visual diagnostics at scale (20)

Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
 
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
 
Art & music vs Google App Engine
Art & music vs Google App EngineArt & music vs Google App Engine
Art & music vs Google App Engine
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheck
 
обзор Python
обзор Pythonобзор Python
обзор Python
 
Intro to computer vision in .net
Intro to computer vision in .netIntro to computer vision in .net
Intro to computer vision in .net
 
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
 
Pythonic Graphics
Pythonic GraphicsPythonic Graphics
Pythonic Graphics
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with Scala
 
Spock
SpockSpock
Spock
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
 
Ember
EmberEmber
Ember
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?
 

Plus de Rebecca Bilbro

Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
Rebecca Bilbro
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword Corpus
Rebecca Bilbro
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)
Rebecca Bilbro
 

Plus de Rebecca Bilbro (15)

Data Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionData Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in Production
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
 
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyAnti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual Consistency
 
The Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsThe Promise and Peril of Very Big Models
The Promise and Peril of Very Big Models
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in space
Words in spaceWords in space
Words in space
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Camlis
CamlisCamlis
Camlis
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword Corpus
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
 

Dernier

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

EuroSciPy 2019: Visual diagnostics at scale

  • 2. Dr. Rebecca Bilbro Chief Data Scientist, ICX Media Co-creator, Scikit-Yellowbrick Author, Applied Text Analysis with Python @rebeccabilbro
  • 3. A tale of three datasets
  • 4. Census Dataset 500K instances 50 features (age, occupation, education, sex, ethnicity marital status) Sarcasm Dataset 50K instances 5K features (“love”, 🙄, “totally”, “best”, “surprise”, “Sherlock”, capitalization, timestamp) Sensor Dataset 5M instances 15 features (Ammonia, Acetaldehyde, Acetone, Ethylene, Ethanol, Toluene ppmv)
  • 5. Scaling pain points are dataset- specific ● Many features ● Many instances ● Feature variance ● Heteroskedasticity ● Covariance ● Noise
  • 6. Logistic Regression Fit Times (seconds) 500 - 5M instances / 5 - 50 features 10 seconds
  • 7. Multilayer Perceptron Fit Times (seconds) 500 - 5M instances / 5 - 50 features 5 min, 48 seconds
  • 8. Support Vector Machine Fit Times (seconds) 500 - 500K instances / 5 - 50 features 5 hours, 24 seconds
  • 9. Support Vector Machine Fit Times (seconds) 500 - 500K instances / 5 - 50 features 5 hours, 24 seconds 😵
  • 10. How to optimize? ● Be patient ● Be wrong ● Be rich ● Steer
  • 11.
  • 13.
  • 14.
  • 15. import matplotlib.pyplot as plt from sklearn.datasets import load_iris from yellowbrick.features import ParallelCoordinates data = load_iris() oz = ParallelCoordinates(ax=axes[idx], fast=True) oz.fit_transform(data.data, data.target) oz.finalize() Each point drawn individually as connected line segment With standardization Points grouped by class, each class drawn as single segment
  • 16.
  • 17. Bumps
  • 18. class Estimator(object): def fit(self, X, y=None): """ Fits estimator to data. """ # set state of self return self def predict(self, X): """ Predict response of X """ # compute predictions pred return pred class Transformer(Estimator): def transform(self, X): """ Transforms the input data. """ # transform X to X_prime return X_prime class Pipeline(Transfomer): @property def named_steps(self): """ Returns a sequence of estimators """ return self.steps @property def _final_estimator(self): """ Terminating estimator """ return self.steps[-1] The scikit-learn API self.X
  • 19. class Visualizer(Estimator): def draw(self): """ Draw called from scikit-learn methods. """ return self.ax def finalize(self): self.set_title() self.legend() def poof(self): self.finalize() plt.show() import matplotlib.pyplot as plt from yellowbrick.base import Visualizer class MyVisualizer(Visualizer): def __init__(self, ax=None, **kwargs): super(MyVisualizer, self).__init__(ax, **kwargs) def fit(self, X, y=None): self.draw(X) return self def draw(self, X): if self.ax is None: self.ax = self.gca() self.ax.plt(X) def finalize(self): self.set_title("My Visualizer") The Yellowbrick API
  • 20. Oneliners from sklearn.linear_model import Lasso from yellowbrick.regressor import ResidualsPlot # Option 1: scikit-learn style viz = ResidualsPlot(Lasso()) viz.fit(X_train, y_train) viz.score(X_test, y_test) viz.poof() from sklearn.linear_model import Lasso from yellowbrick.regressor import residuals_plot # Option 2: Quick Method viz = residuals_plot( Lasso(), X_train, y_train, X_test, y_test ) ��
  • 22. =========================================== test session starts ============================================ platform darwin -- Python 3.7.1, pytest-5.0.0, py-1.8.0, pluggy-0.12.0 rootdir: /Users/rbilbro/pyjects/yb, inifile: setup.cfg plugins: flakes-4.0.0, cov-2.7.1 collected 932 items tests/__init__.py s... [ 0%] tests/base.py s [ 0%] tests/conftest.py s [ 0%] tests/fixtures.py s [ 0%] tests/images.py s [ 0%] tests/rand.py s [ 0%] tests/test_base.py s............ [ 2%] ........................................................................................................... ........................................................................................................... ........................................................................................................... ........................................................................................................... tests/test_utils/test_target.py s............ [ 68%] tests/test_utils/test_timer.py s..... [ 68%] tests/test_utils/test_types.py s.................................................................... [ 70%] ....x................................x.............................................................. [ 72%] .... [ 73%] tests/test_utils/test_wrapper.py s.... ===================== 854 passed, 72 skipped, 6 xfailed, 33 warnings in 225.96 seconds =====================
  • 24. Brushing and Filtering Ok for only 5 features Not good for 23 features
  • 26. Parallelization with joblib Elbow Curve Validation Curve
  • 27. from yellowbrick.features import Rank2D from yellowbrick.pipeline import VisualPipeline from yellowbrick.model_selection import CVScores from yellowbrick.regressor import PredictionError viz_pipe = VisualPipeline([ ('rank2d', Rank2D(features=features, algorithm='covariance')), ('prederr', PredictionError(model)), ('cvscores', CVScores(model, cv=cv, scoring='r2')) ]) Visual Pipelines
  • 28. Models are aggregations, so are visualizations...
  • 29. …so use visualizations to steer model selection!