This meetup was recorded in Mountain View on November 12, 2019.
Recording of the presentation can be viewed here: https://www.youtube.com/watch?v=aRKZTVnyfPM&list=PLNtMya54qvOErCPus07wKDqHlTyMjgTbX&index=2&t=0s
Description:
H2O Driverless AI is H2O.ai's flagship platform for automatic machine learning. It fully automates the data science workflow including some of the most challenging tasks in applied data science such as feature engineering, model tuning, model optimization, and model deployment. Driverless AI turns Kaggle Grandmaster recipes into a full functioning platform that delivers "an expert data scientist in a box" from training to deployment. Driverless AI empowers data scientists to work on projects faster using automation and state-of-the-art computing power from GPUs to accomplish tasks in minutes that used to take months.
We're excited to have recently added the ability for users, partners, and customers to extend the platform with Bring-Your-Own-Recipe. Domain experts and advanced data scientists can now write their own recipes and seamlessly extend Driverless AI with their favorite tools from the rich ecosystem of open-source data science and machine learning libraries.
----------------------------------------------------------------------------
Ana's Bio:
Ana is a Data Science Evangelist for H2O.ai. Before H2O.ai, she worked as an Evangelist for Hortonworks (Cloudera). She holds a B.S. in Electrical Engineering and is currently pursuing a Master in Statistics with a concentration in Machine Learning at San Jose State University. When not at H2O.ai or school, she can be found in Fresno working with farmers to identify ML solutions for their agricultural challenges.
How to Troubleshoot Apps for the Modern Connected Worker
Meetup: Custom Machine Learning Recipes: Ingredients for Success
1. Custom Machine Learning
Recipes: Ingredients for
Success
Get Started with Open Source Custom Recipes
Ana Castro
Ana.Castro@h2o.ai
Rafael Coss
Rafael@h2o.ai
@racoss
2. 2
• aquarium.h2o.ai
– H2O.ai’s cloud environment that provides access to various tools
– Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive Setup Instructions
– https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0
H2O Aquarium 1
2
3. 3
• Automatic Machine Learning Workflow
• Extending Automatic Machine Learning … Open?
• What are custom recipes?
• Tutorial: Using custom recipes
Custom
Machine
Learning
Recipes
4. 4
Company
Founded in Silicon Valley in 2012
Funding: $147M | Series D
Investors: Goldman Sachs, Ping An, Wells
Fargo, NVIDIA, Nexus Ventures
Products
H2O Open Source Machine Learning
H2O Driverless AI: Automatic Machine Learning
Community
20,000 companies using open source
160,000 strong meetup community
Team
185 AI experts (Expert data scientists,
13 Kaggle Grandmasters, Distributed
Computing, Visualization)
Global
Mountain View, NYC, London, Paris, Ottawa,
Prague, Chennai, Singapore
H2O.ai Snapshot
5. 5
Driverless AI
Features Targe
t
Data Quality and
Transformation
Modeling
Table
Model
Building
Model
Data Integration
+
Automates Data Science and ML Workflows
6. 6
ML Solves Business Critical Problems Across Industries
Save Time. Save Money. Gain a Competitive Edge.
Wholesale / Commercial
Banking
• Know Your Customers (KYC)
• Anti-Money Laundering (AML)
Card / Payments Business
• Transaction frauds
• Collusion fraud
• Real-time targeting
• Credit risk scoring
• In-context promotion
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Financial Services
• Early cancer detection
• Product recommendations
• Personalized prescription
matching
• Medical claim fraud detection
• Flu season prediction
• Drug discovery
• ER and hospital
management
• Remote patient monitoring
• Medical test predictions
Healthcare and
Life Science
• Predictive maintenance
• Avoidable truck-rolls
• Customer churn prediction
• Improved customer viewing
experience
• Master data management
• In-context promotions
• Intelligent ad placements
• Personalized program
recommendations
Telecom
• Funnel predictions
• Personalized ads
• Credit scoring
• Fraud detection
• Next best offer
• Next best action
• Customer segmentation
• Customer churn
• Customer recommendations
• Ad predictions and fraud
Marketing and Retail
7. 7
Key Capabilities of H2O Driverless AI
• Automatic Feature Engineering
• Automatic Visualization
• Machine Learning Interpretability (MLI)
• Automatic Scoring Pipelines
• Natural Language Processing
• Time Series Forecasting
• Flexibility of Data & Deployment
• NVIDIA GPU Acceleration
• Bring-Your-Own Recipes
9. 9
Automatic Model Optimization
Make Your Own AI
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tunin
g
+ +
Survival of the Fittest
New Capabilities
Challenge
• Customize for domain use case
– Need additional algos, feature engineering, or optimize
for customer scorer
• Leverage their company IP (secret sauce)
• AI is a Fast Innovation space and can not wait for
vendor updates
10. 10
Automatic Model Optimization
Make Your Own AI
via Bring Your Own Recipe Capability
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tunin
g
+ +
Survival of the Fittest
New Capabilities
Challenge
• Customize for domain use case
– Need additional algos, feature engineering, or optimize
for customer scorer
• Leverage their company IP (secret sauce)
• AI is a Fast Innovation space and can not wait for
vendor updates
Solution
• Modular and extensible auto ML optimization
• App Store for AI
– Open source catalog of recipes (100+)
– Leverage company AI IP
• Integrate latest Machine Learning techniques
Transformations
...
Algorithms
...
Scorers
...
11. Confidential11
Make Your Own AI
via Bring Your Own Recipe Capability
ScorersAlgorithmsTransformations
New Capabilities
Data
Automatic Model Optimization
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Bring Your Own
✔ Import from open source (100+)
✔ H2O company catalog/Github
✔ Develop and upload new recipe
• Modular and extensible autoML
optimization
• App Store for AI
– Open source catalog of recipes
(100+)
– Leverage a company’s domain
expertise
• Integrate latest Machine Learning
techniques
• Customize for domain use case
• Import latest algorithms, techniques
without needing to upgrade entire
platform.
12. 12
H2O Driverless AI - How it works?
SQL
Local
Amazon S3
HDFS
X Y
Automatic Model Optimization
Automatic
Scoring Pipeline
Machine learning
Interpretability
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes
• i.i.d. data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Survival of the Fittest
Understand the data
shape, outliers,
missing values, etc.
1 Drag and Drop Data
2 Automatic Visualization
Use best practice model recipes
and the power of high performance
computing to iterate across
thousands of possible models
including advanced feature
engineering and parameter tuning
3 Automatic Model Optimization
Deploy ultra-low latency
Python or Java Automatic
Scoring Pipelines that include
feature transformations and
models
5 Automatic Scoring Pipelines
Bring data in from
cloud, big data and
desktop systems
Google BigQuery
Azure Blog Storage
Snowflake
Model
Documentation
Transformations
...
Algorithms
...
Scorers
...
4 Extensible and Open Recipes
13. 13
• Machine Learning Pipelines’ model prepped data to solve a business question
– Transformations are done on the original data to ensure it’s clean and most predictive
– Additional datasets may be brought in to add insights
– The data is modeled using an algorithm to find the optimal rules to solve the problem
– We determine the best model by using a specific metric, or scorer
• BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their
problems faster and with more precision by adding their expertise in the form of Python
code snippets
• By providing your own custom recipes, you can gain control over the optimization choices
that Driverless AI makes to best solve your machine learning problems
What is a Recipe…
17. 1818 Confidential18
• aquarium.h2o.ai
– H2O.ai’s cloud environment that provides access to various tools
– Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
– https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0
• Your data will disappear after 2 hours
– Run as many times as needed
H2O Aquarium 1
2
3
18. 19
About the dataset:
– Kaggle’s customer churn Telco dataset:
https://www.kaggle.com/becksddf/churn-in-telecoms-dataset
Add the data:
– /data/Splunk/churn
Launch base experiment
– Predict: Customer Churn
Launch a base Experiment
19. 20
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Transformers -> Numeric -> sum.py
4. Recipes -> Include Specific Transformer -> Select Values
5. Verify Transformer -> Launch Experiment
Custom Transformer
20. 21
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Scorers -> Classification -> binary-> brier_loss.py
4. Recipes -> Include Scorer ->Select Values
5. Scorer -> Select Brier -> Launch Experiment
Learn more: https://en.wikipedia.org/wiki/Brier_score
Custom Scorer
21. 22
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Models -> algorithms -> extra_trees.py ->RAW
4. Recipes -> Include Model ->Select Values
5. Scorer -> ExtraTrees-> Launch Experiment
Learn more: https://scikit-learn.org/
Custom Model