SlideShare une entreprise Scribd logo
1  sur  28
Custom Machine Learning
Recipes: Ingredients for
Success
Get Started with Open Source Custom Recipes
Ana Castro
Ana.Castro@h2o.ai
Rafael Coss
Rafael@h2o.ai
@racoss
2
• aquarium.h2o.ai
– H2O.ai’s cloud environment that provides access to various tools
– Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive Setup Instructions
– https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0
H2O Aquarium 1
2
3
• Automatic Machine Learning Workflow
• Extending Automatic Machine Learning … Open?
• What are custom recipes?
• Tutorial: Using custom recipes
Custom
Machine
Learning
Recipes
4
Company
Founded in Silicon Valley in 2012
Funding: $147M | Series D
Investors: Goldman Sachs, Ping An, Wells
Fargo, NVIDIA, Nexus Ventures
Products
H2O Open Source Machine Learning
H2O Driverless AI: Automatic Machine Learning
Community
20,000 companies using open source
160,000 strong meetup community
Team
185 AI experts (Expert data scientists,
13 Kaggle Grandmasters, Distributed
Computing, Visualization)
Global
Mountain View, NYC, London, Paris, Ottawa,
Prague, Chennai, Singapore
H2O.ai Snapshot
5
Driverless AI
Features Targe
t
Data Quality and
Transformation
Modeling
Table
Model
Building
Model
Data Integration
+
Automates Data Science and ML Workflows
6
ML Solves Business Critical Problems Across Industries
Save Time. Save Money. Gain a Competitive Edge.
Wholesale / Commercial
Banking
• Know Your Customers (KYC)
• Anti-Money Laundering (AML)
Card / Payments Business
• Transaction frauds
• Collusion fraud
• Real-time targeting
• Credit risk scoring
• In-context promotion
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Financial Services
• Early cancer detection
• Product recommendations
• Personalized prescription
matching
• Medical claim fraud detection
• Flu season prediction
• Drug discovery
• ER and hospital
management
• Remote patient monitoring
• Medical test predictions
Healthcare and
Life Science
• Predictive maintenance
• Avoidable truck-rolls
• Customer churn prediction
• Improved customer viewing
experience
• Master data management
• In-context promotions
• Intelligent ad placements
• Personalized program
recommendations
Telecom
• Funnel predictions
• Personalized ads
• Credit scoring
• Fraud detection
• Next best offer
• Next best action
• Customer segmentation
• Customer churn
• Customer recommendations
• Ad predictions and fraud
Marketing and Retail
7
Key Capabilities of H2O Driverless AI
• Automatic Feature Engineering
• Automatic Visualization
• Machine Learning Interpretability (MLI)
• Automatic Scoring Pipelines
• Natural Language Processing
• Time Series Forecasting
• Flexibility of Data & Deployment
• NVIDIA GPU Acceleration
• Bring-Your-Own Recipes
Confidential8
Democratize AI
Make every company an AI company.
8
9
Automatic Model Optimization
Make Your Own AI
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tunin
g
+ +
Survival of the Fittest
New Capabilities
Challenge
• Customize for domain use case
– Need additional algos, feature engineering, or optimize
for customer scorer
• Leverage their company IP (secret sauce)
• AI is a Fast Innovation space and can not wait for
vendor updates
10
Automatic Model Optimization
Make Your Own AI
via Bring Your Own Recipe Capability
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tunin
g
+ +
Survival of the Fittest
New Capabilities
Challenge
• Customize for domain use case
– Need additional algos, feature engineering, or optimize
for customer scorer
• Leverage their company IP (secret sauce)
• AI is a Fast Innovation space and can not wait for
vendor updates
Solution
• Modular and extensible auto ML optimization
• App Store for AI
– Open source catalog of recipes (100+)
– Leverage company AI IP
• Integrate latest Machine Learning techniques
Transformations
...
Algorithms
...
Scorers
...
Confidential11
Make Your Own AI
via Bring Your Own Recipe Capability
ScorersAlgorithmsTransformations
New Capabilities
Data
Automatic Model Optimization
Model Recipes
• i.i.d. data
• Time-series
• NLP
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Bring Your Own
✔ Import from open source (100+)
✔ H2O company catalog/Github
✔ Develop and upload new recipe
• Modular and extensible autoML
optimization
• App Store for AI
– Open source catalog of recipes
(100+)
– Leverage a company’s domain
expertise
• Integrate latest Machine Learning
techniques
• Customize for domain use case
• Import latest algorithms, techniques
without needing to upgrade entire
platform.
12
H2O Driverless AI - How it works?
SQL
Local
Amazon S3
HDFS
X Y
Automatic Model Optimization
Automatic
Scoring Pipeline
Machine learning
Interpretability
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes
• i.i.d. data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Survival of the Fittest
Understand the data
shape, outliers,
missing values, etc.
1 Drag and Drop Data
2 Automatic Visualization
Use best practice model recipes
and the power of high performance
computing to iterate across
thousands of possible models
including advanced feature
engineering and parameter tuning
3 Automatic Model Optimization
Deploy ultra-low latency
Python or Java Automatic
Scoring Pipelines that include
feature transformations and
models
5 Automatic Scoring Pipelines
Bring data in from
cloud, big data and
desktop systems
Google BigQuery
Azure Blog Storage
Snowflake
Model
Documentation
Transformations
...
Algorithms
...
Scorers
...
4 Extensible and Open Recipes
13
• Machine Learning Pipelines’ model prepped data to solve a business question
– Transformations are done on the original data to ensure it’s clean and most predictive
– Additional datasets may be brought in to add insights
– The data is modeled using an algorithm to find the optimal rules to solve the problem
– We determine the best model by using a specific metric, or scorer
• BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their
problems faster and with more precision by adding their expertise in the form of Python
code snippets
• By providing your own custom recipes, you can gain control over the optimization choices
that Driverless AI makes to best solve your machine learning problems
What is a Recipe…
14
https://github.com/h2oai/driverlessai-recipes
FAQ / Architecture Diagram etc.
16
• Automatic Machine Learning Workflow
• Extending Automatic Machine Learning … Open?
• What are custom recipes?
• Tutorial: Using custom recipes
– Transformer
– Scorer
– Model
Custom
Machine
Learning
Recipes
17
https://h2oai.github.io/tutorials/
https://h2oai.github.io/tutorials/get-started-with-open-source-custom-recipes-tutorial
1818 Confidential18
• aquarium.h2o.ai
– H2O.ai’s cloud environment that provides access to various tools
– Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
– https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0
• Your data will disappear after 2 hours
– Run as many times as needed
H2O Aquarium 1
2
3
19
About the dataset:
– Kaggle’s customer churn Telco dataset:
https://www.kaggle.com/becksddf/churn-in-telecoms-dataset
Add the data:
– /data/Splunk/churn
Launch base experiment
– Predict: Customer Churn
Launch a base Experiment
20
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Transformers -> Numeric -> sum.py
4. Recipes -> Include Specific Transformer -> Select Values
5. Verify Transformer -> Launch Experiment
Custom Transformer
21
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Scorers -> Classification -> binary-> brier_loss.py
4. Recipes -> Include Scorer ->Select Values
5. Scorer -> Select Brier -> Launch Experiment
Learn more: https://en.wikipedia.org/wiki/Brier_score
Custom Scorer
22
1. Experiments -> Exp1. Baseline -> New Model Same Parameters
2. Expert Settings -> Official Recipes External
3. Branch rel-1.8.0 -> Models -> algorithms -> extra_trees.py ->RAW
4. Recipes -> Include Model ->Select Values
5. Scorer -> ExtraTrees-> Launch Experiment
Learn more: https://scikit-learn.org/
Custom Model
23
• http://catalog.h2o.ai/
– https://github.com/h2oai/driverlessai-recipes
• https://h2oai.github.io/tutorials/
• https://h2oai-community.slack.com/
Resources
Thank You
2525
Giving Back to the
Community
26
26
27
H2O AI and ML Meetups Around the World
28
Giving Back to the Community
Giving H2O talks at other meetups
29
We are hiring in!
Find out more:
h2o.ai/careers

Contenu connexe

Plus de Sri Ambati

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2OSri Ambati
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation Sri Ambati
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in ManufacturingSri Ambati
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 RecapSri Ambati
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AISri Ambati
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsSri Ambati
 

Plus de Sri Ambati (20)

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AI
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking Myths
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Meetup: Custom Machine Learning Recipes: Ingredients for Success

  • 1. Custom Machine Learning Recipes: Ingredients for Success Get Started with Open Source Custom Recipes Ana Castro Ana.Castro@h2o.ai Rafael Coss Rafael@h2o.ai @racoss
  • 2. 2 • aquarium.h2o.ai – H2O.ai’s cloud environment that provides access to various tools – Recommended for use as a training, workshops and tutorials • Driverless AI Test Drive Setup Instructions – https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0 H2O Aquarium 1 2
  • 3. 3 • Automatic Machine Learning Workflow • Extending Automatic Machine Learning … Open? • What are custom recipes? • Tutorial: Using custom recipes Custom Machine Learning Recipes
  • 4. 4 Company Founded in Silicon Valley in 2012 Funding: $147M | Series D Investors: Goldman Sachs, Ping An, Wells Fargo, NVIDIA, Nexus Ventures Products H2O Open Source Machine Learning H2O Driverless AI: Automatic Machine Learning Community 20,000 companies using open source 160,000 strong meetup community Team 185 AI experts (Expert data scientists, 13 Kaggle Grandmasters, Distributed Computing, Visualization) Global Mountain View, NYC, London, Paris, Ottawa, Prague, Chennai, Singapore H2O.ai Snapshot
  • 5. 5 Driverless AI Features Targe t Data Quality and Transformation Modeling Table Model Building Model Data Integration + Automates Data Science and ML Workflows
  • 6. 6 ML Solves Business Critical Problems Across Industries Save Time. Save Money. Gain a Competitive Edge. Wholesale / Commercial Banking • Know Your Customers (KYC) • Anti-Money Laundering (AML) Card / Payments Business • Transaction frauds • Collusion fraud • Real-time targeting • Credit risk scoring • In-context promotion Retail Banking • Deposit fraud • Customer churn prediction • Auto-loan Financial Services • Early cancer detection • Product recommendations • Personalized prescription matching • Medical claim fraud detection • Flu season prediction • Drug discovery • ER and hospital management • Remote patient monitoring • Medical test predictions Healthcare and Life Science • Predictive maintenance • Avoidable truck-rolls • Customer churn prediction • Improved customer viewing experience • Master data management • In-context promotions • Intelligent ad placements • Personalized program recommendations Telecom • Funnel predictions • Personalized ads • Credit scoring • Fraud detection • Next best offer • Next best action • Customer segmentation • Customer churn • Customer recommendations • Ad predictions and fraud Marketing and Retail
  • 7. 7 Key Capabilities of H2O Driverless AI • Automatic Feature Engineering • Automatic Visualization • Machine Learning Interpretability (MLI) • Automatic Scoring Pipelines • Natural Language Processing • Time Series Forecasting • Flexibility of Data & Deployment • NVIDIA GPU Acceleration • Bring-Your-Own Recipes
  • 8. Confidential8 Democratize AI Make every company an AI company. 8
  • 9. 9 Automatic Model Optimization Make Your Own AI Model Recipes • i.i.d. data • Time-series • NLP Advanced Feature Engineering Algorithm Model Tunin g + + Survival of the Fittest New Capabilities Challenge • Customize for domain use case – Need additional algos, feature engineering, or optimize for customer scorer • Leverage their company IP (secret sauce) • AI is a Fast Innovation space and can not wait for vendor updates
  • 10. 10 Automatic Model Optimization Make Your Own AI via Bring Your Own Recipe Capability Model Recipes • i.i.d. data • Time-series • NLP Advanced Feature Engineering Algorithm Model Tunin g + + Survival of the Fittest New Capabilities Challenge • Customize for domain use case – Need additional algos, feature engineering, or optimize for customer scorer • Leverage their company IP (secret sauce) • AI is a Fast Innovation space and can not wait for vendor updates Solution • Modular and extensible auto ML optimization • App Store for AI – Open source catalog of recipes (100+) – Leverage company AI IP • Integrate latest Machine Learning techniques Transformations ... Algorithms ... Scorers ...
  • 11. Confidential11 Make Your Own AI via Bring Your Own Recipe Capability ScorersAlgorithmsTransformations New Capabilities Data Automatic Model Optimization Model Recipes • i.i.d. data • Time-series • NLP Advanced Feature Engineering Algorithm Model Tuning+ + Bring Your Own ✔ Import from open source (100+) ✔ H2O company catalog/Github ✔ Develop and upload new recipe • Modular and extensible autoML optimization • App Store for AI – Open source catalog of recipes (100+) – Leverage a company’s domain expertise • Integrate latest Machine Learning techniques • Customize for domain use case • Import latest algorithms, techniques without needing to upgrade entire platform.
  • 12. 12 H2O Driverless AI - How it works? SQL Local Amazon S3 HDFS X Y Automatic Model Optimization Automatic Scoring Pipeline Machine learning Interpretability Deploy Low-latency Scoring to Production Modelling Dataset Model Recipes • i.i.d. data • Time-series • More on the way Advanced Feature Engineering Algorithm Model Tuning+ + Survival of the Fittest Understand the data shape, outliers, missing values, etc. 1 Drag and Drop Data 2 Automatic Visualization Use best practice model recipes and the power of high performance computing to iterate across thousands of possible models including advanced feature engineering and parameter tuning 3 Automatic Model Optimization Deploy ultra-low latency Python or Java Automatic Scoring Pipelines that include feature transformations and models 5 Automatic Scoring Pipelines Bring data in from cloud, big data and desktop systems Google BigQuery Azure Blog Storage Snowflake Model Documentation Transformations ... Algorithms ... Scorers ... 4 Extensible and Open Recipes
  • 13. 13 • Machine Learning Pipelines’ model prepped data to solve a business question – Transformations are done on the original data to ensure it’s clean and most predictive – Additional datasets may be brought in to add insights – The data is modeled using an algorithm to find the optimal rules to solve the problem – We determine the best model by using a specific metric, or scorer • BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their problems faster and with more precision by adding their expertise in the form of Python code snippets • By providing your own custom recipes, you can gain control over the optimization choices that Driverless AI makes to best solve your machine learning problems What is a Recipe…
  • 15. 16 • Automatic Machine Learning Workflow • Extending Automatic Machine Learning … Open? • What are custom recipes? • Tutorial: Using custom recipes – Transformer – Scorer – Model Custom Machine Learning Recipes
  • 17. 1818 Confidential18 • aquarium.h2o.ai – H2O.ai’s cloud environment that provides access to various tools – Recommended for use as a training, workshops and tutorials • Driverless AI Test Drive – https://h2oai.github.io/tutorials/getting-started-with-driverless-ai-test-drive/#0 • Your data will disappear after 2 hours – Run as many times as needed H2O Aquarium 1 2 3
  • 18. 19 About the dataset: – Kaggle’s customer churn Telco dataset: https://www.kaggle.com/becksddf/churn-in-telecoms-dataset Add the data: – /data/Splunk/churn Launch base experiment – Predict: Customer Churn Launch a base Experiment
  • 19. 20 1. Experiments -> Exp1. Baseline -> New Model Same Parameters 2. Expert Settings -> Official Recipes External 3. Branch rel-1.8.0 -> Transformers -> Numeric -> sum.py 4. Recipes -> Include Specific Transformer -> Select Values 5. Verify Transformer -> Launch Experiment Custom Transformer
  • 20. 21 1. Experiments -> Exp1. Baseline -> New Model Same Parameters 2. Expert Settings -> Official Recipes External 3. Branch rel-1.8.0 -> Scorers -> Classification -> binary-> brier_loss.py 4. Recipes -> Include Scorer ->Select Values 5. Scorer -> Select Brier -> Launch Experiment Learn more: https://en.wikipedia.org/wiki/Brier_score Custom Scorer
  • 21. 22 1. Experiments -> Exp1. Baseline -> New Model Same Parameters 2. Expert Settings -> Official Recipes External 3. Branch rel-1.8.0 -> Models -> algorithms -> extra_trees.py ->RAW 4. Recipes -> Include Model ->Select Values 5. Scorer -> ExtraTrees-> Launch Experiment Learn more: https://scikit-learn.org/ Custom Model
  • 22. 23 • http://catalog.h2o.ai/ – https://github.com/h2oai/driverlessai-recipes • https://h2oai.github.io/tutorials/ • https://h2oai-community.slack.com/ Resources
  • 24. 2525 Giving Back to the Community
  • 25. 26 26
  • 26. 27 H2O AI and ML Meetups Around the World
  • 27. 28 Giving Back to the Community Giving H2O talks at other meetups
  • 28. 29 We are hiring in! Find out more: h2o.ai/careers