SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Data Scientist, H2O.ai
GitHub: mlandry22, Email: mark@h2o.ai
Mark Landry
An Analysis of Driverless AI Feature Creation
Driver vs Driverless
Overview
• Analyze two recent problems with an emphasis on feature
generation
• Overview of the problem
• Discussion of my approach
• Show the features Driverless created
• Feature creation
• Feature representation
• Results
Who am I?
• Data scientist @ H2O since 2015, user since 2014
• 15 top 20 finishes in Kaggle, highest rank 33
• R | H2O | data.table | GBM
• The “driver”
Problem #1
How Many Attempts will a Student Make
• Online question/answer platform with computer science
problems
• Predict the number of attempts a particular student will make on
a particular problem
• Data
• Student: level, ranking, highest ranking
• Problem: type, 3-tier level, points awarded
• Training: 124,000 attempt counts
• Testing: 60,000 attempt counts
Regression as Classification
• Natural problem is numerical
• End user prefers buckets
• Volume
• 1: 53%
• 2 31%
• 3 9%
• 4 4%
• 5 1%
• 6 2%
The Driver Approach
• Think of it like a recommender problem
• Standard: Matrix factorization, collaborative filtering
• GBM: use deep categorical encodings
• Frequent use of target encoding
interaction
interaction
interaction
The Driver Approach
A messy chain of hierarchical target encoding and if/else statements
The Driver Approach
h2o.gbm feature importance – primarily using three target encodings
The Driverless Approach
The Driverless Approach
The Driverless Approach
The Driverless Approach
Top 5 Features: divided into components
• {16} {CV TE} {problem_id} {0}
• {51} {CV TE} {points * problem_id} {0}
• {3} {max_rating}
• {32} {freq} {last_online_time_seconds * rating}
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Feature #1: same base target encoding I found to be the best
• {16} {CV TE} {problem_id} {0}
• 16: indicator of base features – 16 is later used twice more in the top
15
• CV TE: Cross-Fold target encoding
• Problem_id: feature used as basis for target encoding
• 0: the target; multinomial, so the two other uses are for class 1 & 5
• {51} {CV TE} {points * problem_id} {0}
• {3} {max_rating}
• {32} {freq} {last_online_time_seconds * rating}
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Feature #2: deeper interaction
• {16} {CV TE} {problem_id} {0}
• {51} {CV TE} {points * problem_id} {0}
• 51: base feature ID
• CV TE: out of sample target encoding result
• points * problem: interaction of two different features, both related to
the problem; it is subdividing the problem further
• 0: again, rate of class 0 as the target
• {3} {max_rating}
• {32} {freq} {last_online_time_seconds * rating}
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Feature #3: no transformation
• {16} {CV TE} {problem_id} {0}
• {51} {CV TE} {points * problem_id} {0}
• {3} {max_rating}
• max rating
• used as is – no alternate encoding; was first natural feature in my model as well
• this is the first variable of the student dimension
• a 4-digit number with close to a normal distribution
• {32} {freq} {last_online_time_seconds * rating}
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Feature #4: frequency encoding
• {16} {CV TE} {problem_id} {0}
• {51} {CV TE} {points * problem_id} {0}
• {3} {max_rating}
• {32} {freq} {last_online_time_seconds * rating}
• Counting the occurrences of two fields
• Last online & rating are both numerics in the student dimension
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Feature 5: four-way interaction w/ target encoding
• {16} {CV TE} {problem_id} {0}
• {51} {CV TE} {points * problem_id} {0}
• {3} {max_rating}
• {32} {freq} {last_online_time_seconds * rating}
• {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
• Target encoding of class 0
• Finding the rate for each value of the result of a four way interaction
The Driverless Approach
Top 5 Features: did I try?
• YES {16} {CV TE} {problem_id} {0}
• NO {51} {CV TE} {points * problem_id} {0}
• YES {3} {max_rating}
• NO {32} {freq} {last_online_time_seconds * rating}
• NO {61} {NumToCatTE} {rating * max_rating * points *
last_online_time_seconds} {0}
The Driverless Approach
Problem #2
Bank Customer Churn
• Identify customers likely to churn balances in the next quarter
by 50%
• Data
• 300,00 training rows; 200,000 testing rows
• 377 columns
• Customer: age, gender, demographics
• Reported assets, liabilities
• Monthly balance history
The Driver Approach
Exploit before Explore
• 377 columns made [quick] manual investigation harder
• Rather than iterate: {analyze > model > analyze > … },
I changed to {model > analyze > model > … }
lagged features
The Driver Approach
After lagging, try differences and ratios
• Lagging features present the balance features at
several time steps.
• But often, the interesting part is not the raw balance
itself, but whether it is growing or shrinking
• Decision trees have a hard time “seeing” this so it is
wise to engineer mathematical features: + - * /
The Driver Approach
After lagging, try differences and ratios
• I used the leading monthly feature from the model and
created new features representing month-over-month
differences and a binary indicator
• One field, one specific length (1 month), two calculations
The Driverless Approach
The Driverless Approach
The Driverless Approach
It knows math!
subtraction
subtraction
subtraction
The Driverless Approach
Top 10 Features: divided into categories
• (3) Subtraction: #1, #6, #8
• (1) Truncated SVD components: #2
• (2) Cluster Distances: #3, #9
• (1) Target encoding: #4
• (3) Direct features: #5, #7, #10
The Driverless Approach
Lagged balances also used in clusters & SVD
• Distance to cluster #1 after segmenting columns into 6 clusters
• BAL_prev6
• D_prev1
• D_prev2
• I_AQB_PrevQ1
• Component #1 of truncated SVD of
• D_prev1
• D_prev2
• EOP_prev1_1
The Driverless Approach
Final Analysis
• On first iteration, Driverless AI had surpassed my manual
modeling
• Features were well beyond what I would have ever attempted
• Accuracy was stable: Driverless AI self-reported scores within
1% of competition submission
The Driverless Approach
Competition Results
Kaggle Grandmaster
Kaggle Grandmaster
Also Driverless AI
100% Driverless AI
The End
Thank You

Contenu connexe

Tendances

Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesWill Gardella
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentationehtshamelahi
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
Catch Me If You Can: Keeping Up With ML Models in Production
Catch Me If You Can: Keeping Up With ML Models in ProductionCatch Me If You Can: Keeping Up With ML Models in Production
Catch Me If You Can: Keeping Up With ML Models in ProductionDatabricks
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML AutomationBigML, Inc
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 

Tendances (20)

Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Catch Me If You Can: Keeping Up With ML Models in Production
Catch Me If You Can: Keeping Up With ML Models in ProductionCatch Me If You Can: Keeping Up With ML Models in Production
Catch Me If You Can: Keeping Up With ML Models in Production
 
Deploying ml
Deploying mlDeploying ml
Deploying ml
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Agile data visualisation
Agile data visualisationAgile data visualisation
Agile data visualisation
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 

Similaire à Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product Manager, H2O.ai

From science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productFrom science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productBruce Kuo
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
 
Just the Facets, Ma'am
Just the Facets, Ma'amJust the Facets, Ma'am
Just the Facets, Ma'amTeamstudio
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...Institute of Contemporary Sciences
 
Test Cases - are they dead?
Test Cases - are they dead?Test Cases - are they dead?
Test Cases - are they dead?SQALab
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...
 Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj... Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...
Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...SEAA 2022
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklistMax Kleiner
 
Model-based Testing: Taking BDD/ATDD to the Next Level
Model-based Testing: Taking BDD/ATDD to the Next LevelModel-based Testing: Taking BDD/ATDD to the Next Level
Model-based Testing: Taking BDD/ATDD to the Next LevelBob Binder
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 Omnilogy
 
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Philip Goddard
 
Predictive Analytics based Regression Test Optimization
Predictive Analytics based Regression Test OptimizationPredictive Analytics based Regression Test Optimization
Predictive Analytics based Regression Test OptimizationSTePINForum
 
Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharmaaioughydchapter
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategiesRaquel Pau
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsTechWell
 

Similaire à Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product Manager, H2O.ai (20)

From science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning productFrom science to engineering, the process to build a machine learning product
From science to engineering, the process to build a machine learning product
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Just the Facets, Ma'am
Just the Facets, Ma'amJust the Facets, Ma'am
Just the Facets, Ma'am
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Test Cases - are they dead?
Test Cases - are they dead?Test Cases - are they dead?
Test Cases - are they dead?
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...
 Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj... Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...
Model-Driven Optimization: Generating Smart Mutation Operators for Multi-Obj...
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 
Model-based Testing: Taking BDD/ATDD to the Next Level
Model-based Testing: Taking BDD/ATDD to the Next LevelModel-based Testing: Taking BDD/ATDD to the Next Level
Model-based Testing: Taking BDD/ATDD to the Next Level
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
 
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
 
Predictive Analytics based Regression Test Optimization
Predictive Analytics based Regression Test OptimizationPredictive Analytics based Regression Test Optimization
Predictive Analytics based Regression Test Optimization
 
Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharma
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategies
 
MDE in Practice
MDE in PracticeMDE in Practice
MDE in Practice
 
OOP.pptx
OOP.pptxOOP.pptx
OOP.pptx
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More Defects
 

Plus de Sri Ambati

Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 

Plus de Sri Ambati (20)

Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 

Dernier

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Dernier (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product Manager, H2O.ai

  • 1.
  • 2. Data Scientist, H2O.ai GitHub: mlandry22, Email: mark@h2o.ai Mark Landry
  • 3. An Analysis of Driverless AI Feature Creation Driver vs Driverless
  • 4. Overview • Analyze two recent problems with an emphasis on feature generation • Overview of the problem • Discussion of my approach • Show the features Driverless created • Feature creation • Feature representation • Results
  • 5. Who am I? • Data scientist @ H2O since 2015, user since 2014 • 15 top 20 finishes in Kaggle, highest rank 33 • R | H2O | data.table | GBM • The “driver”
  • 6. Problem #1 How Many Attempts will a Student Make • Online question/answer platform with computer science problems • Predict the number of attempts a particular student will make on a particular problem • Data • Student: level, ranking, highest ranking • Problem: type, 3-tier level, points awarded • Training: 124,000 attempt counts • Testing: 60,000 attempt counts
  • 7. Regression as Classification • Natural problem is numerical • End user prefers buckets • Volume • 1: 53% • 2 31% • 3 9% • 4 4% • 5 1% • 6 2%
  • 8. The Driver Approach • Think of it like a recommender problem • Standard: Matrix factorization, collaborative filtering • GBM: use deep categorical encodings • Frequent use of target encoding interaction interaction interaction
  • 9. The Driver Approach A messy chain of hierarchical target encoding and if/else statements
  • 10. The Driver Approach h2o.gbm feature importance – primarily using three target encodings
  • 14. The Driverless Approach Top 5 Features: divided into components • {16} {CV TE} {problem_id} {0} • {51} {CV TE} {points * problem_id} {0} • {3} {max_rating} • {32} {freq} {last_online_time_seconds * rating} • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 15. The Driverless Approach Feature #1: same base target encoding I found to be the best • {16} {CV TE} {problem_id} {0} • 16: indicator of base features – 16 is later used twice more in the top 15 • CV TE: Cross-Fold target encoding • Problem_id: feature used as basis for target encoding • 0: the target; multinomial, so the two other uses are for class 1 & 5 • {51} {CV TE} {points * problem_id} {0} • {3} {max_rating} • {32} {freq} {last_online_time_seconds * rating} • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 16. The Driverless Approach Feature #2: deeper interaction • {16} {CV TE} {problem_id} {0} • {51} {CV TE} {points * problem_id} {0} • 51: base feature ID • CV TE: out of sample target encoding result • points * problem: interaction of two different features, both related to the problem; it is subdividing the problem further • 0: again, rate of class 0 as the target • {3} {max_rating} • {32} {freq} {last_online_time_seconds * rating} • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 17. The Driverless Approach Feature #3: no transformation • {16} {CV TE} {problem_id} {0} • {51} {CV TE} {points * problem_id} {0} • {3} {max_rating} • max rating • used as is – no alternate encoding; was first natural feature in my model as well • this is the first variable of the student dimension • a 4-digit number with close to a normal distribution • {32} {freq} {last_online_time_seconds * rating} • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 18. The Driverless Approach Feature #4: frequency encoding • {16} {CV TE} {problem_id} {0} • {51} {CV TE} {points * problem_id} {0} • {3} {max_rating} • {32} {freq} {last_online_time_seconds * rating} • Counting the occurrences of two fields • Last online & rating are both numerics in the student dimension • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 19. The Driverless Approach Feature 5: four-way interaction w/ target encoding • {16} {CV TE} {problem_id} {0} • {51} {CV TE} {points * problem_id} {0} • {3} {max_rating} • {32} {freq} {last_online_time_seconds * rating} • {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0} • Target encoding of class 0 • Finding the rate for each value of the result of a four way interaction
  • 20. The Driverless Approach Top 5 Features: did I try? • YES {16} {CV TE} {problem_id} {0} • NO {51} {CV TE} {points * problem_id} {0} • YES {3} {max_rating} • NO {32} {freq} {last_online_time_seconds * rating} • NO {61} {NumToCatTE} {rating * max_rating * points * last_online_time_seconds} {0}
  • 22. Problem #2 Bank Customer Churn • Identify customers likely to churn balances in the next quarter by 50% • Data • 300,00 training rows; 200,000 testing rows • 377 columns • Customer: age, gender, demographics • Reported assets, liabilities • Monthly balance history
  • 23. The Driver Approach Exploit before Explore • 377 columns made [quick] manual investigation harder • Rather than iterate: {analyze > model > analyze > … }, I changed to {model > analyze > model > … } lagged features
  • 24. The Driver Approach After lagging, try differences and ratios • Lagging features present the balance features at several time steps. • But often, the interesting part is not the raw balance itself, but whether it is growing or shrinking • Decision trees have a hard time “seeing” this so it is wise to engineer mathematical features: + - * /
  • 25. The Driver Approach After lagging, try differences and ratios • I used the leading monthly feature from the model and created new features representing month-over-month differences and a binary indicator • One field, one specific length (1 month), two calculations
  • 28. The Driverless Approach It knows math! subtraction subtraction subtraction
  • 29. The Driverless Approach Top 10 Features: divided into categories • (3) Subtraction: #1, #6, #8 • (1) Truncated SVD components: #2 • (2) Cluster Distances: #3, #9 • (1) Target encoding: #4 • (3) Direct features: #5, #7, #10
  • 30. The Driverless Approach Lagged balances also used in clusters & SVD • Distance to cluster #1 after segmenting columns into 6 clusters • BAL_prev6 • D_prev1 • D_prev2 • I_AQB_PrevQ1 • Component #1 of truncated SVD of • D_prev1 • D_prev2 • EOP_prev1_1
  • 31. The Driverless Approach Final Analysis • On first iteration, Driverless AI had surpassed my manual modeling • Features were well beyond what I would have ever attempted • Accuracy was stable: Driverless AI self-reported scores within 1% of competition submission
  • 32. The Driverless Approach Competition Results Kaggle Grandmaster Kaggle Grandmaster Also Driverless AI 100% Driverless AI