Presented by Hila Lamm, Chief Strategy Officer at Firefly.ai
Next DSS MIA Event - https://datascience.salon/miami/
Next DSS AUS Event - https://datascience.salon/austin/
With all the hype around auto machine learning for computer vision, businesses with structured data are left wondering: Is AutoML relevant for enterprise data? Can it alleviate the bottleneck that data science teams are experiencing?
Our team was experimenting with different types of enterprise challenges -- from optimizing pricing to credit card fraud detection to retail banking customer behavior -- and was able to automatically build models that produced top-ranking Kaggle results within a few hours. In this session, through customer use cases and under the hood insights, you will learn about the capabilities of AutoML as applied on Firefly. Oh, and we’ll also talk about how we attained a Kaggle 1st place score in just half an hour.
5. • The printing press was a factor in the
establishment of a community of scientists who
could easily communicate their discoveries through
widely disseminated scholarly journals, helping to
bring on the scientific revolution.
• Because the printing process ensured that the
same information fell on the same pages, page
numbering, tables of contents, and indices became
common.
• The arrival of mechanical movable type printing
introduced the era of mass communication, which
permanently altered the structure of society. The
relatively unrestricted circulation of information and
revolutionary ideas transcended borders.
https://courses.lumenlearning.com/suny-hccc-worldhistory/chapter/the-printing-revolution/
6. Two Types of Applied ML
Mimic human ability
Automation =
Faster, cheaper, more consistent
Improve on human
computation ability
Better decision making
7. • To lend or not to lend?
• When should I service the equipment?
• Is she about to leave me? What can I do to
make her stay?
• Is this a cyberattack or are you just happy
to see me?
• Is there someone on the train tracks or is it
just a cloud?
Which Decision Can I Superpower?
9. Kaggle 2017 The State of Data Science & Machine Learning
What barriers are faced at work?
7,376 responses, showing top 15 responses
Lack of data science talent
Lack of management support
Lack of clear questions to ask
Reality Check
10. 1. Find the questions behind key daily
decisions
2. Evaluate the impact of taking the
decisions faster or in more accurate way
3. Go fetch the data
Lack of clear questions to ask
11. Lack of management support >
What do executives want?
Business value (ROI)
Predictable time to delivery
Easy to scale up
Machine learning techniques
Research project
A new research project
FOCUS ON THE
BUSINESS
SHORT TIME TO
DELIVERY
SCALABLE AND
AUTOMATED
FAST, COST EFFECTIVE
EXPERIMENTS
12. Lack of data science talent
• Liberate talent from
routine work
• Use tools to make
machine learning
accessible to more people
13. AutoML features to look for:
• Algorithm availability
• Preprocessing capabilities
• Search methods
• Ensembles
• Explainability
• Enterprise readiness
Automatically build in parallel multiple
models to select the best
Open-Source/Paid AutoML tools
• AutoWEKA
• Auto-sklearn
• TPOT
• Google Cloud AutoML
• H20 AutoML
Apply data
preprocessing
Research to pinpoint
the right ML algorithm
Optimize
hyperparameters for
selected algorithms
Golden ML ensemble
Automate the design of
machine learning models:
AutoML
14. Data
Import &
Analysis
Regression
Classification
Recommendation
Time Series
Generate solution:
Ensemble of best
algorithms and
models
Meta learning
Preprocessing
Firefly API
Firefly Lab - Model building
Model exports
AutoML Platform
Anomaly Detection
Report results
and model insights
Algorithm selection,
Hyperparameter
optimization
Firefly Predict
Deploy on premises in
operational system
Firefly user interface
Upload dataset for
batch predictions
Real-time
predict requests
Batch predict
requests
15. Target: Reduce false alarm rate of
existing video analytics system
Data: Feature extraction from
moving objects in the videos - a
series of ellipses indicating areas of
change
Solution: Model per camera/sensor
location
Results: Reduced by 90% false
alarms
Case Study:
Homeland Security
17. Target: Identify cyberattacks based
on behavioral indicators
Data: Hundreds of features of
network IoT data
Need: Fast experiments to identify
relevant features per environment
Solution: Highly accurate, dedicated
models per environment
Case Study:
Cybersecurity
18. Predict the time it takes to pass
testing for different permutations
of Mercedes-Benz car features
Mercedes-Benz reliability
prediction
1st place
Of 3835 teams
377 30 min
Data Scientist
time
Features
19. Predicting customer satisfaction
using customer features
Santander Bank Kaggle
challenge
1st place
Of 5123 teams
370
Features
20 min
Data Scientist
time