More Related Content Similar to Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Summit Seoul 2019 (20) More from Amazon Web Services Korea (20) Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Summit Seoul 20191. S U M M I T
S E O U L
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DataRobot, Automated ML
changes in methodology and benefits
홍운표
Customer Facing Data Scientist
DataRobot, Korea
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
1. DataRobot Introduction
Corporate Vision, Introduction
2. Data Science and Practices
Science Methodology, Data Scientist, Agony
3. Automated ML
What & How, Benefits, Live Demo
4. Future Direction
What & How
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Strategic Vision
Enabling the AI-Driven Enterprise
Where AI is applied in every business process to predict outcomes.
The AI-Driven Enterprise adapts to new conditions at incredible
speeds and continually self-optimizes based on predicting the future.
“If your competitor is rushing to build AI and you don’t, it will crush you.”
-Elon Musk
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Opportunity for Machine Learning in Any Business
Marketing
Predicting customer Lifetime Value (LTV)
Churn
Customer segmentation
Product mix (best product mix to reduce churn)
Cross selling/recommendation algorithms
Up selling
Channel optimization
Discount targeting
Responses rates
Reactivation likelihood
Adwords optimization and ad buying
In store traffic patterns
Aircraft scheduling
Sales
Lead prioritization
Demand forecasting
Pricing
Market Basket
Inventory management / Dynamic Pricing
Promos / Upgrades / Offers
Human Resources
Resume screening
Employee churn
Training recommendation
Talent management
Risk
Credit risk
Fraud detection
Accounts Payable Recovery
Anti-money laundering
Insurance Claims prediction
Readmission Risk
Warranty Analytics
Claim Prediction
Logistics
Procurement
Warehousing
Cost Analysis
Product life cycle
Demand Forecast
Assembly
Turnover
Banking
Insurance
Healthcare
Media
Pharma
Telco
Retail
Government
Energy
Transportation
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Filling the Gap
Accelerate the process of researching, testing and deploying predictive algorithms. Enable more people to
help research, test and deploy predictive algorithms.
KEY
Demand for predictive models
Supply of data scientists
Turn Analysts & Engineers
into Data Scientists
Increase Data Scientist
productivity Unmet demand for
Data Science
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The world’s most advanced Enterprise Machine Learning Automation platform
2012
Founded, HQ in Boston, MA
$224M
In funding
1,000,000,000+
Models built on DataRobot Cloud
250+
Data Scientists & Engineers (of 600+)
4
#1 ranked Data Scientists
50+
Top 3 finishes
INSURANCE FINTECH HEALTHCARE MARKETING BANKING MANY MORE
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Best Practices and Technology
The top ranked Data Scientists in the world
Owen Zhang
Product Advisor
Highest: 1st
MASTER
Xavier Conort
Chief Data Scientist
Highest: 1st
MASTER
Sergey Yurgenson
Data Scientist
Highest: 1st
MASTER
Amanda Schierz
Data Scientist
Current: 1st Female, 1st in UK
MASTER
Jeremy Achin
CEO & Co-Founder
Highest: 20th
MASTER
The best technologies in the world
Tom de Godoy
CTO & Co-Founder
Highest: 20th
MASTER
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Scientists
Data
Scientist
Programming
Skills
Math &
Stats
Domain
Expertise
Required Capabilities
1. Knowledge of the business
2. Knowledge of the data
3. Ability to write code to gather data
4. Ability to write code to explore/inspect data
5. Ability to write code to manipulate data
6. Ability to write code to extract actionable intel
7. Ability to write code to build models
8. Ability to write code to implement models
9. Foundational statistics
10. Internals of algorithms
11. Practical knowledge and experience
12. Knowing how to interpret and explain models
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scientific Methodology (Metaphysics)
Karl Popper
Observation/Rationale
Hypothesis
Experiment
Theory
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Science Methodology
Due to limited resource, call for amelioration
No target goal
A few algorithms
& prone to overfit
Aging of model
Not sufficient Explanations
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Science Landscape
Where do you mostly work?
Business
space
Feature
space
Algorithm
space
(LoB, DE, DS) (DE, DS)
ROI Availability Accuracy
LOB : Line of Business
DE : Data Engineer (ETL)
DS : Data Scientist
Actionable predictive model,
Valuable insights
(LOB, DS)
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Algorithm
Best Practices
Business-driven, feature-oriented analysis
Business Feature
Business
Algorithm
Business
Feature
Feature
Algorithm
Data Scientist Drives Business Drives Balanced and Promising
√
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Motivations for AutoML
Value of diverse set of algorithms
Source:
http://statweb.stanford.edu/~tibs/ElemStatLearn/
Methodology driven
Problem driven
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
And trends
Most modelling software using a
20th Century paradigm
Expert systems with
modelling intelligence
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Automated Machine Learning
• 10 steps to building models
• An expert system that knows how to do
each of these 10 steps, without human
instructions
• Human friendly – not a black box
• Fast and accurate
• Replicable data science
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What about DataRobot?
Key Points
• End to end automated machine learning – all 10 steps
are automated
• Hundreds of algorithms in the repository with new
algorithms being added regularly
• Chooses the best algorithms for your data
• Best-in-class human-friendly insights
• Widest range of deployment options
• Enterprise ready
• Automatic model reports
• Large support team around the world
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DataRobot Workflow
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different but powerful way of analysis
A few perspectives (many more)
Single model
Multiple models
Only interpretable
algorithm is chosen
Linear model is
preferable
No need of Hold-out
partition : just train/test
or k-fold CV
Hold-out partition for
evaluation of several
models
Blending starts from
existing model
Interpretability is
model-agnostic
Blending is fair-basis
reflecting multiple
models performance,
with speed vs accuracy
data
Interaction should be
considered for model
performance (linear
model)
Interaction
automatically reflected
in tree-based
algorithms. If interaction
should be of importance,
DR has GA2M model
and R/Python api
support for that
parameter tuning is
limited for a model and
time-consuming
Parameter tuning is
exhaustive for all
candidate models.
One can easily confine
the search space and
quickly get the results
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : safer model
Robust model free from the risk of overfitting
Average of these 5 validation
scores is the cross validation score
The holdout is completely
hidden from the models during
the training process. After you
have selected your optimal
model, you can score your
model on this to get your
holdout score.
Partition 1
(TRAINING)
Partition 2
(TRAINING)
Partition 3
(TRAINING)
Partition 4
(TRAINING)
Partition 5
(VALIDATION)
Holdout
Partition 1
(TRAINING)
Partition 2
(TRAINING)
Partition 3
(TRAINING)
Partition 4
(VALIDATION)
Partition 5
(TRAINING)
Holdout
Partition 1
(TRAINING)
Partition 2
(TRAINING)
Partition 3
(VALIDATION)
Partition 4
(TRAINING)
Partition 5
(TRAINING)
Holdout
Partition 1
(TRAINING)
Partition 2
(VALIDATION)
Partition 3
(TRAINING)
Partition 4
(TRAINING)
Partition 5
(TRAINING)
Holdout
Partition 1
(VALIDATION)
Partition 2
(TRAINING)
Partition 3
(TRAINING)
Partition 4
(TRAINING)
Partition 5
(TRAINING)
Holdout
CV-Fold #1
CV-Fold #2
CV-Fold #3
CV-Fold #4
CV-Fold #5
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : more effort on feature space
”Feature engineering is the art of data science” (Sergey Yurgenson)
Business
space
Feature
space
Algorithm
space
(LoB, DE, DS) (DE, DS)
ROI Availability Accuracy
Actionable predictive model,
Valuable insights
(LOB, DS)
Feature Impact
Feature Effects
Prediction Explanations
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : Explainability
Model-agnostic explanation
[Feature Impact] [Feature Effect] [Prediction Explanation]
• The importance of each feature
• Coincides with domain knowledge?
• Any new insights?
• Relationship among target and a feature
• Relationship reflects domain knowledge?
• Any new insights or feature transform?
• What is the basis of prediction?
• The predictions are reliablable to
business people?
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : effective blending
Search over candidates which promises tangible improvement
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : Hyper-param Tuning
Gradient-free and effective pattern search
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits : API integration
data scientists and developers can use API
Application server
Prediction worker
RestAPI,
R/Python pkg
Model Factory
Automatic
Model Refresh
Model
Diags & Viz
Feature
Engineering
App.
Integration
Custom Analysis and various Analysis
Notebook Web UIConsole US, EU region
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo : Bleedout prediction
Binary classification for QA
Process: Coating of thin film by covering the
surface with coating solution and drying,
followed by polymerizing with UV-light.
Problem: Unintended precipitation of
powder such as unpolymerized monomer,
antioxidant occurs causing “bleedout”. It
spoils the product and contaminates the
production line.
Data:
● Material: length of film roll
● Project type: production vs experiment
● Control: winding tension, UV-exposure
duration, O2 concentration etc
Winding in
Coating
Drying
Tension
UV exposure
(O2 purged)
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo : Bleedout MFG process
1) Unwinding 2) Coating 3) Drying
4) UV exposure 5) Tension Control 6) Winding
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo : Jupyter Notebook
Sagemaker Notebook Automatically Project Created & Run
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full-fledged Automation
Expanded coverage of automation, full automation
Consumption
Consuming and application
of advanced analytics in the
form of dashboards,
decisions, and analytics
powered applications.
Prep, Blend,
Agg and ETL
Self-Service BI, data prep,
blending, transformation,
feature engineering, and
sharing of insights. Data
pipeline and workflow
execution.
Data
Management
Data cataloging,
organization, and
collaboration. Automatic
indexing and knowledge
gathering made available to
the entire organization.
Analytics
(Advanced, Simple)
Simple: Self-Service BI,
charts, graphs,
tables, queries.
Advanced: Automated data
investigation for insights,
predictions, and
recommendations
Deployment
Powering business
applications by providing
advanced analytics
insights, predictions,
monitoring, and refresh on
new data. Hosted as an
API, SDK, or code.
35. 여러분의 피드백을 기다립니다!
#AWSSummit 해시태그로
소셜미디어에 여러분의
행사소감을 올려주세요.
AWS Summit Seoul 2019
모바일 앱과 QR코드를 통해
강연평가 및 설문조사에
참여해 주시기 바랍니다.
내년 Summit을 만들 여러분의
소중한 의견 부탁 드립니다.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
36. Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
홍운표
woonpyo.hong@datarobot.com
37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.