SlideShare une entreprise Scribd logo
1  sur  55
Models in Minutes not Months:
Data Science as Microservices
Sarah Aerni, PhD
saerni@salesforce.com
@itweetsarah
​Einstein Platform
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
salesforce-ai-microservices
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
LIVE DEMO
Agenda
​BUILDING AI APPS: Perspective Of A Data Scientist
• Journey to building your first model
• Barriers to production along the way
​DEPLOYING MODELS IN PRODUCTION: Built For Reuse
• Where engineering and applications meet AI
• DevOps in Data Science – monitoring, alerting and iterating
​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices
• Create reusable ML pipeline code for multiple applications customers
• Data Scientists focus on exploration, validation and adding new apps and models
ENABLING DATA SCIENCE
​A DATA SCIENTISTS VIEW OF BUILDING MODELS
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
#Leads
Created Date
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Engineer Features
Empty fields
One-hot encoding (pivoting)
Email domain of a user
Business titles of a user
Historical spend
Email-Company Name Similarity
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf = svm.SVC()
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Geographies
#Leads
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Fresh Data Input
Delivery of
Predictions
Bringing a Model to Production Requires a Team
​Applications deliver predictions for
customer consumption
​Predictions are produced by the
models live in production
​Pipelines deliver the data for modeling
and scoring at an appropriate latency
​Monitoring systems allow us to check
the health of the models, data,
pipelines and app
Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
Data Engineers
Provide data access and management
capabilities for data scientists
Set up and monitor data pipelines
Improve performance of data
processing pipelines
Front-End Developers
Build customer-facing UI
Application instrumentation and
logging
​Data Scientists
​Continue evaluating models
​Monitor for anomalies and degradation
​Iteratively improve models in production
Bringing a Model to Production Requires a Team
Platform Engineers
Machine resource management
Alerting and monitoring
Product Managers
Gather requirements & feedback
Provide business context
Supporting a Model in Production is Complex
Only a small fraction of real-world ML systems is a composed of ML code, as
shown by the small black box in the middle. The required surrounding
infrastructure is fast and complex.
D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing
Systems (NIPS). 2015
MODELS IN PRODUDCTION
​WHAT IT TAKES TO DEPLOY AN AI-POWERED
APPLICATION
Supporting Models in Production is Mostly NOT AI
Only a small fraction of real-world ML systems
is a composed of ML code, as shown by the
small black box in the middle. The required
surrounding infrastructure is fast and
complex.
Adapted from D. Sculley, et al. Hidden technical debt in machine
learning systems. In Neural Information Processing Systems
(NIPS). 2015
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor ServiceDatastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
CI Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Monitoring your AI’s health like any other app
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Applications
App 1
App 2
App 3
Prov
Prov
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration ToolMicroservice architecture
Customizable model-evaluation &
monitoring dashboards
Scheduling and workflow
management
In-platform secured
experimentation and exploration
Data Scientists focus their
efforts on modeling and
evaluating results
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Why Stop at Microservices for Supporting Your ML Code?
Why stop here?
Your ML code can also be just a
collection of microservices!
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Auto Machine Learning
​Building reusable ML code
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #1
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #2Data Scientists on App #1
Let’s Add a Third App
Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
How This Process Would Look in Salesforce
150,000 customers
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
Einstein’sNewApproachtoAI
Democratizing AI for Everyone
Classical
Approach
Data
Sampling
Feature
Selection
Model
Selection
Score
Calibration
Integrate to
Application
Artificial
Intelligence
Einstein
Auto-ML
Data already prepped
Models automatically built
Predictions delivered in context
AI for CRM
Discover
Predict
Recommend
Automate
Numerical BucketsCategorical Variables Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Categorical Variables
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
1 0 0
1 0 0
0 0 1
0 0 1
0 1 0
0 0 1
0 0 0
0 1 0
Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Word Count
Word Count (no
stop words)
Is English Sentiment
4 2 1 1
6 3 1 1
9 4 0 0
6 4 0 -1
7 3 1 0
5 1 1 0
7 3 1 0
Numerical Buckets
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
What Now? How autoML can choose your model
​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> clf = svm.SVC()
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Should we try other model forms?Should we try other model forms?
Features?
Should we try other model forms?
Features?
Kernels or hyperparameters?
Each use case will have its own
model and features to use. We
enable building separate models
and features with 1 code base
using OP
Model 1
83% accuracy
Model 3
73% accuracy
Model 4
89% accuracy
…
Model 1 Model 2
Model 3 Model 4
…
MODEL GENERATION MODEL TESTINGCUSTOMER A
Model 2
91% accuracy
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
A tournament of models!
A tournament of models!
…
Model 1 Model 2
Model 3 Model 4
…
CUSTOMER B
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
34 22 66 58 41
B A C C B
M M F M F
Y Y N N Y
Model 1
Model 3
Model 4
Customer B
Model 2
Customer A
MODEL GENERATION MODEL TESTINGCUSTOMER A
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​8
Experiments Run this
Week
Deploy Monitors, Monitor, Repeat!
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​216
Models Trained
(curr.month)
​99.25%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​12
Experiments Run this
Week
Key Takeaways
• Deploying machine learning in production is hard
• Platforms are critical for enabling data scientist productivity
• Plan for multiple apps… always
• To ensure enabling rapid identification of areas of improvement and efficacy of new approaches
provide
• Monitoring services
• Experimentation frameworks
• Identify opportunities for reusability in all aspects, even your machine learning pipelines
• Help simplify the process of experimenting, deploying, and iterating
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
salesforce-ai-microservices

Contenu connexe

Plus de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Plus de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Models in Minutes not Months: AI as Microservices

  • 1. Models in Minutes not Months: Data Science as Microservices Sarah Aerni, PhD saerni@salesforce.com @itweetsarah ​Einstein Platform
  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ salesforce-ai-microservices • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 5. Agenda ​BUILDING AI APPS: Perspective Of A Data Scientist • Journey to building your first model • Barriers to production along the way ​DEPLOYING MODELS IN PRODUCTION: Built For Reuse • Where engineering and applications meet AI • DevOps in Data Science – monitoring, alerting and iterating ​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices • Create reusable ML pipeline code for multiple applications customers • Data Scientists focus on exploration, validation and adding new apps and models
  • 6. ENABLING DATA SCIENCE ​A DATA SCIENTISTS VIEW OF BUILDING MODELS
  • 7. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 8. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 9. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy #Leads Created Date
  • 10. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Engineer Features Empty fields One-hot encoding (pivoting) Email domain of a user Business titles of a user Historical spend Email-Company Name Similarity
  • 11. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf = svm.SVC() >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568
  • 12. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Geographies #Leads
  • 13. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy
  • 14. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Fresh Data Input Delivery of Predictions
  • 15. Bringing a Model to Production Requires a Team ​Applications deliver predictions for customer consumption ​Predictions are produced by the models live in production ​Pipelines deliver the data for modeling and scoring at an appropriate latency ​Monitoring systems allow us to check the health of the models, data, pipelines and app Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
  • 16. Data Engineers Provide data access and management capabilities for data scientists Set up and monitor data pipelines Improve performance of data processing pipelines Front-End Developers Build customer-facing UI Application instrumentation and logging ​Data Scientists ​Continue evaluating models ​Monitor for anomalies and degradation ​Iteratively improve models in production Bringing a Model to Production Requires a Team Platform Engineers Machine resource management Alerting and monitoring Product Managers Gather requirements & feedback Provide business context
  • 17. Supporting a Model in Production is Complex Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015
  • 18. MODELS IN PRODUDCTION ​WHAT IT TAKES TO DEPLOY AN AI-POWERED APPLICATION
  • 19. Supporting Models in Production is Mostly NOT AI Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. Adapted from D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015 Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 20. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources …
  • 21. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources … Web UICLI
  • 22. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Admin / Authentication Service Data Sources … Web UICLI
  • 23. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor ServiceDatastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 24. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov
  • 25. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 26. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI Exploration Tool
  • 27. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 28. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 29. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher CI Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 30. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 31. Monitoring your AI’s health like any other app ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 32. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 33. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov Applications App 1 App 2 App 3 Prov Prov Prov
  • 34. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 35. c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration ToolMicroservice architecture Customizable model-evaluation & monitoring dashboards Scheduling and workflow management In-platform secured experimentation and exploration Data Scientists focus their efforts on modeling and evaluating results How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location
  • 36. Why Stop at Microservices for Supporting Your ML Code? Why stop here? Your ML code can also be just a collection of microservices! Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 38. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #1
  • 39. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #2Data Scientists on App #1
  • 40. Let’s Add a Third App Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
  • 41. How This Process Would Look in Salesforce 150,000 customers ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12
  • 42. Einstein’sNewApproachtoAI Democratizing AI for Everyone Classical Approach Data Sampling Feature Selection Model Selection Score Calibration Integrate to Application Artificial Intelligence Einstein Auto-ML Data already prepped Models automatically built Predictions delivered in context AI for CRM Discover Predict Recommend Automate
  • 43. Numerical BucketsCategorical Variables Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 44. Categorical Variables ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0
  • 45. Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines Word Count Word Count (no stop words) Is English Sentiment 4 2 1 1 6 3 1 1 9 4 0 0 6 4 0 -1 7 3 1 0 5 1 1 0 7 3 1 0
  • 46. Numerical Buckets ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 47. What Now? How autoML can choose your model ​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> clf = svm.SVC() >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568 Should we try other model forms?Should we try other model forms? Features? Should we try other model forms? Features? Kernels or hyperparameters? Each use case will have its own model and features to use. We enable building separate models and features with 1 code base using OP
  • 48. Model 1 83% accuracy Model 3 73% accuracy Model 4 89% accuracy … Model 1 Model 2 Model 3 Model 4 … MODEL GENERATION MODEL TESTINGCUSTOMER A Model 2 91% accuracy Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y A tournament of models!
  • 49. A tournament of models! … Model 1 Model 2 Model 3 Model 4 … CUSTOMER B Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 34 22 66 58 41 B A C C B M M F M F Y Y N N Y Model 1 Model 3 Model 4 Customer B Model 2 Customer A MODEL GENERATION MODEL TESTINGCUSTOMER A Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y
  • 50. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​8 Experiments Run this Week
  • 51. Deploy Monitors, Monitor, Repeat! ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 52. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​216 Models Trained (curr.month) ​99.25% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​12 Experiments Run this Week
  • 53. Key Takeaways • Deploying machine learning in production is hard • Platforms are critical for enabling data scientist productivity • Plan for multiple apps… always • To ensure enabling rapid identification of areas of improvement and efficacy of new approaches provide • Monitoring services • Experimentation frameworks • Identify opportunities for reusability in all aspects, even your machine learning pipelines • Help simplify the process of experimenting, deploying, and iterating
  • 54.
  • 55. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ salesforce-ai-microservices