SlideShare une entreprise Scribd logo
1  sur  71
Télécharger pour lire hors ligne
July 4 - 6, 2022
2 n d E d i t i o n
BigML, Inc #DutchMLSchool
The road to production
Automating and deploying Machine Learning projects
2
jao
CTO, BigML
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
3 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
• Accessibility
• Integrability
• Automation
• Ease of use
4 / 61
Machine Learning as a System Service
5 / 61
Machine Learning as a System Service
The goal
Machine Learning as a system level
service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over algorithms
• Automation
6 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
7 / 61
RESTful-ish ML Services
8 / 61
RESTful-ish ML Services
9 / 61
RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes moot
• Maximizes reach (Web, CLI, desktop, IoT)
10 / 61
RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resources management
problems mostly washed away
11 / 61
RESTful-ish ML Services
12 / 61
RESTful-ish ML Services
13 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
14 / 61
Textbook Machine Learning workflows
Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
15 / 61
ML workflows for real
16 / 61
ML workflows for real
17 / 61
ML workflows for real
18 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
19 / 61
Tumor detection using anomalies
Given data about a tumor:
• Extract the relevant features that
characterize it (unsupervised
learning)
• Classify the tumor as either benign
or malignant, improving diagnosis
and avoiding unnecessary surgery
Example: University of Wisconsin Hospital’s Cancer dataset
https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
19 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: workflow
20 / 61
Tumor detection using anomalies: Evaluation
Is the anomaly score a good predictor in real cases?
21 / 61
Tumor detection using anomalies: Automation?
22 / 61
Web UI
23 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
24 / 61
(Non) automation via Web UI
Strengths of Web UI
Simple Just clicking around
Discoverable Exploration and experimenting
Abstract Transparent error handling and scalability
Problems of Web UI
Only simple Simple tasks are simple, hard tasks quickly get hard
No automation or batch operations Clicking humans don’t scale well
24 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
25 / 61
Abstracting over raw HTTP: bindings
26 / 61
Example workflow
27 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
28 / 61
Is this production code?
How do we generalize to, say, 100 datasets?
29 / 61
Example workflow: Python bindings
# Now do it 100 times, serially
for i in range(0, 100):
r, s = 0.8, i
train = api.create_dataset(dataset, {"rate": r, "seed": s})
test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train)
model.append(api.create_model(train))
api.ok(model)
api.ok(test)
evaluation.append(api.create_evaluation(model, test))
api.ok(evaluation[i])
30 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
# Do we wait here?
api.ok(train[i])
api.ok(test[i])
for i in range(0, 100):
model.append(api.create_model(train[i]))
api.ok(model[i])
for i in range(0, 100):
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
31 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but at what level?
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
# Or do we wait here?
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
# and here?
api.ok(model[i])
api.ok(train[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
32 / 61
Example workflow: Python bindings
# More efficient if we parallelize, but how do we handle errors??
for i in range(0, 100):
r, s = 0.8, i
train.append(api.create_dataset(dataset, {"rate": r, "seed": s}))
test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
for i in range(0, 100):
api.ok(train[i])
model.append(api.create_model(train[i]))
for i in range(0, 100):
try:
api.ok(model[i])
api.ok(test[i])
evaluation.append(api.create_evaluation(model, test_dataset))
api.ok(evaluation[i])
except:
# How to recover if test[i] is failed? New datasets? Abort?
33 / 61
Client-side Machine Learning Automation
Problems of bindings-based, client solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows are hard to optimize
Reproducibility Noisy, complex and hard to audit development environment
Not enough abstraction
34 / 61
A partial solution: CLI declarative tools
# "1-click" dataset with parameterized fields
bigmler --train data/diabetes.csv 
--no-model 
--name "4-featured diabetes" 
--dataset-fields 
"plasma glucose,insulin,diabetes pedigree,diabetes" 
--output-dir output/diabetes 
--project "Certification Workshop"
# "1-click" ensemble
bigmler --train data/iris.csv 
--number-of-models 500 
--sample-rate 0.85 
--output-dir output/iris-ensemble 
--project "Certification Workshop"
35 / 61
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation # parameterized input 
--dataset $(cat output/diabetes/dataset) 
--k-folds 3 # number of folds during validation 
--output-dir output/diabetes-validation
36 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough automation
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Not enough abstraction
37 / 61
Client-side Machine Learning automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated as parts of bigger
client–side workflows
Hard to audit Client–side development environments are complex and very hard
to sandbox
Algorithmic complexity and computing resources management problems mostly
washed away are back!
37 / 61
Client-side Machine Learning automation
Algorithmic complexity and computing resources management problems are back! 38 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
39 / 61
Machine Learning Automation
40 / 61
Solution (scalability, reuse): Back to the server
41 / 61
Server–side automation: Scriptify
42 / 61
Server–side automation: Scriptify
42 / 61
Solution (complexity, reuse): Domain–specific languages
43 / 61
In a Nutshell
1. Workflows reified as server–side, RESTful resources
2. Domain–specific language for ML workflow automation
44 / 61
Workflows as RESTful Resources
Library Reusable building-block: a collection of WhizzML
definitions that can be imported by other libraries or
scripts.
Script Executable code that describes an actual workflow.
• Imports List of libraries with code used by the script.
• Inputs List of input values that parameterize the
workflow.
• Outputs List of values computed by the script and
returned to the user.
Execution Given a script and a complete set of inputs, the workflow
can be executed and its outputs generated.
45 / 61
Workflows as RESTful Resources: the bazaar
46 / 61
Workflows as RESTful Resources: metaprogramming
Resources that create
resources that create
resources that create
resources that create 47 / 61
Different ways of executing WhizzML Scripts
Web UI
BigMLer
Bindings
Executions
−→
48 / 61
Executing WhizzML scripts: bindings
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
49 / 61
Creating and executing WhizzML scripts with BigMLer
bigmler execute --code "(+ 1 2)" --output-dir simple_exe
bigmler execute --script script/50a2bb64035d0706db000643
bigmler execute --script script/50a2bb64035d0706db000643 
--inputs my_inputs.json
bigmler execute --code '(define addition (+ a b))' 
--declare-inputs my_inputs_dec.json 
--declare-outputs my_outputs_dec.json 
--no-execute
50 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
51 / 61
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
dataset = api.create_dataset(source)
api.ok(dataset)
r, s = 0.8, "seed"
train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s})
test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True})
api.ok(train_dataset)
model = api.create_model(train_dataset)
api.ok(model)
api.ok(test_dataset)
evaluation = api.create_evaluation(model, test_dataset)
api.ok(evaluation)
52 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
53 / 61
Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
Ready for production!
53 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for 1 resource
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id model-id))
54 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
55 / 61
Domain Specificity and Scalability: Trivial parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (split splits)
(create-evaluation (create-model (split 0)) (split 1))))
Ready for production!
55 / 61
ML workflows for real
56 / 61
Syntactic Abstraction in WhizzML: Simple workflow
(let (score (create-anomalyscore anomaly-id input))
(if (> score threshold)
(raise "Input is too weird to predict")
(create-prediction model-id input)))
Ready for production!
57 / 61
Domain Specificity and Scalability: Trivial parallelization
(for (input inputs)
(when (< (create-anomalyscore anomaly-id input))
(create-prediction model-id input)))
Ready for production!
58 / 61
Outline
1 ML as a system service
2 ML as a RESTful cloudy service
3 Machine Learning worflows
4 Client–side automation
5 Server–side workflow automation
6 A first taste of WhizzML: abstraction is back
7 And back to the (distributed) client: BigMLOps
59 / 61
Package and deploy BigML work
fl
ows in a few clicks
Deploy and
monitor your
application
1 Create an Application
Ops
2 Connect to BigML and add
Work
fl
ows and models
3 Package everything
in a container
4
60 / 61
61 / 61

Contenu connexe

Similaire à DutchMLSchool 2022 - Automation

VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsBigML, Inc
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-shareBruno Capuano
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoMLBruno Capuano
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Dan Crankshaw
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challangesIvica Crnkovic
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016 Mahesh Dananjaya
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in productionStepan Pushkarev
 
VSSML18. REST API and Bindings
VSSML18. REST API and BindingsVSSML18. REST API and Bindings
VSSML18. REST API and BindingsBigML, Inc
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Mining attributes
Mining attributesMining attributes
Mining attributesSandra Alex
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Marco Zamana
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Databricks
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 

Similaire à DutchMLSchool 2022 - Automation (20)

VSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic WorkflowsVSSML17 L7. REST API, Bindings, and Basic Workflows
VSSML17 L7. REST API, Bindings, and Basic Workflows
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
 
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
2021 06 19 ms student ambassadors nigeria ml net 01   slide-share2021 06 19 ms student ambassadors nigeria ml net 01   slide-share
2021 06 19 ms student ambassadors nigeria ml net 01 slide-share
 
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
VSSML18. REST API and Bindings
VSSML18. REST API and BindingsVSSML18. REST API and Bindings
VSSML18. REST API and Bindings
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Mining attributes
Mining attributesMining attributes
Mining attributes
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 

Plus de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 

Plus de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Dernier

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Dernier (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

DutchMLSchool 2022 - Automation

  • 1. July 4 - 6, 2022 2 n d E d i t i o n
  • 2. BigML, Inc #DutchMLSchool The road to production Automating and deploying Machine Learning projects 2 jao CTO, BigML
  • 3. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 3 / 61
  • 4. Machine Learning as a System Service The goal Machine Learning as a system level service • Accessibility • Integrability • Automation • Ease of use 4 / 61
  • 5. Machine Learning as a System Service 5 / 61
  • 6. Machine Learning as a System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation 6 / 61
  • 7. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 7 / 61
  • 10. RESTful done right: Whitebox resources • Your data, your model • Model reverse engineering becomes moot • Maximizes reach (Web, CLI, desktop, IoT) 10 / 61
  • 11. RESTful-ish ML Services • Excellent abstraction layer • Transparent data model • Immutable resources and UUIDs: traceability • Simple yet effective interaction model • Easy access from any language (API bindings) Algorithmic complexity and computing resources management problems mostly washed away 11 / 61
  • 14. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 14 / 61
  • 15. Textbook Machine Learning workflows Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/) 15 / 61
  • 16. ML workflows for real 16 / 61
  • 17. ML workflows for real 17 / 61
  • 18. ML workflows for real 18 / 61
  • 19. Tumor detection using anomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery 19 / 61
  • 20. Tumor detection using anomalies Given data about a tumor: • Extract the relevant features that characterize it (unsupervised learning) • Classify the tumor as either benign or malignant, improving diagnosis and avoiding unnecessary surgery Example: University of Wisconsin Hospital’s Cancer dataset https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ 19 / 61
  • 21. Tumor detection using anomalies: workflow 20 / 61
  • 22. Tumor detection using anomalies: workflow 20 / 61
  • 23. Tumor detection using anomalies: workflow 20 / 61
  • 24. Tumor detection using anomalies: workflow 20 / 61
  • 25. Tumor detection using anomalies: Evaluation Is the anomaly score a good predictor in real cases? 21 / 61
  • 26. Tumor detection using anomalies: Automation? 22 / 61
  • 28. (Non) automation via Web UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability 24 / 61
  • 29. (Non) automation via Web UI Strengths of Web UI Simple Just clicking around Discoverable Exploration and experimenting Abstract Transparent error handling and scalability Problems of Web UI Only simple Simple tasks are simple, hard tasks quickly get hard No automation or batch operations Clicking humans don’t scale well 24 / 61
  • 30. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 25 / 61
  • 31. Abstracting over raw HTTP: bindings 26 / 61
  • 33. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 28 / 61
  • 34. Is this production code? How do we generalize to, say, 100 datasets? 29 / 61
  • 35. Example workflow: Python bindings # Now do it 100 times, serially for i in range(0, 100): r, s = 0.8, i train = api.create_dataset(dataset, {"rate": r, "seed": s}) test = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train) model.append(api.create_model(train)) api.ok(model) api.ok(test) evaluation.append(api.create_evaluation(model, test)) api.ok(evaluation[i]) 30 / 61
  • 36. Example workflow: Python bindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) # Do we wait here? api.ok(train[i]) api.ok(test[i]) for i in range(0, 100): model.append(api.create_model(train[i])) api.ok(model[i]) for i in range(0, 100): evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 31 / 61
  • 37. Example workflow: Python bindings # More efficient if we parallelize, but at what level? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): # Or do we wait here? api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): # and here? api.ok(model[i]) api.ok(train[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) 32 / 61
  • 38. Example workflow: Python bindings # More efficient if we parallelize, but how do we handle errors?? for i in range(0, 100): r, s = 0.8, i train.append(api.create_dataset(dataset, {"rate": r, "seed": s})) test.append(api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) for i in range(0, 100): api.ok(train[i]) model.append(api.create_model(train[i])) for i in range(0, 100): try: api.ok(model[i]) api.ok(test[i]) evaluation.append(api.create_evaluation(model, test_dataset)) api.ok(evaluation[i]) except: # How to recover if test[i] is failed? New datasets? Abort? 33 / 61
  • 39. Client-side Machine Learning Automation Problems of bindings-based, client solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows are hard to optimize Reproducibility Noisy, complex and hard to audit development environment Not enough abstraction 34 / 61
  • 40. A partial solution: CLI declarative tools # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project "Certification Workshop" # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "Certification Workshop" 35 / 61
  • 41. Rich, parameterized workflows: cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation 36 / 61
  • 42. Client-side Machine Learning automation Problems of client-side solutions Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough automation 37 / 61
  • 43. Client-side Machine Learning automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough abstraction 37 / 61
  • 44. Client-side Machine Learning automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Algorithmic complexity and computing resources management problems mostly washed away are back! 37 / 61
  • 45. Client-side Machine Learning automation Algorithmic complexity and computing resources management problems are back! 38 / 61
  • 46. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 39 / 61
  • 48. Solution (scalability, reuse): Back to the server 41 / 61
  • 51. Solution (complexity, reuse): Domain–specific languages 43 / 61
  • 52. In a Nutshell 1. Workflows reified as server–side, RESTful resources 2. Domain–specific language for ML workflow automation 44 / 61
  • 53. Workflows as RESTful Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. 45 / 61
  • 54. Workflows as RESTful Resources: the bazaar 46 / 61
  • 55. Workflows as RESTful Resources: metaprogramming Resources that create resources that create resources that create resources that create 47 / 61
  • 56. Different ways of executing WhizzML Scripts Web UI BigMLer Bindings Executions −→ 48 / 61
  • 57. Executing WhizzML scripts: bindings from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) 49 / 61
  • 58. Creating and executing WhizzML scripts with BigMLer bigmler execute --code "(+ 1 2)" --output-dir simple_exe bigmler execute --script script/50a2bb64035d0706db000643 bigmler execute --script script/50a2bb64035d0706db000643 --inputs my_inputs.json bigmler execute --code '(define addition (+ a b))' --declare-inputs my_inputs_dec.json --declare-outputs my_outputs_dec.json --no-execute 50 / 61
  • 59. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 51 / 61
  • 60. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' dataset = api.create_dataset(source) api.ok(dataset) r, s = 0.8, "seed" train_dataset = api.create_dataset(dataset, {"rate": r, "seed": s}) test_dataset = api.create_dataset(dataset, {"rate": r, "seed": s, "out_of_bag": True}) api.ok(train_dataset) model = api.create_model(train_dataset) api.ok(model) api.ok(test_dataset) evaluation = api.create_evaluation(model, test_dataset) api.ok(evaluation) 52 / 61
  • 61. Syntactic Abstraction in WhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) 53 / 61
  • 62. Syntactic Abstraction in WhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0})) Ready for production! 53 / 61
  • 63. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for 1 resource (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id)) 54 / 61
  • 64. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) 55 / 61
  • 65. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (split splits) (create-evaluation (create-model (split 0)) (split 1)))) Ready for production! 55 / 61
  • 66. ML workflows for real 56 / 61
  • 67. Syntactic Abstraction in WhizzML: Simple workflow (let (score (create-anomalyscore anomaly-id input)) (if (> score threshold) (raise "Input is too weird to predict") (create-prediction model-id input))) Ready for production! 57 / 61
  • 68. Domain Specificity and Scalability: Trivial parallelization (for (input inputs) (when (< (create-anomalyscore anomaly-id input)) (create-prediction model-id input))) Ready for production! 58 / 61
  • 69. Outline 1 ML as a system service 2 ML as a RESTful cloudy service 3 Machine Learning worflows 4 Client–side automation 5 Server–side workflow automation 6 A first taste of WhizzML: abstraction is back 7 And back to the (distributed) client: BigMLOps 59 / 61
  • 70. Package and deploy BigML work fl ows in a few clicks Deploy and monitor your application 1 Create an Application Ops 2 Connect to BigML and add Work fl ows and models 3 Package everything in a container 4 60 / 61