Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 3: REST API, Bindings, and Basic Workflows
Lecturer: Dr. José Antonio Ortega - jao (BigML)
Determinants of health, dimensions of health, positive health and spectrum of...
Automating ML API, BigMLer and Basic Workflows
1. Automating Machine Learning
API, bindings, BigMLer and Basic Workflows
#BSSML16
December 2016
#BSSML16 Automating Machine Learning December 2016 1 / 29
2. Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 2 / 29
3. Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 3 / 29
4. Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over
algorithms
• Automation
#BSSML16 Automating Machine Learning December 2016 4 / 29
6. Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 6 / 29
10. RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resources
management problems mostly washed away
#BSSML16 Automating Machine Learning December 2016 10 / 29
11. RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes
moot
• Maximizes reach (Web, CLI, desktop,
IoT)
#BSSML16 Automating Machine Learning December 2016 11 / 29
12. Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 12 / 29
14. Example workflow: Batch Centroid
Objective: Label each row in a Dataset with its associated centroid.
We need to...
• Create Dataset
• Create Cluster
• Create BatchCentroid from Cluster
and Dataset
• Save BatchCentroid as new Dataset
#BSSML16 Automating Machine Learning December 2016 14 / 29
15. Example workflow: building blocks
curl -X POST "https://bigml.io?$AUTH/dataset"
-D '{"source": "source/56fbbfea200d5a3403000db7"}'
curl -X POST "https://bigml.io?$AUTH/cluster"
-D '{"source": "dataset/43ffe231a34fff333000b65"}'
curl -X POST "https://bigml.io?$AUTH/batchcentroid"
-D '{"dataset": "dataset/43ffe231a34fff333000b65",
"cluster": "cluster/33e2e231a34fff333000b65"}'
curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"
#BSSML16 Automating Machine Learning December 2016 15 / 29
16. Example workflow: Web UI
#BSSML16 Automating Machine Learning December 2016 16 / 29
18. Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
# create dataset and cluster, waiting for both
dataset = api.create_dataset(source)
api.ok(dataset)
cluster = api.create_cluster(dataset)
api.ok(cluster)
# create a batch centroid with output to dataset
centroid = api.create_batch_centroid(cluster, dataset,
{'output_dataset': True,
'all_fields': True})
api.ok(centroid)
# wait again, via polling, until the dataset is finished
batch_dataset_id = centroid['object']['output_dataset_resource']
batch_dataset = api.get_dataset(batch_dataset_id)
api.ok(batch_dataset)
#BSSML16 Automating Machine Learning December 2016 18 / 29
19. Client-side automation via bindings
Strengths of bindings-based solutions
Versatility Maximum flexibility and possibility of encapsulation (via
proper engineering)
Native Easy to support any programming language
Offline Whitebox models allow local use of resources (e.g.,
real-time predictions)
#BSSML16 Automating Machine Learning December 2016 19 / 29
20. Client-side automation via bindings
Strengths of bindings-based solutions
from bigml.model import Model
model_id = 'model/5643d345f43a234ff2310a3e'
# Download of (whitebox) resource
local_model = Model(model_id)
# Purely local calculations
local_model.predict({'plasma glucose': 132})
#BSSML16 Automating Machine Learning December 2016 20 / 29
21. Client-side automation via bindings
Problems of bindings-based solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows are hard to optimize
Not enough abstraction
#BSSML16 Automating Machine Learning December 2016 21 / 29
22. Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 22 / 29
28. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
#BSSML16 Automating Machine Learning December 2016 28 / 29
29. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
Algorithmic complexity and computing resources management
problems mostly washed away are back!
#BSSML16 Automating Machine Learning December 2016 28 / 29