Valencian Summer School 2015
Day 1
Lecture 7
A developers’ overview of the world of predictive APIs
Louis Dorard (PAPIs.io)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2015
19. AMAZON GOOGLE PREDICSIS BIGML
ACCURACY 0.862 0.743 0.858 0.790
TRAINING
TIME
135s 76s 17s 5s
TEST TIME 188s 369s 5s 1s
louisdorard.com/blog/machine-learning-apis-comparison
42. Experiment on “ScienceCluster”
• Distributed jobs
• Collaborative workspace
• Serialize chosen model
Deploy model as API on “ScienceOps”
• Load balancing
• Auto scaling
• Monitoring (API calls, accuracy)
43. 43
• 1 for serving predictions
• 1 for running ML experiment (i.e. train and evaluate
models on given data)?
• 1 for deploying ML models?
Your API endpoints
57. 57
• Spearmint: “Bayesian optimization” for tuning
parameters → Whetlab → Twitter
• Auto-sklearn: “automated machine learning toolkit
and drop-in replacement for a scikit-learn
estimator”
• See automl.org and challenge
Open Source AutoML?!
58. from sklearn import svm
model = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasets
digits = datasets.load_digits()
model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
Scikit Python
59. from sklearn import svm
model = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasets
digits = datasets.load_digits()
model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
Scikit Python
60. import autosklearn
model = autosklearn.AutoSklearnClassifier()
from sklearn import datasets
digits = datasets.load_digits()
model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
AutoML Scikit
62. AMAZON GOOGLE PREDICSIS BIGML
ACCURACY 0.862 0.743 0.858 0.790
TRAINING
TIME
135s 76s 17s 5s
TEST TIME 188s 369s 5s 1s
louisdorard.com/blog/machine-learning-apis-comparison
63. 63
• Requirement:
• train/test splits on local machine
• compute evaluation on local machine
• Solutions
• adapt bigmler and use local evaluations?
• use scikit-learn framework?
Automated Benchmark?!
64. 64
• Python defacto standard: scikit-learn
• “Sparkit-learn aims to provide scikit-
learn functionality and API on PySpark. The main goal of the
library is to create an API that stays close to sklearn’s."
• REST standard: PSI (Protocols & Structures for Inference)
• Pretty similar to BigML API!
• Implementation for scikit available
• Easy benchmarking! Ensembles!
API standards?!
65. 65
• VM with Jupyter notebooks (Python & Bash)
• API wrappers preinstalled: BigML & Google Pred
• Notebook for easy setup of credentials
• Scikit-learn and Pandas preinstalled
• Open source VM provisioning script & notebooks
• Search public Snaps on terminal.com: “machine learning”
Getting started