The talk I gave at Scale By The Bay.
Deploying, Serving and monitoring machine learning models built with different ML frameworks in production. Envoy proxy powered serving mesh. TensorFlow, Spark ML, Scikit-learn and custom functions on CPU and GPU.
2. Mission: Accelerate Machine Learning to Production
Opensource Products:
- Mist: Serverless proxy for Spark
- ML Lambda: ML Function as a Service
- Sonar: Data and ML Monitoring
Business Model: Subscription services and hands-on consulting
About
10. Functions registry
responsible for the
model life cycle and
all the business logic
required to configure
models for serving
Mesh of serving
runtimes is an actual
serving cluster
Infrastructure
integration: ECS for
AWS, Kubernetes for GCE
and on premise
11. UX: Models and Applications
Applications provide public virtual endpoints for the
models and compositions of the models.
12. Why Not just one Big Neural Network?
● Not always possible
● Stages could be independent
● Ad-hoc rule based models
● Physics models (e.g. LIDAR)
● Big E2E DL Requires Black
Magic skills
13. Why Not just one Python script?
● Modularity. Stages could be developed by different teams
● Traceability and Monitoring
● Versioning
● Independent deployment, A/B testing and Canary
● Request Shadowing and other cool stuff
● Could require different ML runtimes (TF, Scikit, Spark
ML, etc)
● We need more microservices :)
14. Why Not just TF Serving?
● Other ML runtimes (DL4J, Scikit,
Spark ML). Servables are overkill.
● Need better versioning and
immutability (Docker per version)
● Don’t want to deal with state
(model loaded, offloaded, etc)
● Want to re-use microservices stack
(tracing, logging, metrics)
● Need better scalability