Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Kubeflow and Data Science in Kubernetes

683 vues

Publié le

Deploying machine learning pipelines robustly at scale is one of the biggest challenges within an organization. Kubeflow is an open-source platform for distributed training, tuning, and serving models on Kubenetes. As a comprehensive solution for deploying and managing end-to-end data science and machine learning pipelines, Kubeflow is rapidly accelerating analytics innovation and adoption. John provides an overview of Kubeflow and how he has been using it in the wild.

Publié dans : Données & analyses
  • Identifiez-vous pour voir les commentaires

Kubeflow and Data Science in Kubernetes

  1. 1. KUBEFLOW AND DATA SCIENCE IN KUBERNETES John Liu, Intelluron Corporation Nashville Kubernetes Meetup October 9, 2019 @drjohncliu
  2. 2. Source: IBM
  3. 3. AIML Tools
  4. 4. 40% of digital transforma<on ini<a<ves will use AIML services. AIML spending will grow to $52.2 billion in 2021. Source: IDC
  5. 5. Corporate Adop<on AIML Awareness AIML Adop<on AIML Na<ve
  6. 6. Business Value is Only Achieved when Models are in Produc<on
  7. 7. Too Common Scenario DS I’ve built a model that can … using XLNet and a 1000-layer ResNet model in TensorFlow Eng Great, but it takes HOW LONG to process a single document? DevOps You need a 1024 GPU cluster?!
  8. 8. Data Science Cycle Data ETL Model Building Model Tuning Model Deployment Model Serving
  9. 9. Data ETL •  Data Lake è Data Warehouse è Query •  Explora<on using Notebooks, BI, Excel •  Visualiza<on Tools
  10. 10. Model Building •  Notebooks –  JupyterHub –  JupyterLab •  Code repos •  Gaps: –  versioning –  reproducibility –  Distributed training –  Big data/models
  11. 11. Model Tuning •  Hyperparameter op<miza<on –  Gridsearch/Parallel runs –  Tracking and repor<ng –  Neural architecture search –  Containeriza<on
  12. 12. Model Deployment •  Model Packaging •  Rest/gRPC API •  Pipelines •  CI/CD •  GitOps/MLOps
  13. 13. Model Serving •  Infrastructure •  API Gateway •  Rollouts •  Autoscaling •  Telemetry •  Security
  14. 14. Sums it up… Most Folks Magical AI Goodness Need a Team
  15. 15. Kubeflow
  16. 16. Cloud Na<ve ML + K8S = Kubeflow •  Composability – ML Stages are independent systems •  Portability – Dev/UAT/Prod (RaspPi/Laptop/Cloud) •  Scalability – Hyperparameter tuning, produc<on workloads
  17. 17. Kubeflow Components •  JupyterHub on K8S – Isolated kernels – Team sharing – RBAC – Distributed training •  Kubeflow Fairing
  18. 18. Kubeflow Components •  Ka<b Op<mizer – Based on Google vizier – Hyperparam Tuning – Neural Architecture Search Ka<b manager Random Grid Hyperband Worker1 Metrics Worker2 Metrics WorkerN Metrics Bayesian RL
  19. 19. Kubeflow Components •  Pipeline/Workflows – Kubeflow Pipelines – Argo Workflows – Docker Pipelines – Airflow
  20. 20. Kubeflow Components •  Ingress – Ambassador
  21. 21. Kubeflow Components •  Telemetry/Policy – Is<o (Envoy/Mixer) – Prometheus/Grafana
  22. 22. Kubeflow Components •  Serving – KFServing – TFServing – Seldon Core – NVIDIA Inference – PyTorch Serving
  23. 23. Pipelines Kubeflow Argo Docker Airflow Notebooks JupyterHub Training Chainer MXNet PyTorch Tensorboard Fairing Tuning Ka<b/Vizier Serving KFServing TFServing Seldon NVIDIA Infer PyTorch Serving Other Kustomize Metadata Ingress Ambassador Telemetry Is<o Prometheus Grafana
  24. 24. Lessons Learned in the Wild 1.  There aren’t good alterna<ves to Kubeflow for scaling deep learning models 2.  Kubeflow favors GKE (not as friendly on EKS installa<on, haven’t tried AKS) 3.  Data and Notebook versioning s<ll missing (Polyaxon, DVC, or Pachyderm may help)
  25. 25. Lessons Learned in the Wild 4.  Ka<b is buggy, metrics logging to stdout 5.  PyTorch Operator + Seldon integra<on 6.  ksonnet replacement: kustomize vs helm 7.  No plans for Tekton/Kubeflow integra<on yet 8.  Kubeflow >> associate ML Engineer
  26. 26. Ques%ons
  27. 27. John Liu, Intelluron Corpora<on @drjohncliu

×