SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Scalable Deep Learning
Big Data, NLP, Machine Perception
2
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer: “ImageNet Training in Minutes” (2018) 3
Facebook: “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour” (2017)
4
› Skalierungsmöglichkeiten
› Technologie Stack
› Frameworks + Vergleich
› Learnings
› Ausblick
› physikalische Grenze
› mehr Ressourcen (CPU/GPU/RAM)
5Icons von Freepik/ CC 3.0 BY
› Rechnerverbund
› Parallelisierbarkeit begrenzt
6Icons von Freepik/ CC 3.0 BY
› periodische Synchronisation
› selbes Netz, unterschiedliche Daten
7
Rechner 2Rechner 1
8
Technologie Stack
› Container Virtualisierung
› Leichtgewichtig
› Meistverbreitetes Containerformat
› Verfügbarkeit von Registries
9
10
› Hardware-Abstraktion
› Container ausführen
› Service Discovery & Networking
› Konfigurationsmanagement
11
› Paketmanager
› Templating Funktionalität
› Convenience
helm init
12
helm install
--set worker=2
Icons von Freepik/ CC 3.0 BY
13
Service
name: worker-0
Service
name: worker-1
Icons von Freepik/ CC 3.0 BY
14
1 {{ range $i := until (int $worker) }}
2 kind: Service
3 apiVersion: v1
4 metadata:
5 name: worker-{{ $i }}
6 spec:
7 selector:
8 job: worker
9 task: {{ $i }}
10 {{ end }}
15
Deep Learning Frameworks
16
› Herbst 2015, Google
› “library for high performance numerical computation”
› ML/ DL support
› TensorBoard
› Parameter Server
17Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014)
Worker Worker Worker
Parameter Server
› multi CPU/ GPU, multi Node
› Infrastruktur: keine Voraussetzungen
› IP-Adressen/ Hostnamen + Port
› Parameter Server
18Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014)
19
worker_hosts
1 worker_hosts={{ range $k := until (int $worker) }}
tensorflow-worker-{{ $k }}:5000
{{ if lt (int $k | add 1) (int $worker) }}{{ print "," }}
{{ end }}{{ end }}
worker_hosts=tensorflow-worker-0:5000,tensorflow-worker-1:5000
20
› Distributed (Deep) Machine Learning Community
(DMLC)
› “A flexible and efficient library for deep learning.”
› Amazons Framework der Wahl
› (TensorBoard Support)
› verteilter KVStore
21
GPU 1 GPU 2 GPU 1 GPU 2
T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems” (2015)
22
› multi CPU/ GPU, multi Node
› Infrastruktur: SSH / MPI / YARN / SGE
› Hostfile mit IP-Adressen/ Hostnamen
› verteilter KVStore
T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems” (2015)
23
hosts
1 data:
2 hosts: |-
3 {{- range $i := until (int $worker) }}
4 mxnet-worker-{{ $i }}
5 {{- end }}
24
› Frühjahr 2015, Microsoft
› “A deep learning tool that balances efficiency,
performance and flexibility”
› TensorBoard Support
25
› 1-Bit SGD/ Block-Momentum SGD/ Parameter Server
› multi CPU/ GPU, multi Node
› Infrastruktur: (Open)MPI
› Hostfile mit IP-Adressen/ Hostnamen
Microsoft: “Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering” (2016)
Microsoft: “1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs” (2014)
26
hosts
1 data:
2 hosts: |-
3 {{- range $i := until (int $worker) }}
4 cntk-worker-{{ $i }}
5 {{- end }}
27
VS.
CNN LSTM
28
› MNIST Datensatz
› LeNet-5
› einfachere Netze
› PTB Datensatz
› 200 Neuronen, 2 Layer
› komplexere Netze
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals: “Recurrent Neural Network Regularization” (2015)
29
30
31
32
› TensorFlow, MXNet, CNTK
› 3 diverse CNNs
› CNTK skaliert effizienter
› zeigen Bottlenecks
S. Shi, X. Chu: “Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs” (2017)
33
› DevicePlugin installieren
› Base Image: nvidia/cuda
› GPU Ressourcen verwenden
1 resources:
2 limits:
3 nvidia.com/gpu: {{ $numGpus }}
34
› kubeflow
› Ähnlich zu unserer Plattform
› ksnonnet für templating (vs Helm alleine)
› Argo für workflow-steuerung (evtl auch)
› Horovod (Uber)
› MPI für TensorFlow (bessere Performance)
› besserer Skalierungsmechanismus
› TensorFlow Operator (Jakob Karalus, data2day ‘17)
› jinja2 für’s templating
35
Sebastian Jäger
@se_jaeger
Hans-Peter Zorn
@data_hpz
inovex GmbH
Ludwig-Erhard-Allee 6
76131 Karlsruhe

Contenu connexe

Plus de inovex GmbH

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
inovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
inovex GmbH
 
Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...
inovex GmbH
 
MQTT in the Enterprise – How to successfully run an MQTT Message Broker
MQTT in the Enterprise – How to successfully run an MQTT Message BrokerMQTT in the Enterprise – How to successfully run an MQTT Message Broker
MQTT in the Enterprise – How to successfully run an MQTT Message Broker
inovex GmbH
 

Plus de inovex GmbH (20)

Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 
Remote First – Der Arbeitsplatz in der Cloud
Remote First – Der Arbeitsplatz in der CloudRemote First – Der Arbeitsplatz in der Cloud
Remote First – Der Arbeitsplatz in der Cloud
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosystem
 
TTD Tatort-driven Development
TTD Tatort-driven DevelopmentTTD Tatort-driven Development
TTD Tatort-driven Development
 
Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...
 
React mit TypeScript – eine glückliche Ehe
React mit TypeScript – eine glückliche EheReact mit TypeScript – eine glückliche Ehe
React mit TypeScript – eine glückliche Ehe
 
Manage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approachManage your bare-metal infrastructure with a CI/CD-driven approach
Manage your bare-metal infrastructure with a CI/CD-driven approach
 
Connected Cooking powered by MQTT
Connected Cooking powered by MQTTConnected Cooking powered by MQTT
Connected Cooking powered by MQTT
 
MQTT in the Enterprise – How to successfully run an MQTT Message Broker
MQTT in the Enterprise – How to successfully run an MQTT Message BrokerMQTT in the Enterprise – How to successfully run an MQTT Message Broker
MQTT in the Enterprise – How to successfully run an MQTT Message Broker
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Skalieren von Deep Learning Frameworks mit Hilfe von Cloudinfrastrukturen und Kubernetes

  • 1.
  • 2. Scalable Deep Learning Big Data, NLP, Machine Perception 2
  • 3. Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer: “ImageNet Training in Minutes” (2018) 3 Facebook: “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour” (2017)
  • 4. 4 › Skalierungsmöglichkeiten › Technologie Stack › Frameworks + Vergleich › Learnings › Ausblick
  • 5. › physikalische Grenze › mehr Ressourcen (CPU/GPU/RAM) 5Icons von Freepik/ CC 3.0 BY
  • 6. › Rechnerverbund › Parallelisierbarkeit begrenzt 6Icons von Freepik/ CC 3.0 BY
  • 7. › periodische Synchronisation › selbes Netz, unterschiedliche Daten 7 Rechner 2Rechner 1
  • 9. › Container Virtualisierung › Leichtgewichtig › Meistverbreitetes Containerformat › Verfügbarkeit von Registries 9
  • 10. 10 › Hardware-Abstraktion › Container ausführen › Service Discovery & Networking › Konfigurationsmanagement
  • 11. 11 › Paketmanager › Templating Funktionalität › Convenience
  • 12. helm init 12 helm install --set worker=2 Icons von Freepik/ CC 3.0 BY
  • 14. 14 1 {{ range $i := until (int $worker) }} 2 kind: Service 3 apiVersion: v1 4 metadata: 5 name: worker-{{ $i }} 6 spec: 7 selector: 8 job: worker 9 task: {{ $i }} 10 {{ end }}
  • 16. 16 › Herbst 2015, Google › “library for high performance numerical computation” › ML/ DL support › TensorBoard
  • 17. › Parameter Server 17Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014) Worker Worker Worker Parameter Server
  • 18. › multi CPU/ GPU, multi Node › Infrastruktur: keine Voraussetzungen › IP-Adressen/ Hostnamen + Port › Parameter Server 18Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014)
  • 19. 19 worker_hosts 1 worker_hosts={{ range $k := until (int $worker) }} tensorflow-worker-{{ $k }}:5000 {{ if lt (int $k | add 1) (int $worker) }}{{ print "," }} {{ end }}{{ end }} worker_hosts=tensorflow-worker-0:5000,tensorflow-worker-1:5000
  • 20. 20 › Distributed (Deep) Machine Learning Community (DMLC) › “A flexible and efficient library for deep learning.” › Amazons Framework der Wahl › (TensorBoard Support)
  • 21. › verteilter KVStore 21 GPU 1 GPU 2 GPU 1 GPU 2 T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems” (2015)
  • 22. 22 › multi CPU/ GPU, multi Node › Infrastruktur: SSH / MPI / YARN / SGE › Hostfile mit IP-Adressen/ Hostnamen › verteilter KVStore T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems” (2015)
  • 23. 23 hosts 1 data: 2 hosts: |- 3 {{- range $i := until (int $worker) }} 4 mxnet-worker-{{ $i }} 5 {{- end }}
  • 24. 24 › Frühjahr 2015, Microsoft › “A deep learning tool that balances efficiency, performance and flexibility” › TensorBoard Support
  • 25. 25 › 1-Bit SGD/ Block-Momentum SGD/ Parameter Server › multi CPU/ GPU, multi Node › Infrastruktur: (Open)MPI › Hostfile mit IP-Adressen/ Hostnamen Microsoft: “Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-block Parallel Optimization and Blockwise Model-Update Filtering” (2016) Microsoft: “1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs” (2014)
  • 26. 26 hosts 1 data: 2 hosts: |- 3 {{- range $i := until (int $worker) }} 4 cntk-worker-{{ $i }} 5 {{- end }}
  • 28. CNN LSTM 28 › MNIST Datensatz › LeNet-5 › einfachere Netze › PTB Datensatz › 200 Neuronen, 2 Layer › komplexere Netze Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals: “Recurrent Neural Network Regularization” (2015)
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32 › TensorFlow, MXNet, CNTK › 3 diverse CNNs › CNTK skaliert effizienter › zeigen Bottlenecks S. Shi, X. Chu: “Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs” (2017)
  • 33. 33 › DevicePlugin installieren › Base Image: nvidia/cuda › GPU Ressourcen verwenden 1 resources: 2 limits: 3 nvidia.com/gpu: {{ $numGpus }}
  • 34. 34 › kubeflow › Ähnlich zu unserer Plattform › ksnonnet für templating (vs Helm alleine) › Argo für workflow-steuerung (evtl auch) › Horovod (Uber) › MPI für TensorFlow (bessere Performance) › besserer Skalierungsmechanismus › TensorFlow Operator (Jakob Karalus, data2day ‘17) › jinja2 für’s templating
  • 35. 35
  • 36. Sebastian Jäger @se_jaeger Hans-Peter Zorn @data_hpz inovex GmbH Ludwig-Erhard-Allee 6 76131 Karlsruhe