Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Debugging and Performance tricks for MXNet Gluon

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 41 Publicité

Debugging and Performance tricks for MXNet Gluon

Télécharger pour lire hors ligne

Notebook available here: https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon

The presentation is about tips and tricks for optimizing the Sample/Sec throughput on training script for MXNet Gluon, and visualizing the key indicators of system and training performance

Notebook available here: https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon

The presentation is about tips and tricks for optimizing the Sample/Sec throughput on training script for MXNet Gluon, and visualizing the key indicators of system and training performance

Publicité
Publicité

Plus De Contenu Connexe

Plus récents (20)

Publicité

Debugging and Performance tricks for MXNet Gluon

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thomas Delteil, Applied Scientist @ AWS Deep Engine APJCTech Summit 2018, Macau Debugging MXNet Gluon modelsAnd other performance tricks © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thomas Delteil, Applied Scientist @ AWS Deep Engine APJCTech Summit 2018, Macau Debugging MXNet Gluon models And other performance tricks
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Remote debugging with PyCharm Visualizing deep learning Performance tricks and gotchas
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache History • CMU project of PHD students in 2015 and the Distributed Machine Learning Community (DMLC) • 2017 => MXNet Gluon Imperative API is released Tianqi Chen UW Mu Li Amazon AI Yutian Li Stanford Min Lin MILA Naiyan Wang TuSimple Minjie Wang NYU CS Tianjun Xiao Tesla Bing Xu Apple AI Chiyuan Zhang Google Brain Zheng Zhang MSR Asia
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic computational graphs Symbolic define, compile, run Imperative define-by-run in the host language Inception model
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imperative > Symbolic Debuggable Fast to prototype Hybridizable
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interactive Debugging
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Shapes Values Gradients
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Youtube tutorial
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualizing Deep Learning
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Architecture
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#1) print(net)
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#2) mx.viz.plot_network(sym)
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#3) mx.viz.print_summary(sym)
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Netron (online tool)
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXBoard sw.add_graph(net)
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. System performance
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GPU: gpu_monitor (github)
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CPU / RAM: > top i
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training metrics
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXBoard
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXBoard Scalars
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXBoard Images
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Console
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance Tips and tricks
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 130 samples/sec 1.25x 2.41x 2.46x 2.53x 3.84x GPU utilization
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Environment mxnet-mkl (32x) vs mxnet
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. I/O Bound → GPU Starvation
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. #1 Asynchronously pre-fetching data (low CPU) (1.25x) DataLoader(num_workers=CPU_COUNT-3) #2 Offline preprocessing (full CPU)
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GPU → CPU memcopy synchronization idling #3 Smart synchronization calls (2.46x) → Small networks
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Copy to GPU
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Execution engine Imperative → Symbolic (2.41x) net.hybridize()
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hyperparameters Batchsize (2.56x)
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizer Performance: Time to accuracy
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mixed precision training float32 → float16 (3.84x) net.cast("float16")
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Profiling profiler.set_state('run') … profiler.set_state('stop')
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conclusion - Use Gluon to debug and iterate quickly - Hybridize and optimize for speed - Know your model: Visualize performance
  41. 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Follow-up: tdelteil@ Github.com/thomasdelteil AWS Deep Engine, Vancouver

Notes de l'éditeur

  • Data loading issue
    Nan values
    Loss exploding suddenly
  • Explain ssh tunnel and tensorboard
  • 22M$ GPU

×