SlideShare une entreprise Scribd logo
1  sur  28
Optimizing training on
Apache MXNet
Julien Simon, AI Evangelist, EMEA
@julsimon
What to expect from this session
• Techniques and tips to optimize training on Apache MXNet
• Infrastructure performance: storage and I/O, GPU throughput, distributed
training, CPU-based training, cost
• Model performance: data augmentation, initializers, optimizers, etc.
• Level 666: you should be familiar with Deep Learning and MXNet
Optimizing Infrastructure Performance
Deploying data sets to instances
• Deep Learning training sets are often very large, with a huge number of files
• How can we deploy them quickly, easily and reliably to instances?
• We strongly recommend packing the training set in a RecordIo file
• https://mxnet.incubator.apache.org/architecture/note_data_loading.html
• https://mxnet.incubator.apache.org/how_to/recordio.html
• Only one file to move around!
• Worth the effort: pack once, train many times
• In any case, you need to copy your data set to a central location
• Let’s look at Amazon EBS, Amazon S3 and Amazon EFS
Storing data sets in Amazon EBS
1. Prepare your data set on a dedicated EBS volume
2. Take a snapshot
3. Deploying to a new instance only takes a few seconds
a. Create a volume from the snapshot
b. Attach the volume to the instance
c. Mount the volume
• Easy to automate, including at boot time (UserData or cfn-init)
• Easy to scale to many instances, even in different accounts
• Large choice of EBS volume types (cost vs. performance)
• Caveat: no sharing for distributed training, copying is required
Storing data sets in Amazon S3
• MXNet has an S3 connector  build option USE_S3=1
https://mxnet.incubator.apache.org/how_to/s3_integration.html
• Best durability (11 9’s)
• Distributed training possible
• Caveats
• Lower performance than EBS-optimized instances
• Beware of hot spots if a lot of instances are running
https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
train_dataiter = mx.io.MNISTIter(
image="s3://bucket-name/training-data/train-images-idx3-ubyte",
label="s3://bucket-name/training-data/train-labels-idx1-ubyte", ...
Storing data sets in Amazon EFS
1. Copy your data set on an EFS volume
2. Mount the volume on instances
• Simple way to set up distributed training (no copying required)
• Caveats
• You probably want the “Max I/O” performance mode, but I’d test both
to see if latency is an issue or not
• EFS is more expensive than S3 and EBS: use it for training only, not
for long-term storage
Maximizing GPU usage
• GPUs need a high-throughput, stable flow of training data to run at top speed
• Large datasets cannot fit in RAM
• Adding more GPUs requires more throughput
• How can we check that training is running at full speed?
• Keep track of performance indicators from previous trainings (images / sec, etc.)
• Look at performance indicators and benchmarks reported by others
• Use nvidia-smi
• Look at power consumption, GPU utilization and GPU RAM
• All these values should be maxed out and stable
Maximizing GPU usage: batch size
• Picking a batch size is a tradeoff between training speed and accuracy
• Larger batch size is more computationally efficient
• Smaller batch size helps find a better minimum
• Smaller data sets, few classes (MNIST, CIFAR)
• Start with 32*GPU_COUNT
• 1024 is probably the largest reasonable batch size
• Large data sets, lot of classes (ImageNet)
• Use the largest possible batch size
• Start at 32*GPU_COUNT and increase it until MXNet OOMs
Maximizing GPU usage: compute & I/O
• Check power consumption and GPU usage after each modification
• If they’re not maxed out, GPUs are probably stalling
• Can the Python process keep up? Loading images, pre-processing, etc.
• Use top to check load and count threads
• Use RecordIO and add more decoding threads
• Can the I/O layer keep up?
• Use iostat to look at volume stats
• Use faster storage: SSD or even a ramdisk!
Using distributed training
• MXNet scales almost linearly up to 256 GPUs
http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html
• Easy to set up
https://mxnet.incubator.apache.org/how_to/multi_devices.html
• Blog post + AWS CloudFormation template
https://aws.amazon.com/blogs/compute/distributed-deep-learning-made-easy/
• Master node must have SSH access to slave nodes
• Data set must be accessible on all nodes
• Shared storage: great!
• No shared storage  automatic copy with rsync
What about CPU training?
• Several libraries help speed up Deep Learning on CPUs
• Fast implementation of math primitives
• Dedicated instruction sets, e.g. Intel AVX or ARM NEON
• Fast memory allocation
• Intel Math Kernel Library https://software.intel.com/en-us/mkl  USE_MKL = 1
• NNPACK https://github.com/Maratyszcza/NNPACK  USE_NNPACK = 1
• Libjpeg-turbo https://www.libjpeg-turbo.org/  USE_TURBO_JPEG = 1
• Jemalloc http://jemalloc.net/  USE_JEMALLOC = 1
• Google Perf Tools https://github.com/gperftools  USE_GPERFTOOLS = 1
Distribution Details
 Open Source
 Apache 2.0 License
 Common DNN APIs across all Intel hardware.
 Rapid release cycles, iterated with the DL community, to
best support industry framework integration.
 Highly vectorized & threaded for maximal performance,
based on the popular Intel® MKL library.
For developers of deep learning frameworks featuring optimized performance on Intel hardware
http://github.com/01org/mkl-dnn
Direct 2D
Convolution
Rectified linear unit
neuron activation
(ReLU)
Maximum
pooling
Inner product
Local response
normalization
(LRN)
Intel® MKL-dnn
Math Kernel Library for Deep Neural Networks
Examples:
Optimizing cost
• Use Spot instances
https://aws.amazon.com/blogs/aws/natural-language-
processing-at-clemson-university-1-1-million-vcpus-
ec2-spot-instances/
• Sharing is caring: it’s easy to share an
instance for multiple jobs
mod = mx.mod.Module(lenet, context=(mx.gpu(7), mx.gpu(8), mx.gpu(9)))
p2.16xlarge 89%
discount
Demo: C5 + Intel MKL = ♥ ♥ ♥
Optimizing Model Performance
Using data augmentation
• Data augmentation lets you add more samples to smaller data sets
• Even a large data set may benefit from it and generalize better
• The ImageRecordIter object lets you do that easily from a RecordIO image file
• Images: crop, rotate, change colors, etc.
• https://mxnet.incubator.apache.org/api/python/io.html#mxnet.io.ImageRecordIter
• Careful: this processing is performed by the Python process: add more threads!
data_iter = mx.io.ImageRecordIter(path_imgrec="./data/caltech_train.rec",
data_shape=(3, 227, 227),
batch_size=4,
resize=256
…
# you can add more augumentation options here.
# use help(mx.io.ImageRecordIter) to see all possible choices )
Picking an initializer
• MXNet supports many different initializers
https://mxnet.incubator.apache.org/api/python/optimization.html
• Initial weights should neither be ”too large” or “too small”
• There seems to be some sort of consensus on:
https://www.quora.com/What-are-good-initial-weights-in-a-neural-network
• Xavier for Convolutional Neural Networks
• Random values between 0 and 1 for everything else
• I wouldn’t use anything else unless I really knew better 
Managing the learning rate
• The learning rate is probably the most discussed parameter in Deep Learning
• Too small: your model may never converge
• Too large: your model may never reach a minimum
• Try keeping a large learning rate for a long time, then reduce it
• Here are common techniques you could use with MXNet:
1. Use a fixed learning rate
2. Use steps: scale the learning rate
• once a number of batches have been completed,
• after each epoch,
• once specific epochs have been completed
3. Use an optimizer which automatically adapts the learning rate
Scaling the learning rate with steps
• Number of steps = number of samples / batch size / number of distributed workers
• FactorScheduler object: update the learning rate after ‘n’ steps
• MultiFactorScheduler object: update the learning rate after specific step counts
• MXNet scripts let you use command-line parameters (--step-epochs)
https://github.com/apache/incubator-mxnet/tree/master/example/image-classification
lr_sch = mx.lr_scheduler.FactorScheduler(step=100, factor=0.9)
mod.init_optimizer( ... optimizer='sgd', optimizer_params=(('learning_rate', 0.1),
('lr_scheduler', lr_sch)))
steps = [0, 100, 200, 250, 300, 325, 350]
lr_sch = mx.lr_scheduler.MultiFactorScheduler(step=steps, factor=0.9)
mod.init_optimizer( ... optimizer='sgd', optimizer_params=(('learning_rate', 0.1),
('lr_scheduler', lr_sch)))
Picking an optimizer
• MXNet supports many different optimizers
https://mxnet.incubator.apache.org/api/python/optimization.html
http://ruder.io/optimizing-gradient-descent/
• It’s unlikely that a single one will work best every time. Experiment!
• Several SGD variants adapt the learning rate during training
• Some of them even use a specific learning rate for each parameter
Example: learning MNIST with the LeNet CNN (20 epochs)
Algorithm SGD NAG Adam NAdam AdaGrad AdaMax
Time / epoch 2.5s 2.55s 18.5s 15.1s 5.7s 7.5s
Validation accuracy 98.5% 98.5% 98.3% 98.4% 99.2% 98.55%
Reducing model size
• Complex neural networks are too large for resource-constrained environments
• MXNet supports Mixed Precision Training
• Use float16 instead of float32
• Almost 2x reduction in memory consumption, no loss of accuracy
• https://devblogs.nvidia.com/parallelforall/mixed-precision-training-deep-neural-networks/
• http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mxnet
• BMXNet: Binary Neural Network Implementation
• Use binary values for weights and activations
• 20x to 30x reduction in model size, with limited loss
• https://github.com/hpi-xnor/BMXNet
Monitoring the training process
• You can run callbacks at the end of each batch and at the end of each epoch.
• This allows you to display training speed…
• … and save parameters after each epoch
module.fit(iterator, num_epoch=n_epoch, ...
batch_end_callback=mx.callback.Speedometer(64, 10))
Epoch[0] Batch [10] Speed: 1910.41 samples/sec Train-accuracy=0.200000
Epoch[0] Batch [20] Speed: 1764.83 samples/sec Train-accuracy=0.400000
module.fit(iterator, num_epoch=n_epoch, ...
epoch_end_callback = mx.callback.do_checkpoint("mymodel", 1))
Start training with [cpu(0)]
Epoch[0] Resetting Data Iterator
Epoch[0] Time cost=0.100 Saved checkpoint to "mymodel-0001.params"
Epoch[1] Resetting Data Iterator
Epoch[1] Time cost=0.060 Saved checkpoint to "mymodel-0002.params"
Early stopping
Training accuracy
Loss function
Accuracy
100%
Epochs
Validation accuracy
Loss
Best checkpoint
OVERFITTING
Conclusion
• There is a lot of literature on selecting and tweaking hyper-parameters
• You should definitely read it but please experiment with your own data
• Train 1,000 models and pick the best one
• Optimizing infrastructure is all the more important, then!
• Make sure all parts are firing on all cylinders
• Spot instances!
• I hope this was useful. Please don’t forget to send your feedback
• Go build cool stuff and let me know! Happy to share and retweet 
Resources
https://aws.amazon.com/ai
https://aws.amazon.com/blogs/ai
https://mxnet.io
https://github.com/apache/incubator-mxnet
https://github.com/gluon-api
https://aws.amazon.com/blogs/machine-learning/speeding-up-apache-mxnet-using-the-nnpack-library/
https://medium.com/@julsimon/speeding-up-apache-mxnet-part-3-lets-smash-it-with-c5-and-intel-mkl-
90ab153b8cc1
https://medium.com/@julsimon/imagenet-part-1-going-on-an-adventure-c0a62976dc72
https://medium.com/@julsimon/imagenet-part-2-the-road-goes-ever-on-and-on-578f09a749f9
Thank you!
Julien Simon, AI Evangelist, EMEA
@julsimon
THANK YOU!
J u l i e n S i m o n , P r i n c i p a l A I / M L E v a n g e l i s t , E M E A
@ j u l s i m o n

Contenu connexe

Tendances

Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Julien SIMON
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfAmazon Web Services
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
 
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMaker
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMakerAWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMaker
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMakerAmazon Web Services
 
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...Amazon Web Services Korea
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsJulien SIMON
 
Deep Learning with Apache MXNet with Python
Deep Learning with Apache MXNet with Python Deep Learning with Apache MXNet with Python
Deep Learning with Apache MXNet with Python Amazon Web Services
 
Building serverless applications (April 2018)
Building serverless applications (April 2018)Building serverless applications (April 2018)
Building serverless applications (April 2018)Julien SIMON
 
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlow
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlowAWS Machine Learning Week SF: Amazon SageMaker & TensorFlow
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlowAmazon Web Services
 
ACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerAWS User Group Kochi
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Julien SIMON
 
MCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSMCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSAmazon Web Services
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Julien SIMON
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseAmazon Web Services
 
Processing images with Deep Learning
Processing images with Deep LearningProcessing images with Deep Learning
Processing images with Deep LearningJulien SIMON
 
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Julien SIMON
 

Tendances (20)

Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdf
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMaker
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMakerAWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMaker
AWS Machine Learning Week SF: Build, Train & Deploy ML Models Using SageMaker
 
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...
[AWS Dev Day] 인공지능 / 기계 학습 | 개발자를 위한 수백만 사용자 대상 기계 학습 서비스 확장 하기 - 윤석찬 AWS 수석테...
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithms
 
Deep Learning with Apache MXNet with Python
Deep Learning with Apache MXNet with Python Deep Learning with Apache MXNet with Python
Deep Learning with Apache MXNet with Python
 
Building serverless applications (April 2018)
Building serverless applications (April 2018)Building serverless applications (April 2018)
Building serverless applications (April 2018)
 
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlow
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlowAWS Machine Learning Week SF: Amazon SageMaker & TensorFlow
AWS Machine Learning Week SF: Amazon SageMaker & TensorFlow
 
ACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemaker
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
 
MCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWSMCL333_Building Deep Learning Applications with TensorFlow on AWS
MCL333_Building Deep Learning Applications with TensorFlow on AWS
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your Enterprise
 
Processing images with Deep Learning
Processing images with Deep LearningProcessing images with Deep Learning
Processing images with Deep Learning
 
AI on a PI
AI on a PIAI on a PI
AI on a PI
 
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기  - 윤석찬 (AWS 테크에반젤리스트)
Amazon SageMaker을 통한 손쉬운 Jupyter Notebook 활용하기 - 윤석찬 (AWS 테크에반젤리스트)
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)
 

Similaire à Optimizing training performance on Apache MXNet

Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsAmazon Web Services
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)AWS User Group Pune
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache Amazon Web Services
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheAmazon Web Services
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...Amazon Web Services
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowJulien SIMON
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAmazon Web Services
 
OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015Oro Inc.
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cachesqlserver.co.il
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Build, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleBuild, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleAmazon Web Services
 

Similaire à Optimizing training performance on Apache MXNet (20)

Optimize Your Machine Learning Workloads
Optimize Your Machine Learning WorkloadsOptimize Your Machine Learning Workloads
Optimize Your Machine Learning Workloads
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
 
OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Build, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleBuild, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scale
 

Plus de Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Julien SIMON
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Julien SIMON
 

Plus de Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
 

Dernier

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Dernier (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Optimizing training performance on Apache MXNet

  • 1. Optimizing training on Apache MXNet Julien Simon, AI Evangelist, EMEA @julsimon
  • 2. What to expect from this session • Techniques and tips to optimize training on Apache MXNet • Infrastructure performance: storage and I/O, GPU throughput, distributed training, CPU-based training, cost • Model performance: data augmentation, initializers, optimizers, etc. • Level 666: you should be familiar with Deep Learning and MXNet
  • 4. Deploying data sets to instances • Deep Learning training sets are often very large, with a huge number of files • How can we deploy them quickly, easily and reliably to instances? • We strongly recommend packing the training set in a RecordIo file • https://mxnet.incubator.apache.org/architecture/note_data_loading.html • https://mxnet.incubator.apache.org/how_to/recordio.html • Only one file to move around! • Worth the effort: pack once, train many times • In any case, you need to copy your data set to a central location • Let’s look at Amazon EBS, Amazon S3 and Amazon EFS
  • 5. Storing data sets in Amazon EBS 1. Prepare your data set on a dedicated EBS volume 2. Take a snapshot 3. Deploying to a new instance only takes a few seconds a. Create a volume from the snapshot b. Attach the volume to the instance c. Mount the volume • Easy to automate, including at boot time (UserData or cfn-init) • Easy to scale to many instances, even in different accounts • Large choice of EBS volume types (cost vs. performance) • Caveat: no sharing for distributed training, copying is required
  • 6. Storing data sets in Amazon S3 • MXNet has an S3 connector  build option USE_S3=1 https://mxnet.incubator.apache.org/how_to/s3_integration.html • Best durability (11 9’s) • Distributed training possible • Caveats • Lower performance than EBS-optimized instances • Beware of hot spots if a lot of instances are running https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html train_dataiter = mx.io.MNISTIter( image="s3://bucket-name/training-data/train-images-idx3-ubyte", label="s3://bucket-name/training-data/train-labels-idx1-ubyte", ...
  • 7. Storing data sets in Amazon EFS 1. Copy your data set on an EFS volume 2. Mount the volume on instances • Simple way to set up distributed training (no copying required) • Caveats • You probably want the “Max I/O” performance mode, but I’d test both to see if latency is an issue or not • EFS is more expensive than S3 and EBS: use it for training only, not for long-term storage
  • 8. Maximizing GPU usage • GPUs need a high-throughput, stable flow of training data to run at top speed • Large datasets cannot fit in RAM • Adding more GPUs requires more throughput • How can we check that training is running at full speed? • Keep track of performance indicators from previous trainings (images / sec, etc.) • Look at performance indicators and benchmarks reported by others • Use nvidia-smi • Look at power consumption, GPU utilization and GPU RAM • All these values should be maxed out and stable
  • 9. Maximizing GPU usage: batch size • Picking a batch size is a tradeoff between training speed and accuracy • Larger batch size is more computationally efficient • Smaller batch size helps find a better minimum • Smaller data sets, few classes (MNIST, CIFAR) • Start with 32*GPU_COUNT • 1024 is probably the largest reasonable batch size • Large data sets, lot of classes (ImageNet) • Use the largest possible batch size • Start at 32*GPU_COUNT and increase it until MXNet OOMs
  • 10. Maximizing GPU usage: compute & I/O • Check power consumption and GPU usage after each modification • If they’re not maxed out, GPUs are probably stalling • Can the Python process keep up? Loading images, pre-processing, etc. • Use top to check load and count threads • Use RecordIO and add more decoding threads • Can the I/O layer keep up? • Use iostat to look at volume stats • Use faster storage: SSD or even a ramdisk!
  • 11. Using distributed training • MXNet scales almost linearly up to 256 GPUs http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html • Easy to set up https://mxnet.incubator.apache.org/how_to/multi_devices.html • Blog post + AWS CloudFormation template https://aws.amazon.com/blogs/compute/distributed-deep-learning-made-easy/ • Master node must have SSH access to slave nodes • Data set must be accessible on all nodes • Shared storage: great! • No shared storage  automatic copy with rsync
  • 12. What about CPU training? • Several libraries help speed up Deep Learning on CPUs • Fast implementation of math primitives • Dedicated instruction sets, e.g. Intel AVX or ARM NEON • Fast memory allocation • Intel Math Kernel Library https://software.intel.com/en-us/mkl  USE_MKL = 1 • NNPACK https://github.com/Maratyszcza/NNPACK  USE_NNPACK = 1 • Libjpeg-turbo https://www.libjpeg-turbo.org/  USE_TURBO_JPEG = 1 • Jemalloc http://jemalloc.net/  USE_JEMALLOC = 1 • Google Perf Tools https://github.com/gperftools  USE_GPERFTOOLS = 1
  • 13. Distribution Details  Open Source  Apache 2.0 License  Common DNN APIs across all Intel hardware.  Rapid release cycles, iterated with the DL community, to best support industry framework integration.  Highly vectorized & threaded for maximal performance, based on the popular Intel® MKL library. For developers of deep learning frameworks featuring optimized performance on Intel hardware http://github.com/01org/mkl-dnn Direct 2D Convolution Rectified linear unit neuron activation (ReLU) Maximum pooling Inner product Local response normalization (LRN) Intel® MKL-dnn Math Kernel Library for Deep Neural Networks Examples:
  • 14. Optimizing cost • Use Spot instances https://aws.amazon.com/blogs/aws/natural-language- processing-at-clemson-university-1-1-million-vcpus- ec2-spot-instances/ • Sharing is caring: it’s easy to share an instance for multiple jobs mod = mx.mod.Module(lenet, context=(mx.gpu(7), mx.gpu(8), mx.gpu(9))) p2.16xlarge 89% discount
  • 15. Demo: C5 + Intel MKL = ♥ ♥ ♥
  • 17. Using data augmentation • Data augmentation lets you add more samples to smaller data sets • Even a large data set may benefit from it and generalize better • The ImageRecordIter object lets you do that easily from a RecordIO image file • Images: crop, rotate, change colors, etc. • https://mxnet.incubator.apache.org/api/python/io.html#mxnet.io.ImageRecordIter • Careful: this processing is performed by the Python process: add more threads! data_iter = mx.io.ImageRecordIter(path_imgrec="./data/caltech_train.rec", data_shape=(3, 227, 227), batch_size=4, resize=256 … # you can add more augumentation options here. # use help(mx.io.ImageRecordIter) to see all possible choices )
  • 18. Picking an initializer • MXNet supports many different initializers https://mxnet.incubator.apache.org/api/python/optimization.html • Initial weights should neither be ”too large” or “too small” • There seems to be some sort of consensus on: https://www.quora.com/What-are-good-initial-weights-in-a-neural-network • Xavier for Convolutional Neural Networks • Random values between 0 and 1 for everything else • I wouldn’t use anything else unless I really knew better 
  • 19. Managing the learning rate • The learning rate is probably the most discussed parameter in Deep Learning • Too small: your model may never converge • Too large: your model may never reach a minimum • Try keeping a large learning rate for a long time, then reduce it • Here are common techniques you could use with MXNet: 1. Use a fixed learning rate 2. Use steps: scale the learning rate • once a number of batches have been completed, • after each epoch, • once specific epochs have been completed 3. Use an optimizer which automatically adapts the learning rate
  • 20. Scaling the learning rate with steps • Number of steps = number of samples / batch size / number of distributed workers • FactorScheduler object: update the learning rate after ‘n’ steps • MultiFactorScheduler object: update the learning rate after specific step counts • MXNet scripts let you use command-line parameters (--step-epochs) https://github.com/apache/incubator-mxnet/tree/master/example/image-classification lr_sch = mx.lr_scheduler.FactorScheduler(step=100, factor=0.9) mod.init_optimizer( ... optimizer='sgd', optimizer_params=(('learning_rate', 0.1), ('lr_scheduler', lr_sch))) steps = [0, 100, 200, 250, 300, 325, 350] lr_sch = mx.lr_scheduler.MultiFactorScheduler(step=steps, factor=0.9) mod.init_optimizer( ... optimizer='sgd', optimizer_params=(('learning_rate', 0.1), ('lr_scheduler', lr_sch)))
  • 21. Picking an optimizer • MXNet supports many different optimizers https://mxnet.incubator.apache.org/api/python/optimization.html http://ruder.io/optimizing-gradient-descent/ • It’s unlikely that a single one will work best every time. Experiment! • Several SGD variants adapt the learning rate during training • Some of them even use a specific learning rate for each parameter Example: learning MNIST with the LeNet CNN (20 epochs) Algorithm SGD NAG Adam NAdam AdaGrad AdaMax Time / epoch 2.5s 2.55s 18.5s 15.1s 5.7s 7.5s Validation accuracy 98.5% 98.5% 98.3% 98.4% 99.2% 98.55%
  • 22. Reducing model size • Complex neural networks are too large for resource-constrained environments • MXNet supports Mixed Precision Training • Use float16 instead of float32 • Almost 2x reduction in memory consumption, no loss of accuracy • https://devblogs.nvidia.com/parallelforall/mixed-precision-training-deep-neural-networks/ • http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#mxnet • BMXNet: Binary Neural Network Implementation • Use binary values for weights and activations • 20x to 30x reduction in model size, with limited loss • https://github.com/hpi-xnor/BMXNet
  • 23. Monitoring the training process • You can run callbacks at the end of each batch and at the end of each epoch. • This allows you to display training speed… • … and save parameters after each epoch module.fit(iterator, num_epoch=n_epoch, ... batch_end_callback=mx.callback.Speedometer(64, 10)) Epoch[0] Batch [10] Speed: 1910.41 samples/sec Train-accuracy=0.200000 Epoch[0] Batch [20] Speed: 1764.83 samples/sec Train-accuracy=0.400000 module.fit(iterator, num_epoch=n_epoch, ... epoch_end_callback = mx.callback.do_checkpoint("mymodel", 1)) Start training with [cpu(0)] Epoch[0] Resetting Data Iterator Epoch[0] Time cost=0.100 Saved checkpoint to "mymodel-0001.params" Epoch[1] Resetting Data Iterator Epoch[1] Time cost=0.060 Saved checkpoint to "mymodel-0002.params"
  • 24. Early stopping Training accuracy Loss function Accuracy 100% Epochs Validation accuracy Loss Best checkpoint OVERFITTING
  • 25. Conclusion • There is a lot of literature on selecting and tweaking hyper-parameters • You should definitely read it but please experiment with your own data • Train 1,000 models and pick the best one • Optimizing infrastructure is all the more important, then! • Make sure all parts are firing on all cylinders • Spot instances! • I hope this was useful. Please don’t forget to send your feedback • Go build cool stuff and let me know! Happy to share and retweet 
  • 27. Thank you! Julien Simon, AI Evangelist, EMEA @julsimon
  • 28. THANK YOU! J u l i e n S i m o n , P r i n c i p a l A I / M L E v a n g e l i s t , E M E A @ j u l s i m o n

Notes de l'éditeur

  1. ImageNet: 1.2 million files, 152 GB
  2. ImageNet: 1.2 million files, 152 GB
  3. ImageNet: 1.2 million files, 152 GB
  4. ImageNet: 1.2 million files, 152 GB
  5. Intel® MKL-DNN (Math Kernel Library for Deep Neural Networks) is highly optimized using industry leading techniques and low level assembly code where appropriate. The API has been developed with feedback and interaction with the major framework owners, and as an open source project will track new and emerging trends in these frameworks. Intel is using this internally for our work in optimizing industry frameworks, as well as supporting the industry in their optimizations.