1. Passionate
on Parallel
REU 2015
This REU is co-funded by the ASSURE program of the Department of
Defense in partnership with the National Science Foundation REU Site
Program under Award No. 1263145.
Accelerating Convolution Neural Network Learning with MPI
Dustyn Tubbs, Saginaw Valley State University Patrick Streifel, St. Mary’s College of Maryland Advisors: Dr. Deming Chen and Ashutosh Dhar
Convolution Neural Networks (CNNs) are a type of feed-
forward neural network specialized for classifying objects
in images. CNNs guess the class of an object in an image by
applying multiple convolution filters across regions of the
image. To train a CNN, these filters are adjusted based on a
guess’s error, so that the updated filters produces slightly
more accurate classifications. This method of supervised
learning typically takes place over tens of thousands of
labeled images.
Background
Having finished data-parallelism in programming,
our future work is based on acquiring training-
time metrics for published architectures (such as
the GoogLeNet architecture in [1]) and
implementing model parallelism as another form
of CNN parallelism. After this is accomplished, we
intend to release a package to the Torch7
community which will let them easily perform
both model and data parallelism in their research.
Future Works
Ever since 2012, when the first CNN to win the biannual
ImageNet Challenge (a visual recognition contest)
outperformed its closest competitor by 11%, CNNs have
become the state of the art for object identification. They
are being used today by big names like Google and
Facebook to improve image searches, auto-tag people in
photograph, and countless other applications.
Given the rapid development in CNN research, the need to
exploit parallelization techniques to allow for both larger
networks and faster training is apparent.
Motivation
[1] Szegedy, Christian, et al. "Going deeper with
convolutions." arXiv preprint arXiv:1409.4842 (2014).
[2] Krizhevsky, Alex. "One weird trick for parallelizing
convolutional neural networks." arXiv preprint
arXiv:1404.5997 (2014).
This research project could not have been accomplished without
the dedicated support of our mentors Deming Chen and
Ashutosh Dhar, our REU leaders and organizers Jill Peckham,
Craig Ziles, and Mathew West, and the University of Illinois at
Urbana-Champaign. Special thanks to the Circuits Research
Group at CSL for hosting us for the duration of our project.
References and
Acknowledgements
Data-Parallelism in Convolution Neural Networks is accomplished by performing
distribution of both the network and the data across multiple nodes. Traditionally, a
CNN is trained serially on a single node, which loads the entire data set (as seen to
the right). Compare this structure with the below structure, which distributes both
the data and the network. Instead of training on one image at a time, the CNN can
train on each node-local chunk of training data and communicate the necessary
changes to every network copy across the network. This is superior in that it lets the
researcher quickly train the network, adjust the topology, and train again.
Data-ParallelismWe used Torch7, a framework for working with CNNs.
During our research, we discovered that Torch7 had no
such convenient utilities for MPI parallelization. So we
decided that our goal should be to implement MPI
parallelization for Torch CNNs.
Goal
There are two paradigms for CNN parallelization: model and data parallelism. In
model parallelism, the CNN architecture is divided among several workers and
trained on the same batch of images. This technique allows you to train on much
larger CNNs. In data parallelism, identical copies of the CNN are maintained by each
worker and trained in parallel on separate batches of images. This allows you to train
a CNN on many more images at one time.
Parallelization Methods