Deep learning has revolutionized computer vision by significantly increasing the accuracy of recognition systems. This session will discuss how the Amazon Fulfillment Technologies Computer Vision Research team has harnessed deep learning to identify inventory defects in Amazon’s warehouses. Beginning with a brief overview of how orders on Amazon.com are fulfilled, the talk will describe a combination of hardware and software that uses computer vision and deep learning that visually examine bins of Amazon inventory to locate possible mismatches between the physical inventory and inventory records. With the growth of deep learning, the emphasis of new system design shifts from clever algorithms to innovative ways to harness available data.
2. What to Expect from the Session
• Description of how Amazon Fulfillment Technologies has
used computer vision to improve our processes.
• Walk through how we combined deep learning and
traditional computer vision to automate an industrial
process.
• What are the challenges and the opportunity created by
deep learning classifiers?
13. General strategy
• We want to take advantage of deep learning.
• The cameras capture images of an entire pod, but we
need data at the bin level.
• We will have a two-step process:
1. Extracting bins from images
2. Analyzing bin Images
24. Fit a line (similar process for the other side)
Amazon Confidential 24
25. We can detect trays in the same way
Amazon Confidential 25
26. We can detect trays in the same way
Now we
have
locations to
tie the
virtual
template to
the image!
Amazon Confidential 26
27. Transformation between image and pod
physical coordinates is called a homography
We can verify
that it works by
calculating the
boundary of
each bin in the
image and
coloring it in.
Amazon Confidential 27
28. How can we use computer vision?
• Automatic
identification of
every item?
Amazon Confidential 28
29. How can we use computer vision?
• Automatic
identification of
every item?(TOO
HARD)
• Automatic
counting of every
item?
Amazon Confidential 29
30. What does computer vision need to tell us?
• Automatic
identification of every
item?(TOO HARD)
• Automatic counting
of every item? (TOO
HARD)
Amazon Confidential 30
31. Instead, we can look for changes
Inbound to the Station Outbound from the Station
Amazon Confidential 31
32. Our first attempt was with hand-engineered
computer vision
Amazon Confidential 32
33. It’s hard!
Must be robust to items rolling or shuffling inside
the bin, illumination changes, specularity, etc.
34. The big insight
• We realized our problem was just binary classification.
• Two images in, one label out.
• Why not try this deep-learning thing?
35. We did the simplest thing possible
• Take the first image,
convert it to grayscale,
and put it in the red
channel of a new image
• Take the second image
and put it in the blue
channel
• Now, we have a single
image to pass to the
neural network
38. Implementation details
• Implemented in OpenCV in Python
• C++ extensions for some steps
• Neural net uses Caffe
• Trained on G2 instances
• Runs on CPU in FC server room
• Can tolerate latency in our current use-pattern
39. Software architecture
Inventory
Event
Correlator
(EC2)
VBI
Service
(EC2)
Remote
Count
Website
(Defect
Detection)
(EC2)
Site Server Room AWS
Inventory
Bin Count
Elimination
(EC2)
• Get Bin Defect
Result
• Get Bin Space
Available
Capture
Event
Data
Router
Bin
Extraction
Process
Auto
Count
Process
Local
Storage
Service
Put
Pod Face
Images
Put Bin
Images
Get Pod
Images
Camera
Controller
File Pusher
Barcode
Extraction
Edge
Device (s)
EDGE
DEVICE
Get Bin
Image
Get Bin
Image
Applications
SNS
HTTP
POST
SNS
DynamoDB
SNS
SNS
SQS
Get Work for Remote
Counting
SQS
SQS
SNS
40. How can we use computer vision?
Automatic
identification of every
item?(TOO HARD)
Automatic counting
of every item?
Amazon Confidential 40
41. Could we just count the number of items in the
bin?
• At this point, we have lots of data.
• Some of it has errors from inventory defects, but
networks have proven resilient to this kind of thing.
• Why not just train a network to directly count bins?
42. Using a convolutional neural network
• We used the Caffe implementation of GoogLeNet [1]
[1] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE International
Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
43. Maps cleanly onto classification paradigm
• Treat it as a multi-class classification problem
Neural
Network
0.1
0.2
0.4
0.4
44. This saved the project
• Hit the targets we needed
• Eliminated a lot of hardware (no more before/after shots
needed)
• Made the project cost effective
• Here is what we learned:
• Don’t focus on algorithms, focus on DATA
45. How else can we use this data?
• We want to find free space
in the bin without having to
label data.
• We can guess from
dimensions of items.
• But where is the space at?
2.0
1.0
46. Train model to predict emptiness from an image
Emptiness scoreConv
Avg
Po
ol
GoogleNet
Conv
(3*3)
This is a noisy,
probably incorrect
estimate!
47. But we can use layers in the network to find where the
space actually is!
emptiness scoreConv
Avg
Po
ol
GoogleNet
Conv
(3*3)
1024 channels
3*3
50. Takeaways
• We have great pattern recognition machinery now.
• Focus on the data:
• How can you get lots of it?
• What can you get for free?
• How much labeling do you really need?
• Is there a proxy problem?