1. Biologically-inspired Active Vision System for Object Recognition
Martin Peniak, Davide Marocco
University of Plymouth
Ron Babich, John Tran
NVIDIA Research
Outline
1. Introduction
a. Biological vision vs Computer vision
b. The role of active perception
c. Neural networks and Genetic Algorithms
2. Background
a. Presentation of related research (Marocco, Floreano, etc.)
3. Preliminary Experiments
a. Method (neural networks + genetic algorithms on GPU)
b. Results (video of evolved controllers)
4. Conclusions
A long-standing challenge in robotics is the development of a truly robust and general-purpose vision system
suitable for object identification, navigation, and other tasks. An unconventional but promising approach for
tackling this challenge relies on the concept of active perception, inspired by the observation that biological
organisms interact with the world in order to make sense of it. In the context of vision, this argues for a system
that takes in only a small part of the scene at a time (mimicking that captured by the fovea in the human eye),
moving from one such part to another in rapid succession. By leveraging a neural network for control, it is possible
to evolve an active vision system with the desired characteristics.
Prior work has relied on very small arrays of photoreceptors (e.g., 5x5), applied to simple identification tasks such
as distinguishing a triangle from a square. Although valuable as proofs of concept, tackling real-world problems
will require much larger systems backed by much larger neural networks, where the computational cost of training
grows super-linearly. We thus turn to an efficient CUDA implementation, scalable to many GPUs in parallel.
Our system is based on an Elman-type recurrent neural network with a biologically-inspired retina. The neural
network is evolved through a genetic algorithm incorporating the island model, which involves segregated
populations whose members migrate between “islands” only infrequently. This design both facilitates parallel
scaling and improves the quality of the final solution by avoiding convergence to local optima.
The active vision system was required to learn to recognize five different objects from Amsterdam Library of
Object Images (ALOI). These objects were presented to the system during the evolutionary process in 16 different
illuminations and 36 different rotation angles. Every neural network controller was able to explore each of these
variations in parallel on GPU, which made the evolutionary process significantly faster than a multi-threaded CPU
code. At the end of evolution, the controllers with the highest fitness were able to successfully recognize all the
objects within 20 time-steps. Our preliminary results suggest that this system is tolerant to variations in object
rotation, position and scale.