SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Automatic Selection of Object
Recognition Methods Using
Reinforcement Learning
Reinaldo A.C. Bianchi†, Arnau Ramisa‡,
and Ramón López de Mántaras‡
†Centro Universitário da FEI, Brazil
‡Artificial Intelligence Research Institute, Spain
Presenter: Shunta SAITO
13年5月22日水曜日
Authors
• Reinaldo A.C. Bianchi
‣ full professor at the electric engineering department of the
centro universitário da FEI, at são bernardo do campo, são
paulo, brasil
• Arnau Ramisa
‣ postdoc with the Perception and Manipulation team at the
Industrial Robotics Institute (IRI UPC-CSIC), at the
Universitat Politecnica de Catalunya
• Ramón López de Mántaras
‣ Director of the IIIA (Artificial Intelligence Research Institute)
of the CSIC (Spanish National Research Council)
13年5月22日水曜日
Abstract
Which algorithm should be used to recognize objects?
Question:
Goal:
Automatically select the best algorithm from 2 state-of-
the-art object recognition algorithms
Methodology:
Using Reinforcement Learning
Background:
The robot should be able to decide by itself which object
recognition method should be used, depending on the
current conditions of the world
13年5月22日水曜日
Abstract
Which algorithm should be used to recognize objects?
Question:
Goal:
Automatically select the best algorithm from 2 state-of-
the-art object recognition algorithms
Methodology:
Using Reinforcement Learning
Background:
The robot should be able to decide by itself which object
recognition method should be used, depending on the
current conditions of the world
13年5月22日水曜日
Reinforcement Learning
The RL problem is meant to be a straightforward framing of
the problem of learning from interaction to achieve a goal
(Sutton and Barto, 1998)
Formulation:
assuming that the environment fill Markov property
- finite set of states   that the agent can achieve;s 2 S
- A finite set of possible actions   that the agent can perform;a 2 A
- A state transition function T : S ⇥ A !
Y
(S)
where   is a probability distribution over
Y
(S) S
- A finite set of bounded reinforcements (payoffs) R : S ⇥ A ! R
task: find out a stationary policy of actions ⇡⇤
: S ! A
13年5月22日水曜日
Reinforcement Learning
Formulation:
Q⇤
(s, a)
is the reward received upon performing action in statea s
+ the discounted value of following the optimal policy thereafter
Q⇤
(s, a) ⌘ R(s, a) +
X
s02S
T(s, a, s0
)V ⇤
(s0
)
optimal policy ⇡⇤
⌘ arg maxaQ⇤
(s, a)
transition probability
: optimal state-action function
Q⇤
(s, a) ⌘ R(s, a) +
X
s02S
T(s, a, s0
)maxa0 Q⇤
(s0
, a0
)
13年5月22日水曜日
Q-Learning
Formulation:
The Q-learning algorithm iteratively approximates ˆQ
the values will converge with probability 1 toˆQ Q⇤
ˆQLet be the learner s estimate of Q⇤
(s, a)
ˆQ(s, a) ˆQ(s, a) + ↵
h
r + maxa0 ˆQ(s0
, a0
) ˆQ(s, a)
i
a0
s0
s
a
...
...
...
...
...
...
...
...
...
backup diagram
ˆQ(s, a)
ˆQ(s0
, a0
)
(0  < 1)
↵ =
1
1 + visits(s, a)
total number of times this state-
action pair has been visited
times, and backup↵
step-size parameter
13年5月22日水曜日
Reinforcement Learning
Q-learning Algorithm:
Q.initialize(arbitrarily)
Episodes.each do
s.initialize
while s.is_terminal?
a = epsilon_greedy(s) # using policy derived rom Q
reward, next_state = take_action a
Q.update(s, a, reward, next_state)
s = next_state
end
end
ˆQ(s, a) ˆQ(s, a) + ↵
h
r + maxa0 ˆQ(s0
, a0
) ˆQ(s, a)
i
13年5月22日水曜日
RL applications in Computer Vision
Active Vision
Whitehead and Ballard (1991) => Machine Learning (Jnl.)
described an adaptive control architecture to integrate active sensory-
motor systems with RL based decision systems
Minut and Mahadevan (2001) => ICAA
proposed a model of selective attention for visual search tasks, such as
deciding where to fixate next in order to reach the region where an object is
most likely to be found
Darrell and Pentland (1996a,b) => NIPS, ICPR
proposed a gesture recognition system that guides an active camera to
foveate salient features based on a RL paradigm
To angle one's eyes such that the foveae are directed at (an object in one's field of view)
13年5月22日水曜日
RL applications in Computer Vision
Active Vision
Darrell (1998) => NIPS
concisely represented active recognition behavior derived from hidden-
state reinforcement learning techniques
Paletta and Pinz (2000) => Robotics and Autonomous Systems (Jnl.)
applied RL in an active object recognition system, to learn how to move
the camera to informative viewpoints, defining the recognition process as a
sequential decision problem with the objective of disambiguating initial
object hypotheses
Reinforcement Learning provides then an efficient method to
autonomously develop near-optimal decision strategies in terms of
sensori- motor mappings (Paletta et al, 1998)
For these authors...
13年5月22日水曜日
RL applications in Computer Vision
Active Vision
Borotschnig, et al. (1999) => Image and Vision Computing (Jnl.)
built a system that learns to reposition the camera to capture additional
views to improve the iamge classification result which is obtained from a
single view
Paletta, et al. (2005) => ICML
proposed the use of Q-learning to associate shift of attention actions to
cumulative reward with respect to object recognition
Image Segmentation
Peng and Bhanu (1998) => IEEE Trnsc. on PAMI
used RL to learn to adapt the image segmentation params of a specific
algorithm to the changing environmental conditions
13年5月22日水曜日
RL applications in Computer Vision
Image Segmentation and Object Recognition
Peng and Bhanu (1998, 2000) => IEEE Trnsc. on SMC
improved the recognition results by using the output at the highest level as
feedback for the learning system
Taylor (2004) => MSc Thesis
proposed a general framework for applying RL to parameter selection
problem in vision
Parameter Selection in Vision Problem
Tizhoosh and Taylor (2006) => Int. Jnl. of Image and Graphics
proposed a automated technique for obtaining a subjectively ideal image
enhancement
13年5月22日水曜日
RL applications in Computer Vision
Parameter Selection in Vision Problem
Shokri and Tizhoosh (2003∼2008) => some Int. Jnl.s...
proposed a reinforcement agent for finding an optimal threshold in order
to segment digital images
Yin (2002) => Signal Process (Jnl.)
design a general framework for an intelligent system to extract one object
of interest from ultrasound images based on reinforcement learning
Sahba, et al. (2008) => Expert Systems with Applications (Jnl.)
proposed a RL system for adaptive tropical cyclone patterns segmentation
and feature extraction from satellite imagery and introduced a closed-loop
system based tropical cyclone forecast on RL
Hossain, et al. (1999) => IEEE SMC
13年5月22日水曜日
RL applications in Computer Vision
Object Recognition
modeled the object recognition problem as a Markov Decision Problem,
and proposed a theoretically sound method for constructing object
recogntion strategies by combining CV algorithms to perform
segmentation (The result is a system called ADORE (Adaptive Object
Recognition) that automatically learns object-recognition strategies from
training data)
Draper, et al. (1999) => Jnl. of Computer Vision Research
There are many applications of RL in Computer Vision
13年5月22日水曜日
Summary of RL in CV
Whitehead and Ballard (1991)
active sensory-motor systems RL based decision systems+
adaptive control architecture
optimize the performance of active vision systems
decide where the focus of attention should be
learn how to move a camera to more informative viewpoints
optimize parameters of existing and new CV algorithms
diversified to...
13年5月22日水曜日
Limitations caused by RL
In Object Recognition Task
the reward value associated with a situation
‣ is usually not directly available
‣ requires that a certain amount of knowledge about the
world to be defined
the large space state
‣ make it difficult to converge
‣ RL algorithms rises performance issues
13年5月22日水曜日
Two Object Recognition Methods
Lowe s Feature Matching method (Lowe, 2004)
Vocabulary Tree Algorithm (Nistér and Stewénius, 2006)
‣ is proposed together with SIFT
‣ is a single view object detection
and recognition system
‣ matches features between a test and model images as below
‣ uses visual words (Bag-of-Words) to classify images
13年5月22日水曜日
Lowe s Method
Model Input
Result
Identification
13年5月22日水曜日
Vocabulary Tree Algorithm
which class?
Classification
13年5月22日水曜日
Preprocessing
• segmentation
1. apply bilateral filtering to remove texture from the
image
2. the Canny edge detector is applied to define the
edges in the image
3. mathematical morphology operators are applied in
order to close the contours that remained open
4. a flood-fill algorithm is used to fill connected areas
divided by the edges
13年5月22日水曜日
Weaknesses of two methods
Lowe s Feature Matching Method
performs poorly when recognizing sparsely textured
objects or objects with repetitive patterns
Vocabulary Tree Algorithm
needs an accurate segmentation state, prior to
classification, which can be very time consuming, and it
depends on the quality of the segmentation stage to
provide good results
13年5月22日水曜日
Learning to Select Object Recognition Methods
1st stage 1. decide to use the image for recognition
‣ because the image contains an object
2. decide the image should be discarded
‣ because the image does not contain objects
2nd stage 1. decide to use Lowe s algorithm
2. decide to use Vocabulary Tree(VT) Algorithm
use Reinforcement Learning as a classification method
13年5月22日水曜日
State
attributes extracted from the images
the possible classification of the image
+
Space state
State definition example in 1st stage
s = [I, , c]
c : class ID
σ: standard deviation of image intensity
I : mean image intensity
13年5月22日水曜日
Action
Update action (not real action happening in the world)
Q(s, a)update the value of a state-action pair   at one state
using the value of a neighbor pair
Example:
if space state is composed of [ I, σ] (2D)
I
σ
0.1 0.7 1.2 3.1 1.8
0.5 1.1 0.3 2.6 4.1
1.4 2.3 3.2 0.9 2.7
0.7 4.3 2.7 1.4 3.9
3.2 4.6 1.3 1.7 0.7
(after action a)
I
σ
0.1 0.7 1.2 3.1 1.8
0.5 1.1 0.3 2.6 4.1
1.4 2.3 2.3 0.9 2.7
0.7 4.3 2.7 1.4 3.9
3.2 4.6 1.3 1.7 0.7
update action
→action toward space state
Q(s, a) values
13年5月22日水曜日
Reward
If the learning agent reaches a state where a traning image
exists and the state corresponds to the correct classification
of the image, the agent receives a reward
Example:
a training image (mean = 50, std = 10, not contains objects)
a state (mean = 50, std = 10, classification = discard)
exists?
yes no
reward > 0 reward = 0
13年5月22日水曜日
MDP Definition
• The set of update actions that the agent can perform,
defined as update the Q value using the value of a neighbor
• The finite set of states in this case is the n-dimensional
space of values of the attributes extracted from the images
plus its classification;
• The state transition function allows updates to be made
between any pair of neighbors in the set of states
• The Reinforcements are defined using a set of
training images
a 2 A
s 2 S
R : S ⇥ A ! R
13年5月22日水曜日
Training phase
Reinforcement learning is performed over a set of pre-
classified images
→ learn a mapping from images to image classes
Learning algorithm
13年5月22日水曜日
Training phase
What is happening during the learning phase?
Goal
Every time the robot finds a
goal state and receive a
reward, the state-action pair
where the robot was before
reaching the goal state is
updated
Every time the robot moves,
it iteratively updates the origin
state-action pair
13年5月22日水曜日
Training phase
σ
I
class id
id 0: no such image in dataset
id 1: a image should be recognized by Lowe is found
id 2: a image should be recognized by VTA is found
learning
13年5月22日水曜日
Training phase
σ
I
class id
The right image shows a table where the classification was
spread over to states where there are no prior examples, and
that allows the classification of other images
learning
13年5月22日水曜日
Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
objects with
repetitive texture
textured
objects
non-textured
objects
3 categories (3 objects per category), each category consists of 3 different
objects and each object has approximately 20 training images
13年5月22日水曜日
Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
test image
include occlusions, illumination changes, blur and other typical nuisances
that will be encountered while navigating with a mobile robot
13年5月22日水曜日
Learning and Test database
image dataset of nine typical household objects
(Ramisa et al, 2008)
background images
that do not contain objects to be recognized
13年5月22日水曜日
Space State Descriptors
MS
mean and standard deviation of the image intensity
mean and standard deviation of the image intensity plus
entropy of the image
MSE
mean and standard deviation of the image intensity plus the
number of interest points detected by the Difference of
Gaussians operator
MSI
13年5月22日水曜日
Experiments
13年5月22日水曜日
Reward table
reward +10
reward -10
x in the training set exists an image with this
combination of mean and std dev values
. represents images that does not
contain objects (backgrounds)
(whitespace) is represents absence ob
images
13年5月22日水曜日
Classification Table
x in the training set exists an image with this
combination of mean and std dev values
. represents images that does not
contain objects (backgrounds)
(whitespace) is represents absence ob
images
the results of applying the RL algorithm during the learning phase
13年5月22日水曜日
Experiments
object image
(3 categories)
test image
(with nuisance)
background image
(with no objects)
image dataset of nine typical household objects
(Ramisa et al, 2008)
Experiment phases
1. the training of the RL
‣ 40 test images, from which approximately 160 images containing objects
were segmented and previously classified
‣ 360 background images, also resulting from the segmentation process
2. the execution phase where training quality can be verified
13年5月22日水曜日
Correctly Classified Images
Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert
MS MSE MSI MS MSE MSI
Back 91.9 100.0 98.0 92.6 100.0 98.9 100.0
Lowe 84.5 100.0 44.4 76.0 98.4 38.1 93.2
(%)
↵ = 0.1 = 0.9
13年5月22日水曜日
Incorrect Classification
Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert
MS MSE MSI MS MSE MSI
Back 12.8 1.8 14.2 20.4 2.4 25.3 8.2
Lowe 11.6 1.9 7.9 15.8 1.9 9.9 10.8
(%)
↵ = 0.1 = 0.9
13年5月22日水曜日
Conclusion
• Reinforcement Learning has been widely used in the
Computer Vision field
• In this paper we presented a method that uses
Reinforcement Learning to decide which algorithm
should be used to recognize objects seen by a mobile
robot in an indoor environment
• Another important contribution of this work is a
method that allows the use of a Reinforcement
Learning algorithm as a Classifier
13年5月22日水曜日

Contenu connexe

Tendances

Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
Manish Khare
 
Object tracking
Object trackingObject tracking
Object tracking
chirase44
 
Resume Yu-Li Liang
Resume Yu-Li LiangResume Yu-Li Liang
Resume Yu-Li Liang
Yuli Liang
 

Tendances (20)

A Critical Survey on Detection of Object and Tracking of Object With differen...
A Critical Survey on Detection of Object and Tracking of Object With differen...A Critical Survey on Detection of Object and Tracking of Object With differen...
A Critical Survey on Detection of Object and Tracking of Object With differen...
 
Object tracking final
Object tracking finalObject tracking final
Object tracking final
 
Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1 Video surveillance Moving object detection& tracking Chapter 1
Video surveillance Moving object detection& tracking Chapter 1
 
Uncertainty-wise Engineering of IoT Cloud Systems
Uncertainty-wise Engineering of IoT Cloud SystemsUncertainty-wise Engineering of IoT Cloud Systems
Uncertainty-wise Engineering of IoT Cloud Systems
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a survey
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
 
L0816166
L0816166L0816166
L0816166
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognition
 
Object tracking
Object trackingObject tracking
Object tracking
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Strategy for Foreground Movement Identification Adaptive to Background Variat...
Strategy for Foreground Movement Identification Adaptive to Background Variat...Strategy for Foreground Movement Identification Adaptive to Background Variat...
Strategy for Foreground Movement Identification Adaptive to Background Variat...
 
Resume Yu-Li Liang
Resume Yu-Li LiangResume Yu-Li Liang
Resume Yu-Li Liang
 
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION  DETECTION AND TRACKING FOR VIDEO SURVEILLANCEHUMAN MOTION  DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
 
[IJET-V1I6P15] Authors : Sadhana Raut, Poonam Rohani,Sumera Shaikh, Tehesin S...
[IJET-V1I6P15] Authors : Sadhana Raut, Poonam Rohani,Sumera Shaikh, Tehesin S...[IJET-V1I6P15] Authors : Sadhana Raut, Poonam Rohani,Sumera Shaikh, Tehesin S...
[IJET-V1I6P15] Authors : Sadhana Raut, Poonam Rohani,Sumera Shaikh, Tehesin S...
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
 
Image recognition
Image recognitionImage recognition
Image recognition
 

Similaire à Automatic selection of object recognition methods using reinforcement learning

UHDMML.pps
UHDMML.ppsUHDMML.pps
UHDMML.pps
butest
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
IJMER
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
IJMER
 

Similaire à Automatic selection of object recognition methods using reinforcement learning (20)

IRJET- Proposed System for Animal Recognition using Image Processing
IRJET-  	  Proposed System for Animal Recognition using Image ProcessingIRJET-  	  Proposed System for Animal Recognition using Image Processing
IRJET- Proposed System for Animal Recognition using Image Processing
 
D018112429
D018112429D018112429
D018112429
 
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
 
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
 
K-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective BackgroundK-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective Background
 
IRJET- Moving Object Detection using Foreground Detection for Video Surveil...
IRJET- 	 Moving Object Detection using Foreground Detection for Video Surveil...IRJET- 	 Moving Object Detection using Foreground Detection for Video Surveil...
IRJET- Moving Object Detection using Foreground Detection for Video Surveil...
 
UHDMML.pps
UHDMML.ppsUHDMML.pps
UHDMML.pps
 
A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Real time object tracking and learning using template matching
Real time object tracking and learning using template matchingReal time object tracking and learning using template matching
Real time object tracking and learning using template matching
 
Detection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A SurveyDetection and Tracking of Moving Object: A Survey
Detection and Tracking of Moving Object: A Survey
 
UNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTS
UNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTSUNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTS
UNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTS
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
IRJET - A Systematic Observation in Digital Image Forgery Detection using MATLAB
IRJET - A Systematic Observation in Digital Image Forgery Detection using MATLABIRJET - A Systematic Observation in Digital Image Forgery Detection using MATLAB
IRJET - A Systematic Observation in Digital Image Forgery Detection using MATLAB
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI Robot
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
IRJET- Automated Attendance System using Face Recognition
IRJET-  	  Automated Attendance System using Face RecognitionIRJET-  	  Automated Attendance System using Face Recognition
IRJET- Automated Attendance System using Face Recognition
 
Motion analyser using image processing
Motion analyser using image processingMotion analyser using image processing
Motion analyser using image processing
 

Plus de Shunta Saito

強化学習入門
強化学習入門強化学習入門
強化学習入門
Shunta Saito
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
Shunta Saito
 

Plus de Shunta Saito (11)

Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusion
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Automatic selection of object recognition methods using reinforcement learning

  • 1. Automatic Selection of Object Recognition Methods Using Reinforcement Learning Reinaldo A.C. Bianchi†, Arnau Ramisa‡, and Ramón López de Mántaras‡ †Centro Universitário da FEI, Brazil ‡Artificial Intelligence Research Institute, Spain Presenter: Shunta SAITO 13年5月22日水曜日
  • 2. Authors • Reinaldo A.C. Bianchi ‣ full professor at the electric engineering department of the centro universitário da FEI, at são bernardo do campo, são paulo, brasil • Arnau Ramisa ‣ postdoc with the Perception and Manipulation team at the Industrial Robotics Institute (IRI UPC-CSIC), at the Universitat Politecnica de Catalunya • Ramón López de Mántaras ‣ Director of the IIIA (Artificial Intelligence Research Institute) of the CSIC (Spanish National Research Council) 13年5月22日水曜日
  • 3. Abstract Which algorithm should be used to recognize objects? Question: Goal: Automatically select the best algorithm from 2 state-of- the-art object recognition algorithms Methodology: Using Reinforcement Learning Background: The robot should be able to decide by itself which object recognition method should be used, depending on the current conditions of the world 13年5月22日水曜日
  • 4. Abstract Which algorithm should be used to recognize objects? Question: Goal: Automatically select the best algorithm from 2 state-of- the-art object recognition algorithms Methodology: Using Reinforcement Learning Background: The robot should be able to decide by itself which object recognition method should be used, depending on the current conditions of the world 13年5月22日水曜日
  • 5. Reinforcement Learning The RL problem is meant to be a straightforward framing of the problem of learning from interaction to achieve a goal (Sutton and Barto, 1998) Formulation: assuming that the environment fill Markov property - finite set of states   that the agent can achieve;s 2 S - A finite set of possible actions   that the agent can perform;a 2 A - A state transition function T : S ⇥ A ! Y (S) where   is a probability distribution over Y (S) S - A finite set of bounded reinforcements (payoffs) R : S ⇥ A ! R task: find out a stationary policy of actions ⇡⇤ : S ! A 13年5月22日水曜日
  • 6. Reinforcement Learning Formulation: Q⇤ (s, a) is the reward received upon performing action in statea s + the discounted value of following the optimal policy thereafter Q⇤ (s, a) ⌘ R(s, a) + X s02S T(s, a, s0 )V ⇤ (s0 ) optimal policy ⇡⇤ ⌘ arg maxaQ⇤ (s, a) transition probability : optimal state-action function Q⇤ (s, a) ⌘ R(s, a) + X s02S T(s, a, s0 )maxa0 Q⇤ (s0 , a0 ) 13年5月22日水曜日
  • 7. Q-Learning Formulation: The Q-learning algorithm iteratively approximates ˆQ the values will converge with probability 1 toˆQ Q⇤ ˆQLet be the learner s estimate of Q⇤ (s, a) ˆQ(s, a) ˆQ(s, a) + ↵ h r + maxa0 ˆQ(s0 , a0 ) ˆQ(s, a) i a0 s0 s a ... ... ... ... ... ... ... ... ... backup diagram ˆQ(s, a) ˆQ(s0 , a0 ) (0  < 1) ↵ = 1 1 + visits(s, a) total number of times this state- action pair has been visited times, and backup↵ step-size parameter 13年5月22日水曜日
  • 8. Reinforcement Learning Q-learning Algorithm: Q.initialize(arbitrarily) Episodes.each do s.initialize while s.is_terminal? a = epsilon_greedy(s) # using policy derived rom Q reward, next_state = take_action a Q.update(s, a, reward, next_state) s = next_state end end ˆQ(s, a) ˆQ(s, a) + ↵ h r + maxa0 ˆQ(s0 , a0 ) ˆQ(s, a) i 13年5月22日水曜日
  • 9. RL applications in Computer Vision Active Vision Whitehead and Ballard (1991) => Machine Learning (Jnl.) described an adaptive control architecture to integrate active sensory- motor systems with RL based decision systems Minut and Mahadevan (2001) => ICAA proposed a model of selective attention for visual search tasks, such as deciding where to fixate next in order to reach the region where an object is most likely to be found Darrell and Pentland (1996a,b) => NIPS, ICPR proposed a gesture recognition system that guides an active camera to foveate salient features based on a RL paradigm To angle one's eyes such that the foveae are directed at (an object in one's field of view) 13年5月22日水曜日
  • 10. RL applications in Computer Vision Active Vision Darrell (1998) => NIPS concisely represented active recognition behavior derived from hidden- state reinforcement learning techniques Paletta and Pinz (2000) => Robotics and Autonomous Systems (Jnl.) applied RL in an active object recognition system, to learn how to move the camera to informative viewpoints, defining the recognition process as a sequential decision problem with the objective of disambiguating initial object hypotheses Reinforcement Learning provides then an efficient method to autonomously develop near-optimal decision strategies in terms of sensori- motor mappings (Paletta et al, 1998) For these authors... 13年5月22日水曜日
  • 11. RL applications in Computer Vision Active Vision Borotschnig, et al. (1999) => Image and Vision Computing (Jnl.) built a system that learns to reposition the camera to capture additional views to improve the iamge classification result which is obtained from a single view Paletta, et al. (2005) => ICML proposed the use of Q-learning to associate shift of attention actions to cumulative reward with respect to object recognition Image Segmentation Peng and Bhanu (1998) => IEEE Trnsc. on PAMI used RL to learn to adapt the image segmentation params of a specific algorithm to the changing environmental conditions 13年5月22日水曜日
  • 12. RL applications in Computer Vision Image Segmentation and Object Recognition Peng and Bhanu (1998, 2000) => IEEE Trnsc. on SMC improved the recognition results by using the output at the highest level as feedback for the learning system Taylor (2004) => MSc Thesis proposed a general framework for applying RL to parameter selection problem in vision Parameter Selection in Vision Problem Tizhoosh and Taylor (2006) => Int. Jnl. of Image and Graphics proposed a automated technique for obtaining a subjectively ideal image enhancement 13年5月22日水曜日
  • 13. RL applications in Computer Vision Parameter Selection in Vision Problem Shokri and Tizhoosh (2003∼2008) => some Int. Jnl.s... proposed a reinforcement agent for finding an optimal threshold in order to segment digital images Yin (2002) => Signal Process (Jnl.) design a general framework for an intelligent system to extract one object of interest from ultrasound images based on reinforcement learning Sahba, et al. (2008) => Expert Systems with Applications (Jnl.) proposed a RL system for adaptive tropical cyclone patterns segmentation and feature extraction from satellite imagery and introduced a closed-loop system based tropical cyclone forecast on RL Hossain, et al. (1999) => IEEE SMC 13年5月22日水曜日
  • 14. RL applications in Computer Vision Object Recognition modeled the object recognition problem as a Markov Decision Problem, and proposed a theoretically sound method for constructing object recogntion strategies by combining CV algorithms to perform segmentation (The result is a system called ADORE (Adaptive Object Recognition) that automatically learns object-recognition strategies from training data) Draper, et al. (1999) => Jnl. of Computer Vision Research There are many applications of RL in Computer Vision 13年5月22日水曜日
  • 15. Summary of RL in CV Whitehead and Ballard (1991) active sensory-motor systems RL based decision systems+ adaptive control architecture optimize the performance of active vision systems decide where the focus of attention should be learn how to move a camera to more informative viewpoints optimize parameters of existing and new CV algorithms diversified to... 13年5月22日水曜日
  • 16. Limitations caused by RL In Object Recognition Task the reward value associated with a situation ‣ is usually not directly available ‣ requires that a certain amount of knowledge about the world to be defined the large space state ‣ make it difficult to converge ‣ RL algorithms rises performance issues 13年5月22日水曜日
  • 17. Two Object Recognition Methods Lowe s Feature Matching method (Lowe, 2004) Vocabulary Tree Algorithm (Nistér and Stewénius, 2006) ‣ is proposed together with SIFT ‣ is a single view object detection and recognition system ‣ matches features between a test and model images as below ‣ uses visual words (Bag-of-Words) to classify images 13年5月22日水曜日
  • 18. Lowe s Method Model Input Result Identification 13年5月22日水曜日
  • 19. Vocabulary Tree Algorithm which class? Classification 13年5月22日水曜日
  • 20. Preprocessing • segmentation 1. apply bilateral filtering to remove texture from the image 2. the Canny edge detector is applied to define the edges in the image 3. mathematical morphology operators are applied in order to close the contours that remained open 4. a flood-fill algorithm is used to fill connected areas divided by the edges 13年5月22日水曜日
  • 21. Weaknesses of two methods Lowe s Feature Matching Method performs poorly when recognizing sparsely textured objects or objects with repetitive patterns Vocabulary Tree Algorithm needs an accurate segmentation state, prior to classification, which can be very time consuming, and it depends on the quality of the segmentation stage to provide good results 13年5月22日水曜日
  • 22. Learning to Select Object Recognition Methods 1st stage 1. decide to use the image for recognition ‣ because the image contains an object 2. decide the image should be discarded ‣ because the image does not contain objects 2nd stage 1. decide to use Lowe s algorithm 2. decide to use Vocabulary Tree(VT) Algorithm use Reinforcement Learning as a classification method 13年5月22日水曜日
  • 23. State attributes extracted from the images the possible classification of the image + Space state State definition example in 1st stage s = [I, , c] c : class ID σ: standard deviation of image intensity I : mean image intensity 13年5月22日水曜日
  • 24. Action Update action (not real action happening in the world) Q(s, a)update the value of a state-action pair   at one state using the value of a neighbor pair Example: if space state is composed of [ I, σ] (2D) I σ 0.1 0.7 1.2 3.1 1.8 0.5 1.1 0.3 2.6 4.1 1.4 2.3 3.2 0.9 2.7 0.7 4.3 2.7 1.4 3.9 3.2 4.6 1.3 1.7 0.7 (after action a) I σ 0.1 0.7 1.2 3.1 1.8 0.5 1.1 0.3 2.6 4.1 1.4 2.3 2.3 0.9 2.7 0.7 4.3 2.7 1.4 3.9 3.2 4.6 1.3 1.7 0.7 update action →action toward space state Q(s, a) values 13年5月22日水曜日
  • 25. Reward If the learning agent reaches a state where a traning image exists and the state corresponds to the correct classification of the image, the agent receives a reward Example: a training image (mean = 50, std = 10, not contains objects) a state (mean = 50, std = 10, classification = discard) exists? yes no reward > 0 reward = 0 13年5月22日水曜日
  • 26. MDP Definition • The set of update actions that the agent can perform, defined as update the Q value using the value of a neighbor • The finite set of states in this case is the n-dimensional space of values of the attributes extracted from the images plus its classification; • The state transition function allows updates to be made between any pair of neighbors in the set of states • The Reinforcements are defined using a set of training images a 2 A s 2 S R : S ⇥ A ! R 13年5月22日水曜日
  • 27. Training phase Reinforcement learning is performed over a set of pre- classified images → learn a mapping from images to image classes Learning algorithm 13年5月22日水曜日
  • 28. Training phase What is happening during the learning phase? Goal Every time the robot finds a goal state and receive a reward, the state-action pair where the robot was before reaching the goal state is updated Every time the robot moves, it iteratively updates the origin state-action pair 13年5月22日水曜日
  • 29. Training phase σ I class id id 0: no such image in dataset id 1: a image should be recognized by Lowe is found id 2: a image should be recognized by VTA is found learning 13年5月22日水曜日
  • 30. Training phase σ I class id The right image shows a table where the classification was spread over to states where there are no prior examples, and that allows the classification of other images learning 13年5月22日水曜日
  • 31. Learning and Test database image dataset of nine typical household objects (Ramisa et al, 2008) objects with repetitive texture textured objects non-textured objects 3 categories (3 objects per category), each category consists of 3 different objects and each object has approximately 20 training images 13年5月22日水曜日
  • 32. Learning and Test database image dataset of nine typical household objects (Ramisa et al, 2008) test image include occlusions, illumination changes, blur and other typical nuisances that will be encountered while navigating with a mobile robot 13年5月22日水曜日
  • 33. Learning and Test database image dataset of nine typical household objects (Ramisa et al, 2008) background images that do not contain objects to be recognized 13年5月22日水曜日
  • 34. Space State Descriptors MS mean and standard deviation of the image intensity mean and standard deviation of the image intensity plus entropy of the image MSE mean and standard deviation of the image intensity plus the number of interest points detected by the Difference of Gaussians operator MSI 13年5月22日水曜日
  • 36. Reward table reward +10 reward -10 x in the training set exists an image with this combination of mean and std dev values . represents images that does not contain objects (backgrounds) (whitespace) is represents absence ob images 13年5月22日水曜日
  • 37. Classification Table x in the training set exists an image with this combination of mean and std dev values . represents images that does not contain objects (backgrounds) (whitespace) is represents absence ob images the results of applying the RL algorithm during the learning phase 13年5月22日水曜日
  • 38. Experiments object image (3 categories) test image (with nuisance) background image (with no objects) image dataset of nine typical household objects (Ramisa et al, 2008) Experiment phases 1. the training of the RL ‣ 40 test images, from which approximately 160 images containing objects were segmented and previously classified ‣ 360 background images, also resulting from the segmentation process 2. the execution phase where training quality can be verified 13年5月22日水曜日
  • 39. Correctly Classified Images Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert MS MSE MSI MS MSE MSI Back 91.9 100.0 98.0 92.6 100.0 98.9 100.0 Lowe 84.5 100.0 44.4 76.0 98.4 38.1 93.2 (%) ↵ = 0.1 = 0.9 13年5月22日水曜日
  • 40. Incorrect Classification Full ImgFull ImgFull Img Small ImgSmall ImgSmall Img Expert MS MSE MSI MS MSE MSI Back 12.8 1.8 14.2 20.4 2.4 25.3 8.2 Lowe 11.6 1.9 7.9 15.8 1.9 9.9 10.8 (%) ↵ = 0.1 = 0.9 13年5月22日水曜日
  • 41. Conclusion • Reinforcement Learning has been widely used in the Computer Vision field • In this paper we presented a method that uses Reinforcement Learning to decide which algorithm should be used to recognize objects seen by a mobile robot in an indoor environment • Another important contribution of this work is a method that allows the use of a Reinforcement Learning algorithm as a Classifier 13年5月22日水曜日