December 4, Project

Combing Reactive and
Deliberative Algorithms

CSCI7000: Final Presentation
Maciej Stachura
Dec. 4, 2009

Outline

• Project Overview

• Positioning System

• Hardware Demo

Project Goals
• Combine deliberative and reactive
algorithms

• Show stability and completeness

• Demonstrate multi-robot coverage on
iCreate robots.

Coverage Problem
• Cover Entire Area
• Deliberative Algorithm Plans
Next Point to visit.
• Reactive Algorithm pushes
robot to that point.
• Reactive Algorithm Adds 2
constraints:
• Maintain Communication Distance
• Collision Avoidance

Proof of Stability

Therefore stable system.
also,

error decays

Demo for single vehicle

• Implimented on iCreate.
• 5 points to visit.
• Deliberative Algorithm
Selects Point.
• Reactive Algorithm uses
potential field to reach point.
• Point reached when within
some minimum distance.
VIDEO

Multi-robot Case
• 2 Robot Coverage

• Blue is free to move VIDEO

• Green must stay in
communication range.

• Matlab Simulation.

Positioning System
• Problems with Stargazer.
• Periods of no measurement
• Occasional Bad Measurements

• State Estimation (SPF)
• Combine Stargazer with Odometry
• Reject Bad Measurements

SPF Explanation
• Sigma Point Filter uses
Stargazer and Odometry
measures to predict robot
position.
• Non-guassian Noise
• Implimented and Tested on
robot platform.
• Performs very well in the
presence of no measurements
or bad measurement.

Roomba Pac-Man
• Implimented 5 Robot Demo along
with Jack Elston.

• Re-creation of Pac-Man Game.

• Demonstrate NetUAS system.

• Showcase most of concepts
from class.

Roomba Pac-Man
• Reactive Algorithms:
• Walls of maze
• Potential Field

• Deliberative Algorithms
• Ghost Planning (Enumerate States)
• Collision Avoidance
• Game modes

• Decentralized
• Each ghost ran planning algorithm
• Collaborated on positions

• Communication
• 802.11b Ad-hoc Network
• AODV, no centralized node

Roomba Pac-Man
• Simulation
• Multi-threaded Sim. Of Robots
• Combine Software with Hardware

• Probabilistic Modelling
• Sigma Point Filter

• Human/Robot Interaction
• Limited Human Control of Pac-Man
• Autonomous Ghosts

• Hardware Implimentation
• SBC's running Gentoo
• Experimental Verification

Left to Do
• Impliment inter-robot potential field.

• Conduct Experiments

• Generalize Theory?

End

Questions?

http://pacman.elstonj.com

A Gradient Based Approach

Greg Brown

  Introduction
  Robot State Machine

  Gradients for “Grasping” the Object
  Gradient for Moving the Object

  Convergence Simulation Results
  Continuing Work

Place a single beacon on an object and
another at the object’s destination. Multiple
robots cooperate to move the object.

Goals:
  Minimal/No Robot Communication
  Object has an Unknown Geometry

  Use Gradients for Reactive Navigation

  Each Robot Knows:
◦  Distance/Direction to Object
◦  Distance/Direction to Destination
◦  Distance/Direction to All Other Robots
◦  Bumper Sensor to Detect Collision

  Robots Do Not Know
◦  Object Geometry
◦  Actions other Robots are taking

  Related “Grasping” Work:
◦  Grasping with hand – Maximize torque [Liu et al]
◦  Cage objects for pushing [Fink et al]
◦  Tug Boats Manipulating Barge [Esposito]
◦  ALL require known geometry
  My Hybrid Approach
◦  Even distribution around object
◦  Alternate between Convergence and Repulsion
Gradients

◦  Similar to Cow Herding example from class.

Pull towards object:

γ = ri − robj

€ Avoid nearby robots:
sign(d c − ri −r j )+1

( ri − rj − dc2 ) 2 
2 2
N 4
1+ d
β = ∏1− 4 c 2


j=1  dc ( ri − rj − dc2 ) 2 + 1


€

Combined Cost Function:

γ2
Cost =
(γ κ c + β )1/ κ c

€

Repel from all robots:
N
2
β = ∏ ri − rj − dr2
j=1

1
Cost =
(1+ β )1/ κ r
€

€

  Related Work
◦  Formations [Tanner and Kumar]
◦  Flocking [Lindhé et al]
◦  Pushing objects [Fink et al, Esposito]
◦  No catastrophic failure if out of position.

  My Approach:
◦  Head towards destination in steps
◦  Keep close to object.
◦  Communicate “through” object
◦  Maintain orientation.

  Assuming forklift on Robot can rotate 360º

Next Step Vector:
rObjCenter − rObjDest
rγ i = rideali + dm
rObjCenter − rObjDest

Pull to destination:
€
γ1 = ri − rγ i

€

Valley Perpendicular to Travel Vector:
rObjCenterx − rObjDestx
m=−
rObjCentery − rObjDesty + .0001

mrix − riy − mrγ x + rγ y
γ2 = 2
€ (m + 1)

κ1 κ 2
Cost = γ γ
1 2

Number of Occurences

0
10
20
30
40
50
521 60
670
820
969
1118
1268
1417
1566
1715
1865
2014
2163
2313
2462
2611
2761
2910

Time Steps
3059
3208
3358
3507
3656
3806
3955
4104
4254
4403
4552
4701
4851
5000
6 Bots
5 Bots
4 Bots
3 Bots

  Resolve Convergence Problems
  Noise in Sensing

  Noise in Actuation

Number of Occurences

0
10
20
30
40
50
245 60
404
562
721
879
1038
1196
1355
1513
1672
1830
1989
2147
2306
2464
2623
2781

Time Steps
2940
3098
3257
3415
3574
3732
3891
4049
4208
4366
4525
4683
4842
5000
6 Bots
5 Bots
4 Bots
3 Bots

Modular Robots
Learning
Contributions
Conclusion

A Young Modular Robot’s Guide to Locomotion

Ben Pearre

Computer Science
University of Colorado at Boulder, USA

December 6, 2009

Ben Pearre A Young Modular Robot’s Guide to Locomotion

Modular Robots
Learning
Contributions
Conclusion

Outline

Modular Robots

Learning
The Problem
The Policy Gradient
Domain Knowledge

Contributions
Going forward
Steering
Curriculum Development

Conclusion


Modular Robots
Learning
Contributions
Conclusion

Modular Robots

How to get these to move?


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

The Learning Problem

Given unknown sensations and actions, learn a task:
◮ Sensations s ∈ Rn
◮ State x ∈ Rd
◮ Action u ∈ Rp
◮ Reward r ∈ R
◮ Policy π(x, θ) = Pr(u|x, θ) : R|θ| × R|u|
Example policy:

u(x, θ) = θ0 + θi (x − bi )T Di (x − bi ) + N (0, σ)
i

What does that mean for locomotion?


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

Policy Gradient Reinforcement Learning: Finite Diﬀerence

Vary θ:
◮ Measure performance J0 of π(θ)
◮ Measure performance J1...n of π(θ + ∆1...n θ)
◮ Solve regression, move θ along gradient.
−1
gradient = ∆ΘT ∆Θ ˆ
∆ΘT J
   
∆θ1 J1 − J0
where ∆Θ =  .  and J = 
ˆ  .
 .  . 
. . 
∆θn Jn − J0


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

Policy Gradient Reinforcement Learning: Likelihood Ratio

Vary u:
◮ Measure performance J(π(θ)) of π(θ) with noise. . .
◮ Compute log-probability of generated trajectory Pr(τ |θ)
H H
Gradient = ∇θ log πθ (uk |xk ) rl
k=0 l=0


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

Why is RL slow?

“Curse of Dimensionality”
◮ Exploration
◮ Learning rate
◮ Domain representation
◮ Policy representation
◮ Over- and under-actuation
◮ Domain knowledge


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

Domain Knowledge

Inﬁnite space of policies to explore.
◮ RL is model-free. So what?
◮ Representation is bias.
◮ Bias search towards “good” solutions
◮ Learn all of physics. . . and apply it?
◮ Previous experience in this domain?
◮ Policy implemented by <programmer, agent> “autonomous”?
How would knowledge of this domain help?


Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion

Dimensionality Reduction

Task learning as domain-knowledge acquisition:
◮ Experience with a domain
◮ Skill at completing some task
◮ Skill at completing some set of tasks?
◮ Taskspace Manifold


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

Goals

1. Apply PGRL to a new domain.
2. Learn mapping from task manifold to policy manifold.
3. Robot school?


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

1: Learning to locomote

◮ Sensors: Force feedback on
servos? Or not.
◮ Policy: u ∈ R8 controls
servos
ui = N (θi , σ)
◮ Reward: forward speed
◮ Domain knowledge: none

Demo?


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

1: Learning to locomote
Learning to move
10

steer bow
5 steer stern
bow
port fwd
0
θ

stbd fwd
port aft
−5 stbd aft
stern

−10
0 500 1000 1500 2000 2500
s

0.4
effort
10−step forward speed
0.3

0.2
v

0.1

0

−0.1
0 500 1000 1500 2000 2500
s


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

2: Learning to get to a target

◮ Sensors: Bearing to goal.
◮ Policy: u ∈ R8 controls servos
◮ Policy parameters: θ ∈ R16

µi (x, θ) = θi · s (1)
1
= [ θi,0 θi,1 ] (2)
φ
= N (µi , σ)
ui (3)
1
∇θi log π(x, θ) = (ui − θi · s) · s (4)
σ2


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

2: Task space → policy space

◮ 16-DOF learning FAIL!
Time to complete task
◮ Try simpler task: 300

◮ Learn to locomote with 250

θ ∈ R16
200

seconds
◮ Try bootstrapping:
150
1. Learn to locomote with 8
DOF 100

2. Add new sensing and 50
0 20 40 60 80 100 120
control DOF task

◮ CHEATING! Why?


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

Curriculum development for manifold discovery?
◮ ´
Etude in Locomotion
◮ Task-space manifold for locomotion

θ ∈ξ·[ 0 0 1 −1 1 −1 1 1 ]T

◮ Stop exploring in task nullspace
◮ FAST!
◮ ´
Etude in Steering
◮ Can task be completed on locomotion manifold?
◮ One possible approximate solution uses the bases
T
0 0 1 −1 1 −1 1 1
1 −1 0 0 0 0 0 0

◮ Can second basis be learned?


Modular Robots
Going forward
Learning
Steering
Contributions
Conclusion

3: How to teach a robot?

How to teach an animal?
1. Reward basic skills
2. Develop control along useful DOFs
3. Make skill more complex
4. A good solution NOW!


Modular Robots
Learning
Contributions
Conclusion

Conclusion

Exorcising the Curse of Dimensionality
◮ PGRL works for low-DOF problems.
◮ Task-space dimension < state-space dimension.
◮ Learn f: task-space manifold → policy-space manifold.


December 4, Project

Recommandé

Recommandé

Contenu connexe

Similaire à December 4, Project

Similaire à December 4, Project (20)

Plus de University of Colorado at Boulder

Plus de University of Colorado at Boulder (20)

Dernier

Dernier (20)

December 4, Project