SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
Combing Reactive and
Deliberative Algorithms

    CSCI7000: Final Presentation
         Maciej Stachura
           Dec. 4, 2009
Outline

• Project Overview

• Positioning System

• Hardware Demo
Project Goals
• Combine deliberative and reactive
  algorithms

• Show stability and completeness

• Demonstrate multi-robot coverage on
  iCreate robots.
Coverage Problem
• Cover Entire Area
• Deliberative Algorithm Plans
  Next Point to visit.
• Reactive Algorithm pushes
  robot to that point.
• Reactive Algorithm Adds 2
  constraints:
   •   Maintain Communication Distance
   •   Collision Avoidance
Proof of Stability




Therefore stable system.
also,

                 error      decays
Demo for single vehicle

• Implimented on iCreate.
• 5 points to visit.
• Deliberative Algorithm
  Selects Point.
• Reactive Algorithm uses
  potential field to reach point.
• Point reached when within
  some minimum distance.
                                    VIDEO
Multi-robot Case
• 2 Robot Coverage

• Blue is free to move          VIDEO

• Green must stay in
  communication range.

• Matlab Simulation.
Outline

• Project Overview

• Positioning System

• Hardware Demo
Positioning System
• Problems with Stargazer.
   •   Periods of no measurement
   •   Occasional Bad Measurements




• State Estimation (SPF)
   •   Combine Stargazer with Odometry
   •   Reject Bad Measurements
SPF Explanation
• Sigma Point Filter uses
  Stargazer and Odometry
  measures to predict robot
  position.
• Non-guassian Noise
• Implimented and Tested on
  robot platform.
• Performs very well in the
  presence of no measurements
  or bad measurement.
Outline

• Project Overview

• Positioning System

• Hardware Demo
Roomba Pac-Man
• Implimented 5 Robot Demo along
  with Jack Elston.

• Re-creation of Pac-Man Game.

• Demonstrate NetUAS system.

• Showcase most of concepts
  from class.
Video
Roomba Pac-Man
• Reactive Algorithms:
   •   Walls of maze
   •   Potential Field

• Deliberative Algorithms
   •   Ghost Planning (Enumerate States)
   •   Collision Avoidance
   •   Game modes

• Decentralized
   •   Each ghost ran planning algorithm
   •   Collaborated on positions

• Communication
   •   802.11b Ad-hoc Network
   •   AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:
   •   Walls of maze
   •   Potential Field

• Deliberative Algorithms
   •   Ghost Planning (Enumerate States)
   •   Collision Avoidance
   •   Game modes

• Decentralized
   •   Each ghost ran planning algorithm
   •   Collaborated on positions

• Communication
   •   802.11b Ad-hoc Network
   •   AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:
   •   Walls of maze
   •   Potential Field

• Deliberative Algorithms
   •   Ghost Planning (Enumerate States)
   •   Collision Avoidance
   •   Game modes

• Decentralized
   •   Each ghost ran planning algorithm
   •   Collaborated on positions

• Communication
   •   802.11b Ad-hoc Network
   •   AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:
   •   Walls of maze
   •   Potential Field

• Deliberative Algorithms
   •   Ghost Planning (Enumerate States)
   •   Collision Avoidance
   •   Game modes

• Decentralized
   •   Each ghost ran planning algorithm
   •   Collaborated on positions

• Communication
   •   802.11b Ad-hoc Network
   •   AODV, no centralized node
Roomba Pac-Man
• Simulation
   •   Multi-threaded Sim. Of Robots
   •   Combine Software with Hardware

• Probabilistic Modelling
   •   Sigma Point Filter

• Human/Robot Interaction
   •   Limited Human Control of Pac-Man
   •   Autonomous Ghosts

• Hardware Implimentation
   •   SBC's running Gentoo
   •   Experimental Verification
Roomba Pac-Man
• Simulation
   •   Multi-threaded Sim. Of Robots
   •   Combine Software with Hardware

• Probabilistic Modelling
   •   Sigma Point Filter

• Human/Robot Interaction
   •   Limited Human Control of Pac-Man
   •   Autonomous Ghosts

• Hardware Implimentation
   •   SBC's running Gentoo
   •   Experimental Verification
Roomba Pac-Man
• Simulation
   •   Multi-threaded Sim. Of Robots
   •   Combine Software with Hardware

• Probabilistic Modelling
   •   Sigma Point Filter

• Human/Robot Interaction
   •   Limited Human Control of Pac-Man
   •   Autonomous Ghosts

• Hardware Implimentation
   •   SBC's running Gentoo
   •   Experimental Verification
Roomba Pac-Man
• Simulation
   •   Multi-threaded Sim. Of Robots
   •   Combine Software with Hardware

• Probabilistic Modelling
   •   Sigma Point Filter

• Human/Robot Interaction
   •   Limited Human Control of Pac-Man
   •   Autonomous Ghosts

• Hardware Implimentation
   •   SBC's running Gentoo
   •   Experimental Verification
Left to Do
• Impliment inter-robot potential field.

• Conduct Experiments

• Generalize Theory?
End

       Questions?




http://pacman.elstonj.com
A Gradient Based Approach

              Greg Brown
  Introduction
  Robot State Machine

  Gradients for “Grasping” the Object
  Gradient for Moving the Object

  Convergence Simulation Results
  Continuing Work
Place a single beacon on an object and
 another at the object’s destination. Multiple
 robots cooperate to move the object.

Goals:
  Minimal/No Robot Communication
  Object has an Unknown Geometry

  Use Gradients for Reactive Navigation
    Each Robot Knows:
     ◦  Distance/Direction to Object
     ◦  Distance/Direction to Destination
     ◦  Distance/Direction to All Other Robots
     ◦  Bumper Sensor to Detect Collision

    Robots Do Not Know
     ◦  Object Geometry
     ◦  Actions other Robots are taking
    Related “Grasping” Work:
     ◦  Grasping with hand – Maximize torque [Liu et al]
     ◦  Cage objects for pushing [Fink et al]
     ◦  Tug Boats Manipulating Barge [Esposito]
     ◦  ALL require known geometry
    My Hybrid Approach
     ◦  Even distribution around object
     ◦  Alternate between Convergence and Repulsion
        Gradients

     ◦  Similar to Cow Herding example from class.
Pull towards object:

         γ = ri − robj



€       Avoid nearby robots:
                                                  sign(d c − ri −r j )+1
                 
                          ( ri − rj − dc2 ) 2 
                                    2                       2
              N         4
                   1+ d
         β = ∏1− 4 c             2
                                              
                 
             j=1   dc ( ri − rj − dc2 ) 2 + 1
                                              



    €
Combined Cost Function:

                         γ2
       Cost =
                (γ κ c   + β )1/ κ c



€
Repel from all robots:
            N
                        2
      β = ∏ ri − rj − dr2
           j=1


                 1
      Cost =
             (1+ β )1/ κ r
€

€
    Related Work
     ◦  Formations [Tanner and Kumar]
     ◦  Flocking [Lindhé et al]
     ◦  Pushing objects [Fink et al, Esposito]
     ◦  No catastrophic failure if out of position.

    My Approach:
     ◦  Head towards destination in steps
     ◦  Keep close to object.
     ◦  Communicate “through” object
     ◦  Maintain orientation.

    Assuming forklift on Robot can rotate 360º
Next Step Vector:
                         rObjCenter − rObjDest
     rγ i = rideali + dm
                         rObjCenter − rObjDest

    Pull to destination:
€
      γ1 = ri − rγ i



€
Valley Perpendicular to Travel Vector:
               rObjCenterx − rObjDestx
    m=−
          rObjCentery − rObjDesty + .0001

           mrix − riy − mrγ x + rγ y
    γ2 =               2
€                  (m + 1)
κ1 κ 2
Cost = γ γ
       1  2
Number of Occurences




                    0
                        10
                             20
                                           30
                                                         40
                                                                  50
              521                                                      60
              670
              820
              969
             1118
             1268
             1417
             1566
             1715
             1865
             2014
             2163
             2313
             2462
             2611
             2761
             2910




Time Steps
             3059
             3208
             3358
             3507
             3656
             3806
             3955
             4104
             4254
             4403
             4552
             4701
             4851
             5000
                             6 Bots
                                      5 Bots
                                                4 Bots
                                                         3 Bots
  Resolve Convergence Problems
  Noise in Sensing

  Noise in Actuation
Number of Occurences




                    0
                        10
                             20
                                         30
                                                        40
                                                                  50
              245                                                      60
              404
              562
              721
              879
             1038
             1196
             1355
             1513
             1672
             1830
             1989
             2147
             2306
             2464
             2623
             2781




Time Steps
             2940
             3098
             3257
             3415
             3574
             3732
             3891
             4049
             4208
             4366
             4525
             4683
             4842
             5000
                             6 Bots
                                      5 Bots
                                               4 Bots
                                                         3 Bots
Modular Robots
                     Learning
                Contributions
                   Conclusion




A Young Modular Robot’s Guide to Locomotion

                       Ben Pearre

                     Computer Science
          University of Colorado at Boulder, USA


                  December 6, 2009




                   Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                             Learning
                        Contributions
                           Conclusion



Outline

   Modular Robots

   Learning
      The Problem
      The Policy Gradient
      Domain Knowledge

   Contributions
      Going forward
      Steering
      Curriculum Development

   Conclusion


                            Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                            Learning
                       Contributions
                          Conclusion



Modular Robots




  How to get these to move?


                          Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                                The Problem
                                   Learning
                                                The Policy Gradient
                              Contributions
                                                Domain Knowledge
                                 Conclusion



The Learning Problem

   Given unknown sensations and actions, learn a task:
     ◮   Sensations s ∈ Rn
     ◮   State x ∈ Rd
     ◮   Action u ∈ Rp
     ◮   Reward r ∈ R
     ◮   Policy π(x, θ) = Pr(u|x, θ) : R|θ| × R|u|
   Example policy:

            u(x, θ) = θ0 +           θi (x − bi )T Di (x − bi ) + N (0, σ)
                               i

   What does that mean for locomotion?

                                   Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                           The Problem
                                Learning
                                           The Policy Gradient
                           Contributions
                                           Domain Knowledge
                              Conclusion



Policy Gradient Reinforcement Learning: Finite Difference

   Vary θ:
     ◮   Measure performance J0 of π(θ)
     ◮   Measure performance J1...n of π(θ + ∆1...n θ)
     ◮   Solve regression, move θ along gradient.
                                                   −1
                gradient = ∆ΘT ∆Θ          ˆ
                                       ∆ΘT J
                                    
                ∆θ1            J1 − J0
   where ∆Θ =  .  and J = 
                          ˆ      .
               .                .    
                 .                .    
                ∆θn            Jn − J0



                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                           The Problem
                                Learning
                                           The Policy Gradient
                           Contributions
                                           Domain Knowledge
                              Conclusion



Policy Gradient Reinforcement Learning: Likelihood Ratio



   Vary u:
     ◮   Measure performance J(π(θ)) of π(θ) with noise. . .
     ◮   Compute log-probability of generated trajectory Pr(τ |θ)
                              H                                  H
             Gradient =            ∇θ log πθ (uk |xk )                 rl
                             k=0                                 l=0




                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                           The Problem
                                Learning
                                           The Policy Gradient
                           Contributions
                                           Domain Knowledge
                              Conclusion



Why is RL slow?


   “Curse of Dimensionality”
     ◮   Exploration
     ◮   Learning rate
     ◮   Domain representation
     ◮   Policy representation
     ◮   Over- and under-actuation
     ◮   Domain knowledge




                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                            The Problem
                                 Learning
                                            The Policy Gradient
                            Contributions
                                            Domain Knowledge
                               Conclusion



Domain Knowledge


  Infinite space of policies to explore.
    ◮   RL is model-free. So what?
    ◮   Representation is bias.
    ◮   Bias search towards “good” solutions
    ◮   Learn all of physics. . . and apply it?
    ◮   Previous experience in this domain?
    ◮   Policy implemented by <programmer, agent> “autonomous”?
  How would knowledge of this domain help?




                               Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                           The Problem
                                Learning
                                           The Policy Gradient
                           Contributions
                                           Domain Knowledge
                              Conclusion



Dimensionality Reduction



   Task learning as domain-knowledge acquisition:
     ◮   Experience with a domain
     ◮   Skill at completing some task
     ◮   Skill at completing some set of tasks?
     ◮   Taskspace Manifold




                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                        Going forward
                             Learning
                                        Steering
                        Contributions
                                        Curriculum Development
                           Conclusion



Goals




    1. Apply PGRL to a new domain.
    2. Learn mapping from task manifold to policy manifold.
    3. Robot school?




                           Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                           Going forward
                                Learning
                                           Steering
                           Contributions
                                           Curriculum Development
                              Conclusion



1: Learning to locomote

    ◮   Sensors: Force feedback on
        servos? Or not.
    ◮   Policy: u ∈ R8 controls
        servos
        ui = N (θi , σ)
    ◮   Reward: forward speed
    ◮   Domain knowledge: none


   Demo?



                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                                                               Going forward
                                                              Learning
                                                                               Steering
                                                         Contributions
                                                                               Curriculum Development
                                                            Conclusion



1: Learning to locomote
                                             Learning to move
        10

                                                                                     steer bow
         5                                                                           steer stern
                                                                                     bow
                                                                                     port fwd
         0
   θ




                                                                                     stbd fwd
                                                                                     port aft
        −5                                                                           stbd aft
                                                                                     stern

       −10
              0          500              1000             1500         2000       2500
                                                    s



        0.4
                  effort
                  10−step forward speed
        0.3


        0.2
   v




        0.1


         0


       −0.1
              0          500              1000             1500         2000       2500
                                                    s




                                                                Ben Pearre     A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                             Going forward
                                Learning
                                             Steering
                           Contributions
                                             Curriculum Development
                              Conclusion



2: Learning to get to a target

     ◮   Sensors: Bearing to goal.
     ◮   Policy: u ∈ R8 controls servos
     ◮   Policy parameters: θ ∈ R16

                          µi (x, θ) = θi · s                                               (1)
                                                                      1
                                           = [ θi,0 θi,1 ]                                 (2)
                                                                      φ
                                    = N (µi , σ)
                                   ui                                                      (3)
                                      1
                    ∇θi log π(x, θ) =    (ui − θi · s) · s                                 (4)
                                      σ2



                              Ben Pearre     A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                            Going forward
                                 Learning
                                            Steering
                            Contributions
                                            Curriculum Development
                               Conclusion



2: Task space → policy space


    ◮   16-DOF learning FAIL!
                                                                        Time to complete task
    ◮   Try simpler task:                                   300


          ◮   Learn to locomote with                        250

              θ ∈ R16
                                                            200




                                                  seconds
    ◮   Try bootstrapping:
                                                            150
         1. Learn to locomote with 8
            DOF                                             100

         2. Add new sensing and                              50
                                                               0   20    40      60      80     100   120
            control DOF                                                         task


    ◮   CHEATING! Why?



                               Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                              Going forward
                                  Learning
                                              Steering
                             Contributions
                                              Curriculum Development
                                Conclusion



Curriculum development for manifold discovery?
    ◮   ´
        Etude in Locomotion
          ◮   Task-space manifold for locomotion

                        θ ∈ξ·[ 0        0 1     −1 1 −1                1 1 ]T

          ◮   Stop exploring in task nullspace
          ◮   FAST!
    ◮   ´
        Etude in Steering
          ◮   Can task be completed on locomotion manifold?
          ◮   One possible approximate solution uses the bases
                                                                           T
                             0 0         1 −1 1          −1 1          1
                             1 −1        0 0 0           0 0           0

    ◮   Can second basis be learned?

                                Ben Pearre    A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                          Going forward
                              Learning
                                          Steering
                         Contributions
                                          Curriculum Development
                            Conclusion



3: How to teach a robot?



   How to teach an animal?
    1. Reward basic skills
    2. Develop control along useful DOFs
    3. Make skill more complex
    4. A good solution NOW!




                             Ben Pearre   A Young Modular Robot’s Guide to Locomotion
Modular Robots
                                Learning
                           Contributions
                              Conclusion



Conclusion



   Exorcising the Curse of Dimensionality
     ◮   PGRL works for low-DOF problems.
     ◮   Task-space dimension < state-space dimension.
     ◮   Learn f: task-space manifold → policy-space manifold.




                              Ben Pearre   A Young Modular Robot’s Guide to Locomotion

Contenu connexe

Similaire à December 4, Project

Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...Aritra Sarkar
 
Rf compass
Rf compassRf compass
Rf compasssarkrass
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolJun Hu
 
Introduction to Steering behaviours for Autonomous Agents
Introduction to Steering behaviours for Autonomous AgentsIntroduction to Steering behaviours for Autonomous Agents
Introduction to Steering behaviours for Autonomous AgentsBryan Duggan
 
Dr.Kawewong Ph.D Thesis
Dr.Kawewong Ph.D ThesisDr.Kawewong Ph.D Thesis
Dr.Kawewong Ph.D ThesisSOINN Inc.
 
How To Make Multi-Robots Formation Control System
How To Make Multi-Robots Formation Control SystemHow To Make Multi-Robots Formation Control System
How To Make Multi-Robots Formation Control SystemKeisuke Uto
 
for "Parallelizing Multiple Group-by Queries using MapReduce"
for "Parallelizing Multiple Group-by Queries using MapReduce"for "Parallelizing Multiple Group-by Queries using MapReduce"
for "Parallelizing Multiple Group-by Queries using MapReduce"Yun-Yan Chi
 
The Art Of Performance Tuning
The Art Of Performance TuningThe Art Of Performance Tuning
The Art Of Performance TuningJonathan Ross
 
Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill) Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill) Jean-Philippe Doiron
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 PresentationShalom Cohen
 
CG simple openGL point & line-course 2
CG simple openGL point & line-course 2CG simple openGL point & line-course 2
CG simple openGL point & line-course 2fungfung Chen
 
Functional solid
Functional solidFunctional solid
Functional solidMatt Stine
 
The not so short
The not so shortThe not so short
The not so shortAXM
 
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례Hwanhee Kim
 
Programmable Matter with Modular Robots
Programmable Matter with Modular RobotsProgrammable Matter with Modular Robots
Programmable Matter with Modular Robotselliando dias
 
Using Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsUsing Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsSerge Stinckwich
 

Similaire à December 4, Project (20)

October 19, Probabilistic Modeling III
October 19, Probabilistic Modeling IIIOctober 19, Probabilistic Modeling III
October 19, Probabilistic Modeling III
 
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
Computer-Vision based Centralized Multi-agent System on Matlab and Arduino Du...
 
Rf compass
Rf compassRf compass
Rf compass
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
 
Introduction to Steering behaviours for Autonomous Agents
Introduction to Steering behaviours for Autonomous AgentsIntroduction to Steering behaviours for Autonomous Agents
Introduction to Steering behaviours for Autonomous Agents
 
Dr.Kawewong Ph.D Thesis
Dr.Kawewong Ph.D ThesisDr.Kawewong Ph.D Thesis
Dr.Kawewong Ph.D Thesis
 
How To Make Multi-Robots Formation Control System
How To Make Multi-Robots Formation Control SystemHow To Make Multi-Robots Formation Control System
How To Make Multi-Robots Formation Control System
 
for "Parallelizing Multiple Group-by Queries using MapReduce"
for "Parallelizing Multiple Group-by Queries using MapReduce"for "Parallelizing Multiple Group-by Queries using MapReduce"
for "Parallelizing Multiple Group-by Queries using MapReduce"
 
Mapping mobile robotics
Mapping mobile roboticsMapping mobile robotics
Mapping mobile robotics
 
The Art Of Performance Tuning
The Art Of Performance TuningThe Art Of Performance Tuning
The Art Of Performance Tuning
 
Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill) Developing Next-Generation Games with Stage3D (Molehill)
Developing Next-Generation Games with Stage3D (Molehill)
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 Presentation
 
CG simple openGL point & line-course 2
CG simple openGL point & line-course 2CG simple openGL point & line-course 2
CG simple openGL point & line-course 2
 
Functional solid
Functional solidFunctional solid
Functional solid
 
The not so short
The not so shortThe not so short
The not so short
 
Hacking for salone: drone races
Hacking for salone: drone racesHacking for salone: drone races
Hacking for salone: drone races
 
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
 
December 2, Projects
December 2, ProjectsDecember 2, Projects
December 2, Projects
 
Programmable Matter with Modular Robots
Programmable Matter with Modular RobotsProgrammable Matter with Modular Robots
Programmable Matter with Modular Robots
 
Using Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systemsUsing Smalltalk for controlling robotics systems
Using Smalltalk for controlling robotics systems
 

Plus de University of Colorado at Boulder

Three-dimensional construction with mobile robots and modular blocks
 Three-dimensional construction with mobile robots and modular blocks Three-dimensional construction with mobile robots and modular blocks
Three-dimensional construction with mobile robots and modular blocksUniversity of Colorado at Boulder
 

Plus de University of Colorado at Boulder (20)

Three-dimensional construction with mobile robots and modular blocks
 Three-dimensional construction with mobile robots and modular blocks Three-dimensional construction with mobile robots and modular blocks
Three-dimensional construction with mobile robots and modular blocks
 
Template classes and ROS messages
Template classes and ROS messagesTemplate classes and ROS messages
Template classes and ROS messages
 
NLP for Robotics
NLP for RoboticsNLP for Robotics
NLP for Robotics
 
Indoor Localization Systems
Indoor Localization SystemsIndoor Localization Systems
Indoor Localization Systems
 
Lecture 10: Summary
Lecture 10: SummaryLecture 10: Summary
Lecture 10: Summary
 
Lecture 09: SLAM
Lecture 09: SLAMLecture 09: SLAM
Lecture 09: SLAM
 
Lecture 08: Localization and Mapping II
Lecture 08: Localization and Mapping IILecture 08: Localization and Mapping II
Lecture 08: Localization and Mapping II
 
Lecture 07: Localization and Mapping I
Lecture 07: Localization and Mapping ILecture 07: Localization and Mapping I
Lecture 07: Localization and Mapping I
 
Lecture 06: Features and Uncertainty
Lecture 06: Features and UncertaintyLecture 06: Features and Uncertainty
Lecture 06: Features and Uncertainty
 
Lecture 05
Lecture 05Lecture 05
Lecture 05
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Lecture 03 - Kinematics and Control
Lecture 03 - Kinematics and ControlLecture 03 - Kinematics and Control
Lecture 03 - Kinematics and Control
 
Lecture 02: Locomotion
Lecture 02: LocomotionLecture 02: Locomotion
Lecture 02: Locomotion
 
Lecture 01
Lecture 01Lecture 01
Lecture 01
 
Lectures 11+12: Debates
Lectures 11+12: DebatesLectures 11+12: Debates
Lectures 11+12: Debates
 
Lecture 09: Localization and Mapping III
Lecture 09: Localization and Mapping IIILecture 09: Localization and Mapping III
Lecture 09: Localization and Mapping III
 
Lecture 10: Navigation
Lecture 10: NavigationLecture 10: Navigation
Lecture 10: Navigation
 
Lecture 08: Localization and Mapping II
Lecture 08: Localization and Mapping IILecture 08: Localization and Mapping II
Lecture 08: Localization and Mapping II
 
Lecture 07: Localization and Mapping I
Lecture 07: Localization and Mapping ILecture 07: Localization and Mapping I
Lecture 07: Localization and Mapping I
 
Lecture 06: Features
Lecture 06: FeaturesLecture 06: Features
Lecture 06: Features
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

December 4, Project

  • 1. Combing Reactive and Deliberative Algorithms CSCI7000: Final Presentation Maciej Stachura Dec. 4, 2009
  • 2. Outline • Project Overview • Positioning System • Hardware Demo
  • 3. Project Goals • Combine deliberative and reactive algorithms • Show stability and completeness • Demonstrate multi-robot coverage on iCreate robots.
  • 4. Coverage Problem • Cover Entire Area • Deliberative Algorithm Plans Next Point to visit. • Reactive Algorithm pushes robot to that point. • Reactive Algorithm Adds 2 constraints: • Maintain Communication Distance • Collision Avoidance
  • 5. Proof of Stability Therefore stable system. also, error decays
  • 6. Demo for single vehicle • Implimented on iCreate. • 5 points to visit. • Deliberative Algorithm Selects Point. • Reactive Algorithm uses potential field to reach point. • Point reached when within some minimum distance. VIDEO
  • 7. Multi-robot Case • 2 Robot Coverage • Blue is free to move VIDEO • Green must stay in communication range. • Matlab Simulation.
  • 8. Outline • Project Overview • Positioning System • Hardware Demo
  • 9. Positioning System • Problems with Stargazer. • Periods of no measurement • Occasional Bad Measurements • State Estimation (SPF) • Combine Stargazer with Odometry • Reject Bad Measurements
  • 10. SPF Explanation • Sigma Point Filter uses Stargazer and Odometry measures to predict robot position. • Non-guassian Noise • Implimented and Tested on robot platform. • Performs very well in the presence of no measurements or bad measurement.
  • 11. Outline • Project Overview • Positioning System • Hardware Demo
  • 12. Roomba Pac-Man • Implimented 5 Robot Demo along with Jack Elston. • Re-creation of Pac-Man Game. • Demonstrate NetUAS system. • Showcase most of concepts from class.
  • 13. Video
  • 14. Roomba Pac-Man • Reactive Algorithms: • Walls of maze • Potential Field • Deliberative Algorithms • Ghost Planning (Enumerate States) • Collision Avoidance • Game modes • Decentralized • Each ghost ran planning algorithm • Collaborated on positions • Communication • 802.11b Ad-hoc Network • AODV, no centralized node
  • 15. Roomba Pac-Man • Reactive Algorithms: • Walls of maze • Potential Field • Deliberative Algorithms • Ghost Planning (Enumerate States) • Collision Avoidance • Game modes • Decentralized • Each ghost ran planning algorithm • Collaborated on positions • Communication • 802.11b Ad-hoc Network • AODV, no centralized node
  • 16. Roomba Pac-Man • Reactive Algorithms: • Walls of maze • Potential Field • Deliberative Algorithms • Ghost Planning (Enumerate States) • Collision Avoidance • Game modes • Decentralized • Each ghost ran planning algorithm • Collaborated on positions • Communication • 802.11b Ad-hoc Network • AODV, no centralized node
  • 17. Roomba Pac-Man • Reactive Algorithms: • Walls of maze • Potential Field • Deliberative Algorithms • Ghost Planning (Enumerate States) • Collision Avoidance • Game modes • Decentralized • Each ghost ran planning algorithm • Collaborated on positions • Communication • 802.11b Ad-hoc Network • AODV, no centralized node
  • 18. Roomba Pac-Man • Simulation • Multi-threaded Sim. Of Robots • Combine Software with Hardware • Probabilistic Modelling • Sigma Point Filter • Human/Robot Interaction • Limited Human Control of Pac-Man • Autonomous Ghosts • Hardware Implimentation • SBC's running Gentoo • Experimental Verification
  • 19. Roomba Pac-Man • Simulation • Multi-threaded Sim. Of Robots • Combine Software with Hardware • Probabilistic Modelling • Sigma Point Filter • Human/Robot Interaction • Limited Human Control of Pac-Man • Autonomous Ghosts • Hardware Implimentation • SBC's running Gentoo • Experimental Verification
  • 20. Roomba Pac-Man • Simulation • Multi-threaded Sim. Of Robots • Combine Software with Hardware • Probabilistic Modelling • Sigma Point Filter • Human/Robot Interaction • Limited Human Control of Pac-Man • Autonomous Ghosts • Hardware Implimentation • SBC's running Gentoo • Experimental Verification
  • 21. Roomba Pac-Man • Simulation • Multi-threaded Sim. Of Robots • Combine Software with Hardware • Probabilistic Modelling • Sigma Point Filter • Human/Robot Interaction • Limited Human Control of Pac-Man • Autonomous Ghosts • Hardware Implimentation • SBC's running Gentoo • Experimental Verification
  • 22. Left to Do • Impliment inter-robot potential field. • Conduct Experiments • Generalize Theory?
  • 23. End Questions? http://pacman.elstonj.com
  • 24. A Gradient Based Approach Greg Brown
  • 25.   Introduction   Robot State Machine   Gradients for “Grasping” the Object   Gradient for Moving the Object   Convergence Simulation Results   Continuing Work
  • 26. Place a single beacon on an object and another at the object’s destination. Multiple robots cooperate to move the object. Goals:   Minimal/No Robot Communication   Object has an Unknown Geometry   Use Gradients for Reactive Navigation
  • 27.
  • 28.   Each Robot Knows: ◦  Distance/Direction to Object ◦  Distance/Direction to Destination ◦  Distance/Direction to All Other Robots ◦  Bumper Sensor to Detect Collision   Robots Do Not Know ◦  Object Geometry ◦  Actions other Robots are taking
  • 29.
  • 30.   Related “Grasping” Work: ◦  Grasping with hand – Maximize torque [Liu et al] ◦  Cage objects for pushing [Fink et al] ◦  Tug Boats Manipulating Barge [Esposito] ◦  ALL require known geometry   My Hybrid Approach ◦  Even distribution around object ◦  Alternate between Convergence and Repulsion Gradients ◦  Similar to Cow Herding example from class.
  • 31. Pull towards object: γ = ri − robj € Avoid nearby robots: sign(d c − ri −r j )+1  ( ri − rj − dc2 ) 2  2 2 N 4 1+ d β = ∏1− 4 c 2   j=1  dc ( ri − rj − dc2 ) 2 + 1  €
  • 32. Combined Cost Function: γ2 Cost = (γ κ c + β )1/ κ c €
  • 33. Repel from all robots: N 2 β = ∏ ri − rj − dr2 j=1 1 Cost = (1+ β )1/ κ r € €
  • 34.
  • 35.   Related Work ◦  Formations [Tanner and Kumar] ◦  Flocking [Lindhé et al] ◦  Pushing objects [Fink et al, Esposito] ◦  No catastrophic failure if out of position.   My Approach: ◦  Head towards destination in steps ◦  Keep close to object. ◦  Communicate “through” object ◦  Maintain orientation.   Assuming forklift on Robot can rotate 360º
  • 36. Next Step Vector: rObjCenter − rObjDest rγ i = rideali + dm rObjCenter − rObjDest Pull to destination: € γ1 = ri − rγ i €
  • 37. Valley Perpendicular to Travel Vector: rObjCenterx − rObjDestx m=− rObjCentery − rObjDesty + .0001 mrix − riy − mrγ x + rγ y γ2 = 2 € (m + 1)
  • 38. κ1 κ 2 Cost = γ γ 1 2
  • 39.
  • 40.
  • 41. Number of Occurences 0 10 20 30 40 50 521 60 670 820 969 1118 1268 1417 1566 1715 1865 2014 2163 2313 2462 2611 2761 2910 Time Steps 3059 3208 3358 3507 3656 3806 3955 4104 4254 4403 4552 4701 4851 5000 6 Bots 5 Bots 4 Bots 3 Bots
  • 42.   Resolve Convergence Problems   Noise in Sensing   Noise in Actuation
  • 43. Number of Occurences 0 10 20 30 40 50 245 60 404 562 721 879 1038 1196 1355 1513 1672 1830 1989 2147 2306 2464 2623 2781 Time Steps 2940 3098 3257 3415 3574 3732 3891 4049 4208 4366 4525 4683 4842 5000 6 Bots 5 Bots 4 Bots 3 Bots
  • 44. Modular Robots Learning Contributions Conclusion A Young Modular Robot’s Guide to Locomotion Ben Pearre Computer Science University of Colorado at Boulder, USA December 6, 2009 Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 45. Modular Robots Learning Contributions Conclusion Outline Modular Robots Learning The Problem The Policy Gradient Domain Knowledge Contributions Going forward Steering Curriculum Development Conclusion Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 46. Modular Robots Learning Contributions Conclusion Modular Robots How to get these to move? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 47. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion The Learning Problem Given unknown sensations and actions, learn a task: ◮ Sensations s ∈ Rn ◮ State x ∈ Rd ◮ Action u ∈ Rp ◮ Reward r ∈ R ◮ Policy π(x, θ) = Pr(u|x, θ) : R|θ| × R|u| Example policy: u(x, θ) = θ0 + θi (x − bi )T Di (x − bi ) + N (0, σ) i What does that mean for locomotion? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 48. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion Policy Gradient Reinforcement Learning: Finite Difference Vary θ: ◮ Measure performance J0 of π(θ) ◮ Measure performance J1...n of π(θ + ∆1...n θ) ◮ Solve regression, move θ along gradient. −1 gradient = ∆ΘT ∆Θ ˆ ∆ΘT J     ∆θ1 J1 − J0 where ∆Θ =  .  and J =  ˆ  .  .  .  . .  ∆θn Jn − J0 Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 49. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion Policy Gradient Reinforcement Learning: Likelihood Ratio Vary u: ◮ Measure performance J(π(θ)) of π(θ) with noise. . . ◮ Compute log-probability of generated trajectory Pr(τ |θ) H H Gradient = ∇θ log πθ (uk |xk ) rl k=0 l=0 Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 50. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion Why is RL slow? “Curse of Dimensionality” ◮ Exploration ◮ Learning rate ◮ Domain representation ◮ Policy representation ◮ Over- and under-actuation ◮ Domain knowledge Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 51. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion Domain Knowledge Infinite space of policies to explore. ◮ RL is model-free. So what? ◮ Representation is bias. ◮ Bias search towards “good” solutions ◮ Learn all of physics. . . and apply it? ◮ Previous experience in this domain? ◮ Policy implemented by <programmer, agent> “autonomous”? How would knowledge of this domain help? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 52. Modular Robots The Problem Learning The Policy Gradient Contributions Domain Knowledge Conclusion Dimensionality Reduction Task learning as domain-knowledge acquisition: ◮ Experience with a domain ◮ Skill at completing some task ◮ Skill at completing some set of tasks? ◮ Taskspace Manifold Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 53. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion Goals 1. Apply PGRL to a new domain. 2. Learn mapping from task manifold to policy manifold. 3. Robot school? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 54. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion 1: Learning to locomote ◮ Sensors: Force feedback on servos? Or not. ◮ Policy: u ∈ R8 controls servos ui = N (θi , σ) ◮ Reward: forward speed ◮ Domain knowledge: none Demo? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 55. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion 1: Learning to locomote Learning to move 10 steer bow 5 steer stern bow port fwd 0 θ stbd fwd port aft −5 stbd aft stern −10 0 500 1000 1500 2000 2500 s 0.4 effort 10−step forward speed 0.3 0.2 v 0.1 0 −0.1 0 500 1000 1500 2000 2500 s Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 56. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion 2: Learning to get to a target ◮ Sensors: Bearing to goal. ◮ Policy: u ∈ R8 controls servos ◮ Policy parameters: θ ∈ R16 µi (x, θ) = θi · s (1) 1 = [ θi,0 θi,1 ] (2) φ = N (µi , σ) ui (3) 1 ∇θi log π(x, θ) = (ui − θi · s) · s (4) σ2 Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 57. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion 2: Task space → policy space ◮ 16-DOF learning FAIL! Time to complete task ◮ Try simpler task: 300 ◮ Learn to locomote with 250 θ ∈ R16 200 seconds ◮ Try bootstrapping: 150 1. Learn to locomote with 8 DOF 100 2. Add new sensing and 50 0 20 40 60 80 100 120 control DOF task ◮ CHEATING! Why? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 58. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion Curriculum development for manifold discovery? ◮ ´ Etude in Locomotion ◮ Task-space manifold for locomotion θ ∈ξ·[ 0 0 1 −1 1 −1 1 1 ]T ◮ Stop exploring in task nullspace ◮ FAST! ◮ ´ Etude in Steering ◮ Can task be completed on locomotion manifold? ◮ One possible approximate solution uses the bases T 0 0 1 −1 1 −1 1 1 1 −1 0 0 0 0 0 0 ◮ Can second basis be learned? Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 59. Modular Robots Going forward Learning Steering Contributions Curriculum Development Conclusion 3: How to teach a robot? How to teach an animal? 1. Reward basic skills 2. Develop control along useful DOFs 3. Make skill more complex 4. A good solution NOW! Ben Pearre A Young Modular Robot’s Guide to Locomotion
  • 60. Modular Robots Learning Contributions Conclusion Conclusion Exorcising the Curse of Dimensionality ◮ PGRL works for low-DOF problems. ◮ Task-space dimension < state-space dimension. ◮ Learn f: task-space manifold → policy-space manifold. Ben Pearre A Young Modular Robot’s Guide to Locomotion