Write Less Robot Specific Code
Created on January 6|Last edited on January 6
Comment
The dream is to find and old dusty robot in an abandoned warehouse, upload a piece of code on it and it wakes up.
[gif from movie: chateau dans le ciel!’ or SD image]
This means we need robot agnostic code. We don’t want to spend days disassembling the robot and analyzing it (actually maybe we do but I’m no control engineer yet lol). We need the robot now!!
[explain the broader project]
Our old code contains code that I find a bit to specific and arbitrary.
First are state limits. These are arbitrary bounds to the state space. e.g we say the robot can go past 140 deg on the vertical axis or we end the episode.
Without state limits we have two issues.
Our simulation is, well, a simulation and thus simpler than real life. This means if the simulated robot spins at mach 10 it's not going to break. This means the numbers in the simulated state can get really big and go to NaN.
On the other hand, on the real robot if we don't restrict e.g the speed the robot might break for real. This is annoying because we need to set arbitrary limits and I fear they might hinder training.
(Second is the Action Limiter. We borrowed it from the Quanser Qube, a similar robot that can't do full revolutions. Because of this mechanical restriction they restrict the robot actions near it's joints limits. We initially had similar limitations but ended up using a slip ring so our robot wasn't constrained. I wan't to get rid of it to make the system simpler, less arbitrary, more portable. But I'm curious if it is making learning easier.)
[show weird quanser code: I ain’t reading that; maybe adjust the vibe]
Section 1
Sweep: by5t7qxh 1
12
0
Having state (especially pos) limits seemed to help training. It prevented the robot from spinning on it itself, which we don't want. We want it to balance it's pole near theta 0. But cutting off the episode when the robot is out of bounds might not be the best solution. I feel it would be better to encode this in the reward.
conclusion from sweep above: having only a limit on position is what produced a working solution the faster. However, limiting the position + speed produced better solutions.
For now I'm going to include a reward for the position. It'll be as important as the pendulum reward. + limit the motor speed to 50 rad/s this should be to prevent the robot from breaking(worked before) and will prevent the sim from going NaN.
I hope the new reward is going to help training while having only one limit on the system's state is going to be less aribtrary, easier to understand/setup.
That said having a reward on theta could hinder training as the robot need to get away from theta 0 to swing up the pendulum.
Later I'd like to use reading from the motor's current sensor to maybe add an energy penalty to the reward.
Run set
6
-> adding theta 0
Sweep: theta reward 1
6
Sweep: theta reward 2
0
-> theta + alpha reward: more samples needed/trains for longer before reward thresh
-> theta + alpha: stays in the center but spins the pendulum
-> realizing im trying to do two things at once:
- make the system more general
- increase sample efficiency
they are somewhat conflicting
Add a comment