SAC+CURL Results (from RGB)
To serve as a baseline.
Created on August 20|Last edited on August 22
Comment
RL Results, Scooping
Looks like SAC/RL does well here, but not for pouring (see other plots). Also these are with 1M replay buffer size capacities.
Overall summary:
- For scooping, these use 1M replay buffer capacity and 1x reward scaling (so there is basically no reward scaling, it's the identity).
- Success rates are high but again I think we need to argue that the purpose of 4D actions was not to optimize for success but to test how well it could imitate a policy. For optimizing success rates we conveniently now have the 6D scooping results.
SAC/CURL, 4DoF ScoopBall
3
SAC/CURL, 6DoF ScoopBall
3
RL Results, Pouring
VERY LOW! But that is due to the difference in reporting binary success rate versus the number of particles that are inside the target cup!
For the paper, we should report Pouring 3D, 100K Cap., 20X scale. That is the same as SAC/CURL, 3DoF PourWater in the plot below.
We should also report Pouring 6D, 100K Cap., 20X scale. That is the same as SAC/CURL, 6DoF PourWater in the plot below.
SAC/CURL, 3DoF PourWater
3
SAC/CURL, 6DoF PourWater
3
Pouring 3D, 1M cap., 20X scale
3
Pouring 3D, 1M cap., 1X scale
3
GIFs, Scooping
4DoF
Well, as expected, this relies on assuming we can quickly knock against the wall, it seems a bit unnatural.
Here are all 3 seeds after 1M steps.



6DoF
Here are all 3 seeds after 1M steps.



GIFS, Pouring
3DoF
This actually isn't too bad and kind of matches what we see on the GIFs on the SoftGym website. Here are all 3 seeds after 1M steps.



6DoF
After 1M steps, all 3 seeds. Yikes.



Add a comment