SAC+CURL Results (from RGB)

To serve as a baseline.
Created on August 20|Last edited on August 22
Comment
﻿
RL Results, ScoopingRL Results, PouringGIFs, Scooping4DoF6DoFGIFS, Pouring3DoF6DoF
﻿
RL Results, ScoopingLooks like SAC/RL does well here, but not for pouring (see other plots). Also these are with 1M replay buffer size capacities.
Overall summary:
For scooping, these use 1M replay buffer capacity and 1x reward scaling (so there is basically no reward scaling, it's the identity).
Success rates are  high but again I think we need to argue that the purpose of 4D actions was not to optimize for success but to test how well it could imitate a policy. For optimizing success rates we conveniently now have the 6D scooping results.
﻿
SAC/CURL, 4DoF ScoopBall3
SAC/CURL, 6DoF ScoopBall3
﻿
RL Results, PouringVERY LOW! But that is due to the difference in reporting binary success rate versus the number of particles that are inside the target cup!
For the paper, we should report Pouring 3D, 100K Cap., 20X scale. That is the same as SAC/CURL, 3DoF PourWater in the plot below.
We should also report Pouring 6D, 100K Cap., 20X scale. That is the same as SAC/CURL, 6DoF PourWater in the plot below.
﻿
SAC/CURL, 3DoF PourWater3
SAC/CURL, 6DoF PourWater3
Pouring 3D, 1M cap., 20X scale3
Pouring 3D, 1M cap., 1X scale3
﻿
GIFs, Scooping
4DoFWell, as expected, this relies on assuming we can quickly knock against the wall, it seems a bit unnatural.
Here are all 3 seeds after 1M steps.
﻿
﻿
﻿
6DoFHere are all 3 seeds after 1M steps.
﻿
﻿
﻿
GIFS, Pouring
3DoFThis actually isn't too bad and kind of matches what we see on the GIFs on the SoftGym website. Here are all 3 seeds after 1M steps.
﻿
﻿
﻿
6DoFAfter 1M steps, all 3 seeds. Yikes.
﻿
﻿
﻿
﻿
Add a comment