Skip to main content

BC #02: MMOneSphere, "Naive" Method, Regress Straight to 3DoF

This report will analyze the "naive" way of BC with 3DOF data / actions: we take the segmented point cloud, pass it as input to a classification PN++, and then do MSE straight on the 3DoF translation. I also used this to debug / investigate MSEs. I also test with 100 and 500 training demos.
Created on April 28|Last edited on May 2
What are the following plots showing?
  • BC #01: this is with the first batch of experiment settings, just used to compare if the collision detection code change (from BC#01 to #02) affected things. See this report for those results in more detail. (This also kept first 90% of each train demo, and used 114 train demos, instead of 100 due to an error on my part. Also it used fewer validation configs compared to what I use now (I have 100 starting validation configs in BC#02) so I think BC#02 experiments are more informative.)
  • BC#02: 100 train demos, keep first 75% of each demo. I think keeping first 75% is what we should do by default since the last 25% the demonstrator doesn't do anything (but we do leave that there since the learned policies might need that extra time).
  • BC#02: 100 train demos, keep first 90% of each demo.
  • BC#02: 500 train demos, keep first 75% of each demo. This is to test how results scale. The testing is still done on the same held-out set of 100 possible valid configs.
The curves are averages over 3X seeds.
TL;DR seems like for this setting, we actually need 500 training demos instead of 100 to get something working here. Of course, the non-naive methods can do with 100 demos, good for us. :-) Even then, what we see is that we get 0.441 +/- 0.3 success rate, which is worse than what we get with 100 demos with SVD pointwise! Though if you break it down in individual runs, 2 did very well and the third did not learn at all, results are actually:
(0.6859 + 0.6074 + 0.03007) / 3 = 0.4411
In contrast with the 100 demo case (keeping first 75% of each training demo, as with the 500 training demos case) we get:
0.029, 0.028, 0.023 (i.e., very bad)
For an actual paper we'd run more but these settings give us confidence in our results. I could do 2x more for each (seeds 103,104) to get 5X runs.

Results Across Different Settings


BC#01 (Old Settings)
3
BC#02 (100 demos, keep first 75% of each)
3
BC#02 (100 demos, keep first 90% of each)
3
BC#02 (500 demos, keep first 75% of each)
3


Comparison with Older Runs:

Old report with just the naive runs from earlier:
Here are GIFs of the current policy.

New BC #02, Keeping first 75%:

All are after 250 epochs of training for the 3X seeds, now this is looking like it runs into the issue with sideways movement that plauged the earlier flow-based experiments.




New BC #02, Keeping first 90%:

All are after 250 epochs of training for the 3X seeds:




New BC #02, Keeping first 75%, now with 500 (instead of 100) demos.

This seems to be a lot better showing that it's a good sanity check that the env is actually doable but also that it suffers from sample efficiency issues. The first seed seems to be still problematic, but the other two are working well! So yeah even with 500 demos the naive way is quite problematic.





Notes to Self on BC #01 to #02

Any issues in the train settings?