Skip to main content

BC04 Multi-Sphere Results (4DoF Demo Data)

4DoF demo data (again, trying now). See other report for 3DoF demo data.
Created on July 13|Last edited on July 26
Why is this environment not inducing good results? Demonstrator success: 0.623. The single-sphere has 0.632 instead.


Overall Comments and Updates

NOTE: some runs had 2000 epochs, and we should stick with 1000 going forward. But those runs that had 2000 epochs ended up with their "best" epochs within the first 1000, so for my Notion I'm just going to report 1000 epochs to reduce confusion.

(07/12/2022) Here's what the statistics mean:
  • eval/info_done_final: if we have retrieved the target item. (Does not consider the distractor, may account for cases when retrieving both.) This is what we normally report.
  • eval/info_dist_1_done_final: if we have retrieved the distractor. (Does not consider the target, may account for cases when retrieving both.)
  • eval/info_done_and_dist_1_final: if we have retrieved BOTH items.
  • eval/info_done_no_dist_1_final: if we have retrieved the target and AVOIDED the distractor.
  • eval/info_no_done_dist_1_final: if we have retrieved the distractor but NOT the target!
(07/17/2022) Trained longer, still not seeing improvements...
(07/19/2022) Adding 2 more random seeds for both versions of point clouds, so that we can compare more fairly with the other settings. In progress ... (edit: done).
(07/20/2022) Adding Direct Vector (MSE) baseline for both types of point clouds.
(07/23/2022) Runs have finished, let's record them (in my Notion) and add some GIFs.
  • From looking at the 2000 epoch results, I think it's safe to assume that we can stick with 1000 epochs for now.
  • Why is Direct Vector (MSE) with g.t. PCL so bad? It seems to be not learning anything? From looking at the logs, it's using the right dataset (the BC data is the same as in the ToolFlowNet case with g.t. PCL) and the right observation type (i.e., point_cloud_gt_v01). It's using the same 'scaling of 100x' for the rotation hack that I've been using. Actually from looking at the training curves, seems like it's completely failed? Why?? Please try and overfit to 1 demo and see what's up.
  • Unfortunately one of the Direct Vector (MSE) with observed PCL had the 'exploding water' stuff. :( Will have 4X seeds instead of 5X for now, then.
  • Also after doing the 2 extra random seeds for ToolFlowNet (for both point cloud variants tried) now the results aren't actually that good, normalized success drops to 0.616 ugh, and I can barely see any difference in the two point clouds. So we can try 1 more point cloud variant to see what we get.
(07/24/2022) progress:
  • Adding g.t. PCL v02 runs in progress (will take another day or so to run). (Edit a few days later, yeah it seems to not be that helpful, interesting...)

Results


ToolFlowNet, observed PCL (2000 epochs)
5
ToolFlowNet, g.t. PCL v01 (2000 epochs)
5
ToolFlowNet, g.t. PCL v02
5
Direct Vector (MSE), observed PCL
4
Direct Vector (MSE), g.t. PCL v01
5



Example GIFs

Direct Vector MSE, Observed PCL

This method succeeded sometimes! But it doesn't go straight down and rotate (it seems to have trouble with this sequential aspect). Actually it's interesting that it doesn't get the water -- note that the point clouds do not contain any water.
Here's one of the seeds after 1000 epochs:

This one seems to most closely imitate what the demonstrator was actually doing (but success is very low):

UPDATE: ah we have to get rid of BC04_MMMultiSphere_v02_ntrain_0100_PCL_PNet2_eepose_4DoF_ar_8_hor_100_rawPCL_scaleTarg_2022_07_20_11_42_14_0004 because the physics got weird, I hate it when this happens (seems entirely out of my control):


Direct Vector (MSE), Ground Truth PCL v01

This method completely failed, not sure why. All 5 of the seeds end up looking like this after 1000 epochs:


ToolFlowNet, Observed PCL

At epoch 775 which seemed to be doing the best (on average).
Yeah it got a lot of the red one here...


ToolFlowNet, Ground Truth PCL v01

At epoch 825 which seemed to be doing the best (on average). Results show that it definitely has a lot of room to improve, but it seems way better than the direct vector method. So that's good. :)
However this one is a really bad one, it got a lot of the red one ...

This one did a bit better I think.


ToolFlowNet, Ground Truth PCL v02