SpheresLadleEnv (3DoF and 4DoF)
How do these compare?
Created on July 27|Last edited on July 28
Comment
Compare with Mixed Media 3DoF and 4DoF results which are in separate wandb reports (for the spheres I put them all in the same thing):
Overall TimelineResultsExample GIFS0 DistractorsDirect Vector MSE (observed PCL)Direct Vector MSE (gt PCL)ToolFlowNet No SVD, MSE (observed PCL)ToolFlowNet No SVD, MSE (gt PCL)ToolFlowNet, Pointwise (observed PCL)ToolFlowNet, Pointwise (gt PCL)1 Distractor1 Distr., Direct Vector MSE (observed PCL)1 Distr., Direct Vector MSE (gt PCL)1 Distr., ToolFlowNet No SVD, MSE (observed PCL)1 Distr., ToolFlowNet No SVD, MSE (gt PCL)1 Distr., ToolFlowNet, Pointwise (observed PCL)1 Distr., ToolFlowNet, Pointwise (gt PCL)
Overall Timeline
(07/27/2022) Ran some results last night and now putting the results together here. Statistics should be the same as in other MM results, i.e.:
- eval/info_done_final: if we have retrieved the target item. (Does not consider the distractor, may account for cases when retrieving both.) This is what we normally report.
- eval/info_dist_1_done_final: if we have retrieved the distractor. (Does not consider the target, may account for cases when retrieving both.)
- eval/info_done_and_dist_1_final: if we have retrieved BOTH items.
- eval/info_done_no_dist_1_final: if we have retrieved the target and AVOIDED the distractor.
- eval/info_no_done_dist_1_final: if we have retrieved the distractor but NOT the target!
TODO
Results
NOTE: for the eval/info_done_final all the results are clustered near 1, so I have a duplicate plot which only looks at 80-100% range.
1 Sphere, Direct Vector (obs PCL)
5
1 Sphere, Direct Vector (gt PCL)
5
2 Spheres, Direct Vector (obs PCL)
5
2 Spheres, Direct Vector (gt PCL)
5
1 Sphere, TFN No SVD (obs PCL)
5
1 Sphere, TFN No SVD (gt PCL)
5
2 Spheres, TFN No SVD (obs PCL)
5
2 Spheres, TFN No SVD (gt PCL)
5
1 Sphere, TFN (obs PCL)
5
1 Sphere, TFN (gt PCL)
0
2 Spheres, TFN (obs PCL)
5
2 Spheres, TFN (gt PCL)
5
Conclusions?
- Yeah, definitely seems like performance is way up across the board.
Example GIFS
0 Distractors
Direct Vector MSE (observed PCL)
Direct Vector MSE (gt PCL)
ToolFlowNet No SVD, MSE (observed PCL)
ToolFlowNet No SVD, MSE (gt PCL)
ToolFlowNet, Pointwise (observed PCL)
100% success.

ToolFlowNet, Pointwise (gt PCL)
1 Distractor
1 Distr., Direct Vector MSE (observed PCL)
Example with 24/25 success.

1 Distr., Direct Vector MSE (gt PCL)
Example with 23/25 success.

1 Distr., ToolFlowNet No SVD, MSE (observed PCL)
Example with 24/25 success.

1 Distr., ToolFlowNet No SVD, MSE (gt PCL)
Example with 25/25 success.

1 Distr., ToolFlowNet, Pointwise (observed PCL)
100% success.

1 Distr., ToolFlowNet, Pointwise (gt PCL)
Add a comment