Behavior Cloning

Top level hypothesis: Giving scene graph information when trying to distill scene mesh information will help.
Created on November 29|Last edited on December 20
Comment
﻿
Debug baseline
Fractal Environment
Jump Environment
﻿
The jump environment that we are familiar with. There are two versions, Fixed and Random. In Fixed, the agent and goal positions are, well, fixed at the start of the episode.
The experiment procedure is as follows.
Train a reinforcement learning agent. (SAC or PPO)
Rollout the trained policy.
Create 25_000 (state, action) pairs.
In JUMP the trained agent uses all 505 observations in training. During Behavior Cloning different sub-values are used. 
MLP BaselineWe use the identical 3 layer 16 width model as before:
﻿
Of course, in this task simply looking and moving towards the goal is not sufficient. The agent, instead, needs to memorize what action to take in which position to move around the obstacles.
We also get the error values of a Random Agent to ground the abs_error / mean_absolute_percentage_error. I created a random agent by setting the learning rate to 0.
﻿
Run set5
﻿
The agent is doing better than random but not by that much. The capacity seems to be the bottleneck given the small size of the network. Let's increase the size.
﻿
Run set7
﻿
Not surprisingly, bigger is better in this case. With 4 layers of 1024, we might have approached potentially overfitting. However we can still see the parking behavior. The point really isn't to solve the env with brute force, so I will stop at 512 hidden size. Hopefully we will see improvement with GNN.
Hypothesis: Dropout will help with parking behavior.
﻿
Run set8
﻿
Sweep over Random Parameters.I am leaving the house for a few hours, lets run a sweep. 
﻿
Run set174
﻿
﻿
[New] Graph Neural Network -- Debug Graph
Graph Neural Network -- Jump Graph -- Flat Tree﻿
﻿
There is no good reason we expect this graph structure to work. The reason why it is tested is because implementation was already present.
﻿
Run set8
﻿
﻿
The existence of a more complicated graph made things worse.
Data Flow / EngineeringIn order to handle graphs of different types I needed to refactor/update a lot of the code. We will have to further update the code to handle dynamic graphs. I decided to take this moment to map out the dataflow to better re-evaluate how to structure things. Excalidraw to the rescue.
﻿
There are a lot of code duplication, and this structure doesn't allow dynamic graphs. So in general we need the following transformations:
Unity Scene in Unity Editor --> Scene Graph Description + Base Features (JSON)
Scene Graph Description + Base Features (JSON) --> DGL PrototypeGraph
Observation + DGL PrototypeGraph --> Feature Updated DGL Graph
Observation + DGL PrototypeGraph --> Connection Updated DGL Graph
Batching in certain cases.
First problem to solve is accessing with index. 
 The first thing that was researched was using a container classing numpy arrays.
﻿
However, there was a lot of boilerplate code to implement, and it wasn't trivial to ensure numpy array functions worked as intended.
The second option is to use Structured Arrays implemented in Numpy.
﻿
However this wasn't flexible. You can only define one view, into the data and the slicing was unintuitive and slow. It truly is used for working with structs in python.
The third thing researched was Xarray.
﻿
Frankly, I couldn't figure out the coordinate data structure, and it didn't seem to solve my problem.
The last thing I tried, and succeeded in is using Subclasses.
﻿
I overwrote the __getitem__ function, and whenever a str is found, returned the corresponding slice from the slice dict. Is it slow? Yes, by 10x. Do I care? No. Why don't I care? Because the actual heavy lifting is done in Tensor land.
KNN Regression Baseline
﻿
Certain trained models act as if they don't respond to the observation. As a response to this I decided to implement a "Mean Baseline", which is the baserate prediction which is if we knew nothing about the particular datapoint but we still had access to the the dataset.
 One thing lead to another and a lazy inference baseline seemed to be better. We look up the closest value(s) in the dataset and return the actions associated with that.
﻿
Let's see the performance:
﻿
 
Run set18
Run set 29
﻿
The success rate is much what I expected, however when the loss is plotted this isn't surprising. One thing that suggested I had a bug is that k=1 didn't reach a 0 loss. I investigated:
﻿
﻿
The reason for non-zero loss with k=1 is that in certain cases the observations were identical, yet, the actions were wildly different. This made me go back to the MLP baseline with the addition of velocity, and bringing in the SHAP values.
MLP Baseline with Different Observations﻿
﻿
Run set21
﻿
Goal Edge to Closest point﻿
﻿
Run set23
﻿
﻿
Add a comment