Skip to main content

Behavior Cloning

Top level hypothesis: Giving scene graph information when trying to distill scene mesh information will help.
Created on November 29|Last edited on December 20

Debug baseline

Fractal Environment

Jump Environment


The jump environment that we are familiar with. There are two versions, Fixed and Random. In Fixed, the agent and goal positions are, well, fixed at the start of the episode.
The experiment procedure is as follows.
  1. Train a reinforcement learning agent. (SAC or PPO)
  2. Rollout the trained policy.
  3. Create 25_000 (state, action) pairs.
In JUMP the trained agent uses all 505 observations in training. During Behavior Cloning different sub-values are used.

MLP Baseline

We use the identical 3 layer 16 width model as before:

Of course, in this task simply looking and moving towards the goal is not sufficient. The agent, instead, needs to memorize what action to take in which position to move around the obstacles.
We also get the error values of a Random Agent to ground the abs_error / mean_absolute_percentage_error. I created a random agent by setting the learning rate to 0.

Run set
5

The agent is doing better than random but not by that much. The capacity seems to be the bottleneck given the small size of the network. Let's increase the size.

Run set
7

Not surprisingly, bigger is better in this case. With 4 layers of 1024, we might have approached potentially overfitting. However we can still see the parking behavior. The point really isn't to solve the env with brute force, so I will stop at 512 hidden size. Hopefully we will see improvement with GNN.
Hypothesis: Dropout will help with parking behavior.

Run set
8


Sweep over Random Parameters.

I am leaving the house for a few hours, lets run a sweep.

Run set
174




[New] Graph Neural Network -- Debug Graph

Graph Neural Network -- Jump Graph -- Flat Tree



There is no good reason we expect this graph structure to work. The reason why it is tested is because implementation was already present.

Run set
8


The existence of a more complicated graph made things worse.

Data Flow / Engineering

In order to handle graphs of different types I needed to refactor/update a lot of the code. We will have to further update the code to handle dynamic graphs. I decided to take this moment to map out the dataflow to better re-evaluate how to structure things. Excalidraw to the rescue.

There are a lot of code duplication, and this structure doesn't allow dynamic graphs. So in general we need the following transformations:
  • Unity Scene in Unity Editor --> Scene Graph Description + Base Features (JSON)
  • Scene Graph Description + Base Features (JSON) --> DGL PrototypeGraph
  • Observation + DGL PrototypeGraph --> Feature Updated DGL Graph
  • Observation + DGL PrototypeGraph --> Connection Updated DGL Graph
  • Batching in certain cases.
First problem to solve is accessing with index.
The first thing that was researched was using a container classing numpy arrays.

However, there was a lot of boilerplate code to implement, and it wasn't trivial to ensure numpy array functions worked as intended.
The second option is to use Structured Arrays implemented in Numpy.

However this wasn't flexible. You can only define one view, into the data and the slicing was unintuitive and slow. It truly is used for working with structs in python.
The third thing researched was Xarray.

Frankly, I couldn't figure out the coordinate data structure, and it didn't seem to solve my problem.
The last thing I tried, and succeeded in is using Subclasses.

I overwrote the __getitem__ function, and whenever a str is found, returned the corresponding slice from the slice dict. Is it slow? Yes, by 10x. Do I care? No. Why don't I care? Because the actual heavy lifting is done in Tensor land.

KNN Regression Baseline


Certain trained models act as if they don't respond to the observation. As a response to this I decided to implement a "Mean Baseline", which is the baserate prediction which is if we knew nothing about the particular datapoint but we still had access to the the dataset.
One thing lead to another and a lazy inference baseline seemed to be better. We look up the closest value(s) in the dataset and return the actions associated with that.

Let's see the performance:

Run set
18
Run set 2
9

The success rate is much what I expected, however when the loss is plotted this isn't surprising. One thing that suggested I had a bug is that k=1 didn't reach a 0 loss. I investigated:


The reason for non-zero loss with k=1 is that in certain cases the observations were identical, yet, the actions were wildly different. This made me go back to the MLP baseline with the addition of velocity, and bringing in the SHAP values.

MLP Baseline with Different Observations



Run set
21


Goal Edge to Closest point



Run set
23