MBRL-Lib and Duckietown Results

Created on April 29|Last edited on April 29
Comment
﻿
General Comments: Experiments are still very crude. For example, average episode reward includes both training and test episodes, ideally we would want to compare only test episodes for performance, but for lack of time / compute, we were unable to have enough results in that criteria.
Dreamer: Duckietown vs Cheetah RunComments: These graphs show us that while learning is taking place in the Duckietown environment, we need to do tuning or pre-processing before the learning can really be effective. We can see that the curves for Cheetah Run are a lot more stable than the curves for Duckietown.
﻿
Average Episode Reward
Average Episode Reward
0102030405060Episodes-400-2000200400600Reward
Dreamer on Duckietown
Dreamer on Cheetah Run
Run set2
﻿
PlaNet: Duckietown vs Cheetah RunComments: As expected, it is "easier" for PlaNet to model, and learn behavior for Cheetah Run than Duckietown.
﻿
Run set2
﻿
﻿
Duckietown: Dreamer vs PlaNet Comments: 
There are some important differences between the two algorithms here. For example the observation loss for Dreamer is  ~400x higher than PlaNet. 
So again it is important to note here that there is essentially another implementation of PlaNet within Dreamer, that is different from the PlaNet implementation in MBRL-Lib. Initially, we wanted to use the MBRL-Lib PlaNet as Dreamer's world model, and build Dreamer on top of MBRL-Lib's PlaNet, but and keep the hyperparameters and other settings fixed, to have a more fair comparison between the two.
Since we are using another implementation of PlaNet (Non-MBRL-Lib) for our Dreamer implementation, it is not only possible that loss metrics are computed differently, it's also that the hyper-parameters being used are different.
This is not to say that the hyperparameters between MBRL-Lib's PlaNet and Dreamer's PlaNet should be the same. Since for example, Dreamer estimates the values of states at the end of the planning using Actor Critic, so Dreamer would deliberately want to use a lower planning horizon. This would still be an appropriate comparison.
These are all good experiments that future work can target. Ideally, we run both versions of PlaNet on Duckietown and Cheetah Run to compare them, or substitute the PlaNet module for Dreamer with the PlaNet-MBRL-Lib with the same parameters, and update with more results
As expected, Dreamer seems to perform better on Duckietown than PlaNet.
﻿
﻿
Cheetah Run: Dreamer vs PlaNetComments: As expected Dreamer performs better than PlaNet on Cheetah Run.
﻿
﻿
﻿
﻿
PlaNet: Hyper-parameter Search on Duckietown Conclusions: 
Higher model learning rate leads to better performance.
It's interesting that none of the other parameters had a large impact on reward. For example we tried planning horizons from 50 to 500 and they all were able to achieve good rewards, this is something we did not expect.
The CEM Elite Ratio or CEM Number of Iterations did not have significant impact either.
﻿
Sweep: ninz7ab2 143
Sweep: ninz7ab2 20
﻿
﻿
Add a comment