Reports
Created by
Created On
Last edited
Agents trained with Imitation Learning Reward
Note that DSSMs which provide the IL reward were trained on datasets containing negatives for all states. Therefore following experiment serve more as a proof of concept. Moreover, only MountainCar and Acrobot are considered as agent learns to solve CartPole as long as reward as positive.
0
2021-03-12
Temperature and LR for LogSoftMax Reward
This report considers various values of temperature and learning rate for LogSoftMax Reward.
0
2021-03-24
Negatives from Experience Replay vs Negatives from Expert
Size of Experience Replay is 10. Fixed subset of size 10 is chosen from the expert dataset.
0
2021-03-12