fantomas

This report considers various negative sampling techniques.

2021-03-25

4 years ago

Agents trained with Imitation Learning Reward

Note that DSSMs which provide the IL reward were trained on datasets containing negatives for all states. Therefore following experiment serve more as a proof of concept. Moreover, only MountainCar and Acrobot are considered as agent learns to solve CartPole as long as reward as positive.

fantomas

2021-03-12

4 years ago

Temperature and LR for LogSoftMax Reward

This report considers various values of temperature and learning rate for LogSoftMax Reward.

fantomas

2021-03-24

4 years ago

Imitation Learning on CartPole

Agents trained on CartPole with different rewards.

fantomas

2021-03-15