GAIL-Gym-steps
Learning to imitate some OpenAI Gym, MuJoCo control tasks, from demos of the task: Pendulum-v0 (-200), CartPole-v1 (500), LunarLanderContinuous-v2 (200), Hopper-v2 (3500), and HalfCheetah-v2 (4800). Imitation is defined as matching learned model's score (mean, std) with expert's score (mean, std). We look at the imitation accuracy of Generative Adversarial Imitation Learning (GAIL), which learns to imitate with just 5 demonstrations of all tasks (Lunar Lander takes 10 demos), demonstrating sample-efficiency
Created on September 28|Last edited on November 4
Comment
Section 1
Add markdown, images, and LaTeX\LaTeX
meta
1d 9m 7s
14h 5m 33s
3h 49m 26s
3h 26m 1s
config
sac
trpo
dqn
sac
0.1
0.1
0.001
0.0000235
10
15
10
-
0
0.01
0
0.01118
LunarLanderContinuous-v2
Hopper-v2
CartPole-v1
Pendulum-v0
0.995
0.99
0.99
0.99
false
false
-
-
-
{}
{}
{}
0.98
0.95
1
0.9
0.01
0.005
0.001
0.000193
11.66667
11.66667
5
11.66667
1024
2048
512
1024
5
5
3
10
0.001
0.001
0.0001
0.00428
summary
advantage
-1.45419
-4.5815
-1.15931
-2.98559
Run set
10
Add a comment