Skip to main content

GAIL-Gym-steps

Learning to imitate some OpenAI Gym, MuJoCo control tasks, from demos of the task: Pendulum-v0 (-200), CartPole-v1 (500), LunarLanderContinuous-v2 (200), Hopper-v2 (3500), and HalfCheetah-v2 (4800). Imitation is defined as matching learned model's score (mean, std) with expert's score (mean, std). We look at the imitation accuracy of Generative Adversarial Imitation Learning (GAIL), which learns to imitate with just 5 demonstrations of all tasks (Lunar Lander takes 10 demos), demonstrating sample-efficiency
Created on September 28|Last edited on November 4

Section 1

Add markdown, images, and LaTeX\LaTeX




meta
1d 9m 7s
14h 5m 33s
3h 49m 26s
3h 26m 1s
config
sac
trpo
dqn
sac
0.1
0.1
0.001
0.0000235
10
15
10
-
0
0.01
0
0.01118
LunarLanderContinuous-v2
Hopper-v2
CartPole-v1
Pendulum-v0
0.995
0.99
0.99
0.99
false
false
-
-
-
{}
{}
{}
0.98
0.95
1
0.9
0.01
0.005
0.001
0.000193
11.66667
11.66667
5
11.66667
1024
2048
512
1024
5
5
3
10
0.001
0.001
0.0001
0.00428
summary
advantage
-1.45419
-4.5815
-1.15931
-2.98559

Parameter importance with respect to
mean true reward (IL)

Config parameter
Importance
Correlation
Loading...
1.5k2k2.5k3k3.5kStep01000200030004000
group: LunarLanderContinuous-v2
group: Hopper-v2
group: CartPole-v1
group: Pendulum-v0
Run set
10