Skip to main content

Pendulum-BC-GAIL-steps

Learning to imitate the OpenAI Gym Pendulum-v0 control task from demonstrations of the task. The goal is to balance an inverted pendulum upright, achieving a score of at least -200. Imitation is defined as matching the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
Created on September 28|Last edited on November 4

Section 1




Pendulum-v0: BC-20Pendulum-v0: GAIL-20Pendulum-v0: BC-10Pendulum-v0: GAIL-10Pendulum-v0: BC-5Pendulum-v0: GAIL-5−300−200−1000

Parameter importance with respect to
mean true reward (IL)

Config parameter
Importance
Correlation
Loading...
meta
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","20"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","20"]
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","10"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","10"]
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","5"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","5"]
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
train_bc.py
train.py
train_bc.py
train.py
train_bc.py
train.py
7m 28s
51m 37s
5m 36s
45m 22s
9m 13s
1h 5m 57s
config
1e5
-
1e5
-
1e5
-
-
0.0000235
-
0.0000235
-
0.0000235
-
false
-
false
-
false
models
-
models
-
models
-
-
0.01118
-
0.01118
-
0.01118
-
Pendulum-v0
-
Pendulum-v0
-
Pendulum-v0
Pendulum-v0
-
Pendulum-v0
-
Pendulum-v0
-
-
true
-
true
-
true
-
0.99
-
0.99
-
0.99
-
0.9
-
0.9
-
0.9
false
-
false
-
false
-
logs
-
logs
-
logs
-
-
0.000193
-
0.000193
-
0.000193
-
20
-
10
-
5
Run set
6