Pendulum-BC-GAIL-steps
Learning to imitate the OpenAI Gym Pendulum-v0 control task from demonstrations of the task. The goal is to balance an inverted pendulum upright, achieving a score of at least -200. Imitation is defined as matching the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
Created on September 28|Last edited on November 4
Comment
Section 1
meta
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","20"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","20"]
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","10"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","10"]
["--env-id","Pendulum-v0","--algo","sac","--exp-id","1","--train-BC","-wandb","-iters","1e5","-num-trajs","5"]
["--env","Pendulum-v0","--algo","sac","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","5"]
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
train_bc.py
train.py
train_bc.py
train.py
train_bc.py
train.py
7m 28s
51m 37s
5m 36s
45m 22s
9m 13s
1h 5m 57s
config
1e5
-
1e5
-
1e5
-
-
0.0000235
-
0.0000235
-
0.0000235
-
false
-
false
-
false
models
-
models
-
models
-
-
0.01118
-
0.01118
-
0.01118
-
Pendulum-v0
-
Pendulum-v0
-
Pendulum-v0
Pendulum-v0
-
Pendulum-v0
-
Pendulum-v0
-
-
true
-
true
-
true
-
0.99
-
0.99
-
0.99
-
0.9
-
0.9
-
0.9
false
-
false
-
false
-
logs
-
logs
-
logs
-
-
0.000193
-
0.000193
-
0.000193
-
20
-
10
-
5
Run set
6
Add a comment