Hopper-BC-GAIL-steps
Learning to imitate the MuJoCo Hopper-v2 control task from demos of the task. The goal is to make a 2D one-legged robot hop forward asap, achieving a score of at least 3000. Imitation roughly means to match the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on learning the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
Created on September 28|Last edited on November 4
Comment
Section 1
Add markdown, images, and LaTeX\LaTeX
meta
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","20"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","20","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","10"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","10","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","5"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","5","--batch-size","256"]
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
train.py
train_bc.py
train.py
train_bc.py
train.py
train_bc.py
4h 38m
36m 52s
2h 47m 32s
45m 58s
5h 44m 34s
1h 3m 41s
config
-
256
-
256
-
256
-
5e5
-
5e5
-
5e5
0.1
-
0.1
-
0.1
-
15
-
15
-
15
-
false
-
false
-
false
-
-
models
-
models
-
models
0.01
-
0.01
-
0.01
-
Hopper-v2
-
Hopper-v2
-
Hopper-v2
-
-
Hopper-v2
-
Hopper-v2
-
Hopper-v2
true
-
true
-
true
-
0.99
-
0.99
-
0.99
-
false
-
false
-
-
-
0.95
-
0.95
-
0.95
-
-
false
-
false
-
false
Run set
6
Add a comment