Skip to main content

Hopper-BC-GAIL-steps

Learning to imitate the MuJoCo Hopper-v2 control task from demos of the task. The goal is to make a 2D one-legged robot hop forward asap, achieving a score of at least 3000. Imitation roughly means to match the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on learning the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
Created on September 28|Last edited on November 4

Section 1

Add markdown, images, and LaTeX\LaTeX




Hopper-v2: GAIL-20Hopper-v2: BC-20Hopper-v2: GAIL-10Hopper-v2: BC-10Hopper-v2: GAIL-5Hopper-v2: BC-501,0002,0003,000

Parameter importance with respect to
mean true reward (IL)

Config parameter
Importance
Correlation
Loading...
meta
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","20"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","20","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","10"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","10","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","5"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","5","--batch-size","256"]
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
train.py
train_bc.py
train.py
train_bc.py
train.py
train_bc.py
4h 38m
36m 52s
2h 47m 32s
45m 58s
5h 44m 34s
1h 3m 41s
config
-
256
-
256
-
256
-
5e5
-
5e5
-
5e5
0.1
-
0.1
-
0.1
-
15
-
15
-
15
-
false
-
false
-
false
-
-
models
-
models
-
models
0.01
-
0.01
-
0.01
-
Hopper-v2
-
Hopper-v2
-
Hopper-v2
-
-
Hopper-v2
-
Hopper-v2
-
Hopper-v2
true
-
true
-
true
-
0.99
-
0.99
-
0.99
-
false
-
false
-
-
-
0.95
-
0.95
-
0.95
-
-
false
-
false
-
false
Run set
6