Hopper-BC-GAIL-steps

Learning to imitate the MuJoCo Hopper-v2 control task from demos of the task. The goal is to make a 2D one-legged robot hop forward asap, achieving a score of at least 3000. Imitation roughly means to match the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on learning the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.

Prabhasa Kalkur

Created on September 28|Last edited on November 4

Comment

﻿
Section 1Add markdown, images, and LaTeX\LaTeXLATE​X
﻿
﻿
﻿
test score (mean)
test score (mean)
Hopper-v2: GAIL-20Hopper-v2: BC-20Hopper-v2: GAIL-10Hopper-v2: BC-10Hopper-v2: GAIL-5Hopper-v2: BC-501,0002,0003,000
Parameter importance with respect tomean true reward (IL)
1-10
 of 25
Config parameter
Importance
Correlation
num_trajs
num_trajs
timesteps_per_batch
timesteps_per_batch
save_best_model
save_best_model
check_callback
check_callback
eval_callback
eval_callback
timesteps_RL
timesteps_RL
timesteps_IL
timesteps_IL
vf_stepsize
vf_stepsize
tensorboard
tensorboard
cg_damping
cg_damping
Loading...
​
diff only
Hopper-v2: GAIL-20
Hopper-v2: BC-20
Hopper-v2: GAIL-10
Hopper-v2: BC-10
Hopper-v2: GAIL-5
Hopper-v2: BC-5
meta
args
args
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","20"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","20","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","10"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","10","--batch-size","256"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","-il","-eval","-best","-tb","-wandb","--num-trajs","5"]
["--env","Hopper-v2","--algo","trpo","--exp-id","1","--train-BC","-wandb","-iters","5e5","--num-trajs","5","--batch-size","256"]
codePath
codePath
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
"train.py"
"train_bc.py"
program
program
train.py
train_bc.py
train.py
train_bc.py
train.py
train_bc.py
runtime
runtime
4h 38m
36m 52s
2h 47m 32s
45m 58s
5h 44m 34s
1h 3m 41s
config
batch_size
batch_size
-
256
-
256
-
256
BC_max_iter
BC_max_iter
-
5e5
-
5e5
-
5e5
cg_damping
cg_damping
0.1
-
0.1
-
0.1
-
cg_iters
cg_iters
15
-
15
-
15
-
check_callback
check_callback
false
-
false
-
false
-
checkpoint_dir
checkpoint_dir
-
models
-
models
-
models
entcoeff
entcoeff
0.01
-
0.01
-
0.01
-
env
env
Hopper-v2
-
Hopper-v2
-
Hopper-v2
-
env_id
env_id
-
Hopper-v2
-
Hopper-v2
-
Hopper-v2
eval_callback
eval_callback
true
-
true
-
true
-
gamma
gamma
0.99
-
0.99
-
0.99
-
generate_expert
generate_expert
false
-
false
-
-
-
lam
lam
0.95
-
0.95
-
0.95
-
load_sample
load_sample
-
false
-
false
-
false
Run set6
﻿
﻿

Add a comment