SAC-without-AirSim-steps
Learning to perform some OpenAI Gym, MuJoCo control tasks (goal scores): Pendulum-v0 (-200), CartPole-v1 (500), LunarLanderContinuous-v2 (200), Hopper-v2 (3500), and HalfCheetah-v2 (4800). Soft Actor-Critic (SAC) is trained on the cost function of these tasks. SAC achieves the goal scores for the tasks, verified by looking at the "mean episode reward (RL)" chart. The learned SAC models are used to generate expert demonstrations of the task. GAIL is then trained on these demos to learn to imitate the tasks.
Created on September 29|Last edited on November 4
Comment
Section 1
Add markdown, images, and LaTeX\LaTeX
meta
7h 27m 59s
7h 6m 25s
2h 18m 37s
21m 5s
15m 35s
config
sac
sac
sac
dqn
sac
256
256
256
-
-
1000000
1000000
-
50000
-
auto
0.01
-
-
-
HalfCheetah-v2
Hopper-v2
LunarLanderContinuous-v2
CartPole-v1
Pendulum-v0
2
1
1
1
1
-
-
-
0.02
-
-
-
-
0.1
-
0.99
-
-
-
-
false
-
-
-
-
1
1
-
-
-
-
{}
{}
{}
{}
0.0003
lin_3e-4
-
0.001
-
10000
1000
1000
-
1000
CustomSACPolicy
CustomSACPolicy
MlpPolicy
CustomDQNPolicy
MlpPolicy
-
-
-
true
-
0
0
0
0
1e5
1
1
-
-
-
summary
26879
25584
8264
1073
935
500
1313
409
4
125
1601132343
1600912427
1600857722
1600849320
1600846816
Run set
5
Add a comment