Sebastian-dittert's workspace
Runs
29
Name
3 visualized
env: CartPole-v0
env: CartPole-v0
3
10
env: Pendulum-v1
env: Pendulum-v1
1
1
env: Pendulum-v0
env: Pendulum-v0
4
12
env: LunarLander-v2
env: LunarLander-v2
2
6
State
Notes
User
Tags
Created
Runtime
Sweep
buffer_size
env
episodes
eps_frames
log_video
min_eps
run_name
save_every
seed
batch_size
cql_weight
eval_every
hidden_size
learning_rate
target_action_gap
tau
temperature
with_lagrange
Alpha
Alpha Loss
Average10
Bellmann error
Bellmann error 1
Bellmann error 2
Buffer size
CQL Loss
CQL1 Loss
CQL2 Loss
Episode
Epsilon
Policy Loss
Q Loss
Reward
Steps
Lagrange Alpha
Lagrange Alpha Loss
Test Reward
Crashed
-
sebastian-dittert
25mo 13d 17h 11m 30s
-
100000
CartPole-v0
260
10000
0
0.01
["CQL-DQN","SAC","new_discrete_test"]
100
19.2
256
-
-
-
-
-
-
-
-
0.018321
0.038516
192.88
2.94296
2.39325
2.40848
26272.6
0.87646
1.10801
1.109
211.3
0.01
-64.52016
2.34794
188.9
18972.6
0
0
-
Finished
-
sebastian-dittert
26m 28s
-
100000
Pendulum-v1
300
-
0
-
SAC_CQL_new_test_pendulum
100
1
256
1
1
256
0.0003
10
0.005
1
0
0.20738
-0.0043685
-240.97059
-
245.24261
247.39452
23200
-
4.29181
4.25729
66
-
198.48547
-
-0.64611
13200
0
0
-283.09445
Failed
-
sebastian-dittert
3mo 14d 23h 13m 58s
-
100000
Pendulum-v0
300
-
0
-
["SAC_Base1_","SAC_CQL_new_test_pendulum","SAC_CQL_new_test_pendulum_w_lagrange","SAC_CQL_wo_alpha"]
100
2
256
1
1
256
0.0003
7.5
0.005
1
0.5
0.27596
-0.011701
-272.11186
-
245.74181
247.24236
28500
-
2.00746
2.00546
92.5
-
183.42378
-
-264.17485
18500
0.0080333
0.0011573
-282.27001
Finished
-
sebastian-dittert
21d 4h 21m 10s
-
100000
LunarLander-v2
600
-
0
-
["CQL-SAC-discrete","SAC"]
100
2
256
-
-
-
-
-
-
-
-
1.7446e-11
7.0787e-11
245.19747
-
5.94456
6.12994
100000
-
1.44344
1.44874
600
-
-56.93027
-
265.8966
368350.83333
0
0
-
1-4
of 4