CleanRL SAC (jax)
Adapted from SBX (SB3 + Jax) implementation
Created on October 23|Last edited on October 24
Comment
Performance (sample efficiency)
HalfCheetah-v2
Computing group metrics from first 10 groups
Run set
104
Hopper-v2
Run set
108
Walker2d-v2
Run set
106
Runtime
Note: for SAC (jax), multiple runs were done at the same time on the same machine, so the number of steps per second is a lower bound.
Additional optimization can be applied but adding a bit of complexity:
- do multiple gradient steps every n steps
HalfCheetah-v2
Run set
104
Hopper-v2
Run set
108
Walker2d-v2
Run set
106
Add a comment