[Offline] Decision Transformer
Created on September 27|Last edited on June 6
Comment
Results are averaged over 4 seeds. For each dataset we plot d4rl normalized score.
Locomotion scores are from Offline Reinforcement Learning with Implicit Q-Learning, Maze2d reference scores are from Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Locomotion
Maze2d
AntMaze
umaze-v2
umaze-diverse-v2
medium-play-v2
medium-diverse-v2
large-play-v2
large-diverse-v2
Adroit
Pen
Human-v1
Cloned-v1
Expert-v1
Door
Human-v1
Cloned-v1
Expert-v1
Hammer
Human-v1
Cloned-v1
Expert-v1
Relocate
Human-v1
Cloned-v1
Expert-v1
Add a comment