Skip to main content

[Offline] TD3 + BC

Created on September 7|Last edited on September 7
Results are averaged over 4 seeds. For each dataset we plot d4rl normalized score.
Locomotion reference scores are from Offline Reinforcement Learning with Implicit Q-Learning

Locomotion

Halfcheetah

medium-v2

Reference score:

200k400k600k800k1MStep10203040
Run set
2016



medium-replay-v2

Reference score:

Run set
16



medium-expert-v2

Reference score:

Run set
20



Walker2d

medium-v2

Reference score:

Run set
28


medium-replay-v2

Reference score:

Run set
24


medium-expert-v2

Reference score:

Run set
24


Hopper

medium-v2

Reference score:

Run set
24


medium-replay-v2

Reference score:

Run set
24


medium-expert-v2

Reference score:

Run set
24


Maze2d

umaze-v1

Reference score: -

Run set
24


medium-v1

Reference score: -

Run set
24


large-v1

Reference score: -

Run set
24


AntMaze

umaze-v2

Reference score: NaN

Run set
12


umaze-diverse-v2

Reference score: NaN

Run set
16


medium-play-v2

Reference score: NaN

Run set
20


medium-diverse-v2

Reference score: NaN

Run set
24


large-play-v2

Reference score: NaN

Run set
28


large-diverse-v2

Reference score: NaN

Run set
36


Adroit

Pen

Human-v1

Reference score: NaN

Run set
16


Cloned-v1

Reference score: NaN

Run set
24


Expert-v1

Reference score: NaN

Run set
28


Door

Human-v1

Reference score: NaN

Run set
24


Cloned-v1

Reference score: NaN

Run set
32


Expert-v1

Reference score: NaN

Run set
36



Hammer

Human-v1

Reference score: NaN

Run set
24


Cloned-v1

Reference score: NaN

Run set
32


Expert-v1

Reference score: NaN

Run set
36



Relocate

Human-v1

Reference score: NaN

Run set
24


Cloned-v1

Reference score: NaN

Run set
32


Expert-v1

Reference score: NaN

Run set
36