Skip to main content

[Offline-to-Online] SPOT

Created on March 27|Last edited on June 9
Results are averaged over 4 seeds. For each dataset we plot d4rl normalized score. Offline pretraining lasts for 1M updates followed by online tuning over 1M updates.
Adroit scores are not available

AntMaze

umaze-v2

Reference score: 93.2 -> 99.2, regret: NaN
Our: 91.0 -> 99.5, regret: 0.01

1.2M1.4M1.6M1.8M2MStep0.020.040.060.080.10.12
500k1M1.5M2MStep20406080100
Run set
2016


umaze-diverse-v2

Reference score: 41.6 -> 96.0, regret NaN
Our: 36.2 -> 95.0, regret: 0.21

Run set
2016


medium-play-v2

Reference score: 75.2 -> 97.4, regret: NaN
Our: 67.2-> 97.2, regret: 0.05

Run set
2016


medium-diverse-v2

Reference score: 73.0 -> 96.2, regret: NaN
Our: 73.7 -> 94.5, regret: 0.05

Run set
2016


large-play-v2

Reference score: 40.8 -> 89.4, regret: NaN
Our: 31.5 -> 87.0, regret: 0.29

Run set
2016


large-diverse-v2

Reference score: 44.0 -> 90.8, regret: NaN
Our: 17.5 -> 81.0, regret: 0.23

Run set
2016


Adroit

Pen

Cloned

Reference: NaN
Our: 6.1 -> 43.6, regret: 0.58

Run set
2016


Door

Cloned

Reference: NaN
Our: -0.2 -> 0.0, regret: 0.99

Run set
2016



Relocate

Cloned

Reference: NaN
Our: -0.2 -> -0.1, regret: 1.0

Run set
2016



Hammer

Cloned

Reference: NaN
Our: 3.9 -> 3.7, regret: 0.97

Run set
2016