Skip to main content

[Offline-to-Online] IQL

Created on March 6|Last edited on June 6
Results are averaged over 4 seeds. For each dataset we plot d4rl normalized score. Offline pretraining lasts for 1M updates followed by online tuning over 1M updates.


AntMaze

umaze-v2

Reference score: 85.4 -> 96.2, regret: NaN
Our: 77.0 -> 96.5, regret: 0.07

Run set
2016


umaze-diverse-v2

Reference score: 70.8 -> 62.2, regret: NaN
Our: 59.5 -> 78.0, regret: 63.7

Run set
2016


medium-play-v2

Reference score: 68.6 -> 89.8, regret: 0.10
Our: 71.7 -> 89.7, regret: 0.09

Run set
2016


medium-diverse-v2

Reference score: 73.4 -> 90.2, regret: 0.09
Our: 64.2 -> 92.2, regret: 0.10

Run set
2016


large-play-v2

Reference score: 40.0 -> 78.6, regret: 0.52
Our: 38.5 -> 64.5, regret: 0.33

Run set
2016


large-diverse-v2

Reference score: 40.4 -> 73.4, regret: 0.46
Our: 26.7 -> 64.2, regret: 0.41

Run set
2016


Adroit

Pen

Cloned

Reference: NaN
Our: 83.7 -> 102.0, regret: 0.36

Run set
2016


Door

Cloned

Reference: NaN
Our: 1.1 -> 20.3, regret: 0.83

Run set
2016



Relocate

Cloned

Reference: NaN
Our: 0.0 -> 0.3, regret: 0.99

Run set
2016



Hammer

Cloned

Reference: NaN
Our: 1.3 -> 57.2, regret: NaN

Run set
2016