Skip to main content

Evidence: Different stopping times for training

Created on April 12|Last edited on April 12
Number of seeds: 5

Smac v1: 3m


010k20k30k40k50kTrainer Steps (eval)5101520
system: dbc
system: maicq
system: idrqn+bcq
system: idrqn+cql
system: qmix+bcq
system: qmix+cql
system: idrqn
system: qmix
Run set
40

We shall cut at various points, but most notably:
  • QMix + cql degrades massively
  • QMix + BCQ only enters the races later

Smac v1: 8m


Run set
40

Now maicq, dbc and idrqn+cql are the only algorithms in the races.
Notably, Maicq only enters the races after about 12k timesteps.