Lifelong Hanabi Experiments
Created on January 9|Last edited on February 21
Comment
As described in the paper, Lifelong Hanabi consists of three phases: 1- Pre-training, 2- Continual training, and 3- Testing. In this notebook, we report the results of each phase with more details on the experiment settings.
1- Pre-training:
2- Continual training
Next, we take a pre-trained agent (In this case, IQL_210) as a learner and train it with a set of partners (Hard partners and Easy partners) sequentially using different SOTA lifelong learning algorithms. What we plot here is the performance of the learner during training.
Hard partners:
Learner: IQL (Type 2)
Partners --- { VDN+OP (Type-3), VDN (Type-4), VDN (Type-5), IQL+OP (Type-3), VDN (Type-3)}.

/Cross-Play matrix of the learner and its Hard partners
EWC_online_SGD
5
5
EWC_online_Adam
5
5
5
5
5
5
MTL_Adam
6
6
Naive_Adam
5
5
Easy partners:
Learner --- IQL (Type-2)
Partners --- {IQL (Type-1), VDN (Type-3), VDN (Type-5), IQL+OP (Type-2), VDN+OP (Type-5)}.

Cross-Play matrix of the learner and its Easy partners
MTL_SGD
5
MTL_Adam
5
Naive_Adam
5
Naive_SGD
6
EWC_offline_SGD
5
EWC_offline_Adam
5
EW_Online_Adam
5
EWC_online_SGD
5
ER_SGD
5
ER_Adam
6
AGEM_Adam
5
AGEM_SGD
4
3- Testing
The goal of this section is to measure the generalization of the learnerend versus some unseen agents. These unseen agents can be chosen from the same MARL algorithm that the learner is pre-trained with (Intra-CP) or from a broader group of agents with different MARL pre-trained agents (Inter-CP). Here we show how the Inter-CP score is improved during continual training at the end of each task.
ER_SGD
1
ER_SGD_Hard
1
1
1
1
1
1
1
1
EWC_offline_Adam
1
EWC_online_Adam
1
EWC_online_SGD
1
EWC_online_SGD_hard
1
EWC_offline_SGD_hard
1
EWC_online_Adam_hard
1
1
MTL_SGD
1
1
MTL_Adam
1
MTL_Adam_hard
1
ER_SGD_AUX
1
IQL_ER_SGD
1
SAD+OP+AUX
1
IQL_EWC_online_Adam
1
IQL_ER_AUX_SGD
1
Ablation Studies:
Ep Memory:
ER_Adam_Hard_8k
3
ER_SGD_Hard_8k
3
ER_Adam_Hard_2k
3
ER_SGD_Hard_2k
3
ER_Adam_Hard_32k
5
ER_SGD_Hard_32k
6
Evaluation steps:
eval_10
1
eval_200
1
eval_2
1
eval_50
3
Testing
Add a comment