Skip to main content

Lifelong Hanabi Experiments

Created on January 9|Last edited on February 21
As described in the paper, Lifelong Hanabi consists of three phases: 1- Pre-training, 2- Continual training, and 3- Testing. In this notebook, we report the results of each phase with more details on the experiment settings.

1- Pre-training:

2- Continual training

Next, we take a pre-trained agent (In this case, IQL_210) as a learner and train it with a set of partners (Hard partners and Easy partners) sequentially using different SOTA lifelong learning algorithms. What we plot here is the performance of the learner during training.

Hard partners:
Learner: IQL (Type 2)
Partners --- { VDN+OP (Type-3), VDN (Type-4), VDN (Type-5), IQL+OP (Type-3), VDN (Type-3)}.
/Cross-Play matrix of the learner and its Hard partners


EWC_online_SGD
5
EWC_offline_SGD
5
EWC_online_Adam
5
EWC_offline_Adam
5
ER_SGD
5
ER_Adam
5
MTL_SGD
5
AGEM_Adam
5
MTL_Adam
6
AGEM_SGD
6
Naive_Adam
5
Naive_SGD
5

Easy partners:
Learner --- IQL (Type-2)
Partners --- {IQL (Type-1), VDN (Type-3), VDN (Type-5), IQL+OP (Type-2), VDN+OP (Type-5)}.
Cross-Play matrix of the learner and its Easy partners


MTL_SGD
5
MTL_Adam
5
Naive_Adam
5
Naive_SGD
6
EWC_offline_SGD
5
EWC_offline_Adam
5
EW_Online_Adam
5
EWC_online_SGD
5
ER_SGD
5
ER_Adam
6
AGEM_Adam
5
AGEM_SGD
4


3- Testing

The goal of this section is to measure the generalization of the learnerend versus some unseen agents. These unseen agents can be chosen from the same MARL algorithm that the learner is pre-trained with (Intra-CP) or from a broader group of agents with different MARL pre-trained agents (Inter-CP). Here we show how the Inter-CP score is improved during continual training at the end of each task.


ER_SGD
1
ER_SGD_Hard
1
ER_Adam
1
ER_Adam_hard
1
AGEM_Adam
1
AGEM_Adam_hard
1
AGEM_SGD_hard
1
AGEM_SGD
1
RWC_offline_SGD
1
EWC_offline_Adam
1
EWC_online_Adam
1
EWC_online_SGD
1
EWC_online_SGD_hard
1
EWC_offline_SGD_hard
1
EWC_online_Adam_hard
1
EWC_offline_Adam
1
MTL_SGD
1
MTL_SGD_hard
1
MTL_Adam
1
MTL_Adam_hard
1
ER_SGD_AUX
1



IQL_ER_SGD
1
SAD+OP+AUX
1
IQL_EWC_online_Adam
1
IQL_ER_AUX_SGD
1



Ablation Studies:

Ep Memory:


ER_Adam_Hard_8k
3
ER_SGD_Hard_8k
3
ER_Adam_Hard_2k
3
ER_SGD_Hard_2k
3
ER_Adam_Hard_32k
5
ER_SGD_Hard_32k
6


Evaluation steps:




eval_10
1
eval_200
1
eval_2
1
eval_50
3


Testing