Run set 1:
Grey: Old GAT
Purple: GAT with the augmentation (mean pool the attentions at each layer, mean aggregate edges) - but idk why they have more parameters!
Green: GraphSAGE
Run et 2: composer task, as the data is better we have better comparison
SAGE learns faster and have less overfitting.
Conclusion: maybe GAT is not making things better.