Improving GNN architectures

A trail to despair
Created on March 2|Last edited on March 3
Comment
From the matrix and sequence we know this task should be achieving >90% in the set. Right now it's around 70-80% (with 2M params). Try to improve it! 
TODOs:
	Hetero layer aggregation with stack and embedding
	Try other layers than GAT (GIN)
	The features!
	Residual connections
﻿
Dark Green: Old best. only 2 layers GAT and 107K params 
Light green: doesn't do better with 2 more layers and dim*2. 2M params
Pink: back to two layers (also back to SAGE), linear layer aggregation (7*hid_dim -> hid_dim) of each edge type, 568K (doesn't seem to learn at all!!!! - okay it does, just much later. and it struggles as well.)
Magenta: cancelled the edge linear aggregation... to make sure nothing goes wrong (107K) Now this is SAGE, also comparable with the dark green one. 
Grey: changed the edge_agg into individual linear embedding layer (reference). also add one extra layer with residual connection (reference). (adding a copy of original before conv) This one not only doesn't learn, but gets immediately worse from start. What's weird is, it learns better on the train set than others, and looks like a dramatic overfitting trial from even the first epoch. (1M)
Purple: give up and cancelled all edge_agg. Just use 'max'. also gived up the residual connection. but change the layer to GIN. 573K. 
﻿
﻿
﻿
Section 1﻿
Run set0
﻿
﻿
Add a comment