to visualize data in this line chart.
Dependency Parser GCN
Method and Results
All models were trained on GQA-train-balanced and evaluated on GQA-val-balanced. Every model has the same underlying structure, taking in a dependency graph of each question, and using 300-dim GloVe embeddings as the features for each node. No images or image-derived data was used, meaning we would generally expect poor performance, however accuracy falls 8% short of global prior results reported in Hudson and Manning's GQA paper - it's almost impressive that the model is this bad, given the current successes of depencency parsing in the VQA field.
Note that the graphs below log loss at each logging step, and accumulate correct/total counts over each epoch, hence the accuracy jumps up at the start of each epoch.
to visualize data in this line chart.
Discussion
Despite the overall poor performance of the models, I learned the importance of weight decay for GCNs in an text-processing context, which will come in handy when optimisaing parameters later, as well as a few techniques for parameter optimisation and performance visualisation.
The next obvious steps are to:
- Investigate the effects of GATs over GCNs
- Investigate how layer structure of GCNs affect performance
- Try learning embeddings for the GCN
- Try a different pre-trained dependency parser; the
ewttrained dependency parser may not cover some vocabulary. Inspecting some samples manually, the dependency parser output is sometimes a bit off. - Incorporate image signals into the training data to address the true VQA challenge of multimodal fusion,
Experimenting with a larger GCN of size (300, 600, 900, 1200, 1500, 1843), I found that the existing models weren't actually big enough to capture the complexity of the problem at hand. This is clearly shown in the graph below, a comparison of the best smaller GCN run with the larger GCN. Note that both runs were performed on different machines, however we can expect the larger GCN to take about 1.6-1.7 times longer to train than the smaller ones.