HINT experimental report
Created on May 29|Last edited on June 16
Comment
Joint learning of perception, syntax, semantics
symbol
35
image
35
For transformer, relative position encoding is much better than absolute position encoding.
Sharing parameters across layers (universal transformer) can further improve the accuracy.
Transformer models have better accuracy than RNN models. TRAN.relative_universal (universal transformer with relative positional encoding) has higher accuracy than LSTM_attn on I, SS, and LS, but similar accuracy on SL and LL.
Relative positional encoding is very important for transformer to generalize to longer expressions (LS)
Few-shot learning and generalization
fewshot
36
max_op_train_xy
22
max_op_train_abcd
44
TRAN.relative_universal has much better performance than LSTM_attn in the few-shot learning experiments and the performance gap mainly comes from the test subsets requiring generalization on syntax and semantics.
LSTM_attn finetuning use lr=0.001 is best.
Sweep: jqd53oz9 1
6
Sweep: jqd53oz9 2
0
Parameter sweeps
For Transformer, hid_dim is the most important parameter than emb_dim and nhead.
Sweep: g86a824r 1
0
Sweep: g86a824r 2
50
For Transformer, the number of encoder layers is more important than decoder layers.
Sweep: xn3zyit4 1
9
Add a comment