byte2word module
Created on March 22|Last edited on April 30
Comment
ch
Todos
- mean pooling baseline
- more layers in byte2word module
- more heads in byte-encoder
- charformer baseline
- charaware baseline
Blood, toil, tears and sweat
- b2w module doesn't work at all on small dataset(wikitex)
- self-attn in word-decoder not necessary helps(performance slightly degraded)
- more heads in byte encoder not really helps
- mean-pooling is inferior, hooray to seq2seq b2w
Chronicles
2022-4-30
2022-4-29 Doing bugfixes and cleanups
2022-4-1 # CharAware
2022-3-31 # Charformer, NaN
2022-3-27 # Baseline selection - CharAware Embedding & Charformer
Both baselines need improvisation.
CharAware Embedding
- modernize hyperparams
- parallel small conv layers
Charfromer
- upsampling method (not mentioned in og paper)
- subword span masking
2022-3-25 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB)
2022-3-23 # byte-encoder(16heads, mean pooling) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget) on 1x3090
2022-3-23 # b2w module(encoder 1 layer, 8 head, decoder 1 layer, 16 heads, w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited
2022-3-22 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget)
2022-3-21 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited
Add a comment