Skip to main content

byte2word module

Created on March 22|Last edited on April 30
ch

Todos

    • mean pooling baseline
    • more layers in byte2word module
    • more heads in byte-encoder
    • charformer baseline
    • charaware baseline

Blood, toil, tears and sweat

  • b2w module doesn't work at all on small dataset(wikitex)
  • self-attn in word-decoder not necessary helps(performance slightly degraded)
  • more heads in byte encoder not really helps
  • mean-pooling is inferior, hooray to seq2seq b2w

Chronicles

2022-4-30


2022-4-29 Doing bugfixes and cleanups


2022-4-1 # CharAware


2022-3-31 # Charformer, NaN


2022-3-27 # Baseline selection - CharAware Embedding & Charformer
Both baselines need improvisation.


CharAware Embedding

  • modernize hyperparams
  • parallel small conv layers

Charfromer

  • upsampling method (not mentioned in og paper)
  • subword span masking

2022-3-25 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB)


2022-3-23 # byte-encoder(16heads, mean pooling) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget) on 1x3090


2022-3-23 # b2w module(encoder 1 layer, 8 head, decoder 1 layer, 16 heads, w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited


2022-3-22 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget)


2022-3-21 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited