byte2word module

Created on March 22|Last edited on April 30
Comment
ch
Todosmean pooling baseline
more layers in byte2word module 
more heads in byte-encoder
charformer baseline
charaware baseline
Blood, toil, tears and sweatb2w module doesn't work at all on small dataset(wikitex)
self-attn in word-decoder not necessary helps(performance slightly degraded)
more heads in byte encoder not really helps
mean-pooling is inferior, hooray to seq2seq b2w
 Chronicles2022-4-30
﻿
﻿
2022-4-29 Doing bugfixes and cleanups
﻿
﻿
2022-4-1 # CharAware
﻿
﻿
2022-3-31 # Charformer, NaN
﻿
﻿
2022-3-27 # Baseline selection - CharAware Embedding & Charformer
Both baselines need improvisation.
﻿
CharAware Embeddingmodernize hyperparams
parallel small conv layers
Charfromerupsampling method (not mentioned in og paper)
subword span masking 
﻿
2022-3-25 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB)
﻿
﻿
2022-3-23 # byte-encoder(16heads, mean pooling) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget) on 1x3090
﻿
﻿
2022-3-23 # b2w module(encoder 1 layer, 8 head, decoder 1 layer, 16 heads, w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited
﻿
﻿
2022-3-22 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ and w/o self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited(24h budget)
﻿
﻿
2022-3-21 # b2w module(encoder 1 layer, 1 head, decoder 1 layer, 16 heads, w/ self-attention) + bert, pretrained on enwiki+bookcorpus(20GB), training early-exited
﻿
﻿
﻿
Add a comment