nbroad

8-bit Adam vs 32-bit Adam tests

I ran longformer base for 3 epochs on the Feedback Prize data with 3 different seeds and 3 configurations: 1. 8-bit Adam and 8-bit Embeddings 2. 8-bit Adam and 32-bit Embeddings 3. 32-bit Adam and 32-bit Embeddings Each configuration used the same hyperparameters and data - the only difference being the seed. For full hyperparameter details, see the table at the very end.

nbroad

2022-01-25

4 years ago

8-bit Adam vs 32-bit Adam

A comparison between training using 8-bit Adam and 32-bit Adam. I had a slight error in the recall calculations for each discourse type, so those scores are not shown. This was only 1 run, so the results are not conclusive. I've had more of a difference in training times when I've done this on other projects, so your mileage may vary!

nbroad

2022-01-17

4 years ago

[Feedback Prize] Bigbird-base NER fine-tuning

It turns out I was calculating my F1 score incorrectly so now my CV values are much higher. Thus, there aren't as many runs now.

nbroad

2021-12-26

4 years ago

BigBird Base In-Domain Pre-training Results

Using Masked Language Modeling to adapt the model to the domain of high school essays.

nbroad

2021-12-20