8-bit Adam vs 32-bit Adam
A comparison between training using 8-bit Adam and 32-bit Adam. I had a slight error in the recall calculations for each discourse type, so those scores are not shown. This was only 1 run, so the results are not conclusive. I've had more of a difference in training times when I've done this on other projects, so your mileage may vary!
Created on January 17|Last edited on January 17
Comment
Training parameters and results
32bit | 8bit | |
---|---|---|
model name | allenai/longformer-large-4096 | allenai/longformer-large-4096 |
max seq length | 2048 | 2048 |
runtime | 7267 | 7232 |
seed | 18 | 18 |
per_device_train_batch_size | 1 | 1 |
fp16 | TRUE | TRUE |
weight_decay | 0.01 | 0.01 |
learning_rate | 0.00003 | 0.00003 |
warmup_ratio | 0.1 | 0.1 |
lr_scheduler_type | linear | linear |
num_train_epochs | 1 | 1 |
gradient_accumulation_steps | 8 | 8 |
eval/accuracy | 0.7949771319201740 | 0.794268376382815 |
eval/f1 | 0.23718263019889600 | 0.2361598893439520 |
eval/precision | 0.18236354695951500 | 0.18293683347005700 |
eval/recall | 0.33912460375518200 | 0.33305900999756200 |
eval/Claim_CV_F1 | 0.4869 | 0.4844 |
eval/Claim_CV_Precision | 0.6373131929937330 | 0.6464445549477870 |
eval/Concluding Statement_CV_F1 | 0.7531 | 0.7632 |
eval/Concluding Statement_CV_Precision | 0.7581552305961760 | 0.7638991845811710 |
eval/Counterclaim_CV_F1 | 0.482 | 0.4796 |
eval/Counterclaim_CV_Precision | 0.5369198312236290 | 0.5211693548387100 |
eval/Evidence_CV_F1 | 0.6158 | 0.622 |
eval/Evidence_CV_Precision | 0.7343118166841400 | 0.7293090909090910 |
eval/Lead_CV_F1 | 0.7968 | 0.8006 |
eval/Lead_CV_Precision | 0.8015225666122890 | 0.800644814615798 |
eval/Overall_CV_F1 | 0.5945714285714290 | 0.5972 |
eval/Position_CV_F1 | 0.6565 | 0.6494 |
eval/Position_CV_Precision | 0.7550590219224280 | 0.7483079526226740 |
eval/Rebuttal_CV_F1 | 0.3709 | 0.3812 |
eval/Rebuttal_CV_Precision | 0.480072463768116 | 0.48586572438162500 |
Plotted Results
Run set
2
Add a comment