Skip to main content

8-bit Adam vs 32-bit Adam

A comparison between training using 8-bit Adam and 32-bit Adam. I had a slight error in the recall calculations for each discourse type, so those scores are not shown. This was only 1 run, so the results are not conclusive. I've had more of a difference in training times when I've done this on other projects, so your mileage may vary!
Created on January 17|Last edited on January 17

Training parameters and results


32bit8bit
model nameallenai/longformer-large-4096allenai/longformer-large-4096
max seq length20482048
runtime72677232
seed1818
per_device_train_batch_size11
fp16TRUETRUE
weight_decay0.010.01
learning_rate0.000030.00003
warmup_ratio0.10.1
lr_scheduler_typelinearlinear
num_train_epochs11
gradient_accumulation_steps88
eval/accuracy0.79497713192017400.794268376382815
eval/f10.237182630198896000.2361598893439520
eval/precision0.182363546959515000.18293683347005700
eval/recall0.339124603755182000.33305900999756200
eval/Claim_CV_F10.48690.4844
eval/Claim_CV_Precision0.63731319299373300.6464445549477870
eval/Concluding Statement_CV_F10.75310.7632
eval/Concluding Statement_CV_Precision0.75815523059617600.7638991845811710
eval/Counterclaim_CV_F10.4820.4796
eval/Counterclaim_CV_Precision0.53691983122362900.5211693548387100
eval/Evidence_CV_F10.61580.622
eval/Evidence_CV_Precision0.73431181668414000.7293090909090910
eval/Lead_CV_F10.79680.8006
eval/Lead_CV_Precision0.80152256661228900.800644814615798
eval/Overall_CV_F10.59457142857142900.5972
eval/Position_CV_F10.65650.6494
eval/Position_CV_Precision0.75505902192242800.7483079526226740
eval/Rebuttal_CV_F10.37090.3812
eval/Rebuttal_CV_Precision0.4800724637681160.48586572438162500


Plotted Results


0.20.40.60.81train/epoch0.811.21.41.6
longformer-2k-32bit-test eval/Claim_CV_F1longformer-2k-8bit-test eval/Claim_CV_F1longformer-2k-32bit-test eval/Concluding Statement_CV_F1longformer-2k-8bit-test eval/Concluding Statement_CV_F1longformer-2k-32bit-test eval/Evidence_CV_F1longformer-2k-8bit-test eval/Evidence_CV_F1longformer-2k-32bit-test eval/Counterclaim_CV_F1longformer-2k-8bit-test eval/Counterclaim_CV_F1longformer-2k-32bit-test eval/Lead_CV_F1longformer-2k-8bit-test eval/Lead_CV_F1longformer-2k-32bit-test eval/Overall_CV_F1longformer-2k-8bit-test eval/Overall_CV_F1longformer-2k-32bit-test eval/Position_CV_F1longformer-2k-8bit-test eval/Position_CV_F1longformer-2k-32bit-test eval/Rebuttal_CV_F1longformer-2k-8bit-test eval/Rebuttal_CV_F10.00.20.40.60.8



Run set
2