8-bit Adam vs 32-bit Adam

A comparison between training using 8-bit Adam and 32-bit Adam. I had a slight error in the recall calculations for each discourse type, so those scores are not shown. This was only 1 run, so the results are not conclusive. I've had more of a difference in training times when I've done this on other projects, so your mileage may vary!

Nicholas Broad

Created on January 17|Last edited on January 17

Comment

﻿
Training parameters and results﻿

32bit8bit
model nameallenai/longformer-large-4096allenai/longformer-large-4096
max seq length20482048
runtime72677232
seed1818
per_device_train_batch_size11
fp16TRUETRUE
weight_decay0.010.01
learning_rate0.000030.00003
warmup_ratio0.10.1
lr_scheduler_typelinearlinear
num_train_epochs11
gradient_accumulation_steps88
eval/accuracy0.79497713192017400.794268376382815
eval/f10.237182630198896000.2361598893439520
eval/precision0.182363546959515000.18293683347005700
eval/recall0.339124603755182000.33305900999756200
eval/Claim_CV_F10.48690.4844
eval/Claim_CV_Precision0.63731319299373300.6464445549477870
eval/Concluding Statement_CV_F10.75310.7632
eval/Concluding Statement_CV_Precision0.75815523059617600.7638991845811710
eval/Counterclaim_CV_F10.4820.4796
eval/Counterclaim_CV_Precision0.53691983122362900.5211693548387100
eval/Evidence_CV_F10.61580.622
eval/Evidence_CV_Precision0.73431181668414000.7293090909090910
eval/Lead_CV_F10.79680.8006
eval/Lead_CV_Precision0.80152256661228900.800644814615798
eval/Overall_CV_F10.59457142857142900.5972
eval/Position_CV_F10.65650.6494
eval/Position_CV_Precision0.75505902192242800.7483079526226740
eval/Rebuttal_CV_F10.37090.3812
eval/Rebuttal_CV_Precision0.4800724637681160.48586572438162500
﻿
Plotted Results﻿
train/loss, eval/loss
train/loss, eval/loss
0.20.40.60.81train/epoch0.811.21.41.6
longformer-2k-32bit-test   train/loss
longformer-2k-32bit-test   eval/loss
longformer-2k-8bit-test   train/loss
longformer-2k-8bit-test   eval/loss
Discourse type F1 scores
Discourse type F1 scores
longformer-2k-32bit-test eval/Claim_CV_F1longformer-2k-8bit-test eval/Claim_CV_F1longformer-2k-32bit-test eval/Concluding Statement_CV_F1longformer-2k-8bit-test eval/Concluding Statement_CV_F1longformer-2k-32bit-test eval/Evidence_CV_F1longformer-2k-8bit-test eval/Evidence_CV_F1longformer-2k-32bit-test eval/Counterclaim_CV_F1longformer-2k-8bit-test eval/Counterclaim_CV_F1longformer-2k-32bit-test eval/Lead_CV_F1longformer-2k-8bit-test eval/Lead_CV_F1longformer-2k-32bit-test eval/Overall_CV_F1longformer-2k-8bit-test eval/Overall_CV_F1longformer-2k-32bit-test eval/Position_CV_F1longformer-2k-8bit-test eval/Position_CV_F1longformer-2k-32bit-test eval/Rebuttal_CV_F1longformer-2k-8bit-test eval/Rebuttal_CV_F10.00.20.40.60.8
﻿
﻿
﻿
Run set2
﻿
﻿

	32bit	8bit
model name	allenai/longformer-large-4096	allenai/longformer-large-4096
max seq length	2048	2048
runtime	7267	7232
seed	18	18
per_device_train_batch_size	1	1
fp16	TRUE	TRUE
weight_decay	0.01	0.01
learning_rate	0.00003	0.00003
warmup_ratio	0.1	0.1
lr_scheduler_type	linear	linear
num_train_epochs	1	1
gradient_accumulation_steps	8	8
eval/accuracy	0.7949771319201740	0.794268376382815
eval/f1	0.23718263019889600	0.2361598893439520
eval/precision	0.18236354695951500	0.18293683347005700
eval/recall	0.33912460375518200	0.33305900999756200
eval/Claim_CV_F1	0.4869	0.4844
eval/Claim_CV_Precision	0.6373131929937330	0.6464445549477870
eval/Concluding Statement_CV_F1	0.7531	0.7632
eval/Concluding Statement_CV_Precision	0.7581552305961760	0.7638991845811710
eval/Counterclaim_CV_F1	0.482	0.4796
eval/Counterclaim_CV_Precision	0.5369198312236290	0.5211693548387100
eval/Evidence_CV_F1	0.6158	0.622
eval/Evidence_CV_Precision	0.7343118166841400	0.7293090909090910
eval/Lead_CV_F1	0.7968	0.8006
eval/Lead_CV_Precision	0.8015225666122890	0.800644814615798
eval/Overall_CV_F1	0.5945714285714290	0.5972
eval/Position_CV_F1	0.6565	0.6494
eval/Position_CV_Precision	0.7550590219224280	0.7483079526226740
eval/Rebuttal_CV_F1	0.3709	0.3812
eval/Rebuttal_CV_Precision	0.480072463768116	0.48586572438162500

Add a comment