qLoRA memory consumption

Comparison of Mistral 7B on axolotl memory consumption for different configs
Created on January 7|Last edited on January 7
Comment
﻿
Using Mistral 7B on Alpaca sampleWe use the same effective batch size in all experiments:
Effective_bs=micro_bs  ×  grad_accumulation_steps=16\displaystyle{\text{Effective\_bs} = \text{micro\_bs} \;\times \; \text{grad\_accumulation\_steps}=16}  Effective_bs=micro_bs×grad_accumulation_steps=16﻿
If micro_bs=1 then grad_accum=16,
If micro_bs=4 then grad_accum=4
This ensures the same number of steps and updates.
﻿
﻿
﻿
​
diff only
bs=1, gc=true, target=linear+head
bs=1, gc=true, target=linear
bs=1, gc=false, target=linear
bs=1, gc=false, target=q, v
bs=4, gc=true, target=linear
meta
runtime
runtime
19m 31s
19m 26s
13m 33s
11m 40s
15m 11s
config
axolotl_config
gradient_accumulation_steps
gradient_accumulation_steps
16
16
16
16
4
gradient_checkpointing
gradient_checkpointing
true
true
false
false
true
lora_modules_to_save
lora_modules_to_save
["lm_head"]
-
-
-
["lm_head"]
lora_target_linear
lora_target_linear
true
true
true
-
true
lora_target_modules
lora_target_modules
-
-
-
["q_proj","v_proj"]
-
micro_batch_size
micro_batch_size
1
1
1
1
4
gradient_checkpointing_kwargs
use_reentrant
use_reentrant
false
false
-
-
false
eval_accumulation_steps
eval_accumulation_steps
16
16
16
16
4
gradient_accumulation_steps
gradient_accumulation_steps
16
16
16
16
4
gradient_checkpointing
gradient_checkpointing
true
true
false
false
true
logging_dir
logging_dir
./qlora-out/runs/Jan07_12-32-12_1996c6d5ac6c
./qlora-out/runs/Jan07_12-09-04_1996c6d5ac6c
./qlora-out/runs/Jan07_11-05-34_1996c6d5ac6c
./qlora-out/runs/Jan05_17-13-12_64dcd0883459
./qlora-out/runs/Jan05_15-43-47_64dcd0883459
per_device_eval_batch_size
per_device_eval_batch_size
1
1
1
1
4
per_device_train_batch_size
per_device_train_batch_size
1
1
1
1
4
sample_packing_efficiency
sample_packing_efficiency
0.85
0.86
0.85
0.85
0.97
sample_packing_seq_len_multiplier
sample_packing_seq_len_multiplier
1
1
1
1
4
summary
_wandb
train
train/train_samples_per_second
train/train_samples_per_second
bs=1, gc=true, target=linear+headbs=1, gc=true, target=linearbs=1, gc=false, target=linearbs=1, gc=false, target=q, vbs=4, gc=true, target=linear0.00.10.20.30.40.50.60.70.80.91.01.1
Run set5
﻿
﻿
Add a comment