qLoRA memory consumption
Comparison of Mistral 7B on axolotl memory consumption for different configs
Created on January 7|Last edited on January 7
Comment
Using Mistral 7B on Alpaca sample
We use the same effective batch size in all experiments:
- If micro_bs=1 then grad_accum=16,
- If micro_bs=4 then grad_accum=4
This ensures the same number of steps and updates.
meta
19m 31s
19m 26s
13m 33s
11m 40s
15m 11s
config
axolotl_config
16
16
16
16
4
true
true
false
false
true
["lm_head"]
-
-
-
["lm_head"]
true
true
true
-
true
-
-
-
["q_proj","v_proj"]
-
1
1
1
1
4
gradient_checkpointing_kwargs
false
false
-
-
false
16
16
16
16
4
16
16
16
16
4
true
true
false
false
true
./qlora-out/runs/Jan07_12-32-12_1996c6d5ac6c
./qlora-out/runs/Jan07_12-09-04_1996c6d5ac6c
./qlora-out/runs/Jan07_11-05-34_1996c6d5ac6c
./qlora-out/runs/Jan05_17-13-12_64dcd0883459
./qlora-out/runs/Jan05_15-43-47_64dcd0883459
1
1
1
1
4
1
1
1
1
4
0.85
0.86
0.85
0.85
0.97
1
1
1
1
4
summary
_wandb
train
Run set
5
Add a comment