Skip to main content
bgiddwani
Projects
Hi_Pretraining
Reports
H100 vs A100 & BF16 vs FP8
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
H100 vs A100 & BF16 vs FP8
Hindi Data Pretraining on Llama2 7B
Bharat Giddwani
Created on December 1
|
Last edited on December 1
Comment
Loss and Train Time
global_step
global_step
0
500
1k
1.5k
2k
Step
500
1000
1500
2000
reduced_train_loss
reduced_train_loss
0
500
1k
1.5k
2k
Step
4
6
8
10
12
14
grad_norm
grad_norm
0
500
1k
1.5k
2k
Step
50
100
150
200
train_backward_timing in s
train_backward_timing in s
0
500
1k
1.5k
2k
Step
0.00005
0.0001
0.00015
0.0002
train_step_timing in s
train_step_timing in s
0
500
1k
1.5k
2k
Step
1
1.5
2
consumed_samples
consumed_samples
0
500
1k
1.5k
2k
Step
20000
40000
60000
Run set
3
Run set
3
Run set
3
Add a comment