Skip to main content

Kastan's group workspace

Aug-05__12:37

What makes this group special?
Tags

q_allBF16_gpt_8B_PP2_TP8_3d

Notes
Tags
Aug-05__12:37
BATCH_SIZE16
MICRO_BATCH_SIZE=4
NUM_EPOCHS=3
NUM_MICRO_BATCHES=16
PP=2
SLURM=513717
TP=8
WORLD_SIZE=64
Author
State
Crashed
Start time
August 5th, 2022 5:38:42 PM
Runtime
12s
Tracked hours
10s
Run path
kastan/LLM-Distributed-Quantization/uc90j7ew
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/q_allBF16_gpt_8B_PP2_TP8_3d.py --host gpub036 --port 29500 --world_size 64 --rank 36
System Hardware
CPU count64
GPU count4
GPU typeNVIDIA A40
W&B CLI Version
0.13.0
Config

Config parameters are your model's inputs. Learn more

  • {} 29 keys
    • 16
    • 1
    • "col_ai_quant"
    • "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
    • {} 1 key
      • "AMP_TYPE.NAIVE"
    • 4
    • 0.00015
    • "./quant_gpt2_3d_tp8_bs16_lr0.00015/"
    • {} 1 key
      • "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
    • 4
    • {} 7 keys
      • {} 4 keys
        • "torch.float16"
        • "torch.float16"
        • "torch.float16"
        • "torch.float16"
      • 3
      • "4"
      • 16
      • {} 2 keys
        • 0.00015
        • 0.01
      • {} 2 keys
        • 2
        • {} 2 keys
          • "3d"
          • 8
      • 2
      • "titans.model.quant_gpt.quant_gpt.quant_gpt2_8B"
      • "titans.model.quant_gpt.quant_gpt.quant_gpt2_xl"
      • {} 4 keys
        • 16
        • true
        • [] 3 items
          • 4
          • 1,024
          • 3,072
        • "colossalai.engine.schedule._pipeline_schedule.PipelineSchedule"
      • 1,024
      • "3d"
      • 8
      • 64
      • "64"
      • 50,304
      • 1
      • 0.01
    Summary

    Summary metrics are your model's outputs. Learn more

    • {} 0 keys