Skip to main content

Kastan's group workspace

Aug-05__13:54

What makes this group special?
Tags

gpt2_PP2_TP4_2d

Notes
Tags
Aug-05__13:54
BATCH_SIZE32
MICRO_BATCH_SIZE=4
NUM_EPOCHS=20
NUM_MICRO_BATCHES=8
PP=2
SLURM=513916
TP=4
WORLD_SIZE=8
Author
State
Crashed
Start time
August 5th, 2022 6:55:24 PM
Runtime
2m 15s
Tracked hours
29s
Run path
kastan/LLM-Distributed-Quantization/jtjcwm6i
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/gpt2_PP2_TP4_2d.py --host gpub076 --port 29500 --world_size 8 --rank 3
System Hardware
CPU count64
GPU count4
GPU typeNVIDIA A40
W&B CLI Version
0.13.0
Config

Config parameters are your model's inputs. Learn more

  • {} 26 keys
    • 32
    • 1
    • "col_ai_quant"
    • "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
    • {} 1 key
      • "AMP_TYPE.NAIVE"
    • "titans.model.gpt.gpt.gpt2_medium"
    • 1
    • 0.00015
    • {} 1 key
      • "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
    • 4
    • {} 4 keys
      • true
      • "torch.float16"
      • 1,024
      • 50,304
    • 20
    • "4"
    • 8
    • {} 2 keys
      • 0.00015
      • 0.01
    • {} 2 keys
      • 2
      • {} 2 keys
        • "2d"
        • 4
    • 2
    • {} 4 keys
      • 8
      • true
      • [] 3 items
        • 4
        • 1,024
        • 3,072
      • "colossalai.engine.schedule._pipeline_schedule.PipelineSchedule"
    • 1,024
    • "2d"
    • 4
    • 32
    • "8"
    • 50,304
    • 1
    • 0.01
Summary

Summary metrics are your model's outputs. Learn more

  • {} 3 keys
    • 0
    • null
    • 0.000025
Artifact Outputs

This run produced these artifacts as outputs. Total: 1. Learn more

Loading...