Skip to main content

Kastan's group workspace

Aug-05__13:58

What makes this group special?
Tags

gpt2_PP2_TP4_2d

Notes
Tags
Aug-05__13:58
BATCH_SIZE32
MICRO_BATCH_SIZE=4
NUM_EPOCHS=20
NUM_MICRO_BATCHES=8
PP=2
SLURM=513923
TP=4
WORLD_SIZE=16
Author
State
Crashed
Start time
August 5th, 2022 6:59:45 PM
Runtime
28m 3s
Tracked hours
-
Run path
kastan/LLM-Distributed-Quantization/3nsgkzdv
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/gpt2_PP2_TP4_2d.py --host gpub007 --port 29500 --world_size 16 --rank 6
System Hardware
CPU count64
GPU count4
GPU typeNVIDIA A40
W&B CLI Version
0.13.0
Config

Config parameters are your model's inputs. Learn more

  • {} 26 keys
    • 32
    • 1
    • "col_ai_quant"
    • "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
    • {} 1 key
      • "AMP_TYPE.NAIVE"
    • "titans.model.gpt.gpt.gpt2_medium"
    • 1
    • 0.00015
    • {} 1 key
      • "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
    • 4
    • {} 4 keys
      • true
      • "torch.float16"
      • 1,024
      • 50,304
    • 20
    • "4"
    • 8
    • {} 2 keys
      • 0.00015
      • 0.01
    • {} 2 keys
      • 2
      • {} 2 keys
        • "2d"
        • 4
    • 2
    • {} 4 keys
      • 8
      • true
      • [] 3 items
        • 4
        • 1,024
        • 3,072
      • "colossalai.engine.schedule._pipeline_schedule.PipelineSchedule"
    • 1,024
    • "2d"
    • 4
    • 32
    • "16"
    • 50,304
    • 1
    • 0.01
Summary

Summary metrics are your model's outputs. Learn more

No summary metrics saved for this run.

Check the summary metrics documentation for more information.

Artifact Outputs

This run produced these artifacts as outputs. Total: 1. Learn more

Loading...