Skip to main content

Kastan's group workspace

Aug-05__19:41

What makes this group special?
Tags
Notes
Tags
Aug-05__19:41
BATCH_SIZE1280
NUM_EPOCHS=60
NUM_MICRO_BATCHES=8
SLURM=513928
TP=16
WORLD_SIZE=32
Author
State
Crashed
Start time
August 6th, 2022 12:43:08 AM
Runtime
27m 52s
Tracked hours
-
Run path
kastan/LLM-Distributed-Quantization/xahx7ef3
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/gpt2_8b_2p5d_256.py --host gpua010 --port 29500 --world_size 32 --rank 25
System Hardware
CPU count64
GPU count4
GPU typeNVIDIA A100-SXM4-40GB
W&B CLI Version
0.13.0
Config

Config parameters are your model's inputs. Learn more

  • {} 25 keys
    • 1,280
    • 1
    • "col_ai_quant"
    • "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
    • {} 1 key
      • "AMP_TYPE.NAIVE"
    • "titans.model.gpt.gpt.gpt2_8B"
    • "titans.model.gpt.gpt.gpt2_xl"
    • 1
    • 0.00015
    • "./gpt2_2.5d_tp16_bs1280_lr0.00015_accum1_clip_grad1.0/"
    • {} 1 key
      • "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
    • {} 5 keys
      • 60
      • "4"
      • 8
      • {} 2 keys
        • 0.00015
        • 0.01
      • {} 2 keys
        • 2
        • {} 3 keys
          • 1
          • "2.5d"
          • 16
      • 1,024
      • "2.5d"
      • 16
      • 1,280
      • "32"
      • 50,304
      • 21
      • 0.01
    Summary

    Summary metrics are your model's outputs. Learn more

    No summary metrics saved for this run.

    Check the summary metrics documentation for more information.

    Artifact Outputs

    This run produced these artifacts as outputs. Total: 1. Learn more

    Loading...