Skip to main content

Chilli's group workspace

Timestamps visible
2024-06-26 14:25:13
        "stage": 1,
2024-06-26 14:25:13
        "allgather_partitions": true,
2024-06-26 14:25:13
        "allgather_bucket_size": 1.260000e+09,
2024-06-26 14:25:13
        "overlap_comm": true,
2024-06-26 14:25:13
        "reduce_scatter": true,
2024-06-26 14:25:13
        "reduce_bucket_size": 1.260000e+09,
2024-06-26 14:25:13
        "contiguous_gradients": true,
2024-06-26 14:25:13
        "cpu_offload": false
2024-06-26 14:25:13
    }
2024-06-26 14:25:13
}
2024-06-26 14:25:13
Time to load utils op: 0.0005292892456054688 seconds
2024-06-26 14:25:13
 > number of parameters on model parallel rank 0: 128217088
2024-06-26 14:25:13
 > total params: 128,217,088
2024-06-26 14:25:13
>>> muP Coord Check: Running Model with width: 1024 on seed: 300000
2024-06-26 14:25:13
>>> muP Coord Check: mup_width_multiplier set to 4.0
2024-06-26 14:26:39
[2024-06-26 14:26:39,286] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=0, lr=[0.0025, 0.01], mom=[[0.9, 0.95], [0.9, 0.95]]
2024-06-26 14:26:39
[2024-06-26 14:26:39,303] [INFO] [timer.py:215:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=7.3524728543693705, CurrSamplesPerSec=7.2924598942553684, MemAllocated=0.7GB, MaxMemAllocated=5.44GB
2024-06-26 14:26:49
Saved coord check plots... exiting