Skip to main content

Atmallen8's group workspace

Timestamps visible
2023-02-24 14:41:55
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO Channel 00/0 : 60[101c0] -> 62[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
2023-02-24 14:41:55
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO Channel 02/0 : 30[101c0] -> 62[101c0] [receive] via NET/AWS Libfabric/0/GDRDMA
2023-02-24 14:41:55
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO Channel 02/0 : 62[101c0] -> 30[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
2023-02-24 14:41:55
wandb: 429 encountered (Filestream rate limit exceeded, retrying in 2.4472460678036136 seconds), retrying request
2023-02-24 14:41:55
wandb: 429 encountered (Filestream rate limit exceeded, retrying in 4.109637409088528 seconds), retrying request
2023-02-24 14:41:55
/fsx/gpt-neox/conda/envs/neox_deeperspeed_new/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead
2023-02-24 14:41:55
  warnings.warn(
2023-02-24 14:41:57
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO Channel 00/0 : 62[101c0] -> 60[101c0] [send] via NET/AWS Libfabric/0/GDRDMA
2023-02-24 14:41:57
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO Connected all trees
2023-02-24 14:41:57
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512
2023-02-24 14:41:57
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
2023-02-24 14:41:57
ip-26-0-143-199:1698549:1700131 [0] NCCL INFO comm 0x4ff5e5c0 rank 62 nranks 64 cudaDev 0 busId 101c0 - Init COMPLETE
2023-02-24 14:41:59
wandb: 429 encountered (Filestream rate limit exceeded, retrying in 9.915408356975092 seconds), retrying request