eleutherai

Atmallen8's group workspace

Group: v2-70M-deduped_16a3198e

198

231

1-2

of 2

Timestamps visible

2023-02-23 00:39:51

File "/fsx/dashiell/gpt-neox/megatron/model/gpt2_model.py", line 123, in __init__

2023-02-23 00:39:51

super().__init__(

2023-02-23 00:39:51

File "/fsx/dashiell/miniconda3/envs/neox_base/lib/python3.9/site-packages/deepspeed/runtime/pipe/module.py", line 195, in __init__

2023-02-23 00:39:51

self._build()

2023-02-23 00:39:51

File "/fsx/dashiell/miniconda3/envs/neox_base/lib/python3.9/site-packages/deepspeed/runtime/pipe/module.py", line 246, in _build

2023-02-23 00:39:51

module = layer.build()

2023-02-23 00:39:51

File "/fsx/dashiell/miniconda3/envs/neox_base/lib/python3.9/site-packages/deepspeed/runtime/pipe/module.py", line 69, in build

2023-02-23 00:39:51

return self.typename(*self.module_args, **self.module_kwargs)

2023-02-23 00:39:51

File "/fsx/dashiell/gpt-neox/megatron/model/transformer.py", line 618, in __init__

2023-02-23 00:39:51

self.attention = ParallelSelfAttention(

2023-02-23 00:39:51

File "/fsx/dashiell/gpt-neox/megatron/model/transformer.py", line 219, in __init__

2023-02-23 00:39:51

self.query_key_value = mpu.ColumnParallelLinear(

2023-02-23 00:39:51

File "/fsx/dashiell/gpt-neox/megatron/mpu/layers.py", line 412, in __init__

2023-02-23 00:39:51

torch.empty(

2023-02-23 00:39:51

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 39.59 GiB total capacity; 50.00 MiB already allocated; 19.12 MiB free; 50.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF