Schoelkopf's group workspace
mamba-tp2-bs16-fusedinner_3yv14en0_7pcsp4pn
What makes this group special?
Tags
ip-10-0-251-210-0
Notes
Author
State
Crashed
Start time
March 13th, 2024 11:25:01 AM
Runtime
20m 30s
Tracked hours
-
Run path
eleutherai/mamba-neox-tp-memsavings/sdndl855
OS
Linux-5.15.0-1037-aws-x86_64-with-glibc2.31
Python version
3.9.18
Git repository
git clone https://github.com/EleutherAI/gpt-neox
Git state
git checkout -b "ip-10-0-251-210-0" 696454f00c71c702a10f5a0e2f28ebbe065e2704
Command
/weka/hailey/mistral-support-neox/gpt-neox/train.py --deepspeed_config eyJ0cmFpbl9iYXRjaF9zaXplIjogMTAyNCwgInRyYWluX21pY3JvX2JhdGNoX3NpemVfcGVyX2dwdSI6IDE2LCAiZ3JhZGllbnRfYWNjdW11bGF0aW9uX3N0ZXBzIjogNCwgIm9wdGltaXplciI6IHsidHlwZSI6ICJBZGFtIiwgInBhcmFtcyI6IHsibHIiOiAwLjAwMDYsICJiZXRhcyI6IFswLjksIDAuOTVdLCAiZXBzIjogMWUtMDh9fSwgImZwMTYiOiB7ImZwMTYiOiB0cnVlLCAiZW5hYmxlZCI6IHRydWUsICJsb3NzX3NjYWxlIjogMCwgImxvc3Nfc2NhbGVfd2luZG93IjogMTAwMCwgImluaXRpYWxfc2NhbGVfcG93ZXIiOiAxMiwgImh5c3RlcmVzaXMiOiAyLCAibWluX2xvc3Nfc2NhbGUiOiAxfSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDEsICJhbGxnYXRoZXJfcGFydGl0aW9ucyI6IHRydWUsICJhbGxnYXRoZXJfYnVja2V0X3NpemUiOiA1MDAwMDAwMDAsICJvdmVybGFwX2NvbW0iOiB0cnVlLCAicmVkdWNlX3NjYXR0ZXIiOiB0cnVlLCAicmVkdWNlX2J1Y2tldF9zaXplIjogNTAwMDAwMDAwLCAiY29udGlndW91c19ncmFkaWVudHMiOiB0cnVlLCAiY3B1X29mZmxvYWQiOiBmYWxzZX0sICJ3YWxsX2Nsb2NrX2JyZWFrZG93biI6IHRydWV9 --megatron_config eyJsYXVuY2hlciI6ICJzbHVybSIsICJub19zc2hfY2hlY2siOiB0cnVlLCAidHJhaW5fYmF0Y2hfc2l6ZSI6IDEwMjQsICJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiAxNiwgImdyYWRpZW50X2FjY3VtdWxhdGlvbl9zdGVwcyI6IDQsICJvcHRpbWl6ZXIiOiB7InR5cGUiOiAiQWRhbSIsICJwYXJhbXMiOiB7ImxyIjogMC4wMDA2LCAiYmV0YXMiOiBbMC45LCAwLjk1XSwgImVwcyI6IDFlLTA4fX0sICJmcDE2IjogeyJmcDE2IjogdHJ1ZSwgImVuYWJsZWQiOiB0cnVlLCAibG9zc19zY2FsZSI6IDAsICJsb3NzX3NjYWxlX3dpbmRvdyI6IDEwMDAsICJpbml0aWFsX3NjYWxlX3Bvd2VyIjogMTIsICJoeXN0ZXJlc2lzIjogMiwgIm1pbl9sb3NzX3NjYWxlIjogMX0sICJ6ZXJvX29wdGltaXphdGlvbiI6IHsic3RhZ2UiOiAxLCAiYWxsZ2F0aGVyX3BhcnRpdGlvbnMiOiB0cnVlLCAiYWxsZ2F0aGVyX2J1Y2tldF9zaXplIjogNTAwMDAwMDAwLCAib3ZlcmxhcF9jb21tIjogdHJ1ZSwgInJlZHVjZV9zY2F0dGVyIjogdHJ1ZSwgInJlZHVjZV9idWNrZXRfc2l6ZSI6IDUwMDAwMDAwMCwgImNvbnRpZ3VvdXNfZ3JhZGllbnRzIjogdHJ1ZSwgImNwdV9vZmZsb2FkIjogZmFsc2V9LCAid2FsbF9jbG9ja19icmVha2Rvd24iOiB0cnVlLCAicHJlY2lzaW9uIjogImZwMTYiLCAibnVtX2xheWVycyI6IDI0LCAiaGlkZGVuX3NpemUiOiA3NjgsICJudW1fYXR0ZW50aW9uX2hlYWRzIjogMTIsICJzZXFfbGVuZ3RoIjogMjA0OCwgIm1heF9wb3NpdGlvbl9lbWJlZGRpbmdzIjogMjA0OCwgIm5vcm0iOiAicm1zbm9ybSIsICJybXNfbm9ybV9lcHNpbG9uIjogMWUtMDUsICJwb3NfZW1iIjogInJvdGFyeSIsICJub193ZWlnaHRfdHlpbmciOiB0cnVlLCAiYXR0ZW50aW9uX2NvbmZpZyI6IFsibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiLCAibWFtYmEiXSwgInNwYXJzaXR5X2NvbmZpZyI6IHt9LCAiYWN0aXZhdGlvbiI6ICJzaWx1IiwgInJvdGFyeV9wY3QiOiAwLjI1LCAib3V0cHV0X2xheWVyX2luaXRfbWV0aG9kIjogInNpbmdsZV9yZXNpZHVhbF9zY2FsZWRfbm9ybWFsIiwgImdwdF9qX3Jlc2lkdWFsIjogdHJ1ZSwgIm1hbWJhX3NlbGVjdGl2ZV9zY2FuX2Z1c2lvbiI6IHRydWUsICJtYW1iYV9jYXVzYWxfY29udl9mdXNpb24iOiB0cnVlLCAibWFtYmFfaW5uZXJfZnVuY19mdXNpb24iOiB0cnVlLCAibHJfZGVjYXlfc3R5bGUiOiAiY29zaW5lIiwgImxyX2RlY2F5X2l0ZXJzIjogMTQzMDAwLCAibWluX2xyIjogNmUtMDUsICJvcHRpbWl6ZXJfdHlwZSI6ICJBZGFtIiwgInplcm9fc3RhZ2UiOiAxLCAiemVyb19yZWR1Y2Vfc2NhdHRlciI6IHRydWUsICJ6ZXJvX2NvbnRpZ3VvdXNfZ3JhZGllbnRzIjogdHJ1ZSwgInplcm9fcmVkdWNlX2J1Y2tldF9zaXplIjogNTAwMDAwMDAwLCAiemVyb19hbGxnYXRoZXJfYnVja2V0X3NpemUiOiA1MDAwMDAwMDAsICJsciI6IDAuMDAwNiwgInRva2VuaXplcl90eXBlIjogIkhGVG9rZW5pemVyIiwgInRyYWluX2RhdGFfcGF0aHMiOiBbIi93ZWthL3BpbGUvcGlsZV8yMEJfdG9rZW5pemVyX3RleHRfZG9jdW1lbnQiXSwgInRlc3RfZGF0YV9wYXRocyI6IFsiL3dla2EvcGlsZS9waWxlXzIwQl90b2tlbml6ZXJfdGV4dF9kb2N1bWVudCJdLCAidmFsaWRfZGF0YV9wYXRocyI6IFsiL3dla2EvcGlsZS9waWxlXzIwQl90b2tlbml6ZXJfdGV4dF9kb2N1bWVudCJdLCAidHJhaW5fZGF0YV93ZWlnaHRzIjogWzEuMF0sICJ2YWxpZF9kYXRhX3dlaWdodHMiOiBbMS4wXSwgInRlc3RfZGF0YV93ZWlnaHRzIjogWzEuMF0sICJkYXRhX2ltcGwiOiAibW1hcCIsICJjb25maWdfZmlsZXMiOiB7Im1hbWJhLTE2MG0ueW1sIjogIntcbiAgXCJwaXBlX3BhcmFsbGVsX3NpemVcIjogMCxcbiAgXCJtb2RlbF9wYXJhbGxlbF9zaXplXCI6IDIsXG5cbiAgXCJudW1fbGF5ZXJzXCI6IDI0LFxuICBcImhpZGRlbl9zaXplXCI6IDc2OCxcbiAgXCJudW1fYXR0ZW50aW9uX2hlYWRzXCI6IDEyLFxuICBcInNlcV9sZW5ndGhcIjogMjA0OCxcbiAgXCJtYXhfcG9zaXRpb25fZW1iZWRkaW5nc1wiOiAyMDQ4LFxuICBcInBvc19lbWJcIjogXCJyb3RhcnlcIixcbiAgXCJyb3RhcnlfcGN0XCI6IDAuMjUsXG4gIFwibm9fd2VpZ2h0X3R5aW5nXCI6IHRydWUsXG4gIFwiZ3B0X2pfcmVzaWR1YWxcIjogdHJ1ZSxcbiAgXCJvdXRwdXRfbGF5ZXJfcGFyYWxsZWxpc21cIjogXCJjb2x1bW5cIixcblxuICBcImF0dGVudGlvbl9jb25maWdcIjogW1tbXCJtYW1iYVwiXSwgMjRdXSxcblxuICAjIFwic2NhbGVkX3VwcGVyX3RyaWFuZ19tYXNrZWRfc29mdG1heF9mdXNpb25cIjogdHJ1ZSxcbiAgIyBcImJpYXNfZ2VsdV9mdXNpb25cIjogdHJ1ZSxcblxuICBcIm1hbWJhX3NlbGVjdGl2ZV9zY2FuX2Z1c2lvblwiOiB0cnVlLFxuICBcIm1hbWJhX2NhdXNhbF9jb252X2Z1c2lvblwiOiB0cnVlLFxuICBcIm1hbWJhX2lubmVyX2Z1bmNfZnVzaW9uXCI6IHRydWUsXG4gIFwibWFtYmFfc2VsZWN0aXZlX2ZwMzJfcGFyYW1zXCI6IHRydWUsXG5cbiAgXCJhY3RpdmF0aW9uXCI6IFwic2lsdVwiLFxuICBcIm5vcm1cIjogXCJybXNub3JtXCIsXG4gIFwicm1zX25vcm1fZXBzaWxvblwiOiAxLjBlLTUsXG5cbiAgXCJvdXRwdXRfbGF5ZXJfaW5pdF9tZXRob2RcIjogXCJzaW5nbGVfcmVzaWR1YWxfc2NhbGVkX25vcm1hbFwiLFxuXG5cbiAgIyBcImluaXRfbWV0aG9kXCI6IFwic21hbGxfaW5pdFwiLFxuICAjIFwib3V0cHV0X2xheWVyX2luaXRfbWV0aG9kXCI6IFwid2FuZ19pbml0XCIsXG5cbiAgXCJvcHRpbWl6ZXJcIjoge1xuICAgIFwidHlwZVwiOiBcIkFkYW1cIixcbiAgICBcInBhcmFtc1wiOiB7XG4gICAgICBcImxyXCI6IDAuMDAwNixcbiAgICAgIFwiYmV0YXNcIjogWzAuOSwgMC45NV0sXG4gICAgICBcImVwc1wiOiAxLjBlLThcbiAgICB9XG4gIH0sXG4gIFwibWluX2xyXCI6IDAuMDAwMDYsXG5cbiAgXCJ6ZXJvX29wdGltaXphdGlvblwiOiB7XG4gICAgXCJzdGFnZVwiOiAxLFxuICAgIFwiYWxsZ2F0aGVyX3BhcnRpdGlvbnNcIjogdHJ1ZSxcbiAgICBcImFsbGdhdGhlcl9idWNrZXRfc2l6ZVwiOiA1MDAwMDAwMDAsXG4gICAgXCJvdmVybGFwX2NvbW1cIjogdHJ1ZSxcbiAgICBcInJlZHVjZV9zY2F0dGVyXCI6IHRydWUsXG4gICAgXCJyZWR1Y2VfYnVja2V0X3NpemVcIjogNTAwMDAwMDAwLFxuICAgIFwiY29udGlndW91c19ncmFkaWVudHNcIjogdHJ1ZSxcbiAgICBcImNwdV9vZmZsb2FkXCI6IGZhbHNlXG4gIH0sXG5cbiAgXCJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHVcIjogMTYsXG4gIFwiZ3JhZGllbnRfYWNjdW11bGF0aW9uX3N0ZXBzXCI6IDQsXG4gIFwiZGF0YV9pbXBsXCI6IFwibW1hcFwiLFxuICBcIm51bV93b3JrZXJzXCI6IDEsXG5cbiAgXCJjaGVja3BvaW50X2FjdGl2YXRpb25zXCI6IHRydWUsXG4gIFwiY2hlY2twb2ludF9udW1fbGF5ZXJzXCI6IDEsXG4gIFwicGFydGl0aW9uX2FjdGl2YXRpb25zXCI6IHRydWUsXG4gIFwic3luY2hyb25pemVfZWFjaF9sYXllclwiOiB0cnVlLFxuXG4gIFwiZ3JhZGllbnRfY2xpcHBpbmdcIjogMS4wLFxuICBcIndlaWdodF9kZWNheVwiOiAwLjEsXG4gIFwiaGlkZGVuX2Ryb3BvdXRcIjogMCxcbiAgXCJhdHRlbnRpb25fZHJvcG91dFwiOiAwLFxuXG4gIFwiZnAxNlwiOiB7XG4gICAgXCJmcDE2XCI6IHRydWUsXG4gICAgXCJlbmFibGVkXCI6IHRydWUsXG4gICAgXCJsb3NzX3NjYWxlXCI6IDAsXG4gICAgXCJsb3NzX3NjYWxlX3dpbmRvd1wiOiAxMDAwLFxuICAgIFwiaW5pdGlhbF9zY2FsZV9wb3dlclwiOiAxMixcbiAgICBcImh5c3RlcmVzaXNcIjogMixcbiAgICBcIm1pbl9sb3NzX3NjYWxlXCI6IDFcbiAgfSxcblxuICBcInRyYWluX2l0ZXJzXCI6IDE0MzAwMSxcbiAgXCJscl9kZWNheV9pdGVyc1wiOiAxNDMwMDAsXG4gIFwiZGlzdHJpYnV0ZWRfYmFja2VuZFwiOiBcIm5jY2xcIixcbiAgXCJscl9kZWNheV9zdHlsZVwiOiBcImNvc2luZVwiLFxuICBcIndhcm11cFwiOiAwLjAxLFxuICBcImNoZWNrcG9pbnRfZmFjdG9yXCI6IDI1MCxcbiAgIyBcImV4dHJhX3NhdmVfaXRlcnNcIjogWzAsMSwyLDQsOCwxNiwzMiw2NCwxMjgsMjU2LDUxMl0sXG4gIFwiZXZhbF9pbnRlcnZhbFwiOiAxNDMwMDAsXG4gIFwiZXZhbF9pdGVyc1wiOiAxMCxcblxuICBcImxvZ19pbnRlcnZhbFwiOiAxMCxcbiAgXCJzdGVwc19wZXJfcHJpbnRcIjogMTAsXG4gIFwid2FsbF9jbG9ja19icmVha2Rvd25cIjogdHJ1ZSxcblxuICBcInRva2VuaXplcl90eXBlXCI6IFwiSEZUb2tlbml6ZXJcIixcbiAgXCJ2b2NhYl9maWxlXCI6IFwiL3dla2EvcGlsZS8yMEJfdG9rZW5pemVyLmpzb25cIixcblxuICAjIFwic2F2ZVwiOiBcIi93ZWthL2hhaWxleS9tYW1iYS1ja3B0cy9tYW1iYS0xNjBtLXB5dGhpYS10ZXN0LWNvbnYtYmlhc1wiLFxuICAjIFwibG9hZFwiOiBcIi93ZWthL2hhaWxleS9tYW1iYS1ja3B0cy9tYW1iYS0xNjBtLXB5dGhpYS10ZXN0LWNvbnYtYmlhc1wiLFxuXG4gICMgXCJzM19wYXRoXCI6IFwiczM6Ly9zLWVhaS1uZW94LXdlc3QvaGFpbGV5L21hbWJhL3Rlc3QtY2twdHMvbWFtYmEtMTYwbS1weXRoaWEtdGVzdC1jb252LWJpYXNcIixcblxuICAjIFwia2VlcF9sYXN0X25fY2hlY2twb2ludHNcIjogMixcblxuICBcInRyYWluX2RhdGFfcGF0aHNcIjogW1wiL3dla2EvcGlsZS9waWxlXzIwQl90b2tlbml6ZXJfdGV4dF9kb2N1bWVudFwiXSxcbiAgXCJ2YWxpZF9kYXRhX3BhdGhzXCI6IFtcIi93ZWthL3BpbGUvcGlsZV8yMEJfdG9rZW5pemVyX3RleHRfZG9jdW1lbnRcIl0sXG4gIFwidGVzdF9kYXRhX3BhdGhzXCI6IFtcIi93ZWthL3BpbGUvcGlsZV8yMEJfdG9rZW5pemVyX3RleHRfZG9jdW1lbnRcIl0sXG5cbiAgXCJsYXVuY2hlclwiOiBcInNsdXJtXCIsIFxuICBcImRlZXBzcGVlZF9zbHVybVwiOiB0cnVlLFxuICAjICBcImFjY291bnRcIjogXCJlbGV1dGhlclwiLFxuICBcIm5vX3NzaF9jaGVja1wiOiB0cnVlLFxuXG4gIFwidXNlX3dhbmRiXCI6IHRydWUsXG4gIFwid2FuZGJfZ3JvdXBcIjogXCJtYW1iYS10cDItYnMxNi1mdXNlZGlubmVyXCIsXG4gIFwid2FuZGJfdGVhbVwiOiBcImVsZXV0aGVyYWlcIixcbiAgXCJ3YW5kYl9wcm9qZWN0XCI6IFwibWFtYmEtbmVveC10cC1tZW1zYXZpbmdzXCIsXG59XG4ifSwgImNoZWNrcG9pbnRfZmFjdG9yIjogMjUwLCAiYmF0Y2hfc2l6ZSI6IDE2LCAidHJhaW5faXRlcnMiOiAxNDMwMDEsICJldmFsX2l0ZXJzIjogMTAsICJldmFsX2ludGVydmFsIjogMTQzMDAwLCAidm9jYWJfZmlsZSI6ICIvd2VrYS9waWxlLzIwQl90b2tlbml6ZXIuanNvbiIsICJudW1fd29ya2VycyI6IDEsICJjaGVja3BvaW50X2FjdGl2YXRpb25zIjogdHJ1ZSwgInN5bmNocm9uaXplX2VhY2hfbGF5ZXIiOiB0cnVlLCAicGFydGl0aW9uX2FjdGl2YXRpb25zIjogdHJ1ZSwgImR5bmFtaWNfbG9zc19zY2FsZSI6IHRydWUsICJtb2RlbF9wYXJhbGxlbF9zaXplIjogMiwgIndvcmxkX3NpemUiOiAzMiwgInVzZV93YW5kYiI6IHRydWUsICJ3YW5kYl9ncm91cCI6ICJtYW1iYS10cDItYnMxNi1mdXNlZGlubmVyXzN5djE0ZW4wXzdwY3NwNHBuIiwgIndhbmRiX3RlYW0iOiAiZWxldXRoZXJhaSIsICJ3YW5kYl9wcm9qZWN0IjogIm1hbWJhLW5lb3gtdHAtbWVtc2F2aW5ncyIsICJsb2dfaW50ZXJ2YWwiOiAxMCwgInRleHRfZ2VuX3R5cGUiOiAidW5jb25kaXRpb25hbCIsICJsb2NhbF9yYW5rIjogMCwgInJhbmsiOiAwLCAiZGVlcHNwZWVkX3NsdXJtIjogdHJ1ZSwgInVzZXJfc2NyaXB0IjogIi93ZWthL2hhaWxleS9taXN0cmFsLXN1cHBvcnQtbmVveC9ncHQtbmVveC90cmFpbi5weSIsICJzYXZlX2l0ZXJzIjogWzI1MCwgNTAwLCA3NTAsIDEwMDAsIDEyNTAsIDE1MDAsIDE3NTAsIDIwMDAsIDIyNTAsIDI1MDAsIDI3NTAsIDMwMDAsIDMyNTAsIDM1MDAsIDM3NTAsIDQwMDAsIDQyNTAsIDQ1MDAsIDQ3NTAsIDUwMDAsIDUyNTAsIDU1MDAsIDU3NTAsIDYwMDAsIDYyNTAsIDY1MDAsIDY3NTAsIDcwMDAsIDcyNTAsIDc1MDAsIDc3NTAsIDgwMDAsIDgyNTAsIDg1MDAsIDg3NTAsIDkwMDAsIDkyNTAsIDk1MDAsIDk3NTAsIDEwMDAwLCAxMDI1MCwgMTA1MDAsIDEwNzUwLCAxMTAwMCwgMTEyNTAsIDExNTAwLCAxMTc1MCwgMTIwMDAsIDEyMjUwLCAxMjUwMCwgMTI3NTAsIDEzMDAwLCAxMzI1MCwgMTM1MDAsIDEzNzUwLCAxNDAwMCwgMTQyNTAsIDE0NTAwLCAxNDc1MCwgMTUwMDAsIDE1MjUwLCAxNTUwMCwgMTU3NTAsIDE2MDAwLCAxNjI1MCwgMTY1MDAsIDE2NzUwLCAxNzAwMCwgMTcyNTAsIDE3NTAwLCAxNzc1MCwgMTgwMDAsIDE4MjUwLCAxODUwMCwgMTg3NTAsIDE5MDAwLCAxOTI1MCwgMTk1MDAsIDE5NzUwLCAyMDAwMCwgMjAyNTAsIDIwNTAwLCAyMDc1MCwgMjEwMDAsIDIxMjUwLCAyMTUwMCwgMjE3NTAsIDIyMDAwLCAyMjI1MCwgMjI1MDAsIDIyNzUwLCAyMzAwMCwgMjMyNTAsIDIzNTAwLCAyMzc1MCwgMjQwMDAsIDI0MjUwLCAyNDUwMCwgMjQ3NTAsIDI1MDAwLCAyNTI1MCwgMjU1MDAsIDI1NzUwLCAyNjAwMCwgMjYyNTAsIDI2NTAwLCAyNjc1MCwgMjcwMDAsIDI3MjUwLCAyNzUwMCwgMjc3NTAsIDI4MDAwLCAyODI1MCwgMjg1MDAsIDI4NzUwLCAyOTAwMCwgMjkyNTAsIDI5NTAwLCAyOTc1MCwgMzAwMDAsIDMwMjUwLCAzMDUwMCwgMzA3NTAsIDMxMDAwLCAzMTI1MCwgMzE1MDAsIDMxNzUwLCAzMjAwMCwgMzIyNTAsIDMyNTAwLCAzMjc1MCwgMzMwMDAsIDMzMjUwLCAzMzUwMCwgMzM3NTAsIDM0MDAwLCAzNDI1MCwgMzQ1MDAsIDM0NzUwLCAzNTAwMCwgMzUyNTAsIDM1NTAwLCAzNTc1MCwgMzYwMDAsIDM2MjUwLCAzNjUwMCwgMzY3NTAsIDM3MDAwLCAzNzI1MCwgMzc1MDAsIDM3NzUwLCAzODAwMCwgMzgyNTAsIDM4NTAwLCAzODc1MCwgMzkwMDAsIDM5MjUwLCAzOTUwMCwgMzk3NTAsIDQwMDAwLCA0MDI1MCwgNDA1MDAsIDQwNzUwLCA0MTAwMCwgNDEyNTAsIDQxNTAwLCA0MTc1MCwgNDIwMDAsIDQyMjUwLCA0MjUwMCwgNDI3NTAsIDQzMDAwLCA0MzI1MCwgNDM1MDAsIDQzNzUwLCA0NDAwMCwgNDQyNTAsIDQ0NTAwLCA0NDc1MCwgNDUwMDAsIDQ1MjUwLCA0NTUwMCwgNDU3NTAsIDQ2MDAwLCA0NjI1MCwgNDY1MDAsIDQ2NzUwLCA0NzAwMCwgNDcyNTAsIDQ3NTAwLCA0Nzc1MCwgNDgwMDAsIDQ4MjUwLCA0ODUwMCwgNDg3NTAsIDQ5MDAwLCA0OTI1MCwgNDk1MDAsIDQ5NzUwLCA1MDAwMCwgNTAyNTAsIDUwNTAwLCA1MDc1MCwgNTEwMDAsIDUxMjUwLCA1MTUwMCwgNTE3NTAsIDUyMDAwLCA1MjI1MCwgNTI1MDAsIDUyNzUwLCA1MzAwMCwgNTMyNTAsIDUzNTAwLCA1Mzc1MCwgNTQwMDAsIDU0MjUwLCA1NDUwMCwgNTQ3NTAsIDU1MDAwLCA1NTI1MCwgNTU1MDAsIDU1NzUwLCA1NjAwMCwgNTYyNTAsIDU2NTAwLCA1Njc1MCwgNTcwMDAsIDU3MjUwLCA1NzUwMCwgNTc3NTAsIDU4MDAwLCA1ODI1MCwgNTg1MDAsIDU4NzUwLCA1OTAwMCwgNTkyNTAsIDU5NTAwLCA1OTc1MCwgNjAwMDAsIDYwMjUwLCA2MDUwMCwgNjA3NTAsIDYxMDAwLCA2MTI1MCwgNjE1MDAsIDYxNzUwLCA2MjAwMCwgNjIyNTAsIDYyNTAwLCA2Mjc1MCwgNjMwMDAsIDYzMjUwLCA2MzUwMCwgNjM3NTAsIDY0MDAwLCA2NDI1MCwgNjQ1MDAsIDY0NzUwLCA2NTAwMCwgNjUyNTAsIDY1NTAwLCA2NTc1MCwgNjYwMDAsIDY2MjUwLCA2NjUwMCwgNjY3NTAsIDY3MDAwLCA2NzI1MCwgNjc1MDAsIDY3NzUwLCA2ODAwMCwgNjgyNTAsIDY4NTAwLCA2ODc1MCwgNjkwMDAsIDY5MjUwLCA2OTUwMCwgNjk3NTAsIDcwMDAwLCA3MDI1MCwgNzA1MDAsIDcwNzUwLCA3MTAwMCwgNzEyNTAsIDcxNTAwLCA3MTc1MCwgNzIwMDAsIDcyMjUwLCA3MjUwMCwgNzI3NTAsIDczMDAwLCA3MzI1MCwgNzM1MDAsIDczNzUwLCA3NDAwMCwgNzQyNTAsIDc0NTAwLCA3NDc1MCwgNzUwMDAsIDc1MjUwLCA3NTUwMCwgNzU3NTAsIDc2MDAwLCA3NjI1MCwgNzY1MDAsIDc2NzUwLCA3NzAwMCwgNzcyNTAsIDc3NTAwLCA3Nzc1MCwgNzgwMDAsIDc4MjUwLCA3ODUwMCwgNzg3NTAsIDc5MDAwLCA3OTI1MCwgNzk1MDAsIDc5NzUwLCA4MDAwMCwgODAyNTAsIDgwNTAwLCA4MDc1MCwgODEwMDAsIDgxMjUwLCA4MTUwMCwgODE3NTAsIDgyMDAwLCA4MjI1MCwgODI1MDAsIDgyNzUwLCA4MzAwMCwgODMyNTAsIDgzNTAwLCA4Mzc1MCwgODQwMDAsIDg0MjUwLCA4NDUwMCwgODQ3NTAsIDg1MDAwLCA4NTI1MCwgODU1MDAsIDg1NzUwLCA4NjAwMCwgODYyNTAsIDg2NTAwLCA4Njc1MCwgODcwMDAsIDg3MjUwLCA4NzUwMCwgODc3NTAsIDg4MDAwLCA4ODI1MCwgODg1MDAsIDg4NzUwLCA4OTAwMCwgODkyNTAsIDg5NTAwLCA4OTc1MCwgOTAwMDAsIDkwMjUwLCA5MDUwMCwgOTA3NTAsIDkxMDAwLCA5MTI1MCwgOTE1MDAsIDkxNzUwLCA5MjAwMCwgOTIyNTAsIDkyNTAwLCA5Mjc1MCwgOTMwMDAsIDkzMjUwLCA5MzUwMCwgOTM3NTAsIDk0MDAwLCA5NDI1MCwgOTQ1MDAsIDk0NzUwLCA5NTAwMCwgOTUyNTAsIDk1NTAwLCA5NTc1MCwgOTYwMDAsIDk2MjUwLCA5NjUwMCwgOTY3NTAsIDk3MDAwLCA5NzI1MCwgOTc1MDAsIDk3NzUwLCA5ODAwMCwgOTgyNTAsIDk4NTAwLCA5ODc1MCwgOTkwMDAsIDk5MjUwLCA5OTUwMCwgOTk3NTAsIDEwMDAwMCwgMTAwMjUwLCAxMDA1MDAsIDEwMDc1MCwgMTAxMDAwLCAxMDEyNTAsIDEwMTUwMCwgMTAxNzUwLCAxMDIwMDAsIDEwMjI1MCwgMTAyNTAwLCAxMDI3NTAsIDEwMzAwMCwgMTAzMjUwLCAxMDM1MDAsIDEwMzc1MCwgMTA0MDAwLCAxMDQyNTAsIDEwNDUwMCwgMTA0NzUwLCAxMDUwMDAsIDEwNTI1MCwgMTA1NTAwLCAxMDU3NTAsIDEwNjAwMCwgMTA2MjUwLCAxMDY1MDAsIDEwNjc1MCwgMTA3MDAwLCAxMDcyNTAsIDEwNzUwMCwgMTA3NzUwLCAxMDgwMDAsIDEwODI1MCwgMTA4NTAwLCAxMDg3NTAsIDEwOTAwMCwgMTA5MjUwLCAxMDk1MDAsIDEwOTc1MCwgMTEwMDAwLCAxMTAyNTAsIDExMDUwMCwgMTEwNzUwLCAxMTEwMDAsIDExMTI1MCwgMTExNTAwLCAxMTE3NTAsIDExMjAwMCwgMTEyMjUwLCAxMTI1MDAsIDExMjc1MCwgMTEzMDAwLCAxMTMyNTAsIDExMzUwMCwgMTEzNzUwLCAxMTQwMDAsIDExNDI1MCwgMTE0NTAwLCAxMTQ3NTAsIDExNTAwMCwgMTE1MjUwLCAxMTU1MDAsIDExNTc1MCwgMTE2MDAwLCAxMTYyNTAsIDExNjUwMCwgMTE2NzUwLCAxMTcwMDAsIDExNzI1MCwgMTE3NTAwLCAxMTc3NTAsIDExODAwMCwgMTE4MjUwLCAxMTg1MDAsIDExODc1MCwgMTE5MDAwLCAxMTkyNTAsIDExOTUwMCwgMTE5NzUwLCAxMjAwMDAsIDEyMDI1MCwgMTIwNTAwLCAxMjA3NTAsIDEyMTAwMCwgMTIxMjUwLCAxMjE1MDAsIDEyMTc1MCwgMTIyMDAwLCAxMjIyNTAsIDEyMjUwMCwgMTIyNzUwLCAxMjMwMDAsIDEyMzI1MCwgMTIzNTAwLCAxMjM3NTAsIDEyNDAwMCwgMTI0MjUwLCAxMjQ1MDAsIDEyNDc1MCwgMTI1MDAwLCAxMjUyNTAsIDEyNTUwMCwgMTI1NzUwLCAxMjYwMDAsIDEyNjI1MCwgMTI2NTAwLCAxMjY3NTAsIDEyNzAwMCwgMTI3MjUwLCAxMjc1MDAsIDEyNzc1MCwgMTI4MDAwLCAxMjgyNTAsIDEyODUwMCwgMTI4NzUwLCAxMjkwMDAsIDEyOTI1MCwgMTI5NTAwLCAxMjk3NTAsIDEzMDAwMCwgMTMwMjUwLCAxMzA1MDAsIDEzMDc1MCwgMTMxMDAwLCAxMzEyNTAsIDEzMTUwMCwgMTMxNzUwLCAxMzIwMDAsIDEzMjI1MCwgMTMyNTAwLCAxMzI3NTAsIDEzMzAwMCwgMTMzMjUwLCAxMzM1MDAsIDEzMzc1MCwgMTM0MDAwLCAxMzQyNTAsIDEzNDUwMCwgMTM0NzUwLCAxMzUwMDAsIDEzNTI1MCwgMTM1NTAwLCAxMzU3NTAsIDEzNjAwMCwgMTM2MjUwLCAxMzY1MDAsIDEzNjc1MCwgMTM3MDAwLCAxMzcyNTAsIDEzNzUwMCwgMTM3NzUwLCAxMzgwMDAsIDEzODI1MCwgMTM4NTAwLCAxMzg3NTAsIDEzOTAwMCwgMTM5MjUwLCAxMzk1MDAsIDEzOTc1MCwgMTQwMDAwLCAxNDAyNTAsIDE0MDUwMCwgMTQwNzUwLCAxNDEwMDAsIDE0MTI1MCwgMTQxNTAwLCAxNDE3NTAsIDE0MjAwMCwgMTQyMjUwLCAxNDI1MDAsIDE0Mjc1MCwgMTQzMDAwXSwgImdsb2JhbF9udW1fZ3B1cyI6IDMyfQ==
System Hardware
| CPU count | 48 |
| Logical CPU count | 96 |
| GPU count | 8 |
| GPU type | NVIDIA A100-SXM4-40GB |
W&B CLI Version
0.16.3
Config
Config parameters are your model's inputs. Learn more
- {} 258 keys▶
- null
- "silu"
- null
- false
- 1,000
- null
- false
- [] 24 items▶
- 0
- false
- null
- null
- null
- 16
- null
- false
- false
- false
- null
- true
- 250
- false
- 1
- "linear"
- false
- 1
- null
- null
- null
- null
- {} 1 key▶
- "{ "pipe_parallel_size": 0, "model_parallel_size": 2, "num_layers": 24, "hidden_size": 768, "num_attention_heads": 12, "seq_length": 2048, "max_position_embeddings": 2048, "pos_emb": "rotary", "rotary_pct": 0.25, "no_weight_tying": true, "gpt_j_residual": true, "output_layer_parallelism": "column", "attention_config": [[["mamba"], 24]], # "scaled_upper_triang_masked_softmax_fusion": true, # "bias_gelu_fusion": true, "mamba_selective_scan_fusion": true, "mamba_causal_conv_fusion": true, "mamba_inner_func_fusion": true, "mamba_selective_fp32_params": true, "activation": "silu", "norm": "rmsnorm", "rms_norm_epsilon": 1.0e-5, "output_layer_init_method": "single_residual_scaled_normal", # "init_method": "small_init", # "output_layer_init_method": "wang_init", "optimizer": { "type": "Adam", "params": { "lr": 0.0006, "betas": [0.9, 0.95], "eps": 1.0e-8 } }, "min_lr": 0.00006, "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 500000000, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 500000000, "contiguous_gradients": true, "cpu_offload": false }, "train_micro_batch_size_per_gpu": 16, "gradient_accumulation_steps": 4, "data_impl": "mmap", "num_workers": 1, "checkpoint_activations": true, "checkpoint_num_layers": 1, "partition_activations": true, "synchronize_each_layer": true, "gradient_clipping": 1.0, "weight_decay": 0.1, "hidden_dropout": 0, "attention_dropout": 0, "fp16": { "fp16": true, "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 12, "hysteresis": 2, "min_loss_scale": 1 }, "train_iters": 143001, "lr_decay_iters": 143000, "distributed_backend": "nccl", "lr_decay_style": "cosine", "warmup": 0.01, "checkpoint_factor": 250, # "extra_save_iters": [0,1,2,4,8,16,32,64,128,256,512], "eval_interval": 143000, "eval_iters": 10, "log_interval": 10, "steps_per_print": 10, "wall_clock_breakdown": true, "tokenizer_type": "HFTokenizer", "vocab_file": "/weka/pile/20B_tokenizer.json", # "save": "/weka/hailey/mamba-ckpts/mamba-160m-pythia-test-conv-bias", # "load": "/weka/hailey/mamba-ckpts/mamba-160m-pythia-test-conv-bias", # "s3_path": "s3://s-eai-neox-west/hailey/mamba/test-ckpts/mamba-160m-pythia-test-conv-bias", # "keep_last_n_checkpoints": 2, "train_data_paths": ["/weka/pile/pile_20B_tokenizer_text_document"], "valid_data_paths": ["/weka/pile/pile_20B_tokenizer_text_document"], "test_data_paths": ["/weka/pile/pile_20B_tokenizer_text_document"], "launcher": "slurm", "deepspeed_slurm": true, # "account": "eleuther", "no_ssh_check": true, "use_wandb": true, "wandb_group": "mamba-tp2-bs16-fusedinner", "wandb_team": "eleutherai", "wandb_project": "mamba-neox-tp-memsavings", } "
- false
- false
- true
- null
- null
- 0
- null
- "mmap"
- null
- null
- false
- null
- true
- true
- null
- {} 8 keys▶
- 500,000,000
- true
- 1
46 ... 95▶▶96 ... 145▶▶146 ... 195▶▶196 ... 245▶▶246 ... 253▶▶
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.