dense-reward-carper v. main
dense-reward-carper
@6bc1c54/fix(ppo_randomwalks): `reward_fn` signature to accommodate tokenizer/2023-07-17
main
@5c5abca/feat(readme): add instructions to avoid OOMs with hyperparameters (#470)/2023-07-13
Created on July 17|Last edited on July 17
Comment
ppo_hh/pythia-6B-static-sft/7gpus
ppo_sentiments/gpt2-imdb/1gpu
ppo_randomwalks/randomwalks/1gpu
ilql_randomwalks/GPT2Config/1gpu
ppo_sentiments_t5/t5-imdb/1gpu
sft_sentiments/gpt2/1gpu
ilql_sentiments/gpt2/1gpu
Add a comment