Skip to main content

trlx: Add `bitsandbytes` optimizer support #133

Report for the following PR: https://github.com/CarperAI/trlx/pull/133
Created on January 4|Last edited on January 4
This PR provides the results from PPO training on the sentiments task to observe GPU memory-saving behavior.
  • Setup: 8 x A100 80GB - of which only 2 devices were used for testing.
  • Command:
torchrun --nproc_per_node 2 examples/ppo_sentiments.py
  • Config:
train:
seq_length: 48
epochs: 10
total_steps: 80000
batch_size: 8

checkpoint_interval: 10000
eval_interval: 100

pipeline: "PromptPipeline"
orchestrator: "PPOOrchestrator"
trainer: "AcceleratePPOTrainer"

model:
model_path: "EleutherAI/gpt-j-6B"
tokenizer_path: "gpt2"
num_layers_unfrozen: 8

optimizer:
name: "adamw_8bit_bnb" # "adamw"
kwargs:
lr: 1.0e-5
betas: [0.9, 0.95]
eps: 1.0e-8
weight_decay: 1.0e-6

scheduler:
name: "cosine_annealing"
kwargs:
T_max: 10000 # train.total_steps
eta_min: 1.0e-4

method:
name: "ppoconfig"
num_rollouts: 8
chunk_size: 8
ppo_epochs: 4
init_kl_coef: 0.2
target: 6
horizon: 10000
gamma: 1
lam: 0.95
cliprange: 0.2
cliprange_value: 0.2
vf_coef: 0.2
scale_reward: False
ref_mean: null
ref_std: null
cliprange_reward: 10
gen_kwargs:
max_new_tokens: 40
top_k: 0
top_p: 0.7
do_sample: True
temperature: 1.0


Results


Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0



Run set
0