Skip to main content

trlx: Add `bitsandbytes` optimizer support #133

Report for the following PR: https://github.com/CarperAI/trlx/pull/133
Created on January 4|Last edited on January 11
This report provides results from PPO training on the sentiments task to observe GPU memory-saving behavior.
NOTE: The reference model is placed on cpu.
  • Setup: 2 x A100 80GB
  • Command: torchrun --nproc_per_node=2 examples/ppo_sentiments.py
  • Config:
train:
seq_length: 48
epochs: 10
total_steps: 80000
batch_size: 8

checkpoint_interval: 10000
eval_interval: 100

pipeline: "PromptPipeline"
orchestrator: "PPOOrchestrator"
trainer: "AcceleratePPOTrainer"

model:
model_path: "EleutherAI/gpt-j-6B"
tokenizer_path: "gpt2"
num_layers_unfrozen: 8

optimizer:
name: "adamw_8bit_bnb" # "adamw"
kwargs:
lr: 1.4e-5
betas: [0.9, 0.95]
eps: 1.0e-8
weight_decay: 1.0e-6

scheduler:
name: "cosine_annealing"
kwargs:
T_max: 80000 # train.total_steps
eta_min: 1.0e-4

method:
name: "ppoconfig"
num_rollouts: 8
chunk_size: 8
ppo_epochs: 4
init_kl_coef: 0.2
target: 6
horizon: 10000
gamma: 1
lam: 0.95
cliprange: 0.2
cliprange_value: 0.2
vf_coef: 0.2
scale_reward: "running"
ref_mean: null
ref_std: null
cliprange_reward: 10
gen_kwargs:
max_new_tokens: 40
top_k: 0
top_p: 0.7
do_sample: True
temperature: 1.0

Results



Run set
2



Run set
2




Run set
2



Run set
2



Run set
2



Run set
2



Run set
2



Run set
2



Run set
2



Run set
2




Run set
2