trlx: Add `bitsandbytes` optimizer support #133

Report for the following PR: https://github.com/CarperAI/trlx/pull/133

Created on January 4|Last edited on January 4

Comment

This PR provides the results from PPO training on the sentiments task to observe GPU memory-saving behavior.
Setup: 8 x A100 80GB - of which only 2 devices were used for testing. 
Command:
torchrun --nproc_per_node 2 examples/ppo_sentiments.py
Config:
train:
  seq_length: 48
  epochs: 10
  total_steps: 80000
  batch_size: 8
﻿
  checkpoint_interval: 10000
  eval_interval: 100
﻿
  pipeline: "PromptPipeline"
  orchestrator: "PPOOrchestrator"
  trainer: "AcceleratePPOTrainer"
﻿
model:
  model_path: "EleutherAI/gpt-j-6B"
  tokenizer_path: "gpt2"
  num_layers_unfrozen: 8
﻿
optimizer:
  name: "adamw_8bit_bnb"  # "adamw"
  kwargs:
    lr: 1.0e-5
    betas: [0.9, 0.95]
    eps: 1.0e-8
    weight_decay: 1.0e-6
﻿
scheduler:
  name: "cosine_annealing"
  kwargs:
    T_max: 10000 # train.total_steps
    eta_min: 1.0e-4
﻿
method:
  name: "ppoconfig"
  num_rollouts: 8
  chunk_size: 8
  ppo_epochs: 4
  init_kl_coef: 0.2
  target: 6
  horizon: 10000
  gamma: 1
  lam: 0.95
  cliprange: 0.2
  cliprange_value: 0.2
  vf_coef: 0.2
  scale_reward: False
  ref_mean: null
  ref_std: null
  cliprange_reward: 10
  gen_kwargs:
    max_new_tokens: 40
    top_k: 0
    top_p: 0.7
    do_sample: True
    temperature: 1.0
﻿
Results﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿

Add a comment