trlx: `accelerate` Multi-Node DDP Benchmark

PPO Sentiments Benchmark on Multi-Node DDP setup
Created on December 7|Last edited on December 8
Comment
EDIT: THE PERF TIME IS OFF B/C OF POORLY CONFIGURED mpirun
Multi-Node DDP Setup:
CoreWeave Cluster
8 x A100 80GB
Num Nodes = 2
Gist for Slurm scripts: https://gist.github.com/jon-tow/f25df93c356c0a47a57c1c5f8c77c2c8﻿
PPO-Benchmark Setup (`ppo-benchmark/{model_name}`)
CoreWeave Cluster
8 x A100 80GB
Num Nodes = 1
Config:
model:
  model_path: "facebook/opt-2.7b"  # Name of hf model to load
  tokenizer_path: "facebook/opt-2.7b"  # Name of hf tokenizer to load
  model_type: "AcceleratePPOModel"  # Name of accelerate model type to load
  num_layers_unfrozen: -1  # Number of bottom layers to freeze during training
﻿
train:
  seq_length: 48  # Size of LM context
  epochs: 10  # Train for max(epochs, total_steps)
  total_steps: 80000  # Train for max(epochs, total_steps)
  batch_size: 8  # batch size
﻿
  # Large Model Settings
  lr_init: 1.04e-5  # init learning rate
  lr_target: 1.04e-5  # target final learning rate
  opt_betas: [0.9, 0.95] # adam betas
  opt_eps: 1.0e-8  # adam eps
  weight_decay: 1.0e-6  # weight decay param
﻿
  checkpoint_interval: 1000000  # checkpoint interval
  eval_interval: 16  # eval interval
﻿
  pipeline: "PromptPipeline"  # prompt pipeline to load
  orchestrator: "PPOOrchestrator"  # orchestrator to load
﻿
method:
  name: 'ppoconfig'  # Name of RL method config
  num_rollouts: 8  # Number of rollouts to collect per epoch
  # WARNING: VERY SMALL CHUNK SIZE BECAUSE OF SLOW GENERATION!
  chunk_size: 1  # Number of rollouts to collect in one loop of orchestrator
  ppo_epochs: 4  # Number of ppo epochs
  init_kl_coef: 0.2  # init kl coefficient
  target: 6  # target kl coefficient
  horizon: 10000  # PPO horizon
  gamma: 1  # PPO discount
  lam: 0.95  # PPO lambda
  cliprange: 0.2  # clip range
  cliprange_value: 0.2  # clip range
  vf_coef: 0.2  # value term weight
  scale_reward: "running" # False|"ref"|"running" estimate against which to scale rewards
  cliprange_reward: 10
  ref_mean: null
  ref_std: null
  gen_kwargs:
    max_length: 48  # LM max sample gen length
    min_length: 48  # LM min sample gen length
    top_k: 0.0  # top k
    top_p: 0.7  # top p
    do_sample: True  # sample
    temperature: 1.0
﻿
Results﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
NOTE:  Multi-Node DDP leads to > 2x slowdowns across training.  
﻿
﻿
Run set0
﻿
﻿
﻿
﻿
﻿
﻿
Add a comment