trlx: `accelerate` Multi-Node DDP Benchmark
PPO Sentiments Benchmark on Multi-Node DDP setup
Created on December 7|Last edited on December 8
Comment
EDIT: THE PERF TIME IS OFF B/C OF POORLY CONFIGURED mpirun
Multi-Node DDP Setup:
- CoreWeave Cluster
- 8 x A100 80GB
- Num Nodes = 2
PPO-Benchmark Setup (`ppo-benchmark/{model_name}`)
- CoreWeave Cluster
- 8 x A100 80GB
- Num Nodes = 1
Config:
model:model_path: "facebook/opt-2.7b" # Name of hf model to loadtokenizer_path: "facebook/opt-2.7b" # Name of hf tokenizer to loadmodel_type: "AcceleratePPOModel" # Name of accelerate model type to loadnum_layers_unfrozen: -1 # Number of bottom layers to freeze during trainingtrain:seq_length: 48 # Size of LM contextepochs: 10 # Train for max(epochs, total_steps)total_steps: 80000 # Train for max(epochs, total_steps)batch_size: 8 # batch size# Large Model Settingslr_init: 1.04e-5 # init learning ratelr_target: 1.04e-5 # target final learning rateopt_betas: [0.9, 0.95] # adam betasopt_eps: 1.0e-8 # adam epsweight_decay: 1.0e-6 # weight decay paramcheckpoint_interval: 1000000 # checkpoint intervaleval_interval: 16 # eval intervalpipeline: "PromptPipeline" # prompt pipeline to loadorchestrator: "PPOOrchestrator" # orchestrator to loadmethod:name: 'ppoconfig' # Name of RL method confignum_rollouts: 8 # Number of rollouts to collect per epoch# WARNING: VERY SMALL CHUNK SIZE BECAUSE OF SLOW GENERATION!chunk_size: 1 # Number of rollouts to collect in one loop of orchestratorppo_epochs: 4 # Number of ppo epochsinit_kl_coef: 0.2 # init kl coefficienttarget: 6 # target kl coefficienthorizon: 10000 # PPO horizongamma: 1 # PPO discountlam: 0.95 # PPO lambdacliprange: 0.2 # clip rangecliprange_value: 0.2 # clip rangevf_coef: 0.2 # value term weightscale_reward: "running" # False|"ref"|"running" estimate against which to scale rewardscliprange_reward: 10ref_mean: nullref_std: nullgen_kwargs:max_length: 48 # LM max sample gen lengthmin_length: 48 # LM min sample gen lengthtop_k: 0.0 # top ktop_p: 0.7 # top pdo_sample: True # sampletemperature: 1.0
Results
Run set
0
Run set
0
Run set
0
Run set
0
NOTE: Multi-Node DDP leads to > 2x slowdowns across training.
Run set
0
Add a comment