trlx: Repro `bnb` perf after t5 update
Created on January 10|Last edited on January 10
Comment
Example Used:
Config:
train:seq_length: 48epochs: 10total_steps: 80000batch_size: 8checkpoint_interval: 10000eval_interval: 100pipeline: "PromptPipeline"orchestrator: "PPOOrchestrator"trainer: "AcceleratePPOTrainer"entity_name: "jon-tow"model:model_path: "EleutherAI/gpt-j-6B"tokenizer_path: "gpt2"num_layers_unfrozen: 8optimizer:# name: "adamw"# 8-bit Optimizer Settingsname: "adamw_8bit_bnb"kwargs:lr: 1.4e-5betas: [0.9, 0.95]eps: 1.0e-8weight_decay: 1.0e-6scheduler:name: "cosine_annealing"kwargs:T_max: 80000 # train.total_stepseta_min: 1.0e-4method:name: "ppoconfig"num_rollouts: 8chunk_size: 8ppo_epochs: 4init_kl_coef: 0.2target: 6horizon: 10000gamma: 1lam: 0.95cliprange: 0.2cliprange_value: 0.2vf_coef: 0.2scale_reward: "running"ref_mean: nullref_std: nullcliprange_reward: 10gen_kwargs:max_new_tokens: 40top_k: 0top_p: 0.7do_sample: Truetemperature: 1.0
Result
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Add a comment