trlx: Add `LORA` support #110
Report for the following PR: https://github.com/CarperAI/trlx/pull/110
Created on January 11|Last edited on January 11
Comment
This report provides results from PPO training on the sentiments task to observe GPU memory-saving behavior.
NOTE: The reference model is placed on cpu.
- Setup: 2 x A100 80GB
train:seq_length: 48epochs: 10total_steps: 80000batch_size: 8checkpoint_interval: 10000eval_interval: 100pipeline: "PromptPipeline"orchestrator: "PPOOrchestrator"trainer: "AcceleratePPOTrainer"entity_name: "jon-tow"model:model_path: "EleutherAI/gpt-j-6B"tokenizer_path: "gpt2"num_layers_unfrozen: 8# LoRA Settingsdelta_kwargs:delta_type: "lora"modified_modules: "all"lora_r: 4optimizer:name: "adamw"kwargs:lr: 1.4e-5betas: [0.9, 0.95]eps: 1.0e-8weight_decay: 1.0e-6scheduler:name: "cosine_annealing"kwargs:T_max: 80000 # train.total_stepseta_min: 1.0e-4method:name: "ppoconfig"num_rollouts: 8chunk_size: 8ppo_epochs: 4init_kl_coef: 0.2target: 6horizon: 10000gamma: 1lam: 0.95cliprange: 0.2cliprange_value: 0.2vf_coef: 0.2scale_reward: "running"ref_mean: nullref_std: nullcliprange_reward: 10gen_kwargs:max_new_tokens: 40top_k: 0top_p: 0.7do_sample: Truetemperature: 1.0
Results
Run set
2
��
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Add a comment