trlx: LORA support
Results for the ILQL Sentiment task with LORA to observe training dynamics and memory saving.
Created on January 7|Last edited on January 7
Comment
Note that the LORA-adapted model has a loss_awac that seems to "converge" to a stable value quickly. At the same time, the baseline (fully-learnable) version continues an almost step-wise decrease through training.
Config:
train:seq_length: 64batch_size: 128epochs: 100total_steps: 1000checkpoint_interval: 1000eval_interval: 100pipeline: "PromptPipeline"orchestrator: "OfflineOrchestrator"trainer: "AccelerateILQLTrainer"seed: 1000model:model_path: "EleutherAI/pythia-2.7b"tokenizer_path: "EleutherAI/pythia-2.7b"num_layers_unfrozen: -1# Comment the delta configs to remove OpenDelta adaptersdelta_method: "lora"delta_modified_modules: "all"optimizer:name: "adamw"kwargs:lr: 5.0e-5betas: [0.9, 0.95]eps: 1.0e-8weight_decay: 1.0e-6scheduler:name: "cosine_annealing"kwargs:T_max: 1000 # train.total_stepseta_min: 5.0e-5method:name: "ilqlconfig"tau: 0.7gamma: 0.99cql_scale: 0.1awac_scale: 1alpha: 0.001steps_for_target_q_sync: 5two_qs: truegen_kwargs:max_new_tokens: 56top_k: 20beta: 4temperature: 1.0
Section 1
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Add a comment