Skip to main content

trlx: LORA support

Results for the ILQL Sentiment task with LORA to observe training dynamics and memory saving.
Created on January 7|Last edited on January 7
Note that the LORA-adapted model has a loss_awac that seems to "converge" to a stable value quickly. At the same time, the baseline (fully-learnable) version continues an almost step-wise decrease through training.
Config:
train:
seq_length: 64
batch_size: 128
epochs: 100
total_steps: 1000

checkpoint_interval: 1000
eval_interval: 100

pipeline: "PromptPipeline"
orchestrator: "OfflineOrchestrator"
trainer: "AccelerateILQLTrainer"

seed: 1000

model:
model_path: "EleutherAI/pythia-2.7b"
tokenizer_path: "EleutherAI/pythia-2.7b"
num_layers_unfrozen: -1
# Comment the delta configs to remove OpenDelta adapters
delta_method: "lora"
delta_modified_modules: "all"

optimizer:
name: "adamw"
kwargs:
lr: 5.0e-5
betas: [0.9, 0.95]
eps: 1.0e-8
weight_decay: 1.0e-6

scheduler:
name: "cosine_annealing"
kwargs:
T_max: 1000 # train.total_steps
eta_min: 5.0e-5

method:
name: "ilqlconfig"
tau: 0.7
gamma: 0.99
cql_scale: 0.1
awac_scale: 1
alpha: 0.001
steps_for_target_q_sync: 5
two_qs: true
gen_kwargs:
max_new_tokens: 56
top_k: 20
beta: 4
temperature: 1.0


Section 1


Run set
2



Run set
2



Run set
2



Run set
2



Run set
2



Run set
2