trlx: LORA support

Results for the ILQL Sentiment task with LORA to observe training dynamics and memory saving.

Created on January 7|Last edited on January 7

Comment

Note that the LORA-adapted model has a loss_awac that seems to "converge" to a stable value quickly. At the same time, the baseline (fully-learnable) version continues an almost step-wise decrease through training.
Config:
train:
  seq_length: 64
  batch_size: 128
  epochs: 100
  total_steps: 1000
﻿
  checkpoint_interval: 1000
  eval_interval: 100
﻿
  pipeline: "PromptPipeline"
  orchestrator: "OfflineOrchestrator"
  trainer: "AccelerateILQLTrainer"
﻿
  seed: 1000
﻿
model:
  model_path: "EleutherAI/pythia-2.7b"
  tokenizer_path: "EleutherAI/pythia-2.7b"
  num_layers_unfrozen: -1
  # Comment the delta configs to remove OpenDelta adapters
  delta_method: "lora"
  delta_modified_modules: "all"
﻿
optimizer:
  name: "adamw"
  kwargs:
    lr: 5.0e-5
    betas: [0.9, 0.95]
    eps: 1.0e-8
    weight_decay: 1.0e-6
﻿
scheduler:
  name: "cosine_annealing"
  kwargs:
    T_max: 1000 # train.total_steps
    eta_min: 5.0e-5
﻿
method:
  name: "ilqlconfig"
  tau: 0.7
  gamma: 0.99
  cql_scale: 0.1
  awac_scale: 1
  alpha: 0.001
  steps_for_target_q_sync: 5
  two_qs: true
  gen_kwargs:
    max_new_tokens: 56
    top_k: 20
    beta: 4
    temperature: 1.0
﻿
Section 1﻿
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿

Add a comment