Skip to main content

Master LLM finetuning for Python code tasks

Master LLM finetuning to enhance Python code tasks, boosting model accuracy and relevance. Learn to adapt pre-trained models for efficient, targeted sol...
Created on September 11|Last edited on September 11

Master LLM finetuning for Python code tasks

Understanding LLM finetuning

This section covers understanding llm finetuning before moving into what is llm finetuning?.

What is LLM finetuning?

LLM finetuning adapts a pre-trained language model to solve targeted, often narrower tasks—think classification, code completion, or dialogue—instead of the broad language understanding learned during pre-training. During pre-training, the model absorbs statistical patterns from vast, generic datasets. Finetuning hones those capabilities on smaller, task-specific datasets. This process brings the model closer to your desired application, boosting both accuracy and relevance. For developers and researchers with limited resources, parameter-efficient methods, which only modify a fraction of the model’s weights, are vital to making finetuning accessible and scalable.

The impact of masking instructions during instruction finetuning

When finetuning LLMs for tasks involving instructions (such as prompts or code comments), how the instructions are handled during training meaningfully affects performance. If instructions are masked (ignored in loss computation), models may not learn to follow them robustly, leading to lower quality outputs. Recent work, including the "Instruction Tuning With Loss Over Instructions" paper, demonstrates that preserving instructions within the loss calculation—allowing the model to model instructions as part of its output—consistently improves generalization and downstream performance. As you develop instruction-based applications, it’s usually best not to mask instructions, unless you have a strong task-specific reason.

Exploring parameter-efficient finetuning methods

This section covers exploring parameter-efficient finetuning methods before moving into how does lora compare to full finetuning?.

How does LoRA compare to full finetuning?

LoRA (Low-Rank Adaptation) is a breakthrough technique that enables finetuning by injecting compact, trainable matrices into specific layers of a pre-trained model. Instead of updating the entire model (which can be hundreds of millions or even billions of parameters), LoRA keeps most weights frozen, only learning a small number of additional parameters. This yields several practical benefits:

  • Significantly fewer trainable parameters, enabling finetuning on modest hardware.
  • Comparable or even improved performance on many domain-specific and general tasks.
  • Easier storage and transfer of trained adapters, since only the LoRA parameters need to be saved.

For domains like code generation, LoRA has demonstrated nearly equivalent learning capacity to full finetuning, but with a fraction of compute and memory footprint.

Implications of the "LoRA Learns Less and Forgets Less" study

According to the "LoRA Learns Less and Forgets Less" study, LoRA’s low-rank adapters focus learning on only the most relevant subspaces of the model. As a result, LoRA often causes:

  • Slower overwriting of general knowledge from pre-training (what’s called “less forgetting”)
  • Greater stability across incremental finetuning tasks, with less risk of catastrophic forgetting

In practice, this means a LoRA-finetuned model is less prone to losing original language or reasoning abilities, even as it acquires strong new task skills.

How does LoRA reduce trainable parameters and GPU memory requirements?

LoRA leverages mathematical decomposition: it represents changes to the model’s weights using low-rank matrices. Each adapted layer learns two small matrices instead of a massive weight tensor—greatly decreasing both the number of new parameters and required memory bandwidth. Technically, this involves:

  • Introducing pairs of small matrices (A and B, with rank r typically 4-16) at points in the Transformer layer (such as the query/key/value projections or feed-forward linear layers).
  • During training, only these matrices are optimized; the vast majority of pre-trained weights remain fixed. This targeted improvement reduces VRAM requirements and enables rapid experimentation and transfer, perfect for modern ML workflows.

Practical tutorial: Fine-tuning Gemma 3 270M using Weights & Biases

This section covers practical tutorial: fine-tuning gemma 3 270m using weights & biases before moving into step-by-step guide to implementing lora.

Step-by-step guide to implementing LoRA

Now you’ll put theory into practice: fine-tune Google’s Gemma 3 270M model on a sample Python code dataset, using LoRA for efficient adaptation and Weights & Biases to track, version, and compare experiments.

Step 1: Set up your environment

First, ensure you have a CUDA-capable GPU or Google Colab session. Then, install required packages.

# Step 1: Install dependencies
!pip install datasets transformers accelerate peft wandb weave

Expected output:

Successfully installed datasets transformers accelerate peft wandb weave ...

💡 Tip: If running on Colab, make sure your notebook is set to GPU by clicking Runtime > Change runtime type > GPU.

Step 2: Log in to Weights & Biases

Configure your Weights & Biases project. This also enables seamless use of Weave, W&B’s powerful dataops engine for inspecting models, runs, and artifacts.

# Step 2: Login to Weights & Biases
import wandb

() # This will ask you to paste an API key from https:///authorize

Expected output:

wandb: Paste an API key from your profile at https:///authorize
wandb: Appending key to your netrc file: /root/.netrc

Step 3: Prepare your Python code dataset

You’ll use a small public Python code dataset for demonstration, though you can use your own.

# Step 3: Load a Python code dataset
from datasets import load_dataset

dataset = load_dataset("codeparrot/github-code", split="train[:2000]") # Small sample for quick demo

print("Sample code snippet:", dataset['content'][:200])

Expected output (truncated):

Sample code snippet: import torch
import  as nn
class MyModel:
 ...

💡 Tip: Explore other code datasets such as "bigcode/the-stack" or upload your own scripts for domain adaptation.

⚠️ Troubleshooting:

  • Some datasets require dataset-specific credentials or large downloads. For quick tests, use small splits (train[:2000]).
  • If you encounter ValueError regarding unavailable splits, adjust split specification.

Step 4: Prepare your training pipeline with Transformers and PEFT

You’ll integrate the HuggingFace Transformers, PEFT (for LoRA), and enable model evaluation. Also set up W&B tracking per run.

# Step 4: Define your LoRA finetuning pipeline for Gemma 3 270M
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import getpeftmodel, LoraConfig, TaskType
import torch

model_name = "google/gemma-2b" # Use the smallest available Gemma variant via HuggingFace
tokenizer = (modelname)
 = 

# Tokenization function tailored for code data
def tokenize_function(example):
 return tokenizer(example["content"], padding="maxlength", truncation=True, maxlength=128)

tokenizeddataset = (tokenizefunction, batched=True)

# Define LoRA configuration
lora_config = LoraConfig(
 r=8, # Rank (tradeoff between adaptivity and efficiency)
 lora_alpha=16,
 targetmodules=["qproj", "v_proj"], # Common for Transformer-based LLMs
 lora_dropout=0.1,
 bias="none",
 tasktype=
)

# Load base model and apply LoRA
model = (modelname, torchdtype=torch.float16, devicemap="auto")
model = getpeftmodel(model, lora_config)

# Display parameter count difference
def printtrainableparams(model):
 trainable, total = 0, 0
 for param in ():
 total += ()
 if param.requires_grad:
 trainable += ()
 print(f"Trainable params: {trainable} | Total params: {total} | Percentage: {100 * trainable / total:.4f}%")

printtrainableparams(model)

Expected output:

Trainable params: X | Total params: Y | Percentage: ~0.05%

(Value varies depending on model; LoRA dramatically reduces percentage of trainable parameters.)

💡 Tip: Use LoraConfig to tune r and target_modules for your dataset and hardware. Higher r means more adaptation but larger memory use.

Step 5: Set up W&B experiment tracking and Weave model registry

Track your experiment and manage model runs. W&B’s Weave enables you to visualize, filter, and debug your training runs and results interactively.

# Step 5: Initialize W&B run and logging
run = (
 project="gemma-python-code-finetune",
 config={
 "lora_rank": 8,
 "trainsize": len(tokenizeddataset),
 "basemodel": modelname,
 "batch_size": 2,
 "epochs": 1
 },
 job_type="finetuning"
)

# Sample training arguments
training_args = TrainingArguments(
 output_dir="./results",
 evaluation_strategy="steps",
 eval_steps=100,
 save_strategy="steps",
 save_steps=200,
 numtrainepochs=1,
 perdevicetrainbatchsize=2,
 perdeviceevalbatchsize=2,
 logging_steps=50,
 report_to=["wandb"], # sends all logs, metrics, and plots to your W&B dashboard
 fp16=True,
 gradientaccumulationsteps=8,
 weight_decay=0.01,
 run_name=,
 pushtohub=False
)

# Collator ensures padding is correct for language modeling
data_collator = DataCollatorForLanguageModeling(
 tokenizer=tokenizer, mlm=False
)

Expected output:

wandb: Currently logged in as: <account>

💡 Tip: Each training run appears in your W&B project dashboard. From there, use Weave to compare learning curves, filter runs by lora_rank or model version, and inspect the most promising checkpoints.

Step 6: Launch training

Now perform LoRA-based finetuning, saving all metrics and artifacts with W&B.

# Step 6: Train the model with LoRA adapters
trainer = Trainer(
 model=model,
 args=training_args,
 traindataset=(seed=42).select(range(1000)), # Use only a subset for speed; remove 'select' for full set
 evaldataset=(seed=42).select(range(200)),
 datacollator=datacollator,
)

()
()

Expected output (truncated):

 Running training 
 Num examples = 1000
 Num Epochs = 1
 ...
wandb: Synced 1 W&B file(s), 0 media file(s), 0 artifact file(s)

💡 Tip: After training completes, open your W&B dashboard, explore logs and visualizations, and use Weave to slice, dice, and review run and model artifact data more interactively.

⚠️ Troubleshooting:

  • CUDA Out-of-Memory: Lower batchsize, set perdevicetrainbatchsize=1, or reduce sequence length (maxlength in tokenize).
  • ImportError: If a module is missing, re-run pip install or restart the kernel.
  • Run not logged: Ensure that () is called before training and that report_to=["wandb"] is set in your TrainingArguments.

Step 7: Save and interact with LoRA-finetuned model in W&B Models

After training, log the LoRA adapter weights as a W&B Artifact. Use the Weights & Biases Models UI to share and deploy models, or reproduce results and compare across runs.

# Step 7: Save and log LoRA adapter weights as a W&B artifact
adaptersavepath = "./lora_adapter"
(adaptersave_path)

artifact = (
 name=f"gemma-3-270m-lora-adapter-{}",
 type="model",
 description="LoRA adapter for Gemma 3 270M finetuned on python code data",
 metadata=dict
)
(adaptersave_path)
run.log_artifact(artifact)

Expected output:

wandb: Adding directory to artifact (lora_adapter)...
wandb: Logging artifact artifact:gemma-3-270m-lora-adapter-<run-id>

💡 Tip: Go to your W&B Models Registry, find the uploaded Artifact, and explore the model card—making it versioned, accessible, and ready for deployment.

Exercise: Test your finetuned model

Challenge yourself to generate Python code with your adapter:

# Exercise: Use the finetuned model to generate code from a prompt
from transformers import pipeline

# Load the base model and apply the finetuned LoRA weights
from peft import PeftModel, PeftConfig

basemodel = (modelname, torchdtype=torch.float16, device_map="auto")
adapter = (basemodel, adaptersavepath)
generator = pipeline("text-generation", model=adapter, tokenizer=tokenizer, device=0)

prompt = "# Write a Python function to compute the Fibonacci sequence.\ndef fibonacci(n):"
output = generator(prompt, maxlength=64, numreturn_sequences=1)

print("Model output:\n", output['generated_text'])

Expected output (example):

Model output:
# Write a Python function to compute the Fibonacci sequence.
def fibonacci(n):
 a, b = 0, 1
 result = []
 for i in range(n):
 (a)
 a, b = b, a + b
 return result

Challenge: Try other Python prompts or adapt for different coding styles!

Alternative use cases and applications

LoRA is versatile and well-suited for:

  • Adapting LLMs to other programming languages (JavaScript, C++, Go, etc.)
  • Task-specific instruction tuning (API documentation, comment generation)
  • Direct natural language tasks (question answering, summarization)
  • Domain adaptation for legal, financial, or scientific text

Its lightweight adapters make it simple to share improvements, revert to base models, or iterate rapidly across new datasets.

Conclusion

You have seen how parameter-efficient finetuning methods like LoRA bring practical, scalable LLM adaptation within reach. By leveraging LoRA with the Gemma 3 270M model, tracking through Weights & Biases, and managing artifacts with Weave and Models, you now have a robust toolkit for any domain or language task. Experiment with other data, tweak LoRA hyperparameters, and explore the latest research to push your language model projects further.

Sources