Fine-tuning gemma 3 270m: Master python code adaptation
Fine-tuning gemma 3 270m: Master python code adaptation
Fine-tuning large language models like Gemma 3 270M on Python code data can dramatically improve their effectiveness for code generation, understanding, and completion tasks. This tutorial teaches you how to adapt Gemma 3 270M to a Python domain using cutting-edge techniques like instruction tuning, parameter-efficient methods such as LoRA, and advanced domain adaptation—all powered by Weights & Biases tools including Weave and W&B Models. You will follow clear, hands-on steps, run complete and reproducible code, monitor results, and get practical tips and troubleshooting advice throughout the process.
LLM finetuning refers to the process of taking a large, pre-trained language model and making small, targeted adjustments so it performs better on a specific downstream task or within a new data domain. Using approaches like LayerNorm Tuning, which targets the normalization layers for adjustment, finetuning helps boost the model’s reasoning abilities and output quality in specialized scenarios with less overfitting and more efficient updates.
When you finetune, you effectively give the model new expertise while keeping its core knowledge intact. This is particularly valuable for adapting to new programming languages, domain-specific jargon, or unique data distributions in real-world applications.
Instruction-based finetuning tactics impact how well the LLM generalizes and follows task directions. Instruction tuning typically means masking or omitting the instruction text during certain training steps, encouraging the model to focus on the actual task data. In contrast, instruction modeling keeps the instruction text visible, which helps preserve context and allows the model to better understand nuanced directives.
Recent research suggests that instruction modeling—where instructions are never masked—often leads to improvements in the model’s ability to follow complex instructions. This is because keeping the context visible provides more learning signal, especially for tasks that require nuanced, instruction-sensitive behavior.
Parameter-efficient finetuning is crucial when resources are limited or when working with very large models. These approaches reduce the number of parameters that need to be updated during finetuning, decreasing memory usage and speeding up training.
LoRA introduces the idea of low-rank adaptation matrices. Instead of updating all model parameters, LoRA injects small trainable matrices in strategic locations (such as attention layers) while freezing the rest of the model. This drastically cuts down both the number of trainable parameters and GPU memory requirements. For example, rather than retraining millions (or billions) of weights, LoRA can adapt the model with only thousands of new parameters, enabling personalized and domain-specific finetuning even on modest hardware.
LoRA vs. full finetuning
LoRA’s targeted updates mean it learns less overall but also retains more of the original model’s knowledge, which helps prevent catastrophic forgetting. Full finetuning, while powerful, risks overfitting and is much more computationally expensive.
Research shows that LoRA’s tradeoff—in terms of slightly reduced capacity for outlier cases, but stable and robust performance for domain adaptation—makes it a superior choice for many specialized applications, including those involving code data or instruction-following tasks.
This section walks you through setting up a LoRA finetuning workflow for Gemma 3 270M using Hugging Face Transformers, PEFT, and integrating each step with Weights & Biases for tracking experiments, managing models, and running evaluation pipelines with Weave. All code examples are complete and ready to run.
Step 1: Set up your environment
Begin by installing the required packages. You’ll use Hugging Face Transformers, PEFT for adapter-based finetuning, and Weights & Biases for experiment tracking and model management. Optionally add Weave for analysis pipelines.
This section explores step 1: install dependencies in your environment., providing essential context and background information. Understanding these fundamentals is crucial before diving into the specific details of skip lines if you already have the packages installed..
!pip install wandb transformers peft datasets accelerate bitsandbytes weave
Expected output:
Successfully installed...
💡 Tip: Always create a new Python virtual environment for experimentation to avoid package version conflicts.
⚠️ Troubleshooting:
- If pip install fails due to a permissions error, try adding --user or consider using a virtual environment like venv or conda.
- For CUDA errors, double-check your GPU’s compatibility with installed package versions.
Step 2: Initialize Weights & Biases
Sign in to your W&B account and initialize a new project for tracking all experiments and results.
import wandb
() # Will prompt for API key if not already logged in
(project="gemma-python-finetune", name="01-lora-baseline")
Expected output:
Tracking run with wandb, Run url: https:///<your-username>/gemma-python-finetune/runs/<run-id>
💡 Tip: Use descriptive names for runs to make comparisons easier in the W&B dashboard.
Step 3: Load Gemma 3 270M and prepare data
Fetch the pre-trained Gemma 3 270M model (if available through Hugging Face or your team’s private Model Registry) and load a sample Python code dataset for finetuning.
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
model_name = "google/gemma-3b-270m"
tokenizer = (modelname)
dataset = load_dataset("codeparrot/codecomplex", split="train[:1000]") # Use small subset for this demo
print("Sample data:", dataset["code"])
Expected output:
Sample data: def example_function(x): ...
💡 Tip: Ensure your dataset is clean and matches the style you want the model to learn. For larger scale, consider using the full train split.
Step 4: Prepare data for supervised instruction tuning
Tokenize the code data and structure it for instruction modeling: the model sees the full instruction (for example, "Write a Python function...") and the code.
def preprocess(example):
instruction = "Write a Python function to: "
src = instruction + example["instruction"]
tgt = "\n" + example["code"]
text = src + tgt
return tokenizer(text, truncation=True, padding="maxlength", maxlength=256)
dataset = (preprocess)
print("Tokenized input IDs:", dataset["input_ids"][:10])
Expected output:
Tokenized input IDs: [394, 1939, 194, 10391, ...]
💡 Tip: Experiment with prompt templates to best match your downstream application (e.g., conversational agent vs. code completion).
Step 5: Apply LoRA adapters
Use the PEFT library to add LoRA adapters to the model. This will inject small trainable projections in attention layers, reducing memory and computation needs.
from peft import LoraConfig, getpeftmodel
import torch
model = AutoModelForCausalLM.from_pretrained(
model_name,
loadin8bit=True,
device_map="auto"
)
lora_config = LoraConfig(
r=8, # Rank of LoRA matrices
lora_alpha=32,
targetmodules=["qproj", "v_proj"], # Tune query and value projections
lora_dropout=0.05,
bias="none",
tasktype="CAUSALLM"
)
model = getpeftmodel(model, lora_config)
()
Expected output:
trainable params: 1,048,576 || all params: 67,108,864 || trainable%: 1.56
💡 Tip: Adjust the rank (r) and alpha to trade off between parameter efficiency and performance.
⚠️ Troubleshooting:
- If you get a ValueError about 8-bit mode, try using loadin4bit or loading in default precision.
- Make sure
target_modulesmatch actual model layer names (check model architecture if unsure).
Step 6: Set up the training loop with W&B tracking
Use Hugging Face’s Trainer API to train your LoRA-augmented model, logging all relevant metrics to W&B. Optionally, use Accelerate for optimized multi-GPU or mixed precision training.
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
perdevicetrainbatchsize=4,
numtrainepochs=2,
learning_rate=2e-4,
fp16=True,
evaluation_strategy="steps",
eval_steps=50,
logging_steps=10,
output_dir="./results",
report_to="wandb" # Report all logs and metrics to WandB
)
import numpy as np
import torch
class CustomDataset:
def init(self, dataset):
= dataset
def len(self):
return len
def getitem(self, idx):
item = {key: (val) for key, val in [idx].items() if key in ["inputids", "attentionmask"]}
item["labels"] = item["input_ids"].clone()
return item
train_dataset = CustomDataset(dataset)
trainer = Trainer(
model=model,
args=training_args,
traindataset=traindataset,
evaldataset=traindataset, # For demonstration; use a separate validation set for real projects
)
()
Expected output in notebook and W&B run page:
Running training
...
wandb: Synced metrics: loss, val_loss, steps, etc.
💡 Tip: Always monitor loss curves and learning rate schedules in the W&B UI. Sudden loss spikes can indicate data or optimizer problems.
⚠️ Troubleshooting:
- If you see CUDA OOM errors, lower the batch size or try fp16/mixed-precision training.
- If logs don’t appear in W&B, make sure WANDBAPIKEY is set and internet access is available.
Step 7: Save, register, and share your model with W&B Models
Once finetuning completes, save your LoRA-adapted model and push it to the W&B Model Registry for easy versioning, collaboration, and downstream deployment.
import os
SAVEPATH = "./loragemma3_codepython"
(SAVEPATH, existok=True)
(SAVEPATH)
(SAVEPATH)
artifact = (
name="gemma3-python-lora",
type="model",
description="Gemma 3 270M finetuned on Python code with LoRA",
metadata={"epochs": 2, "lora_rank": 8}
)
(SAVEPATH)
wandb.log_artifact(artifact)
Expected output:
wandb: Adding directory ./loragemma3codepython to artifact
wandb: Waiting for artifact upload to finish...
...
💡 Tip: Add clear metadata and a README to the model artifact for reproducibility and team collaboration.
Step 8: Evaluate and analyze results with W&B Weave
Use W&B Weave to create reproducible analysis pipelines—plot learning curves, compare model versions, and run evaluation queries directly on logged results.
import weave
runs = ("wandb-artifact").filter(
("project") == "gemma-python-finetune"
)
loss_plot = (lambda run: {
"run": run["run_name"],
"loss": run["metrics"]["train"]["loss"],
"step": run["metrics"]["train"]["global_step"]
}).(x="step", y="loss", color="run")
loss_plot.show()
Expected output: A line plot showing loss curves for every run in your W&B project, revealing convergence patterns and performance differences.
💡 Tip: With Weave, you can quickly explore relationships between hyperparameters and results, and build custom dashboards to share with your team.
This section explores advanced finetuning concepts, providing essential context and background information. Understanding these fundamentals is crucial before diving into the specific details of domain adaptation and layernorm tuning.
Domain adaptation and LayerNorm tuning
Domain adaptation means modifying an LLM so it works better on a specific data distribution or task—here, Python code. LayerNorm Tuning is a strategy where, rather than adjusting many hundreds of millions of weights, you tune only the LayerNorm parameters. These often control much of the model’s domain sensitivity and stability.
Combining LayerNorm Tuning with LoRA or in isolation can further enhance domain transfer, particularly if training resources or labeled data are limited.
Exploring MoRA: high-rank updating
MoRA (Modified Rank Adaptation) builds on LoRA but relaxes the low-rank constraint. Instead of forcing parameter updates to be of very low rank, MoRA allows higher-rank (and thus potentially more expressive) updates while still remaining more parameter-efficient than full finetuning.
MoRA may outperform LoRA in cases where task complexity demands richer transformations, or where the domain shift is significant.
Exercise: Try substituting a MoRA-style adapter (e.g., increase the rank parameter to 64 or higher) in the LoRA-config step above and compare the resulting validation loss and sample generations.
Fine-tuning Gemma 3 270M on Python code with parameter-efficient methods like LoRA offers a powerful, practical path to domain adaptation without the heavy computational footprint of classic approaches. Using Weights & Biases tools, including Weave and W&B Models, ensures your experiments are trackable, results are reproducible, and collaboration is seamless.
As LLM research advances, expect new adapter-based and normalization-centric strategies such as MoRA and LayerNorm Tuning to offer even finer control and improved efficiency. Stay updated by exploring and experimenting with these evolving techniques.