Fine-tune Gemma 3 270M for Python code with LoRA
Fine-tune Gemma 3 270M for Python code with LoRA
Fine-tuning large language models such as Gemma 3 270M for specialized tasks like Python code generation unlocks higher performance, adaptability, and relevance to unique data. By using parameter-efficient strategies such as LoRA, you can adapt these powerful models for niche applications on modest hardware while keeping training costs manageable. This tutorial will show you, step by step, how to fine-tune Gemma 3 270M for Python code using LoRA and highlight how Weights & Biases (W&B) Weave and Models provide transparency, monitoring, and comprehensive experiment tracking. You will set up your environment, implement LoRA, track everything with W&B, and end up with a trained, task-ready model.
Understanding LLM finetuning
LLM finetuning means taking a general-purpose large language model and giving it extra training so it performs better for your chosen domain, data, or instructions. By updating the model’s weights based on new examples (like Python code snippets), you tailor the model’s behavior to produce domain-specific, high-quality output. Finetuning exposes the model to vocabularies, coding conventions, and domain knowledge it may not have seen much of during pretraining.
When you finetune, you make a choice: retrain just some of the model’s parameters (for efficiency), or allow all of them to adapt (which can be computationally costly and risky for overfitting). This leads to innovations like parameter-efficient finetuning, including the use of adapters and LoRA, that only update a fraction of the model while maintaining strong performance. Instruction tuning is another strategy, where the model learns to better follow user prompts and tasks.
How does instruction tuning impact LLM performance?
Instruction tuning is a process where an LLM learns to pay attention to specific task instructions and respond accordingly. During finetuning, if you don’t mask (hide) the instructions, the model can memorize them as fixed text and not truly learn their utility. Masking forces the model to generalize from the instruction, improving its ability to follow varied prompts.
When instruction modeling (no masking) is used, the model copies instructions rather than understanding their intent, which can reduce its usefulness for real-world queries. With proper instruction tuning (using masking), the model better grasps the relationship between instructions and solutions, boosting its practical effectiveness.
Exploring Low-Rank Adaptation (LoRA)
Low-rank adaptation, or LoRA, is a parameter-efficient technique for adapting LLMs to custom tasks. Instead of updating massive weight matrices throughout the neural network, LoRA introduces small, trainable matrices (often called adapters) that decompose the update into low-rank components. This greatly reduces the number of trainable parameters, leading to faster training and lower computational needs, while retaining most of the performance benefits.
By injecting these lightweight adapters into key layers (such as transformers), LoRA tweaks the model just enough to adapt to your data, leaving the bulky original weights mostly unchanged. This approach is especially compelling when you want multiple, task-specific models without retraining the whole LLM for each one.
How does LoRA compare to full finetuning?
Full finetuning updates every parameter in the LLM — this means very high computational and memory requirements, and the risk of destroying general knowledge the pre-trained model possessed. LoRA, on the other hand, only adjusts a small fraction of parameters via the adapter matrices. While it can’t always match the absolute peak performance of full finetuning for some tasks, it often achieves nearly as good results for much less cost.
LoRA is ideal for domain adaptation and instruction tuning where dataset size or task similarity don’t justify total retraining. You get:
- Dramatically lower compute and memory demand
- Faster experimentation and easier model sharing (Just the LoRA weights)
- The ability to rapidly swap or combine domain-specific adapters
But full finetuning may outperform LoRA when:
- Massive, diverse datasets are available
- Maximum possible performance is non-negotiable
- The target task is drastically different from pretraining data
Tutorial: Fine-tuning Gemma 3 270M using LoRA
Fine-tuning Gemma 3 270M with LoRA involves five main steps:
Step 1: Set up your environment Step 2: Prepare your Python code dataset Step 3: Load Gemma 3 270M and configure LoRA Step 4: Integrate Weights & Biases, including Weave for experiment tracking and visualization Step 5: Train, monitor, and evaluate your tuned model
You’ll see code examples for each, with clear commentary, expected outputs, W&B integration, and troubleshooting.
Step 1: Set up your environment
You need recent Python, PyTorch, Hugging Face Transformers, PEFT (for LoRA), and W&B installed.
# Step 1.1: Install dependencies
This section covers step 1.1: install dependencies before moving into (run in your notebook/terminal if not already installed).
# (Run in your notebook/terminal if not already installed)
!pip install torch==2.1.2 transformers==4.39.3 peft==0.8.2 datasets==2.18.0 wandb==0.16.6 weave==0.50.0
# Step 1.2: Import libraries
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from datasets import load_dataset
import wandb
import weave
print("All packages imported successfully.")
Expected output:
All packages imported successfully.
💡 Tip: Always use matching versions—mismatched PEFT and Transformers can cause subtle compatibility issues.
⚠️ Troubleshooting: If installation fails, check for conflicting packages, update pip, and ensure you’re using a Python 3.8+ virtual environment. If running on Colab, restart the kernel after pip install.
Step 2: Prepare your Python code dataset
For hands-on learning, let’s use a publicly available Python dataset from Hugging Face’s Datasets library, such as a subset of the [codeparrot-clean] dataset.
# Step 2.1: Load a sample Python code dataset
dataset = load_dataset('codeparrot/codeparrot-clean', split='train[:2000]') # Small, fast for demo
print("Sample:", dataset['content'][:200]) # Print a snippet
# Step 2.2: Preview the dataset
print(f"Dataset size: {len(dataset)} samples")
Expected output:
Sample: def add(x, y):
return x + y
...
Dataset size: 2000 samples
💡 Tip: For real projects, prepare your own code snippets or combine several datasets for richer examples.
⚠️ Troubleshooting: Dataset download errors may be due to network issues or Hugging Face authentication. Try setting the environment variable HF_HOME or logging in with huggingface-cli login.
Step 3: Load Gemma 3 270M and configure LoRA
Gemma 3 270M is available on Hugging Face. Let’s load the model, set up the tokenizer, and prepare LoRA adapters.
# Step 3.1: Load the model and tokenizer
model_id = "google/gemma-2b" # For demonstration, use 2B (replace with '...270m' if available)
tokenizer = (modelid)
model = (modelid)
# Step 3.2: Prepare text formatting (prompt templates)
def format_example(example):
return {"input_ids": tokenizer(
example['content'],
truncation=True,
padding='max_length',
max_length=512,
return_tensors="pt"
)['input_ids'].squeeze(0)}
dataset = (format_example)
print("Tokenization sample:", dataset['input_ids'][:10])
# Step 3.3: Set up LoRA via PEFT
from peft import LoraConfig, getpeftmodel
lora_config = LoraConfig(
r=8, # Rank of LoRA update matrices
lora_alpha=16, # Scaling factor
targetmodules=["qproj","v_proj"], # Typical modules in transformers
lora_dropout=0.05,
bias="none",
tasktype="CAUSALLM"
)
model = getpeftmodel(model, lora_config)
# Show number of trainable parameters
def printtrainableparams(model):
trainable = sum(() for p in () if p.requires_grad)
total = sum(() for p in ())
print(f"Trainable params: {trainable} | Total params: {total} | Ratio: {trainable/total:.4f}")
printtrainableparams(model)
Expected output (numbers will vary):
Tokenization sample: tensor([ 3882, 325, 345, 313, 8, 1052, 2136, 310, 26, 1])
Trainable params: 3145728 | Total params: 2670632960 | Ratio: 0.0012
💡 Tip: Carefully select target_modules based on your base model’s architecture. For Gemma or other models, check which layers are available for LoRA injection.
⚠️ Troubleshooting: If you get attribute errors about qproj or vproj, double-check the architecture; not all transformer models use the same layer names. Inspect model.named_modules() to find suitable modules.
Step 4: Integrate Weights & Biases for experiment tracking
Now connect Weights & Biases to log your training process, metrics, and artifacts. Use W&B Weave for visualization and data lineage.
# Step 4.1: Log in to W&B (Paste your API key when prompted)
()
# Step 4.2: Initialize a run and configure W&B Models
run = (
project="gemma-lora-finetune",
name="gemma-270m-python-lora",
config={
"model": model_id,
"lorarank": loraconfig.r,
"max_length": 512,
"lr": 5e-5,
"epochs": 1
}
)
# Step 4.3: (Optional) Set up Weave dashboard for live experiment exploration
()
link = (run=run)
print(f"View W&B Weave dashboard at: {link}")
# Step 4.4: Define training arguments
training_args = TrainingArguments(
output_dir="./results",
numtrainepochs=1,
perdevicetrainbatchsize=2,
perdeviceevalbatchsize=2,
evaluation_strategy="steps",
save_steps=100,
savetotallimit=1,
logging_steps=10,
report_to=["wandb"],
learning_rate=5e-5,
fp16=.is_available(),
removeunusedcolumns=False,
run_name="gemma-270m-python-lora",
do_eval=False
)
# Step 4.5: Use a simple collate_fn for model input
def collate_fn(batch):
inputids = [item["inputids"] for item in batch]
inputids = (inputids)
attentionmask = (inputids != tokenizer.padtokenid).long()
labels = input_ids.clone()
return {"inputids": inputids, "attentionmask": attentionmask, "labels": labels}
# Step 4.6: Set up and start the Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
datacollator=collatefn
)
()
()
Expected output snippets:
View W&B Weave dashboard at: https:///your-username/gemma-lora-finetune/runs/xxxxxx
...
TrainOutput(globalstep=100, trainingloss=3.515, ...)
And in your W&B dashboard: real-time loss curves, hyperparameters, and code/artifact lineage visible in Weave.
💡 Tip: Use W&B Weave’s dashboard link to track and compare runs, link artifacts, and visualize evaluation metrics instantly.
⚠️ Troubleshooting:
- If training hangs on GPU: check memory usage with
nvidia-smi; reducebatch_sizeif needed. - If logging doesn’t appear in W&B: ensure your credentials, internet connection, and that
report_to=["wandb"]is set in your Trainer arguments.
Step 5: Evaluate and use your fine-tuned model
After training finishes, generate Python code from prompts and review completion quality. Let’s generate a short code snippet.
# Step 5.1: Reload model and tokenizer (if in a fresh session)
()
# Step 5.2: Provide a prompt and generate code
prompt = "Write a Python function to merge two lists.\n"
inputs = tokenizer(prompt, return_tensors="pt").to
with torch.no_grad():
outputs = (**inputs, maxnewtokens=64, topk=40, dosample=True)
generatedcode = (outputs, skipspecial_tokens=True)
print(f"Prompt:\n{prompt}\n\nGenerated:\n{generated_code}")
Expected output (actual text will vary based on model and data):
Prompt:
Write a Python function to merge two lists.
Generated:
Write a Python function to merge two lists.
def merge_lists(list1, list2):
return list1 + list2
💡 Tip: Always inspect generated code for correctness. Use tools like [W&B Tables] for organized evaluation and error tracking.
⚠️ Troubleshooting: If outputs are not coherent Python code, check your dataset cleanliness, training steps, and training/evaluation tokenization match.
Practical exercise
Try this: Change the dataset to another language (e.g., Java) using [codeparrot-java] and see how the prompt and model outputs change. Document your results using W&B Weave snapshot functionality.
Alternative use cases for LoRA
LoRA is not limited to code generation in Python. It’s flexible enough to support:
- Programming language adaptation (Java, C++, Rust, etc.)
- Domain-specific conversation (financial, legal, medical dialogue) by finetuning with conversational datasets
- Custom assistant behavior—incorporate company procedures or brand guidelines into customer-facing chatbots
- Natural language to SQL translation for database querying tools
Example: Switching the dataset in Step 2 to domain-specific documentation or question-answering pairs will adapt your LLM for technical support chatbots or domain Q&A. The low resource requirements make it easy to experiment with many verticals using the same base LLM and swap adapters based on user context.
Conclusion
LoRA brings a practical, hardware-friendly solution to the challenge of adapting large language models for specific domains or tasks. With this tutorial, you have learned how to fine-tune Gemma 3 270M for Python code generation, harnessing both LoRA’s parameter efficiency and W&B’s experiment tracking and data lineage tools. As LoRA and related parameter-efficient methods evolve, they will further democratize NLP by making task-specific adaptation of the latest LLMs feasible on more modest compute.
Sources
- https:///docs/datasets
- https:///docs/transformers
- https:///huggingface/peft
- https://
- https:///guides/weave
- https:///abs/2106.09685 (LoRA paper)
- https:///google/gemma-2b
- https:///datasets/codeparrot/codeparrot-clean
You now have all the tools to fine-tune Gemma 3 270M on your own code data, monitor results with W&B, and share/adapt models for almost any specialized scenario.
Sources
- [codeparrot-clean]
- [W&B Tables]
- [codeparrot-java]