Fine-tuning Gemma 3 270M for Python code proficiency
Fine-tuning Gemma 3 270M for Python code proficiency
Fine-tuning large language models (LLMs) like Gemma 3 270M on specialized datasets, such as Python code, can significantly boost their ability to perform tailored tasks. Parameter-efficient approaches like Low-Rank Adaptation (LoRA) and instruction tuning enable this specialization without prohibitive resource costs. This tutorial will guide you, step by step, through understanding, implementing, and evaluating parameter-efficient LLM finetuning for Python code data, with practical examples and the integration of Weights & Biases tools such as Weave and W&B Models throughout.
Understanding LLM finetuning
Fine-tuning adapts a pre-trained language model to a particular dataset or task by updating its parameters in response to new task-specific examples. For LLMs like Gemma 3 270M, this process is crucial for tasks such as generating or correcting Python code. Finetuning updates the internal weights so the model better aligns with your target data.
Standard finetuning affects all model weights, but this can be computationally expensive and prone to overfitting or catastrophic forgetting. Alternatively, methods can focus on more stable components, for example, tuning LayerNorm parameters—which act as normalization layers inside the model. By specializing only certain modules, such approaches can efficiently capture new domain knowledge without excessive retraining.
Instruction tuning and its impact
Instruction tuning involves training LLMs to better follow explicit instructions phrased in natural language. This enhances their performance on tasks where detailed directives guide the desired output, such as “Write a Python function to reverse a linked list.”
Recent research shows keeping the instruction text visible (not masked or hidden) during finetuning improves model comprehension and output accuracy. This "instruction modeling" boosts performance, especially for code tasks where understanding detailed directives is critical. Ignoring or masking instructions during training can hinder the LLM’s ability to interpret user prompts contextually.
Parameter-efficient finetuning techniques
Large models with billions of parameters are costly to retrain. Parameter-efficient finetuning addresses this by modifying only a small subset of parameters, preserving general knowledge and lowering computational load.
Comparing LoRA to full finetuning
With LoRA, most of the model remains unchanged (frozen), while a lightweight adapter learns the new task. Research demonstrates that LoRA adaptation:
- Preserves the model's general capabilities better than full finetuning, reducing the risk of catastrophic forgetting.
- Maintains competitive or superior performance on downstream tasks compared to full finetuning, while using less GPU memory and training time.
- Provides efficient scaling as tasks or domains become more varied.
This efficiency enables rapid experimentation and domain transfer, critical for diverse code tasks.
Implementing LoRA in practice
Let’s walk through a hands-on example of fine-tuning Gemma 3 270M on a small Python code dataset using LoRA, PyTorch, Hugging Face libraries, and Weights & Biases tools.
Step 1: Setup environment and dependencies
First, ensure you have the necessary libraries installed: PyTorch, Hugging Face Transformers, BitsAndBytes, W&B, and peft (for LoRA).
# Step 1: Install dependencies
This section covers step 1: install dependencies before moving into run this in your terminal or as a notebook cell.
# Run this in your terminal or as a notebook cell
!pip install torch==2.2.0 bitsandbytes==0.41.0 wandb transformers==4.40.1 datasets peft==0.10.0 weave
# Imports
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, getpeftmodel, TaskType
from datasets import load_dataset, Dataset
import wandb
import weave
import os
Expected output:
Successfully installed torch, bitsandbytes, wandb, transformers, datasets, peft, weave
💡 Tip: Pin library versions to avoid compatibility issues.
⚠️ Troubleshooting: If you encounter CUDA or library errors, ensure your machine has sufficient GPU (ideally >16GB VRAM) and matches the torch/cu version.
Step 2: Log in to Weights & Biases and set up your project
W&B helps you track, compare, and visualize experiments. Log in to W&B and initialize your project.
# Step 2: Log in and configure W&B
() # Enter your API key in the prompted interface
run = (project="gemma3-270m-code-finetune")
Expected output:
Successfully logged in to Weights & Biases!
💡 Tip: Set WANDB_PROJECT environment variable so every training run is nicely grouped.
⚠️ Troubleshooting: If () hangs, ensure your internet connection is active and your API key is correct.
Step 3: Prepare (or mock) a Python code instruction dataset
For most code tasks, datasets contain pairs of (instruction, code output):
# Sample small Python code generation dataset
examples = [
{"instruction": "Write a Python function to compute factorial.", "output": "def factorial(n):\n return 1 if n == 0 else n * factorial(n-1)"},
{"instruction": "Create a function to sum a list.", "output": "def sum_list(lst):\n return sum(lst)"},
{"instruction": "Write a function to check for palindromes.", "output": "def is_palindrome(s):\n return s == s[::-1]"},
]
# Convert to Hugging Face Dataset
traindataset = ([
{"text": f"Instruction: {ex['instruction']}\nAnswer:\n{ex['output']}"} for ex in examples
])
Expected output:
Dataset({
features: ['text'],
num_rows: 3
})
💡 Tip: For real projects, use larger datasets, e.g., OpenAI CodeAlpaca or your own annotated scripts.
⚠️ Troubleshooting: If Dataset.from_list fails, check that all entries are dictionaries with string fields.
Step 4: Load Gemma 3 270M via Hugging Face and set up tokenizer
# Step 4: Load model checkpoint and tokenizer
model_name = "google/gemma-2b" # Substitute with "gemma-3-270m" when available (use 2B here for demonstration)
tokenizer = (modelname)
model = (modelname, torchdtype=torch.float16, devicemap="auto")
Expected output:
Downloading (...) to Insert cache...
Loading weights...
💡 Tip: Use device_map="auto" for automatic CUDA/CPU allocation.
⚠️ Troubleshooting: If model is too large (out-of-memory), consider running in Colab or lower the model size.
Step 5: Apply LoRA via the peft library
# Step 5: Configure and apply LoRA
lora_config = LoraConfig(
r=8, # Rank decomposition dimension
lora_alpha=16, # Scaling factor
targetmodules=["qproj", "v_proj"], # Common for transformer-based LLMs
lora_dropout=0.05, # Dropout for LoRA
bias="none",
tasktype=
)
# Inject LoRA adapters into the model
model = getpeftmodel(model, lora_config)
print(()) # Display how many parameters will be trained
Expected output:
trainable params: 700,416 || all params: 2,000,000,000 || trainable%: 0.035%
💡 Tip: Modify r and lora_alpha for trade-offs between capacity and speed.
⚠️ Troubleshooting: If getpeftmodel errors, check that peft version supports your Hugging Face transformers version.
Step 6: Preprocess data for model input
# Step 6: Tokenize dataset for model input
def preprocess(batch):
result = tokenizer(
batch["text"],
padding="max_length",
truncation=True,
max_length=256,
return_tensors="pt"
)
result["labels"] = result["input_ids"].clone() # Causal LM: shift left
return result
tokenizeddataset = (preprocess, batched=True)
(
type="torch",
columns=["inputids", "attentionmask", "labels"]
)
Expected output:
100%|██████████| 1/1 [00:00<00:00, 53.03ba/s]
💡 Tip: Adjust max_length depending on your prompt and answer lengths.
⚠️ Troubleshooting: Ensure no samples exceed max_length or are truncated unduly.
Step 7: Track datasets, models, and runs with W&B Models and Weave
Let’s log and version your data and model with W&B’s tools for reuse and reproducibility.
# Step 7.1: Log your training dataset as a Weave artifact
import weave
("your-wandb-entity/gemma3-270m-code-finetune")
artifact = (
name="python-instruct-code-dataset",
type="dataset",
description="Demo Python code-instruction pairs"
)
(train_dataset)
()
# Step 7.2: Register your model checkpoint as a W&B Model artifact after (or during) training below.
Expected output:
Saved artifact 'python-instruct-code-dataset' with version index 0
💡 Tip: Use Weave artifacts to maintain a versioned history of all data and models, ensuring reproducibility.
⚠️ Troubleshooting: Use correct naming conventions (entity/project) to avoid conflicts in your W&B workspace.
Step 8: Train the model using Trainer
# Step 8: Set up Hugging Face Trainer with W&B integration
training_args = TrainingArguments(
perdevicetrainbatchsize=1,
numtrainepochs=5,
learning_rate=1e-4,
logging_steps=1,
output_dir="./results",
report_to="wandb",
save_strategy="no" # For demonstration, save model manually later
)
trainer = Trainer(
model=model,
args=training_args,
traindataset=tokenizeddataset,
tokenizer=tokenizer
)
# Start training
()
Expected output:
Step Training Loss
1 1.82
2 1.33
... ...
💡 Tip: Use report_to="wandb" to see live metrics and attention visualizations in your W&B dashboard.
⚠️ Troubleshooting: If CUDA runs out of memory, reduce batch size or use gradient accumulation.
Step 9: Save, version, and share the trained LoRA model using W&B Models
# Step 9: Save and log the fine-tuned model as a W&B Model artifact
output_path = "./lora-gemma3-270m-python"
(outputpath)
(outputpath)
# Log to W&B
model_artifact = ("lora-gemma3-270m-python", type="model")
(output_path)
(modelartifact)
print("Model saved to W&B!")
Expected output:
Model saved to W&B!
💡 Tip: The artifact can be loaded in future projects or shared with team members securely.
⚠️ Troubleshooting: If adddir fails, check that the outputpath contains model and tokenizer files.
Step 10: Test the model and visualize results with Weave
# Step 10: Load the model for inference and generate completion
from transformers import pipeline
# Load adapter-enabled model for inference
pipe = pipeline(
"text-generation",
model=output_path,
tokenizer=tokenizer,
device=0 if .is_available() else -1
)
prompt = "Instruction: Write a function to reverse a string.\nAnswer:\n"
gen = pipe(prompt, maxlength=80, numreturnsequences=1, dosample=True)
print("Generated code:", gen["generated_text"])
Expected output:
Generated code: Instruction: Write a function to reverse a string.
Answer:
def reverse_string(s):
return s[::-1]
💡 Tip: Use Weave panels to visualize string match between reference and generated code, or to build custom dashboards for code evaluation.
⚠️ Troubleshooting: If inference is slow or produces errors, ensure the adapter weights are correctly loaded and not corrupted.
Exercise: Try extending the dataset
Challenge: Add more instruction-output pairs to the dataset above (at least 5 total). Retrain the model and observe improvements in generated results via your W&B tracking. Try instructions such as "Return the nth Fibonacci number", "Check if a year is a leap year", etc.
Domain adaptation and model training
Adapting pre-trained language models like Gemma 3 270M to specific domains requires techniques that transfer generalized knowledge while specializing for tasks like Python code completion. LoRA and instruction tuning allow you to retain broad language abilities of LLMs while adapting to niche contexts using limited training data.
A key aspect is the careful formatting of prompts and outputs, ensuring clear separation between instruction and code. This enables the model to generalize from your dataset schema to broader real-world queries.
Training models efficiently
Parameter-efficient finetuning, such as with LoRA, makes it feasible to train strong adapters on modest hardware. By updating only a fraction of weights (e.g., <1%), you greatly reduce both memory consumption and compute cost.
Matrix decomposition (in LoRA's case, using two small matrices instead of one large one) lies at the heart of this efficiency. It enables rapid iteration and deployment for a variety of code tasks, from docstring generation to bug detection, relying on W&B Weave to version and audit your artifacts across projects.
Conclusion
This tutorial walked through finetuning Gemma 3 270M on Python code data, leveraging parameter-efficient methods like LoRA and instruction tuning. Using Hugging Face, peft, and W&B tools (Weave and Models), you can efficiently specialize large LLMs for niche domains, tracking all experiments for reproducibility and future sharing.
Future directions include exploring more advanced adapters (like QLoRA), mixed-precision training, instruction-based reward optimization, and prompt-based finetuning strategies. Continue your journey by experimenting with larger datasets and integrating evaluation pipelines using W&B and Weave visualizations.
Sources
- https://
- https:///weave
- https:///huggingface/peft
- https:///docs/transformers/v4.40.1/en/main_classes/trainer
- https:///docs/datasets
- https:///huggingface/transformers
- https:///abs/2106.09685 (LoRA)
- https:///abs/2207.05221 (LoRA Learns Less and Forgets Less)
- https:///abs/2304.09842 (Instruction Tuning With Loss Over Instructions)
- https:///google/gemma
Continue exploring the official documentation for peft, Hugging Face, and Weights & Biases for the latest methods and integrations.