Fine-tune Gemma 3 270M for Python code efficiency
Fine-tune Gemma 3 270M for Python code efficiency
Fine-tuning large language models like Gemma 3 270M on Python code data enhances their ability to perform specific tasks efficiently. This process involves adapting the model to new data, improving its performance in coding tasks, and leveraging techniques like instruction tuning and parameter-efficient finetuning. Fine-tuning empowers you to adapt models to your unique problem space, whether that's answering complex queries, generating high-quality code, or tackling specialized workflows. In this tutorial, you'll learn exactly how to fine-tune Gemma 3 270M on Python code using parameter-efficient methods and Weights & Biases (W&B) tools, with a hands-on, code-first approach.
Understanding LLM finetuning
Large Language Model (LLM) finetuning adapts a pre-trained model to better solve specific tasks or perform well within a particular data domain. Pre-trained LLMs, such as Gemma 3 270M, are originally trained for general language understanding. Finetuning continues training the model on a curated dataset aligned with your task, improving task-specific accuracy and relevance.
Instruction tuning is a specific form of finetuning where the model learns to correctly interpret and follow explicit instructions in data. Parameter-efficient finetuning, especially methods like LoRA (Low-Rank Adaptation), limits the number of trainable parameters and memory requirements—making it possible to adapt even large models with modest hardware.
- Key point: Finetuning unlocks LLM personalization without huge compute costs by leveraging efficient adaptation strategies like LoRA.
The impact of masking instructions during instruction finetuning
Traditionally, instruction finetuning involves masking or hiding the instruction portion of training data when calculating loss. This aims to force the model to focus only on generating the answer, assuming instructions are always input and not something to be modeled.
Recent research indicates that not masking instructions—known as "instruction modeling"—helps models learn how to follow instructions more robustly. By modeling instructions alongside answers, the model gains a better contextual understanding of input-output relationships.
- Key point: Modeling (not masking) instructions during finetuning generally leads to improved performance, as the model learns both what to do and how to respond in context.
💡 Tip: Experiment with both strategies if your use-case requires nuanced instruction-following.
Exploring parameter-efficient finetuning techniques
Large models can contain billions of parameters. Full finetuning (updating every weight) is resource-intensive. Parameter-efficient methods like LoRA make adaptation far cheaper and accessible to more practitioners and organizations.
Not runnable code: Example illustration only, replace with and focus on real workflows in practical guide below
import as nn
class SimpleLoRAAdapter: def init(self, infeatures, outfeatures, rank=8): super().init() self.A = (in_features, rank, bias=False) self.B = (rank, out_features, bias=False)
def forward(self, x): return self.B(self.A(x))
LoRA’s real impact is in integration with common frameworks like Hugging Face Transformers and training infrastructure such as Weights & Biases.
## Comparing LoRA to full finetuning
LoRA-based finetuning trains far fewer parameters than full finetuning, often with minimal performance degradation. This means:
- Key point: LoRA can reach similar accuracy and generalization with much lower GPU, time, and storage needs.
- Key point: LoRA allows quick and targeted adaptation, so you can perform domain adaptation on code data (like Python) or switch between tasks with minimal effort.
For example, full finetuning Gemma 3 270M might require training and saving hundreds of megabytes of weights, whereas LoRA adapters are typically only a few megabytes.
💡 Tip: For most application-level finetuning, start with LoRA or related adapter-based techniques before considering full-model updates.
# Practical guide to finetuning with Weights & Biases
Finetuning Gemma 3 270M on Python code data can be made reproducible, monitorable, and shareable using Weights & Biases. This hands-on walkthrough uses W&B's Weave for experiment orchestration and tracking.
We'll demonstrate:
1. Environment setup
2. Preparing Python code data
3. Loading Gemma 3 270M with a LoRA adapter
4. Training and evaluating with W&B tracking
5. Inspecting results in the W&B Dashboard
## Step 1: Environment setup
You'll need `transformers`, `peft` (for LoRA), `datasets`, and `wandb`. If you haven't already, install these with:
```python
# Step 1: Install required libraries
This section covers step 1: install required libraries before moving into run this in a notebook cell or shell.
# Run this in a notebook cell or shell
!pip install transformers peft datasets wandb weave
Expected output (truncated for brevity):
Collecting transformers ...
Collecting peft ...
Collecting datasets ...
Collecting wandb ...
Collecting weave ...
...
Successfully installed peft-... transformers-... datasets-... wandb-... weave-...
Step 2: Initialize Weights & Biases (W&B) and Weave
Set up your W&B API key and import necessary packages.
# Step 2: Import packages and login
import wandb
import weave
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, getpeftmodel, preparemodelforkbittraining
from datasets import load_dataset
# Login for W&B
()
You will be prompted for your W&B API key the first time.
💡 Tip: You can generate a free API key from your [W&B account settings]. Set it as the WANDBAPIKEY environment variable for non-interactive environments.
⚠️ Troubleshooting:
- If you see an authentication error, check your internet connection and re-run
().
Step 3: Prepare your Python code data
Let's use The Stack dataset, which contains code in multiple languages. We'll filter it to Python and use a small subset for demonstration.
# Step 3: Load Python code data
dataset = loaddataset("bigcode/the-stack", datadir="data/python", split="train", streaming=True)
# For demonstration, sample 500 Python functions only
python_samples = []
for idx, example in enumerate(dataset):
python_samples.append(example["content"])
if idx >= 499:
break
print("Sample Python code snippet:")
print(python_samples[:200]) # print first 200 chars
Expected output (first 200 characters of a Python function):
Sample Python code snippet:
def foo(x):
# Some example code
return x**2
class Bar:
def init(self, y):
self.y = y
💡 Tip: For larger experiments, stream more data, or load the full dataset as needed.
Step 4: Tokenization and dataset formatting
Transformers models require tokenized input. We'll convert our Python code samples into a training dataset with labels matching the input (self-supervised code modeling).
# Step 4: Tokenize samples
model_name = "google/gemma-2b" # Example, replace with gemma-3-270m when available
tokenizer = (modelname)
def tokenize_function(example):
return tokenizer(
example, padding="maxlength", maxlength=512, truncation=True, return_tensors="pt"
)
tokenizedtexts = [tokenizefunction(code)["inputids"].squeeze(0) for code in pythonsamples]
import torch
class PythonDataset:
def init(self, input_ids):
= inputids
def len(self):
return len(self.input_ids)
def getitem(self, idx):
x = self.input_ids[idx]
return {"input_ids": x, "labels": ()}
traindataset = PythonDataset(tokenizedtexts)
print("Number of Python code samples:", len(train_dataset))
Expected output:
Number of Python code samples: 500
Step 5: Apply LoRA adapters
Wrap the base model with a LoRA configuration. We'll use a lightweight configuration for this example.
# Step 5: Load model and apply LoRA
from peft import LoraConfig, getpeftmodel, preparemodelforkbittraining
lora_config = LoraConfig(
r=8, # LoRA rank
lora_alpha=16, # Scaling
targetmodules=["qproj", "v_proj"], # Typical for transformer models
lora_dropout=0.05,
bias="none",
tasktype="CAUSALLM"
)
# Load base model
model = (modelname, device_map="auto")
model = preparemodelforkbittraining(model) # Optional: enables 8bit/4bit finetuning
# Wrap with LoRA adapters
model = getpeftmodel(model, lora_config)
print("Trainable parameters:", sum(() for p in () if p.requires_grad))
Expected output (counts trainable parameters; will be far fewer than full model):
Trainable parameters: 393216
💡 Tip: Choose target modules like qproj, vproj, or similar based on your model architecture for best efficiency.
Step 6: Set up training with W&B Weave tracking
Leverage W&B to log, visualize, and compare all your runs. Here's a minimal Trainer integration:
# Step 6: Training setup with W&B
from transformers import Trainer, TrainingArguments
(project="gemma-python-finetuning", config={
"epochs": 1,
"lorarank": loraconfig.r,
"trainsize": len(traindataset)
})
training_args = TrainingArguments(
output_dir="./finetuned",
overwriteoutputdir=True,
numtrainepochs=1,
perdevicetrainbatchsize=2,
save_steps=10,
logging_steps=5,
report_to="wandb", # Key: logs to W&B
run_name=,
fp16=True,
learning_rate=5e-4,
)
trainer = Trainer(
model=model,
args=training_args,
traindataset=traindataset,
tokenizer=tokenizer,
)
print("Starting training...")
()
print("Training complete!")
()
Expected output (truncated, logs loss and metrics to the W&B dashboard):
Starting training...
/wandb/run-xxx
Running training
Num examples = 500
Num Epochs = 1
...
Training complete!
💡 Tip: Monitor your run live at the W&B link output above: you’ll see loss curves, advanced metrics, and parameter configs.
⚠️ Troubleshooting:
- If you get an out-of-memory error, reduce
perdevicetrainbatchsizeor setfp16=Falseif your GPU doesn’t support mixed-precision.
Step 7: Evaluate, save, and share your finetuned model
After training, evaluate the results and log the model directly to W&B Models for reproducible sharing.
# Step 7: Save, evaluate, and log to W&B Models
import os
save_path = "./finetuned"
(savepath)
(savepath)
# Quick evaluation: generate Python code given a docstring
prompt = "# Write a Python function to compute the factorial of a number.\ndef "
tokenizedprompt = tokenizer(prompt, returntensors="pt").input_ids.to
outputs = (tokenizedprompt, maxnew_tokens=40)
generatedcode = (outputs, skipspecial_tokens=True)
print("Generated Python code:")
print(generated_code)
# Log model to W&B Models Registry
artifact = (
"gemma-python-lora", type="model",
description="Gemma 3 270M finetuned on Python code with LoRA",
metadata={"framework": "Transformers", "method": "LoRA"}
)
(savepath)
run = (project="gemma-python-finetuning")
run.log_artifact(artifact)
()
Expected output (generated code should reflect Pythonic style, may be simple):
Generated Python code:
# Write a Python function to compute the factorial of a number.
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
💡 Tip: Your model is now accessible and shareable through W&B Models Registry—include eval notebooks or documentation for collaborators!
⚠️ Troubleshooting:
- If your code output is poor, try training for more epochs, with more data, or tweak LoRA config.
Practical exercise
- Try finetuning Gemma on code for a different language (e.g., JavaScript, Go) by changing your data filter and following the same steps.
- Experiment with instruction-formatting: prepend clear task descriptions or docstrings to your code to see if instruction modeling helps.
Alternative use cases and applications
The LoRA finetuning process applies seamlessly to many domains, not just Python code. Use cases include:
- Adapting LLMs for industry-specific code generation, technical documentation, or natural language question answering
- Finetuning for different programming paradigms (object-oriented vs. functional)
- Instruction tuning for customer service bots, data analysis assistants, or scientific research tools
- Domain adaptation for medical, legal, or financial data
You can use W&B Models and Weave to monitor, compare, and deploy any of these model adaptations, bringing transparency and reproducibility to your workflow.
💡 Tip: W&B Weave enables you to compose data apps and experiment dashboards to visualize evaluation results and facilitate cross-team collaboration.
Conclusion
LLM finetuning offers significant advantages: models become better at specific tasks, you save on compute and storage costs, and you maintain flexibility to adapt to new domains. Techniques like LoRA make this possible even with modest resources.
Try applying these techniques to your own data. Leverage W&B experiment tracking, reproducible artifact logging, and Weave for dashboarding, and you’ll accelerate both research and production deployments.
Sources
- https:///site
- https:///
- https:///docs/transformers/main/en/main_classes/model
- https:///huggingface/peft
- https:///datasets/bigcode/the-stack
- https:///weave
Sources
- [W&B account settings]