LLM finetuning handson
Created on November 8|Last edited on July 15
Comment
In this report, we'll cover a process of llm finetuning with W&B. There is a free training course of llm finetuning "Training and Fine-tuning Large Language Models (LLMs)". Please check it out too.
Asset to learn about Wandb
You can leverage many assets except for wandb official documentation. Visit to "For those who started to use W&B " to see learning resources provided by wandb.
- Training video
- Youtube
- Use case
- Chatbot
- Sample code
Wandb setup
1. Wandb login
For wandb setup, please refer this document. If you are using W&B dedicated cloud, VPC, or on-prem, please refer this and the following figure.

2. 101 of experiment tracking and W&B dashboard
You can catch up a basics of W&B experiment tracking in this course (W&B 101) for 15 minutes. Sample code is here. Colaboratory is here.
Please note that if you register the key as an environment variable, you can log in without entering the API key each time.
You can find you API key in your user setting page or in https://wandb.ai/authorize (this is multi-tenant SaaS only).
3. Team collaboration

W&B experiments are mainly managed in the order of entity => project => Run. An entity is a team unit. By default, it is set as a personal entity, but you can create a team entity and manage the same project within a team. However, for personal or academic use, you can only participate in one entity other than your personal one. The unit under the entity is the project. As the name suggests, please use it for one ML or DL project. You'll need to run many experiments within the project, and Runs are managed under the project. Note that entities and projects are created manually, while Runs are automatically created with each execution.
LLM-finetuning
Please use V100 or A100!!!
💡
1. Data versioning (Artifacts) and visualize it (Table)
In this hands-on, we will use the Alpaca dataset. The Alpaca dataset is a synthetic dataset developed by Stanford researchers using the OpenAI davinci model to generate instruction/output pairs and fine-tuned Llama. The dataset covers a diverse list of user-oriented instructions, including email writing, social media, and productivity tools.
We'll use an updated version that, instead of davinci-003 (GPT-3), uses GPT-4 to get an even better model! More details on the official dataset repo.
The Alpaca-GPT4 dataset is just a single JSON file, alpaca_gpt4_data.json contains 52K instruction-following data generated by GPT-4 with prompts in Alpaca. This JSON file has the same format as Alpaca data, except the output is generated by GPT-4.
An example:
instruction: str, describes the task the model should perform.Each of the 52K instructions is unique.input: str, optional context or input for the task.output: str, the answer to the instruction as generated by GPT-4.
Table
Let's log the dataset as a W&B Table to be able to inspect the different instruction/output pairs quickly. We will use Tables later on to inspect our model predictions and compare different training recipes. Check the table below using the arrows and hovering over the instruction/input/output tooltips:
Run set
59
If you want to learn more about Table, please check "Beyond experiment tracking and best practice of wandb" and Table section of wandb official documentation.
Artifacts


# log to wandbwandb_project = "llm-finetuning-handson"with wandb.init(project=wandb_project):at = wandb.Artifact(name="alpaca_gpt4",type="dataset",description="A GPT4 generated Alpaca like dataset for instruction finetunning",metadata={"url":"https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#how-good-is-the-data"},)at.add_file(dataset_file)wandb.log_artifact(at)# log as a tabletable = wandb.Table(columns=list(alpaca[0].keys()))for row in alpaca:table.add_data(*row.values())wandb.log({"alpaca_gpt4_table": table}

2. Training (Experiment tracking)
Let's fine-tune model! You can use HuggingFace integration, so you don't have to create a script run.log(loss).
config = {"BASE_MODEL":"facebook/opt-125m","lora_config":{"r":32,"lora_alpha":16,'target_modules': [f"model.decoder.layers.{i}.self_attn.{proj}_proj" for i in range(31) for proj in ['q', 'k', 'v']],"lora_dropout":.1,"bias":"none","task_type":"CAUSAL_LM"},"training_args":{"dataloader_num_workers":16,"evaluation_strategy":"steps","per_device_train_batch_size":8,"max_steps": 50,"gradient_accumulation_steps":2,"report_to":"wandb",#wandb integration"warmup_steps":10,"num_train_epochs":1,"learning_rate":2e-4,"fp16":True,"logging_steps":10,"save_steps":10,"output_dir":'./outputs'}}with wandb.init(project=wandb_project, config=config, job_type="training") as run:# track datarun.use_artifact('wandb-public/llm-finetuning-handson/alpaca_gpt4_splitted:v0')# Setup for LoRalora_config = LoraConfig(**wandb.config["lora_config"])model_peft = get_peft_model(model, lora_config)model_peft.print_trainable_parameters()model_peft.config.use_cache = Falsetrainer = transformers.Trainer(model=model_peft,data_collator= collator,args=transformers.TrainingArguments(**wandb.config["training_args"]),train_dataset=train_dataset,eval_dataset=val_dataset)trainer.train()run.log_code()
3. Visualize pairs of input/output & Evaluation (Table)
Let's do some inference with the finetuned model and the evaluation dataset.
Although we don't cover in this hands-on, you can learn a model-based evaluation method in the following report.
Run: fluent-tree-9
1
With Table, you can visualize not just only table data but also images and videos!

4. Hyper-parameter tuning (Sweep)
With Sweep, you can run multiple runs at once within the hyper-parameter areas, and get a nice visualization to understand which pattern of hyper-parameters generate the best performance.
Sweep: r8v60q8t 1
19
Sweep: r8v60q8t 2
0
5. Share results (Reports)
Let's share your insights with a nice reporting tool, Reports! Actually, this blog's platform is Reports!

6. Manage models in the team and create automated process (Model registry / Automations / Launch)
You can learn about Model registry with a nice video and a training.
Reference
A Gentle Introduction to LLM APIs
In this article, we dive into how large language models (LLMs) work, starting with tokenization and sampling, before exploring how to use them in your applications.
Prompt Engineering LLMs with LangChain and W&B
Join us for tips and tricks to improve your prompt engineering for LLMs. Then, stick around and find out how LangChain and W&B can make your life a whole lot easier.
How to Evaluate, Compare, and Optimize LLM Systems
This article provides an interactive look into how to go about evaluating your large language model (LLM) systems and how to approach optimizing the hyperparameters.
How to Evaluate an LLM, Part 1: Building an Evaluation Dataset for our LLM System
Building gold standard questions for evaluating our QA bot based on production data.
How to Fine-Tune an LLM Part 1: Preparing a Dataset for Instruction Tuning
Learn how to fine-tune an LLM on an instruction dataset! We'll cover how to format the data and train a model like Llama2, Mistral, etc. is this minimal example in (almost) pure PyTorch.
How to evaluate an LLM Part 3: LLMs evaluating LLMs
Employing auto-evaluation strategies to evaluate different component of our Wandbot RAG-based support system.
How to Fine-Tune an LLM Part 2: Instruction Tuning Llama 2
In part 1, we prepped our dataset. In part 2, we train our model

Training and Fine-tuning Large Language Models (LLMs)
Explore the architecture, training techniques, and fine-tuning methods for creating powerful LLMs. Gain theory and hands-on experience from Jonathan Frankle (MosaicML), and other industry leaders, and learn cutting-edge techniques like LoRA and RLHF.

Building LLM-Powered Apps
Learn how to build LLM-powered applications using LLM APIs, Langchain and W&B Prompts. This course will guide you through the entire process of designing, experimenting, and evaluating LLM-based apps.
Add a comment