Fine-Tuning Mistral7B on Previous ChatGPT Conversations With A Single GPU!
A tutorial for fine-tuning Mistral7B on your own personal data!
Created on September 29|Last edited on September 30
Comment
The world of large language models is undergoing rapid evolution, with new models pushing the boundaries of both performance and efficiency. One such significant advancement has come from Mistral AI, which has unveiled its latest marvel, Mistral 7B.
This state-of-the-art model houses 7.3 billion parameters and has been engineered to deliver top-tier performance across many benchmarks. But what truly distinguishes Mistral 7B is its unique blend of efficiency and adaptability. Remarkably, it challenges the performance of considerably larger models like Llama 2 13B and Llama 1 34B, setting a new standard in the field.

While Mistral 7B is impressive out of the box, there's huge potential in its capacity for fine-tuning. This tutorial aims to guide you through the process of fine-tuning Mistral 7B for a specific use-case—personalized ChatGPT conversations. We will leverage powerful tools like Hugging Face's Transformers library, DeepSpeed for optimization, and Choline for streamlined deployment on Vast.ai.
What We'll Cover
HuggingFace and DeepSpeed IntegrationWeights & Biases IntegrationCholine Let's Get Coding! The Data The Hardware Reserving an Instance The Training Script Time to Spin Up Some GPU's!
So if you're interested in harnessing the power of Mistral 7B for your unique applications, you've come to the right place. Read on to discover how you can unlock the model's full potential!
HuggingFace and DeepSpeed Integration
HuggingFace's Transformers library has become the go-to platform for working with state-of-the-art NLP models like Mistral 7B. For this tutorial, we will utilize the Transformers library to handle tasks like tokenization and model initialization. However, training a model as large as Mistral 7B can be resource-intensive, both in terms of time and computational power.
To deal with computation constraints, we will leverage DeepSpeed—a library designed to accelerate deep learning models. We'll leverage DeepSpeed's "Stage 2" optimization, which allows for offloading optimizer states to the CPU, reducing VRAM requirements.
Weights & Biases Integration
Keeping track of machine learning experiments can get overwhelming quickly, especially when you have multiple models and hyperparameters to consider. To alleviate these challenges, I've integrated W&B into our workflow for streamlined experiment tracking.
If this is your first time to the site, W&B is a platform that automatically logs your model's metrics, hyperparameters, and even the architecture. It acts as a centralized dashboard for all your machine learning experiments, helping you compare, analyze, and reproduce models with ease.
With W&B, you can focus on model development while it takes care of the tracking.
Choline
Getting access to GPU's can be a little tedious at times, and finding the best prices can be time-consuming. To solve these issues, I've been working on an open-source library called Choline. Choline is a cloud resource manager that automates the entire process of setting up and managing computational resources for machine learning workloads. It serves as a middleware between you and the Vast.ai cloud service (one of the cheapest GPU Cloud providers), helping you focus more on building models and less on machine setup.
Let's Get Coding!
In addition to the repo, other requirements include a Vast AI account with a few bucks in credits, a W&B account, and a Mac/Linux machine. Note I have mainly tested Choline on Mac, but it should work fine on Linux. You can probably deploy it onto Windows as well, I just haven't tested it yet.
The Data
First things first, we will focus on preparing our dataset for training. Basically, we will be converting the ChatGPT data export JSON file into a JSON file format that can be used to train our model.
For some reason, the ChatGPT JSON format is a bit complex (at least more complex than I envisioned it would be). You can find a conversion script at data/gen_json_ds.py, which can be run using python gen_json_ds.py val_pct gpt_json_path where val_pct is the percentage (integer value) of your data that you would like to use for validation, and gpt_json_path is the path to your ChatGPT JSON export. NOTE: Vast AI is a Peer-to-peer GPU sharing service, meaning if you use it, there is a possibility that your data could be stolen.
If data privacy is a concern, I recommend using more secure clouds like GCP or AWS.
Basically, the script converts the JSON data into a JSONL file with a format where the text included in between <s> and </s> fits the length of the MAX_SEQ_LEN value specified in the gen_json_ds.py script. For this tutorial, we will be using 2048 for the MAX_SEQ_LEN.
<s>[INST] What is your favourite condiment? [/INST]Well, I'm quite partial to a good squeeze of fresh lemon juice.</s>
I think it's important to note that the data format used in this tutorial may or may not be optimal. That's something that will require further testing! I used the maximum amount of previous context available for each input example while leaving the full GPT response intact. The gen_json_ds.py script should be pretty easy to modify for anyone looking to change the format used for adapting to new models or changing context length.
After running the script, drag the two JSONL dataset files into the train directory. Now, we are ready to dive into configuring the hardware we will need.
The Hardware
To use Choline, you will need to set up a Vast AI account and set up RSA keys in your local environment. More info can be found here.
After setting up your Vast account, the next step will be configuring your choline.yaml file in the train directory. I've generated this choline.yaml using the init.py script within choline/simple_startup/, so I recommend manually tweaking the config for your various needs, as Choline is still in early development.
conda_version: 23.7.2hardware_filters:cpu_ram: '>150'disk_space: '>250'gpu_name: RTX_3090image: nvidia/cuda:12.0.0-devel-ubuntu20.04local_cuda_version: '12.0'onStart: deepspeed train.pypython_version: 3.10.10upload_locations:- /Users/brettyoung/Desktop/mistral7b/trainsetup_script: |#!/bin/bash# Download Miniconda installerwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh# Install Minicondabash miniconda.sh -b -p $HOME/miniconda# Initialize conda. $HOME/miniconda/bin/activateconda init# Create environmentconda create --name choline python=3.10.10 -y# Activate environmentconda activate choline# Install vimsudo apt install vim -y# Set Wandb API key without user interactionexport WANDB_API_KEY=YOUR_API_KEYpip install accelerate==0.23.0 || conda install accelerate==0.23.0 -ypip install Twisted==22.10.0 || conda install Twisted==22.10.0 -ypip install typing_extensions==4.8.0 || conda install typing_extensions==4.8.0 -ypip install tzdata==2023.3 || conda install tzdata==2023.3 -ypip install urllib3==2.0.5 || conda install urllib3==2.0.5 -ypip install w3lib==2.1.2 || conda install w3lib==2.1.2 -ypip install wandb==0.15.11 || conda install wandb==0.15.11 -ypip install peft==0.5.0 || conda install peft==0.5.0 -ypip install datasets==2.14.5 || conda install datasets==2.14.5 -ypip install deepspeed || conda install deepspeed -ypip install transformers git+https://github.com/huggingface/transformers.git@ab37b801b14d8b9c3186548e6e118aff623e6aa1 || conda install transformers git+https://github.com/huggingface/transformers.git@ab37b801b14d8b9c3186548e6e118aff623e6aa1 -ypip install trl==0.7.1 || conda install trl==0.7.1 -ydeepspeed train.py
You will see that the beginning of the file contains information related to hardware parameters
hardware_filters:cpu_ram: '>150'disk_space: '>250'gpu_name: RTX_3090
Here, we are using a single Nvidia RTX 3090 with 250 GB of disk space and 150 GB of CPU RAM. Note the high amount of CPU RAM used, which is important for deepspeed and gradient checkpointing. We will get into that a bit later.
It's also important to note you will need a substantial amount of disk space in order to store model checkpoints. I would probably recommend at least 500 GB if you are storing a few checkpoints of the model.
Moving down the file a bit, you will see a key called "upload_locations." This essentially tells Choline what files to send to the instance during setup. You will need to replace your train directory with mine.
upload_locations:- /Users/brettyoung/Desktop/mistral7b/train
Underneath this is a script that sets up some dependencies on your instance and also allows you to add your W&B API key for seamless logging. You can add it here
export WANDB_API_KEY=YOUR_API_KEY
At the end of the file is the run command, which will be ran automatically after setting up your instance and syncing all necessary data. It can be seen below.
deepspeed train.py
After changing these config values, you are almost ready to begin training!
Reserving an Instance
Go ahead and 'cd' into your train directory within the repo. At this point, the train directory should contain a choline.yaml file, a train.py script, a zero2_deepspeed.json config file, and a your two train and validation dataset JSON files. We are now ready to reserve an instance!! In order to use Choline to reserve your instance, you can run the simple_startup.py script from inside your train directory.
This script will read from the choline.yaml file you modified, and find your hardware based on your provided information. You should see this, which will allow you to select a machine.

Press the number for the machine you would like to reserve, and Choline will reserve it. Choline will automatically send your train directory to the machine and run the setup script specified in the choline.yaml file.
Note that not all machine startups will succeed. Given the peer-to-peer nature of Vast, there are still some bugs that will prevent certain instances from launching the docker container specified. My best advice is to try a different instance if your original instance doesn't start up within 5-10 minutes. I'm hoping to integrate Choline with other cloud providers other than Vast to deal with issues like data privacy and hardware inconsistencies. Feel free to check out the main Choline repo here.
The Training Script
Below is the training script used. I'm using HuggingFace for my data loading and model initialization.
from transformers import AutoModelForCausalLM, AutoTokenizer, TextDataset, DataCollatorForLanguageModelingfrom transformers import Trainer, TrainingArgumentsfrom datasets import load_datasetimport torchfrom trl import SFTTrainer, DataCollatorForCompletionOnlyLM# Initialize the tokenizer and modeltokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.float16)MAX_SEQ_LEN = 2048# Function to read JSONL file into a listdef read_jsonl(path):data = []with open(path, 'r') as f:for line in f:data.append(json.loads(line))return datatokenizer.add_special_tokens({'pad_token': '[PAD]'})# Read data from JSONL files and create datasetstrain_data_list = read_jsonl('./output_examples_train.jsonl')valid_data_list = read_jsonl('./output_examples_val.jsonl')tr_data_dict = {'text': [item['text'] for item in train_data_list]}val_data_dict = {'text': [item['text'] for item in valid_data_list]}train_dataset = Dataset.from_dict(tr_data_dict)valid_dataset = Dataset.from_dict(val_data_dict)training_args = TrainingArguments(output_dir="./output",overwrite_output_dir=True,num_train_epochs=100,per_device_train_batch_size=1,save_steps=20,save_total_limit=2,fp16=True,bf16=False,report_to='wandb', # Add this line to enable Wandb logginglogging_steps=10, # log every 10 stepsgradient_checkpointing=True,deepspeed="./zero2_deepspeed.json" # Add this line for DeepSpeed)trainer = SFTTrainer(model=model,args=training_args,train_dataset=train_dataset,eval_dataset=valid_dataset, # Added for validationdataset_text_field="text",max_seq_length=MAX_SEQ_LEN,tokenizer=tokenizer,)trainer.train()trainer.save_model("./output/final_model")
A key element of this code is the arguments passed to the SFTTrainer. Here, we specify various parameters for our training run, andc using W&B to log our results. Along with this, we set gradient_checkpointing to True.
training_args = TrainingArguments(output_dir="./output",overwrite_output_dir=True,num_train_epochs=100,per_device_train_batch_size=1,save_steps=20,save_total_limit=2,fp16=True,bf16=False,report_to='wandb', # Add this line to enable Wandb logginglogging_steps=10, # log every 10 stepsgradient_checkpointing=True,deepspeed="./zero2_deepspeed.json" # Add this line for DeepSpeed)
This line specifies where our deepspeed config is.
Here is the DeepSpeed config we use.
{"fp16": {"enabled": "auto","loss_scale": 0,"loss_scale_window": 1000,"initial_scale_power": 16,"hysteresis": 2,"min_loss_scale": 1},"bf16":{"enabled":"auto"},"optimizer": {"type": "AdamW","params": {"lr": "auto","betas": "auto","eps": "auto","weight_decay": "auto"}},"scheduler": {"type": "WarmupLR","params": {"warmup_min_lr": "auto","warmup_max_lr": "auto","warmup_num_steps": "auto"}},"zero_optimization": {"stage": 2,"allgather_partitions": true,"allgather_bucket_size": 2e8,"overlap_comm": true,"reduce_scatter": true,"reduce_bucket_size": 2e8,"contiguous_gradients": true,"round_robin_gradients": true,"cpu_offload": true},"gradient_accumulation_steps": "auto","gradient_clipping": "auto","steps_per_print": 2000,"train_batch_size": "auto","train_micro_batch_size_per_gpu": "auto","wall_clock_breakdown": false}
You can see we use stage 2 zero optimization, which will allow for offloading optimizer states to the CPU, conserving GPU VRAM needed for the model parameters. DeepSpeed handles much of the parameter optimization under the hood, which is awesome! There are probably many more optimizations I could take advantage of, which will be coming in a future tutorial!
With all of this covered, you should be set to begin training! If you used Choline, the training script should start automatically once all other automated setup is complete. You can monitor the status of your machine using the status.py script in the choline/simple_startup/ directory. This will cat out the logs for your machine, which will eventually contain console output from your training script. Additionally, you can log into your Wandb account to view the status and eval logs of your training run.

In order to retrieve the model you trained, use the sync.py script in the choline/simple_startup/sync.py, which will download all of the logged models to your local system.
This will download the models you trained to your local system! Do note, though, that this is a BIG file, and you will need plenty of space on your local system. Another option is to select files individually on your instance by logging in via the command line and scp'ing the files directly.
Time to Spin Up Some GPU's!
If you made it this far, I think you're ready to train your very own Mistral7B on ChatGPT conversation data! As time goes on, personal chat data will be important to keep track of, and will be essential for all AI's to understand the full context of you and the various work/tasks you do!
Thanks for reading, and feel free to comment if you have any questions/suggestions or run into any bugs!
Add a comment