Skip to main content

A Guide to Fine-Tuning CodeLlama with Weights & Biases for Trading Strategies

CodeLlama is an advanced language model tailored for generating programming code, offering capabilities in code completion, debugging, and writing code from scratch. It represents a significant leap in AI-assisted coding, enabling more efficient development workflows and solving complex problems with ease.
Created on March 14|Last edited on April 4
Today, we're going to look at using Code Llama—an LLM that specializes in coding and debugging—alongside W&B to hone trading strategies.

Overview of Code Llama

Unlike generic LLMs, Code LLMs are designed to bridge the gap between natural language processing (NLP) and software development. Amid these releases, Code Llama, a state-of-the-art LLMs has been made publicly available by Meta, focusing on code-related tasks and engineered on the Llama 2 platform. CodeLlama excels in code infilling and is fine-tuned to understand and generate programming code thus, offering assistance in code completion, debugging, and even writing code from scratch based on user prompts.
The main version of CodeLlama includes CodeLlama 7B, a smaller, more resource-efficient model suitable for environments with limited computational capacity, making it ideal for less complex coding tasks. Then there is the CodeLlama 34B, which represents a significant leap in terms of parameter size, leading to enhanced code generation capabilities.
The latest milestone is the release of its 70-billion (70b) parameter model, which sets a new standard in the field. It leverages a deep understanding of multiple programming languages, enabling it to handle more complex queries and provide more accurate, context-aware suggestions.
Source: Author

Code LLMs

Code LLMs work by ingesting prompts or questions from users, interpreting these prompts within the context of programming, and then generating code snippets, or explanations based on learned patterns from vast amounts of programming-related data.

How do Code LLMs work?

Code LLMs involve several steps typical of LLMs, such as tokenization, embedding, processing through a transformer model, and generating outputs token by token.

Tokenization

Tokenization is a critical preliminary step in processing text, including code. It involves converting code into a sequence of tokens (e.g., operators, identifiers, and syntax) that the model can understand enabling the model to capture the structural nuances of various programming languages.

Embeddings

Once the code is tokenized, each token is converted into a high-dimensional vector using embeddings. These embeddings capture semantic and syntactic information about the tokens, facilitating the model's understanding of programming concepts and their relationships. These embeddings are learned during the training process. CodeLLM employs rotary positional embeddings as opposed to absolute positional embedding employed by general LLMs. These rotatory embeddings integrate positional information by applying a rotation to the embedding space allowing us to capture how far apart tokens are in the sequence hence, capturing better relationships between elements.

Transformer Model

LLMs generally employ a transformer architecture with attention mechanisms. This mechanism allows Code LLMs to focus on relevant parts of the code when generating outputs. This is crucial for understanding dependencies in code, such as which variables are being referenced or how different functions interact. Attention mechanisms enable the model to dynamically adjust its focus based on the input context, improving its ability to generate coherent and contextually appropriate code.


Code LLMs performance is heavily dependent on the quantity and quality of its training data. It is trained on a vast corpus of code from various sources, including public code repositories, forums, and documentation. The main focus of llama developers was to increase the model’s performance using more data for initial training instead of increasing the model’s parameters as in OpenAI models, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process. Hence, the lower costs!
Additionally, CodeLLMs utilize SwiGLU activation function instead of GeLU employing a learnable parameter to control the gating mechanism. This adjustment allows for a more dynamic range of gating behaviors compared to the standard GLU, potentially leading to better performance in certain tasks. Moreover, traditional LLMs utilize standard layer-normalization while CodeLLM utilize root-mean-squared layer-normalization.

Challenges and Solutions in AI-Powered Coding

Data Privacy and Security

Ensuring the security of code and data, especially when using cloud-based AI models, is crucial. There may be concerns about sensitive information leakage.

Resource Requirements

Advanced AI models are resource-intensive, requiring significant computational power, which can lead to increased costs and infrastructure demands.\

Scope of Application and Specialization

Although models like Llama 2 excel in broad language-related tasks, their effectiveness can differ greatly when applied to specialized or niche areas. It may necessitate fine-tuning or the development of tailored solutions to meet specific domain requirements.

Model Bias and Errors

AI models, including Code Llama, can generate biased or incorrect code based on their training data, leading to potential bugs or security vulnerabilities.

Explainability and Transparency

The complexity of large language models such as Llama 2 often results in a lack of clear insight into how decisions or outputs are generated. Achieving a level of transparency and explainability in the model's outputs is essential, especially in industries subject to strict regulations.

What can Code Llama do?

Other than generating code, Code Llama excels in understanding natural language instructions. This means developers can describe the functionality they need in plain English, and Code Llama can translate these instructions into functional code. Code Llama enhances coding productivity through features like code completion, infilling, and conversational instructions:
  • Code Completion: Code Llama predicts and fills in the next parts of code based on the existing context, streamlining the coding process and reducing typing effort.
  • Infilling: It can intelligently fill in missing pieces of code or expand partial code snippets, helping to resolve errors and improve code quality.
  • Conversational Instructions: Code Llama interprets natural language instructions and converts them into functional code, enabling a more intuitive and accessible coding experience.
Moreover, Code Llama supports multiple programming languages, making it a versatile tool for a wide range of development environments. Whether working in Python, JavaScript, Java, or other languages, Code Llama can provide appropriate coding solutions, adhere to language-specific conventions, and offer insights based on best practices.

The significance of Code Llama’s specialized models for Python and instruct-based coding tasks

Additionally, there are specialized variants of CodeLlama such as BaseModel, Python, and Instruct. The BaseModel can handle a wide range of coding tasks across different programming languages while the Python model is focused on Python-based tasks. Finally, CodeLlama-Instruct is designed to follow natural language instructions and convert them into code. This model is particularly useful for users who may not be familiar with specific programming syntax but can describe what they want to achieve in natural language.
CodeLlama-Instruct is trained to interpret these instructions and produce the corresponding code, making it a valuable tool for teaching, rapid prototyping, and bridging the gap between non-programmers and software development. The CodeLlama specialization pipeline can be seen below along with the different stages of fine-tuning annotated with the number of tokens seen during training.

Fine-tuning Code Llama along with Weights & Biases

In order to optimize AI Models for your specified task, a pre-trained model can be initially tested to validate its performance on the task. In cases where the results are not satisfactory, the model can be fine-tuned on relevant datasets to increase the performance.
Let’s explore a similar case below where we test CodeLlama 7b for working with trading strategies. For this, we will be using the quant trading instructions dataset. The role of a quant trader is to use mathematical computations and analyze historical financial data to identify trading opportunities.
The dataset, therefore, holds user queries (prompts) to write code related to analyzing and building trading strategies along with the system response in the form of code. We will track the results of our analysis in W&B to analyze the performance and maintain a comprehensive workflow.

1. Importing the Libraries and Setup W&B

!pip install openai
from datetime import datetime
import os
import openai
import sys
import pandas as pd
import json
from IPython.display import Markdown, display
!pip install wandb
import wandb
wandb.init(project="together", name="togethercomputer/CodeLlama")

2. Setup Together.ai

Since Codellama, provided by HuggingFace is not directly accessible through an API like in the case of OpenAI models such as GPT, we need to host the model locally for inference and fine-tuning. However, to cater to the limited computational power when trying out the models for experimental purposes we will utilize Together.ai. Together.ai hosts these models for us and allows easy access through an API with limited to no cost.
We acquire the together.ai API that is to be passed to an object of the OpenAI class.
os.environ["OPENAI_API_KEY"] = 'your_api_key'


client = openai.OpenAI(
api_key=os.environ.get("your_api_key"),
base_url="https://api.together.xyz/v1",
)


!pip install --upgrade together
import together
together.api_key = "your_api_key"

3. Evaluate Performance using the Pre-Trained Model

We first load the CodeLlama 7b instruct model, and create a function to generate code based on the user context provided. We also pass in the system context that is to act as a quant programmer.
results = []
def get_code_completion(user_content, base_model, model_name):
messages = [
{
"role": "system",
"content": "You are an expert quant programmer that helps to write Python trading strategies code based on the user request. Don't be too verbose.",
},
{
"role": "user",
"content": user_content,
}
]
response = client.chat.completions.create(
model=base_model,
messages=messages,
temperature=0.1
)


results.append({"user_prompt": user_content, str(model_name): response.choices[0].message.content})
return response.choices[0].message.content


base_model = "codellama/CodeLlama-7b-Instruct-hf"

Next, we pass the user prompt to write a code for backtesting. The results for the output will be analyzed at the end of this article.
user_prompt = "Please write a Python script that performs a simple moving average (SMA) crossover trading strategy backtest."


chat_completion = get_code_completion(user_prompt, base_model, 'pretrained-7b')
print("Prompt:", user_prompt)
print("Completion:", chat_completion)

4. Load and Preprocess the Dataset

To fine-tune the model, we load the quant trading instructions dataset. This data holds 3 columns namely, question, context, and the answer in the form of code.
!pip install datasets
from datasets import load_dataset
dataset = load_dataset('lumalik/Quant-Trading-Instruct')
We then convert the dataset into a pandas dataframe from a dictionary for easier manipulation. We take only a small portion i.e. the first 100 rows of the data, for fine-tuning.
dataset_df = dataset['train'].to_pandas()
dataset_df_100 = dataset_df.iloc[:100]

5. Convert Data to Json for Fine-Tuning and Upload to Together.ai

This format is required by the 7b model as shown below is available here.
"""<s>[INST] <<SYS>>\n{context}\n<</SYS>>\n\n{question} [/INST] {answer} </s>"""
We convert the dataframe into this format and save it as jsonl for passing to the model for fine-tuning. The format is available here.
output_file_path = 'output_data1.jsonl'
instruct_template = \
"""<s>[INST] <<SYS>>\n{context}\n<</SYS>>\n\n{question} [/INST] {answer} </s>"""
with open(output_file_path, 'w') as file:
for index, row in dataset_df_100.iterrows():
formatted_text = instruct_template.format(
context=row['context'],
question=row['question'],
answer=row['answer']
)
json_object = {"text": formatted_text}
file.write(json.dumps(json_object) + '\n')
print(f"Data successfully written to {output_file_path}"
The format of the dataset can be checked using the code below to ensure it follows the standard where is_passed_checked needs to be True.
resp = together.Files.check(file="/content/output_data1.jsonl")
print(resp)
The data is then uploaded to together.ai:
together.Files.upload(file="/content/output_data1.jsonl")

6. Fine Tune the Model and Log Results to W&B

The model can now be finetuned using the loaded data’s file id and the results can directly be logged to W&B by passing in the W&B API key.
resp = together.Finetune.create(
training_file = 'tarining_file_id',
model = 'togethercomputer/CodeLlama-7b-Instruct',
n_epochs = 5,
n_checkpoints = 1,
batch_size = 32,
learning_rate = 1e-5,
suffix = 'codeLlama-7b-finetune-trading',
wandb_api_key = ‘wandb API key’,
)
fine_tune_id = resp['id']

7. Evaluate the Performance using our Fine-Tuned Model

The fine-tuned model can now be loaded using:
finetuned_model = 'model_id_here'
model_list = together.Models.list()
print(f"{len(model_list)} models available")
available_model_names = [model_dict['name'] for model_dict in model_list]
finetuned_model in available_model_names
We will deploy the model and check if it's available for inference using:
together.Models.start(finetuned_model)
together.Models.ready(finetuned_model)
Now, we will repeat the same prompts used for the pre-trained model and compare the results.
user_prompt = "Please write a Python script that performs a simple moving average (SMA) crossover trading strategy backtest."
chat_completion = get_code_completion(user_prompt, base_model, 'finetuned-7b')
print("Prompt:", user_prompt)
print("Completion:", chat_completion)
The resulting code is logged into W&B for reference.
df_results = pd.DataFrame(results)
wandb.log({"results_table": wandb.Table(dataframe=df_results)})
The fine-tuned parameters logged in W&B show the corresponding decrease in loss and the learning rate.

Source: Author
Below, we can compare the logged results for both pre-trained and fine-tuned models. On comparing we can see that the first code snippet is not ideal because it incorrectly simulates the trading strategy's impact on portfolio equity. It directly multiplies the entire equity (100,000 units) by the stock's price for every buy or sell signal, which doesn't realistically reflect typical trading actions (where you would buy or sell a certain amount of stock, not apply the stock's price as a multiplier to the whole portfolio value). This approach could result in exaggerated and misleading results, as it doesn't accurately track the changes in portfolio value based on actual trading quantities.


Conclusion

The advent of LLMs has drastically transformed the landscape of efficiency and automation. Among these innovations, CodeLlama stands out as a prime example of how artificial intelligence can assist in coding practices. With its specialized capabilities for code completion, infilling, and interpreting conversational instructions, CodeLlama revolutionizes the coding process, making it more efficient and accessible.
In this article, we explored how CodeLlama, particularly the 7B version, serves as a quantitative programmer to backtest trading strategies. This practical application demonstrates through comparison between pre-trained and fine-tuned versions of the model, that fine-tuning CodeLlama can lead to significantly improved outcomes, tailoring the model's responses to fit specific tasks more accurately. The evolution from traditional programming to AI-assisted development is well underway, with tools like CodeLlama leading the charge toward a more efficient and innovative future in coding.
Iterate on AI agents and models faster. Try Weights & Biases today.