Skip to main content

Fine-Tuning a Legal Copilot Using Azure OpenAI and W&B

Building a tool to demystify the intricacies of legal contracts with AI!
Created on November 26|Last edited on January 13

Introduction

Recently, Microsoft Azure rolled out their own service for fine tuning the OpenAI language models, but the choice between Azure OpenAI and OpenAI's fine-tuning service depends on your specific needs and context.
For enterprises requiring robust security, compliance, and integration with other Azure services, Azure OpenAI is a great choice. Conversely, individual developers or smaller teams looking for direct model access and community support might find OpenAI’s fine-tuning service more suitable.
Both services offer powerful capabilities, but their suitability varies based on the user’s technical requirements, scale, and operational context. In addition, the inference and training pricing format for custom fine tuned models on the Azure OpenAI service are billed hourly, instead of per token, which also could be an important factor in your decision of which fine-tuning service to use. In this tutorial, we'll be fine-tuning GPT-3.5 Turbo on subsets of the LegalBench dataset, in order to build a legal contract copilot!




Specifically, we'll be fine-tuning GPT-3.5 Turbo on the contract subsets of the legal bench dataset. This targeted approach is designed to augment the model's proficiency in understanding legal language, particularly in the domain of contracts.
By focusing on these specialized datasets, we aim to enhance the model's accuracy and reliability in legal contexts, making a useful tool that could potentially prevent legal blunders before they occur. This fine-tuning process is a useful step customizing the model to be more precise in specific use cases past legal data can be used to further enhance the model. The dataset we will use may or may not improve the model for your specific use case, and you will likely need to experiment with different datasets that fit your needs.
It's important to clarify that the fine-tuning of GPT-3.5 Turbo using the contract subsets of legal bench datasets is primarily for demonstration purposes. While this initial step is promising and will undoubtedly provide valuable insights, it’s worth noting that a more robust and comprehensive dataset is likely required to develop a fully viable legal copilot. This project serves as a solid foundation, offering a glimpse into the potential capabilities of AI in the legal domain.

Table of Contents



Accessing the Azure OpenAI API

In order to access the Azure OpenAI fine tuning service, you will need to fill out a form outlining your use case for the API. One important note about this step is that you will need to use have a company email address in order to be approved. You can access the form here.
After your request is approved, you will be able to create a new instance in the Azure AI services tab, and access your API keys and Endpoint URL in the Azure dashboard.

The Data

In order to fine-tune our model, we first need to find a relevant dataset. I chose to use a few subsets of the LegalBench Dataset. It's important to note that the samples selected from the LegalBench Dataset were originally intended for the task of natural language inference (NLI). NLI is a fundamental task in natural language processing (NLP) that involves determining the relationship between a premise and a hypothesis, typically classifying it as entailment, contradiction, or neutrality. In our case, the dataset is simply 'yes' or 'no' questions for the model to answer.

Converting From HuggingFace to JSONL

We'll first write a script where the primary objective is to convert the Hugging Face Legalbench dataset into a JSONL format that is compatible with the Azure fine-tuning API. In essence, it translates the structured data from the 'Legalbench' dataset, which contains various subsets relating to contract law, into a format that an AI training system can easily interpret and learn from.
This conversion process involves several key steps. First, the script selects specific subsets from the Legalbench dataset, each focusing on different aspects of contract law, such as clauses on confidentiality, permissible disclosures, or the survival of obligations. It then creates tailored prompts for each subset, which are essential in guiding the AI to identify whether certain legal clauses are present in a piece of contract text.
As the script processes each subset of the dataset, the script constructs a dialogue-like format for each entry. This format includes the predefined prompt, the text of the legal clause, and the correct answer (e.g., yes or no) indicating whether the clause matches the criteria set out in the prompt. The dialogue format is crucial as it mimics the interactive style in which AI models are often trained, allowing for a more natural understanding and processing of the data.
Finally, the script outputs this dialogue-formatted data into JSONL files, splitting them into training and testing datasets. The validation set enables the model to be evaluated for accuracy and comprehension with data it has never been trained on.
from datasets import load_dataset
import json

# List of subsets to load
subsets = [
"contract_nli_confidentiality_of_agreement",
"contract_nli_explicit_identification",
"contract_nli_inclusion_of_verbally_conveyed_information",
"contract_nli_limited_use",
"contract_nli_no_licensing",
"contract_nli_notice_on_compelled_disclosure",
"contract_nli_permissible_acquirement_of_similar_information",
"contract_nli_permissible_copy",
"contract_nli_permissible_development_of_similar_information",
"contract_nli_permissible_post-agreement_possession",
"contract_nli_return_of_confidential_information",
"contract_nli_sharing_with_employees",
"contract_nli_sharing_with_third-parties",
"contract_nli_survival_of_obligations",
"contract_qa"
]

# Dictionary to hold subsets as keys and their prompts as values
subsets_prompts = {
"contract_nli_confidentiality_of_agreement": "Identify if the clause provides that the Receiving Party shall not disclose the fact that Agreement was agreed or negotiated.",
"contract_nli_explicit_identification": "Identify if the clause provides that all Confidential Information shall be expressly identified by the Disclosing Party.",
"contract_nli_inclusion_of_verbally_conveyed_information": "Identify if the clause provides that Confidential Information may include verbally conveyed information.",
"contract_nli_limited_use": "Identify if the clause provides that the Receiving Party shall not use any Confidential Information for any purpose other than the purposes stated in Agreement.",
"contract_nli_no_licensing": "Identify if the clause provides that the Agreement shall not grant Receiving Party any right to Confidential Information.",
"contract_nli_notice_on_compelled_disclosure": "Identify if the clause provides that the Receiving Party shall notify Disclosing Party in case Receiving Party is required by law, regulation or judicial process to disclose any Confidential Information.",
"contract_nli_permissible_acquirement_of_similar_information": "Identify if the clause provides that the Receiving Party may acquire information similar to Confidential Information from a third party.",
"contract_nli_permissible_copy": "Identify if the clause provides that the Receiving Party may create a copy of some Confidential Information in some circumstances.",
"contract_nli_permissible_development_of_similar_information": "Identify if the clause provides that the Receiving Party may independently develop information similar to Confidential Information.",
"contract_nli_permissible_post-agreement_possession": "Identify if the clause provides that the Receiving Party may retain some Confidential Information even after the return or destruction of Confidential Information.",
"contract_nli_return_of_confidential_information": "Identify if the clause provides that the Receiving Party shall destroy or return some Confidential Information upon the termination of Agreement.",
"contract_nli_sharing_with_employees": "Identify if the clause provides that the Receiving Party may share some Confidential Information with some of Receiving Party's employees.",
"contract_nli_sharing_with_third-parties": "Identify if the clause provides that the Receiving Party may share some Confidential Information with some third-parties (including consultants, agents and professional advisors).",
"contract_nli_survival_of_obligations": "Identify if the clause provides that some obligations of Agreement may survive termination of Agreement.",
"contract_qa": "Answer questions about whether contractual clauses discuss particular issues."
}


# Path to save the JSONL files
output_file_train = './legalbench_subsets_test.jsonl'
output_file_test = './legalbench_subsets_train.jsonl'

def write_to_file(subset, output_file, split):
with open(output_file, 'a') as file:
dataset = load_dataset("nguha/legalbench", subset, split=split)

for example in dataset:
# Constructing the dialogue format with the prompt and example data
prompt = subsets_prompts.get(subset, "")
dialogue = {
"messages": [
{"role": "system", "content": "Answer yes/no "+ prompt},
{"role": "user", "content": example['text']},
{"role": "assistant", "content": example['answer']}
]
}
file.write(json.dumps(dialogue) + '\n')

# Write train and test datasets to separate files
for subset in subsets:
# Write the test data to the train file (since they are mixed up on Hugging Face)
write_to_file(subset, output_file_train, "test")

# Write the train data to the test file
write_to_file(subset, output_file_test, "train")
As you can see in the above script, each subset has it's own corresponding prompt which is used as input to the model along with the context.
Now that we have created the dataset files, we can move on to the next step" uploading the files to Azure so that they can be used for fine-tuning. You might notice that the subsets have substantially more examples in the test set than the train set.
Initially, I thought this was a bug, however the NLI task this dataset is usually used for is typically intended to evaluate the models ability to learn from just a small number of training samples. For this project, we will leverage the test set for training, and the train set for validation.
Here is an example of what one of the samples looks like:
{
"messages":
[{"role": "system", "content": "Answer yes/no Identify if the clause provides that Confidential Information may include verbally conveyed information."},
{"role": "user", "content": "2. Every contract party can disclose confidential information to the other contract party orally or in writing. "},
{"role": "assistant", "content": "Yes"}
]
}

Uploading the Data

The process of uploading the training data to Azure is relatively simple, and you can modify this template to upload your training and validation files:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_KEY"),
api_version="2023-12-01-preview" # This API version or later is required to access fine-tuning for turbo/babbage-002/davinci-002
)

# File names for training and validation datasets
training_file_name = './legalbench_subsets_train.jsonl'
validation_file_name = './legalbench_subsets_val.jsonl'

training_response = client.files.create(
file=open(training_file_name, "rb"), purpose="fine-tune"
)
training_file_id = training_response.id

validation_response = client.files.create(
file=open(validation_file_name, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)


Fine Tuning With Azure and W&B

If you've made it this far, the heavy lifting is basically done! All that's left is to train our model using the OpenAI python package, along with our training and validation file IDs (which where printed to the console in our previous example).
response = client.fine_tuning.jobs.create(
training_file=training_file_id,
validation_file=validation_file_id,
model="gpt-35-turbo-0613", # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
)
job_id = response.id
# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.
print("Job ID:", job_id)
print(response.model_dump_json(indent=2))
As with any machine learning experiment, we will need a way to visualize the results of our training run. Luckily, Weights and Biases provides excellent integration for this, and with only two lines of code, we can sync the results from the training run with our Weights and Biases account! The code below demonstrates this.
import os
import openai
from wandb.integration.openai.fine_tuning import WandbLogger
from openai import AzureOpenAI
import argparse

def main():
# Parse command line arguments
parser = argparse.ArgumentParser(description='Sync Wandb with OpenAI Fine-Tune')
parser.add_argument('--id', type=str, help='Fine Tune Job ID')
parser.add_argument('--project', type=str, help='Project Name')
parser.add_argument('--overwrite', action='store_true', help='Overwrite existing logs (defaults to False)')
args = parser.parse_args()

# Retrieve OpenAI API key from environment variable
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
raise ValueError("OPENAI_API_KEY environment variable not set")

# Initialize OpenAI with the API key
openai.api_key = openai_api_key

# Initialize AzureOpenAI client (modify the endpoint if necessary)
client = AzureOpenAI(
azure_endpoint = "YOUR_AZURE_ENDPOINT_URL", # Replace with your Azure endpoint URL
api_key=openai_api_key,
api_version="2023-12-01-preview"
)

# Sync Wandb with the fine-tune job
WandbLogger.sync(fine_tune_job_id=args.id, openai_client=client, overwrite=args.overwrite)

if __name__ == "__main__":
main()
Now, we can run the script with:
python wandb_sync.py --id ftjob-theID --project OpenAI-Fine-Tune --overwrite
...which will sync our training and validation logs with Weights and Biases. Here are the logs from my training run. I will log our training loss, validation loss, as well a few W&B Tables which show some sample outputs from the model for the train and validation set, which will give us a better feel for how our model is performing. Tables are nice because they can give more insight into the models true abilities in comparison to a loss metric, which isn't always the best indicator of performance in a real-world setting.

Run set
6



Run set
6



Run set
6



Run set
6


Model Deployment

There are multiple methods available for deploying your fine-tuned model. The first two method involves using a python script or a CLI to deploy the model, and these methods are well documented in the Azure OpenAI docs.
For my use case, I found it was easiest to use the Azure OpenAI Studio, and with only a few clicks I was able to deploy the model. Simply navigate to the Azure OpenAI Studio, and click the deployments tab. Next, select the 'create new deployment' button. This will bring up the following window, which allows you to name your deployment and launch the model!



Accuracy Evaluation

I went ahead and benchmarked by model against the stock GPT-3.5 Turbo model on the validation set, and I was able to improve the performance of the model!




The App

Now that we have fine-tuned and deployed our model, we are ready to create a front-end for utilizing the model. For this app, we want to build a front end that will allow the user to upload a PDF file containing the contents of the document.
We'll use Flask and some simple HTML for the app. Here's the code for the API that will make requests to our model. The API essentially has an endpoint that receives the legal contract in PDF form, and loops through each page, and makes a model query using the contents of the page. Here is the code:
from flask import Flask, request, jsonify, send_from_directory
import PyPDF2
import io
import os


app = Flask(__name__)

# Set debug mode (set to True for testing without OpenAI calls)
DEBUG_MODE = False

from openai import AzureOpenAI

client = AzureOpenAI(
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_KEY"),
api_version="2023-12-01-preview" # This API version or later is required to access fine-tuning for turbo/babbage-002/davinci-002
)


def getCompletion(prompt):
chat_completion = client.chat.completions.create(
model="deploy_V1",
messages=[
{"role": "system", "content": "You are a helpful assistant reviewing legal documents."},
{"role": "user", "content": prompt}
]
)
return chat_completion



@app.route('/')
def index():
# Serve the HTML file
return send_from_directory('', 'index.html')

@app.route('/upload', methods=['POST'])
def upload_file():
if 'file' not in request.files:
return jsonify({"error": "No file part"}), 400

file = request.files['file']
if file.filename == '':
return jsonify({"error": "No selected file"}), 400

if file:
pdfReader = PyPDF2.PdfReader(io.BytesIO(file.read()))
responses = []
combined_response = ""

for page_number, page in enumerate(pdfReader.pages):
text = page.extract_text()
if DEBUG_MODE:
# In debug mode, return the first 10 characters of each page
response_text = text[:10]
else:
# Constructing the prompt for the model
prompt = f"Review this legal document excerpt for any elements that seem legally unfair under US law \"{text}\""

response = getCompletion(prompt=prompt)
# Extract response from OpenAI
response_text = response.choices[0].message.content

# Collecting responses for each page
combined_response += f"Page {page_number + 1}: {response_text}\n\n"

return jsonify({"text": combined_response})

return jsonify({"error": "Invalid file"}), 400

if __name__ == '__main__':
app.run(debug=True)
In this code, we receive a API request with the PDF document, and loop through each page of the document. As seen here, we make a call to the API, asking the model if any elements of the document should be reviewed by a legal team. Note, you can adjust this prompt depending on your exact use case. This application could simply act as an extra safety measure on top of an existing legal team, as a way to reduce the chance of human error when analyzing contracts.
for page_number, page in enumerate(pdfReader.pages):
text = page.extract_text()
if DEBUG_MODE:
# In debug mode, return the first 10 characters of each page
response_text = text[:10]
else:
# Constructing the prompt for the model
prompt = f"Review this legal document excerpt for any elements that seem legally unfair under US law \"{text}\""

response = getCompletion(prompt=prompt)
# Extract response from OpenAI
response_text = response.choices[0].message.content
I added a simple DEBUG_MODE flag for testing the app without a model (it simply returns the first few characters from the page). After adding the HTML file, we can run the script and it will host our app locally at http://127.0.0.1:5000.
After launching the script and visiting the our local-host address, we will now see our app!

All thats left to do is test our app! I will upload a sample contract PDF and see what the model thinks! Here is what the model said for my sample landscaping contract, which contains an unfair clause on page 3.


Conclusion

The future of AI in the legal sector is promising, with potential for further development and refinement. As AI technology continues to evolve, it's exciting to think about the myriad ways it could be applied to enhance the efficiency and accessibility of legal services. I hope you learned something new in this tutorial, and as always, feel free to drop a comment if you have any questions or comments about the project! You can find the full code used in the tutorial at the Github repo here. Stay tuned for more AI tutorials on W&B and, if you'd like to check out our OpenAI docs, we've got you covered. 
Anish Shah
Anish Shah •  
Same advice as the other fine-tuning comment except with File Uploads via 1.0. The new logger should work identically to the old Logger if you update the code openai.api_key = "your api key"openai.api_base = "your endpoint URL"openai.api_type = 'azure'openai.api_version = '2023-09-15-preview' # Required API version for fine-tuning# File names for training and validation datasetstraining_file_name = './legalbench_subsets_train.jsonl'validation_file_name = './legalbench_subsets_val.jsonl'# Upload training dataset filetraining_response = openai.File.create(file=open(training_file_name, "rb"), purpose="fine-tune", user_provided_filename="legalbench_subsets_train.jsonl")training_file_id = training_response["id"]# Upload validation dataset filevalidation_response = openai.File.create(file=open(validation_file_name, "rb"), purpose="fine-tune", user_provided_filename="legalbench_subsets_val.jsonl")validation_file_id = validation_response["id"]
1 reply
Anish Shah
Anish Shah •  
I would show the newest way to do this with oai client >= 1.0. The client method works identical to the old way, you just need to instantiate the client The difference is instead of doing ``` openai.<attribute> openai.FineTuning ``` we would be consistent and show ``` client = AzureOpenAI( azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), api_key=os.getenv("AZURE_OPENAI_KEY"), ) response = client.fine_tuning.jobs.create( training_file=training_file_id, model="gpt-35-turbo-0613" ) job_id = response.id ``` I wouldn't want the mismatch between the old and new OAI versions response = openai.FineTuningJob.create( training_file=training_file_id, validation_file=validation_file_id, model="gpt-3.5-turbo")
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.