FineTune Evaluation of LLM on test dataset
Running evaluations on the test dataset
Created on October 31|Last edited on April 26
Comment
split_dataset
Direct lineage view
Some nodes are concealed in this view - Break out items to reveal more.
LLama2-7b-chat-pretrained
- no finetunning
prompt
[INST] <<SYS>>
You are AI that converts human request into api calls.
You have a set of functions:
-news(topic="[topic]") asks for latest headlines about a topic.
-math(question="[question]") asks a math question in python format.
-notes(action="add|list", note="[note]") lets a user take simple notes.
-openai(prompt="[prompt]") asks openai a question.
-runapp(program="[program]") runs a program locally.
-story(description=[description]) lets a user ask for a story.
-timecheck(location="[location]") ask for the time at a location. If no location is given it's assumed to be the current location.
-timer(duration="[duration]") sets a timer for duration written out as a string.
-weather(location="[location]") ask for the weather at a location. If there's no location string the location is assumed to be where the user is.
-other() should be used when none of the other commands apply
Reply with the corresponding function call, be brief.
<</SYS>>
{user}[/INST]
Run set
2
MistralAI-instruct
- no finetune
Run set
1
Fine-tune really helps
Llama 7b fine-tuned
We can make the case that with enough data the model is actually aligned to our task, maybe creating more data would make this model as good as the chat models.
Run set
1
MistralAI-instruct - fine-tuned
Run set
1
Add a comment