A-sh0ts's workspace
Runs
35
Name
35 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
CHAT_MODEL_NAME
DEBUG
EVAL_ARTIFACT
EVAL_MODEL_NAME
chat_prompt_artifact
faiss_artifact
human_template
hyde_prompt_artifact
max_retries
model_name
retry_delay
system_template
average_string_distance
chat_accuracy
human_accuracy_score
model_accuracy_score
retrieval_accuracy
Finished
-
darek
15s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0.74249
0.45923
-
Failed
-
darek
18s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Finished
-
darek
18s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Failed
-
darek
14s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Finished
-
darek
13s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Finished
-
darek
13s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Finished
-
darek
13s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Failed
-
darek
8s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Failed
-
darek
8s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Crashed
-
megatruong
7s
-
-
-
-
-
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
-
parambharat/wandb_docs_bot/hyde_prompt:latest
-
gpt-4
-
-
-
-
-
-
-
Crashed
-
megatruong
1d 11h 13m 44s
-
-
-
-
-
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
-
parambharat/wandb_docs_bot/hyde_prompt:latest
-
gpt-4
-
-
-
-
-
-
-
Crashed
-
megatruong
1d 8h 36m 44s
-
-
-
-
-
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
-
parambharat/wandb_docs_bot/hyde_prompt:latest
-
gpt-4
-
-
-
-
-
-
-
Finished
-
darek
10s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Finished
-
darek
1m 51s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
45
0
-
-
1
Finished
-
darek
3m 14s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
47.5
0
-
-
0.5
Failed
-
darek
1m 18s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
78.33333
-
-
-
0.66667
Failed
-
darek
1m 35s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
46.33333
-
-
-
0.66667
Failed
-
darek
1m 42s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
82
-
-
-
1
Failed
-
darek
1m 41s
-
gpt-3.5-turbo
true
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
78.33333
0
-
-
1
Killed
-
darek
1m 48s
-
gpt-3.5-turbo
false
wandbot/wandbbot/eval_dataset:v0
gpt-3.5-turbo
parambharat/wandb_docs_bot/system_prompt:latest
parambharat/wandb_docs_bot/faiss_store:latest
You are an evaluator for the W&B chatbot.
You are given a question, the chatbot's answer, and the original answer, and are asked to score the chatbot's answer as either CORRECT or INCORRECT.
Note that sometimes, the original answer is not the best answer, and sometimes the chatbot's answer is not the best answer.
You are evaluating the chatbot's answer only.
Example Format:
QUESTION: question here
CHATBOT ANSWER: student's answer here
ORIGINAL ANSWER: original answer here
GRADE: CORRECT or INCORRECT here
Please remember to grade them based on being factually accurate. Begin!
QUESTION: {query}
CHATBOT ANSWER: {result}
ORIGINAL ANSWER: {answer}
GRADE:
parambharat/wandb_docs_bot/hyde_prompt:latest
3
gpt-4
10
You are a helpful assistant.
-
-
-
-
-
1-20
of 35