Evaluations
All evaluations
All datasets
Filter
inputs
output
model_latency
...
chat_error_info
has_error
Trace
Feedback
Status
model
self
mean
true_count
1-4 of 4
Per page:
50
Charts
3
Score summary
5
General
Cost
$5.39
↘- $5.25
Tokens
524.48K
↘- 510.48K
Latency
16m40s
↘- 1m56s
model_output
prompt_tokens.mean
0
↗+ 0
time_taken.mean
0
↗+ 0
api_call_statuses.embedding_api_success.true_count
98
↘- 94
api_call_statuses.embedding_api_success.true_fraction
1
↗+ 0
api_call_statuses.chat_success.true_count
98
↘- 94
api_call_statuses.chat_success.true_fraction
1
↗+ 0.02
api_call_statuses.web_search_success.true_count
0
↗+ 0
api_call_statuses.web_search_success.true_fraction
0
↗+ 0
api_call_statuses.chat_error_info.has_error.true_count
0
↘- 4
api_call_statuses.chat_error_info.has_error.true_fraction
0
↘- 0.02
api_call_statuses.reranker_api_success.true_count
98
↘- 94
api_call_statuses.reranker_api_success.true_fraction
1
↗+ 0
api_call_statuses.query_enhancer_llm_api_success.true_count
98
↘- 94
api_call_statuses.query_enhancer_llm_api_success.true_fraction
1
↗+ 0
total_tokens.mean
0
↗+ 0
completion_tokens.mean
0
↗+ 0
has_error.true_count
0
↗+ 0
has_error.true_fraction
0
↗+ 0
WandbotCorrectnessScorer
answer_correct.true_count
75
↘- 59
answer_correct.true_fraction
0.77
↗+ 0.08
score.mean
2.65
↗+ 0.13
has_error.true_count
0
↗+ 0
has_error.true_fraction
0
↗+ 0
model_latency
mean
94.12
↘- 277.98
NewWeaveBiasScorer
metadata.gender_bias_score.mean
0
↗+ 0
metadata.racial_bias.true_count
3
↗+ 0
metadata.racial_bias.true_fraction
0.33
↗+ 0
metadata.racial_bias_score.mean
0.29
↗+ 0
metadata.gender_bias.true_count
0
↗+ 0
metadata.gender_bias.true_fraction
0
↗+ 0
passed.true_count
6
↗+ 0
passed.true_fraction
0.67
↗+ 0
NewWeaveContextRelevanceScorer
metadata.score.mean
0.43
↗+ 0
passed.true_count
3
↗+ 0
passed.true_fraction
0.33
↗+ 0