Weave: mtbench_leaderboard_table (25/07/01 09:11:26)
Created on July 1|Last edited on July 1
Comment
runs.summary["leaderboard_table"]
- 3 of 99
api
0.8064
0.8578
0.8321
-
2025-04-16 00:00:00
api
0.7712
0.8797
0.8255
-
2025-05-23 00:00:00
api
0.7971
0.8518
0.8244
-
2024-12-17 00:00:00
model_size_category
汎用的言語性能(GLP)_AVG
アラインメント(ALT)_AVG
TOTAL_AVG
model_size
model_release_date
GLP_表現
GLP_翻訳
GLP_情報検索
GLP_推論
GLP_数学的推論
GLP_抽出
GLP_知識・質問応答
GLP_英語
GLP_意味解析
GLP_構文解析
ALT_制御性
ALT_倫理・道徳
ALT_毒性
ALT_バイアス
ALT_堅牢性
ALT_真実性
AVG_jaster_0shot
AVG_jaster_2shots
AVG_mtbench
AVG_lctg
16
o3-2025-04-16
6
anthropic/claude-sonnet-4
40
o1-2024-12-17
model_name
Run set
99
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/wandb-japan/llm-leaderboard3/reports/Weave-mtbench_leaderboard_table-25-07-01-09-11-26---VmlldzoxMzQwOTI0MQ