Skip to main content
wandb-japan
Projects
llm-leaderboard3
Reports
Weave: mtbench_leaderboard_table (25/07/01 02:57:37)
Log in
Sign up
Share
Comment
Star
Weave: mtbench_leaderboard_table (25/07/01 02:57:37)
Yuya Yamamoto
Created on June 30
|
Last edited on June 30
Comment
runs
.
summary
["
leaderboard_table
"]
⏎
Filter
api
0.8064
0.8578
0.8321
-
2025-04-16 00:00:00
api
0.7712
0.8797
0.8255
-
2025-05-23 00:00:00
api
0.7971
0.8518
0.8244
-
2024-12-17 00:00:00
model_size_category
汎用的言語性能(GLP)_AVG
アラインメント(ALT)_AVG
TOTAL_AVG
model_size
model_release_date
GLP_表現
GLP_翻訳
GLP_情報検索
GLP_推論
GLP_数学的推論
GLP_抽出
GLP_知識・質問応答
GLP_英語
GLP_意味解析
GLP_構文解析
ALT_制御性
ALT_倫理・道徳
ALT_毒性
ALT_バイアス
ALT_堅牢性
ALT_真実性
AVG_jaster_0shot
AVG_jaster_2shots
AVG_mtbench
AVG_lctg
16
o3-2025-04-16
6
anthropic/claude-sonnet-4
40
o1-2024-12-17
model_name
Run set
99
Add a comment
List<File<(table)>>
Ops
.contents
.count
.digest
.dropna
.filter((row) => row)
.isNone
.join(, (row) => row, (row) => row, "", "", , )
.joinToStr("")
.map((row, index) => row)
.merge("")
.size
.table
.table("")
[]
.project
.run