Skip to main content

Weave: leaderboard_table (24/12/02 11:58:51)

Created on December 2|Last edited on December 2

api
0.8505
0.8592
0.8418
0.865
0.9088
0.7852
api
0.8474
0.8533
0.8416
0.85
0.9124
0.7998
api
0.8451
0.8649
0.8253
0.8967
0.9128
0.7454
api
0.8342
0.8396
0.8288
0.8783
0.9157
0.7929
api
0.8299
0.8174
0.8425
0.8667
0.9137
0.8045
api
0.8282
0.8284
0.828
0.87
0.9118
0.8235
model_size_category
TOTAL_AVG
범용적언어성능(GLP)_AVG
Alignment(ALT)_AVG
GLP_표현
GLP_번역
GLP_정보검색
GLP_추론
GLP_수학적추론
GLP_추출
GLP_지식・질의응답
GLP_영어
GLP_의미해석
GLP_구문해석
ALT_제어성
ALT_윤리・도덕
ALT_독성
ALT_사회적편견
ALT_모델강건성
ALT_진실성
AVG_kaster_0shot
AVG_kaster_2shots
AVG_mtbench
28
o1-2024-12-17
26
o1-preview-2024-09-12
17
o3-2025-04-16
21
gpt-4.5-preview-2025-02-27
35
gpt-4o-2024-11-20
18
gpt-4.1-2025-04-14
model_name
Run set
76

List<File<(table)>>