アプリケーション開発

アプリケーション開発能力と安全性評価を強化し、実用的なLLM選定を支援

Yuya Yamamoto, Kei Kamata, Kotaro Yamamoto, Taichi Ibi, Kazuki Kurosawa, Hyunwoo Oh, Koshiro Murakami, chengwei gu, Richard Song, Akira Shibata

Created on August 26|Last edited on August 26

Comment

﻿
コーディング (coding)SWE-Bench Verified, HumanEval‑ja, MT-bench（coding）
﻿
📋 カテゴリ別リーダーボード
runs.summary["subcategory_table_coding"]
 - 24 of 77
jhumaneval_score
coding_mtbench
AVG
14
20
27
19
75
16
run.name
swebench_resolution_rate
Run set77
﻿
﻿
関数呼び出し (function calling)BFCL
﻿
Run set77
﻿
﻿
﻿
﻿

Add a comment