Nejumi LLM Leaderboaed 4
Enhancing Evaluation of Application Development Capabilities and AI Safety to Support Practical LLM Selection
Created on September 17|Last edited on September 18
Comment

With Nejumi Leaderboard 4, we set out to raise the resolution of evaluation in response to the saturation problem of existing benchmarks.
📄📄📄 For those who believe that “evaluating models is important, but evaluating generative AI applications is even more critical,” here’s our latest white paper.
Evaluation Taxonomy (click to expand for details)
Main Leaderboard
Sorted by total score (the number on the left indicates the evaluation job ID, not ranking).
Breakdown of Each Model’s Features by Category
From the list of models below, you can use the 👁️ icon to select and change which models are displayed.
Add a comment