Evaluation Comparison Report - test
Comparing evaluations
Created on February 8|Last edited on February 8
Comment
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
hellaswag/acc, hellaswag/acc_norm_stderr
Run set
1
Run set
1
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/ayush-thakur/lm-eval-harness-integration/reports/Evaluation-Comparison-Report-test--Vmlldzo2NzY3ODky