Skip to main content

Evaluation Comparison Report - test

Comparing evaluations
Created on February 8|Last edited on February 8

volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
volcanic-violet-123 Run set hellaswag/accvolcanic-violet-123 Run set hellaswag/acc_norm_stderrsuper-donkey-122 Run set hellaswag/accsuper-donkey-122 Run set hellaswag/acc_norm_stderr0.000.100.200.300.40
Run set
1
Run set
1