Skip to main content

Evaluation Comparison Report - test

Comparing evaluations
Created on February 7|Last edited on February 7

Showing first 10 bars
volcanic-violet-123 Run setresplendent-dragon-181 Run settwinkling-paper-175 Run setred-horse-174 Run setalight-noodles-173 Run setlambent-festival-172 Run setflashing-lamp-171 Run setabundant-rat-170 Run setcrimson-noodles-169 Run setfortuitous-paper-168 Run set0.000.100.200.300.40
meta
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","hellaswag,mmlu_abstract_algebra","--device","cuda:0","--batch_size","32","--output_path","output/phi-2","--limit","10","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","mmlu,ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","arc_fr","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
["--model","hf","--model_args","pretrained=microsoft/phi-2,trust_remote_code=True","--tasks","ai2_arc","--device","cuda:0","--batch_size","4","--output_path","output/phi-2-mmlu-arc","--limit","2","--wandb_args","project=lm-eval-harness-integration","--log_samples"]
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"241e0f1129631335e2e346dd8d24f5174c689f87","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"9279b05e0639dbc43b2fa1c3c35a68e2b08216b9","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"89503de1916d2c807c75e23241f4b450e22ed671","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"89503de1916d2c807c75e23241f4b450e22ed671","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"89503de1916d2c807c75e23241f4b450e22ed671","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"89503de1916d2c807c75e23241f4b450e22ed671","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"06b22f17a1b85b0f9d076b5cf5b75e452be0ba1c","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"06b22f17a1b85b0f9d076b5cf5b75e452be0ba1c","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"a8094500ec842cc467bd18f74c546495651cabbc","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"a8094500ec842cc467bd18f74c546495651cabbc","__typename":"GitInfo"}
{"remote":"https://github.com/ayulockin/lm-evaluation-harness","commit":"a8094500ec842cc467bd18f74c546495651cabbc","__typename":"GitInfo"}
11s
25s
9s
12s
2h 52m 19s
4s
5s
4s
5s
4s
6s
config
cli_configs
32
4
4
4
4
4
4
4
4
4
4
10
2
2
2
2
2
2
2
2
2
2
task_configs
arc_challenge
metadata
-
1
1
1
1
-
1
1
1
1
1
-
ARC-Challenge
ARC-Challenge
ARC-Challenge
ARC-Challenge
-
ARC-Challenge
ARC-Challenge
ARC-Challenge
ARC-Challenge
ARC-Challenge
-
allenai/ai2_arc
allenai/ai2_arc
allenai/ai2_arc
allenai/ai2_arc
-
allenai/ai2_arc
allenai/ai2_arc
allenai/ai2_arc
allenai/ai2_arc
allenai/ai2_arc
-
{{choices.text}}
{{choices.text}}
{{choices.text}}
{{choices.text}}
-
{{choices.text}}
{{choices.text}}
{{choices.text}}
{{choices.text}}
{{choices.text}}
-
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
-
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
-
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
-
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
{{choices.label.index(answerKey)}}
-
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
-
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
Question: {{question}} Answer:
-
-
-
["ai2_arc"]
["ai2_arc"]
["ai2_arc"]
["ai2_arc"]
-
["ai2_arc"]
["ai2_arc"]
["ai2_arc"]
["ai2_arc"]
["ai2_arc"]
arc_easy
metadata
Run set
1
Run set
215