OpenAI Evals Profiling Report: gpt-3.5-turbo: aba_mrpc_true_false

2023-04-20 03:08:39
Created on April 20|Last edited on April 20
Comment
﻿
Leaderboard and SummaryData LineageCreate your own custom eval!1. Download the registry artifact2. Set up custom eval3. Upload your modified registry artifact4. Launch job using new registry
﻿
Leaderboard and Summary﻿
​
diff only
robust-resonance-319
eager-star-318
firm-wind-315
dandy-violet-314
peachy-dragon-313
skilled-night-312
firm-hill-309
snowy-resonance-308
serene-dragon-307
generous-sound-306
meta(4 collapsed)
config
artifact_path
artifact_path
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
-
-
-
-
-
model
model
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
-
-
-
-
-
step_by_step
step_by_step
false
false
true
true
true
-
-
-
-
-
summary
_wandb
runtime
runtime
559
481
481
481
481
481
482
482
484
241
Mean of accuracy
Mean of accuracy
0.4764
0.02614
Run set304
﻿
Data Lineage﻿
Run set304
﻿
Create your own custom eval!
1. Download the registry artifactwandb artifact get wandb/jobs/openai_evals_registry:latest --root openai_evals_registry
2. Set up custom evalModify registry/modelgraded/custom.yaml (left panel below)
Modify registry/data/custom/samples_labeled.jsonl (right panel below).  Each row should be a JSON object that at least contains these keys:
input, The query submitted to your LLM
completion, The LLM's response
choice, The correct choice among the options
﻿
3. Upload your modified registry artifactwandb artifact put openai_evals_registry
4. Launch job using new registry{
    "run_config": {
        "eval": "custom-meta",
        "model": "wandb/jobs/openai_evals_model:v0",
        "registry": "your_entity/your_project/openai_evals_registry:latest",
        "oaieval_settings": {
            "max_samples": 10
        }
    }
}
﻿
Run set304
﻿
﻿
Add a comment