Skip to main content

Baseline Report Template

2023-04-18 06:04:03
Created on April 18|Last edited on April 18


Leaderboard and Summary


config
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
-
-
-
-
-
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
-
-
-
-
-
false
false
true
true
true
-
-
-
-
-
summary
_wandb
559
481
481
481
481
481
482
482
484
241
6.956
1.181
0.4764
0.02614
5.43
Run set
304


Data Lineage


Run set
304


Create your own custom eval!

1. Download the registry artifact

wandb artifact get wandb/jobs/openai_evals_registry:latest --root openai_evals_registry

2. Set up custom eval

  1. Modify registry/modelgraded/custom.yaml (left panel below)

  2. Modify registry/data/custom/samples_labeled.jsonl (right panel below). Each row should be a JSON object that at least contains these keys:

    1. input, The query submitted to your LLM
    2. completion, The LLM's response
    3. choice, The correct choice among the options


3. Upload your modified registry artifact

wandb artifact put openai_evals_registry

4. Launch job using new registry

{
"run_config": {
"eval": "custom-meta",
"model": "wandb/jobs/openai_evals_model:v0",
"registry": "your_entity/your_project/openai_evals_registry:latest",
"oaieval_settings": {
"max_samples": 10
}
}
}

Run set
304