Skip to main content

OpenAI Evals Profiling Report: gpt-3.5-turbo: emotional-intelligence

2023-06-08 16:29:05
Created on June 8|Last edited on June 8


Leaderboard and Summary


config
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
wandb/shawn_rl_r1repro/run-v4mn20ec-model:v0
-
-
-
-
-
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
v4mn20ec
-
-
-
-
-
false
false
true
true
true
-
-
-
-
-
summary
_wandb
559
481
481
481
481
481
482
482
484
241
0.4764
0.02614
Run set
304


Data Lineage


Run set
304


Create your own custom eval!

1. Download the registry artifact

wandb artifact get wandb/jobs/openai_evals_registry:latest --root openai_evals_registry

2. Set up custom eval

  1. Modify registry/modelgraded/custom.yaml (left panel below)

  2. Modify registry/data/custom/samples_labeled.jsonl (right panel below). Each row should be a JSON object that at least contains these keys:

    1. input, The query submitted to your LLM
    2. completion, The LLM's response
    3. choice, The correct choice among the options


3. Upload your modified registry artifact

wandb artifact put openai_evals_registry

4. Launch job using new registry

{
"run_config": {
"eval": "custom-meta",
"model": "wandb/jobs/openai_evals_model:v0",
"registry": "your_entity/your_project/openai_evals_registry:latest",
"oaieval_settings": {
"max_samples": 10
}
}
}

Run set
304