OpenAI Evals Profiling Report: gpt-3.5-turbo: manga-translation-bubble

2023-04-24 20:41:06
Created on April 24|Last edited on April 24
Comment
﻿
Leaderboard and SummaryData LineageCreate your own custom eval!1. Download the registry artifact2. Set up custom eval3. Upload your modified registry artifact4. Launch job using new registry
﻿
Leaderboard and Summary﻿
​
diff only
daily-waterfall-33
mythical-fleet-32
forgotten-cruiser-31
elegant-shuttle-30
carbonite-council-29
grievous-wookie-28
carbonite-master-27
dark-dew-26
helpful-yogurt-25
deep-sea-11
meta(7 collapsed)
config
model
name
name
gpt-3.5-turbo
-
-
-
-
-
-
-
-
-
override_prompt
override_prompt
Given a text representing speech of manga in Japanese, generate a high-quality English translation that accurately conveys the meaning and emotion of the original text.  Please do not provide any explanation in the output other than the translation itself.
-
-
-
-
-
-
-
-
-
oaieval_settings
max_samples
max_samples
10
-
-
-
-
-
-
-
-
-
batch_size
batch_size
-
16
16
16
16
16
16
32
16
16
epochs
epochs
-
100
100
100
100
100
100
300
200
100
eval
eval
manga-translation-bubble
-
-
-
-
-
-
-
-
-
lr
lr
-
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
registry
registry
wandb/jobs/openai_evals_registry:latest
-
-
-
-
-
-
-
-
-
summary
_wandb
runtime
runtime
48
2
2
2
2
2
-
3
2
2
spec
run_config
eval_spec
args
samples_jsonl
samples_jsonl
manga-translation/bubbles.jsonl
-
-
-
-
-
-
-
-
-
cls
cls
evals.elsuite.translate:Translate
-
-
-
-
-
-
-
-
-
group
group
manga-translation
-
-
-
-
-
-
-
-
-
key
key
manga-translation-bubble.dev.v0
-
-
-
-
-
-
-
-
-
initial_settings
visible
visible
false
-
-
-
-
-
-
-
-
-
command
command
/usr/local/bin/oaieval gpt-3.5-turbo manga-translation-bubble --max_samples=10 --record_path temp.jsonl
-
-
-
-
-
-
-
-
-
completion_fns
completion_fns
["gpt-3.5-turbo"]
-
-
-
-
-
-
-
-
-
max_samples
max_samples
10
-
-
-
-
-
-
-
-
-
seed
seed
20220722
-
-
-
-
-
-
-
-
-
base_eval
base_eval
manga-translation-bubble
-
-
-
-
-
-
-
-
-
completion_fns
completion_fns
["gpt-3.5-turbo"]
-
-
-
-
-
-
-
-
-
created_at
created_at
2023-04-24 20:41:01.392611
-
-
-
-
-
-
-
-
-
eval_name
eval_name
manga-translation-bubble.dev.v0
-
-
-
-
-
-
-
-
-
run_id
run_id
230424204101SWQ4B2HT
-
-
-
-
-
-
-
-
-
split
split
dev
-
-
-
-
-
-
-
-
-
_runtime
_runtime
36.10414
1
1
1
1
2
1
4
3
1
_step
_step
1
99
99
99
99
99
99
299
199
99
_timestamp
_timestamp
1682368865.987
1651698584
1651677022
1651677010
1651676992
1651674576
1651674551
1651609070
1651608448
1651254262
accuracy
accuracy
0.4
-
-
-
-
-
-
-
-
-
evals_table
evals_table
table-file
-
-
-
-
-
-
-
-
-
loss
loss
-
0.5853
0.94743
0.18038
0.52171
0.0077617
0.93894
0.406
0.40302
0.15443
model
model
gpt-3.5-turbo
-
-
-
-
-
-
-
-
-
override_prompt
override_prompt
Given a text representing speech of manga in Japanese, generate a high-quality English translation that accurately conveys the meaning and emotion of the original text.  Please do not provide any explanation in the output other than the translation itself.
-
-
-
-
-
-
-
-
-
sacrebleu_score
sacrebleu_score
14.59248
-
-
-
-
-
-
-
-
-
Mean of sacrebleu_score
Mean of sacrebleu_score
14.592
NaN
Mean of accuracy
Mean of accuracy
0.4
NaN
$ Donated to OpenAI
$ Donated to OpenAI
0.001384
Run set10
﻿
Data Lineage﻿
Run set10
﻿
Create your own custom eval!
1. Download the registry artifactwandb artifact get wandb/jobs/openai_evals_registry:latest --root openai_evals_registry
2. Set up custom evalModify registry/modelgraded/custom.yaml (left panel below)
Modify registry/data/custom/samples_labeled.jsonl (right panel below).  Each row should be a JSON object that at least contains these keys:
input, The query submitted to your LLM
completion, The LLM's response
choice, The correct choice among the options
﻿
3. Upload your modified registry artifactwandb artifact put openai_evals_registry
4. Launch job using new registry{
    "run_config": {
        "eval": "custom-meta",
        "model": "wandb/jobs/openai_evals_model:v0",
        "registry": "your_entity/your_project/openai_evals_registry:latest",
        "oaieval_settings": {
            "max_samples": 10
        }
    }
}
﻿
Run set10
﻿
﻿
Add a comment