Skip to main content

OpenAI Evals Profiling Report: gpt-3.5-turbo: manga-translation-bubble

2023-04-24 20:41:06
Created on April 24|Last edited on April 24


Leaderboard and Summary


config
model
gpt-3.5-turbo
-
-
-
-
-
-
-
-
-
Given a text representing speech of manga in Japanese, generate a high-quality English translation that accurately conveys the meaning and emotion of the original text. Please do not provide any explanation in the output other than the translation itself.
-
-
-
-
-
-
-
-
-
oaieval_settings
10
-
-
-
-
-
-
-
-
-
-
16
16
16
16
16
16
32
16
16
-
100
100
100
100
100
100
300
200
100
manga-translation-bubble
-
-
-
-
-
-
-
-
-
-
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
wandb/jobs/openai_evals_registry:latest
-
-
-
-
-
-
-
-
-
summary
_wandb
48
2
2
2
2
2
-
3
2
2
spec
run_config
eval_spec
args
manga-translation/bubbles.jsonl
-
-
-
-
-
-
-
-
-
evals.elsuite.translate:Translate
-
-
-
-
-
-
-
-
-
manga-translation
-
-
-
-
-
-
-
-
-
manga-translation-bubble.dev.v0
-
-
-
-
-
-
-
-
-
initial_settings
false
-
-
-
-
-
-
-
-
-
/usr/local/bin/oaieval gpt-3.5-turbo manga-translation-bubble --max_samples=10 --record_path temp.jsonl
-
-
-
-
-
-
-
-
-
["gpt-3.5-turbo"]
-
-
-
-
-
-
-
-
-
10
-
-
-
-
-
-
-
-
-
20220722
-
-
-
-
-
-
-
-
-
manga-translation-bubble
-
-
-
-
-
-
-
-
-
["gpt-3.5-turbo"]
-
-
-
-
-
-
-
-
-
2023-04-24 20:41:01.392611
-
-
-
-
-
-
-
-
-
manga-translation-bubble.dev.v0
-
-
-
-
-
-
-
-
-
230424204101SWQ4B2HT
-
-
-
-
-
-
-
-
-
dev
-
-
-
-
-
-
-
-
-
36.10414
1
1
1
1
2
1
4
3
1
1
99
99
99
99
99
99
299
199
99
1682368865.987
1651698584
1651677022
1651677010
1651676992
1651674576
1651674551
1651609070
1651608448
1651254262
0.4
-
-
-
-
-
-
-
-
-
table-file
-
-
-
-
-
-
-
-
-
-
0.5853
0.94743
0.18038
0.52171
0.0077617
0.93894
0.406
0.40302
0.15443
gpt-3.5-turbo
-
-
-
-
-
-
-
-
-
Given a text representing speech of manga in Japanese, generate a high-quality English translation that accurately conveys the meaning and emotion of the original text. Please do not provide any explanation in the output other than the translation itself.
-
-
-
-
-
-
-
-
-
14.59248
-
-
-
-
-
-
-
-
-
14.592
NaN
0.4
NaN
0.001384
Run set
10


Data Lineage


Run set
10


Create your own custom eval!

1. Download the registry artifact

wandb artifact get wandb/jobs/openai_evals_registry:latest --root openai_evals_registry

2. Set up custom eval

  1. Modify registry/modelgraded/custom.yaml (left panel below)

  2. Modify registry/data/custom/samples_labeled.jsonl (right panel below). Each row should be a JSON object that at least contains these keys:

    1. input, The query submitted to your LLM
    2. completion, The LLM's response
    3. choice, The correct choice among the options


3. Upload your modified registry artifact

wandb artifact put openai_evals_registry

4. Launch job using new registry

{
"run_config": {
"eval": "custom-meta",
"model": "wandb/jobs/openai_evals_model:v0",
"registry": "your_entity/your_project/openai_evals_registry:latest",
"oaieval_settings": {
"max_samples": 10
}
}
}

Run set
10