Changma's workspace
Runs
1
State
Notes
User
Tags
Created
Runtime
Sweep
agent_config.memory_size
agent_config.name
agent_config.need_goal
env_config.alfworld.base_config
env_config.alfworld.batch_size
env_config.alfworld.check_actions
env_config.alfworld.check_inventory
env_config.alfworld.init_prompt_path
env_config.alfworld.label_path
env_config.alfworld.name
env_config.alfworld.split
env_config.babyai.check_actions
env_config.babyai.check_inventory
env_config.babyai.env_num_per_task
env_config.babyai.game_level
env_config.babyai.init_prompt_path
env_config.babyai.label_path
env_config.babyai.name
env_config.babyai.seed
env_config.jericho.check_actions
env_config.jericho.check_inventory
env_config.jericho.game_dir
env_config.jericho.game_name
env_config.jericho.init_prompt_path
env_config.jericho.label_path
env_config.jericho.name
env_config.pddl.check_actions
env_config.pddl.env_num_per_task
env_config.pddl.game_name
env_config.pddl.init_prompt_path
env_config.pddl.label_path
env_config.pddl.name
env_config.scienceworld.check_actions
env_config.scienceworld.check_inventory
env_config.scienceworld.envStepLimit
env_config.scienceworld.init_prompt_path
env_config.scienceworld.label_path
env_config.scienceworld.name
env_config.scienceworld.seed
env_config.tool-operation.check_actions
env_config.tool-operation.dataset_dir
env_config.tool-operation.init_prompt_path
env_config.tool-operation.name
env_config.tool-operation.result_dir
Finished
An evaluation of LLM agent
changma
18h 38m 53s
-
100
VanillaAgent
true
/home/v-changma1/llm-agent-eval-release-version/environment/alfworld/base_config.yaml
1
check valid actions
true
/home/v-changma1/llm-agent-eval-release-version/prompts/VanillaAgent/alfworld_base.json
/home/v-changma1/llm-agent-eval-release-version/data/alfworld/test.jsonl
alfworld
eval_out_of_distribution
check valid actions
inventory
4
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,24,25,27,28,29,31,32]
/home/v-changma1/llm-agent-eval-release-version/prompts/VanillaAgent/babyai_vanilla_prompt.json
/home/v-changma1/llm-agent-eval-release-version/data/babyai/test.jsonl
babyai
1234
check valid actions
inventory
/home/v-changma1/llm-agent-eval-release-version/data/jericho/z-machine-games-master/jericho-game-suite
["905","acorncourt","afflicted","balances","dragon","jewel","library","omniquest","reverb","snacktime","zenon","zork1","zork2","zork3","detective","night","pentari","weapon","huntdark","loose"]
/home/v-changma1/llm-agent-eval-release-version/prompts/VanillaAgent/jericho_vanilla_prompt.json
/home/v-changma1/llm-agent-eval-release-version/data/jericho/test.jsonl
jericho
check valid actions
20
["gripper","blockworld","barman","tyreworld"]
/home/v-changma1/llm-agent-eval-release-version/prompts/VanillaAgent/pddl_vanilla_prompt.json
/home/v-changma1/llm-agent-eval-release-version/data/pddl/test.jsonl
pddl
check valid actions
true
30
/home/v-changma1/llm-agent-eval-release-version/prompts/VanillaAgent/scienceworld_base.json
/home/v-changma1/llm-agent-eval-release-version/data/scienceworld/test.jsonl
scienceworld
0
check_valid_actions
/home/v-changma1/llm-agent-eval-release-version/data
/home/v-changma1/llm-agent-eval-release-version/prompts
tool-operation
/home/v-changma1/llm-agent-eval-release-version/results/operation
1-1
of 1