apiche

Apiche's group workspace

Group: debug_actor_mcp_tir_again

1-1

of 1

Tags

Notes

Author

apiche

State

Crashed

Start time

August 21st, 2025 4:51:59 PM

Runtime

3h 1m 16s

Tracked hours

2h 54m 6s

Run path

apiche/pipeline-rl/debug_actor_mcp_tir_again_actor

Linux-5.15.0-1067-nvidia-x86_64-with-glibc2.39

Python version

CPython 3.11.11

Git repository

git clone git@github.com:ServiceNow/pipelinerl.git

Git state

git checkout -b "debug_actor_mcp_tir_again/actor" f93d7560e6a6449f9e6e8f7b947d9414c59d7e7b

Command

-m pipelinerl.entrypoints.run_actor --config-dir results/debug_actor_mcp_tir_again/conf --config-name exp_config output_dir=results/debug_actor_mcp_tir_again hydra.run.dir=results/debug_actor_mcp_tir_again/actor +me.llm_urls=http://localhost:8080

System Hardware

CPU count	112
Logical CPU count	224
GPU count	1
GPU type	NVIDIA H100 80GB HBM3

W&B CLI Version

0.19.11

Group

debug_actor_mcp_tir_again

Config parameters are your model's inputs. Learn more

▶
Config parameters:{} 196 keys
- accelerate_config:
  null
- actor.discount_factor:
  1
- actor.llm_max_rollouts:
  1
- actor.log_each_n_secs:
  0
- actor.problem_queue_size:
  64
- actor.result_queue_size:
  64
- actor.rollout_policy:
  "pipelinerl.domains.tir_mcp.generate_math_rollout2"
- actor.rollout_workers:
  1
- actor.shared_memory_entry_size:
  10,000,000
- actor.system_prompt:
  "Please reason step by step, and put your final answer within \boxed{}."
- actor.task_template:
  "{task}"
- actor.throughput_window_size:
  50
- agent_max_loops:
  3
- agent._target_:
  "tapeagents.agent.Agent"
- agent.max_iterations:
  3
- agent.name:
  "mcp_agent"
- ▶
  agent.nodes:[] 5 items
- agent.store_llm_calls:
  true
- agent.templates.allowed_steps:
  "You have access to the following tools: {tools_description} "
- agent.templates.allowed_tools:
  "You have access to the following tools: {tools_description} "
- agent.templates.format:
  "Output only a single JSON dict. Do not repeat the last thought again. If the last action does not change the observation, do not repeat it! DO NOT OUTPUT ANYTHING BESIDES THE JSON! DO NOT PLACE ANY COMMENTS INSIDE THE JSON. It will break the system that processes the output. "
- agent.templates.system_prompt:
  "You are an expert AI Agent trained to assist users with complex information processing tasks. Your role is to understand user queries and respond in a helpful and accurate manner. Keep your replies concise and direct. Prioritize clarity and avoid over-elaboration. Do not express emotions or opinions about user questions. "
- agent.templates.thought_format:
  "Important! Respond with the plain text, do not include any JSON or code. Do not output anything besides what I asked in this message. "
- attempts:
  1
- dataset_loader:
  "pipelinerl.domains.math.load_datasets"
- debug.mode:
  "actor"
- debug.place_inference_workers:
  true
- debug.streams_from:
  null
- debug.use_existing_llms:
  false
- deepspeed_config:
  "deepspeed_stage3_bf16"
- environment._target_:
  "pipelinerl.domains.tir_mcp.env_server.MCPEnvironmentServer"
- environment.env_call_timeout:
  600
- environment.exp_path:
  "results/debug_actor_mcp_tir_again/env_server"
- environment.host:
  "0.0.0.0"
- environment.math_target:
  "pipelinerl.domains.math.MathEnvironment"
- environment.mcp_config_path:
  "/home/toolkit/research-now-reasoner/pipelinerl/conf/mcp/python.json"
- environment.mcp_read_timeout_seconds:
  3,000
- environment.mcp_target:
  "tapeagents.mcp.MCPEnvironment"
- ▶
  environment.mcp_tools_whitelist:[] 1 item
  - 0:
    "run_python_code"
- environment.n_envs:
  8
- environment.n_envs_math:
  1
- environment.n_envs_mcp:
  7
- eval_every_n_versions:
  78,000
- finetune.also_save_steps:[] 0 items
- finetune.attempts:
  1
- finetune.attn_implementation:
  "flash_attention_2"
- world.environment_start_port:
  7,777
- world.finetune_fraction:
  4
- world.preprocessor_fraction:
  0
- world.replicas:
  1

Summary metrics are your model's outputs. Learn more

▶
Summary metrics:{} 58 keys
- actor/always_success:
  0
- actor/finished_groups:
  1
- actor/latency_mean:
  244.29058939917013
- actor/model_version_mean:
  0
- actor/never_success:
  1
- actor/no_answer_mean:
  1
- actor/no_error_mean:
  1
- actor/num_python_calls_mean:
  1
- actor/num_steps_mean:
  12
- actor/num_turns_mean:
  7
- actor/open_reasoner_zero_57k/no_answer_mean:
  1
- actor/open_reasoner_zero_57k/no_error_mean:
  1
- actor/open_reasoner_zero_57k/num_python_calls_mean:
  0
- actor/open_reasoner_zero_57k/num_steps_mean:
  5
- actor/open_reasoner_zero_57k/num_turns_mean:
  3
- actor/open_reasoner_zero_57k/output_tokens_max:
  1,755
- actor/open_reasoner_zero_57k/output_tokens_mean:
  1,048.3333333333333
- actor/open_reasoner_zero_57k/output_tokens_min:
  518
- actor/open_reasoner_zero_57k/output_tokens_var:
  270,574.8888888889
- actor/open_reasoner_zero_57k/overflow_mean:
  0
- actor/open_reasoner_zero_57k/prompt_tokens_max:
  1,237
- actor/open_reasoner_zero_57k/prompt_tokens_mean:
  769.3333333333334
- actor/open_reasoner_zero_57k/prompt_tokens_min:
  495
- actor/open_reasoner_zero_57k/prompt_tokens_var:
  110,449.55555555556
- actor/open_reasoner_zero_57k/reward_mean:
  0
- actor/open_reasoner_zero_57k/success_mean:
  0
- actor/open_reasoner_zero_extended_72k/no_answer_mean:
  1
- actor/open_reasoner_zero_extended_72k/no_error_mean:
  1
- actor/open_reasoner_zero_extended_72k/num_python_calls_mean:
  1
- actor/open_reasoner_zero_extended_72k/num_steps_mean:
  12
- actor/open_reasoner_zero_extended_72k/num_turns_mean:
  7
- actor/open_reasoner_zero_extended_72k/output_tokens_max:
  8,192
- actor/open_reasoner_zero_extended_72k/output_tokens_mean:
  2,745.4285714285716
- actor/open_reasoner_zero_extended_72k/output_tokens_min:
  468
- actor/open_reasoner_zero_extended_72k/output_tokens_var:
  7,600,905.673469388
- actor/open_reasoner_zero_extended_72k/overflow_mean:
  0
- actor/open_reasoner_zero_extended_72k/prompt_tokens_max:
  10,389
- actor/open_reasoner_zero_extended_72k/prompt_tokens_mean:
  8,242.714285714286
- actor/open_reasoner_zero_extended_72k/prompt_tokens_min:
  531
- actor/open_reasoner_zero_extended_72k/prompt_tokens_var:
  10,135,128.489795918
- actor/open_reasoner_zero_extended_72k/reward_mean:
  0
- actor/open_reasoner_zero_extended_72k/success_mean:
  0
- actor/output_tokens_max:
  8,192
- actor/output_tokens_mean:
  2,745.4285714285716
- actor/output_tokens_min:
  468
- actor/output_tokens_var:
  7,600,905.673469388
- actor/reward_mean:
  0
- actor/sometimes_success:
  0
- actor/success_mean:
  0
- actor/time_since_start:
  245.42809915542605

This run produced these artifacts as outputs. Total: 1. Learn more

wandb-history

run-debug_actor_mcp_tir_again_actor-history:v0