apiche

Apiche's group workspace

Group: debug_actor_test_logging_july_23rd_1_guessing

1-3

of 3

Tags

Notes

Author

apiche

State

Crashed

Start time

July 23rd, 2025 2:53:27 PM

Runtime

5m 31s

Tracked hours

5m 9s

Run path

apiche/pipeline-rl/debug_actor_test_logging_july_23rd_1_guessing_finetune

Linux-5.15.0-1067-nvidia-x86_64-with-glibc2.39

Python version

CPython 3.11.11

Git repository

git clone git@github.com:ServiceNow/pipelinerl.git

Git state

git checkout -b "debug_actor_test_logging_july_23rd_1_guessing/finetune" 3a6aca9a40fe429fafb5d0da5edb0450a11cb497

Command

pipelinerl/entrypoints/run_finetune.py --config-dir results/debug_actor_test_logging_july_23rd_1_guessing/conf --config-name exp_config output_dir=results/debug_actor_test_logging_july_23rd_1_guessing hydra.run.dir=results/debug_actor_test_logging_july_23rd_1_guessing/finetune +me.weight_update_group_init_method=tcp://localhost:9000 +me.weight_update_group_world_size=2 +me.llm_urls=http://localhost:8080

System Hardware

CPU count	112
Logical CPU count	224
GPU count	4
GPU type	NVIDIA H100 80GB HBM3

W&B CLI Version

0.19.11

Group

debug_actor_test_logging_july_23rd_1_guessing

Config parameters are your model's inputs. Learn more

▶
Config parameters:{} 181 keys
- _cpu:
  "False"
- _mixed_precision:
  "no"
- accelerate_config:
  null
- actor.discount_factor:
  1
- actor.llm_max_rollouts:
  64
- actor.log_each_n_secs:
  0
- actor.problem_queue_size:
  64
- actor.result_queue_size:
  64
- actor.rollout_policy:
  "pipelinerl.domains.guessing.generate_guessing_rollout"
- actor.rollout_workers:
  1
- actor.shared_memory_entry_size:
  10,000,000
- actor.throughput_window_size:
  50
- attempts:
  1
- backend:
  "nccl"
- dataset_loader:
  "pipelinerl.domains.guessing.load_problems"
- debug:
  "False"
- debug.mode:
  ""
- debug.place_inference_workers:
  true
- debug.streams_from:
  null
- debug.use_existing_llms:
  false
- deepspeed_config:
  "deepspeed_stage3_bf16"
- deepspeed_plugins:
  "DeepSpeedPlugin(hf_ds_config=<accelerate.utils.deepspeed.HfDeepSpeedConfig object at 0x7ffc75c1a3d0>, gradient_accumulation_steps=1, gradient_clipping='auto', zero_stage=3, is_train_batch_min=True, offload_optimizer_device='none', offload_param_device='none', offload_optimizer_nvme_path='none', offload_param_nvme_path='none', zero3_init_flag=True, zero3_save_16bit_model=True, transformer_moe_cls_names=None, enable_msamp=False, msamp_opt_level='O1')"
- device:
  "cuda:0"
- distributed_type:
  "DistributedType.DEEPSPEED"
- dynamo_plugin:
  "TorchDynamoPlugin(backend=<DynamoBackend.NO: 'NO'>, mode='default', fullgraph=False, dynamic=False, options=None, disable=False)"
- environment:
  null
- eval_every_n_versions:
  20,000
- finetune.also_save_steps:[] 0 items
- finetune.attn_implementation:
  "flash_attention_2"
- finetune.auto_device_map:
  false
- finetune.config_name:
  "Qwen/Qwen2.5-7B-Instruct"
- finetune.cuda_empty_cache:
  true
- finetune.data:
  null
- finetune.eval_callback._target_:
  "tapeagents.finetune.eval.dummy_eval_callback"
- finetune.eval_callback.config_name:
  ""
- finetune.force_restart:
  true
- finetune.gradient_accumulation_passes:
  1,026
- finetune.gradient_checkpointing:
  true
- finetune.gradient_clipping_threshold:
  0.3
- finetune.input:
  "training_data"
- finetune.interrupt_train_steps:
  -1
- finetune.keep_intermediate_checkpoints:
  true
- finetune.learning_rate:
  0.000001
- finetune.load_as_bf16:
  true
- finetune.log_each_n_steps:
  1
- finetune.lora.alpha:
  16
- world.environment_start_port:
  7,777
- world.finetune_fraction:
  6
- world.preprocessor_fraction:
  0
- world.replicas:
  1

Summary metrics are your model's outputs. Learn more

▶
Summary metrics:{} 69 keys
- rl/advantage:
  -1.529497146606445
- rl/clamp_log_ratio_new_old_indicator:
  0
- rl/clamp_log_ratio_ref_new_indicator:
  0
- rl/entropy:
  0.07191143184900284
- rl/entropy_bonus_coef:
  0
- rl/ess:
  0.996103138464522
- rl/input_size:
  120.97270965576172
- rl/kl:
  0.004619069863110781
- rl/kl_coef:
  0
- rl/loss:
  0.3180975914001465
- rl/max_advantage:
  1.807812452316284
- rl/max_kl:
  0.9836704730987548
- rl/max_loss:
  0.02510165050625801
- rl/max_reward:
  1.2999999523162842
- rl/max_token_weight:
  0.000974658876657486
- rl/min_advantage:
  -2.4585938453674316
- rl/min_kl:
  -0.00000005960464477539
- rl/min_loss:
  -0.019365161657333377
- rl/min_reward:
  -1.899999976158142
- rl/min_token_weight:
  0.000974658876657486
- rl/new_logprobs:
  -0.08967616409063339
- rl/num_nans:
  0
- rl/num_output_tokens_sum:
  9,486
- rl/old_logprobs:
  -0.08333136886358261
- rl/policy_loss:
  0.12415313720703124
- rl/ratio_new_old:
  0.9969342947006226
- rl/ratio_new_old_squared_sum:
  9,523.6376953125
- rl/ratio_new_old_sum:
  9,486.2626953125
- rl/ratio_ref_new:
  1.010963797569275
- rl/ratio_ref_old:
  1
- rl/ref_logprobs:
  -0.08333136886358261
- rl/reward:
  -1.3786550760269165
- rl/surr1:
  0
- rl/surr2:
  0
- rl/token_weight:
  0.000974658876657486
- rl/value_loss:
  0.011088640429079533
- rl/value_max:
  0.71875
- rl/value_mean:
  0.15084204077720642
- rl/value_min:
  -0.67578125
- rl/value_mse:
  2.4945549964904785
- stats/completed_steps:
  15
- stats/epoch:
  0
- stats/grad_norm:
  0.21287604865756085
- stats/lag:
  1,026
- stats/lr:
  0.00000028
- stats/max_actor_version:
  14,364
- throughput/tokens_per_micro_batch:
  3,829.909090909091
- throughput/tokens_per_sec:
  8,303.85229503267
- throughput/tokens_per_step:
  126,387
- throughput/tokens_perGPU_per_sec:
  2,767.95076501089

This run produced these artifacts as outputs. Total: 1. Learn more

wandb-history

run-debug_actor_test_logging_july_23rd_1_guessing_finetune-history:v0