Comment
throughput/samples_per_sec
throughput/samples_per_sec
preprocessor/samples_per_second
preprocessor/samples_per_second
actor/samples_per_second
actor/samples_per_second
system/gpu.6.memoryAllocated
system/gpu.6.memoryAllocated
group: dzmitry/reason/openreas_qwen32b_r21
group: dzmitry/reason/openreas_qwen7b_r18
actor/output_tokens_per_second
actor/output_tokens_per_second
rl/clamp_log_ratio_new_old_indicator
rl/clamp_log_ratio_new_old_indicator
actor/math_500_output_tokens_mean
actor/math_500_output_tokens_mean
actor/math_500_success_mean
actor/math_500_success_mean
actor/sometimes_success
actor/sometimes_success
actor/reward_mean, actor/aime_2024_success_mean
actor/reward_mean, actor/aime_2024_success_mean
actor/always_success
actor/always_success
actor/never_success
actor/never_success
stats/time_waiting_for_data
stats/time_waiting_for_data
blog
6
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/apiche/pipeline-rl/reports/Pipeline-RL-blog-post-results--VmlldzoxMjQ0MjcwNg