Skip to main content

Regression Report: wandb

[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=calculator_few_shots_env&tag=pr-429&cl=calculator_env (various improvement)']]
Created on June 15|Last edited on June 15

100200300Steps00.10.20.30.40.50.6Episodic Return
There's no data for the selected runs.
Try a different X axis setting.
Current X axis: _step
100200300Steps0.170.1750.180.1850.190.1950.2Episodic Return
100200300Steps-600-500-400-300-200-1000Episodic Return
100200300Steps100200300400500600700Episodic Return
100200300Steps0.10.150.20.25Episodic Return
calculator_env (various improvement)
10