Reports
Created by
Created On
Last edited
Regression Report: td3_continuous_action
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metrics=charts/episodic_return&metrics=charts/episodic_length&metrics=charts/SPS&metrics=losses/actor_loss&metrics=losses/qf1_values&metrics=losses/qf1_loss', 'td3_continuous_action?tag=rlops-pilot', 'td3_continuous_action?tag=pr-377']]
0
2023-10-11
Regression Report: sentiment_tuning_gpt2xl_grad_accu
[['?we=huggingface&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.reward_model&cen=trl_ppo_trainer_config.value.exp_name&metrics=env/reward_mean&metrics=objective/kl', 'sentiment_tuning?tag=v0.4.7-55-g110e672&cl=sentiment lvwerra/gpt2-imdb (PR-662)', 'sentiment_tuning_gpt2?tag=v0.4.7-55-g110e672&cl=sentiment gpt2 (PR-662)', 'sentiment_tuning_falcon_rw_1b?tag=v0.4.7-55-g110e672&cl=sentiment tiiuae/falcon-rw-1b (PR-662)', 'sentiment_tuning_gpt2xl_grad_accu?tag=v0.4.7-55-g110e672&cl=sentiment gpt2xl (PR-662)']]
0
2023-09-22
Regression Report: train_policy_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/score_total&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy_avg&metrics=ppo/policy/clipfrac_avg&metrics=ppo/policy/entropy_avg&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl_avg&metrics=ppo/val/clipfrac_avg&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-58-g4f42012&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2']]
0
2023-08-10
TriviaQA Final Experiments
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=longer_tool_response_newprompt&tag=prod&tag=triviaqa&tag=v0.4.7-74-ga993d12&cl=TriviaQA']]
0
2023-08-29
Regression Report: train_reward_jax
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-76-gfbf1f0c&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2', 'train_reward_jax?tag=v0.1.0-75-g8cc6065&tag=tf_adam&tag=gpt2&cl=jax,tf_adam,gpt2']]
0
2023-08-27
Regression Report: train_reward_jax
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-76-gfbf1f0c&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2', 'train_reward_jax?tag=v0.1.0-75-g8cc6065&tag=tf_adam&tag=gpt2&cl=jax,tf_adam,gpt2']]
0
2023-08-27
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-68-g2f3aa38&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2'], ['?we=tliu&wpn=cleanrl&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_jax', 'train_reward_accelerate']]
0
2023-08-27
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-68-g2f3aa38&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2'], ['?we=tliu&wpn=cleanrl&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_jax', 'train_reward_accelerate']]
0
2023-08-27
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy_avg&metrics=ppo/policy/clipfrac_avg&metrics=ppo/policy/entropy_avg&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl_avg&metrics=ppo/val/clipfrac_avg&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-68-g2f3aa38&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2'], ['?we=tliu&wpn=cleanrl&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_jax', 'train_reward_accelerate']]
0
2023-08-27
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-58-g4f42012&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2']]
0
2023-08-14
Regression Report: train_policy_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/score_total&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy_avg&metrics=ppo/policy/clipfrac_avg&metrics=ppo/policy/entropy_avg&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl_avg&metrics=ppo/val/clipfrac_avg&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-58-g4f42012&tag=tf_adam&tag=gpt2-large&cl=tf_adam,gpt2-large', 'train_policy_accelerate?tag=v0.1.0-58-g4f42012&tag=pt_adam&tag=gpt2-large&cl=pt_adam,gpt2-large']]
0
2023-08-12
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-49-g98820bb&tag=tf_adam&tag=gpt2-large&cl=tf_adam,gpt2-large', 'train_reward_accelerate?tag=v0.1.0-49-g98820bb&tag=pt_adam&tag=gpt2-large&cl=pt_adam,gpt2-large']]
0
2023-08-10
Regression Report: train_reward_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-49-g98820bb&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2', 'train_reward_accelerate?tag=v0.1.0-49-g98820bb&tag=pt_adam&tag=gpt2&cl=pt_adam,gpt2']]
0
2023-08-10
Regression Report: train_policy_adamw
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/score_total&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_adamw?tag=v0.1.0-26-ge5aae95']]
0
2023-07-17
Regression Report: train_policy_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/score_total&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-20-gd63c6c3']]
0
2023-07-16
Regression Report: train_policy_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-20-gd63c6c3']]
0
2023-07-16
Regression Report: train_policy_accelerate
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-20-gd63c6c3']]
0
2023-07-16
Regression Report: train_policy_adam5e-4
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=base_model&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_adam5e-4?tag=v0.1.0-9-gc56a4aa']]
0
2023-07-12
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-457&tag=sgd&cl=sentiment analysis SGD', 'wandb?tag=gpt2-sentiment&tag=pr-457&tag=adam&tag=mideps&cl=sentiment analysis Adam w/ eps=4e-3']]
0
2023-07-10
Regression Report: 124M
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/error&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/returns/mean&metrics=train_reward/minibatch/loss&metrics=ppo/ppo/val/vpred&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/val/var_explained&metrics=ppo/objective/score_total&metrics=train_reward/minibatch/error&metrics=ppo/elapsed/fps&metrics=ppo/global_step&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/val/var&metrics=ppo/ppo/val/clipfrac&metrics=ppo/objective/entropy&metrics=ppo/ppo/returns/var&metrics=ppo/objective/kl_coef&metrics=ppo/elapsed/time', '124M']]
0
2023-07-09
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)', 'wandb?tag=gpt2-sentiment&tag=pr-457&tag=sgd&cl=sentiment analysis SGD']]
0
2023-07-06
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)', 'wandb?tag=gpt2-sentiment&tag=v0.4.6-18-gbbc7eeb&cl=sentiment analysis SGD']]
0
2023-07-06
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)']]
0
2023-06-28
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)', 'wandb?tag=gpt2-sentiment-1-nminibs&cl=sentiment analysis (no minibatches, target kl=6)']]
0
2023-06-28
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)', 'wandb?tag=gpt2-sentiment-1-nminibs&cl=sentiment analysis (no minibatches, target kl=6)']]
0
2023-06-28
Regression Report: 124M
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/error&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/returns/mean&metrics=train_reward/minibatch/loss&metrics=ppo/ppo/val/vpred&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/val/var_explained&metrics=ppo/objective/score_total&metrics=train_reward/minibatch/error&metrics=ppo/elapsed/fps&metrics=ppo/global_step&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/val/var&metrics=ppo/ppo/val/clipfrac&metrics=ppo/objective/entropy&metrics=ppo/ppo/returns/var&metrics=ppo/objective/kl_coef&metrics=ppo/elapsed/time', '124M']]
0
2023-06-26
Regression Report: wandb
[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=calculator_mask&cl=calculator_mask', 'wandb?tag=calculator_mask_direct_rewrad&cl=calculator_mask_direct_rewrad']]
0
2023-06-26
Regression Report: 124M
[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl', '124M']]
0
2023-06-23
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean', 'wandb?tag=calculator_few_shots_env3&tag=pr-429&cl=calculator_env (various improvement 2)']]
0
2023-06-22
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean', 'wandb?tag=calculator_few_shots_env_no_training&tag=pr-429&cl=baseline (no training at all)', 'wandb?tag=calculator_few_shots_env&tag=pr-429&cl=calculator_env (various improvement)']]
0
2023-06-15
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=objective/kl&metrics=objective/entropy', 'wandb?tag=calculator_few_shots_env&tag=pr-429&cl=calculator_env (various improvement)']]
0
2023-06-15
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=calculator_few_shots_env&tag=pr-429&cl=calculator_env (various improvement)']]
0
2023-06-15
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)']]
0
2023-06-09
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=gpt2-sentiment&cl=sentiment analysis (PR-410)']]
0
2023-06-08
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=calculator_few_shots&cl=calculator few shots']]
0
2023-06-07
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=calculator', 'wandb?tag=calculator2&cl=with min_length=1, eos_token_id=-1']]
0
2023-06-07
Regression Report: wandb
[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=calculator', 'wandb?tag=calculator2&cl=with min_length=1, eos_token_id=-1']]
0
2023-06-07
Regression Report: openai/baselines PPO2
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_continuous_action?tag=v1.0.0-27-gde3f410&cl=CleanRL PPO'], ['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-mlp?cl=openai/baselines PPO2']]
0
2023-06-05
Regression Report: openrlbenchmark/cleanrl/ddpg_continuous_action_jax ({'tag': ['pr-298']})
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ddpg_continuous_action?tag=pr-371', 'ddpg_continuous_action?tag=pr-299', 'ddpg_continuous_action?tag=rlops-pilot', 'ddpg_continuous_action_jax?tag=pr-371-jax', 'ddpg_continuous_action_jax?tag=pr-298']]
0
2023-06-02
Regression Report: sac_continuous_action
[['?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean', 'a2c', 'ddpg', 'ppo_lstm?cl=PPO w/ LSTM', 'sac', 'td3', 'ppo', 'trpo'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'sac_continuous_action?tag=rlops-pilot&cl=SAC']]
0
2023-05-05
Regression Report: ddpg_continuous_action_jax
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ddpg_continuous_action?tag=pr-371', 'ddpg_continuous_action_jax?tag=pr-371-jax']]
0
2023-05-03
Regression Report: dqn_jax
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'dqn?tag=pr-370', 'dqn_jax?tag=pr-370-jax', 'dqn?tag=rlops-pilot', 'dqn_jax?tag=rlops-pilot']]
0
2023-05-03
Regression Report: dqn_atari_jax
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'dqn_atari_jax?tag=rlops-pilot', 'dqn_atari_jax?tag=pr-370-atari-jax']]
0
2023-05-03
Regression Report: moolib_impala_envpool_machado
[['?we=openrlbenchmark&wpn=moolib-atari&ceik=env_id&cen=exp_name&metric=global/mean_episode_return', 'moolib_impala_envpool_machado?cl=Moolib (Resnet CNN, 1nd set 3 seeds) 1 A100, 10 CPU'], ['?we=costa-huang&wpn=moolib-atari-2&ceik=env_id&cen=exp_name&metric=global/mean_episode_return', 'moolib_impala_envpool_machado?cl=Moolib (Resnet CNN, 2nd set 3 seeds) 1 A100, 10 CPU']]
0
2023-04-18
Regression Report: ppo_atari_envpool_symlog
[['?we=ryan-colab&wpn=PPO-v3&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_atari_envpool_unclipped?tag=v0.0.1-61-g381cb14&cl=reward_clip=False', 'ppo_atari_envpool?tag=v0.0.1-61-g381cb14&cl=basline (reward_clip=True)', 'ppo_atari_envpool_symlog?tag=v0.0.1-61-g381cb14&cl=symlog_on_reward']]
0
2023-03-29
Regression Report: ddpg_continuous_action
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ddpg_continuous_action?tag=pr-299', 'ddpg_continuous_action?tag=rlops-pilot']]
0
2023-03-26
Regression Report: ddpg_continuous_action
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ddpg_continuous_action?tag=pr-299', 'ddpg_continuous_action?tag=rlops-pilot']]
0
2023-03-26
Regression Report: moolib_impala_alepy_8gpu
[['?we=openrlbenchmark&wpn=cleanba&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'cleanba_ppo_envpool_impala_atari_wrapper?tag=v0.0.1-16-g32dbf31&cl=baseline (8 A100)'], ['?we=openrlbenchmark&wpn=moolib-atari&ceik=env_id&cen=exp_name&metric=global/mean_episode_return', 'moolib_impala_alepy_4gpu', 'moolib_impala_alepy_8gpu']]
0
2023-03-18
Regression Report: cleanba_ppo_envpool_procgen
[['?we=openrlbenchmark&wpn=cleanba&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'cleanba_ppo_envpool_procgen?tag=v0.0.1-1-gf0c2e8c']]
0
2023-02-23
Regression Report: ppo_dmc_envpool
[['?we=dream-team-v3&wpn=PPO-v3&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_dmc_envpool?tag=v0.0.1-14-gb2aee2d']]
0
2023-02-22
Regression Report: ppo_atari_envpool
[['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=dream-team-v3&wpn=PPO-v3&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_atari_envpool?tag=v0.0.1-5-g61d4028']]
0
2023-02-22
Regression Report: ppo_atari_envpool
[['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=dream-team-v3&wpn=PPO-v3&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_atari_envpool']]
0
2023-02-22
Regression Report: ppo_atari_envpool
[['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=dream-team-v3&wpn=PPO-v3&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_atari_envpool']]
0
2023-02-22
Regression Report: ppo_continuous_action_envpool_xla_jax_scan
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'ppo_continuous_action_8M?tag=v1.0.0-13-gcbd83f6'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_continuous_action_envpool_xla_jax_scan?tag=v1.0.0-jax-ca-be3113b']]
1
2023-02-04
Regression Report: dqn_atari_jax
[['?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean', 'dqn', 'ppo_lstm'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'c51_atari_jax', 'dqn_atari_jax']]
0
2023-01-17
Regression Report: ppo_atari_envpool_xla_jax_scan
[['?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean', 'ppo', 'ppo_lstm'], ['?we=tianshou&wpn=atari.benchmark&ceik=task&cen=algo_name&metric=test/reward', 'iqn', 'ppo', 'rainbow', 'fqf', 'c51', 'dqn', 'qrdqn'], ['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_scan?tag=pr-328']]
0
2023-01-02
Regression Report: ppo_atari_envpool_xla_jax_truncation
[['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_scan?tag=pr-328&user=51616', 'ppo_atari_envpool_xla_jax?tag=pr-328&user=51616'], ['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_truncation?user=costa-huang']]
0
2022-12-21
Regression Report: ppo_atari_envpool_xla_jax_scan
[['?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return', 'baselines-ppo2-cnn'], ['?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_truncation'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_scan?tag=pr-328']]
0
2022-12-20
Regression Report: ppo_atari_envpool_xla_jax_scan
[['?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_truncation'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax_scan?tag=pr-328']]
0
2022-12-20
Regression Report: sac_continuous_action
[['?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean', 'a2c', 'ddpg', 'ppo_lstm', 'sac', 'td3', 'ppo', 'trpo'], ['?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return', 'sac_continuous_action?tag=rlops-pilot']]
0
2022-12-16
Regression Report: ppo_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-12-08
Regression Report: ppo_atari_envpool_xla_vclip_jax
['ppo_atari_envpool_xla_jax?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_vclip_jax?metric=charts/avg_episodic_return']
0
2022-12-08
Regression Report: sac_jax
['sac_continuous_action_jax?tag=pr-300', 'sac_jax?tag=rlops-pilot']
0
2022-11-22
Regression Report: ppo_atari_envpool_xla_vclip_jax
['baselines-ppo2-cnn?wpn=baselines&we=openrlbenchmark&ceik=gym_id', 'ppo_atari_envpool_xla_jax_truncation?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_vclip_jax?metric=charts/avg_episodic_return']
0
2022-11-12
Regression Report: ppo_atari_envpool_xla_vclip_jax
['baselines-ppo2-cnn?wpn=baselines&we=openrlbenchmark&ceik=gym_id', 'ppo_atari_envpool_xla_jax_truncation?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_vclip_jax?metric=charts/avg_episodic_return']
0
2022-11-12
Regression Report: ppo_atari_envpool_xla_vclip_jax
['baselines-ppo2-cnn?wpn=baselines&we=openrlbenchmark&ceik=gym_id', 'ppo_atari_envpool_xla_jax_truncation?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_jax?metric=charts/avg_episodic_return', 'ppo_atari_envpool_xla_vclip_jax?metric=charts/avg_episodic_return']
0
2022-11-12
Regression Report: ppo_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-11-10
Regression Report: ppo_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-11-10
Regression Report: ppo_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-11-08
Regression Report: ddpg_continuous_action_jax
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot']
0
2022-11-08
Regression Report: ppo_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-11-08
Regression Report: ddpg_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299']
0
2022-11-08
Regression Report: ddpg_continuous_action_jax
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299', 'ppo_continuous_action?user=costa-huang&tag=rlops-pilot']
0
2022-11-08
Regression Report: ddpg_continuous_action
['ddpg_continuous_action_jax?user=joaogui1&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=joaogui1&tag=pr-298', 'ddpg_continuous_action_jax?user=costa-huang&tag=rlops-pilot', 'ddpg_continuous_action_jax?user=costa-huang&tag=pr-298', 'ddpg_continuous_action?user=costa-huang&tag=pr-299']
0
2022-11-08
0
2022-11-03
0
2022-11-03
0
2022-11-03
0
2022-11-01
0
2022-11-01
0
2022-11-01
0
2022-11-01
0
2022-10-31
0
2022-10-31
0
2022-10-31
0
2022-10-31
0
2022-10-31
0
2022-10-31
0
2022-10-31
0
2022-10-06
0
2022-10-06
0
2022-10-06
Archived - Atari: CleanRL's PPO
A comparison of the performance of CleanRL's PPO on Atari games.
0
2022-06-02