Skip to main content

sebulba various setting

Created on January 27|Last edited on January 28
Finding highlights:
  • adding a timeout significantly reduces params_queue get time for actors, especially when paired with a dedicated GPU; however it does come with a side effect (e.g., such as the actor will not be able to pull the latest params from the learner and will use the old params instead)

1. rollout is faster than training


Breakout-v5__sebulba_ppo_envpool_thpt_rollout_is_faster__1__1674859375 stats/rollout_timeBreakout-v5__sebulba_ppo_envpool_thpt_rollout_is_faster__1__1674859375 stats/training_time0.000.100.200.30
100k200k300k400kglobal_step200040006000800010000
sebulba_ppo_envpool_thpt_rollout_is_faster
1

1.1 rollout is faster than training w/ timeout

sebulba_ppo_envpool_1gpu_rollout_is_faster
1
sebulba_ppo_envpool_a0_l1_rollout_is_faster
1
sebulba_ppo_envpool_a0_l01_rollout_is_faster
1
sebulba_ppo_envpool_a0_l12_rollout_is_faster
1
sebulba_ppo_envpool_1gpu_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l1_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l01_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l12_rollout_is_faster_timeout
1




1.2. rollout is much faster than training w/ timeout (heavy slow learning updates) + timeout in params queue

Note that sebulba_ppo_envpool_a0_l01_rollout_is_faster performs the same as sebulba_ppo_envpool_a0_l12_rollout_is_faster. However, as we make the training even slower, there is a more significant difference.
💡

sebulba_ppo_envpool_thpt_rollout_is_faster
1

Note in sebulba_ppo_envpool_a0_l01_rollout_is_much_faster that the params_queue is full, but the actor can't seem to do the rollout because it's sharing GPU0 with the learner, which uses both GPU0 and GPU1. Whereas in sebulba_ppo_envpool_a0_l12_rollout_is_much_faster, the actor can step faster because it has its own GPU.

throughput
1
sebulba_ppo_envpool_a0_l01_rollout_is_much_faster
1
sebulba_ppo_envpool_a0_l12_rollout_is_much_faster
1


2. training is faster than rollout


Warning: actually I was wrong here. Training is not really faster than rollout. However, when we consider the time actor spent on transferring data to the learner device, which takes roughly 0.06, it actually makes the actor thread and learner thread have roughly the same speed.
💡

sebulba_ppo_envpool_thpt_rollout_is_faster
1



throughput
1
sebulba_ppo_envpool_1gpu_training_is_faster
1
sebulba_ppo_envpool_a0_l1_training_is_faster
1
sebulba_ppo_envpool_a0_l01_training_is_faster
1
sebulba_ppo_envpool_a0_l12_training_is_faster
1


Rollout roughly equally fast as training


sebulba_ppo_envpool_thpt_rollout_is_faster
1



sebulba_ppo_envpool_thpt_rollout_is_faster
1