sebulba various setting
Created on January 27|Last edited on January 28
Comment
Finding highlights:
- adding a timeout significantly reduces params_queue get time for actors, especially when paired with a dedicated GPU; however it does come with a side effect (e.g., such as the actor will not be able to pull the latest params from the learner and will use the old params instead)
1. rollout is faster than training
sebulba_ppo_envpool_thpt_rollout_is_faster
1
1.1 rollout is faster than training w/ timeout
sebulba_ppo_envpool_1gpu_rollout_is_faster
1
sebulba_ppo_envpool_a0_l1_rollout_is_faster
1
sebulba_ppo_envpool_a0_l01_rollout_is_faster
1
sebulba_ppo_envpool_a0_l12_rollout_is_faster
1
sebulba_ppo_envpool_1gpu_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l1_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l01_rollout_is_faster_timeout
1
sebulba_ppo_envpool_a0_l12_rollout_is_faster_timeout
1
1.2. rollout is much faster than training w/ timeout (heavy slow learning updates) + timeout in params queue
Note that sebulba_ppo_envpool_a0_l01_rollout_is_faster performs the same as sebulba_ppo_envpool_a0_l12_rollout_is_faster. However, as we make the training even slower, there is a more significant difference.
💡
sebulba_ppo_envpool_thpt_rollout_is_faster
1
Note in sebulba_ppo_envpool_a0_l01_rollout_is_much_faster that the params_queue is full, but the actor can't seem to do the rollout because it's sharing GPU0 with the learner, which uses both GPU0 and GPU1. Whereas in sebulba_ppo_envpool_a0_l12_rollout_is_much_faster, the actor can step faster because it has its own GPU.
1
sebulba_ppo_envpool_a0_l01_rollout_is_much_faster
1
sebulba_ppo_envpool_a0_l12_rollout_is_much_faster
1
2. training is faster than rollout
Warning: actually I was wrong here. Training is not really faster than rollout. However, when we consider the time actor spent on transferring data to the learner device, which takes roughly 0.06, it actually makes the actor thread and learner thread have roughly the same speed.
💡
sebulba_ppo_envpool_thpt_rollout_is_faster
1
1
sebulba_ppo_envpool_1gpu_training_is_faster
1
sebulba_ppo_envpool_a0_l1_training_is_faster
1
sebulba_ppo_envpool_a0_l01_training_is_faster
1
sebulba_ppo_envpool_a0_l12_training_is_faster
1
Rollout roughly equally fast as training
sebulba_ppo_envpool_thpt_rollout_is_faster
1
sebulba_ppo_envpool_thpt_rollout_is_faster
1
Add a comment