Double Buffer Method uses two replay buffers, where the first buffer has robot actions and the second buffer had human actions. The allocation of experiences from these buffers is controlled by a parameter called Alpha.
When Alpha is set to 1.0, the model exclusively learns from experiences sampled from the Robot Action Buffer.
With Alpha at 0.9, the model allocates 90% of the batch size to learning from the Robot Action Buffer, while the remaining 10% of the batch size is dedicated to experiences from the Human Action Buffer.
Similarly, if Alpha is 0.8, then 80% of the batch size is used for learning from the Robot Action Buffer and 20% from the Human Action Buffer.
Five Seeds with Grouping (5000 Episodes):
Average Reward
Average Reward
Select runs that logged average_score to visualize data in this line chart.