variance=0 Expt (2): phase 2 rewards [3, 7] => variance=1.33 Expt (3): phase 2 rewards [0, 10] => variance=8.33" /> variance=0 Expt (2): phase 2 rewards [3, 7] => variance=1.33 Expt (3): phase 2 rewards [0, 10] => variance=8.33" />Distractor gift reward variance

Section 1

Environment has 3 phases. Agent should pick up key in phase 1 (no immediate reward), teleports to phase 2 with immediate reward of mean 5 points per gift (4 gifts total), then teleports to phase 3, where the agent should go to the goal. If the agent is carrying a key when it reaches the goal, it is rewarded extra for having the key.

max possible rewards in last phase: 5 (goal) + 15 (key) = 20 max possible total rewards: (average reward in middle phase) 20 + (max rewards in last phase) 20 = 40

Section 1