Section 1

Environment has 3 phases. Agent should pick up key in phase 1 (no immediate reward), teleports to phase 2 with immediate reward of 3 points per gift (4 gifts total), then teleports to phase 3, where the agent should go to the goal. If the agent is carrying a key when it reaches the goal, it is rewarded extra for having the key.

max possible rewards in last phase: 5 (goal) + 15 (key) = 20

There are an additional 4 gifts * 3 points/gift in 2nd phase = 12 gift points ==>

total 32 points max when time spend in gifts env != 0.

Section 1