Skip to main content

Initial Training in Mode 2 w/ Control2

This is training done on the high-speed ring in a white MR2 (AT), with: obs_space =spaces.Tuple((rState, eClutch, eSpeed, eBoost, eGear, vSpeed, vSteer, vDir, vColl, rLeftSlip, rRightSlip, fLeftSlip, fRightSlip, fLWheel, fRWheel, rLWheel, rRWheel, images)) Digital accelerator (no brakes) and Digital Steering 64 x 64 CNN
Created on October 28|Last edited on January 2

Reward Mechanism

self.reward = (best_index - self.cur_idx) / self.fudgeFactor
if vSpeed < 20 and vDir == 1: # going slow backwards
self.reward = self.reward - 0.02 # this hsould have been higher
else: # constant penalty that can only be overcome by going forward
self.reward = self.reward -0.05

Training & Testing

The training was restricted to 500 steps of 0.05s each initially. That is sufficient to learn how to drive off the start-finish line and approach the first corner. Initially the agent zig-zags and often gets into a spin, until eventually it is able to mostly go in a straightline.
The second stage of training was restricted to 750 steps which corresponds to enough time to finish going round the first high speed left banked corner. The agent quickly learns it should turn left, but struggles to avoid the grass or the wall. Because the corner is banked once the car starts to slide, the dynamics are more difficult to control. However the agent is then able to learn to control the slide if/when it occurs.
The part that is inconsistent is the short straight-section after the first corner:
The grey wall on the side is confusing the agent.
It seems like the agent struggles to consistently recognise the grey wall on the right side.
The third stage of training allows for 1000ms, which is enough to complete the second left corner and encounter the first right corner. It is fascinating that the agent has generalised enough to clumsily navigate the left corner but struggles on the right turn, because it is a scenario it has never encountered before.
During training at around epoch 337, the emulator crashed, so unfortunately it is unclear what the agent was learning (if anything).

Run: GTAI_mode2_control_2
1

System Parameters

Run: GTAI_mode2_control_2
1

# The reward is then proportional to the number of passed indexes (i.e., track distance):
self.reward = (best_index - self.cur_idx) / self.fudgeFactor
#if vColl > 0: # hit
#self.reward = self.reward - 0.02
# if self.reward < 0:
# self.badDirectionSteps = self.badDirectionSteps + 1
if vSpeed < 20 and vDir == 1: # going slow
self.reward = self.reward - 0.02
else: # going back
self.reward = self.reward -0.05
self.cur_idx = best_index
if self.firstLoop: # hack just in case the car is not starting at index 0
self.reward = 0.0
self.firstLoop = False
# if self.badDirectionSteps > self.maxBadDirectionSteps:
# terminated = True
self.reward = self.reward / 1.0

return self.reward, terminated