Initial Training in Mode 2 w/ Control2

This is training done on the high-speed ring in a white MR2 (AT), with: obs_space =spaces.Tuple((rState, eClutch, eSpeed, eBoost, eGear, vSpeed, vSteer, vDir, vColl, rLeftSlip, rRightSlip, fLeftSlip, fRightSlip, fLWheel, fRWheel, rLWheel, rRWheel, images)) Digital accelerator (no brakes) and Digital Steering 64 x 64 CNN

Nadir Zaman Syed

Created on October 28|Last edited on January 2

Comment

﻿
Reward Mechanism        self.reward = (best_index - self.cur_idx) / self.fudgeFactor    
        if vSpeed < 20 and vDir == 1: # going slow backwards
            self.reward = self.reward - 0.02 # this hsould have been higher
        else: # constant penalty that can only be overcome by going forward
            self.reward = self.reward -0.05 
Training & TestingThe training was restricted to 500 steps of 0.05s each initially. That is sufficient to learn how to drive off the start-finish line and approach the first corner. Initially the agent zig-zags and often gets into a spin, until eventually it is able to mostly go in a straightline.
The second stage of training was restricted to 750 steps which corresponds to enough time to finish going round the first high speed left banked corner. The agent quickly learns it should turn left, but struggles to avoid the grass or the wall. Because the corner is banked once the car starts to slide, the dynamics are more difficult to control. However the agent is then able to learn to control the slide if/when it occurs.
The part that is inconsistent is the short straight-section after the first corner:
The grey wall on the side is confusing the agent.
It seems like the agent struggles to consistently recognise the grey wall on the right side.
The third stage of training allows for 1000ms, which is enough to complete the second left corner and encounter the first right corner. It is fascinating that the agent has generalised enough to clumsily navigate the left corner but struggles on the right turn, because it is a scenario it has never encountered before.
During training at around epoch 337, the emulator crashed, so unfortunately it is unclear what the agent was learning (if anything).
﻿
Run: GTAI_mode2_control_21
﻿
System Parameters
﻿
Run: GTAI_mode2_control_21
﻿
        # The reward is then proportional to the number of passed indexes (i.e., track distance):
        self.reward = (best_index - self.cur_idx) / self.fudgeFactor    
        #if vColl > 0: # hit
            #self.reward = self.reward - 0.02
        # if self.reward < 0:
        #     self.badDirectionSteps = self.badDirectionSteps + 1
        if vSpeed < 20 and vDir == 1: # going slow
            self.reward = self.reward - 0.02
        else: # going back
            self.reward = self.reward -0.05
            
          
        self.cur_idx = best_index
        
        if self.firstLoop: # hack just in case the car is not starting at index 0
            self.reward = 0.0
            self.firstLoop = False
                
        # if self.badDirectionSteps > self.maxBadDirectionSteps:
        #     terminated = True
        self.reward = self.reward  / 1.0
﻿
        return self.reward, terminated
﻿

Add a comment