JSRL: A Method For Initializing A Reinforcement Learning Effort

A research paper has been released by Ikechukwu Uchendu et al. describing what they call Jump-Start Reinforcement Learning. This method for reinforcement learning aims to improve the efficiency of initializing reinforcement learning experiments.

Teli Davies

Created on April 7|Last edited on April 8

Comment

One of the main problems with reinforcement learning is its inability to grab any sort of footing to begin learning if the environment is too complex. A new paper by Ikechukwu Uchendu et al. proposes a method for solving just that. 
Their goal is to improve reinforcement learning at the early stages in complex environments and they've appropriately called the technique "Jump-Start Reinforcement Learning" (JSRL). Simply put: JSRL aims to alleviate these situations where the model isn't able to start by itself.
You can read the full paper here﻿﻿.
How does JSRL work?JSRL uses a mix of two policies: a guide policy and an exploration policy.
The guide policy will be a policy that is capable of completing the task, whatever form it may take. The exploration policy, meanwhile, is your standard start-from-scratch randomized reinforcement learning policy. The guide policy will help the exploration policy learn from a good position, helping mitigate the fumbling that sometimes comes with a randomly acting actor in a complex environment.
The way these two policies mix to drive the exploration policy to learn is why this method is so effective. At first, you let the guide policy control the actor, and then once it hits a defined point, the exploration policy will take over and try to figure out what to do from there with its signature randomness.
Once the exploration policy is successful from the jump-started point, we begin to reel back how long the guide policy has control for, until eventually the exploration policy will have learned to complete the whole task from start the finish without the need of the guide policies helpful jump-start.
﻿
Letting the guide policy take control at the start lets the exploration policy incrementally learn the whole process in reverse, starting from the simplest state at the end all the way back to the most complex state at the beginning. Without this aptly named jump-start of the JSRL process, the model would never be able to find its way to the goal, let alone complete anything successfully.
Should you be using JSRL?JSRL is useful in cases where a basic RL policy isn't able to make any progress without some help, especially if the actual task starts out with something conceptually simple like reaching an arm forwards.
Because the guide policy is so flexible in how it can be constructed, there's really not much downside to using JSRL, though some tasks might be better suited for the simpler imitation learning process depending on the task.
If the start of your task is simple enough that a regular RL model can begin learning without help, then JSRL likely won't be too useful to you.
Find out more﻿Read the in-depth research paper by clicking here.﻿
﻿Read Google's blog post breakdown of this paper by clicking here.﻿
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.