Skip to main content

New RL Method Inspired by Animals

Is biology the blueprint for AI?
Created on July 7|Last edited on July 7
Curiosity seems to be a key part of intelligence, as it drives people to dive deeper into subjects, and ultimately develop unique insights and adapt to the ever-changing world. Curiosity is not just important for humans, but also for artificial intelligence (AI) systems. They need to understand and adapt to their environments, just like we do. At
Stanford, researchers Isaac Kauvar and Chris Doyle set up a test. They put a mouse and an AI in similar environments and introduced a red ball. They found that the mouse was much quicker to interact with the ball than the ai system, which led them to work on finding new ways to implement curiosity in reinforcement learning. They call their new method Curious Replay.
Initial Animal Studies from the paper

RL

Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with its environment. In RL, the agent's goal is to learn a policy – a strategy to select an action based on its current state – that maximizes the expected cumulative reward.

Experience Replay

An essential component of many RL algorithms, especially those based on deep learning, is the experience replay buffer. This buffer serves as a repository of the agent's past experiences, each typically represented as a tuple of the agent's state, the action it performed, the reward it received, and the subsequent state it ended up in. This collected data allows the RL agent to learn from its previous experiences, contributing to the efficiency and stability of learning.
The replay buffer operates on a principle akin to human memory, allowing the agent to remember and learn from past actions and their outcomes. When the buffer's capacity is reached, older experiences are typically discarded to make room for newer ones, implementing a sort of "forgetting" mechanism.

Changing Environments

In some scenarios, the agent's environment may change over time. These changes could be a result of factors external to the agent, like a light in the environment being turned on, or because of achievements by the agent, such as entering a new area. In such changing environments, observations may now be novel, actions may now have a different effect, and the agent must adapt.
Adapting to changing environments can be challenging. For example, a method named Dreamer demonstrates promising results in many cases, but struggles to adapt quickly when the environment changes.

Curious Replay

To improve the adaptability of such systems, the researchers apply Curious Replay. Curious Replay encourages the world model to be more adaptive by prioritizing optimization on experiences that are either least accurately modeled or have been trained on the fewest times. The method combines two main strategies:
Count-based Replay: This strategy ensures that the model trains on new data by prioritizing experiences that have been encountered fewer times. It keeps track of how many times each experience in the replay buffer has been used for training and biases its sampling towards experiences that have been used less frequently.
Adversarial Replay: This strategy prioritizes experiences that the world model does not predict accurately. It uses the model's loss on each experience as a measure of how challenging that experience is for the model to learn. By favoring experiences with higher losses, Adversarial Replay encourages the model to improve its predictions on these challenging experiences.


Results


Results of Curious Replay on Crafter, a game with a constantly changing environment
In summary, Curious Replay enhances the RL agent's ability to adapt to changing environments by modifying the way experiences are sampled from the replay buffer. By giving priority to less-understood or less-frequently sampled experiences, Curious Replay ensures the agent learns to handle novel situations more effectively.

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.