On-Chip Bilogical Neurons Outperfrom Modern RL Algorithms
A new study reveals how powerful biological neurons can be!
Created on June 19|Last edited on June 19
Comment
In recent years, the quest to understand and enhance learning efficiency in artificial intelligence has led researchers to draw inspiration from biological systems. A groundbreaking study titled "Biological Neurons Compete with Deep Reinforcement Learning in Sample Efficiency in a Simulated Gameworld" explores how biological neural networks can rival state-of-the-art deep reinforcement learning (RL) algorithms in terms of learning efficiency. This study compares in vitro biological neurons with advanced deep RL models, such as DQN, A2C, and PPO, within the context of a simplified 'Pong' game.
DishBrain
The research utilizes the DishBrain system, a novel technology that integrates in vitro neural networks with in silico computation through high-density multi-electrode arrays (HD-MEAs). This system facilitates a real-time closed-loop interaction between biological neurons and a virtual environment. By stimulating and recording neural activity, DishBrain allows researchers to compare the learning rates and performance of biological systems with those of deep RL algorithms.
The DishBrain system trains in vitro biological neurons by integrating them with real-time in silico computation via HD-MEAs. Neurons are sourced from either embryonic rodent cortical cells or human induced pluripotent stem cells (hiPSCs). Approximately one million cells are plated and maintained in BrainPhys™ Neuronal Medium, supplemented with 1% penicillin-streptomycin during testing. These neuronal cultures are integrated onto HD-MEAs to enable precise stimulation and recording of neural activity.
Learning Via Stimulation
The system involves sensory stimulation where neurons receive input via electrical pulses that communicate information about the game environment, such as the ball position, using a combination of rate coding and place coding. Rate coding involves electrical pulses with frequencies ranging from 4Hz to 40Hz to encode the x-axis position of the ball, while place coding uses specific electrodes arranged topographically to encode the y-axis position. Predictable feedback is delivered when the neurons correctly move the paddle to hit the ball, providing a reward signal. This simultaneous stimulus across all eight stimulation electrodes reinforces the neural connections involved in the successful action.
On the other hand, unpredictable sensory feedback is delivered when the neurons miss the ball, providing a penalty signal. This feedback involves random stimulation at 150mV and 5Hz over a period of four seconds. The random nature of this feedback ensures that it does not consistently activate the same neural pathways, preventing the strengthening of synapses associated with failure.

Playing Pong
The neurons control a virtual paddle in a simplified Pong game, with the paddle's movement determined by the level of electrophysiological activity recorded from predefined motor regions in the neuronal network. The system adjusts the paddle movement in real-time based on neural activity, with a spike-to-stimulus latency of approximately 5ms. Each training session lasts 20 minutes, during which the neurons interact with the game environment. On average, each culture undergoes approximately 70 episodes per session, matching the training duration provided to the RL algorithms for comparison.
How are Biological Neurons "Trained"?
Understanding why predictable sensory feedback leads to learning in biological neurons involves synaptic plasticity and Hebbian learning. Synaptic plasticity refers to the ability of synapses to strengthen or weaken over time in response to activity. Predictable sensory feedback reinforces the neural connections involved in successful actions through Hebbian learning, where "neurons that fire together wire together." Consequently, the synapses that are repeatedly and persistently activated by the predictable feedback become stronger, making it more likely that similar neural activities will occur in response to the same sensory input in the future.
Unpredictable sensory feedback plays a crucial role in disrupting ineffective neural activity patterns. It serves as an error signal, indicating that the action taken was incorrect or suboptimal. Unlike predictable feedback, which reinforces successful actions, unpredictable feedback acts as negative reinforcement, discouraging the repetition of the same action by signaling that the recent neural activity did not lead to a desirable outcome. This form of feedback prevents the strengthening of synapses associated with failure, thereby promoting the exploration of new patterns of activity.
The role of unpredictable feedback is to introduce variability in neural responses, preventing the neurons from overfitting to a specific, potentially suboptimal pattern of activity. This variability is crucial for enabling the neural network to explore different strategies and discover more effective actions. By providing random stimulation when the neurons fail to intercept the ball, the system ensures that no specific neural pathway is consistently activated, thereby preventing any one pathway from being strengthened.
Comparing Biology with Technology
To ensure a fair comparison, the networks were considered equal in terms of size by matching the number of neurons and the complexity of their connections as closely as possible. In biological networks, the neurons were cultured on HD-MEAs, which allowed precise stimulation and recording of neural activity across a comparable number of neurons to those in the artificial neural networks used in the deep RL algorithms. This approach ensured that both types of networks had a similar capacity for processing and learning from the input data, making the comparison between biological and artificial learning systems more meaningful.
One of the key findings of this study is that biological neurons exhibit higher sample efficiency and faster learning rates compared to deep RL agents when given a limited number of samples. This suggests that biological systems possess an inherent ability to learn more effectively from fewer experiences, a characteristic that remains a significant challenge in artificial intelligence.
The study also highlights the challenges faced by deep RL algorithms, particularly in terms of sample efficiency. While RL algorithms have achieved remarkable success in various game environments, they often require a vast number of samples to learn effectively. This limitation contrasts sharply with the rapid learning observed in biological neurons, which can adapt and improve their performance with relatively few samples. This discrepancy underscores the potential advantages of biologically inspired learning systems, which can offer valuable insights into developing more efficient artificial learning algorithms.
Nature is Hard to Beat
In conclusion, the comparison between biological neurons and deep reinforcement learning algorithms in the DishBrain study reveals significant differences in learning efficiency and adaptability. Biological neurons demonstrate a remarkable ability to learn quickly from limited samples, a characteristic that deep RL algorithms still struggle to achieve. Understanding and harnessing the mechanisms underlying biological learning, such as synaptic plasticity and Hebbian learning, could pave the way for developing more efficient and adaptable artificial intelligence systems. The study's findings emphasize the importance of exploring biologically inspired approaches to enhance the learning capabilities of artificial neural networks, ultimately bridging the gap between biological and artificial intelligence.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.