Skip to main content

Deep Mind's Answer to GPT-4: Gemini

Known for utilizing reinforcement learning in games like Go, DeepMind looks to utilize large language models in combination with RL to surpass systems like GPT-4.
Created on June 28|Last edited on June 29
DeepMind, a Google-owned artificial intelligence lab, has a history of monumental breakthroughs. The lab's most striking success was the creation of AlphaGo, a system that stunned the world by defeating a world champion player of the board game Go in 2016, a feat many believed to be decades away. Using reinforcement learning, where the AI system improves by making repeated attempts and receiving feedback on its performance, and a method called Monte Carlo tree search (MCTS) to remember and explore possible moves, AlphaGo mastered a game of nearly infinite complexity.

Go Board
DeepMind's innovation extended beyond board games, displaying impressive performances in mastering video games using reinforcement learning techniques. The lab's profound work in this space has been very influential in advancing the capabilities of AI, particularly in complex decision-making tasks.

Current Systems

State-of-the-art language models, such as OpenAI's ChatGPT are trained on vast amounts of curated text, utilizing statistical patterns to predict the most likely next piece of text. A significant driver for LLM success has been reinforcement learning from human feedback (RLHF), where the AI model's performance is refined based on feedback from humans. The model's ability to interact and learn from this feedback has greatly enhanced its capability, making it possible to engage in more natural, coherent, and contextually relevant conversations.
Despite their impressive performance, current LLMs mainly capture statistical trends and are not built to learn from large amounts of real world experience, unlike systems such as AlphaGo. This is a limitation, especially considering that learning from experience is a crucial aspect of human and animal intelligence.

The Challenges

DeepMind's new project, Gemini, is seeking to integrate the successful components of their previous innovations, taking large language models a step further. CEO Demis Hassabis has hinted that Gemini will incorporate techniques from AlphaGo into a language model similar to GPT-4. The goal is to grant Gemini capabilities beyond what LLMs currently offer, such as planning and problem-solving.
While DeepMind's Gemini aims to leverage reinforcement learning, as was done in AlphaGo, to enhance performance based on feedback and refine its decision-making abilities, this application to natural language processing (NLP) tasks is not without challenges. One of the most significant issues is the vast and complex nature of languages, combined with the context-dependent nature of text, which makes the reward system in reinforcement learning more intricate to establish and evaluate.

Language as a Game?

Unlike a game like Go with clearly defined rules and win conditions, NLP lacks such concrete parameters, making it difficult for the AI to understand when its output is "correct" or "optimal." Additionally, the concept of "self-play," which was a critical part of AlphaGo's learning process, is not directly translatable to NLP tasks. In the context of a game, the AI can play against itself to generate new data and learn from it. But in NLP, it's unclear how an equivalent "self-conversation" would work or what it would accomplish, especially given the fact that the language model needs to react to unpredictable human inputs rather than a more predictable AI-generated response.
The application of Monte Carlo tree search, which was used successfully in AlphaGo, also faces unique challenges in the context of NLP. MCTS requires a well-defined notion of state and action, as well as a simulation of the environment, which are less clear in the context of language generation compared to a game like Go or Chess.
In a board game like Go, the rules are discrete and well-defined, with each move leading to a clear change in the game state. The reward function in this scenario is therefore concrete: winning the game is the ultimate reward. Decisions that lead to victory are positively reinforced, while those leading to loss are negatively reinforced.
RLHF creates an approximation of a reward function by relying on human feedback to rank or rate responses. However, this approximation is inherently imperfect as it's dependent on the subjective judgement of human evaluators. In addition, it may not capture all the nuances or variables in different conversational contexts. Despite these limitations, RLHF currently provides one of the most viable approaches to guiding NLP models in generating improved responses.
DeepMind has also hinted at introducing new innovations. While specifics are yet to be released, it's clear that DeepMind is aiming to push the boundaries of what is currently possible with LLMs, drawing from their extensive experience in RL.

Conclusion

While Gemini's development is still ongoing, DeepMind's ambitious project is being closely watched by the community. Deep Mind’s reputation for building RL systems positions them well to advance the capabilities of LLMs, and potentially leapfrog Google over OpenAI. It should be interesting to see what Deep Mind unveils.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.