AI Expert Speculates on GPT-4 Architecture

George Hotz offers his thoughts on the secretive architecture of OpenAI's GPT-4
Created on June 21|Last edited on June 21
Comment
George Hotz, renowned for his undeniable expertise in artificial intelligence, recently offered his thoughts on the secretive architecture of OpenAI's GPT-4. Hotz's is a co-founder of Comma.ai, a self-driving car startup, and also well known for being the first hacker to unlock the iPhone and Sony PS3. OpenAI's decision to maintain a secretive stance on GPT-4's architecture has been very controversial, and there has been little talk over the true specifics of the underlying architecture. 
GPT-4 Architecture Hotz didn't mince words when discussing the architecture of GPT-4. He put forth that the model is a set of eight distinct models, each featuring 220 billion parameters, totaling to about 1.76 Trillion parameters. Aside from its larger size, another important detail is that GPT-4 uses a Mixture of Experts architecture. This means different components, or 'experts', within the model work together, each contributing to the final output.
﻿
﻿
Mixture of Experts The Mixture of Experts (MoE), as explained by Hotz, is a practical methodology adopted when the path to model enhancement seems to have reached its limits. MoE operates on a straightforward principle – constitute a panel of simpler models or 'experts,' each catering to specific aspects of data, and draw on their collective intelligence.
In practice, this is like tackling a sophisticated task like building a house. You'd recruit professionals: architects, engineers, and interior designers, each with unique specializations. Similarly, an MoE model consists of multiple 'experts,' each skilled in a particular domain, collaborating to deliver enhanced results.
Source: https://github.com/d909b/ame﻿
﻿
Added Benefits Moreover, implementing a Mixture of Experts architecture could offer practical benefits in training the model. It could, for instance, make it easier to delegate tasks to different engineering teams, each focusing on enhancing a specific 'expert'. This approach could also allow for concentrating on certain problems with a dedicated model, thereby breaking up the work into multiple smaller pieces that are more manageable with existing computational resources.
The MoE architecture potentially shrinks the monumental task of improving a massive model like GPT-4 into a series of more achievable challenges. One could imagine a specific model more fact-checking or avoiding hallucinations, potentially simplifying model versioning and reducing regressions in model performance. 
Iterative Inference Hotz also introduced the concept of iterative inference as a potential mechanism in GPT-4. Unlike traditional models that generate a singular output, iterative inference works in a cycle, consistently refining its outputs over numerous iterations.
According to Hotz, GPT-4 might implement 16 iterations of inference. The model revising and fine-tuning the output 16 times, enhancing it at each iteration.
This iterative process would allow the model to adjust its predictions based on the insights gleaned from previous iterations. Although not mentioned specifically by Hotz, this iterative process could also allow for using specific experts throughout various parts of a single task, which could be advantageous for generating longer answers to complex questions. 
Educated GuessWhile Hotz's conjectures about GPT-4's potential architecture seem plausible, it's important to note that these are unconfirmed ideas. Until OpenAI releases official details about GPT-4, we rely on educated guesswork. Nevertheless, these speculations offer interesting avenues for discussion and exploration within the AI community. It would be particularly intriguing to see how open-source projects might adapt and implement these ideas in their models and algorithms.
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.