Skip to main content

OpenAI Updates: Model Regression Claims and Future Plans

Future Plans of OpenAI and Concerns around ChatGPT
Created on June 2|Last edited on June 2
Raza Habib, founder of a startup called HumanLoop recently had the chance to interview Sam Altman about the future plans of OpenAI and interestingly, the article has now been deleted from the Humanloop website. Fortunately the article was archived on the wayback internet Time Machine for viewing, which will be linked below if you are interested in reading.
Find the article at [1]

Technical Constraints

The conversation revealed that OpenAI is currently grappling with GPU limitations, which is affecting the speed and reliability of their API. This shortage is also causing delays in several of their short-term plans. It’s hindering the deployment of a longer 32k context, a feature that many users are anticipating, as well as limiting the availability of the finetuning API, which is currently very compute-intensive due to a lack of efficient finetuning methods like Adapters or LoRa. Furthermore, OpenAI's dedicated capacity offering, which provides customers with a private copy of the model, is also restricted due to this GPU shortage.

Efficiency

OpenAI’s current top priority is given to a cheaper and faster GPT-4. The company is also considering extending context windows to as high as 1 million tokens and improving the finetuning API. Additionally, they plan to release a stateful API which remembers the conversation history, a solution aimed at increasing efficiency. By 2024, multimodality, a feature demoed as part of the GPT-4 release, is expected to be extended to all users.

Plugins

Altman also shared insights on the development of plugins, stating that they currently lack Product-Market Fit and are therefore not likely to be integrated into the API in the near future. He further addressed developers' concerns about potential competition from OpenAI, clarifying that the company would refrain from releasing more products beyond ChatGPT to prevent competition with their API users.

Open Source GPT-3

While Altman advocates for the regulation of future AI models, he does not perceive current models as a threat and believes in the importance of open-source platforms. He also mentioned OpenAI's contemplation about open-sourcing GPT-3, highlighting his skepticism about the ability of many individuals and companies to host and serve large Language Learning Models.

Scaling Laws

Lastly, Altman addressed a prevalent discussion around the "age of giant AI models" coming to an end, arguing that such reports misinterpret the current state of AI. He emphasized that the scaling laws for model performance continue to hold and making models larger will continue to improve performance. Despite the fact that the rate of scaling can't be sustained due to the exponential increase in model size in recent years, Altman is confident that OpenAI will persist in efforts to grow the models, but at a slower pace. This continued scaling could hint at shorter timelines for the development of Artificial General Intelligence (AGI).


GPT Regression Concerns

After every Chatgpt update, it seems as if you hear complaints of a model performing inferior to the previous version. Benchmarking the performance of LLM’s is a very complex task, and requires a wide variety of evaluation data to do correctly. Whether or not users complains are valid is yet to be verified, and is mostly anecdotal. Recently on Twitter, OpenAI employee Logan Kilpatrick asserted that the GPT-4 API is currently not changing, however, it’s unclear as to how much the ChatGPT models are changing across updates. He also mentioned that performance inconsistencies in the API could be related to the stochastic nature of the API. As mentioned in the interview above, Sam Altman is clear about efficiency being a main focus for the company, and it seems likely that ChatGPT is continuously using smaller models with similar performance metrics on OpenAI’s internal eval datasets. This raises an important question around the deployment of models, and seems to point towards the fact that evaluation of LLM’s is very subjective and seems to vary across different users. Maybe in the future we will see tools that allow users to test models on their own individual data in order to select the best performing model for their needs? Time will tell.


Sources: