Skip to main content

PaLM 2, MaMMUT, Explaining Neurons, Anthropic 100k Context Window

A round-up of Google's PaLM 2, a large language model to compete with OpenAI's GPT-4, MaMMUT, a vision encoder, and an upgrade to Anthropic's Claude model.
Created on May 12|Last edited on May 13

PaLM 2

Google released PaLM 2, a large language model (LLM) to compete against GPT-4. PaLM's authors boast that it can not only code and reason like GPT-4, but it's multilingual (as it was trained on over 100 languages!). PaLM 2 comes in 4 sizes: Gecko, Otter, Bison and Unicorn. Google I/O 2023 featured the integration of AI into a wealth of their products with a lot of these applications backed by PaLM 2.
They plan to integrate these LLMs into their workspace like Google Docs, Sheets, Slides, etc called Google Duet. A different flavor of PaLM is also being used for medical questions, called Med-PaLM 2. Sec-PaLM 2 will be used for cybersecurity. They are also working on Gemini, a tool-intelligent, multimodal LLM.



MaMMUT

MaMMUT is a vision-encoder, text-decoder model that achieves SOTA in image-text retrieval, text-image retrieval, VideoQA and more.

As given in the above figure, MaMMUT trains jointly where image-text pairs are jointly trained on both contrastively and with text generation in mind whereas previous methods were unable to establish this paradigm.
Below are a few tasks, MaMMUT achieves SOTA in!


Explaining Neurons

OpenAI, as part of their alignment research pillar, is leveraging GPT-4 to explain the neurons of GPT-2-generated text. Their approach is 3-fold:
  • Showing relevant activations and text sequences from a neuron in GPT-2, have GPT-4 explain it.
  • Simulate that neuron's action using GPT-4
  • Score how well the GPT-4 generated output matches the GPT-2 output for that neuron
They've argue that ML can be used for explaining and interpreting models as:
  • iterating on explanations (asking GPT-4 to generate counterexamples to what the neuron should be doing) improves the score mentioned above
  • architecture and model size typically affect the similarity score

Anthropic 100k Context Window

Anthropic's Claude model has just received a massive upgrade! Their context window expanded from 9k to 100k, dwarfing MosaicML's recent 65k context window MPT models! At this scale, Claude can receive as input entire documents. This could serve as a valuable information retrieval tool for long documents, pdfs, and more. If anything, it's very reminiscent of GPT-4 doing taxes!

References

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.