Skip to main content

Giraffe, Dolma & OLMo, Midjourney Image In-Painting, IDEFICS, and More!

Created on August 26|Last edited on August 27

Giraffe: Long Context LLMs

From Abacus.AI, an AI-assisted data science platform for building AI, Giraffe is a new family of LLMs based on LLaMA and LLaMA 2. They provide models of 3 different context lengths: 4k, 16k and 32k. What distinguishes Giraffe from the rest of the LLM zoo? Well, their paper explores context length extrapolation.
Context length extrapolation is the use of a LLM that is trained on a short context length for evaluation on longer context lengths, without any further training on the long context being involved.
The main reason why extrapolation exists is the compute limitation. Attention mechanisms scale quadratically in memory and compute to the input length. On another note, check out FlashAttention, a fast and memory-efficient attention!
Along with their new suite of models, they have released 3 new tasks/benchmarks to evaluate models: LongChat-Lines, FreeFormQA and AlteredQA. In their paper, the authors demonstrate perplexity falls short in reflecting a model's long-context performance. Thus, prompting the need for these 3 benchmarks.
All their code is available on GitHub here!

AI2 Dolma Dataset & OLMo

The Allen Institute for AI (AI2) recently released the Dolma Dataset, which, according to their blog, is a diverse mix of web content, academic publications, code, books, and encyclopedic materials. They provide a datasheet describing the dataset here. A combination of other established NLP datasets, Dolma's purpose is to train AI2's unique open-source SOTA LLM, OLMo.


Midjourney Image In-Painting

First off, what is in-painting? Well, it can be thought of as photoshop: you select a part of the image and update that. You're painting within the image. Out-painting is extending beyond the borders of the image.
Midjourney recently released Vary, the name of their text-driven in-painting feature. To try this out, join the Midjourney discord and query the Midjourney Discord bot!

IDEFICS: Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS

IDEFICS is a model. It is based on DeepMind's Flamingo. What is the purpose of IDEFICS? As part of HuggingFace's commitment to open-source, they strive to provide the AI research community with implementations of proprietary models and papers. More information about the model can be found here. To play test the model, check out this HuggingFace Space!


OpenAI GPT-3.5 Turbo Fine-Tuning

You can now fine-tune GPT-3.5 Turbo. Fine-tuning in what sense? Granular training on an entire dataset? No. From their fine-tuning guide and blog post, it seems that you're essentially providing meta-prompts to guide the model in the right direction before inference. This simplifies the prompt engineering aspect and prompts may be a lot simpler. Essentially, this decouples the role/persona assignment from the actual input data or question.
# An example prompt.
"""
<Assign role/persona; set the tone and specify formats>
<Context>

<Input Data>
Output: <Output>
"""

# An example prompt after fine-tuning the model.
"""
<Context>

<Input Data>
Output: <Output>
"""
Fine-tuning is encapsulates creating a dictionary of examples, uploading and starting the job, and loading in the fine-tuned model. All done in just a couple lines of code or commands!

Repo Spotlight & Other News


  • ollama, self-hosted, offline LLaMA2 via cmd
  • GodMode, web app connecting to ChatGPT, Claude 2, Perplexity, Bing, etc
  • llama-gpt, similar to ollama

References

Giraffe: Long Context LLMs
AI2 Dolma Dataset
Midjourney Image In-Painting
IDEFICS
OpenAI GPT-3.5 Turbo Fine-Tuning
HuggingFace Raises $235M
Platypus
llama2.py
ollama
GodMode
llama-gpt
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.