Mojo, Scaling Transformers, YOLO-NAS, MPT & OpenLLaMA
A hot new Python Language, Scaling up the context length of transformers, YOLO-NAS achieves SOTA, foundation models, and more!
Created on May 6|Last edited on May 7
Comment
Mojo: A Newer, Faster Python Superset Language
Mojo, a Python superset Language, is designed for AI developers. It's currently still a work in progress, but developers can sign up for early access in a Jupyter playground.

Scaling Transformers to 1M Tokens
All language models have a very limited context length. For example, Bing only allows you to enter 4000 characters. Yannic Kilcher has a great video debriefing this paper.
This paper builds on an existing paper titled: Recurrent Memory Transformers. In summary, for an input length that's too long to process for a given language model and for some given hardware, the input is chunked and fed into the model sequentially. At each chunk, the model ingests a few fixed memory tokens and outputs a few fixed memory tokens. Essentially, it applies the recurrent property from RNNs to transformers.

YOLO-NAS
YOLO-NAS, released by Deci, outperforms all other YOLO variants. NAS stands for Neural Architecture Search.

import super_gradientsyolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()yolo_nas.predict("https://deci-pretrained-models.s3.amazonaws.com/sample_images/beatles-abbeyroad.jpg").show()
They also have example notebooks for finetuning and quantization.
MosaicML MPT Models & OpenLLaMA
MosaicML, as part of their foundation model series, released the MosaicML Pretrained Transformer (MPT) family of LLMs. Check out more on their blog and check out their HuggingFace spaces to interact with these models!
They have 4 models:
- MPT-7B Base
- MPT-7B-StoryWriter-65k+
- MPT-7B-Instruct
- MPT-7B-Chat
MosaicML is part of a larger cohort of organizations that are releasing their foundation models: StableLM from StabilityAI, Pythia from EleutherAI, LLaMA from Meta, OpenLLaMA from BAIR, and more.
It's incredible how fast open source is progressing in the LLM space!
Interesting Resources
References
Mojo
Scaling Up Transformers to 1M Tokens
YOLO-NAS
MosaicML MPT Models & OpenLLaMA
Interesting Resources
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.