Mojo, Scaling Transformers, YOLO-NAS, MPT & OpenLLaMA

A hot new Python Language, Scaling up the context length of transformers, YOLO-NAS achieves SOTA, foundation models, and more!
Vincent Tu
Created on May 6|Last edited on May 7
Comment
﻿
Mojo: A Newer, Faster Python Superset Language﻿Mojo, a Python superset Language, is designed for AI developers. It's currently still a work in progress, but developers can sign up for early access in a Jupyter playground.
﻿
﻿
﻿
Scaling Transformers to 1M TokensAll language models have a very limited context length. For example, Bing only allows you to enter 4000 characters. Yannic Kilcher has a great video debriefing this paper.
﻿
﻿
This paper builds on an existing paper titled: Recurrent Memory Transformers. In summary, for an input length that's too long to process for a given language model and for some given hardware, the input is chunked and fed into the model sequentially. At each chunk, the model ingests a few fixed memory tokens and outputs a few fixed memory tokens. Essentially, it applies the recurrent property from RNNs to transformers.
﻿
YOLO-NASYOLO-NAS, released by Deci, outperforms all other YOLO variants. NAS stands for Neural Architecture Search. 
﻿
import super_gradients
﻿
yolo_nas = super_gradients.training.models.get("yolo_nas_l", pretrained_weights="coco").cuda()
yolo_nas.predict("https://deci-pretrained-models.s3.amazonaws.com/sample_images/beatles-abbeyroad.jpg").show()
From their GitHub repo.
They also have example notebooks for finetuning and quantization.
MosaicML MPT Models & OpenLLaMAMosaicML, as part of their foundation model series, released the MosaicML Pretrained Transformer (MPT) family of LLMs. Check out more on their blog and check out their HuggingFace spaces to interact with these models!
They have 4 models:
MPT-7B Base
MPT-7B-StoryWriter-65k+
MPT-7B-Instruct
MPT-7B-Chat
MosaicML is part of a larger cohort of organizations that are releasing their foundation models: StableLM from StabilityAI, Pythia from EleutherAI, LLaMA from Meta, OpenLLaMA from BAIR, and more.
It's incredible how fast open source is progressing in the LLM space!
Interesting Resources﻿The Ultimate Battle of Language Models: Lit-LLaMA vs GPT3.5 vs Bloom vs …﻿
﻿NeMo-Guardrails﻿
﻿DeepFloyd IF﻿
﻿heypi ﻿
﻿Google "We Have No Moat, And Neither Does OpenAI"﻿
﻿WizardLM: Empowering Large Language Models to Follow Complex Instructions﻿
﻿OpenLLaMA: An Open Reproduction of LLaMA﻿
ReferencesMojo
﻿https://www.modular.com/mojo﻿
Scaling Up Transformers to 1M Tokens
﻿https://arxiv.org/abs/2207.06881﻿
﻿https://arxiv.org/abs/2304.11062﻿
﻿https://www.youtube.com/watch?v=4Cclp6yPDuw&ab_channel=YannicKilcher﻿
YOLO-NAS
﻿https://deci.ai/blog/yolo-nas-foundation-model-object-detection/﻿
﻿https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md﻿﻿﻿
MosaicML MPT Models & OpenLLaMA
﻿https://www.mosaicml.com/blog/mpt-7b﻿
﻿https://huggingface.co/mosaicml﻿
Interesting Resources
﻿https://lightning.ai/pages/community/community-discussions/the-ultimate-battle-of-language-models-lit-llama-vs-gpt3.5-vs-bloom-vs/﻿
﻿https://github.com/NVIDIA/NeMo-Guardrails﻿
﻿https://huggingface.co/spaces/DeepFloyd/IF﻿
﻿https://heypi.com/talk﻿
﻿https://www.semianalysis.com/p/google-we-have-no-moat-and-neither﻿
﻿https://arxiv.org/pdf/2304.12244.pdf﻿
﻿https://github.com/openlm-research/open_llama﻿
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.