Skip to main content

Gorilla, SDXL, MetaGPT & More

Using tools with the Gorilla LLM family, generating higher quality images with SDXL and software engineering with MetaGPT!
Created on August 6|Last edited on August 6

Gorilla



Gorilla is an LLM trained on over 1,600 API calls. Given a command, it's capable of generating a set of API calls or code. On Hugging Face, they provide their dataset APIBench and 6 different models! Models denoted with a hf use Hugging Face APIs. Models with tf and th use TensorFlow v2 and Torch Hub, respectively. Querying the model is as simple as making an OpenAI call with a specific model!

Check out their paper and their recently released Gorilla CLI that helps you write commands in the terminal with support for AWS, Kubernetes, GCP, Azure, GitHub, and more!

SDXL

SDXL, StabilityAI's newest diffusion model, is their best yet.

SDXL is a 2.6B model with a base UNet model, a refinement UNet model, a VAE decoder, and CLIP ViT-L + OpenCLIP ViT-BigG.


Additionally, the authors of SDXL conditioned their model on the original image sizes (as opposed to images of a certain size, often done with Latent Diffusion Models). With SD 1.5 and 2.1, they used random cropping for data augmentation which unintentionally bled into the model's generated images in the form of cut-off pictures. The authors of SDXL avoided this by conditioning the model on the crop parameters.

Training was done on multiple aspect ratios to reflect the diversity in image sizes in the real world. Their last improvement was to their autoencoder which they increased the batch size from 9 to 256 leading to an incremental improvement.


MetaGPT

Check out the MetaGPT post if you haven't already!
In short, MetaGPT is like Gorilla, but generalized to the project-level scope, meaning it can not only write API calls and code, but also act as the tester, designer, and product manager. It's like a software engineering team of sorts! What does this look like?

It's "Design a RecSys" crazy. This multi-agent framework doesn't just spit out code when you interact with it like ChatGPT. Its new mediums allow it to tackle other aspects of the project process like designing, testing, infrastructure, and more.

Other Interesting News

  • CoreWeave secures a $2.3B loan to drive their ever-growing GPU cluster.
  • ollama is a library/collection of open-source LLMs (queried through cli)


References