Skip to main content

ESM Atlas: Meta AI's New Open-Source 600M+ Metagenomic Structure Database

Meta AI have released a massive database for metagenomic structure predictions with the use of their own ESM protein structure prediction models.
Created on November 1|Last edited on November 1
ESMFold, Meta AI's model for protein structure prediction, has been applied to create the furthest-reaching collection of protein structures to date - a new database of 600+ million metagenomic protein structures called the ESM Metagenomic Atlas.

Metagenomics is the study of organism genomes collected directly from an environmental sample, such as from soil or the ocean. Within these samples are countless microorganisms, all with their own unique genomes and protein structures to study.
Proteins are represented by strings of characters, similar to how language is built on collections of sequential words. ESMFold and its many related models are all language models developed by Meta to understand the language of protein sequences and apply it to making predictions on the way these proteins fold around themselves in 3D space.
The ESM Atlas was built using ESMFold models, and predicts the structure of nearly all proteins present in the MGnify database, coming to over 600 million protein structures. Half of these proteins are known to exist, yet their function remains unknown; With ESMFold's predicted structures, biologists can have a helping hand in discovering the function of these elusive proteins.
The ESM Atlas is explorable on its dedicated website and the dataset is downloadable with instructions on the ESM GitHub. As always, the ESM models used to create this database are open source and weights are downloadable at the same GitHub repository.

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.