ESM Atlas: Meta AI's New Open-Source 600M+ Metagenomic Structure Database
Meta AI have released a massive database for metagenomic structure predictions with the use of their own ESM protein structure prediction models.
Created on November 1|Last edited on November 1
Comment
ESMFold, Meta AI's model for protein structure prediction, has been applied to create the furthest-reaching collection of protein structures to date - a new database of 600+ million metagenomic protein structures called the ESM Metagenomic Atlas.
Metagenomics is the study of organism genomes collected directly from an environmental sample, such as from soil or the ocean. Within these samples are countless microorganisms, all with their own unique genomes and protein structures to study.
Proteins are represented by strings of characters, similar to how language is built on collections of sequential words. ESMFold and its many related models are all language models developed by Meta to understand the language of protein sequences and apply it to making predictions on the way these proteins fold around themselves in 3D space.
The ESM Atlas was built using ESMFold models, and predicts the structure of nearly all proteins present in the MGnify database, coming to over 600 million protein structures. Half of these proteins are known to exist, yet their function remains unknown; With ESMFold's predicted structures, biologists can have a helping hand in discovering the function of these elusive proteins.
The ESM Atlas is explorable on its dedicated website and the dataset is downloadable with instructions on the ESM GitHub. As always, the ESM models used to create this database are open source and weights are downloadable at the same GitHub repository.
ESMFold, Meta's Rival To AlphaFold, Gets New Public Releases
ESMFold, Meta AI's horse in the protein folding race, gets a new public model set release that gets comparable results at significantly faster speeds.
AlphaFold's Database Grows Over 200x To Cover Nearly All Known Proteins
AlphaFold's protein database has expanded from 1 million to over 200 million catalogued protein structures - nearly all proteins known to science.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.