Skip to main content

OpenFold: A PyTorch Reproduction Of DeepMind's AlphaFold

OpenFold hits the ground running with it's version 1.0 release. This ML model is a PyTorch reproduction of Deepmind's AlphaFold, a model built for protein folding.
Created on June 23|Last edited on June 24
Protein folding is a problem scientists have been facing for decades - knowing the specific way particular proteins are complexly configured could lead to rapid breakthroughs in medicine and biological understanding across the board. However, the ways that proteins fold around on themselves is extremely complex, far too complex for a human to compute in any timely manner.
AlphaFold is a machine learning model built by DeepMind to compete in CASP14 (the 2020 iteration of the biannual Critical Assessment of Structure Prediction experiment, an event to gauge progress in protein folding tech). Revolutionary at the time, AlphaFold broke records and set the bar for future research.
Now, OpenFold, a PyTorch reproduction of AlphaFold, has had its version 1.0 release.


What is OpenFold?

OpenFold is a PyTorch reproduction of DeepMind's AlphaFold, a machine learning model built to automatically process protein folding experiments. OpenFold is not the first of its kind, but it's by far the most complete and boasts ability equal to or greater than AlphaFold.
Like AlphaFold, and following in its own name, OpenFold is totally open-source and provided under very generously permissive licensing. Both have their parameters easily downloadable and licensed under CC BY 4.0, while their code available through GitHub is licensed under Apache 2.0. This means that OpenFold is ready to be used for just about any purpose by anyone interested.
The clearest difference between OpenFold and AlphaFold is that while AlphaFold was developed for a JAX workflow, OpenFold bases all of its code on a PyTorch environment. OpenFold is also trainable, meaning variants may be created for specialized research, unlike AlphaFold.
OpenFold was trained for around 100K compute hours on NVIDIA A100 Tensor Core GPUs, though the researchers found that 90% of final accuracy was attained in only the first 3K compute hours. After the initial jump, accuracy gain significantly slowed, though it still did climb gradually.

How does OpenFold compare to AlphaFold?

Overall, OpenFold's accuracy is comparable, if slightly higher on average, to AlphaFold. OpenFold's inference was also found to be twice as fast when compared to AlphaFold for short proteins, though the advantage shrinks on longer proteins. With considerable optimizations to memory usage compared to AlphaFold, OpenFold is able to handle much longer protein sequences - up to around 4,600 residues on a single 40GB A100.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.