Molecule optimization with MolMIM and WandB
Created on July 24|Last edited on July 24
Comment
What is MolMIM?
MolMIM is a latent variable model developed by NVIDIA that is trained in an unsupervised manner over a large-scale dataset of molecules in the form of SMILES strings. MolMIM utilizes transformer architecture to learn an informative fixed-size latent space using Mutual Information Machine (MIM) learning. MIM is a learning framework for a latent variable model which promotes informative and clustered latent codes. MolMIM can be used for sampling novel molecules from the model’s latent space. MolMIM
- Allows users to generate molecules similar to the seed molecule in SMILES format by randomly perturbing (eg by adding 0 centered gaussian noise with a desired variance) the latent space encoded from a seed molecule and decoding that back into SMILES.
- performs optimization with the CMA-ES algorithm in the model’s latent space and sample molecules with improved values of the desired scoring function. (from BioNeMo official documentation : https://docs.nvidia.com/bionemo-framework/latest/models/molmim.html)
Tips
- We use MIM to avoid the main caveat of VAE, a phenomenon called posterior collaps where the learned encoding distribution closely matches the prior, and the latent codes carry little information (Razavi et al., 2019).
- MolMIMは初めてMIMを分子データに適用した例です。
How to optimize molecule
Problem setting
Conclusion
Add a comment