Molecule optimization with MolMIM and WandB

Created on July 24|Last edited on July 24
Comment
﻿
﻿
﻿
What is MolMIM?MolMIM is a latent variable model developed by NVIDIA that is trained in an unsupervised manner over a large-scale dataset of molecules in the form of SMILES strings. MolMIM utilizes transformer architecture to learn an informative fixed-size latent space using Mutual Information Machine (MIM) learning. MIM is a learning framework for a latent variable model which promotes informative and clustered latent codes. MolMIM can be used for sampling novel molecules from the model’s latent space.  MolMIM - Allows users to generate molecules similar to the seed molecule in SMILES format by randomly perturbing (eg by adding 0 centered gaussian noise with a desired variance) the latent space encoded from a seed molecule and decoding that back into SMILES. - performs optimization with the CMA-ES algorithm in the model’s latent space and sample molecules with improved values of the desired scoring function.  (from BioNeMo official documentation : https://docs.nvidia.com/bionemo-framework/latest/models/molmim.html) 
Tips
We use MIM to avoid the main caveat of VAE, a phenomenon called posterior collaps where the learned encoding distribution closely matches the prior, and the latent codes carry little information (Razavi et al., 2019).
MolMIMは初めてMIMを分子データに適用した例です。
﻿
﻿
How to optimize molecule
Problem setting﻿
﻿
﻿
Conclusion﻿
﻿
﻿
Add a comment