DeepSAFT

Created on December 7|Last edited on March 17
Comment
﻿
Old ExperimentsNew experimentPretraining ExperimentsDipole momentPretraining on CombisolvExperiments on Regressed Experimental DataHyperparameter grid search for PyG MPNN modelHyperparameter sweep using quasi random searchPyG pretraining
﻿
Old Experiments
New experimentI made the following changes and ran the experiments again:
Filter non-associating molecules to zero
Always use vector/tensorial dipole moment representation in PaiNN
﻿
﻿
Run set5
﻿
﻿
Below, I plot distributions of the mean absolute error only PaiNN and the GNN. These experiments suggest that PaiNN only has better predictions for dipole moment and the rest should use the GNN.
﻿
Run set2
﻿
﻿
Below, I show the validation predictions of PaiNN model ordered from highest error to lowest error in association parameter ϵAB\epsilon_{AB}ϵAB​﻿.
﻿
Run set1
﻿
Pretraining Experiments
Dipole momentPretraining on QM9 and fine tuning on data from SEPP with allowing further adjustment to the whole network seems to work best.
﻿
Run set7
﻿
Below, I compare several PaiNN models for predicting dipole moment. 
﻿
Run set5
﻿
﻿
Previously, I tried the DMPNN on QM9 dipole moments (see this run﻿﻿) and then freezing the everything up to the first feedforward layer to predict only the dipole moments. That worked terribly!
﻿
Run set2
﻿
Pretraining on CombisolvUsed the 1M solvation energies dataset from this paper for pretraining then fine tuned on SEPP data. I thought solvation energy might have some relationship to PC-SAFT parameters (e.g., see this paper). Unfortunately, the experiment did not seem to help at all.
﻿
Run set3
﻿
Experiments on Regressed Experimental Data﻿
Run set5
﻿
Hyperparameter grid search for PyG MPNN modelI did a simple grid search of the following hyperparameters (MR in Gitlab here):
Whether or not to use the GRU layer
Activation: ReLU or LeakyReLU
Number of convolutions (i.e., message passing steps): 1,2, 3, 4
Fingerprint dimension:  [32, 64, 128]
Takeaways
Increasing the fingerprint dimension causes a slight improvement in validation loss
Increasing the number of convolutions has a negative impact? I'm not 100% sure about this but it kind of makes sense because it increases the number of parameters.
Increasing dropout acts as a form regularization and seems to reduce validation loss.
Using GRU has a slightly useful impact.
ReLU has a slight advantage of LeakyReLU but very slight.
﻿
Run set241
﻿
﻿
﻿
Before running another search, I did some testing and found that I could run faster using num_workers =0
﻿
Run set4
﻿
Hyperparameter sweep using quasi random searchI ran a more extensive hyperaparameter sweep over 11 scientific hyperparaemters also optimizing the "nuisance" parameters as suggested by the Google tuning playbook. 
Main conclusions:
Optimizer and scheduler are most significant
Filtering non-associating molecules inside loss function might give slightly improved performance
Dropout helps
﻿
Run set305
﻿
﻿
It seems that Noam and Adam are the best optimizers and both need hyperparameter. tuning
﻿
Run set152
﻿
PyG pretrainingFrom this group of runs: sweep_9xt0sw﻿
This works okay, but not great!
The dropout rate seems to be most important.
Lower learning rate is better
﻿
Run set46
﻿
﻿
Add a comment