DeepSAFT
Created on December 7|Last edited on March 17
Comment
Old ExperimentsNew experimentPretraining ExperimentsDipole momentPretraining on CombisolvExperiments on Regressed Experimental DataHyperparameter grid search for PyG MPNN modelHyperparameter sweep using quasi random searchPyG pretraining
Old Experiments
New experiment
I made the following changes and ran the experiments again:
- Filter non-associating molecules to zero
- Always use vector/tensorial dipole moment representation in PaiNN
Below, I plot distributions of the mean absolute error only PaiNN and the GNN. These experiments suggest that PaiNN only has better predictions for dipole moment and the rest should use the GNN.
Below, I show the validation predictions of PaiNN model ordered from highest error to lowest error in association parameter .
Pretraining Experiments
Dipole moment
Pretraining on QM9 and fine tuning on data from SEPP with allowing further adjustment to the whole network seems to work best.
Below, I compare several PaiNN models for predicting dipole moment.
Previously, I tried the DMPNN on QM9 dipole moments (see this run) and then freezing the everything up to the first feedforward layer to predict only the dipole moments. That worked terribly!
Pretraining on Combisolv
Used the 1M solvation energies dataset from this paper for pretraining then fine tuned on SEPP data. I thought solvation energy might have some relationship to PC-SAFT parameters (e.g., see this paper). Unfortunately, the experiment did not seem to help at all.
Experiments on Regressed Experimental Data
Hyperparameter grid search for PyG MPNN model
- Whether or not to use the GRU layer
- Activation: ReLU or LeakyReLU
- Number of convolutions (i.e., message passing steps): 1,2, 3, 4
- Fingerprint dimension: [32, 64, 128]
- Takeaways
- Increasing the fingerprint dimension causes a slight improvement in validation loss
- Increasing the number of convolutions has a negative impact? I'm not 100% sure about this but it kind of makes sense because it increases the number of parameters.
- Increasing dropout acts as a form regularization and seems to reduce validation loss.
- Using GRU has a slightly useful impact.
- ReLU has a slight advantage of LeakyReLU but very slight.
Before running another search, I did some testing and found that I could run faster using num_workers =0
Hyperparameter sweep using quasi random search
I ran a more extensive hyperaparameter sweep over 11 scientific hyperparaemters also optimizing the "nuisance" parameters as suggested by the Google tuning playbook.
Main conclusions:
- Optimizer and scheduler are most significant
- Filtering non-associating molecules inside loss function might give slightly improved performance
- Dropout helps
It seems that Noam and Adam are the best optimizers and both need hyperparameter. tuning
PyG pretraining
- This works okay, but not great!
- The dropout rate seems to be most important.
- Lower learning rate is better
Add a comment