Skip to main content

DeepSAFT

Created on December 7|Last edited on March 17


Old Experiments

New experiment

I made the following changes and ran the experiments again:
  • Filter non-associating molecules to zero
  • Always use vector/tensorial dipole moment representation in PaiNN


Run set
5


Below, I plot distributions of the mean absolute error only PaiNN and the GNN. These experiments suggest that PaiNN only has better predictions for dipole moment and the rest should use the GNN.

Run set
2


Below, I show the validation predictions of PaiNN model ordered from highest error to lowest error in association parameter ϵAB\epsilon_{AB}.

Run set
1


Pretraining Experiments

Dipole moment

Pretraining on QM9 and fine tuning on data from SEPP with allowing further adjustment to the whole network seems to work best.

Run set
7

Below, I compare several PaiNN models for predicting dipole moment.

Run set
5


Previously, I tried the DMPNN on QM9 dipole moments (see this run) and then freezing the everything up to the first feedforward layer to predict only the dipole moments. That worked terribly!

Run set
2


Pretraining on Combisolv

Used the 1M solvation energies dataset from this paper for pretraining then fine tuned on SEPP data. I thought solvation energy might have some relationship to PC-SAFT parameters (e.g., see this paper). Unfortunately, the experiment did not seem to help at all.

Run set
3


Experiments on Regressed Experimental Data


Run set
5


Hyperparameter grid search for PyG MPNN model

  • I did a simple grid search of the following hyperparameters (MR in Gitlab here):
    • Whether or not to use the GRU layer
    • Activation: ReLU or LeakyReLU
    • Number of convolutions (i.e., message passing steps): 1,2, 3, 4
    • Fingerprint dimension: [32, 64, 128]
  • Takeaways
    • Increasing the fingerprint dimension causes a slight improvement in validation loss
    • Increasing the number of convolutions has a negative impact? I'm not 100% sure about this but it kind of makes sense because it increases the number of parameters.
    • Increasing dropout acts as a form regularization and seems to reduce validation loss.
    • Using GRU has a slightly useful impact.
    • ReLU has a slight advantage of LeakyReLU but very slight.

Run set
241



Before running another search, I did some testing and found that I could run faster using num_workers =0

Run set
4

I ran a more extensive hyperaparameter sweep over 11 scientific hyperparaemters also optimizing the "nuisance" parameters as suggested by the Google tuning playbook.
Main conclusions:
  • Optimizer and scheduler are most significant
  • Filtering non-associating molecules inside loss function might give slightly improved performance
  • Dropout helps

Run set
305


It seems that Noam and Adam are the best optimizers and both need hyperparameter. tuning

Run set
152


PyG pretraining

  • From this group of runs: sweep_9xt0sw
  • This works okay, but not great!
  • The dropout rate seems to be most important.
  • Lower learning rate is better

Run set
46