Predict molecular binding with machine learning

In biochemistry, a ligand is a small molecule that binds to a protein. Can we predict how a protein and a ligand will interact—how tightly they will bind together—based on the chemical structures alone? Below are some examples of the chemical structures we're considering from the Protein Data Bank (PDB) Binding dataset, and some random forests used to predict the binding affinity (Ki) of the ligand to the protein.

Protein-ligand binding examples

Protein-ligand binding examples

More examples

More examples

Random Forests

Learning curves are similar

Varying the number of estimators and the max features doesn't affect the learning curves much. You can see below that the learning curves are almost identical, overfitting quickly on the training and improving gradually as the train set size increases.

Larger range in R^2 on validation data

The R^2 on the validation data is more informative: increasing the number of estimators from 10 to 100, and the max features to "sqrt" (square root) gives better results by up to 11 points.

Random Forests