DeepChem: Molecular Interaction
Predict molecular binding with machine learning
In biochemistry, a ligand is a small molecule that binds to a protein. Can we predict how a protein and a ligand will interact—how tightly they will bind together—based on the chemical structures alone? Below are some examples of the chemical structures we're considering from the Protein Data Bank (PDB) Binding dataset, and some random forests used to predict the binding affinity (Ki) of the ligand to the protein.
Protein-ligand binding examples
Left: Protein, Middle: Ligand, Right: Protein-ligand complex
More examples
Random Forests
Learning curves are similar
Varying the number of estimators and the max features doesn't affect the learning curves much. You can see below that the learning curves are almost identical, overfitting quickly on the training and improving gradually as the train set size increases.
Larger range in R^2 on validation data
The R^2 on the validation data is more informative: increasing the number of estimators from 10 to 100, and the max features to "sqrt" (square root) gives better results by up to 11 points.