Distilling Knowledge in Neural Networks With Weights & Biases