This report is the successor of my part report on Quantization. In this report, we're going to go over the mechanics of model pruning in the context of deep learning. Model pruning is the art of discarding the weights that do not improve a model's performance. Careful pruning enables us to compress and deploy our work horse neural networks onto mobile phones and other resource constrained devices.
This report is structured into the following sections: