ReLU vs. Sigmoid Function in Deep Neural Networks