I am reproducing a network that implemented in caffe. The last layer of the nework is
(Caffe) block (n) → BatchNorm → ReLU → SoftmaxWithLoss
I want to reproduce it in pytorch using CrossEntropy Loss. So, Is it right to remove ReLU layer before Softmax Loss because Cross Entropy aleady has it as
(Pytorch) block (n) → BatchNorm → CrossEntropyLoss