For input feature size of torch.Size([1, 64, 32, 32, 32]), the caffe output size is 1, 64, 16, 16, 16, while the pytorch is 1,64,15,15,15. What is happen?
When implementing neural nets, particularly convnets, I always recommend writing down the dimensions you expect to get (e.g., via comments in code or goode olde pen & paper) and see if they match with what you actually get. This is really important for making sure the network is doing what you intend it to do.
That being said, your expected output size should be
(32 - 3) /2 + 1 = 15.5
I suppose Caffe is rounding up whereas PyTorch is rounding down.
PS: I would try to implement your nets such that you avoid fractions, because it’s ambiguous and is handled differently from framework to framework, which can cause a lot of ambiguity.
Differences in implementation of Pooling - In keras, the half-windows are discarded. Caffe will put additional output for half-windows. Differences in Padding schemes - The ‘same’ padding in keras can sometimes result in different padding values for top-bottom (or left-right). caffe always pad evenly on both sides so the top-bottom (or left-right) padding values are always equal.
This is another one for caffe and tensorflow. I guess the pytorch follows the rule of caffe
yeah, this sounds correct, it’s basically the “rounding up.” So is your goal to match the output of Caffe exactly or is either way fine with you and the question was more about seeing what’s going one? If the former is the case, to achieve this, you would need to create a custom padding layer (or just append a padding vector (1 pixel width) to the images before maxpooling. I suppose setting padding=1 won’t give you the Caffe results because it would add 1 pixel on each side.