According to Google’s pytorch implementation of Big Data Transfer, there is subtle difference between the following 2 approaches. Could anyone explain the difference? Is it some different strategy for boundary pixels?
What’s the purpose of spliting padding
parameter from nn.MaxPool2d
and making it a separate nn.Pad
layer before the pooling?