A question about `padding` in `nn.MaxPool2d`

According to Google’s pytorch implementation of Big Data Transfer, there is subtle difference between the following 2 approaches. Could anyone explain the difference? Is it some different strategy for boundary pixels?

What’s the purpose of spliting padding parameter from nn.MaxPool2d and making it a separate nn.Pad layer before the pooling?