Nn.functional.interpolate vs manually initialized bilinear weights

Originally, I was using a pytorch ported version of the bilinear_u function from MatConvNet when implementing Hypercolumns for object detection. This is basically a ConvTranspose2d layer with the weights fixed during training and initialized to a bilinear filter. On the COCO benchmark, this would give me an mAP of 0.19.

However, if I use the nn.functional.interpolate function like this

score4 = nn.functional.interpolate(score_res4, size=score_res3.shape[2:], mode='bilinear', align_corners=False)

the mAP drops to 0.16. Setting align_corners=True does not help either.

Theoretically, both these operations should result in bilinear interpolation to scale up the feature map and should be the same operation. I am confused why does the pytorch interpolate function cause a drop in performance then. Any ideas or suggestions as to what I am doing wrong?

The full model code can be found here.