We ported the weights for a model that consists of a VGG16 encoder and a reversed VGG16 decoder, but the results we are getting are different. The output is a saliency map in both cases; however, in the ported version there is a drop in performance. We have repeatedly cross checked that there are no mistakes in the architecture. The operations included are essentially Convolution, Upsampling , ReLU and a Sigmoid
The types of operations for the decoder part are a repetition of the following commands only in different scales:
Lasagne
Conv2dLayer(net[‘conv5_3’], 512, 3, pad=1) #applies ReLU
Upscale2DLayer(net[‘uconv5_1’], scale_factor=2)
PyTorch
Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
ReLU()
Upsample(scale_factor=2, mode=‘nearest’),
The encoder part is in both cases loaded from VGG16
Lasagne
encoder = vgg16.build(input_height, input_width, input_var)
PyTorch
original_vgg16 = vgg16()
# select only convolutional layers
encoder = torch.nn.Sequential(*list(original_vgg16.features)[:30])
Since the skeleton should be the same, I was wondering what might explain this behavior. Is something known to be happening differently, if only slightly, between these 2 libraries in the any of the operations I mentioned? My search didn’t yield something in this regard.